A Grid-based Architecture for LIDAR Data Interpolation

26 downloads 2574 Views 369KB Size Report
... currently performed on a desktop PC with software packages available and ... Hosting environment is aggregation of software above physical infrastructure, ...
A Grid-based Architecture for LIDAR Data Interpolation SU Lina ,

TAO Jinhuaa LI Shukaia

a.Institute of Remote Sensing Applications/CAS, Demonstration Center for Spaceborne Remote Sensing/CNSA, State Key Laboratory of Remote Sensing Science, Beijing 100101, China ABSTRACT LiDAR data interpolation is an important step in LiDAR data processing. Interpolation arithmetic involves high time complexity. Now it is difficult to tackle it by existing centralized processing because of increasing datasets. Emerging Grid technologies is a cost-effective solution for solving scientific problems that involve large datasets and complex analyses. In this paper we present a grid-based architecture to coordinate various resources as LiDAR data processing platform, utilizing centralized task scheduling and parallel computing technologies. LiDAR data is partitioned into appropriate slice and neighboring slice boundary is overlap. These slices are send to each node by scheduler. Computation speed is improved by parallel computing of each node. Grid service is the interface of interpolation arithmetic. In addition, it is beneficial to make these available to the community in a unified framework through a portal, so scientists can focus on their scientific work and not be concerned with the implementation of the underlying infrastructure. Key words: Grid computing; LiDAR; Parallel computing

1. INTRODUCTION Airborne LiDAR (Light Detection And Ranging), also termed Airborne laser scanning (ALS), is one of many laser remote sensing techniques[1].By measuring the round trip time ∆t of an emitted laser pulse from the sensor to a reflecting surface and back again, the distance from the sensor to the surface d is determined using the known speed of light c in air: d=c*∆t /2. For measuring the topography of the Earth surface from an aircraft, typically an airplane, the position and attitude of the platform have to be known. These parameters are determined with GPS and IMU[2]. Through periodical deflection of the emitting direction across the flight path by an oscillating or rotating mirror and by the forward motion of the aircraft, a dense cloud of points is sampled on the Earth’s surface in the form of a swath. Multiple flight lines are necessary to cover a large area. Because of its immediate generation of 3D data, high spatial resolution and accuracy, LiDAR data is becoming popular for the reconstruction of digital elevation models[3] and virtual city models[4]. With improvements in LiDAR system comes an increase in the volume of LiDAR data. LiDAR data interpolation is an important step in LiDAR data processing. Interpolation arithmetic involves high time complexity. Nowadays Interpolation of the high-point density LiDAR data constitutes the bottleneck in the overall process[5]. The interpolation of large datasets, currently performed on a desktop PC with software packages available and familiar to most users, also presents a significant challenge for processing these types of data volumes. The high point density pushes the

International Symposium on Photoelectronic Detection and Imaging 2007: Related Technologies and Applications, edited by Liwei Zhou, Proc. of SPIE Vol. 6625, 66250F, (2008) · 0277-786X/08/$18 · doi: 10.1117/12.790773

Proc. of SPIE Vol. 6625 66250F-1

computational limits of processing systems and makes grid interpolation difficult for most users who lack computing and software resources. In the current state of affairs, the popularity and rate of acquisition of LiDAR data far outpaces the resources available for user who wish to work with these data[5]. Emerging Grid technologies is a cost-effective solution for solving scientific problems that involve large datasets and complex analyses. Grid technologies enable large scale resource sharing and data management, collaborative and distributed applications and high performance computing, for solving large scale computational and data intensive problems. There are few reports about grid computing applying in LiDAR data processing. A scientific workflow approach to coordinate various resources as data analysis pipelines was described in reference 5. It present a three tier architecture for LiDAR interpolation and analysis, a high performance processing of point intensive datasets, utilizing a portal, a scientific workflow engine and Grid technologies. In this paper we present a grid-based architecture to coordinate various resources as LiDAR data processing platform utilizing centralized task scheduling and parallel computing technologies. The rest of this paper is organized as follows. Section 2 gives a brief overview of grid computing technology. Section 3 describes the architecture in data model, component model, and parallel arithmetic details. We conclude in section 4.

2. GRID COMPUTING The term grid, coined in the mid 90s in the academic world, was originally proposed to denote a distributed computing system that would provide computing services on demand just like conventional power and water grids do. During the last few years, as the technology evolved and the grid concept started being explored on commercial endeavours, some slight but meaningful changes have been made in its original definition. Nowadays, an accepted definition, world-wide, states that a “grid” is a system that: “coordinates resources that are not subject to centralized control;using standard, open, general-purpose interfaces and protocols;to deliver non-trivial qualities of service”.[6]

OGSA(Open Grid Service Architecture) is the standard software architecture of grid computing system in reality. Grid service is the most important concept in OGSA. It is a independent computation unit. An OGSA grid service is a potentially transient Web service based on grid protocols using WSDL(Web Services Description Language) which has uniform interface and data interchange protocols. OGSA is service oriented because it delivers functionality among loosely-coupled interacting grid services.

3. SYSTEM ARCHITECTURE For the end user, LiDAR data interpolation includes three main computational steps: querying point cloud datasets, interpolating the data using interpolation algorithms, and visualizing the results or downloading the result. Below we elaborate on each of the processing steps in details.

Proc. of SPIE Vol. 6625 66250F-2

3.2 Software component Architecture Considering user’s requirements and OGSA, a software component architecture is presented in figure 1. Application Layer(Grid Portal) OGSA Layer Hosting Environment Physical Infrastructure

Figure 1: software component architecture diagram

Physical infrastructure layer is aggregation of hardware which can supply computational power, storage capability or data transfer capability. These resources could have different operating system, in different location, and belonging to different organizations, and connect with each other by internet directly or indirectly. Hosting environment is aggregation of software above physical infrastructure, which includes operating system, web server, application server and so on. Physical infrastructure and hosting environment constitute computer resources of grid system. Each computational or storage resource is called grid node. OGSA layer is the data processing layer. Details will be described below. A web-based grid portal as application layer for gathering user’s processing requirements and returning the processing results. It is the front user interface. 3.2 Component model The diagram presented in Figure 2 illustrates the way that the main grid components are connect to each other and shows which roles they play in the proposed solution. The massive amount of LiDAR data were organized in a relational database on Database Server to provide a unified structure. Meta data base recorded the description data of LiDAR data. After login, user inputs processing parameters and submits them to grid system. These processing parameters are first sent to OGSA toolkit. The toolkit arranges the processing task to scheduler. Then the remote computing resource and storage resource are scheduled by scheduler. The subset query returns all points that reside within a user selected bounding box on portal. The query is then performed on database server by standard SQL(Structure Query Language). The returned data is partitioned into some slices. It is depend on the number of computational nodes, computational power of each node and communication capacity among nodes. Each slice is sent to grid computational node. The interpolation step is performed on computational nodes. That is time consuming. Part of processing results are returned to user by grid portal, and another part of the results are sent to database server as a permanent data, so as to minimize the processing time of same operation as this next. User can select visualizing or downloading the results from portal.

Proc. of SPIE Vol. 6625 66250F-3

I ubn bLoc2i Iiä bSL9UGL

jp bLots

D ebi e1 bL OCG2 uä LG2flI

enpuLU fl26L

bLocG22i uä b9L ]U GL

2

bLoc2iuä LG2(

00! K!

peqn E

q

S

vg 0

S

S2G

!q p20fl1(

M q19

0CG22 01i

bn cpukn 6

2ELA6L

Figure 2: Component model diagram

Data model The important part of data model as described in figure 3. I 92L bo U EK

x

2C9UU! uä 9LE

1

U eu2!

ggg uJu ponuqsLl

ill/I

COOLqI U9E

PI 6 U9U O I 9261 boi u gpI 6 U9U O [Jkl

EK AGL E,xJ

AGL 6X5 AGL 6X3 II GJ

II 65

q] ceu L! 9UII 63 Figure 3: LiDAR data base model diagram

There are three important tables in data model. Table “scanning area” is meta data table which records the name, boundary, coordinates, table name of recording laser point, table name of records TIN of the scanning area. One scanning area has one record in this table. Table “laser point” records x, y, z and intensity of one laser point. One laser point has one record in this table. Table “TIN” records the Triangle Irregular Network generated from laser point. In order to save storage space, every scanning area has one table for recording all laser points and one table for recording the TIN in that area. Field id is the primary key in these three tables. All tables are managed in Oracle DBMS(Data Base

Proc. of SPIE Vol. 6625 66250F-4

Manage System) and visited by standard SQL. 3.4 Computation considerations The goal of grid computing is to look for a minimal execution time Makespan. Nowadays it has been already demonstrated that looking for the Makespan is a Non-deterministic Polynomial Complete problem[5]. Grid provides the ability to run these applications across a heterogeneous, geographically dispersed set of computers. Rather than run the application on a single homogenous computer, the application can take advantage of the larger set of resources in the grid. LiDAR data processing is computation intensive and data intensive. So we consider it from two aspects: computation and data. Scheduling the most important step in grid enabling an application is to determine whether the calculations can be done in parallel. Not all problems can be converted into parallel calculations. Designing a parallel arithmetic is very difficult if the algorithm is such that each computation depends on the prior result. Reducing the dependency on prior computations is one of basic rules in designing parallel arithmetic. When considering applications that may be split into multiple parts for parallel execution on a grid, the amounts of data that are needed to be sent to the node performing a calculation and the time required to send it must be considered. If the application can be split into small work units requiring little input data and producing small amounts of output data that would be most ideal. Sending this data along with the executable file to the grid node doing the work is part of the function of most grid systems. Reducing transferred data and balancing load of each node is another basic rule of designing parallel arithmetic. However, in LiDAR data processing large amounts of input and output data are involved, and this can cause complications and inefficiencies. The grid interpolation parallel arithmetic is presented in figure 4. Li dar poi nt cl oud

bi OCG22OL 2

Segment poi nt cl oud

Met a dat abase

Tr ansmi t dat a t o each comput at i onal node

Cr eat e TI N

Gr i d i nt er por l at i on

Cr eat e TI N

Gr i d i nt er por l at i on

Cr eat e TI N

Gr i d i nt er por l at i on

Combi ne r esul t s f r om each node

Li dar dat abase

Figure 4: grid interpolation parallel processing diagram

There are many arithmetic in create grid digital elevation model from LiDAR data, but interpolating grid DEM from tin is generally thought the better one in precision and creating efficiency. z

Segment point cloud: LiDAR point cloud is split into subsets. Each subset has different size because of different computing power in each node and different data transfer capacity among nodes. Boundary of neighboring subsets

Proc. of SPIE Vol. 6625 66250F-5

is overlap. Meta data is visited in this step. z

Transfer data to each computational node: Because of not all nodes have large storage capacity, some computational nodes have not local data storage. The subset data must be transferred to computational nodes. It is important to arrange for subjobs to access the nearest copy per the configuration of the network. The network does not become the bottleneck for such a grid application.

z

Create TIN and grid interpolation: are the most time consuming step in DEM creation. The two steps as a whole execute in parallel, and execute in serial inside. Overall executing time is reduced.

z

Combine results of each node: Result from each node is a part of whole. Combining sub results with other form the whole. Boundary processing is primary work in this step.

In conclusion, part of interpolation arithmetic can be performed in parallel, while others, such as segment point cloud, combine results cannot. Point cloud interpolation is a mix of independent computations as well as dependent computations.

4 CONCLUSION In this paper we describe a grid-based architecture for LiDAR data processing. It is possible that LiDAR data and processing are shared in internet. The overall performance of points interpolation is improved. A lot of research should be done in advanced computing of LiDAR data processing because of few references in this domain. We plan to extend this work to whole processing of LiDAR data, and improve the overall performance with advanced processing method.

REFERENCES [1] Measures R. M, Laser remote sensing: Fundamentals and applications, Malabar, Fla: Krieger Pub, 1992. [2] Ip, A., El-Sheimy, Performance analysis of integrated sensor orientation, International Archives of Photogrammetry and Remote Sensing, Istanbul Turkey, 2004. [3] Sithole, G., Experimental comparison of filter algorithms for bare—Earth extraction from airborne laser scanning point clouds, ISPRS Journal of Photogrammetry and Remote Sensing, 2004. [4] Kaartinen, H., Accuracy of 3D city models: EuroSDR comparison, International Archives of Photogrammetry and Remote Sensing, The Netherlands: Enschede, 2005. [5] V.N. Alexandrov et, A Three Tier Architecture for LiDAR Interpolation and Analysis, ICCS,2006 [6] IBM, grid computing in research and education, 2005.

Proc. of SPIE Vol. 6625 66250F-6