volved with civil infrastructures. ... riety of civil engineering and urban policies applications, ..... practitioners such as geologist and civil engineers who need.
Design of a Geotechnical Information Architecture Using Web Services∗ Roger Zimmermann1, J.P. Bardet2 , Wei-Shinn Ku1 , Jianping Hu2 , Jennifer Swift2 2 1 Department of Civil and Environmental Engineering Department of Computer Science University of Southern California University of Southern California Los Angeles, CA 90089 Los Angeles, CA 90089 {bardet, jianhu, jswift}@usc.edu {rzimmerm, wku}@usc.edu
Abstract
dard Penetration Test (SPT), a particular type of geotechnical borehole test. Significant amounts of geotechnical borehole data are generated in the field from engineering projects each year. The data and results of boreholes are usually published and released as hardcopy reports, without the digital data from the field test. Naturally, sharing data via the traditional hardcopy reports is slow and results in errors when the data is converted back to digital form for further processing. With the recent ubiquity of communication networks – particularly the Internet – the trend toward electronic storage and exchange of geotechnical borehole data has accelerated. So far these efforts have often been uncoordinated and chaotic. Geotechnical borehole data is complex and sophisticated in that it contains both well structured and semi-structured elements. In Figure 1, for example, the Summary field contains free-form text, while some of the other columns are well defined. Therefore, an efficient data format for storage and exchange is required that is suitable for the diversity of geotechnical borehole data. Currently, there is no accepted common format for exchanging these data among researchers and practitioners. To date, the most commonly used formats for representing geotechnical borehole data include the Association of Geotechnical and Geoenvironmental Specialists [1], the Log ASCII Standard [5] for well logging in petroleum engineering,and the National Geotechnical Experimental Site [4] format that adopts the AGS data dictionary and expands it to cover more research oriented geotechnical tests. Another research project was initiated at the US Army Engineer Waterways Experiment Station [9] to establish a standard electronic data format for geotechnical and geological exploration in order to automate data interchange. In addition, the Pacific Earthquake Engineering Research Center at Berkeley (PEER) and the Consortium of Organizations Strong Motion Observation Systems (COSMOS) commence a project to create a geotechnical data dictionary and exchange format specifically for the digital dissemination of geotechnical data [8]. Recently, we have proposed the use of XML as the preferred format for storage and exchange of borehole data [3].
We present an information architecture using Web services for exchanging and utilizing geotechnical information, which is of critical interest to a large number of municipal, state and federal agencies as well as private enterprises involved with civil infrastructures. For example, in the case of soil liquefaction hazard assessment, insurance companies rely on the availability of geotechnical data for evaluating potential earthquake risks and consequent insurance premium. The exchange of geotechnical information is currently hampered by a lack of a common data format and service infrastructure. We propose an infrastructure of Web services which handles geotechnical data via an XML format. Hereafter we report on the design and some initial experiences.
1 Introduction Geotechnical information on soil deposits is critical for our civil infrastructure. Local, state and federal agencies, universities, and companies need this information for a variety of civil engineering and urban policies applications, including land usage and development, and mapping of natural hazards such as soil liquefaction and earthquake ground motions. Foremost examples of geotechnical information, geotechnical boreholes are vertical holes drilled in the ground for the purpose of obtaining samples of soil and rock materials and determining the stratigraphy, groundwater conditions and/or engineering soil properties [6]. In spite of rather costly drilling operations, boreholes remain the most popular and economical means to obtain subsurface information. These type of data range from basic borehole logs containing a visual inspection report of soil cuttings to sophisticated composite boreholes combining visual inspection and in-situ, laboratory geotechnical and geophysical tests. Figure 1 shows an example transcript of the Stan∗ This research is supported by the National Science Foundation grants ITR-0219463 and EEC-9529152.
1
2 Geotechnical Data Management and Exchange (GDME) Our proposed architecture for the Geotechnical Data Management and Exchange, GDME, system makes extensive use of Web services to provide access to these valuable data sets for the expert practitioners as well as the general public. We will now describe each component of the system in detail.
2.1 GDME Overview
Figure 1. Example of boring log showing stratigrapy, physical sampling record and Standard Penetration Test (SPT) blow counts. XML offers many advantages over other data formats for borehole data. Its tree-structure and flexible syntax are ideally suited for describing constantly evolving borehole data. However, in addition to new data formats for borehole data, there is much research to be done to improve the exchange and utilization of geotechnical information over the Internet. There is a need for an infrastructure for sharing and manipulating this valuable information. This paper describes our efforts to design and implement such an infrastructure based on an extensible set of Web services. The services that are initially required are (1) file storage and (2) query & exchange. With these facilities practitioners in the field can directly store newly acquired data in a repository while the data customers are able to access these data sets in a uniform way. Beyond these two basic services we have also designed a (3) visualization capability that allows an automated conversion of XML geotechnical data into a graphical view similar to the traditional hardcopy format. The output is presented as Scalable Vector Graphics (SVG). The rest of this study is organized as follows. Section 2 introduces the overall architecture of our system. We continue in Section 3 with a detailed description of the services. Conclusions and future research directions are contained in Section 4.
Figure 2 illustrates the architecture of the GDME system. Multiple, distributed geotechnical data archives are accessible via the Internet. These repositories are under different administrative control, for example, the US Geological Survey provides some of their information to the public. We distinguish two types of archives on the basis of what data access they allow: read-write (RW) or read-only (RO). RW archives host three geotechnical Web services, an XSLT (XSL Transformations) processor, and a database (with XML and spatial data query support). The three Web services that are implemented within the application server provide the interface for distributed applications to store (File Storage Web Service, FSWS), query and retrieve (Query & Exchange Web Service, QEWS) and visualize (Visualization Web Service, VWS) the geotechnical information. RO archives implement only the QEWS Web service to access their database. The services use the Simple Object Access Protocol (SOAP) as the communication protocol between the server and clients. A complete Web Service Description Language (WSDL) file of the system provides two pieces of information: an application-level service description (abstract interface), and the specific protocol dependent details that users must follow to access the service at a specified concrete service endpoint. These services will be registered to an Universal Description Discovery and Integration (UDDI) server and users can retrieve the information about these services from the Internet. The client program can use its service proxy (static binding) to request these services from the GDME system. 2.1.1 GDME Web Services Functionality Geotechnical data may be added to an archive in two different ways. First, an onsite database administrator may insert the data directly into the local database. In this case insert operations are not available via Web services and we consider this archive read-only within the GDME architecture. Second, geotechnical data may be uploaded into a database via the file storage Web service. Hence the repository is considered read-write. Any uploaded XML file is
Map Service FSWS File Storage Web Service
Microsoft TerraServer .NET
Temporary Storage QEWS Query & Exchange Web Service Queries, XML Data
XML VWS Visualization Web Service
SVG
DTD, XML Data
XML
XSLT Transform
Archive (Read-Only Access)
Database
QEWS
XML Spatial Extension Extension
Archive (Read/Write Access)
Geotechnical User Application Program
FSWS QEWS VWS
Archive (Read-Only Access) QEWS
Network
Archive (Read/Write Access)
Geotechnical User Application Program
FSWS QEWS VWS
Archive (Read-Only Access) QEWS
Figure 2. The proposed Geotechnical Data Management and Exchange architecture composed of multiple, distributed data archives. Some archives are read-only while others allow reading and writing. Web services provide the programmatic interface to the data. first placed in a temporary space where it is validated against the Document Type Definition (DTD) for geotechnical data sets. If the file is accepted the data is then stored in the main database. The stored data can be queried and downloaded via the QEWS interface. Currently we support a number of predefined queries that are further described in Section 3.2. Note that each repository contains a unique set of geotechnical data. To avoid that a user application must contact each repository directly we implement a distribued query mechanism that automatically forwards queries to other known archives and collects the results before returning the data to the application. Such forwarding mechanisms can be effective as demonstrated by the SkyQuery project [7]. At the current stage the query results are complete borehole data sets. In our local repository we have added the DB2 Spatial Extender to support spatial queries on the geotechnical data. For example, users can query boreholes based on their geographical location. Finally, the visualization Web service, VWS, translates the original geotechnical data files into graphical data images (in SVG format) via an XSLT processor. We use SVG because it is in essence XML and is hence easily compatible with the overall architecture.
3 Geotechnical Web Services Functionality The three main geotechnical Web services of the system provide a number of specialized methods for programmatic access. These methods follow the W3C standardized datatypes and structures of the SOAP specification to provide an easily usable interface for client program construction. We will now describe the overall functionality of each Web service.
3.1 Geotechnical File Storage Web Service The geotechnical file storage service provides client programs with an interface to upload their geotechnical XML files into the main database. All files will be stored in XML format and at the same time meta-data will be extracted and saved. The meta-data includes specific elements of the imported files to facilitate querying (described in Section 3.2). This service is subject to user authentication because at this moment only registered expert users are allowed to make changes to the database. We currently utilize the DB2 relational database from IBM with an XML Extender. With DB2 there are two different ways to store XML data. The
first one is named the “column” method and it extracts elements of interest from an XML file and stores them in table columns as meta-data. The XML file itself is also stored in its entirety in a column of type XML in the main table. Hence, the meta-data columns can be rapidly accessed when users want to query and retrieve XML information in the database. The second way to store XML data is named the “collection” method, which completely decomposes the elements of an XML file into columns of multiple tables. The “collection” process is bi-directional which means that the XML files can be regenerated after individual elements have been updated or even new XML files are generated from existing tables. Both the “column” and “collection” methods require Document Access Definition (DAD) files to map XML file structures to database tables and specify the access and storage methods. Verification of the Geotechnical XML Files. Two levels of verification are performed on uploaded files. The first level is executed automatically by checking the XML data against a borehole DTD to ensure that the uploaded file is well-formed. The DTD defines geotechnical tags that allow the full description of data provided by our partner organizations (California Department of Transportation, California Geological Survey, Los Angeles Department of Water and Power, etc.). If any file cannot pass the first level verification, it will be rejected and an error condition will be raised. Otherwise the file will be stored in a temporary directory and a notification email will be sent to an expert of that category of geotechnical files. The expert verifies the content of the uploaded file and decides if it can be stored into the main database or should be returned to the author for clarification. After passing the two levels of verification, the file will be imported into the database and its meta-data will be extracted. Users can download the DTD files from the project website for guidance when constructing their geotechnical XML files. They can also use the verification interface on our website for remote checks.
3.2 Geotechnical Query & Exchange Web Service The main purpose of the query and exchange Web service is to facilitate the dissemination of these valuable geotechnical data sets and encourage their use in broad and novel applications. Once XML files are uploaded into the main database as described in the previous paragraphs, they are then available for dissemination. A data user can query the geotechnical files in the main database on meta-data attributes such as the name of the data provider, the geographical location of the borehole, etc., to find the files of interest. The files can then be downloaded with the GetGeoFile method. A graphical version of the data in SVG format can also be obtained (see Section 3.3 for further details).
Predefined Queries of the Data Exchange Web Service. In the current GDME version, we have identified six parameters for extracting particular sets of geotechnical files from the main database; these parameters are also indexed as meta-data. Client programs can use these parameters in the query interface to find the geotechnical files of interest. At this point we restrict the query interface to these six parameters to gain some experience with our architecture and to simplify the distributed processing of queries. Spatial query constructs are supported within the database management system if a spatial extender is available. 1. Project Name: The individual name of these geotechnical projects. 2. Site: The field site where borehole testing has been carried on. 3. Location: Certain region with specified longitude and latitude. 4. Testing type: Field investigation type (SPT, CPT, etc.). 5. Project Date: The date of the beginning of the project. 6. Data Provider: The organization or agency who produced the data. Recall that the GDME system is an evolving federation of autonomous and geographically distributed geotechnical databases on the Internet. To facilitate the querying process, we require a client application program to only contact one of the GDME nodes and then have this node forward the query to and collect results from other GDME databases that are part of the federation. Each node maintains a list of other known GDME nodes. Currently there is no automated mechanism for a new node to join the federation and we plan to address this issue in the future. A node recognizes when it receives a query from another GDME node and will not forward the query any further so as to avoid loops in query processing. The result of a query is a list of the borehole data sets that satisfy the query predicate. Each data set is identified with a link descriptor similar to a URL. All the qualifying links are collected from the distributed GDME nodes and returned to the user application to form the result of the query. If the user application decides to perform any processing on the data sets then it can download the data directly from each individual node via the link descriptor. Using the Web Service of TerraServer. Our query interface allows boreholes to be searched by geographical location. A background map with superimposed borehole locations is a compelling way to present the search results, as illustrated in Fig. 3. However, rather than storing aerial maps in the system, the functionality of the Microsoft TerraServer is enlisted [2]. The TerraServer provides maps (aerial imagery and topographical maps) covering the United States with up to one meter resolution. Furthermore, it provides
files that are published by our system. They may then upload the data and use the visulalization method to translate the file into the SVG format.
4 Conclusions and Future Research Directions
Figure 3. Geotechnical borehole locations superimposed onto an aerial map image (San Fernando Valley, Los Angeles).
We have presented some ongoing research results on the development of a Web service for exchanging and utilizing geotechnical information across a variety of applications critical to our civil infrastructure systems. The proposed Web services are being presently tested in realistic work environments in collaboration with municipal, state, and federal agencies, as well as international partners. It is likely that our first prototype of geotechnical Web services will evolve in different directions depending on the needs of our research collaborators. It is anticipated that the concepts developed in this research will contribute to paving the way for the creation of a virtual repository of geotechnical information operated by a consortium of local, state and federal agencies. Acknowledgements. We would like to thank Yu-Ling Hsueh, Chirag Trivedi, Xiaohui He and Manan Shah for their help with the system implementation.
References Figure 4. Borehole SVG file that was automatically generated from an XML data file. Web service interface methods to obtain these background maps. Our system will then plot the geotechnical borehole locations on top of the map according to their longitude and latitude.
3.3 Geotechnical Visualization Web Service Visualizing geotechnical XML files is very useful for practitioners such as geologist and civil engineers who need to analyze the data. We use the Apache Xalan parser in combination with Extensible Stylesheet Language (XSL) files for this purpose. The parser translates XML files into SVG descriptions according to a set of XSL rules. After the translation, the SVG data is transmitted to the client and the user may display and analyze it with, for example, a web browser and the Adobe SVG Viewer plug-in (Fig. 4). Data may not only be visualized from the main database, but also from temporary data provided by users who do not have database write privileges. Again, the user must construct their geotechnical XML files in accordance with the DTD
[1] AGS. Electronic Transfer of Geotechnical and Geoenvironmental Data. Association of Geotechnical and Geoenvironmental Specialists, 3rd Edition, United Kingdom, 1999. [2] T. Barclay, J. Gray, E. Strand, S. Ekblad, and J. Richter. TerraService.NET: An Introduction to Web Services. Technical Report MSR-TR-2002-53, Microsoft Research, June 2002. [3] J. Bardet, J. Hu, S. Codet, and J. Swift. Data Storage and Exchange Format of Borehole Information. Submitted for publication, 2003. [4] J. Benoˆit and A. Lutenegger. National Geotechnical Experimentation Sites, NGES. ASCE Geotechnical Special Publication, (93):408, April 2000. [5] K. Heslop, J. Karst, S. Prensky, and D. Schmitt. Log ASCII Standard Document #1 – File Structures. Canadian Well Logging Society, Calgary, Alberta, June 2000. Version 3. [6] R. Hunt. Geotechnical Engineering Investigation Manual. McGraw-Hill, New York, 1984. [7] T. Malik, A. S. Szalay, T. Budavari, and A. R. Thakar. SkyQuery: A Web Service Approach to Federate Databases. In Proceedings of the First Biennial Conferenc on Innovative Data Systems Research (CIDR 2003), pages 188–196, Asilomar, CA, January 5-8 2003. [8] U. Survey. Archiving and Web Dissemination of Geotechnical Data Website. COSMOS/PEER-LL 2L02, USC. URL http://geoinfo.usc.edu/gvdc/. [9] US Army Waterways Experiment Station. Standard Data Format for Geotechnical / Geological Exploration, 1998. Project 98.005, URL http://tsc.army.mil/products/geotech/geocover.asp.