Ontology-Based Geospatial Web Query System - CiteSeerX

2 downloads 7733 Views 38KB Size Report
creates theme and value level ontologies in XML and uses semi-automatic methods to create mapping tables between the ontologies and local schemas [CR03 ...
Ontology-Based Geospatial Web Query System Nancy Wiegand and Naijun Zhou University of Wisconsin–Madison, [email protected], [email protected] 1. Introduction This abstract describes our system that provides DBMS type querying over heterogeneous geospatial data sets distributed over the Web. Our work contributes to the Semantic Web and to geospatial interoperability. Our work is done in the context of a proposed statewide land information system, which will consist of both spatial and nonspatial data that remain on local servers. One of the major obstacles in querying distributed geospatial data involves semantic heterogeneity. We focus on resolving semantic heterogeneity at the value level to accommodate distributed data sets having conceptually similar attributes in which values are drawn from diverse domains. We describe our query system and include a dynamic approach for ontology integration between a central ontology and local domains. Our system accommodates geospatial data, and we propose further designs for full spatial querying. Our work extends XML Web-based querying to heterogeneous and geospatial data. Research activities in Database Management Systems (DBMSs) have evolved from centralized DBMSs to querying XML Web data sources. However, most DBMS research on Web querying has focused on nonspatial data. Historically, research in Database Management Systems has proceeded separately from research in GIScience although the technologies are eventually combined [e.g., SC03, RSV02]. While the DBMS community is now working to extend querying over distributed Web sources, the spatial community is actively pursuing spatial interoperability [Bis98, GEF+99]. The OpenGIS Consortium has proposed the Geography Markup Language (GML) [CCL+01] as a standard means to express spatial data, and spatial ontologies are being proposed [e.g., FEA+02]. Our work aims to integrate current research on DBMS querying over the Web with spatial data needs and present a design to incorporate spatial data processing.

We also propose methods of semantic integration, which is needed for thematic elements of spatial data. When local jurisdictions independently create data sets, results are highly heterogeneous. To be able to query over regional and larger areas, these data need to be integrated. For example, Smart Growth legislation requires comprehensive data analyses over disparate local data for land use planning. We present a solution to automatically resolve semantic heterogeneity over attributes having hierarchical domains. Our work contributes to querying over the Semantic Web. This paper presents our system architecture and ontology solution and discusses our current and future work for spatial elements. Our system is built on top of a general purpose XML Net DBMS, and our methods are generalizable to other application areas and domains. 2. XML Technologies 2.1 XML Data Representation Being able to query data that are distributed over the Web is enabled by the semantics found in XML element tags. For example, Figure 1 shows a parcel record from ArcView files marked up in XML. We combined the nonspatial data from the .dbf file with spatial coordinates from the .shp file for a feature-based approach. GML also uses a feature-based approach although it has additional spatial elements. 1704995.6 5223.9 258 AR Wilson

… 477229.2, … 477229.2,…,295553.5 ….

Figure 1. Example Data in XML

2.2 XML DBMSs Native XML and XML enabled DBMSs are being developed [CRZ03]. XML Internet DBMSs process queries over distributed XML data on the Web. These systems are more powerful than HTML search engines that only search on keywords and return URLs. Niagara [NDM+01], for example, processes text-incontext queries to return actual query results. However, Niagara does not have semantic integration facilities, nor does it process spatial data. We modify the system to have these capabilities. 2.3 XML Query Languages Eventually, XML query languages such as XQuery [BCF+03] will be able to process spatial data using user-defined functions. For example if a Point_In_Polygon function were defined, the following would be XQuery syntax to find the feature element that contains a specified point [Ch01, Wie02]. { FOR $b IN document("EauClaire.xml")//* WHERE Point_In_Polygon(point_location, $b/extent_of) RETURN $b }

Figure 2. Spatial Function in XQuery It is interesting to note that many types of spatial queries can currently be processed over data represented in full GML specifications using the element tags alone. For example, to find all polygon parcel features that contain inner boundaries (i.e., holes), one can write in XQuery/Xpath: document(“EauClaire.xml”)//landparcel/polygon/ innerBoundaryIs

3. Ontology Approach Ontologies have been proposed as a solution for semantic integration [Fen01, FEA+02, TW02]. One of our prior solutions creates theme and value level ontologies in XML and uses semi-automatic methods to create mapping tables between the ontologies and local schemas [CR03, CRS+02, WZC03]. This method allows the most accurate and precise mappings between values and can be used to compare

values at any level of subcategories. However, it requires advance intervention by local domain experts and also requires mappings to be stored. We developed another approach for ontology mapping that uses a machine learning technique and enables dynamic and automatic mappings for domains that are hierarchical such that they can be defined by their subcategories. This approach can be used for land use coding systems because they often have main categories, e.g., agriculture or residential, that contain defining subcategories. Most ontology mapping techniques are based on string comparisons using a dictionary (e.g., WordNet [RS95]) or applying the technique of logical reasoning [HT01]. We introduce an approach that does not need an additional dictionary or description language. Instead, we use variations of a Naïve Bayes Classifier. 4. System Architecture Our full system architecture is presented in Figure 3. We add an ontology system and other capabilities to Niagara. 4.1 Overview We require an XML metadata file that contains jurisdiction type, jurisdiction name, and theme as minimal metadata to identify each distributed data source. Such information can be drawn from FGDC metadata files represented in XML or can be newly created as XML files. Base source data is indexed by Niagara. We additionally index and store metadata information in separate metadata indexes. A user query enters our Ontology System (Figure 3). Query re-writes to local terms are done either by consulting the pre-stored ontology mappings or by using dynamic algorithms. For the latter, although there may be multiple mappings that are highly relevant, for now, we only choose the highest ranked match. Subqueries are then sent to the Niagara Query Engine for execution. We post-process query results to provide statistical aggregations and present a spatial display using MapObjects. 5. Spatial Data Our initial spatial data representation, shown in Figure 1, can be considered to be a type of GML. So far, we consider polygon data with x,

Node Node

Node

GML Data Crawler

XML Data Indices

Search Engine

Ontology Mappings (Agreements)

Query Engine

Niagara

Spatial Display

XML Query Result

indexing

Inverted list Non-spatial Data Indexing ID

non-spatial query

GML Query

MetaData Indices

Figure 11.

Ontology System

List Spatial Data Indexing ID

XML DBMS Query

GeoSpace (query rewrite)

Result Aggregate

Messages

Figure 3. System Architecture y coordinates that are listed as a subelement of the feature. These coordinates along with our calculated bounding box (extent_of) element can be used for spatial processing either directly or using user-defined functions in a query language such as shown in Figure 2. It is possible to accommodate full spatial and nonspatial querying in our Web query system (Figure 4). We assume the distributed spatial data will be in a type of GML. To facilitate full spatial processing, spatial data and nonspatial data would be indexed separately. Queries would access the appropriate indexes. An ID is needed to relate the nonspatial and spatial data at the feature level. Acknowledgement This work was partially supported by the Digital Government Program of NSF, Grant No. 091489. References [Bis98] Y. Bishr, “Overcoming the Semantic and Other Barriers to GIS Interoperability”, IJGIS, 12(4), 1998, pp. 299-314. [BCF+03] S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu, J. Robie, J. Siméon, Editors, “XQuery 1.0: An XML Query Language”, W3C Working Draft 02 May 2003. [CCL+01] S. Cox, A. Cuthbert, R. Lake, and R. Martell (Editors). 2001. “Geography Markup Language (GML) 2.0. OpenGIS Implementation Specification, 20 February 2001.

spatial query GML

Data

R+ Tree Bounding-box Indexing Indexing and ID

Querying

Figure 4. GML Spatial Data Indexing and Querying [Cha01] D. Chamberlin, 2001. Email consultation. [CR03] I.F. Cruz and A. Rajendran, “Semantic Data Integration in Hierarchical Domains”, IEEE Intelligent Systems, Vol. 18, No. 2, March-April, 2003, pp. 66-73. [CRS+02] I.F. Cruz, A. Rajendran, W. Sunna, and N. Wiegand, “Handling Semantic Heterogeneities Using Declarative Agreements”, Proc. ACM GIS,2002, pp.168-174. [CRZ03] A. Chaudhri, A. Rahid, and R. Zicari, (Editors), XML Data Management, Native XML and XML-Enabled Database Systems, Addison-Wesley, 2003. [Fen01] D. Fensel, Ontologies: Silver Bullet for Knowledge Management and Electronic Commerce, Springer-Verlag, Berlin, 2001. [FEA+02] F. Fonseca, M. Egenhofer, P. Agouris, and G. Camara, “Using Ontologies for Integrated Geographic Information Systems”,Transactions in GIS,6(3),2002,pp. 231-257. [GEF+99] Goodchild, M.; Egenhofer, M.; Fegeas, R.; and Kottman, C. 1999. Interoperating Geographic Information Systems. Kluwer Academic Publishers, Boston. [HT01] F. Hakimpour and S. Timpf, “Using Ontologies for Resolution of Semantic Heterogeneity in GIS”, 4th AGILE Conference on Geog. Info. Science, April 19-21, 2001. [NDM+01] J. Naughton, D. DeWitt, D. Maier, and others, “The Niagara Internet Query System”, IEEE Data Engineering Bulletin, 24(2), 2001, pp. 27-33. [RSV02] P. Rigaux, M. Scholl, and A. Voisard,, Spatial Databases with Application to GIS, Morgan Kaufmann, 2002. [RS95] R. Richardson and A.F. Sineaton, “Using WordNet in a Knowledge-Based Approach to Information Retrieval”, Working paper, CA-0395, School of Computer Applications, Dublin City University, Ireland, 1995. [SC03] S. Shekhar and S. Chawla, Spatial Databases, A Tour, Prentice Hall, 2003. [TW02] A. Theobald and G. Weikum, “The Index-based XXL Search Engine for Querying XML Data with Relevance Ranking”, in Proceedings of EDBT 2002, pp. 477-495. [Wie02] N. Wiegand, “Investigating XQuery for Querying Across Database Object Types”, SIGMOD Record, Volume 31, Number 2, June 2002, pp. 28-33. [WZC+03] N. Wiegand, N. Zhou, and I.F. Cruz, “A Web Query System for Heterogeneous Geospatial Data”, In Proceedings Scientific and Statistical Database Management (SSDBM), July 2003, pp. 262-265.

Suggest Documents