Automatic search of geospatial features for disaster and emergency ...

12 downloads 4758 Views 642KB Size Report
May 20, 2010 - ing intelligent search and integration of heterogeneous geospatial information ..... a set of feature attributes, which can be described as a 〈name, type, ... conceptual models for geospatial domain objects and reason about.
International Journal of Applied Earth Observation and Geoinformation 12 (2010) 409–418

Contents lists available at ScienceDirect

International Journal of Applied Earth Observation and Geoinformation journal homepage: www.elsevier.com/locate/jag

Automatic search of geospatial features for disaster and emergency management Chuanrong Zhang a,∗ , Tian Zhao b , Weidong Li a,c a

Department of Geography and Center for Environmental Sciences and Engineering, University of Connecticut, Storrs, CT 06269, USA Department of Computer Science, University of Wisconsin, Milwaukee 53201, USA c College of Resources and Environment, Huazhong Agricultural University, Wuhan 430070, China b

a r t i c l e

i n f o

Article history: Received 26 June 2009 Accepted 20 May 2010 Keywords: WFS (Web Feature Service) Data interoperability Disaster management Semantic web Ontology Natural language

a b s t r a c t Although the fast development of OGC (Open Geospatial Consortium) WFS (Web Feature Service) technologies has undoubtedly improved the sharing and synchronization of feature-level geospatial information across diverse resources, literature shows that there are still apparent limitations in the current implementation of OGC WFSs. Currently, the implementation of OGC WFSs only emphasizes syntactic data interoperability via standard interfaces and cannot resolve semantic heterogeneity problems in geospatial data sharing. To help emergency responders and disaster managers find new ways of efficiently searching for needed geospatial information at the feature level, this paper aims to propose a framework for automatic search of geospatial features using Geospatial Semantic Web technologies and natural language interfaces. We focus on two major tasks: (1) intelligent geospatial feature retrieval using Geospatial Semantic Web technologies; (2) a natural language interface to a geospatial knowledge base and web feature services over the Semantic Web. Based on the proposed framework we implemented a prototype. Results show that it is practical to directly discover desirable geospatial features from multiple semantically heterogeneous sources using Geospatial Semantic Web technologies and natural language interfaces. Published by Elsevier B.V.

1. Introduction Disaster and emergency management requires instant access to diverse data to make quick decisions and take instantaneous actions. Timely and accurate geographic information from easily accessible databases is fundamental to quick response and emergency service dispatch (Coppock, 1995; Abdalla and Tao, 2005) and can greatly improve the decision-making process, potentially save lives, and aid citizens (Zlatanova and Li, 2008). However, it is often difficult to obtain even basic geographic information in real time from the Internet (Zerger and Smith, 2003). Experience suggests that the real barriers to emergency response and disaster management are not the lack of data (Monmonier and Giordano, 1998; U.S. House Select Bipartisan Committee to Investigate the Preparation for and Response to Hurricanes Katrina and Rita, 2006). There is a huge amount of data stored in different formats in various emergency response communities (Donkervoort et al., 2008; Levinsohn, 2000). The bottlenecks are in most cases the difficulties in making intelligent search and integration of heterogeneous geospatial information (Abdalla et al., 2007; White House, 2006; Mansourian et al., 2006). It is ineffective to use a general Internet search engine

∗ Corresponding author. Tel.: +1 860 486 3656; fax: +1 860 486 1348. E-mail addresses: [email protected] (C. Zhang), [email protected] (T. Zhao), [email protected] (W. Li). 0303-2434/$ – see front matter. Published by Elsevier B.V. doi:10.1016/j.jag.2010.05.004

to search for geospatial data. Better methods are needed for effective search and sharing of semantically heterogeneous geospatial data (Li et al., 2008; Yang et al., 2008; Wiegand and García, 2007). Another important issue blocking disaster and emergency management applications from quick acquisition and integration of geospatial data over the web is that most practices have been focused on web data sharing at the file level. That is, to share and exchange geospatial information, users must request entire datasets or data files from different data sources via online downloading. There are several problems with file-level data sharing systems. An important one is that file-level data sharing makes it difficult to provide feature-level data searches, access, and exchange in real time over the web. By feature level, we mean accessing and exchanging data at individual feature (e.g. a location represented by a point, line, or a polygon) level rather than at the file level as is the common practice in data sharing with the implemented SDI (Spatial Data Infrastructure) (e.g. Crompvoets et al., 2004). Disaster and emergency management applications must download a whole data file for analysis, even if they need only several features of the file or have interest only in a small portion of the file. Downloading entire datasets or data files increases the time of data acquisition and analysis and affects the speed of decisionmaking. Therefore, file-level data sharing is insufficient to meet the demands of disaster and emergency applications that need quick access to the most up-to-date feature-level data.

410

C. Zhang et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 409–418

Recently, OGC WFSs are becoming popular for the development of distributed systems at the feature level over the Internet for time-critical applications (OGC, 2005; Zhang and Li, 2005; Zhang et al., 2007). Although the fast development of OGC WFS technologies has undoubtedly improved the sharing and synchronization of feature-level geospatial information across diverse resources, literature shows that there are limitations in the current implementation of OGC WFSs. The existing implementations of OGC WFSs only emphasize syntactic data interoperability via standard interfaces, and cannot resolve semantic heterogeneity problems in geospatial data sharing (OGC, 2006; Lutz and Klien, 2006). With OGC WFSs it is quite difficult to perform an intelligent contentbased search and users cannot correctly utilize the discovered WFSs without additional human assistance or programming. To overcome aforementioned problems and help emergency responders and disaster managers find new ways of efficiently searching the needed geospatial information at the feature level, this paper proposes a framework for automatic search of geospatial features using Geospatial Semantic Web technologies and natural language interfaces. Recently, efforts are underway to facilitate geospatial data sharing using ontologies and Geospatial Semantic web technologies for emergency response and disaster management (e.g. Athanasis et al., 2009; Bakillah et al., 2007; Lan et al., 2008; Klien et al., 2005; Pundt, 2008; Xu and Zlatanova, 2007; Zhang, 2007). This paper is focused on the issue of directly searching geospatial data at the feature level using a natural language query for disaster and emergency respond applications. A solution and some algorithms were developed to improve the automatic geospatial feature data discovery and a natural language interface for semantic queries was also provided. The authors of this paper have been working on the topics of automatically searching geospatial data at feature level over the Web for quite several years and have made some progress. A GML (Geography Markup Language) approach for developing geographical databases to enable geospatial data interoperability at the feature level was first introduced by Zhang et al. (2003). Peng and Zhang (2004) then proposed using GML as a coding and data transporting mechanism to achieve data interoperability, using SVG to display GML data on the Web, and using WFSs as a data query mechanism to access and retrieve geospatial data at the feature level in real time on the web. Later, Zhang and Li (2005) implemented a prototype for emergency applications to query, extract, create, delete, update, and map geographic features stored in OGC simple feature data-stores using OGC WFSs and WMSs (Web Map Services). Although the use of open standards and OGC web service technologies offer the potential to overcome the heterogeneous problems of legacy GIS and sharing geospatial data at the feature level, it is realized that they cannot resolve semantic heterogeneity problems in geospatial data sharing. Thus Zhang et al. (2007) proposed Geospatial Semantic Web technologies to search and access geospatial data and services based on their content instead of keywords in the metadata by using ontologies. To avoid the time-consuming and error prone task of manually building ontologies, they proposed an approach to develop OWL (Web Ontology Language) by using the existing UML data models and developed an algorithm of transforming UML model into OWL ontology knowledge base (Zhang et al., 2008). To enable more efficient and powerful discovery and use of geospatial data at the semantic level, Zhao et al. (2008) suggested a method to enable RDF (Resource Description Framework) ontology query on spatial data from WFS services by rewriting user queries to WFS getFeature requests. Recently, Zhang et al. (2010a,b) further developed geospatial feature discovery algorithms and geospatial web feature service composition algorithms for feature-level geospatial data sharing and proposed a partitionrefinement algorithm for heterogeneous ontology integration for developing interoperable Spatial Decision Support Systems (SDSS).

Although these studies all address important pieces of the major challenge – automatic search of geospatial data at the feature level over the web, many issues still remain to be solved to fully realize this goal. The intention of this paper is to introduce the recent progress made by the authors on this topic: (1) defining spatial rules for automatic derivation of implicit spatial relations and intelligent geospatial feature retrieval; (2) enabling the semantic query in natural language rather than using a structured query to extend the targeted users from those who are familiar with GIS and Geospatial Semantic Web technologies to the general public and emergency responders. 2. A motivating example Suppose an earthquake with a magnitude of 6.8 struck Mansfield, CT, USA. The earthquake caused large-scale human injury and destruction of properties in the region. To take immediate rescue actions, the Emergency Response Center (ERC) in Connecticut needs accurate, up-to-date data, such as data describing population, physical geography, political boundaries, infrastructure and other aspects of an area, which may come from many sources at federal, state, and local levels. However, the emergency event does not allow enough time to gather these resources. For example, suppose Department of Environmental Protection in Connecticut collects Connecticut town boundary layer data in one server and Department of Transportation in Connecticut holds Connecticut road network data in another server. To obtain the road information of Mansfield, the most common practice is that: First, the Emergency Responders download the needed entire GIS data files from both websites; then they use GIS and take several steps to process these downloaded entire files. They select the needed Mansfield street features by combining information from the Connecticut road network file and the Connecticut town boundary file using GIS spatial analysis functions. This process requires that the emergency responders have GIS skills and be familiar with GIS software. The other scenario assumes that the standard-based Spatial Data Infrastructure (SDI) has been developed in Connecticut to facilitate the exchange and sharing of different formats of geospatial data. Under this scenario, the Connecticut town boundary data and road network data have been distributed using standard OGC WFSs over the Internet from two servers. Although the OGC standards have been adopted under this scenario, it is still difficult to perform intelligent query to directly get the needed street features of Mansfield because of the heterogeneous semantic problems. To get the interested detailed street information of Mansfield, the procedure under this scenario is that: First, the emergency responders send the GetCapabilities query to both WFS servers. Through manually interpreting the query results they understand that Feature Type “ct:Road” line strings and “connecticut:Town” polygons are what they want. Then, they send one query to one WFS server to find out the location of Mansfield. After that, they issue a query to another WFS server to fetch the instances of street features in Mansfield that they are interested in. With OGC WFSs, although it is possible to query individual geospatial features remotely, it is difficult to implement the above processes automatically and perform intelligent contentbased search. The OGC WFS description only allows for the specification of the syntax of basic service contents such as operation metadata, FeatureType list, and filter capabilities, and it provides no semantic descriptions in the meaning of the contents of GetCapabilities, DescribeFeatureType, and GetFeature. For example, without the human interpretation of the syntax string “ct:MjRoad”, which semantically means “major road network data in Connecticut”, there is no way to automate the process of searching for “major roads in Connecticut” based on the WFS description alone. The emergency responders also cannot uti-

C. Zhang et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 409–418

411

Fig. 1. A framework for automatic search of geospatial features.

lize the discovered WFSs without additional human assistance or programming. It would be desirable for emergency responders, who usually do not have many GIS skills, to automatically search for these street features from multiple sources by using natural language such as “Find the streets in the town of Mansfield”. In this paper, a geospatial semantic web framework is proposed to facilitate automatic and intelligent search of geospatial features from multiple sources over the web.

3. Methods 3.1. The framework for automatic search of geospatial features The aim of the framework proposed here is to make the knowledge of heterogeneous geospatial datasets immediately accessible to emergency responders, who may not have much GIS experience. Fig. 1 illustrates the proposed framework for automatic search of geospatial features. The framework is based on a Service-Oriented Architecture, which is composed of service provider, service broker, and service client (Zhang et al., 2007). The service providers must maintain local ontologies to ensure the semantic interoperability. Local ontology refers to semantics used by the data providers. The OGC WFSs are used to publish feature-level data from heterogeneous databases. They are mapped to local OWL (Web Ontology Language) ontologies to provide a semantic-based view of the services, which span from the abstract description of capabilities of the services to the actual feature data contents. The ontology server is used to create mappings of equivalent or related classes and properties in the local ontologies. It keeps the taxonomy of geospatial terminologies and maintains the consistency of different local ontologies. Because OWL is based on Description Logics (DL), we use a DL-based reasoner and inference rules to collect a knowledge base for automatic service queries. To provide flexible query capa-

bility, we provide a friendly query interface to allow emergency responders to type query questions in easily understandable natural language. The parser layer is used to convert query questions into SPARQL queries. SPARQL is used to access the OWL knowledge base and find out geospatial features from multiple WFS sources through the knowledge base. The main advantages of the framework are: (1) the developed system can recognize and represent the implicit and explicit meaning of the heterogeneous geospatial data content and can achieve data interoperability at the semantic level; (2) the system can understand the semantic meaning of a query, and can convert the natural language query into a WFS understandable query; (3) the system can give appropriate answers to natural language queries, even if key query terms are not included in geospatial data contents or metadata. With the proposed framework, disaster responders do not have to manually parse their ideas and process heterogeneous GIS databases. They also do not have to download a whole data file for analysis if they need only several features of the file or have interest only in a small area of the file. On the contrary, they can automatically obtain needed geospatial features over the Internet. The following sections introduce the main technologies applied in the framework.

3.2. The OWL ontology knowledge base 3.2.1. Mapping OGC WFS feature descriptions into local OWL ontologies OGC WFS descriptions for GetCapabilities, DescribeFeatureType, and GetFeature operations are mapped to local OWL ontologies to provide richer semantic specification. These descriptions contain (1) a variety of GML objects such as coordinate reference systems, geometry, topology, time, and units of measures for describing geospatial features; (2) a collection of geospatial features types; (3) a set of feature attributes, which can be described as a name, type,

412

C. Zhang et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 409–418

value tuple; and (4) a group of geospatial feature instances. These descriptions are mapped to local OWL ontologies using the following mapping rules: (1) OWL classes are mapped to the GML feature types and GML class objects; (2) OWL properties are mapped to feature attributes and relations between features, which correspond to feature associations in ISO 19109; (3) Instances of OWL classes are mapped to individual geospatial features. We use a simple linear search algorithm to map these descriptions to OWL ontologies. It operates by checking all elements in OGC WFS description files one

The notation of DL used to describe the knowledge base can be directly translated to the OWL-DL syntax. Assuming knowledge about a transit system is to be defined, we here give an example to show how to use DL to express the knowledge base. Example 1. Knowledge “A TransitRoute has at least two TransitStops that are TransitPointFeatureEvent.” is stated using a DL: TransitRoute ⊆ (≥2 hasTransitStop.TransitPointFeatureEvent) Using OWL/RDF syntax the above knowledge would be written as:

at a time in sequence until an element is found. Once an element is found, it is inserted into a queue. Then it pulls out an element from the beginning of the queue, and maps the element into OWL ontologies. It processes one element at a time in sequence until the queue is empty and maps every OGC WFS descriptions to OWL ontologies. Once the elements of the OGC WFS descriptions have been mapped to OWL ontologies, a WFS client can distinguish all FeatureTypes and FeatureProperties in GetCapabilities, DescribeFeatureType, and GetFeature operations. The client can also properly interpret the query results of FeatureTypes and FeatureProperties, without additional interpretation to the results. Therefore, the client can automatically identify which FeatureTypes and FeatureProperties are required for the kind of search desired, transform those FeatureTypes and FeatureProperties, if necessary, to the appropriate (string) form, and interpret the FeatureTypes and FeatureProperties of a returned GML result. 3.2.2. Using DL (Description Logics) to represent knowledge The proposed framework uses a DL based knowledge base for automatic geospatial feature search. To provide means for dealing with spatial descriptions used in OGC WFS operator descriptions, such as equals, disjoint, intersects, touches, contains, crosses, we develop an extended DL formalism to reason about spatial relations (Zhang et al., 2010b). The extended DL provides modeling constructs, which can be used to represent spatial relations as defined roles. The reasoning tasks depend on the extended DL formalism for representing spatial knowledge. Spatial reasoning can be done by deriving spatial relations from the given knowledge such as the existing Connecticut road network and Connecticut town boundary maps. The popular and well-known set of the RCC8 spatial relations for regions (polygons) is adopted in the extended DL formalism (Randell et al., 1992). Because most data in disaster management also involve the spatial relations between point and line, we extended the polygon RCC8 relations to point and line. The TBox of the knowledge base holds spatial relation names of the different RCC, point, line species structured in a hierarchy of object properties. These names are used to assert selected spatial relations between individual geospatial features in the ABox of the knowledge base. Using these base spatial relations, further spatial knowledge can be expressed as a union of different possible base spatial relations.

Most basic spatial relations can be geometrically computed and asserted in the ABox of an OWL DL knowledge base. With knowledge reasoning, software can detect implicit information in formal conceptual models for geospatial domain objects and reason about the relations in geospatial data, thus the proposed framework can support powerful queries. 3.2.3. Spatial rules for implicit knowledge Even though, in principle, all spatial relations can be geometrically computed and asserted in the ABox of an OWL DL knowledge base, we won’t do so in the proposed framework because asserting all spatial relations held among geospatial features can easily result in a very large knowledge base thereby decreasing the performance of the system. So in the proposed framework we only provide minimal representation of the basic relations in the ABox of an OWL DL knowledge base, and we infer or calculate those relations which are not represented using spatial rules when requested at runtime. We define spatial rules for reasoning over spatial relations between objects in space. These spatial reasoning rules can be used as the deduction rules for automatic derivation of implicit spatial relations. We define these rules based on the literatures on spatial reasoning (e.g. Sun and Li, 2005). For example, the facts that the town of Mansfield is located inside the state of Connecticut and the state of Connecticut is inside the New England region imply that the town of Mansfield is also inside the New England region. We give some examples of the rules we used in the proposed framework. Note that each of the rules below has the form of: conclusion::condition 1, condition 2, . . . , condition n The semantics of the rules is that if all the conditions are true, then so is the conclusion. Rule 1: Transitivity of left (right) of, above, behind, inside, east, west, north, northwest, northeast, southwest, and southeast. This rule indicates the transitivity of some of the relations. Let x denotes any relation in {left(right) of, above, behind, inside, east, west, north, northwest, northeast, southwest, and southeast}. We have the following rule for each such x: A × C :: A × B, B×C Rule 2: This rule captures the interaction between the relations involving left-of, above, behind, and the relation involving overlaps. Let x denotes any of the relation symbols- left-of, above and

C. Zhang et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 409–418

413

behind. We have the following rule for each such z. A × D :: A × B, B overlaps C, C × D Rule 3: This rule captures the interaction between the relations involving left-of, above, behind, outside, and the relation involving inside. Let x denotes any relation symbol in {left-of, above, behind, outside}. We have the following two rules for each such x. (a) A × C :: A inside B, B × C (b) A × C :: A × B, C inside B

Rule (b) is redundant for the case when x is the relation symbol outside; for the other cases (a) and (b) are independent. Rule 4: Symmetry of Equal, overlaps, externally connected, and disjoint: This rule captures the symmetry of equal, overlaps, externally connected, and disjoint. Let x denotes any of these relations. We have the following rule for each such x. A × B :: B×A Rule 5: Inverse Property: This rule says that the following directional relations are inverses of each other: A North B A Northeast B A Northwest B A East B A left B A Above B

:: :: :: :: :: ::

B South A B Southwest A B Southeast A B West A B right A B Below A

Rule 6: This rule allows one to deduce that two objects are outside each other if one of them is to the left of, or above, or behind the other object. Let x denotes any of the relation symbols in {left-of, above, behind}. We have the following rule for each such x. A outside B :: A × B. One of the characteristics of these rules is that they enable systems built on them to infer implicitly represented knowledge from explicit knowledge in the knowledge base. These rules can help employ reasoning to support intelligent queries. By defining above rules, the implicit knowledge of spatial relations can be obtained. The spatial reasoning is necessary because emergency responders would like to use the qualitative reasoning more than precision quantitative measurements for query. For example, emergency responders are in most cases interested in whether Mansfield is located inside of Connecticut, instead of whether Mansfield has smaller latitude than Connecticut. The flexible query can be done by using the above rules.

3.3. Query geospatial features from multiple heterogeneous sources 3.3.1. Semantic query To allow emergency responders to simultaneously access semantically heterogeneous geospatial features in different datasets/files that may be located at different data servers, we develop a semantic query solution for the search of geospatial features over the network. The semantic queries deliver geospatial features as a single response from all necessary data sources in a timely fashion with minimal or no human assistance. Under the solution, we do not assume that emergency responders have detailed knowledge of data sources. The semantic queries are expressed using SPARQL select-queries which define a set of triple patterns using variables. With SPARQL semantic queries, spatial relations such as DWithin and Beyond can be expressed as user defined functions within the query. For example, we can define a property nearby as follows: ?a nearby ?b

:−

?a geometry ?geom, filter (DWithin(?geom, ?b, 0.01)

Fig. 2. The main procedure of the semantic query solution.

We can use this property in a triple such as ?a nearby (−88, 43), which will return geometries that are within the distance of 0.01 unit from the coordinate (−88, 43). The semantic queries overcome the problems that meet in the traditional text/keyword queries. For example, the traditional text/keyword queries require an exact word from the query to appear in the searched GML data/metadata. If a mistake is made or a word is used in a different form/name (synonyms/hyponyms) than in the data/metadata, users may not find the right answer. The semantic queries overcome these problems and can improve the query processing capabilities of GML data based on semantics derived from ontologies. The semantic queries attempt to obtain geospatial features without knowing their detailed syntactic structure. Unlike syntactic queries such as XPath and XQuery queries for GML, which only support retrieval of explicit data based on syntactic information, the semantic queries enable retrieval of both explicitly and implicitly derived information based on syntactic and semantic information contained in the data. The basic idea of the query solution is to refine the query initially provided by emergency responders, to make it understandable to the software program and send it to WFS servers as WFS queries. After the results are analyzed, the query is given lexical context and expanded. The main procedure of the semantic query solution is illustrated in Fig. 2: (1) parse the user queries in natural language, and translate them to SPARQL queries; (2) decompose the queries into a sequence of sub-queries; (3) automatically rewrite SPARQL sub-queries into WFS queries to multiple WFS servers; (4) send WFS queries to remote servers and retrieve a set of GML responses; (5) backwards chain/combine the retrieved GML results by performing spatial calculations within the knowledge base; (6) deliver the retrieved geospatial features as a single response to emergency responders. Here we use the query rewriting algorithms developed in Zhao et al. (2008) to rewrite SPARQL sub-queries into WFS queries (Fig. 3(a) and (b)). The query rewriting algorithms have two parts: The first part applies inference rules to the body of a SQARQL so that RDF triples with object properties are replaced by RDF triples with datatype properties. An inference rule i is applicable to a triple t if i.head matches t via a variable substitution s such that s(i.head) = t. The second part rewrites the resulting query to WFS getFeature requests.

414

C. Zhang et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 409–418

Fig. 3. Query rewriting and search algorithms.

To support efficient WFS feature search, we index the available WFS features in an index file. We use the following fields for the index file: (1) URI of the WFS server, (2) feature name, (3) feature property name, (4) geometry type of the feature, and (5) bounding box of the geometry. The ontology server maintains the index file with these fields and domains, and application ontologies with mappings to these features and their properties. To find geospatial feature data, the discovery algorithm considers a WFS description as a consistent collection of restrictions over the named properties of a WFS, such as URI of the WFS server, Feature Type name, Feature Property name, Geometry Type of Feature, and Bounding Box of Geometry. Fig. 3(c) shows the WFS feature search algorithm. The appropriate WFS features have been found by matching the descriptions required to solve the query with the descriptions of providers through the parameters. The search algorithm first uses the query Bounding Box of the Geometry parameter to narrow down the list of services in the repository. It gets all those services that produce at

least matched Bounding Box of Geometry (all WFSs that are located within the geography limitation). From those services it further narrows down the list of services by Geometry Type of Features, then by Feature Type names, and finally by Property names. All the description parameters provided by WFS must be equivalent to or subsume the required description parameters in the query. Whenever an exactly equivalent match is found, it is recorded with the highest score. Otherwise, according to the degree of match detected, it is recorded with a lower score. We calculate the score using the following math formula: s(i) = ST (T, T  ) ∗ WT + SP (P, P  ) ∗ WP + SG (G, G ) ∗ WG +SB (B, B ) ∗ WB ,

(1)

where s(i) is the calculated score, ST is the similarity score between two feature classes T and Tˇı, SP is the similarity score between two feature property sets, SG is the similarity score between two Geometry Type of Feature sets, SB is the similarity score between two Bounding Box of Geometry sets, and WP , WG , WB are the weights

C. Zhang et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 409–418

415

Fig. 4. A basic query example using SPARQL.

of the property set, geometry type and bounding box, respectively. The similarity score ST of the two feature classes T and T can be based on the number of inheritance levels between them. For example, ST (T,T ) = 1 if T is a direct subclass of Tˇı, ST (T,T ) = 0 if they have no inheritance relation, and we assign it a number between 0 and 1 if T extends from Tˇı indirectly. WT is the weight of the similarity score of feature classes. The similarity score SP of two property sets is computed the same way except that we add up the similarity scores between the matched properties in the two sets to produce the similarity score of the two sets. SG is computed the same way as ST . SB is calculated as below: SB (B, B ) = sizeOf (overlapAreaOf (B, B ))

(2)

3.3.2. Query interface The SPARQL semantic query solution may present a challenge to the emergency responders who have no training on how to use SPARQL or the data set that they wish to query. To help them acquiring the needed information from the knowledge base, we develop a natural language processing (NLP) interface to allow them express their needs without having knowledge in ontology or SPARQL. The interface takes input queries using natural language expressions and sends queries to multiple data sources through the ontology-based knowledge base. There are many challenges in the development of the NLP interface because of the ambiguity of the natural language. It is difficult to transform natural language queries into formal SPARQL queries because we need to correctly map the vocabulary of the natural language to the vocabulary of the knowledge base. So in the proposed framework we make use of the application and domain specific ontologies to produce a lexicon for translating user input. The lexicon is constructed automatically from the ontologies in the ontology server that creates the verb, noun and prepositional phrases with the relations in the ontological structure. We use the Stanford Parser (http://nlp.stanford.edu/software/lexparser.shtml) to provide a syntax tree for the natural language query. Based on the syntax tree we extract the sequence of the main word categories Noun (N), Verb (V), Preposition (P), Wh-Word (Q), and Conjunction (C). We generate a query skeleton from the

extracted word categories. Then we try to match the generated query skeleton with the synonym enhanced triples in the ontologies. The matching is controlled by domain and range information of the ontology. After identifying all possible triples in the query skeleton and combining them to the ontology’s resources, we create OntologyTriples which are represented by entities in the ontologies from the composed triples. The OntologyTriples are used to generate SPARQL queries. The lexicon used in the framework is composed of three sources: (1) ontology entities in the ontology server, including ontology classes (concepts), ontology properties (relations), and ontology instances (individuals), are used to limit the ambiguities and errors in the natural language interactions; (2) general dictionaries, such as WordNet, are used to enlarge the vocabulary of the ontology and help mapping user vocabulary to ontology vocabulary; (3) application specific synonyms, such as user-defined synonymy words, are used to define application jargons and abbreviations. The lexicon can be updated in an automatic way from the ontology server for the specific application and domain of use. It can insert the words in the lexicon with the semantic value (classes) that they inherit from the ontologies used by the applications. By using the NLP interface, emergency responders can intuitively find out the requested geospatial features without special GIS or programming training. The interface takes advantage of the semantic markup and ontologies to interpret the natural language queries and exposes semantic WFS services directly to emergency responders.

4. Some results of an implemented prototype Based on the proposed framework we implemented a prototype to allow emergency responders to automatically search for geospatial features from multiple sources by using natural language. In the implemented prototype, semantically heterogeneous data—roads, town boundary, and airports in Connecticut—were served by the three different WFS servers. Geoserver (http://geoserver.sourceforge.net/html/index.php) was used to publish geospatial data using OGC WFSs. Apache Tomcat

416

C. Zhang et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 409–418

Fig. 5. The natural language interface for the implemented prototype.

server was employed as a container for WFS servers and ontology servers. The prototype uses Jena (http://jena.sourceforge.net/), a Semantic Web framework for Java, to access ontology definitions, analyze their consistency, and infer knowledge needed for sending WFS requests. A spatial ontology server was developed based on Joseki (http://www.joseki.org/) to answer ontology queries in SPARQL forms. The client was developed using OpenLayers to render the ontology queried results as graphic maps. Stanford Parser was used as the natural language parser. Our experiments show that with the implemented prototype it is possible to directly search geospatial features from semantically heterogeneous sources using semantic web technologies. Fig. 4 illustrates one basic query example using SPARQL for the geometry of towns in New London county by the county name “New London” (http://boyang.cs.uwm.edu:8080/ijaeog/basic.html). In this example, the geometry of towns in “New London” county is queried from storage in the ontology server. Because OWL ontologies in the ontology server provide richer semantic specifications for OGC WFSs, software program can properly interpret the query results and can automatically identify the desired feature types and feature properties. Thus, it is possible to perform intelligent content-based search of WFSs at semantic level. Our prototype provides an intuitive and easy-to-use interface to allow emergency responders to query the ontology-based knowledge base using natural language. Fig. 5 shows the NLP interface of the prototype to query street features in Mansfield by

using the sentence “Find the streets in the town of Mansfield” (http://boyang.cs.uwm.edu:8080/ijaeog/). This query is parsed and translated to SPARQL queries. Note that the returned streets are those that are within the polygon of the town of Mansfield and the results come from spatial analysis of the two files in two separate WFS servers: Connecticut road file and Connecticut town boundary file. Retrieving street feature is slower because of large number of instances have to be rendered in the browser. The main procedure in the example is as follows: Users send natural language queries, which are converted into ontology queries in SPARQL. The SPARQL queries are rewritten to WFS getFeature requests using the DL reasoning ability or inference rules if necessary. At system initialization phase, the system takes a list of known WFSs and sends a getCapabilities request to each of the services. The returned feature data are parsed to obtain the list of features supported in each web service. The name, the bounding box, and the description of each service are also stored. For each discovered feature, a describeFeatureType request is sent to obtain their properties and their types including the geometry types. The next step is to compare the feature names and their descriptions to the ontology classes defined earlier to associate the features to the application ontology classes. Then, repeat the same task for the application ontology properties. Internally, one ontology class is generated for each WFS feature type and one ontology instance is generated for each WFS feature instance. The ontology class simulates the WFS feature so that it contains all the feature properties as ontology

C. Zhang et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 409–418

properties. The ontology instance includes other feature information including server URL, bounding box, description, geometry type, and a list of property names and their types. Based on the mapping between WFS features and domain/application ontologies, subclass relations to the generated ontology classes are added and the equivalent ontology properties are merged. The built-in reasoning algorithm of Jena is used to infer feature instances (i.e. route segments and stops) that users would like to retrieve. When users query roads, the implemented prototype takes the request and processes it using the discovery algorithms explained earlier to retrieve a list of street features suitable to answer the query. 5. Conclusion Emergency responders need instant access to diverse data to make quick decisions and take prompt actions. With OGC WFSs, although it is possible to search for and access feature-level information in a distributed Internet environment, it is very difficult to automate the discovery of these geospatial features if the semantic disparity exists. OGC WFSs only allow for the specification of the syntax of the data content and does not support the specification of the data semantics. Therefore there is no way using natural language to automatically retrieve the desired data sets from two semantically heterogeneous sources (road data source and town data source) by simply expressing the request as “find the streets in the town of Mansfield” based on the OGC WFS descriptions alone. To facilitate automatic and intelligent search of geospatial features, in this paper we propose a geospatial semantic web framework which is capable of processing such requests. We focus on two major tasks: (1) intelligent geospatial feature retrieval using Geospatial Semantic Web technologies; (2) a natural language interface to the geospatial knowledge base and web feature services over the Semantic Web. The proposed framework brings together the advantages of natural language querying and the power of the geospatial semantic web, and it can capture and analyze geospatial feature information beyond the purely lexical and syntactic level. Based on the proposed framework a prototype has been implemented. The results of the prototype show that it is possible to directly discover desirable geospatial features from multiple semantic heterogeneous sources using geospatial semantic technologies. With geospatial semantic web technologies, geospatial features turn out to be machine understandable and the retrieval of the semantically heterogeneous features can be precise to the level that the results of user queries can be immediately useful, without further processing and weeding out irrelevant information. This overcomes the low recall/precision problem faced by the text/keyword based search, which does not capture the underlying semantics of data and forces users to express their query in the data sources’ vocabulary and syntax. By providing a natural language interface for accessing geospatial feature information, the proposed framework can dramatically lower the barrier of the disaster and emergency management applications entry to the Semantic Web. It is unnecessary for emergency responders to understand formal ontologies and precisely defined vocabularies. Although the proposed framework has the above obvious advantages, it does have its limitations such as how to efficiently handle large geographical knowledge bases. The search in the implemented prototype becomes slow for features with a large number of instances. Acknowledgments We thank the three anonymous reviewers for their comments on the manuscript. This research is partially supported by USA NSF

417

grant no. 0616957. Authors have the sole responsibility to all of the viewpoints presented in the paper. References Abdalla, R., Tao, V., 2005. Integrated distributed GIS approach for earthquake disaster modeling and visualization. In: van Oosterom, P., Zlatanova, S., Fendel, E.M. (Eds.), Geo-Information for Disaster Management. Springer, pp. 1183–1192. Abdalla, R., Tao, C.V., Li, J., 2007. Challenges for the application of GIS interoperability in emergency management. In: Li, J., Zlatanova, S., Fabbri, A. (Eds.), Lecture Notes in Geoinformation and Cartography: Geomatics Solutions for Disaster Management. Springer, pp. 201–224. Athanasis, N., Kalabokidisa, K., Vaitisa, M., Soulakellis, N., 2009. Towards a semanticsbased approach in the development of geographic portals. Comput. Geosci. 35, 301–308. Bakillah, M., Mostafavi, M.A., Brodeur, J., Bédard, Y., 2007. Mapping between dynamic ontologies in support of geospatial data integration for disaster management. In: Li, J., Zlatanova, S., Fabbri, A. (Eds.), Lecture Notes in Geoinformation and Cartography: Geomatics Solutions for Disaster Management. Springer, pp. 201–224. Coppock, J.T., 1995. GIS and natural hazards: an overview from a GIS perspective. In: Carrara, A., Guzzetti, F. (Eds.), Geographical information systems in assessing natural hazards. Kluwer Academic, Netherlands, pp. 21–34. Crompvoets, J., Bregt, A., Rajabifard, A., Williamson, I., 2004. Assessing the worldwide developments of national spatial data. Int. J. Geogr. Inf. Sci. 18, 665–689. Donkervoort, S., Dolan, S.M., Beckwith, M., Northrup, T.P., Sozer, A., 2008. Enhancing accurate data collection in mass fatality kinship identifications: lessons learned from hurricane Katrina. Forensic Sci. Int. Genetics 2, 354–362. Klien, E., Lutz, M., Kuhn, W., 2005. Ontology-based discovery of geographic information services – an application in disaster management. Comput. Environ. Urban Syst. 30, 102–123. Lan, G.W., Zhou, X.Q., Huang, Q.Y., 2008. Integration of geographic information services fro disaster management. Int. Arch. Photogrammetry, Remote Sens. Spatial Inf. Sci., XXXVII (Pt. B4), 725–730. Levinsohn, A., 2000. Spatial data insights – geospatial interoperability: the holy grail of GIS. Geo World 13, 28–29. Li, W., Yang, C., Raskin, R., 2008. A Semantic enhanced model for searching in spatial web portals. In: AAAI Spring Symposium Semantic Scientific Knowledge Integration Techinical Report SS-08-05, Palo Alto, CA, pp. 47–50. Lutz, M., Klien, E., 2006. Ontology-based retrieval of geographic information. Int. J. Geogr. Inf. Sci. 20, 233–260. Mansourian, A., Rajabifard, A., Valadan Zoej, M.J., Williamson, I.P., 2006. Using SDI and web-based system to facilitate disaster management. Comput. Geosci. 32, 303–315. Monmonier, M., Giordano, A., 1998. GIS in New York State county emergency management offices. Appl. Geogr. Studies 2, 95–109. OGC, 2005. Web feature Service Implementation Specification, Version 1.1.0, Document 04-094, http://www.opengeospatial.org/specs/?page=specs, accessed on 12 July 2008. OGC, 2006. Geospatial Semantic Web Interoperability Experiment Report, document 06-002r1, http://www.opengeospatial.org/projects/initiatives/gswie, accessed on 10 May 2009. Peng, Z.-R., Zhang, C., 2004. The roles of geography markup language, scalable vector graphics, and web feature service specifications in the development of Internet geographic information systems. J. Geogr. Syst. 6, 95–116. Pundt, H., 2008. The semantic mismatch as limiting factor for the use of geospatial information in disaster management and emergency response. In: Ziatanova, S., Li, J. (Eds.), Geospatial Information Technology for Emergency Response. Taylor & Francis Group, London, pp. 243–256. Randell, D.A., Cui, Z., Cohn, A.G., 1992. A spatial logic based on regions and connections. In: Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning (KR’92). Sun, H., Li, W., 2005. Spatial reasoning based on rules. Lecture notes in computer science LNCS3561: Mechanisms, Symbols, and Models underlying Cognition 3561, 469–480. U.S. House Select Bipartisan Committee to Investigate the Preparation for and Response to Hurricanes Katrina and Rita, 2006. A Failure of Initiative: The Final Report of the Select Bipartisan Committee to Investigate the Preparation for and Response to Hurricanes Katrina and Rita, Washington, DC, February 15. White House, 2006. The Federal Response to Hurricane Katrina: Lessons Learned. Executive Office of the President, Washington, DC, http://www. whitehouse.gov/reports/Katrina-lessons-learned.pdf, accessed on 10 May 2009. Wiegand, N., García, C., 2007. A task-based ontology approach to automate geospatial data retrieval. Trans. GIS 11, 355–376. Xu, W., Zlatanova, S., 2007. Ontologies for disaster management response. In: Li, J., Zlatanova, S., Fabbri, A. (Eds.), Lecture Notes in Geoinformation and Cartography: Geomatics Solutions for Disaster Management. Springer, pp. 185–200. Yang, C., Li, W., Xie, J., Zhou, B., 2008. Distributed geospatial information processing: sharing distributed geospatial resources to support digital earth. Int. J. Digital Earth 1, 259–278. Zerger, A., Smith, D.I., 2003. Impediments to using GIS for real-time disaster decision support. Comput. Environ. Urban Syst. 27, 123–141. Zhang, C., 2007. Towards real-time feature level spatial data sharing based on geospatial semantic web services. In: Proceedings of the Third IASTED International Conference on Environmental Modeling and Simulation, Honolulu,

418

C. Zhang et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 409–418

HI, USA, August 20–22, http://www.actapress.com/PaperInfo.aspx?PaperID= 31229. Zhang, C., Li, W., 2005. The roles of web feature service and web map service in real time geospatial data sharing for time-critical applications. Cartogr. Geogr. Inf. Sci. 32, 269–283. Zhang, C., Li, W., Peng, Z.-R., Day, M., 2003. GML-based interoperable geographical database. Cartography 32, 1–16. Zhang, C., Li, W., Zhao, T., 2007. Geospatial data sharing based on geospatial semantic web technologies. J. Spatial Sci. 52, 11–25. Zhang, C., Peng, Z.-R., Zhao, T., Li, W., 2008. Transforming transportation data models from UML to OWL. J. Transport. Res. Board: Transport. Res. Rec. 2064, 81–89.

Zhang, C., Zhao, T., Li, W., 2010a. A framework for geospatial semantic web based spatial decision support system. Int. J. Digital Earth 3, 111–134. Zhang, C.T., Zhao, W., Li, J., Osleeb, 2010b. Towards logic-based geospatial feature discovery and integration using web feature service and geospatial semantic web. International Journal of Geographical Information Science 24, 903–923. Zhao, T., Zhang, C., Wei, M., Peng, Z.-R., 2008. Ontology-based geospatial data query and integration. Lecture Notes in Computer Science LNCS5266: Geographic Information Science 5266, 370–392. Zlatanova, S., Li, J., 2008. Geospatial Information Technology for Emergency Response. Taylor & Francis Group, London, pp. 394.

Suggest Documents