Available online at www.sciencedirect.com
Procedia Computer Science 5 (2011) 920–925
Artifact 2011
Improving intelligence through use of Natural Language Processing. A comparison between NLP interfaces and traditional visual GIS interfaces. Davide Calì1, Antonio Condorelli2, Santo Papa3, Marius Rata4, Luca Zagarella5 1
[email protected], 2
[email protected],
[email protected], 4
[email protected], 5
[email protected]
Abstract dŚŝƐ ƉĂƉĞƌ ĂŝŵƐ Ăƚ Ă ĐŽŵƉĂƌĂƚŝǀĞ ĂŶĂůLJƐŝƐ ďĞƚǁĞĞŶ ĂĐĐĞƐƐŝŶŐ '/^ ƵƐŝŶŐ ƚƌĂĚŝƚŝŽŶĂů ǀŝƐƵĂů ŝŶƚĞƌĨĂĐĞƐ ĂŶĚ ŶĞǁ ŽŶĞƐ͕ ǁŚŝĐŚ ŝŶǀŽůǀĞ Ă EĂƚƵƌĂů >ĂŶŐƵĂŐĞWƌŽĐĞƐƐŝŶŐ;E>WͿĂƉƉƌŽĂĐŚƚŚĂƚƉƌŽǀŝĚĞƐƐƵƉƉŽƌƚĨŽƌƚŚĞŝŶƚĞƌĂĐƚŝŽŶďĞƚǁĞĞŶƵƐĞƌƐĂŶĚƚŚĞŝŶĨŽƌŵĂƚŝŽŶƐLJƐƚĞŵ͘>ĂƚĞůLJ͕'/^ŚĂǀĞďĞĞŶ ƵƐĞĚĂƚĂǀĞƌLJůĂƌŐĞƐĐĂůĞ͖ĂůƐŽ͕ƵƐĞƌƐŽĨƚŚĞƐĞƐLJƐƚĞŵƐĂƌĞĚŝĨĨĞƌĞŶƚŝŶŵĂŶLJĂƐƉĞĐƚƐ͕ůŝŬĞƚĞĐŚŶŝĐĂůƐŬŝůůƐ͕ŝŶƚĞƌĞƐƚƐ͕ŐŽĂůƐ͕ďĂƐĞŽĨŬŶŽǁůĞĚŐĞ͕ĞƚĐ͘ ŵŽŶŐ'/^ĨĞĂƚƵƌĞƐƚŚĞůŽĐĂƚŝŽŶƐLJƐƚĞŵƐƉůĂLJĂďŝŐƌŽůĞĨŽƌ/d^͘dŽŽƉƚŝŵŝnjĞƵƌďĂŶŵŽďŝůŝƚLJŝƚŚĂƐďĞĐŽŵĞŵŽƌĞĂŶĚŵŽƌĞĐƌƵĐŝĂůƚŽďĞĂďůĞƚŽ ŽĨĨĞƌůŽĐĂƚŝŽŶďĂƐĞĚƐĞƌǀŝĐĞƐǀĞƌLJĂĐĐƵƌĂƚĞĂŶĚĂǀĂŝůĂďůĞĞǀĞŶĨŽƌŶŽƚĞdžƉĞƌƚƵƐĞ͘/ŶƚŚŝƐƐĐĞŶĂƌŝŽE>WĂůůŽǁƐƚŽƵƐĞƵŶͲƉƌĞĐŝƐĞĞdžƉƌĞƐƐŝŽŶƐĂŶĚ ƋƵĞƌŝĞƐ͕ŽĨƐLJŶŽŶLJŵƐĂŶĚƋƵĂůŝƚĂƚŝǀĞĞdžƉƌĞƐƐŝŽŶƐŝŶĂǀĞƌLJƐŝŵƉůĞǁĂLJ͕ƉƌŽƉŽƐŝŶŐŝƚƐĞůĨĂƐĂƉŽǁĞƌĨƵůŝŶƚĞƌĨĂĐĞĞŶĂďůĞƌ͘ dŚĞ ŵĞƚƌŝĐƐ ĚĞĨŝŶĞĚ ĂŶĚ ĐŽŶƐŝĚĞƌĞĚ ďLJ ƚŚĞ ĂƵƚŚŽƌƐ ƚŽ ďĞ ƌĞůĞǀĂŶƚ͕ ĨƌŽŵ ƚŚĞ ƉŽŝŶƚƐ ŽĨ ǀŝĞǁ ŵĞŶƚŝŽŶĞĚ ĂďŽǀĞ͕ ĂƌĞ͗ ĂĐĐĞƐƐŝďŝůŝƚLJ ;ƚĞĐŚŶŝĐĂů ƐŬŝůůƐ ƌĞƋƵŝƌĞĚͿ͕ƐƉĞĞĚ;ƚŝŵĞƌĞƋƵŝƌĞĚƚŽŝĚĞŶƚŝĨLJƚŚĞĚĞƐŝƌĞĚƌĞƐƵůƚƐͿĂŶĚƋƵĂůŝƚLJ;ŚƵŵĂŶƉĞƌĐĞƉƚŝŽŶŽĨŵŝƐƚĂŬĞƐĂŶĚƚŚĞƐƚĂŶĚĂƌĚŝnjĂƚŝŽŶŽĨƚŚĞĨŽƌŵƵůĂƐ ƵƐĞĚƚŽƉĞƌĨŽƌŵƚŚĞƋƵĞƌŝĞƐͿ͘džƉĞƌŝŵĞŶƚƐĂƌĞƉĞƌĨŽƌŵĞĚ ƵƐŝŶŐĂ ƚƌĂĚŝƚŝŽŶĂů'/^ŝŶƚĞƌĨĂĐĞ;ƌĐ'/^ͿĂŶĚĂE>WďĂƐĞĚŝŶƚĞƌĨĂĐĞĐĂůůĞĚ^Y>ĂŶĚ ƉƌŽǀŝĚĞĚďLJƚŚĞĐŽŵƉĂŶLJhůĂ͘hůĂ͛ƐƚĞĐŚŶŽůŽŐLJŝƐďĞŝŶŐƵƐĞĚĨŽƌƚŚĞƉƌŽũĞĐƚŝͲdŽƵƌǁŚŝĐŚŝƐĚĞǀĞůŽƉĞĚƵŶĚĞƌƚŚĞϳƚŚĨƌĂŵĞǁŽƌŬǁŝƚŚƚŚĞĨŝŶĂŶĐŝĂů ƐƵƉƉŽƌƚŽĨƚŚĞ͘
Keywords: NLP, GIS, A.I;
Introduction The use of geographical information systems (GIS) with the purpose of managing, analyzing and presenting data with reference to the underlying cartographic geography has grown rapidly during recent years. A key concern in GIS applications regards the storage, handling and presentation of data from various domains of activity at a common set of geospatial coordinates with the purpose of swiftly identifying and accessing the resources or data available. The need for coherent and contextual use of geographic information between different stakeholders, such as departments in public administrations, formed the basis for a number of initiatives aiming at the sharing of spatial information, e.g., the INfrastructure for SPatial InfoRmation in Europe (INSPIRE) [8, 10] and this move towards a common GIS for many different applications is sometimes referred to as Spatial Data Infrastructure (SDI). However, the accessibility of information using different and incompatible GIS software has meant that users have adapted to a specific terminology, and the acquired database knowledge (structure and contents) and familiarization with the visual GIS interfaces module, has not been easily transferable to alternative GIS software. This is in spite of a large set of common questions asked by GIS users. The massive volume of information that need be stored in such a system discourage the user in identifying the desired information by using a “category by category” approach. Not the same can be said if the user could perform
1877–0509 © 2011 Published by Elsevier Ltd. Open access under CC BY-NC-ND license. Selection and/or peer-review under responsibility of Prof. Elhadi Shakshuki and Prof. Muhammad Younas. doi:10.1016/j.procs.2011.07.128
Davide Calì et al. / Procedia Computer Science 5 (2011) 920–925
921
queries in a natural language on a system that is able to take the responsibility of identifying the key components from the specified queries and provides the sought results or a user-friendly mechanism towards filtering the request/results. The paper is organized as follows: 1) A general overview about how the traditional GIS applications work; 2) A general presentation on how a NLP approach can be applied over GIS; 3) Presentation of the i-Tour system and how it works; 4) Comparative analysis of the results obtained from real life scenarios; 5) Conclusions and remarks. 1. OVERVIEW ON THE INTERACTION WITH TRADITIONAL GIS SYSTEMS GIS technology is a computer-based data collection, storage, and analysis tool that combines previously unrelated information into easily, understandable maps. Also, it can perform analytical functions and then present the results visually as maps, tables or graphs, allowing users to see the issues and then select the best course of action. Combined with the Internet, GIS offers a consistent and cost-effective tool for the sharing and analysis of geographic data among government agencies, private industry, non-profit organizations, and the general public. The themes in the below graphic are only a small example of the wide array of information that can be viewed or analyzed via GIS.
Fig. 1, GIS Themes
GIS is used to display and analyze spatial data which are tied to databases. Maps can be drawn from the database and data can be referenced from the maps. When a database is updated, the associated map can be updated as well. Two main problems of current GIS are that often they do not perform the functions the users really want, and the users do not know exactly how to translate their needs into operations that the system can execute. The cost of training people in the use of a new GIS system is currently estimated to have the same order of magnitude as the acquisition of the necessary hardware and software (Mark & Frank [1]). In complex scenarios (e.g. involving Intelligent Transport Systems ITS) it becomes crucial introduce technologies able to reduce the learning or training time of users, and to increase the speed of the information retrieval process. 1.1 GIS GUI ACCESSABILITY The goal of the GIS interface design is to facilitate communication between the users and the GIS. The existence of different conceptualizations of space has repercussions that have given rise to competing theoretical and technical models for the design and use of GIS interface. Generally, there are three types of spaces that are interwoven together in desktop GIS applications: the GUI (Graphic User Interface) work space; the geographic space; and the map space. Integrating these three spaces into one interface can be challenging: A) Information in the map space and information in general GUI work space must be dealt with simultaneously through the same sort of spatial concepts and the same sensory channels, without users confusing one with the other. B) The inherent cognitive difference between small-scale spaces and large-scale spaces identifies further interaction needs. The operations for locomotion and changes of size (zoom in/out) provided by the map spaces are limited to a small set which cannot fully satisfy the needs of navigation in geographic spaces. 1.2 GIS USER KNOWLEDGE Ideally, the user interface of a computer program should present the users with concepts that are consistent with the users’ mental models of that phenomenon in the real world (Norman, [16]). Hence, the more that mental models of the users are understood, the better the user interface design will become. In the context of GIS, Nyerges [8] divided user knowledge into problem domain knowledge and tool domain knowledge. The problem domain knowledge in GIS (also called spatial knowledge) can be further divided into two
922
Davide Calì et al. / Procedia Computer Science 5 (2011) 920–925
subdomains - the conventional spatial knowledge (gained from daily experience) and the professional spatial knowledge, the latter being deeper and broader than the former (Nyerges [8]). In other words, to make users able to make queries in GIS software and applications in order to obtain what they want, a good skill regarding GUI and software functions is needed, and, also, a good level of knowledge about data (layers and database structures, contents and terminology). 2. HOW NLP APP WORKS Natural language appears to be an optimal substitute for formal query languages in allowing users to access databases (DBs) according to their own familiar concepts and requirements, but, the explicit and structured way in which information is stored in DBs is in sharp contrast with the inherent vagueness and implicitness of natural language semantics and, more generally, with the way users conceptualize the goal-oriented information they search has inhibited the substitution. The results presented in this paper show that developments in NLP make the use of natural language as a substitute is close to fruition. Systems based on “semantic grammars” were quite popular in the past decades, but, recently, they have been replaced by systems using one or more layers of some intermediate representation language. The user’s query, for example “How can I get to X, by car, in less than two hours”, is translated into a set of clauses, “I want to get to X” ^ “I travel by car” ^ “Journey time smaller or equal (