Databases and Natural Language Interfaces - CiteSeerX

7 downloads 10302 Views 228KB Size Report
(i) a type hierarchy — contains information on inheritance between classes of ..... The translation of a predefined formula with the INF predicate (INF(trm,cte)) ...
Databases and Natural Language Interfaces Porfírio P. Filipe 1

Nuno J. Mamede 2

1Inst. Sup. de Eng. de Lisboa, R. Conselheiro Emídio Navarro, 1949-014 Lisboa, Portugal 2CSTC/Instituto Superior Técnico, Av. Rovisco Pais, 1049-001Lisboa, Portugal [email protected] [email protected]

Abstract. A Natural Language Interface for Databases allows users of multimedia kiosks to formulate natural language questions. User questions are first translated into a logic language and subsequently into Structured Query Language (SQL), which is processed by a database management system to return the answer. This paper focuses on the translation stage. Special attention is devoted to the conceptual model, a relational database that organizes all the data supporting the translation process. The translation algorithm is presented and commented examples are used to better understand its functioning.

Keywords. language engineering, natural language interface for databases, conceptual model, relational database, type hierarchy, translation 1 Motivation One of the main characteristics of multimedia kiosks is their familiar visual appearance, reducing the complexity of communication between man and machine to a minimum. The anthroponomical synchronisation of Image, Video, Audio and Text is one of the crucial factors to “seduce” the user into wanting to experiment the system. Another fundamental characteristic is its usefulness. The user must feel, when using the system for the first time, that it can be useful. This is only possible with a well designed interface through which information can be easily accessed without needing to learn another vast and complex communication language (the one used by the system). In spite of the large variety of existing systems, a standard for these interfaces does not yet exist. As a consequence, the user can fully understand the system only after a certain amount of time. Another criticism one could have about this interaction method has to do with the fact that in a traditional system, with navigation through several successive windows, one can not always get the needed information. This occurs either because it does not exist (and the system is unable to inform the user about it) or because the user does not know the system’s language well enough to extract the desired information (it may take too many steps to get there). What could be the best browsing alternative that passes beyond the aforementioned limitation? The answer could be a Natural Language Interface for Databases (NLIDB). The evolution of technology has caused a continuos development of NLIDBs, especially in the area of natural language processing, exploring architectures that transform the NLIDBs into relational agents, and integrating languages and graphics that explore the advantages of both modalities [1][5]. Many times the common citizen needs to access information kept in databases. Almost all relational database management systems use SQL’s SELECT instruction as the standard interface to perform interrogations. This language is a cumbersome language for “normal” users.

Databases and Natural Language Interfaces 2

A NLIDB may translate the questions from natural language into SELECT instructions. The questions formulated by the user contain two types of information: (i) the information to be found, i.e., what the user expects in the reply; and (ii) the conditions the reply must satisfy. The translation process handles each type of information differently. Replies supplied to the user may have presentation requirements (text format, use of graphs, or videos, …) to clarify the answer, to make it more pleasant, or to complete it. If, for example, the reply includes an address, a presentation similar to the one used in postcards will certainly be appreciated. 2 System Architecture The system’s architecture (see Figure 1) is based on an Intermediate Representation Language (LIL), where the natural language question is transformed into an intermediate logical query before the final translation into an SQL query. This language expresses the meaning of the sentence in terms of high-level concepts, independent from database structure [2][ 3]. The system’s architecture can be seen as consisting of two large modules. The first module controls natural language processing (linguistic component), where a question is submitted and successively transformed (morphological, syntactic, and semantic analysis). One or more LIL expressions are obtained at the end of this process. These expressions correspond to the possible interpretations of the initial question. Given the domain’s dimension and the natural language’s flexibility, there will usually exist several interpretations for the same question. The second component is in charge of the connection with the database, translating the LIL expressions into SQL expressions and sending them to the database management system to produce the answers. The main advantage of this architecture is the complete separation between the linguistic component and the database knowledge. The portability of the system to other relational database is guaranteed by the conceptual model’s configuration. The translation is made by stages: natural language into a syntactic tree, then to a logical formula, and finally into a SELECT instruction. Communication with the database management system, which generates the replies, is carried out through an ODBC driver. Natural Language Question P o r t u g u e s e , F r e n c h , E n g l i s h ,...

Semantic Analysis

Morphological Analysis

Syntactic Syntactic Analysis

W Woorrddss Information

LIL/SQL LIL/SQL Translation Translation

LLooggiiccaall Q Quueerryy (LIL language)

SSyynnttaaccttiicc SSttrruuccttuurree ((ssyynnttaaccttiicc ttrreeee))

SQL Query (SELECT)

Domain D Daattaabbaassee

Conceptual Model

DBMS

Type Hierarchy Hierarchy Question Discovery Context

Consult A Annssw weerr

Consult General Representation

TTrraannssllaattiioonn Data

Answer Answer Interpretation

Fig. 1. Architecture of a Natural Language Interface for Databases.

The conceptual model (CM) is the component of an NLIDB that symbolically represents the constraints associated with the application’s domain [6]. The conceptual model has two components:

Databases and Natural Language Interfaces 3

(i) a type hierarchy — contains information on inheritance between classes of entities, and (ii) a general representation — an explicit representation of the domain’s conceptual constraints. During the semantic analysis it is necessary to verify if every question respects the conceptual constraints of the domain database, which may help choosing an adequate interpretation for the original question. For example, the question: “which hotels have swimming pools with salt-water” has two possible interpretations: the “salt-water” can be related to the “hotel” or to the “swimming-pool”. A request to the conceptual model asking if the “hotel” entity has the “saltwater” property will fail if the fact “hotel has salt-water” is irrelevant and, consequently, not represented in the conceptual model. When a question does not respect these domain constraints, we say it is semantically incorrect. 3 The Source Language The Logic Interface Language (LIL) is inspired in the formalism presented in the MASQUE/SQL project [3] and was first described in [9][11]. The LIL’s syntax is similar to that of first-order logic and has the expressive power of a predicate logic that allows representing real world concepts: (a) A term may be a constant or a variable or a function symbol applied to a tuple of terms. Constants and variables represent world objects, including abstract objects, such as events and situations. Examples of constants are Ritz, sauna, -15, and 1997; examples of variables are _3 and X. (b) A primitive formula contains a predicate symbol (written as a term) and one or two arguments (terms). A primitive formula is written as pred_symbol(term1[,term2]). Examples of primitive formulas are hotel(X) and have(X,sauna). (c) A LIL expression contains a set of LIL formulas separated by commas (denoting conjunction), and has the following syntax: formula1,…,formula2. An example of a LIL expression is author(X), write(X,book). (d) Logical connectives, conjunction (&) and disjunction (V), glue LIL formulas. The valid syntax is: V(pred1(term1,…),…, predn(termn1, …)), and &(pred1(term1, …), …, predn(termn1, …)). An example of a formula with logical connectives: &(have(X,restaurant),have(X,sauna)). (e) Predefined formulas have the following syntax: EXACT(term1,termcte)(equal), SUP(term1,termcte)(greater), and INF(term1,termcte)(lesser). 4 The Object Language The questions placed to a NLIDB system may contain three kinds of information that are used to identify: (a) the properties that are relevant to the answer; (b) the selection condition; and (c) how the reply is to be sorted. All these components are optional, i.e., may be unspecified. For example, in the question “which hotels have sauna?”, the relevant property is the “hotel name” and the selection condition is “have sauna”. Note that the original question does not identify any relevant property, being necessary to deduce it during the translation process. The translation from LIL into SQL involves the identification of the above components and transforming them into the syntax of the SELECT instruction. Figure 2 describes the syntax of the SELECT instruction, relating it with the contents of the natural language question: (a) the names (of columns) or expressions (that involve columns) that follow the SELECT keyword specify the properties that are relevant to the reply; (b) the tables that follow the FROM keyword specify the entities referred to in the question; (c) the logical expressions that follow the WHERE and HAVING

Databases and Natural Language Interfaces 4

keywords handle the conditions the reply must satisfy; (d) following the ORDER BY and GROUP BY keywords is the definition of the output sorting; (e) The clause FOR UPDATE OF is

A A nn ss w w ee rr Properties

Question Entities Answer C C oo nn dd ii tt ii oo nn A A nn ss w w ee rr Organization

Answer Condition

Answer O O rr gg aa nn ii zz aa tt ii oo nn Don't II nn tt ee rr ee ss tt

Fig. 2. Syntax of the SELECT instruction (SQL language).

meaningless to the translation process. The WHERE component is the most important considering our objectives. It may be necessary to use the HAVING component in alternative to the WHERE component when the logical expression calls SQL functions. 5 Translation The LIL-SQL translator [10] is based on several mapping tables, which are highly dependent on database organization. This process is very efficient and, more importantly, it can be used with any relational database, being only necessary to update the contents of the conceptual model. The question “which hotels do have sauna?” is translated to the following LIL expression: hotel(X), have(X,sauna), which is then transformed into SQL, for example: SELECT HOTEL.NAME FROM HOTEL WHERE HOTEL.QT_SAUNA>0. The answer may be produced using nominal lists of tourism resources, texts, and graphics. 5.1 Auxiliary Data Structure The data structure that sustains the LIL/SQL translation contains information that depends on the implementation of the domain database.

Fig. 3. Relational data model of the expanded conceptual model.

Databases and Natural Language Interfaces 5

Instead of creating a new data structure, we decided to extend the relational database that stored the conceptual model. The entity relationship diagram of that structure is presented in Figure 3. The tables and columns used to support translation (shaded), depend on how each concept (already belonging to the conceptual model) is represented in the domain database. Representing Symbols The representation of each symbol includes a field to keep its type. In the simplest case, the type of each symbol corresponds to an SQL type such as INTEGER, DATE, or CHAR. We use the _TYPE_ column of the SYMBOL table to represent the type of each symbol. The meaning of the column SYMBOL._TYPE_ is the following: (a) if the symbol is a property, then it denotes the SQL data type; (b) if the symbol is an entity, then it denotes the symbol represented by a view or a table; (c) if the symbol is an equivalent symbol, then it denotes a format function. The table SYMBOL also has the column _CONDITION_ to represent a condition with the following meaning: (a) if SYMBOL._TYPE_ is VIEW, it denotes a view and the _CONDITION_ value contains the SELECT instruction used to generate the view; (b) in the remaining cases, the _CONDITION_ value is a restriction (SQL logical condition). Representing Associations The requirement that each association’s type be represented is satisfied through the inclusion of the ASSOCIATION._TYPE_ column, enabling the expression of its direction (direct or inverse) and the side of N-ary associations having cardinality N: (a) character ‘D’, in the ASSOCIATION._TYPE_ column, denotes a direct association; (b) character ‘I’ denotes an inverse association. These associations are represented in the conceptual model only to extend the natural language vocabulary available to formulate questions. In the presence of several associations with the same arguments, only the association defined first is represented in the ASSOCIATION._ASSOC_ column. These associations, called base associations, have a type ‘B’ (column ASSOCIATION._TYPE_) instead of ‘D’. All associations with the same arguments, direct or inverse, are considered equivalent, being convertible into the corresponding base association (the arguments of the associations in the inverse direction are changed). When there are multiple associations with distinct meanings one of the entities has to be defined as equivalent in the EQ_SYMBOL table [7]. Representing Default Attributes and Default Entities Questions that have a set of entities as reply are common. For example, “which hotels are located in Lisbon?”. Note that this question cannot be directly translated into a SELECT instruction because the hotel property that should be part of the reply has to be determined. This corresponds to determining the relevant column in the HOTEL table. The DEFAULT table (a detail of the SYMBOL table) helps the translator to determine the properties implicit in a question. This table allows memorizing which property (SYMBOL2 column) represents an entity by default (SYMBOL1 column). Default entities and default properties are represented in the same way: column DEFAULT.SYMBOL2 contains the default entity associated with the property represented in column DEFAULT.SYMBOL1.

Databases and Natural Language Interfaces 6

Representing Values We represent in the conceptual model values that belong to the domain database. This is done to minimize the mismatch between the domain database and the conceptual model. Ideally, when the definition of the domain database is generated from the conceptual model, the data necessary to support the translation contains only meta-information about the domain database (mainly definitions of database keys). Whenever it is necessary to adapt the NLIDB to an existing domain, we also have to represent some data belonging to the domain database, such as “codes” and “abbreviations”. The relational conceptual model was enlarged with the OCCURRENCE table, a detail of the SYMBOL table. The OCCURRENCE.OC_ENTITY column refers an entity, the OCCURRENCE.OC_ATTRIBUTE column refers a relevant attribute, the OCCURRENCE.CODE column contains one code or one abbreviation valid for that attribute, and the OCCURRENCE.VALUE column contains the represented data. For example, we may represent the fact “red vehicle” in the database as ‘R’ if the OCCURRENCE.OC_ENTITY column refers the ‘vehicle’ entity, the OCCURRENCE.OC_ATTRIBUTE column the ‘color’ property, the OCCURRENCE.CODE column holds the ‘R’ code and the OCCURRENCE.VALUE column contains the ‘red’ string. Representing proper names The representation of proper names is similar to the representation of values. An entity’s proper name is stored in the OCCURRENCE table. To guide the search at execution time, at least one proper name for each entity must be included in this table. The representation of proper names to support translation is only possible in domains with few proper names. Otherwise, it is necessary to search, at execution time, the domain database to verify which are the table and the column to which the proper name corresponds. To translate the question “which are the books by José Saramago?” one has to determine that ‘José Saramago’ is the proper name of a writer. To find which table and column contain that proper name, it is necessary to search the OCCURRENCE.VALUE column. The result of this search the pair formed by the OC_ATTRIBUTE and OC_ENTITY columns of the OCCURRENCE table. Representing Keys A key is made of a set of columns that univocally identify a line of a table of the relational model. When the key belongs to the table we say that it is the primary key of that table. When a table contains a primary key of another table we say it is a foreign key. A foreign key is a form of relating tables. The representation of primary and foreign keys is essential to determine the translation between associations. The primary keys are represented in the KEY_PK table, and the foreign keys in the KEY_FK table. 5.2 Auxiliary Functions The LIL/SQL translation needs to know how each concept is represented in the domain database. To help achieving that goal we defined five functions that always return a character string. Function TRANSLATION(X,Y,Z) All the arguments are symbols. The first argument refers an association, the second an entity, and the third argument either an entity, a property, a property value, or a proper name. Alternatively, the ‘_’ character may be used, anywhere, to express the concept “any symbol”. This function returns the translation of the fact (part of the question) specified in the arguments.

Databases and Natural Language Interfaces 7

Function CLASS(X) Using the column CLASSIFICATION.DESG_CLASS, this function informs if the argument, a symbol, is represented as a table or as a column in the domain database.1 Function DEFAULT(X) This function is used to get the default properties or default entities associated with the argument, an entity or a property: if the argument is a property it returns the default entity, but if the argument is an entity it returns the default property. This function returns the symbol represented in column DEFAULT.SYMBOL2 that is associated with column DEFAULT.SYMBOL1. Function EQUIVALENT(X) The argument can be either a property or an entity, and the return value is the expression that was used to define the argument as an equivalent symbol of another symbol2. If there is an equivalent symbol, it returns the value (the name of the equivalent function) stored in the CLASSIFICATION.DESC_CLASS column and the arguments stored in the EQ_SYMBOL table. Otherwise it returns the argument itself. Function FORMATTER(X) This function is used to provide a function to correctly display the reply: the argument is a property, and the return value is the function name stored in the SYMBOL._TYPE_ column. 5.3 The Translation Algorithm The translation of LIL into SQL is supported in the assumption that the final SQL expression can be obtained by appending the partial translations of each formula belonging to a LIL expression. LIL formulas with a variable argument identify properties, columns, or tables containing values the reply must exhibit. The translation of these formulas defines the SELECT or FROM clauses, while the remaining formulas define the FROM or WHERE clauses, i.e., the conditions the reply must satisfy. After translating all the formulas belonging to a LIL expression, it is necessary to verify whether the FROM clause contains all the columns referenced in the SELECT and in the WHERE clauses. When an absence is discovered, a comma and the missing column are appended to the FROM clause. A similar operation must also be performed to identify columns that are not referenced in the SELECT clause. The translation process uses eight rules: one to substitute variables by its type; two to translate primitive formulas; one to translate formulas that have logical connectives; one to handle predefined formulas; two to guarantee that all the referenced tables and columns belong to the FROM and SELECT clauses; and one to handle formatting. Rule 1 — Variable Substitution During the translation process it is assumed that variables are replaced by their types, i.e., the names of the classes they belong to. The LIL expression: pred1(X1), pred2(X1,term2), after applying the substitution X1/pred1 (assuming pred1 is the type of the variable X1), is transformed into pred2(pred1,term2). When the substitution takes place we will refer to the type of X1 as X1cte.

1

This description is not complete, but a full description can be found in [6].

2

The symbols’ classification is stored in the column CLASSIFICATION.DESG_CLASS. If it is a function, the table EQ_SYMBOL contains a line referring each argument.

Databases and Natural Language Interfaces 8

Rule 2 — Unary Primitive Formulas The translation of a primitive formula with one argument (prd(trm)) identifies a table or an expression that involves columns. There are two possibilities, depending on the value returned by the evaluation of CLASS(prd): (a) if the returned value is ‘Table’, then the value returned after evaluating EQUIVALENT(prd) is concatenated, with a comma, to the FROM clause. If trm

is a constant (a proper name) then the value returned after evaluating is concatenated, with the ‘AND’ operator, to the WHERE clause;

TRANSLATION(_,pred,trm)

(b) if the returned value is ‘Column’, then If the LIL expression contains a formula of the type pred2(trm2,prd) If trm is a constant (value of a property) then the value returned after evaluating TRANSLATION(pred2,trm2cte,prd(trm)) is concatenated, with the ‘AND’ operator, to the WHERE clause; If trm is a variable then if the value returned by EQUIVALENT(prd) contains a function symbol then the value returned after evaluating EQUIVALENT(prd) is concatenated, with a comma, to the SELECT clause. otherwise EQUIVALENT(trm2cte).EQUIVALENT(prd) is evaluated and its value is concatenated, with a comma, to the SELECT clause. If the LIL expression does not contain a formula of type pred2(trm2,prd) If the value returned by EQUIVALENT(prd) contains a function symbol then the value returned after evaluating EQUIVALENT(prd) is concatenated, with a comma, to the SELECT clause. otherwise the value returned after evaluating DEFAULT(prd).EQUIVALENT(prd) is concatenated, with a comma, to the SELECT clause. Rule 3 — Binary Primitive Formulas The translation of a primitive formula with two arguments (prd(trm1,term2)) returns an expression, the evaluation of TRANSLATION(prd,trm1cte, trm2cte), that must be concatenated, using the ‘AND’ operator, to the WHERE clause. Rule 4 — Logical Connectives The translation of a formula using the logical connective and (&(prd1(trm2,trm3),…, prda(trmb, trmc))) returns an expression, the evaluation of TRANSLATION(prd1cte, trm2cte, trm3cte) AND … AND TRANSLATION(prdacte, trmbcte, trmcct), that must be concatenated, using the ‘AND’ operator, to the WHERE clause. The translation of a formula using the logical connective or (V(prd1(trm2,trm3),…,prda(trmb,trmc))) returns an expression, the evaluation of (TRANSLATION(prd1cte,trm2cte,trm3cte) OR … OR TRANSLATION(prdacte, trmbcte,trmccte)), that must be concatenated, using the ‘AND’ operator, to the WHERE clause. Rule 5 — Predefined Formulas The translation of a predefined formula with the EXACT predicate (EXACT(trm,cte)) concatenates, with the ‘AND’ operator, to the WHERE clause the value returned after evaluating an expression. The latter depends on the LIL expression: (a) if it contains a formula of the type pred2(trm2,trm) the evaluated expression is EQUIVALENT(trm2cte).EQUIVALENT(trmcte) = cte otherwise, (b) expression DEFAULT(trmcte).EQUIVALENT(trmcte) = cte is evaluated.

Databases and Natural Language Interfaces 9

The translation of a predefined formula with the SUP predicate (SUP(trm,cte)) concatenates, with the ‘AND’ operator, to the WHERE clause the value returned after evaluating an expression. The latter depends on the LIL expression: (a) if it contains a formula of the type pred2(trm2,trm) the evaluated expression is EQUIVALENT(trm2cte).EQUIVALENT(trmcte) > cte otherwise, (b) expression DEFAULT(trmcte).EQUIVALENT(trmcte) > cte is evaluated. The translation of a predefined formula with the INF predicate (INF(trm,cte)) concatenates, with the ‘AND’ operator, to the WHERE clause the value returned after evaluating an expression. The latter depends on the LIL expression: (a) if it contains a formula of the type pred2(trm2,trm) the evaluated expression is EQUIVALENT(trm2cte).EQUIVALENT(trmcte) < cte otherwise, (b) expression DEFAULT(trmcte).EQUIVALENT(trmcte) < cte is evaluated. Rule 6 — Missing tables If the FROM clause does not contain all the tables referenced in the SELECT and WHERE clauses, then all the missing tables are appended, using commas, to the FROM clause. Rule 7 — Missing columns Searches the tables that are referenced in the LIL expression with a variable argument and simultaneously do not have any of its columns included in the SELECT clause. All the found tables (named T) with missing tables are submitted to the following processing: the value returned after evaluating EQUIVALENT(T).defualt(T) is concatenated, with a comma, to the SELECT clause. Rule 8 — Optional formatting Applies the FORMATTER function to each column of the SELECT clause, or to the values returned by the equivalence functions, obtaining the final format for the properties. Algorithm Start with an empty SELECT SQL query With each LIL formula do begin Try to apply rule 1 Select the appropriate rule from the set {2,3,4,5} Apply the selected rule end Try to apply rule 6 and rule 7 Apply rule 8

6 Examples To demonstrate the application of the LIL/SQL translation algorithm we will present a couple of commented examples. A question mark will be used to represent an empty SQL clause. So, the initial SELECT query is: SELECT ? FROM ?. “What are the names of the hotels having five stars?” LIL formula: name(X), hotel(Y), have(Y,name), have(Y,star), EXACT(star,5) Initial SQL query:

SELECT ? FROM ?

Translation of: name(X), using Rule 2: since the call to function CLASS(name) returns ‘Column’, and the LIL formula also contains have(Y,name)the substitution rule produces have(hotel,name) the expression

Databases and Natural Language Interfaces 10

is evaluated and its value (HOTEL.NAME) is appended to the SELECT clause. The new SQL query is: SELECT HOTEL.NAME FROM ? EQUIVALENT(hotel).EQUIVALENT(name)

Translation of: hotel(Y), using Rule 2: since the call to function CLASS(hotel) returns ‘Table’, the expression EQUIVALENT(hotel) is evaluated and its value (hotel) is appended to the FROM clause, then the new SQL query is: SELECT HOTEL.NAME FROM HOTEL Translation of: have(Y,name), using Rule 3: since the LIL formula also contains hotel(Y) the substitution rule produces have(hotel,name). The expression TRANSLATION(have,hotel,name) is evaluated and its value (an empty string, assuming that table HOTEL has the NAME column) is appended to the WHERE clause then the SQL query remains unchanged. Translation of: have(Y,star), using Rule 3: since the LIL formula also contains hotel(Y), the substitution rule produces have(hotel,star). The expression TRANSLATION(have,hotel,star) is evaluated and its value (an empty string, assuming that table HOTEL has the column STAR) is appended to the WHERE clause then the SQL query remains unchanged. Translation of: EXACT(star,5), using Rule 5: since the LIL formula also contains have(Y,star), the substitution rule produces have(hotel,star). The expression EQUIVALENT(hotel).EQUIVALENT(star)=5 is evaluated and its value (HOTEL.STAR=5) is appended to the WHERE clause then the final SQL query is: SELECT HOTEL.NAME FROM HOTEL WHERE HOTEL.STAR=5 The result of evaluating FORMATTER(NAME), in this case CHAR(60), is the information to format the answer. “Which hotels have swimming-pool or sauna?” We assume that the “hotel” entity is represented by the HOTEL table, which has the QT_SAUNA column to store the number of saunas. The fact “hotel has swimming-pool” is represented by keyword ‘S’ in the POOL column. LIL formula:

hotel(X), V(have(X,pool), have(X,sauna))

Initial SQL query:

SELECT ? FROM ?

Translation of: hotel(X), using Rule 2: since the call to the function CLASS(hotel) returns ‘Table’, the expression EQUIVALENT(hotel) is evaluated and its value (hotel) is appended to the FROM clause. Then the new SQL query is: SELECT ? FROM HOTEL Translation of: V(have(X,pool),have(X,sauna)), using Rule 4: since the LIL formula also contains hotel(X) the substitution rule produces have(hotel,pool) → TRANSLATION(HAVE, HOTEL, POOL) → HOTEL.POOL=‘S’ have(hotel,sauna) → TRANSLATION(HAVE, HOTEL, SAUNA) → HOTEL.QT_SAUNA>0 the obtained values are appended to the WHERE clause using the OR keyword as separator. Then the new SQL query is: SELECT ? FROM HOTEL WHERE HOTEL.POOL=‘S’ OR HOTEL.QT_SAUNA > 0

Databases and Natural Language Interfaces 11

Determination of the omitted column, using Rule 6: the expression: EQUIVALENT(HOTEL).DEFAULT(HOTEL) is evaluated and its value (HOTEL.NAME) is appended to the SELECT clause. Then final SQL query is: SELECT HOTEL.NAME FROM HOTEL WHERE HOTEL.POOL=‘S’ OR HOTEL.QT_SAUNA > 0

The result of evaluating FORMATTER(NAME), in this case CHAR(60), is the information to format the answer. 7 Conclusion The algorithm presented in this paper was implemented and is a module of Edite, a multi-lingual (Portuguese, French, English, and Spanish) natural language front-end for relational databases. Edite answers written questions about tourism resources by transforming them into SQL queries. The answers depend on the type of question. They can be nominal lists of resources, text, images, or graphics. Currently, the database contains 53000 tourism resources, organized as 253 distinct types, corresponding to 209 database tables. The main goal of Edite, a NLIDB, is to provide users with the capability of obtaining information stored in a database [4]. The user is not required to learn an artificial communication language, being possible to formulate questions in the user’s own native language. Our solution has the advantage of being database independent [8]. References 1.

Allen, J. 1995. “Natural Language Understanding”. The Benjamin/Cummings Publishing Company, Inc.

2.

Androutsopoulos I., Ritchie G., Thanisch, P. 1993. “An Efficient and Portable Natural Language Query Interface for Relational Databases”. Proceedings of the 6th International Conference on Industrial & Engineering Applications of Artificial Intelligence and Expert Systems, Edinburgh, U.K., pages 327-330. Gordon and Breach Publishers Inc., Langhorne, PA, U.S.A.

3.

Androutsopoulos, I. 1993. “Interfacing a Natural Language Front-End to a Relational Database (MSc thesis)”. Technical paper 11, Dept. of AI, Univ. of Edinburgh.

4.

Androutsopoulos, I. 1994. “Natural Language Interfaces - An Introduction”. Journal of Natural Language Engineering, Cambridge University Press.

5.

Cohen, P.R. 1991. “The Role of Natural Language in a Multimodal Interface”. Technical Note 514, Computer Dialogue Laboratory, SRI International, 1991.

6.

Filipe, P. 1999, “Sistema de Interrogações em Língua Natural para Bases de Dados: Modelo Conceptual, Aquisição de Vocabulário e Tradução”, M.Sc. Dissertation. Instituto Superior Técnico, Universidade Técnica de Lisboa.

7.

Filipe, P., Mamede, N, 1999. “Aquisição de Vocabulário num Sistema de Interrogações em Língua Natural para Bases de Dados”, Actas do IV Encontro para o Processamento Computacional da Língua Portuguesa Escrita e Falada, Évora.

8.

Grosz, B. J., Appelt, D. E., Martin, P. A., Pereira, C. N. 1987. “TEAM: An Experiment in the Design of Transportable Natural-Language Interfaces”. Artificial Intelligence 32, pages 173-243. Elsevier Science Publishers B.V. (NorthHolland).

9.

Marques, L. 1996. “Edite - Um Sistema de Acesso a Base de Dados em Língua natural Análise Morfológica, Sintáctica e Semântica”, .M.Sc. Dissertation. Instituto Superior Técnico, Universidade Técnica de Lisboa.

10. Reis, P., Mamede, N. 1996. “LIL-SQL. Processamento de Interrogações LIL por Tradução para SQL”. Technical Report. Grupo de Sistemas e Serviços Telemáticos, INESC. 11. Reis, P., Mamede, N., Matias, J. 1997. Edite – A Natural Language Interface to Databases: a New Dimension for an Old Approach in “Proceeding of the Fourth International Conference on Information and Communication Technology in Tourism”, ENTER’ 97, Edinburgh, Scottland.