An Approach for Solving of Natural Language Queries and ...

3 downloads 0 Views 432KB Size Report
What are the movies together with the actor as ‗Salman Khan' and the ... Select|Films|Actor|‗Salman Khan' |Director| 'Sooraj Barjatya' ..... 12 Shahrukh Khan.
International Journal of Advancements in Technology (IJoAT)

http://ijict.org/

ISSN 0976-4860

An Approach for Solving of Natural Language Queries and Transliteration using Multi-Agent System Nilesh M. Shelke, Rajiv Dharaskar RTM Nagpur University, Nagpur Email id : [email protected] , [email protected] Vilas Thakre Sant Gadge Baba Amravati University, Amravati Email id: [email protected] Abstract Since the invention of computer researcher fraternity is trying to minimize the communication gap between the computer and a human. Since then continuous and consistent efforts are being made to develop Natural Language Interfaces to Databases (NLIDBs). A general user is not an expert in SQL, but knows general English or native language for computer interaction. A majority of school-going children pursue their education in regional languages, among which Hindi language stands out to be most prominent. Schools are being provided with computer education facilities and internet connectivity so that vast educational resources already available and to be developed by schools themselves could be shared amongst them. In the current research work NLIDB interface has been developed in which the question is asked in simple daily life human language like English and the answer is given in the same language and in Hindi. Innovative architecture has been developed for making the use of hot technology of today: Software Agents. Keywords: NLP, NLI, SQL, Agents 1. Introduction Continuous efforts are being made by the researcher fraternity to make advancements in the field of Natural Language Programming (NLP). Natural Language Interfaces is a hot area of research since long. Asking questions from a database in natural language is a user friendly way of searching databases rather than writing and posing a question in the restricted pattern of SQL syntax. Although the nature of questions and vocabulary for a particular natural language interface is limited in some way but the user is more comfortable in writing questions in natural fashion instead of learning the keywords and syntax of the SQL. A number of researchers have developed different NLIDBs. The paper presents an interface module that converts user‘s query given in natural language into a corresponding SQL command and corresponding answer into the local language like Hindi. Typically, natural language database interfaces (NLDI's) use grammatical and/or statistical parsing. The application of intelligent agents to the design of information retrieval systems has drawn some attention in recent years To the best of the knowledge no such system has been developed for Indian languages based on agent technologies so far [7]. This research paper explores the research work in this direction. A software agent is an intelligent program that acts as a user‘s personal assistant. Software agents endowed with the property of mobility are called mobile agents. Mobile

Vol 1, No 1 (June 2010) © IJoAT

103

International Journal of Advancements in Technology (IJoAT)

http://ijict.org/

ISSN 0976-4860

agents perform a user‘s task by migrating and executing on several hosts connected to the network [2,3,4]. In computer science mobile agent is a composition of computer software and data which is able to migrate (move) from one computer to another autonomously and continue its execution on the destination computer. Naturally Mobility is the basic property of agents. The agent that can migrate from one machine to another is called as mobile agent [5]. Thus the agents which can not move are called as stationary agents. Software agents are expected to take an important role in the information technology world for solving the complex problems in the real world and a number of related works have been done mainly in the research fields of artificial intelligence, software engineering, database, and network computing. They behave like human intelligently, autonomously, cooperatively, and socially to solve problems or to support human users. The goal of this research work is to analyze the utility of the agents in the real time environment. Perhaps the most general way in which the term agent is used, is to denote a hardware or (more usually) software-based computer system that enjoys the following properties [1]:   

  

  

Autonomy: agents operate without the direct intervention of humans or others, and have some kind of control over their actions and internal state; Social Ability: agents interact with other agents and (possibly) humans via some kind of agent communication language; Reactivity: agents perceive their environment (which may be the physical world, a user via a graphical user interface, a collection of other agents, the Internet, or perhaps all of these combined), and respond in a timely fashion to changes that occur in it. This may entail that an agent spends most of its time in a kind of sleep state from which it will awake if certain changes in its environment (like the arrival of new e-mail) give rise to it; Proactivity: agents do not simply act in response to their environment, they are able to exhibit goal-directed behaviour by taking the initiative; Temporal Continuity: agents are continuously running processes (either running active in the foreground or sleeping/passive in the background), not once-only computations or scripts that map a single input to a single output and then terminate; Goal Orientedness: an agent is capable of handling complex, high-level tasks. The decision how such a task is best split up in smaller sub-tasks, and in which order and in which way these sub-tasks should be best performed, should be made by the agent itself. Benevolence: is the assumption that agents do not have conflicting goals, and that every agent will therefore always try to do what is asked of it; Rationality: is (crudely) the assumption that an agent will act in order to achieve its goals and will not act in such a way as to prevent its goals being achieved - at least insofar as its beliefs permit; Adaptivity: an agent should be able to adjust itself to the habits, working methods and preferences of its user;

Vol 1, No 1 (June 2010) © IJoAT

104

International Journal of Advancements in Technology (IJoAT)



http://ijict.org/

ISSN 0976-4860

Collaboration: an agent should not unthinkingly accept (and execute) instructions, but should take into account that the human user makes mistakes (e.g. give an order that contains conflicting goals), omits important information and/or provides ambiguous information. For instance, an agent should check things by asking questions to the user, or use a built-up user model to solve problems like these. An agent should even be allowed to refuse to execute certain tasks, because (for instance) they would put an unacceptable high load on the network resources or because it would cause damage to other users.

2. Related Work To empower the general mass through access to information and knowledge, organized efforts are to be made to develop relevant content in local languages and provide local language capabilities to utility software. There are several artificial languages for manipulating the data in the database. But their usage needs knowledge about the database structure, language syntax etc. One area of research efforts in the query interfaces is focused on improving the usability. The NLIDB means that a user can use some natural language to create query expressions and also the answer is presented in the same language. The history of NLIDB goes back as early as 1960's. The era of peak research activity on NLIDB was in the 1980's. In that time, the development of a domain and language independent NLIDB module seemed as a realistic task. The prototype projects showed that the building of a natural language interface is a much more complex task than it was expected. 3. Agent Architecture Agent Architecture has been selected for the implementation of the above work.  A framework for integrating a community of software agents in a distributed environment – Facilitates flexible, adaptable interactions among distributed components through delegation of tasks, data requests & triggers – Enables natural, mobile, multimodal user interfaces to distributed services 3.1 Software Agents vs. JINI, KQML, CORBA, E"SPEAK etc. Software Agents is found more advantageous for the above implementation.  Distributed object technologies such as OMG's CORBA, Microsoft's DCOM, Sun's JINI and HP's e"speak, and distributed agent approaches such as KQML/KIF and FIPA's ACL all share common properties:  The architecture that provides an object repository containing interface specifications for available objects. In CORBA, it's called the ORB, in DCOM the Object Repository, and in JINI and e"speak, a lookup service.  When an object requires the service of another, it queries the repository to find an object by specified name, id, attribute, or service, and then interacts with the object under control of it's own code. The Requesting object decides which objects it will interact with and how the interactions will occur, and is thus responsible for choosing, monitoring and maintaining the interaction session. From flexibility, modularity, and a dynamic community standpoint, these approaches have serious problems: since all interaction code is in the object itself, as new objects are added to the community, the individual entities must be rewired to take into account interactions with the new components.

Vol 1, No 1 (June 2010) © IJoAT

105

International Journal of Advancements in Technology (IJoAT)

http://ijict.org/

ISSN 0976-4860

3.2 Software Agents works on the principle of Delegated Computing

Fig. 1: Delegated Computation





A key distinguishing feature of Software Agent architecture is its delegated computing model which enables both human users and software agents to express their requests in terms of what is to be done without requiring specification of who is to do the work or how it should be performed. In Software Agent architecture, control of how interaction and communication occurs among agents is the product of cooperation between 4 distinct knowledge sources: 1. The requester which specifies a goal to the Facilitator and provides advice on how it should be met, providers who register their capabilities with the Facilitator, know what services they can provide and limits on their ability to do so, 2. The Facilitator which maintains a list of available provider agents and a set of general strategies for meeting goals. 3. Meta-agents that contain domain- or goal-specific knowledge and strategies which is used as an aid by the Facilitator. This knowledge is employed to foster cooperation among a set of agents. 4. The Facilitator matches a request to an agent or agents providing that service, delegate the task to them, coordinate their efforts, and deliver the results to the requester.

Thus this is the architecture so that whatever new agents you create other agents can make use of it once it get registered with the facilitator agents. Based upon this architecture we have developed the architecture for our experimental setup as discussed in section 4.

Vol 1, No 1 (June 2010) © IJoAT

106

International Journal of Advancements in Technology (IJoAT)

http://ijict.org/

ISSN 0976-4860

Fig. 2: Adding the agents on the fly

4. Experimental Setup Facilitator Agent

User Interface Agent

Conversational Agent

Parser Agent

Database Agent

Converter Agent

Fig. 3: System Architecture: Interaction amongst agent is possible only through Facilitator Agent

4.1 User Interface Agent This agent provides a graphical interface. Main tasks of User Interface Agents are:  Cooperate with Facilitator Agent and send it the accepted information.  Display the results to the user.  Display any error if found in the query. It takes the natural language query from the user like ‗what are the movies of actor ―Amitabh Bachchan in the year 2010?‖ and show the results. 4.2 Parser Agent Following are the main tasks of Parser Agents: 4.2.1 Tokenization: As a first step in processing of a query, it has to be determined what the processing tokens are. One of the simplest approaches to tokenization defines word symbols and inter-word symbols. All characters that are no letter and no digit are considered to be inter-word symbols. The inter-word symbols are ignored during this phase, and the

Vol 1, No 1 (June 2010) © IJoAT

107

International Journal of Advancements in Technology (IJoAT)

http://ijict.org/

ISSN 0976-4860

remaining sequences of word symbols are the processing tokens. As a result it is not possible to search for punctuation marks like for instance hyphens and question marks. 4.2.2 Stop Word Removal: Stop words are words with little meaning that are removed from the query. Words might carry little meaning from a frequency point of view, or alternatively from a conceptual point of view. Removing stop words for conceptual reasons can be done by using a stop list that enumerates all words with little meaning, typically function words like for instance ―the‖, ―it‖ and ―a‖. Separate list is attached at the end of the paper in Appendix A. 4.2.3 Morphological Normalization: Morphological normalization of words in queries is used to find morphological variants of words. We can have dictionary lookup for this normalization. For example: if we have query ―In what year the film ‗Kaun‘ released?‖ The word into inflexion form can be brought. Through dictionary lookup original word of released is release could be find. 4.2.4 Synonym’s Normalization: Synonyms words might also be conflated to one processing token during indexing and automatic query formulation. For Example the query ‗What are the movies of Amitabh Bachchan‘ has the same meaning as ‗What are the Films of Amitabh Bachchan?‘ or What are the stars, casts or actors are equivalent words. So the first step in our method is extracting the keywords from the English sentence. Given information about the database we generate that will do the task of extraction. The keywords that we extract are words that refer to table names, field names, operators etc. Some of the keywords that we look for and their corresponding outputs are mentioned in Table 1. TABLE 1: Sample extracted keywords and their replacements for SQL Query K ey w o rds

O utput After Prepro cessing

Disp lay, what, List, Retr ieve

Select

Emp lo yees

Emp lo yees

Names

Name

W ho se, T hat, W hich,W ho

W ho se

Greater , B igger , Old er , Mo r e

>

Suppose the query given is: What are the movies together with the actor as ‗Salman Khan' and the director as 'Sooraj Barjatya‘? After performing the above steps by the parser agent our query becomes: Select|Films|Actor|‗Salman Khan' |Director| 'Sooraj Barjatya‘

Vol 1, No 1 (June 2010) © IJoAT

108

International Journal of Advancements in Technology (IJoAT)

http://ijict.org/

ISSN 0976-4860

Now this query is given to the Database Agent. 4.3 Database Agent Once we have all the keywords and their relative order from the preprocessing, this information is fed into the database agent. It should be noted that the Database Agent contains information about the database such as the table names and the field names. The Database Agent looks at the final SQL query as broken down into two components: the objects, component and the optional conditions component. The Database Agent tries to determine if there is a Conditions component by looking for its corresponding indicator, which is whose. If it does exist, it splits the input string into two components, consisting of words that occur before the indicator and after the indicator. The logic behind this is that most human queries are structured in a way similar to the ones mentioned in the examples in Fig. 3. However there are sentences that don‘t adhere to this rule. In such cases we can set up the system so that it iterates back and forth between the users to see if there is a similar sentence that does match the required structure based on some pattern matching. 4.3.1 Objects Component Irrespective of the presence of Conditions component the Objects component will always exist. So once we have that, the database agent iterates through it to find all the fields that are mentioned in the component. Once it has this it checks to see what tables a field is associated with based on the information. In case if it finds a field that is associated with multiple tables such as name, or age, then it iterates through the table mentioned in the whole string and does elimination based on the presence of tables in the string. SELECT [tablenames.fieldnames] FROM [tables] 4.3.2 Conditions Components Once we are done with the Objects component and if the program determines that a conditions component does exist. It iterates through the Conditions component to see the fields that are mentioned in there. Once it finds a field, it finds the relatively closest operator to that field and the closest value to that field and associates them together. Then it goes through the same process as in the Objects components to tables associated with that field. If there are multiple fields in the Conditions object it puts an ‘AND‘ and goes through the same process as mentioned above. Once it is done iterating through all the fields it checks all that tables that are associated with the fields and linking based on the keys which it already knows. Once it has all this information is makes the second part of the SQL Query in the format: WHERE [table linkings] AND [tablenames.fieldnames][operators][values] Then the sub strings are concatenated to give the entire query.

Select Films.name from Films, Actor, Director, Films_actor

Fig. 4: Logic of chosing the right field name from the closer tablename

Vol 1, No 1 (June 2010) © IJoAT

109

International Journal of Advancements in Technology (IJoAT)

http://ijict.org/

ISSN 0976-4860

What are the movies together with the actor as ‘Salman Khan' and the director as 'Sooraj Barjatya‘?

Parsing Agent

Select|Films|Actress ‘Salman Khan' |Director| 'Sooraj Barjatya‘

Database Agent

Objects

Components

Select name from Films, Actor, Director, Films_actor

Films.Filmid=Film_Actor.Film_Id Director.Directorid = Films.Director_id

Select title from Films, Actor, Director, Film_actor where Films.Filmid=Film_Actor.Film_id and Director.Directorid = Films.Director_id and Actor.name= 'Salman Khan' and Director.name='Sooraj Barjatya'

Fig. 5: Shows the Entire Translation Process

Consider table 2 like DEFAULT which contains the list of all the tables and corresponding default attributes of that table. TABLE 2: Representation of Default Attributes of the Entities

DEFAULT SYMBOL1 SYMBOL2

Database agent will interact with the main data of our desired domain. This agent will generate SQL equivalent of the natural language query entered by the user. The output from the parser agent gives clear indications of required columns and conditions in the final SQL. The SQL will be generated based on the underlying database structure and set of expert rules for query building. Interpretation of the natural language patterns that we received as a result of parsing is required for generating SQL. Following Auxiliary functions were created for assisting the conversion of parsed statement to the SQL statement. Function GETTABLENAME (X)

Vol 1, No 1 (June 2010) © IJoAT

110

International Journal of Advancements in Technology (IJoAT)

http://ijict.org/

ISSN 0976-4860

If the Argument provided is the field then it gives the name of the table and returns Tablename.Fieldname and if it is the table name then it just return boolean true value. Function DEFAULT(X) This function is used to get the default properties of the entities associated with the argument from the table DEFAULT. It returns the answer as Tablename.Fieldname. 4.4 Facilitator Agent The facilitator agent will be a blackboard server agent responsible for coordinating agent communication and control and for providing a global data store to its client agents. It will maintain a registry of agent service and data declarations. All communication between client agents will pass through the black board. This will establish a high-level interface to the agent, which will be used by a facilitator in communicating with the agent, and in delegating service requests to agents. 4.5 Conversational Agent The idea that a computer could actually engage in a conversation with a human being was thought to be the subject of science fiction for many years. Since then the ability to create a computer that could understand and communicate using natural language has been the main thrust of scientists worldwide. This has led to the development of conversational agents, computer based agents that can participate in natural human dialogue with a user. The implication of this technology, even whilst still in its infancy is that a machine rather than a human operator can engage in a conversation The tasks of Conversational Agents are: 1. User enters a natural language statement into the user interface, which is passed to the conversational agent. 2. Attributes and their associated values are extracted from the user input by the conversational agent, • If no attributes can be identified the conversational agent will engage in a dialogue to guide the users through a process aimed at capturing an attribute and its value. • If the user is uncooperative and fails to cooperate with the conversational agent after several attempts then the session will end. 4.6 Converter Agent Transliteration is a representation of the words of one language in the script of another, i.e., it is the transcription of one alphabet in another. Some other interesting definitions are:     

The representation of characters or words of one language by corresponding characters of words of another language. [16] A systematic way to convert characters in one alphabet or phonetic sounds into another alphabet. [17] The translation of text from one writing system into another where the writing conventions of the target writing system are applied. The transliterated text should read naturally in the target script. [18] A letter-for-letter or sound-for-letter spelling of a word to represent a word in another language. [19].

Vol 1, No 1 (June 2010) © IJoAT

111

International Journal of Advancements in Technology (IJoAT)

http://ijict.org/

ISSN 0976-4860

Its usage is particularly common in scientific and news articles when they refer to named entities or events of another language, is different from the one used to write these articles. In this paper we present a novel approach to transliterate named entities in Hindi to English. In multilingual processing, transliteration must be used for handling words in the categories like: 1. The names of people, organizations etc. (e.g. Anil Kapoor (अनिल कपूर). Amitabh Bachchan (अममताभ बच्चि) or Microsoft (qÉæ¢üÉåxÉÉåÄnOû ) etc.). 2. Names of the cities like Nagpur (ukxiqj) or Banglor (caxyksj) 3. Names of the movies like Kaun (dkSu). The problem can be stated formally as a sequence labeling problem from one language alphabet to other. Consider a source language word x1x2….xi..xN where each xi is treated as a word in the observation sequence [1]. Let the equivalent target language orthography of the same word be y1, y2...yi..yN where each yi is treated as a label in the label sequence. The task here is to generate a valid target language word (label sequence) for the source language word (observation sequence) [15]. x1 —————– y1 x2 —————– y2 . ——————- . . ——————- . xN ——————yN Here the valid target language alphabet(yi) for a source language alphabet(xi). Following different options were considered for transliterating the words of Hindi into English.  Shrilipi (which is a software mostly used for printmedia related work). It is licensed, Platform dependent and application dependent.  Government related software‘s ism one can write keeping the same font as ASCII values are given same to English and rest is for Hindi. License Platform and application dependent. Encrypt for the Hindi letters.  Script based font supported by windows or other application developers. Platform and application dependent  Krutidev same keyboard standard layout freeware. Out of these using freeware Krutidev was the best option. Fig. 6 shows how English letters can be made equivalent to the Hindi words.

Vol 1, No 1 (June 2010) © IJoAT

112

International Journal of Advancements in Technology (IJoAT)

http://ijict.org/

ISSN 0976-4860

Fig. 6: Matrix for Transliteration of English letters to Hindi

Fig. 7: Sample screen shot from the software developed

5. Results and Discussions 100 queries were asked by users during testing. 26 of these questions were disregarded as they had not been covered in the films domain of the prototype system. Of the 74 suitable questions, 48 resulted in a correct SQL query. Analysis of these questions and the queries which were generated revealed the following: •

42 of the suitable questions processed by the system gave correct results which displayed only relevant information. For example, ―Who are the actors of the film ‗maine pyar kiya‘?‖

Vol 1, No 1 (June 2010) © IJoAT

113

International Journal of Advancements in Technology (IJoAT)



http://ijict.org/

ISSN 0976-4860

6 of the suitable questions processed by the system gave correct results, but also displayed information not asked for in the question.

6. Conclusions Research in information retrieval field has understood the need for agent-based system. A multi-agent based system which runs several agents locally on each user‘s computer and that support communication between them to help in common tasks has been proposed. It is believed that the social behavior of multi agent systems will enhance the performance of information retrieval system in both recall and precision. The main goal of the work was to provide users with the capability of obtaining information stored in a database. The user is not required to learn an artificial communication language, being possible to formulate questions in the user‘s own native language. Our solution has the advantage of being database independent. While the results are promising, clearly, more research would need to be done in order to develop this system further. A much larger survey would need to be done with database users, in order to collect a wider range of example natural language questions in order to exploit the interactive capability of all the agents. A more interactive system would therefore require extensive user involvement during development. It would also be beneficial to conduct more research on a range of real life databases, to discover whether databases with different type‘s content still share any similarities in their schema, and how this relates to the type of questions which their users ask. In the future further issues are required to take into consideration  Handling other questions than wh- (and give/show/obtain/display) Questions.  Dealing with light syntactic/semantic constructions and vague prepositions.  Inclusion of some approach for handling scope, in particular for representation and disambiguation  An adequate treatment of quantifiers, in particular those translated into aggregation, comparison and negation operators is required.  Handling and reacting appropriately to out-of-scope questions 6. REFERENCES: [1]

[2]

[3]

[4]

[5]

"Agent Management," FIPA 1997 Specification, part 1, version 2.0, Foundation for Intelligent Physical Agents, October 1998.

Suggest Documents