Jour of Adv Research in Dynamical & Control Systems, 13-Special Issue, September 2017
Developing a Natural Language Interface with Semantic Matching to Complex Data K. Karthika, M.Phil Student, Department of Information Technology, Vels University, Pallavaram, Chennai, India. E-mail:
[email protected] R. Devi, Assistant Professor, Department of Information Technology, Vels University, Pallavaram, Chennai, India. E-mail:
[email protected]@gmail.com
Abstract--- Information Technology has been playing an important role in our lives. The major source of information is databases. Databases and its technology are having major impact on the highly developing use of computers. To retrieve information from a database, we need to formulate a query in such a way that the computer can understand and produce the necessary output that users required. The SQL norms are been pursued in almost all languages for RDBMS systems. Everybody can not able to write SQL queries that they may not be aware of the structure of database. So there may be a need for users whom were non-expert to query relational databases in their natural language not working with the values of the attributes. This idea of using natural language instead of SQL, has promoted the development for Natural Language Interface to the Database systems. The need of Natural Language Interface to the Database systems is increasing day by day as many more people access information through web browsers, Personal Digital Assistance and cell phones. This paper introduces an intelligent interface for the database. This theory proves that our Natural Language Interface to the Database systems are guaranteed to map a natural language query to corresponding SQL query. Keywords--- Data Mining, NLIDB, Fuzzy Logic, Data Reduction, Prediction.
I. Introduction Data Mining: Data mining is the process of applying these methods to data with the intention of uncovering hidden patterns. Data mining technology has been used for many years by many fields such as businesses, scientists and governments. It is used to sift through volumes of data such as airline passenger trip, information population data and marketing data to generate market research reports although that reporting is sometimes not considered to be data mining. Data mining is nothing but collecting or fetching some part of values from the whole database as per the user’s requirements. Data mining plays a vital role in retrieving the data’s or files from the bigger database. It mainly involves four classes of tasks such as, Classification: It arranges the data into predefined groups. Clustering: It is nothing but classification but it will not arrange the data into predefined groups. So the algorithm will try to group the similar number of items. Regression: Attempts to find a function which models the data with the least error. Association rule learning: It searches the relationship between variables.
Figure 1
ISSN 1943-023X
102
Jour of Adv Research in Dynamical & Control Systems, 13-Special Issue, September 2017
It is also known as a collection of techniques for discovering the efficient unknown, novel, valid patterns in the larger database systems. It is used to sift through volumes of data such as airline passenger trip, information population data and marketing data to generate market research reports although that reporting is sometimes not considered to be data mining.
II. Literature Review [1]. Reiter and C. Mellish [1992], defines the Classification of Natural Language Interfaces to Databases based on the Architectures. Association for Computational Linguistics Using Classification of Natural Language Interfaces to Databases based on the Architectures [3]. Popescu [2004], discuses about the Modern Natural Language Interfaces to Databases for Composing Statistical Parsing with Semantic Tractability [4]. M. Minock [2004], explains the Modular generation of relational query paraphrases. To explain the Journal of Language and Computation special issue on Formal Aspects of NLG, 2004 [5]. Anirudh Khanna , Bhagwan Das , Bishwajeet Pandey, DMA Hussain, Vishal Jain [2016], discussed about Upgrading the Quick Script Platform to Create Natural Language based IoT Systems [7]. K. Javubar Sathick , A. Jaya [2015], narrates the Natural Language to SQL Generation for Semantic Knowledge Extraction in Social Web Sources [10]. Juwahae, el.delphio and G.Martin hinghis [1999]. demonstrates an introduction to the Datasets Over Natural Languages. Explain the datasets natural language to complex data [11]. Androutsopoulus, I., Ritchie, G., Thanish, P.MASQUE [1993], discussed about an Efficient and Portables Language Query Interface for Relational Databases [12]. Minock, M [2007], demonstrates a step towards Realizing Codd’s Vision of Rendezvous with the Casual User. International Conference on Very Large Databases (VLDB-2007), Demonstration Session, Vienna, Austria (2007) [14]. Bagnasco, C., Bresciani, P., Magnini, B., Strapparava [1996],explains about Natural Language Interpretation for Public Administrations Database Querying in the TAMIC Demonstrator.[15]. Chu, W., Yang, H., Chiang, K., Minock, M., Chow, G., Larson, C.: Cobase [1996], discussed about a Scalable and Extensible Cooperative Information System. To explain Journal of Intelligent Information System.
III. Data Reduction 1.
2.
3.
Data reducing the number of attributes. • Data cube aggregation: applying roll-up, slice or dice operations. • Removing irrelevant attributes: attribute selection (filtering and wrapper methods) and searching the attribute space (see Lecture 5: Attribute-oriented analysis). • Principle component analysis (numeric attributes only): The searching for a lower dimensional space that can best represent the data. Data reducing the number of attribute values. • Binning (histograms): reducing the number of attributes by grouping them into intervals (bins). • Clustering: The grouping values in clusters. • Aggregation or generalization. Data reducing the number of tuples. • Sampling.
Figure 2: Collection of Data Reducing
ISSN 1943-023X
103
Jour of Adv Research in Dynamical & Control Systems, 13-Special Issue, September 2017
IV. Research Methods NLIDB Natural Language Interface to the Database Systems- is difficult to access the database to the persons whom having no idea of database language. So, the idea of using natural language instead of using the SQL has triggered the development of a new type of processing method. If users have no requirement to learn any other formal language, they can give query in their own native language to process it. Hence it has been discarded as the burden to learn structured query language. A natural language interface to database system will help in many ways. By using those systems everyone can collect the information from database. Further, it might change the thinking about the information in database. In personal digital assistance and cell phone environments, the screen is not as wide like computers. Filling a form has many fields as one may have to navigate through the screen or to look up the scroll box values to scroll etc. Instead that with natural language to database systems only work that needs to be done is to type the question similar to the SMS. In natural language interface to database systems user uses their own native language to query database so there might not be need for the users to spend the time for learning the system communication language. Fuzzy Logic This paper describes a fuzzy logic-based language processing method, which has been applied to NLP process in order to recognize human modifiable languages. Let us begin by presenting some of the basic concepts of fuzzy logic systems. However, it is one on those concepts which has been used in the induction process while dealing with the data mining. It is a logic system for reasoning approximate rather than exact. The main fundamental unit of a fuzzy logic is the fuzzy set. In classical set theory, a certain element either belongs or not belongs to a set. Our purpose is to create a system that can learn from a linguistic corpus the fuzzy semantic relations between the concepts represented by words and use such relations to process the word sequences generated by text recognition systems. In particular, the system might be able to predict the words failed to be recognized by a text recognition system. So this will help to increase the accuracy of a search based on user text system. Not only this and also it will serve as the first stage of deep semantic processing of user input results by providing “semantic relatedness” between the recognized words. Applications of Fuzzy Logic in Data Mining • • • • •
Fuzzy Logic allows to model in a more initiative way to complex dynamic data systems. It does not need to drain lots of data’s from the database. Due to its interoperability and simplicity it is used to compute with words or allows modeling near natural languages rules. The main advantage is that whenever a new data’s are added to the system, there is no need to retrain the system. It is ease to model your reasoning, ability to deal with uncertainty and nonlinearity, the ease of implementation and finally use of linguistic variable.
V. Proposed Method Fusion of Fuzzy Logic with NLIDB Algorithm • • • • •
Gets the natural language query as input. Preprocess that inputs value and meaning. Search the relevant query matching with the preprocessed input values. Fetch the relevant values from the database. Finally generates the required result or output for the non structured query language, i.e., for normal natural user language.
Input General and normal query rather than sql query is given as an input.
ISSN 1943-023X
104
Jour of Adv Research in Dynamical & Control Systems, 13-Special Issue, September 2017
Output A desired action will be processed related to the structured query language for the normal natural language which has been given as input.
VI. Results and Discussions In this work we obtained a result of having a required output from our natural language. We have also seen that how the natural language is processed to sensing a relevant structured query language and providing the valuable and relevant results for the non query inputs.
VII. Future Work In future an extra importance has been going to provide for the extraction of more accurate results for the natural language query. This can be implemented and enhanced using other techniques of data mining such as regression, clustering and neural network. This paper is the first to provide the evidence for those statistical parsers can support the NLIs. This paper identifies the quandary associated with appropriately training a statistical parser without special training for each and every single database systems, the parser also makes numerous errors, but creating a massive labeled corpus of questions for each and every database which has been prohibitively expensive.
References [1]
[2] [3]
Reiter, E. and Mellish, C. Using Classification of Natural Language Interfaces to Databases based on the Architectures. Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, 1992, 265–272. Minock, M. Managing fine-grained representations of completeness. Technical Report 04.05, the Univeristy of Ume°a, Ume°a, Sweden, 2004. Popescu, A. Modern Natural Language Interfaces to Databases. Composing Statistical Parsing with Semantic Tractability, University of Washington, 2004.
ISSN 1943-023X
105
Jour of Adv Research in Dynamical & Control Systems, 13-Special Issue, September 2017
[4] [5]
[6] [7] [8] [9] [10] [11]
[12] [13] [14]
[15]
[16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30]
Minock, M., Modular generation of relational query paraphrases (to appear). Journal of Language and Computation special issue on Formal Aspects of NLG (2004). Khanna, A., Das, B., Pandey, B., Hussain, D.M.A. and Jain, V. A Discussion about Upgrading the Quick Script Platform to Create Natural Language based IoT Systems. Indian Journal of Science and Technology 9 (46) (2016). Androutsopoulos, I., Ritchie, G.D. and Thanisch, P. Natural language interfaces to databases–an introduction. Natural language engineering 1 (1) (1995) 29-81. Sathick, K.J. and Jaya, A. Natural language to SQL generation for semantic knowledge extraction in social web sources. Indian Journal of Science and Technology 8 (1) (2015) 1-10. Popescu, A.M., Etzioni, O. and Kautz, H. Towards a theory of natural language interfaces to databases. Proceedings of the 8th international conference on Intelligent user interfaces, 2003, 149-157. ELF Software, ELF Software Documentation Series (2002). Juwahae, el.delphio and Martin hinghis, G. Introduction to the Datasets Over Natural Languages (1999). Androutsopoulos, I., Ritchie, G. and Thanisch, P. Masque/sql. Ancient and Portable Natural Language Query Interface for Relational Databases. Database technical paper, Department of AI, University of Edinburgh, 1993. Minock, M.J. A STEP towards realizing Codd's vision of rendezvous with the casual user. Proceedings of the 33rd international conference on Very large data bases, VLDB Endowment, 2007, 1358-1361. Minock, M. Natural language access to relational databases through step. Department of Computing Science, University of Umeå, 2004. Bagnasco, C., Bresciani, P., Magnini, B. and Strapparava, C. Natural language interpretation for public administration database querying in the TAMIC demonstrator. Applications of Natural Language to Information Systems, 1996. Chu, W.W., Yang, H., Chiang, K., Minock, M., Chow, G. and Larson, C. Co Base: A scalable and extensible cooperative information system. Journal of intelligent information systems 6 (2-3) (1996) 223-259. Alshawi, H., Carter, D., Crouch, R., Pulman, S., Rayner, M. and Smith, A. Clare: A contextual reasoning and cooperative response framework for the core language engine, 1994. Binot, J., Debille, L., Sedlock, D. and Vandecapelle, B. The Natural Language Interfaces. A New Philosophy, SunExpert, Magazine, 1991. Boldasov, M. and Sokolova, G.E. QGen – Generation Module for the Register Restricted In-BASE System. 4th International Conference on Computational Linguistics and Intelligent Text Processing, 2003, 465-476. Microsoft English Query Tutorials available with standard installation in SQL SERVER 7.0 or higher. Ceri, S., Gottlob, G. Tanca, L. Logic Programming and Databases (Surveys in Computer Science) (Hardcover), 1990. Ceri, S., Gottlob, G. and Wiederhold, G. Efficient database access from PROLOG. IEEE Transactions on Software Engineering 15 (2) (1989) 153-164. Ceri, S. and Pelagatti, G. Distributed Databases: Principles and Systems. McGraw-Hill, New York, 1984. Clifford, J. Natural Language Querying of Historical Databases. Computational Linguistics 14 (4) (1988) 10–34. Clifford, J. The Formal Semantics and Pragmatics for Natural Language Querying. The Cambridge Tracts in Theoretical Computer Science, Cambridge University Press, Cambridge, England, 1990. Clifford, J. and Warren, D.S. The Formal Semantics for Time in Databases. ACM Transactions on Database Systems 8 (2) (1983) 215–254. Codd, E.F. A Relational Model for Large Shared Data Banks. Communications of the ACM 13 (6) (1970) 377–387. Codd, E.F. Seven Steps to RENDEZVOUS with the Casual User. J. Kimbie and K. Koffeman, editors, Data Base Management, 1974. Cohen, P.R. The Role of Natural Language in a Multimodal Interface. In SRI International Technical Note 514, Computer Dialogue Laboratory, 1991. Copestake, A. and K. Sparck Jones. The Natural Language Interfaces to Databases. The Knowledge Engineering Review 5 (4) (1990) 225–249. Damerau, F. Operating statistics for the transformational question answering system. American Journal of Computational Linguistics 7 (1981) 30–42.
ISSN 1943-023X
106