Jour of Adv Research in Dynamical & Control Systems, Vol. 10, No. 4, 2018
Survey on Query Expansion Techniques in Word Net Application Dr.D. Akila, Associate Professor, Department of Information Technology, School of Computing Sciences, VELS (VISTAS), Chennai. E-mail:
[email protected] S. Sathya, Assistant Professor, Department of Information Technology, School of Computing Sciences, VELS (VISTAS), Chennai. E-mail:
[email protected] Dr.G. Suseendran, Assistant Professor, Department of Information and Technology, School of Computing Sciences, Vels University, Chennai, India. E-mail:
[email protected]
Abstract--- To enhance the overall retrieval performance, the original user query is used to do the query expansion, so that the user’s query can be reformulated. Broadening the query in order to add more term that matches the original terms is the objective of query expansion techniques. While expanding the query most of the query expansion techniques will not consider the terms. This may not result in precision while retrieving the answer because of the vagueness and terms ambiguity in the query. This article proposes a widespread survey to know about the several query expansion techniques by the researchers additionally with their disadvantages. This survey includes several automatic query expansions that include corpus specific based, linguistic based, search log based and query-specific based approaches. Some techniques use lexical resource like WordNet and others use web data and search log for query expansion. The following aspects are being discussed in this paper: The importance of is query expansion important in information retrieval , the important steps of automatic query expansion, several approaches of automatic query expansion and comparison approaches, the research directions and critical issues of automatic query expansion. Keywords--- MeSH, WordNet, AQE, IR, Semantics, Query Expansion.
I.
Introduction
In research area, the Information Retrieval (IR) has become a vital part process. The knowledge-based information can be searched and retrieved from the database with the help of Information Retrieval (IR) [1]. Additionally it includes storage, representation and information retrieval [2]. IR mainly concentrates on the organization and information retrieval in case of large database collection [3]. IR was emerged jointly with the database field. In traditional schema, huge number of databases and unstructured documents are assumed without any associated schema. The process of generating appropriate documents based on the user query is called IR. The World Wide Web (WWW) is considered to be the suitable mode of interacting with other resources. IR has made the web a useful and a productive tool [4]. Some of the information retrieval systems (IRS) are online document-management systems and online library catalogues and theses files are arranged well organized. In World Wide Web (WWW), each HTML page is considered as a document. But the issue here is to locate the document in appropriate time. Nowadays the WWW technology becomes an information repository for the references of knowledge. Several challenges can be solved if IR applied to the databases, Natural Language Processing (NLP), Web Mining Techniques, and Machine Learning etc. Because of Query In accuracy, IRS will lead to Ineffectiveness. Keywords should be used to get best appropriate results in IRS. If the user is not sure about the required content, then the IRS will not produce relevant results [5]. Vocabulary mismatch is also a reason for the ineffectiveness performance of the IRS. It can be considered as the fundamental problem for the IR [6]. It comes into existence in case of natural language where the same idea or concept can have altogether a different meaning. There are several approaches to deal with this problem by using query expansion, interactive query refinement, relevance feedback, search results clustering, word sense disambiguation and re-ranking. Expanding the query with other related word is considered to be the best approach [7]. The problem of ineffectiveness can be solved by applying the above techniques [5]. Hence, this paper concentrates on reviewing and intelligent information retrieval implementations using datasets of two test collections. The research mainly focuses on FIRE and Clue Web though other test collections are available.
ISSN 1943-023X Received: 5 Feb 2018/Accepted: 15 Mar 2018
119
Jour of Adv Research in Dynamical & Control Systems, Vol. 10, No. 4, 2018
II.
Literature Survey
Query Expansion [9] is a most enhanced tool for improving the query statements and also improving the retrieval of relevant documents in match to the search. A) Concept Network Based Query Expansion In [10], Orland et al. discussed an approach which considers the conceptual semantic theories where the concept network knowledge base is used to expand the queries. The new queries are derived from the concept network which matches the query terms. The semantic similarity among the concepts are framed and expressed using a directed graph model called as Conceptual Word Cluster Space Graph (CWCSG). Here, the user’s query is extended in order to meet user’s search needs accurately. When compared to synonyms dictionary of WordNet [11] the above method delivers better performance but the quality of concept network is very important aspect. Based on the threshold value the higher weight terms are derived and based these top n documents, the relevant terms are selected in [12].The query expansion by G. Akrivas et al. has performed by adding only semantic entities and not considering the terms. The inclusion relation is constructed using semantic encyclopedia explored with semantic relations. The above approach is clubbed with user’s profile [13]. The grouping is done for the purpose of clustering by considering the top ranked documents for classification information and the documents with same classification information are grouped together to form clusters in [14]. By combining both query expansion and relevance feedback approach, a query expansion approach was proposed based on the keyword querying to get pure concept-based information search [15]. B) Query Expansion Based On Term Weighing In [16] Lee et al. implemented a method where the domain ontology and also users profile are used to rank the documents. The expansion of the query is based on the user’s interest. To sort and rank, the expanded query is matched with the retrieved documents. In [17] Rahgozar et al improve the performance of information retrieval in MAP criterion (7%) based on the pseudo-relevance feedback .The retrieval set of documents are assumed to be same at the beginning level. The retrieved documents are used to extract more similar terms for querying and re-weighing process. As the user’s context is very much relevant to the top documents; it helps in efficient retrieval based on weighing the set on its context. The Numerical terms are approximately matched by Mittal et al [18]. Here the fuzzy triangular membership function is used to weigh the fuzzy. With the aim of representing both queries and documents , the vector space model has been used by Lin et al. in [19].To define the degree of importance , a fuzzy rule is been proposed that weighs to get more added query terms. The similarity between the concepts based on new and old queries was discussed by A.Hust et al. in [20].Here the extraction of the terms is based on the old queries. These approaches make use of global feedback approach to expand the query. C) Query Expansion Based on Word Sense Disambiguation (WSD) To measure the modules, the Semantic relatedness and WordNet are used to expand the query in [21] by Li et al. In order to analyse the meaning of the term and the context, the Word sense disambiguation technique was used. Now, considering the recovered concepts, the expanded query term is produced from WordNet and there is a 7 % improvement in the precision. By analyzing the similarity of ambiguous terms with other terms on the query, the query is expanded in [22] by Mittal et al. Here first the OWA (Ordered weighted averaging operator) is calculated and the highest similarity score word is fixed as the suitable meaning for the term. The query is expanded using the above sense with implicit feedback. Here the optimization of the query is attained and hence better performance along with better precision was achieved. The polysemy words are expanded by selecting the terms that are lose to the context sense incorporating automatically in [23] by Tayal et al. A graph based structure is constructed for the query terms in this approach. The similar nodes which are represented as additional nodes are added in the query as additional terms to increase the performance. Considering large WSD, a graph based algorithm is proposed by Lapata et al., in [24].in this approach, they have given an approach to choose lexicon and connectivity influences to improve the performance.
ISSN 1943-023X Received: 5 Feb 2018/Accepted: 15 Mar 2018
120
Jour of Adv Research in Dynamical & Control Systems, Vol. 10, No. 4, 2018
To retrieve the information, WSD is used along with thesaurus, WordNet and ontology in the approach proposed by Valli et al. [25] .Here the precision and recall are improved effectively. In [26] Parapar et.al delivered the queries as logical formulas by considering different connectives and various types of linguistic information from WordNet. D) Word Net The Princeton University proposed an online philological situation framework WordNet. Verbs, nouns, adjectives and adverbs are gathered into substitute groups (synsets). The synsets are in turn controlled into faculties. The synsets are further associated with synsets advanced or inferior in the grading categorized by distinctive categories of affairs. The maximum regular connections are the Hyponym/Hypernym (i.e., Is-A connection), and the Meronym/Holonym (i.e., Part-of connection). There are nine things and some verb Is-A hierarchies (adjectives also adverbs are not ordered into Is-A hierarchies). [22] E) MeSH U.S National Library of Medicine (NLM) preferred MeSH as an ordered hierarchy for biological and medicinal relations. MeSH positions are planned in Is-A classifications through extra common relations [―drugs and chemicals] higher in a classification than new detailed terms [-aspirin] [22]. Every MeSH duration is portrayed by numerous possessions, the greatest authoritative of them actuality the MeSH Heading (MH), Possibility Note and Entrance Positions. In this work, entrance positions are preserved as synonyms [22] F) Semantic Comparison There are several techniques that work to determine semantic comparison among relations that have been projected in the writing plus selected of them have been confirmed arranged Word Net. We existent an assessment for an added comprehensive besides upto-date established of techniques also we research cross ontology methods. Comparison measures put on first for nouns [also verbs in Word Net] and for Is-A relations. Ordered chattels similar unity, individuality and variance things for adverbs and adjectives ensure not occur. Semantic comparison techniques are categorized into subsequent central techniques: 1. 2. 3. 4.
III.
Edge Counting Techniques Information Content Techniques Hybrid measure Feature based measure
Query Expansion
Query expansion is a method to add more new terms along with the original query so as to a better retrieval performance. The four different ways of query expansion is mentioned below: • • • •
Manual (the user can choose the expansion terms). Interactive (system suggest the query expansion terms to the user to expand the query). Automatic (the entire process is invisible). Hybrid (mixture of more than one query expansion methods).
A) Automatic Query Expansion (AQE) After the evaluation results obtained at the Text Retrieval Conference series (TREC) the AQE become popular and considered to be most common and preferred method. Most researchers had implemented this technique and brought noticeable improvements in IR. AQE is considered to be most promising technique to boost-up the retrieval effectiveness of document ranking. Even MySQL, Google Enterprise and Lucene provide AQE to users that can turned on or off. B) Important Steps in Automatic Query Expansion Based on the review, Automatic query expansion is categorized into four steps. The steps are briefed below. 1) Pre-processing of Raw Data The data used for users query expansion is converted into a new effective format for further processing.
ISSN 1943-023X Received: 5 Feb 2018/Accepted: 15 Mar 2018
121
Jour of Adv Research in Dynamical & Control Systems, Vol. 10, No. 4, 2018
It involves the following steps: a. b. c. d. e.
Text extraction is done from documents like MS Word, HTML and PDF documents. Tokenization (here individual words are extracted). Stop word removal (the common words like prepositions and articles are removed). Stemming of Word Weighting of word (To understand the importance in every document, each word is assign with a score).
2) Query Term Features Generation and its Ranking The query term is generated and ranked by the system for expansion features. The query term properties form a base for feature generation. With very few query term expansions features, the query is expanded. So ranking is very important. The input to this stage are user query and data. The output contains a set of expansion features additionally with their corresponding scores. 3) Selection of Query Term Features The top most elements are considered for query expansion after completing the ranking of query term features. These elements are considered individually and not on their expansion feature’s mutual dependencies among them 4) Query Formulation and Reformulation The query formulation and reformulation is the final stage in query expansion. Here it is discussed on the submission of expanded query to information retrieval system in order to get effective results. Here a weight is assigned to each feature that describes the expanded query term reweighting. There are many application of AQE like Multimedia Information Retrieval, Question Answering, Information Filtering, Mobile Search, Cross-Language Information Retrieval, Expert Finding, Federated Search, Slot-based Document Retrieval, etc. 5) Categorization of Query Expansion Techniques The five major classification of Automatic query expansion techniques are mentioned below: 1. Query based techniques [27] • Distribution difference based techniques • Model based techniques • Document summarization based techniques 2. Corpus based techniques [27] • Concept term based techniques • Term clustering based techniques 3. Linguistic based techniques [27] • Stemming based techniques • Ontology browsing based techniques • Syntactic parsing based techniques 4. Web data based techniques [27] • Anchor Text based techniques • Wikipedia based techniques 5. Search log data based techniques [27] • Related queries based techniques • Exploiting query documents relationship based techniques
IV.
Conclusion
The performance of information retrieval system can be enhanced using automatic query expansion methods. There is no efficient method to handle the vocabulary problem in information retrieval. There are several techniques that are available namely: query specific, Linguistic, corpus specific, search log and based on web data, etc. These methods serve the purpose of various requirements like, computational efficiency, query type, external data availability and features of underlying ranking system. Based on automatic query expansion, lots of experiments exist. These existing researches made a remarkable improvement in retrieval efficiency with gain in recall and precision.
ISSN 1943-023X Received: 5 Feb 2018/Accepted: 15 Mar 2018
122
Jour of Adv Research in Dynamical & Control Systems, Vol. 10, No. 4, 2018
References [1] [2]
[3] [4] [5]
[6]
[7] [8]
[9] [10] [11]
[12]
[13] [14]
[15] [16] [17] [18] [19] [20] [21] [22] [23]
Sharma, M. and Patel, R. A survey on information retrieval models, techniques and applications. International Journal of Emerging Technology and Advanced Engineering 3 (11) (2013) 542-545. Sy, M.F., Ranwez, S., Montmain, J., Regnault, A., Crampes, M. and Ranwez, V. User centered and ontology based information retrieval system for life sciences. BMC bioinformatics 13 (1) (2012) 1471- 2105. Roshdi, A. and Roohparvar, A. Information Retrieval Techniques and Applications. International Journal of Computer Networks and Communications Security 3 (9) (2015) 373-377. Silberschatz, A., Korth, H.F. and Sudarshan, S. Database system concepts. New York: McGraw-Hill, 1997, 915-943. Ooi, J., Ma, X., Qin, H. and Liew, S.C. A survey of query expansion, query suggestion and query refinement techniques. IEEE 4th International Conference on Software Engineering and Computer Systems (ICSECS), 2015, 112-117. Xu, J. and Croft, W.B. Query expansion using local and global document analysis. Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, 1996, 4-11. Carpineto, C. and Romano, G. A survey of automatic query expansion in information retrieval. ACM Computing Surveys (CSUR) 44 (1) (2012) 1-50. Li, Y., Bandar, Z.A. and McLean, D. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on knowledge and data engineering 15 (4) (2003) 871-882. Abdelmgeid Amin, A. Using a query expansion technique to improve document retrieval. International Journal Information Technologies and Knowledge 2 (2008). Hoeber, O., Yang, X.D. and Yao, Y. Conceptual query expansion. International Conference on Atlantic Web Intelligence, 2005, 190-196. Peng, M., Lin, Q., Tian, Y., Yang, M., Xiao, Y. and Ni, B. Query expansion based on Conceptual Word Cluster Space Graph. IEEE 5th International Conference on New Trends in Information Science and Service Science (NISS), 2011, 128-133. Jain, A., Mittal, K. and Sabharwal, S. Conceptual weighing Query Expansion on user profiles. National Conference on Communication Technologies & its impact on Next Generation Computing CTNGC Proceedings published by International Journal of Computer Applications (IJCA), 2012. Akrivas, G., Wallace, M., Andreou, G., Stamou, G. and Kollias, S. Context-Sensitive Semantic Query Expansion. IEEE International Conference on Artificial Intelligence Systems (ICAIS), 2002, 109-114. Kang, J.W., Kang, H.K., Ko, M.C., Jeon, H.S. and Nam, J. A Term Cluster Query Expansion Model Based on Classification Information in Natural Language Information Retrieval. IEEE International Conference on Artificial Intelligence and Computational Intelligence (AICI), 2010, 172-176. Chang, C.H. and Hsu, C.C. Integrating query expansion and conceptual relevance feedback for personalized web information retrieval. Computer Networks and ISDN Systems 30 (1-7) (1998) 621-623. Hahm, G.J., Yi, M.Y., Lee, J.H. and Suh, H.W. A personalized query expansion approach for engineering document retrieval. Advanced Engineering Informatics 28 (4) (2014) 344-359. Karisani, P., Rahgozar, M. and Oroumchian, F. A query term re-weighting approach using document similarity. Information Processing & Management 52 (3) (2016) 478-489. Tayal, D.K., Sabharwal, S., Jain, A. and Mittal, K. Intelligent query expansion for the queries including numerical terms. Proceedings of International Journal of Computer Applications, 2012, 35-39. Lin, H.C., Wang, L.H. and Chen, S.M. Query expansion for document retrieval by mining additional query terms. Information and Management Sciences 19 (1) (2008) 17-30. Hust, A., Klink, S., Junker, M. and Dengel, A. Query Expansion for Web Information Retrieval. In GI Jahrestagung, 2002, 176-182. Zhang, J., Deng, B. and Li, X. Concept based query expansion using wordnet. In Proceedings of the 2009 international e-conference on advanced science and technology, 2009, 52-55. Mittal, K. and Jain, A. Word Sense Disambiguation Method using Semantic Similarity Measures and OWA Operator. ICTACT Journal on Soft Computing 5 (2) (2015) 896-904. Jain, A., Mittal, K. and Tayal, D.K. Automatically incorporating context meaning for query expansion using graph connectivity measures. Progress in Artificial Intelligence 2 (2-3) (2014) 129-139.
ISSN 1943-023X Received: 5 Feb 2018/Accepted: 15 Mar 2018
123
Jour of Adv Research in Dynamical & Control Systems, Vol. 10, No. 4, 2018
[24] [25] [26] [27] [28]
Navigli, R. and Lapata, M. An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE transactions on pattern analysis and machine intelligence 32 (4), 2010678-692. Barathi, M. and Valli, S. Ontology based query expansion using word sense disambiguation. International Journal of Computer Science and Information Security 7 (2) (2010). Parapar, D., Barreiro, Á. and Losada, D.E. Query expansion using word net with a logical model of information retrieval. IADIS AC, 2005, 487-494. Singh, J., Sharan, A. and Siddiqi, S. A literature survey on automatic query expansion for effective retrieval task. International Journal of Advanced Computer Research 3 (3) (2013) 170-178. Muralidharan, M. and Mayil, V.V. A Survey: Efficient Semantic Comparison& Effective Information Retrieval System. International Journal of Trend in Research and Development (IJTRD), 2016, 33-36.
ISSN 1943-023X Received: 5 Feb 2018/Accepted: 15 Mar 2018
124