Fuzzy Information Retrieval in WWW

Fuzzy Information Retrieval in WWW: A Survey Shruti Kohli Department of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India E-mail: [email protected]

Ankit Gupta Department of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India E-mail: [email protected]

Biographical Notes : Dr Shruti Kohli is working as an Assistant Professor in the Department of Computer Science in Birla Institute of Technology, Mesra, Ranchi, India. She did her Master Degree in Operational Research from the University of Delhi, India in 2001. She obtained Masters' degree in Computer Application from IGNOU, India in 2002. She did her Mphil in Operational Research from University of Delhi, India in year 2004 and obtained the Ph.D (in Technology) in the year 2012 from Birla Institute of Technology, Mesra, Ranchi, India. The area of her doctoral research work was web intelligence. Her area of interest include Information retrieval, Operational Research,Data Mining,Web Analytics. She has presented papers in many international and national conferences and had been a resource person in DST sponsored FDPs. She is an active blogger and has great interest Mobile Apps development. She had been conducting Mobile App workshop in the institute and is currently running Mobile Incubator Cell in her college. She is active member of IEEE, IAENG International Society for Engineers and Soft Computing Research Society. Ankit Gupta obtained his B.Tech. degree in Computer Science and Engineering in the year 2004 from Uttar Pradesh Technical University, Lucknow, India and M.Tech. in Computer Science in 2012 from Birla Institute of Technology, Mesra, Ranchi, India. Currently, he is enrolled as a doctoral candidate with Department of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India. His research work focuses on Information Retrieval, Web Analytics and Web Mining. He has attended many national level workshop, seminars, faculty development programs and conferences.

Abstract—Information Retrieval has been an integral domain of storage and retrieval of meaningful information from the early age of computing. Last fifteen years has seen a drastic shift of paradigm from classical database retrieval to web-based information retrieval. World Wide Web has established itself as one of the primary mode of information storage, sharing and searching. The size of the WWW and its users has made the retrieval and retrieval systems much more complicated as well as sophisticated and thus posed a grand challenge to the researchers and developers to design the modern Information Retrieval system which can returns the query result as per user’s requirement effectively and efficiently. This survey paper is an attempt to find some of the challenges faced by modern retrieval system in effective retrieval of information and different methodologies of fuzzy logic trying to make this mammoth and complex task simpler. Keywords—Fuzzy logic; Web Intelligence; Information Retrieval; Semantic Web

1. Introduction Information Retrieval systems were in use for a very long time although the mode of operation has changed from “table of content” of the book to fully automated indexing and retrieval of a variety of

information. In recent time, libraries were first to use this kind of automated system. But as more information is going online, it is posing a lot of challenges to the architects of the modern Information Retrieval system. It is because of about 12.6 billion indexed web pages and approximately 3.5 billion internet users [1-2].In fact, The WWW has revolutionized the way we gather, process and use information. It has reforms the way how we handle daily aspects of our life like business, education, commerce, etc.[3]. According to a report by Mckinsey and company, overall search value while using internet is estimated at around $780 billion[4]. In the year 2000, a new term Web Intelligence(WI) was coined by Zhong et al. [3] who described Web Intelligence as “..a new direction for scientific research and development to explore the fundamental roles as well as practical impacts of Artificial Intelligence and advanced information technology on the next generation of web empowered products systems and services and activities.” Overall goal of WI is to make the experience of end user a pleasant one. Some critical domains of WI are Web prediction, Web mining, Web prefetching, Semantic web, Information retrieval, etc. Web prediction algorithms pursue to discover patterns that permit them to foresee subsequent web user demands. Applications of this type of prediction algorithm include recommendation systems, ecommerce, content personalization and reduction of user-perceived latencies by means of web prefetching and cache pre-validation [5]. In web prefetching, the web browser requests object before the user requires them. In order to do so, the browser prefetches the hints provided by a prediction engine. Web usage mining (WUM) systems are specially designed to carry out task of analyzing the data representing usages data about a particular Web Site. WUM can model user behavior and, therefore, can forecast the future movements of the users [111]. Online prediction is one WUM application. Information retrieval is the activity of retrieving meaningful information relevant to specific information need from a collection of information resources. Searches can be based on metadata or full-text indexing. The Semantic Web is a collective movement initiated by World Wide Web Consortium (W3C). Its objective is to endorse the use of common data formats on web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web dominated by unstructured and semi-structured documents into a "web of data [112]." The objective of this survey paper is to discuss various fuzzy logic methodologies being used in the area of Information Retrieval. Some survey papers on mining has already been presented in [6-8], but these articles primarily focused on different soft computing methods used for Web Mining area problems while discussing a fraction of Information Retrieval part, also they were presented when web was still in its infant stage. Some survey papers/books on Information Retrieval[9-11] have also been introduced in recent past, but the use of fuzzy logic methodologies in solving web IR problem remain untouched in these work. Kobayashi et al. [12] has published scholarly work in the area of web and IR, but their work was mainly focused on growth of internet and different technologies used for Information Retrieval. The rest of the paper is organized as follows: Section 2 of this paper provides a brief introduction of Information Retrieval, its models and differences between classical and web-based IR. Section 3 describes some of the major challenges faced by any Web-based Information Retrieval system, Section 4 gives an insight of the use of various Soft Computing methodologies used in IR, Section 5 explain some of the methodologies of Fuzzy logic which are in use in the area of Information Retrieval part, Section 6 conclude the paper with some of the possible future directions. [Space for Figure 1]

2.

Information Retrieval (IR)

Information Retrieval is a way to access information available on the internet, intranet, databases and data repositories. An Information Retrieval system takes the user’s query as input and returns a set of documents sorted by their relevance to the query. IR system are usually based on the segmentation of documents and queries into index terms and their significance is computed according to the index terms they have in common as well as according to other information such as the characteristics of the documents( Eg. number of words, hyperlinks between papers or bibliographic reference) or some probabilistic information [13]. 2.1 Information Retrieval: A Brief Introduction An Information Retrieval process typically consists of four small modules [8][12]: 1. Indexing: Described as “generation of document representation.” An Index term is a collection of terms with pointers to places where information about the document can be found. Four approaches to indexing documents on the web are (i) Human or manual indexing, (ii) Automatic

Indexing, (iii) Intelligent or agent-based indexing, (iv) Metadata, RDF and Annotation based indexing. 2. Querying: Expression of user preferences through natural language or terms connected by logical operators. 3. Evaluation: Performance of matching between user query and document representation. 4. User Profile construction: Storage of terms representing user preferences, primarily to enhance the system retrieval in future accesses by the user. Fig. 2 describes different steps of a typical IR process while fig. 3 illustrates various aspects of IR. [Space for Figure-2] [Space for Figure-3] Formally, an Information Retrieval system based on the conventional fuzzy set model can be defined as a quadruple [15] where, 1-T represents set of index terms that are used to represent queries and documents, 2-Q is a set of queries that can be recognized by the system. Each query q∊Q is a legitimate Boolean expression composed of index terms and the logical operators AND, OR and NOT. 3-D is a set of document, D ={d1,d2,….,dn}. Each document di∊D is represented by ((t1,ei1) , (t2,ei2),……,(tm,eim)) where eij denotes the weight of term tj in document di and may take any value between 0 and 1, 1≤i≤n, and 1≤j≤m. These degree of strength eij of term tj in document di are determined either subjectively, by the author of the document or objectively, by some algorithmic procedure[10]. 4- F is a retrieval function given by: F: D X Q →[0,1],

(1)

Which assigns to each pair (d, q), a number in the closed interval [0,1]. This number is a measure of similarity between the document d and the query q and is called the document value for document d with respect to query q. This retrieval function can be defined as follows: i) for each term ti in a query; the function F(d, ti) is defined as weight of ti in document d ii) Logical operators are then evaluated according to certain predefined rules of operators. A fundamental aim of any information retrieval system is to search the relevant information from a pool of knowledge while trying to reject any non-relevant information as possible. There are two primary measures of efficiency of any IR system, Recall and Precision, described as follows: Let D be a set of documents and a query q. Let, TP⊆ D the set of relevant retrieved documents. Let FP ⊆ D the set of retrieved documents non-relevant to query q and FN ⊆ D the set of documents that are relevant to user’s query but are not retrieved by the IR system, then Recall(R) and Precision(P) can be calculated as:

R

| TP | | TP |  | FN |

(2)

And

P

| TP | | TP |  | FP |

(3)

Precision is the number of retrieved documents that are relevant to a query while Recall is the number of “truly” relevant document that are effectively retrieved. Precision is a measure of “Soundness” of the IR system while “Recall” provides a measure of “Completeness” of the system [113]. Both these terms have their advantages and disadvantages. To maintain a trade-off between these terms, a new measure for evaluating any IR system is introduced named the F-measure. It is given as:

F 

(1   2 )  P R (  2  P)  R

(4)

Usually, the value of variable  is taken as 1 and the resultant measure is called also Harmonic mean of the terms: Recall and Precision.

F1 -measure that is

Example1: Let’s assume, we have a knowledge base of 100 documents, and out of these 100 documents, only 25 are relevant to the user. If the Information retrieval system returns 20 relevant documents and 10 non-relevant documents, recall value will be 20/25(80%) and precision value will be 20/30(66.66%). 2.2

Different IR Models

There are four types of IR model used to model the solution of the problems. They are namely[16][17]: 1-Boolean Model: This is the first model of Information retrieval introduced in the literature. Initial IR models were based on Boolean query based on set theory. Documents are represented as sets of terms and queries are Boolean expression on terms[136]. The retrieval mechanism does an exact match by classifying documents that satisfy the Boolean query as being relevant, all other documents as being irrelevant. This model is used by virtually all commercial textual document retrieval systems 2-Vector Space Model: Here documents are represented as binary vectors. The queries are vectors of terms with weights based on the estimated probability of relevance of documents with those terms. 3-Probabilistic Model: documents and queries are represented as vectors in the space of all possible index terms. The document vector consists of weights based on terms frequencies in the collection while the query vectors are binary vectors on the terms. The matching is based on a similarity measure between the documents and the query. 4-Semantic Model: Semantic IR models are used to provide the solutions to the IR problems faced in terms of semantic web. Semantic Web is a term coined by Lee et al. [61] in 2001. According to World Wide Consortium, ”The semantic web provides a common framework that allows data to be shared and reused across application, enterprise and community boundaries.” In other words, the semantic web can be described as an extension of the current web. Information in semantic web is expected to be presented in welldefined format, thus enabling people and computers to work in cooperation. Data in the semantic web is defined and linked in such a way that can be used for more effective automation, discovery, integration and reuse across application. The data and the resulting information can be processed and shared by automated tools as well as people [62]. The fundamental concept of the Semantic Web is to extend the current human-readable web into machine-readable form by encoding some of the semantics of available resources. The functional architecture of semantic web has a vision to move beyond the syntax based resource storage and retrieval. It will enabale modern computer systems to process, search, present and integrate the content of these web based resources in an intelligent and meaningful manner [63]. [Space for Table-1] Boolean model, Vector space model and Probabilistic models are considered as the classical IR models [9-10]. Some other variants of IR models like Language Models, Linguistic Models, Extended Boolean Model, Belief Network model are also introduced, but most of them are an extension to the earlier discussed classic IR models. A Brief taxonomy of IR models is given in table 1. After the popularization of Web, Semantic model of IR has also been proposed. These different IR models caters to the different needs of the user. There is no concept of an entirely satisfactory IR model in all scenarios. Various researches on the supremacy of the two statistical models namely vector space and probabilistic model has proved to be a failure as some

results were showing the dominance of the vector space while some for the probabilistic approach [10]. Vector space models have proved their efficiency for similarity search and relevance feedback if a suitable weighting function is available. Probabilistic approach best work where some prior information about the data is available [14]. 2.3

Difference between Classical and Web-based IR

There are two types of Information Retrieval systems proposed in the literature, (i) Classical and (ii) Web Based. These two are separated with a thin line. A classical IR deals with Offline databases while a Web Based IR deals with IR related to Web. But as the time advanced, the data of most of the organizations is going online; the difference between these two approaches is coming down. In a broad way, both of these terms have come under Mining approaches [38]. Classical methodology of IR deals with the offline data of various entities like companies, organizations, etc. The advantages here are that the most of the users of this data are familiar with the systems as majority of them are more or less a part of the concerned entities. So most of the IR systems are based on this approach of previous knowledge of expected query types and indexing types and so it is relatively easy to develop models keeping above things in mind. Problem with web-based IR system is that they mostly deals with online data retrieval, and the primary users of this system are those who are neither much familiar of the data nor the system. Challenge lie here to depict the expected query type from the inexperienced, unfamiliar users and then map it to a particular document index. Besides this, as offline classical IR system deals with some particular data type along with an expected target audience along with specific queries, so mapping between user query and document indexing can be done pretty effectively. On the other hand, web is vast, dynamic and enormous, while data is distributed, heterogeneous, semi-structured, time-varying, high Dimensional and information contained is Imprecise, Incorrect, Inconsistent, Uncertain, Partial truth, Fast changing(refer fig.1) so there is a significant challenge to develop an IR system according to this different needs. Above paragraphs explain the difference between classical versus Web-based IR systems. It explicitly states that the system that is developed for Classical IR system is not well suited for dealing with the different challenges posed by web-based systems. They also point out the need for developing Web Based IR system to cater the different needs of the system as well as the users described above. Much research has been done to develop a classical IR system based on Fuzzy Logic but the Research to develop Web-based IR system using fuzzy is still in early stage. This paper will try to deal with some major research work related to Web-based IR system using fuzzy logic.

3. Challenges of Web Based Information Retrieval System 3.1 Evolution of Web Today’s web is a result of a continuous and concurrent research and development of the consortium of technologies proposed by individuals and groups [49]. First instance of social interaction, based on computer networking is attributed to J. C. R. Licklider of MIT in August 1962 [159], while he generated series of memos to discuss the concept of galactic network. The concept of predecessor of InternetARPANET was published by Roberts in 1967 [160]. First public demonstration of this new technology was done at International Computer Communication Conference in October 1972 [49]. During 1990’s, the concept of hypertext and world wide web changed the face of the then Internet. These concepts lead to the generation of the era of Web 1.0. Web 1.0 consists of static web pages, and the responsibility of creating information and contents at these pages was on companies/ organizations that were responsible for creation and maintenance of the pages. End-users of the Web 1.0 were considered as consumers of the information with no role in creation and updation of information. The second era of web, Web 2.0 is considered as both a technology and usage paradigm. Murugesan [161] defined the web 2.0 as “..of technologies, business strategies, and social trends.” Web 2.0 empowers the end-user to control and create the content at the web and participate with the various web related activities so as to relate with other users. The third era of web, Web 3.0 is still in its infant phase. Silva et al. [162], describes web 3.0 as “..consists of new generation of web applications that will have specific core technologies to support

them.” Various technologies which will affect the development of web 3.0 are semantic web, web 3D, media-centric web, and social web and the pervasive and ubiquitous web. 3.2 Challenges Today’s web is a combination of technologies, contents and persons. A typical web data contains text, images, audio, video and inter-linkage between them. We categorize various challenges encountered in Web-based IR process in three domains, namely data domain, information domain, user query domain. ( Fig 1): 3.2.1 Data Domain Web data is distributed, heterogeneous, semi-structured, time-varying, high dimensional. 

Distributed: The web data is most of the time is scattered over different servers scattered over different geographical regions.



Heterogeneous: Web data can be found in the form of image, text, audio, video, i.e., in different modes that make it heterogeneous.



Semi-Structured: The data is available in many different formats and is structured according to the website administrator. User preferences although used in structuring the data but keeping in mind the enormous number of users it is not possible to standardize the data that fits for every user.



High-dimensional: Web is growing exponentially. Today’s web data instances are more than what we used to observe for studying patterns.



Time-varying: Web is full of the data that are not regularly updated though the information concerned has been changed. This leads to more inconsistency of the information.

3.2.2

Information Domain

To understand the different areas of challenges in this domain, consider following statements: 1. Apple trees are found in middle Himalayas. 2. Apple trees height is 20 feet .but this I can’t tell you for sure. 3. Apple tree height can reach 200 feet. 4. Apple tree height is 20+ feet. 5. Apple production is varying from 20 Kgs. to 200 Kgs. in the last 100 years. 

Imprecise: Consider statement 1. This information is certain but not precise. Himalayas are scattered in many lakhs of square Kms. and consists of many ranges. So, exactly where the apple trees can be found is not explained in above examples.



Uncertain: Consider statement 2.This Information is uncertain though it is specific.



Incorrect: Consider statement 3.This information is incorrect.



Partial truth: Consider statement 4 .This information is correct, but it does not cover different aspects of the height of the apple tree. As whether it is pruned, dwarf or which species of the apple tree it belonged to?



Inconsistent: Consider statement 5.This is showing inconsistency in the production of apple in a particular orchid.



Fast-Changing: The best example for this is the news sites where the information keeps changing frequently, sometimes in every 10 seconds.

In fact, web is full of this kind of data/information. This makes it difficult to retrieve the related and relevant information. 3.2.3 User Query Domain As we have discussed already that today’s web has billions of web pages. Besides this, today’s web is very unstructured, messy and contains incomplete, inconsistent, incorrect and fast changing information [19].Also, today’s IR system has to deal with many different types to search problems like[20]: -Users are not conscious of their exact information needs. -Information needs changes as the user receives information during his query. -Asking the system about information needs is not usually easy. -Search criteria might depend on users personalized goal For example, if a user searches for “Apple,” he might me asking the search engine about the fruit “Apple,” instead the search engine might return the result of “Apple Inc” which is an IT giant. Another example to understand this is -A Ph.D. candidate might raise search request like “ find recent publication written by Zadeh” or “find highly cited publication talking about description logic.” In this case ”highly,” “Recent”, and even “about” are so called fuzzy or vague predicates .since they can hardly have a precise definition as per the predicate “written by” does have [21]. Lucarella [22] pointed out that the imprecise and uncertain information comes from three major aspects in an IR system environment including the representation of users, queries, the representation of documents and the relevance relationship between user’s queries and documents. Keeping above things in mind, most of the IR systems are developed around the concept of relevance [16]. [Space for Table 2]

4.

Information Retrieval and Soft Computing

There are a number of Artificial Intelligence (AI) [fig.1] methodologies that work together to achieve the consolidated goal of Web Intelligence. A domain of AI is Soft Computing having different areas like Fuzzy Logic, Artificial Neural Networks, Rough Sets, and Genetic Algorithm. These methods are considered more flexible and less computational demanding than the other AI techniques. According to Zadeh [18], guiding principle of Soft Computing is “to exploit the tolerance for imprecision, uncertainty, partial truth, and approximation to achieve tractability, robustness, low solution cost and better rapport with reality.” Fuzzy sets provide a natural framework for the process in dealing with uncertainty. Artificial Neural Networks are widely used for modeling complex function, and provide learning and generalization capabilities. Genetic algorithms are efficient search and optimization tools. Rough sets help in granular and knowledge discovery [7]. Apart from the above soft computing tools, Natural language processing and Swarm Intelligence are areas where research is going for smooth information retrieval.NLP is an area of AI to gather the knowledge related to human behavior and understanding and then map these understandings to develop new tools and techniques to make computer systems to act accordingly. Swarm intelligence is the study of collective behavior of individuals working in a group in a decentralized but self-organized environment. Cerulo et al. [11] established in their survey paper that the best methodology to deal with uncertainty is the Fuzzy Theory.

5. Different Fuzzy Approaches towards IR Fuzzy logic has been in extensively use in the domain of Information retrieval from the early days of Networking. It has become more relevant in today’s scenario because of its inherent ability to solve the complex problems faced by exponentially increasing size of web and web information. While the last decade witnessed the progress in the field of Information Retrieval in form of various retrieval mechanism introduced for a better and efficient query-document mapping targeted for static and offline data repository like company databases, institutional databases etc., It has turned its core focus towards

online/offline ,dynamic and fast growing contents .The challenges we have discussed in section 3 clearly demonstrates that the today’s information retrieval system needs to evolve and follow different strategies to cope up with these new and primitive challenges. The concept of globalization has forced the corporate, as well as academic world, to go global and so is their data, prompting them to convert their offline static data into online dynamic one. Keeping these things in mind, this paper intends to find out different methodologies of Information Retrieval paradigm currently in practice. This paper has mainly taken into account the advances introduced in the literature in this field in the last 20 years. Based on the data available we can divide last 20 years into 3 parts :(i)Time ranging from 1991-2003, (ii) time ranging from 2001-2007, (iii) Time ranging from 2006-2014. Table 2 consolidates various scholarly work done related to IR using fuzzy logic during this period. We have categorized various related work into eight sections. First sub-section gives details of work done using the concepts of fuzzy concept network, second and third sub-sections gives some insight of formal concept analysis and fuzzy formal concept analysis with a short description of fuzzy ontologies. Fourth subsection describes various fuzzy operators used for effective IR. Fifth subsection describes fuzzy agents. Although the research in the area of agents dates back to early 90’s, most of the fuzzy integration to the agent-based research in IR was done in the last decade. So, this subsection mostly includes literature from 2000 onwards. Sixth sub-section presents work done in the core area of semantic web. Seventh sub-section provides different fuzzy logic work related to fuzzy genes, fuzzy clustering, multiple related ontologies, response time. Eight sub-section gives a brief insight of rough set and its integration with fuzzy logic. 5.1 Fuzzy Concept Network (FCN) The term “Concept” refers to as a process of identification of the significant domain terms of the document. It might be considered as an extension of the indexing process. One of the earliest work describing the “concept” was proposed in [23] where the author tried to extract pertinent noun groups from the documents. This “concept” was further extended and proposed as Fuzzy Concept Network in [24] for information retrieval from a knowledge base. A FCN consists of nodes and directed links (fig. 4). These nodes represent concepts frames while the directed links represents relationships between concepts semantically. The nodes can represent a concept or document. Real value µ is associated with the directed links, where µ∊[0,1]. This real value µ reflects the strength of the semantic association between the nodes. Typically, a knowledge base K can be viewed as a directed graph (C,T,L), where C is the set of the concepts that represents meaningful entities in the domain, T is the set of admissible relation type and L⊆ C X T X C is the set of links between concepts. Accordingly, given t∊T, a link can be defined as a binary fuzzy relation l∊ L As: L = { µ1(c,r) / (c,r) ) | c,r ∊ C}

(5)

With a membership function µ1: C X C ->[0,1] indicating for each pair (c,r) a measure of the strength of the semantic links between pairs of concepts. The notation (c,r) means that c and r are linked to a degree given by µ1(c,r). With the help of this model and multivalued fuzzy logic(Modus Ponens and Modus Tollens), authors presented FIRST: Fuzzy Information Retrieval System. A fuzzy relations between concept network was proposed in [25], as links between the concepts. Four types of relationships were described as: 1. Fuzzy Positive Association: It relates concepts that have in some context a similar meaning i.e. typically used in the same context.(e.g., person ↔ individual, person ↔ address). 2. Fuzzy Negative Association: It relates concepts that are complementary (Eg. Male ↔female), incompatible (Eg. unemployed↔freelancer) or antonyms (Eg. small↔large). 3. Fuzzy Generalization: one concept is regarded as a generalization of another concept if it includes that concept in an analytic (person→students), partitive (Eg. machine →screw) or if it includes that concept (Eg. vehicle→car). 4. Fuzzy Specialization: It is the inverse of the generalization relation (e.g., student→person). [Space for Figure-4] [Space for Figure-5]

Each member of a relation has a numeric value between 1 and 0 associated with it that express the strength of the relationships. Its mathematical representation is shown in table 3. [Space for Table 3]. These relations hold along with following restrictions: 1. µP(x,y) ≠ 0→ µN(x,y)= 0

µG(x,y)=0

µS(x,y)=0

µP(y,x)= µP(x,y)

(6)

2. µN(x,y) ≠ 0→ µP(x,y)= 0

µG(x,y)=0

µS(x,y)=0

µN(y,x)= µN(x,y)

(7)

3. µG(x,y) ≠ 0→ µP(x,y)= 0

µN(x,y)=0

µS(x,y)=0

µS(y,x)= µG(x,y)

(8)

4. µS(x,y) ≠ 0→ µP(x,y)= 0

µN(x,y)=0

µG(x,y)=0

µG(y,x)= µS(x,y)

(9)

This model not only take the directly linked concepts but traverses one or more link to find related concepts. Besides above mentioned four relationships, another relationship name “context relationship” was proposed between a concept representing a dialogue context and the dependent relationship between two concepts. Extended Fuzzy Concept Network was proposed in [26], consisting of nodes and directed links. Each directed link connects two concepts or connects from a concept ci to a document dj, based on the following condition:  , A) ci (  X , Where, A{P, N , G, S} and X {c j , d j } and   [0,1] .

(10)

A total of 8 Combinations of directed links is possible. Every directed link in an extended fuzzy concept network is labeled with a pair of values (µ,FR) , where µ denotes the degree of relevance and FR denotes the fuzzy relationships between the concepts and concept/document. In different application domains, the kind of relationship between any two concepts may be defined differently from various points of view. Thus, if only one kind of relationship among several possible relationships between concepts can be kept in knowledge base, then the resulting knowledge base is restrictive in nature and thus appropriate for some specific application areas. Keeping this in mind, an upgraded approach to the one discussed above is presented in [27], where instead of using single fuzzy association between concepts and concepts/documents; a multi-relationship fuzzy association was presented. According to [27], A multi-relationship fuzzy concept network is denoted as MRFCN (E, L), where E is a set of nodes, and each node stands for a concept or a document; L is a set of directed edges between nodes. For this approach, Equation 1 is modified as –( µ,A) is replaced with the quadruple (,< µN,N>,< µG,G>,< µS,S>), where µP, µN, µG, µS describes the fuzzy association between these concepts thus giving a very high level of membership among index terms and the query keyword. A Comparative diagram of concept network and extended concept network is given in fig. 4 and fig 5.Evolution of concept network is given in figure 6. [Space for Figure 6] 5.2

Formal Concept analysis (FCA)

Formal concept analysis has been introduced by Wille [28] in 1982 for analyzing and structuring a domain of interest. Formal concept analysis is a formal technique for data analysis and knowledge representation. It defines formal contexts to represent relationships between objects and attributes in a domain. From here, FCA can then generate formal concepts and interpret the corresponding concept lattice making information retrieval much easier. Formal concept analysis provides both a conceptual framework and mathematical framework. While the former part is responsible for structuring, analyzing and visualizing data to make them more interactive and more understandable, the latter part gives a precise mathematical foundation to the data to support several activities, in different research field [29]. Based on lattice theory[30], Formal concept analysis has distinguished itself as one of the major area of research which has found its relevance in a variety of domain areas like Computer science, Linguistic, Medicine, Sociology, Psychology, mathematics Etc. In Computer science, it is extensively used in different research domains like Software engineering, requirement analysis, component retrieval, Information retrieval, Conceptual modeling, Artificial intelligence, Object oriented databases, etc. It is also established itself as a well-founded methodological approach for the construction of ontologies for semantic web development during the last decade. Formal concept

analysis can serve as a guideline for ontology building because it allows the identification of concepts by factoring out their commonalities while preserving concepts specialization relationships. Remaining part of this section will be about the basics of Formal Concept analysis followed by some of the scholarly research work done [ table2]. Some Terminologies related to FCA are [32]: Definition: Formal Context: A formal context is a triple (O, A, R), where O and A are two sets of elements called object and attributes, respectively, and R is a binary relation between O and A. Definition: Formal Concept: Given a context (O, A, R), Let E, I be two sets such that E⊆ O and I ⊆ A. Then consider the dual set E’ and I', i.e. the sets defined by the attributes applying to all the objects belonging to E and the objects having all the attributes belonging to I, respectively, i.e.: E’ = { a ∊ A | oRa ∀o∊ E}

(11)

I’ = { o ∊ O | oRa ∀a∊ I}

(12)

A formal concept of the context (O, A, R) is a pair (E, I) such that E⊆ O and I ⊆ A and the following conditions hold: E’ =I , I’= E. (13) The set E and I represent the concept extensional and intensiona’ components, respectively and are referred to as extent and the intent of the concept, respectively. The extension covers all objects belonging to the concept, while the intension comprises all attributes valid for all those objects[31].The term object and attributes are referred as documents and terms. This mapping makes it easy to use concepts of FCA in information retrieval. As discussed earlier FCA is a concept used in a variety of areas, so it is important to map the different concepts of FCA to the desired application area. Given two concepts (E1, I1) and (E2, I2) of a context (O, A, R), an inheritance relation(≤) can be establish according to following condition: ( E1, I1)  ( E2 , I 2 )

iff E1  E2 (iff I 2  I1 ) (14) ( E1, I1) is called the subconcept of ( E2 , I 2 ) and ( E2 , I 2 ) is called superconcept of ( E1, I1) . Definition: Concept Lattice: Given a context (O,A,R) ,consider the set of all concept of this context ,indicated as L =(O,A,R). Then, (L =(O,A,R),≤)

(15)

is a complete lattice called concept lattice(or Galois Graph). ‘≤’ represents a partial order relation, and so there is an existence of greatest lower bound and least upper bound as per definition of partial order relations. The concept lattice also has two special nodes, the maximum (labelled with ┬) and the minimum (labelled with ┴). During the year 2001-2006[table2], Formal concept analysis was extensively used to improve retrieval of information. Formal concept analysis was used to develop domain Ontologies with the help of the similarity graph. Anna Formica [32] successfully explained the concept of mapping the concept lattice to similarity graph. The author established that the similar attributes are more important than the common objects helping greater ontology integration. [Space for Figure-7] 5.3

Fuzzy Formal Concept analysis

Fuzzy formal concepts analysis is a generalization of FCA for modeling uncertainty information [34].FFCA can support ontology construction when some information is more relevant than other data or semantic web search when the user is not sure about what he/she is looking for. FFCA provides a mathematical framework which can support the construction of formal ontologies in the presence of uncertain data for the development of the semantic web [32]. Tho et al. [35] proposed FFCA, where they combine fuzzy logic and FCA to propose FFCA where uncertainty information is directly

represented by a real number of membership value in the range of [0,1]. After the induction of FFCA, the research domain was diverted towards FFCA. 5.3.1

FFCA-Some Terminologies

Definition: Fuzzy Formal Context: A fuzzy formal context is a triple K=(G,M,I=φ(G X M)), where G is a set of objects, M is the set of attributes, and I is a fuzzy set on domain G X M. Each relation (g,m) ∊ I has a membership value µ(g,m) in [0,1]. Fuzzy Representation of object: Each object O in a fuzzy formal context K can be represented by a fuzzy set Φ(O) as Φ(O)={A1(µ1), A2(µ2),……. Am(µm)}

(16)

Where, {A1 ,A2 …Am} is the set of attributes in K and µ i is the membership of O with attribute Ai in K. Φ(O) is called the fuzzy representation of O. A α-cut can be set to eliminate relations that have low membership value. Since, each relationship between the object and an attribute is represented as a membership value in fuzzy formal context, the intersection of these membership values should be the minimum of these membership values(The intersection of fuzzy set A and B is given by µA∩B(x)=min(µA(x),µB(x)). Definition: Fuzzy Formal Concept: Given a Fuzzy Formal Context K=(G, M, I) and a confidence threshold T, define: A* = {m∊ M | ∀g∊ A : µ(g,m) ≥ T } for A⊆ G (17) And B* = {g ∊ G | ∀m∊ B : µ(g,m) ≥ T } for B⊆M. (18) A fuzzy formal concept of a fuzzy formal context (G, M, I) with a confidence threshold T is a pair (Af =φ(A), B) where A⊆ G, B⊆M, A* =B and B*=A. Each object g∊φ(A) has a membership µg defined as µg = min  ( g , m) (19) mB

Where,  ( g , m) = membership value between object g and attribute m defined in I. If B={}, then µg =1 for every g. A and B are the extent and intent of the formal concept (φ(A), B), respectively. Further Tho et al. [35], introduced fuzzy concept lattice to calculate the similarity between a concept and its subconcept. Definition: Fuzzy Formal Concept Cardinality: Since the fuzziness of a fuzzy formal concept is represented by membership values of objects of the concept, the cardinality of a fuzzy formal concept Kf = (φ(A), B) is defined as |Kf|=|φ(A)|. Definition: Fuzzy Formal concept similarity: The similarity of a fuzzy formal concept Kf1 =(φ(A1), B1) and its subconcept Kf2=(φ(A2), B2) is defined as E(Kf1, Kf2)=E(φ(A1),φ(A2)). 5.3.2

Fuzzy Ontologies

In classical literature, ontology is considered as the study of various types of things that exist. Chandrasekaran et al. [120] defined ontology as: “Ontologies are content theories about the sorts of objects, properties of objects, and relations between objects that are possible in a specified domain of knowledge.” They further established that various kinds of Information-retrieval systems need domain ontologies to organize information and direct the search process. In semantic web architecture, ontologies are specified in a machine-readable format and represent various aspects of domain knowledge by defining concepts, attributes of concepts and relationships between the concepts with the taxonomic is-a relationship [119]. Last decade has witnessed new advancement to integrate fuzzy logic for knowledge representation on the semantic web [122]. One of the results of this integration is in the form of fuzzy formal concept analysis (FFCA) discussed in preceding section. FFCA takes fuzzy formal context as input and gives fuzzy concept lattice as output

[121] which is then used to develop fuzzy ontologies. Most of the research work to generate fuzzy concept lattices uses two major approaches, namely, i) a α-cut approach[123], and ii) fuzzy closure operator[124]. Cross et al. [121] compared these two techniques on bioinformatics data (gene annotation data file) and concluded that fuzzy closure approach produces an extremely large number of fuzzy concepts compared to the thresholding approach. They also concluded that the extents produced by the threshold approach are a subset of those produced by the fuzzy closure approach [119]. In one of the most recent work with respect to FFCA, Maio et al. [33], used the FFCA for a better ontology –based retrieval mechanism supporting data organization and visualization and proposed a better navigation model. A detailed study of Formal Concept analysis in Knowledge Processing can be found at [60]. Some other scholarly work done in the area of FFCA and IR are given in Table 2. 5.4

Fuzzy Operators

While the FCA and FFCA approach uses the concept of lattices for better mapping of keyword with index terms, the operator uses some mathematical equations and series like geometric mean, arithmetic mean, etc. for the same purpose. So, they can also be considered as an essential component of information retrieval Function ‘F’ as described earlier in section 2. MAX and MIN are two earliest fuzzy operators introduced in the literature. Many other operators were introduced with the passage of time. These operators are divided into two broad categories based on their domain of use. These categories are T-operators and Averaging operators. T-operators are generally used for logical operations like AND, OR, while averaging operators are used in multi-criteria decision making when there are multiple sources of information consisting of various experts. A detailed list of fuzzy operators is given in Table 4. In the earlier era of computing, variants of t-norm and t-conorm operators dominated the information retrieval domain. Although some averaging operators were also introduced like p-norm, Waller- kraft, Infinite one, etc. but their approach was mainly based on “anding” and “oring” of the terms. The theory of T-operators starts with MAX and MIN operators. Some basic terminologies related to these operators are given in [15] to understand the behavioral properties of these operators. Some basic definitions related to these operators are being provided: Definition: An operator Ө is single operand dependent if Ө(x,y) is either x or y for all x,y ∊[0,1].It is called partially single operand dependent when one of the following conditions is satisfied (i) Ө(0,x)=Ө(x,0)=0 or x , (ii) Ө(1,x)=Ө(x,1)=1 or x. Definition : An operator Ө is negatively compensatory if Ө(x,y) is less than MIN(x,y) or greater than MAX(x,y) for all x,y∊[0,1]. When Ө(x, y) is less than MIN(x, y) or greater than MAX(x, y) only in some value ranges of x and y, it is called partially negatively compensatory. Definition: An operator Ө is positively compensatory if Ө(x, y) is greater than MIN(x, y) and less than MAX(x, y) for all x, y∊[0,1] with the exception that Ө(x, x) is equal to x. With the help of an example, working of these operators can be understood. Example2: Suppose we have two documents d1 and d2 shown below. The documents are represented by two pairs of index terms and its weights as: Document 1,

d1 ={(Desktop,0.40),( Laptop,.40)},

Document 2,

d2 = {(Desktop,0.99),(Laptop,0.39)},

Query,

q1 = Desktop AND laptop

The use of MIN operator for the AND operation returns the document value for document d1 and d2 as 0.40 and 0.39 and thus d 1 is retrieved with a higher rank than d2. But as one can see that the document d2 should be the obvious choice for this particular query.

Example3: Suppose that we have two documents d3 and d4 and a query q2 as follows: : Document 3, d3 = {(t1,0),(t2,.8,)(t3,1),…..(t100,1)}, Document 4, Query,

d4 = {(t1,0),(t2,.1,)(t3,.1),…..(t100,1)}, q3 = t2 OR t100

In the above query, the degree of satisfaction of both the document d3 and d4 are same while a careful examination can tell us that the document d3 is more appropriate solution to the query q3. These two examples were presented to give the user a brief view of working of fuzzy operators. These two examples uses basic MAX and MIN operation and shows the deficiencies in the above approach. Yager [36] in 1988, introduced the concept of ordered weighted averaging operator(OWA) where weights were assigned to indexed terms to calculate the Information retrieval function 'F’ finally. Many modifications of OWA operators were then introduced and are currently in use like Geometric mean averaging operators, ordered weighted geometric operators, etc. Lee et al. [15], investigated the various behavioral aspects of T-operators and concluded that Zadeh’s basic fuzzy operators MAX and MIN are inappropriate for a model of Information retrieval system as given in above two examples. Renata et al. [37] compared the various aggregation operators for selecting applicants for a Ph.D. program and concluded that the various averaging operators have their own domain of discourse, and they work best when they are used in the environment for which they are developed. A detailed theory of T-operators is given in [40] and theory of weighted averaging operators is given in [39][40]. [Space for Table 4] 5.5 Intelligent Agents In the context of Computer Science and Artificial Intelligence, Intelligent Agents can be defined as the programmable software capable of performing autonomous operations (based on some designated task) on behalf of a user or another program [144]. They are also known as Wizards, Software Agents or Multi-agent system [138]. Wooldridge et al.[137] proposed some characteristics of an agent as: 1. Autonomy: Agents are autonomous entities with negligible interference from external factors like humans with some level of control on their internal state and action [139]. 2. Social Ability: Agents have their specific language, agent communication language, to communicate with themselves or to some other entity [141]. 3. Reactivity: Agents observe their environment (which may be a user via a graphical user interface, a physical world, the Internet, a collection of other agents, or perhaps all of these combined), and respond in a timely fashion to change that occur in it. 4. Pro-activeness: agents are supposed to take initiative and exhibit goal directed behavior. They are not supposed to a display a simple act of responding the environment. 5. Mobility: is the capability of an agent to move around an electronic network [143]. 6. Veracity: is the presumption that an agent will not communicate false information delibrately [140]. 7. Benevolence: Agents will try to pursue their own objective (for which they are designed) without any conflicting goals among themselves [142]. 8. Rationality: is the assumption that an agent will act in order to achieve its goals, and will not act in such a way as to prevent its goal being achieved- at least insofar as its beliefs permit [140]. Because of their usefulness, agents are used in a variety of application domains like, data communication, concurrent system research, robotics and user interface design [137][151], supply chain management, scheduling and control, manufacturing system [146], medical applications [147], Information retrieval from the web, etc.

Intelligent agent for the Internet is called Information agents [145]. Klusch [145] defined information agents as “..an autonomous, computational software entity (an intelligent agent) that has access to one or more, heterogeneous and geographically distributed information sources, and which pro-actively acquires, mediates, and maintains relevant information on behalf of users or other agents, preferably just in-time.” Cortez at al. [152] proposed an agent based IR approach which was based on the combination of fuzzy logic and Bayesian Network. The approach was based on retrieving meaningful information from web pages of “interest” by filtering the desired pages with the help of Bayesian network and then using multiple agent namely, domain facilitator agent, Management system agent, agent communication channel and wrapper agent. Cesarano et al. [149] proposed an agent based framework for improved information retrieval from the web. The proposed framework consisted of Search Engine Wrapper, Web Spider Agent, Document Preprocessor, and Miner Agent. A short survey on multi-agent system for information retrieval in semantic web can be found at [150]. Ropero et al. [148] proposed an intelligent agent based on fuzzy logic for information extraction. The agent was used along with vector space model to propose term weighting scheme. For the experiment, a web page was divided into ‘N’ objects. Each object was divided into some sets of ‘M’ standard questions. From these ‘M’ sets, index terms were extracted using vector space model method. The weights of these index terms become the input for a fuzzy inference engine, which, based on some fuzzy rules, help in preparing an index term database that is then used for classification and information retrieval. Some scholarly work related to agents is provided in table 2. 5.6

Fuzzy Solution to Semantic Web IR

The core of the semantic web contains a number of fundamental formal models, languages and technologies for interoperability and reuse of information, including RDF, RDFS, the OWL family of languages, the WSML family of languages and SPARQL [64]. With the rapid development of personalized IR, user profile plays an important role. Han et al. [65] proposes a fuzzy clustering method of construction of ontology-based profile (FCOU).This method combines fuzzy clustering techniques with optimization techniques to develop ontology-based user profile. It employs an augmented Lagrangian function to create fuzzy clustering model for the construction of user profile. A breakthrough in fuzzy ontology construction using fuzzy XML models is proposed by Zhang et al. [66]. In their paper, they showed that classical ontology construction approaches are not sufficient for handling imprecise and uncertain information that is commonly found in many application domains. To overcome this drawback they proposed a formal approach for constructing fuzzy ontologies from fuzzy XML models. In the year 2011, Bobillo [67] proposed a concrete methodology to represent fuzzy ontologies using OWL 2 annotation properties. This approach is a based on the procedure to represent vague information within current standard languages and tools. This paper again proved the fact that fuzzy queries are most suitable to handle vague and imprecision knowledge [68]. Stoilos et al.[118] extended the OWL, a web ontology language with the help of fuzzy set theory to propose a new language called fuzzy OWL, which intends to enhance search query result by capturing and representing vague and imprecise information and then representing them with the help of this language. Jin et al. [69], proposed an approach based on fuzzy which can formulates one’s search request through tightly combining fuzziness together with the user’s subjective weighting importance over multiple search properties. In this paper, a special ranking mechanism based on the weighted fuzzy query representation is proposed. This Ranking method is general and unique rather than arbitrary. Search result was supposed to be based on matching search intention. It yielded much better result than existing approaches. 5.7

Miscellaneous work

5.7.1

Fuzzy Genes

Bautista et al. [133] proposed an association of Genetic Algorithm and Fuzzy Logic for improvement of IR systems. The concepts of fuzzy logic was used to propose a new term weighting scheme in the ‘genes’ of the chromosomes, where ‘genes’ are the terms extracted from the documents and can be defined as[133]: Definition: A gene G is a pair G(t,w), where ‘t’ is a term, and ‘w’ is the weight associated to the term for each document in the collection where the term appears at least once (otherwise it will be assumed that the weight of the term is zero). In fuzzy based weighting scheme, Bautista et al.[133] used machine learning approach to select the most relevant terms. A user feedback module guides the genetic algorithm to provide a suitable fuzzy classification. A ‘fuzzy gene’ can be describes as follows: Let user’s evaluation is denoted by ui. Let Ω is the ordered set of documents and is given as Ω = {D1,…,Dk , Dl ,…, Dm }. The set of good and bad documents are selected from Ω, as Ω G = { D1,…,Dk } and ΩB = s{ Dl ,…, Dm } respectively. Let T={t1,….,tn } be the set of symbols extracted from Ω and xij be the relative frequency of symbol tj in document Di. Then, a fuzzy gene G is a pair G(t, ῆ ) and ῆ is a fuzzy number characterized by the membership function µn as,

  0 ,x  0 n ( x)   x  2  1/2( )   e ,x  0 

(20)

The value of  and  is based in the estimation of the expected value of xj in good and bad documents as is given as: i) Parameter  : The parameter is evaluated as the weighted average of the relative occurrence frequency of a symbol tj in good and bad document by x j and x 'j , respectively as follows: k

 (u .x ) i

xj 

ij

i 1

(21)

k

u

i

i 1

And, m

 ((1  u ).x ) i

x 'j



ij

i  k 1 m

(22)

 (1  u ) i

i  k 1

ii) Parameter  : It can be calculated as k

 (u x

i ij

 2j 

 x j )2

i 1

(23)

k

u

i

i 2

For relevant document, and m

 ((1  u ) x i

 2j 

ij

 x 'j ) 2

i  k 1

m

 (1  u ) i

i k  2

For non-relevant document.

(24)

These fuzzy genes are then used for document evaluation. 5.7.2 Fuzzy Clustering Cluster analysis is an integral domain of data mining and knowledge discovery systems. Cluster analysis can be defined as a consortium of methodologies which divides a data-set, A into B subsets (clusters) with a property that they are pair-wise disjoint, nonempty and can reproduce A via Union of the partitioned sets. One of the most popular and widely used fuzzy clustering algorithm is Fuzzy-c means introduced in [131]. Kraft et al. [130] used FCM clustering algorithm to capture the various relationships exhibited by various index term among them. This paper [130] used fuzzy logic rules to represent the association relationships between index terms. They further used these rules to form the basis of the association mechanism. The system was designed in such a way, which applies fuzzy logic rules to modify the original user’s query after the user enters them. The work of Kraft et al. [130] was extended by Horng et al.[132]. They proposed a fuzzy agglomerative hierarchical clustering algorithm for clustering documents. The algorithm returns the center value of each document cluster. Then a set of constructed fuzzy logic rules are applied to modify the user’s query for query expansion. The fuzzy logic rules can represent three kinds of fuzzy relationships (i.e., fuzzy positive association relationship, fuzzy specialization relationship, and fuzzy generalization relationship) between index terms. This implementation helps in to guide the IR system to retrieve the relevant document as per user’s request. 5.7.3 Multiple related ontologies Leite et al. [129] tried to overcome the shortcomings of generally using a single conceptual structure to model the knowledge base and to generate ontologies discussed earlier. In their paper, Leite et al. [129] took a document as a collection of different domain names, explicitly expressed by distinct ontologies. The paper uses three kind of relationships for the work, namely, i) Fuzzy specialization, ii) Fuzzy generalization, iii) fuzzy positive association. The experiment uses a document collection sample of Agrometeorology domain in Brazil, a query set, ontologies. The result, thus obtained was improved than the conventional approaches. 5.7.4 Response time Most of the above mentioned work does not take into consideration the bandwidth and end user’s system capabilities which affect the overall response time of the system. Olshefski et al.[128] established that the response time is a major factor playing active role in the end user satisfaction while using any web application. Ajayi et al. [127] successfully proposed a model to use fuzzy logic for improving the overall response time while taking limited bandwidth and system characteristics like, processor speed, memory size, resolution, availability of anti-virus, etc.). 5.8

Rough Set

Rough set theory [59] is an extension of set theory for data analysis in the presence of inexact, uncertain or vague information. The combination of rough set theory and FCA provides an interesting framework for semantic web development, for the definition of hybrid similarity measures for ontology mapping, alignment and integration, etc.[60].Although this area of research is sometimes used in conjunction with FCA, it independently have very little relevance towards Information Retrieval approach, so we will not discuss this topic in detail. Formica [70] showed how rough set theory can be employed in combination with Fuzzy Formal Concept Analysis to perform semantic web search and discovery of information in the web. According to this proposal, in the case the required data are not modeled by formal concepts, the user can search and discover information in the web that are closer to his/her preference by following a twofold approach. Poelmans et al. [109] has provided a survey on fuzzy and rough formal concept analysis. A comparative analysis between FCA and Rough Set theory can be found at [117].

6.

Conclusion and Future Direction

Information retrieval is one of the most important areas of web intelligence. Most of the million dollar E-Commerce and social media sites heavily depend on the effective and efficient information retrieval for their revenue generation. In other words, this domain is responsible for billions of dollars in word economics. Fuzzy is playing an important role in making this task simpler. The inherent power of fuzzy logic with a high level of flexibility makes it possible to design and develop Information Retrieval system which can match with users ever growing demands. Considering the vast area of Information Retrieval research, here are some of the key areas which need to be keep in mind while developing any Information Retrieval system. a) Context: A context can have different meaning at different instance of time. A user’s context and the environment in which the user is seeking information both play an important role while retrieving information. Even the most advanced technology can not 100% match the user’s context of thinking and the surrounding environment. As far as Fuzzy context is concerned, it is still a complex task to build a fuzzy lattice from the given context. Although, research to decompose the fuzzy lattice in crisp and small lattice is undergoing, but it is still in the infant stage. b) Integration: As discussed in previous sections, different soft computing tools have their specific area of research and fuzzy is mainly used for uncertainty. But there is a need to integrate different tools with fuzzy to provide more efficient results. There are some instances of using fuzzy logic concepts with other soft computing methodologies, but this collaboration needs to be exploited more in theory as well as in practice. c) Architecture: Work related to Information retrieval is concentrated on the logical part. Hardware architecture is considered a domain of database management system. Future work can be shifted to the design the architectural part of information storage keeping information retrieval in mind. Besides this, Last 2-3 years have seen a tremendous shift of users from personal computers to mobile resulting in a change of hardware, as well as software architecture, and thus change of indexing plan of the documents and hence making the modern information retrieval system to work in a much widespread architecture. d) Data representation: WWW is full of information, but the problem lies with the format of stored information. The format of information stored is at the disposal of Database administrator, but it is very difficult to keep the data in many different formats for different target audience to reduce replication. Eg. The data is in .jpg format in the form of the pie chart, but some users want the data in Excel format. There is a need to develop such system that can extract information from various sources and formats and map them to the desired information. This will help in more effective information retrieval. e) Prediction: Much research is going on web prefetching and web prediction to prefetch the expected information in advance. There are many different ways to gather the prefetching information like user’s demographic information, but use of proxy servers has not been exploited for this purpose yet. And as we are aware that most of the people connects with the internet form some proxy servers specially those at work, this area has much potential to exploit. f) Computing Time: The different work we came across regarding Information retrieval deals with efficient use of fuzzy logic to retrieve the information that satisfies user’s request. Computation time has still considered the domain of computer hardware architecture and Data Base Management Systems. In the future, computing time can be included while designing any Fuzzy Information Retrieval System to add efficiency to the effective retrieval of information. g) Domain Specific: Research work going on in this domain are highly domain specific and are limited to some particular domains like travelling, tourism and online shopping stores. Every domain has a different type of uncertainty levels so it is difficult to design a system that will work in different domains, but there is a need to expand the work of fuzzy to include more domains like Medical, Education and small shopping sites.

Acknowledgement This Work has been partially funded under grant F.No.42-134/2013(SR) in the Major Research Project Scheme of University Grant Commission ,India. References http://www.worldwidewebsize.com/ retrieved on 18.01.2013. [2] http://www.internetworldstats.com/stats.htm retrieved on 18.01.2013 [1]

[3] [4] [5] [6] [7]

[8] [9] [10] [11] [12] [13] [14]

[15] [16] [17] [18] [19] [20] [21] [22]

[23]

[24] [25] [26]

[27] [28]

th

Ning Zhong, Liu Jiming, Y. Y. Yao, S.ohsuga,” Web Intelligence,” in proceedings of 24 Annual International computer software and application conference, COMPSAC 2000 A report on “Impact of Internet technologies: Search “ by Mckinsey& company, July 2011 Domenech, J., de la Ossa, B., Sahuquillo, J., Gil, J. A., & Pont, A. (2012). A taxonomy of web prediction algorithms. Expert Systems with Applications,39(9), 8496-8502. Arotaritei, D., & Mitra, S. (2004). Web mining: a survey in the fuzzy framework.Fuzzy Sets and Systems, 148(1), 5-19. Pal, S. K., Talwar, V., & Mitra, P. (2002). Web mining in soft computing framework: relevance, state of the art and future directions. Neural Networks, IEEE Transactions on, 13(5), 1163-1177. Mitra, S., Pal, S. K., & Mitra, P. (2002). Data mining in soft computing framework: a survey. IEEE transactions on neural networks, 13(1), 3-14. Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval (Vol. 1, p. 6). Cambridge: Cambridge university press. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval (Vol. 463). New York: ACM press. Cerulo, L., & Canfora, G. (2004). A taxonomy of information retrieval models and tools. CIT. Journal of computing and information technology, 12(3), 175-194. Kobayashi, M., & Takeda, K. (2000). Information retrieval on the web. ACM Computing Surveys (CSUR), 32(2), 144-173. Ferrández, A. (2011). Lexical and syntactic knowledge for information retrieval.Information Processing & Management, 47(5), 692-705. Djoerd Hiemstra,”Information Retrieval Models” Published in John Davies, Ayse Goker, st Margaret Graham(Eds.), in Information Retrieval: Searching in the 21 Century, Wiley 2009 Ho Kim, M., Ho Lee, J., & Joon Lee, Y. (1993). Analysis of fuzzy operators for high quality information retrieval. Information Processing Letters, 46(5), 251-256. Salton, G. (1989). Automatic Text Processing: The Transformation, Analysis, and Retrieval of. Addison-Wesley. Singhal, A. (2001). Modern information retrieval: A brief overview. IEEE Data Eng. Bull., 24(4), 35-43. Zadeh, L. A. (1997). What is soft computing?. Soft computing, 1(1), 1-1. Stefan Schlobach, Craig A. Knoblock,”Dealing with the Messiness of web data,”Web Semantics: Science, Services and Agents on the world wide web,” 14(2012) 1. Belkin, N. J. (2008, June). Some (what) grand challenges for information retrieval. In ACM SIGIR Forum (Vol. 42, No. 1, pp. 47-54). ACM. L. A. Zadeh,”Fuzzy Sets,” Information and Control 8(3)(1965) 338-353. Lucarella, D. (1990, March). Uncertainty in information retrieval: An approach based on fuzzy sets. In Computers and Communications, 1990. Conference Proceedings., Ninth Annual International Phoenix Conference on (pp. 809-814). IEEE. Bruandet, M. F. (1987, November). Outline of a knowledge base model for an intelligent information retrieval system. In Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 33-43). ACM. Lucarella, D., & Morara, R. (1991). FIRST: Fuzzy information retrieval system.Journal of Information Science, 17(2), 81-91. Kracker, M. (1992, March). A fuzzy concept network model and its applications. In Fuzzy Systems, 1992., IEEE International Conference on (pp. 761-768). IEEE Chen, S. M., & Horng, Y. J. (1999). Fuzzy query processing for document retrieval based on extended fuzzy concept networks. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 29(1), 96-104. Chen, S. M., Horng, Y. J., & Lee, C. H. (2003). Fuzzy information retrieval based on multirelationship fuzzy concept networks. Fuzzy Sets and Systems,140(1), 183-205 R. Wille,”Restructuring Lattice Theory: An approach based on hierarchies of concepts,”I Rival(Ed.), proceedings of ordered Sets, Reidel, Dordrecht, Boston,1982,pp 445-470.

[29]

[30] [31] [32] [33]

[34]

[35]

[36]

[37] [38]

[39]

[40]

[41] [42] [43] [44] [45] [46] [47] [48] [49]

[50] [51]

Tilley, T., Cole, R., Becker, P., & Eklund, P. (2005). A survey of formal concept analysis support for software engineering activities. In Formal Concept Analysis(pp. 250-271). Springer Berlin Heidelberg. G. Birkoff,”Lattice theory,” American Mathematical society, Providence RI 1967. B. Ganter, R.Wille,”Formal concept analysis: Mathematical foundation,”Springer, Berlin,1999. Formica, A. (2006). Ontology-based concept similarity in formal concept analysis. Information Sciences, 176(18), 2624-2641. De Maio, C., Fenza, G., Loia, V., & Senatore, S. (2012). Hierarchical web resources retrieval by exploiting fuzzy formal concept analysis. Information Processing & Management, 48(3), 399-418. BĚLOHLÁVEK, R., Outrata, J., & Vychodil, V. (2008). Fast factorization by similarity of fuzzy concept lattices with hedges. International Journal of Foundations of Computer Science, 19(02), 255-269. Tho, Q. T., Hui, S. C., Fong, A. C. M., & Cao, T. H. (2006). Automatic fuzzy ontology generation for semantic web. Knowledge and Data Engineering, IEEE Transactions on, 18(6), 842-856. Yager, R. R. (1988). On ordered weighted averaging aggregation operators in multicriteria decisionmaking. Systems, Man and Cybernetics, IEEE Transactions on, 18(1), 183-190. Smolíková, R., & Wachowiak, M. P. (2002). Aggregation operators for selection problems. Fuzzy Sets and Systems, 131(1), 23-34. Kohli, S., & Gupta, A. (2014, January). A Survey on Web Information Retrieval Inside Fuzzy Framework. In Proceedings of the Third International Conference on Soft Computing for Problem Solving (pp. 433-445). Springer India. Herrera, F., Herrera‐Viedma, E., & Chiclana, F. (2003). A study of the origin and uses of the ordered weighted geometric operator in multicriteria decision making. International Journal of Intelligent Systems, 18(6), 689-707. R.R.Yager, Janusz Kacprzyk, Gleb Beliakov,”Recent developments in the ordered weighted averaging operators: Theory and Practice,”Studies in fuzziness and soft computing vol.205 2011. Zadeh, L. A. (1973). Outline of a new approach to the analysis of complex systems and decision processes. Systems, Man and Cybernetics, IEEE Transactions on, (1), 28-44. Bandler, W., & Kohout, L. (1980). Fuzzy power sets and fuzzy implication operators. Fuzzy Sets and Systems, 4(1), 13-30. Giles, R. (1976). Łukasiewicz logic and fuzzy set theory. International Journal of ManMachine Studies, 8(3), 313-327. Weber, S. (1983). A general concept of fuzzy connectives, negations and implications based on t-norms and t-conorms. Fuzzy sets and systems, 11(1), 103-113. D. Dubois, H.Prade,”new results about properties and semantics of fuzzy set-theoretic operators,” Fuzzy Sets, Plenum Press, New york 1986 59-75. Yager, R. R. (1980). On a general class of fuzzy connectives. Fuzzy sets and Systems, 4(3), 235-242. Yandong, Y. (1985). Triangular norms and TNF-sigma-algebras. Fuzzy Sets and Systems, 16(3), 251-264. J.Dombi,”A general class of fuzzy connectives,”Fuzzy sets and systems 4(1980) 235-242 Leiner, Barry M., Vinton G. Cerf, David D. Clark, Robert E. Kahn, Leonard Kleinrock, Daniel C. Lynch, Jon Postel, Larry G. Roberts, and Stephen Wolff. "A brief history of the Internet." ACM SIGCOMM Computer Communication Review39, no. 5 (2009): 22-31.. J.L. Marichal,”Aggregation operators for multicriteria decision aid,”Ph.D. Dissertation, University De Liege,1999 Chiclana F, Herrera F, Herrera-Viedma E,”The ordered weighted geometric operator: th properties and application,”in proceedings 8 international conference on information processing and management of uncertainty in knowledge based systems Madrid, Spain 2000 pp 985-991

[52] [53]

[54] [55] [56] [57]

[58] [59] [60]

[61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74]

[75] [76]

Chen, S. J., & Chen, S. M. (2005). Fuzzy information retrieval based on geometric-mean averaging operators. Computers & Mathematics with Applications, 49(7), 1213-1231. M.E. Smith,”Aspects of the P-norm model of information retrieval: syntactic query generation, efficiency and theoretical properties,”Ph.D. Dissertation, Cornell University 1990. Waller, W. G., & Kraft, D. H. (1979). A mathematical model of a weighted Boolean retrieval system. Information Processing & Management, 15(5), 235-245. Salton, G., Fox, E. A., & Wu, H. (1983). Extended Boolean information retrieval. Communications of the ACM, 26(11), 1022-1036. Dubois, D., Fargier, H., & Prade, H. (1996). Refinements of the maximin approach to decision-making in a fuzzy environment. Fuzzy sets and systems,81(1), 103-122. Yager, R. R. (1997). On the analytic representation of the Leximin ordering and its application to flexible constraint propagation. European Journal of Operational Research, 102(1), 176-192. M. Sugeno,”Theory of fuzzy integrals and its applications,”Ph.D. thesis, Tokyo institute of technology, Tokyo 1974. Pawlak, Z. (1982). Rough sets. International Journal of Computer & Information Sciences, 11(5), 341-356. Poelmans, J., Ignatov, D. I., Kuznetsov, S. O., & Dedene, G. (2013). Formal concept analysis in knowledge processing: A survey on applications. Expert systems with applications, 40(16), 6538-6560. Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic web. Scientific American, 284(5), 28-37. Ajaya Chakravarty,”Mining the semantic web,” in proceedings of the first AKT doctoral colloquium 2005. http://challenge.semanticweb.org/ retrieved on 26.01.2013 Breslin, J. G., O'Sullivan, D., Passant, A., & Vasiliu, L. (2010). Semantic Web computing in industry. Computers in Industry, 61(8), 729-741. Han, L., & Chen, G. (2009). A fuzzy clustering method of construction of ontology-based user profiles. Advances in Engineering Software, 40(7), 535-540. Zhang, F., Ma, Z. M., & Yan, L. (2013). Construction of fuzzy ontologies from fuzzy XML models. Knowledge-Based Systems, 42, 20-39. Bobillo, F., & Straccia, U. (2011). Fuzzy ontology representation using OWL 2.International Journal of Approximate Reasoning, 52(7), 1073-1094. Fenza, G., & Senatore, S. (2010). Friendly web services selection exploiting fuzzy formal concept analysis. Soft Computing, 14(8), 811-819 Jin, H., Ning, X., Jia, W., Wu, H., & Lu, G. (2008). Combining weights with fuzziness for intelligent semantic web search. Knowledge-Based Systems,21(7), 655-665. Formica, A. (2012). Semantic Web search based on rough sets and Fuzzy Formal Concept Analysis. Knowledge-Based Systems, 26, 40-47. G.Salton, M. J. McGill,”Introduction to modern information retrieval,”Mcgraw-hill, Newyork(1983). Salton, G., Fox, E. A., & Wu, H. (1983). Extended Boolean information retrieval. Communications of the ACM, 26(11), 1022-1036. Kraft, D. H., & Buell, D. A. (1983). Fuzzy sets and generalized Boolean retrieval systems. International journal of man-machine studies, 19(1), 45-56. Burkowski, F. (1992) “Retrieval activities in a database consisting of heterogeneous th collections of structured text,”in proceedings of the 15 ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’92), pp. 112-125. Waller, W. G., & Kraft, D. H. (1979). A mathematical model of a weighted Boolean retrieval system. Information Processing & Management, 15(5), 235-245. Rocchio, J.(1971) Relevance feedback in information retrieval. In G. Salton(Ed.), the smart retrieval System: Experiments in automatic Document Processing pp.313-323 Prentice hall.

[77] [78]

[79] [80] [81] [82] [83] [84] [85]

[86] [87] [88] [89]

[90] [91]

[92]

[93]

[94] [95] [96] [97]

[98]

Berry, M. W., Drmac, Z., & Jessup, E. R. (1999). Matrices, vector spaces, and information retrieval. SIAM review, 41(2), 335-362. R. Wilkinson and P. Hingston. Using the cosine measure in a neural network for document retrieval. In Proc. of the ACM SIGIR Conference on Research and. Development in Information Retrieval, pages 202-210, Chicago, USA, Oct 1991. Turtle, H., & Croft, W. B. (1991). Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems (TOIS), 9(3), 187-222. Maron, M., J. Kuhns (1960).On relevance, probability indexing and information retrieval. Journal of the Associations for Computer Machinery 7,216-244 Fuhr, N. (1989). Models for retrieval with probabilistic indexing. Information Processing & Management, 25(1), 55-72. Robertson, S. E. (1977). The probability ranking principle in IR. Journal of documentation, 33(4), 294-304. Robertson, S. E., & Jones, K. S. (1976). Relevance weighting of search terms.Journal of the American Society for Information science, 27(3), 129-146. Bookstein, A., & Swanson, D. R. (1974). Probabilistic models for automatic indexing. Journal of the American Society for Information science, 25(5), 312-316. Ponte, J. M., & Croft, W. B. (1998, August). A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 275-281). ACM. Metzler, D., & Croft, W. B. (2004). Combining the language model and inference network approaches to retrieval. Information processing & management, 40(5), 735-750. Stumme, G., & Maedche, A. (2001, August). FCA-Merge: Bottom-up merging of ontologies. In IJCAI (Vol. 1, pp. 225-230). Formica, A., & Missikoff, M. (2002). Concept similarity in SymOntos: an enterprise ontology management tool. The Computer Journal, 45(6), 583-594. Jiang, G., Ogasawara, K., Endoh, A., & Sakurai, T. (2003). Context-based ontology building support in clinical domains using formal concept analysis.International journal of medical informatics, 71(1), 71-81. Bain, M. (2003). Inductive construction of ontologies from formal concept analysis. In AI 2003: Advances in Artificial Intelligence (pp. 88-99). Springer Berlin Heidelberg. Yannis Kalfoglou, Srinandan Dasmahapatra, Yun-Heh Chen-Burger,”FCA in knowledge technologies experience and opportunities,” Second International Conference on Formal Concept Analysis, ICFCA 2004, Sydney, Australia, February 23-26, 2004.Concept Lattices Lecture Notes in Computer Science Volume 2961, 2004, pp 252-260 Cimiano, P., Hotho, A., Stumme, G., & Tane, J. (2004). Conceptual knowledge processing with formal concept analysis and ontologies. In Concept Lattices(pp. 189207). Springer Berlin Heidelberg. Hwang, S. H., Kim, H. G., & Yang, H. S. (2005). A FCA-based ontology construction for the design of class hierarchy. In Computational Science and Its Applications–ICCSA 2005 (pp. 827-835). Springer Berlin Heidelberg. Zhou, B., Hui, S. C., & Chang, K. (2005). A formal concept analysis approach for web usage mining. In Intelligent information processing II (pp. 437-441). Springer US. Gerd Stumme,”Ontology Merging with Formal Concept analysis,”Dagstuhl Seminar proceedings 04391 2005 Formica, A. (2008). Concept similarity in Formal Concept Analysis: An information content approach. Knowledge-Based Systems, 21(1), 80-87. H. Haav,”A semi-automatic method to ontology design by using FCA,”in: V Snasel, R. Belohlavek(Eds.) Proceedings of concept lattices and their applications(CLA), Ostrava, Czech Republic 2004 Tho, Q. T., Hui, S. C., Fong, A. C. M., & Cao, T. H. (2006). Automatic fuzzy ontology generation for semantic web. Knowledge and Data Engineering, IEEE Transactions on, 18(6), 842-856.

Chiclana, F., Herrera‐Viedma, E., Herrera, F., & Alonso, S. (2004). Induced ordered weighted geometric operators and their use in the aggregation of multiplicative preference relations. International Journal of Intelligent Systems,19(3), 233-255. [100] Hong, W. S., Chen, S. J., Wang, L. H., & Chen, S. M. (2007). A new approach for fuzzy information retrieval based on weighted power-mean averaging operators. Computers & Mathematics with Applications, 53(12), 1800-1819. [101] Xu, Z. S., & Da, Q. L. (2002). The ordered weighted geometric averaging operators. International Journal of Intelligent Systems, 17(7), 709-716. [102] Wen Zhou Zong-tian liu Yan Zhao,” Ontology learning by clustering based on fuzzy st Formal concept analysis,”In proceedings of 31 annual international computer software and applications conference(COMPSAC 2007) [103] Yang, K. M., Kim, E. H., Hwang, S. H., & Choi, S. H. (2008). Fuzzy concept mining based on formal concept analysis. International journal of computers, (3), 279-290. [104] Peici Fang, Siyao Zheng,”A research on Fuzzy formal concept concept analysis based collaborative filtering recommendation system,”In proceedings of second international symposium on knowledge acquisition and modeling,2009 [105] Zheng, S., Zhou, Y., & Martin, T. (2009, September). A new method for fuzzy formal concept analysis. In Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology-Volume 03 (pp. 405408). IEEE Computer Society. [106] G. Birkoff,”Lattice theory,” American Mathematical society, Providence RI 1967. [107] Singh, P. K., & Kumar, C. A. (2012). A method for decomposition of fuzzy formal context. Procedia Engineering, 38, 1852-1857. [108] Xu, X., Wu, Y., & Chen, J. (2010, October). Fuzzy FCA based ontology mapping. In Networking and Distributed Computing (ICNDC), 2010 First International Conference on (pp. 181-185). IEEE. [109] Poelmans, J., Ignatov, D. I., Kuznetsov, S. O., & Dedene, G. (2014). Fuzzy and rough formal concept analysis: a survey. International Journal of General Systems, 43(2), 105134. [110] G.J. Klir, B.Yuan,”Fuzzy Sets and fuzzy logic: Theory and applications,” Prentice-Hall, Upper Saddle River, NJ(1995). [111] Jalali, M., Mustapha, N., Sulaiman, M. N. B., & Mamat, A. (2009). OPWUMP: An Architecture for Online Predicting in WUM-Based Personalization System. In Advances in Computer Science and Engineering (pp. 838-841). Springer Berlin Heidelberg. [112] http://semanticweb.co/ retrieved on 10.09.2014. [113] Ceri, S., Bozzon, A., Brambilla, M., Valle, E. D., Fraternali, P., & Quarteroni, S. (2013). Web Information Retrieval. Springer Publishing Company, Incorporated. [114] Casasús-Estellés, T., & Yager, R. R. (2014, January). Fuzzy Concepts in Small Worlds and the Identification of Leaders in Social Networks. In Information Processing and Management of Uncertainty in Knowledge-Based Systems (pp. 37-45). Springer International Publishing. [115] Krajči, S. (2014). Social Network and Formal Concept Analysis. In Social Networks: A Framework of Computational Intelligence (pp. 41-61). Springer International Publishing. [116] Belohlavek, R., & Konecny, J. (2013, June). Toward reduction of formal fuzzy context. In IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), 2013 Joint (pp. 221-225). IEEE. [117] Yao, Y. (2004, January). A comparative study of formal concept analysis and rough set theory in data analysis. In Rough sets and current trends in computing (pp. 59-68). Springer Berlin Heidelberg. [118] Stoilos, G., Stamou, G. B., Tzouvaras, V., Pan, J. Z., & Horrocks, I. (2005, November). Fuzzy OWL: Uncertainty and the Semantic Web. In OWLED. [119] Cross, V., & Kandasamy, M. (2011, June). Fuzzy concept lattice construction: a basis for building fuzzy ontologies. In Fuzzy Systems (FUZZ), 2011 IEEE International Conference on (pp. 1743-1750). IEEE. [120] Chandrasekaran, B., Josephson, J. R., & Benjamins, V. R. (1999). What are ontologies, and why do we need them?. IEEE Intelligent systems, 14(1), 20-26. [99]

[121] Cross,

V., Kandasamy, M., & Yi, W. (2011, March). Comparing two approaches to creating fuzzy concept lattices. In Fuzzy Information Processing Society (NAFIPS), 2011 Annual Meeting of the North American (pp. 1-6). IEEE. [122] Sanchez, E. (Ed.). (2006). Fuzzy logic and the semantic web. Elsevier. [123] Chen, W., Yang, Q., Zhu, L., & Wen, B. (2009, October). Research on automatic fuzzy ontology generation from fuzzy context. In Intelligent Computation Technology and Automation, 2009. ICICTA'09. Second International Conference on (Vol. 2, pp. 764-767). IEEE. [124] De Maio, C., Fenza, G., Loia, V., & Senatore, S. (2009, August). Towards an automatic fuzzy ontology generation. In Fuzzy Systems, 2009. FUZZ-IEEE 2009. IEEE International Conference on (pp. 1044-1049). IEEE. [125] Belohlavek, R., De Baets, B., Outrata, J., & Vychodil, V. (2010). Computing the lattice of all fixpoints of a fuzzy closure operator. Fuzzy Systems, IEEE Transactions on, 18(3), 546-557. [126] Majidian, A., & Martin, T. (2009, September). Extracting Taxonomies from Data-a Case Study using Fuzzy Formal Concept Analysis. In Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT'09. IEEE/WIC/ACM International Joint Conferences on (Vol. 3, pp. 191-194). IET [127] Ajayi, A. O., Aderounmu, G. A., & Soriyan, H. A. (2010). An adaptive fuzzy information retrieval model to improve response time perceived by e-commerce clients. Expert Systems with Applications, 37(1), 82-91. [128] Olshefski, D., & Nieh, J. (2006, June). Understanding the management of client perceived response time. In ACM SIGMETRICS Performance Evaluation Review(Vol. 34, No. 1, pp. 240-251). ACM. [129] de Leite, M. A., & Ricarte, I. L. (2008, November). Fuzzy information retrieval model based on multiple related ontologies. In Tools with Artificial Intelligence, 2008. ICTAI'08. 20th IEEE International Conference on (Vol. 1, pp. 309-316). IEEE. [130] Kraft, D. H., Chen, P., & Mikulcic, A. (2000, May). Combining fuzzy clustering and fuzzy inferencing in information retrieval. In Fuzzy Systems, 2000. FUZZ IEEE 2000. The Ninth IEEE International Conference on (Vol. 1, pp. 375-380). IEEE. [131] Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy< i> c-means clustering algorithm. Computers & Geosciences, 10(2), 191-203. [132] Horng, Y. J., Chen, S. M., Chang, Y. C., & Lee, C. H. (2005). A new method for fuzzy information retrieval based on fuzzy hierarchical clustering and fuzzy inference techniques. Fuzzy Systems, IEEE Transactions on, 13(2), 216-228. [133] Martin-Bautista, M. J., Vila, M. A., Sanchez, D., & Larsen, H. L. (2000). Fuzzy genes: improving the effectiveness of information retrieval. In Evolutionary Computation, 2000. Proceedings of the 2000 Congress on (Vol. 1, pp. 471-478). IEEE. [134] Horng, Y. J., Chen, S. M., & Lee, C. H. (2003). Automatically constructing multirelationship fuzzy concept networks for document retrieval. Applied Artificial Intelligence, 17(4), 303-328. [135] Kim, K. J., & Cho, S. B. (2001, July). A personalized web search engine using fuzzy concept network with link structure. In IFSA World Congress and 20th NAFIPS International Conference, 2001. Joint 9th (Vol. 1, pp. 81-86). IEEE. [136] Bordogna, G., Bosc, P., & Pasi, G. (1996, February). Fuzzy inclusion in database and information retrieval query interpretation. In Proceedings of the 1996 ACM symposium on Applied Computing (pp. 547-551). ACM. [137] Wooldridge, M., & Jennings, N. R. (1995). Intelligent agents: Theory and practice. The knowledge engineering review, 10(02), 115-152. [138] Turban, E., Aronson, J., & Liang, T. P. (2005). Decision Support Systems and Intelligent th Systems 7 Edition. Pearson Prentice Hall. [139] Castelfranchi, C. (1995). Guarantees for autonomy in cognitive agent architecture. In Intelligent agents (pp. 56-70). Springer Berlin Heidelberg. [140] Galliers, J. R. (1988). A theoretical framework for computer models of cooperative dialogue, acknowledging multi-agent conflict (Doctoral dissertation, Open University).

[141] Genesereth,

M. R., & Ketchpel, S. P. (1994). Software agents. Commun. ACM,37(7), 4853. [142] Rosenschein, J. S., & Genesereth, M. R. (1985). Deals among rational agents(pp. 91-99). Department of Computer Science, Stanford University. [143] White, J. E. (1994). Telescript technology: The foundation for the electronic marketplace. General Magic white paper, 282. [144] Goodwin, R. (1995). Formalizing properties of agents. Journal of Logic and Computation, 5(6), 763-781. [145] Klusch, M. (2001). Information agent technology for the internet: A survey. Data & Knowledge Engineering, 36(3), 337-372. [146] Shen, W., & Norrie, D. H. (1999). Agent-based systems for intelligent manufacturing: a state-of-the-art survey. Knowledge and information systems,1(2), 129-156. [147] Lee, C. S., & Wang, M. H. (2008). Ontological fuzzy agent for electrocardiogram application. Expert Systems with Applications, 35(3), 1223-1236. [148] Ropero, J., Gómez, A., Carrasco, A., & León, C. (2012). A Fuzzy Logic intelligent agent for Information Extraction: Introducing a new Fuzzy Logic-based term weighting scheme. Expert Systems with Applications, 39(4), 4567-4581. [149] Cesarano, C., d'Acierno, A., & Picariello, A. (2003, November). An intelligent search agent system for semantic information retrieval on the internet. InProceedings of the 5th ACM international workshop on Web information and data management (pp. 111-117). ACM. [150] Cheng, X., Xie, Y., & Yang, T. (2008, December). Study of Multi-Agent Information Retrieval Model in Semantic Web. In Education Technology and Training, 2008. and 2008 International Workshop on Geoscience and Remote Sensing. ETT and GRS 2008. International Workshop on (Vol. 2, pp. 636-639). IEEE. [151] Agah, A., & Tanie, K. (2000). Intelligent graphical user interface design utilizing multiple fuzzy agents. Interacting with computers, 12(5), 529-542. [152] Cortés, J. C. R., & Sheremetov, L. B. (2003). Fuzzy Bayesian Classifier: a Multi-Agent System for Information Retrieval in the Web. In Neural Networks and Soft Computing (pp. 444-449). Physica-Verlag HD. [153] Vrettos, S., & Stafylopatis, A. (2001). A fuzzy rule-based agent for web retrieval-filtering. In Web Intelligence: Research and Development (pp. 448-453). Springer Berlin Heidelberg. [154] Tang, Y., Zhang, Y. Q., Kandel, A., Lin, T. Y., & Yao, Y. Y. (2004). Personalized Search Agents Using Data Mining and Granular Fuzzy Techniques. InEnhancing the Power of the Internet (pp. 207-223). Springer Berlin Heidelberg. [155] Teuteberg, F. (2003). Intelligent Agents for Document Categorization and Adaptive Filtering Using a Neural Network Approach and Fuzzy Logic. InKnowledge-Based Information Retrieval and Filtering from the Web (pp. 231-250). Springer US. [156] Herrera-Viedma, E., Porcel, C., López, A. G., Olvera, M. D., & Anaya, K. (2004). A fuzzy linguistic multi-agent model for information gathering on the web based on collaborative filtering techniques. In Advances in Web Intelligence(pp. 3-12). Springer Berlin Heidelberg. [157] Ropero, J., Gómez, A., León, C., & Carrasco, A. (2007). Information extraction in a set of knowledge using a fuzzy logic based intelligent agent. InComputational Science and Its Applications–ICCSA 2007 (pp. 811-820). Springer Berlin Heidelberg. [158] Loia, V., Luongo, P., Senatore, S., & Sessa, M. I. (2002). Info-Miner: bridging agent technology with approximate information retrieval. In Advances in Soft Computing—AFSS 2002 (pp. 459-465). Springer Berlin Heidelberg. [159] Licklider, J. C. R., & Clark, W. E. (1962, May). On-line man-computer communication. In Proceedings of the May 1-3, 1962, spring joint computer conference (pp. 113-128). ACM. [160] Roberts, L. G. (1967, January). Multiple computer networks and intercomputer communication. In Proceedings of the first ACM symposium on Operating System Principles (pp. 3-1). ACM. [161] Murugesan, S. (2007). Understanding Web 2.0. IT professional, 9(4), 34-41.

[162] Silva,

J. M., Mahfujur Rahman, A. S. M., & El Saddik, A. (2008, October). Web 3.0: a vision for bridging the gap between real and virtual. In Proceedings of the 1st ACM international workshop on Communicability design and evaluation in cultural and ecological multimedia system (pp. 9-14). ACM.

Data/Web Space (Text, Image, Audio, Video)

Data Domain

Information Domain

1. Distributed 2. Heterogeneous 3. Semistructured 4. Time Varying 5. High Dimensional

1. Imprecise 2. Incorrect 3. Inconsistent 4. Uncertain 5. Partial truth 6. Fast Changing

User Domain

1-Unfamiliar 2-Frequent change in Need 3-Based on personalized goals 4-Not conscious

Soft Computing Tools (Consortium of Methodology)

Fuzzy Logic

Artificial Neural Network

Genetic Algorithm

Probabilistic Reasoning

Modeling system for Imprecision ,uncertainty

Learning and Generalization

Search Optimization

Complex, Heterogeneous optimization

Artificial Intelligence

Natural Language Processing

Swarm Intelligence

Unstructured Query

Big Data Optimization

Output to the user for further action/interpretation/Decision

Fig1. Web Information Retrieval: Challenges and Solutions

Information Need

Documents

Indexing

Query Formulation

Indexed Documents

Query

Matching

Feedback

Retrieved documents

Fig. 2 information Retrieval Processes [14]

Information Retrieval

Areas:

Types

1- Text Mining 2-Image Mining 3-Multimedia Content

Process 1- Querying 2-Indexing 3-Evaluation 4-User Profile construction

1- Classical 2- Web Based

Fig. 3 IR and its different Aspects

d1 d1

0.8

d2

0.1 d4

d3

0.6

C2

C4

C6

c6

(, ,)

(,, ,) (,, ,)

0.3 c3 Q

Fig. 5 A Multi-relationship fuzzy concept network Fig. 4 A Fuzzy Concept Network

(,, ,) (, ,)

0.9

0.7

C3

1

c5 c3

c4

0.9 0.3

c1

C7

C5

d3 0.2

.3

0.4 C1

d2 .7

(,, ,)

Fig 7 FCA Vs FFCA[33]

Concept

Fuzzy Concept Network

Fuzzy Relations in Concept Network

Extended Fuzzy Concept Network

Fig.6. Evolution of Fuzzy Concept Network[38]

Multirelation Fuzzy Concept Network

Table1 A Taxonomy of Information Retrieval Model Information Retrieval Models (Best Match) Statistical model

(Exact match)

Methodology

Boolean model

Vector Space Approach

Probabilistic Approach

Set-Theoretic Approach

Algebraic

Probabilistic

Positive, Non-Binary

Binary

Binary

Index Term Query

Semantic Model

Conventional Boolean Expressions

t-dimensional vector

In the form of subset of index terms 

P( R | d j )  sim (d j , q)  P( R | d j )

   1if q cc | *

sim( d j , q )  

Similarity Matrix Between document d j

 0 otherwise 

 

* 







( (qcc qdnf )  (ki , gi (d j )  gi (qcc ))

sim (d j , q) 

d j q 



dj  q

and query q

R- set of relevant doc R -set of non-relevant doc  P( R | d j ) - Probability of dj is relevant to q  P( R | d j ) -Probability

Use of RDF, XML, OWL, etc.

of dj non-relevant to q

Advantages

Drawbacks

Variants

1- Easy implementation 2-first model of IR 3-considered computationally efficient 4-Good expressive power

1-Retrieval is not based on ranking 2-Document will be retrieved or not 3-Too few or too many retrievals

1- Standard Boolean (Salton et. al [71]) 2-Extended Boolean (Salton et. al[72]) 3-Fuzzy Set (Kraft et. al [73]) 4- Region Models (Burkowski [74] 5-Generalized Boolean Model (Waller et. al [75])

1-Term weighting scheme So improve IR 2-Partial matching allowed 3-based on cosine measure 4-ranking mechanism 1-Index terms are mutually independent 2-limited expressive power 3-computationally intensive 4-Lacks simplicity

1-Vector space model (Salton et. al [71]) 2-Support Vector machine (Rocchio [76]) 3-Latent Semantic Indexing [Berry et. al 77] 4-Neural Network (Wilkinson et. al [78])

1- documents are ranked in order of the relevant probability

1-Prior knowledge needed 2-All weights are binary 3-adoption of the independence assumption for index terms 1-Bayesian Network (Turtle et. al [79]) 2-The probabilistic indexing model (Maron et. al[80], Fuhr .[81]) 3-The probabilistic retrieval model (Robertson [82]) (Robertson et. al [83]) 4- 2-Poisson model (Bookstein et. al[84]) 5-Language Model (Ponte et. al[85]) (Metzer et. al[86]) 6-Belief Network

1Ontologybased 2-Vector space model based 3-Concept Based

Table 2: Some Scholarly Work related to IR using fuzzy logic Fuzzy Concept Networks

Concept

Weighted Operators Fuzzy Formal Concept Analysis

2006-2014

Research Work

Area

Lucarella[24]

Document retrieval

Cracker[25]

Concept network

Shyi-Chen et al. [26]

Document retrieval

Shyi-Ming et al. [27]

Document retrieval

Kim et al. [135]

Personalized web search engine

Horng et al.[134]

Document retrieval

Stumme [87]

Ontology merging

Formica et al. [88]

Enterprise tool

Jiang et al. [89]

Context-based development

Bain [90]

Ontology construction

Kalfoglou et al. [91]

Review of FCA

Cimiano [92]

FCA and ontology

Hwang et al. [93]

Ontology Construction

Zhou et al. [94]

Web usages mining

Stumme [95]

Ontology Merging

Formica [32]

Ontology merging

Formica [96]

Ontology development

Haav [97]

Fuzzy Agents

2000-2007

Formal Concept Analysis

1991-2003

Year Range

Ontology Development

Shi-Jay Chen [52]

Geometric Mean Averaging operator

Chiclana et al. [51]

Ordered weighted Geometric operator

Renata Smolikova [37]

Aggregation operator

F. Herrera et al. [39]

Study of OW GO

F. Chiclana et al. [99]

Induced ordered weighted geometric operator

Won-Sin Hong et al. [100]

Weighted power mean averaging operator

Z.S. Xu et al. [101]

Ordered weighted geometric averaging operators

Vrettos et al. [153]

Web filtering

Loia et al. [158]

Data gathering

Teuteberg [155]

Document categorization

Cortez et al. [152]

Web Filtering

Tang et al. [154]

Personalized search agent

Herrera-Viedma et al. [156]

Data gathering

Ropero at el. [157]

Information extraction

Quan Thanh Tho [35]

Ontology Generation

Wen Zhou Zong-tian liu Yan Zhao [102]

Clustering

Kyoung-Mo Yang et al. [103]

Fuzzy Concept

Peici Fang et al. [104]

Recommendation system

Siyao Zheng et al. [105]

Ontology Generation

A. Formica [106]

Fuzzy Concept

Prem Kumar Singh et al. [107]

Context

Xiaoliang Xu et al. [108]

Ontology Mapping

Anna Formica [70]

Web Search

Carmen De Maio [33]

Web Search

Fenza et al. [68]

Web search

Casasús-Estellés et al.[114]

Mining Social Network

Krajči [115]

Mining Social Network

Chen et al. [123]

Ontology Generation

Maio et al. [124]

Ontology Generation

Majidian et al. [126]

Extraction of useful taxonomic relations

Table 3 Fuzzy Associations Sr. No.

Name of the Fuzzy Relation

Symbol

Description

Relations Followed

1

Fuzzy Positive Association

µP

µP : C X C →[0,1]

Reflexive, Symmetric, Max-*-transitive

2

Fuzzy Negative Association

µN

µN: C X C →[0,1]

Anti-reflexive, symmetric, max-*-nontransitive

3

Fuzzy Generalization

µG

µG : C X C →[0,1]

Anti-reflexive, antisymmetric, max-*-transitive

4

Fuzzy Specialization

µS

µS: C X C →[0,1]

Anti-reflexive, antisymmetric, max-*-transitive

Table 4 Fuzzy Operators T-Operators Sr . N o. 1.

Averaging Operators

T-norms (for MIN)

T-Conorms (For MAX)

MIN( x, y )

MAX ( x, y )

Zadeh[41]

x  y  x. y

x. y

2.

Operator n F (a1 , a 2 ,..a n)   wi bi i 1 Ordered weighted averaging operator[49] 1 n  F ( x)  h1  h( xi )  n i 1 

Probabilistic Operator[42]

3.

Max( x  y  1,0)

Min( x  y ,1)

Quasi-Arithmetic means[50]

Fw ( x )  h 

Luckasiewicz Logic[43]

 ( x  y )  xy(1  2 )

xy

4.

1  (1   )( x  y  xy)

  xy(1   )

Hamacher[44]

xy

5.

1

MAX ( x , y ,  )

 x  y  1  xy  ,0  1   

6.

1 n   w h( x ) n   i 1 i i 

Quasi-linear means[50]



G

n wi ( a1 , a 2 ,...., a n )   bi i 1

Ordered weighted geometric operator[51]

(1  x )(1  y ) MAX (1  x ,1  y ,  )

Dubois[45]

MAX 

1

m  F ( d i , q AND )   (  eij )  j 1 

1

n



m  F ( d i , q OR )  (  1)   (  1  eij )  j 1 

F ( di , q AND )   . min(ei1, ei 2 ,..,eim )  (1   ).

m  eij j 1

F ( d i , q OR )   . max(ei1 , ei 2 ,.., eim )  (1   ).

Weber[44]

7.

MAX (1  ((1  x)

p

p  (1  y ) ) p ,0)

MAX ((1   )( x  y  1)  xy,0)

m m  eij j 1 m

Infinite-one operator[53] 1 p p MIN (( x  y ) p ,1)

F (di , q AND )  z. min(ei1, ei 2 ,..,eim )  (1  z ). max(ei1, ei 2 ,..,eim) ) F (di , qOR )  z. min(ei1, ei 2 ,..,eim )  (1  z ). max(ei1, ei 2 ,..,eim )

Yager[46]

8.

m

Geometric mean averaging operator[52]

MIN( x  y  xy,1)

1

1

Waller Kraft Operator[54]

MIN ( x  y  xy,1)

  F ( d i , q AND )  1    

1 m p  (1  eij )   j 1 m

  

  F ( d i , q OR )     Yu Yandong[47]

  1    1       1    1     1     1    1  1     1    1  x   y    x   y       Dombi[48]

11

xy

x  y  2 xy

x  y  xy

1  xy

 x y 0

if

y 1

if

x 1

1

p

1 1

10

m

    

P-norm Operator[55]

1

9.

m p  eij j 1

 x y 1

otherwise

Weber[44]

if

y 0

if

x0

otherwise

 1

n FLexi min ( x )   wi bi i 1

Leximmin ordering[56-57] n S  ( x )   [ x (i )  {(i ),...(n ) ] i 1 Sugano Integral[58] n OWMAX w ( x )   ( wi  x (i ) ) i 1 n ' OWMINw' ( x )   ( wi  x (i ) ) i 1 Ordered weighted MAX and MIN operators[58]