User feedback based metasearching using neural network
Rashid Ali & Iram Naim
International Journal of Machine Learning and Cybernetics ISSN 1868-8071 Volume 6 Number 2 Int. J. Mach. Learn. & Cyber. (2015) 6:265-275 DOI 10.1007/s13042-013-0212-2
1 23
Your article is protected by copyright and all rights are held exclusively by SpringerVerlag Berlin Heidelberg. This e-offprint is for personal use only and shall not be selfarchived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com”.
1 23
Author's personal copy Int. J. Mach. Learn. & Cyber. (2015) 6:265–275 DOI 10.1007/s13042-013-0212-2
ORIGINAL ARTICLE
User feedback based metasearching using neural network Rashid Ali • Iram Naim
Received: 21 October 2012 / Accepted: 24 October 2013 / Published online: 17 November 2013 Ó Springer-Verlag Berlin Heidelberg 2013
Abstract Metasearch engines are the web services that receive user queries and dispatch them to multiple crawler based search engines. After this, they collect the returned search results, reorder them and present the reordered list to the end user. To combine the results from different search engines, a metasearch engine may use different rank aggregation techniques to aggregate the various rankings of the search results to generate an overall ranking. If different rank aggregation techniques are used to collate search results, the results of metasearching for the same query may vary for the same set of participating search engines. In this paper, we discuss a metasearching technique that utilizes neural network based rank aggregation. Here, we formulate the rank aggregation problem as a function approximation problem. As the multilayer perceptrons are considered universal approximators, we use a multilayer perceptron for rank aggregation. We compare the performance of the neural network based method with four other methods namely rough set based method, modified rough set based method, Borda’s method and a Markov Chain based method (MC2) using three independent evaluators. Experimentally, we find that the neural network based method performs better than each of these four methods.
R. Ali (&) College of Computers and Information Technology, Taif University, Taif, Saudi Arabia e-mail:
[email protected] I. Naim Department of Computer Engineering and Information Technology, M.J.P.Rohilkhand University, Bareilly, Uttar Pradesh, India e-mail:
[email protected]
Keywords Metasearching User feedback Neural network Correlation coefficient Performance evaluation
1 Introduction Web searching is one of the most popular activities on Internet. There are a large number of public Web search engines available for the purpose. No search engine is perfect. Since, different search engines use different indexing algorithms and have different coverage of the web, search results from different search engines also vary even in response to the same query. So, there exists a good idea of metasearching to combine results of different search engines to increase coverage. Metasearch engines try to combine the benefit of individual search engines. They are test beds of innovation. Metasearch engines do not have database of their own. They simply collect the search results of different search engines in response to a query, reorder them and present before the user. This saves a lot of computing time and intellectual effort. The ranked results from different search engines may be combined into a single overall ranking using any rank aggregation technique. If a user’s feedback is used to combine these ranked search results, we call such metasearching as user feedback based metasearching. Here, we discuss a user feedback based metasearch system that uses neural network. In this, user feedback is obtained for combining results of participating search engines for each query in training set. We may obtain a user’s feedback in many ways. For example, we may provide the user a form with the union of search results returned from participating search engines for each query in the training set and may ask the user to rank the search results according to his preference on that
123
Author's personal copy 266
Int. J. Mach. Learn. & Cyber. (2015) 6:265–275
form. But, this exercise is too demanding for the user. An easy user may not bother to provide correct feedback. This will affect the quality of our metasearch system. It is also expensive in terms of time taken. Alternatively, the user feedback may be obtained implicitly by watching the actions of a user on the union of search results presented before him in response to his query. A good number of user’s queries may then constitute the training set. Then, the neural network is trained using the user feedback for each query in the training set. The trained neural network may be then used for combining results of participating search engines for any new query. The important contribution of this work may be summarized as follows. (1) (2) (3) (4)
(5)
(6) (7)
The formulation of rank aggregation problem as a function approximation problem Use of a neural network popularly known as the universal approximator for rank aggregation Application of the neural network based rank aggregation for metasearching Comparison of the performance of the neural network based metasearching with two other soft computing based rank aggregation techniques called rough set based rank aggregation and modified rough set based rank aggregation Comparison of the performance of the neural network based metasearching with two other classical rank aggregation techniques called Borda’s method and MC2 method Evaluation of the performance of these five systems using feedback from three independent evaluators Statistical analysis of this evaluation using modified spearman rank order correlation coefficient.
The rest of the paper is organized as follows. In Sect. 2, we discuss briefly related work in the area. In Sect. 3, we discuss the user feedback based metasearching that uses neural network. We present results of our experiments in Sect. 4. Finally, we conclude in Sect. 5.
discussed spam fighting. In [5], Renda and Straccia compared rank based methods (like Borda’s method, Markov chain based methods) with score based methods (like LC method) for metasearching. They concluded that while Markov chain based method contended with score based methods, for metasearching, Borda’s count method performed worse than the score based methods. Soft computing techniques have been popular for their tolerance to imprecision and uncertainty and have been applied in many real life applications [9–14]. In the context of rank aggregation and metasearching, soft computing techniques like fuzzy logic, genetic algorithms, rough set etc., have been reported to be used in [15–24]. If we use human intelligence for metasearching, we can hope for better results. Such approach may also be expected to be helpful in spam fighting. Therefore, some of the studies on metasearching [6, 23, 24] used user feedback for metasearching. In [6], the authors proposed a rough set based learning algorithm for user feedback based metasearching. In that work, the algorithm learnt ranking rules for overall ranking from the user feedback available for the queries in the training set and then, the learned rules were used to estimate the overall ranking for the queries in the test set, for which, the user feedback was not available. Few limitations of the rough set based algorithm were discussed in [25] and a modified version of the rough set based rank aggregation was proposed. In [7], the authors discussed the development of a metasearch engine, which was based on modified rough set based aggregation. Here, we discuss a metasearch system that uses one of the popular soft computing techniques called neural network for user feedback based metasearching.
3 User feedback based metasearching Here, we discuss the user feedback based metasearching using neural network. First, we discuss the basics of an artificial neural network. Then, we discuss the method to obtain user feedback. After this, we discuss the overall details of the metasearching using neural network.
2 Related work 3.1 Neural network Metasearching is widely discussed in literature. There are a good number of studies that have been performed on metasearching [1–7]. In [1], Aslam and Montogue used the classical Borda’s method [8] for metasearching. In [2], Montogue and Aslam [2] proposed a condorcet method called Condorcet-fuse for metasearching. In [3], Vogt and Cottrell used the linear combination (LC) model for combining the results from different systems. In [4], Dwork et al. proposed a number of Markov chain based rank aggregation methods (MC1, MC2 etc.) and used them for metasearching. They also
123
An artificial neural network (ANN, in short) is a mathematical model or computational model based on biological neural networks. It consists of an interconnected group of artificial neurons and processes information using a connectionist approach to computation [26]. Neural networks can be used to model complex relationships between inputs and outputs or to find patterns in data. Discussion on few important and recent works that used neural networks can be found in [27–30]. Multilayer feed forward neural network or multilayer perceptron (MLP, in short) is perhaps simplest type of
Author's personal copy Int. J. Mach. Learn. & Cyber. (2015) 6:265–275
267
artificial neural network. In this neural network, the information moves in only one direction i.e. forward, from the input nodes, through the hidden nodes, if any, to the output nodes [31]. There are no cycles or loops in the network. This class of networks consists of multiple layers of computational units, usually interconnected in a feedforward way. Each neuron in one layer has directed connections to the neurons of the subsequent layer. In many applications, the units of these networks apply a sigmoid function as an activation function. The universal approximation theorem for neural networks states that every continuous function that maps intervals of real numbers to some output interval of real numbers can be approximated arbitrarily closely by a multilayer perceptron with just one hidden layer. This result holds only for restricted classes of activation functions, e.g. for the sigmoidal functions. In training phase, a set of input–output pair is provided to the neural network. The back propagation algorithm is used to train multilayer feed-forward ANNs. The artificial neurons that are organized in different layers send their signals ‘‘forward’’, and then the error between the actual output and the predicted output is propagated ‘‘backwards’’ [32]. 3.2 Implicit user feedback Here, the combination of search results from different participating search engines are presented before the user and the user feedback is obtained implicitly by watching actions of user on the search results. The implicit feedback of the user can be characterized by a vector (V, T, P, S, B, E, C) [33], which consists of the following: (a)
The sequence V in which the user visits the documents, The time ti that a user spends examining the document i. Whether or not the user prints the document i. Whether or not the user saves the document i. Whether or not the user book-marked the document i. Whether or not the user e-mailed the document i to someone. The number of words that the user copied and pasted elsewhere.
(b) (c) (d) (e) (f) (g)
Once we have the feedback, the weighted sum rj for each document j selected by the user may be computed as follows: 1 tj þ wT max þ wP pj þ wS sj þ wB bj vj 1Þ ð t j 2 ! cj þwE ej þ wC total cj
rj ¼
wV
ð1Þ
Where tjmax represents the maximum time a user is expected to spend in examining the document j, and ctotal is j
the total number of words in the document j. Here, wV, wT, wP, wS, wB, wE and wC, all lying between 0 and 1, give the respective weights which can be given to each of the seven components of the feedback vector. The sum rj represents the importance of document j in the eyes of the user. Now, sorting the documents on the descending values of their weighted sum rj will produce a sequence say RU, which is in fact, ranking of documents on the basis of the user’s feedback. 3.3 Metasearching using neural network The whole process of metasearching using Neural Network is performed into two phases namely (1) training phase and (2) test phase. In the training phase, we present the collection of search results obtained from the different participating search engines before the user, and obtain user feedback implicitly by watching actions of the user on the results. We train the neural network with this user feedback. In the test phase, result of metasearching is returned using the trained neural network. In training phase, the neural network is trained using training data. Here, the user first issues a query, which is passed to each of the participating search engines. Then, we collect top few results from each of the participating search engines and these top few search results from the participating search engines are combined and presented before the user. Let us assume the cardinality of the union, U of all the ranked lists from the different search engines is |n|. The user feedback on the set U is obtained implicitly as discussed in Sect. 3.2 Then, we build a table say rank table (say RT) of size n 9 (m ? 1) by using the m ranked lists from the m participating search engines and the one user feedback based ranking RU. Here, n is the number of documents present in the Union of search results and m ? 1 is the total number of rankings. In this rank table, we have m ? 1 columns corresponding to these m ? 1 rankings. We place a value –k in the cell (i, j), if a document i [ U is present at kth position in the jth ranking. The argument behind this is that we are converting a set of ranked list into a set of scored lists. So, by giving a score –k, where k is the position of the document in a ranked list, we make sure that each document obtains a score according to its position in the ranked lists. If the document i [ U is not present in the jth ranking at all, then, we place a value –(n ? 1) in the cell (i, j) to ensure that such a document gets the least score in the jth column corresponding to jth ranking. After this, a normalized rank table (NRT) is built by normalizing the real valued scores in rank table (RT) as follows. Normalized Value ¼
Actual value Min value Max value Min value
ð2Þ
Where, Actual value is the present value in the RT at the position of R(i, j), Min value is the minimum value present
123
Author's personal copy 268
in the RT, Max value is the maximum value present in the RT. As a neural network gives better performance with normalized values, we perform normalization of values so as to get the homogeneous values in between 0 and 1 for all ranking values. We take first m columns of this NRT to provide values to the input nodes of a neural network and (m ? 1)th column of this NRT to provide values to the target output of the neural network and train the neural network. We repeat the whole process for good number of queries in the training set. We validate the system by performing cross validation. After the training of the neural network, we save the set of neural network layer weights. In the test phase, for any new query, we take m ranking lists from m participating search engines to construct a rank table (RT) of size n 9 m, where n is the number of documents present in the Union of search results. Here, for the new query, we do not have the user feedback based ranking. So, we may assume m ? 1th column having all values zero as the initial values. We convert the RT into NRT, by following the same procedure as in the training phase. We estimate the values of the attribute corresponding to the desired overall ranking in the NRT using the trained neural network with saved layer weights. We convert the NRT back into the RT. The column corresponding to the user feedback based ranking in the RT yields the predicted score for each of the n documents. We sort these n documents in descending order of their scores to obtain aggregated ranked list Rag. This aggregated ranked list of results is then displayed as the result of metasearching.
4 Experiments and results For training of the metasearch system, we submit a query to the participating search engines and obtain the search results. We issue a common query to the seven participating public search engines. These seven participating search engines are AltaVista, Ask, Excite, Google, HotBot, Lycos and Yahoo. We obtain seven differently ranked search results from these seven search engines. Then, we collect top few results (say top-10) from every participating search engine. These top few search results from the participating search engines are combined and presented before the user for obtaining the user’s feedback. From the user feedback, we obtain the user feedback based ranking of search results. Then, with seven ranked lists from the seven search engines and one user feedback based ranked list, we build a rank table of size n 9 8, where n is the number of documents present in the Union of search results. Then, we normalize this RT. For experimentation, we have taken the same 15 different queries in training set, which were used in [34–37]. These queries are listed in
123
Int. J. Mach. Learn. & Cyber. (2015) 6:265–275 Table 1 List of queries used in training phase 1
Measuring search quality
2
Mining access patterns from web logs
3
Pattern discovery from web transactions
4
Distributed associations rule mining
5
Document categorization query generation
6
Term vector database
7
Client -directory-server-model
8
Similarity measure for resource discovery
9
Hypertextual web search
10
IP routing in satellite networks
11
Focussed web crawling
12
Concept based relevance feedback for information retrieval
13
Parallel sorting neural network
14
Spearman rank order correlation coefficient
15
Web search query benchmark
Table 1. Therefore, in training phase, we have 15 NRTs. For training the multilayer perceptron, from each of these 15 NRTs, we take first seven columns as input values for input nodes of the neural network and the remaining eighth column as target value for output node of the neural network. In the architecture of our MLP, the number of input nodes is 7 and the number of output node is one. The number of hidden nodes at first hidden layer is 5 and the number of hidden nodes at second hidden layer is 10. From this, we have 35 (7 9 5 = 35) weights present at first hidden layer, 50 (5 9 10 = 50) weights present at second hidden layer, and 10 (10 9 1 = 10) Weights present at output layer. So, the total number of layer weights for this neural network is equal to 95. After training of the neural network, we obtain values of these 95 layer weights of the neural network. We save all of these layer weights. These weights are used for designing the neural network to categorize test examples. We performed 5-fold cross validation to validate our system. After this validation, the neural network is fully trained to categorize any new example. For any new query, we first build normalized rank table and then use the trained neural network with saved layer weights to predict the output score for each document in the union of search results. This normalized rank table may be converted back to the rank table. Now, we may convert this score of search results from the rank table into ranking of search results by sorting results in descending order of their scores. This ranking is overall ranking of search results, which is subsequently returned as result of metasearching. In test set, we take 10 new queries. These queries are shown in Table 2. We pass each of the 10 queries one by
Author's personal copy Int. J. Mach. Learn. & Cyber. (2015) 6:265–275 Table 2 List of test queries
269 Table 3 List of URLs in U-set for the query ‘‘Learning Algorithm’’
1
Data visualization
2
Learning algorithm
3
Correlation coefficient
4
Search engine
5
Optimization tool
6
Lower bound
7
Rank aggregation
8
Meta search
9
Computer graphics
10
Neural network
one to the seven search engines and collect top-10 results from each of these seven search engines. On the basis of these results we build RT and NRT. After the formation of NRTs of these ten queries, we use these values of the NRT of each query to provide input to the trained neural network and obtain the output values. We convert each of these 10 NRTs into a RT. Then, sort the documents in descending order of their score in each rank table and obtain the overall ranking for each query in test set. For the evaluation of the performance of the neural network based metasearch system, we take the union of search results for each query from the test set. We provide this union of search results to three judges and obtain three different rankings of search results according to the choices of judges. These ranking lists are called user 1, user 2 and user 3 ranking lists. We also implement MC2 based metasearch system, Borda’s method based metasearch system, rough set based metasearch system and modified rough set based metasearch system. Then, we obtain the overall ranking for each of the queries in test set using each of these four systems. After this, we calculate modified Spearman rank order correlation coefficient [33] between aggregated ranked lists from each of the five implemented metasearch systems and each of the three independent evaluator’s rankings. The correlation coefficient gives the amount of disagreement between the each of the independent evaluator ranking and results of each of these five metasearch systems. Thus, the value of correlation coefficient is a measure of search quality of each of these five metasearch systems. Higher is the value of correlation coefficient, better is the metasearch system. This evaluation of the five metasearch systems is an example of user feedback based evaluation of search systems. An excellent discussion on different evaluation techniques may be found in [38]. For example, for the query ‘‘Learning Algorithm’’, when we collect top-10 search results from seven search engines, we get the cardinality of Union set, say U-set, to be 43. Uniform resource locators (URLs) present in this U-set are shown in Table 3.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
27 28 29 30 31 32 33 34 35 36 37 38
39 40 41 42 43
http://www.cse.unsw.edu.au/*cs9417ml/RL1/algorithms.html http://en.wikipedia.org/wiki/Machine_learning http://www.education.com/reference/article/algorithms-learning http://dli.iiit.ac.in/ijcai/IJCAI-2003/PDF/085.pdf http://www.sensenetworks.com/mve_algorithm.php http://www.cis.hut.fi/ahonkela/dippa/node36.html http://www.cse.iitb.ac.in/saketh/phdthesis.pdf http://www.cs.uiuc.edu/*madhu/learning.pdf http://eprints.iisc.ernet.in/4968 http://en.wikipedia.org/wiki/Q-learning http://ask.com/wiki/Supervised_learning http://people.revoledu.com/kardi/tutorial/Learning/index.html http://www.huomah.com/Search-Engines/Algorithm-Matters/ SEO-Hig http://wwwipd.ira.uka.de/*prechelt/NIPS_bench.html http://gromgull.net/blog/2010/03/the-machine-learningalgorithm http://research.yahoo.com/Machine_Learning http://hunch.net/*vw/ http://www.computer.org/portal/web/csdl/doi/10.1109/TAI. 2000.8. http://www.computer.org/portal/web/csdl/abs/proceedings/icpr/2 http://www.research.ibm.com/infoecon/paps/html/qtree/node4. html http://en.wikipedia.org/wiki/Learning_to_rank http://ttic.uchicago.edu/*smale/papers/online_learning_ algorithms http://www.spb.com/pocketpc-software/flashcards/learningalgorithm http://www.cs.cornell.edu/*caruana/ctp/ct.papers/caruana. icml06.pdf http://www.fon.hum.uva.nl/paul/papers/etgla.pdf http://www.popsci.com/technology/article/2010-09/video-usingnew-learning-algorithm-archer-bot-learns-how-aim-and-shootbow-and-arrow http://www.ntu.edu.sg/home/egbhuang/c-elm.pdf http://www.inference.phy.cam.ac.uk/itprnn/book.html http://www.almaden.ibm.com/cs/projects/algorithms http://www.cs.uiuc.edu/*madhu/learning.pdf http://www.stat.psu.edu/*jiali/course/stat597e/notes2/percept. pdf http://www.stanford.edu/class/cs229/notes/cs229-notes2.pdf http://lasa.epfl.ch http://out.uclv.edu.cu/dgi/Tutorial.pdf http://umichrl.pbworks.com/w/page/7597581/Algorithms-ofReinforcement-Learning http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume8 http://www.cse.unsw.edu.au/*waleed/tclass/ http://www.rtc-idyll.com/shell_dyll/contents/academic_ publications/bung_1992_dynamic_learning_algos_prague/ bung_1992_dynamic_learning_algorithms.html http://www.answers.com/topic/machine-learning http://users.ics.tkk.fi/ahonkela/dippa/node36.html http://www.icgst.com/AIML05/papers/P1120535137.pdf http://citeseerx.ist.psu.edu/viewdoc/summary http://cseweb.ucsd.edu/*yfreund/papers/ LargeMarginsUsingPerceptron.pdf
123
Author's personal copy 270
Int. J. Mach. Learn. & Cyber. (2015) 6:265–275
Table 4 Top-10 URLs of user 1’s ranking for the query ‘‘Learning Algorithm’’
Table 7 Top-10 URLs of ranked list from Borda’s method based metasearch system for the query ‘‘Learning Algorithm’’
1
http://www.cis.hut.fi/ahonkela/dippa/node36.html
1
http://en.wikipedia.org/wiki/Machine_learning
2
http://www.spb.com/pocketpc-software/flashcards/learningalgorithm
2
http://www.education.com/reference/article/algorithms-learning
3
http://ask.com/wiki/Supervisedlearning
3
http://www.inference.phy.cam.ac.uk/itprnn/book.html
4
http://www.computer.org/portal/web/csdl/abs/proceedings/icpr/2
4
http://umichrl.pbworks.com/w/page/7597581/Algorithms-ofReinforcement-Learning
5
http://people.revoledu.com/kardi/tutorial/Learning/index.html
6
http://www.huomah.com/Search-Engines/AlgorithmMatters/ SEO-Hig
7 8
http://dli.iiit.ac.in/ijcai/IJCAI-2003/PDF/085.pdf http://www.research.ibm.com/infoecon/paps/html/qtree/node4. html
5
http://www.stat.psu.edu/*jiali/course/stat597e/notes2/percept. pdf
6
http://ask.com/wiki/Supervisedlearning
7
http://www.cse.iitb.ac.in/saketh/phdthesis.pdf
8
http://hunch.net
9
http://www.cse.unsw.edu.au/*cs9417ml/RL1/algorithms.html
9
http://gromgull.net/blog/2010/03/the-machine-learningalgorithm
10
http://www.ipd.ira.uka.de/*prechelt/NIPS_bench.html
10
http://www.fon.hum.uva.nl/paul/papers/etgla.pdf
Table 5 Top-10 URLs of user 2’s Ranking for the query ‘‘Learning Algorithm’’ 11
http://www.huomah.com/Search-Engines/Algorithm-Matters/ SEO-Hig
Table 8 Top-10 URLs of ranked list from MC2 based metasearch system for the query ‘‘Learning Algorithm’’ 1
http://en.wikipedia.org/wiki/Machine_learning
2
http://www.education.com/reference/article/algorithms-learning
2
http://wwwipd.ira.uka.de/*prechelt/NIPS_bench.html
3
http://ask.com/wiki/Supervisedlearning
3
http://dli.iiit.ac.in/ijcai/IJCAI-2003/PDF/085.pdf
4
http://www.computer.org/portal/web/csdl/abs/proceedings/icpr/2
4
http://www.sensenetworks.com/mve_algorithm.php
5
5
http://www.ntu.edu.sg/home/egbhuang/c-elm.pdf
http://www.research.ibm.com/infoecon/paps/html/qtree/node4. html
6
http://www.inference.phy.cam.ac.uk/itprnn/book.html
6
http://en.wikipedia.org/wiki/Learning_to_rank
7 8
http://www.computer.org/portal/web/csdl/abs/proceedings/icpr/2 http://www.research.ibm.com/infoecon/paps/html/qtree/node4. html
7
http://ttic.uchicago.edu/*smale/papers/online_learning_ algorithms
8
http://people.revoledu.com/kardi/tutorial/Learning/index.html
9
http://www.almaden.ibm.com/cs/projects/algorithms
9
http://dli.iiit.ac.in/ijcai/IJCAI-2003/PDF/085.pdf
10
http://www.cs.uiuc.edu/*madhu/learning.pdf
10
http://www.huomah.com/Search-Engines/AlgorithmMatters/ SEO-Hig
Table 6 Top-10 URLs of user 3’s ranking for the query ‘‘Learning Algorithm’’ 1
http://www.spb.com/pocketpc-software/flashcards/learningalgorithm
2
http://www.cis.hut.fi/ahonkela/dippa/node36.html
3
http://www.inference.phy.cam.ac.uk/itprnn/book.html
4
http://www.cse.iitb.ac.in/saketh/phdthesis.pdf
5
http://umichrl.pbworks.com/w/page/7597581/Algorithms-ofReinforcement-Learning
6
http://www.fon.hum.uva.nl/paul/papers/etgla.pdf
7
http://hunch.net
8
http://ask.com/wiki/Supervisedlearning
9
http://gromgull.net/blog/2010/03/the-machine-learningalgorithm
10
http://www.stat.psu.edu/*jiali/course/stat597e/notes2/percept. pdf
We provide the list of results in U-set to three judges and obtain three different user feedback based ranked lists according to the choices of judges. The user 1’s
123
Table 9 Top-10 URLs of ranked list from rough set based metasearch system for the query ‘‘Learning Algorithm’’ 1
http://en.wikipedia.org/wiki/Machine_learning
2
http://www.education.com/reference/article/algorithms-learning
3
http://ask.com/wiki/Supervisedlearning
4
http://www.computer.org/portal/web/csdl/abs/proceedings/icpr/2
5
http://people.revoledu.com/kardi/tutorial/Learning/index.html
6
http://ttic.uchicago.edu/*smale/papers/online_learning_ algorithms
7
http://www.huomah.com/Search-Engines/Algorithm-Matters/ SEO-Hig
8
http://www.cse.unsw.edu.au/*cs9417ml/RL1/algorithms.html
9
http://dli.iiit.ac.in/ijcai/IJCAI-2003/PDF/085.pdf
10
http://www.sensenetworks.com/mve_algorithm.php
ranking list that contains user 1’s top-10 preferences from U-set is shown in Table 4. The user 2’s ranking list that contains user 2’s top-10 preferences from U-set is shown
Author's personal copy Int. J. Mach. Learn. & Cyber. (2015) 6:265–275
in Table 5. The user 3’s ranking list that contains user 3’s top-10 preferences from U-set is shown in Table 6. The top-10 search results for the query ‘‘Learning Algorithm’’ obtained using Borda’s method based metasearch system are shown in Table 7. The top-10 search results for the query ‘‘Learning Algorithm’’ obtained using MC2 based metasearch system are shown in Table 8. The top-10 search Table 10 Top-10 URLs of ranked list from modified rough set based metasearch system for the query ‘‘Learning Algorithm’’ 1
http://en.wikipedia.org/wiki/Machine_learning
2
http://en.wikipedia.org/wiki/Machine_learning
3
http://ask.com/wiki/Supervisedlearning
4
http://people.revoledu.com/kardi/tutorial/Learning/index.html
5
http://www.huomah.com/Search-Engines/Algorithm-Matters/ SEO-Hig
6
http://www.cse.unsw.edu.au/*cs9417ml/RL1/algorithms.html
7 8
http://dli.iiit.ac.in/ijcai/IJCAI-2003/PDF/085.pdf http://www.sensenetworks.com/mve_algorithm.php
9
http://www.cis.hut.fi/ahonkela/dippa/node36.html
10
http://www.cse.iitb.ac.in/saketh/phdthesis.pdf
Table 11 Top-10 URLs of ranked list from neural network based metasearch system for the query ‘‘Learning Algorithm’’ 1
http://www.computer.org/portal/web/csdl/abs/proceedings/icpr/2
2
http://www.almaden.ibm.com/cs/projects/algorithms
3
http://www.cs.uiuc.edu/*madhu/learning.pdf
4
http://www.stat.psu.edu/*jiali/course/stat597e/notes2/percept. pdf
5
http://www.cse.unsw.edu.au/*cs9417ml/RL1/algorithms.html
6
http://www.sensenetworks.com/mve_algorithm.php
7 8
http://www.cis.hut.fi/ahonkela/dippa/node36.html http://www.stanford.edu/class/cs229/notes/cs229-notes2.pdf
9
http://www.fon.hum.uva.nl/paul/papers/etgla.pdf
10
http://www.stanford.edu/class/cs229/notes/cs229-notes2.pdf
271
results for the query ‘‘Learning Algorithm’’ obtained using rough set based metasearch system are shown in Table 9. The top-10 search result for the query ‘‘Learning Algorithm’’ obtained using modified rough set based metasearch system is shown in Table 10. The top-10 search results for the query ‘‘Learning Algorithm’’ obtained using neural network based metasearch system are shown in Table 11. To compare the performance of the neural network based metasearch system with the performance of the other four metasearch systems, we computed the values of modified Spearman rank order correlation coefficient between each evaluator’s ranking and ranking of the search results returned by each of the five different metasearch systems. Values of modified Spearman correlation coefficient between user 1’s ranking and ranking of search results of Borda’s count, MC2, rough set, modified rough set and neural network based metasearch systems are listed in Table 12 and are pictorially represented in Fig. 1. Values of modified Spearman correlation coefficient between user 2’s ranking and ranking of search results of Borda’s count, MC2, rough set, modified rough set and neural network based metasearch systems are listed in Table 13 and are pictorially represented in Fig. 2. Values of modified Spearman correlation coefficient between user 3’s ranking and ranking of search results of Borda’s count, MC2, rough set, modified rough set and neural network based metasearch systems are listed in Table 14 and are pictorially represented in Fig. 3. Results of the combined evaluation of the five metasearch systems namely Borda’s method based metasearch system, MC2 based metasearch system, rough set based metasearch system, modified rough set based metasearch system and Neural Network based metasearch system by the three independent evaluators are shown in Table 15. Results from Table 15 are pictorially represented in Fig. 4. From Table 15 and Fig. 4, this is clear that the neural network based metasearch system performs well in comparison
Table 12 Results of evaluation of the five metasearch systems by the independent evaluator user 1 Test query
Borda count
MC2
Rough set
Modified rough set
Neural network
1
0.711515
0.809524
0.800696
0.856522
0.907500
2
0.731131
0.722304
0.734722
0.744526
0.847944
3 4
0.783092 0.795476
0.769577 0.732604
0.813236 0.778159
0.821691 0.772253
0.812381 0.830923
5
0.844283
0.755190
0.865022
0.907792
0.794121
6
0.690893
0.743224
0.766228
0.794444
0.860000
7
0.740257
0.827288
0.840606
0.857836
0.722944
8
0.736121
0.765833
0.799643
0.782440
0.730298
9
0.817811
0.750031
0.784303
0.763602
0.772818
10
0.754955
0.765041
0.695944
0.727792
0.778922
AVG
0.7505534
0.7640616
0.7878459
0.8028898
0.8057851
123
Author's personal copy 272
Int. J. Mach. Learn. & Cyber. (2015) 6:265–275
Fig. 1 Results of evaluation by user 1
Neural Network Modified Rough Set Rough Set MC2 Borda Count
0.72
0.74
0.76
0.78
0.8
0.82
Modified Spearman Rank Correlation Coefficient
Table 13 Results of evaluation of the five metasearch systems by the independent evaluator user 2 Test query
Borda count
MC2
Rough set
Modified rough set
Neural network
1
0.710130
0.810714
0.801739
0.857217
0.908810
2
0.725536
0.714297
0.726716
0.736520
0.855519
3
0.784408
0.768884
0.808385
0.816285
0.812500
4
0.797738
0.729271
0.773214
0.760165
0.822469
5
0.829730
0.745687
0.854978
0.897576
0.790119
6
0.687917
0.744145
0.761111
0.794737
0.864211
7
0.748713
0.835621
0.845281
0.860606
0.722511
8
0.741268
0.773333
0.805714
0.787202
0.741250
9
0.812365
0.741776
0.779800
0.755597
0.783840
10
0.756618
0.765916
0.702078
0.734892
0.786274
AVG
0.7514423
0.7640616
0.7859016
0.8000797
0.8087503
Fig. 2 Results of evaluation by user 2
Neural Network
Modified Rough Set
Rough Set
MC2
Borda Count 0.72
0.74
0.76
0.78
0.8
0.82
Modified Spearman Rank Correlation Coefficient
to the other four metasearch systems as it has higher values of the modified Spearman rank order correlation coefficient in comparison to the other systems.
5 Conclusion In this paper, we discussed the user feedback metasearching using neural network. We obtained the feedback from
123
the user implicitly by watching user’s actions on the search results returned in response to his query. We train the multilayer perceptron using the user feedback for each query in training set. Once the system is trained, we may perform metasearching using the trained multilayer perceptron for any new query. Therefore, the neural network based approach is very much useful since it models the user’s preference for metasearching without actual user’s involvement, once the training is over. To test the
Author's personal copy Int. J. Mach. Learn. & Cyber. (2015) 6:265–275
273
Table 14 Results of evaluation of the five metasearch systems by the independent evaluator user 3
Table 15 Results of combined evaluation of the five metasearch systems
Test query
Test query
Borda count
MC2
Rough set
Modified rough set
Neural network
Borda count
MC2
Rough set
Modified rough set
Neural network
1
0.710476
0.818333
0.790956
0.850609
0.913214
2
0.727321
0.718709
0.729493
0.737663
0.846320
1
0.710707
0.812857
0.797797
0.854782
0.909841
2
0.727996
0.718436
0.730310
0.739569
3 4
0.785855 0.797262
0.773874 0.746146
0.817949 0.792994
0.826819 0.776923
0.849927
0.814405 0.829588
3 4
0.784451 0.796825
0.770778 0.736007
0.813191 0.781455
0.821598 0.769780
0.813095 0.827666
5
0.941095
0.750219
0.862597
6
0.685655
0.743224
0.766667
0.903810
0.789869
5
0.838369
0.750365
0.860865
0.903059
0.791369
0.798977
0.865263
6
0.688155
0.743531
0.764668
0.796052
7
0.738603
0.823529
0.863158
0.834892
0.853853
0.728788
7
0.742524
0.828812
0.840259
0.857431
8
0.743290
0.724747
0.768810
0.808690
0.786250
0.721964
8
0.740226
0.769325
0.804682
0.785297
9
0.731170
0.820079
0.746779
0.781301
0.760225
0.791248
9
0.816751
0.746195
0.781801
0.759808
0.782640
10
0.745660
0.763540
0.698788
0.731602
0.785131
10
0.752414
0.764832
0.698903
0.731928
0.783442
AVG
0.7515305
0.7653163
0.7884327
0.8026731
0.808579
AVG
0.759841
0.764138
0.787393
0.801880
0.807704
Fig. 3 Results of evaluation by user 3
Neural Network Modified Rough Set Rough Set MC2 Borda Count 0.72
0.74
0.76
0.78
0.8
0.82
Modified Spearman Rank Correlation Coefficient
Fig. 4 Results of combined evaluation
Neural Network Modified Rough Set Rough Set MC2 Borda Count 0.72
0.74
0.76
0.78
0.8
0.82
Modified Spearman Rank Correlation Coefficient
performance of the neural network based metasearch system, we used a set of 10 new queries and the same set of seven participating public search engines. We evaluated the search results of the neural network based metasearch system using three independent evaluators. We also obtained the results from four other metasearch systems
namely Borda’s method based metasearch system, MC2 based metasearch system, rough set based metasearch system and modified rough set based metasearch system for the same set of 10 new queries and the same set of seven participating public search engines and evaluated their search results using the three independent evaluators. We
123
Author's personal copy 274
used modified Spearman rank correlation coefficient to measure the correlation between the ranked search results from these metasearch systems and the ranking by an evaluator. Experimentally, we deduce that the neural network based metasearch system outperforms the other four metasearch systems. Our results may be relied upon till we are getting correct feedback from the user and the independent evaluators. If due to any reason, we are not able to obtain correct feedback, accuracy of our system is compromised. In future, we may obtain feedback for the same query from different users and may perform an efficient aggregation to obtain collective feedback of the group of users and this aggregated feedback may be then used to obtain the results of neural network based metasearch system. Moreover, in future work, the number of queries in training set and test set may be increased to see the effect on relative accuracy of these systems. It is also an interesting research direction to use queries from different fields in the training set and then, evaluate the performance of the metasearch system using test queries from these different fields.
References 1. Aslam JA, Montague M (2001) Models for metasearch. In: Croft WB, Harper DJ, Kraft DH, Zobel J (eds) Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM Press, pp 276–284 2. Montague M, Aslam JA (2002) Condorcet fusion for improved retrieval. In: Proceedings of the 11th International Conference on Information and Knowledge Management, pp 538–548 3. Vogt CC, Cottrell GW (1999) Fusion via a linear combination of scores. Inf Retrieval 1(3):151–173 4. Dwork C, Kumar R, Naor M, Sivakumar D (2001) Rank aggregation methods for the web. In: Proceedings of the tenth international conference on World Wide Web, pp 613–622 5. Renda ME, Straccia U (2003) Web metasearch: Rank vs. score based rank aggregation methods. In: Proceedings of the 18th Annual Symposium on Applied Computing, pp 841–846 6. Ali R, Sufyan Beg MM (2007) A learning algorithm for metasearching using rough set theory. In: Proceedings of the 10th International Conference on Computer and Information Technology (ICCIT 2007), IEEE Press, Dhaka, Bangladesh, pp 361–366 7. Ali R, Saxena A, Gupta R, Sufyan Beg MM (2011) Myriad-a novel user feedback based metasearch engine. In: Proceedings of the 2011 International Conference on Control, Robotics and Cybernetics (ICCRC 2011) vol 1, New Delhi, India, pp 163-167 8. Borda JC (1781) Memoire sur les election au scrutiny. Histoire de l’Academie Royale des Sciences 9. Cheng CT, Wang WC, Xu DM, Chau KW (2008) Optimizing hydropower reservoir operation using hybrid genetic algorithm and chaos. Water Resour Manage 22(7):895–909 10. Jia W, Ling B, Chau KW, Heutte L (2008) Palmprint identification using restricted fusion. Appl Math Comput 205(2): 927–934
123
Int. J. Mach. Learn. & Cyber. (2015) 6:265–275 11. Xie JX, Cheng CT, Chau KW, Pei YZ (2006) A hybrid adaptive time-delay neural network model for multi-step-ahead prediction of sunspot activity. Int J Environ Pollut 28(3–4):364–381 12. Lin JY, Cheng CT, Chau KW (2006) Using support vector machines for long-term discharge prediction. Hydrol Sci J 51(4):599–612 13. Wu CL, Chau KW, Li YS (2009) Predicting monthly stream flow using data-driven models coupled with data-preprocessing techniques. Water Resour Res 45:8432. doi:10.1029/2007WR006737 14. Zhang J, Chau KW (2009) Multilayer ensemble pruning via novel multi-sub-swarm particle swarm optimization. J Univers Comput Sci 15(4):840–858 15. Beg MMS, Ahmad N (2003) Soft computing techniques for rank aggregation on the World Wide Web. WorldW Int J 6(1):5–22 16. Ahmad N, Beg MMS (2002) Improved methods for rank aggregation on the World Wide Web. In: Proceedings of the International Conference on Knowledge Based Computer Systems (KBCS 2002), Mumbai, India, Dec 19-21, pp 193–202 17. Beg MMS, Ahmad N (2002) Improved shimura technique for rank aggregation on the World Wide Web. In: Proceedings of the 5th International Conference on Information Technology (CIT 2002), Bhubaneswar, India, Dec, 21–24 18. Beg MMS (2004) Parallel rank aggregation for the World Wide Web. In: Proceedings of the International Conference on Intelligent Sensing and Information Processing (ICISIP – 2004), IEEE Press, Chennai, India, Jan 4–7, pp 385–390 19. Beg MMS, Ahmad N (2002) Genetic algorithm based rank aggregation for the Web. In: Proceedings of the 6th International Conference on Computer Science and Informatics—a track at the 6th Joint Conference on Information Sciences (JCIS 2002), Durham, NC, USA, Mar 8–13, pp 329–333 20. Ahmad N, Beg MMS (2002) Fuzzy logic based rank aggregation methods for the World Wide Web. In: Proceedings of the International Conference on Artificial Intelligence in Engineering and Technology (ICAIET 2002), Malaysia, June 17–18, pp 363–368 21. Ali R, Beg MMS (2009) A comparative study of rough set and fuzzy set based rank aggregation techniques for the Web. Int J Info Process 3(1):78–91 (ISSN 0973–8215) 22. Ali R, Beg MMS (2007) Rough set based rank aggregation for the Web. In: Proceedings of the 3rd Indian International Conference on Artificial Intelligence (IICAI-07), Pune, India, Dec 17–19, pp 683–698 23. Ali R, Beg MMS (2008) User feedback based metasearching using rough set theory. Int J Fuzzy Syst Rough Syst (IJFSRS) 1(2):45–56 24. Ali R, Beg MMS (2008) User feedback based metasearching using rough set theory. In: Proceedings of the 2008 International Conference on Information and Knowledge Engineering (IKE’08)—a track at the 2008 World Congress in Computer Science, Computer Engineering and Applied Computing (WORLDCOMP’08), Las Vegas, USA, July 14–17, pp 489–495 25. Ali R, Sufyan Beg MM (2009) Modified rough set based aggregation for effective evaluation of Web search systems. In: Proceedings of the 28th North American Fuzzy Information Processing Society Annual Conference (NAFIPS2009), IEEE Press, Cincinnati, Ohio, U.S.A., June 26. Haykin S (1999) Neural networks. Prentice-Hall International Inc., 1999 27. Barakat M, Lefebvre D, Khalil M, Druaux F, Mustapha O (2013) Parameter selection algorithm with self adaptive growing neural network classifier for diagnosis issues. Int J Mach Learn Cybern 4(3):217–233 28. Zheng Huiru, Wang Haiying (2012) Improving pattern discovery and visualisation with self-adaptive neural networks through data transformations. Int J Mach Learn Cybern 3(3):173–182
Author's personal copy Int. J. Mach. Learn. & Cyber. (2015) 6:265–275 29. Wang Xizhao, Dong Chun-Ru, Fan Tie-Gang (2007) Training T-S norm neural networks to refine weights for fuzzy if-then rules. Neurocomputing 70(13–15):2581–2587 30. Tsang Eric, Wang Xizhao, Yeung Daniel (2000) Improving learning accuracy of fuzzy decision trees by hybrid neural networks. IEEE Trans Fuzzy Syst 8(5):601–614 31. Fausett L (1994) Fundamental of neural networks: architectures, algorithms and application. Prentice-Hall, New Jersey 32. Freeman JA, Skapura DM (2005) Neural networks: algorithms, application and programming techniques. Pearson-Education Private Ltd., 2005 33. Beg MMS (2002) On measurement and enhancement of Web search quality. Ph.D. thesis submitted to the Department of Electrical Engineering, I. I. T. Delhi, India 34. Beg MMS, Ahmad N (2007) Web search enhancement by mining user actions. Int J Inform Sci 177(23):5203–5218 (Elsevier Science)
275 35. Ali R, Beg MMS (2007) A framework for evaluating web search systems. WSEAS Trans Syst 6(2):257–264 36. Ali R, Beg MMS (2009) Automatic performance evaluation of web search systems using rough set based rank aggregation. In: Proceedings of the IEEE Workshop on Recent Trends in Human Computer Interaction, Springer, ISBN No. 978-81-8489-203-1, Allahabad, India, Jan 19–21, pp 344–358 37. Ali R, Beg MMS (2006) A comprehensive model for web search evaluation. In: Proceedings of the 5th WSEAS International Conference on Circuits, Systems, Electronics, Control & Signal Processing (CSECS ‘06), Dallas, USA, Nov 1–3, pp 159–164 38. Ali R, Beg MMS (2011) An overview of web search evaluation methods. Comput Electr Eng Int J 37(6):835–848
123