Optimization mechanism for Web Search Results using Topic Knowledge Yannis Panagis and Evangelos Sakkopoulos* Computer Engineering & Informatics Dept, University of Patras, 26500 Patras, Greece Email:
[email protected] Email:
[email protected] *Corresponding author
John Garofalakis and Athanasios Tsakalidis Research Academic Computer Technology Institute, N. Kazantzaki Str., 26504 Patras, Greece Email:
[email protected] Email:
[email protected]
Abstract Web searching is one of the most frequent activities among the Internet community, but perhaps the most complicated one, because of the abundant resulting information available. Users find themselves a lot of times puzzled before long results’ ranking lists that are compiled without considering their individual preferences and needs. In this work, we present a mechanism that re-ranks and groups search results on the user’s side according to his/her explicit and implicit choices. Furthermore, to minimize personalization effect response time a caching strategy is introduced. A web environment prototype has been developed to exemplify the potentials of the proposed mechanism. User assessment has been conducted to verify the effectiveness of the mechanism. Results and feedback has been efficient and encouraging respectively.
Keywords: Category based personalization, Personalized Search, Web Searching Biographical notes: John Garofalakis, born in 1959, obtained his Ph.D. from the Department of Computer Engineering and Informatics (CEID), University of Patras, Greece, in 1990, and his Diploma on Electrical Engineering from the National Technical University of Athens, Greece, in 1983. He is currently Associate Professor in CEID and manager of the Telematics Center Department at the Research and Academic Computer Technology Institute of Greece. His research interests include performance evaluation, distributed systems and algorithms, Internet technologies and applications. He has published over 65 papers in various journals and refereed conferences and is author of several books and lecture notes in the Greek language. Yannis Panagis was born in Greece, in 1978. He is currently a PhD candidate at the Computer Engineering and Informatics Department at the University of Patras and a member of the Research Unit 5 of the RA Computer Technology Institute. Yannis holds an MSc from the same Department, where he has also completed his undergraduate studies. His interests span the areas of Data Structures, String Processing Algorithms and Web Engineering, where he has published papers in international journals and conferences. He has also co-authored two book chapters. Evangelos Sakkopoulos was born in Greece, in 1977. He is currently a PhD candidate at the Computer Engineering and Informatics Department, University of Patras, Greece and a member of the Research Unit 5 of the RA Computer Technology Institute. He has received the M.Sc. degree with honors and the diploma of Computer Engineering and Informatics at the same institution. His research interests include Web Services, Web Engineering, Web Usage Mining, Web based Education, Web Services, Web Searching and Intranets. He has more than 20 publications in international journals and conferences at these areas Athanasios Tsakalidis is a Professor in the Department of Computer Engineering and Informatics of the University of Patras since 1993. He completed his Diploma in Mathematics in 1973 (University of Thessaloniki), his studies and his PhD in Informatics in 1980 and 1983 respectively (University of Saarland, Germany). From 1983 to 1989, he worked as a researcher at the University of Saarland, and was student and cooperator of Prof. Kurt Mehlhorn (Director of Max Planck Institute of Informatics in Germany). He is one of the contributors of the ‘Handbook
1
of Theoretical Computer Science’ (Elsevier and MIT-Press 1990). His main interests are Data Structures, Graph Algorithms, Computational Geometry, Multimedia, Information Retrieval and Bioinformatics. Professor Tsakalidis has authored books, chapters, and numerous publications in International Journals and Conferences having an especial contribution to the solution of elementary problems in the area of data structures.
1. Introduction The Internet has evolved into a radically evolving universal information source. To seek information into this ever increasing amount of data efficiently, users utilize popular Web search engines such as [17][18][19]. Unfortunately, users often spend long time and a lot of energy to locate successfully the information they are really interested in. This happens especially to users with limited web searching experience, who have difficulties in defining efficiently the knowledge domain of the requested information. In this work, we introduce an optimized mechanism that takes advantage of users’ individual choices to group and as a result re-rank the results into topics of interest. This is achieved using two actions. First, topic selection is possible for users in order to define a set of preferred groups of results. Furthermore, a history profile of previous topics chosen is kept in order to implicitly detect user preferences. To maximize the personalization effect a multilevel adaptive web [3] profile is utilized, which was initially presented for educational environments in [7] by some of the authors. The users’ recorded behavior in the direct past helps to forecast their interests and to categorize search results accordingly. After the completion of the standard search engine’s activities, a high performance classifier [6][13] is used for the final re-ranking and topic-based grouping of results. In fact, the classifying module is available with multiple instances. In the proposed work each user may train additionally his/her personal classifier instance apart from using the generic one to achieve targeted personalization training further and more specialized for the user individual interests. An exemplifying paradigm together with initial searching results of a popular search engine can be found in Figure 1. The proposed mechanism’s results are found in the left hand side of the figure under the specific topic title. In the right hand side default Google Search engine’s results are displayed. Furthermore, the proposed mechanism is implemented supporting two user-defined operation modes. To facilitate fast categorization results, the first mode utilizes only search results’ summary text. For users that insist on a safer, topicbased ranking, a second mode is available based on the full text of each resulting page. This intuition has been verified during the user-based, experimental assessment. In the implementation we have further optimized the proposed mechanism’s response time with an intelligent caching policy. The proposed solution is presented and discussed as follows. In section 2, previous work is presented concerning techniques and mechanisms that facilitate searching, enhance search experience and provide means of personalization. Section 3 presents the mechanism behind by the use of the classifier as well as the user-driven training of the classifier instance. Next, in section 4 the personal profile technology is outlined. In section 5 functional specifications of the environment are discussed. Section 6 presents details about the caching policy. Section 7 includes the user feedback of the experimental evaluation of the solution. Finally, section 8 discussed conclusions and describes possible future steps.
2. Previous Work Research community has put a lot of effort in the direction of the personalization of the web searching results up to now. There are results, which are efficient under particular circumstances, but research on this topic is restless. The proposed mechanism provides an additional approach, which interrelates to previous works that are mentioned in the following. A number of attempts have been introduced that deal with the context based improvement of the user searching experience. For instance, topic indicators are returned with the results of Yahoo!, A9 and Google [17][18][19] based on the Open Directory Project category structure. There are also engines that provide results clustering such as WiseNut, Adutopia and Vivisimo [20][21][22]. One step further, Teoma [23] clusters its results as well as it provides query refinements. The metasearch, class of searching environments, implement strategies that map user queries to collections as in [10] [16]. Our optimized mechanism takes advantage of standard search engines infrastructure and builds an infrastructure above them as a transparent to the user extension. The aim is to personalize the resulting set of results after web searching and before rendering the list on the user browser using implicit and explicit user-specific information. A comparative presentation on information filtering and intelligent agent work is given in [8]. Letizia and Citeseer also build user profiles and recommend documents based on the knowledge acquired in the profiles. In contrast to the previous
2
methods that filter search results which have already been retrieved, the proposed techniques give the opportunity to the user to perform a quick topic re-ranking based on the results summary only minimizing in this way the response time of the whole procedure. Furthermore, no general profile is used in information filtering. In the personalized web search arena the personalized PageRank scores have been used to enable “topic sensitive” Web searches [9]. Experiments in this work concluded that it can improve a single Web search. Nevertheless, no experiments based on an individual’s context were conducted. On the other hand, in [5] techniques are introduced for “personally customized authority documents”. This attempt tries to follow the conventions of Kleinberg’s HITS algorithm [11]. Popular search engines (e.g. Personalized Google Search) have recently introduced personalization effects based on precategorization of pages, providing them the ability to save time. The proposed attempt differs in that we try to perform personalization to a small set of categories and document results based on user individual interests. In this limited set of results adaptive web techniques [3] utilized by our environment can help to improve searching experience. Adaptive web makes it possible to deliver personalized views or versions of a hypermedia document, improving this way usability and thus productivity, for all the users with diverse needs and knowledge backgrounds [4][15]. A comparison of various text categorization methods is given in [14]. We have utilized support vector machine [6][13] as a classifier and details are provided in section 3 below.
Figure 1: Comparing results with personalization and Google standard interface
3. SVM Classification There are a number of cases where a series of different subjects with a degree of similarity have to be grouped and ranked into a list of predefined topics/categories. Such cases are the categorisation of documents, the recognition of alphabetical characters, the categorisation of pictures as well as the recognition of DNA chains. These problems are characterized by a large number of parameters and they have been proved both difficult and time-consuming, to be solved, without the help of computers. Various algorithms have been developed for this aim. Such solutions initially learn using predetermined samples, which have been ascertained to belong in the categories of interest. The algorithm that utilized in the proposed work is the reputable Support Vector Machine (SVM). As its name declares, SVM creates a vector that describes each one of the entered samples. In the sequel, it can separate the vectors that contain appreciable information and classify them into categories. This is achieved using comparison between the vectors and the samples, used for the learning of the algorithm. SVM is proved in previous works to be both quick and quick in cases of text classification [6][13].
3.1. User-defined SVM Training To elevate the accuracy of the proposed mechanism, it is essential to train the SVM algorithm with pre-selected samples of representative topics. In this case the popular list of ODP topics was utilized. ODP topics are accompanied with a number of web pages – documents available to train an SVM. The main concept of SVM is that the classification of a web page result into a topic follows previous categorized samples. After taking the “advice” of the previous training, the algorithm decides the topic that the page should be grouped under.
3
At this point, an important aim is to avoid erroneous conclusions in future page-category comparison attempts. This is achieved, if assignment of web pages to categories is as representative as possible. Still, it is expected that the more samples we provide to the algorithm for each category, the more likely it is to achieve suitable equivalence between web pages and categories. The training of the text classifier is crucial for the efficient operation of the system. In this work, the initial SVM training is realised before the first use of the system. In a next level, it is possible for users to provide additional and focused training of a personal classifier instance (see Figure 2). Each user may choose to provide the personal classifier with data of his/ her personal choices in order to perform extra specialization concerning the categories of personal interest than others. Thereafter, we will describe analytically the design and the training procedure. Using suitable samples is a very critical step in the process of SVM learning. These samples contribute in the categorisation of the next stages. The successful outcome of the algorithm training procedure depends highly on the selection of appropriate samples. For even more accurate results, the proposed personalized environment is designed to facilitate the users to train the classifying algorithm. The user is able to define the number of samples that will be used, as well as the number of samples that have already been used for the category training. This way the system avoids reusing the same samples. In the form shown in Figure 2, the user’s form appears. The training process includes mining of samples (ODP list or even Google Search or other engine results) and exporting of the most useful keywords.
3.2. SVM Implementation Details As it was mentioned before, we used the libSVM 2.4 library to implement the SVM algorithm. This library includes a training process for samples whose category is already known and also a category forecasting process. Here we will be concerned with the training of the algorithm, which is achieved in a specific way. The first part of the training process includes the creation of a data file and the second a training model that is useful for the categorization.
4. User Adaptation 4.1. Multi-Level Personal Profile The activities of a user explicitly or implicitly recorded are usually stored into a profile [1]. In this case, the profile adopted bases its conceptual schema in the work of [7]. The knowledge kept in the profile is crucial for the effective personalization of the system. In general, implicit user actions’ recording requires only minimum user involvement. However, all this information can be difficultly tracked into a profile as it is not straightforward which actions correspond to the user's knowledge abilities and preferences. In addition, web applications’ stateless and transactionless nature poses extra constraints in recording the user. In the proposed solution, it has been chosen to record only the topics previously accessed into the profile. These actions indicate that the corresponding topic is important to the user. As a consequence, we introduce a significance factor, which depicts how much interesting is a specific topic to an individual. This factor is computed as follows: Factor of interest = number of movements in the category/total number of movements. The resulting value of the factor can be found next to the title of each topic in the results list. This factor is used for the ranking of the topic-based groups of results. The topics with a higher factor are found first in the results list. Some example is shown in Figure 1. New categories are regarded having a lowest factor value and they are found last in the presented list. Within a topic, the resulting pages p are ranked according to a function of the form: θ (ψ ( p )), ρ ( p )) . In the above function, ψ ( p ) is the category that a page p belongs to, ρ ( p ) is the relevance-importance according to the ranking algorithm of engine, and function φ ( ) indicates how the final ranking will be biased towards an independent ranking or topic importance defined (e.g. φ (α , β ) = (α + β ) 2 ). In the implemented environment, results are presented based on a combination of alphabetical order and PageRank value [12] for rankings a and β respectively.
4.2. Time-based Personalization Obsolescence
4
In order to prevent forgotten and obsolete notions to be maintained in the profile, it is designed to support time-based distinction in users’ reactions. The property of obsolescence can be easily transformed so that it is based on the hypothesis that a uniform distribution of property appearance over time is more valuable than others [2]. Although this approach has been neither proved nor disproved, the mere influence of time is a subject of high effect. In short, properties that have not been used for a period of time longer than the last ten days become disabled (though kept in the users profile) indicating that perhaps the user has nothing more to do with this keyword. After two months these keywords are being deleted as out-dated for the user.
Figure 2: User SVM training web form
Figure 3: Searching Input Form
5. Functional Details and Adaptive Specifications In this section, we will present details of the functional modules of the environment, ignoring the training procedure which is a standard one. First the two operation modes will be discussed along with their adaptation features. Moreover, searching options and tools will be outlined.
5.1. Full-Page or Summary-Only based Categorization The environment includes two built-in different topic categorization functions, allowing the user to choose what is best for him at any step of his searching voyage. One mode takes advantage of the full text of a web page result and after analyzing it, the system decides where to group the specific result. The second mode uses only the summary text provided in the standard search engine result lists. Analysis of the overall search process will be described in the following section. As far as the full-text-based categorization mode is concerned, more reliable topics assignment results are possible. The full-page body provides the maximum detailed description of the content as input in the categorization mechanism. In this case all contained keywords in the page are considered in the categorization process. Additionally, this mode supports efficient re-categorization of a page in case that its full-text body changes radically. Categorization takes into consideration any renewals of the page as every time its full text is examined before categorization (Details depend on the caching policy described below). As a result in this operation mode re-categorization is possible when necessary. The environment adapts efficiently to the new data and achieves better user guidance in this mode. Unfortunately, in cases of page results with excessively large text-size, finding keywords exhaustively can become rather time-consuming. To limit the effects of this drawback, a page-size threshold is set to the first 10KB of text and only this information is taken into consideration by the environment for page categorization. This decision limits delays in the categorization process without affecting system performance too much. Moreover, the content of the main web pages in some websites is not always representative and this very fact can possibly complicate or mislead the categorization. The appropriate information that will promote the categorization procedure may not be included in the main page (e.g. in typical portal home pages). In order to avoid these situations, we categorize only the web pages whose text-size exceeds a minimum bound.
5
Every search engine provides together with a page in the results list a short excerpt that includes parts of the page that include information relative to the user-posed query. We call this text as summary though it surely does not represent a page’s abstract. The proposed mechanism has a functional mode that utilizes this summary during categorization procedures. In this case, the text being processed has securely limited size that minimizes categorization response time. This mode allows fast production of categorization conclusions since it’s easy to isolate the keywords. The summary is representative of the web page content and it can provide satisfactory results. If the summary is descriptive enough, the categorization is considered fully trusted. Drawbacks in this mode derive from the fact-limited information “length” of the summary. This may lead to possibly unsafe conclusions. Moreover, there are cases that this summary is not available at all and the mechanism cannot initiate the process of categorization. After associating the summary with a web page link, the summary text is always constant and it is permanently registered in the data storage of the search engine. Consequently the environment does not have the ability to adapt the categorization according to the potentially changed data of the page. During the evaluation process we have verified that the full-text categorization mode leads to safer grouping of results in topics. This is mainly because it provides a satisfactory number of information in keywords – terms. On the contrary, the summary-based topic based re-ranking may mislead the final categorization propositions. Nevertheless, retrieving and searching in the entire content is much slower in response time. Summary-based choice quickly produces useful conclusions and provided that it is sufficient in length and semantics, users seem to prefer this trade-off.
Figure 4: Categorized Presentation of Results vs typical Result
5.2. Searching procedure in the environment Initially, the user finds himself in front of a simple interface as depicted in Figure 3. Upon submission of a query, the proposed system performs a series of steps just like in the SVM training phase. First, it retrieves using the Google Web Service API resulting proposals. A different search engine can be utilized without affecting the generality of the mechanism. Google has been chosen because it provides a transparent programmable interface through a Web Service. Followingly, we take advantage of multithreading techniques to improve system response time. A kind of broadcasting is performed to request the pages included in the initial query response. A wait time threshold of seven (7) secs has been set for a thread to be completed. As soon as the threshold is met, the threads continue to next requests.
6
In the cases when certain pages are excessively long and in order to avoid delaying the progress of categorization computations, we have specified an upper threshold to their size. This limit is 50K characters, which allows the retrieval of a satisfactory amount of information in order to obtain safe results, without being time-consuming. After having completed content retrieval of the resulting pages, processing begins to analyze the page’s text and detect keywords. For this reason, we follow a process similar to the one described during the training operation. Unnecessary code is removed. This excludes HTML tags, client side JavaScript or other script code and a list of unnecessary stop words. Regular expressions are utilized to implement the above functionality. In the final step, categorization of the web page results in topics is performed. The implementation utilizes the algorithm found in the public libSVM 2.4 library. During the process descriptors of each page are created depicting the representative keywords of a page. Finally, results are rendered dynamically on the client browser.
6. Caching Policy To optimize system response time, a time-based caching policy is applied on retrieved results. The key concept is to avoid unnecessary page retrieval or categorization processing of pages that have recently requested. Furthermore, the search results’ categorization can be time-consuming as it depends on its text length. The intuition behind the caching policy lies in the fact that, a lot of web page results remain unchanged and as a result, it is not necessary either to download them or to repeatedly categorize them. In the former case, expiration dates are utilized to verify page retrieval. This information can be found in a page’s HTML tags usually automatically inserted by many integrated development tools. As for the caching policy concerning the categorization of results, we store them with a timestamp and consider them valid for a two-day period. This time interval is adequately improving the response time in searches that have only a lag between them. The result grouping information stored in system cache, can also be used by the remaining users, if applicable, accelerating the system response this way. As the number of categorized pages increases, the probabilities will also increase for a stored page to be asked in the future. Therefore, we achieve the creation of a data warehouse with categorized web pages, which can be a precious assistance to all system users. This way, we manage to improve system performance, without jeopardizing its validity. The two-day interval has returned efficient operational results and ensured an acceptable speed of responses during assessment.
7. User assessment In this section, we present system assessment results. We evaluated system performance for a number of different usercases asked to perform web searching with the use of our proposed mechanism. We provided online access to the system for students in a laboratory environment. Two separate experimental rounds have been performed. The parameters that help us evaluate the usefulness of the system proposed are: • Categorization – Result Regrouping • Adaptation to the users individual interests
7.1. Comparative Results To evaluate the proposed solution, we have delivered a system version that presented to the evaluating user, side-byside results of our system and the Google Search engine. This was designed to facilitate immediate comparison. Our personalized environment is accompanied by a logging mechanism that records statistical information about the search results a user has preferred. In this way, we detect which solution the user has chosen as more effective. In order to satisfy user needs, the system should categorize the pages in the most efficient way. In the process of categorization, the training of the classification algorithm plays a very important role, as it has already been mentioned. Suitable samples must be selected and the training must be symmetric for all the categories, in order to avoid any erroneous conclusions. In the case of the personalized categorization training, only the categories interesting the user are specifically extra-trained. In our case, the algorithm has been trained uniformly, using results from the ODP repository. Hence, we achieved to create a model that can correspond in each possible case and help significantly the operation of the system. Under the assumption that at the moment of result categorization, inactive pages (broken links) and pages without sufficient content (text length less that 100 characters) are excluded, we observed that our system is able to correctly estimate the majority of page topics. After two rounds of user-based laboratory evaluation, the categorization success-rate is above
7
70%, since the algorithm forecasts with considerable precision. This percentage is even higher, as the system is utilized by more users and the number of training examples is increased or personalized training is performed. Topic based optimization of results has been primarily chosen by subjects during the experiments. Categorization has been judged positively for most queries submitted. The aggregated number of user clicks recorded during the experiments has shown that 73% of them have been in the categorized results of our system. This is a strong indication that our optimization mechanism provides a preferred personalized search interface alternative to the users. A quantitative comparative graph is presented in Figure 5, where the accuracy of the two functional topic assignment modes is presented. Different query terms are presented in the horizontal axis. The vertical axis depicts the result categorization average success percentage of each query for all experimental cases. For the presented queries, the full-page categorization mode performed successfully in 86.67%, while summary-based topic assignment in 54.17%.
7.2. User Interface Evaluation Particular attention has been paid to provide a user-friendly interface for the proposed system. For that reason, we avoided to deal with a large number of registration data, both in the registration process, as well as in the query submission.
Full Page vs Summary Grouping 120 100 80 60 40 20
average
apple
book
airplane
java
jaguar
athens
0
Different Full Text term-query Summary only Figure 5: Full Page vs Summary Grouping results
System registration is simple and requires only the insertion of user name and password. When posing the query, the user has to fill in a) the number of results that he wishes to receive per page and also b) the searching operational mode of the query performed (full-text or summary-based). During the evaluation, users were supportive of the simplified though efficient input forms. They found it easy enough to clarify the search details and to continue without delays. User preferences are stored as preselections, available to guide them the next time they enter the system. Apart from the data insertion, we also tried to simplify the way that results are presented to the user. The “grouping in categories” presentation provides a better perception of the retrieved results and allows the users to proceed without misinterpretations of their choices. The results include the short summary description provided by Google, simplifying even more the readability of results for the user. The web page descriptions are presented correspondingly under their addresses, which in combination with the prediction of their category, lead the users to safer conclusions. Thus, we achieve suitable conditions for the selection of the most suitable page, without redundant information.
8
8. Conclusions and Future Steps Searching is a daily activity for more Internet users. Unfortunately the increasing number of web page sources complicates information detection. As a matter of fact, a lot of internet users are often disappointed, since they are unable to use the search engines effectively in order to locate desired results. Selecting the appropriate keywords can be quite puzzling some times. Nevertheless, even when defining such appropriate keywords to describe a request, search engines return long lists of results always the same for all different kinds of users. The results are presented with no consideration of the user’s personal preferences or needs. In this work, we present, an optimized mechanism that a-posteriori personalizes search results of a classic engine, by using global and local topic knowledge. The main aim is to suggest higher in the ranking topics-groups of pages that interest the user most. The above objective is achieved by categorizing the pages in a list of predefined topics, providing thus a more direct view of their content. The proposed solution uses a high performance SVM classifier in a double role. First, a generic categorizing SVM instance supports all users. Topics assignment to search results is performed in an adaptive sense as the classifier is not used as a monolithic tool. Furthermore, the classifier is built into the environment in personalized multiple instances for the individual user. In this case, users can perform extra training for their personal classifier instance beside using the generic one. In this way users may personalize the environment to handle with more efficiency potential documents and categories of their interests. Additionally, the proposed mechanism implements two functional modes in order to provide full-text based and summary based classification options. Evaluation has verified that these two capabilities work according to a speed vs accuracy trade-off, offering the possibility to the user to choose. Using the proposed system, the user can identify through a different viewpoint potentially interesting pages, since they are grouped in categories based on their content. Consequently, we provide to the user the possibility to select without any additional delay the desirable page, dispensing with unnecessary information. Overall, the proposed environment takes advantage of popular search engine results and assists the user in the detection and selection of the most interesting information. Previous user activity, preferences, choices and needs are stored as knowledge in an adaptive user profile. Such an approach is expected to play a significant role in future attempts for the optimization of search engine performance. Future work includes incorporation of semantic based categorization techniques, experimentation with additional caching policies to minimize the number of page downloads and categorization operation, peer-based personalization upon the categories of the retrieved results in community based environments and transformation of the technique to support a Web Services search environment in the classification and selection of annotated Web Service descriptions.
9. References [1] A.T. Arampatzis, T.P. van der Weide, C.H.A. Koster and P. van Bommel., ”Term selection for filtering based distribution of terms over time”, RIAO’ 2000 Conference Proceedings, Vol. 2, Paris, France, 12-14/4/2000, pp. 1221-1237. [2] J. Beck, M. Stern and E. Haugsjaa, “Applications of AI in Education”, ACM Crossroads, 3(1), 1996, pp. 11-16. [3] P. Brusilovsky and M.T. Maybury, “From Adaptive Hypermedia to the Adaptive Web”, Communications of the ACM Journal, 45(5), May 2002, pp. 31-33. [4] P. Brusilovsky, “Adaptive educational systems on the world-wide-web: A review of available technologies”, 4th International Conference in Intelligent Tutoring Systems, San Antonio, TX, 1998. [5] H. Chang, D. Cohn, and A. K. McCallum, “Learning to Create Customized Authority Lists”, proceedings of the 17th International Conference on Machine Learning (ICML 2000), 2000, pp. 49–54. [6] S.T. Dumais, J. Platt, D. Heckerman, and M. Sahami, “Inductive learning algorithms and representations for text categorization”, proceedings of ACM CIKM98, Nov. 1998, pp. 148-155. [7] J. Garofalakis, E. Sakkopoulos, S. Sirmakessis, A.K. Tsakalidis, “Integrating Adaptive Techniques into Virtual University Learning Environment“, IEEE International Conference on Advanced Learning Technologies (ICALT02), Kazan, Russia, 2002, pp. 2833. [8] U. Hanani, B. Shapira, and P. Shoval, “Information Filtering: Overview of Issues, Research and Systems”, User modeling and user -adapted interaction II, Kluwer Academic Publishers, 2001, pp. 203-259. [9] T.H. Haveliwala, “Topic-Sensitive PageRank”, proceedings of the 11th International World Wide Web Conference (WWW2002), 2002, pp. 517–526.
9
[10] A.E. Howe and D. Dreilinger, “SavvySearch: A meta-search engine that learns which search engines to query”, AI Magazine, 18(2), 1997, pp. 19-25. [11] J.M. Kleinberg, “Authoritative Sources in a Hyperlinked Environment”, Journal of the ACM, 46(5), 1999, pp. 604–632. [12] L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank citation ranking: Bringing order to the Web”, Technical report, Stanford University Database Group, 1998. http://citeseer.nj.nec.com/368196.html. [13] J. Platt, “Fast training of support vector machines using sequential minimal optimization”, Advances in Kernel Methods -Support Vector Learning. B. Scholkopf, C. Burges, and A. Smola, eds., MIT Press, 1999, pp. 185-208. [14] F. Sebastiani, “Machine learning in automated text categorization”, ACM Computing Surveys (CSUR), 43(1), 2002, pp. 1-47. [15] M. Stern, B. Woolf, and J. Kurose, “Intelligence on the Web?”, Artificial Intelligence in Education, Amsterdam: IOS Press, 1997, pp. 490-497. [16] W. Meng, C. Yu, K. Liu, “Building Efficient and Effective Metasearch Engines”, ACM Computing Surveys, 34(1), March 2002, pp.48-89. [17] Google Search http://www.google.com [18] A9 Search http://www.a9.com [19] Yahoo! http://www.yahoo.com [20] Wisenut http://www.wisenut.com/ [21] Adutopia http://www.adutopia.com/ [22] Vivisimo http://www.vivisimo.com/ [23] Teoma http://www.teoma.com/
10