Specialized Search Engines for E-learning - CiteSeerX

3 downloads 34839 Views 327KB Size Report
ized search engine can crawl the pages more frequently [2]. Searches .... In Figure 2 we present a screenshot of the taxonomy builder web interface. On the left ...
Recent Research Developments in Learning Technologies (2005)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

1

Specialized Search Engines for E-learning M. Arrigo*,1, M. Gentile1, D. Taibi1 and O. Di Giuseppe1 1

Italian National Research Council - Institute for Educational Technology - Via Ugo la Malfa, 153 90146 Palermo, ITALY

The web provides an enormous amount of information and the diffusion of search engines has made this information accessible. Without the use of search engines the web would not have been so successful. The most common search engines are keyword-based, but in general they present various limits related to the quality of the search results. To overcome these limits many authors suggest the use of a specialized search engine. In this paper we propose the use of a specialized search engine to assist students in their learning tasks. We introduce an iterative technique to build a taxonomy which is used to classify documents regarding a specific topic, and a tool that assists users in the taxonomy building process. Finally, we present an application example that by means of a search engine, GPS and mobile learning technologies allows students to access relevant information related to the cultural site they are visiting. Keywords search engine; e-learning; mobile learning

1. Introduction The aim of this paper is to present a technique which uses search engine technologies for personalizing information for educational purposes. Recent studies have demonstrated that many users are now accustomed to using a search engine to retrieve interesting information [1]. One object of our study is to design a specialized search engine that searches for information on a specific topic. In fact, even though generic search engines are those most used, they present some limitations when carrying out a task in a particular category. A search engine which is specialized in a particular topic usually generates a better quality of results than a general one. Moreover, the pages indexed by our search engine concern a particular domain, thus dramatically reducing the interference in the results which occurs with a general search engine and increasing the quality of the search. Furthermore, the quality is increased by the presence of a specific taxonomy that allows the system to carry out finer clustering of the information. In such a way and according to [2], the user can specialize the query by choosing a category of the designed taxonomy. In this paper we present a framework that allows users to create a specialized search engine and its relative taxonomy in an iterative and incremental way. The framework uses a traditional search engine, enhancing it with the features that allow users to create taxonomy, define criteria for the document classification and tune the indexing results continuously. In addition, according to [3, 4], we provide an example where we have applied the specialized search engine. In particular, we have used the techniques in a mobile learning framework that combines a wireless access with PDA and localization technologies to use a number of useful services and to access Web information correlated to the place and time of the observation. Finally, some findings in the learning context are also reported.

2. Specialized Search Engines The amount of information available on the Internet is growing rapidly every day. The success of the web is strictly related to the availability of search engines that allow users to perform searches on this enormous data source. Nowadays, search engines are very popular, and students, too, are using them to retrieve relevant learning materials. However, even though generic search engines are most commonly used, *

Corresponding author: Marco Arrigo e-mail: [email protected], Phone: +39 0916809206

© FORMATEX 2005

2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

M. Arrigo, M. Gentile, D. Taibi and O. Di Giuseppe: Specialized Search Engines for E-learning

they present some limitations; for example all search engines index only a small portion of the documents available on the Internet. Moreover, there are some limitations regarding results because a general search engine often returns irrelevant documents. A search engine which is specialized in a particular topic usually generates a better quality of results than a general one. This happens for several reasons: first of all, a specialized search engine has a smaller and more manageable index because the pages indexed are fewer in number and consequently, a specialized search engine can crawl the pages more frequently [2]. Searches performed on the Internet can be divided into two main categories: navigational search and informational search [5]. In the navigational search the user is looking for a page that he knows exists, for example the home page of a university or an organization in general. In this case, search engines are used to navigate and reach the page. In the informational search, instead, the user is looking for information about a specific topic and generally the information is not contained in a single page but is distributed over several documents. In this case, the search is very wide and the keywords provided by the student are more general than in the navigational search. As a result, search engines are used to obtain a set of links about the topic of interest. To increase the quality of the results obtained, search engines can organize the indexed documents in a taxonomy structure. In this way, students can browse the taxonomy tree to find the relevant information; something that can be useful in both search categories. For example, by using the taxonomy structure in the informational search students can directly access a set of documents regarding a specific topic of interest.

3. The framework 3.1

Taxonomy Builder

Modern search engines use a taxonomic structure, such as a hierarchical topic catalog, in order to organize documents into related areas. Topic tagging improves the search experience in many ways; for example it can be used to assist hierarchical visualization and browsing aids. Moreover, the application of taxonomy is very often connected to a specialized search engine; in this way, the search engine only indexes the information sources which are closest to the defined taxonomy. A typical approach to the classification of documents is to provide training set documents; this technique is called supervised learning. Supervised learning has been intensively studied for several decades in AI, machine learning, and pattern recognition, and, of late, in data warehousing and data mining. In supervised learning, the classifier first receives training data in which each item (Web page, in our framework) is marked with a label or class from a hierarchical topic catalog. Once the classifier is trained, it is given unlabeled crawled documents and has to deduce the appropriate category. In order to apply this technique we need to perform the following steps: define the taxonomy relative to a specific topic, choose the right documents for the training set and select the information source to index. In carrying out these steps we have preferred to adopt an iterative and incremental approach as we consider it more suitable than a sequential process. In fact, the approach selected allows a more refined control and produces better results. In this paper we present a framework to create a specialized search engine and its relative taxonomy. The tool communicates with a traditional search engine, and uses a web interface in order to allow users to manage all the phases for the creation of the specialized search engine and its taxonomy. The aim of this framework is not to propose new algorithms for the classifier, but to facilitate users in controlling the definition of the taxonomy and the supervised learning method through the same Web Interface.

© FORMATEX 2005

m-ICTE2005 http://www.formatex.org/micte2005

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

3

In our framework we propose different types of classifiers, starting from the growing number of statistical learning methods which have been applied to this problem in recent years, including regression models [6,7], nearest neighbour classifiers [8], Bayes belief networks [9,10], decision trees [6,10,11], rule learning algorithms [12,13], neural networks [13] and inductive learning techniques [14]. The user can verify the elements classified in a specific category and can validate them by navigating the results of a search. In particular, through the classic web search engine interface, the user can - insert/move/rename a category in the taxonomy tree - insert the document as a representative example/counter-example for a specific topic - analyze each result and decide if the document is in the correct category, and if not, the user can move it to another existing category or create a new one - decide whether a document is a non-indexable document for the specialized search engine

Fig. 1 The Taxonomy Builder Architecture

In Figure 1 we present the architecture of the taxonomy builder. The framework adds a specific couple of trainer/classifier modules to a standard search engine architecture. The aim of these modules is to deduce the appropriate category of the crawled documents. To explain this architecture we consider the case of the nearest neighbour (NN) classifier. Starting from the training set documents the NN trainer creates a representation of each single category in the taxonomy tree. Then, in the pipeline, the NN classifier measures the similarity of the crawled documents with the model of each single category and labels the documents with the categories which are nearest to it. In Figure 2 we present a screenshot of the taxonomy builder web interface. On the left hand side it is possible to see the taxonomy tree which can be used to navigate through the taxonomy categories. By clicking on a category of the taxonomy tree, the taxonomy builder will show the documents belonging to that category. For each document the title, a brief summary of the contents, the complete url and the size are shown. Moreover, there are three links for performing important operations on the document. These links are marked with an ellipse in Figure 2. Using the first of these, the “Move to…” link, the user can move the document into another taxonomy category. Generally, this happens when s/he finds that the document was wrongly categorized by the search engine. The Counter-example link permits the user to mark the document as a counter-example for the selected category. The last operation regards the exclusion of the document from the search engine index. This operation is performed using the Exclude link. In particular this figure shows a taxonomy about art history based on the dmoz open directory.

© FORMATEX 2005

4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

M. Arrigo, M. Gentile, D. Taibi and O. Di Giuseppe: Specialized Search Engines for E-learning

Fig. 2

A screenshot of the Taxonomy Builder Interface 3.2

Application Example

In this section we present an application that we have developed using the search engine technique described above to support students’ learning activities during a visit to historical buildings. This project also uses technologies related to mobile devices and the Global Positioning System. Although, in the last few years a number of mobile learning applications for visiting museums and archaeological sites have been proposed, these generally require the contents to be prepared and structured in advance. Moreover, it is necessary to update the information regarding for example new historical sources, the latest news about opening times, events and so on. Otherwise a deterioration in the quality of the service occurs. Up-to-date information can be retrieved by the student using a search engine to access the Internet with a PDA. But the use of a traditional search engine brings students up against the limitations described above in section 2. In our study we propose the use of a specialized search engine that allows access to updated information sources and the acquisition of a set of semi-structured data. The search engine we have designed selects only documents from a set of validated art information sources on the web and can be used through a set of predefined specialized queries or free text Internet searching. Searching modes are based on the location and use of the taxonomy designed for document classification. Furthermore, according to [2] we improved the quality of the search by using a query specialization procedure that extends the user queries with a category of the designed taxonomy. Moreover, bearing in mind the limited hardware resources of mobile devices, we have developed an easy to use interface that

© FORMATEX 2005

m-ICTE2005 http://www.formatex.org/micte2005

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

5

enables the users to surf the Internet and to have the information they are really looking for, after only a few interactions.

4. Conclusion and ongoing work Our study argues that the use of a specialized search engine is to be preferred in a learning environment to retrieve information about a specific topic. Using a specialized search engine, students can access the information they require without encountering the results interference of a general search engine. Moreover, it is important for the search engine to provide a taxonomic structure that allows students to access the relevant categories directly. At present, different techniques allow association of documents to a category of the taxonomy. In our framework we propose different types of classifiers, starting from the growing number of statistical learning methods. In the ongoing work we would like to compare different classification techniques to perform this task in a more efficient and effective way.

References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

iProspect, Search engine user attitudes survey, on-line at http://www.iprospect.com/about/free-seminformation.htm , (2004). R. Steele, Techniques for Specialized Search Engines, Proceedings of Internet Computing '01, Las Vegas, USA, (2001). M.Arrigo, M.Gentile, D.Taibi, Mobile, location and search technologies in the visiting experience, Proceedings of the first International Conference on Telecommunications and Network Computers, San Sebastian, Spain, December 1-3, (2004). M.Arrigo, M.Gentile, D.Taibi, The use of search engine technologies to enhance an on-site learning experience, accepted to The 4th IASTED International Conference on WEB-BASED EDUCATION, Grindelwald, Switzerland, February 21-23, (2005). A.Broder, A Taxonomy of Web Search. SIGIR Forum 36, 2, (2002). N. Fuhr, S. Hartmanna, G. Lustig, M. Schwantner, K. Tzeras, Air/x – a rule based multistage indexing systems for large subject fields. Proceedings of RIAO’91, pp. 602-623, (1991). Y. Yang, C.G. Chute, A linear least squares fit mapping method for information retrieval from natural language texts. Proceedings of the 14th International Conference on Computational Linguistics (COLING 92), pp. 447-453, (1992). Y. Yang, C.G. Chute, An example-based mapping method for text categorization and retrieval. Proceedings of ACM Transaction on Information Systems (TOIS), pp. 253 277, (1994). K. Tzeras, S. Ilartman, Automatic indexing based on bayesian inference networks. Proceedings of 16th Ann Int ACM SIGIR Conferences on Research and Development in Information Retrieval (SIGIR'93), pp. 22-34, (1993). D.D. Lewis, M. Ringuette. Comparison of two learning algorithms for text categorization. Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94), (1994). I. Moulinier. Is learning bias an issue on the text categorization problem? Technical report, LAFORIA-LIP6, Universite Paris 17, (1997). I. Moulinier, G. Raskinis, J. Ganascia, Text categorization: a symbolic approach. Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval, (1996). E. Wiener. J.O. Pedersen, A.S. Weigend, A neural network approach to topic spotting. Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR'95), (1995). David D. Lewis. Robert F. Schapire, James P. Callan, Ron Papka. Training algorithms for linear text classifiers. Proceedings of tee 19th Annual International ACM SIGIR '96: Conference on Research and Development en Information Retrieval, pp. 298-306, (1996).

© FORMATEX 2005