Making E-Learning Better Through Machine Learning - CiteSeerX

65 downloads 3593 Views 45KB Size Report
specific concept hierarchies could address the problem of indexing and reusabil- ity. ... of a learning domain we need an ontology describing the domain in an ...
Making E-Learning Better Through Machine Learning Paramjeet S. Saini1 , Diego Sona2 , Sriharsha Veeramachaneni2 , and Marco Ronchetti1 1 Department

of Information and Communication Technology, University of Trento, Italy. 2 ITC-irst, Automated Reasoning Systems Division, Trento, Italy.

Abstract Due to massive information overload on the web it’s hard to index and reuse existing learning resources. Classifying learning resources according to domain specific concept hierarchies could address the problem of indexing and reusability. Manual classification is a tedious task and, as a result, automatic classifiers are in high demand. For this task we present an automated approach based on machine learning technique to exploit hierarchal knowledge in order to classify learning resources in a given hierarchy of concepts. We show by experimentation that using hierarchical information and content of unclassified documents provides better accuracy. Keywords: unsupervised hierarchical classification, e-learning resource management, bootstrapping, document clustering, Expectation Maximization.

1 Introduction It is well known that manufacturing of learning material is both time consuming and expensive. It is therefore important to be able to reuse existing learning resources on the Web. Moreover, students would benefit if they were able to retrieve learning materials that are suited to their needs, even when such materials are provided by external sources. Due to the massive information overload on the Web, the main problem here is to find and manage the relevant learning resources. We believe that hierarchical organization of learning resources is an method to address the above-mentioned problem. Hierarchical classification according to

Computer Science

Programming Fundamentals

Recursion Recursion Recursive Mathematical Functions Divide and conquer Recursive backtracking Implementation .........

Figure 1: A snapshot of the concept hierarchy derived from ACM Computing Curricula 2001.

some predefined categories has proved to be very useful in the e-Learning domain, where any learning resource can be defined in terms of related concepts. To deliver clear concepts about any of the subject’s area it is necessary to find precise relationships between documents belonging to the various concepts. This goal can be attained if the learning resources to be delivered are arranged according to a predefined concept hierarchy. However building a domain specific concept hierarchy is a challenging task. Actually, to create a concept hierarchy spanning all the aspects of a learning domain we need an ontology describing the domain in an exhaustive way. Our main aim is to generate a hierarchy of concepts related to the “computer science” domain, and populate its leaves with e-learning material taken from the Web. This concept hierarchy was generated according to the subject areas covered in the ontology that we derived from the ACM Computing Curricula 2001 for Computer Science (for details refer http://www.computer.org/education/cc2001) [1]. In this hierarchy there are three layers of knowledge, i.e., Areas, Units, and Topics. Each of these layers is composed by labelled nodes, and the leaf node of the hierarchy is labelled with many keywords of the subtopic (see a snapshot in Fig. 1) Our main task is to organize the learning resources within the given hierarchy of concepts, allocating the documents in the leaf nodes. Hence the classification task needs to determine the most relevant leaf for any input document, exploiting the hierarchical information, i.e., the descriptions (labels) of a node and those of its parent. Since manual filtering and classification are time consuming and expensive we need a mechanism to automate the classification process. Hierarchical document classification has been addressed many times within the Information Retrieval and Machine Learning communities (see for example [2, 3, 4, 5, 6]). All the proposed models however are based on supervised learning strategies, where classifiers are trained with a set of labelled training data. On the other hand, our task is to classify documents given a hierarchy of concepts built from scratch, without any example of classified documents. This is the so-called ‘unsupervised’ setting, i.e., only unlabelled objects are available. We can however use the knowledge about the classes, which is provided in terms of the class labels and the concept hierarchy. We refer this task to as “bootstrapping” [7]. We propose a convenient initialization of a simple Na¨ıve Bayes classifier which

takes into account the knowledge related to the keywords describing classes and the hierarchy. The generated classifier is then used for a preliminary categorization of documents. To increase the quality of results we also propose a similar variation of the widely used Expectation Maximization (EM) algorithm that learns the classifier parameters also using the unclassified data. The paper is structured as follows: Section 2 describes our approach to the “unsupervised” classification, Section 3 describes the datasets used to test the models, and the results of their evaluation. Finally, in Section 4 we conclude and give some ideas of directions for future work.

2 Approach to Classification The Na¨ıve Bayes (NB) classifier (see [8, 9]) is a simple probabilistic approach used to classify objects to some predefined category. Usually the parameters of the classifier are learned from the set of classified examples, and then they are used to classify new objects. The classification is based on the probability of given document d belonging to a class c is given by: P (c|d) = P (c)

Y

P (w|c),

(1)

w∈d

where P (w|c) is the class-conditioned word probability and P (c) is the class prior probability. Usually, these two probabilities are estimated using the statistics of the dataset of classified examples. In our case however, there are no such examples. Therefore we initialize the parameters using the prior knowledge (i.e., the descriptions of the 1 nodes). In particular, P (c) is set to |C| and P (w|c) is taken as 0.9 if w is found as class label, 0.1 otherwise. This approach provides minimal benefits in the classification process for two reasons. Firstly, the classification can result in high rejection since the classifier only uses the labels at the nodes, and therefore it can be difficult to disambiguate between different classes for many documents. Secondly, the accuracy is low also because the small number of keywords at the nodes gives insufficient information. The structural knowledge can also be used by initializing the probabilities P (w|c) of a node by also using labels at its parent node. In this case the ambiguities between different classes in different sub-trees are reduced, therefore, the errors can be reduced to classes in the same sub-tree. Clearly, the Na¨ıve Bayes approach only uses part of the knowledge we have. The other knowledge is contained in the unclassified documents. The EM classifier has been adopted in order to also exploit this knowledge. In particular, the EM algorithm uses the classification results on the entire data set at an iteration to reestimate the classifier parameters for classification at the following iteration. The EM algorithm is initialized with the results of the Na¨ıve Bayes classifier andf then it iterates re-classifying documents and learning new parameters (see Fig. 2).

• Initialize classifier parameters with the NB parameters; • EM training: – E-step – Classify the documents; – M-step – Compute new parameters using classified documents; • Iterate E and M steps till classifier converge. • Output: A NB classifier trained with unclassified data. Figure 2: The EM algorithm starts using the initialized NB parameters and then iterate learnig the data

Labelling the entire set of documents with the outlined initialization of NB and running EM helps to exploit both taxonomic knowledge as well as the content of the unlabelled documents, improves the classification accuracy.

3 Experimental Evaluation To evaluate the model in a real world scenario, we used a benchmark dataset for documents classification on hierarchies (see [10]), modified according to the task we are addressing. This dataset is made up of a set of taxonomies selected from two well known Web directories (Google and LookSmart). The selected taxonomies were shrinked taking only branches of depth three. The documents in the subtree of each branch were shifted in the corresponding leaf of the branch and the documents classified to internal categories were removed. Each concept hierarchy with the corresponding set of documents comes with a vocabulary (feature set), derived removing stop-words, stemming the keywords to common roots, and performing a feature selection based on the notion of Shannon Entropy. Documents were then represented as bag-of-words (words with frequencies). For a detailed description of the dataset and its preprocessing refer [10]. The evaluation of the proposed model has been done using the Micro-F 1 measure [11]. This is a standard information retrieval measure that combines precision (i.e, ratio between the number of correctly classified documents and the number of classified documents) and recall (i.e., the ratio between the number of correctly classified documents and the number of documents that should be classified). The reason for this evaluation measure is that we were interested in evaluating the model both for its classification accuracy and for its ability to reduce the ambiguity when classifying (rejection). 3.1 Experimental Results Experimental evaluation reveals that higher accuracy can be attained by the proposed model when hierarchical knowledge (parent’s labels) is taken in to account (see table 1). By using labels of the parent together with the local labels we reduce the classification error in the initialization phase that can be further reduced by the iterative EM clustering.

Docs Classes

Node labels

Node+Parent labels

NB

NB+EM

NB

NB+EM

Neural disorder Archeology

1294 428

94 12

47.2% 28.0%

42.2% 63.0%

58.3% 29.2%

50.9% 63.3%

Peripherals Common language

3904 1792

35 23

66.9% 65.0%

64.7% 73.8%

66.1% 66.7%

66.0% 77.8%

Zoology Movies

2340 517

26 26

29.3% 36.7%

42.6% 32.1%

30.8% 38.4%

45.8% 39.8%

Average 45.5%

53.1%

48.3%

57.3%

Table 1: Evaluation of classifiers using F 1-Micro measure. First two columns describe the data statistics. Third and fourth main columns show the results when NB and EM are initialized with “node” or “node & parent” labels.

More precisely, there are many documents that do not contain the labels of the correct leaf category but have the labels of the corresponding parent category. There are also documents that contain the labels of two leaves, leading to difficulty in discriminating between the two classes, where using the labels of the parent can help. It is interesting to see that, in almost every case, NB initialized with both category and parent labels outperforms NB initialized with node label only. The same result holds for EM initialized with the corresponding NB results. This can be explained by the fact that using the labels of the parent during initialization of the EM algorithm, we are increasing the probability of classifying the documents in the correct “side” of the tree even if into the wrong leaf category. In this way, the probability of correcting the class label of these documents during iterative classification is higher than the case where the documents were classified in the wrong leaf category with a parent different from the correct leaf node. Finally, although is it not clearly evident on individual datasets whether it is better to use NB or EM, observing the average results it is clear that EM is better than NB. The reason for EM outperforming the NB is that it finds the improved parameters using both the nodes’ description (labels) and the content of the documents. That is aggregating the data according to their similarity guarantees a reduced standard deviation on the resulting model.

4 Conclusion and Future Work We have presented an approach to the automatic classification of web documents in a hierarchy of concepts based on unsupervised learning. The proposed bootstrapping algorithm combines Expectation Maximization to correct the class labels

generated by the Na¨ıve Bayes classifier, which is initialized with labels. We also examine how hierarchical knowledge can improve the classifiers accuracy. Experimental results show that the classification accuracy can be improved when knowledge from the parent (i.e., information provided by the hierarchy) is used. In the future we want to investigate the extension of the model presented in [12], based on the ‘shrinkage’ technique. Our algorithms presently classify documents on only the leaf nodes. We intend to explore the possibility of extending this model to classify documents on internal nodes as well. In this scenario, we could explore more relations among concept nodes.

References [1] Ronchetti, M. & Saini, P., Ontology-based metadata for e-learning in the computer science domain. IADIS e-Society Conference, 2003. [2] M, C. & D.Malerba., Web-pages classification into a hierarchy of categories. The 25th European Conf. on Information Retrieval, 2003. [3] Sun, A. & Lim, E.P., Hierarchical text classification and evaluation. ICDM , IEEE Int. Conf. on Data Mining, 2001. [4] Cheng, C., Tang, J., Wai-chee, A. & King, I., Hierarchical classification of documents with error control. Proceedings of PAKDD-01, 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2001. [5] Ruiz, M. & Srinivasan, P., Srinivasan. hierarchical text categorization using neural networks. Information Retrieval, pp. 87–118, 2002. [6] Koller, D. & Sahami, M., Hierarchically classifying documents using very few words. In D. Fisher, editor, ICML 1997, Proc of the 14th Int. Conf. on Machine Learning, pp. 170–178, 1997. [7] Adami, G., Avesani, P. & Sona, D., Bootstrapping for hierarchical document classification. Proc. of CIKM-03, 12th ACM Int. Conf. on Information and Knowledge Management, ACM Press, New York, US, pp. 295–302, 2003. [8] Mitchell, T., Machine Learning. McGraw Hill, 1997. [9] S, L.D., Naive (bayes) at forty: The independence assumption in information retrieval. 10th European Conference on Machine Learning, 1998. [10] Avesani, P., Girardi, C., Polettini, N. & Sona, D., TaxE: a testbed for hierarchical document classifiers. Technical Report T04-04-02, ITC-irst, 2004. [11] Baeza-Yates, R. & Ribeiro-Neto, B., Modern Information Retrieval. Addison Wesley, 1999. [12] McCallum, A. & Nigam, K., Text classification by bootstrapping with keywords, em and shrinkage. In ACL Workshop for Unsupervised Learning in NLP, 1999.