Combining classifiers for harmful document filtering Bruno GRILHERES1,2 & Stephan BRUNESSAUX1 & Philippe LERAY2 1 EADS DCS / IPDF Parc d’affaire des Portes 27104 Val de Reuil {bruno.grilheres, stephan.brunessaux}@sysde.eads.net 2 PSI Rouen / CNRS Place Emile Blondel 76130 Mont Saint Aignan
[email protected] Abstract In this paper, we describe the experiments that we have carried out during the European Research Project NetProtect II that aims at filtering harmful Web pages in order to protect children. These experiments focus on the combination of classifiers (relying on texts, images and addresses), dealing with heterogeneous classes (bomb-making, drug, pornography, violence) for multimedia documents (composed of both semi-structured text and images). We test and compare different combination formulas (Voting methods, logical methods, k Nearest Neighbors, evidence-based k Nearest Neighbors, Naive Bayes, Artificial Neural Network and Support Vector Machine) on a five thousand webpages database. We present how learning based methods combined to introduction of a priori knowledge on classifiers enable us to get better filtering performances than classical approaches (such as static black/white lists and single classifier).
Introduction The size of the Internet has increased during the last past years. Teenagers and children have also an easier access to the Web (HJKFF, 2002)1. The problem of the access to harmful document has appeared… followed by filtering systems designed to secure the internet access. Most of them rely on simple static lists of blocked web sites (filled in manually). These methods are not suitable for volatile Web sites. Recently, automatic text and image classification have been applied to this domain. But only a few approaches (Denoyer et al., 2003) have explored the combination of text, image classification and static methods to detect harmful content. This is what is studied in this paper. In this paper, we show how the combination of classifiers can be applied to increase the filtering efficiency of harmful web pages. We describe the experiments that have been carried out during the European NetProtect2 project (2003) within EADS DCS in the field of the Safer Internet Action Plan (2003). This project aims at providing an efficient filtering solution by combining multimedia classifiers on various tasks. The scope of the project is the detection of dangerous documents (drugs consumption or sales, bomb-making) and shocking documents (pornography and violence) in 8 languages (Dutch, English, French, German, Greek, Italian, Portuguese, Spanish). Firstly, we present the different classifiers used in NetProtect II and the database that was use to train/test them. Then, we compare different methods of combination of classifiers known as efficient (Bennett et al., 2002) (voting strategy, rules based methods, probabilistic methods). Finally, we show how learning based methods of the parameters of the combination function, associated to a priori knowledge on the classifier feature (classes and medias treated) enable us to improve the harmful webpage filtering performance.
1
According to a survey (N2H2 2003), the number of pornographic web pages on the Internet has been, multiplied by 18 during the 5 past years. Moreover, 70% of teenagers (15-17) have already access harmful websites. On the same way, the number of violent, drug related or bomb related websites is increasing on the Web (ERCA 2003)
Description of the individual classifiers In this paper, we deal with the problem of combining heterogeneous classifiers (various classes, medias, and languages) developed in the NetProtect project. This section describes firstly the existing database that was used to train and test the individual classifiers, and secondly the main characteristics of these classifiers. The NetProtect Database For our tests, we used the NetProtect 2 database. This base is composed of more than 21000 dumped webpages retrieved during the project. For our preliminary tests, we have kept 5029 Web pages in French and English (see Table 1.). This choice has been done because of the treatment time (to get the result of each classifier) and as our 7 classifiers (see section “Classifiers used in NetProtect II”) were able to deal with both French and English. English(En) 40 182 145 203 393 332 190 110 453 563 2611
B ¬B D ¬D P ¬P V ¬V E G Total
French(Fr) 113 223 205 183 292 155 187 150 440 470 2418
Table 1 : Netprotect II Database content The database is filled as follow: -
Bomb making related pages (B): pages dealing with home-made bomb or explosive recipe.
-
Bomb Counter-Examples (¬B): pages relative to historical use of bombs.
-
Drug (D): websites promoting drug consumption or drug sales online.
-
Drug Counter-Examples (¬P): sites describing negative effects of drug consumption and fightagainst-drugs association websites.
-
Pornography (P): sexually explicit websites.
-
Pornography Counter-Examples (¬P): sites dealing with sexuality (AIDS, STD), or sales of underwear.
-
Violence (V): crime, suicide, torture, violent websites.
-
Violence Counter-Examples (¬V): sites describing violent phenomenon and how to avoid them.
-
Generic websites (G): generic website on sport, finance, movies, news.
-
Child oriented websites (E): websites for children (games, cartoons, books and activities for 3 to 15 years old children).
In this work, we have divided randomly the database in three parts. 40% are used for single classifiers training, 40% for learning the parameters of the combination function, and 20% for testing the combination methods. NB: This database has been used by Denoyer et al. (2003) for the tests of classification of semi structured document (but only on the pornography category). Classifiers used in NetProtect II We describe, in this paragraph, the classifiers used in the NetProtect II project. Here, it must be stressed that the choice of the classifiers was done in a previous project (Netprotect 2001): the aim of the article is to describe how to combine them optimally but not how to make each of them better. For these tests, we consider two languages: French and English in the four categories i.e. classes (bombmaking B, drug D, pornography P, violence V). The prerequisite of the NetProtect II project was to use the following classifiers2: 5 text classifiers (one was implemented during a previous project), an address classifier and an image classifier (cf Table 2). A description of text classification and image classification can be found at the end of this paper (in Appendices). We describe in the following the particularities of each classifiers:
2
-
The text classifier (TEX1) coming from Netprotect is an Artificial Neural Network (Weiner, 1995). It deals with the pornography category only in French and English. This classifier was trained on another database than the NetProtect 2 Database. It returns a boolean result for each document (pornographic document or not).
-
The other four text classifiers (SVMi) are two-classes Support Vector Machines (Vapnick, 1995; Joachims, 1998). The training of each SVM is realized on Web pages relative to a category (bomb, drug, pornography, violence) on the two considered languages. The training is based on to be-blocked, ambiguous pages and generic pages. The pretreatment consists in transforming web pages into raw text (no structural information is kept) and selection of the n most pertinent word (the n words of strongest TFIDF per document). Short words (less than three characters) and long words (more than 20 characters) were also removed. We do not use an antedictionnary so as to avoid a language recognition step (this decrease marginally the performance). Each of these 4 classifiers is able to treat one of the four categories in the two languages. The choice of biclasses SVM classifier was done so as to be able to add categories among time without retraining everything. As a consequence, multi-class SVM were not considered. Each Support Vector Machine returns a score which is the probability to belong to the considered category.
-
The address classifiers (ADR) is based either on a regular expression research on the name/address of the document (for instance: *sex*) or on a comparison to a static black list. This classifier was trained on another database than the NetProtect 2 Database. This classifier deals with the four categories in the two considered languages and returns a boolean result (document blocked or not).
All those classifiers (except the address classifier) are machine-learning supervised classifiers: that is to say that all of those classifiers are trained on an learning set of samples (images or texts).
-
The image classifier (IMG) developed by one of our partner, is also a machine learning system. The documentation of the system mentions the usage of a vectorial model (features are both shape, colors (skin color detection), and texture). This classifier was trained on an image database by our partner. This classifier is only able to detect pornographic document and returns a boolean result (page pornographic or not). B TEX SVM1 SVM2 SVM3 SVM4 ADR IMG
D
P X
V
Boolean X
X
X X X X
X X X
X
X X
X X
Score
X X
Table 2 : Functionalities of integrated classifiers in NetProtect II The table above shows that all classifiers have specificities both in terms of output (either boolean = page harmful or not, or score = probability for a given page to be harmful) and in the classes/categories they are able to deal with (some of them treats only pornography and others all considered categories). So, the problem consists in combining all specific classifiers (able to deal with only some categories or type of medias) to get the best filtering performance for the 4 considered categories.
Classifier Combination The following describes different existing approaches about classifier combination, existing comparison of methods and the methods we evaluate in this work. Previous work on classifier combination It is possible to follow several types of approaches (see Figure 1). Indeed, combination of classifiers can be done at different levels. The combination function could consider as inputs, the forecast classes (level 1) or the probability for a sample to be associated to each class (level 2), or the probability for a sample to be associated to each class and information concerning input data (level 3) or the probability for a sample to be associated to each class, information concerning input data plus contextual information (level 4). A description of contextual information suitable for webpages filtering is given by Bennett et al. (2002).
Figure 1: Several level of Combination Voting strategy: These strategies consider the forecast classes as input of the combination function. In the case of voting methods (Lecce et al., 200), each classifier has a voting token and a decision rule is used in case of conflict votes (Majority Voting, Maximum, Median and “Dictatorial” Voting in the case of classifier selection (Giacinto & Roli., 2000)). Boolean and logic methods: these methods are based on logical rules relying on predicates (OR, AND) to combine the classifiers (i.e. if (ADR decide that the page is harmful AND TEX decide that the page is armful) OR if (SVM3 decide that the page is harmful) then the taken decision is the page is harmful. These sets of decision can be represented as decision tree. Probabilistic combination: These methods consist in calculating a score or a probability of belonging to one of the predefined class and to apply a decision rule (These methods are often level 2 methods). For instance, Larkey and Croft (1996) propose to use a simple linear combination of scores or Yang (2000) propose to normalize the output of the different classifiers Learning the parameters of the combination function: The task of combining classifiers can be considered as a simple classification problem (the input vector contains the output of each individual classifiers and sometimes other contextual information, and the output is the forecast class). Indeed, it is possible to use classification algorithms to learn the parameters of the combination function: Behavior Knowledge Space, Bayesian Average, k Nearest Neighbors, Bayesian Network, SVM, Neural Networks, Boosting, Bagging can be applied to solve this combination problem. In our work, we compared some of the previous methods. A description of the evaluated method is given in the following.
Existing comparison of combination methods
Giacinto and Roli (2000) compared several methods of combination a posteriori classifier selection, Majority voting and Bayesian Average They come to the conclusion that their a posteriori combination method gets better results than a priori selection methods proposed by Wood (1997) but lower result than Majority Voting or Bayesian Average. Di Lecce et al. (2000) compared Majority Voting, Dempster Schaffer methods (Yamaoka, 1994) and Behaviour Knowledge Space. They conclude that the choice of the method depends on the correlation level between classifiers: the more the classifiers are correlated, the more a priori information is needed to increase the performance. Bennett, Dumais and Horvitz (2002) compared Neural Networks, Nave Bayes Network and Support Vector Machines to learn the combination function. On the corpus Reuters (Lewis, 1997), SVM get the best results. Our work is close from Bennet et al. work but is dealing with combination of heterogeneous classifiers (working on different media and classes). Indeed we consider a multilingual classification task with classifiers working on several categories. Bennet et al. dealt with combining textual classifiers working on the same data (with the same preprocessing and the same feature vectors) and on the same classes. Bennet used contextual knowledge3 to increase the combination function although we considered only a priori knowledge on the role and the feature of the classifier. Evaluated methods This paragraph describes the methods evaluated in our work. The following methods are level 1 or 2 methods (taking as input either the forecast classes or the probability of a sample to be classified in a given class):
3
-
OR (OR): The page is considered harmful if at least one of our simple classifier (TEX OR ADR OR SVM1 OR SVM2 OR SVM3 OR SVM4 OR IMG) considers the page as harmful. (for the SVM, the page is considered harmful if the score is superior to a given threshold). The chosen threshold is 0.5.
-
AND (AND) : The page is considered harmful if all classifiers consider the page as harmful. This should be the case very rarely because a page may not deal with pornography, violence, bomb-making and drug at the same time. The chosen threshold for the SVM is 0.5.
-
Majority Voting (MV) The page is considered harmful if the majority of simple classifiers considered the page as harmful. The chosen threshold for the SVM is 0.5.
-
Logical Rules (RUL): We use a priori knowledge on the classifiers (on the categories: B, D, P, V and on the type of information: addresses, texts, images they deal with). Whenever two classifiers, dealing with different type of information and able to treat the same category, consider a page as harmful, it is likely that the page is harmful. This is the rule we used in our work.
-
K Nearest Neighbor (kNN): This method consists in a Majority Voting on the k Nearest Neighbor of a considered page. We fixed the optimum k (the number of nearest neighbors) automatically on a subset of the training set (Training the combination function on 2/3 and learning the optimum k on the 1/3 left). The similarity measure between one document and another one is the euclidean distance.
A priori knowledge that could be used to increase filtering performance can be: output classes of the classifier, language or media supported by the classifier, relative performance on each classes, correlation rate between classifiers
-
K Nearest Neighbor based on the theory of Evidence (DSkNN): We based our work on Denoeux’s work (Denoeux, 95) so as to increase the performance of our k Nearest Neighbor. This method consists in considering that each nearest neighbor is a piece of evidence concerning the document to classify. The mass of evidence on the k nearest neighbor is calculated thanks to Dempster Schaffer rule of combination (Dempster 1967).
-
Naive Bayes Network. (NB) This classifier is trained by learning the probability of the classes regarding the input vector on a learning set. These probabilities are computed with the Bayes Rule. The naïve aspect is due to the hypothesis of independence of inputs knowing the class. However, this strong hypothesis leads to good results in various classification tasks (Langley et al., 1992).
-
Multi Layer Perceptron (MLP): The tested Neural Network is a MLP with a hidden layer.
-
Support Vector Machines: We used two different kernels: polynomial (PSVM) and gaussian (GSVM). The degree of the polynomial and the bandwidth are set automatically on a sub-set of the training set (The division 2/3, 1/3 is the same than for the k Nearest Neighbor). The implementation of the SVM used is the SVMToolbox developed by Canu et al. (2003).
Results and Future Work This section describes the way the performance is measured, the results we obtain, the limits of the followed approach and research perspectives. Measuring the filtering efficiency For each evaluated method, we calculate the confusion matrix (see. Table 3). Usually, when considering problem of filtering web pages, the following measure are used : the blocking that is to say the ability to block correctly harmful webpages and the overblocking that is to say the error mane on harmless webpages (harmless web page considered as harmful by the system).
Forecast Class Harmful webpage Real Class Really Harmful webpage Really Harmless webpage
Blocking Overblocking
Harmless webpage ¬Blocking ¬Overblocking
Table 3 : Confusion Matrix To resume the two measures we used the good classification rate that can be calculated as follow -
Blocking = Rate of harmful pages correctly blocked by the system
-
Overblocking = Rate of harmless pages blocked by the system (mistakes)
-
GCR = Good Classification Rate
-
Nd = Number of harmful pages
-
N¬ d = Number of harmless pages
-
GCR =
Nd * Blocking + N¬d * (1 - Overblocking) N
We calculate also the confidence interval at 5 % by using the formula proposed by Bennani and Bossaert (1996). -
Z α = Value of the density function of the normal law at α % (for α = 5% Z α =1.96) 2
GCR + -
I (GCR, α ) =
2
Zα Zα GCR(1 − GCR ) ± Zα + 2 N 2N 4N 2 Z 1+ α 2N
Results This paragraph describes the result we obtained. These results are summarized in the Figure 2 representing the GCR of each individual classifier on the four categories and for the evaluated combination methods.
Figure 2 : Comparison of GCR for individual classifiers
Results obtained by individual classifiers on the category they are able to treat
Classifier Treated Category SVM1 Bomb-Making SVM2 Drug SVM3 Pornography SVM4 Violence TEX Pornography IMG Pornography
Blocking 0.7818 0.8261 0.9407 0.7027 0.8444 0.3407
OverBlocking 0.1282 0.1850 0.1850 0.2300 0.0485 0.0583
GCR 0.8373 0.8203 0.8885 0.7302 0.8889 0.5903
I(GCR,5%) 0.8135-0.8586 0.7956-0.8426 0.8678-0.9063 0.7022-0.7565 0.8682-0.9067 0.5599-0.6200
Table 4 : Results for individual classifiers on theirs categories We can see on the table above that the filtering efficiency of the individual classifiers on the category there were trained for, are correct: SVM1 achieve a good classification rate of 83.73% on the bombmaking category (B), SVM2 get a GCR of 82.02% on the drug category, SVM3 gets a GCR of 88.85% (The number of pages to train this category was the highest) and SVM4 gets a GCR of 73.02% on the violence category. The TEX classifier gets good results on the pornography category (TBC=88.89%), it is equivalent to the SVM3 classifier on that. The image classifier has the lowest performance on the pornography category 59.03%. Results obtained by individual classifiers on the 4 considered categories Classifier ADR SVM1 SVM2 SMV3 SVM4 TEX IMG
Blocking 0.6486 0.3303 0.3514 0.5736 0.5195 0.4384 0.2102
OverBlocking 0.0895 0.1400 0.1457 0.2165 0.1544 0.0101 0.0101
GCR 0.8255 0.6881 0.6910 0.7154 0.7398 0.8109 0.7368
I(GCR,5%) 0.8011-0.8475 0.6591-0.7157 0.6621-0.7186 0.6870-0.7422 0.7121-0.7657 0.7858-0.8337 0.7090-0.7629
Table 5 : Results for individual classifiers on all categories On the four categories, for the individual classifiers, the address classifier gets the best performance (because it was designed to work with the four categories). The SVMs get lower results because they treat only one category. The ADR classifier, that gets the highest performance on the four categories, will be considered as the reference. Results obtained by combining individual classifiers Classifier OR AND MV RUL NB MLP kNN kNN+RUL DSkNN DSkNN+RUL
Blocking 0.9550 0.0150 0.4655 0.5976 0.7417 0.8018 0.7898 0.8048 0.7958 0.8048
OverBlocking 0.3304 0 0.1328 0.0188 0.0404 0.0418 0.0404 0.0346 0.0303 0.0346
GCR 0.7622 0.6803 0.7368 0.8567 0.8889 0.9074 0.9074 0.9133 0.9133 0.9133
I(GCR,5%) 0.7352-0.7872 0.6511-0.7872 0.7090-0.7629 0.8340-0.8768 0.8682-0.9067 0.8881-0.9237 0.8881-0.9237 0.8945-0.9290 0.8945-0.9290 0.8945-0.9290
PSVM PSVM+RUL
0.7958 0.8108
0.0317 0.0346
GSVM GSVM+RUL
0.7928 0.8048
0.0289 0.0346
0.9123 0.9152 + 0.9133 0.9133
0.8934-0.9281 0.8966-0.9307 0.8945-0.9290 0.8945-0.9290
Table 6 : Results for combination of individual classifiers For our problem, the simple combination method (logic or voting strategy: OR, AND, MV) get lower results than the one of the best individual classifier (ADR). This is logic because these methods are not suitable for combining classifiers working on heterogeneous classes. However, a combination rule based on the a priori knowledge of the classifier (regarding the category and type of media they are able to deal with) achieve better result than the best individual classifier. We have not considered a priori information on classifier performance or correlation for these tests. A rule based on the knowledge of the low overblocking rate of TEX and IMG could have been used to combine the classifiers. The method based on a training of the combination function parameters gets higher results than our best individual classifier. They achieve performance between 88% and 92% of GCR (and many of them reach a plateau at 91.33%, this means that each classifier makes mistakes on a set of document difficult to classify: few text, no images etc.). But as the confidence interval (about 3%) is high, it is not possible to come to definitive conclusions. However, without adding any a priori information, the best results are the Gaussian SVM (GSVM) (the optimal bandwidth is 3) and the k Nearest Neighbors based on the evidence theory with a GCR that reaches 91.33 %. However, SVM are better for our problem as they are more efficient regarding computation time in the classification step. Our results confirm those of Bennet et al. concerning the good performance of the SVM for combining classifiers. Moreover, we noticed that DSkNN gets a better GCR than simple kNN (as mentioned in (Denoeux, 1995)). The Multi Layer Perceptron and the Nave Bayes classifier achieve lower performance. When we introduce a priori knowledge on role and categories treated by the classifiers to the learned function, we always increase the performance (see. kNN+RUL,P SVM+RUL,G SVM+RUL). The best method is in this case a Polynomial SVM (degree of the polynomial = 2). The approach developed in this paper is an alternative to approaches by black or white list (that require a manual classification of new documents available in the Internet). Indeed, this approach combines in an optimal way text, image and address classification for harmful webpages detection. This is done by learning the parameters of the combination rule and by adding a priori knowledge on the capability and role of various classifiers. Limits and Future work As mentioned previously, we have only tested combination methods based on level 1 or 2: we do not take into account knowledge on input data or contextual knowledge for the classification. Indeed information on the composition of document (number of words, number of images, number of linked to harmful webpages) may be used to increase again the filtering efficiency and we plan to carry out more experiments on this. Moreover, if we have kept only English and French language for these tests, it may be interesting to carry out experiments on the 8 languages studied during the NetProtect II project. Indeed, it may be helpful to see the behavior of the combination for classifiers working on various languages (TEX works only in 5 languages, SVMi on the 8 languages). We will study also methods like hierarchical combination that can be used to combine efficiently heterogeneous classifiers.
Conclusion The work that is presented in this paper consists in combining classifiers to improve filtering of harmful documents (webpages) for children. Given a set of heterogeneous classifiers (different type of medias: text, image, addresses and classes treated: bomb-making, drug, pornography, violence), we show how to combine them optimally. We describe how learning the parameters of the combination function enable us to enhance the filtering efficiency of the system and to overcome classical approaches based on black/white lists or single text/image classifiers. Within the combination method evaluated, all the learning based method (Support Vector Machine, k Nearest Neighbor, Multi Layer Perceptron, Naive Bayes Classifiers) overcome simple boolean approaches or voting strategy (not adapted for heterogeneous classifiers). We show also, how that adding a priori knowledge on class(es) treated and media(s) supported by each classifier can enable us to construct extra rules (to be used in addition to the learned combination function). In our future work, we will focus on adding extra information such as a priori knowledge on classifier performance (overblocking, correlation between classifiers) or knowledge on the input (either on the content of the page or on the context of the page)
Appendices Text Classification It exists a big variety of methods for text classification. They are based on several representation mode of the textual information (boolean, logical (Cohen, 1995), probabilistic (Amini, 2000, Fuhr, 1992), vectorial (Salton, 1978)). This last model gives the best results and is most used. It consists in representing a document in a space of predefine segments (different granularity levels: letters, ngrams, words, set of words, concepts). The choice of a best model of representation is still open (However, the most used model is ’bag of words’). Functions that associate documents to segments are also numerous: from a simple frequency of a segment in a document to more complex measures (number of segment per documents, frequency of a segment in a complete corpus). The most used function is the normalized TFIDF measure despite its simplicity (it does not take into account neither semantic nor localization information). The main problem of this method is the high dimensionality of the feature space (the number of different word in a corpus reaches quickly 20000 terms). Numerous methods, not detailed here, lead to a reduction of the model dimensionality: Antidictionnary, Lemmatisation (Porter, 1980), Zipf Law (Zipf, 1932), Morphosyntaxic Analysis (Brill, 1995), Information Gain (Rogatti et al., 2002), Latent Semantic Indexing (Deerwester et al., 2001) , Chi2 Method (Rogatti et al., 2002). The most common supervised classification algorithms for text classification are: Artificial Neural Networks (Weiner et al., 1995), Bayesian Network (McCallum, 1998), k Nearest Neighbor, Support Vector Machine (Joachims, 1998), Boosting, Bagging. SVM are put forward in several articles (Dumais et al., 1998; Yang, 1999). Image Classification The problematic of image classification is close from the text one. Indeed, the user has to define a representation mode in which the images are projected as well as an association function between images and segments, and a decision rule. The vectorial model is also the most used and the most common segment are either the chromatic histograms (Ogle et al., 1995) or a feature vector named the image signature (Contrast, Shapes (Flickner, 1995) and Textures (Tamura, 1978) ). Various supervised classification algorithm have been used for this task (kNN, Logistic Regression, Decision Tree, Neural Network, SVM). A comparison can be found in (Dreiseitl et al. 2001).
References Al-Ani, A. & Deriche, M. (2002) A new technique for combining multiple classifiers using the dempstershafer theory of evidence. In Journal of Artificial Intelligence Research, 17, (pp. 333— 361) Amini, M.-R. (2001) Automatic Learning and Information Retrieval : PhD thesis, University of Paris 6 Bennani, Y. & Bossaert, F. (1996). Predictive neural network for traffic disturbance detection in telephone network. In Proceeding of IMACS-CESA’96. Bennet, P., Dumais, S. & Horvitz, E. (2002). Probabilistic combination of text classifiers using reliability indicators : Models and results. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland. Brill, E. (1995). Unsupervised learning of disambiguation rules for part-of-speech tagging. In Proceedings of the Workshop on Very Large Corpora 2003. Canu, S., Grandvalet, Y. & Rakotomamonjy, A. (2003) SVM and kernel methods matlab toolbox. PSI, INSA de Rouen, France Cohen, W. (1995) Learning to classify English text with ILP methods, In Advances in ILP. IOS Press Deerwester, P., Dumais, S. T., Furnas, G. W., Landauer, T. & Harshmann, R. (2001). Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6) (pp 391– 407). Dempster, A. (1967) Upper and lower probabilities induced by multivalued mapping. In AMS-38, (pp. 325—369) Denoeux, T. (1995) A k-nearest neighbor classification rule based on Dempster-Shafer theory. In IEEE Transactions on Systems, Man and Cybernetics, 25(05) pp 804—813. Denoyer, L., Vittaut, J.-N., Gallinari, P., Brunessaux, S. & Brunessaux, S. (2003). Structured ultimedia document classification. In Proceeding DOCENG 2003. Dreiseitl, S., Ohno-Machado, L., Kittler, H., Vinterbo, S., Billhardt, H. & Binder, M. (2001). A comparison of machine learning methods for the diagnosis of pigmented skin lesions. Journal of Biomedical Informatics, (pp. 28–36). Dumais S. T., Platt J. D. H. & Sahami, M. (1998) Inductive learning algorithms and representations for text categorization. In Proc. 7th International Conference on Information and KnowledgeManagement CIKM (pp. 148–155). European Research on Computer Affairs (2003). Children and Violence on the Web. Giacinto, G. & Roli, F. (2000) Methods for dynamic classifier selection. In Proc. of the first Int. Workshop on Multiple Classifier Systems MCS2000, (pp. 177–189). Flickner, M. (1995) Query by image and video content: The QBIC system. In IEEE Computer 28(9). Fuhr, N. (2001) Probabilistic models in information retrieval. In The computer journal 35(3) (pp 243– 255). Henry J. Kaiser Family Foundation. (2002) Teens online key fact 4. Joachims T. (1998) Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Proceedings of the European Conference on Machine Learning (ECML), Springer. Langley, P., Iba, W. & Thompson, K. (1992) An analysis of bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence San Jose, CA, (pp. 223–228) AAAI Press
Larkey, L. & Croft, W. (1996) Combining classifiers in text categorization. In Proc. of SIGIR’96, (pp. 289–297.) Lecce, V. D., Dimauro, G., Guerrerio, A., Impedovo, S., Pirlo, G. & Salzo, A. (2000) Classifier combination: the role of a-priori knowledge. In Proc. of the 7th IWFHR’2000 Lewis, D. D. (1997) Reuters-21578, distribution 1.0. McCallum, A. & Nigam, K. (1998) A comparison of event models for naive Bayes text classification. In Proc. AAAI-98 Workshop on Learning for Text Categorization. Netprotect. (2001) A european prototype of internet access filtering, http://np1.net-protect.org Netprotect II. (2001) A european tool of internet access filtering http://www.net-protect.org Ogle, V. E. & Stonebraker, M. (1995) Chabot: Retrieval from a relational database of images. In IEEE Computer 28(9). Porter, M. (1980) An algorithm for suffix stripping. In Program, 14 3, (pp 130–137). Rogati, M. & Yang, Y. (2002) High-performing feature selection for text classification. In Proceedings of ACM CIKM2002. Safer Internet Action (2003) Plan. European Commission http://www.saferinternet.org/ Salton, G. & Buckley, A. (1988) Term-weighting approaches in automatic text retrieval. In Information Processing and Management. Soucy, P. & Mineau, P. (2001), A simple feature selection method for text classification. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (pp. 897– 902). Tamura H., Mori S. & Yamawaki, T. (1978) Textural features corresponding to visual perception. In IEEE Trans. on Systems, Man and Cybernetics 8(6) (pp 130–137). Vapnik, V. The Nature of Statistical Learning Theory. Springer-Verlag, 1995. Weiner, E., Perdersen, J. O. & Weigend, A. (1995) A neural network approach to topic spotting. In roceedings of SDAIR’95 (p. 317–332) Woods, K. (1997) Combination of multiple classifiers using local accuracy estimates. In IEEE trans. On Pattern Analysis and Machine Intelligence 19, (pp 405–410) Yamaoka, L. F. (1994) Integration of handwritten digit recognition results using evidential reasoning. In Proc of IWFHR-4, pp. 456–463. Yang, Y., Ault, T. & Pierce, T. (2000) Combining multiple learning strategies for effective cross validation. In Proc. ICML’00 Berkley, (pp. 1167–1182). Yang, Y. & Liu, X. (1999). A re-examination of text categorization methods. In Proc. ACM SIGIR Berkley, (pp. 42–49). Zipf, G. (1932) Selective Studies and the Principle of Relative Frequency in Language.