Wine Vinification prediction using Data Mining tools - wseas.us

COMPUTING and COMPUTATIONAL INTELLIGENCE

Wine Vinification prediction using Data Mining tools JORGE RIBEIRO1, JOSÉ NEVES2, JUAN SANCHEZ3, MANUEL DELGADO4, JOSÉ MACHADO2, PAULO NOVAIS2 1

School of Technology and Management 3 Agrarian School Viana do Castelo Polytechnic Institute Viana do Castelo PORTUGAL 1 3 [email protected], [email protected] http://ww.ipvc.pt 2 Department of Computer Science University of Minho PORTUGAL {Jneves,jmac,pjon}@di.uminho.pt http://www.di.uminho.pt 4 Department of Electronic and Computation University of Santiago de Compostela SPAIN [email protected] http://www.usc.es Abstract: - The vinification process is one important stages of the wines production that can influence the achievement of wines quality. Based in a chemical samples this assessment is traditionally realized by wine tasters that analyze some subjective parameters such as colour, foam, flavour and savour. This type of analysis is very important for the production of wine and for its successful marketing. The use of Data Mining techniques in this field has a great relevance in revealing the importance of the numerous chemical parameters involved in the process of wine production, as well as to define models to classify classes of parameters for example, to determine the organoleptic parameters based on chemical parameters of the winemaking process. This paper presents the Decision Trees, Artificial Neural Networks and Linear Regression as Data Mining techniques to achieve the objectives of classification and regression in order to create models to predict the organoleptic parameters from the chemical parameters of the vinification process. The experiments were oriented using the new Microsoft's SQL Server 2008 Business Intelligence Development and an open-source Data Mining tool (WEKA). Very good results were achieved with accuracies between 86% and 99% obtained for all models. Key-Words: - Data Mining; Knowledge Discovery in Databases, Decision Trees, Artificial Neural Networks, Linear Regression, Wine Vinification Process. interpretation) are associated to this process of knowledge extraction and their use is oriented to a different objectives: classification, clustering, forecasting, optimization and summarization. The Data Mining task is centred in the application of algorithms including: artificial neural network, decision trees, association rules and genetic algorithms for example that are used to extract affinities from the previously treated data. This work used the objectives of classification and regression using the Decision Trees (DT) [3], Artificial Neural Networks (ANN) [4] and Linear Regression (LR) [5] as the Data Mining techniques applied to the vinification process of the “green red” wine production [6]. This data set is characterized by the existence of chemical parameters/attributes

1 Introduction In recent years, the application of Data Mining techniques [1] has become a very powerful tool and easy to use for analyzing relationships between various attributes of the data sets. The high volume of data stored through time by the transactional systems of the organizations origins a new challenge in the extraction of knowledge from the information stored in these systems. From the process of Knowledge Discovery from Databases (KDD) [2] organizations can potentiate the stored data, discovering relationships or affinities between them and understand the behaviour of the various agents that intervenes in their organization like customers, suppliers and sellers. Various tasks (selection, preprocessing, transformation, data mining and

ISSN: 1790-5117

78

ISBN: 978-960-474-088-8


(colour, foam, flavour and savour) evaluated by the wine tasters for the classification of chemical parameters of the samples. For the attributes with continuous values, figure 1 shows the histogram that lets analyze their distribution, and set classes to be used in the processing of attributes with continuous values in attributes with discrete values. The analysis of histograms presented allows to take the classes A, B, C, D and E. The division of continuous values into five classes was decided by the managers of the wine production. The subjective parameters are divided in two classes corresponding to "medium" for class "A" and "good" for class "B". As we mentioned the first objective is to create classification models for the various subjective attributes. It is intended to analyze the variation of chemical parameters and the predictive value of the subjective parameters in terms of two class "A" and "B" corresponding to the "medium" and "good" evaluation of the attribute.

corresponding to the different samples (pH, CA, FCI, etc.) and organoleptic attributes (or subjective attributes) [7] which correspond to the evaluation by the winetasters of the sample obtained in the vinification process. In this context, this study focuses on two objectives: the first one is to create a classification model of the subjective attributes (savour, colour, flavour and foam) from the chemical parameters obtained during the vinification process. For this purpose we will first use the DT and ANN. The second objective is to use the LR in order to obtain mathematical functions to present the relationship between the chemical attributes and to allow the prediction of the subjective parameters. Two tools oriented to Data Mining tasks were used: an open-source (WEKA) [8] and a proprietary (Microsoft Business Intelligence Studio 2008) [9]. With this work we intend to demonstrate the potential use of the Data Mining techniques in the extraction of knowledge in databases in particular for the creation of models for classification of subjective attributes in the wine vinification process. With these models, the wine production decision makers will have a tool capable of assisting them in the analysis of chemical parameters of the best conditions to consume the wine and improve the wine’s quality. The paper is organized as follows: first, the basic concepts are introduced and the data is presented and described; then the data mining techniques applied are presented; next, the experiments performed are described, being the results analyzed in terms of several criteria; finally, closing conclusions are drawn.

Several physico-chemical attributes are associated with the wine production [6, 7]. Despite the relevance of these parameters, the attributes that the managers considered most relevant to the analysis of wine production in special in the ”red green wine“ [7] are: the chemical pH, the absorbency at 420 nm (contribution for the blue colour), absorbency a520nm (contribution for the red-blue colour), the absorbency to 620 nm (contribution for the yellow colour), the Anthocyanins, the Chemistry Age (CA) and the Folin-Ciocalteau Index (FCI). The fermentation sample (SFTTM) corresponds to the time in months of the sample collection. This indicator has the values 6 till 36 months. Related to the chemical colour in the “green red wine“ the particulary is that the color of the wine is determined by the sum of absorbances at three a wavelength (420, 520 and 620 nm). By this reason the chemical colour parameter was suppressed from the dataset.

2 Materials and Methods 2.1 Wine vinification Data This work adopted part of the data collected during the wine production phase during four years in the Wine Estate in Minho Region (North of Portugal) that produces and makes “green red” wines. During the wine vinification process the data used is characterized by three kinds of wine vinification [6]: Vinification by Fermentative Pellicular Maceration (C), Vinification by Carbonic Maceration (CM), Vinification by Rotary Cube (CR). In the used wine tests (C, MC and CR) were made for each processes the following glues (clarification type) [10, 11]: polyvinilpolipirrolidona, albumin, gelatin, casein more the “witness”, without any glue. These characteristics are mentioned in the table 1 by the letters p, a, g, c and t respectively. The data set has two types of attributes: attributes with chemical characteristics and organoleptic/subjective attributes

ISSN: 1790-5117

Before attempting the DM modelling, the data was pre-processed. The original dataset contained attributes with missing values. Since it was not possible to obtain the correct values the blank registers were discarded [12] remaining a total of 362 examples. The main features of the vinification data set are described in Table 1. The frequency distributions (or histograms) related to these variables are plotted in Figure 1. According to the managers of this wine estate, the classification of the vinification wine’s quality was defined as a typical classification problem. 2.2 Decision Trees

79

ISBN: 978-960-474-088-8


The Decision Tree (DT) [3] is one of the most popular Data Mining and efficient classification algorithms. Corresponds to the representation of a set of rules that follow the hierarchy of classes or values, expressing the simple conditional logic, is graphically similar to a tree (figure 2). DT corresponds to representations of a set of rules for classification, classifier bodies from the root node to terminal nodes

(leaves), which provides the classification for the body: each node of the tree specifies a test for the attributes of the instance (variable) and descending branch of each node corresponding to one of the possible values for this attribute. An instance is classified first by testing the attribute specified by the root node, then following the branch corresponding to the value of the attribute in the instance.

Fig 2. Microsoft Decision Tree to predict the Colour parameter. The most popular decision trees algorithms for classification are ID3, C4.5 and C5.0 proposed by Ross Quinlan [3]. The CART classification algorithm proposed by Breimann [5] is also widely adopted. In this study we use the C 4.5 implementation using the WEKA tool [8] and the Microsoft Decision Trees [9] has a hybrid of these algorithms (C4.5 and CART). The C 4.5 is a DT algorithm, that is based on the concept of information gain. The information gain represents the decrease in entropy caused by dividing a given data set according to an attribute. The attribute with the highest gain is chosen to divide the data set, and recursive application of this procedure for different relevant attributes allows of structuring the data set w.r.t. the relevant attributes. In this study the J48 [8] which is a Java re-implementation of C4.5 algorithm [13] and is a part of the machine learning package WEKA [8] was used to induce the decision trees under the usage of an opensource tool. The other tool was the Microsoft Business Intelligence Studio 2008 [9].

(a)

2.3 Linear Regression The objective of the Linear Regression (LR) [5] is to find a basis for predicting one variable, i.e. find a function that represents a form to represent the behaviour of the variables (figure 3 (b)).

(b)

Fig. 3- Scatter plot of the linear regression for the Colour attribute The LR methods permit the discrimination of the data combining attributes which is equivalent to

ISSN: 1790-5117

80

ISBN: 978-960-474-088-8


determining the separation lines of values. The regression typically requires that both dependent and independent variables are continuous and numeric type. By this fact, we remove the non-numerical data from the dataset when applied the LR model. In this study, we applied LR to obtain functions for predicting the subjective variables of the wine vinification data set.

3 Results Attending the analysis of the wine vinification process it was decided to develop the experimentation with two different strategies. The first approach is based on the classification objective using DT and ANN, and the second is using the LR. These two approaches will be compared and the criteria will be the predictive accuracy. As we mentioned the tools used were the WEKA and the Microsoft Business Intelligence in 2008. The classification models for wine vinification analysis were developed using the C4.5 algorithm [13]. To insure statistical significance of the attained results, 10 runs were applied in all tests, being the accuracy estimates achieved using Holdout method [14]. The training strategy was separated in a balanced and non-balanced training sets. In each simulation, the available data is randomly divided into mutually exclusive partitions: the training set, with 2/3 of the available data and used during the modelling phase, and the test set, with remaining 1/3 examples, being used after training in order to compute the accuracy values. A common tool for classification analysis is the confusion matrix [15]. This matrix is a structure of size N x N, where N denotes the number of possible cases. This matrix is created by matching the given predicated by the current Data Mining model and the desired result. In the presented experiments, J48 [8] with default parameters was used for inducing classification trees. Model training and validation was based on 10fold cross-validation and evaluated the number of correctly classification instances. The gain represents the interestingness score of the attribute.

2.4 Artificial Neural Networks The Artificial Neural Networks (ANN) [4] are simplified models of the central nervous system that constitutes the human brain. The application of ANN in the context of decision support is justified by the characteristics of learning and adaptation that are often considered unique to biological systems. The Multilayer Perceptron [4] is a popular architecture, where neurons are grouped in layers and only forward connections exists. In particular, Microsoft Neural Network [4] is a feedforward network, which uses the weighted sum combination function approach [9] and the tanh activation function in hidden nodes. For output nodes it uses the sigmoid function. The number of hidden nodes in the hidden layer is set automatically by the algorithm. The ANN is applied in several areas in particular to problems of classification and prediction.

(a)

3.1 Experimental Results Table 2 presents the confusion matrix of the DT applicability for each tool, where the values denote the average of 10 runs. Both approaches have a predictive accuracy of about 90%. Analyzing the experimental results we can verify that when using the two different tools there are no improvements when using balanced training sets. The results reveal that the Model 1 (Microsoft Decision Tree) is more accurate in predicting than the model 2 (WEKA Decision Tree). The predictive probability of the first model is centred

(b) Fig. 4 - ANN Model Viewer

ISSN: 1790-5117

81

ISBN: 978-960-474-088-8


in 85% to 98% to predictive the subjective attributes of the vinification date set. In the other side, the WEKA tool presents the correct classified instances between 83% and 92%. Despite the differences of the tools, both presented a good accuracy.

performances than the DT models. For example, the accuracy for the Flavour attribute using the Microsoft Decision Tree is 91.18% and 92,2% for the WEKA DT. Using the ANN Microsoft the predict probability is 96% and 91% for the WEKA ANN tool respectively. Figure 4 shows the two examples of how the neural network model can be interpreted for the flavour attribute (a) and for the savour attribute (b). For example the attributes that influence the class "A" for the attribute flavour are the SFTM in the domain 24 and 26 (with probability of 96%) and the Anthocyanins in the class "A" (between less than 160) with the probability of being "A" of 94%. For the class "B" are the attributes SFTM in the domain 6 and 20 months (with a probability of 96%) and the chemistry age in the class "D" (between more than 0.56) with the probability of being "B" of 90%.

Table 3 presents the most relevant attributes for the various classification models obtained by the applying of the DT. The particularity of both tools is that the attribute "SFTM" is selected as the most relevant for classifying the various subjective attributes. The second most important attribute is the “type clarification”. For the classification of the attribute “Flavour” the most relevant attributes are the "vinification type" for the WEKA tool and the “vinification type and chemistry age” for the Microsoft tool. Despite the accuracy is similar for both tools (91.18% and 92%) the difference in the selection of attributes is justified by the more detailed analysis of correlation between the attributes by one tool against the other. General rules of the type “IF.. THEN” can be deducted from DT by following the route from leave to the root of the tree. From the tree of the figure 2 it could be derived that if SFTM equal to 30 and CA equal to “B” (between 0.32 and 0.4) then the colour is “B” (“good”) with a predict probability of 99.39%. In the table 4 we present some examples of the rules extracted from the DT from both tools.

As we mentioned the objective of the LR is to find the function (figure 3 (b)) which represents an approximate form of the variables behaviour. The LR obtained by the application of Microsoft Linear Regression for the attribute savour is shown in figure 3 (b) and the function in the table 6. To apply the LR it is needed that the variables must use continuous values. For this reason non-numeric attributes were removed from the data attributes namely the: clarification and vinification Type. LR uses interestingness and corresponds to rank and sort attributes in columns that contain continuous nonbinary numeric data. The interestingness score was used to assess all input columns, to ensure consistency. In the figure 3 (a) we present the Microsoft Data Mining Scatter plot for the LR of the subjective attribute colour. The Microsoft Linear Regression algorithm is a variation of the Microsoft Decision Trees algorithm that helps to calculate the linear relationship between the dependent and independent variable, and then use that relationship for prediction. In other hand, the scatter plot allows visual assessment of the relationship between the response and predictor variable.

Table 5 presents the results obtained by applying the ANN to create classification models for each subjective attributes. The prediction accuracy is very high and is located at about 97%. The ANN consisted of one input layer including 10 neurons (one for each input variable), one hidden layer with 10 neurons and one output layer consisting of 2 neurons (one for each class "A" – “Medium” and "B"- “Good”). The learning rate and “momentum” (WEKA tool) were set at 0.2 and the learning time at 300 iterations. Regarding the applicability of ANN in Figure 4 shows an example of how the ANN model can be interpreted. The most important attributes are the SFTM and the CA for the colour prediction. For example, if the SFTM is 20 and 6 (months) the probability of the colour be "good" is 96%. Instead, if the SFTM is 24 the probability of the colour be medium is 95,6%.

4 Conclusion and Further Work This paper presented a study for the prediction of the organoletic parameters (colour, foam, flavour and savour) from the chemical parameters in the wine vinification process. We use the Decision Trees, Artificial Neural Networks and Linear Regression as Data Mining techniques to create

The analysis of the results shows that there are some differences between the utilization of the DT an ANN. When using the same input variables the ANN's presented better

ISSN: 1790-5117

82

ISBN: 978-960-474-088-8


[5] Breimann, L., Friedman, J., Olshen A., Stone J., Classification and Regression trees. Wadsworth, Pacific Grove, 1984. [6] Castillo-Sanchez, J.X., Arantes J. et Maia, M.O. Étude de l' Évolution des Composés Phénoliques des Vins du Nord du Portugal Issues des Différentes Processus de Vinification. In: Polyphenols Comunications 96 Vol. I, 18th International Conference on Polyphenols, July 1518, Bordeaux, pp: 55-56, 1996. [7] Castillo-Sanchez, J.X, Mejuto, J.C., Garrido, J. and Garcia-Falcón, S. Influence of wine-making protocol and fining agents on the evolution of the anthocyanin content, color and general organoleptic quality of Vinhão wines. Food Chemistry, 97, 1, pp: 130-136, 2006. [8] Witten, I.H., Frank, E., Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, San Francisco, p. 369, 2000. [9] Larson, B., Delivering Business Intelligence with Microsoft SQL Server 2008, McGraw-Hill Osborne Media; 2 edition, 2008. [10] OIV. Office Internationale de la Vigne et du Vin. Recueil des Méthodes Internationales d’Analyse des Vins et des Moûts., Paris, 1990. [11] Papadopoulou C., Kalliopi , S., Ioannis, R., Potential Antimicrobial Activity of Red and White Wine Phenolic Extracts against Strains of Staphylococcus aureus, Escherichia coli and Candida albicans, CAntimicrobial Activity of Wine Phenolic Extracts, Food Technol. Biotechnol. 43 (1) pp.41–46, 2005. [12] Pyle, D., “Data Preparation for Data Mining, Morgan Kauffman Publishers, 1999. [13] Quilan, J.R., Bagging Boosting and C4.5, Proceedings of the fourteenth National Conference on Artificial Intelligence. [14] Souza, J., Matwin, S., Japkowicz, N., Evaluating Data Mining Models: A Pattern Language, Proceedings of the 9th Conference on Pattern Language of Programs, Illinois, USA, 2002. [15] Kohavi, R., Provost, F., Glossary of Terms, Machine Learning, 30 (2/3), pp. 271-274, 1998.

models for the classification and regression objectives of the Data Mining concepts application. The experiments were conducted using the new Microsoft Business Intelligence Studio 2008 and the open-source WEKA tool. Accuracies between 86% and 99% were obtained, indicating that the use of Data Mining models can be used to predict organoletic attributes in the wine vinification process based on chemical parameters. It was possible to create classification models for the various subjective attributes in order to identify the most relevant chemical attributes that influences the value of the subjective attributes. The ANN achieved a better performance compared to the DT however, the performance of DT in terms of accuracy was above 90%. Although the data set contains a few attributes and quite good results were attained. However in the future it should be interesting also to consider a new set of chemical parameters in the wine production. With this work we present the advantages of the usage of Data Mining tools to support decision-making process in particular in the field of winemaking. References: [1] Han, J., Kramer, M., Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2001. [2] Fayyad, U.M., Pialetski, G., Smith, P. Advances in Knowledge Discovery and Data Mining., The MIT Press, Massachussets, USA, 1996. [3] Quilan, J.R., Induction of decision trees. Machine Learning, pp: 81-106, 1986. [4] Venkatesh, E., Self-Organizing Map and multi-layer perceptron neural network based data mining to envisage agriculture cultivation, Journal of Computer Science, vol:4, issue (6) p:494(9) 2008.

Apendix Domain Values Min Max Sample Fermentation (time in months) {6, 8, 12, 14, 24, 30, 36} - SFTM {t, p, a, g, c} Clarification type {C, MC, CR} Vinification Type (vT) Chemical

Attribute

A

Categories/Classe B C

D

E

---

---

---

---

---

-----

----˃3,45 ≤ 3,56 ˃0,42 ≤ 0,5 ˃0,6 ≤ 0,75

----˃3,56 ≤ 3,63 ˃0,5 ≤ 0,6 ˃0,75 ≤ 0,97

----˃3,63 ≤ 3,7 ˃0,6 ≤ 0,7 ˃0,97 ≤ 1,27

-----

˃0,16

˃0,19

˃0,23

pH

3.29

3.9

≤3,45

Absorbency -A420

0.26

1,21

≤ 0,42

Absorbency -A450

0.31

2.48

≤ 0,6

Absorbency -A620

0.1

0.5

≤ 0,16

ISSN: 1790-5117

83

˃3,7 ˃0,7 ˃1,27 ˃0,28

ISBN: 978-960-474-088-8


≤ 0,19 ˃0,32 ≤ 0,4 ˃28 ≤ 34 ˃160 ≤ 230

≤ 0,23 ˃0,40 ≤ 0,48 ˃34 ≤ 40 ˃230 ≤ 330

≤ 0,28 ˃0,48 ≤ 0,56 ˃40 ≤ 48 ˃ 330 ≤ 520

67.7

914.3

≤ 0,32

Folin-Ciocalteeau Index (FCI)

0.2

0.9

≤ 28

Anthocyanines - Ant (mg/L)

12.6

69.8

≤ 160

Savour

1,9

8

≤ 5,4 (*)

˃ 5,4 (**)

---

---

---

Color

2,8

8,7

≤ 6,2 (*)

˃ 6,2 (**)

---

---

---

Foam

2

8,4

≤ 5,6 (*)

˃ 5,6 (**)

---

---

---

Aroma

1,5

8,1

≤ 4,7 (*)

˃ 4,7 (**)

---

---

---

Subjective

Chemical Age - CA

˃ 0,56 ˃ 48 ˃ 520

(*) Medium (**) Good

Table 1 – The main wine’s vinification indicators.

Fig.1 - Histograms for the attributes of the wine vinification data set.

Colour Foam Flavour Savour

A B A B A B A B

Model 1 - Microsoft Decision Tree Classification Matrix Predict Probability A B 37 7 85,36% 16 48 47 12 90,48% 4 45 56 7 91,18% 3 42 45 7 98,31% 7 49

Model 2 – WEKA Decision Tree Confusion Matrix Score Correct Classified Instances A B 140 26 0,85 83,3% 34 160 143 29 0,95 87,2% 17 171 162 12 0,89 92,2% 16 170 149 21 0,96 87,5% 24 166

Table 2 – Confusion Matrix of the obtained models

ISSN: 1790-5117

84

ISBN: 978-960-474-088-8


Model 1 - Microsoft Decision Tree Vinification Type and SFTM Clarificant Type and SFTM Vinification Type and CA SFTM, Clarification Type and Vinification Type


Model 2 – WEKA Decision Tree SFTM SFTM and Clarification Type SFTM SFTM and Clarification Type

Table 3 - Atributos mais relevantes do modelo 1 e 2 Model 1 - Microsoft Decision Tree IF SFTM= 36 AND cA=”B” THEN COLOUR=”B” (99,36%) IF SFTM =’26’ AND a420=’B’ THEN FOAM=’A’ (99,81%) IF cA=”A” AND aNT =’A’ THEN FLAVOUR=’A’ (99,35%) IF SFTM ≠ {8, 30, 36} and vT = 'C' and CA = 'D' THEN SAVOUR=’B’ (98,98%)


Model 2 – WEKA Decision Tree IF SFTM="20" AND vT="CR" AND aNT=E THEN B (75%) IF SFTM="20" AND cT="g" and cT="CR" THEN A (70%) IF SFTM="26" AND aNT="B" THEN A (17/1) IF SFTM="14" AND cT="g" AND vT="MC" THEN B (4)

Table 4 – Rules derived by the Data Mining Tools applying the Decision Trees technique

Model 4 – ANN WEKA Correct Classified Instances

Model 3 – Microsoft ANN Predict Probability

Score

Colour

81,21%

0,80

82%

Foam

94,94%

0,87

85,3%

Flavour

96%

0,92

91,2%

Savour

95,93%

0,91

87,5%

Table 5 – ANN Results Model 5 - Microsoft Linear Regression

Model 2 – WEKA Decision Tree

Model

Score

Model

CC

Colour

Colour = 6,241-0,060*(SFTM22,087)+0,002*(AntmgL-351,970)

1,38

Colour =-0.0547 * SFTM + 1.473 * pH -0.9412 * A420 + 0.95 * A520 -0.01 * FCI + 0.0014 * Ant(mg/L) + 1.6907

0.71%

Flavour

Flavour = 4,358+1,556*(CA-0,455)0,141*(SFTM-22,429)

1,91

Flavour = -0.1465 * SFTM + 1.9938 * pH + 0.4052 * A520 + 0.9344 * CA + 0.3105

0.88%

Foam

Foam = 5,430+1,228*(p H-3,571)+1,583*(CA0,458)-0,088*(SFTM-22,270)

1,41

Foam = -0.0784 * SFTM + 2.4955 * pH + 0.829 * A520 + 2.2724 * CA – 0.0011 * Ant(mg/L) -3.2165

0,76%

Savour

Savour = 5,133-0,123*(SFTM-22,143)

1,49

Savour = -0.1126 * SFTM + 2.0375 * pH -1.7608 * A420 +1.3292 * A520 + 0.106

0,79%

(*) Correctly Classified Instances

Table 6 – Linear Regression results

ISSN: 1790-5117

85

ISBN: 978-960-474-088-8

Wine Vinification prediction using Data Mining tools - wseas.us

Wine Vinification prediction using Data Mining tools - wseas.us

Suggest Documents

Wine Vinification prediction using Data Mining tools - wseas.us

RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A ...

Bankruptcy Prediction Using Data Mining ...

Enrollment Prediction Models Using Data Mining - Ashutosh ...

climate change prediction using data mining 1

Meteorological Phenomena Forecast Using Data Mining Prediction ...

climate change prediction using data mining 1

comparitive study of data classification using data mining tools

Using Data Mining for Wine Quality Assessment - Semantic Scholar

Using data mining techniques to predict industrial wine ... - CiteSeerX

Comparative genomics using data mining tools - Indian Academy of ...

Disease detection in medical prescriptions using data mining tools ...

Detecting Diseases in Medical Prescriptions Using Data Mining Tools ...

Customer Churn Prediction in Telecom using Data Mining - plaza

Seasonal to Inter-annual Climate Prediction Using Data Mining KNN ...

IRJET- Financial Distress Prediction of a Company using Data Mining

Heart Disease Prediction System Using Data Mining and Hybrid

Road Crash Proneness Prediction using Data Mining - Semantic Scholar

Prediction of Investment Patterns Using Data Mining Techniques - ijcce

Early Prediction of Heart Diseases Using Data Mining Techniques

stock market direction prediction using data mining ... - ARPN Journals

Heart Disease Prediction Using Machine learning and Data Mining ...

blood tumor prediction using data mining techniques - AIRCC Online

Lung cancer survival prediction using ensemble data mining on SEER