A Multi-Stage Ensemble Data Mining Model to Predict ...

A Multi-Stage Ensemble Data Mining Model to Predict Ferritin Serum Levels Mohammad A. Abedini, Iran University of Science & Technology, Department of Industrial Engineering, Tehran, Iran Kamran Heidari, Department of Emergency Medicine, Loghman Hospital, Shahid Beheshti University of Medical Science, Tehran, Iran Mohammadsadegh Mobin, Afshan Roshani, Aliea Afnan, Western New England University, Department of Industrial Engineering and Engineering Management, MA, USA

Abstract Motivation: The Ferritin serum level is one of the key factors in diagnosing Iron Deficiency Anemia (IDA)-related diseases, which is one of the most common types of anemia. It is not common to measure the ferritin serum level in many cases, especially in the primitive stages of disease diagnostics; and in clinical laboratories it is not always feasible to assess ferritin serum levels.

Dataset Description Dataset was obtained in TALEGHNI Hospital, Tehran, Iran. About 300 people were selected from the hospital patient list and after initial assessments by Department of Clinical Diagnostics; a dataset size of 164 was selected.

The Proposed Model Framework

● Correlation-based feature selection ● Search method: Best first  Findings: Input Features for the Regression Model

Dataset Partitioning Dataset: 164

Objectives:

Train Set: 114

In this research, we developed a multi-stage ensemble data mining model which predicts the ferritin serum level in a more efficient way.

Model Input & Output Inputs (Features)

● CBC Test Result:

Summary of Proposed Model: The proposed model works as a Decision Support System (DSS) which considers the Complete Blood Count (CBC) test results as inputs in order to make a prediction for ferritin serum levels. The developed model uses demographical information of the patients in addition to CBC test results consisting of three stages: 1. Select important features using correlation-based feature selections; 2. Train the decision tree as a base classifier by applying four different ensemble regressions approaches including: Bagging, Additive regression, Rotation forest and Random subspace; 3.Evaluate and compare mentioned approaches based on correlation coefficient and root mean squared error criteria.

1. Red Blood Cells(RBC) 2. Hemoglobin (HG) 3. Hematocrit (HCT) 4. Mean Corpuscular Volume (MCV) 5. Mean Corpuscular Hemoglobin (MCH) 6. Mean Corpuscular Hemoglobin Concentration (MCHC)

● Demographical Characteristics: 7. Age 8. Sex

PROPOSED DATA MINING MODEL

Output: Ferritin Serum Level Note: All features from CBC Test and Ferritin are numeric.

Data Mining Software

Model Verification

2. Ensemble Learning

● Comparing ensemble models on Test Set

● Base Regression: REPTree ● Ensemble methods: 1. Bagging 2. Additive regression 3. Rotation forest 4. Random subspace  Findings: Four Different Trained Ensemble Regression models

● Comparing the ensemble model with other powerful data mining tools, such as:

3. Model Selection

∘ Multilayer Perceptron Neural Network, ∘ RBF Neural Networks, ∘ Support vector Machines (SVM), ∘ Linear Regression (LR).

● Evaluation criteria: 1. Correlation Coefficient 2. Root Mean Squared (RMS) Error  Findings: Final Model

Feature Selection Result Using Correlation Based Feature Selection (CBFS) as feature selection method and best first algorithm for searching option, four features were selected in first stage including: 1. HG 2. MCH 3. Age 4. Sex

Conclusions

Ensemble Learning Result

Summary of the Results: The results show that the bagging approach outperforms the others in terms of both criteria. By conducting this case study, the proposed model has proven to be an efficient DSS in IDA diagnosis.

Considering RMS and Correlation Coefficient simultaneously as evaluation criteria, It turned out that the bagging ensemble approach had better performance improvement for base regression method (REPTree). Also, Additive Regression failed to make any improvement on the base regression method (REPTree) performance.

1. Feature Selection

Test Set: 50

Regression

Weka is a collection of Machine Learning Algorithms for data mining tasks. It contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.

Model Selection Results

REPTree Bagging Additive reg. Rotation forest Random subspace

Correlation Coefficient 0.669 0.845 0.669 0.781 0.721

RMS Error 1.040 0.936 1.040 0.986 0.990

● The bagging approach had more improvement on the performance of REPTree as a weak learner. ● The suggested model outperformed other powerful data mining tools. ● Beside Diagnostic Procedures, the model could be used as DSS and help in Ferritin serum level estimation and IDA diagnosis.

A Multi-Stage Ensemble Data Mining Model to Predict ...

A Multi-Stage Ensemble Data Mining Model to Predict ...

Suggest Documents

a data mining approach to predict prospective business sectors ... - arXiv

A Data Mining Approach to Predict Forest Fires using Meteorological ...

Data Mining Methods to Predict Failure due to Partial ...

Active ensemble learning: Application to data mining and bioinformatics

Data Mining Model Management to Support Real

Comparing data mining classifiers to predict spatial ...

Using data mining techniques to predict industrial wine ... - CiteSeerX

using data mining to predict secondary school student

Application of Data Mining Techniques to Predict Allergy Outbreaks ...

PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN ...

Mining Citizen Science Data to Predict Prevalence ... - Semantic Scholar

Using data Mining to Predict Instructor Performance - Core

IRJET- Various Data Mining Techniques Analysis to Predict Diabetes Mellitus

Using data mining techniques to predict industrial ... - Semantic Scholar

Using data mining techniques to predict ... - Semantic Scholar

learning to predict forest fires with different data mining techniques

using data mining to predict secondary school student performance

Using data Mining to Predict Instructor Performance (PDF Download ...

Data Mining Techniques: To Predict and Resolve Breast Cancer ...

Ensemble modeling to predict habitat suitability for a ... - Forest Service

Ensemble modeling to predict habitat suitability for a largescale

Data Mining based on Random Forest Model to Predict the California ...

A Data Mining framework to model Consumer Indebtedness ... - arXiv

A Hybrid Data Mining Model to Improve Customer Response Modeling ...