International Journal of Computer Science and Information Security (IJCSIS), Vol. 17, No. 1, January 2019
A proposed Model for Predicting Employees’ Performance Using Data Mining Techniques: Egyptian Case Study Mona Nasr
Essam Shaaban
Ahmed Samir
Faculty of Computers & Inf., Helwan University, Egypt
[email protected]
Faculty of Computers & Inf., Beni-Suef University, Egypt
[email protected]
Faculty of Computers & Inf., Helwan University, Egypt
[email protected]
Abstract—Human Resources Management (HRM) has become one of the essential interests of managers and decision makers in almost all types of businesses to adopt plans for correctly discovering highly qualified employees. Accordingly, managements become interested about the performance of these employees. Especially to ensure the appropriate person allocated to the convenient job at the right time. From here, the interest of data mining (DM) role has been growing that its objective is the discovery of knowledge from huge amounts of data. In this paper, DM techniques were utilized to build a classification model for predicting employees’ performance using a real dataset collected from the Ministry of Egyptian Civil Aviation (MOCA) through a questionnaire prepared and distributed for 145 employees. Three main DM techniques were used for building the classification model and identifying the most effective factors that positively affect the performance. The techniques are the Decision Tree (DT), Naïve Bayes, and Support Vector Machine (SVM). To get a highly accurate model, several experiments were executed based on the previous techniques that are implemented in WEKA tool for enabling decision makers and human resources professionals to predict and enhance the performance of their employees.
Database (KDD) and is currently acquiring great deal of attention and utilization. It is considered as a recently emerging analysis and predictive tool [2], because of the existence and multiplicity of massive amount of data
containing huge hidden unknown knowledge. Knowledge can be extracted through various methods and one of them is by using DM technique. DM techniques provides an approach to utilize different DM tasks such as classification, association, and clustering used to extract hidden knowledge from huge amount of data. Classification is a predictive DM technique, makes prediction about values of data using known results found from various data. Classification technique is a supervised learning technique in DM and machine learning, whereas the class level or the target class is already previously known. It is one of the most useful tasks in DM to build classification models from an input dataset. The used classification techniques commonly build models, which in turn used to predict future data trends [3]. With classification, Predictive models have the specific target of enabling us to predict the unknown values of variables depending on interest previously known values of other variables [4].
Index Terms —Classification, C4.5 (J48), Data Mining, Employees’ Performance, HRM, MOCA, Naïve Bayes, SVM
In this connection, the main objectives of the present study were extracted to support the decision makers in different locations to discover potential talents of employees as follows:
I. INTRODUCTION HRM has a leading role in deciding the competitiveness and effectiveness for better continuation. Organizations consider HRM as “people practices”. Therefore, it becomes the responsibility of the HRM to allocate the best employees to the appropriate job at the right time, train and qualify them, and build evaluation systems to monitor their performance and an attempt to preserve the potential talents of employees [1].
o Gathering a dataset of predictive variables, o Identification of different factors, which affects employees’ behavior and performance. o Using proposed DM classification techniques for constructing a predictive model and identifying relationships between most important factors affecting over whole efficiency of the model. There are various data classification techniques such as DT, SVM, Naïve Bayes classifier, and others. In this paper, the classification process is executed through using the three main classification technique that were mentioned above. Other techniques can also be used for classification such as Neural Network (NN), K-Nearest Neighbors (KNN), etc.
With the advancement and growth of technologies in business organizations, HR employees need not handle the massive amount of data manually any further. These data is very important for the decision makers, but there is a challenge to mine and get the best and useful data from these huge data [1]. From here, the role of DM comes. DM is a step in Knowledge Discovery in
31
https://sites.google.com/site/ijcsis/ ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS), Vol. 17, No. 1, January 2019
comprehensive study is presented on employee’s performance prediction model and criteria that this model measure based on the following literature study:
The C4.5 (J48) technique is one of the DT family. It can generate both decision tree and its rule sets. In addition, it builds the tree for enhancing the prediction accuracy. Besides that, the models that are generated from the C4.5 (J48) are easily understandable because the extracted rules from the technique have a very explicit uncomplicated interpretation and has the advantage that does not need any field learning or parameter setting. Where, the researcher can easily detect the most effective variables on the predicted target. J48 is the optimal implementation for C4.5 rev. 8 technique and it is the own version of WEKA toolkit package that will be used in this study [5].
Kirimi JM, Motur CA (2016) concentrates on collecting employees’ data of a public management development institute in Kenya using the user interface, generating a decision tree based on the historical data of employees, identifying the relationship between the DT accuracy and employees’ attributes. Moreover, they concentrated on the possibility of constructing two or more prediction techniques for predicting the employees’ performance and choosing the best suitable one for this organization [10].
Naïve Bayes classifier or the Bayesian therom is another classification technique that is utilized for predicting a target class. It depends on probabilities in its calculations, in addition, it provides a unique approach for realizing various learning algorithms that do not explicitly use probabilities [6]. Therefore, the results of this classifier are more accurate, effective, and more sensitive to recent data inserted to the dataset [7].
Desouki M. S., Al-Daher J (2015) presented a study for applying DM techniques such as DT, Key Nearest Neighbors (KNN), and SVM to the HRM field through analyzing the Performance Appraisal (PA) results, which supported by a multi-discipline academic research organization in order to enhance the appraisal method and assess the compatibility of practical implementation with the objectives of PA process. To achieve that, various DM tasks have been utilized such as clustering, classification, and prediction. This study concluded that DM tasks can be hopeful and important in dealing with the activities of human resource like enhancing the methods of performance’s evaluation [11].
SVM is considered as one of the most effective supervised machine learning techniques that has a straightforward structure and high ability for classification. Moreover, SVM is recognized as the appropriate technique in machine learning and DM for classification particularly on both linear and non-linear decision margins where, high accuracy of model can be produced [8]. SVM has many advantages such as it has no ceiling on the number of attributes and depends on the kernel trick for building the model through expert knowledge on the problem via kernel adjustment [9]. Sequential Minimal Optimization (SMO) is a SVM algorithm. It is recognized as an efficient classification technique in solving the problem of optimization. SMO can be considered as the state–of–the–art approach in a non-linear SVM [10]. SVM will train the dataset using SMO algorithm to build the prediction model.
V.Kalaivani, M.Elamparithi (2014) applied DT techniques in order to predict the employees’ performance; this is the objective of their research. DT is one of the most popular classification technique that creates both a tree and rules set; building the model of based a given data set. There are various DT algorithms as ID3, C4.5, CART, Bagging, Random Forest, Rotation forest, and CHAID. In this study, C4.5, Bagging and Rotation Forest algorithms are utilized, which are implemented in WEKA toolkit. Experiments were performed based on the collected data from an institution [12].
This paper is organized in six sections. The first section is the introduction, followed by the second section, which describes some related work on HRM, DM in HR, the classification techniques used for classification and prediction. The third section discusses the adopted methodology for constructing the proposed model. While, section 4 presents the experiments that executed for generating the model. Section 5 shows some results and discussion. Finally, section 6 ends with the concluding remarks and future research directions.
H. Jantan, Norazmah Mat Y. and Mohamad Rozuan N. (2014) applied SVM technique in the Classification process of Employee Achievement. This study aimed to investigate the effectiveness of SVM technique in detecting the required data pattern for classifying the employee achievement. The model’s accuracy was considered satisfactory by the SVM technique but needs some enhancement to get the higher [13]. Lipsa Sadath (2013) discussed the possibility of making decisions with automated and intelligent manner using DM techniques and depending on rich employee database. It was concluded that C4.5 technique had the higher accuracy. The objective of this study was predicting the employees’ performance, applying the finest Knowledge Management (KM) strategies, thus implementing stable HR system and powerful business [14].
II. LITERATURE REVIEW Many researches have used DM classification techniques for generating rules and predicting certain attitudes in various fields of science [5]. therefore, evaluation and prediction of employee’s performance efficiency are considered as a critical issue for detecting the whole number of variables and criteria related to the predictive model efficiency of the employees’ performance that have been reviewed. In this section, a
Qasem et al. (2012) used DM techniques for building a classification model in order to predict the performance
32
https://sites.google.com/site/ijcsis/ ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS), Vol. 17, No. 1, January 2019
employees’ performance. For achieving this objective, it is necessary to exist a generic guide to develop a DM project lifecycle containing certain steps that includes Problem Definition and Objective Structuring, Data Collection and Understanding, Data Preparing and Preprocessing, Modeling and Experiments, Testing and Evaluating.
of new employees. Different DT techniques were used for building the model such as ID3 and C4.5 (J48) algorithms, where several classification rules set were produced. In addition to using the Naïve Bayes classification technique as another classifier, where three experiments were conducted based on real data collected from several organizations for detecting the most effective factors on the employees’ performance. Moreover, the results of experiments showed that the job title was the effective factor on the employees’ performance [15].
In general, Classification contains some steps to complete its process. The first step is called the learning step where in the model; predefined classes are built by analyzing a set of training dataset variables. Each variable is assumed that has a relation and regards to a predefined class. The second step is responsible for estimating the accuracy of model or classifier (validating the model) through testing the model using a different dataset. If the classifier’s accuracy was considered acceptable, the model or classifier can be used to apply to new unseen data to give prediction about specific unknown label class and this is considered the third step as shown in figure 1. Therefore, the model acts as a classifier in the process of decision-making. There are various classification techniques have been used in the prediction process such as DT, Naïve Bayes, SVM, etc.
Hamidah Ja., Abdul Razak Ha., and Zulaiha (2010a) presented an important study about the problems that may face the talents management that can be solved by using various DM techniques. In this study, they attempted to settle one of the talents management tasks as identifying potential talents by predicting their performance based on previous experience knowledge and introducing the suitable DM Technique for this issue. [16]. Hamidah Ja., Abdul Razak Ha., and Zulaiha (2010b) used the DT techniques to investigate a study on how the potential talent can be predicted. In this study, the C4.5 (J48) classification algorithm was the main technique to produce the classification rules set for human talent performance records. Finally, the generated rules are evaluated using the new unseen data to assess the accuracy of the predication results [5]. Hamidah Ja., Abdul Razak Ha., and Zulaiha (2009) also discussed the potential classification techniques for talents’ forecasting. In this study, they used various classification techniques such as DT, NN, and KNN. They focused on the techniques’ accuracy to detect the most suitable one for HR data. The results showed that the DT technique was the potential one for talents’ forecasting in HRM, where it had the highest accuracy. The used dataset was collected from a higher education institution for academic staff [17].
Figure 1. The Classification Process in DM
A.
In General, this paper is an initiative attempt to investigate DM tasks, especially classification task, for supporting decision makers and HR’s professionals by identifying and studying the main factors of their employees that may positively affect their performance. The paper applied some of the classification techniques to build a proposed model for supporting the prediction of the employees’ performance. In the next sections, a comprehensive description of the study is presented, specifying the methodology, the experiments and results, and a discussion of the results, finally conclusions and recommendations for future work.
Problem Definition and Objective Structuring
The first step in data mining is to understand and define the right problem and specify the objectives. Meanwhile, data miners should also equip themselves with domain knowledge to understand problem nature, which will greatly improve DM effectiveness and efficiency. Indeed, human resource management activities are very complicated and thus few quantitative approaches have been employed in practice [2]. HRM at MOCA and most of other public sectors use traditional assessment techniques that they do not enable them to get the perfect assessment for the employees’ performance and therefore they cannot predict the performance and discover the talents.
III.
CONSTRUCTING THE CLASSIFICATION MODEL The proposed methodology was adopted for the objective, which is building the classification model studying certain factors that may affect and predict the
this research concentrates on how can present a proposed model supporting HRM and Decision makers
33
https://sites.google.com/site/ijcsis/ ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS), Vol. 17, No. 1, January 2019
to predict the employees’ performance of MOCA and identifying the employees’ factors that are affect and associate with bad/good performance. Moreover, detecting the most suitable DM technique with the most highly accuracy between the various classification techniques that will be used. B.
important advantage where it is available for free and has a simple GUI so, it could be used smoothly. The tools supported by the WEKA workbench are based on statistical evaluations of the models (algorithms). Consequently, the WEKA user can easily make comparisons among the results and accuracies of the applied machine learning and DM algorithms for a given dataset in flexible procedures in order to detect the most suitable algorithm for the given dataset [18].
Data Collection and Understanding Process
The idea of this study is building a classification model for predicting the employees’ performance based on a real dataset to get real and significant results for supporting the HR executives and the decision makers. To collect the required data, it is necessary to exist a practical way. Therefore, a questionnaire is prepared and manually distributed the employees of MOCA containing the several attributes that may affect and predict the performance Class (the target Class). The asked attributes for training dataset are selected based on the related factors for employee performance that confined between Educational factors, Personal factors, and Professional factors such as (job title, age, rank, qualifications, grade…etc.) as illustrated in table 1. These attributes are used to predict the employee performance (the target class) to be - Excellent, Very Good, or Good. The questionnaire was filled by 145 employees from all different sectors of MOCA with various job titles, ages, and ranks to get complete sample about them. C.
Feature Selection Feature selection is a one of the main concepts of DM and Machine Learning. Where, it is a process of selecting necessary useful variables in a dataset to improve the results of machine learning and make it more accurate. At which, Using too many numbers of variables in a dataset reduce predictive performance. The data set may contain too many features; some of them do not promote the prediction accuracy, and thus make the predictive model excessively complicated. Therefore, unnecessary useless variables must be avoided to make the model efficiently works. Deciding which unnecessary variable to avoid can be done by a manual manner using domain knowledge or it can be done automatically [19]. this paper targets getting the most important variables that may positively affect the accuracy of the employees’ performance prediction model using the various feature selection algorithms that are supported in WEKA such as CorrelationAttributeEval algorithm, GainRatioAttributeEval algorithm, ReliefFAttributeEval algorithm, and so on.
Data Preparation and Pre-processing
After the process of questionnaire collection finished, the process of preparing the data is performed, the raw data contained instances that were not applicable. This was due to errors and anomalies that had to be discarded. The data was transferred to Excel sheets to review and modify the types of the collected data where some attributes types need to be changed from numeric data type into categorical data type i.e. values illustrated by ranges for example the attributes of No. of experience years and service period (X3, X4) according to table 1. Other attributes need to be generalized in fewer discrete values instead of that they already for example the attribute of faculty specialization (X15) according to table 1 contained values like IT, CS, MIS they have been considered as only one value, IS and so on. Therefore, Data generalization is also considered as one of the data reduction techniques. After preparing the excel sheet and making the needed processing, the file was transformed into arff format that is compatible with the WEKA DM toolkit which was used in building the model.
IV. MODELING AND EXPERIMENTS The stage of Classification process comes after the data has been prepared and preprocessed. Three classification techniques were used, which they are SVM, DT, and Naïve Bayes classifier. These classification techniques are used and applied on the dataset for building the employees’ performance prediction model to get the most proper DM technique and the most effective variables that may affect and predict the employees’ performance as discussed at table 1. These variables consist of (A) Professional information such as: job title, rank, No. of experience years, No. of the service years at MOCA, No. of companies that worked for previously, salary, ask about working in comfortable conditions, ask about the existence of comfort and satisfaction with the salary, job, work conditions, and ask about getting trainings, (B) Personal information such as: age, gender, marital status, (C) Educational information such as: grade, degree, general specification, and university type. All these variables used to predict the target class (performance of MOCA’s employees) to be Excellent, Very Good, or Good.
The WEKA (Waikato Environment for Knowledge Analysis) toolkit is a machine learning platform, developed by researchers at the University. Java is the used implementation language. It provides a unified package at only one application, which enables users to access the modern updated technologies in DM and machine-learning environment. It contains several tasks such as pre-processing, classification, clustering, association and visualization. The WEKA tool had an
34
https://sites.google.com/site/ijcsis/ ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS), Vol. 17, No. 1, January 2019
TABLE 1.
THE USED ATTRIBUTES FOR PREDICTING THE TARGET CLASS (PERFORMANCE)
Variable Symbol
Variable
X1
JobTitle
A.
Description Employee’s Job Title Employee’s Rank or Level
X2
Rank
X3
#ExpYears
X4
ServicePeriod
X5
#PrevCo.
X6
SalRange
X7
ComfWorkCond.
X8
SatSalary
X9
ProfTrain.
X10
SatJob
X11
Age
X12
Gender
X13
MarStatus
Employee’s Marital Status
X14
EduDegree
Employee’s Education Degree
X15
GenSpecial.
General Specialization
X16
UniType
Type of the University
X17
Grade Performance
According to table 2, the results of the E1 indicated that the accuracy of the SVM technique is the highest through using the whole variables of the dataset with accuracy percentage 81.38%. In addition, the E1 results indicated that all these variables have some sort of effectiveness on the employees’ performance. The X9 variable is the most effective one on the performance. Other variables that participated in the decision tree generated from the C4.5 (J48) were X3, X2, X10, X14, and others had positively affected the performance.
No. of Working Experience Years Service Period at MOCA (in Years) No. of Previous Companies the employee worked for
The profTrain. (X9) variable was the most effective factor on the employees’ performance. The results showed that the variable had positively affected the performance of employees that took training and joined to courses related to their jobs better than ones who did not.
Range of Employee’s Salary Working in Comfortable conditions (in employee’s perspective). Answer with (Yes - No) Existing Satisfaction for Salary (in employee’s perspective). Answer with (Yes - No) Existing trainings for the job (in employee’s perspective). Answer with (Yes - No) Existing Satisfaction for the job (in employee’s Perspective). Answer with (Yes - No)
In the next experiment (E2), the feature selection algorithms were used to get the best feature subset for each algorithm from the whole dataset. These algorithms were CorrelationAttributeEval, GainRatioAttributeEval, and ReliefFAttributeEval algorithm. All of them are supported by WEKA tool.
Employee’s Age
B.
Employee’s Gender
By using the previously mentioned feature selection algorithms, Table 3 shows the important feature subset containing the most 10 important variables that positively affect the employees’ performance. In addition, the prediction accuracy for each classification technique applied to this dataset.
Employee’s Graduation Grade Employee’s Performance either as informed or predicted. This is the target class
According to table 3, the results of the E2 indicated that the highest accuracy of the three Classification techniques through using the three different feature selection algorithms is the SVM Technique with accuracy percentages 84.14%, 82.76%, and 82.07% in descending order. The results of E2 also proved that using less No. of variables as predictors for the target class as in E2 produced a higher accuracy than using all ones of the dataset as in E1, where the accuracy percentages of the three classification techniques in E2 through using different feature selection algorithms were better than the opposite ones in E1.
First Experiment (E1): Using the whole variables of the dataset that may affect the performance (17 variables)
In the first Experiment (E1), the whole variables of the dataset were considered and tested to measure the prediction accuracy of the three applied classification techniques. Table 2 shows the accuracy percentages of predicting the performance for each of these techniques. TABLE 2.
Second Experiment (E2): Using the important variables resulting from the use of Feature selection algorithms (10 variables)
TABLE 3. ACCURACY PERCENTAGES FOR PREDICTION ALGORITHMS IN E2 BASED ON USING FEATURE SELECTION ALGORITHMS
ACCURACY PERCENTAGES FOR PREDICTION ALGORITHMS IN E1
Prediction Accuracy No.
Technique
Prediction Accuracy
1
C4.5 (J48)
77.93 %
2
Naïve Bayes
71.03 %
3
SVM
81.38 %
Feature Selection Algorithm.
CORRELATIONATTRIBUTEEVAL
35
Produced Feature Subset
[X2,X6, X9,X11, X10,X3,
Technique C4.5 (J48)
Naïve Bayes
79.31 %
73.10 %
SVM 84.14 %
https://sites.google.com/site/ijcsis/ ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS), Vol. 17, No. 1, January 2019
who had less number of experience years. The table 5. below illustrates this finding.
X4,X14, X12,X7]
GAINRATIOATTRIBUTEEVAL
RELIEFFATTRIBUTEEVAL
[X9,X2, X6,X3, X10,X11, X4,X1, X14,X7]
79.31 %
[X3,X2, X6,X11, X4,X9, X14,X1, X10,X15]
79.31 %
C. 72.41 %
73.79 %
82.07 %
Third Experiment (E3): Using the most effective variables resulting from the tree generated using Decision Tree technique (5 variables)
In this experiment (E3), DT technique was used as a classification technique using its algorithm C4.5 to get the generated tree that illustrate the most effective factors on the employees’ performance and rank them with its effectiveness. The generated tree showed the five variables that had greatly affected the performance were X9, X3, X2, X14, and X10 as illustrated in figure 2. That experiment can be applied to determine whether the variables reduction would affect the accuracy of the classifier or not.
82.76 %
The C4.5 (J48) had the same accuracy percentage when the three feature selection algorithms were used even with the different produced feature subsets. Nevertheless, its accuracy in E2 is better than the opposite one in E1. The naïve bays classifier had the best prediction accuracy with percentage 73.79% when the ReliefFAttributeEval feature selection algorithm was used. In addition, its accuracy percentages through using the three algorithms of the feature selection in E2 was better than the opposite one in E1. The SVM algorithm had the best prediction accuracy with percentage 84.14% when the CorrelationAttributeEval feature selection algorithm was used. In addition, its accuracy percentages through using the three algorithms of the feature selection in E2 was better than the opposite one in E1.
TABLE 4. ACCURACY PERCENTAGES FOR PREDICTION ALGORITHMS IN E3 BASED ON THE FIVE EFFECTIVE VARIABLES No.
Technique
Prediction Accuracy
1
C4.5 (J48)
79.31 %
2
Naïve Bayes
82.07 %
3
SVM
86.90 %
According to table 4, the results of the E3 indicated that the SVM technique had the highest prediction accuracy through using the most five effective factors with accuracy percentage 86.90 %. If the three experiments’ results of E1, E2, and E3 were reviewed, The SVM technique would have the highest prediction accuracy at all experiments. Moreover, the prediction accuracy percentage of the SVM technique increased when the number of used variables had decreased at each experiment.
The 10 variables of the produced feature subsets had a weight from 0 to 1 and sorted in descending order. All of them had a greatly affected the employees’ performance but the most effective factor differ from each feature selection algorithm and other based on its weight through the used algorithm. When using the CorrelationAttributeEval algorithm, the most effective variable was the rank (X2) that had the greatest weight. Where, Employees’ performance with higher rank were better than ones with lower rank, But in some cases the better performance did not require higher rank as shown in table 5.
The results of E3 answered about the question of did the variables reduction would affect the accuracy of the classifier or not. Where, the results proved that the less of the used variables, the higher of the classifier accuracy. Therefore, it is very important to determine the variables that had the greatest effect on the performance to get the highest predication accuracy.
The profTrain. (X9) variable was the most positively affected the employees’ performance because it had the maximum gain ratio when GainRatioAttributeEval algorithm was used. Where the performance of employees that underwent professional training and joined to courses related to their jobs better than ones who did not. This effective factor was common in E1 and E2.
The generated tree indicated that all five variables had some sort of effect on employee’ performance, but the profTrain. (X9) variable had the greatest positive effect on the employees’ performance and it was the starting node at the tree as shown in figure 2. Where those employees that underwent professional training had a better performance than ones who did not as previously illustrated. If the three experiments’ results of E1, E2 were reviewed, and E3, the results would prove that the X9 variable was common in all experiments. Other variables that participated in the generated tree were Rank, #ExpYears, EduDegree, and SatJob variable.
As shown in table 3, the #ExpYears (X3) variable had the greatest effect on the employees’ performance when the RelieffAttributeEval feature selection algorithm was used. Where employees with more numbers of experience years related to their jobs had positively affected the performance compared to those
36
https://sites.google.com/site/ijcsis/ ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS), Vol. 17, No. 1, January 2019
The Rank and #ExpYears (X2, X3) variables had a greatly effect on the employees’ performance where the experiments results showed that employee’s seniority plays an important role at the performance of the MOCA’s employees. Where employees with higher rank like First rank and more number
The EduDegree (X14) variable had positively affected the employees’ performance where, employees with higher academic education degrees had a better performance compared to ones with lower academic qualifications. Figures and generated rules form the decision tree concluded that most of MOCA’s employees with higher qualifications like PhD and Master’s degree had excellent performance disregarding the financial rank of them.
experience years performed better than the newest ones who had lower ranks like third rank and less number of experience years. Nevertheless, there was an exception for this rule. Where, employees with a large number of experience years related to the wanted job and they newly hired to MOCA and were accommodated to lower financial rank like Third rank according to the low of civilization service, had a better performance compared to ones who had a higher rank because of the high service period and had a low number of experience years.
The SatJob (X10) variable also had positively affected the employees’ performance. Where, it was noticed through experiments that employees who had a general satisfaction towards their job had a better performance compared to ones who did not had a satisfaction towards their job. Even if the employee had some years of experience related to his job and he was not satisfied, his performance would not be Excellent as it promising.
Figure 2. The decision tree generated from using C4.5 algorithm for E3 to predict employees’ performance
37
https://sites.google.com/site/ijcsis/ ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS), Vol. 17, No. 1, January 2019
TABLE 5. CLASSIFICATION RULES GENERATED BY C4.5 ALGORITHM IN E3 FOR PREDICTING EMPLOYEES’ PERFORMANCE
SVM technique was the most suitable classifier for the dataset.
V. RESULTS AND DISCUSSION In this research, the accuracy of the DM classification techniques was measured through using the averaging accuracy of 10-fold cross validation dataset that supported by WEKA toolkit. The results of the three above experiments that exist in the tables (II, III, and IV) showed that all of the three techniques had convergent and moderate accuracy, which is greater than 70%. The moderate accuracy can be considered as acceptable accuracy in many cases. In the all three experiments, the dataset produced satisfactory models for each of the three selected classification techniques.
The research has found that several variables had greatly affected the employees’ performance of MOCA. One of the variables that had the highest effect is ProfTrain. (X9). The evidence of the importance of professional training and its effect on the employees’ performance is the trend of the state recently for employees’ rehabilitation and human resource development and enrolled them in professional training to increase their performance. Other professional variables like #ExpYears (X3) had positively affected the employees’ performance as shown in the results of the three experiments E1, E2, and E3. Where the experience factor had an important role in the performance of MOCA’s employees but with existence a condition of consistency of employees’ rehabilitation and supporting them with professional training and courses for enhancing their performance. The professional variable of ServicePeriod (X4) had positively affect the performance. This slight impact had been shown in E2, while in the other experiments E1 and E3 it was not significant. The performance of seniors
The goal of this research was detecting the most suitable classification technique for the used dataset. Sequel to the above, the accuracy of the model was used to define the most proper classification technique for the dataset. The model was created after the classification process was evaluated using 10-fold validation technique. As shown in the three tables (II, III, and IV) of the three above experiments, The SVM technique had the highest accuracy of among the selected techniques through all the experiments. As a result of the above, the
38
https://sites.google.com/site/ijcsis/ ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS), Vol. 17, No. 1, January 2019
was high if compared to juniors. Nevertheless, this was not a permanent rule as shown in the next paragraph. Finally, this correlation between this variable and the rank (X2) variable existing in E2 is natural because the employee’s financial rank is based on his service period, where the movement from the lower rank and the higher one requires specific number from service’s years. It is a HR’s law.
VI. CONCLUSION AND FUTURE WORK Applying the DM techniques in the different problem domains in the HRM field is considered as an important and urgent issue. Especially, at the public sector in Egypt. In addition, increasing the horizons of academic and practice research on DM in HR for reaching a government sector with a high performance. This paper has concentrated on the capability of building a predictive model for employees’ performance of MOCA using classification techniques through studying and testing the factors that might positively affect the performance of the MOCA’s employees. Some of them had greatly affected the performance prediction. Proftrain. (X9) was found as the most effective factor on the performance then the #ExpYears (X3). The SVM technique was found as the most suitable classifier for building the predictive model, where it had the greatest prediction accuracy through all the three experiments that had executed with the highest percentage 86.90%. WEKA toolkit was used through executing the experiments.
On the other hand, the results showed that the MOCA had hired new employees who had a degree of master’s and PhD, in addition to hiring the top graduated students of the faculties. Those employees had high studies and capabilities and did not need a high rank or a lot of time and many years of experience to perform their tasks. Therefore, the EduDegree (X14) variable had a great effect on the performance as shown at TABLE 5. Some Personal Variables such as Age (X11) had slightly affected the employees’ performance, but not with obvious impact. Since sometimes, the performance increases with the increase of age that includes the experience factor, but at other times, it decreases because of the lack of the highest motivation and ambition compared to the younger employees.
For decision makers and HRM department, this model, or an enhanced one, can be utilized in predicting the performance of the potential talents that will be promoted, predicting the performance of the recently applicant employees where various actions can be taken for avoiding any risk related to hiring employees with a low performance, or so on.
One of the professional variables that had a positive effect on the performance was SalRange (X6) where its impact had been shown in E2, where the employees with high salaries performed better compared to ones who received low salaries. Sometimes, the money factor plays an important role in the employee’s performance. On the other hand, it had not an effective role on the performance in E1 and E3. Someone can find this thing as a surprise. But in the truth, this is a natural thing because this research was about employees at a public sector, where the salaries are almost specified for each financial rank. The salaries are based on the employees’ seniority and the employees know that well.
As future work, it is recommended to support the used dataset with a greater number of employees to get high accuracy for the predictive model. The accuracy of other classification techniques such as Neural Network (NN), fuzzy logic and many others should also be experimented to validate these findings and help to select a more robust model.
Last but not least, the result of the experiments had showed that the professional then educational variables had the greatest impact on the performance of MOCA’s employees much more than the personal ones.
Finally, when the suitable predictive model is generated, an application could be developed to be used by the decision makers and HR’s Officials based on the generated rules for predicting the performance of employees.
As a final analysis on the accuracy of the classification models that built through the three experiments, it was noticed that the prediction accuracy was much more in E3 than in experiments E2 and E1 for all different techniques used excepting the C4.5 (J48) technique. It had the same accuracy in experiments E3 and E2 but it was much more than E1. This might prove that the less of the used variables in the classification process, the higher of the classifier accuracy. Therefore, it is very important to determine the variables that had greatly affect the performance to get the highest predication accuracy.
The manuscript has not been published elsewhere. REFERENCES
39
[1]
L. Sadath, (2013) “Data Mining: A Tool for Knowledge Management in Human Resource,” International Journal of Innovative Technology and Exploring Engineering, Vol. 2, Issue 6, April 2013.
[2]
G. K. Gupta (2006) “Introduction to Data Mining with Case Studies” ISBN-81-203-3053-6.
[3]
AI-Radaideh, Q. A., AI-Shawakfa, E.M., and AI-Najjar, M. I., (2006) “Mining Student Data using Decision Trees”, International Arab Conference on Information Technology(ACIT'2006), Yarmouk University, Jordan, 2006.
https://sites.google.com/site/ijcsis/ ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS), Vol. 17, No. 1, January 2019
[4]
Surjeet K. Y., Brijesh B., Saurabh P., (2011) “Data Mining Applications: A comparative Study for Predicting Student's performance”, International Journal of Innovative Technology and Creative Engineering, Vol.1 No.12 (2011) 13-19.
[12] V.Kalaivani, Mr.M.Elamparithi (2014), “An Efficient Classification Algorithms for Employee Performance Prediction”, International Journal of Research in Advent Technology, Vol.2, No.9, September 2014 E-ISSN: 2321-9637.
[5]
Jantan, H., Hamdan, A. R., & Othman, Z. A. (2010b). “Human talent prediction in HRM using c4.5 classification algorithm”. International Journal on Computer Science and Engineering, 2 (08-2010), PP. 2526–2534 [D].
[6]
Islam, M. J., Wu, Q. M. J., Ahmadi, M., and Sid-Ahmed, M. A., (2010), "Investigating the Performance of Naive- Bayes Classifiers and K- Nearest Neighbor Classifiers" Journal of Convergence Information Technology Volume 5, Number 2, April 2010.
[13] Hamidah Jantan, Norazmah Mat Yusoff and Mohamad Rozuan Noh (2014), “Towards Applying Support Vector Machine Algorithm in Employee Achievement Classification”, Proceedings of the International Conference on Data Mining, Internet Computing, and Big Data, Kuala Lumpur, Malaysia, 2014 ISBN: 978-1-941968-02-4 ©2014 SDIWC.
[7]
[8]
[9]
[14] Lipsa Sadath (2013), “Data Mining: A Tool for Knowledge Management in Human Resource”, International Journal of Innovative Technology and Exploring Engineering (IJITEE), Vol-2, April 2013.
Al-Radaideh, Q.A., Al-Nagi, E., (2012). “Using Data Mining Techniques to Build a Classification Model for Predicting Employees Performance”, International Journal of Advanced Computer Science and Applications, 3(2), pp 144 – 151.
[15] Qasem et al. (2012), “Using Data Mining Techniques to Build a Classification Model for Predicting Employees Performance”, in (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 3, No. 2, 2012.
S.Yasodha and P. S.Prakash, (2012), "Data Mining Classification Technique for Talent Management using SVM," the International Conference on Computing, Electronics and Electrical Technologies, 2012.
[16] Jantan, H., Hamdan, A.R. and Othman, Z.A. (2010a), “Knowledge Discovery Techniques for Talent Forecasting in Human Resource Application”, International Journal of Humanities and Social Science, 5(11), pp. 694-702.
Hua Hu, Jing Ye, and Chunlai Chai, (2009), “A Talent Classification Method Based on SVM”, in International Symposium on Intelligent Ubiquitous Computing and Education, Chengdu, China, 2009, pp. 160-163.
[17] Jantan, H., Hamdan, A.R. and Othman, Z.A. (2009), "Classification Techniques for Talent Forecasting in Human Resource Management” in 5th International Conference on Advanced Data Mining and Application (ADMA), Beijing, China, 2009, pp. 496-503.
[10] Kirimi JM, Motur CA (2016), “Application of Data Mining Classification in Employee Performance Prediction”. International Journal of Computer Applications, Volume 146 – No.7, July 2016.
[18] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. Witten (2009), “The WEKA data mining software: an update”, ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10– 18, 2009.
[11] Desouki M. S., Al-Daher J (2015), “Using Data Mining Tools to Improve the Performance Appraisal Procedure, HIAST Case”. International Journal of Advanced Information in Arts, Science & Management Vol.2, No.1, February 2015.
[19] Pedro Domingos. (2012), “A few useful things to know about machine learning. Communications of the ACM” 55(10), 78-87.
40
https://sites.google.com/site/ijcsis/ ISSN 1947-5500