Recent Advances in Computers, Communications, Applied Social Science and Mathematics
Rule-based Genetic Algorithm for In-Service Training Curriculum Plan DİDEM ABİDİN1, ŞEN ÇAKIR2 Tire Kutsan Vocational School, 2Department of Computer Engineering 1 Ege University, 2Dokuz Eylül University 1 E.Ü. Tire Kutsan MYO 35900 Tire, İzmir, 2Dokuz Eylül University Tınaztepe Campus Buca, İzmir TURKEY 1
[email protected], 2
[email protected] 1
Abstract—In many companies, the employees take some training programs in order to cope with the innovations about their profession. The managers need to save time while the employees are being trained – the business must continue. Therefore, training plans must be prepared carefully. The employees may not be at the same level of knowledge and training must be planned with an optimum content for the employees. Preparing the curriculum plan manually may cause some problems like conflictions and may be waste of time for some of the employees if they are aware of the subject taught in the training. In this study, software, that provides a quite applicable solution to curriculum planning problem of the corporations by using the genetic algorithm technique, is introduced. The system aims to find the optimum training set for the employees. The study differs from the similar ones with the methodology of obtaining the fitness function of the genetic algorithm by using the rule base mechanism of an expert system. The project is available for using for a company’s in-service training timetable optimization. Keywords: – Genetic Algorithms, Rulebase, Curriculum Planning, Optimization.
1 Introduction Nowadays, using our time effectively is a rapidly evolving notion in order to catch up with the technological improvements and higher life standards. Timetables or curriculum plans, which are used for this purpose, have a great importance to determine the workflow of the companies and to make use of the personnel and technical hardware effectively. Thus, preparing the curriculum plans of companies for in-service training has also become a rather popular research area. However, preparing an optimum curriculum plan manually might turn out to be a quite complex and time consuming problem. In a corporation, it is a must to take the constraints of each component (instructors and classrooms for a school, flight traffic for an airport, nurse rostering or operating room timetable for a hospital [1], etc.) as in timetabling problems into consideration. Since timetabling problems are generally difficult to manage with linear solutions, some evolutionary algorithms and stochastic search techniques are used while dealing with such complex problems. Genetic algorithms (GA) are said to be the most appropriate search methodology for optimization problems and GAs may be applied to many different application areas. GAs were first suggested by Holland [2] in 1970s and developed by Goldberg in 1989 [3]. With genetic algorithms, Expert System (ES) approach can also be used in solving curriculum planning problems. ESs are intelligent computer programs that can be replaced with a human expert in a certain subject to make decisions and find solutions to a problem by using its own inference mechanisms and human expertise data [11]. Some of the application areas of the ES are medical treatment, engineering failure analysis, decision support, knowledge representation, climate forecasting, decision making and learning, chemical process controlling [10] and education. Educational corporations are the corporations that use the ESs more frequently because curriculum planning and preparing schedules manually is a quite complex process. The usage of an ES mechanism can be integrated with a GA by using some of the components of the ES with the GA. This mechanism ends up with a
ISBN: 978-1-61804-030-5
160
Recent Advances in Computers, Communications, Applied Social Science and Mathematics
“hybrid” system to be used in optimization problems. In this study, only one component of the ES is used as a part of the system; for that reason, the system itself is not an ES, but a rule-based GA is in question. There exist many studies in which the GA and ES techniques are used together. The application areas that most of the hybrid studies are made are product design [4], cost management [5], decision making tools in fashion [6] and different sectors, optimization problems [7][8]. Our contribution to the literature is using the rule base (knowledge base) component of an expert system in the fitness function of the genetic algorithm to prepare the curriculum plan. The training data of this study is the inservice education material of a software company. We aimed to obtain more effective solutions to curriculum planning problem of the companies and the output format was planned to be a generic one like XML. The project consists of such a hybrid system to be used to prepare a curriculum plan for in-service training process of companies. It is aimed to integrate the rule base component of an expert system as the input data of a genetic algorithm to be used for the optimization phase. This software will be helpful for the companies that have trouble with preparing in-service training programs for their trainees. The study also has a different application area for the XML technology. The XML files include the rule base data as the input of the initial population of the genetic algorithm and the timetable output can also be saved in XML format. The system has a practical user interface, which permits the user to add and delete the data residing in the system when needed. The courses are divided into modules and the instructors can add their courses and the modules of the courses. The range of the modules in the training takes its shape according to the requirements of the trainees. Some trainees may pass some of modules, where others may fail; that is, not all of the trainees are successful in the same modules. There comes up the problem about the range of the modules in the curriculum plan. After the data for a course and its modules is provided properly, the system runs the genetic algorithm phase to optimize the range for the selected modules of the course.
2 Material and Method The course material for an in-service training is divided into modules. The modules of the courses to be selected for the training program can be added and dropped via a user interface. The time periods of all modules and prerequisites modules of the modules (if any) must be determined clearly. This data is used as the rule base phase of the system, which is treated as an ES component. The prerequisite modules of a module are represented as IF-THEN statements as used in ES methodologies. The IF-THEN statements used for the prerequisite rules of the modules are represented in Fig. 1. IF M1 THEN M2 IF M1 THEN M3 IF M1 AND M2 AND M3 THEN M4 IF M1 AND M3 AND M6 THEN M7 IF M1 AND M2 AND M4 AND M5 AND M8 THEN M9 … IF M1 AND M14 THEN M16 IF M1 AND M2 AND M3 AND M12 THEN M20 IF M1 AND M2 AND M12 AND M15 AND M17 AND M20 THEN M22 … IF M1 AND M3 AND M7 AND M27 THEN M28 IF M1 AND M3 AND M7 AND M27 AND M28 THEN M29 … IF M1 AND M2 AND M3 AND M4 THEN M35 …
Fig.1 – Prerequisite rules among the modules.
ISBN: 978-1-61804-030-5
161
Recent Advances in Computers, Communications, Applied Social Science and Mathematics
After the training program has been applied to the trainees once, the grades and the failed modules of the trainees are evaluated. Our system focuses on planning the curriculum for the second turn. The failed modules are chosen for compensation training. Since some of the modules are not needed to be repeated, the second training program does not need to follow the original module turn. Instead, the modules are first arranged from the most failed to the least failed one. The number of chosen modules determines the length of the chromosomes used in the GA. The initial population of the GA is produced from the first module range obtained. While GA is performed, the fitness function of the GA to be applied to these chromosomes use a rule set that includes the prerequisite information of each module as mentioned above. The result of the GA produces the most reliable and useful range of the modules to be used in the compensation training.
2.1 The Genetic Algorithm The GA used in the project works on the chromosomes, which are composed of the module numbers. Each module can take place in a chromosome only once; therefore permutation encoding technique is used for chromosome encoding. The module numbers that build up the chromosome are taken from the XML file, in which the prerequisite rules of the modules are saved. 2.1.1 GA Operators In the project, a standard GA is used with one – point order crossover and two – point crossover. In one-point crossover, the parents change their genetic material according to the randomly chosen splitting point. But since permutation encoding is used, the exchanging parts of the chromosomes are arranged considering the nonrecurring of the genes. In two-point crossover, the genes between two randomly chosen positions on the chromosome are inversed to obtain the new child for the next generation. The two crossover techniques are used in three different ways; we call them GA1, GA2 and GA3 respectively.
GA1 with 1-point order crossover, GA2 with 2-point crossover, GA3 with one of the two crossover methods chosen randomly (1-point order crossover or 2-point crossover) in each generation [12].
Swap mutation is applied in various probabilities as the mutation method in the study. Using swap mutation in various probabilities helps to manage the parameter tuning task of the GA. When a chromosome is mutated, two genes (module numbers in this case) are swapped and a new child for the population is produced. The aim of the mutation process is to sustain the diversity of the population and prevent the fitness values converge rapidly to maximum. The selection operator is determined as the linear rank selection operator [9] as given in (1), where rankval i represents the rank value of the ith individual in the population, N represents the population size and rank i represents the order of the ith individual.
rankvali
max min ranki 1 1 min N N 1
(1)
The elitism mechanism is also applied to protect the best individual in the population by passing the best two individual directly to the next generation. In order to obtain the best results from the GA, parameter tuning is done by changing the four basic parameters of the GA. These parameters are population size, generation, crossover rate (crate) and mutation rate (mrate). The parameters used for different scenarios (S1 – S16) are given in Table 1. These parameters are applied to all three GAs (GA1, GA2 and GA3) having different crossover methods used in the study.
ISBN: 978-1-61804-030-5
162
Recent Advances in Computers, Communications, Applied Social Science and Mathematics
Table 1 – Parameter Tuning S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16
Pop. size 100 100 100 100 100 100 100 100 50 50 50 50 50 50 50 50
Generation 100 100 100 100 200 200 200 200 100 100 100 100 200 200 200 200
crate 0.80 0.80 0.90 0.90 0.80 0.80 0.90 0.90 0.80 0.80 0.90 0.90 0.80 0.80 0.90 0.90
mrate 0.2 0.3 0.2 0.3 0.2 0.3 0.2 0.3 0.2 0.3 0.2 0.3 0.2 0.3 0.2 0.3
2.1.2 The Fitness Function The input of the fitness function of the project is the XML file in which the prerequisite rules of the modules are saved. The XML file including the rules is given in Fig. 2.
Fig. 2 – XML File Including Rules among the Training Modules. The XML file is converted to a sparse matrix and for each row of the matrix, prerequisite module numbers are represented with 1’s. The fitness function mechanism works with the penalty scores calculated for the modules on the chromosome. When a prerequisite of a module appears after the module in the chromosome, the penalty point is increased by calculating with a weight value of the module. The chromosome with the highest penalty score is
ISBN: 978-1-61804-030-5
163
Recent Advances in Computers, Communications, Applied Social Science and Mathematics
defined as the worst individual of the population, that’s why the GA tries to minimize the penalty scores. The chromosome with the minimum value of the penalty score turns out to be the best individual of the generation.
3 Results The system has two different outputs, one before the optimization and the other after the optimization process. The first one is the XML files of the lessons, including the prerequisite rules among the modules of the lesson inserted by the instructor. The second is another set of XML files in which the timetable for the optimum sequencing of the chosen modules is saved. The XML files of the courses for both the rule base and the timetable are saved as distinct files. The GA is run for 16 different parameter combinations with 3 different crossover scenarios. The runtime results in milliseconds (ms) for three crossover scenarios with 16 different parameter combinations are given in Table 2, Table 3, Table 4 and Table 5. Table 2 – Runtime Results for Scenarios S1-S4 GA1 GA2 GA3
S1 3796 4125 3796
S2 4156 4046 4125
S3 4187 4109 4171
S4 3828 4140 4156
Table 3 – Runtime Results for Scenarios S5-S8 GA1 GA2 GA3
S5 8187 7593 8078
S6 8125 8093 8125
S7 8250 8218 8171
S8 8218 8125 8109
Table 4 – Runtime Results for Scenarios S9-S12 GA1 GA2 GA3
S9 2171 2109 2140
S10 2109 2062 2171
S11 2218 2140 2187
S12 2156 2093 2146
Table 5 – Runtime Results for Scenarios S13-S16 GA1 GA2 GA3
S13 4171 4000 4125
S14 4187 4125 4218
S15 4281 4093 4140
S16 3890 4062 4156
According to the results obtained, some crossover rates do not make a remarkable difference, where the crossover rate 0.80 has the most effective results in terms of the runtime when compared to other scenarios (S1, S2, S5, S6, S9, S10). The crossover rate 0.90 does not give the most effective runtime results even if different mutation rates are tested as well. Another important point is about GA3; that is, using one-point order crossover and two-point crossover together randomly always give the best runtime results with the crossover rate of 0.80 and mutation rate of 0.2. Below in Fig. 3, Fig. 4 and Fig. 5 the behavior of the fitness function for the scenarios S1 to S16 are given for GA1, GA2 and GA3 respectively. The fitness values in all cases have a weak point; that is they rapidly converge to the maximum. The scenarios with GA1 seem to have better results in terms of converging to the maximum, which means the diversity of the population lasts longer during the execution of the algorithm. The next step of the project aims to focus on preventing the rapidly convergence. The project has been tested on the training data of a software company and obtained reasonable results while optimizing the in-service training timetable. The module ranges obtained as the results of the program executions
ISBN: 978-1-61804-030-5
164
Recent Advances in Computers, Communications, Applied Social Science and Mathematics
are shown to the training program experts of the company to be evaluated. The system produced results similar to the ones that the training program experts prepared manually. It is aimed to repeat the tests with another type of inservice training data for another company in a different sector with different training requirements.
Fig. 3 – Fitness Graphics for GA1
Fig. 4 – Fitness Graphics for GA2
ISBN: 978-1-61804-030-5
165
Recent Advances in Computers, Communications, Applied Social Science and Mathematics
Fig. 5 – Fitness Graphics for GA3 References 1. Cardoen B. et al., “Operating Room Planning and Scheduling: A Literature Review”, European Journal of Operational Research, Vol. 201, Elsevier, 2010, pp: 921-932. 2. Holland H.J., “Adaptation in Natural and Artificial Systems”, Cambridge, MIT Press, 1975. 3. Goldberg D.E., “Genetic Algorithms in Search, Optimization and Machine Learning”, Reading, Addison Wesley, 1989. 4. Chaoan L., “The Expert System of Product Design Based on CBR and GA”, International Conference on Computational ;Intelligence and Security Workshops, IEEE, 2007, pp: 144-147. 5. Chou J-S., “Generalised Linear Model-Based Expert System for Estimating the Cost of Transportation Projects”, Expert Systems with Applications 36, Elsevier, 2009, pp: 4253-4267. 6. Wong W.K. et al., “A Decision Support Tool for Apparel Coordination Through Integrating the KnowledgeBased Attribute Evaluation Expert System and the T-S Fuzzy Neural Network”, Expert Systems with Applications 36, Elsevier, 2009, pp: 2377-2390. 7. Kim J-S., “Development of a User-Friendly Expert System for Composite Laminate Design”, Composite Structures 79, Elsevier, 2007, pp: 76-83. 8. Chakravorty S. and Thukral M., “Choosing Distribution Sub Station Location Using Soft Computing Technique”, International Conference on Advances in Computing, Communication and Control (ICAC3’09), ACM, 2009, pp:53-55. 9. Greffenstette J.J. and Baker J.E., “How Genetic Algorithms Work: A Critical Look at Implicit Parallelism”, Proceedings of the 3rd International Conference on Genetic Algorithms, San Mateo, CA, Morgan Kaufmann Publishers, 1989, pp: 20-27. 10. Verma Ms.N. et al., “An Approach Towards Designing of Car Troubleshooting Expert System”, International Journal of Computer Applications, Vol. 1, No. 23, 2010, pp:65-67. 11. Giarratano J. and Riley G., “Expert Systems: Principles and Programming”, Fourth Edition, 2004, ISBN-13: 978-0534384470. 12. Tseng M-H. et. al., “A Genetic Algorithm Rule-Based Approach for Land-Cover Classification”, ISPRS Journal of Photogrammetry & Remote Sensing, No. 63, 2008, pp:202-212.
ISBN: 978-1-61804-030-5
166