MultiCraft
International Journal of Business, Management and Social Sciences Vol. 1, No. 1, 2010, pp. 1-8
INTERNATIONAL JOURNAL OF BUSINESS, MANAGEMENT AND SOCIAL SCIENCES www.ijbmss-ng.com 2010 MultiCraft Limited. All rights reserved
Applying data mining and fuzzy technology on learning material recommendation mechanism R.M. Chao1, Shaio Yan Huang2*, Jia-Nan Chang3 2
1 Graduate Institution of Information and Social Science, National United University, Miaoli, Taiwan Department of Accounting and Information Technology, National Chung Cheng University, 168, University Road, Min-Hsiung Chia-Yi, Taiwan 3 Department of Information Management, Da-Yeh University, Datsuen, Taiwan * Corresponding author’s e-mail:
[email protected]
Abstract The rapid growth of e-Learning has caused learning object overload where the teacher is no longer able to effectively choose the learning object he/she is exposed to. To solve the problem above, we introduce a recommendation mechanism by which we can provide the teacher recommendation effectiveness when applied to Learning Content Management System (LCMS). The suggested procedure is based on association rule mining, associative classification and sequential pattern mining. Additionally, with the popularization of the concept of learning content management systems, it not only propels the whole development of knowledge learning, but also directs the LCMS of “Basic Learning Agent Service” of becoming the mainstream in the present elearning markets. This research also institute constructed a system called, “Knowledge-based Broker Service Centre” (KBSC). It allows the students to submit specific questions online in the form of their natural language. By using Chinese learning materialcutting, key words weighted value calculations, and professional categorizations, it can automatically analyze the nature of the student’s problem and search for the relevant information in the database to list the most suitable names of specialists as the assigned coordinator for the students. Keywords: Learning material, data mining, knowledge-based agent/broker, natural language, fuzzy sets theory 1. Introduction Data mining is a step of the process for knowledge discovery in databases (Gnardellis and Boutsinas, 2001). As implied in the name, this looks for knowledge from the database, not only database and knowledge base, but also include the relational application in artificial intelligence, machine learning, statistics, etc. Mining out meaningful information from the huge database through the knowledge mining will help the policy-maker to make the most advantageous decision-making. The data mining technology may not only be used to help understanding learner's study, course choice patterns, and curriculum timetabling assistance, also to be used to test new and emerging technologies for an ideal environment of a large number electronic students in virtual institution and an expansive curriculum delivery systems. In order to reach the greatest benefit, it is even more important to display (Berry and Linoff, 1997). This will be a great benefit to the activity of e-learning and has harnessed data mining technologies to organize learning communities and provide learning content recommendation based on student profiles (Shen, Wang & Shen, 2009; Shen & Shen, 2005). Many researchers regard it as Knowledge Discovery in Database (KDD). Data mining has many successful cases which exist in literatures and business applications; for example, biotechnology, marketing, new product development and finance. However, it has scarce applications to education than other issues (Merceron and Yacef, 2005). Literatures about e-Learning focus more on standardization than on compilation. But Learning Content Management System (LCMS) still stores a lot of data of assets which can be accessed (ex. learner profile, learning goals, learning events, learner emotions, learning knowledge-based objects) (Shen, et al., 2009). In education context, learning materials are designed based on pre-determined expectations and learners are evaluated to what extent they master these expectations (Askar & Altun, 2009). If we can use those data well, we can give some recommends properly when teachers compile courseware.
2
Chao et al. / International Journal of Business, Management and Social Sciences, Vol. 1, No. 1, 2010, pp. 1-8
Courseware is just like film script in this paper. Not only can good courseware transmit knowledge clearly, but those can help students to understand knowledge well. It is a key issue that how LCMS can help teachers compile nice courseware well. We observe that teachers still have to pick and choose learning objects (materials) (we call assets below) now during compiling. It is not only time-consuming but also burdensome. In that case, teachers maybe resist e-Learning unconsciously. A number of studies emphasized the need to capture learners’ interaction patterns in order to personalize their learning process as they study through learning objects (Askar & Altun, 2009). In this study, the recommendation mechanism operates data mining technology from two aspects, and provides teachers some recommends they want. Recommendation mechanism operates association rule mining to discover association rules within transaction records. Next, a classifier is developed with associative classification. Assets will classify according to Class-Association Rules. Teachers can compile courseware and find other assets which rely on recommendations. Recommendation mechanism operates sequential pattern mining to discover sequential patterns from transaction records. According to those results, teachers can save more spirit from collecting and selecting assets and place more professional knowledge in developing courseware. In contemporary education, emphasis is on meaningful learning, knowledge construction and self-directed learning (Kicken, et al., 2008; Loyens, et al., 2008; Brand-Gruwel, et al., 2009). To stimulate learners to construct knowledge in a meaningful way, students receive learning tasks and assignments that require them to identify information needs, locate information sources, extract and organize information from each source, and synthesize information from a variety of sources. This research is based on the LCMS of matching the student’s problems vs. teacher’s profession specialties. By using fuzzy sets theories, automatically file categorized theories and techniques, it has built up a service center system (Zadeh, 1965). To make the LCMS enables the students to submit questions online in the form of “natural language” and categorize and analyze automatically in response to the questions and problems. When each matching procedure was completed, the questionnaire was given to examine the correctness of the search as the data for the following adjustment of the system. 2. Methodology This research is based on a LCMS of a regional education center. They provide original data of digital courses between January 1, 2006 and March 10, 2007. The framework of this research is shown in Figure 1.
Material repository
Courseware repository
Data preprocessing
Association Rule Mining
Sequential pattern mining
Associative Classification
Rule repository
Classifier
Pattern repository
Figure 1. Framework
2.1 Data pre-processing Based on Han and Kamper (2000) proposed that the original data has to be preprocessed before mining. The data preprocessing consists of the four steps below: • •
Data cleaning: First, we checked do have any asset lost essential fields. Then, we make up the sum required by manpower. As regards to data noises and reduplicate records, we designed a method to save it, or delete it. Data integration: In this case, some assets obtained from cutting ready-made digital contents. But those digital contents had no records in the LCMS. So, we gathered those data and then redesigned it and wrote in asset
3
Chao et al. / International Journal of Business, Management and Social Sciences, Vol. 1, No. 1, 2010, pp. 1-8
• •
repository. Finally, in order to speed up the process of data mining, we integrated interested date like course name, discipline from asset repository and courseware repository into a new repository – mining pool. Data transformation: Original data may come from tow or more database and store by different formats. To mine those data quickly, we must transform those into single format. In this study, our original data had been defined well; hence, we do not transform any data. Data reduction: This step is decreasing the quantity of mining data. Its method have data visitation, data encoding, data reducing, and so on. Our original data are very simply; hence, we do nothing during this step, too.
2.2 Association rule mining When transforming original data into mining data, we can operate association rule mining technology to mine association rule from mining pool. Those results will be stored in rule repository, and recommendation mechanism will recommend according to those rules. •
•
Parameter: There is no standard criterion about how to determine the optimal value of the minimum support. Generally speaking, determination of minimum support regarding real situations. So, we performed an experiment where we varied minimum support from 0.1 to 0.9 in increments of 0.05 and discussed those results with managers of education center. From the results, we found that minimum support is 25%. We obtained complete and clear rules. When it is 50%, those rules were best in terms of comprehensibility and quantity. When we raised it to 65%, those rules decreased obviously, yet we trusted those more than others. Therefore, we decided to provide there minimum supports – 25%, 50% and 65% -- for teachers. They can select one from choice list according to real situation to gain different recommendations. System presentation: Asset research function includes both general research and recommendation. General search: Teachers search assets by fields, grades and retrieves (school, author, courseware and all does not want) and keywords. Recommendation: Teachers write down an asset number and select one of minimum supports, system will recommend related assets from rule repository.
2.3 Associative classification All assets had been classified with disciplines in asset repository. There are still all sorts of multimedia consisting of texts, pictures, voices and videos. If teachers want to exchange some similar assets, they must find those one by one. The purpose of classification is to find relations between multimedia and discipline, and to help teachers to find assets easily in a small range. In this study, Classification based on Multiple Class-Association Rules (CMAR) is offered (Li, et al., 2002), and FPL (Frequent Pattern Lists) is applied to substitute for FP-tree to mine Class-Association Rules (CARs). Because FPL is from FP-tree and is not modified core of classifier, its performance is not considered in this study. •
•
Parameters: Training data was between January 1, 2006 and November 30, 2006 and test data was between December 1, 2006 and March 10, 2007. The ratio of training data to test data was about 70% and 30%. The support threshold and confidence threshold of CARs mining are 2% and 50% through experiments. The database coverage threshold and confidence difference threshold which are parameters of pruning rule are four and 25%. System presentation: A teacher will gain other similarly assets in a new window when they click the blue triangle near an asset name. When clicking one of assets, they can peruse details about it. For example, a teacher clicked the triangle of Yses aquatic botanical garden, and system lists other resembling assets like Sha Shi aquatic botanical garden, Plants in campus on the new window.
2.4 Sequential pattern mining It is linear recommendation to association rule mining and associative classification, but those are not enough. Teachers sometime need a series of scenarios. For example, a teacher needs a frog’s description, sound and video and how to arrange for these assets of a frog’s life. To solve the problem above, sequential pattern mining is a useful method. We apply FPL to be sequential pattern mining algorithm. Cheung and Fu (2004), Seno and Karypis (2005) and Cong and Liu (2002) recognize that FPtree can find frequent pattern mining via tree structure. FPL was improved from FP-tree; therefore, it also can be done. Analysis of original data: To find sequential patters from every discipline. We first analyzed compiling behaviors of teachers and found the time period of compiling was concentrated at the beginning of the semesters, especially in the period between February and Mar and the period between August and September. To give them new information, we set a semester as a time-serious consists of the period between February and July and the period between September and January. Therefore, recommendation mechanism operates FPL on the basis of time-serious and disciplines. Those results will be stored in pattern repository from mining. • Parameter: According to experiment, we obtained clear and complete rules when minimum support is 2%. When we raised it more than 6%, those results were useless because itemsets of every rule were less then four items.
4
Chao et al. / International Journal of Business, Management and Social Sciences, Vol. 1, No. 1, 2010, pp. 1-8 •
System presentation: Teachers can select discipline (see Table 1) or write down a asset number to find sequential patters in that discipline. Next, asset recommendation system will list a series of sequential patterns according to input value. Teachers can peruse details by clicking one of the assets. For example, a teacher selected Science and life technology as input value, and the system gave him four sequential patterns by ascending order.
2.5 Fuzzy sets theory In the ordinary set theory, the relationship between an element x and a set, A can only be x A and x A. However in reality, there are a large amount of ‘ambiguous’ and ‘paradoxical’ situations. In order to indicate this concept, Zadeh (1965) vivified the absolute membership function in the ordinary set theory and allow the character level of the elements to present from the value 0~1. It does not only confine within the binary opposition theory of common mathematical set (either 1 or 0) but it can use the membership function to show the reflective relationship between the elements and the character level. Depiction of Questions in Fuzzy Set: With regard to the student question Q, the question key learning material set and the fuzzy set theory was used to depict the question Q in keyword Fuzzy Sets: Q = {(K1 , W1) , (K2 , W2) , (K3 , W3) … (Kn , Wn)}
(1)
or Q = {(Kj , Wj) | Kj
Kj: J: N:
K}, j =1,…,n
(2)
The j key learning material in student question Q The weighted value of the j key learning material in student question Q The key learning material unit of the student question Q as to the characters description of each professional category, was used the same way to generate the key learning material fuzzy set Ci: Ci = {(K1, Wi1) , (K2, Wi2) , (K3 , Wi3) … (Kn , Win)}
(3)
or Ci = {(Kj , Wij) | Kj Kj: Wij: N:
K} , j =1,…,n
(4)
The j key learning material in professional category i. The weighted value of j key learning material in professional category i. The weighted value has already been calculated and pre-stored by learning material collecting model in the key learning material database. The key learning material unit Ci of in the professional category i.
2.6 Automatic classification The main function of the Automatic Classification is using computer calculations to find the characters of the documents and categorized them (Richardo and Berthier, 1999). The Automatic Classification will first proceed words arrangement for the sample information and transform the information into learning material set data and find the characteristic key learning material set to represent the document. When testing or categorizing new documents, it is also based on the same procedure which is to find of characteristic key learning material sets, weight and calculate them with the ones found in the database. The result of the similarity will be labeled and categorized as the classification of the new document based on the highest value, then the entire automatic classification of the document is done. There are learning material classical modes for automatic classification: Boolean Model, Vector Space Model and Probabilistic Model. 2.7 System structure The system structure of the knowledge mediating service is shown in Fig 1. Its main models include: “learning material database collecting model”, “questions sorting model”, “specialist matching model” and “accuracy adjustment model”. Each function as described below: • Learning material database collecting model: This model’s main function is to collect all the characteristic key learning material sets and the weighted values from each professional category in selected learning sample to be used for the question sorting model and the accuracy adjustment model. • Question sorting model: This model could calculate by ways of proper auto-learning material cutting and calculation of similarity from customer’s natural language questions to specify the relevant professional category in respond to the questions.
5
Chao et al. / International Journal of Business, Management and Social Sciences, Vol. 1, No. 1, 2010, pp. 1-8 • •
Specialists matching model: The results of the question sorting model will be searched and compared in the specialist database to be listed as the knowledge brokers. Accuracy adjustment model: The accurate examination of the specialist group to the question sorting is in accordance with further adjustments. K now ledge-based Broker Service C enter
Q uestion
question sorting m odel
key phrase collecting m odel
key phrase database
Custom er specialist m atching m odel Q uestionnaire
specialist resources database C orrect degree replying
Specialist
know ledge broker
accuracy adjustm ent m odel
Figure 1. System structure diagram
3. Results This research used questionnaire investigation to prove this recommendation performance. We randomly chose thirty-seven teachers who used our system as sample. The condition of questionnaires implementation and retrieve are Table 1 as follows. Table 1. The condition of questionnaires implementation and retrieve Total 37 Invalid sample 3 Valid sample 34 Valid sample rate 92% From Table 1, there are three invalid samples and thirty-three valid samples. The valid sample rate is ninety-two percentage. We referred to literature and model of recommendation system to design this questionnaire. It consists of recommendation of assets, substitution, and recommendation of patterns. We collected all of questionnaire and set four measurements consist of Operation interface, Performance, Recommendation and Degree of confidence. All these questions are descriptive and were applied by Likert 5 point scale. It will be counted in addition: 5 points for very agree, 4 points for agree, 3 points for no comment, 2 points for disagree and 1 point for very disagree. The results are shown in Table 2. Table 2: The results of questionnaire investigation Mean Standard Deviation Operation interface 4.08 0.82 Performance 3.42 0.95 Recommendation 4.17 0.72 Degree of confidence 3.67 1.04 From the result of questionnaire investigation, The Operation interface, Performance, Recommendation and Degree of confidence of system were generally accepted by teachers. Even, most of teachers were satisfied with the Recommendation. On the other hand, the testing objects of this experiment were a central management research development center and the group specialists: students 48 people, specialists 16 people. The beginning date was January 1, 2006 and the termination date was March 10, 2007. The list of professional category includes 19 management fields. The system calculated the similarities between customer’s question and the professional category and come up with the most possible result. The result showed that there are a total numbers of 41 application data, and all of the professional categories determine that they are all within one or two categories. It showed that the system was gathering focus instead of dispersing. The following figures (as Figure 2) show the matching results:
6
Chao et al. / International Journal of Business, Management and Social Sciences, Vol. 1, No. 1, 2010, pp. 1-8
Figure 2. Matching results 4 Conclusion We aim at the problem of acquisition of assets which is not easy and time-consuming. To solve the problem, a recommendation mechanism is developed consisting of association rule mining, associative classification, and sequential pattern mining. The results indicated that our recommendation mechanism was beneficial to teachers. It can provide teachers with three recommendations including the relationship of assets, searching in identical class level, and sequential patterns about scenario. Those recommendations can help teachers to compile courseware. Most of teachers are satisfied with the recommendation mechanism and trust its recommendations. Questionnaires revealed that this mechanism can assist teachers in searching assets. The result of this study can refer for another issue about e-learning. It is a key subject about how to develop a meaningful courseware in elearning. Courseware itself has possibilities which can provide learners to link their knowledge framework with meaningfulness and realize other meanings and methods of use through Concept mapping. Therefore, it is an interested issue on how to develop courseware with concept mapping and how to descript proposition, concept and linking and labeling within a SCO (Sharable Content Object). A study of whether “meaningful learning” can support recommendation mechanism to enhance its “meaning”. By ways of the discussion and summation of the knowledge mediating theory and methods, this system is capable of dealing with application question in the natural language form and auto-gauging its professional category and recommend lists of group specialists. By using the key phrase fuzzy set calculation on the similarity between the question and professional category, it could compare and match with the specialties and come up with a list of group specialists that can be recommended to the broker center as the references for the case counseling. However, there are still some parts that need to be further improved. • • •
The professional key phrase database mostly relies on the manpower to maintain its function and it has not yet occupied with the self studying function. In the future, it could be combining with analogical-nerve internet in order to further elevate its goal of self-learning function to semi-construction or automatic-construction. If the personal writings, other relevant counseling document, specialty fuzzy set of the specialists could be added in as part of the established learning samples, it could cross-comparing the three fuzzy sets: customer questions, professional categories, professional specialties. It will help to improve the matching standard of the system. As to deal with a natural language, the proper inference ability and Chinese syntactic structure analysis were not mentioned in the present system. Under many circumstances, the combination of the phrases may extend to other meanings and the system can not cope with it. Right now these conditions are still lacking of solutions and need to be continuously researched on.
A final observation: Active participation in contemporary society is becoming ever more ’digitized’. Access to information, guidance and support concerning personal health; active citizenship and participation in national and local governance; control and safety surrounding personal finances and financial management; leisure and work–each is increasingly dependent upon access to digitised sources of information and communication (Wood, 2009). In the prosperous ‘digitized’ era of the science and information technology, the social change faces the development of the information technology since the last decades, and the single view is obviously and absolutely not enough to contain and explain it (Chao, 2008). All such life functions seem destined to become even more reliant upon digital technologies, and future changes in these technologies and in the uses we come to make of them will demand vigilance and flexibility from tomorrow’s citizens, who will need to keep themselves up to date and abreast of change if they are to maintain pace and place. Against this general scenario, achieving a solid understanding of why, when and how people
7
Chao et al. / International Journal of Business, Management and Social Sciences, Vol. 1, No. 1, 2010, pp. 1-8
are able to seek help and advice, and having the knowledge we need in order to provide help that is accessible and functional take on a major significance (Wood, 2009). We have to start from description and analysis of the people and society (‘know’ their ‘source’), then utilizing the management and communication rules to involve inside (‘lead’ their ‘way’), and finally holding the application and service of information and technology (‘serve’ their ‘practice’) to conduct the practices and effectiveness of information technology within behavioral science. In the modern society, based on the discipline of “communication” and utilizing of “information technology”, which is hooked together and inseparable, the “people” and “organization” produce a phenomenon of “dispersed” and “clustering” (Chao, 2008). Such observations, if accepted, help to underwrite and give credence to contemporary demands on schools to help to engender more independent and self-regulating learners; future citizens willing and able to meet the demands of lifelong learning (Rouet & Puustinen, 2009; Wood, 2009). References Askar, P., & Altun, A. 2009. CogSkillnet: An ontology-based representation of cognitive skills. Educational Technology & Society, Vol. 12, No. 2, pp. 240–253. Berry, M. J. A., Linoff, G, 1997. Data mining techniques: for marketing, sales, and customer support, NY: Wiley. Brand-Gruwel, S., Wopereis I., & Walraven A. 2009. A descriptive model of information problem solving while using internet. Computers & Education, Vol. 53, pp. 1207–1217. Chao R. M. 2008. Between behaviour science and human-computer interaction―Information and society (First Edition), (Academic Edition), Hwa Li Professional Publishing Co., Ltd., Taipei. ISBN:978-957-784-257-2. Cheung, Y. L., Fu, A. W. C., 2004. Mining frequent itemsets without support threshold: With and without item constraints. Knowledge and Data Engineering, IEEE Transactions on, Vol. 16, No. 9, pp. 1052 - 1069. Cong, G., Liu, B., 2002. Speed-up iterative frequent itemset mining with constraint changes. Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 107-114. Gnardellis, T., Boutsinas, B., 2001. On experimenting with data mining in education (Οι Τεχνολογίες της Πληροφορίας και της Επικοινωνίας στην Εκπαίδευση). 2ο Πανελλήνιο Συνέδριο µε ∆ιεθνή Συµµετοχή, pp. 275-283. Han, J., Kamber, M., 2000. Data mining: Concepts and techniques, San Fransisco: Morgan Kaufmann. Kicken, W., Brand-Gruwel, S., & Van Merrienboer, J. J. G. 2008. Scaffolding advice on task selection: A safe path toward selfdirected learning in on-demand education. Journal of Vocational Education and Training, Vol. 60, No. 3, pp. 223–239. Li, W., Han, J., Pei, J., 2002. CMAR: Accurate and efficient classification based on multiple class-association rules. Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 369-376. Loyens, S. M. M., Magda, J., & Rikers, R. M. J. P. 2008. Self-directed learning in problem based learning and its relationships with self-regulated learning. Educational Psychologist Review, Vol. 20, pp. 411–427. Merceron, A., Yacef, K., 2005. Educational data mining: a case study. The 12th International Conference on Artificial Intelligence in Education. Richardo, B. Y., Berthier, R. N., 1999. Modern information retrieval. Addision Wesley Longman Limited. Rouet, J. F., & Puustinen M. 2009. Introduction to “Learning with ICT: New perspectives on help seeking and information searching”, Computers & Education, Vol. 53, No. 4, pp. 1011-1013. Seno, M., Karypis, G. 2005. Finding frequent patterns using length – decreasing support constraints. Data Mining and Knowledge Discovery, Vol. 10, No. 3, pp. 197-228. Shen, L. P., & Shen, R. M. 2005. Ontology-based intelligent learning content recommendation service. International Journal of Continuing Engineering Education and Life-Long Learning, Vol. 15, No. (3-6), pp. 308-317. Shen, L., Wang, M., & Shen, R. 2009. Affective e-Learning: using “emotional” data to improve learning in pervasive learning environment. Educational Technology & Society, Vol. 12, No. 2, pp. 176–189. Wood, D. 2009. Comments on “Learning with ICT: New perspectives on help seeking and information searching”. Computers & Education, Vol. 53, pp. 1048–1051. Zadeh, L. A., 1965. Fuzzy sets. Information and Control, Vol. 8, pp. 338-353.
Biographical notes Dr. Ruey-Ming Chao is an Assistant Professor in the Graduate School of Information and Social Science at National United University (NUU), Taiwan R.O.C. He received his Ph.D. degree in Management Information Systems (MIS) at Nova Southeastern University, USA in 1999. His current research interests include the general area of Knowledge Intensive Business Services (KIBS), Knowledge Management, and Digital Learning System Development &Management. In particular, he is interested in digital learning, and the development model and strategy of knowledge reuse to the enterprises. Dr. Shaio Yan Huang is an Associate Professor of Accounting and Information Technology at the National Chung Cheng University, Taiwan. He is also a member of the International Affair and Education Board in Information Systems Audit and Control Association (ISACA), Taipei
8
Chao et al. / International Journal of Business, Management and Social Sciences, Vol. 1, No. 1, 2010, pp. 1-8
Chapter. He published his research in International Journal of Accounting Auditing and Performance Evaluation, Journal of Applied Business Research, International Journal of Business System Research and Journal of Applied Management and Entrepreneurship. His main research interests are managerial accounting, accounting information system, computer auditing and financial accounting. Jia-Nan Chang is a graduate student in the department of Information Management at Da-Yeh University. Her research interests are Digital Learning and Knowledge Intensive Business Services.
Received August 2009 Accepted November 2009 Final acceptance in revised form November 2009