Self-Organizing Map Clustering Method for the

3 downloads 0 Views 286KB Size Report
clustering, prediction and visualization. While the tools used in. EDM include Waikato Environment for Knowledge Analysis. (WEKA), Rapid Miner, MATLAB, etc.
Self-Organizing Map Clustering Method for the Analysis of E-Learning Activities Musa Wakil Bara Department of Computer Science Mai Idris Alooma Polytechnic Geidam, Nigeria [email protected] Mohammed Maina Modu Department of Computer Science Mai Idris Alooma Polytechnic Geidam, Nigeria [email protected]

Abstract:- This research investigates the performance of SelfOrganizing Map (SOM) Clustering Method to analyze students’ e-learning activities with the aim to identify clusters of students who use the e-learning environment in similar ways using data obtained from the log files of their actions as input. The SOM clustering technique was used to group the students into three clusters. Learning behaviors of students in each cluster were analyzed; a relationship between students’ learning behaviour and their academic performance (Final Results) was investigated. The analysis shows that students in Cluster1, having the highest interactions frequency with the e-learning, also got the highest final score mean of 91.12%, this followed by Students in Cluster2 with less number of interactions than Cluster1 and final score mean of 75.65%. Finally, students in Cluster3 have least number of interactions than the remaining clusters with least final score mean of 36.57%. The research shows that, students who participate more in Forum activities perform higher, while students with lowest records in Forum activities have the lowest performance. The research found that Forum activity has significant factor on student’s course success but it is optional to students and no marks allocated to it. The research suggests that marks should be allocated to Forum activities to encourage students’ participations. Keywords— E-Learning, Educational Data Mining, SelfOrganizing Map, Clustering,.

I.

INTRODUCTION

The advancement in information technology has enhanced the effectiveness of web-based education (e-learning) system [1]. The e-learning system allows students from anywhere and at any time to carryout different learning activities such as viewing the learning materials, carrying out assignments, Forums participations and so on [2]. These activities usually take palace via Learning Management System (LMS), which provides access to the courses and facilitates communications between students and tutors or among the students. All students’ actions are stored in log files via the LMS. The students’ actions are characterized by meaningful learning which show the students’ learning behaviour.

978-1-5386-3761-6/18/$31.00 ©2018 IEEE

Nor Bahiah Ahmad Department of Sofware Engineering Universiti Teknologi Malaysia Johor Bahru, Malaysia [email protected] Hamisu Alhaji Ali Department of Computer Science Mai Idris Alooma Polytechnic Geidam, Nigeria [email protected]

The learning success of students depends on their learning behaviours. Some students are very active; some are medium active while some are low active. Students also vary in prioritizing the e-learning activities. Some students pay more attention on attendance and compulsory assignments only, while some will extend beyond compulsory activities by participating in discussion forums, participation in class exercise and searching for additional learning material. It is therefore very important to know and monitor the learning behaviour of each student in the e-learning. However, the rapid growth of data in the learning management system keeps the data growing everyday thus making it huge and unstructured [3], this makes it difficult to identify individual learning behaviour. Another issue in the LMS is that, it does not provide a means for knowledge discovery about the data in the system’s log files, thus making it challengeable to analyse students’ learning behaviour [4]. A proper method to extract and identify student learning behaviour from the LMS is highly required. Data mining is a method of knowledge discovery from databases. Data mining is the act of data analysis that involves the extraction useful knowledge on data that from different perspective, the process involves patterns, models and rules [5]; it can also be defined as the process that is concerned with the discovery of valuable and understandable information from massive databases. In e-learning, Data Mining (also called Educational Data Mining) is an emerging discipline, concerned with development of methods for discovering the uniqueness of data that come from educational database, and use the methods to for understanding learners and their learning skills [6]. These methods include classification, clustering, prediction and visualization. This paper is concerned with students’ involvement in elearning. It studies and analyzes the effects of students’ learning behaviors to their learning performance upon carrying out the e-learning activities. It also discovers which activity has more significance over the others. The research uses data obtained from log files of Computer Science undergraduate students taking a Data Structure Course at Faculty of

Computing, Universiti Teknologi Malaysia. The research is aimed to investigate the performance of Self-Organizing Map (SOM) clustering technique to obtain clusters of students who have similar way of interacting with the e-learning environment, and analyze the effects of their learning behaviors on their learning performance using their final semester result. II.

affect students’ performance. Reference [10] used educational data mining technique to improve the comprehensibility of the learning materials, students were clustered based on their behavioral usage of the e-learning, however, the researchers did not state which activities is more significant than the other. Fig. 2 depicts the steps of mining student activities in EDM.

ELEARNING AND EDUCATIONAL DATA MINING

Educational Environment

A. ELearning Reference [7] defined ELearning an educational method that involves the use of a wide range of technologies mostly online or computer-based in process of learning. E-learning does not completely replace the classroom setting but improves it by taking advantage of new content and delivery technologies to enable learning process. It is an environment in which learners interact with the learning materials, peers and lecturers [8]. The platform that supports the e-learning activities is known as Learning Management System (LMS) as in [9] where LMS was defined as a network-based technology that help in facilitating computerized learning, provision of an integrated platform for content delivery and content creators which act as central components of an enterprise e-learning. LMS stores the e-learning activities that are being carried out by students; the activities are stored in a log file and can be used to analyze the students’ habit as they use the e-learning. Examples of learning management system include Modular Object-Oriented Dynamic Learning Environment (Moodle), Felder and Silverman, Atutor, Blackboard, etc. [10]. Fig. 1 shows some benefits of learning management system (LMS)

ELearning Activities

Data Preprocessing

EDM Methods

Patterns

Interpret Fig. 2 Educational Data Mining Steps

III.

SELF-ORGANIZING MAP CLUSTERING METHOD

Clustering is a fundamental task in educational data mining. It is an unsupervised learning that groups data instances into subsets such that instances in one group are similar but differ with instances in other groups [12], [13] & [14]. Clustering technique was first applied in e-learning by Gwo-Jen Hwang [15], where a network-based and diagnostic system was implemented and uses dynamic programming approach for test sheets generation. In clustering techniques, the number of clusters in data partitioning can either be predefined by the researcher [16] or be obtained using cluster validity indices [12]. Fig. 1.Some Benefits of Learning Management System

B. Educational Data Mining Educational Data Mining (EDM) is an emerging discipline, concerned with development of methods for exploring the uniqueness of data that come from educational database, and use those methods to better understand learners and their learning settings [6]. These methods include classification, clustering, prediction and visualization. While the tools used in EDM include Waikato Environment for Knowledge Analysis (WEKA), Rapid Miner, MATLAB, etc. Educational data mining fits various research works in elearning such identifying students’ learning styles [6], prediction of students’ performance as in [10] & [11] and determining the situation, characteristics and phenomena that This research was carried out at Universiti Teknologi Malaysia Supported by Mai Idris Alooma Polytechnic Geidam Nigeria and Sponsored by Tertiary Education Trust Fund TETFund Nigeria.

Self-Organizing Map (SOM), also called Kohonen Map is an unsupervised learning algorithm invented by Professor Kohonen in 1982, and was described as a tool for “visualization and analysis” of high dimensional data [17]. There are two phases in SOM, the learning phase, where maps are built, network is organizes using competitive process on training set, and the second phase is (also called prediction phase) where new vectors are allocated a space to the converged map and new data can easily be classified or categorized. Reference [18] applied K-Means clustering method to group E-learners based on meaningful learning characteristics, however, emphasis was not given on the priority of the learning activities. In [14], different Fuzzy methods were used to identify learners behaviours, the researchers use days and

times to see how students read the learning materials. It does not break the eLearning activities. The idea of choosing SOM Clustering method is because of its advantages which include the mapping of high-dimensional data to low-dimensional data such as 1-D or 2-D. Another advantage of SOM is easier visualization of data, it also supports large amount of data and gives high mining accuracy.

3. Matching: Find the winning neuron I(x) with weight vector closed to input vector; 4. Updating: Apply the weight update equation Δwij =Ƞ(t)Tj,I(x)(xi-wij) 5. Continuation: Return to step 2 until the feature map stops changing.

V. IV.

METHODOLOGY

The raw data contains some features and information that have no effect on accuracy of the method used during the mining process. Before carrying out any analysis on the collected data, the data must first be preprocessed. The importance of preprocessing is to make the data clean from noisy data, missing values or inconsistent data. The major tasks involved in data preprocessing include data cleaning in which the raw data is being cleaned; these include removal of noisy data, filling the missing values and identifying outliers; data transformation which involved transforming the cleaned data into format that is readable by the data mining tool. In this research, the cleaned data was transformed to Comma Separated Version (CSV) before been exported to MATLAB for clustering. A. Data Preprocessing The raw data contains some features and information that have no effect on accuracy of the method used during the mining process. Before carrying out any analysis on the collected data, the data must first be preprocessed. The importance of preprocessing is to make the data clean from noisy data, missing values or inconsistent data. The major tasks involved in data preprocessing include data cleaning in which the raw data is being cleaned; these include removal of noisy data, filling the missing values and identifying outliers; data transformation which involved transforming the cleaned data into format that is readable by the data mining tool. In this research, the cleaned data was transformed to Comma Separated Version (CSV) before been exported to MATLAB for clustering. B. SOM Implementation After preprocessing the data, Self-Organizing Map (SOM) Clustering method was used to cluster the data. Cluster validation was conducted using Calinski-Harabasz Criterion and the optimum number of k (Clusters) was found to be three. SOM was launched in MATLAB using the nctool command in the command window. The network consists of six (6) inputs variables namely the Source_View, Resource_View, Assign_View, Assign_Submit, Forum_View and Forum_Discuss as shown in Fig.3, these are the e-learning activities carried out by the students during the semester. The algorithm for SOM Clustering is given below: 1. Initialization: Choose random values for initial weights wij; 2. Sampling: Draw a sample training input vector x from the input space;

EXPERIMENT AND RESULTS

A. Dataset The dataset was collected from the Universiti Teknologi Malaysia (UTM) Moodle LMS log records. It contains the activities of Computer Science undergraduate students from Faculty of Computing, Universiti Teknologi Malaysia during the period of first semester 2014/1015 session. The activities of 22 regular students were monitored from the beginning of the semester to the end. The semester lasted for fourteen weeks. The data was downloaded in excel format. The interface is the medium that is used to carry out the data collection from the log records. B. SOM Network Creation This implies the creation of neural network that would perform the SOM clustering. The network consists of a number of neurons that determine the map size; there are two ways of determining the map size. The first method is to determine the number of neurons using equation (1) (Jian Tian et al, 2015) [15]. (1) Where M is an integer and it represents the approximate number of neurons required in the network, N is the number of observations (number of students in this case) on which the research takes place. The second method is to set the number of neurons equals to the predefined number of clusters [16]. Each output neuron represents a cluster. In this case, the number of neurons was set to three (3), each neuron represents a cluster center. The network structure is shown in Fig. 3.

Fig. 3 SOM Network with 6 Inputs and 3 Outputs

C. Training the Network The network was then trained using the data at hand and the result obtained is shown in Fig. 4. The result shows three clusters, each cluster contains instances associated to it.

VI.

Fig. 4 Three Clusters and Associated Instances

D. Result Analysis From the experimental result, the students are grouped into three clusters. Each cluster contains students with similarity in interaction with the e-learning environment. Cluster1 contains students who have the highest number of actions thus; they are termed as Very Active Students. Students in Cluster2 have more actions than those in Cluster3 and they are termed as Active Students. Cluster3 students have lowest actions hits and are termed Non-Active, this categorization is based on students’ interactions and in line with the work of [10] categorization. On the other hand and based on students’ course success, Clusters are categorized as High Learners (Cluater1) with the highest marks 91.12% in their Final Scores, Medium Learners (Cluster2) with Final Score marks 75.65% and Low Learners (Cluster2) with the lowest Final Score marks of 36.57. This categorization is in line with Kardan and Cornati [19]. Fig. 5 shows the graphical representation of Actions hits means and the Final Score Means of each Cluster.

Fig. 5 Showing the Activities of Each Cluster and Their Final Marks

FINDINGS

From the experiments carried out, it was learnt that, students’ actions on e-learning activities differ according to their learning behaviors. As a result, the students are grouped into clusters such that each cluster contains students with similarities in learning behavior. The research discovered three categories of students, those that are Very Active (Cluster1), students of this group have higher number of interactions with the E-learning especially participation in Forum activities and they have the highest grades. Active (Cluster2), students of this group have average number of interactions with the Elearning and have average final grades, those that are Non-Active (Cluster3) students of this group have least number of interactions with the E-learning and they all failed the course. It can therefore be concluded that, students’ learning behaviour contributes to their learning performance. It was also learnt that, the E-learning activities differ in contributing to student’s learning performance. Students who participate more in Forum (Cluster1) have high chance to outperform the student who do not participate more in Forum. Likewise, students who failed to submit their assignments regularly have chance to fail the course. It was learnt that participation in Forum activities by students is optional and there is no mark allocated to it, this is why students feel reluctant to participate. However, if marks can be allocated to Forum participation, students can give it a priority and this would improve their learning performance. VII. CONCLUSION AND FUTURE WORK In this research, the implementation of SOM Clustering Method to group students according to their similarities in interaction with the E-learning was explained. Three clusters were obtained and are termed based on students’ learning success as High Learners (Cluster1) with Final Score of 91.12%, Medium Learners (Cluster2) with Final Score of 75.65% and Low Learners (Cluster3) with Final Score of 36.57% in line with Kardan and Cornati [19], as Very Active Students (Cluster1), Active Students (Cluster2) and Non-Active Students (Cluster3) in line with [20] categorization. The researchers also conducted analysis on the correlation between students’ learning behaviors and their course success (performance). The result showed that, students that fall in both ‘High Learners’ and ‘Very Active’ group (Cluster1) emerged to be best in their course success. The Final grade is A+ or an excellent pass for this group. Cluster2 contains students that are Active and High Learners, and passed with an A grade. Finally, Cluster3 students emerged to be both Low Active and Low

Learners with an F grade which means they failed the course. VIII. REFERENCES [1]

L. Xu, C. Jiang, J. Wang, J. Yuan, and Y. Ren, “Information Security in Big Data: Privacy and Data Mining,” IEEE Access, vol. 2, pp. 1149–1176, 2014.

[10]

A. Bogarín, C. Romero, R. Cerezo, and M. Sánchez-Santillán, “Clustering for improving educational process mining,” Proceedins Fourth Int. Conf. Learn. Anal. Knowl. - LAK ’14, pp. 11–15, 2014.

[11]

S. Sisovic, M. Matetic, and M. B. Bakaric, “Mining Student Data to Assess the Impact of Moodle Activities and Prior Knowledge on Programming Course Success,” Proc. 16th Int. Conf. Comput. Syst. Technol., pp. 366–373, 2015.

[12]

L. Rokach and O. Maimon, “Chapter 15— Clustering methods,” Data Min. Knowl. Discov. Handb., p. 32, 2010.

[2]

B. Baradwaj and S. Pal, “Mining educational data to analyze student’s performance,” Int. J. od Advamced Comput. Sci. Appl., vol. 2, no. 6, pp. 63–69, 2012.

[13]

[3]

Félix Castro, A. Vellido, À. Nebot, and F. Mugica, “Applying Data Mining Techniques to e-Learning Problems\r,” Stud. Comput. Intell., vol. 62, no. 2007, pp. 183–221, 2007.

A. S. Sabitha, D. Mehrotra, and A. Bansal, “Delivery of learning knowledge objects using fuzzy clustering,” Educ. Inf. Technol., 2015.

[14]

[4]

U. F. Alias, N. B. Ahmad, and S. Hasan, “Student Behavior Analysis Using Self-Organizng Map,” vol. 10, no. 23, pp. 1–9, 2013.

E. E. Technology and T. Benha, “Evaluation of E-Learners Behaviour using Different Fuzzy Clustering Models : A Comparative Study,” vol. 7, no. 2, pp. 131–140, 2010.

[15]

G. Hwang, "A conceptual map model for developing intelligent tutoring systems." Computers & Education 40.3 2003: 217-235.

[5]

U. Fayyad, “The KDD Process for Extracting Useful Knowledge from Volumes of Data,” vol. 39, no. 11, pp. 27–34, 1996.

[16]

[6]

V. Efrati, C. Limongelli, and F. Sciarrone, “A Data Mining Approach to the Analysis of Students’ Learning Styles in an eLearning Community: A Case Study,” Univers. Access HumanComputer …, pp. 289–300, 2014.

G. Akcapynar, A. Altun, and E. Cosgun, “Investigating students’ interaction profile in an online learning environment with clustering,” Proc. - IEEE 14th Int. Conf. Adv. Learn. Technol. ICALT 2014, pp. 109–111, 2014.

[17]

F. Bação and V. Lobo, “Introduction to Kohonen’s Self-Organising Maps,” Inst. Super. Estat. E Gest. Inf., p. 22, 2010.

[18]

N. Octaviania, Dewi, Mohd Shahizan Othmana, Norazah Yusofa, and Andri Pranoloc. "Applied clustering analysis for grouping behaviour of e-learning usage based on meaningful learning characteristics." Jurnal Teknologi., vol. 1, pp. 125–138, 2015.

[19]

S. Kardan and C. Conati, Kardan, Samad, and Cristina Conati. "A framework for capturing distinguishing user interaction behaviors in novel interfaces." Educational Data Mining 2011.

[20]

C. Romero, S. Ventura, and E. García, “Data mining in course management systems: Moodle case study and tutorial,” Comput. Educ., vol. 51, no. 1, pp. 368–384, 2008.

[7]

P. Dráždilová, G. Obadi, K. Slaninová, S. Al-Dubaee, J. Martinovič, and V. Snášel, “Computational intelligence methods for data analysis and mining of eLearning activities,” Stud. Comput. Intell., vol. 273, pp. 195–224, 2010.

[8]

K. Niu, Z. Niu, D. Liu, X. Zhao, and P. Gu, “A Personalized User Evaluation Model for Web-Based Learning Systems,” 2014.

[9]

N. H. M. Ariffin, H. A. Rahman, N. A. Alias, and J. Sardi, “A survey on factors affecting the utilization of a Learning Management System in a Malaysian higher education,” IC3e 2014 2014 IEEE Conf. e-Learning, e-Management e-Services, pp. 82–87, 2015.

Suggest Documents