A Decision Support System to improve e-Learning ... - CiteSeerX

19 downloads 109097 Views 428KB Size Report
Mar 22, 2010 - intelligent tutorial systems, adaptative and intelligent web-based systems, etc. to support the ... students' behavior and the effectiveness of course design, to ..... 16th European conference on artificial intelligence, pp. 17–23.
A Decision Support System to improve e-Learning Environments Marta Zorrilla

Diego García

Elena Álvarez

[email protected]

[email protected]. es

[email protected]

Department of Mathematics, Statistics Department of Mathematics, Statistics Department of Applied Mathematics and Computation, University of and Computation, University of and Computer Science University of Cantabria Cantabria Cantabria Avda. Los Castros s/n Avda. Los Castros s/n Avda. Los Castros s/n Santander, Spain Santander, Spain Santander, Spain 34 942 20 20 63 34 942 20 14 20 34 942 20 18 44

ABSTRACT Nowadays, due to the lack of face-to-face contact, distance course instructors have real difficulties knowing who their students are, how their students behave in the virtual course, what difficulties they find, what probability they have of passing the subject, in short, they need to have feedback which helps them to improve the learning-teaching process. Although most Learning Content Management Systems (LCMS) offer a reporting tool, in general, these do not show a clear vision of each student’s academic progression. In this work, we propose a decision making system which helps instructors to answer these and other questions using data mining techniques applied to data from LCMSs databases. The goal of this system is that instructors do not require data mining knowledge, they only need to request a pattern or model, interpret the result and take the educational actions which they consider necessary.

Categories and Subject Descriptors H.2.8 Database Applications: data mining

General Terms Algorithms, Management, Design, Experimentation

Keywords Data mining, Web mining, Data warehouse, E-learning, Distance education.

1. INTRODUCTION In recent years, more and more, universities and educational centers offer the possibility of enrolling in their degrees and masters in a semi-presential or completely virtual (online) way in order to facilitate the lifelong learning and to make this compatible with other activities. In general, they use e-learning Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. EDBT 2010, March 22-26, 2010, Lausanne, Switzerland. Copyright 2010 ACM 978-1-60558-945-9/10/0003 ...$10.00.

platforms such as learning content management systems (LCMS), intelligent tutorial systems, adaptative and intelligent web-based systems, etc. to support the learning and teaching process. A lot has been written and said about guidelines for designing virtual courses [3][5][25] and more and more instructors follow them with the aim of increasing the pass rate. But, even if the course is well-designed, it may be not suitably adapted to students’ learning styles [10][14] or perhaps students feel underattended or lost in the hyperspace of the course and they require extra motivation [6]. Instructors unfortunately have very few tools to monitor and track the student activity in the platform and so be able to detect and solve these problems. These systems offer some reporting tools that, in general, show raw data (nº of accesses, time spent in the course, nº of message read, etc.) in a tabular format. As a consequence of this, getting a clear vision of each student or group academic progression during the course is difficult and time consuming for instructors [8]. But, at the same time, these systems accumulate a vast amount of information which is very valuable and can be used to analyze students’ behavior and the effectiveness of course design, to predict students’ performance and their final mark, to group students according to their preferences, and in short, to improve the educational process, if it is suitably treated. It is here that the decision support systems have a practical application of great interest for the educational community, since they are orientated to define and measure business key performance indicators (KPI) and understand their behavior and process, summarize, report and distribute the relevant information on time. We have already developed a tool called MATEP [36] for monitoring and analyzing the learners’ behavior in e-learning platforms, in particular for WebCT 4.0 (now BlackBoard). This tool uses the data registered in a data warehouse database which come from log files and LCMS database (see Figure 1). MATEP allows instructors to know useful, consistent, understandable information by means of expressive and easy-to-use static and dynamic reports which are built querying the data warehouse database directly or the OLAP cubes. With these reports, instructors know how their students progress in the virtual course, compare their activity with respect to the average student activity, get an idea of the learning style

according to the resources students use and assess the course design seeing the click stream carried out by the students. But, they do not yet answer other questions such as:

algorithms nor their parameters and offer good visualization facilities to make their results meaningful to educators and elearning designers.



Knowing students’ profiles according to demographic and navigation information



Grouping students’ according to the their style of learning

As far as we know there are two works in this sense. TADA-Ed [15] and Moodle Data Mining Tool [23], but in both, instructors have to have certain knowledge of data mining to use them.



Knowing drop-out students’ profile



Predicting students’ grades



Finding out the questions which students fail more frequently



And so on.

But, answers to questions such as the previous ones can not be answered if data mining techniques are not used. For that, it is convenient that LCMSs add modules which support “intelligent techniques”. Currently, data mining tools (Weka[30], Keel [2], etc.) are normally designed more for power and flexibility than for simplicity. Most of the current data mining tools are too complex for educators to use and their features go well beyond the scope of what an educator may want to do [23]. Consequently, these modules must incorporate data mining capacities using an intuitive and easy to use interface which require neither choosing

Our proposal is to extend our data warehouse architecture in order to generate and store the data mining models [18]. That means, choosing the variables that allow us to answer each question, to specify the pre-processed tasks they require, to store them suitably in the data warehouse, to determine the algorithm to be used in such a way that instructors only need to interpret the result and take the educational actions which they consider necessary, and finally, to store the data mining models obtained in order to calculate incremental and refined patterns later. The paper is organized as follows. In Section 2 data mining applied to educational context is introduced and works published in this area are mentioned. Section 3 explains the proposed architecture to develop the decision system. Section 4 gives details about how to obtain some patterns which are interesting for instructors. Finally, section 5 summarizes the goals of the work which we propose.

Figure 1. Extended MATEP architecture

2. RELATED WORK Educational data mining (EDM) is an emergent discipline concerned with developing methods for exploring the unique types of data that come from the educational context [22]. In short, EDM is the application of the data mining techniques in the area of education, with the aim of obtaining a better comprehension of the students’ learning process and of how they participate in it, in order to improve the quality of the educational system. Data mining techniques are extensively used in other fields such as business, marketing, bioinformatics, science and so on, but the specific characteristics of data from e-learning environments make their application particular. One of these characteristics is the fact that it is difficult, or even impossible, to compare different methods or measures a posteriori and decide which is the best [16]. Take the example of building a system to transform handwritten documents into printed documents. This system has to discover the printed letters behind the hand-written ones. It is possible to try several sets of measures or parameters and experiment what works best. Such an experimentation phase is difficult in the educational field because the data is very dynamic and can vary a lot among samples (different course design, students with different skills, different methods of assessment, different resources used, etc.). This reduces the amount of data available to mine, only that corresponding to the students enrolled in the course. Furthermore, as a consequence of not using more data than that stored in the database of the e-learning platform, data mining models lack context information. That means that we will obtain a model but it will surely not be the best. We would obtain more accurate patterns if we knew more about course details, had background knowledge of the students or their interest in the course and so on (this information could be obtained from surveys, for example). One advantage is that data sets are usually very clean, i.e., the values are correct, so that few pre-processing tasks are required. There are a great number of works in which data mining techniques are used in order to understand learner behaviour [12][27], to recommend activities, topics, etc. [34] or to provide instructional messages to learners [29] with the aim of improving the effectiveness of the course, promoting group-based collaborative learning [20], predicting students’ performance [12], etc. A survey about the application of data mining to educational systems is found in [22]. Other data mining fields related to our aim are interactive data mining and visual data mining. The first one aims to investigate ways in which the user can become an integral part of the mining process [19][35]. The need for user inclusion is based on the premise that the concept of interestingness is subjective rather than objective and cannot therefore be defined in heuristic terms. Regarding visual data mining, it focuses on integrating the user in the KDD process in terms of effective and efficient visualization techniques, interaction capabilities and knowledge transfer [9][13][31]. Although our aim is to avoid instructors needing to know data mining techniques in order to take advantage of them, the advances in both areas will help us twofold: how to include the instructor’s participation in the discovery knowledge process if necessary [4]; and how to explain the data mining results [33].

Finally, it must be mentioned that there are other research works which focus on analysing distance student data using other technologies such as data warehousing and OLAP, for example [26][32][37].

3. ARCHITECTURE The proposed system with the aim of being generic and usable for different e-learning platforms is designed based on a modular architecture as can be observed in Figure 1. This will have at least the following modules: •

A module to read and gather data from the e-learning platform, to carry out the pre-processing tasks related to the application of data mining algorithms and to store this data in the data warehouse database (this module gathers ETL processes and the Data Staging Area)



A module which wraps the data mining algorithms (Data Mining Module)



A user-friendly interface oriented towards the analysis of results.

Three open source data mining software packages, RapidMiner [17], Weka [30] and Keel [2] will be mainly used and tested for our proposal. We have chosen these tools because they are opensource, their algorithms can either be applied directly to a dataset from their own interface or used in your own Java code and all of them contain tools for data pre-processing, classification, regression, clustering, association rules and visualization. Furthermore, RapidMiner is currently the leading open-source data mining solution according to KDnuggets Data Mining Software Usage polls in 2009 [21]. Although RapidMiner incorporates most of the Weka algorithms, it still contains some algorithms, especially in the area of descriptor selection, which are not available in other software. Finally Keel will be used to build models using evolutionary algorithms which are not gathered in the other tools. With regard to commercial tools, BI SQL Server 2005 will be used due to the fact that our data warehouse is developed with it. The communication among modules will be done by means of XML files. Wherever it is possible, we use the standard Predictive Model Markup Language (PMML). This is an XML-based language which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications. The user interface will be designed according to web standards. Likewise, we also think that a data mining service [28] independent from this architecture, would be good, in such a way that instructors could benefit without having to deploy all the system, they would only need to prepare the input data file according to a template, request the generation of the model and interpret the results.

4. PROPOSAL OF PATTERNS Initially, the set of models which we will design will use descriptive techniques such as clustering and association. This allows us to gain an insight into students’ characteristics and depict students’ learning patterns. Later, we will deal with

prediction and classification tasks, once we have determined which parameters are more relevant. We start giving answer to questions such as: •

Knowing students’ profiles according to demographic and navigation information



Knowing drop-out students’ profile and successful students’ profile



Knowing session patterns



Grouping students according to their use of course resources



Finding out the questions which students fail more frequently



Discovering the resources which are commonly used together

As it has been said, we have a data warehouse database, so we have the parameters from LCMS database already prepared to make decisions. This means that the pre-processing tasks are reduced. Table 1 gathers the set of variables with which we can work. These can be obtained with different levels of aggregation. For example, student time spent can be calculated per session, per resource, per action, per week, etc.

Table 1. Some of available variables. Variables Nº sessions

Learner gender

Nº hits

Learner academic level

Time spent

Learner age

Delay among sessions

Nº chat room entered

Nº content pages viewed

Nº wiki page edited

Nº messages post to forum

Nº quizzes done

Nº messages read on forum

Nº quizzes passed

Nº messages replied to forum

Nº quizzes failed

Nº messages sent by mail

Nº items in test

Nº messages read on mail

Nº attempts in item

Nº assignments read

Grade in item

Nº assignments submitted

Mark in each assessment, test, or assessed task

In general, numerical data will be discretized into categorical classes that are easier to understand for the instructor (categorical values are more user-friendly for the instructor than precise magnitudes and ranges) [7]. For example, the mark could be divided in 4 classes: fail, pass, good and excellent if the system has the criteria to do it. The equal-width method will be generally

used to divide the range of the attribute into intervals, for example nº of sessions could be divided in low, medium or high. We define session as a series of requests by the same identified student (user) from the moment he or she connects to the course until he or she disconnects or leaves it. We consider hit as each click in a web page. Resource is any tool available in the LMS such as mail, forum, wiki, and so on. Lastly, we define action as any activity inside a resource, for example sending a mail, browsing content pages, etc. Next, we show two possible templates for building the student profile and session profile and a template for discovering the resources which are used together. For this case study we have used the data from a virtual course entitled “Introduction to multimedia methods”. It is a subject of 6 ECTS which was taught in the first semester of 2009 at the largest virtual campus in Spain, called G9 (this group is composed of 9 Spanish universities; one of them is the University of Cantabria). It is a practical subject in which a multimedia tool is taught. The course is designed by means of web pages conformed to SCORM [24] and include some video tutorials, flash animations and interactive elements. It is registered in Blackboard LMS. Although the number of students enrolled in the course was 80, only 45 did the first assignment, whose submission was 15 days after the beginning of the course, and finally, 37 students followed the course until the end. First, we analyse the student profile. This template utilizes the following input parameters: gender, age, number of sessions in the course, time spent in the course, average sessions per week, average time spent per week. These parameters were chosen in accordance with the instructor, since the model obtained will be explained depending on them. We use EM (Expectation-Maximization) and KMeans as clustering algorithms [30]. Given that the EM algorithm provides a probability distribution that can be used as a similarity criterion to characterize the data, we utilize it in order to know the number of clusters with which the KMeans algorithm will be executed (required parameter). We show the result obtained with KMeans because it is easier to understand it graphically and statistically. Each cluster is represented by its centroid, which means, the "average" of all its points. As can be extracted from Figure 2, there is a 44% of females with an age around 22 and they carry out more sessions than the rest and these are longer. Males about 24 represent 26% of the population and dedicate less time per session (practically half) and the number of visits is lower. The youngest males (31%) have a behavior quite similar to the female group. The image on the right shows each cluster graphically, using the percentile in which the cluster variable value is located. Cluster 1 and 2 are very similar, the difference is in the gender variable which is not represented due to the fact that it is not numerical. Next, we show the template for discovering the resources which are commonly used together. This contains the following parameters: session id, and a Boolean variable indicating if each one of the available resources in the course was visited: contentpage, mail, discussion, chat, assignments, weblink, organizer, learning objectives, assessment, calendar and others.

Average 22,4872 Male 1976,8065 128,0323 115,8065 7,0323

Age Gender TotalTime Sessions AvgTimeWeek AvgSessionWeek

Cluster 0 24 Male 1313,4806 80,6032 76,8806 4,3032 Instances: 26% (10)

Cluster 1 22,1765 Female 2290,4953 146,4175 134,26 8,1233 Instances: 44% (17)

Cluster 2 21,6667 Male 2085,185 141,5108 122,1022 7,7608 Instances: 31% (12)

Figure 2. Student profile. Table 2. More interesting frequent itemset

We use the Apriori algorithm [1] (association rule) in this case since its goal is to find frequent item sets. As a consequence of the fact that some of the resources were used very little, those variables with a rate of use inferior to 15% of the instances (sessions) were removed. The file used had 5666 instances. We executed the algorithm with a minimum support of 0.2 and a minimum confidence of 0.7 as parameters. A support of 20% means that 20% of all the sessions under analysis show that the resources in the antecedent and consequent of the rule are used together. A confidence of 70% means that 70% of the sessions which used the resources shown in the antecedent of the rule, also used the resources which appear in the consequent of the rule.

Resources

The minimum support parameter is obtained from the frequent item set calculation (first phase of the algorithm, see Table 2), 0.114 in this case and it is established a little higher, 0.2, in order to obtain more interesting rules; and the confidence parameter is established slightly below the value of the most frequent item set, organizer with 80.3% in this case, in order to obtain rules in which other resources appear. Rule 8 16 20 23

Disc.

24 27

Yes Yes

No

No No

Rule 28 31 41

Disc. Yes Yes Yes

Assig. No No

Mail

44

0

Rule 33 45

Cont. No No

Yes Yes Yes

Assig.

Content Yes

No

No

Yes Assig. No No

Mail No

Instances 1799 2093 2252 3476 (61%) 1772 2544 Instances 2252 1834 3476 (61%) 2765

Percentage (%)

Organizer

80.3

Discussion

61.3

Other

43.5

Assignment

36.9

Content

31.8

Mail

19.4

Assignment Discussion Organizer

18.0

Content Discussion Organizer

15.8

Content Assignment Organizer

12.5

Assignment Other Discussion Organizer

11.4

⇒ ⇒ ⇒ ⇒

Mail No No No No

Instances 1593 1735 1834 2795

Support 0.28 0.30 0.32 0.49

Confidence 0.89 0.83 0.81 0.80

⇒ ⇒

No No

1411 2002

0.24 0.35

0.80 0.79

⇒ ⇒ ⇒

Content No No No

Instances 1772 1411 2544

Support 0.31 0.24 0.44

Confidence 0.79 0.77 0.73



No

2002

0.35

0.72

Instances Disc. Instances Support 1843 Yes 1411 0.24 ⇒ 2489 1772 0.31 Yes ⇒ (43%) Figure 3. More interesting rules obtained about the use of resources.

Confidence 0.77 0.71

The proposed system will directly offer instructors the rules although they will be able to modify this latter parameter and request a new rule set again. Table 2 and Figure 3 illustrate the interesting results obtained. Table 2 shows that students do not use all the resources in each session. What’s more, the use of all the resources in a session is quite infrequent. Only 11,4% of sessions used 4 resources. The organizer is the most used resource because it is the main page of the course, next the discussion, after that others (announcement, calendar, urls, web-links, etc.), and finally assignments and content-pages. As can be observed in Figure 3, in 61% of the sessions, the use of the forum is present and in only 27% of these sessions, students also accede to content-pages (R41). In the case of the mail, this percentage is lower, only 20% (R23). There are 43% of sessions in which students neither accede to content-pages nor the assignment tool and of these, in 71% students accede to the discussion tool (R45).

SessionTime hit_mail hit_discussion hit_chat hit_contentpage hit_assignments hit_weblinks hit_organizer hit_learningobjectives hit_other time_mail time_discussion time_chat time_contentpage time_assignments time_weblinks time_organizer time_learningobjectives time_other

Average 14.0658 0.6873 8.9338 0.0625 1.4481 1.1112 0.0672 2.3489 0.1315 1.0856 0.725 3.1068 0.0018 4.9652 2.9017 0.0321 0.6511 0.0178 1.201

Cluster 0 6.0482 0.6191 7.1086 0.0021 0.6111 0.5813 0.0184 1.5293 0.0955 0.7269 0.5591 1.9746 0 1.7227 0.6796 0.0116 0.2741 0.0148 0.501 Instances: 83% (4731)

After this result, it is reasonable to want to know what the session pattern is. The template for obtaining session profile uses the following input variables: time spent in session (minutes), hits and time spent in content-pages, hits and time spent in collaborative resources (mail, discussion, chat) and in the rest of resources of the course. The criterion used to build the clustering model was the same as the one utilised to generate the student profile. Observing Figure 4, we can discover that most of sessions are very short (6 minutes) and generally focused on reading the forum (cluster 0). Sessions in which students spend more than half an hour are hardly 14% (cluster 2 and 3) and although the number of hits in the discussion tool is the highest, most of the time is dedicated to content-pages and assignments. Finally, cluster 1 gathers brief sessions which can be considered as consulting visits.

Cluster 1 27.0222 2.2519 23.9259 2.3185 2.363 3.3111 0.6222 3.9259 0.7333 5.3333 4.1852 5.8667 0.0741 5.8963 5.3111 0.6296 1.1037 0.0444 2.5407 Instances: 2% (135)

Figure 4. Session profile.

Cluster 2 51.8116 0.9799 17.4246 0.0402 0.8769 6.2739 0.0804 3.1206 0.1834 3.4648 1.2965 9.0101 0 3.2613 27.7412 0.0226 0.9573 0.0201 8.3367 Instances: 7% (398)

Cluster 3 66.7015 0.6741 16.9726 0.0373 11.5572 1.4975 0.4428 10.7015 0.301 1.5249 0.9502 9.6592 0 44.5 3.6517 0.0821 4.6318 0.0423 1.9254 Instances: 7% (402)

These reports were shown to the instructor in charge of the course, and in her opinion, they allow her to gain an insight into the characteristics of her students with relation to the time spent and the use of resources available in the course. Although it is true that the learning process can be carried out without being connected, the interaction of the students with the different resources contains information to improve it. This allows instructors to validate or refute hypothesis used in the design of the learning process. For example, knowing that there are few sessions in which students accede to content-pages makes instructor suppose that most of the students do not study connected or, what would be worse, they do not read the contentpages. This data can alert instructors and they can detect, for example, a bad design of content-pages. Likewise, knowing that the forum is visited in practically each session and that it is the main resource used obliges instructors to have knowledge and skills for the suitable utilization of this learning tool. On the other hand, the instructor suggested that we improve the presentation of the results in order for them to be more understandable. For example adding an explanation similar to the one used in this work.

5. CONCLUSIONS Adding intelligence to e-learning platforms means giving tools the ability to understand and profit from data (experience). Consequently, in this paper we present the proposal of a decision making system which helps distance instructors to know who their students are, how they work, how they use the virtual course, where they find the problems and so on, and in this way, instructors can act as soon as they detect any difficulty, for example, proposing new tasks, re-organizating the content-pages, adding new information, opening discussions and so on. Likewise we propose some questions that, in our opinion, are of interest to teaching staff and show how the answers are very useful for improving the learning and teaching process. These answers are obtained by means of data mining techniques. Lastly we also suggest a modular architecture for its implementation. This work presents two main challenges: firstly, to determine the input variables, the technique and the parameters with which to execute the algorithms to answer the teachers’ questions appropriately; and secondly, to define a graphical interface which allows instructors to interpret the results easily. Regarding the first challenge, this work defines three templates which must be validated with data from other virtual courses in order to be considered adequate; and with regard to the second, we are studying different research works carried out in this field such as [9][13][31].

6. ACKNOWLEDGMENTS The authors are deeply grateful to CEFONT, the department of the University of Cantabria which is responsible for LCMS maintenance, for their help and collaboration. Likewise, the authors gratefully acknowledge the valuable suggestions of the anonymous reviewers.

This work has been partially financed by Spanish Ministry of Science and Technology under project ‘TIN2007-67466-C02-02’ and ‘TIN2008 – 05924’.

7. REFERENCES [1] Agrawal, R. and Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: 20th International Conference on Very Large Data Bases, 478-499, 1994. [2] Alcalá-Fdez, J., Sánchez, L., García, S., Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J. Romero, C. Bacardit, J., Rivas, V.M., Fernández, J.C and F. Herrera. KEEL: A Software Tool to Assess Evolutionary Algorithms to Data Mining Problems. Soft Computing 13:3 (2009) 307-318, doi: 10.1007/s00500-008-0323-y [3] Álvarez, E. Zorrilla, M. E. 2008. Orientaciones en el diseño y evaluación de un curso virtual para la enseñanza de aplicaciones informáticas. Revista Iberoamericana de Tecnologías del Aprendizaje (IEEE-RITA), 3(2), 61-70. http://webs.uvigo.es/cesei/RITA/200811/ [4] Brin, Sergey and Page, Lawrence (1999) Dynamic Data Mining: Exploring Large Rule Spaces by Sampling. Technical Report. Stanford InfoLab. [5] Brown, A.R., Bradley D. 2005. Elements of Effective eLearning Design. International Review of Research in Open and Distance Learning. EDUCASE publications. [6] Conrad, D. L. 2002. Engagement, excitement, anxiety and fear: Learners’ experiences of starting an online course. American Journal of Distance Education, 16(4), pp. 205– 226. [7] Dougherty, J., Kohavi, M., and Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In Int. Conf. Machine Learning Tahoe City, CA, pp.194–202. [8] Douglas, I., 2008. Measuring Participation in Internet Supported Courses. International Conference on Computer Science and Software Engineering, 5, pp. 714-717. [9] Durand, N, Cremilleux, B and Suzuki, E. 2006. Visualizing transactional data with multiple clusterings for knowledge discovery. 16th International Symposium on Methodologies for Intelligent Systems, Bari , Italy. [10] Graf, S., Kindhuk and Liu, T. Identifying Learning Styles in Learning Management Systems by Using Indications from Students’ Behaviour. Proc. of the 8th IEEE International Conference on Advanced Learning Technologies. July, Santander, Spain. 2008. [11] Han, J. Data mining: Concepts and Techniques. Morgan Kaufmann. 2006. [12] Hung, J., and Zhang, K. 2008. Revealing Online Learning Behaviors and Activity Patterns and Making Predictions with Data Mining Techniques in Online Teaching. Journal of Online Learning and Teaching 8(4), pp. 426-436 [13] Kreuseler, M. and Schumann, H. 2002. A flexible approach for visual data mining. IEEE Transaction on Visualization and Computer Graphics, Vol. 8 (1), pp. 39-51

[14] Krichen, J. 2007. Investigating Learning Styles in the Online Educational Environment. Proceedings of the 8th ACMSIGinformation Conference on Information Technology Education, 127-134, Destin, Florida, USA, 18 - 20 de October 2007.

[27] Talavera, L., and Gaudioso, E. 2004. Mining student data to characterize similar behaviour groups in unstructured collaboration spaces. In Workshop on artificial intelligence in CSCL. 16th European conference on artificial intelligence, pp. 17–23.

[15] Merceron, A., and Yacef, K. 2005. TADA-Ed for Educational Data Mining. Interactive Multi-media Electronic Journal of Computer-Enhanced Learning, Vol. 7, nº 1, May 2005.

[28] Tsai, C., and Tsai, M. 2005. A dynamic Web service based data mining process system. The Fifth International Conference on Computer and Information Technology, pp. 1033 – 1039. 21-23 Sept. 2005

[16] Merceron, A. and Yacef, K.. 2008. Interestingness Measures for Association Rules in Educational Data. 1st International Conference on Educational Data Mining (EDM08). Montreal, Canada

[29] Ueno, M., and Okamoto, T. 2007. Bayesian Agent in eLearning. Proceedings of the Seventh IEEE International Conference on Advanced Learning Technologies (ICALT), pp.282-284

[17] Mierswa, I., Wurst, M., Klinkenberg, R. , Scholz, M., and Euler, T.: YALE: Rapid Prototyping for Complex Data Mining Tasks, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), 2006.

[30] Witten, I. H., and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition). Morgan Kaufmann. ISBN 0-12-088407-0

[18] Millan, S, Zorrilla, M. E., Menasalvas, 2005. E. Intelligent elearning platforms infrastructure. XXXI Latin American Informatics Conference (CLEI’2005). Cali, Colombia. [19] Pendharkar, P. C. 2003. Managing Data Mining technologies in Organizations: Techniques and Applications. ISBN 159140-057-0. [20] Perera, D., Kay, J., Koprinska, I., Yacef, K., and Zaïane, O. R. 2009. Clustering and Sequential Pattern Mining of Online Collaborative Learning Data. IEEE Trans. on Knowledge and Data Eng. 21, 6 (Jun. 2009), 759-772. DOI= http://dx.doi.org/10.1109/TKDE.2008.138 [21] Piatetsky-Shapiro. 2009. Data Mining Tools Used Poll. KDNuggets.com [22] Romero, C. and Ventura, S. Educational Data Mining: A Survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135-146, 2007. [23] Romero, C., Ventura, S., Espejo, P.G., and Hervas, C. Data Mining Algorithms to Classify Students. International Conference on Educational Data Mining, Canada, 2008. [24] SCORM 2004 3rd Edition. The Sharable Content Object Reference Model, ADL. 2009. [25] Steen, H. L. 2008. Effective eLearning Design. Journal of Online Learning and Teaching, 4 (4). http://jolt.merlot.org/vol4no4/steen_1208.htm [26] Silva, D.R., Vieira, M.T.P.(2002). Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning. In: Proceedings of IEEE Intl. Conf. on Advanced Learning Technologies (ICALT 2002)

[31] Wong, P., Whitney, P.and Thomas, J. Visualizing association rules for text mining. In Proceedings of IEEE Information Visualization INFOVIS. IEEE Computer Society Press, 1999. [32] Xiaohua Hu, X, Cercone, N.2004. A data warehouse/online analytic processing framework for web usage mining and business intelligence reporting. International Journal of Intelligent Systems, Vol. 19 ( 7), pp. 585-606 [33] Yao, Y.Y., Zhao, Y. and Maguire, R.B. (2003). Explanationoriented association mining using rough set theory. Proceedings of Rough Sets, Fuzzy Sets and Granular Computing, pp. 165-172. [34] Zaïane, O. 2002. Building a recommender agent for elearning systems. Computers in Education, 2002. Proceedings of the International Conference on Computers in Education, pp. 55–59. [35] Zhao, Y., Chen, Y.H. and Yao, Y.Y., User-centered interactive data mining, in: Yao, Y.Y. and Shi, Z.Z. (Eds.) International Journal of Cognitive Informatics and Natural Intelligence (IJCiNi), 2(1), 58-72, IGI Global, 2008. [36] Zorrilla, M., and Álvarez, E. 2008. MATEP: Monitoring and Analysis Tool for e-Learning Platforms. Proceedings of the 8th IEEE International Conference on Advanced Learning Technologies. Santander, Spain. [37] Zorrilla, M. E. 2009. Data Warehouse Technology for ELearning. In book Methods and Supporting Technologies for Data Analysis. Studies in Computational Intelligence 225, pp. 1–20. D. Zakrzewska et al. (Eds.) Springer-Verlag Berlin Heidelberg.