A Framework for Decision Support for Learning ...

2 downloads 3463 Views 179KB Size Report
In this paper we review relevant literature on educational data mining (EDM) and ..... The CRISP-DM data mining model provides a generic template for the ...
A Framework for Decision Support for Learning Management Systems Phelim Murnion1, Markus Helfert2 1School of Business, Galway-Mayo Institute of Technology, Galway, Ireland 2School of Computing, Dublin City University, Dublin, Ireland [email protected] [email protected] Abstract: Learning Management Systems (LMS) provide a valuable platform for e-learning that offer great flexibility. , However, compared to traditional learning environments they are challenging and complex for decision-makers, both teachers and learners. At the same time, LMS environments offer opportunities for analysis by storing large quantities of data, such as web log files and data about students and content, which are not generally available in the traditional environment. Motivated by approaches in other domains, such as e-commerce and clinical management, in this article we propose to relate the complex decision environment with the possibilities of using large quantities of data. In this paper we review relevant literature on educational data mining (EDM) and combining that with a standard data mining methodology we propose a conceptual framework that appropriately relates the methods of data mining to the settings of teaching and learning in a LMS environment. In contrast to other frameworks, our conceptual framework enables EDM research to be more integrated with the task domain. In our framework, teaching and learning activities and the decisions required to control those activities are addressed [p1]by relating the following three elements: pedagogy; learning activities; and decision-making. The significance of our work is that the framework enables us to compare between different research studies as well as provide practical guidelines for developing EDM solutions. The framework also provides a number of further directions for researchers which follow naturally from a decision-centric perspective and from the full implementation of the contextual phases of the data mining life cycle Keywords:

Educational Data Mining, Learning Management Systems, Decision Support, Programme Evaluation

1. Introduction New technologies for learning are being developed and introduced at a rapid rate. Perhaps the most common of these is Learning Management Systems (LMS), also known as Course Management Systems (CMS) or Virtual Learning Environments (VLE). These systems offer a variety of tools to enable educators to distribute information to students, produce content material, prepare assignments, engage in discussions, and to enable collaborative learning with forums, chats, file storage areas, and news services. LMSs such as Moodle and Blackboard have recently become ubiquitous in tertiary/higher education, with (citing US statistics) almost 100% of institutions using LMS technology (Green 2010), and 60% using an approved campus-wide LMS(Pam Arroway 2010). At the same time, teaching and learning in a LMS environment presents new challenges and opportunities. The teacher loses some of the advantages of the traditional learning environment and while the learner gains more freedom to make their own decisions. As a result, the decision-making environment for both teachers and learners becomes more complex. However, LMS environments also store large quantities of data, such as web log files and data about students and content, which are generally not available in the traditional environment. This combination of a complex decisionmaking environment and large quantities of raw data presents problems but also opportunities for practitioners (teachers) and researchers. Moreover, researchers in e-learning, particularly in the area of programme evaluation, have identified the need for more research based on the data in the LMS (Janossy 2008) to supplement traditional survey-based and experimental methods. In other problem domains, such as e-commerce and clinical management, this combination of a complex decision environment and large quantities of data has been addressed by information systems researchers using data mining and other analytical approaches. Motivated by this observation, we focus on ‘Data Mining’, which involves the automatic extraction of implicit and interesting patterns from large data collections (Klosgen 2002). The field of e-Learning, with the large amounts of usage data automatically stored in the LMS web log files and the difficulty of making decisions in online learning, is well suited to data mining (Zaïane 2001). This approach, which has become established as “educational data mining” or EDM (Castro, Vellido et al. 2007), has been applied to a variety of problems in LMS environments. In data mining, it is an accepted principle that an understanding of the context or setting in which data mining is deployed is important (Shearer 2000), (Lavrac 2004), (Hofmann and Tierney 2009). In the specific field of EDM several researchers have identified the need for a better integration with the teaching and learning context (Gaudioso and Talavera 2006), (Romero, ventura et al. 2008). The objective of this paper is to address decision support for LMSs by describing a conceptual framework that enables the integration of data mining methods with the context or settings of teaching and learning in a LMS environment. The main contribution is to address a limitation in existing EDM research with a framework which can guide future research and practice. The remainder of the paper is structured as follows. In section two, we develop a description of the problem of integrating EDM with the teaching and learning context. Based on that description, we propose a set of categories for analysing EDM research. Using these categories we present in section three a detailed analysis of the EDM literature. Drawing from this analysis, in section four we propose and describe a conceptual framework for an improved integration of EDM research with the learning context, followed by a discussion and comment on the proposed framework.

2. Related Work In order to develop an appropriate basis for the analysis of the limitations in EDM research, in this section we review existing work, first by describing the EDM research field in general and then by examining relevant related work in data mining methodologies. The general perspective provides us with an overview of the problem of integrating EDM research with the teaching and learning context. The work on data mining methodologies allows us to convert that overview into a more detailed model, which provides the basis for the analysis in section three.

2.1.

Educational Data Mining

Data Mining (also known as knowledge discovery from data) is a well-established approach for extracting patterns from large quantities of data (Berry and Linoff 2004), using a variety of statistical, machine-learning and other data-mining algorithms, in order to explore and understand the phenomena underneath the data or to support decision making (Peng, Kou et al. 2008). It has been applied successfully to a number of scientific and commercial domains (Lavrac 2004). Due to its origins in data analysis and machine learning, data mining research has tended towards a technical orientation, focussing on techniques/tasks such as: data clustering, classification, association rule mining and sequential analysis (Peng, Kou et al. 2008). Educational data mining (EDM) is the application of the data mining approach to the different types of educational data (Romero and Ventura 2010), (Baker and Yacef 2009). It has been recognized that the field of eLearning in a LMS environment is well suited to the data mining approach (Zaïane 2001), due to the two features of large amounts of raw data (stored in the LMS web log files) and the complexity of the decision making problems (Peng, Kou et al. 2008). This EDM research approach (Castro, Vellido et al. 2007) has included investigations of a wide variety of situations in the LMS environment including: evaluating learner activity (Zaïane 2001), (Muehlenbrock 2005), (Pahl 2006); providing support to educators (Gaudioso and Talavera 2006) and recommending student actions (Sacin, Agapito et al. 2009). The above-mentioned technical orientation [p2]is reflected in EDM as well, where papers have tended to focus on the application of these techniques to educational data. Despite the strongly technical orientation of EDM research there have been a number of recommendations for more work on integrating data mining with the context of teaching and learning. An early paper in the field describes the concept of “integrated web usage mining” (Zaïane 2001), in which a data mining system would be integrated into the eLearning system. A review paper in 2007 describes a model in which “data mining in educational systems is an iterative cycle of hypothesis formation, testing, and refinement. Mined knowledge should enter the loop of the system and guide, facilitate and enhance learning as a whole. Not only turning data into knowledge, but also filtering mined knowledge for decision making” (Romero and Ventura 2007) . Our paper emphasises this view. However, the impact of these recommendations on the research work is not clear. Frequently EDM research has been focussed on the techniques of data mining with little or no reference to the details of the learning context (Zaïane 2001), (Etchells, Nebot et al. 2006). In contrast some researchers have based their EDM interventions on a comprehensive model of the learning context using established pedagogic theory on (for example) learner-content interaction (Pahl 2006), and group work (Perera, Kay et al. 2009). Of particular significance is the work of Elena Gaudioso and colleagues (Talavera and Gaudioso 2004), (Gaudioso and Talavera 2006) on collaborative learning, which describes not only a detailed educational context but a purpose and role for the results of the EDM intervention within that context. On this basis we postulate a spectrum of EDM research in terms of integration with the educational context from studies with little or no integration with the context to studies which are highly integrated. However, the level of integration is unlikely to be a simple scalar variable. Any analysis of an ‘integration’ construct within the EDM literature requires a more concrete version of this construct. Since the relationship of data mining to the domain context is a feature of the data mining methodology adopted, an examination of data mining methodologies is a necessary next step.

2.2.

Data Mining Methodologies

Data mining arose from a number of related computational and statistical approaches, which form a set of data mining methodologies. The methodologies can have an emphasis on technical aspects rather than the processes and the relationship to the task domain (Peng, Kou et al. 2008). However, in order to structure the methodologies, research and practice in data mining has expanded in perspective to include steps along a data mining process or data mining model (figure 1). This model can be described as the ‘technical perspective’ on data mining and has been commonly used in EDM research (Luo 2001), (García E. 2007).

Figure 1: Data Mining Model: technical perspective As data mining matured as a discipline and as data mining applications were implemented in a wider variety of problem domains the technical steps were subsumed into a more comprehensive methodology known as the data mining cycle (e.g. figure 2). A number of data mining methodologies/cycles have been developed but the process of developing these methodologies has exhibited two common features (Hofmann and Tierney 2009): the replacement of a sequence of steps with a cyclical, iterative process, and a greater focus on the connections between the data mining process and the underlying problem context. The most widely accepted model, known as the CRISPDM (CRoss Industry Standard Process for Data Mining) Cycle (Shearer 2000), exhibits both features (figure 2).

Figure 2:

CRISP-DM Cycle (Shearer 2000)

The first phase, Problem definition, is the start of the cycle. Based on an understanding of the task domain problem, a data mining problem or hypothesis can be derived. The next four phases (data exploration, data preparation, modelling, and evaluation) constitute the technical data mining steps,

resulting in a model, knowledge or information which can be deployed. Finally in the Deployment phase the results of the data mining are used to solve the originally defined problem. For a complete implementation of the cycle Deployment and Problem definition must be appropriately connected, as the results of the deployment should be measured and fed back into problem definition for the next iteration of the cycle (Berry and Linoff 2004). On this basis, the extent of integration with the task domain consists of: 1. A definition of the problem to be solved; 2. A description of the deployment method(s); and 3. The proper inter-relationship between problem definition and deployment and the underling task domain.

2.3.

Categorising EDM research

On the basis of the above definitions we propose a maturity model for categorising EDM research into stages; representing development along an ‘integration’ axis (figure 3).

Figure 3: Integration with the Educational Context The stages are incremental and hierarchical, each stage assuming the completion of the previous stage and representing a higher level of maturity. An EDM intervention at the Problem stage includes a definition of the problem, but no clear description of the deployment phase. In the Solution stage, both problem definition and deployment are described. In the Integrated stage both problem definition and deployment are appropriately related in a model based on the task domain. On this basis, EDM research can be analysed by categorising each study into one of the three stages.

3. Analysis of EDM research The maturity model described in figure 3 provides a set of criteria for analysing the EDM literature. The model forms the basis for a detailed analysis of research related to EDM. In order to select a sufficient number of papers, we define relevant papers, that they are recent; relate to the correct learning environment; and address a relevant educational research task. Recency is determined by selecting from the latest review paper. Environment and task are classifications defined in several of the existing review papers (Castro, Vellido et al. 2007), (Romero and Ventura 2010). The selection of the relevant subset of literature follows a three step process: 1. Identify all review papers in the field: [1] (Castro, Vellido et al. 2007), [2] (Romero and Ventura 2007), [3] (Baker and Yacef 2009), and [4] (Romero and Ventura 2010) 2. Select the most recent review, which also has the largest number of papers (300), which is [4] above. 3. The selected review classifies references according to types of educational environment. According to the focus of this paper we can dichotomously divide the types into: a. Learning Management Systems, which is the focus of this study, (29 papers) and, b. All other types (Traditional education, Web-based Education, Intelligent Tutoring Systems; Adaptive Educational Systems, Tests/Questionnaires, and Texts/Contents) 4. Using another typology popular in the field, we can categorise papers by educational task (Baker and Yacef 2009), or educational objective (Romero and Ventura 2010) again using a binary division consisting of: a. Supporting teaching and learning (which is the focus of this study), versus b. Supporting theory advancement (student models, domain content models, etc.)

Based on these steps, the twenty-nine papers per criterion 3.a. are sampled to select papers according to criterion 4.a. This will focus the review on papers relevant to this study: those which examine support for teaching and learning in a LMS environment.

Figure 4: EDM research; integration with the educational context Out of the full sample of papers (per criterion 3.a), some are excluded because they do not meet criterion 4.a or for other practical reasons. Papers excluded are listed in Appendix A

The analysis in figure 4 shows that EDM research is not highly developed in terms of integration with the educational context. At each stage in the development of the integration (vertical axis) there are fewer papers. The most notable gap is in the integration stage. Furthermore there is no clear development over time. On the one hand, all the papers in the Integration stage are in the later years. On the other, most of the papers in the Problem stage are also in the later years (5 for 2008).

4. Proposed Conceptual Framework As our analysis in section three shows, there is a significant gap in EDM research; in the form of an inadequate model of the task domain of teaching and learning. In order to address this gap, we propose a conceptual framework that enables EDM research to be more integrated with the task domain. The basis for that framework is the related work on the data mining cycle, as outlined in section 4.1 and the refinement of that model for the special case of the task domain, which is teaching and learning, as outlined in 4.2. Therefore, we propose an educational data mining cycle (figure 5) which incorporates a conceptual linkage (dashed triangle) to the task domain of teaching and learning.

Figure 5:

4.1.

Framework: An Educational Data Mining Cycle

Data Mining and the Framework

The CRISP-DM data mining model provides a generic template for the application of data mining. From the data mining model discussed in section two we can identify two key elements for our framework: Problem definition, and Deployment. In the data mining model these two phases enable the integration of the data mining process with the task domain. Any conceptual model must incorporate these elements, but in a way that is specific to the domain of teaching and learning but generic across that domain. The review of the relevant research in section three provides evidence to support the importance of the elements of the data mining cycle identified above. In addition, the description of an integration maturity model in figure 3 and the analysis of the literature presented in figure 4 suggest that the main gap in existing research is in the integration of the two elements with each other and with the context of the task domain. On that basis the elements of the conceptual framework should at least include: Table 1:

Framework Elements Element 1. Problem definition 2. Deployment 3. Task Domain

However, a further step in the construction of the model is to refine the description of the elements in table one by reference to the special characteristics of teaching and learning. For each of the three elements above, the model should specify a corresponding definition which is widely applicable across the domain of teaching and learning. That requires a brief examination of the educational task domain.

4.2.

Educational Decision-making and the Framework

Existing best practice (as described in the analysis of the literature in section three) presents a basis for any improved conceptual framework. Of the EDM studies in that analysis (see figure 4), three papers fall into the most mature stage: Integration. All three studies provide a context, described in terms of learning theory, which informs, constrains and relates the data mining phases: problem definition and deployment. The first study (Jovanović, Duval et al. 2007), describes the context as ‘learning object context’: the interaction between learning content, student and activities; with the emphasis on the learning content. The second (Wang 2008), describes the context as the inter-relationship between student, content, and domain knowledge with the emphasis on the student. The third (Perera, Kay et al. 2009), describes the context as collaborative learning (incorporating group work theory) with the emphasis on activities. Clearly, each study defines the context differently and as a result each has a unique definition of problem definition and deployment. However, a feature common to all three studies is their operational orientation. An educational system clearly is a set of operations (in which content, learner, and task interact) resulting in outcomes. But decision support approaches like data mining are not designed to support operations. Another element is required which relates operations to outcomes. This concept is addressed in another EDM study, (Gaudioso and Talavera 2006), already mentioned in the related work. The context of the study is collaborative learning; which is described using an appropriate theoretical context. Based on that context the operations (the problems to be solved) are described in terms of the creation of virtual communities. The critical difference between this and other EDM studies is the next element. The authors describe a separate process called adaptive collaborative support. (ACS): the mechanism for ensuring that the virtual communities operate in a manner which meets the goals of collaborative learning theory, thus relating operations to outcomes. The data mining intervention is addressed, not directly at the operations of the collaborative groups, but to provide decision support for the ACS function. Motivated by this example, we suggest that a general model for our framework can be extrapolated from this particular study. Three elements are necessary in an educational system: an underlying pedagogy (theory); a set of teaching and learning activities (operations) and a related set of control decisions. It is towards the controls decisions that EDM should be directed. This perspective of educational decision making and control allows us to extend the model described in table 1 by providing a definition for each element, resulting in table 2. Table 2:

Framework Elements and Definitions Element Task Domain Problem definition Deployment

Definition Learning Theory Teaching & Learning Activities Control Decisions

Learning theory constrains the kind of learning and teaching activities that should occur and also provides the goals/outcomes which decide what control decisions to make. Learning and teaching activities generate the data which EDM approaches can turn into knowledge for decision-making (Romero and Ventura 2007). This control systems view has been examined in the educational literature from a number of different perspectives; learner control (Williams 2001), learning cybernetics (Liber 2003) and soft systems methods (Warwick 2008). This specification can be combined with the standard data mining cycle to provide a specific framework for this task domain, the Educational Data Mining Cycle in figure 5. Using this model, the deployment phase for EDM consists of providing knowledge for decision making processes. The decision making processes are part of systems which control the teaching and learning activities identified in problem definition.

5. Discussion and Concluding Remarks The proposed framework makes a number of contributions. Firstly, the framework addresses the thesis raised at the start of this paper; providing a way to integrate the methods of data mining with the context of teaching and learning in a LMS environment. Secondly, it provides a bridge between EDM approaches and general educational theories. This enables EDM research to increase the impact on and relationship with other areas of educational technology research, a problem identified in an earlier review paper (Baker and Yacef 2009). The framework also provides a reference model for constructing further EDM interventions (based on the dashed triangle in the framework diagram, figure 5) that EDM designers can use to support the implementation of the data mining cycle in education. Finally the framework provides further directions for researchers that follow from a decision-centric and control systems perspective. For example, EDM research has tended to focus on gathering knowledge to direct attention to a problem. In other problem domains, decision support tools are considered more useful when deployed at later steps in decision making such as when devising or evaluating solutions (March and Hevner 2007). A limitation of the framework is that it is a perspective, a way of examining EDM research, which requires further validation. In further research we aim to apply this framework within a live learning situation. This will provide further insight into the validity of the framework and its utility in enabling decision support in a LMS environment.

Appendix A Table 3:

Papers excluded from literature review (section three).

Paper (Baruque, Longo et al. 2007) (Castro, Vellido et al. 2005) (Liu and Shih 2007) (Matsuda, N. et al. 2007)

Notes Portuguese (Brasil) Spanish Does not meet criterion 4a Wrong category (ITS)

References

Baker, R. and K. Yacef (2009). "The State of Educational Data Mining in 2009: A Review and Future Visions." Journal of Educational Data Mining 1(1): 3-17. Baruque, C. B., C. J. Longo, et al. (2007). Analysing users' access logs in Moodle to improve e learning. Proceedings of the 2007 Euro American conference on Telematics and information systems. Faro, Portugal, ACM. Berry, M. J. A. and G. S. Linoff (2004). Data Mining Techniques : For Marketing, Sales, and Customer Relationship Management. Hoboken, NJ, USA, John Wiley & Sons, Incorporated. Castro, F., A. Vellido, et al. (2005). Detecting atypical student behaviour on an e-learning system. Proc. Simposio Nacional de Tecnologıas de la Informacion y las Comunicaciones en la Educacion, Granada, Spain. Castro, F., A. Vellido, et al. (2007). Applying Data Mining Techniques to e-Learning Problems. Evolution of Teaching and Learning Paradigms in Intelligent Environment. J. Kacprzyk, Springer Berlin Heidelberg. 62: 183-221. Etchells, T. A., A. Nebot, et al. (2006). Learning what is important: Feature selection and rule extraction in a virtual course. European Symposium on Artificial Neural Networks, Bruseles, Belgium. García E., R. C., Ventura S., Calders T. (2007). Drawbacks and solutions of applying association rule mining in learning management systems. International Workshop on Applying Data Mining in eLearning 2007. Gaudioso, E. and L. Talavera (2006). Data mining to support tutoring in virtual learning communities: experiences and challenges. Data Mining in E-Learning. C. R. M. a. S. VENTURA. Cordoba, WITPress. Green, K. C. (2010). The 2010 Campus Computing Survey. Campus Computing Survey. Hofmann, M. and B. Tierney (2009). An Enhanced Data Mining Life Cycle. IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009, Nashville, TN, USA, IEEE. Janossy, J. H., T. (2008). Proposed Model for Evaluating C/LMS Faculty Usage in Higher Education Institutions. Society for Information Technology & Teacher Education International Conference, Las Vegas, Nevada, USA. Jovanović, J., E. Duval, et al. (2007). LOCO-Analyst: A Tool for Raising Teachers’ Awareness in Online Learning Environments. Creating New Learning Experiences on a Global Scale, Springer Berlin / Heidelberg. 4753: 112-126. Klosgen, W., & Zytkow, J. (2002). Handbook of data mining and knowledge discovery. New York, Oxford University Press. Lavrac, N., Motoda, H., Fawcett, T., Holte, R., Langley, P. & Adriaans, P., (2004). "Lessons learned from data mining applications and collaborative problem solving." Machine Learning 57(1-2): 13-34. Liber, O. (2003). "Cybernetics, eLearning and the education system." International Journal of Learning Technology 1(1): 127-140. Liu, F.-j. and B.-j. Shih (2007). Learning Activity-Based E-Learning Material Recommendation System. Multimedia Workshops, 2007. ISMW '07. Ninth IEEE International Symposium on. Luo, O. R. Z. a. J. (2001). Towards Evaluating Learners’ Behaviour in a Web-Based Distance Learning Environment. Proc. IEEE International Conference on Advanced Learning Technologies (ICALT 2001),, Madison, WI, USA. March, S. T. and A. R. Hevner (2007). "Integrated decision support systems: A data warehousing perspective." Decision Support Systems 43: 1031-1043. Matsuda, C. N., et al. (2007). Predicting students performance with SimStudent that learns cognitive skills from observation. International conference on Artificial Intelligence in Education, Amsterdan, Netherlands. Muehlenbrock, M. (2005). Automatic Action Analysis in an Interactive Learning Environment. Workshop on Usage Analysis in Learning Systems at AIED-2005, Amsterdam. Pahl, C. (2006). Data Mining for the Analysis of Content Interaction in Web-based Learning and Training Systems. Data Mining in E-Learning, WIT Press: 41-56. Pam Arroway, E. D., Guangning Xu, Dan Updegrove (2010). EDUCAUSE Core Data Service Fiscal Year 2009 Summary Report. EDUCAUSE Core Data Service.

Peng, Y., G. Kou, et al. (2008). "A Descriptive Framework for the field Of Data Mining And Knowledge Discovery." International Journal of Information Technology & Decision Making 07(04). Perera, D., J. Kay, et al. (2009). "Clustering and Sequential Pattern Mining of Online Collaborative Learning Data." IEEE Trans. on Knowl. and Data Eng. 21(6): 759-772. Romero, C. and S. Ventura (2007). "Educational data mining: A survey from 1995 to 2005." Expert Systems with Applications(33): 135-146. Romero, C. and S. Ventura (2010). "Educational Data Mining: A Review of the State of the Art." IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS 40(6): 601-618. Romero, C., S. ventura, et al. (2008). Data mining algorithms to classify students. Int. Conf. Educ. Data Mining, Montreal, Canada. Sacin, C. V., J. B. Agapito, et al. (2009). Recommendation in Higher Education Using Data Mining Techniques. 2nd International Conference on Educational Data Mining (EDM'09), Cordoba,Spain. Shearer, C. (2000). "The CRISP-DM model: The new blueprint for data mining." Journal of Data Warehousing 5(4). Talavera, L. and E. Gaudioso (2004). Mining Student Data To Characterize Similar Behavior Groups In Unstructured Collaboration Spaces Workshop Artif. Intell. CSCL. Valencia, Spain. Wang, F.-H. (2008). "Content Recommendation Based on Education-Contextualized Browsing Events for Web-Based Personalized Learning." Educational Technology & Society 11(4): 94-112. Warwick, J. (2008). "A Case Study Using Soft Systems Methodology in the Evolution of a Mathematics Module." TMME 5(2&3): 269-290. Williams, M. D. (2001). Learner-Control and Instructional Technologies Handbook of Research for Educational Communications and Technology. D. H. Jonassen, AECT. Zaïane, O., & Luo, J (2001). Web usage mining for a better web-based learning environment. Proceedings of conference on advanced technology for education, Proceedings of conference on advanced technology for education.

Suggest Documents