Examining online learning processes based on log files ... - CiteSeerX

209 downloads 676 Views 132KB Size Report
can learn with, such as: a) Tutorials, introducing new information taught ... activities, during which learners examine their knowledge until mastery is achieved; c) ...
Research, Reflections and Innovations in Integrating ICT in Education

Examining online learning processes based on log files analysis: A case study Galit Ben-Zadok*1,2, Arnon Hershkovitz1, Rachel Mintz1,2 and Rafi Nachmias1 1 2

Knowledge Technology Lab, School of Education, Tel Aviv University, Israel The Center for Educational technology (CET), Tel-Aviv, Israel

The purpose of this study is to identify and examine learning processes, based on data extracted from log files, which document the learners' action within an online learning environment. For this purpose, log files of four elementary school students, studied with a science Web-based module, were examined and analysed. A Learnogram - graphical representation tool that visualizes students' learning process over time - was produced for each student. Based on the log files and the Learnograms, seven learning variables were defined and computed, reflecting the differences between the learning processes. This study serves as a base for a large-scale research, for understanding learning processes in Web-based learning environments, using Web mining techniques. This information is valuable for educators, environment designers and instructors, enabling them to evaluate the students' learning processes and to fit the environments to students, based on their actual behaviors and preferences. Keywords Online learning processes; Log files analysis; Visualization tool; Web Mining

1. Introduction Web-based learning environments provide the students with new opportunities for learning. They are rich in instructional content and tools, enabling students to freely navigate between them, according to their preferences and needs. Thus, the students are responsible to plan, carry out and evaluate their own learning in these environments. In order to augment students' learning and improve the instructional design, educators need to learn more about these learning processes. However, it is not easy to do so with traditional research methodologies. While learning with such environments, students leave continuous hidden traces of their activity in the form of log file records, which document every action taken by three parameters: What was the action taken, who took it, and when. The purpose of this study is to use the data stored in the log files for defining, extracting and evaluating variables, reflecting learning processes, in order to learn more about learning processes in Web-based environments. To this end, logged data reflecting the activity of four students were visualized and analyzed, and seven variables characterizing the students' learning processes were defined.

2. Background Web-based learning environments have been widely in use in recent years, changing the way students think and learn [1]. These environments are rich in media and content, offering the students varied instructional tools they can learn with, such as: a) Tutorials, introducing new information taught sequentially; b) Drill and practice activities, during which learners examine their knowledge until mastery is achieved; c) Games, motivating practice; d) Simulations, in which learners engage with material utilizing inquiry learning; and e) Self-test, enabling learners to evaluate their knowledge [2]. By being offered with these tools, the students can freely navigate between them and have control over their learning processes, according to their needs and preferences. The aspects of control are expressed, for example, as control over content, control over time and pace, and control over the learning sequence [3]. The independence and autonomy in the learning process that the students have in Web-based learning environments on one hand, and the distance between the educators and the online learners on the other hand, are raising the need of the educators to enhance their understanding about the students' learning processes in these environments. This understanding will enable them to evaluate the learning processes and to design the environments according to the students' behaviors and needs. However, tracing the online learner behavior is not an easy task to achieve with traditional research methodologies, which can hardly cope with gathering of information about the distant learners [4]. Data Mining is an emerging methodology in the educational research field, which can help us to enhance our understanding of learning processes in Web-based learning environments. While learning with Web-based environment students leave continuous hidden traces of their activity in the form of log file records, which document every action taken by three parameters: what was the action taken (e.g., the page URL, the file *

Corresponding author: e-mail: [email protected], Phone: +972-3-6407795

55

Research, Reflections and Innovations in Integrating ICT in Education

downloaded), who took it (if the system requires login, this field will usually include the student identification), and when (exact date, time). Researchers use Data Mining techniques to analyze this data and to locate different aspects of learning behaviors, such as patterns of navigation, time spans and sequences - in order to provide a more effective learning environment. Web-based learning environments might also hold information about the student's profile (e.g., age, gender, grades). Integrating the information derived from the log data with the students' profiles, can be valuable for both students and educators. It could be oriented towards students - in order to recommend them about activities, resources or links that would favor and improve their learning, or to educators and instructors, in order to evaluate the structure of course content and its effectiveness in the learning process and also classify learners into groups based on their behaviors and needs [5]. Data mining has already been successfully applied in e-commerce and has begun to be used in e-learning recently, with promising results [6]. In this study, we will focus on identifying, extracting and evaluating variables related to learning processes, using system log files and visualization tool - Learnogram, in order to learn more about the learning processes of the online learners.

3. Methodology 3.1

The Learning Environment

A Web-based learning module in Earth Science for elementary school was chosen1. The module includes six different activities implementing four of the five instruction strategies defined by Alessi and Trollip [2]: simulation, self-test, drill and practice and a game. The simulation is the information resource of the module; it appears as the first activity, following are three drill and practice activities and a game. In a different page, the students may choose to do an on-line self-test. All of the activities offer the students with automatic feedback to their answers, except for the simulation. 3.2

Procedure

Log files of a class of about twenty 4th grade students who used the module were collected. For each student, a Learnogram was produced (see next section), mapping the events documented in the log by their activity type (Simulation, Drill-1, Drill-2, Drill-3, Game, Test) over time. The authors have examined these Learnograms and noticed a large variance between them, in terms of, e.g., length of using the module, intensity of actions, order of activities taken. Therefore, we chose four students the Learnograms of whom best represented the varied patterns, and these students' logs and Learnograms were analyzed. By examining the four students' logs and Learnograms, and by computing variables describing their learning, seven variables have been chosen to best explain the diversity in these students' learning behavior. These variables will be described in details in the Results. 3.3

Learnograms

A Learnogram is a visual representation of learning variables over time [7]. The X-axis is the time axis, and the Y-axis might be changed according to the variable values. In our study, we mapped two varialbes on each Learnogram: a) Sequence of Learning – all the logged actions were mapped according to their activity type, hence the Y-axis represents the different activities within the module, arranged in the same order presented to the students (from bottom to top); b) Activity Completion – for each activity, its completion (i.e., were all the questions composing the activity correctly answered?) was checked just before the student moved to another task. A plus (+) sign indicates completion, a minus (-) was placed otherwise.

4. Results The seven variables that were defined and computed for each student are: Time in the Module, Time on Task, Relative Time on Task, Resource Consumption, Completion Rate, Sequence of Learning and Time Segmentation. In this section the Learnograms are presented (Fig. 1) and the variables are described and analyzed. Values of the variables for the four students are given in Table 1 and Fig.2.

1

The module is a part of the OFEK Web-based educational environment for elementary school students: http://ofek.cet.ac.il. The English version of this module: http://www.cet.ac.il/ofek/eng/unit11.asp

56

Research, Reflections and Innovations in Integrating ICT in Education

Fig. 1

b) Lenni

c) Zori

d) Kenny

The Learnograms of the four students.

Table 1

Student

Rosie Lenni Zori Kenny

a) Rosie

Summary of six variables

Time in the Module [min] 17.9 7.7 13.1 20.6

Time on Task [min] 10.2 6.3 6.5 16.2

Relative Time on Task 0.57 0.81 0.49 0.78

Resource Consumption

Completion Rate

Sequence of Learning

4 0 1 1

5 5 2 5

Nonlinear Linear Linear Nonlinear

The three first varialbes are time-related: Time in the Module is defined as the total time measured between the first action (mouse click) of a student and her or his last action (within the same session of using the module), Time on Task is the time spent on the different activities, excluding the times in-between activities, and Relative Time on Task is the ratio between these two variables. We may notice the big difference in the general time spent by the four students, where the maximum time (20.6, Kenny) is about 2.5 times bigger than the minimum time (7.7, Lenni). However, it might be seen that both Lenni and Kenny dedicate about the same ratio of their time spend in the module to actually working on the activities, while Rosie and Zori spent much less time on the activities relative to their total time in the module. While the first two time-related varialbes might be an indication to the student's actual need of learning time, the ratio between them might point out on the student's focus on task. Less focus on task might lead to poorer learning, hence this variable is very important for assessing learning processes. The simulation in the learning module is the main source of information. For measuring the extent to which the students exploited it, the variable Resource Consumption was defined, counting the visits of a student in the simulation during his or her learning process. As we can see, the simulation was consumed differently by the students, and while Lenni never used it, and Zori and Kenny used it once at the beginning of the session, Rosie watched the simulation 4 times - first, at the beginning of the session, and then after failing to complete the test, between Drill-2 and Drill-3 activities and during Drill-3 activity. For understanding the results of their learning, we've defined the variable Completion Rate, which counts the total number of successfully completed activities when leaving the module. An activity is defined as “successfully completed” if all of its questions were completed correctly (regardless the number of trials); this variable takes a number between 1-5 (simulation is excluded since it does not have any questions to answer). This varialbe is mapped on the Learnograms (see section 3.3 Learnograms). All the students except Zori had finished their session while all the activities were completed. Zori didn't complete Drill-2, the Game and the Test. Low Completion Rate might indicate on a low motivation to succeed.

57

Research, Reflections and Innovations in Integrating ICT in Education

Sequence of Learning is a binary variable indicating wether the activities were done linearly or not. It is very important to emphasize in this stage that module's interface is simple and linear, hence any nonlinear consumption of it is somehow surprising. However, two of the students (Rosie, Kenny) didn't take the activities in the same order they were presented. For Kenny (Fig. 1d), this nonlinearity is directly derived from his Learnogram, according to which he visited the simulation and then continued with the drills and the game; after a failure in the game he executed all the activities again – in the opposite order. After completing the activities successfully, he returned the game, failed again and returned to previous activities. He completed these activities successfully and returned to the game, ending it successfully. Only then he turned to the test and completed it successfully. As for Zori (Fig. 1c) – she started with the simulation and then continued to the activities and experienced them in the same order presented in the module; her Learnogram exhibits that during Drill-3 activity she returned to Drill-2 activity. An examination of the log file indicates that she returned only to check her answers. Therefore, we will argue that Zori's Sequence of Learning is linear. The seventh variable Time Segmentation is a vector representing the percentages of time spent on each of the four activity types from Time on Task. We suggest that this variable relates to the characteristic of time management. Fig. 2 describes the values of Time Segmentation for each student, reflecting different segments of the learning time within the module. For example, Lenni didn’t spend any time in the simulation but did spend almost half of his time on the test (48%). Kenny spent on the game the longest time compared other activities (31%). Rosie spent the longest time on the test (36%) and Zori spent the most of her time on the Drills. While comparing the time spans among the students, one can see that Rosie dedicated the highest percentages of time to the simulation, Lenni dedicated the highest percentages of time to the test, Kenny dedicated the highest percentages of time to the game, and Zori dedicated the highest percentages of time to the Drills activities.

Fig. 2

Students’ Segmentation Time over the Four Activity Types

5. Discussion and future work Many cognitive, meta-cognitive and affective aspects of learning which are relevant to the way students control their learning and implement it online - such as self-regulation, self-efficacy, and autonimousity - might be reflected by the hidden traces they leave in log files [8, 9]. A very challenging task is to reveal these patterns and to infer from them on the learning processes. In this study, we have demonstrated the potential of using log file analysis for enhancing our understanding of the online learning process. By learning independently in Web-based learning environments, students utilize the system in different ways, and implement varied learning strategies according to their goals, needs and preferences. Understanding these aspects of the learning processes is important for integrated formative evaluation and instructional design [10, 11]. The seven variables defined and extracted in this study from the log files and the Learnograms, present some of the the differences among students regarding the way they consume the online content (e.g., the order they take they activities comparing the order they are presented, the extent to which they use information resource, the way they manage their time). Furthermore, they may imply on the students' motivation while learning (e.g., in accordance with the amount of activities successffully completed, focus on task). This study lays the foundations for a wider research, in which we will define and extract additional variables from the log files, and will examine the relationships between these variables and cognitive and affective aspects of the learning process. Educational Data Mining is an emerging research filed, serving a range of educational goals within Webbased educational systems, such as: evaluation of learning and effectiveness of instructional designs, development of adaptive environments for students based on their actual behaviors, provision of feedback to both students and educators, or identification of irregular learning behaviors in the environments [12, 13]. Yet,

58

Research, Reflections and Innovations in Integrating ICT in Education

we are only doing our first steps in the long journey that may finally lead us to be able to portray the learning process using these digital traces. Acknowledgements

This study is partially funded by the Center of Educational Technology – CET. We would like to thank Ofer Tiber and Josh Reuben for their valuable technical contribution.

References [1] [2] [3]

[4] [5] [6] [7] [8] [9] [10] [11] [12]

[13]

D. Mioduser and R. Nachmias. WWW in Education, in H. Adelsberger, B. Collis and M. Pawlowski (Eds.), Handbook on Information Technologies for education & Training (2002), pp. 23-43. S. Alessi and S. Trollip. Computer-Based Instruction: Methods and development (Englewood Cliffs, NJ: Prentice Hall, 1985). R. Sims and J. Hedberg. Dimensions of learner control: A reappraisal for interactive multimedia instruction. In H. Maurer (Ed.), Educational multimedia and hypermedia (Association for the Advancement of Computing in Education, USA, 1995). R. Nachmias and A. Hershkovitz. Using Web mining for understanding the behavior of the online learner. The International Workshop on Applying Data Mining in e-Learning (ADML'07), Crete, Greece, 2007. C. Romero, S. Ventura and E. Garcia. Data mining in course management systems: Moodle case study and tutorial. Computers and Education, 51, 1, pp. 368-384 (2008). C. Romero and S. Ventura. Education data mining: A survey from 1995 to 2005. Expert System with Applications, 33, 1, pp. 135-146 (2007). R. Nachmias and A. Hershkovitz. Learning about the online learner. Workshop on Logging Traces of Web Activity: The Mechanics of Data Collection, WWW'2006, Edinburgh, Scotland, 2006. A.F. Hadwin, J.C. Nesbit, D. Jamieson-Noel, J. Code and P.H. Winne. Examining trace data to explore selfregulated learning. Metacognition and Learning, 2, pp. 107-124 (2007). M. Cocea and S. Weibelzahl. Cross-system validation of engagement prediction from log files. Second European Conference on Technology Enhanced Learning (EC-TEL 2007), Crete, Greece, 2007. C. Pahl. Data mining technology for the evaluation of learning content interaction, International journal of eLearning, 3, pp. 47-55 (2004). A. Hirami. The design and sequencing of e-learning interactions. A Grounded approach. International journal on e-learning, 1, 1 (2002). F. Castro, A. Vellido, A. Nebot and F. Mugica. Applying data mining techniques to e-learning problems. In L. C. Jain, T. Raymond and D. Tedman (Eds.), Evolution of Teaching and Learning Paradigms in Intelligent Environment (Berlin: Springer-Verlag, 2007), pp. 183-221. J. Srivastava, R. Cooley, M. Deshpande and P.T. Tan. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. SIGKDD Explorations, 1 (2000).

59

Suggest Documents