Support System for e-Learning Environment Based on ... - IEEE Xplore

5 downloads 1032 Views 730KB Size Report
paper proposes a novel structure of a support system for e-. Learning infrastructure that is based on data representing learner's activities and processes.
Support System for e-Learning Environment Based on Learning Activities and Processes Dumitru Dan Burdescu, Marian Cristian Mihăescu

Costel Marian Ionaşcu, Bogdan Logofătu

Software Engineering Department University of Craiova Craiova, Romania

Statistics/CREDIS Department University of Craiova/Bucharest Craiova/Bucharest, Romania

Abstract—In e-Learning domain there has been observed an important effort towards designing and integrating support system software. These software systems may be fully integrated or run as services along e-Learning platforms and have as main goal increasing the effectiveness of the e-Learning processes. This paper proposes a novel structure of a support system for eLearning infrastructure that is based on data representing learner’s activities and processes. Data relevance is based on well discipline structuring based on concept maps. Markov chain modeling and classification is used as main intelligent procedure for data analysis. Keywords-e-Learning; support system; concept maps; Markov chain modeling;

I.

INTRODUCTION

Design, building and integrating a support system into an eLearning environment is one of the greatest challenges of the Information and Communication Technology (ICT) in Education domain. Huge amount of hypermedia may be accessed from Internet thus leading to large amounts of data regarding user behavior. This data may be used for intelligent user profiling, content reaching and classification, personalized intelligent interface. The main prerequisite that is used for building the support system regards the proper structuring of the studied discipline. An e-Learning infrastructure needs to be set up such that disciplines are presented in a structured way. For this purpose there are used concept maps [1]. Behind each concept from the structured presentation of the discipline there is a pool of quizzes that are accessed by learners. As business logic we use an approach based on the Markov Chain Model to learn and to characterize learner behaviors and learning processes from his/her interactions with a web based learning system. II.

RELATED WORK

First, confirm that you have the correct template for your paper size. This template has been tailored for output on the A4 paper size. If you are using US letter-sized paper, please close this file and download the file for “MSW_USltr_format”. The template is used to format your paper and style the text. All margins, column widths, line spaces, and text fonts are prescribed; please do not alter them. You may note peculiarities. For example, the head margin in this template measures proportionately more than is customary. This

measurement and others are deliberate, using specifications that anticipate your paper as one part of the entire proceedings, and not as an independent document. Please do not revise any of the current designations. A. e-Learning Infrastructure For running such a process the e-Learning infrastructure must have some characteristics. The process is designed to run at chapter level. This means a discipline needs to be partitioned into chapters. The chapter has to have assigned a concept map which may consist of about 20 concepts. Each concept has assigned a document and a set of quiz questions. Each concept and each quiz has a weight, depending of its importance in the hierarchy. It is supposed that the platform run for some time such that there is a history of performed activities for a large number of learners. Figure 1 presents a general e-Learning infrastructure for a discipline.

Figure 1. General structure of a discipline

Once a course manager has been assigned a discipline he has to set up its chapters by specifying their names and their associated concept maps. For each concept managers have the possibility of setting up one document and one pool of questions. When the discipline is fully set the learning process may start for learners. Any opening of the document and any test quiz that is taken by a learner is registered. The business logic of document retrieval tool wills using these data in order to determine the moment when it is able to determine the document (or the documents) that are considered to need more attention from the learner. The course manager specifies the number of questions that will be randomly extracted for creating a test or an exam.

978-1-4244-4840-1/10/$25.00 ©2010 IEEE

Let us suppose that for a chapter the professor created 50 test quizzes and he has set to 5 the number of quizzes that are randomly withdrawn for testing and 15 the number of quizzes that are randomly withdrawn for final exam. It means that when a student takes a test from this chapter 5 questions from the pool of test question are randomly withdrawn. When the student takes the final examination at the discipline from which the chapter is part, 15 questions are randomly withdrawn. This manner of creating tests and exams is intended to be flexible enough for the professor. B. Concept Maps Concept mapping may be used as a tool for understanding, collaborating, validating, and integrating curriculum content that is designed to develop specific competencies. Concept mapping, a tool originally developed to facilitate student learning by organizing key and supporting concepts into visual frameworks, can also facilitate communication among faculty and administrators about curricular structures, complex cognitive frameworks, and competency-based learning outcomes. To validate the relationships among the competencies articulated by specialized accrediting agencies, certification boards, and professional associations, faculty may find the concept mapping tool beneficial in illustrating relationships among, approaches to, and compliance with competencies [2]. Figure 2 is a typical example of a concept map. It addresses a simple question, “What is a plant?” and illustrates how crosslinks can be made and concepts organized. This is just one way the concepts listed on the left of the figure can be positioned in a hierarchical fashion to show relationships. Recent decades have seen an increasing awareness that the adoption of refined procedures of evaluation contributes to the enhancement of the teaching/learning process. In the past, the teacher’s evaluation of the pupil was expressed in the form of a final mark given on the basis of a scale of values determined both by the culture of the institution and by the subjective opinion of the examiner. This practice was rationalized by the idea that the principal function of school was selection - i.e. only the most fully equipped (outstanding) pupils were worthy of continuing their studies and going on to occupy the most important positions in society. According to this approach, the responsibility for failure at school was to be attributed exclusively to the innate (and, therefore, unalterable) intellectual capacities of the pupil. The learning/ teaching process was, then, looked upon in a simplistic, linear way: the teacher transmits (and is the repository of) knowledge, while the learner is required to comply with the teacher and store the ideas being imparted [3]. Usage of concept maps may be very useful for students when starting to learn about a subject. The concept map may bring valuable general overlook of the subject for the whole period of study. It may be advisable that at the very first meeting of students with the subject to include a concept map of the subject.

Figure 2. An Example of a Concept Map (Novak, The Institute for Human and Machine Cognition)

C. HMM - Hidden Markov Modeling Hidden Markov model (HMM) is a well-known approach to probabilistic sequence modeling and has been extensively applied to problems in speech recognition, motion analysis and shape classification [4, 5]. The Viterbi algorithm has been the most popular method for predicting optimal state sequence and its associated maximum posterior probability value. The BaumWelch re-estimation, a variant of the well-known EM algorithm, has been the predominant model estimation technique given a HMM topology. In HMM based learning, initialization is a crucial step due to the local behavior of the standard procedure used to estimate the HMM parameters. Training with inappropriate initializations may be trapped in a local minimum and should be avoided whenever possible. Another fundamental issue is the determination of the structure of the HMM. The choice of a good model structure is critical to the effectiveness of the learning process [5]. The problem of characterizing a user’s behavior on a web-site has gained popularity due to the rapid growth of the need to personalize and influence users’ browsing experience. Markov models have been found well suited for addressing such problems. In general, the input for these problems is a sequence of some web pages accessed by a user and the objective is to build Markov models that can characterize the user’s behavior [6, 7]. In our approach the input is reflected by the sequence of results regarding specific concepts. However, in most relevant work the web users’ behavior is modeled by a simple Markov chain where the individual states are directly associated with particular navigation links on web pages. In other words, there are no hidden states for representing different internal cognitive behaviors of users. In our approach each concept represents a state that is described by the set of quizzes that were answered. This paper aims at

exploiting the capabilities of hidden Markov models for building a support system. The main objective is to investigate HMM approach as main intelligent procedure that produces knowledge. There are addressed issues of model selection and initialization. III.

HMM MODELLING

A. Introduction to hidden Markov models A Markov process, is a mathematical model for the random evolution of a memory less system, that is, one for which the likelihood of a given future state, at any given moment, depends only on its present state, and not on any past states. This means the process has the Markov property. HMM is defined by a transition matrix which defines the transition probabilities between N hidden states and an emission matrix, which defines the probabilities of M discrete observations under each state. A vector of initial state probabilities is also necessary.

Thus, HMM may be defined as:   , , , where A is the transition matrix, which defines the transition probabilities between N hidden states, B is the emission matrix, which defines the probabilities of M discrete observations under each state and π is a vector of initial state probabilities. A typical HMM-based classifier adopts the maximumlikelihood criterion, where an unknown sequence O is assigned to the class showing the highest likelihood.

  arg    |  , where  is the HMM corresponding to the i-th class. This requires training C HMMs for a C-class problem. The probability term  |  can be computed with the well-known forward-backward procedure [8]. The learning process for a learner may be seen as a sequence of actions regarding the activities related to a concept. Sequential modeling is based on stochastic methods that form the sequences. The probabilistic model may be used for predicting subsequent visits as states in a Markov chain [9].

B. Learner’s classification methodology When learners interact with the e-Learning environment, the learning actions and processes are dynamically updated. That is why the learners are classified using a ContinuousTime Markov Chain which includes the learning processes of every learner and we denote the advanced learners’ probability as the transition probability from learner X to learner Y over a step period. The weight of a concept within a learner’s learning process is obtained taking into account the experiences of previous learners. The weight is high (which means is close to 1) for hard questions and is low (which means is close to 0) for easy questions. For example if from 1000 learners 100 answered correctly to a question this means this is a hard question with a high weight of 0.9. The average weights of questions that relate to a concept determine the weight of the concept, which is also between 0 and 1. Learners generally have different usage scenarios regarding learning activities. We think that we may differentiate among

classes of learners starting from performed activities. The content access sequence may be regarded as input data for a process of user modeling. The obtained user model may thereafter used to characterize the navigation sequence for the corresponding class members. The final goal is to obtain the model’s parameters such that any new user may be classified. Having in mind the above considerations the Hidden Markov Model (HMM) looks like a natural choice since it meets the requirements of modeling discrete temporal sequences like web content access actions. Each class of users will be characterized by its corresponding HMM. Discovery of new classes of users is concretized by defining new HMM. The first step of the overall framework is training. The input data is represented by the set of navigation sequence for all users. The Baum-Welch algorithm [10] is used to iteratively estimate the parameters of the HMM such that the likelihood of the sequences is maximized. The procedure is run for all user classes so that the parameters of corresponding HMM are estimated. In this way there may be obtained classes of user behaviors that are modeled in terms of transitions within a HMM. In order to be classified a learner must have a minimum required activity such that he may be classified. The forwardbackward procedure will determine to which existing HMM he belongs. At this point a model selection procedure takes place. Model selection procedure is based on two prerequisites: the topology of HMM and the number of hidden states. Learner’s sessions are made of performed actions of learners that are obtained by custom preprocessing of raw log data. Finding hidden states is not a trivial procedure. We suppose that there are several hidden states of psychological nature behind the observation sequences. Still, temporal order between different states may be very hard to prove. That is why, a generic Markov model is used such that relations between all hidden states are taken into consideration. Having hidden states of abstract nature the number of states is very hard to be determined. The classical Bayesian Information Criterion (BIC) is used as score function to evaluate each possible model structure. The following score functions [11] is used in model selection.     2   



log #

 is the negative log-likelihood function measuring the goodness of fit of the model described by  .  is the number of free parameters in  and # is the sample size taken into account. The goodness of fit and model complexity is the characteristics that need to be balanced. The goal is a low model complexity and high level of fit. In general a tradeoff between these two characteristics is performed through the usage of the previously presented score function. Using such a nested searching algorithm the final goal is to obtain solutions that are reasonable regarding the level of fit and acceptable computational complexity. Having a high level of optimization of the HMM is highly influenced by the initialization of the training procedure. Choosing inappropriately a starting point may often lead to a suboptimal solution because solution may be trapped in a local minimum. High complexity methods may be used to avoid such situation. Still, a feasible solution is to repeat the

Figure 3. Binary Search Tree Concept Map

procedure several times with random initial starting points and use the best result as final solution. This approach proved to be robust and reliable. HMM learning algorithms relies on two matrices A and B as presented in introduction. Random initialization of both matrices is not generally a good idea since it increases the computational complexity. Experiments have shown that transition matrix A has a greater influence in the final result than emission matrix B. That is why we cannot afford to have both matrices randomly initialized. At each trial matrix A is randomly initialized and matrix B is initialized to some default unique distribution. This initialization strategy proved to be simple and yet effective regarding obtained results. C. Effective student’s class discovery methodology Firstly, there is needed a formula for obtaining a weight for each concept. For this purpose, the following formula is used:

$  exp ) ,+ , where sk is knowledge weight a learner + gained after accessing concept k. Sk is the average knowledge weight for all learners who accessed the concept k. Thus, Wk is the weight of concept k. *

A learner’s matrix where learner’s relative accumulated knowledge is computed using the function: -. 

∑012 $. ., ∑012 $ .,

Where pia is a distance path between learners a and i. Cai is the concept that has been accessed by both learners a and i. k represents the learning objects that were accessed both by learners a and i.

Sai,k is normalized using the following formula:

4$., ) 4$,  1 , 67 4$., 8 4$, @ .,  3  0, :;?6 =

LWa,k is the learning weight for learner a regarding the concept k. Similarly, LWi,k is the learning weight for learner i regarding the concept k. Ak is the average of learning weights for all learners who had activity regarding the concept k. The above formulas are used as in [13] to compute the learner’s matrix. GH 67 I H 8 0 E

H @ F 1 :;?6 = E D J Where N is the number of learners and CCC AB is the revised transition probability matrix. The primitive stochastic matrix -K is:

CCC  AB 

== M J In this way different classes of users (A – advanced, B – average and C - beginners) may be obtained along with individual profiles derived from performed activities. C CCC AB  L  1 ) L

The recommendation procedure starts by an initial classification of students in three classes: A, B and C. Once the classes are obtained any new learner is classified and thus a corresponding HMM general model is obtained. The rating estimation is computed as in [14]. Finally, a concept is

recommended to the learner based on item-to-item collaborative filtering [15] from the set of concepts which were note accessed by the learner. IV.

EXPERIMENTAL RESULTS

The setup and experimental results were obtained on Tesys e-Learning platform [12]. On this platform there was set Algorithms and Data Structures discipline. The tests were performed for the chapter named Binary Search Trees. Figure 3 presents the concept map for Binary Search Trees chapter.

we denote as HMM A, HMM B and HMM C respectively. Random transition matrix initialization is employed to reach global optimal solution. The number of hidden states is selected by BIC using equation. Table 3 gives the normalized score function values under several different state numbers, which shows that the 6-state topology leads to minimal score function and is thus the best choice for the model. This leads to an optimal topology with a 6x6 transition matrix and a 6x7 emission matrix. TABLE III.

For each concept there is associated a set of quizzes. As learners start answering quizzes the table data regarding performed activities is continuously updated. Table 1 presents the structure of recorded activities. Table 2 presents a snapshot for the activities table. The total number of students of 400 and the total number of problems is 200. The time during which results have been taken is six months. In order to obtain relevant results we pruned noisy data. We considered that students for which the number of taken tests or the time spent for testing is close to zero are not interesting for our study and degrade performance and that is why all such records were deleted. After this step there remained only 268 instances. TABLE I.

STRUCTURE OF RECORDED ACTIVITIES

Event type

Details

login

start time and duration

view document

start time, duration, concept

self test

start time, duration, quizzes

messages

start time, duration, receiver categories

TABLE II. User id 10 11 12 13

SAMPLE ACTIVITIES TABLE

Event type

Details

login

10.10.2009 11:20, 25 min

view document self test

10.10.2009 11:21, 10 min, concept11 10.10.2009 11:31, 3 min, concept11 10.10.2009 11:34, 5 min, prof3, prof4

messages

The students were divided in three groups A, B and C according with results obtained in previous year of study at Computer Programming discipline. In order to have relevancy, groups A and C have 25% of students each, while group B has 50% of the students. In each group, a number of students are selected at random and their navigation sequences are used as the training data set while the others form the test data set. For this study there are 90 % of students are used for training and the remaining 10% are used for testing. The procedure is repeated ten times such as a tenfold cross validation is performed. For each group of students under study, a corresponding HMM is constructed to model their navigation behavior, which

BIC SCORE FUNCTION FOR DIFFERENT NUMBER OF STATES

No. of states

3

4

5

6

SBIC(A)

1.26

1.25

1.18

1.15

SBIC(B)

1.31

1.21

1.21

1.18

SBIC(C)

1.41

1.31

1.51

1.09

The parameters of the three HMMs are determined based on the training sequences. The classification performances are evaluated by applying these three models on test data. Test data is represented by actions which are not included in the training set. For each group of test data there are tested sequences with different lengths. Still, each HMM is used for all students disregarding the initial class in which they are places. This methodology is used in order to check if obtained models may distinguish between classes. The results showed higher log likelihood for the learners that actually belong to that class than other students. This is a clear indication that each obtained HMM obviously distinguishes between students that belong to corresponding class and all other students. For the test data the classification accuracy is 88.5% for class A, 93.31 for class B and 91.2% for class C. The obtained results showed an accuracy increase for students with longer sequences. This result is somehow expected since longer sequence means more data and therefore more predictable behavior. The in depth analysis may reveal activity length thresholds from where accuracy levels are very good. When sequence length is greater than 50 the accuracy of classification is at least 98%. Once we have the HMMs there may be obtained other relevant evaluations. It may be computed the equilibrium state probabilities that is associated with all three HMMs. The probabilities computations show that groups A and C have greater entropy than group B. This observation has a great importance regarding the predictability of user’s activities. Accordingly, the Markov chain may be obtained starting from the concept map. The Markov chain has only in-memory representation and represents the model that stands the basis for learner’s score computation. We assume students’ score in every concept are that student A is (0.8, 0.7, 0.7, 0.6) and student B is (0.8, 0.6). Both student A and B have same pre-testing score ”0.7”. Then the “SCORE” of student A is 0.7 and that of student B is 0.7. we can also compute the first arrival probability of student A is 1/4 and that of student B is 1. When evaluating student assessment, SCORE

is firstly considered and the second is the first arrival probability “f”. Observing the data of student A and B, student B’s assessment is higher. V.

CONCLUSIONS AND FUTURE WORK

We have created a procedure of data analysis which may be used for running a support system for an e-Learning environment. The procedure is based on learning activities performed by learners and uses processes analysis. An e-Learning platform has been set up such that a set of learners may study a discipline that is well structured. This means that the discipline is divided into chapters and that each chapter has assigned a concept map. More than this, each concept defined in the concept map has assigned a document and a set of quizzes. The platform on which the study may be performed needs to have built in capabilities of monitoring students activities. The business goal of the platform is to give students the possibility to download course materials, take tests or sustain final examinations and communicate with all involved parties. To accomplish this, four different roles were defined for the platform: sysadmin, secretary, professor and student. The main contribution of our proposed system is to provide instructors a course content design and management system. Through the internet technology, students are able to access and study course contents via browsers. The load of instructors has been reduced by using our system because our system is easy to operate and requires no professional background of computer technology. Besides, our system collects information of students while they browse course materials then with the collected information, system can automatically generate individual tutorial for each student as supplement for his weakness of study. This approach makes usage of HMM classification in webbased education systems. BIC is employed to select proper model topology and random initialization is used to guarantee global optimization. Our experiments show that the proposed approach is capable of assigning new student users to their corresponding categories with high accuracies. Although our proposed system is practically useful, we still hope to improve it in some ways. We are considering to add course adoption function into our current system then instructors can simply add a new course without designing all course units. We also hope to provide not only individual tutorial for students but also suggestions for modifying course units and problems for instructors.

provide facilities for adaptive assessment. In this vein, extending existing open source tools seems to be an interesting idea. By using the technology of asynchronous distance education, our proposed system helps students to break the limitation of time and spatial. Study becomes more convenient for students and the instructors’ load caused by new technology also has been reduced by our proposed system. We hope our proposed system can benefits both educators and students. The evaluating strategy traces student’s activities so that it can evaluate his/her performance in every concept. Furthermore, it is to acquire information from the learning process. For most students, it is just not possible to have consistent help through many tests at many different times. In addition, alternative assessment techniques in every concept also discourage cheating and plagiarism. According to the experiment’s analysis, the new evaluating strategy is usability and reliability. [1]

[2]

[3]

[4]

[5]

[6] [7]

[8] [9] [10]

[11] [12]

[13]

In this approach, we introduce an intelligent assessment methodology based on the students’ knowledge information. To this end it has been realized the representation of associations between test questions and learning concepts. Also, questions are distinguished in three levels of difficulty. Assessment is done at two levels, the concept and the goal level. In the assessment process, the difficulty level of questions is taken into account, which is not the case in existing systems. To technically achieve the above, expert systems technology is used. Very few e-learning environments

[14]

[15]

Joseph D. Novak & Alberto J. Cañas (2006). "The Theory Underlying Concept Maps and How To Construct and Use Them", Institute for Human and Machine”Cognition. E., McDaniel, B., Roth, and M. Miller, Concept Mapping as a Tool for Curriculum Design. Issues in Informing Science and Information Technology, 505-513, Flagstaff, AZ, June 16-19, 2005. L., Vecchia, M., Pedroni, Concept Maps as a Learning Assessment Tool. Issues in Informing Science and Information Technology, Volume 4, 2007. L. Rabiner, A tutorial on Hidden Markov Models and selected applications in speech recognition, Proc. IEEE, vol. 77, no. 2, pp. 257286, 1989. M. Bicego, V. Murino, Investigating Hidden Markov Model’s capabilities in 2D shape classification, IEEE Trans. PAMI, vol. 26, no. 2, pp. 281-286, 2004. R. Sarukkai, Link prediction and path analysis using Markov chains, Computer Networks 33: 377-386, 2000. M. Deshpande and G. Karypis, Selective Markov models for predicting web page accesses, ACM Trans. on Internet Technology 4(2): 163-184, 2004 G.D. Forney, Jr., “The Viterbi Algorithm”, Proc. IEEE, vol. 61, Mar. 1973, pp. 268 - 278. J.R.Norris, Markov Chains, Cambridge University Press,1997. L. E. Baum, T. Petrie, G. Soules, and N. Weiss, "A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains", Ann. Math. Statist., vol. 41, no. 1, pp. 164–171, 1970. D. Hand et al., Principles of data mining, Cambridge, MIT Press, 2001. Burdescu, D.D., Mihăescu, M.C., (2006), “Tesys: e-Learning Application Built on a Web Platform”, in Proceedings of International Joint Conference on e-Business and Telecommunications, pp. 315-318, Setubal, Portugal. INSTICC Press. X.Song, B.L.Tseng, C.Y.Lin, and M.T.Sun, “Personalized Recommendation Driven by Information Flow”, International ACM SIGIR Conference on Research & Development on Information Retrieval, 2006. G. Adomavicius, “Incorporation Contextual Information in Recommender Systems Using a Multidimensional Approach”, ACM Transactions on Information Systems, Vol.23, No.1, pp. 103-145, 2005. G.Linden, B.Smith, J.York, “Amazon.com Recommendations: Item-toItem Collaborative Filtering”, Proc. 7, IEEE Internet Computing, Vol.7, No. 1s, pp.76-78, 2003.