Mining Categories of Learners by a Competitive

0 downloads 0 Views 56KB Size Report
classical validity measures, i.e. the intra-cluster distance and the inter-cluster distance, respectively. The intra-cluster distance represents an average of distance ...
Mining Categories of Learners by a Competitive Neural Network G. Castellano, A.M. Fanelli and T. Roselli Computer Science Department, University of Bari Via Orabona, 4 - 70126 Bari, Italy E-mail: [castellano, fanelli, roselli]@di.uniba.it

Abstract This paper addresses the problem of user modeling, which is a crucial step in the development of Adaptive Hypermedia systems. In particular, we focus on adaptive educational hypermedia systems, where the users are learners. Learners are modeled in the form of categories that are extracted from empirical data, represented by responses to questionnaires, via a competitive neural network. The key feature of the proposed network is that it is able to adapt its structure during learning so that the appropriate number of categories is automatically revealed. The effectiveness of the proposed approach is shown on two questionnaires of different type. 1

Introduction

Adaptive hypermedia (AH) is a new direction of research within the area of adaptive systems [1], [2]. AH systems build a model of the user and use it for adaptation to the needs of that user, e.g. to adapt the content of a hypermedia page to the user knowledge and goals, or to suggest relevant links to follow. AH systems are useful in any application area where the system is expected to be used by people with different knowledge, goals and background, and where the hyperspace is reasonably large. All AH systems are based on modeling the user and presenting his information needs in the form of a profile. Hence “user modeling”, which concerns building and updating the user model, represents a crucial step in the development of AH systems. These systems and their benefits have been reviewed in many studies regarding user modeling and evaluations [3], [4], [5] In this paper we focus on a specific class of AH systems, represented by adaptive educational hypermedia systems, where the users are learners with different knowledge, attitudes and background. In such a case, the system uses the information about the learner to determine the didactic material and the suitable path to provide him [6]. Also, the AH system must be able to model the student even before the learning process begins, in order to be able to tailor the

teaching strategies to his learning profile, interests and real needs. To derive a model for such users, we address the problem of mining categories of learners with “similar” interests and attitudes from empirical data coming in the form of questionnaire responses. One of the simplest ways to automatically identify groups of similar learners is the use of cluster analysis. Instead of deciding subjectively on the attribute values for each category of learner, cluster analysis detects groups of similar cases, and the attribute values for each cluster are found from those of the cluster members. Members of the same cluster are similar in terms of their response data, and members closer to the cluster center are more typical of that category than those at a greater distance from the center. To this aim, a competitive neural network is adopted that performs cluster analysis of data (in the form of questionnaire answers) to aggregate the data into clusters. From each cluster a learner profile is directly derived. Existing approaches for user profile extraction based on clustering techniques [2] require the number of clusters to be fixed in advance. Since the inherent structure in the data is unknown and a-priori knowledge about the number of user categories is rarely available, the proper number of clusters must be established by expensive trial and error processes. The approach proposed in this work overcomes this limitation. Learner profiles are extracted from data by a competitive neural network, which is able to adapt its structure during learning so that the appropriate number of categories is automatically revealed. The paper is organized as follows. In Section 2 we introduce the proposed strategy for mining learners profiles by categorization of questionnaire responses. In Section 3 we describe the competitive neural network and its learning algorithm to mine learner categories from data. Section 3 shows some experimental results on two questionnaires of different type. Finally, conclusions are covered in Section 4.

2

Extraction of categories from data

The proposed approach is concerned with finding categories of learners from a set of data consisting of learners’ responses to questions that may concern knowledge about a specific domain, background details and psychological attributes. Learner categories are extracted in the form of clusters, i.e. entities embracing collections of numerical data that exhibit some functional or descriptive commonalties. Formally, we assume the availability of a set of data concerning the responses of N students to a questionnaire made of n questions, each with multiple answers. We denote the dataset by:

D N = {x (t ) = (x1 (t ),..., x n (t ))}tN=1 where x (t ) represents the vector of responses given by the

t-th student to the n questions. Each component x i (t ) is an integer denoting the index of the response provided by the tth student to the i-th question. To embrace such response vectors into prototypes, we consider a competitive neural network with a learning algorithm that is able to perform clustering in the input space while determining the number of prototypes needed to properly model the data. By doing so, the number of student categories and the attributes of each category are simultaneously and automatically established. The strategy for extraction of learner’s categories is portrayed in Figure 1.

3

The competitive neural network

To find proper clusters in the input space, we use a singlelayer neural network that is trained via a competitive learning algorithm with the ability to adapt the number of clusters as the learning proceeds. The algorithm is a soft competitive learning algorithm similar to [7] that introduces concepts of reward and punishment into the standard competitive learning scheme. Starting from an initial network structure, based on a guessed maximum number of clusters (learners categories), given as a form of a-priori knowledge, the algorithm dynamically reduces the network structure, providing the proper number of clusters automatically. In this way a suitable number of clusters for representing the input data is automatically selected.

Figure 1: Diagram of the process of mining categories from data. Specifically, when an input vector x is presented to the network, its units compete and the unit whose weight vector c is closest to the input vector is chosen as winner. The unit with the second closest weight vector is marked as the rival. Then, the winner unit is rewarded, i.e. its weight vector is updated so as to become closer to the current input vector. Conversely, the rival unit is punished, i.e. its weight vector is updated so as to move away from the input vector. Through this mechanism the weight vector of the rival unit is pushed far away from the cluster towards which the weight vector of the winner is moving, thus implicitly assuring that each cluster is represented by only one weight vector. The use of this reward-punishment mechanism, that gradually drives the weight vectors of extra units far away from the distribution of the data, allows the appropriate number of weight vectors, and hence the number of clusters, to be automatically selected. The closeness between a pattern and a cluster center is measured in terms of the Euclidean distance scaled up with the winning frequency of the unit, according to the Frequency-Sensitive Competitive learning proposed in [8]. The advantage of incorporating a frequency-sensitive term is that underutilization of small clusters is avoided. The complete learning algorithm is summarized in Figure 2. Summarizing, starting with H units, the network selforganizes its structure by automatically finding a set of K

units

K≤H

whose weight vectors

c k (k = 1,..., K )

represent the centers of clusters in the input space. Each cluster center is used to derive a profile that summarizes the typical attributes of learners grouped in that cluster. Hence, the final number of network units provides the number of learner categories needed to properly model the data. In other words, by adapting its structure, the network adapts the categorization level to the available data. BEGIN 1. τ : = 0 /* epoch number */ 2. Initialize randomly the weight vectors c k , k = 1,..., H . 3. Initialize the learning rates α w and α r for the winner and the rival, respectively, so that 0 ≤ α r ≤ α w ≤ 1 . 4. REPEAT a) τ : = τ + 1; b) For each training input vector x (t ), t = 1,... N : ƒ For k = 1,..., H compute the distances: N D(x (t ), ck ) = H k



d (x (t ), ck )

4

Experimental results

To illustrate the effectiveness of the proposed approach in mining a proper number of student prototypes that reflects well the given data, two questionnaires that differ in the type and number of questions, are considered in this section. The first questionnaire is domain-independent, the latter is domain-dependent. In both cases, the learning rates in the soft competitive learning algorithm were fixed at α w = 0.25 and

α r = 0.15 . The clusters resulting from each run were evaluated in terms of compactness and separation using two classical validity measures, i.e. the intra-cluster distance and the inter-cluster distance, respectively. The intra-cluster distance represents an average of distance between all pairs of vectors within the k-th cluster and is given by: Dintra (k ) =

∑ x (t )∈Ck ∑ x (s )∈Ck ,t ≠ s d (x (t ), x (s ))2

k =1

where N k is the cumulative number of the winning occurrences for unit u k ∈ L3 and

d (⋅,⋅) is the Euclidean distance. ƒ Determine the winning unit u w ∈ L3 and its rival u r ∈ L3 according to:

w = arg min D(x (t ), c k ) k

r = arg min D(x (t ), c k )

where N k is the number of input vectors assigned to cluster C k . This measure is inversely related to the compactness of a cluster, hence for a good clustering the intra-cluster values should be small. The inter-cluster distance represents an average of the distances between vectors in the k-th cluster and vectors in the h-th cluster. It is given by:

k ≠w

ƒ ƒ

Update the number of winning occurrences for the winner: n w = n w + 1 ; Update weight vectors of the winning and the rival unit according to: c w = c w + α w (x (t ) − c w ) c r = c r − α r (x (t ) − c r )

c)

Modify the learning rates α w and α r according to a linear decay; H

5.

UNTIL

1 H

∑ c k(τ +1) − c k(τ )

N k ( N k −1 )

Nk

≤ε

k =1

Remove all units u k ∈ L3 whose weight vector c k falls outside the input range. END

6.

Figure 2: The competitive learning algorithm.

Dinter (k , h ) =

∑ x (t )∈Ck ∑ x (s )∈Ch d (x (t ), x (s ))2 Nk Nh

For a good clustering, the inter-cluster distances should be high because they measure the separation between clusters. 4.1. First questionnaire The first experiment concerned the extraction of learner categories to be used in the development of an adaptive educational system about archeology. The dataset contains 50 response vectors obtained by submitting a questionnaire of 16 questions to a group of 50 students of our University that were regarded potential users of the educational system. All the questions with corresponding possible responses are given in Figure 3. It can be seen that the first group of questions concerns background details and psychological attributes of students. The second group of questions concerns the ability to use

common types of software and questions in the last groups try to understand the preferences about the features required to an adaptive educational system.

1) a) b) 2) a) b) 3) a) b) 4) a) b) 5) a) b) 6) a) b) 7) a) b) 8) a) b) 9) a) b) 10) a) b) 11) a) b) 12) a) b) 13) a) b) 14) a) b) c) 15) a) b) c) d) 16) a) b) c)

Gender Male Female Are you a confidential person? Yes No Are you a quiet person? Yes No Are you a shy person? Yes No Are you a pessimistic person? Yes No Do you watch television more than 15 hours a week? Yes No Do you work with computer more than 15 hours a week? Yes No What is your level of knowledge about Windows? Beginner Expert What is your level of knowledge about word-processors? Beginner Expert What is your level of knowledge about e-mail? Beginner Expert What is your level of knowledge about WEB? Beginner Expert What is your level of knowledge about MS Office? Beginner Expert What is your level of knowledge about archeology? Beginner Expert What type of assistance would you like to receive by an educational system? None Some All In which way would you like to get assistance by the system? You make questions to the system when you need assistance The system assists you when you need it You call for assistance and you get it automatically Other different ways How do you like the educational system to adapt to your knowledge? By changing the presentation of links in the document By changing the contents of the document By enabling access to all the available information with all the links

Figure 3: The first questionnaire.

Note that only one question (no. 13) concerns the level of knowledge about the specific teaching domain. In fact this test is domain-independent, aiming to ascertain the learner’s individual characteristics. To obtain numeric input vectors to be processed by the network all the responses were translated into integer numbers denoting the index of the response given by each student. For example the response vector a b b a a b a b a a b b a c d c

was transformed into the following integer vector 0 1 1 0 0 1 0 1 0 0 1 1 0 2 3 2

To test the ability of the proposed learning algorithm to find the proper number of clusters needed to represent the structure in the input data, three runs of the algorithm were performed starting with a different initial guessed number of clusters. Precisely three networks were trained, having an initial structure of 20, 15 and 10 units, respectively. Table 1 summarizes the results of such three trials (labeled by A, B, and C, respectively), in terms of final number of prototypes found by the competitive learning algorithm and validity measures computed on the resulting clusters. It can be seen that in all the trials, 8 prototypes were discovered through the network learning. This demonstrated the robustness of the algorithm, which is able to find a suitable number of clusters regardless of the initial structure of the network. Also, it can be seen that the extracted clusters are quite good both in terms of compactness (intra-cluster distance) and separation (inter-cluster distance). In particular, trial B provides the best partitioning of the input space, with the best compromise between a small average value of intra-cluster distance and a large average value of inter-cluster distance. Then, the extracted clusters were translated in a linguistic form to provide learner profiles. As an example, Figure 4 gives a synthetic description of one among the 8 profiles derived from the clusters generated in the best trial (B). Table 1: Results of clustering performed via the competitive learning algorithm on the first data set. Trial

No. of initial prototypes

No. of extracted prototypes

Average intra-cluster distance

Average inter-cluster distance

A B C

20 15 10

8 8 8

0.6521 0.2948 0.6875

0.8252 0.8619 0.9090

Learner profile no. 1 Female, not confidential nor quiet, but shy and pessimistic person. She spends more than 15 hours a week watching television, but less time working with computer. She is expert of Windows and word-processor, but knows only at a beginner level e-mail, web and MS Office. She would like to get some assistance from an educational system, by calling for assistance and getting it automatically. She would like a system to adapt to her knowledge by changing the contents of a document.

same number of prototypes, irrespective of the initial guessed number of prototypes given to start the learning. In Figure 6 we provide a brief description of one of the 8 prototypes derived by the clusters generated by the algorithm in the best trial.

…………………………… 11) The inequality f (x ) < k is solved by requiring that:

{f (x ) > 0; f (x ) < k } {f (x ) < 0; f (x ) > k}and {f (x ) > 0; f (x ) < −k} c) {f (x ) ≥ 0; f (x ) < k }and {f (x ) < 0;− f (x ) < k} d) − f (x ) > k

Figure 4: One of the profiles extracted from learner responses to

a)

the questionnaire in Figure 3.

4.2. Second questionnaire The second experiment concerned the extraction of learner categories to be used in the development of an adaptive educational system concerning mathematics. The dataset contains 100 response vectors obtained by submitting a questionnaire of 20 questions to a group of 100 students of the Computer Science Diploma Course at our University. Unlike the first questionnaire, the questions in this second questionnaire are all aimed to test the attitude of students to mathematics, by checking their ability to solve equations and inequalities. Some of the questions with corresponding possible responses are given in Figure 5. This questionnaire is quite different from the first one, since all the questions are aimed to check the knowledge of the student about the subject of the teaching and there are no psychological questions. It is an example of domain-dependent test. As for the first questionnaire, all the responses were translated into integer numbers denoting the index of the response given by each student. Due to the domaindependent nature of the questions in this questionnaire, it may happen that a learner does not answer some questions. To represent the “no answer” case in the numeric response vectors, we used the integer value –1. As before, the competitive learning algorithm was run for three times, starting with a network structure with 20, 15 and 10 units (number of clusters), respectively. Table 2 shows the average validity measures computed on the clusters generated by the competitive learning algorithm. They indicate that the intra-cluster distances for most clusters is quite small while the inter-cluster distance is quite large, meaning that the algorithm makes a good job in finding compact and separated clusters that produce meaningful learner profiles. Also, as before, it is evident the robustness of the algorithm, that is able to find always the

b)

………………………

(

)

14) the solutions of inequality 1 / x 2 − 1 > 0 are a) every value of x b) x > 0 c) x < −1 or x > 1 d) x ≠ −1 and x ≠ 1 ……………………….. 17) the equation that expresses the text “ given two numbers such that the triple sum of the first number and the square of the second number is equal to the squared sum of the first number and the double of the second number” is: a) 3 x + y2 = x2 + 2 y

(

)

b) 3x + y = (x + 2 y )2 2

(

)

c) 3 x + y 2 = x 2 + 2 y 2 d) none of the previous answers is correct …………………….

Figure 5: Some questions of the second questionnaire.

Table 2: Results of clustering performed via the competitive learning algorithm on the second data set. Trial

No. of initial prototypes

No. of extracted prototypes

Average intra-cluster distance

Average inter-cluster distance

A B C

20 15 10

7 7 7

0.2841 0.3646 0.3727

0.7193 0.6850 0.6725

Learner profile no. 1 This type of learner, who gives a correct answer to all questions except for questions no. 11 and 17, shows to have learnt knowledge about the properties of elementary functions and about the solution of simple inequalities. Conversely, he is not able to translate the natural language into a mathematical formalism nor to understand the meaning of some more complex concepts, such as absolute value, squared root, etc….

Figure 6: One of the profiles extracted from learners’ responses to the questionnaire in Figure 3. 5

Conclusions

In this paper a competitive neural network and its learning algorithm have been proposed for the extraction of learners categories from unlabeled data represented by responses to questionnaires. The categories are derived by clustering data into groups that embrace similar responses. One main feature of the proposed categorization technique is the ability to extract the number of learner categories automatically. The preliminary experimental results presented on two different questionnaires show that the network is able to extract a proper number of clusters from data, which result in well separated and representative learners’ profiles. Actually, the considered questionnaires are very simple, since from responses to their questions only some of the important notions about the learner can be derived. As a consequence, since the quality of the resulting profiles depends heavily on the quality of data, the derived profiles are not sufficiently complete to be used in an AH educational system. To tailor the instructional program to real needs, the students should be modeled right from the start, by considering both domain-dependent notions already present - for a classification as beginner, intermediate, advanced and domain-independent individual characteristics. Future work will be devoted to test the proposed approach on more comprehensive questionnaires in order to obtain more complete and meaningful learner’s profiles. The proposed approach to mine learner categories from data is the first step towards the development of an adaptive educational hyper-media environment that conforms to the attributes of individual learners by providing the best presentation of the learning material and suggesting the most relevant links to follow. Such environment is intended to use the trained competitive network to classify new

learners into one of the categories extracted during learning of the network. In such a way, on entry to the system, the learner is assigned to a category depending on its response vector and the system provides the best presentation of the learning material suited for that student category.

6

References

[1] S. Milne, E. Shiu and J. Cook, “Development of a model of user attributes and its implementation within an adaptive tutoring system”, User Modeling and User-Adapted Interaction, 6:303-335, 1996. [2] P. Brusilovsky, “Methods and techniques of adaptive hypermedia”, User Modeling and User-Adapted Interaction, 6:87-129, 1996. [3] J. Fink and A. Kobsa, “A review and analysis of commercial user modeling servers for personalization on the World Wide Web,” User Modeling and UserAdapted Interaction, 10:209-249, 2000. [4] P. Brusilovsky, A. Kobsa and J. Vassileva, eds., Adaptive Hypertext and Hypermedia, Kluwer Academic Publishers, Dordrecht, 1998. [5] M. Specht, “Empirical evaluation of adaptive annotation in hypermedia,” Proc. of the EDMEDIA98, Freiburg, Germany, pp. 1327-1332. [6] T. Roselli, “Artificial Intelligence can improve Hypermedia instructional technologies for learning,” ACM Computing Surveys, 27(4):624-626. [7] L. Xu, A. Krzyzak, E. Oja, “Rival Penalized Competitive Learning for clustering analysis, RBF net, and curve detection,” IEEE Trans. on Neural Networks, 4(4):636-649, 1993. [8] S.C. Ahalt, A.K. Krisnamurty, P. Chen, and D.E. Melton, “Competitive learning algorithms for vector quantization,” Neural Networks, 3:277-290, 1990.