OpenAnswer, a framework to support teacher's

OpenAnswer, a framework to support teacher's management of open answers through peer assessment Andrea Sterbini

Marco Temperini

Dept. of Computer Science Sapienza University of Roma Roma, Italy [email protected]

Dept. of Computer, Control, and Management Engineering. Sapienza University of Roma Roma, Italy [email protected]

Abstract— Open-ended questions are an important means to support analysis and assessment of students; they can be of extraordinary effectiveness for the assessment of higher cognitive levels of the Bloom's Taxonomy. On the other hand, assessing open answers (textual, freely shaped, answers to a question) is a hard task. In this paper we describe an approach to open answers evaluation based on the use of peer-assessment: in a social-collaborative e-learning setting implemented by the OpenAnswer web system, the students answer questions and rate others' (and may be own) answers, while the teacher marks a subset of the answers so to allow the system inferring the rest of the marks. The aim of our system is to ease the teacher's marking burden and allow for a more extensive use of open ended questionnaires in her/his teaching activity. Keywords—assessment, peer-assessment, social collaborative e-learning

I.

INTRODUCTION

Open ended questionnaires (where questions can be answered through free text writing) can play a crucial role in the activity of evaluation and analysis of a learner's knowledge.; from a pedagogical point of view, they can be of extraordinary effectiveness for the assessment of skills [1,2]. On the other hand, assessing open answers is a hard task. A good deal of research activity has been performed in this regard, applying a variety of Computer Science techniques (see Sec. II). In this paper we describe an approach to open answers evaluation based on the use of peer-assessment in a socialcollaborative e-learning setting; we implemented it in the OpenAnswer web system, where a student answers questions, rates others' (and may be her/his own) answers, while the teacher marks a subset of the answers so to allow the system inferring the rest of the marks. The aim of our system is to ease the teacher's marking burden and allow for a more extensive use of open ended questionnaires in her/his teaching activity. OpenAnswer allows to: 1) deliver open ended questions to groups of students; collect the open-answers; 3) collect each student's peer assessment of some of the answers given by others (and possibly by her/himself); 4) managing a student model representing learner's competence and her/his ability to

assess/self-assess; 5) support the teacher's analysis and grading of the answers, by iteratively suggesting the next answer to assess so to reach a configuration in which all the remaining answers can be graded basing on the peer evaluations and the implicit endorsement derived by the teacher's marks. The student model is managed through a simple Bayesian-networks based approach, grounding on the student's given peer assessments and their correspondence to final grading of the answers related to those assessments. The next-question-tograde mentioned at point 5 above, is selected by taking into consideration the grades actually given by the teacher, the students models computed basing on such grades, and the peer assessments available on the remaining answers. This process brings the system to eventually assigning grades on a part of the answers still not graded by the teacher. This is done by exploiting the possible trust on the students' peer assessments, as pointed out by the underlying Bayesian networks model. So a complete grading of the starting set of answers can be reached, saving for the teacher a part of the direct grading work, and making the use of open ended questions more feasible in a didactic setting. We think that a good side effect of the use of OpenAnswer is in the students being exposed to assessment and self-assessment, which are high cognitive level activities in the Bloom's Taxonomy [2]. In previous work we have dealt with the definition of the formal models deemed to support the marking process, with a Constraint Logic Programming based approach and with a Bayesian Networks based model [3,4,5]. We hadn't yet implemented a system able to let teachers use either models, nor we performed real experimentations (limiting to simulated experimentations). Here we show the OpenAnswer web-based system, the implementation of Bayesian networks propagation and analysis, some basics of the underlying formal system, and an experimentation involving real students data. II.

RELATED WORK

Data mining with natural language processing and concept mapping have been used to summarizing questionnaires, extracting customers opinions and defining products reputation [6,7]. Mainly, in these cases, marketing and commercial applications were pursued. An overview of educational applications of the Data Mining Technology is in [8].

In [9] concept mapping is used to define and apply socalled "coding schemes" for answers: answers are labeled by such schemes to be analyzed and classified. Classification is supported by semantic annotations, labeling answers’ parts; labels are by human "coders" applying the coding-schemes. In [10] an implementation of a (partially semi-)automatic assessment of open-answers is shown, based on the use of ontologies and semantic web technologies. The ontology formally defines the domain of knowledge related to the questions. Then the ontological labels that have to characterize the answers are expressed as "Semantic annotations". When also the answers have been labeled in terms of the ontology, they can be analyzed, by evaluating the similarity of their ontological labels against the question's ones. Such evaluation is followed by grading. In [11] open answers are represented by set of algebraic transformations; they are analyzed to determine the implicit conceptions (and mis-conceptions) of the students, and to treat them with the ways of an intelligent tutoring system. An evaluation of this system has proven worthy when applied to answers constituted by purely algebraic expressions, without intermixed text in natural language. The work exploits admirably the fact that the derivation and evaluation of expressions of a formal language is highly automatable [12] and a possible expansion of the work points at integrating the comments intermixed in the answer in the grading process. III.

THE OPENANSWER SYSTEM

OpenAnswer is a module integrated in a preexisting PHPbased Leaning Management System (named sLMS). In it, each course can be upgraded so as to use OpenAnswer services. A teacher in OpenAnswer can

some directions to students during their peer-reviewing activity. When a question is selected for the questionnaire, the associated criteria are shown and some or all of them can be checked for the purposes of the specific questionnaire. In phase b) the teachers defines a session, basically by specifying what questionnaire will be used and how the groups of students are composed. In phase c) the questionnaire is submitted to the students, and the answers are entered. Then the answer period is closed the peer-assessment can take place. Consider that these time-spans can last as long as the teacher wish (from minutes to days; the only constraint is in the answer phase being closed before the peer-assessment starts. In phase e) the teacher start the analysis of a session. The present implementation of the system the main goal is support to experimentation, so two specific services are provided (which will probably not be part of the final release): 1) it is possible to “clone” a session that has been already peerevaluated, and 2) it is possible to select the method (strategy) that will be used by the system during the marking phase for the session (cfr. next section for the different strategies that can be used to support the teacher and suggest the “best next answer to mark”). Having clones of the same (peer-evaluated) session allows to apply and compare different strategies. One of the selectable strategies is the “manual” one: the teacher will assess all the answers (and this also is a relevant option in the present experimental phase). In this phase the teacher can select a session to grade (and select the strategy to apply) and start marking. The teacher is presented with the list of answers for the session/group (Fig.1).

a) define a questionnaire associated to a course; b) define a questionnaire session, that is an occasion for administering the questionnaire to a group of students; c) open/close the “answer time” for a session; d) open/close the “peer-assessment time” for a session; e) examine a session that has been peer-evaluated: here the teacher can assess and mark (some of) the given answers; f) publish the session grading for students. In phase a) the teacher specifies some information on the questionnaire, namely 1) the number of open ended questions, 2) how many answers will be peer-assessed by each peer, 3) cardinality of each group of students (they have to be partitioned in groups of 10-25 students, otherwise the computation burden of the system would make unfeasible its response time), 4) whether the student could be presented with her/his own answer to be assessed (and with what probability: always, never, 33% or 66%), 5) whether the answer should be anonymous or bear the author's username. Moreover the question(s) used in the questionnaire are selected from a repository (presently associated to the course, with no tag related to the topic). When a question was defined and added to the repository: a set of criteria was associated to it. Criteria are defined in a special repository. The association between a question and some criteria is important as it will help providing

Fig. 1. Guided correction: For each answer in the list, the distribution of probability of being Wrong/Fair/Right (cfr next section) is shown. E.g. the answer in figure is more likely to be wrong/fair, than right, according to what the system can infer so far during the grading phase).

The first question is the one suggested for grading (at any stage of grading, it is supposed to be the one that helps better the system to converge towards the possibility to grade automatically the answers not yet graded by the teacher, according to the strategy in use). In OpenAnswer a Student Model (SM) is managed: the competence of the student on the subject matter at hand, and her/his ability to assess others' and own answers are stored as the distribution of probability for her/his K and J variables over the values Good/Fair/Bad (cfr next Section). Each session grading starts using the present values in the students SMs; these values evolve, in a protected computing environment, according to the propagation mechanism occurring during teacher's work on the session. So, after a session has been

graded, temporary new values for the students SMs are stored: they are not directly used to update the global values of students SMs, rather they can be used to do that, if the teacher activates the updating process. This is because, in the present implementation of OpenAnswer we can have clones of a same session, to be graded by different strategies, and we would not like the modifications of the SM due to a grading done by a given strategy affecting the results of the same-session grading performed through another strategy; so, if there are several sessions graded, when the temporary SMs produced by one of them is used to update the global students SMs, then this possibility is lost by all the other graded sessions. IV.

THE BAYESIAN NETWORK MODULE

In OpenAnswer a Yap Prolog ([13,14]) module supports the management of the Bayesian network. The module • reads the run parameters from the DB; • builds the Bayesian network corresponding to the peers' assessment network;

according to what specified above) are as follows: the first table relates the student's knowledge about the questionnaire topic and the correctness of her/his answer P(C|K) C

• selects the next answer to be corrected depending on the given strategy and the computed probabilities; • writes back to the DB both the chosen student and the computed probabilities. A. A simple Bayesian network model The Bayesian network modeling the peer-assessment (defined through the CLP(BN) library available in Yap Prolog) is made of 3 finite-domain variables for each student: K: Knowledge (good, fair, bad) J: Judgment (good, fair, bad) C: Correctness (right, fair, wrong) The Knowledge variable is independent and based on the following probability distribution: P(K)

good 0.5

fair 0.2

bad 0.01

Notice that the above and the following probability distributions tables are “synthetic”, that is they have not been (yet) extracted from actual experimental data. This is presently acceptable, we think, because our aim, for the moment, is to test if the methodology works sufficiently well with the Bayesian approach. An important property of Bayesian networks is that such probability distributions can be learned from experimental data. Thus, later, we will obtain these distributions from the student's interaction with the system. The 3 variables of our model are related through two Conditional Probability Tables (CPTs), that (for the moment,

K fair 0.2 0.5 0.3

bad 0.01 0.3 0.69

The rationale for this assumption is that the answer is open, therefore the student could not guess its content (for the moment we are ignoring plagiarism issues). Similarly, we model the Judgment variable as probabilistically dependent on Knowledge. The rationale being that judging the answers of peers is a higher cognitive activity respect to both knowing and using the knowledge useful to answer. In this we take inspiration from the Bloom's taxonomy of cognitive abilities. P(J|K)

• introduces as evidence the currently available teacher's corrections; • propagates the evidence through the Bayesian network to compute the probability distributions of the answers' correctness;

right fair wrong

good 0.5 0.4 0.1

J

good fair bad

good 0.5 0.4 0.1

K fair 0.2 0.5 0.3

bad 0.01 0.3 0.69

The variables modeling a student are connected to the other students whenever a peer-assessment takes place. The student is presented with some (e.g. 3) peer answers and should choose the best (more correct) one. We model the choice by defining 3 cases respect to J: J = good: the student can discriminate among peers' answers, i.e. (s)he can tell apart the different values of Correctness for the answers presented:

right != fair != wrong

J = fair: the student is able to discern only among wrong and right answers

right = fair != wrong

J = wrong: the student is not able to distinguish and chooses at random

right = fair = wrong

The above rules are implemented in the Bayesian network through a CPT defining the probability of choosing answer q1, q2 or q3 depending on the student Judgment (J) and on the correctness (C1, C2, C3) of the proposed answers P(Choice | J, C1, C2, C3) A corresponding CPT can be easily obtained by generating all possible combinations of J, C1, C2, C3 and by assigning a uniform probability distribution only for the q1, q2, q3 answers that satisfy the above choice cases. Some examples follow: J good fair wrong good

C1 right right right wrong

C2 wrong fair wrong wrong

C3 right wrong fair wrong

P(q1) 0.5 0.5 0.33 0.33

P(q2) 0 0.5 0.33 0.33

P(q3) 0.5 0 0.33 0.33

Similar CPTs can be computed for peer-assessments with less or more than 3 answers to assess.

• 8-40 students answered,

The resulting Bayesian network interconnects all student models through the Choice variable, and propagates evidence coming from the teacher's correction both:

• marks were given in the 0 - 2.5 range,

• towards the corrected student's Knowledge (and thus to his Judgment), • and through the Choices where the corrected answer was proposed, to the Judgment of the peers (and thus to their Knowledge and Correctness). B. Selection strategies Four selection strategies are available to suggest the next answer to be corrected: • min_diff: the answer with the minimum difference among its maximum and minimum correctness probabilities is chosen. Rationale: the more the probability distribution of C is flat, the more the answer's correctness is ambiguous. • max_entropy: the entropy of the answer's correctness probability distribution is maximum. Rationale: the answer's correctness is the most ambiguous one. • max_wrong: the probability to be a wrong answer is highest. Rationale: students would not deem acceptable to fail “just because the Bayesian model said so”, without the teacher actually checking their answer. Thus the teacher is forced to correct/check at least all the answers deduced “wrong”. By grading these answers first we hope to collect enough information to deduce the rest without further work. V.

INITIAL EXPERIMENTATION

Some initial experimentation has been done by collecting data at the end of the “Web programming languages” course, part of the Computer Engineering Bachelor curriculum of Sapienza University of Roma, Italy. The first data-set (named 'marco' in the rest of the paper) is made of: • 1 proposed question, • 10 students answers, • 1-3 peer assessments were made by each student, • marks were given in the 0 – 10 range, • the teacher graded all answers. A second data-set (named 'carlo' in the rest of the paper) has been made available by the colleague Carlo Giovannella of Tor Vergata University, Rome, Italy. The data has been collected in a Physics course, part of the Computer Engineering Master curriculum and has been used for a different study on peer-assessment: • 3 different topics have been used, • 6 related questions for each topic were proposed (of which the first 4 were mandatory and the last 2 optional),

• 1-3 peer assessment were made by each student, • the teacher graded all answers. Both data-sets actually belong to other peer-assessment experiments, and collect the numeric marks awarded by the students to their peers' answers, and thus some adaptation was required 1) to transform the teacher's marks to the Correctness domain used in OpenAnswer, 2) to compute the students' choices and 3) to map back the probability distributions to the Correctness domain. The student's choice is obtained by selecting the highest mark among those given. The Teacher's votes are discretized by using the following mark ranges: data-set marco carlo

wrong 0.0 – 5.5 0.0 – 1.0

fair 5.5 – 7.0 1.0 – 2.0

right 7.0 – 10.0 2.0 – 2.5

The above ranges are also used to map probability distributions to Correctness values, which is a required step to evaluate the percentage of False Positives and False Negatives in our simulations. VI.

METHODOLOGY

The collected data-sets allows us to compare the different correction strategies and termination criteria with respect to: • Length: percentage of answers graded. • OK: percentage of the remaining answers deduced with a mark equivalent to the teacher's mark. Positive marks (“fair” and “right”) are considered equivalent. • False Positives: percentage of the remaining answers deduced with a mark better than the teacher's mark (e.g. “fair” or “right” instead than “wrong”). • False Negatives: percentage of the remaining answers deduced with a mark worse than the teacher's mark (e.g. “wrong” instead than “fair” or “right”). A. Mapping probability distributions to marks To compute False Positives and False Negatives we first need the grades we would assign to the remaining answers, given the computed probability distributions. To map the probabilities to grades we first map each correctness label to its corresponding numeric range then we compute the linear weighted combination: M = P(C=wrong) * (MinWrong + MaxWrong)/2 + P(C=fair)

* (MinFair

+ MaxFair)/2 +

P(C=right)

* (MinRight + MaxRight)/2

and then we map back the numeric mark obtained to the corresponding discrete value. E.g. if we use the 'marco' data-set ranges, and

• P(C=wrong)=0.4,

TABLE I.

SIMULATIONS BASED ON THE FIRST DATASET

• P(C=fair)=0.1, and • P(C=right)=0.5 M comes out as M = (1.1 + 0.625 + 4.25 = 5.975) i.e. C=fair. B. Simulating the correction to evaluate the strategies The correction is simulated by repeating the following steps: • read the peer-assessment and current marks and suggest the next best answer; • introduce as evidence the teacher's mark for the suggested answer; • test if the correction can be stopped, else repeat. C. Termination strategies As we aim to obtain a good but short correction, we need a stopping criterion, describing when the deduced marks for the remaining unmarked answers are good enough to stop.

TABLE II.

SIMULATIONS BASED ON THE SECOND DATASET

Two termination strategies have been defined (by mapping first the probability distributions to marks in the Correctness domain as explained above): • no_wrong: there are no more remaining answers deduced “wrong”. Rationale: see the max_wrong selection strategy. • no_flip(N): the current marks are the same than in the previous N simulation steps (default N=1). Rationale: the grades deduced are stable. VII. INITIAL RESULTS Table I shows the results of simulations done on the 'marco' data-set, listing for each pair the percentage of: answers corrected (Length), False Positives (FP), False Negatives (FN) and correct grades (OK, I.e. the percentage of marks inferred by the system in accord with the teacher’s). From the table we notice that: • The no_wrong termination criterion produces the longest correction (90%) on this data-set, obtaining 0% False Negatives (by design). • No False Positives are present in this data-set. • The highest percentage of OK is on the max_entropy, max_wrong and min_diff selection strategy. • The lowest percentage of False Negatives is obtained with the no_flip(3) criterion, yet the percentage is high. Basing on that we could come to the conclusion that even if this data-set is too limited to give us strong evidence, yet OpenAnswer is able to help the teacher by reducing the correction length of about 30% (the OK value). Table 2, instead, shows the simulations conducted on the, bigger, 'carlo' data-set.

From Table II we may observe that • the best correction quality (OK) ranges from 30% to 40% for a rather good number of combinations. • The no_wrong termination criterion produces very short corrections (Avg. Length=18%) but only when paired with the max_wrong selection strategy. The quality of the correction in this case is around Avg. OK=30%. Yet, the Avg. FP is 51%. • The lowest percentage of Avg. FP + Avg. FN = 7% is (obviously) obtained at the expense of longer corrections (Avg. Length = 89%), when the no_wrong termination criterion is used. • The number of False Negatives (negative marks deduced for good answers) is very low in all cases: max of Avg. FN = 8%. • Even with the very short corrections (Avg. Length = 610%, corresponding to the no_flip(1) termination criterion), where just 1 or 2 corrections are made, OpenAnswer is able to correctly deduce a good percentage (Avg. OK = 40%) of grades.

• The number of False Positives of the no_flip(1, 2, 3) corrections decreases when the correction size increases and more information is collected. Conclusions regarding the second table are manifold. First the pair seems to be the best choice, as it gives short corrections (Avg. Length=18%) with no FP (by design) and good number of right deductions (Avg. OK=31%), but with a slightly higher number of good marks given to bad answers (Avg. FP = 51%) than the others combinations. A second conclusion is in that even with the synthetic probability distributions (cfr. beginning of Sec.IV.A) OpenAnswer is able to help the teacher by reducing the correction load of at least 30%. A final conclusion stemming from these simulation is that surprisingly short corrections are able to produce a good picture of the class' knowledge. VIII. CONCLUSIONS AND FUTURE WORK Assessment of skills and knowledge is a crucial factor in systems supporting e-learning in general, and personalized elearning [15,16,17] and social-collaborative e-learning [18,19,20] in particular. The use of open-ended questions and questionnaires is deemed important in the area, while this is one of the assessing methods more work-intensive for the teacher. We have presented the OpenAnswer web-based system, built to help the teacher marking open-answer questions through a simple Bayesian network model of the student's peerassessment choices. The initial experimentation and simulation shows that, even with a very simple model of grades and peer-assessment, the teacher can be helped in the correction of a set of open answers, by iteratively suggesting the most informative answers to correct first and spread the information acquired through the marking onto the network made by the students models, the peer evaluations of the answers and the teacher’s grading. Our future lines of research can be summarized as follows. A. Using OpenAnswer as a group diagnostic tool Teachers need to quickly assess how much the students know the topic, either to plan further explanations or to move on to other topics. As we have seen that even short corrections, in OpenAnswer, can give some sufficient understanding of the distribution of skills among the class mates, we plan to use it as a group diagnostic tool. B. Learning the parameters of the model The Bayesian network modeling the peer-assessment depends on: • the CPTs describing the probabilistic dependencies of C and J respect to K. • the probability distribution of K Therefore, the next required step in this research is to collect enough data to learn the above CPTs and probability

distribution (for a total of 6+6+2 = 14 parameters) through machine learning of Bayesian networks parameters. C. Using a more detailed Correctness domain The CPT modeling the student's choice is easily computable for any domain size of the Correctness variable. Thus it is not difficult to extend OpenAnswer to use standard grades (such as the one ranging through A-F of an AngloSaxon grading model, or the one ranging between 5 and 10, more familiar to the authors). Yet, the domain size does impact on the K-C CPTs size, which will in turn require us to collect more data to learn the CPTs. D. Modeling more informative peer-assessment choices Here we mean that the amount of information coming for the peers activity can be increased (presently we just ask to select the best out of several answers). Independetly of the possible modifications of the number of answers proposed to the peer for evaluation, the planned improvements are: • asking the peer to point out both the best and the worst answers, out of the sample;; • asking to sort the answers, in an order related to thei correctness (such as from the worst to the best); • asking the peer to grade individually each one of the proposed answers (according to a stated grading mechanism and to the defined evaluation criteria). E. student model and personalization The Knowledge variable is a description of how much the student knows about the topic. Thus it can be used a (yet very simple) student model. In our simulation we have used a “uniform” student model, without a specific representation for each involved student. As the teacher's marks entered as evidence propagate through the Bayesian network and update the Knowledge probability distributions of each student, we can update the initial student model by storing the final Knowledge distributions for each student. Even better, if enough data is available we can compute, by machine learning, the correct Knowledge distributions for each student and topic, that would allow to finely personalize the grading process, by taking into account the specific strengths of an individual. F. Modeling plagiarism and anonymity For the moment, we are ignoring two common student behaviors: plagiarism of answers (from peers or from external sources) and lack of anonymity during peer-assessment. We intend to model these behaviors by: • enhancing the Bayesian student model by introducing a “Cheater” and a “PeerKnown” variables, addressing the two aspects; • enhancing the peer-assessment CPTs to model the different cases arising from blind/disclosed peerassessment. In this the ability of the system to learn the CPTs after the experiments will be useful.

ACKNOWLEDGMENT We would like to thank the colleague Carlo Giovannella, who made available to us the main data-set used for our simulations. REFERENCES Palmer, Kevin, and Pete Richardson. 2003. "On-line assessment and free-response input-a pedagogic and technical model for squaring the circle." Proceedings of the 7th Computer Assisted Assessment Conference. [2] Bloom, Benjamin S., Bertram B. Mesia, and David R. Krathwohl. 1964. "Taxonomy of Educational Objectives (two vols: The Affective Domain & The Cognitive Domain)." New York. David McKay. [3] Sterbini, A. and M. Temperini. 2012. “Supporting Assessment of Open Answers in a Didactic Setting”. In Advanced Learning Technologies (ICALT), 2012 IEEE 12th International Conference on (pp. 678-679). IEEE. [4] Sterbini, Andrea, and Marco Temperini. 2012. "Correcting open-answer questionnaires through a Bayesian-network model of peer-based assessment." Information Technology Based Higher Education and Training (ITHET), 2012 International Conference on. IEEE. [5] Sterbini, Andrea, and Marco Temperini. 2012. "Dealing with openanswer questions in a peer-assessment environment." In Advances in Web-Based Learning-ICWL 2012. LNCS 7558. pp240-248, Springer Berlin Heidelberg. [6] Yamanishi, K. and H. Li. 2002. “Mining Open Answers in Questionnaire Data”, IEEE Intelligent Systems, Sept-Oct 2002, pp 5863. [7] Morinaga, S, Yamanishi, K., Tateishi, K. and T. Fukushima. 2002. “Mining product reputations on the Web”, Proceedings eighth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD'02, pp 341-349, ACM New York, NY, USA. [8] Romero, C. and S. Ventura. 2010. “Educational Data Mining: A Review of the State of the Art”, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 40:6 pp. 601-618. [9] Jackson, K. and W. Trochim. 2002. “Concept mapping as an alternative approach for the analysis of open-ended survey responses”, Organizational Research Methods, 5, Sage. [10] D. Castellanos-Nieves, J. Fernández-Breis, R. Valencia-García, R. Martínez-Béjar, and M. Iniesta-Moreno. 2011. “Semantic Web

[11]

[12]

[1]

[13] [14]

[15]

[16]

[17]

[18]

[19]

[20]

Technologies for supporting learning assessment”, Information Sciences, 181:9, pp. 1517–1537, Elsevier. N. El-Kechaï, É. Delozanne, D. Prévit, B. Grugeon, and F. Chenevotot, 2011. “Evaluating the Performance of a Diagnosis System in School Algebra”, Proceedings Advances in Web-Based Learning - ICWL 2011, LNCS 7048, pp. 263-272, Springer. Formisano, A., Omodeo, E.G. and M. Temperini. 2001. Layered map reasoning: An experimental approach put to trial on sets. Electronic Notes in Theoretical Computer Science 48, pp. 1-28. Elsevier. Costa, V. Santos, et al. 2000. "YAP User’s Manual." Universidade do Porto, version 4 (2000): 20. Costa, Vítor Santos, et al. 2002. “CLP (BN): Constraint logic programming for probabilistic knowledge." Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc. Sterbini, A. and M. Temperini. 2010. “Selection and sequencing constraints for personalized courses”. In Proc. 40th ASEE/IEEE Frontiers in Education Conference, Oct.27-30 2010, Washington, DC, USA. Elvira Popescu. 2010. “Adaptation Provisioning with respect to Learning Styles in a Web-Based Educational System: An Experimental Study”. Journal of Computer Assisted Learning, Vol. 26(4), Wiley, ISSN: 0266-4909, pp. 243-257. Limongelli, C., Sciarrone, F., Temperini, M. and, G. Vaste. 2011. The Lecomps5 Framework for Personalized Web-Based Learning: a Teacher's Satisfaction Perspective. Computers in Human Behavior, 27:4, ISSN 0747-5632. Ivanova, M. and A. Popova. 2011. “Formal and Informal Learning Flows Cohesion in Web 2.0 Environment”. International Journal of Information Systems and Social Change (IJISSC), 2/1, pp. 1--15, IGI Global. Popescu, E. 2012. “Providing Collaborative Learning Support with Social Media in an Integrated Environment”. World Wide Web, Springer, ISSN: 1386-145X, DOI: 10.1007/s11280-012-0172-6. De Marsico, M., Sterbini, A. and M. Temperini. 2013. A Framework to Support Social-Collaborative Personalized e-Learning. In M. Kurosu (Ed.) Human-Computer Interaction (Part II), 15th International Conference on Human-Computer Interaction, 21-26 July 2013, Las Vegas, Nevada, USA, LNCS 8005, pp.351--360, Springer, Heidelberg.