Using Normative Theories to Design an ITS for

Using Normative Theories to Design an ITS for English Capitalisation and Punctuation Michael Mayo Department of Computer Science University of Canterbury [email protected] Abstract. Intelligent Tutoring Systems (ITSs) are a technology for supporting teachers. They are intelligent software programs equipped with subject knowledge, a student model, and teaching ability. This paper reports on the outcome of PhD research aimed at making ITSs more robust and effective, by applying normative theories such as Bayesian probability theory and decision theory, to the design and implementation of such systems. An approach is outlined by demonstrating how a normative ITS for English capitalisation and punctuation, CAPIT, was implemented. CAPIT was evaluated in a real classroom at Ilam School in June 2000, and the evaluation results show that normative methods are indeed a viable approach to the design of ITSs.

Keywords: Intelligent Tutoring Systems, Normative theories, Literacy skills. 1. Introduction This paper gives a high-level overview of PhD research that has been conducted in the area of Intelligent Tutoring Systems (ITSs). The focus of the research was to develop a robust and computationally efficient approach to applying normative theories to ITS design and implementation. Normative theories are sound, rigorous general theories of rationality. However, the reasoning mechanisms of many existing ITSs are largely ad-hoc, and as a result, they may behave in ways that are sub-optimal. An ITS whose behaviour obeys the rules of normative theories, on the other hand, is guaranteed to behave rationally and optimally, with respect to the chosen normative theories. To illustrate an approach to designing completely normative ITSs, we have developed CAPIT. CAPIT is an ITS that teaches basic capitalisation and punctuation skills to 8-10 year old school children. It is a constraint-based tutor that implements two normative theories: Bayesian probability theory for its student model, and decision theory for its tutorial action selection strategies. CAPIT was evaluated at Ilam School in June 2000, and a summary of the results are given in this paper. The interested reader is strongly encouraged to read [2,3] for more technical details about CAPIT’s implementation and evaluation, as this is purely an overview paper. 2. Normative Theories Decision theory and Bayesian probability theory are both instances of normative theories. A normative theory encompasses not only a set of rules, but also the set of logical consequences of those rules [1]. Therefore they can be considered logically complete and consistent. Under the assumption that a rational agent will act logically, normative theories can be thought of as prescriptive models for rational behaviour. This is in direct contrast to descriptive models that attempt to describe either the behaviour of an individual (such as an expert or a teacher) or a group of individuals

(e.g. a psychological theory derived from observations), which may be subject to logical inconsistencies or incompleteness (i.e. irrationality). Subjective Bayesian probability theory is a normative theory for maintaining uncertain beliefs. It provides a rational means of updating beliefs as new evidence or facts become known. The best computational implementation of Bayesian probability theory is the Bayesian network [5], a directed acyclic graph (DAG) in which nodes represent uncertain variables and edges correspond to the causalities or relevancies between the variables. Figure 1 is an example of a Bayesian network. In this simple network, a Flat Tyre or having Slept In can cause me to be Late for Work. With this model, I can determine the probability of being Late for Work given my knowledge (or lack of knowledge) about the causes. Similarly, I can reason backwards from knowing that I am or am not Late for Work to calculate the probabilities of any of the causes. In this particular example, I know I have not slept in (therefore Slept In is instantiated to No), so the probability of being Late for Work given that I have not slept in can be computed. No The most significant use of a Bayesian network Slept In Flat Tyre is that it is convenient for computing Bayes’ theorem on a large scale. Bayes’ theorem is a rule relating one’s belief in a hypothesis H before and after some evidence Late for e is observed. Mathematically, the rule is expressed as: Work Fig. 1. A simple Bayesian network.

P(H|e) = kP(e|H)P(H)

P(H) is the hypothesis’ prior probability and P(H|e) is its posterior probability. In the example above, the hypothesis was Late for Work and the evidence was Slept In = No. The posterior probability was therefore P(Late for work| Slept In = No). Any other variable like Flat Tyre can also be a hypothesis, e.g. P(Flat Tyre|Late for Work = Yes). The advantage of Bayesian networks is that inference algorithms have been developed that can efficiently handle large and complex queries. In a real system, the evidence e will likely comprise a number of observations (perhaps even hundreds of them), and the hypothesis H may in fact be a large set of hypotheses. Real models are thus large and complex, and naïve methods of calculating posteriors become intractable. Bayesian networks have been developed to handle these types of queries efficiently and effectively. Bayesian networks have a mathematical representation isomorphic to their graphical representation. In the general case, the mathematical representation is n

P(B) =

∏ P( X

i

| PA( X i ))

i =1

where X1...Xn are all the variables in the network and PA(Xi) is the set of parents of Xi in the DAG. In the example depicted by Figure 1, the mathematical representation is: P(B)=P(Late for Work|Flat Tyre,Slept In)P(Flat Tyre)P(Slept In) To completely specify a Bayesian network, one needs to specify each of the conditional probability factors in the mathematical representation.

Whilst Bayesian nets an effective scheme for representing and reasoning with uncertain beliefs, the theory says nothing about how behaviour should be determined. Normative action selection is the domain of decision theory [6]. Briefly, decision theory combines Bayesian probability theory with a representation of preference in order to select the action that, on average, will result in the most preferred outcome. This is called expected utility maximisation. Preferences are encoded as numeric utility functions, in which a real-valued number is assigned to each possible outcome of each potential action. For example, suppose that I must decide whether or not to explain to a student a particular concept. My possible actions, therefore, are Explain It or Don’t Explain It. Suppose that if I explain the concept, the student may, as a consequence, either Understand It or Doesn’t Understand It (assume there are no states of partial understanding). Certainly if I choose not to explain the concept, he/she will fail to understand out. The first outcome (explaining the concept and having the student understand it), naturally, is the most preferred outcome and I assign a utility of 1.0 to it. The second outcome is the least preferred outcome, as I have expended the energy and time to try and explain the concept but the student is no better off. I therefore assign a utility of –1.0 to it. Finally, I assign a utility of 0.0 to the status quo outcome of not explaining the concept at all, as it is more preferable to explaining the concept and having the student not understand it, but less preferable than having the student explain the concept. This utility function is depicted in Figure 2. Action Outcome Explain It Understand It Explain It Doesn’t Understand It Don’t Explain It Doesn’t Understand It Figure 2. A simple utility function.

U(Action, Outcome) 1.0 -1.0 0.0

To decide whether or not I should explain the concept, decision theory states that the expected utility of each action should be calculated and the action with maximum expected utility selected. Expected utility is simply the probabilityweighted average of the utilities of each possible outcome. So, for example, if P(Outcome=Understand Concept|Action=Explained Concept) = 0.7 and P(Outcome=Not Understand Concept|Action=Explained Concept) = 0.3, then the expected utility of explaining the concept is 0.7(1.0) + 0.3(–1.0) = 0.4. On the other hand, the expected utility of not explaining the concept, in which there is only one outcome, is 0.0. Decision theory therefore states that the concept should be explained, in this example. 3. CAPIT: An ITS for Capitalisation and Punctuation An Intelligent Tutoring System (ITS) is a computer program designed to teach a particular subject domain to a student. Unlike conventional educational software (e.g. multimedia CD-ROMs) that encapsulate only subject matter, ITSs are also equipped with student models and pedagogical knowledge. The student model is basically a representation of the student, which at the very least is a set of beliefs about which concepts and/or skills the student has mastered. The pedagogical module is a representation of generic teaching knowledge. Its input is the student model, and its purpose is to adapt the teaching actions of the system to the student. For example, if

the student is weak in one particular area but strong in another, the pedagogical module might decide to focus its teaching efforts on the area of weakness. CAPIT (Capitalisation And Punctuation Intelligent Tutor) [2,3] is an example of an Intelligent Tutoring System (ITS). It has been designed for school children in the 8-10 year old age group. CAPIT teaches a subset of the basic rules of English capitalisation and punctuation, such as the capitalisation of sentences and the basic uses of commas, periods, apostrophes and quotation marks. Indications are that the ITS motivates children to complete capitalisation and punctuation exercises significantly more so than the traditional approach of using a textbook. Figure 3 depicts the main interface to CAPIT. Brief instructions relevant to the current problem are clearly displayed at the top of the main interface. This reduces the cognitive load by enabling the child to focus on the current goals at any time without needing to remember them. Immediately below the instructions, and clearly highlighted, is the current problem. In this area, the child interacts with the system by moving Fig. 3. The tutor’s main user interface. the cursor, capitalising letters, and inserting punctuation marks. The child can provide input either by pointing and clicking the mouse, or by pressing intuitive key combinations such as Shift-M to capitalise the letter m. By requiring the cursor to be positioned at the point where the capital letter or punctuation mark is to go, the child’s ability to locate errors as well as correct them is tested. To motivate the children and retain their attention, CAPIT also gives student points for each correct solution, and displays simple animations as rewards. The exercises that CAPIT poses are known as completion exercises, in which a fully unpunctuated, uncapitalised piece of text is displayed and the student is asked to perform the corrections. An example is depicted in Figure 4(a). This example comprises only a single sentence, but most of the problems in CAPIT contain 3-4 sentences. Figure 4(b) is an incorrect solution to the problem, which would generate a feedback message such as The full stop should be within the quotation marks! Hint: look at the word books in your solution. Figure 4(c) is the correct solution to the problem. Feedback messages are typically short and relate to only a single mistake, but if the student wants more detailed explanatory information, she/he can click Why? CAPIT is the second ITS (a) the teacher said open your books to implement Ohlsson’s (b) The teacher said, “open your books”. Constraint-based Modelling (c) The teacher said, “Open your books.” (CBM) [4]. CBM was proposed in Figure 4(a). A problem, (b). a student’s incorrect part because of the intractability solution, and (c). the correct solution. of modelling approaches that try to infer students’ mental processes from problem solving steps, and in part because Ohlsson believes that diagnostic information is most readily available in the problem states that the student arrives at. A CBM tutor represents subject knowledge as a set of constraints of the form where Cr is the relevance condition and Cs is the satisfaction condition. The constraints define which problem states are consistent (or correct), and which are not.

A constraint is relevant to a problem if the Cr is true. All constraints that are relevant to a problem state must also be satisfied for the problem state to be correct. Otherwise, the problem state is incorrect and feedback can be given depending on which relevant constraints had their satisfaction condition violated. In the current version of CAPIT, 45 problems and 25 constraints are represented. The problems are relevant to the constraints in roughly equal proportions, although a small number of constraints (such as capitalisation of sentences) are relevant to all the problems. The constraints cover rules such as ending sentences with period, using an apostrophe for contractions and for denoting ownership, direct speech, and so on. To illustrate a constraint from CAPIT, consider the rule that all sentences must begin with a capital letter. The Cr of this constraint would be “Any word starting a sentence”. In other words, the constraint is only relevant to the first word of any sentence. The Cs of this constraint is “The word must start with a capital letter”. When a student clicks Submit, CAPIT firstly determines if there are any words that start a sentence in the solution (in this case, there will be at least one such word in every solution). It then tries to match the Cs of the constraint to see if those particular words that the constraint is relevant to are capitalised. If any are not, the constraint is violated and the system can give the student feedback about capitalising sentences. Another example is the constraint defining the correct punctuation of singular possessive nouns. In this case, the Cr is “The word is a singular possessive noun” and the Cs is “The word ends in ‘s”. Since CAPIT knows which words in each problem are singular possessive nouns, it can easily determine if the constraint is satisfied. All 25 of the constraints in CAPIT are defined using the language of regular expressions, which is a compact means of specifying patterns occurring in text strings. 4. Normative Modelling in CAPIT Normative systems were used to implement both CAPIT’s student model and its pedagogical module [2]. The student model is a Bayesian network and the tutorial actions are selected by combining the output of the Bayesian network with a utility function in order to select the tutorial action with maximum expected utility. Unlike other Bayesian student modelling approaches, in which a Bayesian network is crafted initially by hand and then remains static during its use, CAPIT’s Bayesian network was constructed automatically by machine learning and is then allowed to adapt dynamically on-line to the current student. The Bayesian student model is sophisticated and compromises 50 uncertain variables. Each of the 25 constraints have two variables in the Bayesian network; the first (Li) representing the outcome of the last attempt at the ith constraint (which can be either satisfied, violated, violated but followed by a constraint-specific feedback message, or never attempted), and the second (Ni) represents the system’s prediction about whether or not the constraint will be satisfied or violated on the next attempt. As each of the variables L1..L25 represent past actions of the student, they are therefore certain. However, the variables N1..N25 are unknown and therefore uncertain. The student model is therefore a mechanism for reasoning from the student’s most recent performance (L1..L25) in order to make a prediction about the student’s future performance (N1..N25). In other words, for the ith constraint, the purpose of the student model is to output the probabilities P(Ni=Satisfied) and P(Ni=Violated), which are the posterior probabilities of satisfaction and violation respectively. Figure 5 depicts the network.

Because the system initially has no knowledge about a new student, the starting student model is constructed from the past L2 N2 performance data of other students. Standard Bayesian network learning L3 N3 algorithms were applied this data to construct the student model depicted in L4 N4 Figure 5. However, as the student interacts with the system and observations are accumulated about the current student, the network is dynamically adapted to maximise its ability to predict the student’s L25 N25 future performance on each constraint. Figure 5. The Bayesian student model in Two tutorial action selection CAPIT. procedures are implemented in CAPIT using decision theory. The first is next problem selection: when a student completes or abandons the current problem, how does the system select the most appropriate next problem? Ideally, the next problem should not be too difficult (or else the student will become discouraged), nor should it be too easy (or the student will not be challenged). The solution is to use the Bayesian network to firstly predict the student’s likely performance on each of the possible next problems, and then to feed those posterior probabilities into a decision-theoretic procedure that calculates the expected utility of the problem. The potential next problem with the highest expected utility is then selected. In CAPIT, a utility function has been defined based on the probable number of errors that the student will make. Specifically, problems that are expected to produce only one or two errors with high probability maximise expected utility, whilst problems where the likely number of errors is zero or greater than two with high probability have minimum utility. An analysis of this strategy with “simulated” students shows that suitable next problems are selected for both poor students who consistently violate many constraints, and good students to consistently satisfy most constraints. The second selection strategy concerns feedback messages. When a student submits a solution that violates many constraints, how should the ITS decide which one constraint to give feedback on first? Some violated constraints are more significant from a pedagogical perspective. The solution in this case is to use the Bayesian network to “pretend” that feedback was, in turn, given on each violated constraint. The system can then compute the posterior probability that the constraint be violated again on the next attempt, with and without feedback. It turns out that the expected utility corresponds to the difference in these probabilities. More specifically, the expected utility of a feedback message is equal to the decrease in the posterior probability of the constraint being violated that the message results in. L1

N1

5. Classroom Evaluation of CAPIT Three classes of 9-10 year olds at Ilam School in Christchurch, New Zealand, participated in a four-week evaluation of CAPIT [2]. The first class (Group A) did not use CAPIT at all. The purpose of this group was to provide a baseline for comparing the pre- and post-test results of the groups that did use CAPIT. The second class (Group B) used the a “cut down” version of CAPIT with randomised next problem and feedback message selection, and the third class (Group C), used the full version of

Mean Score

the tutor with normative student modelling and tutorial action selection. The groups using the tutor, B and C, had one 45-minute session per week for the duration of the study, and they worked in the same pairs each week. (Working in pairs was necessary because of the limited availability of computers.) Every interaction was logged. Pre and post tests were also completed by the students working in the same pairs. The pre- and post-tests were comparable (and challenging) and consisted of eight completion exercises similar to those presented by CAPIT, but done manually with pencil-and-paper. The score for each test was calculated by 100.0% subtracting the number of 80.0% punctuation and capitalisation 60.0% Pre-Test errors from the number of Post-Test 40.0% punctuation marks and capital letters required for a perfectly 20.0% correct solution. The mean scores 0.0% and standard deviations (the Y A B C error bars) are shown in Figure 6. Group The mean pre-test score for Group C is almost 10% lower than that of Group B. Both Group Figure 6. Pre- and post-test mean scores. B and C show an improvement in mean test scores, although the improvement is more marked for Group C. Group A, the class that did not use the tutor, actually regressed. Statistical significance tests were performed to compare the individually matched improvements of Groups B and C from pre-test to post-test. Because the same pair of students in each group completed both a pre- and a post-test, a one-tailed paired difference experiment was performed to gauge the significance of the improvement. With H0 being the proposition that a group did not improve, it was found that Group B improved with 95% confidence (α = 0.05, t = 1.86, rejection region ± 1.75) while Group C improved with 99% confidence (α = 0.01, t = 3.4, rejection region ± 2.6). The improvement is thus much more significant for Group C, the group using normative CAPIT. The constraint violation frequencies against time was also investigated. Each attempt at a problem was analysed, and the total proportion of violated constraints was calculated for each attempt. This was averaged over all students in each group, and the result is depicted in Figure 7. The scatter plot shows that Group C initially made more errors than Group B, but that the rate of constraint violation decreased much faster for that group. A cut-off point of 125 attempts was selected because approximately half of the pairs of students reached this number of attempts, and beyond this number statistical effects arising from the smaller number of pairs tend to corrupt the trend.

Mean Prop. Constraints Violated on Nth Attempt

These results and further analysis reported in [2] lead to a conclusion that Group C was initially less able (i.e. made more errors and had a lower pre-test score) than Group B, but they learned the constraints at a faster rate as a result of the normative student model and action selection strategies.

0.6 0.5

y = -0.0025x + 0.4209 R2 = 0.8264

0.4

Group B

0.3

Group C 0.2

y = -0.0016x + 0.3237 R2 = 0.6474

0.1 0 0

20

40

60

80

100

120

N

Figure 7. Mean number of errors on the nth attempt.

6. Conclusion The application of normative theories, namely Bayesian probability and decision theory, to ITS design and implementation has been introduced and described. A normative ITS, CAPIT, that teaches basic capitalisation and punctuation skills to 8-10 year old school children, has been detailed. CAPIT was evaluated in a real classroom, and the results show that students using the normative version of CAPIT learn constraints at a faster rate than students using a non-normative version of the same system. References 1 Gärdenfors P. (1989). The dynamics of normative systems. In Martino A., Proceedings of the 3rd International Congress on Logica, Informatica, Dritto, pp. 293-299. Consiglio Nazionale delle Ricerche, Florence 1989. Also published in A.A. Martino (Ed.) Expert Systems in Law, Elsevier, pp. 195-200, 1991. 2 Mayo M. and Mitrovic A. (2000). Optimising ITS Behaviour with Bayesian Networks and Decision Theory. To appear: International Journal of Artificial Intelligence and Education. Also available online at http://cbl.leeds.ac.uk/ijaied/. 3 Mayo M., Mitrovic A. and McKenzie J. (2000). CAPIT: An Intelligent Tutoring System for Capitalisation and Punctuation. In Kinshuk, Jesshope C. and Okamoto T. (Eds.) Advanced Learning Technology: Design and Development Issues, Los Alamitos, CA: IEEE Computer Society (ISBN 07695-0653-4), pp. 151-154. 4 Ohlsson S. (1994). Constraint-based Student Modelling. In: Greer, J.E., McCalla, G.I. (Eds.): Student Modelling: the Key to Individualized Knowledge-based Instruction, pp. 167-189. NATO. 5 Pearl J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference ( 2nd ed.): Morgan Kauffman. 6 Savage L. (1954). The Foundations of Statistics. Wiley.

Using Normative Theories to Design an ITS for

Using Normative Theories to Design an ITS for

Suggest Documents

Normative design using inductive learning

Transitional Media vs. Normative Theories - Oxford Journalshttps://www.researchgate.net/...Media.../Transitional-Media-vs-Normative-Theories-Sc...

Normative theories of journalism - NYU Steinhardt

AN INTRODUCTION TO LEARNING THEORIES

Normative Positions in Architectural Design

"An agent architecture for prognostic normative reasoning"

"An agent architecture for prognostic normative reasoning"

Using Simulation to Design an Automated Underground System for ...

Using Evaluation to Shape ITS Design: Results and ... - Core

An Introduction to Network Stack Design using Software Design Patterns

An Introduction to Digital Design Using a Hardware Design ...

Normative and Descriptive Theories of Decision Making under Risk ...

Normative theories of argumentation: Are some norms better than ...

REAL WORLD JUSTICE Can normative theories about ... - Springer Link

An alternate Hamiltonian formulation of fourth–order theories and its ...

Using theories of change to design monitoring and evaluation ...www.researchgate.net › publication › fulltext › Using-the

STRUCTURAL DESIGN MODIFICATIONS USING ... - Digilib ITS

Using an SVG simulation tool to design an urban ... - MSSANZ

An Engineering Approach to Enterprise Architecture Design and its ...

An Introduction to Gender Audit Methodology: Its design ... - CiteSeerX

An Experiment to Apply Some Substance-Theories

Theories and Practice of Design for Information Systems: Eight Design ...

Theories and Practice of Design for Information Systems: Eight Design ...

Flight Autopilot Design using LQ and Feedback Controller Theories