Abstract. Student log files can be used to infer students' attitudes that affect learning. .... requested 2 hints, then he/she is having an average behavior and thus ...
Inferring unobservable learning variables from students’ help seeking behavior Ivon Arroyo, Tom Murray, Beverly P. Woolf Computer Science Research Center, University of Massachusetts Amherst {ivon, tmurray, bev}@cs.umass.edu Abstract. Student log files can be used to infer students’ attitudes that affect learning. Starting from a correlation analysis that integrates survey-collected student attitudes with student behaviors while using the tutor, we constructed a Bayesian Network that infers students attitudes and perceptions of the system.
1. Introduction One of the primary components of an interactive learning environment is the help provided during problem solving activities. Some studies have found a link between help seeking and learning, suggesting that higher help seeking behaviors result in higher learning (Wood&Wood, 1999). However, there is growing evidence that students may have non-optimal help seeking behaviors, and that they seek and react to help depending on student characteristics, motivation, past experience and other factors (Aleven et al, 2003). For instance, we have found that elementary school males spend less time on hints than females (Arroyo et al., 2001). Even further, we have evidence to suggest that hints that demand a high level of interaction (dragging and dropping, high clicking levels, etc.) promote higher leaning for females than for males (Arroyo et al., 2000), probably for the same reason (i.e., that males are willing to spend less time in hints). It has also been suggested that different attitudes and beliefs may translate into different help seeking behaviors. For instance, Nelson-LeGall et al. (1990) discussed how students’ help seeking depends on the confidence in the answer they achieved. It has been also discussed that goal orientations while using the ILE might affect students’ help seeking behavior. Learningoriented students tend to see the benefits of help seeking, while performance-oriented students perceive helpseeking as a threat to self-worth and might thus restrain from seeking for help (Ryan&Pintrich, 1997). Once negative attitudes towards help or towards the system are understood, and provided that these attitudes are known to the tutoring system, corrective actions may be taken. However, there are yet many questions to answer before making inferences and remediation about suboptimal use of help in tutoring systems, such as whether there is a link between students’ help seeking behaviors and students’ attitudes towards help and help seeking, or how are students’ perceptions of the system linked to actual learning, and to students’ ways of interacting with the tutoring system. After that, can we build a student model from past student interactions with the system that allows to infer students’ attitudes and perceptions just by observing student interactions with the system? And later, what are possible actions to be taken by the tutoring system to encourage positive attitudes? This paper explores these questions with a special emphasis on the technique used of integrating survey data and log files data to build Bayesian Networks that diagnose hidden variables that affect students’ learning, such as students’ attitudes towards help or the system. This methodology can contribute to building tutoring systems that are responsive and adaptable to students’ feelings and attitudes towards the system and towards help.
2. Domain: Wayang Outpost Wayang Outpost is a multimedia web-based tutoring system for SAT (standardized achievement tests) problems for high school mathematics (Arroyo et al., 2003). Multimedia is used to direct attention, animate parts of the solution when providing help, and introduce concepts with sound and animations when requesting for help. When students request help by clicking on a help button, Wayang Outpost provides step-by-step instruction in the form of animations, which help students solve the current problem and teach concepts that are transferred to later problems. Each SAT problem is associated with a solution plan, with hints and skills attached to steps in the solution. Each step in the solution is associated with one skill in the knowledge base. When clicking on the help button, the tutor explains the next step of the solution. Every interaction of the student with the system is logged
in a server-side relational database, allowing for later queries of help seeking behaviors such as time spent in hints, time and number of problems seen, etc.
3. Data sources The data comes from a population of 150 students from two high schools in rural areas in Massachusetts, 15-17 year-olds. Students took a pretest, and then used Wayang Outpost for about 2-3 hours. Students were provided headphones, as the help contains text and animations with audio. After using the tutor, students took a post-test, and filled out a survey asking for questions about their perceptions of the system. Tables 1, 2 and 3 describe the instruments used at the end of the study, with code names for each question and the number of choices they had to answer. There is a clear overlap between our variables and Vicente and Pain’s (2002) motivational variables, created with a similar purpose than ours: they attempted to infer motivational variables based on students’ interaction with a tutoring system (by creating rules), concluding that it is feasible to do it, and proposing sample diagnosis rules based on a study with students. However, creating rules becomes more complicated when the amount of motivational and observable variables increases, and given the degree of uncertainty and dependence among variables. In general, the variables we utilized specifically target attitudes and perceptions towards help. Still, tables 1, 2 and 3 show how our variables and Vicente and Pain’s overlap. Code Learned?
Choices 5
Like?
5
Helpful? Return?
5 5
Question Do you think you learned how to tackle SAT-Math problems by using the system? How much did you like the system?
Vicente and Pain (2000)
Satisfaction, Sensory Interest Satisfaction Satisfaction
What did you think about the help in the system? Would you come back to the web site to use the system again if there were more problems and help for you to see? How many more times would you use it again? Table 1. Post-test questions for students’ perceptions about the tutoring system Code Choices Question Vicente and Pain (2000) Headphones? 5 How much did you use the audio for the Sensory Interest, Effort explanations? Table 2. Post-test question for students’ interaction with the tutoring system Attitude Code Choices Question Vicente and Pain (2000) Seriously try 5 How seriously did you try to learn from the Effort learn tutoring system? Get it over with 2 I just wanted to get the session over with, so I Effort went as fast as possible without paying much attention Challenge 2 I wanted to challenge myself: I wanted to see how Independence, many I could get right, asking as little help as Challenge possible. Didn’t care help 2 I wanted to get the correct answer at each Effort problem, but I didn’t care much about the help or about learning with the software. Help 2 I wanted to ask for help when necessary, but tried Independence Independence to become independent of help as time went by. Other 2 I wanted to see other approaches to solving the Cognitive Interest approaches problem, and that is why I asked for help even if I got it right. Confirmatory 2 I didn’t want to enter a wrong answer, so I asked Confidence help attitude for help before attempting an answer, even if I had a clear idea of what the answer could be. Table 3. Post-test questions for students’ attitudes towards help and learning with the tutoring system
Figure 1. Hypotheses about the relationship between learning, attitudes, help seeking and other variables Measures of learning. Two learning measures were considered. One of them is students’ perception of how much they learned (Learned? in table 1), collected from surveys after the tutoring session. The second one is a ‘Learning Factor’ that describes how students decrease their need for help in subsequent problems during the tutoring session, on average. Performance at each problem is defined in equation (1) as the 'expected' number of requested hints for this problem (the average for all subjects) minus the help requests made by the student at the problem, divided by the expected number of requested hints for the problem1. For instance, if students on average tended to ask for 2 hints in a problem before answering it correctly, and the current student requested 3 hints, then, the student performed 50% worse than expected, and thus performance is -0.5. If the student instead requested 2 hints, then he/she is having an average behavior and thus performance is 0.0. If the student requested no hints at all, he/she is performing 100% better than expected, and thus performance is 1.0. Ideally, performance should increase as the session progresses, so these performance values should increase too. The average difference of performance between pairs of subsequent problems (the (t+1)th and the (t)th problem) in the whole tutoring session becomes a measure of how students’ need for help fades away before choosing a correct answer (see equation 2). This is an observable measure of learning, which should be higher when students learn more (see equation 2). As an example, if the student saw only four problems, the learning factor would be
(P2 − P1) + (P4 − P3 ) 2
Pp =
1
E(H p ) − H p E(H p )
(1)
This method is a way to make different problems comparable to each other. The expected difficulty of the problem is always higher than zero, as there was no problem where students didn’t ask for help at all on average. However, it is true that easy problems will have expected hints close to zero, fostering extremely large absolute values for the performance measure. One possible ‘trick’ solution is to add 1 to E( H p ) and to H p .
Probs. seen
∑ (P
p +1
Learning factor =
p=1, p= p +2
− Pp )
Probs. seen 2
(2)
Given the mentioned dependent and independent variables, the hypotheses may be re-stated to evaluating the existence of the links in figure 2.
4. Results A correlation analysis was performed to look for links among the variables in figure 2. Figure 3 shows the significant correlations found among help seeking attitudes, help seeking behaviors, perceptions of the system, gender and other student behaviors, such as problems seen, and how much students used headphones. As expected, there are tight links among variables within a group, such as significant correlations among the variables that describe perceptions towards the system (e.g., correlation for Learned? and Return? R=0.49, p Learning factor Learned? 0.9352 Helped probs/total probs Didn’t care help 0.0935 Hints per problem Headphones on? 0.5503 Std. Hints per problem Helpful? 0.6186 Problems per minute Gender=female 0.6285 Total time in system Serious? 0.7062 Time per problem Like? 0.9071 Challenge Attitude 0.594 Help Independence 0.6495 Confirmatory Help 0.0824 Return? 0.7827 Other Approaches 0.2869 Get Over With 0.0396 Table 6. Some inferred hidden nodes for a student who seems to use the system carefully Observed “leaf” nodes
Observed Value Low High Low High High Low Low
Hidden nodes
Posterior Probability => Learning factor Learned? 0.485052 Helped probs/total probs Didn’t care help 0.413229 Hints per problem Headphones on? 0.655983 Std. Hints per problem Helpful? 0.512603 Problems per minute Gender=female 0.590023 Total time in system Serious? 0.556645 Time per problem Like? 0.920361 Challenge Attitude 0.568902 Help Independence 0.471927 Confirmatory Help 0.35873 Return? 0.613032 Other Approaches 0.212891 Get Over With 0.156031 Table 7. Some inferred hidden nodes for a student who doesn’t seem to pay attention to help (probably slipping and making careless mistakes) The network described and the corrective actions are being added into the Wayang Outpost system, and will be tested in Massachusetts public schools. Two versions of the system will be compared: a version that includes inferences and corrective actions that are triggered when the probability of a certain attitude happening is above a threshold; the control version will not include any reasoning about attitudes. We expect that the former will lead to higher learning, better perceptions of the system, and better attitudes towards help at the end of the session. Future work involves an evaluation of accuracy of the proposed Bayesian Network, by computing the CPTs with 80% of the data, and testing the accuracy at predicting hidden nodes on the remaining 20% of students’ data. We realize there are two main limitations to the BBN construction approach presented above. The first one is that correlation doesn’t determine causality, and some of the links were given a certain direction based on intuition. Because in general it is better that the nodes of origin are causes of the end nodes (Russell&Norvig, 2003), it is not clear that certain links have the best direction. Different networks with alternative node directions may be compared depending on their predictive power, making the methodology more sound. It may happen that the difference in predictive power is not major by changing the direction of certain links, such as the work of Zhou and Conati (2003), but it would worth to know in any case. The second limitation is that the discretization
of the variables may be unnecessary and reduce predictive power. While keeping the structure of the network, continuous parameter learning can used to learn continuous distributions (instead of the discrete distribution of a CPT). Future work consists of comparing the accuracy of different models (discrete, continuous, different link directions): how much accuracy is gained by changing such characteristics of the model.
5. From a data-centric student model to pedagogical decisions The following step is to produce an improved tutoring system that makes informed decisions based on the detection of specific attitudes towards help and perceptions of the system, adding the model to operate in a system that is non-experimental (where the variables from the surveys are not available and we would like to be able to infer them from the learner's behavior). This is clearly the final goal. What still needs to be determined is what kind of corrective actions should be taken when negative attitudes or undesirable behaviors are inferred to be true. Table 4 suggests actions to initiate when certain attitudes are detected. Note that not all actions are remedial (i.e. there are not only actions in response to negative attitudes), but also some actions are applicable to the detection of positive attitudes, such as an action of reinforcement, encouragement to continue with a certain behavior, or the possibility of an alternative action. When detecting an attitude of… …not taking the program seriously OR "get it over with" attitude
…Didn’t care about help
…Help Independence
…Confirmatory Help Attitude
When the student believes that… …the system is not helpful BUT observable learning factor is high
…he/she doesn’t like the system OR he/she wouldn’t use the system again
Perform the following action… - Show past students’ data to make this student reflect on their behavior: Show past students’ post-test scores for students that did not take the program seriously vs. those who did take it seriously, saying that they won’t improve if they don’t take it seriously. - Increase some motivational component of the system. - Flag these students to a teacher or on-line dialog to ask why they are not seeming to take it seriously and if anything can be done to make it more motivating - Show past students’ data to make this student reflect on their behavior: “Please pay attention to the help. When students in other schools paid attention to help, they improved this much, but when they didn’t, they only improved this much.” - Positive reinforcement: Congratulate them, and show a graph showing how much less help they are requiring at this time than before. - Show the possibility of an alternative behavior: “You seem to be asking for help even if you are pretty sure about your response. Let’s see if next time you can make an attempt without seeing help” - Encouragement if they are doing well: "you can trust your answers!" Perform the following remedial action… - Show the student his own data to demonstrate how his beliefs don’t match reality: display a graph showing how much less help they are requiring at this time than before.
Interrupt to get important feedback: You don’t seem to like the system very much… can you please tell us what you don’t like about it?…… When there is a behavior of… Perform the following remedial action… …student is not wearing headphones Show the possibility of an alternative behavior: Are you wearing your headphones? Sound is a very important part of the help we give you. You will learn much more if you listen to sound. Table 4. Remedial actions to trigger when specific attitudes are detected
7. Summary and conclusions We showed how data collected from post-test surveys of students’ perceptions and attitudes towards the system can be merged with data of student interactions with the system to build a data-driven model that infers negative and positive attitudes of student users, while they are using the system. A methodology was presented that describes how to transition from a correlation analysis to the creation of a Bayesian student model that merges observable variables (such as time spent in hints) with hidden nodes that capture students’ motivations, attitudes, perceptions, beliefs and other unobservable behaviors. It highlights how machine learning methods and a classical statistical analysis can be merged to produce a valuable mapping between the low grade log data to aspects that are interesting to the realm of student learning and cognition.
8. References Aleven, V., Stahl, E., Schworm, S., Fischer, F., & Wallace R. (2003) Help Seeking and Help Design in Interactive Learning Environments Review of Educational Research. Arroyo, I., Beck, J. E., Woolf, B.P., Beal, C. R., Schultz, K. (2000) Macroadapting Animalwatch to gender and cognitive differences with respect to hint interactivity and symbolism. Proceedings of the Fifth International Conference on Intelligent Tutoring Systems. Montreal, Canada. pp. 574-583. Springer. Arroyo, I., Beck, J. E., Beal, C. R., Rachel E. Wing, Woolf, B. P. (2001) Analyzing students' response to help provision in an elementary mathematics Intelligent Tutoring System. Help Provision and Help Seeking in Interactive Learning Environments. Workshop at the Tenth International Conference on Artificial Intelligence in Education. Arroyo, I., Walles, R., Beal, C. R., Woolf, B. P. (2003) Tutoring for SAT-Math with Wayang Outpost. Advanced Technologies for Mathematics Education Workshop. Supplementary Proceedings of the 11 th International Conference on Artificial Intelligence in Education. Beck, J.; Woolf, B; Beal, C. (2000) ADVISOR: A machine learning architecture for intelligent tutor construction. In the Proceedings of the Seventeenth National Conference On Artificial Intelligence. Conati C. (2002). Probabilistic Assessment of User's Emotions in Educational Games. Journal of Applied Artificial Intelligence, special issue on “ Merging Cognition and Affect in HCI”, vol. 16 (7-8), p. 555-575 Mayo, M.; Mitrovic, A. (2001) Optimising ITS Behaviour with Bayesian Networks and Decision Theory. International Journal of Artificial Intelligence in Education, 12, 124-153. Nelson-Le Gall , S., Kratzer, L., Jones, E., & DeCooke, P. (1990). Children's self-assessment of performance and task-related help seeking. Journal of Experimental Child Psychology ,49 , 245-263 Russell, S.; Norvig, P. (2002). Artificial Intelligence: A Modern Approach (2nd Edition). Chapter 14: Probabilistic Reasoning Systems. Ryan , A. & Pintrich , P. ( 1997 ). Should I ask for help? The role of motivation and attitudes in adolescents’ help -seeking in math class. Journal of Educational Psychology, 89, 1-13 de Vicente, A., Pain, H. (2002) Informing the detection of the students' motivational state: an empirical study. In Proceedings of the Sixth International Conference on Intelligent Tutoring Systems. Lecture Notes in Computer Science. Springer. Wood, H.; Wood, D. (1999). Help seeking, learning and contingent tutoring. Computers & Education , 33(23):153-169. Zhou X. and Conati C. (2003). Inferring User Goals from Personality and Behavior in a Causal Model of User Affect . In Proceedings of IUI 2003, International Conference on Intelligent User Interfaces, Miami, FL, U.S.A. p. 211-218.