Challenges of Using Observational Data to Determine the Importance ...

7 downloads 35945 Views 1MB Size Report
Jun 17, 2015 - One of the promises of “Big Data” in education is to use non-experimental data to discover insights. We focus on studying the impact of example ...
Challenges of Using Observational Data to Determine the Importance of Example Usage Yun Huang1(B) , Jos´e P. Gonz´ alez-Brenes2 , and Peter Brusilovsky1 1

Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA {yuh43,peterb}@pitt.edu 2 Pearson Research and Innovation Network, Philadelphia, PA, USA [email protected]

Abstract. Educational interventions are often evaluated with randomized control trials, which can be very expensive to conduct. One of the promises of “Big Data” in education is to use non-experimental data to discover insights. We focus on studying the impact of example usage in a Java programming tutoring system using observational data. For this, we compare different formulations of a recently proposed generalized Knowledge Tracing framework called FAST. We discover that different formulations can have the same predictive performance; yet their coefficients may have opposite signs, which may lead researchers to contradictory conclusions. We discuss implications of using fully data-driven approaches to study non-experimental data. Keywords: Student modeling

1

· Example usage · Knowledge Tracing

Introduction

Tutoring systems offer different educational content to help students. For example, programming tutoring systems store a variety of program examples and present it to students in the learning process [1]. Prior work has studied the impact of examples on learners using randomized control trials [8,10]. In this paper we study issues of evaluating example usage using observational data. The motivation of using non-experimental data is to enable cheaper experimentation. Our work differs from previous work [2,9,11] in that we focus on example usage and not general hints. More importantly, our work suggests a reconciliation of seemingly conflicting results in the literature. While some researchers [2,9] find that hints affect positively learning, others [11] find the opposite.

2

Approach

We focus on the effect of examples in student performance. Student performance is often modeled with the Knowledge Tracing algorithm [3]. Knowledge Tracing has four parameters: the probability of the student knowing the skill before c Springer International Publishing Switzerland 2015  C. Conati et al. (Eds.): AIED 2015, LNAI 9112, pp. 633–637, 2015. DOI: 10.1007/978-3-319-19773-9 79

634

Y. Huang et al.

Table 1. Different FAST parameterizations. Each square represents a parameter. Model

Init

Learn Guess Slip

Knowledge Tracing InitScaff LearnScaff EmitScaff

How to handle example features Et ? can’t handle features init = logistic(i1 + i2 · Et ) learn = logistic(l1 + l2 · Et ) guess = logistic(g1 + g2 · Et ) slip = logistic(s1 + s2 · Et )

practice (init), the learning rate (learn), the probability of guessing and slipping. The guess and slip probabilities are often called emission probabilities. We consider different combinations of parameterizations on how example usage affects student performance. Prior work has only parameterized learning parameters [9], or all four parameters [2]. We also consider fitting parameters per skill [2], or per student [9]. Fitting student-specific parameters allows to control for student differences that can be possible confounders [2]. Prior work [5] has focused on the predictive performance of different parameterizations; we believe we are the first to examine the fitted parameters. For our analysis we use a student modeling toolkit called FAST [4], which enables features in Knowledge Tracing easily and efficiently. We define a binary feature Et that gets activated on time t when a student requests an example immediately before tth practice. We use FAST’s coefficients on the feature E to measure the impact of example usage. For simplicity we report the inverse of the slip probability (1−slip) so that the positive coefficients indicate higher probability of correct. Table 1 describes the different parameterizations we consider.

3

Experimental Setup

Our data was collected from an online Java programming tutoring system that offers code examples and problems [6]. In each question, students are asked to predict the output of the provided Java program. In each code example students can interactively explore line-by-line explanations. There are 43,696 question attempts and 62,494 example line clicks from 328 students on 124 questions, 110 examples, and 14 topics in total. For skill-specific models, we randomly selected 80% students for each skill to train and test on the rest. For student-specific models, we train one model per student using the first half of observations and predict the latter. For all models, we do 20 random restarts and pick the one with the maximum log likelihood on train set to avoid local optimum. We evaluated models using Area Under the Curve and report mean (across skills or students) and overall AUC. We computed the 95% confidence interval with bootstrap across skills or students for the mean AUC and the feature coefficients. Knowledge Tracing has mean AUC 0.62(±0.03) and overall AUC 0.68 for skill-specific models, and mean AUC 0.54(±0.02) and overall AUC 0.65 for

Challenges of Using Observational Data

635

student-specific models. FAST with example features all have statistically the same mean and overall AUC as the corresponding Knowledge Tracing baseline by paired t-tests (α=0.05). Do these different model formulations show the same impact of example usage? We proceed with our investigation.

4

Empirical Results

Figure 1 and 2 report the relative effect of example usage for skill and student specific models, respectively. We report the difference between using an example or not for the different parameterizations of Table 1. For example, for guessing: Δ guess = logistic(g1 + g2 · 1) − logistic(g1 + g2 · 0) On average, EmitScaff model suggests that an example activity has a negative association with performance, while LearnScaff model suggests a positive association with learning. For example, for skill Objects, viewing an example decreases the probability of succeeding suggested by EmitScaff model, but increases the learn probability suggested by LearnScaff model.

Fig. 1. Skill-specific comparison

On average skill-specific and student-specific models both suggest that an example activity has a negative association with performance. However, they differ in that student-specific models suggest a negative association with init probability and a stronger positive association with learn probability. We need to further study the implications of these two formulations. Previous studies have reported both positive [2,9] and negative [11] effects on the usage of examples. To reconcile the conflicting results in the literature and in our analysis, we hypothesize that two processes may co-exist: students may learn from examples which can increase performance or knowledge, yet a lower ability student may be more likely to request an example. Future work may study this hypothesis.

636

Y. Huang et al.

Fig. 2. Student-specific comparison

5

Contributions and Conclusions

We report an example of using different model formations fitted using observational data. We discover that equally predictive models may lead to conflicting conclusions. The implications of our study is that the common practice of simply reporting coefficients for a single formulation may not be appropriate. In follow up work [7] we extend this work and study more closely the parameters learned from observational data.

References 1. Atkinson, R.K., Derry, S.J., Renkl, A., Wortham, D.: Learning from examples: Instructional principles from the worked examples research. Review of Educational Research 70(2), 181–214 (2000) 2. Beck, J.E., Chang, K., Mostow, J., Corbett, A.T.: Does Help Help? Introducing the Bayesian Evaluation and Assessment Methodology. In: Woolf, B.P., A¨ımeur, E., Nkambou, R., Lajoie, S. (eds.) ITS 2008. LNCS, vol. 5091, pp. 383–394. Springer, Heidelberg (2008) 3. Corbett, A., Anderson, J.: Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction (1995) 4. Gonz´ alez-Brenes, J., Huang, Y., Brusilovsky, P.: General features in knowledge tracing: Applications to multiple subskills, temporal item response theory, and expert knowledge. In: Educational Data Mining (2014) 5. Gu, J., Wang, Y., Heffernan, N.T.: Personalizing Knowledge Tracing: Should We Individualize Slip, Guess, Prior or Learn Rate? In: Trausan-Matu, S., Boyer, K.E., Crosby, M., Panourgia, K. (eds.) ITS 2014. LNCS, vol. 8474, pp. 647–648. Springer, Heidelberg (2014) 6. Hsiao, I.-H., Sosnovsky, S., Brusilovsky, P.: Guiding students to the right questions: adaptive navigation support in an e-learning system for java programming. Journal of Computer Assisted Learning 26(4), 270–283 (2010)

Challenges of Using Observational Data

637

7. Huang, Y., Gonz´ alez-Brenes, J., Kumar, R., Brusilovsky, P.: A framework for multifaceted evaluation of student models. In: Educational Data Mining (2015) 8. Najar, A.S., Mitrovic, A., McLaren, B.M.: Adaptive Support versus Alternating Worked Examples and Tutored Problems: Which Leads to Better Learning? In: Dimitrova, V., Kuflik, T., Chin, D., Ricci, F., Dolog, P., Houben, G.-J. (eds.) UMAP 2014. LNCS, vol. 8538, pp. 171–182. Springer, Heidelberg (2014) 9. Sao Pedro, M., Baker, R., Gobert, J.: Incorporating scaffolding and tutor context into bayesian knowledge tracing to predict inquiry skill acquisition. In: Educational Data Mining, Memphis, TN, pp. 185–192 (2013) 10. Schwonke, R., Wittwer, J., Aleven, V., Salden, R., Krieg, C., Renkl, A.: Can tutored problem solving benefit from faded worked-out examples. Proceedings of EuroCogSci 7, 59–64 (2007) 11. Velasquez, N.F., Goldin, I., Martin, T., Maughan, J.: Learning aid use patterns and their impact on exam performance in online developmental mathematics (2014)

Suggest Documents