Open Educational Resource Assessments (OPERA)

7 downloads 0 Views 137KB Size Report
Open Educational Resource Assessments (OPERA). Tamara Sumner1, Kirsten Butcher2, and Philipp Wetzler1. 1 Institute of Cognitive Science, University of ...
Open Educational Resource Assessments (OPERA) Tamara Sumner1, Kirsten Butcher2, and Philipp Wetzler1 1

2

Institute of Cognitive Science, University of Colorado Department of Educational Psychology, University of Utah {Sumner,Phillip.Wetzler}@colorado.edu, [email protected]

1 Background and Significance “Share, Remix, Reuse — Legally”, the tagline for creative commons, cogently captures the ethos of peer production. Through the rapid growth of open educational resources (OER), peer production has begun to play a major role in how we teach and learn. OER are teaching and learning resources that reside in the public domain or have been released under licensing schemes that allow their free use or customization by others. They encompass a multiplicity of media types, including lesson plans, animations, videos, scientific data, etc. OER can be created by scientific institutions, by university faculty, by K-12 teachers, or by learners. Here, we focus on K-12 teachers engaging in peer-production for instructional purposes. Central to OER vision is the assumption that peer-production processes lead to a cycle of continuous improvement. Namely, educators find useful OER on the Web (reuse), adapt and/or combine them to better meet their needs (remix), and then share their new resources with others. However, these skills require two types of knowledge that individuals often lack: content knowledge and metacognitive skills [1]. To reuse OER, teachers need to make difficult, complex, and time-consuming judgments to assess how well OER suit their instructional purposes. Judgments about the quality and appropriateness of OER are influenced by the information present in the resource, structural and presentational aspects of the resource, and the user’s content knowledge about the topic [2]. To adapt or remix OER, teachers need to think strategically about how to leverage the strengths and compensate for the weaknesses of particular resources (e.g., by clarifying learning objectives or adding reflective questions). To realize the transformative promise of OER, there is a critical need for reliable and scalable software tools that can help educators characterize the instructional quality of OER and use them more effectively in their own peer-produced resources.

2 Approach and Motivating Use Case Our research is investigating how open educational resource assessments (OPERA) can increase the effectiveness of educators’ peer production practices. Open educational resource assessments use sophisticated algorithms combining machine learning and natural language processing to automatically analyze OER along dimensions important to teaching and learning, such as “organized around learning goals,” or “effective use of representations.” The goal is not to produce a single thumbs-up-or-down V. Aleven, J. Kay, and J. Mostow (Eds.): ITS 2010, Part II, LNCS 6095, pp. 414–416, 2010. © Springer-Verlag Berlin Heidelberg 2010

Open Educational Resource Assessments (OPERA)

415

decision on the overall quality of a resource, but to produce a rich profile characterizing the different strengths and weaknesses of a resource to aid human judgments. We do not hypothesize useful indicators; we have carefully studied the cognitive processes of skilled educators and designed software models to approximate their processes. We have developed: (a) a set of software models capable of assessing OER and approximating expert human judgments for a variety of quality indicators, and (b) a methodology for identifying and operationalizing potentially useful indicators through empirical studies of human decision-making processes. To illustrate the potential impact of OPERA, consider a teacher creating a webquest to help students understand how changes in the water level of the Colorado River impact predators and prey. The teacher wants to include background readings on environmental adaptation and resources that enable students to use scientific models. Imagine that the search engine and the editing tools are OPERA-enhanced: the OPERAenhanced search results note that one resource is from “a highly reputable sponsor” and another is “making effective use of representations.” The “reputable sponsor” resource provides an excellent background reading and a powerful instructional video; the “effective representations” will form the backbone of her webquest, since it contains a series of simulations that enable students to explore what-if scenarios for a fictional watershed. As she saves her webquest, the OPERA-enhanced editor notes that the webquest not “organized around learning goals” and lacks clear “instructions” (Figure 1). She revises her webquest to include questions to guide student thinking and instructions on how to use the simulations and animations to explore questions about the relationship between water level and predators/prey.

Fig. 1. Mock-up illustrating possible quality profile visualization

3 Results As a first step towards realizing the OPERA vision, we have demonstrated the feasibility of automatic quality assessments for a single domain: high school Earth science. We conducted a series of experiments to identify an initial set of quality indicators; i.e., a set of criteria useful for assessing the instructional quality of OER for K-12 settings. Indicators were identified and characterized using a set of rich qualitative and quantitative data on expert evaluative processes involving re-analyses of data sets from other projects as well as a lab study with science education experts. We then developed computational quality assessment models, one model for each indicator.

416

T. Sumner, K. Butcher, and P. Wetzler

Initial models were created and evaluated following a standard supervised machine learning approach: models were trained using a carefully prepared corpus of annotated examples, then evaluated on previously unseen corpus examples. Details of the experimental and computational methodologies are published elsewhere [3]. Our test bed was 1000, high-school level educational resources drawn from the Digital Library for Earth System Education (www.DLESE.org). Our machine learning models analyze a resource and determine whether or not a quality indicator is present. Every decision looks at a complete resource – containing multiple web pages, rich media, and PDF files – as a unit. Since machine learning models operate on numerical vectors, we build a vector representation of each resource by extracting a number of numerical and yes/no features. Some features are taken straight from the text (e.g. groups of words in the resource), and some make use of non-textual elements (e.g., HTML structure); other features include the host domain (URL), or the sites linked to the resource. The system analyses these vectors, generated from the training corpus, to learn a statistical model for each indicator. We use a support vector machine approach to machine learning. Training parameters are chosen using cross validation: we repeatedly build a model from one part of our training data and evaluate it on the rest, each time refining the parameters of the algorithm. We then compare the results to a simple baseline: always assume the most common case (e.g., the has instructions indicator is present in 39% of resources; if we always assume that a resource has no instructions, we’d be correct in 61% of cases). Table 1. Model Evaluation Performance Results Indicator Has instructions Has prestigious sponsor Indicates age range Identifies learning goal Organized for goals

Baseline 61% 70% 79% 72% 75%

Models 78% 81% 87% 81% 83%

Table 1 shows the performance of the machine learning models relative to this baseline. Good improvements over the baseline were achieved on has instructions and has prestigious sponsor, and moderate improvements on the indicates age range, organized for goals, and identifies learning goals indicators. These results are very encouraging in that, even using basic features, we classified many indicators well.

References 1. Lin, L., Zabrucky, K.: Calibration of comprehension: Research and implications for education and instruction. Contemporary Educational Psychology 23, 345–391 (1998) 2. Sumner, T., Khoo, M., Recker, M., Marlino, M.: Understanding Educator Perceptions of “Quality” in Digital Libraries.. In: 3rd ACM/IEEE Joint Conference on Digital Libraries, pp. 269–279. ACM Press, New York (2003) 3. Bethard, S., Wetzler, P., Butcher, K., Martin, J., Sumner, T.: Automatically Characterizing Resource Quality for Educational Digital Libraries. In: 9th ACM/IEEE Joint Conference on Digital Libraries, pp. 221–230. ACM Press, New York (2009)