Learning Strategies and Transfer in the Domain of Programming

Learning Strategies and Transfer in the Domain of Programming

Peter Pirolli and Margaret Recker University of California, Berkeley

Running head: Learning strategies and transfer

Learning Strategies and Transfer

Page 2

Abstract We report two studies involving an intelligent tutoring system for Lisp (the CMU Lisp Tutor). In Experiment 1, we develop a model, based on production system theories of transfer and analogical problem-solving, that accounts for effects of instructional examples, the transfer of cognitive skills across programming problems, and practice effects. In Experiment 2, we analyzed protocols collected from subjects as they processed instructional texts and examples before working with the Lisp Tutor and protocols collected after subjects solved each programming problem. The results suggest that the acquisition of cognitive skills is facilitated by high degrees of metacognition, which includes higher degrees of monitoring states of knowledge, more selfgenerated explanation goals and strategies, and greater attention to the instructional structure. Improvement in skill acquisition is also strongly related to the generation of explanations connecting the example material to the abstract terms introduced in the text, the generation of explanations that focus on the novel concepts, and spending more time in planning solutions to novel task components. We also found that self-explanation has diminishing returns. Finally, reflection on problem solutions that focus on understanding the abstractions underlying programs or that focus on understanding how programs work, seems to be related to improved learning.


Page 3

Introduction This paper reports our investigations of the evolution of knowledge in a complex learning environment in which learners work through instruction on computer programming using an intelligent tutoring system for Lisp. Although the technology involved is by no means common, the general structure of the learning environment is one that is commonplace in modern education: learners must process expository materials, including relevant examples, and then acquire problem-solving skill by working through sets of exercise problems. Relying on computational models of learning as a backdrop for our analyses, we are interested in examining some of the ways in which instruction and examples affect the development of expertise, how expertise transfers across programming tasks, and how differences in learners' interpretations of instruction and learners' reflections on their solutions affect the development of expertise. Our work can be seen as directly flowing from three recent strands of research. The first (Anderson, Conrad, & Corbett, 1989) is a broad analysis of the acquisition of cognitive skill in studies of the CMU Lisp Tutor, an intelligent tutoring system for Lisp programming, developed at CarnegieMellon University (Reiser, Anderson, & Farrell, 1985). Anderson et al. (1989) found a remarkable systematicity in the trajectory of development of complex programming skills, and were able to attribute a large proportion of this systematicity to a few simple learning mechanisms. Anderson et al. themselves, however, noted that they were not addressing admittedly important variations in learning due to learners' interpretations of the instructional texts associated with the Lisp Tutor. A second strand of background research is that of Pirolli (1991) on the transfer of knowledge from instructional examples of programs to novel skills for programming recursive functions, as well as the transfer of cognitive skills across programming problems. As noted by Pirolli (1991), his model of these results did not address how the learners varied in their encoding of the examples. The third strand of research is the study by Chi, Bassok, Reiman, Lewis, and Glaser (1989), which focused on analyses of differences in learners' explanations of instructional examples, and how these differences were related to subsequent performance on a set of physics problems. Our aim is to extend, refine, and integrate these complementary analyses. Our empirical studies focus on a Lisp Tutor lesson on programming recursion, which typically takes place over the course of several hours and involves several programming assignments that students find difficult. Part of the eventual aim of this line of research is to develop formal models that provide integrated accounts of subject-matter understanding, problem solving, and cognitive skill acquisition. In the studies presented in this paper, we adopt the following strategy. We develop a model of


Page 4

knowledge transfer relations among examples and problem exercises and fit this model to data obtained with the Lisp Tutor. One point of this venture is to establish that previously obtained effects extend to the current experimental paradigm. Another aim of this model-fitting is to establish calibration and corroboration for our analysis of the task environment and relevant knowledge. Having established this analysis, we turn to relatively more exploratory data analyses of individual differences in learning strategies, and the relations among these strategies, subject-matter understanding, and problem-solving performance. Basically we view these studies as an attempt to begin a systematic exploration of a intricate parameter space governing the complex learning environment surrounding the Lisp Tutor. This exploration is founded on the notion (suggested in Newell & Simon, 1972) that one can establish analyses of the first-order phenomena associated with particular knowledge-performance states, then second-order phenomena of learning transitions among these states, and then third-order phenomena governing learning to learn and learner-environment interactions. We assume that this approach is more sophisticated and scientifically productive than sets of independent experimental hypothesis tests or unsystematic protocol studies.

Production System Analysis of Programming Writing Our analyses of learning to program assume a central distinction between procedural and declarative knowledge, articulated in the ACT family of theories (Anderson, 1976, 1983, 1989), that is explicitly based on Ryle's (1949) distinction between knowing how and knowing that . The analyses assume that production rules can be used to represent procedural knowledge, or what we often call cognitive skill or know-how. Production rules represent potential transitions in cognitive state by identifying how specific conditions in the task environment and working memory evoke specific cognitive and physical actions. Knowledge represented by production rules (or productions) is not inspectable by reflection. Declarative knowledge, on the other hand, is assumed to be the the sort of knowledge that an agent can reflect upon, interpret, or elaborate. In our early studies of programming recursive functions (Anderson, Pirolli & Farrell, 1988; Pirolli, 1986; Pirolli & Anderson, 1985) we collected protocols from individual learners and experts working on a variety of problems from several available textbooks. Computer simulations, implemented as production systems, were developed to address data from individual episodes of problem solving. One purpose of developing these simulations was to refine our analysis of the structure of the task of writing recursive programs, and to refine an analysis of the knowledge and skills required to perform that task. The specific production system model used to guide the present set of data analyses was presented in detail in Pirolli (1986). This is also the production system underlying the ideal model


Page 5

used by the Lisp Tutor (Reiser, Anderson, & Farrell, 1985) for its lessons on recursion. This model addresses both the specification of programs from given problem statements, and the abstraction of programming plans from self-generated examples of input-output relations. The specification of programs is characterized as the decomposition of a problem into subgoals, and the iterative application of programming plans to set new subgoals, until some elementary level of programming actions is achieved. Such elementary actions largely involve the writing of code. Abstraction, in these models, is carried out by generating specific examples of inputs and outputs, and inducing the general relationship that characterizes the examples. This relationship takes the form of a new programming plan that can then be specified through further decomposition. Of particular importance in coding recursive functions is the abstraction associated with the induction of the recursive relation, which is the relationship between the final output produced by a function and the result produced by a recursive call to the function. For instance, in writing a recursive function to generate the factorial ofn, or n!, one has to know that n! = n • (n - 1)!. The notion that the writing of programs can be characterized by sets of programming goals and programming plans agrees with many other studies of programming (Rich, 1981; Spohrer, Soloway, & Pope, 1985; Waters, 1985). In our own models, programming plans are represented by production rules that decompose a programming goal into other subgoals, and elementary programming actions are represented by productions that match to internal goals and generate calls to perform the elementary actions. Abstraction is carried by productions that search for, and recognize, patterns in sets of inputoutput examples and that generate abstract programming plans. We view such production system models as key to the study of factors that influence the acquisition, development, and transfer of cognitive skills. Indeed, one of the successes of our early simulation work (Anderson et al., 1988) was the development of programs that could track the evolution of cognitive skill for individual learners over the course of several problem-solving episodes.

The Learning Paradigm Figure 1 sketches our general framework for the analysis of learning in our studies. For each programming lesson in these studies, subjects read through textbook material and then solved problems under the guidance of the Lisp Tutor. In this sort of situation, we assume that the learner uses his or her prior knowledge to actively interpret and explain the available instructional texts and examples. These interpretations and elaborations yield declarative knowledge that may become stored in the learner's memory. Following Chi et al. (1989), we call such interpretation and elaboration selfexplanation. We think of this processing of instructional texts and examples as being an expansion of recent text processing models (Kintsch, 1986; van Dijk and Kintsch, 1983) in which the objective of


Page 6

the learner is to construct a coherent mental model (Johnson-Laird, 1983) that interprets the presented material. The declarative knowledge that gets constructed is the result of an interaction of the presented material and the constructive self-explanation strategies employed by a particular person. Continuing with Figure 1, after reading through instructional materials, learners are presented with sets of programming problems as exercises with the Lisp Tutor. Solving these exercise problems involves some mix of familiar and novel subtasks. Familiar subtask situations evoke previously acquired production rules (cognitive skills). The effectiveness and efficiency of application of these productions is improved by such practice through strengthening mechanisms (Anderson, 1983; Singley & Anderson, 1989). Other subtask situations in a programming problem exercise will be novel. These situations cause problem-solving impasses (Newell, 1990) that can be resolved through the use of general problem-solving methods, such as analogy, and by making use of relevant declarative knowledge obtained from processing the instructional texts and examples. Declarative knowledge will be used interpretively to guide the search for novel solutions at problem solving impasses. The effectiveness and efficiency of problem solving in these novel situations is, to a large degree, contingent on the particular declarative interpretations constructed by individuals for the given instructional texts and examples. _______________________ Insert Figure 1 about here _______________________ Problem solving at these impasses results in the acquisition of new task-specific cognitive skills represented as production rules. We assume that such skill acquisition occurs by some variant of the automatic knowledge compilation mechanisms proposed in ACT* (Anderson, 1987). The structure and content of new productions is directly determined by the structure and content of problem solving at impasses. New declarative knowledge about the domain may also arise as learners reflect on their problem solving or the structure and rationale for their solutions. Of the learning processes involved in Figure 1, only knowledge compilation mechanisms are assumed to be part of the cognitive architecture. Other learning processes, such as self-explanation and reflection, are learning strategies brought to bear by learners These learning strategies are assumed to themselves be cognitive skills acquired from previous experience, and these vary across learners. New productions rules (cognitive skills) originate in the operation of procedural knowledge over declarative knowledge sources. Instructional resources, such as texts and examples, can vary


Page 7

in the degree to which they afford interpretations that will be specifically relevant to overcoming future problem-solving impasses. Interpretations of such instructional resources may also vary across learners, with some learners coming to more meaningful and generalizable interpretations than others. Variations in declarative knowledge sources can affect the ease with which problemsolving impasses are overcome, and the quality of the productions that result from knowledge compilation over that problem solving. Variations in the examples or texts presented to learners, and variations in self-explanation or reflection will affect performance in novel situations and the skills derived from such performance. Thus, both pedagogical factors and subject factors can influence the acquisition and effectiveness of cognitive skills. This framework of analysis is an elaboration of production system models of cognitive skill acquisition and transfer (Bovair, Kieras, & Polson, 1990; Kieras & Bovair, 1986; Pirolli, 1991; Singley & Anderson, 1989). Singley and Anderson (1989) noted four properties of the production rule representation of ACT* that are relevant to predictions of the transfer of cognitive skill from one task to another. First, productions are learned independently and transfer independently. Second, productions come into existence in an all-or-none manner through the process of knowledge compilation. Third, productions accrue strength with use, which predicts greater reliability and speed with practice. Fourth, production rules are partially variabilized or abstract, which means that they can apply across situations. Collectively, these properties imply that one can identify which parts of a task will involve productions acquired earlier and their current strength, as well as the parts of a solution that will be novel and require the interpretive use of declarative representations. In the current studies, one can use the ideal model production rules of the Lisp Tutor to frame such transfer analyses. The assumption that productions are acquired independently and transfer independently implies that the expected performance on a problem, in terms of errors or solution time, is just the sum of the expected errors or execution time of the individual productions involved in the solution. The expected error rates and times for individual productions improve as a function of learning and practice opportunities.1 The opportunities for individual productions can be identified by examining production system traces over problem exercise sequences (see Pirolli, 1991, for details). This sort of analysis can be made more tractable through a few simplifications based on prior results. This more tractable analysis is probably more extensible to situations in which one does not

1In the current studies, as in similar studies of the Lisp Tutor (Anderson et al., 1989), we report on error

rates as measures of performance because they are more reliable than time measures. Our analyses of latency measures are consistent with the error analyses.


Page 8

have an intelligent tutoring system to provide on-line detailed behavioral recordings. From previous studies of similar programming problems and inspection of the ideal model (Pirolli, 1986) one can expect there to be little variability in the problem space structure for the programming solutions we are studying. For our analysis, we use traces of ideal model productions along the minimal solution paths for our programming problems. These are almost always a proper subset of other variant solutions. On the basis of two other observations, we aggregate production firings in these traces into three major subsets. The first observation (Anderson et al., 1989) is that large performance improvements, of about 50%, occur at the initial learning opportunity for a production. The second observation (Pirolli, 1991), is that the learning functions for productions are improved when an instructional example illustrates problem solving steps analogous the productions. Improvements of over 60% at the initial production learning opportunity have been obtained, and the examples do not appear to affect the production learning rates (Pirolli, 1991). Based on these two observations, we partition production firings based on whether there have been prior learning opportunities for the productions, and based on whether production learning is expected to be facilitated by an available example.

Figure 2 illustrates how this is done. For a particular programming problem one can identify the ideal model productions that may be used to write the program that solves the problem. A subset of these productions will interesect with productions that were used to write programs in earlier exercises. These are labelled old productions in Figure 2. Another subset of productions correspond to novel problem situations in which new cognitive skills will be acquired. We call these new productions. The acquisition of some of these new cognitive skills will be facilitated because an earlier example program illustrates the relevant problem-solving step. As was done in Pirolli (1991), we identify these new example productions simply by determining which new productions would be used by the ideal model simulation to code the example.2 The remaining new productions are the new nonexample productions in Figure 2. The expected performance on a programming problem can then be treated as the sum of the performance over each subset of production firings. Within each subset partition, the expected performance is just the number of production firings in that subset times the

2This does not mean that the theory is making the assumption that subjects somehow encode productions

directly from the example. Rather, for the purposes of analysis, we assume that if an ideal model production would be used to produce the example solution, then there is a higher probability that more effective declarative knowledge relevant to acquiring that production will be encoded. See Pirolli (1991) for more detail on the rationale for this assumption.


Page 9

mean expected performance for productions of that type. For instance, if the mean expected error rate on old productions is pO, new example productions is pE, and new nonexample productions is pN, and there are k old production firings, m new example production firings, and n new noneample production firings in a problem solution, then the expected errors on the problem is just k p O + m p E + n pN

Prior Research on Individual Differences Using the Lisp Tutor A recent paper by Anderson et al. (1989) reported an extensive analysis of data collected by the Lisp Tutor for its early lessons. In general, the results suggest that the ACT* representation of cognitive skills as productions captures many regularities in the learning data collected by the Lisp Tutor. Skill acquisition appears simple, after declarative information is encoded and applied in a problem solving context. Anderson et al. (1989) performed factor analyses in search of individual differences in learning skills corresponding to Lisp Tutor productions. These factor analyses failed to reveal any indication that certain subsets of learners found certain subsets of productions easier to learn. In other words, although there might be interesting variability in the raw abilities of learners, and there is variability in the difficulty associated with learning specific productions, Anderson et al. could find no learner by production interactions. Factor analyses did reveal, however, that learners could be associated with two general learning attributes. One learning factor was associated with the ease of acquisition of new productions, and the other factor was associated with the retention of previously learned productions. The Anderson et al. (1989) study did not include any measures concerning how learners acquire and develop the declarative information that is necessary for initial performance of novel programming tasks. Their analyses suggesting individual differences in production acquisition and retention could well be capturing important learner variability in abilities for constructing and applying useful declarative knowledge. Anderson et al. could not further analyze such factors because the Lisp Tutor provides no means for recording what subjects are doing as they process their instructional texts and examples. In the current studies we record learners' think-aloud protocols as they work through their instructional texts and as they work with the Lisp Tutor.

Individual Differences in Self-explanation


Page 10

A recent paper by Pirolli (1991) provides results that complement the Anderson et al. (1989) findings. Pirolli (1991) examined the acquisition of programming skill for recursive functions. This is a programming topic beyond those studied by Anderson et al.(1989), and known to be much more difficult than topics that preceeded it (Pirolli, 1986). Pirolli (1991) reported a study in which examples and their associated explanations were varied in an instructional text. Different pedagogical explanations of the same example program were found to produce substantial differences in learning efficiency. Specifically, learners showed a more rapid progression through a set of training problems, involving the writing of recursive functions, if they were presented with instruction containing an explanation of the example that focused on the underlying abstract structure of the example (how the functions are written). Less rapid learning was exhibited when the explanation focused on a trace of the execution of the example program (how the functions worked). This suggests that different interpretations of examples, communicated by the external texts, produced different declarative representations of the example solutions that had different degrees of utility in skill acquisition. However, the Pirolli (1991) work only dealt with analyses of pedagogical (treatment) effects, not subject effects, nor subject-treatment interactions. Work by Chi et al. (1989) addressed important subject-related factors in learning from examples in their studies of the role of self-explanation in learning. One goal of the present paper is to replicate and extend the work of Chi et al. (1989) in the domain of programming. Their study examined how students explained worked-out examples in physics prior to solving problems. The analysis concentrated on the verbal protocols of students' self-explanations and examined how these correlated with subsequent problem solving performance. The students were divided into groups of "Good" and "Poor" subjects, based on a post-hoc median split of their performance in solving a set of physics problems. Chi et al. (1989) found that, overall, the good learners made significantly more elaborations than the poor students and consequently spent more time studying the examples. The self-explanations of good students also exhibited significant qualitative differences. Good students generated more lines of protocols relating to particular ideas, they focused more on content relating to the current physics topic, and they attempted to relate steps in the worked examples to physics principles introduced in the textbook. Additionally, the good students seemed better able to detect their own comprehension failures. Chi et al (1989) concluded that good students seem to be able to explain and justify steps in the worked examples. This enables them to infer conditions and consequences behind problem solving goals in the examples. Good students also exhibit better monitoring of their understanding process, enabling them to detect and possibly resolve comprehension failures. These factors all lead to a richer, deeper understanding of the topic, contributing to better use of declarative knowledge in resolving problem solving impasses.


Page 11

Overview of the Experiments We report two experiments in this paper. In Experiment 1, we manipulated the instructional examples available to subjects, and the sequences of problems encountered by subjects as they proceeded through a lesson on recursion in the Lisp Tutor. By using a production system analysis of examples and problems, specific predictions could be made about the effects of the examples on initial problem solving performance. In a similar vein, transfer of skills and practice effects between problems could be isolated. Experiment 1 provides a baseline analysis of these pedagogical factors (effects of examples, transfer of skills, and practice effects) in the current paradigm, as well as a calibration of the analysis method outlined above. In Experiment 2, we largely replicated Experiment 1 with the additional collection of verbal protocols from subjects throughout the lesson on recursion. The purpose of this experiment was to investigate the effects of learner variability on skill acquisition. In particular, by analyzing subjects’ self-explanation and reflection verbal protocols, we could isolate those strategies that correlated with successful programming performance.

Experiment 1 In Experiment 1, we focused on an analysis of transfer effects due to the structure of the instructional environment for students using the Lisp Tutor. Specifically, we examined the impact of examples on skill acquisition, the transfer of skills across tasks, and practice effects. Subjects were divided into two groups such that different examples of recursive functions were provided to each group, and all subjects used the Lisp Tutor to solve sets of programming problems. The production system models employed by the Lisp Tutor were used to identify components of programming solutions that were novel, and components which would evoke previously acquired skills. The production system model was also used to identify the degree of similarity between the available example programs and any given target problem. Following Kieras and Bovair (1986), we used two different fixed training sequences of problems to provide variation in transfer predictions.


Page 12

Method Subjects. Twenty adults participated in Experiment 1. Twelve of the subjects were Carnegie-Mellon University undergraduates who were enrolled for credit in a self-paced introductory Lisp programming course that used the Lisp Tutor. The remaining eight subjects were solicited through an advertisement in a University of California, Berkeley, student newspaper, and were paid for participating in the experiment. None of the subjects had prior Lisp programming experience. Assignment to groups was random. Instructional materials. Subjects proceeded through a set of lessons on Lisp programming. For each lesson, subjects had to read a lesson booklet, and had to solve a set of exercise problems using the Lisp Tutor. The lesson of interest was the lesson on elementary recursive functions that operate on numeric and list inputs. Before the recursion lesson, subjects worked through six lessons covering elementary Lisp functions, user-defined functions, predicates and conditionals, the use of user-defined sub-functions, prog structures and input-output, and iteration on numeric inputs. The booklets for these lessons were early drafts of chapters in the textbook by Anderson, Corbett, and Reiser (1987). The lesson booklet on recursion was also an earlier draft of the chapter on recursion in Anderson et al. (1987). The design of the Lisp Tutor is presented in Reiser et al. (1985), and the background research and design of the recursion modules are presented in detail in Pirolli (1986). Anderson et al. (1989) provide a description of analyses of Lisp Tutor data that is similar in spirit to our own. The Lisp Tutor instructs using a model tracing methodology, which involves comparing a student's programming behavior to the behavior of the Lisp Tutor's internal ideal and buggy models. An ideal model is a GRAPES production system model of the programming skill to be acquired by subjects. A buggy model is a representation of common misconceptions and mistakes made by subjects. To anthropomorphize a little, the Lisp Tutor essentially watches what the student is doing as he or she programs and compares this behavior to what it knows about what good students (ideal models) or poor students (buggy models) might do. If the student's behavior corresponds to that of an ideal model then the Lisp Tutor usually does not interfere (except in cases where some major program planning may be forthcoming, or the Lisp Tutor needs to determine which of several goals the student is attempting). If, on the other hand, the student's behavior matches that of a buggy model then the


Page 13

Lisp Tutor interrupts with feedback, hints, and help that is intended to get the student back on the correct solution path and overcome the student's misconception. When a student does something that does not match anything in the system's ideal or buggy models, the system attempts to at least provide enough information so that the student can figure out the next correct step. If the student shows evidence of floundering (i.e. by making an error three times in a row), the Lisp Tutor will show the student the next programming step. More specifically, learners are presented with programming problems and must write code to solve those problems. The ideal and buggy models basically structure the space of problem solving. On each cycle, the tutor sets an internal programming goal and places the cursor over a slot on the screen where the next program decomposition will take place. The student enters some code (usually a single Lisp atom) and the tutor, using its internal ideal and buggy models, categorizes the input as correct or incorrect. Frequently, more than one correct program decomposition is possible at a goal. If the code is correct, then the tutor sets a new goal internally and a new cycle begins. If the code is incorrect, then the tutor provides feedback and resets the same goal for the next cycle. Unless the student makes an error, she simply types in her code as the tutor performs its internal categorization of inputs and setting of internal goals. For the lesson on recursion, subjects had to solve 10 programming problems. Most of the problems and their solutions can also be found in Anderson et al. (1987). Four of the problems were designated as number recursion problems, because they required the definition of recursive functions that operate on integer inputs. These are labelled Problems N1 through N4. Four other problems required the definition of recursive functions that operate on lists. These problems are designated aslist recursion problems and labelled Problems L1 through L4. Finally two problems required the definition of recursive functions that involve a mix of list and numeric operations. These are called mixed problems and are labelled Problems M1 and M2. Descriptions of the programs associated with these problems are as follows: •

Power (Problem N1). A function that takes two whole number inputs, m and n, and computes mn.

•

Fact (Problem N2). A function that takes a whole number input n and computes n!.

•

Sumodd (Problem N3). A function that takes as input a whole number n, and returns the sum of all odd numbers from zero up to, and including, n.


Page 14

•

Listnums (Problem N4). A function that takes a whole number n as input and returns a list of all numbers from zero to n.

•

Intersect (Problem L1). A function that takes two list inputs, s1 and s2, representing sets, and returns a list representing the intersection of the input sets, s1 ∩ s2.

•

Setdiff (Problem L2). A function that takes two list inputs, s1 and s2, representing sets, and returns a list representing the set difference of the input sets, s1 - s2.

•

Add1nums (Problem L3). A function that takes an input list and returns a list containing all the numbers in the input list, incremented by one, but does not contain any of the nonnumeric inputs.

•

Negnums (Problem L4). A function that takes an input list containing numbers and returns a list that contains only the negative numbers.

•

Greaternum (Problem M1). A function that takes two inputs. The first input is a list of numbers, and the second input is some maximum number. The function returns the first number in the input list that is larger that the given maximum, otherwise it returns the given maximum.

•

Length (Problem M2). A function that takes a list input and returns a whole number that indicates the number of elements in the input list.

Two example programs were used in the texts for separate groups of subjects. One example was a list recursion example, called Carlist, that takes an input list, which contains embedded list elements, and returns the first element of each list element in the input list. The second program was a number recursion example, called Sumall, which takes a whole number as input, and sums all numbers from zero to the input number. Procedure Subjects were divided into groups such that 11 subjects received the text on recursion that included the list recursion example and nine subjects received the booklet containing the number recursion example. After reading their texts on recursion, subjects solved 10 recursion programming problems using the Lisp Tutor. Two sequences of problems were used and this grouping factor was


Page 15

crossed with the example grouping. A blocked sequence of problems was presented to 11 subjects, in which four number recursion problems were followed by four list recursion problems, followed by the two mixed problems. The specific problem sequence was N1, N2, N3, N4, L1, L2, L3, L4, M1, and M2. An intermixed sequence of problems was presented to the other nine subjects, in which the presentation of number and list recursion problems was interleaved and then followed by the two mixed problems. The specific problem sequence in this case was N1, L1, N2, L2, N3, L3, N4, L4, M1, and M2. Subjects read their texts and solved their Lisp Tutor problems individually. The texts were available throughout the sessions. An experimenter was present to help with system-related problems, but provided no instructional help.

Results and Discussion For each programming problem, we summed the number of errors made by each subject. Error data were provided by the Lisp Tutor analysis of the number of incorrect entries of code corresponding to ideal model production rules (or equivalently, the number of incorrect tries during each student-tutor interaction at a programming goal). Figure 3 presents the mean number of errors per problem across problem trials for the blocked sequence for groups receiving either a list recursion example or a number recursion example. Figure 4 presents the same data for the intermixed problem sequence. Problems are labelled along the abscissa, and their left-to-right ordering corresponds to the presentation orders. The plotted data points correspond to the mean errors for the two example groups over the two problem presentation sequences, and the curves in Figures 3 and 4 plot the mean of the two example groups for each presentation sequence. At first glance, the data in Figures 3 and 4 appear to be highly varied, but we will show that they have an underlying pattern that can be fit rather well by a model that takes into account the transfer effects from declarative knowledge of examples to performance, the transfer of productions across problems, and strengthening effects. We begin with a general analysis of the trends in Figures 3 and 4, and then move to a quantitative model based on a production system analysis.


Page 16

General Effects of Problem Sequencing and Examples An ANOVA was conducted on the subject error data summarized in Figures 3 and 4, as transformed by a square root transformation . This transfromation was identified as appropriate from examination of the raw data and range statistics computed for a variety of standard transformations (Winer, 1962, pp. 218-222). As is suggested by Figures 3 and 4, there was a strong main effect of trials, F(9, 144) = 3.85 p < .0001, MSe = 2.49, and an interaction of problem presentation sequence (blocked or interleaved) with problem trial, F(9, 144) = 5.19, p < .0001. Interestingly, there was no main effect of presentation sequence, indicating that the two kinds of sequence did not produce substantially different performance overall, F(1, 16) = .30, MSe = 9.89. This lack of main effect of problem sequencing is consistent with the production system account of transfer. Within certain bounds, problem sequencing should have no effect on acquisition or transfer (Singley & Anderson, 1989, pp. 227-229). _______________________ Insert Figures 3 and 4 about here _______________________ Subjects appear to have performed better, overall, when the example was of the same type as the problems. On the number recursion problems, the number recursion example group scored a mean of 2.64 errors, and for the list recursion example group the mean was 4.19 errors. In contrast, on the list recursion problems, number recursion example group scored a mean of 6.25 errors, and for the list recursion example group it was 4.02 errors. This interaction of example with problem type on the transformed error scores was significant, t(144) = 2.37, p < .01. One of the critical components of learning to program recursion is developing the ability to plan the recursive cases of a recursive function. This is the portion of a recursive program that specifically involves formulating and coding the recursive calls to the function, and determining how the results of those calls will be used to construct a final result. On each recursion problem, the Lisp Tutor prompts the user with a menu to identify the correct general plan (stated in English) for the recursive cases. This is the only place where the tutor interrupts in problem solving to ask for such general plans, and if the subject fails to identify the correct plan from three distractor items, the tutor guides the subject through a set of examples and in the abstraction of that plan from the examples (see Pirolli, 1986 for details). Consequently, we can examine whether the instructional examples had an effect on subjects' abilities to accurately recognize correct plans for recursion in their problems.


Page 17

Table 1 presents data for the first list recursion or number recursion problem encountered by subjects. The data are the number of subjects who recognize or fail to recognize the correct plan for the recursive cases in their programs. For subjects presented with the number recursion example, Table 1 shows that more recognized the correct plan for the first number recursion problem than for the first list recursion problem. For the list recursion example subjects, the pattern of plan recognition data is reversed. On list recursion problems, the effect of the example is significant, Fisher Exact p = .02, but on the number recursion problems, the effect is not significant, Fisher Exact p = .18. The interaction of example similarity with recognizing or failing to recognize the correct plan was significant by sign test, p < .01. Thus, it appears that having an example program that is similar to the one that is being written aids in the abstraction of appropriate programming plans. _______________________ Insert Table 1 about here _______________________

Production Rule Analysis of Effects of Examples, Transfer, and Strengthening Our general analyses indicate that examples appear to have an influence on performance with the Lisp Tutor, but problem sequencing does not have an overall effect. We would now like to develop a more specific account of the variation in performance presented in Figures 3 and 4. For each programming problem presented to subjects, we know the ideal model production rules that could be used to solve the problem. To simplify our analysis somewhat, we will ignore the full space of productions that could be used on each problem, and simply note the productions that would be used to generate the minimum length solution path. That is, for each problem, we identify a sequence of ideal model productions that would solve the problem. Using this analysis, we can make predictions about the mixture of transfer effects expected for each problem. First, for each problem-production sequence, we can identify the new productions that subjects will have the opportunity to acquire versus the old productions that could have been acquired on earlier problems. The mixture of new vs old productions that apply to a problem indicate the mixture of novel problem solving steps vs steps involving cognitive skills that transfer from earlier learning. This mixture provides an indication of the amount of expected transfer of procedural knowledge. Second, we can identify productions that might be acquired more easily because of the particular example available to a subject. To do this, we followed a method described in Pirolli


Page 18

(1991): We determined the ideal model production rules that would be used in generating the example programs available in subjects' instruction. We then divided the full set of ideal model productions into those that occurred in the example or not. The assumption was that subjects are more likely to encode declarative knowledge relevant to guiding performance on a novel tasks when the example solution is implicitly the result of similar operations. The mixture of example vs nonexample productions that apply to a particular problem indicates the mixture of problem solving steps for which there is relevant example material vs steps involving no relevant example material. This mixture provides an indication of the expected transfer of declarative knowledge from the example material. Using this two-way classification, we generated four predictor variables for each problem trial for each subject: (a) NEWEX, the number of new productions to be acquired on the problem that occurred in the example, (b) NEWNX, the number of new productions to be acquired that did not occur in the example, (c) OLDEX, the number of old productions that apply to the problem that also occurred in the example, and (d) OLDNX, the number of old productions that apply to the problem that did not occur in the example. These predictors, plus the TRIAL value for each problem were entered into a regression following the procedure described recently by Lorch and Meyers (1990), which is appropriate for repeated measures designs with non-orthogonal variables. Conceptually, this procedure involves the calculation of individual regression analyses for each subject and determining if the distributions of subjects' regression coefficients differ significantly from a null hypothesis prediction (generally a difference of zero). Using the square-root transformed data, the regression yields a high coefficient of determination3 of R2 = .80. Each of the predictors except OLDEX were significant. The coefficients and significance levels were: NEWEX = .29, t(19) = 2.82, p = .005; NEWNX = .71, t(19) = 3.06, p = .003; OLDNX = .12, t(19) = 4.28, p < .0001; and TRIAL = -.08, t(19) = 1.79, p = .045. Furthermore, the relationships among the magnitudes of the regression coefficients are sensible within our model of transfer. The coefficients associated with NEWEX, NEWNX, and OLDNX are all positive, indicating that the number of errors per problem increase with the number of problem-solving steps associated with each kind of production, and the coefficient associated with TRIAL is negative, indicating that errors per problem decrease with practice. More importantly, the magnitudes of the coefficients suggests that novel problem solving steps (NEWEX) yield more errors than steps involving transfer of cognitive skills (OLDNX), and the available examples yield predictable improvements on novel problem solving steps (NEWEX < NEWNX).

3Calculated using Equation 7 in Kvåsleth (1985).


Page 19

We conducted an additional post-hoc analysis along these lines to identify the similarities between example programs and exercise problems. So far, we have categorized our recursion examples and problems according to whether they perform list operations, number operations, or some mixture of the two. This tripartite categorization obviously glosses over many other similarities and differences, and was developed before we realized that the sort of production system analysis we just conducted could be used to identify inter-problem similarity. Figure 5 presents a hierarchical clustering tree of the recursion problems and the example programs used in Experiment 1. The distance metric used to generate the cluster tree was developed by assuming that the similarity of two problems is captured by the proportion of productions shared by their respective solutions. In other words, interproblem similarity is a function of the number of shared skill elements. For each problem we determined the minimal set, A, of productions that could be used to solve the problem. The distance of that problem from another problem requiring the minimal set of productions B was calculated by distance(A, B) = 1 - similarity(A, B)

(1)

with

similarity(A, B) =

|A ∩ B| |A |

(2)

where |X| is the number of elements in set X. _______________________ Insert Figure 5 about here _______________________

Inspection of Figure 5 reveals that the number recursion problems do tend to cluster together, with the number recursion example centrally located in the cluster. The list recursion problems also cluster together, but the list recursion example is somewhat less central to the cluster. Figure 5 also shows that the mixed problems are actually closer to the list recursion problems than the number recursion problems.


Page 20

Summary of Experiment 1 The analyses in Experiment 1 show that the production system analysis captures many important features determining the development of cognitive skill. To a good approximation, we can segment the stream of ongoing situations, cognition, and learning and identify when new skills are being acquired, their degree of practice, and their transfer across situations. Furthermore, the production system formalism can be used to identify the relevance of particular examples to the acquisition of new skills on particular problems. Missing from this analysis is the contribution of subject-related factors, such as how subjects vary in the kinds and quantities of cognitive processing allocated to different aspect of the instructional task. The next experiment investigates some of these subject-related factors, focusing especially on self-regulatory learning strategies.

Experiment 2 Experiment 1 was conducted without the collection of verbal protocols. The Lisp Tutor itself provides data records that are comparable to verbal protocols for the problem-solving aspects of the learning task. However, the results of Chi et al. (1989) suggest that there is important variability in the ways that learners interpret instructional texts and examples. Furthermore, the work of VanLehn (1991) suggests that learners' reflections about their problems and solutions may also have an impact on learning. In order to examine self-explanation and reflection in more detail, we essentially replicated Experiment 1 with the additional collection of verbal protocols.

Method Subjects Twelve subjects were recruited through an advertisement in a University of California, Berkeley, newspaper and through posters. All subjects were either students or recent graduates and had completed at least one semester of college level calculus. Half of the subjects had no prior programming experience. If a subject had any experience, it was at most a one semester course which was not about Lisp and did not cover the topic of recursion. A test for differences in Lisp Tutor performance for subjects with or without prior programming experience was nonsignificant t(10) < 1. Subjects were paid $4 per hour for their participation.


Page 21

Instructional materials The instructional materials for the non-recursion lessons in Experiment 2 were identical to those in Experiment 1. Subjects in Experiment 2 did not work through the lesson covering iteration and prog structures. Although it is generally preferable to teach these topics before recursion, we eliminated them because neither of the constructs is directly used in the recursion lesson, none of the subjects was using the Lisp Tutor to obtain course credit, and because of the time and cost pressures of the lengthy experimental procedure. In the target lesson on recursion, subjects solved 10 programming problems. The instructional booklet for the lesson on recursion was completely rewritten in a sparse manner in order to stimulate subjects’ generation of self-explanations. The instruction contained the following components (on separate pages):

1. An abstract description of the structure and function of the components of recursive functions. 2. An example program. 3. A description of the computational behavior generated by recursive functions. 4. A trace of the example program as it processes an input. 5. Some design heuristics for writing recursive functions. 6. A description of how the design heuristics were used in defining the original example. The first, third and fifth pages were text descriptions. The second page contained an example program which varied across subjects. The example used in the instruction was the same as in Experiment 1: a number recursion (Sumall) or a list recursion (Carlist) function. The fourth and sixth pages were consistent with whatever example had been presented on Page 2. The design heuristics presented on Page 5 were English renditions of production rules in the Lisp Tutor ideal model for coding recursive functions. These rules decompose recursive functions into the terminating cases and


Page 22

the recursive cases. The recursive case is further planned in terms of the recursive relation and the recursive step. Additionally, the design heuristics suggest using concrete examples to assist in deriving the recursive case. The text material was constructed by first developing a propositional text base (van Dijk & Kintsch, 1983) employing concepts that were thought to be consistent with the declarative representations used in the ideal models of the Lisp Tutor. The surface form of each text was then constructed according to a set of principles for the design of technical manuals (Kieras, 1985). None of the text on Pages 1, 3, and 5 referred to any of the example material on Pages 2, 4, and 6. We assumed that this intentional lack of reference between text material and example material would increase any effects of learner self-explanations. Procedure The procedure in Experiment 2 was similar to Experiment 1. One difference was the sequencing of recursion problems. Recall that from the results of Experiment 1, the sequencing of problems should not have an overall effect on performance. Using the same recursion problems as in Experiment 1, subjects received one of two sequences (which was crossed with the example factor). The first was a blocked sequence of four number recursion problems followed by four list recursion problems followed by four mixed recursion problems. The second sequence was a blocked sequence of four list recursion problems followed by four number recursion problems followed by four mixed recursion problems. A second difference was that during the target lesson on recursion, subjects were video-taped and asked to "think aloud" as they read through the instructional booklet and solved problems. Subjects were also asked to explain the examples in the instruction to themselves. The experimenter introduced the verbal protocol procedure (Ericsson & Simon, 1984) by first describing the process of thinking aloud, then illustrating the procedure by thinking aloud while solving an addition problem. Subjects then performed two warm-up tasks. The first required the solution of another addition problem and the second involved coding a previously written Lisp function. Before reading the instruction booklet for the recursion lesson, subjects were instructed to read and think aloud and, specifically, to explain the examples to themselves. Before turning to pages containing example material, subjects were reminded to explain the examples. After reading through the instructional material, subjects worked through 12 recursion problems on the Lisp Tutor. Subjects were allowed to refer back to the instructional booklet at any


Page 23

time during this phase. They were instructed to continue giving verbal protocols as they worked through the problems. These verbal protocols formed the basis for the reflection analysis.


Page 24

Results and Discussion Problem Solving Performance Before presenting our protocol analyses, we discuss some general analyses of problem-solving performance with the Lisp Tutor in Experiment 2. The square-root transformed mean errors per problem for each subject were entered into an ANOVA, with problem sequencing and instructional example as between-subjects factors, and the number of errors on the first eight problems (N1-N4 and L1-L4) as the repeated measures. Overall, there were no main effects of example type, F(1, 8) = .26, MSe = 7.65, sequence type F(1, 8) = 1.61, nor an interaction of sequence with example type, F(1, 8) = .01. As before, there was a main effect of trial, F(7, 56) = 8.25, MSe = 1.16. There were no interactions of example with trials, F(7, 56) = 1.19, but an interaction of sequence type with trials F(7, 56) = 3.57, p = .003 as in Experiment 1. The triple interaction of example by sequence by problems was not significant F(7, 56) = 1.40. Examining effects of the available example, the mean errors per problem on number recursion problems N1-N4 was 3.50 errors with the number recursion example and 5.04 errors with the list recursion example. The mean errors per problem for list recursion problems L1-L4 was 6.29 errors with the number recursion example and 5.71 errors with the list recursion example. Based on the ANOVA described above, a test for this interaction was significant t(56) = 1.87, p = .03, showing, as in Experiment 1, that the examples improve performance on problems that are similar to the example. We also analyzed the effects of sequencing on the error rates on the pooled N1-N4 problems and pooled L1-L4 problems. Recall that two groups of subjects received different sequences of problems. One group received problems N1-N4, followed by problems L1-L4 (sequence NL), whereas the second group received problems L1-L4 followed by problems N1-N4 (sequence LN). The mean errors per problem for problems N1-N4 in the NL sequence was 5.50 errors, and 3.04 errors in sequence LN. The mean errors per problem for problems L1-L4 was 6.33 errors in sequence NL, and 5.67 errors in sequence LN. This interaction is significant with t(56) = 2.07, p = .02, although the error differences for the L1-L4 problems across sequencing is not. The finding of an interaction of problem type with sequencing despite a lack of main effect of problem sequencing is consistent with the results of Experiment 1, and another corroboration of a strong prediction of the production system theory of transfer. Problem sequencing has no discernable overall effect on performance on the entire set of eight problems. On the other hand, problems show less errors when


Page 25

they occur later in a problem sequence rather than earlier. This interaction is predicted because fewer new productions are being acquired when a problem occurs later in the problem sequence.

Protocol Coding Procedure for Individual Self-explanation Utterances The self-explanation verbal protocols were transcribed and segmented into elaborations. An elaboration was defined as a pause-bounded utterance that was not a first reading of the text. Each elaboration was classified into a hierarchical typology of elaborations, described below, along with the instructional content which was being processed at the time of the elaboration. To code references to the instructional content being processed or being referred to, we assigned a unique identifier to each proposition and concept in the propositional text base from which the instruction booklets were generated. A set of coding identifiers for the example program, the trace, and the design heuristics were also created. Our protocol coding scheme was partly based on that used by Chi et al. (1989). The top level elaboration types were: •

Domain: statements about Lisp, programming, or recursion. This category is further elaborated below.

•

Monitor: statements that concerned a subject's own state of understanding. For example, “I am definitely confused at this point.”

•

Strategy: statements about a planned explanation strategy. For example, “I'll look at an example, maybe that will help.”

•

Activity: statements concerning the instructional task or the instructional materials. For example: “This paragraph is too long.”

•

Reread: statements that are verbatim re-readings of the instructional text or example.

•

Other: the residual category.

The domain category, an important coding category in terms of shedding light on selfexplanation processes, was further decomposed into three kinds of elaborations: (a) assertions, in


Page 26

which an elaboration was simply stated in the form of a declarative statement, (b) proposals, in which the statement proposed a possible hypothesis or alternative to consider, and (c) questions, which were elaborations stated in question form. Each of these three subcategories were further subdivided into the following coding categories: •

Operation: statements concerned with the operation of Lisp code. For example, “Ok, the first thing it does is null list.”

•

Result: statements about the result of a computation. For example, “cdr gives you the last element of a list.”

•

Input: statements concerned the input to a computation. For example, “When the function gets nil as a value...”

•

Structure: statements about the structure of a function. For example, “you use cond when you're defining the recursive thing.”

•

Is-a: statements that related a particular to a concept. For example, “you're using Carlist as it's own helping function.”

•

Reference: statements relating a concept to a particular. For example, “the recursive step is cdr list.”

•

Purpose: statements about the purpose of a piece of code. For example, “cdr... in order to get it closer and closer to the terminating case.”

•

Analogy: statements involving analogies to information outside of the programming domain. For example, "...like n factorial, we did this in math, is equal to n times n-1 factorial," involved an analogy to the mathematical definition of factorial.

•

Entail: statements describing the entailments of an action. For example:, “when the answer is nil then it will stop.”

•

Plan: statements about a programming plan. For example, "I see, we keep going and going into easier elements, more and more elementary steps until we know what to code.”


Page 27

An inter-coder reliability measure was calculated by having two experimenters independently code a long protocol from a subsequent but similar experiment. The coders agreed 83% percent of the time in their coding assignments for the top level categories. They agreed 81% of the time when deciding what counted as a separable utterance within the protocol. Protocol Coding Procedure for Explanation Episodes Sequences of related elaborations were coded into episodes (or macro-codings). We considered a series of elaborations related to each other and part of an episode if the elaborations all referred to a similar comprehension or explanation goal. For each episode we recorded its length, in terms of number of utterances, and the trigger for the episode. Two kinds of triggers were coded: (a) a subject may have exhibited a comprehension failure, or (b) may have verbalized a self-imposed explanation goal. Comparisons of Elaboration Types for Good vs Poor Subjects. Following Chi et al. (1989), we performed median-split analyses in which subjects were divided into the good or the poor group using a post-hoc median split based on mean number of errors per problem on the Lisp Tutor. Several other analyses were attempted using other performance metrics for the median split, such as number of errors on new productions, but all such metrics were highly correlated with average errors per problem and yielded similar results. The mean errors per problem for the good subjects was 3.99 errors, with a range of 5.25 errors to 1.83 errors per problem, and for the poor subjects, the mean errors per problem was 7.19 errors with a range of 10.75 errors per problem to 5.33 errors per problem. Table 2 presents the mean number of elaborations in the top-level categories of our coding scheme, produced by good and poor subjects while performing self-explanations of the instructional text and examples. We performed a multivariate analysis of variance, with group (good or poor subjects) as a between-subjects independent variable and the number of elaborations in the various elaboration categories as dependent variables (excluding the residual other category of elaborations). The relevant p values for the groupwise comparisons for each category of elaborations are also presented in Table 2. _______________________ Insert Table 2 about here _______________________


Page 28

Examination of Table 2 reveals that the mean number of elaborations for good subjects appear to be greater than those for the poor subjects in all categories except rereads of the text and the example. The overall multivariate contrast between groups is significant, F(10, 1) = 288.74, p < .05, showing an overall difference between groups in producing self-explanations. In processing the text, good subjects produce significantly more monitoring, strategy, and activity statements than poor subjects. Interestingly, these kinds of elaborations can be taken as indicators of various kinds of metacognition. Monitoring statements indicate some awareness of comprehension, strategy statements indicate some awareness of possible ways for refining one's comprehension, and activity statements indicate some awareness of the structure of the instructional task. Although domain elaborations of the text material do not significantly differ between the good and poor subjects, the difference does approach marginal significance. Thus it appears that the major differences between good in poor subjects in processing the text are attributable to improved or increased metacognitive processes. In processing the example material, good subjects produce significantly more domain and strategy elaborations than the poor subjects. Domain elaborations can be treated generally as an indicator of the amount of declarative content being generated while processing the available example. This is consistent with the notion that analogical processes play an important role in skill acquisition. Having greater amounts of declarative knowledge about the domain of Lisp and recursion tied to representations of the example increases the probability of retrieving a relevant chunk of knowledge at novel problem-solving tasks. The difference in strategy elaborations during example processing between good and poor subjects again suggests that good subjects are more metacognitive, or more directed in the way that they attempt to understand the example. These analyses suggest that the good subjects are more metacognitive and generate more domain-related elaborations than poor subjects. An additional conspicuous feature of the selfexplanation data is the rarity of incorrect elaborations. Although students had much opportunity to draw incorrect generalizations and conclusions, they seldom did. Only 1.8% of all elaborations were judged to be incorrect. Thus, it appears making incorrect elaborations is not a key difference between groups. Rather it is simply a failure to generate information of importance. Comparisons of Focus in Domain Elaborations for Good and Poor Subjects A slightly more refined examination of the domain elaborations reveals some particularly striking differences between good and poor subjects. Table 3 presents mean frequencies for three kinds of domain elaborations made while processing the text and example for good and poor subjects.


Page 29

The first category of elaborations in Table 3 is the mean number of reference and is-a elaborations (collectively called ties, because they connect the example to the text or vice-versa) combined with analogy elaborations. The second category of elaborations are those domain statements that refer specifically to recursion-related concepts. The third category of elaborations in Table 3 are domain statements that refer to some aspect of Lisp or Lisp programming that is not related to recursion. Most of the recursion-unrelated statements are about the surface form of the code in the example, such as the names of basic Lisp functions, the syntactic placement of atoms and parentheses, etc., whereas the recursion-related elaborations are abstractions about the structure and processes implied by these surface forms. _______________________ Insert Table 3 about here _______________________

The means in Table 3 suggest that good subjects produced almost an order of magnitude more elaborations that involve connecting example material to the text material. Recall that the text material is very sparse and almost exclusively an abstract discussion of the structure, operation, and design heuristics for recursive functions. The good subjects also appear to have made more elaborations of the example that directly related to recursion than the poor subjects. Interestingly, the poor subjects appear to have made more elaborations about the example that were not directly related to recursion. Overall, the mean data in Table 3 suggest that the good subjects were focused on understanding the examples in terms of the material presented in the texts and were focused on elaborating how the example was related to abstract notions of recursion, whereas the poor subjects focused on surface features of the example. To test these differences, we performed another multivariate analysis of variance, with group (good, poor) as the independent variable, with ties+analogy as one dependent variable, and an interaction indicator as another dependent variable, which was constructed to test whether the relative focus on recursion-related vs recursion-unrelated information varied across groups. The interaction indicator was constructed by taking the number of recursion-related elaborations, XRi, for each subject i, the recursion-unrelated elaborations, XUi, computing the proportion of the total of these two categories devoted to recursion-related statements, and stabilizing the variances of these proportions by using an arcsin transformation (Winer, 1962, p. 221). Specifically, we computed X Ri / (X Ri + X U i ) 2 arcsin √ 


Page 30

for each subject and entered these data into the multivariate analysis of variance. This analysis showed that there was indeed a significant difference between good and poor subjects in making ties and analogy statements, F(1, 10) = 7.56, p = .02, and the proportion of recursion-related statements was greater for the good subjects than for the poor subjects, F(1, 10) = 5.91, p = .04. Together these differences imply that good subjects are more focused in generating elaborations that are directly related to the target of the lesson. Another way of examining the degree to which good subjects are focused in their selfexplanation is to examine the episode structure of the self-explanations. Recall that we identified such episodes as sequences of related self-explanations that were triggered by comprehension failures or self-imposed explanation goals. Good subjects produce a mean of 5.50 (s.d. = 6.15) self-explanation episodes, whereas the poor subjects produce a mean of 1.17 (s.d. = 1.94) self-explanation episodes, which is a significant difference, t(10) = 2.41, p = .02. This result again suggests that good subjects are more persistent and organized in achieving their goals for understanding the instruction. The differences in explanation episodes between good and poor subjects is more interesting when one examines the events that trigger such episodes.4 Good subjects perform a mean .35 (s.d. = .51) self-explanation episodes in reaction to comprehension failures, and the poor subjects perform a mean .50 (.83) self-explanation episodes in reaction to comprehension failures. This difference is not significant t(10) = .45, suggesting that both good and poor subjects were equally reactive to comprehension failures. However, good subjects produced a mean of 3.07 (s.d. = 1.09) selfexplanation episodes triggered by self-imposed explanation goals, whereas poor subjects produced only a mean .67 (s.d. = 1.21) self-explanation episodes in reaction to self-imposed explanation goals, and this difference is strongly significant, t(10) = 3.43, p = .003. Thus, good subjects were more self-directed in generating explanations than poor subjects. Questions and proposals (collectively called nonassertions) accounted for 27% of the domain elaborations across both good and poor subjects. The proportion of domain elaborations that were assertions did not differ across the two groups, t(10) = 1.41 (data were modified by an arcsin

4In this particular analysis, we trimmed the data for one outlier subject in the good group. Specifically, the

subject produced 18 self-explanation episodes, mainly because the Lisp Tutor was delayed in starting the first programming problem and the subject returned to explanation of the instructional text and examples. Her data were transformed to the grand mean + 2 standard deviations based on the data for the remaining 11 subjects.


Page 31

transformation), p = .09, although the trend suggested that good subjects were producing greater proportions of assertions than poor subjects. Table 4 presents a breakdown of domain assertions by content type for the good and poor subjects. Aside from the differences in ties and analogy statements already discussed, none of the other differences between good and poor subjects in Table 4 are significant. Table 4 also presents the proportion of total assertions devoted to each content type, pooled across the good and poor subjects. It is clear that most of the elaborations produced by subjects concerned the operation of example code. This category, combined with elaborations about inputs and results of the code, accounts for 71% of the content. Only 7% of the content was concerned with the structure or purpose of the code, although Pirolli (1991) suggested that it is explanations of the structure and purpose of code that should be most effective. _______________________ Insert Table 4 about here _______________________ The Impact of Elaborations on the Acquisition of Cognitive Skills We have shown that differences exist in the kinds of elaborations made by good and poor problem solvers, and that these differences are related to general improvements on problem solving with the Lisp Tutor. Now, we examine the relation of elaborations to performance measured at the grain-size of ideal model productions in the Lisp Tutor.. In particular, we look at the impact that high quality elaborations have on initial opportunities for coding new productions, as compared to old productions. In this analysis, subjects were categorized as High Elaborators and Low Elaborators, based on a post-hoc median split of subjects' mean number of ties and analogies. These kinds of elaborations were hypothesized to be particularly productive when understanding new instructional material. Then, as above, the mean number of errors made on each trial for coding new and old productions was recorded. Following Anderson et al. (1989), errors on the first two trials were pooled separately, but we pooled together trials 3 & 4, and we pooled trials 5, 6, 7, & 8 (basically yielding a logarithmic scale on trials). An ANOVA was conducted on the production error data (transformed by a square root transformation) with elaboration group (high, low) as a betweensubjects factor, and production type (new, old) and trials 1, 2, 3&4, and 5, 6, 7 & 8 as the repeated measures. Not suprisingly, there was a strong main effect of elaboration type, F(1, 20) = 38.24, p = .0001, MSe = .07, a main effect of production type, F(1, 20) = 5.81, p < 05, and a strong main effect

Learning Strategies and Transfer of trials, F(3, 60) = 16.42, p = .0001, MSe = .05.

Page 32

Interestingly, there was an interaction of

elaboration type with production type, F(1, 20) = 6.51, p = .02. This interaction is evident in Table 5. Higher amounts of focussed elaborations that tie the example material to new concepts reduced error rates on new productions down to levels approaching those of old productions. No other interactions were significant. Note that significant interactions of elaboration type and trials [F(3, 60) = 1.47] or the triple interaction of elaboration type, production type, and trials [F(3, 60) = .75] would have indicated that elaborations were affecting the slopes of the learning curves associated with the practice of cognitive skills. These results suggest that elaborations are improving the initial acquisition of cognitive skill, but not affecting subsequent improvement rates. _______________________ Insert Table 5 about here _______________________ Analysis of Individual Differences in Elaborations and Subsequent Novel Problem Solving The median-split analyses of self-explanations involve some subtle methodological costs and benefits. The division and aggregation of subjects based on a criterion median performance level increases our ability to reject null hypotheses, but prohibits us from making more subtle inferences about patterns in the data. Furthermore, in the self-explanation analyses performed so far, and in the Chi et al. (1989) analyses, the use of the group labels, good and poor, implies a unidimensional characterization of the quality, adaptiveness, or utility of subjects' performance. Different subsets of students may be adapting to different collections of task constraints and goals. For instance, one subset of students may be adapting to a minimization of total task time without regard for errors, another group may focus on generating lengthy self-explanations at the cost of coherence, others may value problem solving with the tutor as more effective in learning than self-explanation, and so on. Furthermore, there may be different topologies of cognition and behavior that are equally adaptive for any particular set of task goals and constraints. With only 12 subjects, one cannot expect to have an extensive sample of the space of individual differences. However, we present data analyses that indicate that the simple good vs poor subjects comparisons might be masking some interesting patterns.


Page 33

The Effectiveness of Elaborations Is more self-explanation necessarily better? From the Chi et al. (1989) study and our own analyses so far, the answer to that question would appear to be a simple "yes." But consider the learning task in a little more detail. Assume that, ideally, all subjects attempted to exhibit stellar performance in solving problems with the Lisp Tutor. One way to improve one's performance is to try to understand the instructional texts and examples in such a way that it might just lead to improved problem solving later. On the other hand, extensive amounts of self-explanation might begin to yield diminishing returns as far as later problem solving is concerned. After all, how much can one really elaborate on a six-page text before one begins to become either repetitive or begins to stray from the important concepts relevant to problem solving? The gain in problem solving performance might still improve with additional self-explanations, but the expected effectiveness of each new elaboration may diminish as time goes on. To explore this issue, we examined the relationship of errors with the Lisp Tutor to elaborations that explained the example material in terms of newly introduced concepts (ties) or that explained the material by analogy to extra-domain concepts (analogies). More specifically, we performed three regressions of errors on the first opportunity to acquire new productions with the Lisp Tutor regressed on the summed ties + analogies elaborations made by subjects. The first was a linear regression to examine if the trend in the relationship between errors and ties + analogies suggested that each unit elaboration produced a unit decrease in error rates. Specifically, we fit an equation of the form Errors = A + B • Elaborations to the error data, and number of ties + analogies elaborations. The second analysis was an exponential regression to examine if the trend suggested a proportional decrease in errors with each additional elaboration of ties + analogies. Specifically, we fit an equation of the form Errors = A • exp( - B • Elaborations) The third analysis was a power function regression to examine if the trend suggested diminishing decreases in errors rates with increasing values of elaborations. In other words, we used a power function fit to determine if there was a negatively sloped marginal value function for elaborations. Specifically, we fit an equation of the form


Page 34

Errors = A • Elaborations - B The regressions for the exponential and power functions were performed by first transforming the original data appropriately (into either log-linear or log-log coordinates) and then performing ordinary least-squares regression. To obtain comparable coefficients of determination (R2), we used Equation 1 from Kvålseth (1985) on the untransformed data and predicted data.

For the linear regression, R2 = .28, for the exponential regression, R2 = .28, and for the power function regression, R2 = .37. If we accept the higher value of R2 for the power function as a robust trend estimate, then it suggests that the derivative of the best-fit function decreases with increasing elaborations. There appears to be a decreasing marginal value for self-explanation: Producing more elaborations is indeed related to improved problem solving performance, but the improvement in effectiveness is more rapid at lower numbers of elaborations than at higher numbers of elaborations. If we imagine an idealized learner attempting to optimize by maximizing their global rate of learning over time, then such negatively sloped marginal value functions of the effectiveness of selfexplanation have great significance. A basic decision facing the idealized rate-optimizing learner is to choose the point at which stop elaborating on a particular segment of instruction. In general, the tradeoff is a competition between the net gains being made by elaborating the current segment of instruction (which continually decrease) and the net gains that would be made by moving onto the next segment of instruction (cf. Charnov, 1976). Note that we are not saying the errors rates on the Lisp Tutor are the only measure of the utility of self-explanations. But there is one intuitively compelling model of this law of diminishing returns that would suggest that we should expect it to show up for other performance or learning metrics. Specifically, generalizations of the exponential exhaustion models presented in Newell and Rosenbloom's (1981) analysis of practice-performance relations may be applicable to the relationship between self-explanation and performance. Assume that a performance measure is dependent on a set of independent task-achieving elements, such as goals and methods that solve those goals. Suppose that some subset of those elements can be improved by self-generated explanations of instructional material and examples. Further, assume that any particular elaboration can improve some proportion, α, of the n improvable methods. This might generally lead us to expect an exponential improvement in total performance, P, with elaborations, or P=βe-αn


Page 35

However, a sub-exponential rate of improvement, indistinguishable from a power function, would be predicted if proportion of improvement, α, decreases somehow with increasing elaborations. There are several plausible ways that α might decrease. It may be the case that after first "insightful" elaborations are generated, there is a tendency to persevere on paraphrasing or embellishing those early insights without adding anything fundamentally new to the insight and at the expense of explaining other elements. It could also be the case that subjects do not persevere on their early explanations, but their chances of producing a new explanation for something already explained earlier increase as time goes on (roughly speaking, the subject repeatedly samples the elements to be explained with replacement), and the improvement on a particular element explained is sublinear in the number of explanations for that element.

Dimensions related to improvements in error rates Another subtlety in the relationship of self-explanation to Lisp Tutor performance can be seen by considering the different ways that subjects can allocate their learning efforts. Consider the possibility of a three-way trade-off among time spent performing self-explanations, time spent searching for a correct solution on novel task components, and errors on those novel task components. We might expect that subjects who perform low amounts of self-explanations and do not spend much time searching for or planning a novel solution would exhibit high errors rates. We might then expect these error rates to be reduced by either allocating time to self-explanations, or investing time when a novel task is encountered. To illustrate this trade-off in our data, we have plotted, in Figure 6, the mean errors on the first encounter with new productions in the Lisp Tutor as a function of the mean time spent on those encounters, and the total ties + analogies elaborations performed earlier by each subject. Figure 6 shows that, in general, subjects' average number of errors on encountering novel task components decreases with increases in ties + analogies and/or increases in time spent on those first encounters with novel tasks. Dividing the subjects in Figure 6 into three equal groups based on mean time (fast, slow, and medium) we found that three of the six good performers (lowest error rates) were in the fast group and produced above-median ties+analogies, and two of the six top performers were in the slow group and produced below-median ties+analogies. In contrast, four of the six poor performers (highest error rates) were in the medium group and produced below-median ties+analogies. In other words, good performers seemed to devote themselves to either high self-explanations and fast problem solving with the tutor, or low self-explanations and


Page 36

slow problem solving with the tutor, whereas the poor performers were low on self-explanations and not slow in problem solving. Specific case studies of individual differences among subjects' protocols are presented in Recker and Pirolli (1991). _______________________ Insert Figure 6 about here _______________________

Analysis of Reflection Reflecting over one's own problem solving can take place at any time during a problem solving session. A subject may reflect after reading a problem statement, after a comprehension failure, during planning, etc. However, it would seem the most common and effective use of reflection would occur just after completing a problem. This is especially true if the problem just solved is very novel. In order to collect potential reflection episodes, we concentrated on verbalizations that took place after the completion of one problem and before the beginning of the next. Subjects were not prompted in any way, hence any reflection that occurred was spontaneous. In addition, to focus on novel problem solving situations, we only considered the first eight problems. From the between-problem segments, we extracted and transcribed the verbal protocols of reflection episodes. Verbalizations are considered to be reflection episodes only if they are a deliberate analysis of a subject's own problem solving efforts or about the particular problems solved. As a learning strategy, elaborations generated while reflecting over problem solutions have much in common with self-explanation. In self-explanation, a learner considers new, external representations, in our case text and examples, to build an encoding of the domain. In reflection, the learner operates over an internal representation of the domain and the just completed program. For this reason, reflection protocols were coded in roughly the same manner as the self-explanation protocols. However, unlike the self-explanation protocols, no significant differences were found between the groups in the number of monitoring and strategy elaborations made. Significant differences exist between good and poor groups in the kinds of domain elaborations made during reflection. The domain coding category captures reflections made about the problem solution. The domain category has two sub-categorizations: (a) abstraction, and (b) elaboration. An abstraction involves abstracting a schema on the basis of similarities or differences between the justsolved problem and other solutions. In some cases, an existing schema might be discriminated. Such


Page 37

abstraction usually concerns either structural features in the code or process features of recursive functions. The following is an example of an abstraction statement, made by a good subject on the negnums function:

The idea, first, in every case you have to run through the list before you get to the end and test if something is true or not. If it is then you have to put your thing in a list, maybe you have to modify it by adding one in one case. And if it's not then you just have to return your function as applied to the rest of the thing, and so each one is just a carbon copy of the others except for the names and the tests are a little changed but... In the following segment, the same subject has just solved her first number recursion problem (power) after solving 4 list recursion problems. She comments on their differences:

Oh, I know, it wasn't comparing 2 lists, that was why it was different. Because all the other ones up to this point had been comparing 2 lists or otherwise just scanning one list for certain elements... which is the same thing... I mean essentially just scanning a list or two... but this one wasn't, that was why it was different.

A domain elaboration is defined as a comment on the just solved problem, without reference to other solutions. It is usually either simply a summary of the code or a process description of the program. The main distinction between the two is that the former is usually a rereading of the program using mostly a Lisp-like vocabulary. The latter is a description in English. The motivation behind the summary-process distinction is they may reflective qualitatively different explanations. The "summary" description may indicate a lack of deep understanding of recursion since the subject is merely reading the solution at a syntactic level. However, it could be argued that a subject has begun to feel very facile and comfortable in Lisp and prefers using its syntax. The following is an example of an elaboration where the subject provides a good explanation of the process of recursion. This subject, a good subject, had the fifth lowest average number of errors per problem. The program (setdiff) finds set differences:

Defun Setdiff x y ...Cond.. null x..nil...member first you test car x is member in y and if so you just do difference all the other elements in x


Page 38

and y and if x is not empty nor is the first member of x in y then true case you insert first element of x into the list that results in Setdiff and the rest of the members of x in y. The following is an example of an elaboration where the subject provides a surface summary for a different program. This subject recorded the highest average number of errors per problem:

You have the function Negnums which has the parameter called x which is a list. You have 3 conditions, the first is, if the list is the empty set. If so return nil. If it's negative then you take the first letter of the list and insert it in front of the Cdr of the list, executed with Negnums. And then, if it's not negative then it has to be positive and so you just do the Cdr of x. The first observation is that reflection occurs in about half of the opportunities we observed. There are 96 opportunities for reflection episodes (after each of eight problems for each of the 12 subjects). Reflection was observed to occur in 42 of these cases. The number of reflection episodes does not differ significantly, t(10) = .89, between good subjects (mean = 4.83 episodes, s.d. = 3.60) and poor subjects (mean = 3.33 episodes, s.d. = 2.07). Good subjects tend to say more in their reflections (mean = 437 words, s.d. = 305.65) than poor subjects (mean = 240 words, s.d. = 170.46) although this difference is also not significant t(10) = 1.37, p = .10. On the face of it, it appears that reflection happens with the same frequency and duration across the two groups, although the trends suggest that the good subjects are performing greater amounts of reflection. Examination of the reflection episodes does however suggest that the good and poor subjects differed qualitatively in the kinds of reflections made. Five of the six good subjects performed abstractions or descriptions of the processes generated by their functions. In contrast, only one poor subject performed an abstraction and another poor subject gave a process description of the program. This difference is significant, Fisher Exact p = .045, with one subject excluded from the analysis because of no reflections. Poor subjects, when they did engage in reflection, largely gave verbatim rereads of the final solution. In addition, good subjects made more crossproblem comparisons. On average, for good subjects, 26% (s.d. = 27) of the elaborations made during reflections were cross-problem reflections whereas poor subjects made an average of 2% (s.d. = 5) cross-problem reflections, and this difference is significant (t(10) = 2.00, p < .05). In


Page 39

general, good subjects appeared to be more focused on the generation of abstractions and understanding the operation of their programs than poor subjects.

Summary of Experiment 2 In Experiment 2 we developed protocol analysis techniques to complement our production system analyses of the development of cognitive skills. The results suggest the following: •

Skill acquisition is facilitated by higher degrees of metacognition, sensitivity to the structure of instruction, and knowledge of goals and plans for learning.

•

Skill acquisition is strongly related to the generation of explanations connecting the example material to the abstract concepts introduced in the text and to concepts from outside the domain of programming.

•

Skill acquisition improves to the degree that explanations are more focussed on the novel content of the lesson.

•

Error rates on novel task components may also be improved by spending more time in planning solutions to those components.

•

The effects of self-explanations has diminishing returns.

•

Reflections that focus on understanding the abstractions underlying the programs or that focus on understanding how programs work, seem to be related to improved learning.

Finally, we should note that it could be argued that the protocol-giving instructions and the instructions to explain examples makes some or all subjects work differently than if they were reading silently. It could also be argued that ability to verbalize explanations modulates the effectiveness of self-explanations. None of these possibilities would seriously undermine the finding of a relationship between the output of self-explanations and learning, although it forces one to consider the underlying complexity of possible causes. To be more precise, the finding of a correlation of self-explanation with subsequent learning could (at a minimum) be due to any one or more of the following factors: (a) subjects (preexperimentally) vary in self-explanation capabilty or (b) background knowledge, (c) subjects (preexperimentally) vary in disposition to self-explain, (d) subjects vary in the degree to


Page 40

which they heed the instructions to explain out loud or (e) subjects vary in their ability to verbalize their self-explanations, which suggests either (e’) varying mental codes for self-explanations, and/or (e’’) varying abilities to recode the internal self-explanations. Unfortunately we could not disentangle these hypotheses in our limited experimental design, but the fact remains that people who more effortlessly verbalize explanations also do better on subsequent learning. It should be noted that this propensity to verbalize explanations or states of comprehension extended beyond the portions of the materials that subjects were specifically instructed to explain out loud (the examples).

General Discussion Our studies were carried out in a production system framework that allowed us to track the use and evolution of cognitive skills, and we tied this framework to a verbal protocol coding scheme that provided indicators of metacognitive processes and the generation of various kinds of declarative knowledge. In part, our results replicate and integrate the findings of Anderson et al. (1989), Pirolli (1991), and Chi et al. (1989). Like Anderson et al. (1989), we found that a production system analysis of acquisition, practice, and transfer provided a systematic account of much of the performance variations. This analysis relied on a few simple assumptions about the nature of cognitive skills, their representation in a production system formalism, and mechanisms for knowledge compilation and production strengthening. Consistent with Pirolli (1991), we found that a production system analysis can also provide a metric for the facilitation of novel problem solving in the presence of example solutions. Our analysis of the self-explanations of good and poor subjects is also consistent with the results of Chi et al. (1989). In addition to extending the results of Chi et al. (1989) to a new domain, we performed several more detailed analyses of individual differences, that went beyond simple good-poor distinctions. We found that the relationship between self-explanation and learning is rather subtle. There appears to be diminishing returns as one continues to generate self-explanations. It also appears that there are several ways to adapt well to the learning task, and not all of these include simply producing more self-explanations. Finally, it appears that reflection on one's problem solutions is an additional way to improve a declarative understanding of the domain and to improve subsequent skill acquisition. Our studies investigated self-generated elaborations in the context of instructional materials that were very sparse, with no elaborations of the main points, nor any references to the example materials. Other studies show that author-provided elaborations can also vary in their effects on skill acquisition. Reder, Charney, and Morgan (1986) analyzed the impact of author-provided


Page 41

elaborations on learning commands for a computer operating system. Reder et al. found that carefully chosen elaborations can indeed enhance learning and performance when the elaborations were directly related to components of the skill to be acquired. A related result comes from Pirolli's (1991) study of the effects of different author-provided explanations of example programs. Pirolli (1991) found that explanations which elaborated the underlying structure of a program produced more efficient learning than explanations that focused on how the code worked. Pirolli (1991) speculated that abstract representations of the program code were more directly useful in planning new solutions. Thus, it appears that carefully chosen instructional explanations can facilitate learning a particular task. One may question, however, whether externally provided explanations are as effective as self-generated explanations. The answer to the question is somewhat complicated. Studies of memory (e.g., Hirshman & Bjork, 1988) show a generation effect, in which self-generated items are better remembered than externally provided items. However, our studies suggest that learners have to generate the right kinds of elaborations, and not all learners are appropriately selective in their elaborations. Carefully crafted instructional explanations might ameliorate learning difficulties for those with poor self-explanation abilities at the expense of improved memory due to generation effects for those with good self-explanation abilities. However, improvements in the measurement of individual differences in cognitive abilities and background knowledge (Pirolli & Wilson, 1992), combined with technologies that provide individualized instruction, such as intelligent tutoring systems, could be expected to provide the flexible pedagogy required to address such aptitude-treatment interactions (Recker & Pirolli, 1992). Our analyses of self-explanation and reflection indicate that the student modelling component of the Lisp Tutor fails to capture some important individual differences. This suggests that the Lisp Tutor, and similar intelligent tutoring systems might be more effective in student diagnosis and contingent tutoring strategies if they incorporated capabilities for measuring and modelling important variations in student learning strategies. However, we should point out that such work probably requires substantial and fundamental work in basic measurement theory, and a better understanding of the degree to which learning strategies can be communicated and shaped. Developments in these areas probably require a better understanding of the learning strategies themselves. Chi et al. (1989) suggested that recent artificial intelligence work on explanation-based learning (DeJong & Mooney, 1986; Mitchell, Keller, & Kedar-Cabelli, 1986) might provide a theoretical framework for understanding human self-explanation processes and their effects. In


Page 42

their original form, explanation-based learning methods required that the learner have theories of the domain that were complete, consistent, and tractable. Clearly, human learners encountering a new lesson have domain theories that are incomplete, sometimes inconsistent, and perhaps intractable. Indeed, much of the ongoing work on explanation-based learning (e.g. DeJong & Mooney, 1986; Mitchell et al., 1986) is aimed at incorporating methods for the induction of new domain theories and relaxing constraints on consistency and tractability. We believe that the the basic notion of learning from examples by some sort of explanationbased learning needs to be augmented by the theory of mental models (Johnson-Laird, 1983) and theories of discourse processing (Kintsch, 1986). We assume that the acquisition of procedural knowledge (cognitive skill) is dependent on evolving mental models of the task domain as well as mental models of learning in the domain (a form of meta-model). Mental models of the task domain include models of code structures, programming abstractions, their functionality, purposes, and operation, as well as mental models of programming tasks such has how to write, simulate, and debug various kinds of programming structures. Such mental models are declarative structures representing various aspects of programs and programming that are operated upon by procedural knowledge that construct, manipulate, and interpret such models. Mental models of learning basically structure how the learner focuses their learning to improve their knowledge and performance. Such models structure the processing of instructional discourse and examples as well as introspective reflection upon problem solving performance. Generally, we assume that a goal of self-explanation processes is to produce an integrated coherent mental model of the programming situations and tasks described by instructional texts and examples. Instructional discourse is processed into a set of interrelated verbal propositions comprising a text base. These propositions in turn are used to attempt to construct a mental model representing the situation described by the text. This construction of the mental models from discourse is made difficult by two factors. First, the texts typically describe programs and programming in a very abstract manner, and consequently the construction of concrete referents in the mental model is difficult or uncertain at best. Second, because it is instructional discourse, a number of new concepts and propositions are introduced for which the learner cannot possibly already have the machinery for directly constructing the corresponding interpretation in a mental model. For instance, it is unlikely that learner can directly construct a mental model of the phrase "a recursive call" having never before seen a recursive call.


Page 43

Examples can be used to solve both difficulties, by providing concrete referents for abstract discourse and newly introduced concepts and propositions. We assume that example programs are processed in a manner similar to explanation-based learning. Explanation-based learning involves constructing a structure that explains how goal concepts are instantiated in a particular example by using available domain knowledge. Learning mechanisms lift new pieces of domain knowledge from the explanation structure. In the case of programming examples, one might imagine two sorts of explanation structures being generated. One kind of explanation might be directed at explaining how the example program satisfies the specifications for the program. A second sort of explanation process might be directed at explaining how novel goal concepts introduced in the instructional discourse are instantiated in the example program. Finally, we should point out that most of the results we have reported on the relations between self-explanations and learning with the Lisp Tutor are correlational in nature. This was also true in the study of Chi et al. (1989). Clearly, the next step to take would involve experimental manipulation of self-explanation strategies and examination of these manipulations on learning and performance.


Page 44

References

Anderson, J.R. (1976). Language, memory, and thought. Hillsdale, NJ: Lawrence Erlbaum. Anderson, J.R. (1983). The architecture of cognition. Cambridge, Md: Harvard University Press. Anderson, J.R. (1987). Skill acquisition: The compilation of weak-method problem solutions. Psychological Review, 44, 192-210. Anderson, J.R. (1989). A theory of the origins of human knowledge. Artificial Intelligence, 40, 313351. Anderson, J.R., Conrad, F., and Corbett, A. (1989). Skill acquisition and the Lisp Tutor. Cognitive Science, 13, 467-505. Anderson, J.R., Corbett, A. T. , and Reiser, B.J. (1987). Essential Lisp. Reading, MA: Addison Wesley. Anderson, J.R., Pirolli, P.L., and Farrell, R. (1988). Learning to program recursive functions. In M. Chi, R. Glaser, and M. Farr (Eds.) The nature of expertise (pp. 153-183). Hillsdale, NJ: Erlbaum. Bovair, S., Kieras, D., and Polson, P.G. (1990). The acquisition and performance of text-editing skill: A cognitive complexity analysis. Human-computer Interaction, 5, 1-48. Charnov, E.L. (1976). Optimal foraging: The marginal value theorem. Theoretical Population Biology, 9, 129-136. Chi, M.T.H, Bassok, M., Lewis, M.W., Reiman, P., and Glaser, R. (1989). Self-explanations: How students study and use examples in learning to solve problems. Cognitive Science, 13, 145-182. DeJong, G. and Mooney, R. (1986). Explanation-based learning: An alternative view. Machine Learning, 1, 145-176.


Page 45

Ericsson, K. A., and Simon, H. A. (1984). Protocol analysis: verbal reports as data. Cambridge, MA: MIT. Hirschman, E. and Bjork, R.A. (1988). The generation effect: Support for a two-factor theory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 484-494. Johnson-Laird, P.N. (1983). Mental Models. New York: Columbia. Kieras, D.E. (1985). Improving the comprehensibility of a simulated technical manual (Tech Rep No. 20 TR-85/ONR-20). University of Michigan. Kieras, D.E. and Bovair, S. (1986). The acquisition of procedures from text: A production system analysis of transfer of training. Journal of Memory and Language, 25. 507-524. Kintsch, W. (1986). Learning from text. Cognition and Instruction, 3, 87-108. Kvålseth, T.O. (1985). Cautionary note about R2. The American Statistician, 39, 279-285. Lorch, R.F. and Myers, J.L. (1990). Regression analyses of repeated measures data in cognitive research. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 149157. Mitchell, T.M., Kellar, R.M., and Kedar-Cabelli, S.T. (1986). Explanation-based generalization: A unifying view. Machine Learning, 1, 47-80. Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard University Press. Newell, A. and Rosenbloom, P. (1981). Mechanisms of skill acquisition and the law of practice. In J. R. Anderson (Ed.) Cognitive skills and their acquisition. Hillsdale, NJ: Lawrence Erlbaum. Newell, A. , and Simon, H. (1990). Human problem solving. Englewood Cliffs, NJ.: Prentice-Hall. Pirolli, P. (1986). A cognitive model and computer tutor for programming recursion. Humancomputer Interaction, 2, 319-355.


Page 46

Pirolli, P. (1991). Effects of examples and their explanation in a lesson on recursion: A production system analysis. Cognition and Instruction, 8, 207-259. Pirolli, P., and Anderson, J.R. (1985). The role of learning from examples in the acquisition of recursive programming skills. Canadian Journal of Psychology, 39, 240-272. Pirolli, P. and Wilson, M. (1992). Measuring leaning strategies and understanding: A research framework. In C. Frasson, G. Gauthier, and G.I. McCalla (Eds.) Intelligent Tutoring Systems (pp. 539-558). Berlin: Springer-Verlag. Recker, M. and Pirolli, P. (1991). Analyses of self-explanation verbal protocols: Description of a protocol coding scheme and representative protocols (Tech. Rep. CSM-5), Berkeley, CA: University of California, School of Education. Recker, M. and Pirolli, P. (1992). Student strategies for learning from a computational environment. In C. Frasson, G. Gauthier, and G.I. McCalla (Eds.) Intelligent Tutoring Systems (pp. 382394). Berlin: Springer-Verlag. Reder, L.M., Charney, D.H., and Morgan, K.I. (1986). The role of elaborations in learning a skill from an instructional text. Memory and Cognition, 14, 64-78. Reiser, B.J., Anderson, J.R., & Farrell, R.G. (1985). Dynamic student modelling in an intelligent tutor for Lisp programming. In Proceedings of the Ninth International Joint Conference on Artificial Intelligence, 8-14. Los Altos, CA: Morgan-Kaufman. Rich, C. (1981). Inspection methods in programming. (Tech. Rep. AI-TR-604). Cambridge, MA: MIT. Ryle, G. (1949). The concept of mind. New York: Barnes and Noble. Schoenfeld, A.H., Smith, J.P.,. Arcavi, A. (in press). Learning. In R. Glaser (Ed.) Advances in instructional psychology, Volume 4. Hillsdale, NJ: Lawrence Erlbaum. Singley, M.K., & Anderson, J.R. (1989). Transfer of cognitive skill. Cambridge, MA: Harvard University Press.


Page 47

Spohrer, J.C., Soloway, E., & Pope, E. (1985). A goal/plan analysis of buggy Pascal programs. Human-computer Interaction, 1, 163-207. van Dijk, T. and Kintsch, W. (1983). Strategies of discourse comprehension. New York: Academic Press. VanLehn, K. (1991). Rule acquisition events in the learning of a cognitive skill. Cognitive Science, 15, 1-47. Waters, R.C. (1985). A step towards the Programmer's Apprentice (Tech. Rep. No. 753). Cambridge, MA: MIT Artificial Intelligence Laboratory. Winer, B.J. (1962). Statistical principles in experimental design. New York: McGraw-Hill.


Page 48

Acknowledgments A portion of Experiment 1 was conducted while the first author was funded by an IBM fellowship. Additional funding was provided by the National Science Foundation (award number IRI–9001233) to the first author, and the University of California, Berkeley, Institute of Cognitive Studies. Portions of this research have been presented at meetings of the American Educational Research Association, April 1988, and the Eleventh Annual Meeting of the Cognitive Science Society. We thank Matthew Lewis, Luciano Meira, Mitch Nathan, Michael Ranney, and Kurt VanLehn and an anonymous reviewer for comments on earlier drafts of this paper, and Kate Bielaczyc and Beatrice Lauman for assisting in the collection of data. We are also grateful to the Cognitive Stud Muffins (Dan Berger, Kate Bielaczyc, Vinod Goel, Susan Newman, and Mike Sipusic) for many useful suggestions at various stages of this research.


Page 49

Table 1 Number of subjects recognizing or failing to recognize the general plan for the first problem of each type.

First Number Problem

First List Problem

Recognize Plan

Fail to Recognize Plan

Recognize Plan

Fail to Recognize Plan

Number

4

5

1

8

List

2

9

7

4

Example Type


Page 50

Table 2 Mean number of elaborations per subject for good and poor subjects during the processing of instructional texts and examples (p values for pairwise group contrasts from a MANOVA are in parentheses).

Text Elaboration Type

Good

Poor

Example

(p value)

Good

Poor

(p value)

Domain

12.33

5.50

(.148)

16.17

6.83

(.048)

Monitor

7.83

3.00

(.010)

7.67

1.67

(.158)

Strategy

8.50

3.33

(.008)

.50

.00

(.049)

Activity

5.50

.17

(.038)

3.00

.50

(.212)

Reread

10.33

9.50

(.888)

.00

.83

(.231)

Other

1.83

.33

.67

.17

Total

47.33

18.67

29.17

15.50


Page 51

Table 3 Partial breakdown of mean number of domain elaborations for good and poor subjects into statements that tied the example to text material or were analogy statements, and statements related or unrelated to recursion.

Group

Domain Elaboration Type

Good

Poor

Ties+Analogy

11.33

1.67

Recursion Related

37.67

17.17

Recursion Unrelated

7.17

7.83


Page 52

Table 4 Mean number of assertions by content type for the good and poor subjects in Experiment 2.

Group

Content Type

Proportion of Total Content

Good

Poor

Input

1.00

.00

.02

Result

10.17

2.33

.20

Operation

19.00

11.50

.49

Structure

1.33

.33

.03

Purpose

1.17

1.33

.04

Plan

.83

1.67

.04

Is-a

2.67

.83

.06

Reference

3.67

1.17

.08

Analogy

1.83

.17

.03

Entail

.83

.00

.01


Page 53

Table 5 Mean number of errors on old and new production for subjects with high and low number of ties and analogy elaborations.

Production Type

Elaboration Frequency

Old

New

High

.17

.19

Low

.31

.52


Page 54

Figure Captions Figure 1. A framework for the analysis of learning. Figure 2. An analysis identifying the sets of ideal model productions that differentiates the major effects of declarative and procedural transfer. Figure 3. Mean errors per problem in the blocked problem sequence in Experiment 1. Figure 4. Mean errors per problem in the interleaved sequence in Experiment 1. Figure 5. A hierarchical cluster analysis of the interproblem similarities based on an identical elements production analysis of minimal solution paths. Figure 6. Subjects' mean errors on their first encounter with new productions in the Lisp Tutor as a function of the mean time spent in those encounters and total self-explanations that were concept ties or analogies.


Instructional text

Page 55

Examples

Interpretation of text

Explanation of examples Declarative knowledge of domain and tasks

Interpretive use of declarative knowledge to guide search at impasses

Reflection on problem solving Problem solutions

Application of task-specific knowledge Efficient task-specific knowledge

Acquisition or strengthening of task-specific cognitive skills

ocedural knowledge quired on prior problems

nsfer of procedural wledge

Old productions

New nonexample productions

New example productions

Knowledge required to solve a target problem

in Instruction

Declarative knowledge from examples

Transfer of declarative knowledge Learning Strategies and Transfer Page 56


Page 57

5

Number Example

:

List Example

14

Mean

12 5

Errors

10

5

8 6

5 : : 5

4

:

5

2

5 :

:

:5

5 : :

:5

:5

0 N1

N2

N3

N4

L1 L2 Problem

L3

L4

M1

M2


Page 58

5

Number Example

:

List Example

14

Mean

12

5

Errors

10 8

:

: :

6

:

4

:

5

: : :

5

2

5

5

5 5

L1

N2

L2

N3 L3 Problem

5

: 5

:

0 N1

5

N4

L4

M1

M2


Page 59

.4

.3

.2

.1

0.0

-.1

s u m o d d

l i s t n u m s

s u m a l l

f a c t

p o w e r

l e n g t h

c a r l i s t

n e g n u m s

i n t e r s e c t

s e t d i f f

a d d 1 n u m s

g r e a t e r n u m


Page 60

1.5 1.5 1 1

Errors

0.5 0.5 0 5 0

10

Time (secs)

0

15 20

100 25 200

50

150

Ties + Analogies

Learning Strategies and Transfer in the Domain of Programming

Learning Strategies and Transfer in the Domain of Programming

Suggest Documents

Learning Strategies and Transfer in the Domain of Programming

Domain Adaptation and Transfer Learning in StochasticNets

Programming the Transfer of Learning in Adventure ...

cash transfer programming in the pacific - Cash Learning Partnership

Domain Transfer Multiple Kernel Learning

Domain Transfer Learning for MCI Conversion Prediction

Selective Transfer Learning for Cross Domain Recommendation

Chapter 31 Transfer of Learning in Adventure Programming Michael A ...

Transfer Learning for Domain Adaptation in MRI: Application in Brain ...

Cash Transfer Programming - Cash Learning Partnership

Transfer of Active Learning Strategies from the Teacher Education ...

Transfer of Active Learning Strategies from the Teacher Education ...

THE TRANSFER OF STRATEGIES FROM

Learning to Learn, from Transfer Learning to Domain Adaptation: A ...

Learning to Learn, from Transfer Learning to Domain Adaptation: A

Cross-Domain Transfer in Reinforcement Learning using Target

Weighted Domain Transfer Extreme Learning Machine and Its Online

the role of gender and language learning strategies in learning

TRANSFER AND LINGUISTIC CONTEXT IN THE LEARNING ...

Multi-Domain Transfer Learning for Early Diagnosis of ... - Mingxia Liu

PROGRAMMING STRATEGIES OF U

child outcomes of cash transfer programming - Cash Learning

SVM-based boosting of active learning strategies for efficient domain ...

measuring the perceived transfer of learning and