An evaluation of object oriented example ... - Semantic Scholar

An Evaluation of Object Oriented Example Programs in Introductory Programming Textbooks Jürgen Börstler

Dept. of Computing Science Umeå University SE-90187 Umeå, Sweden

[email protected]

James H Paterson

School of Engineering and Computing Glasgow Caledonian University Glasgow G4 0BA, UK

Mark S Hall

CSEPA University of Wisconsin Colleges Wausau, WI 54401

Marie Nordström

Dept. of Computing Science Umeå University SE-90187 Umeå, Sweden

[email protected]

[email protected] Kate Sanders

Math/CS Department Rhode Island College Providence, RI 02908

[email protected]

Carsten Schulte

Didaktik der Informatik Freie Universität Berlin D-14195 Berlin, Germany

[email protected]

[email protected] Lynda Thomas

Dept. of Computer Science Aberystwyth University Aberystwyth SY23 2AX, UK

[email protected]

ABSTRACT Research shows that examples play an important role for cognitive skill acquisition. Students as well as teachers rank examples as important resources for learning to program. Therefore examples must be consistent with the principles and rules of the topics we are teaching. However, educators often struggle to find or develop objectoriented example programs of high quality. Common examples are often perceived as not fully faithful to all principles and guidelines of the object-oriented paradigm, or as not following general pedagogical principles and practices. Unless students are able to engage with good examples, they will not be able to tell desirable from undesirable properties in their own and others’ programs. In this paper we report on a study in which experienced educators reviewed a wide range of object-oriented examples for novices from popular textbooks. This review was accomplished using an on-line checklist that elicited responses on 10 quality factors. Results show that the evaluation instrument provides a sufficiently consistent set of responses to distinguish examples. The paper then goes on to examine some of the characteristics of good and bad examples and how this study will

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ITiCSE’09, July 6–9, 2009, Paris, France. Copyright 2009 ACM , ISBN 978-1-60558-886-5 ...$10.00.

influence the evolution of the evaluating instrument.

Categories and Subject Descriptors D.1.5 [Programming Techniques]: Object-oriented Programming; K.3.2 [Computers and Education]: Computer and Information Science Education—Computer science education

General Terms Principles, Guidelines, Examples

Keywords Example programs, check list, courseware, textbooks, assessment

1.

INTRODUCTION

Examples play an important role in teaching and learning programming. Students as well as teachers cite examples as the most helpful resource for learning to program [28]. This is also supported by research in cognitive science which confirms that “examples appear to play a central role in the early phases of cognitive skill acquisition” [47]. Research in cognitive load theory furthermore shows that an alternation between using worked examples and actual problem solving increases achievement of learning outcomes [44]. Examples work as role models; students use examples as templates for their own work. Examples must therefore be consistent with the principles and rules being taught and should not exhibit any undesirable properties or behavior. In other words, all examples should follow the very same principles, guidelines, and rules we expect our students to

_________________________________________________________________________________________________________ inroads — SIGCSE Bulletin Volume 41, 41, Number 4— December inroads — SIGCSE Bulletin --126 Volume Number 4 2009 — 2009 December 126--

eventually learn. If our examples do not do so consistently, students will have a difficult time recognizing patterns and telling an example’s surface properties from those that are structurally or conceptually important. In other words, it is important to present examples in a way that conveys their “message”, but at the same time be aware of what learners might actually see in an example [33]. Trafton and Reiser [45] note that in complex problem spaces (like programming), “[l]earners may learn more by solving problems with the guidance of some examples than solving more problems without the guidance of examples”. By continuously exposing students to “exemplary” examples, important properties are reinforced. Students will eventually gain enough experience to recognize general patterns which helps them to distinguish between good and bad designs. With carefully developed examples, we can reduce misinterpretations and prevent students from drawing inappropriate or undesired conclusions, e.g., by overgeneralization. This helps to prevent misconceptions, which might hinder students in their further learning [14, 24]. Although example programs are perceived as one of the most important tools for the teaching and learning of programming [28], there is very little research regarding their properties and usage. There is a large body of knowledge on program comprehension (e.g., [9, 11]) and software quality and measurement (e.g., [37]), but this is rarely applied in an educational setting [5, 30]. In this report, we describe a checklist-based approach for assessing object-oriented example programs for novices. The checklist covers technical, object-oriented and didactical qualities of example programs. The checklist has been used to evaluate common textbook examples. Our results show that the checklist helps to indicate strengths and weaknesses of example programs.

2.

RELATED WORK

Textbooks are an important component of teaching introductory programming. They are a major source for example programs and also work as a reference for how to solve specific problems. Although examples play an important role in the teaching and learning of programming, there are no systematic evaluations of textbook examples. De Raadt et al. [16] compared 40 introductory programming textbooks by measuring their amount of content particularly relevant for the textbook’s usefulness as a learning tool, like the number of pages covering examples, exercises, bibliographies, appendices, language reference, index, glossary, and other chapter content. Of the 14 books covering object-oriented programming (in Java), examples covered from 0% to 43% of the content. The authors conclude that Examples should concisely illustrate a technique. They should include line numbers for reference, though should preferably be as self-contained as possible, not requiring the reader to keep referring back to the accompanying text discussion. Better examples will often include the author’s comments maybe accompanied with some lines and arrows like the typical classroom blackboard example. Wu et al. [49] address the question of how examples are used and presented in high school textbooks (using BASIC

as a programming language). They categorize Presentation style based on the presence or absence of four major problem-solving steps: problem analysis, solution planning, coding and testing/debugging. An analysis of 16 textbooks, in total 967 examples, shows that only 2% of the examples display a thorough analysis of the problem statement (only 3 of 16 books). And, even worse, only 0.2% of the examples discuss how the programs can be tested and debugged. Malan and Halland [31] discussed the potential harm “bad” example programs might do when learning object-oriented programming. They identified four main problem areas, based on their own experiences as teachers; examples are either too abstract or too complex and examples are applying concepts inconsistently or even undermining the concept(s) being introduced. An important aspect of examples is misconceptions and how to avoid them. Holland et al. [25] outlined a number of student problems and how they might relate to properties in example programs, like object/class conflation, objects as simple records, and reference vs. object. Along with each problem they provide a pedagogical suggestion for avoiding potential misconceptions by choosing suitable examples. A similar problem is noted by Fleury [21], who discussed how students constructed their own rules by misapplying correct rules. She described certain cases how these studentconstructed rules can systematically lead students to incorrect solutions. In 2001, a discussion on ‘HelloWorld’-type examples was initiated in Communications of the ACM [48] which created a series of follow-ups on the object-orientedness of common introductory programming examples [12, 36, 18, 13]. Surprisingly, the main discussion has been on how to adjust the ‘HelloWorld’-example to better fulfill the characteristics of object-orientation, not on whether this is a good example at all. Hu [27] discusses the related problem of data-less objects (of which ‘HelloWorld’ is an instance). Using data-less objects is contradictory to the basic notion of objects when introducing them to novices. He calls them merely containers holding (static) methods and simulating a procedural way of programming. Although checklist-based assessment is a common approach for quality assurance in industry [10], it has rarely been used in educational contexts. Sanders and Thomas [41], for example, have reviewed typical novice misconceptions to develop checklists that educators can use when designing or grading student programs. The work presented in the present paper is based on earlier work on checklist-based evaluation of example programs by some of the co-authors [6, 7]. This earlier work mainly evaluated the feasibility of the approach and usefulness of the evaluation instrument. The present paper uses a refined evaluation instrument, taking into Nordstr¨ om’s heuristics for designing object oriented examples for novices [35], which are discussed in more detail at the end of Section 4.2. For the present work, we also extended the pool of reviewers beyond the original pilot group and looked at a much broader range of textbooks and examples.

3.

DESIRABLE PROPERTIES OF EXAMPLE PROGRAMS

In the literature the terms ‘example’ and ‘example program’ are often used interchangeably but can mean quite

_________________________________________________________________________________________________________ inroads — SIGCSE Bulletin Volume 41, 41, Number 4 —4 2009 December 127 -inroads — SIGCSE Bulletin -- 127 Volume Number — 2009 December

different things. In the context of the present work, we define example program as a complete description of a program, i.e., the actual source code plus all supporting explanations related to this particular program. For brevity, we usually omit ‘program’, when this is implicit from the context. A good example would then be an example that relates to the needs of novices in programming and object-orientation. Novices have no or only a limited frame of reference. They rarely have any prerequisite knowledge about (object-oriented) programming languages or the programming process. It is therefore important to carefully control cognitive load [15] and strip unnecessary complexity from the examples. • Example context should be easy to understand. • Examples must be small and/or focus on a single concept. • Object-oriented principles must be upheld and enforced. This leads to somewhat contradictory demands on examples. Upholding object-oriented principles does not necessarily lead to easy, straightforward solutions to small problems. In fact, the strength of object-orientation is in managing complexity. Kristen Nygaard, one of the originators of object-orientation, often stressed that object-orientation is a better problem solving tool for complex problems than for simple ones. In our opinion, an object-oriented example program must fulfill certain basic properties to be effective as an educational tool. It must • be technically correct, • be easy to understand (readable), • be a valid role model for an object-oriented program, • promote “object-oriented thinking”, and • emphasize programming as a problem solving process. However, implementing these characteristics is quite difficult. The educational quality of an example is not a definite, measurable quantity. Although some quality aspects, like code complexity, are quantifiable, it is very difficult to capture didactic and contextual properties [5]. This becomes apparent from the debates around the common ‘HelloWorld’ example [12, 13, 18, 27, 48], where object-orientedness is often argued about in technical terms, i.e. concrete syntax. From a didactic point of view, it might be argued that none of the proposed versions of ‘HelloWorld’ could serve as a good example, since the example as such seems artificial and not serving any “real” purpose. Which kind of application would need objects of this kind? Does it exemplify any important or general concept, besides concrete syntax? Even if the example fulfilled the criteria for good examples in a technical sense, its didactic qualities are still questionable. The design of examples for novices has constraints not present in the design of large-scale object-oriented systems. The size of examples, the repertoire of concepts and syntactical components, and the need to present concepts in isolation are limiting conditions. Furthermore, object thinking has to be promoted, and particular attention has to be given to the context or problem domain supplied with the example. Heuristics for designing examples for this particular situation are discussed by Nordstr¨ om [35]. Since these heuristics are closely related to the quality factors of our evaluation instrument, they will be further discussed in Section 4.2.

4.

METHOD

The work described here was carried out as an ITiCSE working group and based on earlier work by some of the

co-authors [6, 7]. An electronic checklist (described in Section 4.2) was administered for evaluating object-oriented example programs from common CS1 textbooks.

4.1

Example Programs

We investigated a large range of popular textbooks. The 13 textbooks from which we chose examples are listed in Table 1. Following VanDrunen’s approach [46], we investigated each text’s detailed table of contents to determine when various key concepts were introduced. In addition to that, we also looked into each text to get a feeling for actual focus and style of concept presentation. We observed a large variation from a strictly imperative first approach to an objects-first approach. Most texts (8 of 13) claim to follow some kind of objects-early or objectscentric approach or pedagogy in their back cover texts, prefaces or web pages. Six of the 13, however, don’t name objects in their titles, and on close inspection, those who claim to follow an objects-early or objects-centric approach sometimes have difficulties living up to their promises. Following this analysis, we classified these textbooks into two categories: those with a clear and early focus on object orientation (category OO) and those with a more traditional imperative first approach (category Trad ). The results of this classification are shown in Table 1. A full description of the key data concerning all 13 textbooks can be found in Appendix A. Table 1: Textbooks considered. BK [4] BS [8]

Pages 516 1210

Edition 4th 1st

DD [17]

1596

7th

Fa [20] GL [23]

870 625

5th 1st

Ho [26] LL [29]

1204 832

3rd 6th

MB [32]

1018

1st

NH [34] Ri [39] Ro [40]

1040 769 587

3rd 2nd 2nd

Sa [42] Wu [50]

1182 987

3rd 5th

Self declared as Objects first Fundamentals first Early classes and objects pedagogy — Balanced and flexible approach to mastering object-oriented principles Objects gradual True objectorientation Objects early but gently Objects first Object centric Modern objects first approach; class hierarchies from the beginning — Objects first

Category OO Trad Trad Trad Trad

OO Trad Trad OO OO Trad

Trad Trad

To reach a manageable but still representative set of examples, we focused on examples with comparable properties. Only complete examples (i.e. no code snippets) were considered. They should furthermore the first ones exemplifying certain high level concepts or ideas. Examples were then grouped into one of three groups for further analysis. To achieve a good balance of varying types of examples, we grouped them furthermore into three distinct groups, each

_________________________________________________________________________________________________________ inroads — SIGCSE Bulletin Volume 41, Number 4 —4 2009 December 128 -inroads — SIGCSE Bulletin -- 128 Volume 41, Number — 2009 December

focusing on different groups of concepts, as described below. FUDC: First user defined class. These examples reflect the first occurrence of a user defined class in a text. We consider these examples particularly important, since they “set the stage” for how students are expected to think about object-oriented class design. OOD: Multiple user defined classes. Examples in this group exemplify some kind of design decision/strategy. They show how existing classes can be “used” for defining new classes (inheritance, composition) or how designs can be made flexible (interfaces, polymorphism, ...). Examples in this group can be considered role models for determining relationships between classes. CS: Control structures. The main purpose of the examples in this group is exemplifying the usage of control structure (selection and repetition). One could argue that object-orientedness does not matter in this category, since this is not the purpose of the example. However, if students are supposed to eventually learn OOP (and not just the syntax of an OO language), it might be interesting to investigate this category further. In total, we selected 38 examples from the 13 textbooks described above. To bring this number down to a manageable number that can be handled by the working group, we grouped examples into mandatory (16) and optional (22) categories. The mandatory examples should be reviewed by all working group members. A summary of the distribution of examples per category can be found in Table 2. A full description of all examples can be found in Appendix B. Table 2: Example programs per category. Mandatory Optional P Category OO Trad OO Trad FUDC 3 3 0 5 11 OOD 3 3 3 11 20 CS 2 2 0 3 7 P 8 8 3 19 TOTAL 16 22 38

4.2

Didactic quality. A good example must be easy to understand and emphasize programming as a process. In this category, we take a didactic point of view; a good example must be easy to understand (see Table 5).

T1

Table 3: Technical quality (T1–T2). Correctness and Completeness. The code is bug free and the example is sufficiently complete. Examples of problems that might affect quality negatively: Actual syntactical errors; actual or potential run-time errors; code that does not adhere to the problem statement or requirements; etc.

T2

Readability and Style. The code is bug free and the example is sufficiently complete. Examples of problems that might affect quality negatively: Non-intuitive identifiers; inconsistent naming schemes; inconsistent or insufficient indentation; trivial comments (not adding information); excessive comments (making it difficult to discern code); etc.

Each quality factor can be assessed on a 7-point Likerttype scale from -3 (extremely poor) to +3 (excellent). In the review instrument, each quality factor is described briefly together with a list of typical problems. These problems were distilled from the literature on student problems or misconceptions and object-oriented design principles, guidelines and rules (see also sections 2 and 3). We then implemented an electronic survey using LimeSurvey [1]. Figure 1 shows an example screen for the questions in the category Technical Quality. The interested reader is invited to test the instrument at http://www.cs.umu.se/proj/limesurvey/. In addition to the 10 quality factors described in Table 3– Table 5, we also asked reviewers for their overall impression of example quality before and after the actual assessment (question codes I1 and I2, respectively). In two final open questions, reviewers could provide additional comments regarding example quality and the questionnaire itself.

Evaluation Instrument

Inspiration for our evaluation instrument was taken from the checklist-based evaluation by the Benchmarks for Science Literacy project [2]. We defined 10 quality factors that we grouped into three quality categories; technical quality, object-oriented quality, and didactic quality. Technical quality. A good example’s code must be technically sound and easy to read. The factors in this category capture properties that are independent of a particular programming paradigm, like correctness and readability (see Table 3). Object-oriented quality. A good example must be a valid role model for an object-oriented program and promote “OO thinking”. The factors in this category capture enforcement of commonly accepted principles, guidelines and heuristics of object-oriented design and programming (see Table 4).

Figure 1: Example questions (Technical quality). It is important to be able to discuss object oriented quality of examples without submitting to a particular line of


Table 4: Object-oriented quality (O1–O5). O1 Reasonable Abstractions. Abstractions are plausible from an OO modelling perspective as well as from a novice perspective.

D1

Examples of problems that might affect quality negatively: The problem is too trivial to motivate a computer program; describing the problem takes longer than solving it; the problem domain is not inclusive, e.g., by requiring certain prerequisite knowledge, focusing on localized concepts, domains, or using terms not commonly known to non-native speakers of the language; etc.

Examples of problems that might affect quality negatively: The units of abstraction are not intuitive; the units of abstraction have multiple responsibilities; etc.

O2

Reasonable State and Behaviour. State and behaviour make sense in the presented software world context. Examples of problems that might affect quality negatively: There are only get-/set-methods and print operations; methods have multiple responsibilities; when real world objects/phenomena are modelled, the model and the modelled are confused (model state and behaviour will usually differ from the real world objects); etc.

O3

Reasonable Class Relationships. Class relationships are modelled properly (the “right” class relationships are applied for the “right” reasons).

D2

D3

Exemplary OO code. The example is free of “code smells”.

Promotes “Object Thinking”. The example supports the notion of an OO program as a collection of collaborating objects. The example does not support common misconceptions. Examples of problems that might affect quality negatively: All classes are instantiated at most once; all attributes are of primitive types; many objects are static (i.e., not instantiated explicitly); works heavily with Singletons and/or class methods (like Math); works heavily with immutable objects (e.g., of type String); etc.

presentation (objects first/late, order of concepts, ..) and IDE/environment used. To support the design of exemplary examples for novices, Nordstr¨ om [35] proposes six heuristics (He[d]uristics1 ). These heuristics are based on accepted object-oriented principles and guidelines and the particular needs of novices. The major focus of the He[d]uristics is the mindset of the designer and not to regulate technical details. Many of the established detailed practices, like keeping all attributes private, not creating “God classes,” etc. are covered by Riel’s object-oriented design heuristics [38] or Fowler’s object-oriented design principles and code smells [22]. Our quality factors O1–O5 correspond to Nordstr¨ om’s He[d]uristics in the following way: 1. Model Reasonable Abstractions corresponds to O1. 1

Heuristics for the design of educational examples.

Well Balanced Cognitive Load. Explanations and supporting materials promote comprehension; they are neither simplistic, nor do they impose extraneous cognitive load. Examples of problems that might affect quality negatively: Lack of focus; too much new material in one step; incoherent set of new materials/issues/topics; irrelevant details; text is unstructured or difficult to navigate/orientate; too high/low a level of abstraction; lack of visuals (tables, graphs, diagrams, pictures, screenshots, ...); inappropriate visuals (e.g., off-topic or increasing cognitive load by split attention effect); etc.

Examples of problems that might affect quality negatively: Objects can be in incomplete/inconsistent states (e.g., when instantiation does not set all attributes appropriately); the main-method is mixed up with another abstraction; inadequate conditionals (that should be replaced by polymorphic solution); long methods; long parameter lists; long method chains (violating the Law of Demeter); implicit dependencies; duplicate code; unnecessarily complex concepts (e.g., anonymous classes, assertions) or syntactic sugar (like the ? : operator); etc.

O5

Process. An appropriate programming process is followed/described. Examples of problems that might affect quality negatively: The code falls from the blue sky, without motivation or rationale; the code is presented as the single correct solution; there is no discussion around the problem or (initial) analysis/design; verification/validation issues are completely neglected; etc.

Examples of problems that might affect quality negatively: Is-a and has-a relationships are confused; subclasses do not model subtypes; inheritance is used to model roles (instead of composition and delegation); parallel inheritance; there are superfluous or missing relationships; etc. Note: When no relationships are present and the example doesn’t call for any, Reasonable class relationships can be considered excellent.

O4

Table 5: Didactic quality (D1–D3). Sense of Purpose. Students can relate to the example’s domain and computer programming seems a reasonable way to solve the problem.

2. Model Reasonable Behaviour corresponds to O2. 3. Emphasize Client View is a component of O5. 4. Favour Composition over Inheritance is implicitly present in O3. 5. Use Exemplary Objects Only is a component of O4 and O5. 6. Make Inheritance Reflect Structural Relationships is a component of O3.

5.

RESULTS

In total, we received 215 valid reviews by 25 individuals performing between 1 and 38 reviews each. Of the 25 reviewers, 7 (R1, R2, R4–R8) submitted 16 or more reviews. Of the 38 examples, 18 (E1–E16, E21, E26) received 8 or more reviews. In the remainder of this paper, we focus our analysis on these major reviewers and examples. The 7 major reviewers account for 122 of the reviews (7 x E1–E16 and 5 x E21,E26). Details about reviews per reviewer and reviews per example can be found in Table 6 and Table 7, respectively. Table 8 summarizes the overall impression of the quality of the major examples. This shows that many textbook examples get less than half of the maximum score of +30. We also examined the examples with the highest and lowest total score per reviewer (columns Hi) and (column Lo, respectively). This further confirmed the ranking in Table 8 in that E9 (picked 2 times), E3 (picked 3 times), E12, E7, E2 and E10 were rated the highest; and E15 (3 times), E13 (2 times), E4, E8, E11 and E14 were rated lowest. The average ratings for overall impression before and after the actual review (I1 and I2, respectively) further corroborate the ranking. Interestingly, the overall impression seems to degrade during the review, in particular for the examples


Table 6: Number of reviews per reviewer. Reviewer Reviews R1 16 R2 38 R4 22 R5 18 R6 18 R7 21 R8 16 All other 18 reviewers 66 P 215 Table 7: Number of reviews per example. Reviews Example(s) 12 E1,E2,E13 11 E11,E12,E16 10 E3,E4,E6,E9,E10,E14,E15 9 E5,E7,E8 8 E21,E26 3 E19,E20,E25 1–2 E22–E24,E27–E38

Table 8: Summary of the main results for the major examples (with ≥ 8 reviews) and major reviewers (with ≥ 16 reviews). Ranking is top down according to average total score. ECa BCb Scorec Hi Lo I1d I2e E26 OOD OO 23.80 2.20 2.20 E9 OOD OO 20.29 2 2.00 1.71 E21 FUDC Trad 20.20 1.80 2.00 E12 OOD Trad 17.86 1 1.57 1.29 E3 FUDC OO 17.71 3 1.00 1.43 E7 OOD OO 16.57 1 1.43 0.86 E16 CS OO 16.00 1.71 1.29 E2 FUDC OO 15.71 1 1.00 0.71 E1 FUDC OO 14.43 1.00 1.00 E5 FUDC Trad 13.86 0.29 0.43 E10 OOD Trad 12.29 1 1.00 0.43 E6 FUDC Trad 9.57 0.00 -0.29 E4 FUDC Trad 9.00 1 -0.14 -0.71 E8 OOD Trad 6.43 1 0.00 -0.43 E11 OOD OO 5.00 1 0.14 -0.29 E14 CS Trad 1.14 1 0.00 -1.00 E13 CS OO -2.43 2 -1.43 -1.57 E15 CS Trad -2.43 3 -1.29 -1.57 a

that already have a low overall first impression (I1). We elaborate on that further in Section 6.1. So there does seem to be some degree of agreement on what is a good example and there were no examples that were rated both highest and lowest! As can be seen in Figure 2, some reviewers are either more or less generous than others. Averages can hide considerable variation, so we dug deeper and looked at the reviewers’ ratings for each example. Figure 3 shows that reviewers are “in the same ball park” with their ratings. Reviewer ratings largely follow a similar up and down pattern, but often on different levels. A certain amount of discrepancy can be observed, particularly with reviewer R5 (the most generous one) and to a lesser extent R1.

Figure 2: Average total scores per reviewer. Table 9 summarizes the mean values and standard deviations for all quality factors over all major examples and reviewers. The standard deviations are quite high. However, some of this variation might be due to reviewer generosity. When we concentrate on the O-factors, for example (see Figure 4), there is a similar pattern as for Figure 3, i.e. the reviewers often have the same ups and downs, just on different levels.

Example category as defined in Table 13 and Table 14. Book category as defined in Table 1. c Average total score ([-30..+30]) for the 10 quality factors T1–D3. d Average score ([-3..+3]) for overall impression before review start. e Average score ([-3..+3]) for overall impression after finishing the review. b

By looking at the average scores on all single factors for all examples (see Figure 5), we get some indication of whether or not our factors capture different aspects of example quality. When the lines converge we are seeing an example where all the quality factors match. Where they diverge we see an example that scores differently for different factors, see for example E14. If all the lines converged on every example, it would mean that the checklist elicited the same information redundantly. When results are similar and on a high level, see for example E9, we have a good allround example without any particular flaws. Experienced teachers develop a sense for “good” and “bad” (“I know it when I see it”). They intuitively recognize examples that exhibit desirable properties and deliver their message in a clear and unambiguous way. This intuitive sense of quality is captured by question I1 of our electronic survey. How our evaluation instrument affects this intuitive perception of quality is discussed further in Section 6.1.

6.

DISCUSSION

In this section we discuss some of our results in more detail. In Section 6.1, we look at the relationships among the quality factors and, in addition, consider some comments on the checklist in general. Section 6.2 discusses certain properties of example programs. In section 6.3, we look at specific issues the reviewers noted in their review comments. Finally, in Section 6.4, we look into reasons why examples scored particularly high or low on our quality factors.


inroads — SIGCSE Bulletin

- 132 -

Volume 41, Number 4 — 2009 December

O5

T1

T2

O1

T

I2

O O2

I1 # Pages

O4 O3

D2 D3

D1 D

Figure 6: Significant correlations (at level α = 0.01) between quality factors, quality categories (Σs) and I1 and I2 (crossing lines are dashed for readability). all went down. Twice as many of the reviews that changed by 2 went down (8 down and 4 up). It seems that the overall impression of quality degrades from the first impression to the impression after going through all quality factors. It could be argued that the checklist facilitates a closer examination and helps identifying actual or potential problems. It furthermore seems that the checklist helps finding agreement on the overall impression of example programs. In Figure 7, we can see how the ratings for overall impression change from initial (I1 to the left) to final (I2 to the right). The boxes in Figure 7 represent the middle 50% of the data. The upper and lower hinges extending from the boxes indicate the upper and lower quartiles of the data. A dark horizontal line inside a box represents the mean value and single bullets represent outliers. We can see that the pattern for I2 (right) looks much more similar for all reviewers. We therefore argue that the checklist not only helps finding actual or potential problems in the examples, it also facilitates more consistent evaluations.

Figure 7: Average ratings per reviewer on I1 (left) and I2 for all examples.

6.1.1

Additional Comments

An important design decision was to keep the checklist

small and easy to use. This meant, among other things, that the number of quality factors should be kept small. One consequence of this is that the quality factors are rather coarse-grained. Fine-grained checklists are less manageable, in that not all aspects will be applicable to all examples. Ideally the checklist should achieve a balance between, on the one hand, ease of use and manageability and, on the other hand, coverage of all the important issues associated with the examples. The responses given to the two open questions provide information on issues which reviewers identify with the checklist and with the examples themselves. The following observations were commonly made regarding the checklist: • the opportunity to make comments at each question or page of the checklist would be useful • it is difficult to grade OO quality for an example which has only one class • it is difficult to apply OO quality indicators to an example which is not object-oriented • there is no indication given of appropriate weighting for the individual quality indicators listed for each question • the individual quality indicators used in grading each question are not captured • it is difficult to grade examples which build on previous examples - reviewers generally had access only to the book pages which contained the examples, not to the whole book • the rating approach which should be taken is not clear. For example should the reviewer start with a rating of +3 and then punish examples for faults found, or start at 0 and add for good features and subtract for bad ones? Where reviewers provided comments on the examples they tended to make several points in the single open question field provided for that purpose at the end of the checklist. These comments often related to different aspects of the example, and as suggested by some reviewers, it may have been easier for reviewers to comment at the appropriate page of the checklist. Many comments could be related closely to quality indicators in the questions. For example, the comment “There are instance variables that are not initialized by the constructor” clearly suggests that the first quality indicator for question O4 (“Objects can be in incomplete/inconsistent states”) has been used in grading that question. However, the absence of any comment closely related to it does not necessarily imply that a particular quality indicator has not been discovered and used in grading the question. Several common issues emerge from the comments which cannot be easily related to the quality indicators. This suggests that there is scope for additional guidance in the quality indicators regarding these issues. The wording in the quality indicators emphasises guidance for the reviewer in identifying problems with examples, rather than in identifying good practice. Although it is not explicitly stated, this could imply that reviewers should start with the assumption that the example is excellent and then grade lower as problems are discovered. Reviewers, on the other hand generally made positive and negative comments on the examples. Some positive comments simply indicate the absence of problems or are the opposite to issues identified as problems, for example “interesting” as opposed to

_________________________________________________________________________________________________________ inroads — SIGCSE Bulletin - 133 Volume 41, Number — 2009 December inroads — SIGCSE Bulletin Volume 41, Number 4 —4 2009 December

“boring”, while other comments take a position on one of the issues where there is no clear preference among reviewers.

6.2

Table 10: Relative lengths of examples. E15, E14 short E13, E7, E6, E16, E1, E8, E10, E21 medium E2, E5, E4, E11 long E9, E12, E26 longer E3 longest

About the Examples

We computed the maximum, minimum, and median scores for each question on the 122 reviews of major examples by the core reviewers. In all cases, the maximum score was a 3 (on a scale from -3 to 3). The minimum score was -3 for 8 of the questions, and -2 for the rest (T1, Correctness; T2, Readability; O3, Class Relationships; and D1, Sense of Purpose). Overall, for all 1464 answers on these reviews, the median score was a 1. The median score on one question (O3, Class Relationships) was a perfect 3. This probably reflects the nature of the question, however, rather than the quality of the examples (see Footnote 2). The median score on five questions (T1, Correctness; T2, Readability; O1, Abstraction; O4, Code Smells; and D1, Sense of Purpose) was a 2. We can’t say that one reviewer’s 2 is necessarily the same as another’s, but broadly speaking, this probably means that the reviewers considered the quality of the examples along these particular dimensions to be, while not perfect, still pretty good. The median score on five questions was a 1 (O2, State and Behavior; D2, Process; D3, Cognitive Load; I1, Initial Impression; and I2, Final Overall Impression). This is more difficult to interpret. Some reviewers interpreted 0 as neutral and anything above that as positive. Others started with 3 and subtracted when they found flaws in an example; 1 for such a reviewer may not have been a positive review. Some reviewers reported that they had used different strategies on different examples. Certainly, these questions reflected areas in which the reviewers found flaws in many of the examples. The median for Question O5 (Object Thinking) was a 0, the lowest median value for any of the questions. This was in fact the weakest point for the examples. Specific problems related to O5 are discussed further in Section 6.4.5. The questions about the reviewers’ overall impressions (I1 and I2) give some further insights into the examples. For both of these, as indicated above, the median value was a 1. Six different examples received a perfect 3 for the initial impression, each from a single reviewer (E2, E7, E9, E12, and E27). Three of those (E2, E7, and E26) received a 3 for the final overall impression as well (again, each from a single reviewer). Two examples (E3 and E9) received a 3 for the final overall impression after starting with a 2 (again, each from a single reviewer). The reviewers’ overall impression of the examples went down on average after completing the checklist. When we look closer, the picture is not so clear, however. The scores went down in 34% (42) of the reviews, went up in 15.5% (19), and stayed the same in 50% of the reviews (61). This may reflect some conscious or unconscious bias in favor of giving consistent answers, or it may indicate that the ratings are on the whole quite stable.

6.2.1

Length of Examples

We calculated the length of each example (counting all pages, both code and text, in half-page units). The examples varied from 1.5 to 18.5 pages. We then sorted them by length and grouped the examples, including examples in the same group if the difference between successive items was ≤ 1. The results are shown in Table 10.

We also looked at each reviewer’s most and least favorite examples (see also Table 8). We found that three of the four longest were each listed by at least one reviewer as their favorite example. The two shortest examples were each listed by at least one reviewer as their least favorite example. There was no intersection between the list of favorites and the list of least favorites.

6.2.2

Visuals and Pedagogical Features

All the examples included discussion and text, but many of the examples also provided other supporting materials. These materials tended to be suited to the type of example. For instance, E1–E6 and E21 were all first examples of a user defined class and all except E5 include skeleton code. E13– E16 were all control structure examples, and all except E15 include a flowchart; all except E14 include skeleton code. All programmers have their own opinion on the “art” of commenting. Based on comments made by the reviewers during the working group, the commenting style could easily have affected the reviews about the code. The value of “block comments” versus line-by-line has been discussed in previous textbooks and literature. The commenting of methods ranged from prior to the definition to after the method definition. The commenting style can affect the readability of the actual code. Not surprisingly, both of the inheritance examples (E7 and E8) include an inheritance hierarchy, as do the polymorphism examples (E11 and E12). Only E9 of the collaborating classes examples (E9, E10, E26) shows an interaction diagram, this example also includes a discussion of class responsibilities. In their respective categories, E3 and E9 have amongst the most supporting materials, including instance diagrams, but there does not appear to be a correlation between materials and score on the questionnaire; for instance, within the same categories, E11 and E12 exhibit the same materials and their overall scores are very different, as do E13 and E16.

6.2.3

Importance of Cover Stories

An important part of an example program, besides the actual source code, is the text describing or explaining the example. Often this text is devoted to a context and makes up a kind of cover-story. This story links the example to a context, providing means to be accepted as a meaningful example, as it, for example, can be related to business, typical programming tasks, or everyday live. Thereby, a context provides motivation. Furthermore, it should directly support the understanding of the source code, and the concepts targeted by this example. From a theoretical point of view, we can distinguish three dimensions of understanding which can be supported by the cover story: (1) understanding of the text-structure, (2) understanding of the execution of the program, and (3) understanding of the intention of the program ([43]).


Text-structure relates to the way the source code of the example is divided into chunks—here the classes, their properties (attributes and methods), their relationships, and their naming. A cover story can help to establish a general understanding of the entities in the context, which are (partially) presented in the source code. Therefore the structure of this code is more easily to understand, because it is somehow already known / available in the mental model of the learner. The execution of the program (second dimension) focuses on the dynamic aspects of understanding the source-code. These are related to understanding the algorithms involved, the state and behavior of the objects during runtime. As before, a familiarity with the context and processes in the context should help to understand related processes in the example source code. The third dimension focuses on the intention of the program, its goals and aims. What problem should be solved by the source code? The cover story can engage students in thinking about the context in a way which raises a problem to be solved—and the program then is presenting a solution to this context-related issue or problem. These different functions of cover stories related to programming will be analyzed from a qualitative and a quantitative perspective. We can also analyse the role of cover stories quantitatively by using data from the instrument. The quality factors O1, O2, and D1 all address the context of examples, in different ways: • O1 asks for reasonable abstractions; examples should be plausible from an OO modeling perspective (the design captures the essential “issues” of the context!), and from a novice perspective (the design is understandable from an everyday perspective of the context). • O2 asks for reasonable state and behaviour: examples should make sense in the presented software worlds context. • D1 asks for the sense of purpose; whether students can see connections between the source code and the real world domain, and whether it is sensible to use a program in this context. Indeed, the quality factors O1, O2 and D1 are significantly correlated (at level α = 0.01), see Table 11. Table 11: Correlation between quality factors O1, O2 and D1. O1 and O2 0.55 O1 and D1 0.69 O2 and D1 0.54 We computed a new quality factor Cover Story, the sum of the above three factors. Cover Story is strongly correlated to both Initial Impression (I1), and Final Impression (I2). This suggests that reviewers believe that a good cover story is important for a good example. What are the concrete features of the best (and worst) cover stories? When comparing examples regarding the Cover Story quality factor, two examples stand out (E26 and E12). E26 introduces a project where several objects interact, and uses the perspective of abstraction and modularization.These principles are discussed using an example from developing

cars, and is then transfered to the source code example, a digital clock. E12 introduces Polymorphism and actually uses several contexts to describe polymorphic behavior. First context is chess pieces, then animals, followed by a hierarchy of staff members. On the other end of the spectrum were examples in which the context did not match the source code. Reviewers often thought, however, that with little changes the source code and context could have been brought in sync, which should improve the overall quality of the example. Qualitatively, we interpreted whether we think the cover stories are really useful for engaging students in thinking about issues which make them more aware and more focused on the topics to learn. Considering the three dimensions mentioned above (structure, execution, and intention), we found that only few examples make a substantial contribution to understand the execution. This may be due to the fact that mostly object oriented concepts were introduced, and therefore only few steps in algorithmic processing were involved. More generally, most examples included only simple methods, only a few method calls, and processing was often done by simple changes of values of numerical variables. Regarding the intention of the program, some cover stories raised questions to be solved with the program, but many examples simply presented a model of the context which could be explored, rather than an answer to a problem (besides the technical problem of writing a program related to that context). The dominant use of cover stories was to help students to understand the structure of the source code, by transferring names from the context to names of entities in the source code, and by structuring the source code so that it is in alignment with a “usual” mental model of the structure of the context.

6.2.4

Types of Contexts

The examples also use a range of contexts. A well-chosen context should ensure that the example is meaningful, inclusive, interesting, well-matched to the learning objective and does not require the introduction of numerous or complex concepts or ideas not related to the learning objective. A number of distinct types of context can be identified in the examples, see Table 12 for a summary: Table 12: Context types for examples. Context Example(s) Simulation of real E1,E2,E4,E6,E21,E26 world objects Real world data E3,E5,E7,E8,E12,E15,E16 Games E9 Graphics E10,E11 Artificial E13,E14 • Simulations of real world objects: for example E1 simulates a ticket machine. • Real world data: objects which might conceivably exist in a computer system. For example, E7 models a bank savings account. • Games: computer games or simulations of real world games. For example, E9 simulates a Nim game, and offers the possibility of extending the code later to cre-

_________________________________________________________________________________________________________ inroads — SIGCSE Bulletin Volume 41, Number 4 —4 2009 December inroads — SIGCSE Bulletin - 135 Volume 41, Number — 2009 December

ate a playable computer game. • Graphics: the aim of the program is to draw a picture of some kind on the screen and the objects are the graphical elements which make up the picture. For example, E10 draws a train composed of shapes on the screen. • Artificial: no discernible relationship with the real world and/or trivial functionality. For example, E13 moves a label across the screen and changes its color for no purpose other than to give a visual indication of the result of executing an if statement. If we compare these results with the scores for total average as well as overall impressions (I1 and I2) in Table 8, we can see that both the artificial examples scored very poorly, while the game example scored well. Neither of the graphics examples scored particularly well, while the other two categories, simulations and real world data, both contain examples which scored well. The quality factor most likely to be affected by the context is D1 (Sense of purpose). Our data in Figure 5 show the average scores for all factors for each example. We can see that the artificial examples score very poorly on OO quality, which brings the overall total down. However, the D1 scores for these examples are the lowest of all the examples even though they are higher than almost all the corresponding object-oriented quality factors.

6.3

Issues Identified by Reviewers

The following issues were identified by reviewers as problems in their additional comments: • “boring” examples • unrealistic simulation of a visual scenario • excessively long explanation or slow progression • use of concepts which have not been covered and will be explained later in the book • example misses out a key concept within the topic • instances variables listed at the end of the class – although this is a recognized coding convention ([3]) all reviewers who commented on this issue regarded it as a problem in a teaching context • example is not OO, all code is in main method of a single class • no example of the way in which a class can be used, for example by a client or test class or testing interactively in an IDE • code complexity unrelated to concept • in examples which use a graphical context, high focus on coding graphics and not much on the modeling aspects • textual or graphical output from code not illustrated • subclasses add no behaviour or override all superclass methods • use of control structure not motivated within OO context • inappropriate choice of control structure for problem statement • variable declarations within loops • unclear comments – the checklist refers to trivial or excessive comments but not to poorly written or unhelpful comments The following issues were identified by reviewers, but no clear preferences emerged from the comments: • Process: incremental improvement or upfront design? • Is the example written assuming the use of a particular

IDE? • Comment style: are JavaDoc comments used? • Other style issues: for example, opening brackets at end of line of code or on next line?

6.4

How Examples Meet Quality Factors

The quality factors capture the desired properties of good OO examples, but they do not explain how these desirable properties can be reached. Speaking in software terms, they provide an analysis of the requirements, but not a design for the solution. In this section we analyze high and low rated examples with regard to certain quality factors. The aim is to provide some more detailed insights into how to meet quality standards for OO examples. Note that these tips are based on ratings of the analyzed set of examples. We looked at 6 factors in more detail: T2, O2, O3, O4, O5, and D1. We mention those features by which some examples stand out against the others – they may seem obvious, but they are too seldom used.

6.4.1

T2 – Readability and Style

The examples were all clearly laid out with consistent formatting and indentation. The main differences in style were found in comments and in the order of class components in the code listings. No examples received a negative average score for this quality factor, suggesting that reviewers did not perceive the way examples dealt with these issues to be major problems. Example programs vary very widely in their style and amount of comments. In some example programs comments amount to more than 50% of the program text (e.g. E5), other don’t have comments at all (e.g. E7). Many textbooks also colour code their code, which can have a tremendous effect on readability. However, this is not always an advantage, since too many and too light colors still can make it quite difficult to discern the various parts of a program’s text. Colour coding is not necessarily positive; E5, for example, contains more comments (in JavaDoc) than many other examples. But still, the actual code stands out, by just two different styles (code in bold and comments in gray). Excessive commenting, in particular heavy use of JavaDoc can also affect D3 negatively, since it imposes extraneous cognitive load (the comments just repeat things and do not give meaningful additional information). Based on their comments it is apparent that reviewers took a negative view of the use of the coding convention which places instance variables after the public methods in the class listings. This structure may impair readability in a book format as variables appear to be introduced “out of thin air” unless the reader scans to the end of the listing. Where a class is also illustrated in a UML diagram, it is probably helpful to match the code structure to the diagram.

6.4.2

O2 – Reasonable State and Behavior

The good examples in our data set demonstrate rich behaviour and complex algorithmic processing through proper design of class relationships. The best examples included: • A polymorphic pay-method, which differed according to the type of staff member, and allowed interaction to the collection of members to pay everybody • A clock display that created rich behavior by using nested method calls of small and simple methods (including the management of “buffer overflow”)


In both cases, simple methods are used to create interesting and rich behavior – in this way demonstrating the usefulness of proper decomposition. The examples of our data set which were rated as bad in O2 produced such rich behaviour with longer and more complicated method bodies, often including hard to follow program flow due to complex selection structures (without in the end building any really surprising behaviour). Some examples which aimed to demonstrate the usefulness of conditionals, selection and looping chose to do this within a single method body, with few or no calls to user-defined methods.

6.4.3

O3 – Reasonable Class Relationships

The reviewers’ scores for this quality factor were influenced by the suggestion that class relationships can be considered excellent if no relationships are present and the example doesn’t call for any. An example which has a single class which is exercised using the interactive capability of an IDE such as BlueJ may have been rated highly because reviewers considered that no relationships are present or because they considered that a class exercised appropriately by a client represents a reasonable relationship. On the other hand, it is reasonable to assume that a single-class example in which all the code is in a main method can only have been rated highly for O3 on the basis of having no relationships. Several reviewers commented that a “not applicable” option would have been useful here. Nevertheless we can identify some features by which some examples stand out. • The example should contain at least one class which models a reasonable entity • A clear example should be given of how to exercise the classes, even if there is only one class. This can be done with a test/driver class with a main method, or by exercising interactively in an IDE which supports this. • Where an example has multiple collaborating classes a well described design process makes the abstractions and the reasons for relating classes clear. • Visualizations of the object structure at runtime, e.g. uml-like instance diagrams, are particularly helpful for relationships. Typical problems related to O3: • Modelling roles with subclasses. In some cases difference between roles and subtypes may seem rather subtle for CS1 students to appreciate, and inheritance makes sense at that level. However, it should not be too difficult to think of situations where subtypes are clearly appropriate. • Subclasses which do not add useful behaviour, or which override all the behaviour of the superclass. • Using protected without explaining the difference between that and other access modifiers.

6.4.4

O4 – Exemplary OO Code

O4 considered exemplary OO code, particularly whether the example is free of “code smells”. Many examples of code smells and other problems the reviewers should look for were explicitly mentioned in the questionnaire, like operations leaving objects in incomplete/inconsistent states, adding the main method to a problem class, long methods or parameter lists, etc. (see Table 4 for details).

Code smells or similar problems were found to some extent in many of the examples we reviewed: • I/O in class (toString, should return a String) • Return string and let caller decide what to do with data (E1) • Constructor does not initialize all instance variables (E5) • No constructor method (E8) • Partial implementation of actual behavior (E9) • Instance variables initialized by declaration, not constructor (E10) • Method definitions without implementations (E13) • Main method in object/problem class (E14, E15) • Duplication of code (E16) The examples with the poorest scores for this quality factor were E14 and E15, each of which consist of a single class which includes a main method, suggesting that this is the code smell which most affects reviewer’s perceptions for O4.

6.4.5

O5 – Object Thinking

Good examples meet the requirements for O5 in a number of ways: • Multiple instantiations of at least one class – this helps differentiate the idea of object from that of class. • Different types of objects, that fulfill a meaningful purpose (e.g. a clock display, which displays two number displays, and therefore is an object that has links to other objects) • Visualization of the object structure at runtime, e.g. UML-like instance diagrams. • Visualization of how method calls change the state of objects, or the overall change of the object structure, e.g interaction or sequence diagrams (only one example used this). • Methods that change attribute values, but in this process some state is taken into account, so that the change of attribute is still a simple operation, but a meaningful one. Note, such interaction of objects, where the interaction depends on state is a good anchor to discuss selection and logical operators. Good examples do this – and interestingly the badly rated examples in this category are those that discuss selection or similar basic language features for algorithmics in a OO context without providing a meaningful example. Typical problems related to O5: • discussing a particular detail of the language, paradigm etc. without a meaningful context. The class should model some real world entity in a way the student can relate the classes and objects to things in the real world. For instance a Die class modeling a single die with a maximum value of 6 is more meaningful at this stage than a Die class that models a collection of dice (without using an array or any other collection to store individual dice). • not instantiating objects, only using static methods to show some language features/constructs. It is better to make use of the changing state of objects during runtime to discuss these features. • not at least indicating how to run the code. Use a Driver class or an IDE that directly exercises classes and objects, rather than leaving the code standing alone.


6.4.6

D1 – Sense of Purpose

To meet the requirements for D1 an example must be about something students understand and should not introduce complexity that does not contribute to the point being made. The examples with the highest scores for D1 model real-world data (E5, E12) or implement a game (E9). Not surprisingly, examples with artificial contexts (E13, E14) score poorly for D1. However, D1 is not simply about context – the problem domain must be inclusive. For example, the traffic light simulation in E2 may not relate accurately to the real-world knowledge of some students as traffic light sequences vary from country to country. Typical problems related to D1: • fixating on the syntax and presenting features / concepts and their implementation in a programming language without motivating their usefulness in actually solving a programming task. For instance, an if statement that simply checks an age may be an OK example for an expert wishing to simply check the syntax, but it provides no sense of purpose. Summaries of the syntax of constructs can be given at the end of the chapter (or text) for quick reference. • making assumptions about students’ background. Frankly, problems that require mathematical expertise are not suitable for most of our students.

6.4.7

Issues Raised by this Discussion

Examples that intend to show relationships between objects may more naturally comply with these factors, but what of control structure examples? Three of the four control structure examples scored poorly on these factors. In these three the control structure was used in just one class: one did not show how to exercise the code, one put the control structure in the main method, and another had a main method in that same class. The “good” control structure example used two classes, and the control structure was clearly the right way to compute something useful (compound interest). It is possible to write meaningful examples that exhibit a sense of purpose and promote object thinking, even for control structures and other constructs that are often presented in a way that is mainly syntactical. The difficulty is that these examples may tend to be longer than simpler, but less useful examples. It is possible to solve this problem however, the control structure example discussed above actually includes two types of loop, each used in an appropriate situation. This is useful both as a pedagogic device for showing variation and also saves space. As discussed previously, our approach has made it very difficult for us to evaluate examples that build throughout the course of a textbook – where the final cognitive load may be high, but since it is introduced gradually that is not a problem.

7.

CONCLUSIONS AND FUTURE WORK

In this report, we have described a project in which a group of seven CS faculty, from six different institutions, used an online checklist to evaluate examples selected from a variety of CS1 Java textbooks and illustrating a variety of topics. The books fell into two broad categories, which we have labeled “object-oriented” and “traditional.” We selected three types of examples: the first user-defined class; collaborating classes; and control structures.

In this paper, we have analyzed 122 reviews from these seven reviewers. The books reviewed included recent editions of 10 widely used Java textbooks.3 They included an equal number of object-oriented and traditional books, and a balanced selection from the three different types of examples. Based on our analysis of these data, we concluded: • Textbook examples generally do not have significant technical problems. • Control structures examples are generally weak from an object-oriented point of view. Writing a control structures example that also has good OO qualities is not impossible, however, as one of the control examples scored quite well on the OO questions. • In general, the OO textbooks fared better than the traditional books. The OO books in fact were just given that name because they present OO material earlier; but it turns out that their examples were also rated more highly on OO qualities. • Reviewers gave an “overall rating” both before and after completing the substantive questions in the checklist. In half the cases, these ratings changed, most often by plus or minus 1, but occasionally by as much as 3 or 4. 34% of the reviews went down, and 16% went up. It seems that the checklist helped reviewers to think more deeply about the examples and might be useful as a quality assurance tool. An instrument of the kind presented here is useful for several reasons. For an individual, it is a way of learning about the qualities example programs must hold to be successful learning and teaching tools. It helps in the construction of examples for educational purposes. By selecting a few examples of key points from each book and evaluating them, faculty could use the checklist to help them in textbook selection. Finally, the community could use it in connection with a repository of code examples. A problem with all repositories is building a community of users, and one of the key issues is quality control [19]: users only take advantage of the repository if they believe the contents will be of value to them. If code examples were accompanied by reviews in some standard format, such as the one proposed here, users would have a way of quickly evaluating the materials they retrieve.

8.

ACKNOWLEDGMENTS

The authors would like to thank all additional reviewers, in particular Michael E. Caspersen, Thomas Johansson, and Lena Kallin Westin who contributed 10 or more reviews each.

9.

REFERENCES

[1] Limesurvey project homepage. http://www.limesurvey.org/, last visited 2009-08-26. [2] AAAS. Benchmarks for science literacy, a tool for curriculum reform, 1989. http://www.project2061. org/publications/bsl/default.htm, last visited 2009-08-26. [3] S. W. Ambler, A. Vermeulen, and G. Bumgardner. The Elements of Java Style. Cambridge University Press, 1999. 3 In total, we reviewed 13 textbooks, but the 122 reviews analyzed in the present paper cover only 10 of those texts.


[4] D. J. Barnes and M. K¨ olling. Objects First with Java. Prentice Hall, 4th edition, 2009. [5] J. B¨ orstler, M. Caspersen, and M. Nordstr¨ om. Beauty and the beast—toward a measurement framework for example program quality. Technical Report UMINF-07.23, Dept. of Computing Science, Ume˚ a University, Ume˚ a, Sweden, 2007. [6] J. B¨ orstler, H. B. Christensen, J. Bennedsen, M. Nordstr¨ om, L. K. Westin, J. E. Mostr¨ om, and M. E. Caspersen. Evaluating oo example programs for cs1. In ITiCSE’08: Proceedings of the 13th Annual Conference on Innovation and Technology in Computer Science Education, pages 47–52, 2008. [7] J. B¨ orstler, M. Nordstr¨ om, L. K. Westin, J. E. Mostr¨ om, H. B. Christensen, and J. Bennedsen. An evaluation instrument for object-oriented example programs for novices. Technical Report UMINF-08.09, Dept. of Computing Science, Ume˚ a University, Ume˚ a, Sweden, 2008. [8] R. Bravaco and S. Simonson. Java Programming – From the Ground Up. McGraw-Hill, 1st edition, 2010. [9] R. Brooks. Towards a theory of the comprehension of computer programs. Intl. Journal of Man-Machine Studies, 18(6):543–554, 1983. [10] B. Brykczynski. A survey of software inspection checklists. ACM SIGSOFT Software Engineering Notes, 24(1):82–89, 1999. [11] J. Burkhardt, F. D´etienne, and S. Wiedenbeck. Object-oriented program comprehension: Effect of expertise, task and phase. Empirical Software Engineering, 7(2):115–156, 2002. [12] CACM Forum. ‘Hello, World’ gets mixed greetings. Communications of the ACM, 45(2):11–15, 2002. [13] CACM Forum. For programmers, objects are not the only tools. Communications of the ACM, 48(4):11–12, 2005. [14] M. Clancy. Misconceptions and attitudes that infere with learning to program. In S. Fincher and M. Petre, editors, Computer Science Education Research, pages 85–100. Taylor & Francis, Lisse, The Netherlands, 2004. [15] R. Clark, F. Nguyen, and J. Sweller. Efficiency in Learning, Evidence-Based Guidelines to Manage Cognitive Load. Wiley & Sons, San Francisco, CA, USA, 2006. [16] M. de Raadt, R. Watson, and M. Toleman. Textbooks: Under inspection. Technical report, University of Southern Queensland, Department of Maths and Computing, Toowoomba, Australia, 2005. [17] H. M. Deitel and P. J. Deitel. Java – How to Program. Prentice Hall, 7th edition, 2007. [18] M. H. Dodani. Hello World! goodbye skills! Journal of Object Technology, 2(1):23–28, 2003. [19] S. H. Edwards, J. B¨ orstler, L. N. Cassel, M. S. Hall, and J. Hollingsworth. Developing a common format for sharing programming assignments. ACM SIGCSE Bulletin, 40(4):167–182, 2008. [20] J. Farrell. Java Programming. Thomson, 5th edition, 2010. [21] A. E. Fleury. Programming in Java: Student-constructed rules. In Proceedings of the

[22]

[23]

[24]

[25]

[26] [27] [28]

[29] [30] [31]

[32]

[33]

[34]

[35]

[36]

[37]

[38] [39] [40] [41]

thirty-first SIGCSE technical symposium on Computer science education, pages 197–201, 2000. M. Fowler. Refactoring: improving the design of existing code. Addison-Wesley Longman Publishing Co., Inc., 1999. M. Goldwasser and D. Letscher. Object-Oriented Programming in Python. Prentice Hall, 1st edition, 2008. M. Guzdial. Centralized mindset: A student problem with object-oriented programming. In Proceedings of the 26th Technical Symposium on Computer Science Education, pages 182–185, 1995. S. Holland, R. Griffiths, and M. Woodman. Avoiding object misconceptions. In Proceedings of the 28th Technical Symposium on Computer Science Education, pages 131–134, 1997. C. S. Horstmann. Big Java. Wiley, 3rd edition, 2008. C. Hu. Dataless objects considered harmful. Communications of the ACM, 48(2):99–101, 2005. E. Lahtinen, K. Ala-Mutka, and H. J¨ arvinen. A study of the difficulties of novice programmers. In Proceedings of the 10th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education, pages 14–18, 2005. J. Lewis and W. Loftus. Java – Software Solutions. Addison-Wesley, 6th edition, 2009. K. Magel. A Theory of Small Program Complexity. ACM SIGPLAN Notices, 17(3):37–45, 1982. K. Malan and K. Halland. Examples that can do harm in learning programming. In Companion to the 19th Annual Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 83–87, 2004. D. Malik and R. P. Burton. Java Programming – Guided Learning with Early Objects. Course Technology, 1st edition, 2009. J. Mason and D. Pimm. Generic Examples: Seeing the General in the Particular. Educational Studies in Mathematics, 15(3):277–289, 1984. J. Ni˜ no and F. A. Hosch. Introduction to Programming and Object Oriented Design Using Java. Wiley, 3rd edition, 2008. M. Nordstr¨ om. He[d]uristics—Heuristics for designing object oriented examples for novices. PhD thesis, Ume˚ a University, Ume˚ a, Sweden, 2009. N. Ourosoff. Primitive types in Java considered harmful. Communications of the ACM, 45(8):105–106, 2002. S. Purao and V. Vaishnavi. Product metrics for object-oriented systems. ACM Computing Surveys, 35(2):191–221, 2003. A. J. Riel. Object-Oriented Design Heuristics. Addison-Wesley, Reading, MA, 1996. D. D. Riley. The Object of Java. Addison-Wesley, 2nd edition, 2006. E. Roberts. Java – An Introduction to Computer Science. Addison-Wesley, 2nd edition, 2008. K. Sanders and L. Thomas. Checklists for grading object-oriented cs1 programs: Concepts and misconceptions. In Proceedings of the 12th annual SIGCSE conference on Innovation and technology in


computer science education, pages 166–170, 2007. [42] W. Savitch. Absolute Java. Addison-Wesley, 3rd edition, 2008. [43] C. Schulte. Block model: An educational model of program comprehension as a tool for a scholarly approach to teaching. In Proceedings of the 4th International Workshop on Computing Education Research, pages 149–160, 2008. [44] J. Sweller and G. Cooper. The use of worked examples as a substitute for problem solving in learning algebra. Cognition and Instruction, 2:59–89, 1995. [45] J. G. Trafton and B. J. Reiser. Studying examples and solving problems: Contributions to skill acquisition. Technical report, Naval HCI Research Lab, Washington, DC, USA, 1993. [46] T. VanDrunen. Java interfaces in cs 1 textbooks. In OOPSLA’06: Companion to the 21st ACM SIGPLAN Symposium on Object-Oriented Programming Systems, Languages, and Applications, pages 875–880, 2006. [47] K. VanLehn. Cognitive skill acquisition. Annual Review of Psychology, 47:513–539, 1996. [48] R. Westfall. ‘Hello, World’ considered harmful. Communications of the ACM, 44(10):129–130, 2001. [49] C.-C. Wu, J. M.-C. Lin, and K.-Y. Lin. A content analysis of programming examples in high school computer textbooks in taiwan. Journal of Computers in Mathematics and Science Teaching, 18(3):225–244, 1999. [50] C. T. Wu. A Comprehensive Introduction to Object-Oriented Programming with Java. McGraw-Hill, international edition, 2008.

with the number of pages (#pp) and factors combined: quality categories (Σ T, Σ O, and Σ D) and the total score for three items related to the cover story (CS, see Section 6.2.3). Table 13: Summary grams. Text Pages E1 BK 18–22 E2 NN 85–92 E3 Ho 85–90 E4 MB 185–192 E5 Ro 190–198 E6 BS 404–408 E7 Ho 438–442 E8 Fa 370–375 E9 NH 123–135 E10 Ro 332–338 E11 Ri 386–395

of mandatory example proCategory FUDC FUDC FUDC FUDC FUDC FUDC OOD-I OOD-I OOD-CC OOD-CC OOD-P

E12

LL

515–527

OOD-P

E13 E14 E15 E16

Ri LL Wu Ho

263–265 241–243 309–310 236–241

CS CS CS CS

Description Ticket machine Traffic signal Bank account Thermometer Student Die Savings account Employee types Simple Nim game Drawing train carts Painting and cutting cakes Firm with different types of staff If If While loop For loop

APPENDIX A. DESCRIPTION OF TEXTBOOKS A full description of the key data of all 13 textbooks used as sources for example programs can be found in Table 15. Column Topic order according TOC lists central topics in their order of detailed discussion. The numbers encode topics in the following way; 1=user defined classes, 2=multiple user defined classes (collaboration/composition), 3=inheritance, 4=polymorphism, 5=primitive data types, 6=control structures, 7=recursion. Entries in column Self declared approach are quotations from the textbooks’ back-cover texts, prefaces, or web pages.

B.

FULL DESCRIPTION OF EXAMPLES

Tables 13 and 14 summarize the key data for the examples used for evaluation. The acronyms in column Category refer to the example categories as described in Section 4.1; FUDC (first example of a user defined class), OOD (multiple used defined classes), and CS (control structures). Category OOD has been further refined into subcategories OODI (first example of inheritance), OOD-CC (first example of class composition/collaboration), and OOD-P (first example of polymorphism).

C.

CORRELATIONS OF QUALITY FACTORS

The correlations we obtained for all factors in the checklist (quality factors T1–D3, I1, and I2) are shown in Table 16. In addition to those we also performed a correlation analysis


Table 14: Summary of optional Text Pages Category E17 GL 214–221 FUDC E18 Fa 104–117 FUDC E19 DD 86–103 FUDC E20 Wu 152–162 FUDC E21 LL 190–194 FUDC E22 GL 309–310 OOD-I E23 GL 321–325 OOD-I E24 DD 442–461 OOD-I E25 BS 523–533 OOD-I E26 E27 E28 E29 E30 E31 E32

BK Ho MB DD Fa GL Sa

56–71 544–556 475–488 384–387 164–165 239-262 472–478

OOD-CC OOD-CC OOD-CC OOD-CC OOD-CC OOD-CC OOD-P

E33 E34 E35 E36 E37 E38

LL Ho MB Fa GL GL

528–536 556–577 351–357 250–252 136–139 185–192

OOD-P OOD OOD CS CS CS

example programs. Description TV controls Employee Gradebook Bicycle Die Sorted set Mailbox Employee types Special remote control Digital clock display Invoice + Address Fruit juice machine Employee + Date School + Address Mastermind game Different types of sales Polymorphic sorting ATM with GUI Fibonacci sequence Nested loops For loop Various loops



Wu [50]

987

1182

Savitch [42]

1040

Ni˜ no&Hosch [34] 769 587

1018

Malik&Burton [32]

Riley [39] Roberts [40]

1204 832

625

Goldwasser&Letscher [23]

Horstman [26] Lewis&Loftus [29]

870

1210 1596

Pages 516

Farrell [20]

Bravaco&Simonson [8] Deitel&Deitel [17]

Text Barnes&K¨ olling [4]

5th

3rd

2nd 2nd

3rd

1st

3rd 6th

1st

5th

1st 7th

Ed. 4th

2(1)

1(0)

2(2) 2(2)

2(2)

3(1)

5(3) 4(2)

6(0)

4(1)

2(1) 3(0)

Examples 2(1)

5,1,6,3,4,2,7

5,6,1,3,4,7 (2?)

1,5,2,6,3,4,6,7 5,6,1,2,7

1,6,2,3,4,7

5,1,2,6,3,4,7

1,5,6,4,3,2,7 5,1,2,6,3,4,7

5,6,1,2,3,7

5,1,2,6,5,3,4

5,6,7,1,2,3,4 5,1,6,2,3,4,7

Topic order according TOC 1,2,5,6,3,4

Y

N

Y N

Y

Y

N N

Y

Y

N N

N

N

Y N

Y

N

N Y

N

N

N N

Keywords in title Object Design/SE Y N

Objects first

Object centric Modern objects first approach; class hierarchies from the beginning —

Objects first

Balanced and flexible approach to mastering objectoriented principles Objects gradual True objectorientation Objects early but gently

—

Fundamentals first Early classes and objects pedagogy

Self declared approach Objects first

The word “object” cannot be found in the book’s description or features on its web page

There are further Java texts by the same author(s) Emphasizes specification before implementation GUI heavy Heavily based on ACM Java Task Force libraries

Very comprehensive; covers much more than just CS1&2 Features “Game Zone” allowing “students to experiment with gameconstruction logic” Python

Comments Based on BlueJ programming environment

Table 15: Summary of textbooks (examples in parentheses are mandatory).

_________________________________________________________________________________________________________ inroads — SIGCSE Bulletin Volume 41, Number 4 —4 2009 December 143 -inroads — SIGCSE Bulletin -- 143 Volume 41, Number — 2009 December #pp I1 T1 T2 O1 O2 O3 O4 O5 D1 D2 D3 I2 ΣT ΣO ΣD

I1 0.36

T1 0.17 0.36

T2 0.14 0.35 0.47

O1 0.52 0.72 0.37 0.28

O2 0.37 0.68 0.32 0.21 0.55

O3 0.17 0.41 0.22 0.21 0.35 0.25

O4 0.40 0.50 0.43 0.34 0.51 0.50 0.21

O5 0.47 0.60 0.22 0.20 0.61 0.56 0.21 0.47

D1 0.33 0.65 0.42 0.44 0.69 0.54 0.47 0.36 0.46

D2 0.44 0.52 0.34 0.22 0.56 0.42 0.36 0.37 0.46 0.52

D3 -0.02 0.37 0.44 0.54 0.30 0.20 0.12 0.28 0.24 0.37 0.37

I2 0.45 0.79 0.33 0.37 0.69 0.67 0.35 0.57 0.64 0.58 0.59 0.35

ΣT 0.18 0.42 0.85 0.87 0.37 0.31 0.25 0.45 0.24 0.50 0.33 0.57 0.41

ΣO 0.53 0.80 0.42 0.33 0.83 0.78 0.53 0.73 0.79 0.68 0.59 0.31 0.80 0.44

Table 16: Correlation matrix for quality factors and quality categories. ΣD 0.33 0.66 0.51 0.51 0.66 0.50 0.41 0.43 0.50 0.81 0.81 0.73 0.65 0.59 0.68

CS 0.48 0.80 0.43 0.36 0.88 0.81 0.42 0.54 0.64 0.87 0.59 0.34 0.76 0.46 0.90 0.77

An evaluation of object oriented example ... - Semantic Scholar

An evaluation of object oriented example ... - Semantic Scholar

Suggest Documents

An Evaluation Instrument for Object- Oriented Example Programs for

An Object-Oriented Deductive Language - Semantic Scholar

An Object-Oriented Frameworks-based ... - Semantic Scholar

An object-oriented optimization system - Semantic Scholar

An Object-Oriented Approach - Semantic Scholar

an object oriented shell for - Semantic Scholar

An object-oriented software development ... - Semantic Scholar

An Object-Oriented Framework with ... - Semantic Scholar

Object Wrapper: an Object-Oriented Interface for ... - Semantic Scholar

Object-Oriented Frameworks - Semantic Scholar

Object Oriented Interoperability - Semantic Scholar

Evaluation of Object-Oriented Reflective Models - Semantic Scholar

Object-Oriented Networking - Semantic Scholar

Evaluation of Object-Oriented Reflective Models - Semantic Scholar

Evaluation of object-oriented design patterns in ... - Semantic Scholar

Object-oriented Simulation and Evaluation of River ... - Semantic Scholar

Query Evaluation in an Object-Oriented Multimedia

An Object-oriented Design and Implementation of ... - Semantic Scholar

An Object-oriented Design and Implementation of ... - Semantic Scholar

Industrial Application of Object-Oriented ... - Semantic Scholar

Foundations of Object-Oriented Languages - Semantic Scholar

Software Cost Estimation of an object oriented ... - Semantic Scholar

An Efficient Measurement of Object Oriented ... - Semantic Scholar

An Object-oriented Design and Implementation of ... - Semantic Scholar