Objects Count so Count Objects!

1 downloads 0 Views 267KB Size Report
mented in only a few hundred lines of Java code, however, part of ...... [10] Paul J. Deitel and Harvey Deitel. 2015. Java How To Program (late objects) (10 ed.).
Objects Count so Count Objects! Ewan Tempero

Paul Denny

The University of Auckland Auckland, New Zealand [email protected]

The University of Auckland Auckland, New Zealand [email protected]

Andrew Luxton-Reilly

Paul Ralph

The University of Auckland Auckland, New Zealand [email protected]

The University of Auckland Auckland, New Zealand [email protected]

ABSTRACT One means to determine whether a student understands the fundamentals of good object-oriented design is to assess designs the student has created. However, providing reliable assessment of designs efficiently is difficult due to the many viable designs that are possible and the high level of expertise required. Consequently, design assessment tends to be limited to identifying the most basic of design problems. We propose a technique—“object counts”—that involves counting the objects created at runtime. This is more efficient than manual grading because the data is gathered automatically and more reliable than using rubrics because it is based on objective data. The data is relevant because it captures the fundamental property of an object-oriented program—the creation of objects—and so provides good insight into the student’s design decisions. This provides support for both summative and formative feedback. We demonstrate the technique on two corpora containing submissions for a typical first assignment of an introductory course on object-oriented design. ACM Reference Format: Ewan Tempero, Paul Denny, Andrew Luxton-Reilly, and Paul Ralph. 2018. Objects Count so Count Objects!. In ICER ’18: 2018 International Computing Education Research Conference, August 13–15, 2018, Espoo, Finland. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3230977.3230985

1

INTRODUCTION

Assessing a student’s object-oriented design is difficult. In part, this is because even small designs require a significant amount of time to understand. For example, Sanders and Thomas [17], in presenting guidelines for educators grading object-oriented programs, observe: “We realise that frequently when grading we are so rushed for time that we look at superficials: does it compile, does it fulfil the test cases, are there some comments?” Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. ICER ’18, August 13–15, 2018, Espoo, Finland © 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-5628-2/18/08. . . $15.00 https://doi.org/10.1145/3230977.3230985

They found that working programs sometimes included subtle errors that suggest serious misconceptions and that it took a great deal of time (“spending hours reading and re-reading each program”) to identify these errors. Assessment is difficult because we have little in the way of objective criteria to use. Christensen [8] observes that many teachers provide feedback to students such as “Your design is not really object-oriented”, but judging a design as either object-oriented or not is: “a complex interplay between exact science, experience and craftsmanship, and personal taste”. In this paper we present a means to improve our ability to provide reliable, efficient and consistent assessment of object-oriented designs. It is conceptually simple, yet surprisingly powerful. The technique is based on counting the objects created at runtime. For example, consider an assignment for a simple payroll system that takes a file containing employee records and calculates income and taxes. We might expect a design to have an Employee class. If we ran the program on a file with 100 employee records we would expect to see 100 Employee objects. If we did see 100 objects, we would have some assurance as to what this class is for. If we saw a different number (e.g. one) it would raise questions as to what the Employee class is really representing. So knowing the object count can give some insight into a design. The question is whether this generalises. Our research goal is therefore: to determine whether counting objects is useful for evaluating object-oriented designs in a software engineering education context. The rest of the paper is organised as follows. In the next section, we will present a more detailed motivating example of the problem we want to address. Section 3 then discusses background material and related work. In Section 4, we discuss the counting objects approach in more detail. We then present results from applying the technique to the submissions for two assignments requiring development of a small program (Section 6). We discuss the results in Section 7 and present our conclusions in Section 8.

2 MOTIVATION Our aims are to improve the efficiency of assessing the quality of object-oriented designs of students’ programs , and to improve the quality of both formative and summative (“marking”) assessments. As an example of the kind of assignment we are concerned with, consider the need for a program that will take two arguments from

Figure 1: Simplified class diagram for design1078 for the dependency analysis assignment. “C” means class and arrows indicate “uses” relationships. the command line: a dataset and a query. The program has to apply the query to the dataset and output the results to standard output. The dataset consists of a set of records, one per row. Each record describes a “dependency” from a source “module” to a target “module”. A dependency can have various properties, and different queries may apply only to dependencies that have those properties. An example of a query is to report, for each unique module, how many dependencies it is the source for (DepCount). Another example is, for each unique module, how many unique modules does it have a dependency with (FanOut). The specific details of the assignment are available on the project website [20]. This assignment is an example of the standard assignment set for the introductory course on object-oriented design at The University of Auckland for many years. It can be completely implemented in only a few hundred lines of Java code, however, part of the assessment criteria given to the students is that their solutions must demonstrate their understanding of object-oriented design. The difficulty we have faced with assessing this assignment is exactly that raised by Christensen. We see designs that to us are “not object-oriented”, but providing feedback is difficult. Students who do not understand the object-oriented paradigm can neither produce object-oriented designs nor understand our criticism of their designs. We want to be able to provide concrete and specific feedback. We also want to be able to gather the data necessary to provide the feedback in less time than it would take to manually review all of the code. Consider Figure 1, which shows a simplified UML class diagram for a design for the above assignment. This design, D1078, has several classes that interact with each other and so might be considered acceptable. However, further analysis raises some questions. The Counter class is not an obvious abstraction for the problem being solved, and in fact it essentially has all of the functionality, being larger (444 lines) than the other classes combined (273 lines in total). Further, the Query class, which does seem like a reasonable abstraction based on its name, does not in fact represent a query, but rather determines what query is required and invokes the appropriate method from the Counter class. These observations can be determined fairly easily from brief inspections of the source code. What is not so easy is assessing designs such as D1022 (Figure 2) and D1053 (Figure 3). These designs and D1078 have the

same functionality (determined using automated tests). These designs look better than D1078. There are more classes, and the class names correspond to reasonable abstractions for the problem being solved. They both use inheritance, with D1053 having an interface and D1022 having an interface and classes extending one another. The designs are obviously different from each other, but it is not clear how these differences might translate into different marks—it is not even obvious how to rank the two. It would seem reasonable to give both of these designs full marks. However, assessing a submission based on just a (very much simplified) UML class diagram is problematic. Often a diagram is not available or lacks important details. There is a concern that the diagram may contain errors, or have a misleading presentation. We need to identify the main design decisions that have been made in each case. Traditionally, this is done by examining the source code and piecing together the overall design. As noted in the introduction, Sanders and Thomas observed that this requires significant time and experience. The subjective nature of design quality means we cannot expect a fully-automatic procedure; nevertheless, having objective data that helps us more efficiently and effectively identify the main design decisions would be of great value. The question is, what data should we use? This paper explores whether object counts are useful.

3 BACKGROUND AND RELATED WORK 3.1 Object-oriented design To assess object-oriented designs we must be clear as to what criteria we are assessing against. There have been many discussions as to what “object-oriented” means, but they generally discuss what this means in terms of programming language features not designs. For example, Kay, who arguably has the best claim to having invented the term, states “I invented the term object-oriented, and I can tell you that C++ wasn’t what I had in mind” [13], clearly referring to the design of the language. Stroustrup’s discussion also applies to the language [19]. However, most commentators on object-oriented programming (including Kay and Stroustrup) are aware of the distinction between the language used to express the design of a program and the behaviour of the program as it executes. This view is commonly presented in textbooks. For example, Budd [5] comments “Working in an object-oriented language [...] is neither a necessary or sufficient condition for doing object-oriented programming. [...] the most important aspect of OOP is the creation of a universe of largely autonomous interacting agents.” That is, the result of executing an object-oriented program is the creation of objects that send messages to each other.

3.2 Teaching Object-Oriented Design How we assess design depends on what we teach about design. There is surprisingly little in textbooks for object-oriented programming on how to develop an object-oriented design, perhaps an indication of the difficulty of knowing what to teach. Cross reviewed studies of design cognition from an interdisciplinary and domain-independent view, and observed “Designers appear to be ‘ill-behaved’ problem solvers, in that they do not spend much time

Figure 2: Simplified class diagram for design1022 from the CA corpus. “C” means class, “I” interface, open arrow heads indicate inheritance, and other arrows indicate “uses” relationships.

Figure 3: Simplified class diagram for design1053 from CA corpus. “C” means class, “I” means interface, open arrow heads indicate inheritance, and other arrows indicate “uses” relationships. and attention on defining the problem.” [9] He concluded “In design education we must therefore be very wary about importing models of behavior from other fields.” Model curricula provide little help. Ralph examined model curricula from both the Association for Computing Machinery and the Association for Information Systems and concluded “[both model curricula] insufficiently cover how to generate design candidates.” [16] Textbooks mostly focus on what the different language features are and how to use them to create classes, but say little on how to decide which classes to create (e.g. [10, 23]). Of those that do discuss design, the most common technique presented seems to be the “noun/verb” method. For example, Barnes and Kölling [2, p.394] say “Classes in a system roughly correspond to nouns in the system’s description. Methods correspond to verbs” (emphasis theirs). Budd [5] advocates responsibility-driven design [22]. This approach first identifies the responsibilities, grouping them into roles, and then identifying the objects that should play those roles. Beck and Cunningham introduced the Class-Responsibility-Collaborator (CRC) card technique for teaching object-oriented design [3]. Both Budd, and Barnes and Kölling, use this technique. Christensen [8] identified three perspectives used in textbooks to define what an “object” is: the language perspective, which emphasises how the language constructs are used to describe objects; the model centric perspective, where objects are part of the wider context (the model); and the responsibility centric perspective, focusing on roles and responsibilities. He demonstrated how different perspectives lead to different designs, indicating how choice of

perspective in teaching impacts the kinds of designs students will produce.

3.3 Assessing Object-Oriented Design Although several studies have investigated assessment of program quality (e.g. [18]), they have typically focused on code syntax and structure, such as Documentation, Presentation, Algorithms (comprising Flow and Expressions) and Structure (comprised of decomposition and modularization), rather than the design of an objectoriented program. Armstrong reviews research that characterises the fundamental ideas of object-oriented programming, and categorises objectoriented concepts into Structural elements (Abstraction, Class, Encapsulation, Inheritance, Objects) and Behavioural elements (Message Passing, Methods, Polymorphism) [1]. Sanders and Thomas developed a checklist for designing and grading object-oriented programs based on Armstrong’s list of object-oriented concepts. Their checklist for “indications that a student understands basic OO concepts” has 23 items. One item is “Multiple instances of same class,” however they do not suggest counting the number of instances created. Studies of student understanding of object-oriented concepts found several misconceptions: they do not have a clear understanding of the difference between classes and instances [11, 12]; they have difficulty modelling the real world as objects, sometimes produce solutions that look like a procedural programming design rather than an object-oriented design, and separate the data from

the code that acts on that data [15]. In other words, they are not modelling the objects as entities that have both data and behaviour. In an evaluation of the object-oriented examples used in textbooks, Borstler et al. [4] identify five object-oriented qualities that are intended to capture accepted guidelines of object-oriented design. They found that the textbook examples typically had reasonable abstractions that plausibly modelled the problem domain, but few examples explicitly supported the idea that object-oriented programs were a collection of collaborating objects. Turner et al. [21] asked students to engage in code review of object-oriented programs (in Java) developed by their peers. They provided a rubric focusing on six dimensions (Style, Functionality, Decomposition, Encapsulation, Abstraction, and Testing Completeness) to the students. The study identified several misconceptions relating to the nature of object-oriented design which were grouped into categories of Decomposition, Encapsulation and Abstraction. Problems falling into both Decomposition and Abstraction relate to the design of the classes and how they were used to model the problem domain. The difficulty with checklists or rubrics is that they typically have a number of items that need to be checked, which can be timeconsuming to apply. Often items are subjective, cannot be treated in isolation, and assessing their interaction can require considerable expertise.

3.4

Software Metrics

When seeking an objective means on which to base assessment, it would seem reasonable to consider software metrics. There is little available describing experience with the use of such metrics, and even less on using them to assess design. For example, CardellOliver described use of metrics for summative and formative assessment, and also as a diagnostic tool to identify students’ learning styles [6]. However, she did not address assessment of design. Many software metrics have been proposed, including the socalled “CK metric suite” proposed by Chidamber and Kemerer [7], and many more based on the CK metrics. The difficulty with using such metrics is that they are only applied to a single class. For example, the CBO metric from the CK metric suite is intended to measure coupling for a class. To use this to assess a design requires gathering measurements for each class in the design and then interpreting those individual measurements. The interpretation seems as difficult as understanding the design by reading the code. The CK metric suite has six metrics. A design with 10 classes would thus yield 60 measurements to be interpreted, making interpretation even more difficult. Another issue is that measurements from most metrics are meant to be interpreted as “larger means lower quality.” Typically thresholds are provided with the interpretation that measurements that exceed the thresholds indicate a poor design. However, most student assignments are quite small. Since the measurements often depend on program size, the corresponding measurements are often very low and so do not exceed the thresholds.

4 ASSESSING OBJECT-ORIENTED DESIGNS 4.1 Counting Objects Our goal is to improve our ability to provide good quality summative and formative assessment. This requires efficiently identifying the important design decisions. One means for improving efficiency is to have a procedure for automatically gathering relevant data about a design. As noted in Section 3, existing software metrics apply only to individual classes and the measurements require careful interpretation by experienced software designers. We need data that reflects the whole design, has a clear interpretation, and indicates the decisions that it encompasses. We start from first principles. As noted previously, any objectoriented design should result in an executing system consisting of objects sending messages to each other. It follows that the objects that get created must reflect important design decisions that have been made. The basis of our assessment procedure can be simply stated as: Count the number of objects created for each class during execution. This data, which we will colloquially refer to as “object counts”, captures an important class of design decisions. The objects, and their interaction, must provide the prescribed functionality. The choice of objects must indicate how the designer has chosen to distribute the functionality in the design, in particular the roles the objects are meant to play in providing the functionality. To count objects, the program must be executed and so inputs must be supplied. Like testing, the inputs used should provide reasonable coverage of the program. Unlike testing, we do not need to cover all possibilities; we are only concerned with inputs that cover all the cases where objects are created. Our experience is that a small number of inputs is sufficient for the kinds of assignments we address. The object counts are recorded for each input. The object counts for the design are then compared with what might be reasonably expected in the problem domain. What might be reasonably expected is domain dependent, and so this requires that the instructor have a good understanding of it. Even within a given domain, there may be more than one interpretation of a reasonable number of objects. The instructor must interpret the object count with respect to such possibilities. This interpretation then forms the basis for developing the final summative or formative feedback. We illustrate the proposed approach below and provide more detail in Section 6. Our assessment procedure has the following benefits: • The data can be acquired automatically and quickly. • The data indicates an important class of design decisions, decisions we argue are central to what it means to be “objectoriented.” • The interpretation of the data is made with respect to the problem being solved. • The data can be used to justify feedback given to students.

4.2 Illustration We illustrate our procedure with the examples from Section 2. First, we must decide on the inputs. The analysis program takes two inputs, the dataset and the query. The variation in the inputs

comes from the contents of the dataset and the choice of query. We created a dataset that has records that together represent all the variation we might expect for the data. This means we can use the same dataset for all executions. We then execute the implementation of the design seven times, once for each query with that dataset. Table 1 shows the object counts for D1078. The table shows the counts for the different inputs in columns corresponding to the different queries. The left-most column shows the name of the classes for which objects are created. For brevity, package names have been omitted. We also omit classes for which no objects are created. For example CLI has only a single static method, the entry point method main, and so has no objects created (see Figure 1). As the data shows, for D1078 at most one object from each class is ever created. Compare the object counts of D1078 to those of D1053 (Table 2). For this design, 95 objects are created for each input. To interpret this number, the assessor needs to be aware of the nature of the input. In particular, the dataset used has 93 rows (recall that each row represents a dependency). From this alone we can hypothesise that the Deps class represents an individual dependency rather than a collection, as indicated by the (plural) name, since there are always 93 instances of it created. If this is the case, then “Deps” is perhaps not the best choice of name. From the classes for which a single object is created for each input, we might also hypothesise which of the classes are responsible for which query. While this may have been guessed at from their names, the object counts provide evidence to support that guess. Another point to note from the data is that several classes in Figure 3 have no object counts, such as NameChecker, SharedActions, LineProcessor, and CLI. There are two possible explanations. One is that these classes only have static methods (as was the case for CLI in D1078). The other is that they are involved in an inheritance relationship with other classes—more on this in Section 5.1. Table 3 shows the object counts for D1022. This data reveals a significant difference between this design and D1053. This design does not have a representation for a dependency. As with D1053 we can hypothesise the relationship between some classes and the query but given there is no more than one object created per class for any execution the decomposition appears more procedural than object-oriented. An argument can be made that D1022’s decomposition based on objects is better than for D1078. We have made a number of observations about these designs without having to inspect any code. We have learned much about the similarities and differences between the designs, and indications for the key design decisions made in each case, based solely on knowing the number of objects created from each class. We know of no other software metric that can easily provide this kind of insight. That said, assessors may still need to inspect some of the code. The names of the classes play a role in the interpretation, but they could be inappropriately-named. There are some aspects of the design that are not visible from just counting objects. For example, abstract classes and interfaces are never instantiated and so would not be represented in the data. The assessment for each design still has to be determined. How this is done depends on the assessment criteria for the assignment,

and the learning outcomes for the course. For example, if the criteria is only to demonstrate a basic understanding of how to distribute functionality across classes, then all three designs may be given the same mark. If the criteria is to demonstrate how to create objects that play a recognisable role in the problem domain, then the designs would be ranked (lowest) D1078, D1022, D1053 (highest). These decisions can be justified directly from the object count data in terms of what it means to be an object-oriented program. Perhaps of more value is how object counts can help provide formative feedback. The student who produced D1078 appears to be struggling with the general idea of how to create a set of objects that collaborate to provide functionality. For D1022, the student seems to understand the basics but also missed a good opportunity by not representing dependencies. For D1053, the student might benefit from giving more thought to the names of classes.

5 METHODOLOGY Our goal is to investigate whether object counts are useful for assessing students’ designs. We address this with three specific research questions, as discussed below. For any measurement to be useful for assessment, it needs variation. Student assignments for courses introducing object-oriented programming are typically small, on the order of a few hundred lines of code. They are often tightly constrained. It is therefore possible that there can be little variation in the choice of designs, and so little variation in the number of objects that might be created. Our first research question is therefore: RQ1 Is there significant variation of object counts for student designs? Furthermore, for object counts to be useful, differences in object counts should reflect differences in designs. Conversely, similar designs should have similar object counts. Our second and third research questions are therefore: RQ2 Do significantly different object counts typically indicate significantly different design decisions? RQ3 Do similar object counts typically indicate similar design decisions? To investigate our research questions we need to be able to count objects. This requires a measurement instrument and something to measure. For this first study, we consider designs implemented in the Java Programming Language. The remainder of this section describes how we make the measurements and what we measure.

5.1 Counting objects The obvious starting point to counting objects is to count the number of times a constructor is executed. However this has some problems. One problem is that there are many constructors executed in a Java program that were not written by the developer. If the design uses classes from the standard library, or any third-party libraries, then constructors for those classes will be executed. As we want to assess the design as determined by the student, then such classes should not be assessed, and so their objects should not be counted. Use of inheritance in a design complicates the analysis. If a class Child inherits from class Parent, then when a constructor for

Table 1: Objects per class for design1078 Class

Query Total Aggregates DepCount FanIn FanOut Static Summary Uses Counter 1 1 1 1 1 1 1 7 FileReader 1 1 1 1 1 1 1 7 Query 1 1 1 1 1 1 1 7 ValueComparator 0 1 0 0 0 0 0 1 Total 3 4 3 3 3 3 3

Table 2: Objects per class for design1053 Class

Query Total Aggregates DepCount FanIn FanOut Static Summary Uses Total Aggregates 1 0 0 0 0 0 0 1 DepCount 0 1 0 0 0 0 0 1 Deps 93 93 93 93 93 93 93 651 FanIn 0 0 1 0 0 0 0 1 FanOut 0 0 0 1 0 0 0 1 Static 0 0 0 0 1 0 0 1 Summary 0 0 0 0 0 1 0 1 Uses 0 0 0 0 0 0 1 1 queryCall 1 1 1 1 1 1 1 7 Total 95 95 95 95 95 95 95

Table 3: Objects per class for design1022 Class

Query Total Aggregates DepCount FanIn FanOut Static Summary Uses Total Aggregates 1 0 0 0 0 0 0 1 DepCount 0 1 0 0 0 0 0 1 DepCount$1 0 1 0 0 0 0 0 1 FanIn 0 0 1 0 0 0 0 1 FanOut 0 0 0 1 0 0 0 1 ReadFile 1 1 1 1 1 1 1 7 Static 0 0 0 0 1 0 0 1 Summary 0 0 0 0 0 1 0 1 Uses 0 0 0 0 0 0 1 1 Total 2 3 2 2 2 2 2

Child executes it must also execute a constructor from Parent. So, while only one object is created, two constructors are called. Thus just counting constructor calls can result in an incorrect measurement. Instead, constructor calls of ancestor classes should not be counted. Another consideration that must be taken into account is that in Java, all classes must have a constructor, and if the developer does not provide one, then the compiler will do so, the so-called default constructor. Objects from such classes will be created, but the constructor only appears in the compiled code (bytecode) and not the source code. Java has the enum type. The way enums are implemented means that when a program starts, all possible objects from the enum type are created. If a design has an enum with many values, the number of objects created for the enum can be larger than all objects from the other classes, obscuring the roles played by the other classes. As enums are meant to represent simple constants, we have decided not to count such objects. The data presented in this paper was gathered with a tool developed using AspectJ [14]. This is used to instrument the code to record when constructors are called. This allows the tool to capture

every call and execution to constructors. It then post-processes the results to remove unwanted data (e.g. standard library constructor calls) and to correctly count objects created in the context of inheritance, as discussed above. The tool uses so-called post-compile time or binary weaving in AspectJ, meaning it is the bytecode of the classes that are instrumented, to deal with the issue of default constructors. The tool is provided on the project website [20].

5.2 Corpora Two corpora, each consisting of multiple designs for a small system, are used in this study. They come from an assignment in two offerings of an undergraduate course in software engineering at The University of Auckland. The course introduces object-oriented design principles, and the assignment is the first in which students try to apply the principles. The stated main criteria for these assignments is that the submitted designs demonstrate that the student understands the principles of object-oriented programming. One corpus, CA, has 86 submissions for the dependency analysis assignment described in section 2 (D1078, D1022, and D1053 are members). Of these, 14 did not pass all of the automated tests,

Figure 4: Average objects created by CA designs per input in order of average objects. (One outlier omitted) but in all but 4 cases the faults were assessed as minor (all are included). The object count data for CA was gathered by executing each submission seven times, once for each different query, on the same dataset that had 93 lines of data, that is, 93 dependencies. The second corpus, CB, also has 86 submissions for an assignment that took two arguments: a file containing timesheet information and a command describing how to process the timesheets. The object count data for CB was gathered by executing each submission four times, once for each processing command, on the same dataset that had 24 lines of data, that is, 24 timesheets. The majority of the submissions satisfied the correctness requirements. These corpora are available from the project website [20].

6

STUDY RESULTS

Figure 4 shows the results for the CA corpus. The values shown are the average object counts over the seven executions for a single design. The designs are ordered by average object count. The variation between executions is typically very small, so the average counts are representative of the design behaviours. The range for the chart is 0–1,292. The designs fall into 4 main groups. The first group (G1, 43 designs) has a range of 0–15, the second (G2, 24) 86– 102, the third (G3, 6) 188–208, and the fourth (G4, 6), 280–284. Not shown is an outlier (D1009), with an average of 17,847, because it would make the chart unreadable. Figure 5 shows the results for the CB corpus, again showing average number of objects per input. Three main groups are evident, with averages of 24–34 (38), 48–56 (20), and 76–78 (6), although the groupings are less clear-cut than for CA. Regarding RQ1, whether there is variation in the number of objects created by student designs, the answer is clearly yes. To answer RQ2 and RQ3, further discussion of the problem domain is needed. Due to space constraints we will focus on CA. There are many choices a student could make, but two possible choices are: explicitly represent a dependency (with a class) or not, and explicitly represent a module or not. This gives four design options: no representation for either dependency or module (D1), representing dependency but not module (D2), representing module but not dependency (D3), or representing both (D4). Translating this into expectations for object counts, D1 would give zero objects per dependency, D2 gives one object, D3 two objects, and D4 three objects.

Figure 5: Average objects created by CB designs per input in order of average objects. This means, for a dataset with 93 dependencies, for design option D2 we would expect a minimum of 93 objects for each query, we would expect 186 for D3, and 279 for D4. For D1, the number of objects would be independent of the size of the dataset (but may depend on the query). If the rest of the required functionality is provided by just a few objects (e.g. 2–10), then the expected number of objects for the four design options matches the groups G1, G2, G3, and G4 identified above in the data. This indicates that design choices are reflected in the object counts, answering RQ2. Most designs show some variation across inputs so there are not many with exactly the same total object counts. Three that do are D1053, D1065 and D1084, with a total object count of 665, corresponding to exactly 95 objects for each input. From the analysis above, we hypothesise that these all chose design option D2. All are similar in nature (see Table 2 for D1053). All have a class that represents a dependency (called Dependency in both D1065 and D1085). All have classes providing the functionality for the individual queries (the relevant classes are often named after the query), and all are similar in what they do. One point of difference is that D1065 and D1084 provide the required output in a shared parent class (named Query) whereas D1053 implements an interface (also Query) and each implementation provides its own output. So while the designs are similar, there is some variation. This indicates that designs with the same object count are not identical, but are generally similar (RQ3). The above analysis could have been done without the object counts; however, it would have taken careful analysis of the source code. The object counts make it obvious, for example, that Deps in D1053, despite the plural, likely plays the same role as the classes named Dependency in the other designs. This is immediately obvious as there are 93 objects created from each class for all inputs. Having established this hypothesis, it can be quickly confirmed by scanning the source code. Another advantage of object counts is that it highlights designs that have unusual characteristics. For example, the design with the largest number of objects (1,292) in Figure 4 is D1050. In six of the seven inputs the class FileLine has 930 objects created. The name suggests (confirmed by inspecting the code) that the class is intended to represent one line of data. There is no obvious explanation for why so many objects would be needed, suggesting there

may be a problem with the design. In fact it is a consequence of a complicated interaction between three classes (one an abstract superclass) that suggests confusion on the part of the student as to which class has the responsibility for reading the file. Not shown in Figure 4 is D1009, which averages nearly 18,000 objects per input. The class that contributes most of these objects is Dependency (more than 17,000 for all inputs). Examining the source code for where Dependency objects were being created revealed that the only place was in the Dependency class itself—an extremely questionable design! Neither of these designs were noticed as having such unusual characteristics when they were originally graded (using traditional grading processes based on rubrics), and even when the code was scrutinised for this paper it was not evident that they would behave so strangely. Without the object count data to point the way, we would not have been able to properly assess them, illustrating the usefulness of counting objects.

7

DISCUSSION

Our contribution is that determining how many instances of each class is created—counting objects—provides useful information for assessing a design. We base this on the view that an object-oriented program must create objects. The objects that get created are concrete indications of the decisions the student has made regarding how responsibilities have been assigned to which roles, which is necessary to understanding the design. It is like assessing a house by looking at the house, rather than trying to assess it from the blueprints. The objects that are created are the reification of the design. The tool we have developed is a prototype, with no thought to performance, yet it is able to gather the data for a corpus in a matter of seconds. We have applied the technique to other corpora, and a larger design (JUnit), and encountered no performance issues. We are not suggesting that “more objects means more objectoriented”! As D1050 and D1009 from CA indicate, “too many” objects can be a sign of a poorly-conceived design. The object counts must be interpreted in the context of the problem being solved. In the case of CA, we interpret the object counts knowing there are 93 lines of data. This allows us to, based just on the object counts, hypothesise what design decisions have been made (e.g. D1–D4). Nor are we suggesting that assessment should be done solely on the basis of object counts. It is still the case that the source code should be examined. The difference is that instead of trying to understand what design decisions have been made, we instead inspect the code to confirm a hypothesis about the design based on the object counts. For example, D1014 of CA typically has 95 objects per input, suggesting it uses the D2 design option. However the class with 93 objects is Module. We might hypothesise that this is a poorly-named class and examine it to confirm this hypothesis. In fact, we find a mix of module, dependency, and query elements, providing a good source of information for providing formative feedback to the student. What the object counts do is provide guidance as to what to look for in a design, such as the Module class for D1014, the FileLine

class for D1050, or the Dependency class for D1009. Manually inspecting the source code may or may not reveal these design problems. In particular, D1050 looks reasonable if the use of FileLine is not closely examined. Counting objects provides more useful information about a design than just statically considering the abstractions represented in the design. An inappropriate choice of name can mislead as to what abstraction a class represents, whereas the object count can convey more clearly what is intended (e.g. Deps for D1053). Furthermore, a design can have abstractions that have no objects created. For example, D1053 has an interface Query. Focusing just on those classes that create objects simplifies the assessment task. A consequence of the decisions described in Section 5.1 is that superclasses do not get counted. This may seem problematic if we regard inheritance as the defining characteristic of object-oriented programming. However, a program that creates few objects despite extensive inheritance may indicate a lack of understanding of object-oriented design. We advise assessing use of inheritance separately from, and after, inspecting object counts. For example, D1022 from CA has 10 classes and an interface. However, it creates only two objects for six of the inputs, and three for the seventh. The interface in fact adds no value to the design, which overall is more procedural than object-oriented. We define an object-oriented program as one that when executing has objects sending messages to each other; however, we focus on the objects, not the messages. While there may be value in considering the messages, and we originally planned to do so, what we have found is that there is considerable value in just looking at the object counts. Small experiments we have done suggest interpreting message data is quite difficult, and so for now we are concentrating on just the objects. A limitation of object counts is that it requires an executable solution. In our case, we require that submissions be executable, in part because we use automatic marking to assess correctness. However, in courses on object-oriented design with no requirement to produce code, object counts may be of little help. The assignments we have presented here are very similar in nature, raising the question as to how generalisable object counts may be. However, we have applied object counts to two other designs of similar size to CA and CB (one a board game, the other the server side of an interactive application) from 4 different student cohorts, and found the data to be as effective as our discussion here indicates. The measurements show similar degrees of variation as seen in CB and CA, and have shown similar value for understanding the designs.

8 CONCLUSIONS We have presented how object counts can be used to assess objectoriented designs. The assessment is based on objective data, meaning it is reliable. The data is gathered automatically, and so can be done efficiently. A particular benefit is that characteristics of designs indicated by the object counts are indicative of the design decisions made, and so provide an objective basis on which to give feedback to students. We believe object counts can be used for more than just student assessment, and we plan to explore their use in other contexts in the future.

REFERENCES [1] Deborah J. Armstrong. 2006. The Quarks of Object-oriented Development. Commun. ACM 49, 2 (Feb. 2006), 123–128. https://doi.org/10.1145/1113034.1113040 [2] David J. Barnes and Michael Kölling. 2006. Objects First with Java: A Practical Introduction Using BlueJ (3rd ed.). Prentice Hall. [3] Kent Beck and Ward Cunningham. 1989. A Laboratory for Teaching ObjectOriented Thinking. In Proc. of OOPSLA-89: ACM Conference on Object-Oriented Programming Sy stems Languages and Applications. 1–6. [4] Jürgen Börstler, Marie Nordström, and James H. Paterson. 2011. On the Quality of Examples in Introductory Java Textbooks. Trans. Comput. Educ. 11, 1, Article 3 (Feb. 2011), 21 pages. https://doi.org/10.1145/1921607.1921610 [5] Timothy A. Budd. 2001. An Introduction to Object-Oriented Programming (3rd ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA. [6] Rachel Cardell-Oliver. 2011. How Can Software Metrics Help Novice Programmers?. In Proceedings of the Thirteenth Australasian Computing Education Conference - Volume 114 (ACE ’11). Australian Computer Society, Inc., Darlinghurst, Australia, Australia, 55–62. http://dl.acm.org/citation.cfm?id=2459936.2459943 [7] S. Chidamber and C. Kemerer. 1994. A metrics suite for object oriented design. IEEE Transactions on Software Engineering 20, 6 (1994), 476–493. [8] Henrik Baerbak Christensen. 2005. Implications of Perspective in Teaching Objects First and Object Design. In Proceedings of the 10th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education (ITiCSE ’05). ACM, New York, NY, USA, 94–98. https://doi.org/10.1145/1067445.1067474 [9] Nigel Cross. 2001. Design cognition: results from protocol and other empirical studies of design activity. In Design knowing and learning: cognition in design education, W. Eastman, C.; Newstatter and M. McCracken (Eds.). Elsevier, Chapter 5, 79–103. [10] Paul J. Deitel and Harvey Deitel. 2015. Java How To Program (late objects) (10 ed.). Pearson. [11] Anna Eckerdal and Michael Thuné. 2005. Novice Java Programmers’ Conceptions of “Object" and “Class", and Variation Theory. In Proceedings of the 10th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education (ITiCSE ’05). ACM, New York, NY, USA, 89–93. https://doi.org/10.1145/1067445.1067473 [12] Simon Holland, Robert Griffiths, and Mark Woodman. 1997. Avoiding Object Misconceptions. In Proceedings of the Twenty-eighth SIGCSE Technical Symposium on Computer Science Education (SIGCSE ’97). ACM, New York, NY, USA,

131–134. https://doi.org/10.1145/268084.268132 [13] Alan Kay. 1997. The computer revolution hasn’t happened yet. Keynote at OOPSLA. http://files.squeak.org/Media/AlanKay or https://www.youtube.com/watch?v=oKg1hTOQXoY, timecode 10m 34s. [14] Gregor Kiczales, Erik Hilsdale, Jim Hugunin, Mik Kersten, Jeffrey Palm, and William G. Griswold. 2001. An Overview of AspectJ. In Proceedings of the 15th European Conference on Object-Oriented Programming. 327–353. http://dl.acm.org/citation.cfm?id=646158.680006 [15] Rachel Or-Bach and Ilana Lavy. 2004. Cognitive Activities of Abstraction in Object Orientation: An Empirical Study. SIGCSE Bull. 36, 2 (June 2004), 82–86. https://doi.org/10.1145/1024338.1024378 [16] Paul Ralph. 2012. Improving coverage of design in information systems education. In International Conference on Information Systems. AIS, Orlando, FL, USA. [17] Kate Sanders and Lynda Thomas. 2007. Checklists for Grading Objectoriented CS1 Programs: Concepts and Misconceptions. In Proceedings of the 12th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education (ITiCSE ’07). ACM, New York, NY, USA, 166–170. https://doi.org/10.1145/1268784.1268834 [18] Martijn Stegeman, Erik Barendsen, and Sjaak Smetsers. 2014. Towards an Empirically Validated Model for Assessment of Code Quality. In Proceedings of the 14th Koli Calling International Conference on Computing Education Research (Koli Calling ’14). ACM, New York, NY, USA, 99–108. https://doi.org/10.1145/2674683.2674702 [19] Bjarne Stroustrup. 1988. What Is Object-Oriented Programming? IEEE Softw. 5, 3 (May 1988), 10–20. https://doi.org/10.1109/52.2020 [20] Ewan Tempero, Paul Denny, Andrew Luxton-Reilly, and Paul Ralph. 2018. “Objects count so count objects!” Project Website. http://qualitas.cs.auckland.ac.nz/data/icer2018. [21] Scott A. Turner, Ricardo Quintana-Castillo, Manuel A. Pérez-Quiñones, and Stephen H. Edwards. 2008. Misunderstandings About Object-oriented Design: Experiences Using Code Reviews. In Proceedings of the 39th SIGCSE Technical Symposium on Computer Science Education (SIGCSE ’08). ACM, New York, NY, USA, 97–101. https://doi.org/10.1145/1352135.1352169 [22] Rebecca Wirfs-Brock, Brian Wilkerson, and Lauren Wiener. 1990. Designing Object Oriented Software. Prentice Hall. [23] C. Thomas Wu. 2008. A comprehensive introduction to object-oriented programming with Java. McGraw-Hill.