Cost estimation based on business models - Semantic Scholar

1 downloads 50722 Views 247KB Size Report
software metric of business model size called the System .... It can, therefore, never be applied before a ... also been applied to both more preliminary models. 34.
The Journal of Systems and Software 49 (1999) 33±42

www.elsevier.com/locate/jss

Cost estimation based on business models Simon Moser *, Brian Henderson-Sellers 1, Vojislav B. Misic

2

Terrassenweg 18, M unsingen 3110, Switzerland Received 21 July 1997; received in revised form 25 September 1997; accepted 7 December 1997

Abstract Software development requires early and accurate cost estimation in order to enhance likely success. System complexity needs to be measured and then correlated with development e€ort. One of the best known approaches to such measurement-based estimation in the area of Information Systems is Function Point Analysis (FPA). Although it is reasonably well used in practice, FPA has been shown to be formally ambiguous and to have some serious practical de®ciencies as well, mainly in the context of newly emerged object-oriented modeling approaches. This paper reports results from an empirical study undertaken in Swiss industry covering 36 projects. We observed that a new formally sound approach, the System Meter (SM) method, which explicitly takes reuse into account, predicts e€ort substantially better than FPA. Ó 1999 Elsevier Science Inc. All rights reserved.

1. Introduction No longer is object technology (OT) con®ned to research environments (industry research departments, pilot projects and universities); for the past several years it has been emerging into the broad world of industrial and commercial software producers and has rapidly become the status quo. This does not mean that OT is a thoroughly mature technology; on the contrary, it is evolving at a higher pace than ever. One currently poorly developed component of OT is cost estimation, which provides project management advice for controlling cost, time and earned value. In the context of software development projects, those three parameters are roughly equivalent to the notions of (1) development e€ort, (2) time to delivery and (3) functionality 3 of the produced system. Controlling the three parameters has two aspects: ®rst, we have to know where we stand and *

Corresponding author. Tel.: +41-31-721-6106; fax: +41-31-3027489; e-mail: [email protected] 1 Professor of computer science (object technology) at Swinburne University of Technology in Hawthorn (Melbourne, Australia) and director of COTAR. E-mail: [email protected] 2 Currently in Hong Kong, on a leave from University of Belgrade, Yugoslavia. E-mail: [email protected] 3 The notion of quality is subsumed under the notion of functionality here, i.e. a part of a system that is below some degree of quality is not considered to contribute to functionality. We are aware of the fact that the two aspects of quantity ( ˆ functionality) and quality of a software product have to be treated separately in certain situations, but this is beyond the scope of this paper.

second, we have to know what to expect (this notion of control is adopted from DeMarco, 1982). Whereas the latter will lead us to estimation (the focus of this paper), we ®rst brie¯y discuss the basis for both aspects which is measurement. Measurement of time and e€ort is well understood at least in theory. However, measurement of parameter (3), the size of a system from a black-box view, i.e. user perspective, is both theoretically and practically nontrivial. Perhaps the most popular approach is that of Function Point Analysis. Both the original Function Point approach and recently proposed improvements, however, su€er from several theoretical and, more severely, practical de®ciencies in the context of OT (Stathis and Je€ery, 1993). The details of our assessment of those de®ciencies are given in Section 2. Based on the lessons learned, we took on the challenge of overcoming some of the problems of measuring (black-box) system size by proposing a new metric forward on cost estimation in object-oriented projects. This technique requires the existence of business models and their formal underpinning (cf. Moser et al., 1996 and Firesmith et al., 1997). See also Section 3. We also have a problem with the term size. It may be misleading because it is coupled with a non-software concept. Notwithstanding, we will use the term here, but at the same time invite the reader to take this notion as meaning something more comprehensive than is usually associated with software size (e.g. as measured by mere number of lines of code). Informally stated, we consider everything in a system as contributing to its size when it

0164-1212/99/$ - see front matter Ó 1999 Elsevier Science Inc. All rights reserved. PII: S 0 1 6 4 - 1 2 1 2 ( 9 9 ) 0 0 0 6 4 - 3

34

S. Moser et al. / The Journal of Systems and Software 49 (1999) 33±42

increases the number of elements necessary to describe the system. A formal de®nition of a newly proposed software metric of business model size called the System Meter (original proposal in Moser, 1995), is given in Moser et al. (1997) and used here as a cost predictor (the metric is explained in brief only in Section 4). Once a link between the size of business models and development e€ort has been established (Sections 3 and 4), one is quite naturally interested in assessing the extent or quality of the link. Our basic measure of this quality is the doubled standard deviation of estimation error, indicated for example as ‹20%, made with di€erent approaches to measuring system size. In a ®eld study undertaken in Swiss industry, data from 36 projects was gathered to determine the estimation error for Function Points and System Meters. In Section 5, the main contribution of this paper, we summarize the statistically analyzed results of this ®eld study. 2. Assessment of existing approaches to measuring system functionality Measuring system functionality, for example by sizing the business model of a system, is a technique that is only partially developed within the software metrics area. In the beginning, there was code and little else; hence all measures concentrated on code. Indeed, a number of software sizing and estimating techniques (e.g., COCOMO, Boehm, 1981 and Boehm et al., 1995) were ± and still are based on code measures, among which the Lines-Of-Code (LOC) measure enjoys the most widespread usage. LOC are a formally sound and very practical way to measure coded systems. However it has a number of de®ciencies. The most notable are: (1) several mutually inconsistent ways of counting lines are used, (2) the expressive power of the language used is not taken into account and (3) it is not possible to count the lines of code until the code is actually written ± which is too late to be useful for non-maintenance estimation purposes. Other measuring techniques are used, however, which cannot be ruled out so easily. One of the most widely used is the Function Point Analysis (FPA), a technique for measuring (or, more precisely, rating) system complexity after the elaboration of a functional system model. There are a few less popular variations of FPA, namely Function Weight and Function Bang measures (DeMarco, 1982), Feature Points (Jones, 1991) and Mk II FPA (Symons, 1993). We will not discuss these ``sidelines'' here in detail as they do not di€er essentially from the standard FPA as described in IFPUG (1994). The principal idea behind it is very simple: In most business applications (i.e., information systems), most functions are performed by means of just four basic

operations on one or more records kept in persistent storage: record(s) may be created, read, updated, or deleted (which does not happen very often, actually). Each record consists of a number of ®elds with actual data values. We quantify the number of ®elds in each record, the distinct operations performed on these records, and the number of these operations required to carry out a business function, and then sum the results over all business functions, using appropriate weights at each intermediate step. The ®xed weights were assigned empirically, although the generally accepted values have changed over time. Ironically they do not contribute at all to e€ort prediction as a recent study showed (Kitchenham and Kñnsñlñ, 1993) and may therefore be considered a ``historical'' burden. Nevertheless, the sum of all those weights may be used as the measure of the complexity (of functions) the system ought to provide. In order to estimate how long it will take to develop such a system, a number of factors related to the software production environment, the so-called in¯uence or adjustment factors, must be taken into account; the number of these factors and their weights are not universally agreed upon (another historical burden), and they cannot be easily related to the other parts of the model. In particular, the in¯uence of the programming paradigm used for implementation is rather hard to quantify. Mathematically the in¯uence factors are not well incorporated into the ®nal so-called adjusted Function Points. Therefore many mathematical operations (such as taking averages, etc.) do not make sense on adjusted Function Points. Moreover, FPA cannot be easily automated, i.e. measures cannot be extracted automatically from design documents, although automated assistance is available. FPA is usually done by trained professionals which introduces further biasing (Kemerer, 1993). Note that, in spite of these drawbacks, FPA rating provides good estimates for business information systems, which ®t the underlying paradigm well. The major drawbacks of FPA may be summarized then as: (a) inability to deal with non-database requirements, from which stems, (b) limited scope of systems for which it is applicable and (c) unrelatedness to issues like reuse, use of frameworks and libraries-of particular importance for object-oriented systems developments. Furthermore, the application of FPA is tied to business models or equivalent descriptions of a system's functionality. It can, therefore, never be applied before a signi®cant part (usually around 15%) of the time and budget available has already been spent. It will also never be applicable on the more detailed descriptions of the system's internals to augment estimation accuracy. This, in contrast to the System Meter approach which in addition to being applicable on business models have also been applied to both more preliminary models

S. Moser et al. / The Journal of Systems and Software 49 (1999) 33±42

(Moser and Nierstrasz, 1996) and code (Moser and Misic, 1997). In the object-oriented ®eld, a simpli®ed version of FPA exists (Object or Data Points, Sneed, 1994), which eliminates the functional part of FPA, and considers just the structural class model part. The main objective here is to make the estimate available as early as possible, with the rationale that the class model is known well in advance of a complete functional model. The accuracy of the estimate would be a€ected, but for estimation purposes an early but satisfactory estimate is better then a perfect yet late estimate. However, it is obvious that this technique has an even more limited scope than the original FPA, since the other drawbacks mentioned above are not addressed. Similar approaches that rely on unadjusted business model class counts are reported by Haynes and Henderson-Sellers (1996) with successful application for cost prediction of MIS projects. Misic and Tesic (1998) have analyzed this type of approach at the design level and found good correlation of e€ort to class counts, but better correlation to the more ®negrained method count. The approach described in detail in this paper can be viewed as a more advanced and generic extension to class counts. Recently, Boehm et al. (1995) have proposed yet another measure, the object points, as the basis of the improved COCOMO 2.0 estimation technique. Object points are generated in this technique by counting the number of screens, reports and 3GL modules involved in the application, summing them with appropriate weights similar to the FPA procedure, and adjusting the result for e€ects of reuse. However, these techniques cannot be applied before the design phase is over. Hence, they are unsuitable for early estimation purposes-a new general purpose (i.e. domain independent) object-oriented software metric for estimation is required. 3. Business models and standard software processes: prerequisites to estimation Among the biggest obstacles to good estimation are missing product and process standards. Only for a process under control (CMM Level 3 and above) does the notion of ``standardized product artifact'' become a reality. Recent studies (Haynes and Henderson-Sellers, 1996 and Haynes et al., 1997) have shown that such standardization can be achieved ± not only in traditional software development environments but also for objectoriented systems. Whilst an OO system is composed of classes with implemented operations, attributes and references to other classes, there are many higher-level artifacts that are currently used to describe and plan the building of these coded classes. Typically, we partition these higher-

35

level artifacts into design and analysis artifacts. The design level is characterized by the fact that it is a simpli®ed and thereby a clari®ed view of the coded system. Whatever type of design diagram is used, it must be remembered that all these diagram types simply abstract away certain details, while the system being viewed, regardless of the diagrammatic ``lens'' used, is still the same ± all diagrams must re¯ect the same (designed) reality (or reality-to-be!). In contrast, the analysis or requirements level re¯ects an inherently di€erent view: the view of the system as required and perceived by the system user. This view in turn may be subdivided into a real-world (or business-oriented one) and another, the application analysis view, closer to the computerized system. As the real-world or business models typically appear very early in the software development process, they seem most appropriate as a basis for early estimates. E€orts have been under way in recent years to ``standardize'' the concepts underlying the various higher-level artifacts developed to describe an objectoriented system. Here we propose a simpli®ed version of such metamodeling and formalized approaches as an ``arti®cial'' non-graphical business modeling language, called DOME (Moser, 1996, cf. Fig. 1). A DOME model can be automatically extracted from CASE tools or established manually. Most likely, due to DOMEs semantical simpli®cations, it will be compliant with the forthcoming OMG standard. A detailed description of the semantics of the language is beyond the scope of this paper. Instead, we provide a sample business model expressed in DOME as an illustration and working example (cf. Fig. 2). It makes use of only the class model and use case model constructs for the sake of simplicity. We assume that the semantics of the example is to a large extent self-explanatory. Just a few hints: the real world system modeled in Fig. 2 mainly deals with orders. Customers want to inquire about the item types o€ered, then, eventually, they want to order something, and, if we are unlucky, they want to cancel orders. A major distinction of domain classes in DOME, compared to classes in UML (Rational, 1997) or OML (Firesmith et al., 1997), is that the details of their attributes are ignored. Just a rough indication of the number of expected attributes per type is given. In addition to the standardization of business models, standardization (at least within an organisation) of the software development process is also essential. If we fail to provide such a process standard, the e€ort estimates will have no reference for what they are meant. However, the standardization of the development process lags behind other standardisation e€orts, probably because not many modern valid approaches exist that incorporate modern object-oriented software development techniques (Henderson-Sellers et al., 1996). We

36

S. Moser et al. / The Journal of Systems and Software 49 (1999) 33±42

Fig. 1. The DOME business modeling language (main keywords bold). stands for the expected number of attributes of the class/type indicated. This expected number is an integer indicating overlapping ranges as follows: 1 ˆ 1 ÿ 15, 2 ˆ 6 ÿ 35, etc. For example by stating ``contains two attributes 'String'' we model the fact that we expect a class to contain between 6 and 25 attributes of type String. Function types are ``classi®ed'' methods that can be attributed to classes in an identical manner to data types. Function Types are gained from factoring out commonalities in the concrete use cases. Function Types can comprise single steps only (typically the basic steps by adding, changing accessing or deleting of information) or consist of repeated standard sequences of steps in concrete use cases. In that sense they correspond to the notation of abstract use case as contained in the UML metamodel (Rational, 1997).

recommend either the recently proposed OPEN standard software process (Graham et al., 1997) or the more established BIO software process (Moser et al., 1996, available in German only). Both process frameworks

are highly tailorable to ®t into various existing corporate standards. The details of the process aspect of standardization is, however, beyond the scope of this paper.

Fig. 2. A sample business model in DOME.

S. Moser et al. / The Journal of Systems and Software 49 (1999) 33±42

4. Measuring business models: the System Meter approach The System Meter measures the size (or complexity) of system descriptions. Fig. 3 depicts this very generic metamodel behind the System Meter in an ER-style diagram (using crow-foot notation for relationshipcardinalities and arrows for subtype-relationships): The three System Meter de®nitions, external, internal and total size, for a description object o based on this metamodel are straightforward: (i) externalSize (o) ˆ numberOfTokens (o.name) (ii) internalSize (o) ˆ Rx 2 o:isWrittenBy (externalSize (x)) (iii) size (o) ˆ externalSize (o) + internalSize (o) As explained in more detail further below, the System Meter also distinguishes between 1. highly common reused description objects (language sub-category), 2. reused but non-publicly standardized description objects (library sub-category) and 3. project-speci®c description objects (project sub-category). A last step before we sum up the individual size values is to adapt for reuse. It is well known that the issue of reuse has many facets. In this paper we will focus on only two issues which are important for estimation: 1. the determination of which parts of a business model are supported by already existing components (e.g. in the form of frameworks, parametrizable classes, packages, clusters, libraries), and 2. the re¯ection of this information in the system's size. First, we have to enhance the business model with additional information about reuse. This information cannot be provided without some knowledge of the technology that is available to implement the system. The procedure is to ask, for each of the model items, whether the available technology has a reusable component that implements the model item. For the di€erent model items this leads to the following questions:

Data type. Does the programming environment (GUI, programming language, database, communication software, etc.) support the data type speci®ed? Function type. Do we have ready-to-use components (GUI, programming language, database, communication software, etc.) that support the function type? For example, do we have a framework that provides the functionality of creating objects ( ˆ recording information) covering all layers (from GUI to the database)? Domain class. Do we have a plug-in class (or package/ cluster) that implements the required domain class? For example, do we have a ready-to-use technical class that implements a domain class ``Address''? (NB: The implementation should support the required domain associations, too.) Use case. Can we reuse an entire implemented subsystem that supports the business process described as a use case? State-transition. Does the implementation of a reusable class or subsystem support the state-dependent behavior speci®ed? The answers to these questions may be simply ``Yes'' or ``No''. Often, however, we will observe cases where the answers are not straightforward. Still, the rule is that we have to decide on one of the two simple answers. The rationale behind this is that even if we allowed for a percentage answer (e.g. 75% reusable components for a required item) we would have a large amount of uncertainty and gut-feeling involved. Worse still, we would give an opportunity to introduce a means of manipulating reuse-modeling towards a desired estimate. Therefore, the rule is that a reusable component is either existing 4 or not. Now we may think on how to re¯ect reuse-modeling in the system's size. Before we give our solution, we review the kinds of objects (or items) that are involved in a business model description: 1. the language objects (i.e. the constructs provided by DOME), 2. the library objects (i.e. the ones supported by reusable components) and 3. the genuinely project-speci®c objects. You need to understand all three kinds of objects in order to understand what the system is supposed to do. However the contribution to the system's size of each of the kinds of objects di€ers. Language objects are assumed to be understood without e€ort, library objects just need to be known with respect to their interface, and only the project-speci®c objects need an understanding of both the interface and implementation. This is re¯ected by the following rules for summing up sizes:

4

Fig. 3. The generic metamodel behind the System Meter.

37

The anticipated e€ort of reusing the component should be in the range of 0±10% of the building-from-scratch e€ort or else the component should not be considered to exist.

38

S. Moser et al. / The Journal of Systems and Software 49 (1999) 33±42

1. do not count language objects at all 2. count only the externalSize of library objects 3. count the full size of the project-speci®c objects Before we go into examples, two things need some clari®cation. First, for each development process the categorization of what are language, library and projectspeci®c model items is di€erent. Even the language objects, which may be used as such in many ``normal'' projects, may change their category in very special circumstances (e.g. when a new language is developed). In general it is the key belief of reuse that the percentage of project-speci®c objects will decrease over time. 5 Second, we consider library objects only if they may actually be referenced in other parts of the system description. Therefore anonymous library objects are ignored. It is hoped that the reuse-adapted overall system size predicts the involved e€orts better than the non-adapted size. That hope has turned into con®dence through the results of our ®eld study (cf. Section 5) which may be considered as a practical validation of the System Meter. Readers interested in a formal validation are referred to previously published research in Moser (1996). In summary, the formal assessment yielded positive results. Because we essentially just count the tokens involved in the description of systems, the System Meter is on an absolute scale; all statistical and mathematical operations can be applied and consequently System Meter values can be used safely in regression models (as in the ®eld study described below). 5. Results of the ®eld study Before we evaluate the results, the criteria used to assess the estimation methods need to be brie¯y outlined. Obviously the main criterion should be (C1) yield estimates as precisely as possible. We will discuss this criterion below. But ®rst, a few words on the second criterion, which is (C2) yield estimates as cheaply as possible. Because the main estimation e€ort lies in the required pre-modeling, in our case the elaboration of a valid business model, the two evaluated methods, FPA and the DOME System Meter method, may be considered roughly equal. When regarding the measurement step of estimation, however, FPA is inherently less automatable and therefore more labour-intensive than the System Meter. The di€erences are still not considered substantial from the larger project perspective. The following observations and remarks may be of interest (even though not the focus of the work presented in this paper): In our survey an average of 19% of

the total development e€ort was spent before a stable business model was available. Di€erent kinds of models showed other percentages. In general, there is a tradeo€ situation between criteria C2 and C1, i.e. the cheaper the modeling behind an estimation method, the less accurate it is. The need for cheaper estimates than those based on business models led to the application of the System Meter measurement rule on so-called preliminary analysis models (using a formal language PRE, described in Moser (1995), that is similar to DOME as shown in Fig. 1, though much less detailed). Preliminary analysis models are typically available after 5% of the overall development e€ort is spent. The corresponding PRE System Meter model size predicts the e€ort with ‹34% accuracy. We are not aware of any other comparable measurement-based estimation method. Further discussion is, however, not an issue of this paper. Because the estimation methods compared here are equivalent with respect to C2, we may focus our attention entirely on criterion C1. In order to assess with respect to C1, one takes an estimate, called Xestim , of some parameter X with the method in question and, at the end of the process, measures the e€ective outcome of X, called Xeff . The relative error for a single project may then be de®ned as: (De®nition 1) estimation error º (Xestim ÿ Xeff )/Xeff The estimate Xestim gained with a measurement-based method M is calculated using a trend function (usually a least-square ®tted polynomial) AM that takes the measured model size, S, as its argument: (De®nition 2) Xestim º AM (S) and therefore we have (Formula 1) estimation errorM ˆ (AM (S) ÿ Xeff )/Xeff The main advantage of using formula 1 is that we can quantify the estimation error entirely using ex-post ( ˆ after the fact) measurements of S and Xeff . In a large enough set of projects, we will expect a mean estimation error of zero, because over- and underestimates are equally probable when AM is a reasonably ®tted approximation function. The standard deviation, however, will di€er for di€erent ways of measuring the model size. We, therefore, introduce the notion of approximation error, dAM , which is the doubled standard deviation of the estimation error made with method M observed in a set of projects P: (De®nition 3) dAM º 2 ´ stddev (estimation errorM , P)

5

Time, here, is not to be understood within a project, but in the course of several projects that may yield reusable library or even language objects.

The approximation error can then be used to assess the estimation capability of a sizing method for

S. Moser et al. / The Journal of Systems and Software 49 (1999) 33±42

parameter X in the context of a project set P. Furthermore, for a single estimate made with method M, ‹dAM can be interpreted as the range within which 95% of the expected outcomes should fall (according to the Gaussian distribution function). Therefore dAM is a measure to assess both individual and method-wide estimation errors. In the following discussion, we will use it to assess the method-wide estimation errors of the FPA and the DOME System Meter methods for estimating e€ort. E€ort was measured with respect to the full standard BIO software process. Individual deviations from this standard development process were factored out using a normalization step. The details of this step are described in Moser (1996) and not discussed further in this paper. The analyzed set consists of 37 projects that vary in many aspects: The majority of projects were undertaken at three medium sized information services companies (18, 9 and 4 projects). Additionally, there were two university projects, one single-person company project and three projects from a large company from the chemical industry. The total project e€orts observed ranged from 1.5 person-months to more than 100 person-months. The average team sizes ranged from as few as 0.8 persons to slightly over ®ve persons and maximum team sizes from 1.2 to 10.5. Project completion ranges from December 1987 until October 1996. An overwhelming 26 projects used Smalltalk, seven used 4GLs and four used C++. The application domains varied from work-¯ow administration, land registry, statistics, taxation, registration of chemical formulae to decision support and management information systems. A majority of the projects (29 of 37) was built using a client/server architecture with a GUI-client and a data-

39

Fig. 5. Function Point to e€ort correlation: dAFunction Point ˆ ‹20%.

base server. Four projects explicitly had to deliver reusable frameworks. The data and corresponding values of dAM are shown as scatter charts in Figs. 4 and 5 as well as in tabular form in Table 1. Quite obviously, one might be tempted to consider the System Meter method as (at least) twice as good as Function Points. A careful statistical analysis, though, is more appropriate: Conte et al. (1986) recommend measuring MRE, the magnitude relative estimation error, in a set of test projects and compare the mean MRE values obtained by the two competing methods. Because we cannot use all 36 projects as test projects (because then we would have no projects to derive the prediction models from) we have randomly chosen 24 projects as the basis for the prediction models and the remaining 12 as test projects. A recently completed project could also be included to make the test suite 13 projects in total. The two predictive models, obtained using polynomial (quadratic) regression, are: FPA: e€ort ˆ 0.63 á s + 0.0000344 á s2 DOME SM: e€ort ˆ 0.152 á s + 0.0000131 á s2 For further statistical analysis, we have to state a basic hypothesis as follows: (0-Hypothesis)

The mean MRE of DOME System Meters is equal to that of Function Points

and a one-sided alternative hypothesis: (Alt-Hypothesis) Fig. 4. DOME System Meter to e€ort correlation: dADOME SM ˆ ‹9%. Intrestingly, the two criteria of precision (C1) and cost (C2) of estimation models seem to be linked by a ``law of non-determinism''. If we de®ne the cost of estimation as the percentage of e€ort spent on modeling, PM, and the precision of estimates as dAM , then we observe that the product PM dAM is almost constant for the two System Meter methods: for PRE SM PM ˆ 5 amd dAM ˆ 34, hence the product is 170, for DOME SM we have PM ˆ 19 and DAM ˆ 9, hence the product is 171. Is this the seed for a new universal law of software estimation precesion? Or is it a mere incident?

The mean MRE of DOME System Meters is lower than that of Function Points

We may then apply a statistical test measure on the test set that tells us which of the hypotheses is supported by the data. Because we have pairs of data, SM and FP values for the same 13 projects, and because the exact distribution of the data cannot be determined (due to the small sample), we choose the most robust of the available tests, the paired Wilcoxon-signed-ranksum test

40

S. Moser et al. / The Journal of Systems and Software 49 (1999) 33±42

Table 1 37 Projects with e€ort and size values Project Id.

Characteristics

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22 P23 P24 P25 P26 P27 P28 P29 P30 P31 P32 P33 P34 P35 P36 P37

cpp 4gl 4gl 4gl 4gl st, c/s st, c/s st, c/s st, c/s st, c/s, fw st, c/s st, c/s st, c/s st, c/s st, c/s 4gl st, c/s, fw cpp st, c/s st, c/s st, c/s st, c/s st, c/s st, c/s st, c/s 4gl, c/s st, c/s 4gl, c/s st, c/s cpp, c/s st, c/s, fw cpp st, c/s st, c/s st, c/s st, c/s, fw st, c/s

a

Size in DOME system meters

Size in function points

E€ort in person days

1471 2772 5011 1554 2075 3574 2592 3058 1949 5011 6669 181 5975 1054 1430 5047 2994 1841 5769 3124 2040 4129 813 2348 2165 5029 1873 4518 4161 391 1233 1332 1065 1225 9552 1319 6381

256 528 1084 274 371 716 491 579 318 1009 1600 28 1389 176 239 1126 487 327 1326 605 347 867 133 426 377 1095 339 942 842 65 187 229 180 200 2647 232 1423

370 800 1556 420 579 1050 753 946 610 1173 2300 45 2000 251 341 1640 505 440 1897 900 550 1269 205 651 537 1630 540 1290 1215 109 230 315 280 308 3462 295 1687

a

st ˆ Smalltalk, cpp ˆ C++, 4gl ˆ 4th generation language (Synon, ORACLE/FORMS), c/s ˆ Clinet/Server application with GUI, fw ˆ construction of a framework.

Table 2 FPA and DOME SM magnitude of relative estimation errors for 13 projects Project Id.

FPA MRE (%)

DOME abs. error SM MRE di€erence (%) (%)

sign

signed rank

P1 P2 P10 P13 P15 P17 P23 P24 P29 P31 P32 P33 P37

7.11 0.38 22.07 0.61 8.44 32.87 1.81 0.30 3.06 21.54 11.85 0.50 18.44

1.70 1.25 7.95 1.07 2.04 17.42 0.68 0.61 1.93 10.75 1.55 1.92 5.50

+ ÿ + ÿ + + + ÿ + + + ÿ +

7 ÿ3 12 ÿ2 8 13 4 ÿ1 5 10 9 ÿ6 11

Total

5.41 0.87 14.12 0.46 6.40 15.45 1.13 0.31 1.14 10.79 10.30 1.42 12.93

67

(Riedwyl, 1978). The idea is to calculate the absolute MRE di€erences, i.e. the DOME SM MRE minus the FP MRE, for each project, assign ranks to these absolute di€erences, make those ranks negative where FP showed a lesser error than DOME SM, and ®nally sum up all signed ranks. This ranksum is then compared to a threshold value that depends on the sample size. If the threshold is surpassed, then the alternative hypothesis is supported (not proven nota bene). The following Table shows the detailed MRE and ranking values of the 13 selected projects: (see Table 2). The ranksum is 67 while the threshold 6 for 13 item samples is 49. This allows the following conclusion to be drawn: 6

This threshold is de®ned as a 95%-con®dence level one-sided quantile. This means that the test result can be taken with 95% con®dence.

S. Moser et al. / The Journal of Systems and Software 49 (1999) 33±42

(Test Result) 67 > 49, the 0-hypothesis is rejected We may, therefore, claim that there is statistical evidence for DOME System Meters performing signi®cantly better as an e€ort predictor than Function Points in the context of the sample. A more comprehensive collection of data would be necessary to allow a more global statement. 6. Conclusions and further research A ®eld study covering 36 projects showed that a new software metric of size, the System Meter, applied on business models predicts development e€ort signi®cantly better than the traditional Function Point metric. The DOME System Meter cost estimation method, as it is called, has been applied in industry since mid-1996. We constantly get more empirical data. 7 Besides its successful application on business models, the System Meter has already been proven to be of use on even earlier system models (Moser and Nierstrasz, 1996). The main advantages of the method are its generality and the fact that di€erent project-speci®c degrees of reuse can be assessed. Furthermore, being essentially a token count, the measure has good formal properties which allows it to be applied in the de®nition of derived ratio measures of quality. The traditional Function Points, despite having their non-deniable merits, lack the abovementioned properties and therefore their application seems to be limited. Ongoing research is focused on two areas: (1) making the System Meter operational on the level of software speci®cations (window and report designs and speci®cations) and (2) applying the System Meter on coded software with the main focus of quality measurement. Quality measurement is currently mainly focused on the structural design aspects of coupling and cohesion, which ®t the ``description object/dependency'' paradigm underlying the System Meter well. While theoretical proposals for System Meter-based quality metrics already exist, the practical evaluation is still a topic of ongoing research. Acknowledgements Simon Moser wishes to acknowledge the ®nancial support of the Swiss National Science Foundation. This 7

For more information on how to practically apply this method and other modern software estimation techniques contact The SEE Group (Software Estimation and Engineering) at [email protected]. The SEE Group is a newly formed software estimation and engineering non-pro®t interest group hosted by COTAR, the Centre of Object Technology Application and Research, at Swinburne University of Technology (Melbourne).

41

is contribution number 97/34 of COTAR (Centre for Object Technology Applications and Research). We thank Dr. Rob Allen for his suggestions on improving the presentation of this article. References Boehm, B.W., Clark, B., Horowitz, E., 1995. Cost models for future life cycle processes: COCOMO 2.0. Ann. Software Engrg. 1 (1), 1±24. Boehm, B.W., 1981. Software engineering economics, Prentice-Hall, Englewood Cli€s. Conte, S.D., Dunsmore, H.E., Shen, V.Y., 1986. Software engineering metrics and models. Benjamin/Cummings, Menlo Park. DeMarco, T., 1982. Controlling SW Projects, Prentice-Hall, Englewood Cli€s. Firesmith, D., Henderson-Sellers, B., Graham, I., 1997. OPEN Modeling Language (OML) Ref. Manual, SIGS Books, New York. Graham, I.M., Henderson-Sellers, B., Younessi, H., 1997. The OPEN process speci®cation, Addison-Wesley, Reading. Haynes, P., Avotins, J., Henderson-Sellers, B., 1997. Classes as a size unit for cost estimation and productivity measurement, (in preparation). Haynes, P., Henderson-Sellers, B., 1996. Cost estimation of OO projects. Amer. Programmer 9 (7), 35±41. Henderson-Sellers, B., Graham, I.M., 1996. OPEN: toward method convergence? IEEE Computer 29 (4), 86±89. IFPUG, 1994, Counting Practices Manual V4.0, Westerville, Ohio, USA. Jones, T.C., 1991. Applied Software Measurement, McGraw-Hill, New York. Kemerer, C.F., 1993. Reliability of function points measurement. CACM 36 (2), 85±97. Kitchenham, B., Kñnsñlñ, K., 1993. Inter-Item Correlations among Function Points. Intl. Software Metrics Symp., IEEE CS Press, Los Alamitos, pp. 11±14. Misic, V.B., Tesic, D.N., 1998. Estimation of e€ort and complexity, to appear in J. Software and Systems. Moser, S., Cherix, R., Flueckiger, J., 1996. BI-CASE/OBJECT (BIO) V3, Bedag Informatik, Berne, Switzerland. Moser, S., 1996. Measurement and estimation of software and software processes, Ph.D. thesis, University of Berne, Berne, Switzerland. Moser, S., 1995. Metamodels for Object-Oriented Systems, SoftwareConcepts and Tools. Springer Intl. 16 (2), 63±80. Moser, S., Nierstrasz, O., 1996. Measuring the E€ects of O±O frameworks on Developer Productivity. IEEE Computer 29 (9), 45±51. Moser, S., Henderson-Sellers, B., Misic, V.B., 1997. Measuring objectoriented business models. In: Proceedings of the TOOLS Pacific'97 conference, Melbourne, Australia. Moser, S., Misic, V.B., 1997. Measuring Class Coupling and Cohesion: a Formal Metamodel Approach. In: Proceedings of the APSEC'97 conference, Hong Kong. Rational, 1997. UML Semantics, Version 1.0, 13 January 1997, http:// www.rational.com. Riedwyl, H., 1978. Angewandte mathematische Statistik in Wissenschaft, Administration and Technik, Paul Haupt Verlag, Bern, Switzerland. Sneed, H., 1994. Calculating Software Costs using Data (Object) Points, SES, Ottobrunn/Germany. Stathis, J., Je€ery, D.R., 1993. An empirical study of Albrecht function points, in Measurement for improved IT management. In: Verner, J.M. (Ed.), Proceedings of the First Australian Confer-

42

S. Moser et al. / The Journal of Systems and Software 49 (1999) 33±42

ence on Software Metrics, ACOSM'93, Australian Software Metrics Association, Sydney, pp. 96±117. Symons, C.R., 1993. Software Sizing and Estimating Mk II FPA, Wiley, New York. Simon Moser is manager of software engineering methods at Bedag Informatik in Switzerland. Prior to this, in 1997, he was a research fellow founded by the Swiss National Science Foundation at Swinburne University of Technology in Melbourne. In his Ph.D. thesis, which was ®nished in 1996, he developed a new software metric of size,

the System Meter, and empirically validated it in the context of cost estimation. He also developed tool support for measuring System Meters from various modelling languages (such as UML). The basis for this new development lies in the experience he gained from applying the Function Point Method (IFPUG) in over 100 projects since 1990. He has published and presented a number of scienti®c papers on the topic in journals and at conferences. Simon is a member of the ACM and the OPEN Consortium. In 1997 he founded and started the SEE Group initiative, a non-pro®t virtual organisation promoting sound software estimation and engineering practices.

Suggest Documents