An Instrument for Assessing Software Measurement Programs

2 downloads 59763 Views 91KB Size Report
data on the establishment and conduct of software measurement programs. ... Little of the measurement research literature to date has focussed on software measurement ... patterns of behaviour emerge in different contexts, for example, the business and public ..... Without that knowledge, the best prediction would.
Empirical Software Engineering, 5, 183–200 (2000)

c 2000 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. °

An Instrument for Assessing Software Measurement Programs MICHAEL BERRY CSIRO Mathematical and Information Sciences (CMIS), Organisational Performance Measurement Group, North Ryde, Australia ROSS JEFFERY Centre for Advanced Empirical Software Research (CAESAR), School of Information Systems. University of New South Wales, Sydney 2052, Australia

Abstract. This paper reports on the development and validation of an instrument for the collection of empirical data on the establishment and conduct of software measurement programs. The instrument is distinguished by a novel emphasis on defining the context in which a software measurement program operates. This emphasis is perceived to be the key to 1) generating knowledge about measurement programs that can be generalised to various contexts, and, 2) supporting a contingency approach to the conduct of measurement programs. A pilot study of thirteen measurement programs was carried out to trial the instrument. Analysis of this data suggests that collecting observations of software measurement programs with the instrument will lead to more complete knowledge of program success factors that will provide assistance to practitioners in an area that has proved notoriously difficult. Keywords: Software measurement, software metric, metrics program, measurement program, success factors, assessment

1.

Introduction

A software measurement program is the set of on-going organisational processes required to define, design, construct, implement, operate and maintain an information system for collecting, analysing and communicating measures of software processes, products and services. A program requires a combination of standards, procedures, techniques, and support tools plus people who understand the underlying software technology and understand human behaviour when confronted by change. A program produces a socio-technical system for achieving behavioural changes within the wider socio-technical system of the organisation. Socio-technical systems provide a significant challenge to the researcher who seeks to identify which of many variables contribute to the eventual success or failure of those systems. Even when this is possible for a particular case, it may be difficult to generalise this knowledge to other measurement programs. Much of the knowledge available to those implementing measurement programs has been generalised from the particular. The Software Measurement Program Survey Project (MPSP) has assembled these items of particular knowledge and incorporated them into an assessment instrument. The instrument thus enables a measurement program to be assessed against a comprehensive set of “success” factors. It is expected that, by using the MPSP instrument on many programs, those factors that are commonly associated with success and failure will be identified.

184

BERRY AND JEFFERY

The current version of the instrument is available on the world-wide web at http://www. cmis.csiro.au/Mike.Berry/MPSPInstrument.htm. In addition to the questions and definitions used in the instrument, it is possible to review the argument for including each factor in the instrument and see a summary of the data collected from the thirteen measurement programs that were used as part of the validation of the instrument.

2.

Research into Software Measurement Programs

Little of the measurement research literature to date has focussed on software measurement programs. Hall and Fenton (1994) call this a serious omission. The poor rate of program success (Niessink, 2000; Goldenson et al., 1999; Laframboise and Abran, 1996; Fenton, 1991; Daskalontonakis, 1992; Rubin, 1987; Holdsworth, 1994) and lack of measurement maturity (Dutta et al., 1999; Jeffery and Zucker, 1997) may be attributed to this omission. The poor outcomes in practice appear unrelated to the total level of research into software measurement. Zuse states that “more than one thousand measures were proposed by researchers and practitioners, and till today more than 5000 papers about software measurement are published” (1998). While the incidence of measurement has increased over the last ten years, particularly within the commercial IT sector, for many organisations (Jeffery and Zucker, 1997) measurement is immature when compared to, for example, the maturity models presented by Daskalontakis et al. (1990), Slovin (1997) and ISO/IEC 15504 (1998). Fenton and Neil (1998) conclude that there is a mismatch between the directions of academic metrics research and the requirement of measurement programs to support managerial decision-making. Research to date is typified by single-organisation case studies. For example, the work by Grady and Caswell (1987) provides a well-documented case study of how one organisation established measurement. This case has provided the model for many other programs. Multi-organisational studies are rarer. Possibly the most comprehensive descriptive study of software measurement programs was published by Rifkin and Cox in 1991. These researchers conducted site surveys of eleven divisions of the following organisations: Contel, Hewlett Packard, Hughes Aircraft, IBM, MacDonnell Douglas, NASA, NCR and TRW. These organisations were selected because of their reputations for excellent measurement practices. The study identified that the organisations had certain patterns of behaviour in common. The Rifkin and Cox study is important in that it was conducted across multiple organisations, however, all those organisations are located within military, telecommunications, aerospace, and engineering environments. Few of the success patterns identified by the authors relate to the organisational context in which the program operated. Many of the themes that they identify are process-related and many of the organisations had processes in common. This may be a result of the process-orientation of the study and the common base of measurement publications from which the subject organisations were operating. The Rifkin and Cox study left the following questions remaining from their work: •

Are the patterns of behaviour causal or are they outcomes of some other cause?

AN INSTRUMENT FOR ASSESSING SOFTWARE MEASUREMENT PROGRAMS



185

What is the importance of contextual variables in program success? Would the same patterns of behaviour emerge in different contexts, for example, the business and public administration environments?

Experience from research into failures of information systems would indicate that the causes of measurement program failure would have a social and behavioural basis (Sauer, 1999). Mostly, this research is factor-based in order to rank variables in terms of their association with success. Sauer (1999) cautions that factor studies have been too simplistic for the complexity of the phenomenon. He suggests that a qualitative approach is needed to deal with the complexity. The MPSP project has attempted to support further empirical research into measurement programs by providing an instrument for multi-organisational research. The resulting instrument combines a factor study approach in order to support prediction, and a qualitative approach, seeking to understand the relationships between all variables. The factor-based approach was adopted to meet the practitioners’ need for practical advice, however the imprecision in the definition of the factors suggests that caution needs to be exercised in applying the advice (Garrity and Sanders, 1998). It needs to be emphasised that the contribution of the MPSP project to empirical research is the MPSP instrument itself. We are not suggesting that the analyses of the observations from the pilot study are a significant contribution, except in so far as they support the assertion that characterisation of the context for measurement is essential. 3.

The MPSP Project

CAESAR at UNSW and CMIS at CSIRO have been conducting a long-term research program into software measurement programs. The aim has been to develop a model of measurement program success from multi-organisational empirical data rather than from single organisation case studies and anecdote. The MPSP is part of that program. The primary objective of the MPSP project has been to provide a validated instrument for the collection of research data about software measurement programs. The MPSP project consisted of the following activities: 1. Determining the information requirements in order to research measurement program success. 2. Designing the collection method to obtain that information. In addition to the primary objective of supporting research, the MPSP instrument may be used by practitioners to assess their specific software measurement programs. Since the instrument has been developed using the available literature, the practitioner can compare their program to the normative theory and identify strengths and weaknesses of their current program. Figure 1: (Context of the MPSP) illustrates the relationship between the MPSP Instrument, empirical research into measurement and organisational measurement programs. A measurement program services the information requirements of the software life-cycle processes (ISO/IEC JTC1/SC7, 1995). It is split into activities 1) measurement and analysis,

186

BERRY AND JEFFERY

Figure 1. Context of the MPSP.

and 2) program establishment and improvement. The measures collected and analysed by the measurement program result in information products to be used in the conduct and improvement of the software life-cycle processes. The MPSP instrument is developed from a model of program success based on the available literature. The instrument is used to collect observations of the performance and capability of a measurement program. Analysis of the observations flows back to the organisation’s improvement activity. Additionally, when aggregated with the evaluations of many programs, the observations provide a resource for empirical research into software measurement programs. The research product flows to the literature where it provides both a source of advice to practitioners and helps to improve the MPSP Model of Measurement Program Success. 4.

The Design of the MPSP Instrument

This section discusses the theoretical basis for the design of the instrument and identifies the rationale for decisions that were made. The design of the MPSP data collection instrument consisted of the following activities: 1. Determining the information required in order to research measurement program success. This involved selection of a model of software measurement programs, identification of variables possibly associated with success and failure, and selection of a framework for observing and evaluating those variables in operation.

AN INSTRUMENT FOR ASSESSING SOFTWARE MEASUREMENT PROGRAMS

187

2. Creating the collection method to obtain that information. This involves selection of the method and selection of measures to operationalise the variables in the MPSP Model of Measurement Program Success, followed by design and validation of the data collection instrument. 4.1.

Frameworks for the Evaluation of Software Measurement Programs

A research framework is required in order to select relevant variables from the large number of elements involved in a measurement program. The framework is then used to organise these variables and to eliminate redundancies. The framework chosen by a researcher to support their investigation will normally depend upon their discipline. However, a measurement program may be viewed from various disciplinary perspectives. The perspective chosen by the researcher limits the variables regarded as relevant to the problem (Garrity and Sanders, 1998; Lee, 1999; Orlikowski and Baroudi, 1991). The following frameworks were considered for use in the MPSP as they offer different levels of abstraction of measurement programs: •

A framework used for a descriptive study of a metrics Programs (Rifkin and Cox, 1991)



A normative framework of software measurement (Bassman et al., 1994)



A framework for project assessment (Slevin and Pinto, 1986)



A Conceptual Model of Project Success (Selby et al., 1991)



A framework for the assessment of Management Information Systems (Boynton and Zmud, 1984)



The General Systems Model, described as an “Inter-disciplinary tool of description, analysis, and prediction useful for studying many types of interlinking phenomena.” (Basil and Cook, 1974)

Because of its ability to consider many points of view, the General Systems Model was selected as the framework for the MPSP project. Support for this decision was found in the work by Laframboise and Abran (1996) and Comer and Chard (1993), who selected similar frameworks to examine measurement. 4.2.

Observing Software Measurement

Observation of an instance of any object is achieved by assigning values to a set of indicators that provide quantitative or qualitative insights into the instance. Generalising from the indicators of the observed instances provides insights into the object class. Metrics enable values to be assigned to indicators which are developed from the model of the object class that is to be observed. Thus, observing a software measurement program requires a model of that object class. The MPSP is concerned with the program attributes leading to success or failure and there-

188

BERRY AND JEFFERY

fore the model is limited to those program attributes linked to success and failure. Indicators and metrics were chosen to evaluate these attributes. Most of the indicators of program attributes lack precise definition. For example, management commitment is frequently asserted to be essential for program success. But the properties of management commitment are difficult to define and measure. The MPSP instrument has had to rely mainly on the respondents’ own definitions of many program attributes (e.g. management commitment). The consequent lack of precision suggested that a dichotomous indicator is the most appropriate. Thus, most program attributes are evaluated in terms of present/absent, done/not done, provided/not provided, and produced/not produced. Preliminary testing of the instrument found that the respondents had the most difficulty with the questions exploring the organisation systems within which the measurement program operated. They seemed to lack the mental models that enable them to understand the question and to respond appropriately. It was therefore necessary to briefly define the variable and ask the respondent a wide choice. The example below from the MPSP instrument explores the Strategic Control property of the organisational system. Strategic Control: This variable assesses the extent to which the enterprise and the software development group follow a predefined strategy laid down by management through the process of setting goals to be met by the operational groups. This is contrasted with an organisational model in which management ensures that each operational group is able to evolve and adapt by learning from experiences and by picking up cues from the environment. Enter Value 1–7

With respect to the

2

Enterprise

2

Coding Key It always follows a predefined strategy 1

It adapts as it learns 7

IT Group

As the dependent variables of interest, measurement program success requires careful definition. However, the term “measurement program” is imprecise and as the literature on IS success measurement shows, success and failure are not easy concepts to define and measure (Garrity and Sanders, 1998). There are potentially multiple levels of analysis: organisational, process and individual, and there are interactions between each level. Success is also time-dependent—of the nine programs classified as successful by the authors, at least four are known to have terminated. Two of these terminations occurred because the IT function was out-sourced. One was because the promoter of measurement and the management sponsor left the organisation at the same time as budget cuts came in. And one was because the organisation adopted a business model in which “billable hours” became the only measure of interest. In the face of such imprecision, it is not surprising subjective measurement of success on some ordinal scale is often used. Niessink (Niessink and van Vliet, 1999) discussing

AN INSTRUMENT FOR ASSESSING SOFTWARE MEASUREMENT PROGRAMS

189

four measurement programs describes them as “quite unsuccessful,” “rather successful” and “successful as long as the students were supporting the program.” In their 1993 paper (Jeffery and Berry, 1993), the authors used terms like “most successful,” “least successful” and “relatively successful” and noted that an organisation could be successful in one set of practices and weak in another. Attempts to provide objective definitions of measurement program success may be as open to criticism as subjective definitions. For example, Goldenson et al. set out to perform “wider, more rigorous” empirical testing of the determinants of success in software measurement programs (Goldenson et al., 1999). Success was defined as being a function of: 1. The extent to which measurement and analysis are regularly used to inform management and technical decision-making. 2. The extent to which improvements in an organization’s performance can be attributed to the use of measurement and analysis in the organisation. These concepts were operationalised through six and ten indicators respectively, presumably measured on common interval scales. The contribution of measurement to organisational improvement is probably unmeasurable with any confidence. With the large number of confounding factors between measurement and outcome, an assessment of the impact of measurement should usually be—“don’t know.” The contribution to decision-making appears to be based on an assumption of rational decision-making that is open to challenge on theoretical grounds (Beach, 1996) and also on the grounds of response bias. The cult of rationality is so dominant in the IT domain that respondents will feel compelled to answer that measures are used for decision-making when the actual impact of the measures may be negligible. Finally, in comparison with DeLone’s widely referenced model of Information System Success (DeLone and McLean, 1992), the definition of success could be said to be incomplete in that it addresses only two of the six dimensions. An alternative approach to the definition of success, an interpretive approach, seeks to avoid the definition issue. In this approach, the measures of success are defined by the respondent and not by the researcher (Coe, 1998). This then admits the possibility of multiple definitions of success, measured by many indicators. Thus the MPSP asks the respondent if they consider the program a success and if the management consider it a success. These indicators were backed up by other more observable indicators of success in order to eventually categorise the program as a success or failure. The MPSP uses the model in which the establishment and development of a measurement program usually proceeds through a number of program stages from Initial Vision to Stable Production. Program failure may result from an inability to transition from one stage to the next. Back-tracking to an earlier stage, for example to redefine the program when the implementation stage fails, is regarded as a stage failure and not program failure. The other indicators of success were: 1. A software measurement program is probably a success if it has been in Stable Production for more than twelve months. 2. A program is probably a failure if it is cancelled at any stage. Cancellation means that there is no further work on establishing/improving the program and there is no systemic

190

BERRY AND JEFFERY

data collection and analysis at the organisational level. It is possible that a successful program could be terminated after achieving its defined objectives. 3. A program is probably a failure if its latest stage is a failure and that stage (or an earlier stage) is not currently being repeated. 4. Any stage is considered a failure if a transition is not made to the next stage within a certain elapsed time. For each stage, this time limit is defined as the mean stage duration plus 2 standard deviations based on the observations of measurement programs collected to date. A judgement is required on whether the stalled stage will result in program failure. (Note. Some of the time limits used above are arbitrary and might need to be refined with experience) A similar approach has been used by Laframboise and Abran (1996) to classify measurement programs. They suggest that (1) a measurement program that cannot be established within two years is a failure and (2) a measurement program for which the products are not used, even though the program remains in place, can also be regarded as a failure. 4.3.

Threats to the Validity of Data Collected with MPSP

The collection method chosen for the MPSP is a sample survey of convenience within a natural setting. This decision was based upon the need to collect data relating to a large number of variables at the same time, the need for generalisability, and the resources available for the project. The disadvantages stemming from this decision are an inability to examine cause and effect, the inability to control for certain variables, and possible skewing of data due to the use of a sample of convenience. Respondents to the MPSP survey needed to be volunteers and this presents an additional risk of error (Rosenthal and Rosnow, 1979). It is expected that people will only volunteer to respond to the survey if their program is a success. This conflicts with need for the study to include a representative set of failed programs, as it is only by having data set with both failed and successful programs, that any association between program outcome and the program attributes can be determined. A tactic adopted for the MPSP to help to capture data on failed programs, was to break down a measurement program into a series of stages. Even on failed programs, at least the early stage(s) may be considered successes. It is more likely that volunteers can be found to talk about early successes, even though the eventual outcome of their program is a failure. 5.

MPSP Instrument Testing and Validation

The preceding section discussed the design of the MPSP instrument. Having designed and constructed the instrument, the MPSP survey instrument has been validated by: 1. Testing the instrument using a Skirmish Test,

AN INSTRUMENT FOR ASSESSING SOFTWARE MEASUREMENT PROGRAMS

191

2. Revising the instrument, 3. Trialing the instrument in a Pilot Test, and then 4. Testing that the observations collected using the instruments can be effectively analysed. These activities are discussed below. 5.1.

Skirmish Test

In order to get a systematic assessment of the respondents’ perceptions of each question, a survey about the MPSP instrument was used. For each question in the instrument, the respondent was asked to provide a rating on a seven point ordinal scale for each of the following four questions: 1. How confident are you that you understand this question? 2. To what extent do you have the knowledge to answer this question? 3. How difficult was it to respond in terms of the alternatives given? 4. How relevant is this question to the subject of software metrics programs? For each response given by the respondents, the mean rating was calculated and used to target problematical questions, which were then revised. Modifications were made to twenty out of the seventy questions for which responses were received. The Skirmish Test highlighted the difficulty encountered in taking statements from the literature and converting them into questions for the purpose of a survey. The policy in the development of the questions was to retain as much as possible of the text of the original statement. In some case, this resulted in a confusing loss of context. Where possible, when revising the survey, the problematical questions were supported with additional explanation or illustration. The time to complete the questions relating to a single defined stage of a measurement program was between 46 and 60 minutes. This included time for asking questions from the interviewer, for reading definitions, and for providing feedback on questions. This time is believed to be acceptable, given the large number of questions. There was no obvious reluctance on the part of the respondents to provide this amount of time. 5.2.

MPSP Instrument Pilot Testing

Following revision due to the skirmish testing, the instrument was subjected to pilot testing. The purpose of pilot testing the survey (Crockett, 1990) is to uncover any design problems, to assess the adequacy of the instructions and to determine how long the survey takes to complete. In addition the variation in responses received during pilot testing may be used as a guide to determining the sample size for the fuller study.

192

BERRY AND JEFFERY

The goals of the second stage of testing (pilot testing) were to: 1. Administer the survey to subjects who are more representative of the current population of people conducting measurement programs than the respondents in the skirmish test, 2. Verify that the changes to design and content following the skirmish test had been effective, 3. Verify that no more than one hour was required to respond to questions about a single stage of a measurement program, and 4. Determine the level of interaction/intervention required for the interviewer, and 5. Provide data that can be used to test the data entry and storage facilities and the data analysis processes. Thirteen organisations took part in the pilot test. Apart from all of them being members of the Australian Software Metrics Association (ASMA), they were considered to be representative of practitioners in the measurement area. Two of the respondents had answered a call for volunteers, the others were approached for interviews. There was an element of selection involved in that all were known to have had active programs and were expected to be cooperative. The organisations were based in the Australian Capital Territory and the states of New South Wales and Victoria. The pilot test satisfactorily demonstrated that the questions in the revised MPSP could be understood and responded to by a representative sample of measurement program practitioners. The greatest area of difficulty for the respondents was to assess and categorise their organisational context and support was usually required from the interviewer to do this. Since their responses are critical to understanding the context in which each measurement program was being conducted, this remains a concern and may mean the instrument is not appropriate for self-administration. 5.3.

Analysis of Pilot Survey Data

To demonstrate that the data gathered with the MPSP instrument is usable, an analysis was conducted of the limited set of data collected during the pilot study. It is important to note that the results of this analysis are presented here only as a demonstration of the analysis that might be performed in order to explore the degree of association between the variables and measurement program success. One purpose of a Pilot Test is to assist estimation of the required sample size. Based on the pilot survey data and using the approach of Kraemer and Thiemann (1987), to be 90% confident that an association between an indicator and program success can be identified, then at least 175 cases are required. If a 70% confidence level is satisfactory, then 100 cases are sufficient. If only 55 cases are collected, then the confidence level drops to 50%. Most of the questions in the survey could be answered with a Yes or No response depending upon whether the success indicator was present or absent. They were framed so that a “Yes” response is expected for a successful measurement program if the reported association

AN INSTRUMENT FOR ASSESSING SOFTWARE MEASUREMENT PROGRAMS

193

between the indicator and measurement program success is correct. These questions formed two-by-two contingency tables. A number of questions employ a categorical scale for responses. Since there were less than 5 observations in each category cell, there was an insufficient number to validly employ the chi-square test (Spurr and Bonini, 1973). Therefore, further analysis depends on obtaining more data. However, single variable associations were explored using Contingency Table Analysis. Seventeen of the questions in the MPSP pilot survey required responses to be expressed as ordinal values. Using a non-parametric technique (Mann-Whitney-Wilcoxon Rank Sum Test (Hamburg, 1977)), a statistically significant different set of responses was found for only one indicator that related to Enterprise Managerial Style. Questions relating to the number of employees, the scope of the program and the number of people in the metrics team, required responses to be expressed as interval values. These indicators were analysed using a non-parametric technique: Mann-Whitney-Wilcoxon Rank Sum Test. Enterprise and IT Department size had an interesting (but not statistically significant) association with success. The binary, categorical and ordinal responses were subjected to contingency table analysis. The intention was to calculate indices of association between success indicators and measurement program success. Indices such as Lambda (Goodman and Kruskal, 1979; Judd et al., 1991) and MP (Multiple Probability) (Darlington, 1999) are measures of the extent to which predictions of success or failure can be improved based on knowledge of the presence of the success indicator. Without that knowledge, the best prediction would be based on knowledge of the relative frequencies of successful and failed programs. For example, if the calculated value for the MP index for the indicator related to Demonstration of Management Commitment is shown as .68 then a prediction of either program success or failure could be improved by 68% over a prediction based only on the relative frequencies if it were known whether management had demonstrated their commitment to the program. Table 1: (Ability to Predict) shows indicators that provide an increase of > 30% in the ability to predict measurement success, depending upon the value of the indicator. The column headed MP Index contains an index of association calculated using the Multiple Probability (MP) method of contingency analysis. MP values range between zero and one on a ratio scale. The variables in Table 1 have been organised in descending order of the increase in predictive power. An entry in the column headed “+,” indicates there is evidence in the literature that the presence of the variable is positively associated with program success. If using the MP index to make inferences about a population, it is important that the sample is representative of the relative proportions in the population. In this analysis, the sample is not representative in that it contains more successful programs than might be expected. Note that the MP index only shows the increase in predictive power from knowing the value of the indicator over knowing the marginal frequencies. The predictive power does not imply a cause and effect relationship between a program variable and the program outcome. Nor does it suggest that any particular variable has a greater impact upon the outcome than any another variable. The predictive power of each variable has been calculated in isolation from the other variables.

194

BERRY AND JEFFERY

Table 1. Ability to predict successful measurement from context. Context-Related Success Variables

+

MP index

Degree of stability & certainty in IT group’s environment Demonstration of Management Commitment Degree to which pursuit of goals is preferred to evolution & adaptation Degree to which organisation is bureaucratic versus organic Organisation having a financial culture Degree to which IT staff are motivated by money versus job interest Degree of stability & certainty in organisation’s environment Organisation valuing acquisition & communication of information Presence of an emphasis on quality in the IT group Program being justified as enabling improved decision-making Degree to which management style is authoritarian or democratic Presence of a culture in which errors are “decriminalised” Extent of balance between low-discretion & high-discretion work in organisation Presence of an emphasis on individual people, communication and relationships

+ +

0.73 0.68 0.64 0.64 0.58 0.52 0.49 0.47 0.41 0.41 0.40 0.33 0.33 0.31

+ +

Table 2. 100% Co-varying responses. Responses to Question A

MP

Product measures collected

Responses to Question B

MP

0.2

Start small with a few well-chosen metrics

0

Project management measures collected

0

Start small with a few well-chosen metrics

0

Tools for automated data collection & analysis

0

Metrics database created

0

The issue of co-variance in the responses is relevant because it is the basis of techniques such as Principal Component Analysis that can be used to reduce the number of questions in a survey. Table 2: (100% Co-varying Responses) shows three pairs of questions in which the response to Question A were always the same as to Question B. However, as the MP values show, these responses contributed little to the ability to predict success as, in nearly all cases, the response was “yes” to these questions. In Table 3: (Co-varying Responses) the three pairs of questions are cases in which the ability to predict is higher and for which the responses co-varied in 90% or more of cases. Table 3. >= 90% Co-varying responses. Question A

MP

Question B

MP

Management Stated Commitment

0.2

Program goal is to improve Decision-Making

0.4

Management Demonstrated Commitment

0.7

Program goal is to improve Decision-Making

0.4

Emphasis on Quality in the Enterprise

0.2

Decriminalisation of Errors

0.3

AN INSTRUMENT FOR ASSESSING SOFTWARE MEASUREMENT PROGRAMS

195

For these pairs, it is also possible to propose a relationship based on the issues of interest (management and decisions, quality and errors). If the co-variance is repeated with a larger sample size, they may present an opportunity for question set reduction. 6.

The Importance of Context

The analysis of observations from the Pilot Study highlights the importance of characterising the context of the program. If a software measurement system is an instance of an information system, then we must recognise that they “are like social institutions in that they are embedded in a complex web of social norms and practices” (Hirscheim and Klein, 1994; MISQ 18, quoted by Lee (1999). During the preparation of the MPSP instrument, more than one hundred variables relating to program success were identified from a review of the literature: few of these had been empirically validated by systematic studies. Furthermore, few of the researchers systematically address context issues, raising issues of generalisability for their research (Lee, 1999). The 1993 Jeffery and Berry paper identified the need to consider contextual variables in future work in this area. Accordingly, for the MPSP instrument, the 1993 success variables were augmented by variables relating to the organisational context. These were principally drawn from the works of Lawrence and Lorsch (1967) and Burrel and Morgan (1979) on organisational classification. The rationale for the inclusion of contextual variables is that the context in which a measurement program is conducted applies constraints to the program, contains risks for the program and provides opportunities. Context may be expected, therefore, to play an important part in measurement program success. The inclusion of contextual variables is supported on three other bases: 1. application of the General Systems Model (Basil and Cook, 1974) requires an explicit consideration of context, 2. many success stories come from similar contexts (for example, Daskalontonakis, 1992; Garrett, 1989; Miller, 1989; Grady and Caswell, 1987; Rifkin and Cox, 1991; Bassman et al., 1994; ISO/IEC JTCI/SC7, 1995; Pfleeger, 1993), 3. there are a relatively high number of “people-related” variables in the patterns of success found by Rifken and Cox (1991). This points to the importance of considering the sociotechnical system. The people provide the inputs, carry out the processes and evaluate the products. The organisation provides the contextual setting for the activities of the knowledge workers (Scarborough, 1999). The following is a list of the principal contextual variables included in the MPSP instrument: •

The stability of the external environment for the organisation



The sector of the economy in which the organisation operates



The segment of the Information Technology industry serviced by the IT group of the organisation

196

BERRY AND JEFFERY



Organisational culture



Organisation size



Organisational managerial system



The extent to which an emphasis on quality exists in the organisation



The ability of the organisation to learn from experience



The software processes in use



The degree of management commitment to measurement



The extent to which measurement is integrated into the organisation



The justification for measurement used by the organisation.

The different levels of granularity and obvious overlapping between some of these variables stems from the way they have been taken from the literature. Future work concerning the assessment of measurement will attempt to structure them into a framework for the characterisation of context. 7.

Conclusions

Researchers and authors have highlighted the incidence of failures, difficult successes and the lack of maturity in software measurement. Since measurement is a critical part of the infrastructure processes of software engineering organisations, it is essential that research be directed towards improving software measurement. The MPSP instrument will support that research. The MPSP project has produced a validated instrument for the collection of data about measurement programs with a novel emphasis on characterisation of the context. The instrument is of use to measurement program researchers and also to practitioners wanting to assess their organisation’s measurement program. The instrument is appropriate for a structured interview situation and, with further work, might be used for self-assessment (a web-based survey instrument is currently being trialed). The data collected with the MPSP is amenable to statistical analysis. The analysis of data collected during the pilot testing of the instrument produced the following indications. 1. The data from the MPSP pilot study reinforces the view that data on contextual variables need to be collected because of their likely association with measurement program success. 2. The data also suggests that some of the many process-related variables nominated as important for success may have little impact upon program success. It is likely that they are important only in conjunction with other variables. These associations between variables will become clear only with more data. In particular, more data is required on failed programs if any valid research is to be carried out. Unfortunately, data on failed programs is hard to obtain.

AN INSTRUMENT FOR ASSESSING SOFTWARE MEASUREMENT PROGRAMS

197

3. Different patterns seem to be present in the Australian data (predominantly commercial and non-military government organisations) than in the data reported by researchers in the U.S. (predominantly from military, engineering, and telecommunications contexts). Further data is necessary, however, before any conclusive evidence can be presented. The focus of the MPSP instrument is on organisational context and practices and supports factor-based research into measurement program success. However, parallel work in the area of information systems success measurement suggests that expectations of being able to explain and predict software measurement program success in terms of success factors may be unrealistic (Kanellis et al., 1998). Factor-based research should identify certain key factors that are associated with relative success and failure (depending on their definition). These will probably represent the “hygiene” factors—the things that should be right if the program has a chance of succeeding. Identifying these factors is useful work but will provide insufficient guidance to people seeking to improve their measurement. To provide this guidance, a complementary method is currently being trialed by one of the authors. This method assesses the relationship of measurement with a specific well-defined process (e.g. a key process area as defined in the CMM model), as enacted in a specific organisational context by specific individuals. The degree of specificity that can be achieved should make it easier to identify opportunities for improvement and to initiate action. In the trial of the targeted instrument, the MPSP instrument is also being used for its ability to characterise the organisational measurement program since this should enable identification of organisational factors that might constrain proposed improvement actions. References Basil and Cook. 1974. The Management of Change. Maidenhead, U.K.: McGraw-Hill. Bassman, M., McGarry, F., and Pajerski, R. 1994. Software Measurement Guidebook. Greenbelt, Maryland: Software Engineering Laboratory. Beach, L. R. (Editor). 1996. Decision Making in the Workplace a Unified Perspective. Mahwah, New Jersey: Lawrence Erlbaum Associates. Boynton, A., and Zmud, R. 1984. An analysis of critical success factors. Sloan Management Review Summer: 17–27. Burrel, G., and Morgan, G. 1979. Sociological Paradigms and Organisational Analysis. London: Heinemann. Coe, L. 1998. Five small secrets to systems success. Garrity, E., and Sanders, G. (eds). Information Systems Success Measurement. Hershey: Idea Group Publishing. Comer, P., and Chard, F. 1993. A measurement maturity model. Software Quality Journal 2: 277–89. Crockett, R. A. 1990. An Introduction to Sample Surveys—A User’s Guide. Melbourne: Australian Bureau of Statistics. Darlington, R. B. Measures of association in crosstab tables. Available at: http://comp9.psych.cornell.edu/ Darlington/crosstab/table0.htm. Accessed December, 1999. Daskalantonakis, M. K., Yacobellis, R. H., and Basili, V. R. 1990. A method for assessing software measurement technology. Quality Engineering 3: 27–40. Daskalantonakis, M. K. 1992. A practical view of software measurement and implementation experiences within Motorola. IEEE Transactions on Software Engineering, pp. 886–1001. DeLone, W. H., and McLean, E. R. 1992. Information systems success: The quest for the dependent variable. Information Systems Research 3(1): 60–95. Dutta, S., Lee, M., Van Wassenhove, L. 1999. Software engineering in Europe: A study of best practices. IEEE Software 82–9. Fenton, N. 1991. Software Metrics: A Rigorous Approach. London: Chapman & Hall.

198

BERRY AND JEFFERY

Fenton, N., and Neil, M. 1998. New directions in software metrics. Available at: http://www.agena.co.uk/new directions metrics/start.htm. Accessed April, 1999. Garrett, W. A. 1989. Proving application development productivity and quality. Proc. 1989 Spring Conf. of IFPUG. San Diego: International Function Point Users Group. Garrity, E., and Sanders, G. 1998. Dimensions of information systems success. Garrity, E., and Sanders, G. (eds). Information Systems Success Measurement. Hershey: Idea Group Publishing. Goldenson, D., Gopal, A., and Mukhopadhyay, T. 1999. Determinants of success in software measurement programs: Initial results. Sixth International Software Metrics Symposium. (Nov 4–6). Boca Raton, Florida, USA. Los Alamitos, California: IEEE Computer Society. Goodman, L. A., and Kruskal, W. H. 1979. Measures of Association for Cross Classifications. New York: Springer-Verlag. Grady, R. B., and Caswell, D. L. 1987. Software Metrics: Establishing a Company-Wide Program. New Jersey: Prentice-Hall. Hall, T., and Fenton, N. 1994. Implementing software metrics—The critical success factors. Software Quality Journal 3(4): 195–208. Hamburg, M. 1977. Statistical Analysis for Decision Making. (2nd ed). USA: Harcourt Brace Jovanovich, Inc. Holdsworth, J. 1994. Software Process Design: Out of the Tar Pit. Maidenhead: McGraw-Hill. ISO/IEC JTC1/SC7. 1995. Information technology—Software life cycle processes. Geneva: International Organization for Standardization. ISO/IEC TR 12207:1995 ISO/IEC JTC1/SC7. 1998. Information technology—Software process assessment. Geneva: International Organization for Standardization. ISO/IEC TR 15504:1998. Jeffery, R., and Berry, M. 1993. A framework for evaluation and prediction of metrics program success. Proc. of the IEEE International Software Metrics Symposium. ( May 17–21). Baltimore. Los Alamitos: IEEE Computer Society. Jeffery, R., and Zucker, B. (Centre for Advanced Empirical Software Research). 1997. The state of practice in software metrics. Technical Report No. 97/1. Sydney, Australia: CAESAR, University of New South Wales. Judd, D., Smith, E., and Kidder, L. 1991. Research Methods in Social Relations. (6th ed). Orlando, USA: Holt, Rinehart and Winston, Inc. Kanellis, P., Lycett, M., and Paul, R. J. 1998. An interpretive approach to the measurement of information systems success. Garrity, E., and Sanders, G. (eds). Information Systems Success Measurement. Hershey: Idea Group Publishing. Kraemer, H. C., and Thiemann, S. 1987. How Many Subjects? Statistical Power Analysis in Research. Newbury Park: Sage Publications. Laframboise, L., and Abran, A. 1996. Grille d’evaluation des facteurs de risque d’un programme de mesures en Genie Logiciciel. Proceedings of the Vision 96 Conference on Software Process Improvement. Montreal. Lawrence, P. R., and Lorsch, J. W. 1967. Organisation and Environment: Managing Differentiation and Integration. Boston: Harvard Business School Press. Lee, A. S. 1999. Researching MIS. Currie, L., and Galliers, B. (eds). Rethinking Management Information Systems. New York: Oxford UP, pp. 7–27. Miller, J. C. 1989. Measurement using function point analysis. Proc. 1989 Spring Conference of IFPUG. San Diego, California: International Function Point Users Group. Niessink, F., and van Vliet, H. 1999. Measurements should generate value, rather than data. Proceedings of the Sixth International Software Metrics Symposium. (Nov 4–6) Boca Raton, Florida. Los Alamitos, California: IEEE Computer Society. Niessink, F. 2000. Perspectives on Improving Software Maintenance. Amsterdam: SIKS, Dutch Graduate School for Information and Knowledge Systems. Orlikowski, W. J., and Baroudi, J. J. 1991. Studying information technology in organisations: Research Approaches and Assumptions. Information Systems Research 2: 1–28. Pfleeger, S. 1993. Lessons learned in building a Corporate Metric Program. IEEE Software. May: pp. 67–74. Rifkin, S., and Cox, C. 1991. Measurement in practice. Technical Report CMU/SEI-91-TR-16. Software Engineering Institute, Carnegie Mellon University. Rosenthal, R., and Rosnow, R. 1979. The volunteer subject. MowDay, R. T., and Steers, R. M. (eds). Research in Organisations—Issues and Controversies. Santa Monica: GoodYear Publishing. Rubin, H. A. 1987. Critical success factors for measurement programs. Proc. 1987 Spring Conference of IFPUG. Scottsdale, Arizona: International Function Point Users Group.

AN INSTRUMENT FOR ASSESSING SOFTWARE MEASUREMENT PROGRAMS

199

Sauer, C. 1999. Deciding the future for IS failures. Currie, L., and Galliers, B. (eds). Rethinking Management Information Systems. New York: Oxford UP, pp. 279–309. Scarborough, H. 1999. The management of knowledge workers. Currie, L., and Galliers, B. (eds). Rethinking Management Information Systems. New York: Oxford UP, pp. 474–96. Selby, R., Porter, A., Schmidt, and Berney, 1991. Metric driven analysis and feedback systems for enabling empirically guided software development. Proc. 13th International Conference on Software Engineering. IEEE Computer Society. Slevin, D. P., and Pinto, J. K. 1986. The project implementation profile. Project Management Journal September: 57–70. Slovin, M. 1997. Measuring measurement maturity. IT Metrics Strategies III(4): 11–13. Spurr, W. A., and Bonini, C. P. 1973. Statistical Analysis for Business Decisions. Illinois, USA: Richard D. Irwin. Zuse, H. 1998. The history of software measurement. Available at: http://irb.cs.tu-berlin.de/∼zuse/. Accessed April, 1999.

Mike Berry is currently a researcher within the Organisational Performance Measurement group of CSIRO Mathematical and Information Sciences. In this role, his current focus is on the establishment and evaluation of measurement. However, his previous work with CSIRO has been involved with software process improvement and software quality management. The majority of Mike’s thirty-year career was spent in commercial software development as a programmer, systems analyst and technical consultant. Then a redundancy package gave him the stimulus and opportunity to explore other worlds. This led him to a partnership in small software business, lecturing and research at UNSW, and now research and consulting with CSIRO. Mike’s specialisation is in software metrics and the implementation of software measurement programs. He has been a practitioner and researcher in the area since 1988 and has presented papers on measurement at a number of forums. He is a founder-member of the Australian Software Metrics association.

Ross Jeffery is Professor of Software Engineering in the School of Computer Science and Engineering and Director of the Centre for Advanced Software Engineering Research (CAESAR) at The University of New South Wales. Professor Jeffery was the first Head of the School of Information Systems at UNSW from 1989 to 1994. He was the founding Chairman of the Australian Software Metrics Association and also was instrumental in creating the Australian Conference on Information Systems for which he was General Chair for the first two meetings. He served on the editorial board of the IEEE Transactions on Software Engineering for many years,

200

BERRY AND JEFFERY

and is an Associate Editor of the Journal of Empirical Software Engineering. He has served on the steering committee of the International Conference on Software Engineering, and is also on the editorial board of the Wiley International Series in Information Systems. He was honoured by the Australian Computer Society for his contribution to research in software engineering. His current research interests are in Software engineering process and product modeling and improvement, software metrics, software technical and management reviews, and software resource modeling. His research has involved over fifty government and industry organizations over a period of 15 years. He has also held positions at The University of Queensland, University of Maryland, and University of Minnesota. He has authored/co-authored four books and over seventy research papers.