The Challenge of Metrics Implementation - CiteSeerX

The Challenge of Metrics Implementation Jakob H. Iversena & Karlheinz Kautzb [email protected] and [email protected]

Department of Computer Science, Aalborg Universitya Institute for Informatics, Copenhagen Business Schoolb

Abstract This paper describes ten principles that may be valuable to a metrics implementation effort. The principles are based on the authors’ practical experiences with metrics implementation in different companies. the principles are thought to be of assistance as a guideline for companies wishing to implement metrics, and they are therefore presented as a pragmatic approach to metrics and are illustrated with examples from the companies.

Keywords: Metrics, software process improvement, software development

1. Introduction Metrics programs are a vital part of every serious software process improvement endeavor. Despite the reports on successful metrics implementation (Fenton & Pfleeger, 1997; Dekkers, 1999; Weinberg, 1993; Carleton, 1992), most companies experience great difficulties when trying to implement a metrics program as part of their software process improvement activities. Does this mean that the many experts that write the books and recommendations are wrong? Likely not. The truth seems to lie in the complexity and uncertainty of implementing metrics as part of the changes intended by an improvement initiative. Various writers recommend key elements for the implementation of metrics programs. Grady & Caswell (1987) derive their suggestions from Hewlett-Packard, a very large US based IT provider. Rifkin & Cox (1991) have analyzed the eleven best measurement programs in the US, among them NASA, Contel and Hewlett-Packard. Pfleeger (1993) puts forward ten lessons learned, again from Contel, a large telecommunication organization, and Dekkers offers her 7 ‘secrets’ for successful measurement program implementation based on her consulting work in North America. The factors presented by these authors largely overlap and as examples we show a summary of Rifkin & Cox’s work and Dekkers’ advice in Table 1 and Table 2. Such guidelines and frameworks are useful, but to be successful an organization should tailor them to the situation at hand rather than follow them entirely. All authors emphasize consideration of cultural issues. We have assisted and analyzed the implementation of different metrics programs in a different cultural environment in North Europe. The organizations we have studied are in comparison to North America small, although in a European perspective they vary from large to very small. On this basis we have formulated nine principles for increasing the likelihood of successful metrics implementation (Table 3). The principles do not cover all aspects of metrics implementation and following them does not guarantee successful metrics Proceedings of IRIS 23. Laboratorium for Interaction Technology, University of Trollhättan Uddevalla, 2000. L. Svensson, U. Snis, C. Sørensen, H. Fägerlind, T. Lindroth, M. Magnusson, C. Östlund (eds.)

implementation. Some of them have common characteristics, so great care and tailoring is required when applying them to a specific metrics project. The principles are thought as an inspiration and help for practitioners who want to establish metrics programs in organizations. Table 1. Recommendations for successful metrics programs (Rifkin and Cox, 1991). Pattern Type Measure

People

Program

Implementation

Recommendation 1. Start small 2. Use a rigorously defined set 3. Automate collection and reporting 4. Motivate managers 5. Set expectations 6. Involve all stakeholders 7. Educate and train 8. Earn trust 9. Take an evolutionary approach 10. Plan to throw one away 11. Get the right information to the right people 12. Strive for an initial success 13. Add value 14. Empower developers to use measurement information 15. Take a “whole process” view 16. Understand that adoption takes time

Table 2. Secrets of Highly Successful Measurement Programs (Dekkers, 1999). 1. 2. 3. 4. 5. 6. 7.

Secret Set solid objectives and plans for measurement Make the measurement program part of the process, not a management “pet project” Gain a thorough understanding of what measurement is all about – including benefits and limitations Focus on cultural issues Create a safe environment for reporting true data A predisposition to change A complementary suite of measures

Table 3. Summary of our principles Area Knowledge

Organization

Design

Communication

Usage

Principle 1.

Use improvement knowledge

2.

Use organizational knowledge

3.

Establish a project

4.

Establish incentive structures

5.

Start by determining goals

6.

Start Simple

7.

Publish Objectives and collected Data Widely

8.

Facilitate debate

9.

Use the Data

We have arranged our presentation according to five areas namely, knowing about improvement, organizing metrics programs, defining and collecting metrics, communicating metrics programs and results and using metrics. We also illustrate our principles with concrete examples from two cases and discuss why it is that even if the principles are followed it is still difficult to put metrics programs into operation. First we briefly introduce theoretical background and our rationale for deploying metrics in practice. Then we present the metrics programs from the companies and then we present the principles and their application in action. We complete the paper with a discussion about how to tailor the principles to a specific context.

2. Background It has to be underlined again: simply applying these principles is no guarantee for success. We have identified them as a result of our literature survey and our experience when we developed and introduced metrics programs ourselves. We have applied some of them successfully in practice under the certain conditions of one case (Kautz, 1999). Thus we can confirm the results of earlier research on the introduction of metrics programs. Others we suggest informed by the above referenced studies on the basis of a thorough analysis of the problems we observed in the other case (Iversen & Mathiassen) The principles are very pragmatic in nature and they might be criticized for a lack of highly developed metrics theory as their foundation. But this is exactly the point here. Given the insight that measurement is an essential component of improvement, we agree with as Larry Draffel (Draffel, 1994), then Director of the SEI that it is more important to initiate the collection and analysis of data than to wait “until we get the ‘right’ measures”. Of course, metrics have to be technically correct and meaningful and theory should not be ignored. On the other hand, there is no need to get frightened by it. Once you have launched an initial program you might want to evolve it in different directions and with growing experience your metrics program might be developed further. At that point, theory and advanced advice makes sense. Then an academic definition that ‘measurement is the process by which numbers or symbols are assigned to attributes of entities in the real world in such a way as to describe them according to clearly defined

rules’ and that ‘we define measurement as a mapping from the empirical world to the formal, relational world’ and that a metrics as ‘a measure is the number or symbol assigned to an entity by this mapping in order to characterize an attribute’ as given by Fenton & Hall (1997) becomes understandable and invaluable. Advanced rules for mapping and the conditions for representing data to ensure meaningfulness in measurement will then turn out to be even more interesting. In general, the book by Fenton & Hall (1997) entitled ‘Software Metrics – A Rigorous & Practical Approach’ in that case contains helpful guidance in this respect We have not introduced the distinction between direct measurement of an attribute that does not depend on the measurement of another attribute like the length of a source code document, and indirect measurement that involves measures of one or more other attributes like module defect density, which can be defined as the number of defects in relation to module size. Such distinction may be considered academic in a simple metrics program, but might however be helpful for a sophisticated and complex program. Finally, the principles do not cover basics of measurements concerning measurement scales and scale types. Measurement theory distinguishes between nominal, ordinal, interval, ratio, and absolute scales and for these defines the mapping and computing rules which make sense. Such theory is important, but rather supplements than interferes with our principles. They are also applicable independently of whether a purely metrics driven improvement approach (Aaen et al., 2000) as represented by the ami-method is chosen or whether improvement is based on the well-known assessment and improvement approaches like CMM (Humphrey, 1989), Bootstrap (Kuvaja et al., 1994) or SPICE (El Emam, 1998) is used. The principles are both directed at measurement in the small, as we have seen for measuring single characteristics in small companies (Kautz, 1999), as well as for larger endeavors involving companies with several hundred software developers (Iversen & Mathiassen, 2000). We show that they can be used for measuring ‘hard’, engineering issues as in the three small companies and ‘soft’ human issues as in Danske Data earlier in the text. The crucial point is to adjust them to the environment and the situation at hand.

2.1. Case 1: Danske Data A/S The metrics program in Danske Data (Iversen, & Mathiassen, 2000; Chapter 5) was initiated to show a 10% improvement in efficiency from the SPI project. This was done by introducing a program to measure the following six factors: project productivity, quality, adherence to schedule, adherence to budget, customer satisfaction, and employee satisfaction. One of us was a member of the SPI group and closely followed the metrics program for almost three years. The setup of the program was that data was collected on all finished projects in each quarter, and then a report was published with these data. Since it was the CEO who defined the 10% goal for SPI, senior management was thought to be the target group for the metrics program. The intention was also to minimize disturbance of the development projects, and to collect as much data as possible automatically. The Danske Data metrics program was initiated in March 1997 and the first report to be made public within the organization was published two and a half years later in September 1999.

Table 4. Indicators of the metrics program in Danske Data Factor

Definition

Project Productivity

Resources used to develop the system relative to its size in function points

Quality

Number of error reports both absolute and relative to size in function points

Adherence to schedule

Variation from agreed time of delivery both absolute and relative to size in function points

Adherence to budget

Variation from estimated use of resources

Customer satisfaction

Satisfaction with the development process and the implemented solution (multiple choice questionnaire)

Employee satisfaction

Satisfaction with the development process (multiple choice questionnaire)

The metrics program was intended from the outset to measure the six indicators shown in Table 4. The idea was that by including an array of measures the results would be better balanced and dysfunctional behavior less likely and have less impact. The major hurdle was to establish a sufficient level of data quality such that senior management was confident to make the reports public within the organization. One of the problems was the attempt to count function points automatically. This had to be abandoned after several tries because the results were not believed to reflect reality. Another problem was that questionnaires to measure customer satisfaction covered questions relating to contractual agreements and to the entire course of the project, whereas those who answered the questionnaire were users who were only involved in acceptance tests.

2.2. Case 2: Three Small Companie s In order to illustrate metrics implementation in further detail we have chosen to include a case that is otherwise not a part of the SPI project. This has been done for three reasons. First, of the four companies in the SPI project, only Danske Data has actively worked on establishing a metrics program. By including an external case, we compensate this. Second, with no more than five system developers and almost no administrative overhead, the three companies are extremely small compared to all of the companies in the project, and particularly to Danske Data. We are thus able to discuss the applicability of the principles in both large and small organizations. With small upstart software companies becoming increasingly prevalent it is very relevant to discuss improvement for this type of organization. Third, one of us acted as consultant for that project, and thus provides easy and extensive access to data and knowledge. Each of the three companies (Kautz, 1998; Kautz, 1999) bases its business on developing one main product. Respectively, they develop a system for stochastic modeling and analysis, a system simulating oil reservoirs, and a system for managing pension funds and their members. They were all less than five years old when their improvement project started; none had more than 5 employees and all of them worked as system developers. Due to a growing demand of variants of their products each company felt independently problems with the control of source code versions and a need to introduce configuration management. Resources were scarce however, but the European Union’s (EU) European Systems and Software Initiative (ESSI) program sponsored the project. On request of the project sponsor, metrics to verify and validate the effect of the

improvement actions had to be elaborated and established. Originally, neither the project leaders nor the developers were convinced about the benefits of a metrics program. However, informed by the practical work with the new configuration and change request management routines, distrust was gradually removed and a shared understanding of metrics and an opinion about what could be interesting to measure was formed. The metrics programs served different, dedicated purposes in each company. This is one reason why all employees adopted them. It made the effect of the introduced measures visible and resulted in improved planning and performance, improved working conditions and higher customer satisfaction. One company measured the number of fixed change requests delivered on time and the time used in review meetings and found that change requests delivery on time had increased from 45% to 77% and review meeting time was shortened by a factor 4. Another found that the average time spent on a change request was twice what was expected. Although disappointed, they considered it to be a realistic figure, which allowed better planning and information for the customers. Although a certain imprecision and inaccuracy lay in this data – which underlines how careful metrics should be used - the involved companies were confident that the metrics indicated interesting tendencies. They now plan to improve their metrics programs and will continue measuring. Table 5. Results of the metrics programs for the three small companies Company A: Chief developer effort on library development decreased. Library development effort by other staff increased from 8 % to 16%. Efficiency had not decreased. Improved planning and greater flexibility in work organization Company B: The number of fixed requests delivered on time increased from 45% to 77%. The time spent on code merging was reduced by more than a factor 4. The weekly review meetings were shortened by a factor 4. Improved planning and performance, improved working conditions, higher customer satisfaction Company C The number of change requests increased considerably. The time spent on change request handling was determined more precisely. The preparation time for releases was reduced drastically. Improved planning and performance, improved working conditions, higher customer satisfaction

3. Knowledge Principle 1: Use improvement knowledge

Developing and deploying metrics programs call for different kinds of knowledge by the members of the organization as well as by the members of a possibly established improvement and measurement team. Implementing metrics not only requires the participants to know about the state of the art of software metrics. Metrics implementation is a very complex form of organizational change, so in order for the implementation to be successful, the implementers should be knowledgeable and experienced in matters of software process improvement, software development, software engineering, as well as organizational change.

Principle 1 in Action

In the case of the three small companies the organizations had originally no knowledge about software process improvement and metrics programs. This was provided by external specialists. The overall improvement initiative was guided by the SPICE approach (El Emam et al., 1998) and the metrics program was inspired by the GQM paradigm (Basili & Rombach, 1988) and the ami method (Pulford et al., 1996), but executed in a very lean manner. On this basis, instead of developing an all too comprehensive metrics program, the practitioners and their consultants collaborated on developing simple, quantitative, but small-scale metrics programs. The consultants provided guidance in defining metrics and for collecting data and helped preparing documents and reports. However, it is important to note that the consultants only provided assistance, but no prescriptions. The members of the companies themselves accomplished the changes and implemented the new practices. The story in Danske Data is very similar. The practitioners did not have any prior knowledge of neither SPI nor metrics. Instead, the researchers and consultants stepped in and provided literature, gave lectures, and general advice. In addition, one of the other companies in the project had already been working on defining a metrics program, and some of the principles from this were transferred to Danske Data. In this way, we helped the metrics group gain the necessary knowledge and experience along the way. However, most of the interactions between the practitioners and the researchers took the form of the researchers providing some knowledge, and the practitioners then using this knowledge to make decisions about how to implement the metrics program. Principle 2: Use organizational knowledge

Many metrics programs fail due to the limited understanding of the organizational context in which the program will operate. There is a difference between the work procedures as described in handbooks and the actual work practices. It is necessary to be aware of the needs of the various actors to collect data that makes sense to them. An understanding of the organizational politics is also most helpful. Involving as many of the employees affected by the intended changes as possible or their representatives is one way of obtaining this knowledge. Principle 2 in Action

In the three small companies knowledge about the organization was collected in the following way: The external consultants acted as analysts of the current practice and carried out interviews with the developers in the beginning of the project concerning the administration of different system versions, the handling of change requests and wishes and the management and registration of resources. This provided valuable input for the development of the routines for handling configuration management and change requests. This provided the knowledge necessary to define metrics and to gather data. The most important result of this analysis was to keep the metrics very simple such that it would not interfere with the organizations’ creativity and flexibility. It was also decided to aggregate the figures by using existing data, which was easily accessible from the established configuration item libraries and the change request databases. This was seen as a compromise between the existing doubts and the promised benefits and decreased the burden of the developers who were not used to collect data about their daily work. In Danske Data, no special activities were aimed at increasing the researchers’ general

understanding of the organization in relation to the metrics program. Instead, we relied on the practitioners’ general understanding of the organization. This seemed reasonable since the SPI group had a very broad base in the organization. Several members were also project managers on development projects, and were thereby in daily contact with other project managers and developers and were thus able to discern what the rest of the organization felt about the initiative. Similarly to principle 1 where the practitioners gained knowledge along the way about SPI and metrics, the researchers gained knowledge along the way about the organization. Toward the end of the project, the difference in knowledge between researchers and practitioners were greatly minimized in these areas.

4. Organization Principle 3: Establish a project.

Metrics programs do not come for free. They need attention and visibility and consume resources. Setting up the initiative as a formal project with responsibilities for planning, reporting progress, and with explicit success criteria supports these objectives. Organizing the introduction of several metrics in the context of a project in a stepwise manner will also contribute to the desired effects. Principle 3 in Action

At first the metrics program in Danske Data was conducted as a concentrated effort, and within a few months, it was possible to establish the foundation of the metrics program and to begin the collection of data. The people working on the metrics program did so, because they found the work interesting, and they were therefore very committed to the successful implementation of the program. However, when the program later became riddled with problems, this organization became inadequate. It became evident that it was necessary to establish a dedicated improvement project to improve the quality of the metrics program. This improvement project was a carefully planned and staffed effort with clear goals and success criteria. After working for about 6 months, the improvement project was successful, and the quality of the metrics program had increased enough to allow for publication of the results in the organization. Establishing a formal project made the program far more visible in the organization and made it much easier for the participants to argue that adequate resources should be available. The amount of missing or unreliable data was also measured and it became a specific goal that these numbers should go down. In the three small companies the introduction and utilization of the metrics program were clearly marked work packages with formally required deliverables as part of a defined improvement project. An organizational structure and was decided where in each company one senior developer acted as a local project leader with one of them being the overall project manager. The day-to-day management and work of the overall project was coordinated on weekly meetings between the project leaders and within the organizations where both formal and informal matters were discussed. The request for documented deliverables enforced a certain discipline and product-orientation on the team. According to the project members without the demand for documented results much less would have been put in writing and stored for future actions. The local project leaders acted as principal project members, too and

were among other things responsible for the establishment of the metrics. They were also accountable for the collection the measurement data. In each company at least one additional developer was to a varying extent involved in providing metrics data. On this background, despite fighting with the daily business, serious-mindedness characterized the introduction of the metrics program. Principle 4: Establish incentive structures.

Those who report data to a metrics program need to see some form of advantage in the program. In the best case the results of the metrics program are directly beneficial for the developers in their daily practice. However, in some cases, a more indirect approach using a bonus system, awards, etc. may be applied to facilitate the uptake of the metrics program. It should be noted that bonus systems, could often be easily circumvented and become dysfunctional. However, if designed carefully they may prove an efficient kick-start for metrics implementation. In addition, disadvantages and extra burdens must be avoided. When tools and procedures counteract efficient collection and reporting of data or when they are not properly integrated into the developers’ normal work practice, it is far less likely that sufficient data quality will be attained. Principle 4 in Action

The Danske Data case illustrates the value of establishing incentives to improve data quality. All projects were required to record several data in the on-line project and hour registration system. But almost no projects recorded complete and accurate data mainly because they saw no immediate use for the data they provided. A marked improvement of data quality was achieved by using a combination of informing the project managers about what data they should report and how to do it, as well as informing about the importance of the data they provided and showing results based on the data. Moreover, when one of the divisions made the reporting of complete and accurate data part of the bonus system, a very clear incentive scheme was established for both the project managers and the division manager. The data quality rapidly improved in this division for the projects starting after the bonus system was established. This was the reaction of an ‘adverse’ incentive that was also in place in Danske Data. Originally the systems for registering data about the projects and the hours spent working on them were only useful for the daily practice of the economics department. The project managers saw no use in entering the data into the databases. They felt that it was not possible to submit the relevant data to provide and a full understanding of a development project. It was only allowed to enter one estimated completion date, which meant that in cases where a project was re-estimated, the original estimate would either be lost or the new, realistic estimate would not be entered. With no clear guidance on how to cope with this situation, many project managers simply refrained from entering any data at all. A recent improvement initiative has been established to develop better systems and to alleviate these problems. In the three small companies no explicit bonus system or special awards were introduced to facilitate the uptake of the metrics program. However the developers directly experienced the benefits of a metrics program as part of the improvement initiative. In one company the employees stated that working conditions had improved and that correctness and integrity of the software items had increased. Fewer errors were reported from individual customers after

delivery. Thus, higher customer satisfaction could be documented. This was confirmed by the employees of another of the companies who stated that the work was experienced as much more professional and motivating as a result of a recorded process.

5. Design Principle 5: Start by determining goals.

Successful implementation of metrics requires very clear goals from the outset, lest the effort becomes very difficult to manage and decisions on which measures to make must be made blindly. One approach that might help design the metrics program based on goals, is the GoalQuestion-Metrics approach (Basili & Weiss1984). No matter which method is used, however, it is important that the goals and the derived metrics are meaningful for both those who deliver and if not the same, for those who use the data. Principle 5 in Action

In the case of the three companies, project objectives were formulated at the very beginning as requested by the project sponsor. These were 1) Formalize and document procedures for configuration management in each company 2) Choose and implement configuration management software and 3) Establish a metrics program to measure the impact of the new process. Initially the following outcomes were expected: 1) Improved procedures for configuration management for source code were expected to reduce the time spent on producing new versions and releases, correcting errors, and handling changes 2) Improved procedures for documentation of source code should allow all developers to work in all areas of the companies’ products and should facilitate the extension of the development teams and 3) Improved procedures for testing source code should reduce the number of errors in the delivered products. On this basis the companies then were clear to decide which measures were most critical for them. Each company had different key objectives: Company A wanted to let all developers work on all parts of their product, thus reducing reliance on the chief developer; Company B wanted to increase the number of accepted change requests handled in a given time frame; Company C wanted to reduce the time it took to handle customer change requests and finalize releases for shipping. With these clear objectives the employees collected the necessary data. In Danske Data, the original goal was to show 10% improvement in productivity. However, this was a very broad goal, which was very difficult to operationalize. When the six factors in Table 4 were defined, it helped, but the six factors were chosen to broaden the horizon for the metrics program rather than support the stated goal. In addition the ownership of the metrics program was also problematic. Although the CEO had stated the 10% goal, he had, in fact, not requested a metrics program. When the metrics group started its work, they felt that they were fulfilling a vision set by management, but when the results started coming in, management did not seem enthusiastic about it. One of the problems was that the metrics program was defined before any improvement initiatives were launched. The metrics were, therefore, not very well suited to give information about the effect of specific improvement initiatives, but were more directed at the general state of the development process.

Principle 6: Start simple.

Systematically collecting data to use as a basis for decision-making is a difficult and complex undertaking. Start therefore with a small set of metrics. Maybe just one single measure will already be able to assess the fulfillment of the defined goals. It is, however, important to select the metrics that are most critical to measure. Another approach is to use existing data or to automatically collect data, thereby minimizing the disturbance for the development organization. However, the list of metrics, and their definitions must be adapted to the conditions found in the organization and to the defined goals of the metrics program. Principle 6 in Action

In the three small companies the development of the metrics program took its starting point in the description of the originally expected results. The companies decided to measure what they individually considered most critical for them. The practitioners and the researchers developed simple, quantitative, but small-scale metrics programs. The figures were aggregated from existing data, which was easily accessible from the established configuration item libraries and the change request databases. For each company basically only one metric was defined to get an indication whether the new practices and tools allowed all developers to work on all parts of the products, supported the delivery of accepted change requests within the estimated time frame, or had any influence on the time used to handle customer change requests and finalizing releases for shipping, respectively. In all three companies data was accumulated with as much rigor and care as possible to secure credibility and precision. When the metrics effort in Danske Data was initiated, the goal was very ambitious. Six fairly complex factors should be measured (Table 4), and all projects were required to report data from day 1. This was an extremely ambitious undertaking, and as of yet, all the factors have not been measured, and some have even been officially abandoned. This caused a great deal of frustration among the members of the metrics program. One particular event was the attempt to automatically compute function points (Albrecht, 1979) to measure the productivity of the developers. The project spent 2 man-months establishing the routines for collecting this information, but after counting function points in several application systems, it was very difficult to see any relationship between the perceived complexity of the systems, and the number that the counting procedures had arrived at. It was decided to abandon the automatic function point. As a consequence hat it became impossible to measure productivity, as there was no longer any measure of the output from the software development process.

6. Communication Principle 7: Publish objectives and collected data widely.

It is important to communicate the objectives of the metrics program widely and when metrics have been collected, it is essential to publish the results as broadly as possible and include as many of those affected by the results as possible. However, due considerations must be made to protect individuals, and not relate the metrics results to performance evaluations. This would make sub-optimization of data very likely as everybody would make sure that their numbers look good. Measurements do provide views on sensitive issues. It is therefore necessary that the

results that are made public be based on data that are sufficiently valid and reliable to support fruitful discussions. Metrics will likely yield unpleasant results about the software operation. Being able to cope with such results and use them to improve the organization rather than just figuring out who is to blame for the bad results is an important part of the cultivation involved in implementing metrics programs. Principle 7 in Action

In the three small companies no extra effort was made to publish the metrics data. The publication and distribution of the measurement data took place during the weekly co-ordination meetings where all developers participated and where both formal and informal matters and among these the development and the actual results of the metrics programs were openly communicated. This made additional roll-out of the results in the companies obsolete. In addition, the metrics data, even the part which did document no direct improvement, was provided for the project sponsor and made publicly available for software engineering practitioners – and academics – through publication in the Journal of Software Process Improvement (Kautz, 1998) and the IEEE Software magazine (Kautz, 1999) The intention from the metrics group in Danske Data was to publish the measurement findings each quarter. However, it took over two years before the first measurement report was published. Management withheld the earlier reports because it was felt that the data validity was insufficient. During this period the metrics group recognized that one very obvious way to improve the program was to get the numbers published, so that those responsible for providing the data could see the use of doing so. However, even though the reports were not made public in an official report, there were still some efforts to communicate the collected data. The metrics group staged a road show to present the results of measurements to managers and developers at the different sides of the company in the country. The road show stirred interest and debate during the presentation sessions, although the results were completely anonymous and results were only presented at a high level of abstraction. The road shows were successful enough, however, to convince the metrics group that the best way to improve the metrics program was to communicate widely about the data that was collected. This insight increased the frustration of not being able to make the reports public. Principle 8: Facilitate debate.

In order to prevent formation of myths about the data collected and its use, there should be a forum for discussion of the metrics program and its results. This could be in the form of an electronic message board, periodic meetings about the measurements, or putting the metrics program on the agenda for other meetings. Principle 8 in Action

In all three small companies the metrics were not only communicated, but also discussed on the weekly co-ordination meetings, in individual sessions or on this basis simply directly during the normal workdays when the developers applied the new routines. They often had concrete questions when trying to accomplish their daily tasks. Their feedback led to continuous evaluation of the procedures. The experience gained with the procedures informed their refinement process and the routines and the metrics were adjusted accordingly. The

responsibility to permanently update the descriptions lay with the project leaders in their capacity as champions for the changes and as trainers. Their prompt reaction of the project leaders on the employees’ proposals additionally supported a smooth introduction process. In Danske Data the public debate about the metrics program was hampered by the lack of public data. Not being able to talk about the actual results caused the organizational members to either no talk at all or on hearsay basis. The SPI group did establish an electronic discussion board, but it was hardly used at all. Some discussion did take place with the senior management group when they were given the reports to approve for publication. On these events, the data foundation, the actual results, and the presentation (report layout etc.) were discussed. However, these discussions only took place in the meetings where the reports were presented. In addition, some project managers who felt that the system for reporting data about the project and hours spent was inadequate also discussed with the SPI group how this could be improved. These discussions resulted in a description of how to fill in the relevant fields, and later in a project to replace or improve the existing system.

7. Usage Principle 9: Use the data.

If the metrics results are not used to gain insight and as a basis for making corrective actions, the metrics program will soon degenerate into a bureaucratic procedure that merely adds to the overhead of developing software without contributing to the continuous improvement of the software operation. If no consequences are drawn from poor measurement results, then the metrics program is likely not to succeed. In drawing consequences, keep in mind that the entities involved in software development cannot be measured with absolute precision. Be aware of a certain imprecision and inaccuracy of the data and do not over-interpret the data. However, being able to recognize trends, even from imprecise or non-complete data sets, can be more helpful than having no data at all. Principle 9 in Action

The three small companies used the data in different ways. In company A the data showed that the chief developer spent fewer hours on development tasks; the time saved was used for business administration — a positive by-product of the improvement project. In the same company the number of hours spent tutoring other developers by the chief developer in relation to them carrying out the task had not changed. The tutoring hours remained constant, despite improved documentation. Thus, the company aims were not supported in this case. However, the results were used when planning further work and new projects with respect to the required resources. In company B the time merging the code after changes had been made was reduced to less than a third, from 90 to 20 minutes on average. A similar reduction was observed for the weekly review meeting of the test runs, from 120 to 30 minutes on average. This allowed for extra testing without spending more time. Finally in company C the more precise examination and additional coding doubled the average time spent on a request. This figure was not surprising; the company considered it more realistic and could thus better plan and inform the customer about delivery of error corrections. Discussions with customers confirmed that, they were more content with the company’s service and software as a result of accurate estimates on

error-correction and an increased number of corrected errors. In all three companies the metrics data was thus used to continuously improve planning, performance, and working conditions, which as a consequence lead to higher customer satisfaction. At the time when we left Danske Data, the metrics program had not yet been in active use. One of the reasons for this was the lack of ownership of the program; nobody actively requested the data. The metrics group was convinced that they were in a vicious circle: the data reported were of a poor quality, since those who reported them did not see any advantage in supplying accurate data in a timely manner. At the same time, the poor data quality caused management to be wary of making the results public. The final publication of a metrics report, therefore, made the metrics group look much more positive to the future of the program.

8. Discussion Even though the companies described here were aware of and adhered to many of the principles listed earlier, they still had problems in implementing a successful metrics program. This is especially true for Danske Data. There are several explanations of this. First, implementing a metrics program requires extensive cultural and organizational adaptations to a new way of thinking that is built upon making decisions based on measurable phenomena rather than gut-feel and intuition. Second, as measurements are made while the organization is improving, it becomes difficult to establish a baseline for the improvements. In Danske Data the purpose of the measurements was to show that a three-year SPI effort would yield a 10% increase in productivity. However, during the implementation of the program, the organization changed considerably, it grew from about 300 to about 500 software developers and from three to five development sites. In the same period a number of improvement initiatives were launched that would significantly change the software development process as well as the productivity levels. This made it very difficult to establish a productivity level against which the improvements could be evaluated. The 10% increase could, therefore, not be verified. In the presented example projects, we took all the described principles into account. However, the fundamental factor was that each key element was adjusted to suit the companies. It is not simply a matter of following predefined checklists and guidelines. Debate is ongoing as to how and when to apply metrics, and which metrics to apply for evaluation (Humphrey, 1989; Paulish & Carleton, 1994; Brodman & Johnson, 1995; Fenton & Hall, 1997, Kautz, 1999). The mentioned principles are certainly a good starting point, but they are interrelated in a complex, not generally predictable way and they depend on the context in which they are applied. Thus their mere application is no guarantee for success. Several authors (e.g. Bjerknes, 1992; Wastell 1992; Walsham 1993) have warned against interpreting software technology in organizations in a simplistic way. They advocate strategies that emphasize the complex interrelationships and this was taken care of in this project by tailoring the existing knowledge to the companies’ needs. The application of the principles in our examples was guided by the insight that metrics application is not a context-free technical matter, but a process that takes place in a social environment. Thus, each company developed their own way of defining and using metrics. With the overall organizational structure unchanged, measurements of the work processes were not experienced as a contradiction to creativity and flexibility. On the contrary, these were supported and a systemic approach to process improvement by using metrics was

acknowledged as motivating and as a vital part of professionalism. Apt metrics were not experienced as personal threats, but as useful help when monitoring the processes for further improvement. It has been difficult to identify the data for measuring improvements quantitatively in the software development process, but simple, existing metrics of real value to the companies proved to be a good starting point. Finally, in order to ensure the continued success of metrics programs after they have initially been implemented, their use should be continuously evaluated and improved. In this way it becomes possible to detect needs for new metrics to be added, or new ways of collecting the data etc. This evaluation can be performed in a number of different ways, such as through surveys and interviews. If the data is available in an on-line system such as an intranet, various statistics can easily be produced. This tenth principle, which we, however, have not thoroughly analyzed or applied in our two cases, might be formulated as: Evaluate your metrics program to further improve.

9. References Albrecht, A. J. (1979). Measuring Application Development, Proceedings of the IBM Applications Development Joint SHARE/Guide Symposium, Monterey, CA, pp.83-92 Basili, V. R. and Weiss, D. M. (1984). A Methodology for Collecting Valid Software Engineering Data, IEEE Transaction on Software Engineering, Vol. 10, No. 6, pp. 728-738. Basili, V. R., H.-D. Rombach (1988). The TAME Project: Towards Improvement – Oriented Software Environments. In IEEE Transaction of Software Engineering, Vol. 14, No. 6, pp. 758-773. Bjerknes, G. (1992). Dialectical Reflections in Information Systems Development. In Scandinavian Journal of Information Systems, Vol. 4, pp. 55-78. Brodman, J. D., D. L. Johnson (1995). Return on Investment from Software Improvement as Measured by US Industry. In Software Process - Improvement and Practice, Pilot Issue, pp. 35-47. Carleton, A. D., Park, R. E., Goethert, W. B., Florac, W. A., Bailey, E. K., and Pfleeger, S. L. (1992). “Software Measurement for DoD Systems: Recommendations for Initial Core Measures.” SEI-92-TR-19, Software Engineering Institute, Pittsburgh, Pennsylvania. Dekkers, C. A. (1999). The Secrets of Highly Successful Measurement Programs, Cutter IT Journal, Vol. 12, No. 4, pp. 29-35. Draffel, L. (1994). Professionalism and the Software Business. In IEEE Software, July 1994, p. 8. El Emam, K. et al. (1997). SPICE - The Theory and Practice of Software Process Improvement and Capability Determination. IEEE Computer Society, Los Alamitos, Ca., USA. Fenton, N. E. and Pfleeger, S. L. (1997). Software Metrics - A Rigorous and Practical Approach, PWS Publishing Company. Grady, R.B., Caswell, D. (1987), Software metrics: Establishing a Company-wide Program, Prentice Hall, Englewood Cliffs, NJ, USA Humphrey, W. S. (1989). Managing the Software Process. Addison-Wesley, Reading, USA. Iversen, J. H. and Mathiassen, L. (2000). Lessons from Implementing a Software Metrics Program, in Proceedings of the HICSS 33Conference, Wailea, Maui, Hawaii. Kautz, K. (1998). Software Process Improvement in Very Small Enterprises: Does it pay off, Software Process – Improvement and Practice, Vol. 4, No. 4, pp. 209-226 Kautz, K. (1999), Making Sense of Measurements for Small Organizations, IEEE Software, Vol. 16, No. 2, pp. 14-20. Kuvaja, P., et al. (1994). Software Process Assessment & Improvement - The Bootstrap Approach. Blackwell, Oxford, UK.

Paulish, D. J. and A. D. Carleton (1994). Case Studies of Software Process Improvement Measurement. In IEEE Computer, September 1994, pp. 50-47 Pfleeger S. L. (1993), Lessons Learned in building a corporate metrics program, IEEE Software, 10 (3). Pulford, K. et al. (1996). A quantitative approach to Software Measurement - The ami Handbook, Addison-Wesley, Reading, USA. Rifkin, S. and Cox, C. (1991). Measurement in Practice, Software Engineering Institute, Pittsburgh, PA. Wastell, D. G. (1992). The Social Dynamics of Systems Development: Conflict, Change, and Organizational Politics. In Easterbrook, S. (ed.), CSCW: Cooperation and Conflict. Springer, London, UK. Walsham, G. (1993). Interpreting Information Systems in Organizations. Wiley, Chichester, UK. Weinberg, G. M. (1993). Quality Software Management: First-Order Measurement, Dorset House Publishing, New York, New York. Aaen, I. et al. (2000), A conceptual Map of Software Process Improvement, (forthcoming).

The Challenge of Metrics Implementation - CiteSeerX

The Challenge of Metrics Implementation - CiteSeerX

Suggest Documents