Goal-Oriented Software Assessment David M. Weiss, David Bennett1, John Y. Payseur, Pat Tendick, Ping Zhang Avaya Labs Research Basking Ridge, NJ
[email protected],
[email protected],
[email protected],
[email protected],
[email protected]
1
Currently with David Bennett LLC 134 Augusta Dr. Lincroft, NJ
ABSTRACT Companies that engage in multi-site, multi-project software development continually face the problem of how to understand and improve their software development capabilities. We have defined and applied a goal-oriented process that enables such a company to assess the strengths and weaknesses of those capabilities. Our goals are to help a) to decrease the time and cost to develop software, b)to decrease the time needed to make changes to existing software, c) to improve software quality, d) to attract and retain a talented engineering staff, and e) to facilitate more predictable management of software projects. In response to the variety of product requirements, market needs, and development environments, we selected a goal-oriented process, rather than a criteria-oriented process, to advance our strategy and ensure relevance of the results. We describe the design of the process, discuss results achieved, and present vulnerabilities of the methodology. The process includes both interviews with projects’ personnel and analysis of change data. Several common issues have emerged from the assessments across multiple projects, enabling strategic investments in software technology. Teams report satisfaction with the outcome in that they act on the recommendations, ask for additional future assessments, and recommend the process to sibling organizations.
Categories and Subject Descriptors D.2.9 [Software Engineering]: Management - Cost estimation, Life cycle, Productivity, Software process models (e.g., CMM, ISO, PSP), Software quality assurance (SQA), Software measurement,. Time estimation.
General Terms Management, Measurement, Performance, Design, Experimentation, Standardization, Verification. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. International Conference on Software Engineering 2002, May 19-25, Buenos Aires, Argentina. Copyright 2002 ACM.
Keywords Assessment, Software Management, Quality Improvement, Process, Goal-oriented Measurement
1. INTRODUCTION 1.1 Software Management Issues What does it mean to manage the capacity to develop software effectively at a large company? Our company, Avaya Inc., has approximately 3,000 engineering and management associates involved in software development of more than 100 projects around the world supporting nearly 1 million end customers. We need to understand the key technologies, processes, skills and tools that support predictable development of software intended for a diverse marketplace. Each market segment targeted by the company has different development requirements and expectations. For emerging applications software the marketplace is driven by rapid changes in technology and competitive threats that create a need for quick and responsive software development. Development for this marketplace has to be agile, with rapid prototyping and collaborative validation of initial requirements through lead customers. By contrast, to maintain an existing customer base of telecommunications products we must focus on careful attention to defect avoidance and reliable design, driven by high cost of recalls, design to broad requirements, and customers’ dependence on a highly available product. Both types of products are essential for our business survival; we face the challenge of creating a software development environment that meets the challenges of both needs. To understand the state of software development in our company, we must be able to answer questions such as the following: 1. 2. 3. 4.
How is effective software development measured? What is our current performance against those measures? How much are we investing in software? What does it buy us? What are the key technologies that enable rapid development? What approaches lead to reliable customer experience with software?
1. Commitment
2. Data Gathering & Analysis
a) Commit to assessment
a) Conduct interviews
b) Review with Director & managers
b) Propose alerts & observations, validating with managers prior to final report
c) Clarify goals d) Customize questions, data approach
c) Find trends in MR data, share with project until explanations for trends understood d) Agree on tentative areas for work
4. Improvement Implementation
3. Feedback
a) Assessment team facilitates identification of experts to advise and target improvement projects
a) Present alerts, observations, data analyses to organization
b) Organization acts on targeted areas
b) Continue to explore “why” in order to focus improvement projects better
c) Plan for future assessment cycle
Figure 1: Assessment Process Summary. The process reflects the “Plan-Do-Check-Act” Shewhart cycle. 5.
How can we create a work environment that attracts and retains the most talented people? 6. How can we quickly train new employees both in application domains and in standard company software development processes, especially in times when there is a large influx of new employees or a serious loss of experienced employees? As an example, data from our assessment process, which we describe in the following sections, shows a characteristic learning curve for our projects, constant in shape, but varying in scale. Notably, for some projects it takes as long as two years for new developers to become fully productive, whereas for other projects it may take only three months. (See section 3 for more details on this effect.)
1.2 Goals Avaya is a large telecommunications equipment company with many software-intensive products. Our objective is to create an assessment process that provides useful feedback to individual software development project teams while at the same time identifying opportunities for investment in software technology across the corporation, where the cost/benefit ratio is large. Project-level feedback needs to be useful and actionable in the near term; corporate-level feedback needs to result in investments that pay back across many projects. Avaya employs engineers in locations worldwide, including New Jersey, Colorado, California, Massachusetts, Australia, the United Kingdom, France, Israel, and others. Teams in these locations are often faced with conflicting constraints, such as delivering code at predictable dates, delivering code with high levels of reliability and availability, and altering code in response to rapidly changing customer needs. Products need to meet interoperability requirements with new and existing company products as well as provide backward compatibility to large legacy code bases. In addition to market-focused new features, development teams must address non-functional requirements, such as usability, performance, security, serviceability, installability and others. Some products need to be introduced quickly to counter niche offerings by com-
petitors. Others comprise telecommunications infrastructure and must be both highly reliable and more functional than competitive offerings. Changes to the technology and to the market for telecommunications equipment occur so quickly that products must evolve to reflect them or become obsolete, possibly between project startup and market introduction. The software code bases range from single-person projects of a few thousand lines of code to multi-location projects with more than four million lines of code. Many company locations represent acquisitions of formerly independent companies. The combination of recent acquisitions, technology changes, and force reductions creates an environment of continual team re-formation. Consequently, software development environments are diverse in size, management style, tools, maturity, application domains, programming languages, configuration control methods, and process methodologies. These factors pose problems for a software assessment process, including: How can the process minimize the assessment cost to individual projects? How can the process allow for the diversity of software requirements across projects? How can the process provide near-term benefits so that conducting assessments continues to receive local support? How do we assure that the topics covered by assessments are complete and comprehensive? How can improvements be validated so that expenses associated with assessments are justified by improvements made? How can improvement proposals be prioritized given the limited availability of resources? How can the assessment process be made repeatable and predictable? How can the company use assessment results in support of its business strategy?
Over the past 5-10 years, R&D teams within our company addressed these factors by conducting organizational assessments using both the Software Engineering Institute Capability Maturity Model for Software (SEI/CMM) [19] and other assessment methodologies [11], [12], [13]. These methodologies start with a set of issues and practices that are considered by experts and/or industry consensus to be critical to production of software, and then evaluate the extent to which interview and documentation evidence supports the presence or absence of required criteria elements. In the case of the SEI/CMM, for example, feedback is provided that evaluates, for each of the 17 Key Process Areas (KPAs), the extent to which the organization meets preestablished criteria in the areas of activities and infrastructure, institutionalization, and goals. The 17 KPAs are grouped into 5 levels, where the items in each level are a prerequisite for achieving the next higher level, e.g., the organization must attain level 2 before it can progress to level 3. In this way, problems at lower achievement levels prevent consideration of the achievement for the higher levels. Organizations are scored on a cumulative scale such that if items at the beginning of the progression are not present, no credit is given for items further along the sequence. Experience was mixed using these criteria-oriented assessment procedures. While many individual teams valued and acted on the feedback they received, problems were evident. 1.
The cumulative nature of the SEI/CMM levels created frustration among the teams where criteria were satisfied at higher levels of the model, but not at lower levels, preventing acknowledgement of strengths. 2. Management focused on the numerical ratings to the exclusion of implementing business-related improvements, causing some organizations to work toward optimizing assessment scores but not achieve meaningful performance improvement. 3. Debate about the pros and cons of the criteria itself consumed time that could have been spent more beneficially on implementing improvements. In response to the preceding problems and the wide variety of product requirements, market needs, and development environments, we selected a goal-oriented process, rather than a criteriaoriented process. A goal-oriented process, such as the goal/question/metric paradigm ([1], [24]) assesses software in the context of individual, project, and organizational goals rather than rating software practices against external criteria (SEI/CMM). Goal-orientation helps to reinforce the feeling within each organization that the recommended improvements will create benefit, and eliminates much of the resistance to change that comes from long explanation and discussion of the external assessment criteria. Programs for establishing goals and measures for software development in a large distributed corporation have been documented in [2], [3], [4] [9] , [10], and [22]. As an example, nearly all projects want to decrease the length of time it takes them to produce their next release, known as the development interval. Some projects may have a quantitative goal, such as a 10% decrease, or delivery within two weeks of their planned delivery. A criteria-oriented assessment is unlikely to tell them whether they will be late or not, or even to provide an estimate of their delivery date, or to suggest improvements they could make to improve their interval. A goal-oriented assessment will try to obtain the data needed to estimate their delivery date
and give them an assessment of whether they will meet their date or not. An effective project manager welcomes such feedback, especially if the feedback is aimed at helping to produce the product rather than grading the product development organization. One result is that project management becomes receptive to introducing new ideas, techniques, tools, and processes because they are introduced with the specific purpose of helping the project to meet its goals. We call our process the Avaya Software Assessment Process (ASAP). It combines structured interviews with analysis of change data, combining aspects of CMM assessments and quantitative analyses such as used by the SEL, TAME, and others [2],[3],[9],[10],[11],[12],[13]. The key aspects of ASAP are as follows. Derive assessment goals from the goals of the organization undergoing assessment, thereby providing understanding of and feedback on a wide variety of issues ranging from product quality to organizational morale to productivity to customer expectations. Focus on understanding both the strengths and weaknesses of individual projects and on understanding common problems and strengths across the entire company, thereby allowing both detailed recommendations for improvement at the project level and recommendations for large-scale investments at the corporate level. Use a combination of quantitative and qualitative analysis, thereby providing performance feedback expressed entirely in the context of the organizations’ own data, moving the organization to see more quickly the strategic value of implementing improvements. Perform quantitative analysis that enables validation of immediate findings and provides a baseline for comparison with future performance. Conduct structured interviews with a representative sample of project personnel, conducted by experienced software developers, software development managers, and software engineering researchers, which helps achieve complete coverage of issues and permits the assessors to follow particular lines of interest in the interviews. Refrain from computing an overall one-dimensional “total score,” thereby eliminating resistance to changes based on honest disagreements over the value of the assessment criteria details. Although our interviews focus on many of the same areas as criteria-oriented assessments, we have complemented interviews with quantitative analysis, and have adopted goal-orientation rather than criteria-orientation, to create a novel assessment process.
2. How do we structure the assessments? Our assessment process has the following phases: 1. Commitment 2. Data gathering & Analysis 3. Feedback 4. Improvement implementation These phases reflect the Shewhart “plan-do-check-act” cycle [20], first documented in 1939 by Walter Shewhart, and often regarded as a fundamental paradigm for organizational performance improvement planning and implementation. Most assessment proc-
esses, whether goal or criteria-orfiented, use some version of these phases. Figure 1 highlights the primary components of ASAP.
2.1 Commitment The first step is to obtain commitment from the managers of the organization, starting with the senior manager responsible for the R&D team. We cannot and do not impose an assessment on a project; projects voluntarily participate. They do so only when their senior managers believe that the process will benefit them. This improves the commitment level of participants, and it eliminates (or reduces) the number of assessments undertaken in organizations unprepared to follow through with improvement resources. Prior to the start of assessing a specific organization, the assessment team and the organization’s entire management team review the assessment process to ensure that expectations are set realistically with regard to the amount of time and resources required, the length of time needed, and the types of observations and recommendations that may result. To set properly the expectations of the organization, and to convince them of the benefits of an assessment, we often use the results from assessments of other projects. Managers of the software team are asked to review the question set used for interviewing, and make any additions, deletions, or changes they think are appropriate. This allows them to influence the goals of the assessment in a way that best reflects their goals. The director of the software development team is also interviewed so that the goals of the organization can be made clear by the senior leadership. These goals are used during data gathering and analysis to shape questions and to direct inquiry into areas and issues related to achieving the goals. We rely on the experience and alertness of the interviewers to guide the interview when an issue related to the project’s goals surfaces during the interview. The result of the commitment phase is a schedule for interviews, a set of development team members to be interviewed, and access to data amenable to quantitative analysis, such as change data maintained as part of a configuration control system.
2.2 Data Gathering & Analysis Data gathering and analysis has two components: quantitative and qualitative. Quantitative analysis is performed through the application of statistical methods to databases of software change information; qualitative analysis is performed with structured interviews of engineering staff members coupled with analysis of project and process documentation. Additional people, such as marketing staff, are consulted as appropriate and additional material is analyzed, such as any benchmarking conducted recently, or any recent competitive marketing information.
2.2.1 Quantitative Data Gathering & Analysis Analysis of software change data is important because value is added to (or removed from) software only through changes made to it. These changes are expensive and often risky. Therefore, the extent to which the business value of software is increasing or decreasing can be discerned through an analysis of change information. For additional discussion of this area, see [18]. The quantitative analysis is based on software change information found in each project’s version control databases. This change
information is contained in data base records called Modification Requests, or MRs. Each MR contains much of the information associated with one software change, such as the reason for the change, the engineers involved, analysis of the problem, identification of the code, record of unit testing (if any), and the history of decisions associated with the change, and other information designated by the project. The gathering of raw data is accomplished by using the MR data base query functions. We use a statistical analysis tool, Splus, to support data analysis and visualization. Although the steps of the quantitative analysis can vary depending on availability of information in the specific project database, we usually perform the following steps. 1.
Obtain administrative data base access accounts for each product generic of each project. Extract MR data from the appropriate change database and the source code database. 2. Understand the process flow and decision making assumptions of the team around problem and change management. 3. Perform preliminary data cleaning to avoid double counting and to remove irrelevant MRs based on feedback from the previous step. 4. For each MR, calculate metrics from raw data that are relevant to the goals and questions of the assessment process. Examples include time, effort, MR extent, developer experience, and the history of the files being changed. 5. Validate findings identified during interviews (e.g. evidence of improvements, process bottlenecks). Alternatively, statistical visualization and modeling tools can be used to display trends, relationships, and to address fundamental issues such as architectural design and effort estimation. See [8] for more discussion of these ideas. 6. Perform any additional analysis suggested by the data found so far, or by the goals of the organization. Where data availability permits, we often try to analyze over a five or ten year time span. Such long-term analysis may reveal interesting trends, particularly with respect to long term goals, but is also fraught with opportunities for misinterpreting those trends. Changes in productivity, quality, effort, and other characteristics are affected by numerous factors, such as changes in development processes, development technology, management focus, marketplace shifts, and staff turnover. It is critical to discuss explanations for trends with those familiar with the history of the project, especially if they have lived through that history. We constantly challenge ourselves and others to suggest hypotheses to explain any trends that we see, including the absence of effects that we think should be clear from the data but that are not evident. Avoiding misinterpretation requires understanding of statistical techniques, experience with software development, and knowledge of project history. In parallel with the data analysis, we have developed a data mart to provide a standardized, flexible platform for future analyses. The data mart is based on the same data sources used for the initial quantitative analysis, but the data is transformed to enable easy reconstruction of previous analyses. Existing data mart development methodologies fit into the goal-oriented framework very well [14], [15].
2.2.2 Qualitative Analysis The first step in the qualitative analysis phase is to interview the senior leadership of the software project team and to discuss the proposed assessment with the engineering managers. Goals of the organization are reviewed at this stage.2 Once the management team understands the purpose and the method, we incorporate their feedback into the questions and we jointly select people to interview for the project. To date, there have been remarkably few changes to the specific questions, reflecting initial agreement that the base question set is complete and adequate. Interview questions are structured into four major areas: Process, Performance, Product, and Organization/People. Appendix 1 contains a complete set of the questions used. Examples of questions in each of these areas are: Process: For each defined phase of development, what are the activities of that phase? How are intersystem interfaces managed? What criteria are used to decide when a new release may be shipped to customers? How are time, cost, and quality estimated and predicted? How is the code organized into work assignments? What are the top 3 risks the project faces? Performance: What are the goals of the organization? To what extent are you and the organization meeting these goals? How satisfied are customers with the product, including end customers, channel customers, and internal customers? How much time does it take to make a change to the software (time to implement an MR)? For new feature MRs? For bug fix MRs? Product: What is the market for the product? What type of software is it? Who are the competitors? Is there a documented, well-defined, enforceable software architecture? How is it mapped to business objectives? How many lines of non-comment source code are there in the last few releases, or equivalent customer delivery items? Organization/People: How are training needs determined? How do new employees learn what is expected? What are the key roles in producing the product? How good is teamwork within the organization? Across organizations? How good is morale in the organization? Why do people leave or join the project or organization? What are people rewarded for? In order to obtain repeatable results, we select a sample of project personnel to interview according to the following criteria. The sample includes at least one person representing each of the key roles in the software team. These typically include: director, group manager, project manager, developer, build/load engineer, integration tester, system tester, product introduction engineer, and sustaining engineering. Other 2
We have found that in many cases, by simply discussing goals and objectives with engineering managers, they begin thinking more sharply about what their goals are and how those are reflected in plans. In one example, the initial management discussion led the managers to take a few weeks prior to initiation of any interviews to clarify their project and organizational goals.
specific roles that we consider are the customer (or dealer), and the market/product/offer manager. Every supervisory group in the development team is represented in the sample. The sample reflects a mix across those newly hired to the most experienced. Additional people are considered based on the specific situation. For example, is there a person who owns the architecture? Is there a senior engineer to whom many people look for guidance? Is there a person responsible for handoff quality? Is there a process owner? The sample usually includes about 1/3 of the people on the development team. Each interview takes 1 to 2 hours. We conduct the interviews with only one subject from the software team present, and usually no more than 2 assessors, and we do not reveal outside the assessment team what answers to our questions were given by any particular interviewee Study of additional product, process, and market-related documents augments the interview part of the process. Documents are gathered prior to interviews based on the initial discussions with managers. These include project plans, test plans, design documents, architecture documents, product introduction plans, project meeting notes (including go/no go gate decision meetings), process documents, and any marketing or benchmarking information available to help in understanding of the software environment. As the R&D organization is operating under an ISO 9001 certification for management of quality and process, many of these documents are readily available on corporate document database systems. Benchmarking or competitive positioning information is sought from organizations where recent benchmarking studies are available.
2.3 Feedback to the Project We report results at two organizational levels. First is a report for the project, which is kept confidential to the project team unless the team otherwise authorizes distribution, followed by a report to the senior management of R&D for the company. The project report is provided first to the managers of the software team, and then, after any changes or corrections are made, to the entire team. The report to the senior R&D management is provided to the R&D Vice Presidents of each of the major market and technology areas. This report focuses not on specific assessments, but on trends and software technology areas where common action across all teams would be beneficial.
2.4 Improvement Implementation The improvement implementation phase depends on the areas identified for action. Some examples are presented in the following section on results. Generally, improvement implementation has been most effective where the senior management of the organization believes the area is of strategic importance, and where the assessment team has been able to bring expertise in from outside the organization to facilitate rapid planning of follow-on items.
3. What are some typical results? A surprising outcome of the assessment process has been the emergence of common themes across project teams. We have conducted six assessments over a period of about eight months, covering teams located across the U.S., and in one case, in Europe; many of these teams consisted of organizations that had
been recently acquired by the company. They ranged in size from about 20 to about 250 developers.
Version 9
When we asked development teams about goals, there was a mixture of answers, ranging from “quality is what really matters to customers, and they would rather wait a little for us to get it right” on the one hand, to “we need to get this software into the marketplace more quickly or we won’t have a marketplace” on the other. Often, asking managers and engineers about their goals stimulated constructive thought among the team on what their goals are, and what they should be.
Version 8
phase 7 phase 6 phase 5 phase 4 phase 3 phase 2 phase 1 Version 6
phase 7 phase 6 phase 5 phase 4 phase 3 phase 2 phase 1
200
400
600
Average Number of Days in Cycle
Figure 2: Development Cycle Time. Analysis validates substantial improvement in cycle time. Therefore, we did not expect the degree of commonality in results that we actually encountered. From the company’s viewpoint, the emergence of common themes is beneficial since it allows focus on investment in a smaller number of technology areas with some confidence that benefits will be applicable to many projects. Examples of areas where assessments found opportunities for software technology improvement include: Methods and tools used by projects for estimation of schedule, resources, and expenses associated with new development are ad-hoc and, for some projects, are not producing repeatable and accurate results; Better use of automated testing tools would speed up the final phases of a development, and make the re-testing of features more predictable and complete; The use of architecture analysis and documentation would improve software team performance. Specifically, it would help to identify a) modules most likely to undergo change in the future, b) modules most likely to be relatively stable in the face of future change, and c) information required by new development team members to understand the code. This information includes key interfaces, information and decisions hidden by each module, and the scheme for partitioning the code so that the location of changes is apparent and that changes can be made independently of one another. The creation and support by management of opportunities for teams to learn and to share ideas, tools, methods, and approaches would speed up initial project plans and improve the effectiveness of software teams. At the project level, observations that the assessment team felt could have a negative short term impact if not dealt with quickly were presented as “alerts.” These could be issues such as staffing shortfalls for key positions, test tools that do not provide repeatable results, inadequate disaster recovery planning, or problems in inter-group misunderstanding of handoff criteria. Software technology investment opportunities included items such as test automation, use of better tools for requirements management, use of tools or methods not previously considered for project schedule and size estimation, use of new tools for configu-
The following are typical quantitative analyses. First, Figure 2 shows, for each of three projects, the relative amount of time taken by each phase of the problem resolution process for MRs. We have identified seven phases that an MR undergoes, including review, assignment, coding, testing, and closing. Each of these events is timestamped by the change management system, therefore allowing the calculation of intervals.The data that we used to produce Figure 2 spans many years of history for this code base, which is between 1 and 2 million lines of code, primarily C++, although some other languages are represented. The horizontal scale shows the number of days for an MR to move through a particular work phase, on average for that generic or release of software. Each project represents 1 or 2 years of work for a development team of 8-15 individuals. For example, the bottom project, Version 6, can be seen to take a few weeks to move an MR through the developer part of the cycle while it can take up to 2 years to resolve fully and close problem reports. By contrast, the topmost and most recent project has implemented substantial improvements in the software change and project problem resolution process, reducing the entire interval to just a few weeks. Note that we focus on interval rather than on effort, since interval is of greater importance to the projects. Furthermore, each MR is usually assigned to a single developer, so interval is usually a ceiling on effort (one developer may work on several MRs concurrently, so effort cannot be directly computed from interval). Finally, the data shows that the number of lines changed through each MR remains relatively stable across versions. Accordingly, the observed day reduction is not caused by a decrease in MR complexity.
Number of MRs in Queue
phase 7 phase 6 phase 5 phase 4 phase 3 phase 2 phase 1
0
ration change control, and introducing product-line engineering to enhance time-to-market and quality for sections of code most likely to change (see [23] for a discussion of Product-Line Engineering technology).
Phase 5 Phase 2 Phase 3 Phase 4
1996
05/98
11/98
06/99
M R Create D ate
2001
Figure 3: Backlog Queue Sizes. Analysis of MR queues reveals software process efficiency
Figure 3 shows an analysis for a code base history of the queue sizes, or backlogs, of MRs waiting for action in each MR lifecycle phase. This analysis provides a benchmark against which future cycle time improvements and software environment changes can be judged.
engineers who had been working with this code for more than 4 years achieved “super expert” status, confirming that efficiency continues to improve with experience as revealed quantitatively by the chart. Similar reactions were received from developers who work on the smaller product.
Each curve shows the number of MRs in a particular phase awaiting actions that would move them to the next phase. This result was interesting because it validated individual project level metrics showing that an improvement effort begun 2 years ago had succeeded in stabilizing and controlling the workload represented by MRs waiting in queue for development action. Improvement actions centered around weekly monitoring of customer problem data combined with root cause analysis of individual problem reports.
For the larger product, we also asked a related question: is the learning curve effect visible when we plot developer efficiency as a function of the number of software changes made to the code base by the developer? The answer is yes. Our data showed a pronounced “knee” in the curve at 400 MRs, indicating that engineers with experience beyond 400 MRs are progressively more efficient than those with less experience.
4. What do we do with the results? Once assessment results are available, we recommend actions that the project’s management might take. To be effective, these actions require the following.
4 years
2 years
14 months
3 months
Number of open days
1.
Number of months since 1st MR (log-log scale)
Figure 4: Developer Learning Curve. After 2 years in one project, but only 3 months in another, developers become more efficient at making software changes. Figure 4 quantifies the length of time for a beginning engineer to become fully productive (the “learning curve”). The data used to produce the two curves come from two different projects. The upper curve corresponds to a project with more than 8000 MRs, completed by 125 developers over a 10 year period. The lower curve corresponds to a project that is about 10 times smaller. The upper curve shows the median number of days it takes a developer to make a software change as the developer gains experience. For example, the peak in the top curve illustrates that for developers with 14 months of experience, it takes roughly 7 business days to analyze, code, and submit a change. As a comparison, the lower curve shows the same kind of learning phenomenon for a much smaller software product. Both axes are plotted in log-scale to make the two curves visually comparable. The actual difference is much more pronounced when shown in real scale. Note that the shape of the two curves are remarkably similar. We have come to think of this as the characteristic learning curve for our projects. For the larger product in the chart, an interesting hypothesis is that after about two years, the amount of time needed to make a change stabilizes, and then efficiency improves steadily across the next six years of experience. When asked during interviews how long it took to become a code “expert” where the developer was trusted to fix problems efficiently in any part of the code, the answer was 2 years. Additional responses indicated that project
Commitment from the senior organization management prior to start of the assessment. 2. Validation of observations and recommendations with the managers of the organization, including alignment of problems with organizational goals. 3. Ability to identify and to bring in support resources with expertise in the specific problem areas as part of follow-on planning. 4. An on-site team member to facilitate various meetings, agreements, resources, and logistics required to put improvement projects in place. In the first step after completing the data analysis phase, the assessment team meets with the managers of the organization to share preliminary observations, and to validate the accuracy of the conclusions. During this discussion, the managers have an opportunity to focus subsequent activity on those areas where they believe the biggest payback will occur, consistent with organizational goals. It is at this meeting that priorities are set for software technology areas that, if improved, will have the most strategic payback, and where agreement is reached regarding which improvement areas are the most strategic to the organization in meeting its goals. Feedback reports are designed to make planning easier by classifying observations according to planning urgency. For example, items that appear to call for near-term management attention are dubbed “alerts” and are presented first. Next, items are presented summarizing the important aspects of the software organization based on the question set and the quantitative data uncovered, including areas of strength. Where possible, areas of strength are validated with marketing or benchmarking information. We present opportunities for improvement of the organizational software capability along with corresponding technology that could address the problems. We give an indication regarding the extent to which the problem area has a known solution. On one end of the spectrum is a problem that has been solved by another software organization within the company, and needs only time and energy for sharing and learning. An example might be to use an improved approach to planning for product trials. A more difficult problem might require the acquisition, training, and use of an available tool. An example here might be to improve the level of automation of system verification testing, where an available tool needs to be selected, and then adapted to the local envi-
ronment before application in the next project. Finally, some problems might require experimentation with techniques that represent current software engineering research issues. An example here might be how to restructure most economically the code and associated documentation so that learning time for new employees is reduced, and the time needed to make the most frequent changes is reduced. See [23]. Note that an important characteristic of our assessment process is that feed-back reports, either at the project or corporate level, do not include any comparison of the assessed organizations to an absolute criteria – there are no “A”s or “F”s.
5. Success Factors We find that the single most important factor contributing to the success of the assessment project is the mix and experience of the team members performing the assessment. The assessment team is part of Avaya Labs Research. Team members who are interviewers either have more than 20 years of experience in software development or are experienced in software engineering research. Other team members have expertise in statistical analysis, and in data mining. Some members of the research team had previously managed software development and quality improvement projects involving the organizations and people being assessed. This experience level greatly aids in knowing what questions to ask, how to follow up, and how to distill the interview notes into recommendations that further the strategic interests of the organizations. As part of the feedback, we identify “Alerts” representing issues, problems, or observations that the team feels warrant near-term management attention. Identifying the items most critically in need of management attention requires assessment team members who have “worn the other guys’ shoes.” Managers are much more interested in where their current practices made their product vulnerable to problems than they are in recommendations expressed in the more abstract vocabulary of an industry assessment criteria. Similarly, in distilling MR data into hypotheses regarding organizational performance, assessment team experience is called on to form sensible and possible explanations for the data. In almost every case, we need to iterate with the organizations to refine data analyses until all agree on the validity of the analysis. Getting the context right is essential, and experience is needed to understand the context of complex software environment. An example of the pitfalls of data interpretation is how we came to understand one team’s MR flow cycle. The data initially indicated that it was taking about 6 months for MRs to be approved once the MRs had entered the final verification phase. Only after checking this observation against experience and against other data, then reviewing the process several times with project principals, did we realize that the real approval intervals were not being captured in the database we were analyzing, but were kept elsewhere. In responding back to the organization with questions and clarifications, the assessment team ensured that conclusions were sound and well understood by the organization. A second factor for success is the presence of one of the assessment team members at the location where the bulk of the interviews are conducted, and where subject matter experts in MR databases are located. The on-site person is able to act as facilitator for the many logistical details inherent in this process, and also helps to “grease the skids” by explaining the purpose and resource
requirements of the process to the many stakeholders. Most of the assessment team is in a location other than the one where the practitioners and clients work. In our case, the assessment team is based in New Jersey while the primary site for assessed projects is in Colorado. Industrial software teams are always working toward tight delivery deadlines, and the team member on-site is able to avoid many types of problems and delays that naturally occur when engineers, who are subject to unpredictable interruptions, are asked to participate in interviews and data analyses. A third factor contributing to success is to bring expert resources to bear following the interview and data analysis phase to help assessed organizations introduce new technologies and procedures to their normal operations. For example, when an organization agreed that improving the accuracy of schedule and cost estimation was strategic, expertise was offered in two approaches, and then quickly piloted by the organization. The approaches offered were the COCOMO estimation model [5] and the Delphi [6], [7], [17], [20], estimation method. For COCOMO, a COCOMO software tool was provided along with training to create an initial model. For the Delphi method, a facilitator was brought in who conducted two sessions in a “train the trainer” mode so that the team will be able to conduct future Delphi estimation sessions without the need for outside help. This software organization is now able to apply two additional technologies to future projects that directly address problems identified during the assessment process. A fourth success enabler is that software change information is kept by each team so that multi-year statistical analyses can be performed. We found that some projects had more usable information than others did, depending on how old the product was and how stable their configuration management implementation was. Those projects that had not reengineered their configuration management toolset nor been moved across locations tended to have the data with the longest time frame. The types of information kept by each team depended on their tools, and the extent to which they had performed root cause problem analysis in the past using change information. A fifth success factor is the speed with which we feed back results to the organizations we assess. Typically we feed back preliminary results to project management within three or four weeks of conducting interviews, and final results to the complete project team within another two weeks.
6. What are the vulnerabilities of the method? Performing a goal-oriented assessment of our sort depends for success on several pre-conditions. The quantitative analysis requires the availability of MR, or software change, information for each project team over time. In our company, each project keeps MRs with different information and in different formats, creating problems of consistency and reproducibility across projects. To manage this in the future, we have created a data mart to capture and transform change information in a consistent way and with consistent fields; however, to the extent that data has not been kept by the software version control system, analysis is still somewhat limited. Improvements in development environment make it difficult to obtain a multiyear history of software changes, as tools are upgraded. This problem cannot be avoided entirely, as the company expects to be implementing improvements (i.e., changes) to de-
velopment environments and tools in order to improve process reproducibility, reduce rework, and improve development speed. When tools such as version control systems are upgraded, care must be taken to preserve as much data as possible. Software teams may lack implementation of a practice area regarded as important to one of the criteria-based assessment methodologies. Some of these gaps may not surface in a goal-oriented assessment methodology. To mitigate this risk, an analysis was done comparing our initial question set with the 17 Key Process Areas of the SEI/CMM [19]. The flexibility inherent in our goaloriented design addresses all of the structural elements of the CMM. The goal-oriented approach keeps the interviews, data analysis and feedback more directly focused on items that are on the minds of the organizational managers. We also mitigated this risk through the use of assessment staff experienced in software development and criteria-based assessment methodologies. We have so far performed these assessments on organizations of 75 people or less, and therefore cannot state from experience how the process would change with larger project teams. The process for interviews would not be different, and would require only additional interview sessions to be held. The quantitative analysis process is tailored to the goals of each organization, so that while the amount of time needed to understand each project varies according to the goals and complexity of the change management system, providing qualitative analysis for larger projects is not conceptually different from supporting a medium sized project. Finally, if this process is conducted within a medium to large corporation, a commitment is needed by engineering leadership to conduct the assessments on a continual basis so that teams benefit from and take credit for improvement activities. While it is helpful to perform a single cycle of assessments, with associated improvement projects, the payback is reinforced when management can see the improvements compound over a longer time frame. It could be difficult for an outside group to duplicate these results, especially if a high level of trust is not developed between the organizations and the assessment team, and if an integrated improvement planning infrastructure is not developed.
7. Summary and Conclusions We defined and piloted a goal-oriented software assessment process. The design benefited from considerable experience within AT&T/Lucent Bell Laboratories over the past decade in conducting software organizational assessments using well known criteria such as the Software Engineering Institute’s Capability Maturity Model for Software. The goal of the Avaya process is to identify and change areas of software technology and organizational practice within the company, leading to quantifiable improvements in speed, cost, and quality. Such improvements are problematic given that acquisitions and market-driven reorganization have created a mosaic of software teams worldwide that vary widely in their approaches to software engineering. Anecdotal feedback from assessed organizations indicates that using a goal-oriented approach results in higher motivation and interest than did criteria-based approaches such as the SEI/CMM. Baseline performance levels have been quantified for speed, engineering learning curves, development effort, and work queue backlogs. In most cases, improvement projects were undertaken to address problem areas from assessment results.
We found that asking questions about organizational goals often creates enabling conversations for those organizations to articulate, set, and deploy goals. Success in replicating these results will depend strongly on factors such as staffing the assessment team with substantial experience, skill in software engineering and software assessments, and statistical analysis, placing a member of the team on-site during planning and follow-up phases, and facilitating or brokering expertise to support rapid start-up of team improvement projects once assessment results are available.
8. Acknowledgements The authors thank all those who took part in the assessments and took time away from busy schedules to work with us in the interest of improving software development in Avaya. We especially thank the following individuals who helped to plan, facilitate, organize, or otherwise participate in the process: Al Literati, Tom Tierney, Sue Harkreader, Lucy Sanders, Shannon Hogan, Charles Parker, Larry Haas, Jason Brown, Holly Welty, Roger Pascoe, Darren Turgeon, Lisa Givens, Betty Goebel, David Holland, Debbie Herring, Dave Beauregard, Sheila Higgins, Mark Flores, Pat Klem, Jim Ferenc, and Margaret Carmichael. We also thank Jon Bentley for his careful reading and many helpful suggestions on earlier drafts of this paper, and the anonymous referee who provided helpful comments.
9. References [1] Basili, V.R., Weiss, D., "A Methodology for Collecting Valid Software Engineering Data,” IEEE Transactions on Software Engineering, Vol. 10, No. 3, November 1984, pp. 728-738. [2] Victor R. Basili and H. Dieter Rombach. "The TAME Project: Towards Improvement-Oriented Software Environments", IEEE Transactions on Software Engineering, v. SE-14, n. 6, June 1988, pp. 758-773. [3] Victor R. Basili and Scott Green, Software Process Evolution at the SEL, IEEE Software, pp 58-66, July 1994. [4] Birk, A., Hamman, D., Pfahl, D., Järvinen, J., Oivo, M., Vierimaa, M., van Solingen, R., The Role of GQM in the PROFES Improvement Methodology, CONQUEST 99. [5] Boehm, B.W. Software Engineering Economics, Prentice-Hall, Englewood Cliffs, NJ 1981. [6] Dalkey, Norman C., Delphi, P-3704., The Rand Corporation, Santa Monica, CA 1967. [7] Dalkey, N., Brown, Cochran, The Delphi Method, III: Use of Self Ratings to Improve Group Estimates, RAND, RM-6115-PR, 1969. Can be found in www.rand.org/publications/classics/delphi3.pdf [8] Eick et. al, “Does Code Decay? Assessing the Evidence from Change Management Data,” IEEE Transactions of Software Engineering, January 2001.
[9] Grady, Robert B. and Caswell, Deborah L., “Software Metrics: Establishing a Company-Wide Program,” Englewood Cliffs N.J.: Prentice-Hall, August 1986.
Appendix 1: Questions used for interviews
[10] Grady, Robert B. and D. Caswell, “Understanding HP’s Software Development Processes through Software Metrics,” HP Software Productivity Conference Procedings (April 1984), pp 3-38- 3-54.
1.
[11] Jones, C., “Measuring Programming Quality and Productivity,” IBM Systems Journal, Vol. 17, no. 1 (1978), pp. 39-63. [12] Jones, C., Programming Productivity: Issues for the Eighties, IEEE Computer Society Press (1981). [13] Jones, Capers, Software Assessments, Benchmarks, and Best Practices, Addison-Wesley, April 2000. [14] Kimball, Ralph, The Data Warehouse Toolkit, Wiley, 1996. [15] Kimball et. al., The Data Warehouse Lifecycle Toolkit, Wiley, 1998. [16] Lientz, B.P. and E.B. Swanson, Software Maintenance Management, Reading, Mass.: Addison-Wesley, 1980. [17] Linstone, Harold A., and Murray Turoff, eds. Delphi Method: Techniques and Applications. Reading, MA, Addison-Wesley Publishing Company, 1975. [18] Mockus, A., Eick, S.G., Graves,T. L., and Karr, A. F., "On Measurement and Analysis of Software Changes," Tech. Rep., National Institute of Statistical Sciences, 1999. [19] Paulk, M.C., Curtis, B., Chrissis, M.B., et. al., Capability Maturity Model for Software, Version 1.1, Software Engineering Institute, CMU/SEI-93-TR-25, DTIC Number ADA263432, February 1993. [20] Turoff, M., and Hiltz, Starr Roxanne, Computer Based Delphi Processes, http://eies.njit.edu/~turoff/Papers/delphi3.html. [21] Shewhart, Walter A., Statistical Method from the Viewpoint of Quality Control, Graduate School, Department of Agriculture, Washington 1939; Dover 1986. [22] Walston, C.E., and C.P. Felix, “A Method of Programming Measurement and Estimation,” IBM Systems Journal, no. 1 (1977), pp. 54-73. [23] Weiss, D.M. and Chai Tau Robert Lai, Software Product-Line Engineering: A Family Based Software Development Process, Addison-Wesley, June 1999. [24] Weiss, D.M. and V.R. Basili, “Evaluating Software Development by Analysis of Changes: Some Data from the Software Engineering Laboratory,” IEEE Transactions on Software Engineering Vol. SE-11, No. 2 (Feb 1985), pp. 157-168.
Questions about Performance. These questions are designed to give us a picture of how well the project produces software. What are your goals? What are the goals of the organization? 2. To what extent are you and the organization meeting these goals? How well did the last major release meet its goals? 3. How has the quality of the software changed over time, as measured by defects in the code, by field problems, or by other indicators? 4. How satisfied are customers with the product, including end customers, channel customers, and internal customers? 5. How much time does it take to make a change to the software (time to implement an MR)? For new feature MRs? For bug fix MRs? 6. How much effort does it take to make a change to the software (effort to implement an MR)? For new feature MRs? For bug fix MRs? Questions about Process. These questions are designed to give us a picture of what activities and artifacts are used to produce software. 7. 8.
9.
10. 11.
12. 13.
14. 15.
16.
17.
How are customer, channel, and business needs such as new features and cost identified and prioritized during planning? For each defined phase of development, what are the activities of that phase? What inputs are required? What documents or other artifacts are produced as output? How is the output of the phase checked for acceptability? How is adherence to that checking assured? What are the major handoff points (gates) in the product development cycle, where a handoff consists of one organization, such as systems engineering, baselining an artifact and providing it to another organization, such as development, for the next stage of activities? How are intersystem interfaces managed? How are they consistent? What are the integration issues? What criteria are used to decide when a new release may be shipped to customers? Who decides that the project is ready for customer release? How are time, cost, and quality estimated and predicted? Project managers: how do you keep track of the status and progress of the development? Developers: how do you report progress and work status to the project manager? What is the form of status and progress reports, and how often are they issued? What tools do the managers use to manage the development? What environment do the developers use to create code and other project artifacts, such as designs and test cases? In particular, what tools are used to develop the code? To test the code? To control the configuration of the code? To build the load? What activities are used to build, package, and deliver a working system to the customers? What artifacts are used during those activities? How long does it take to build a load? How often are loads built? How is the code organized into work assignments? How are the work assignments mapped into the directory structure used to maintain the source code?
18. What activities are the biggest wastes of time? The biggest bottlenecks? Most prone to rework? 19. What activities are the most effective? 20. What does the project do best? What are the project’s strengths? 21. What are the top 3 risks the project faces? What activities were put in place to reduce or mitigate those risks? Questions about Product. These questions are designed to tell us the characteristics of the code that is included in the product and its dependencies. 22. What is the name of the product? What is the designation of the next release? What prior versions have been released, and how widely deployed are they? 23. What is the market for the product? What type of software is it? Who are the competitors? 24. Is there a documented, well-defined, enforceable software architecture? How is it mapped to business objectives? How is it managed ? 25. How much has the software architecture changed over time? How much do you expect it to change in the future? 26. In what language(s) is the code written? 27. How many lines of non-comment source code are there in the last few releases, or equivalent customer delivery items? 28. What platform is used for development? 29. What platform is used for the operational environment? 30. What other Avaya products does this one require for its operation? What other Avaya products require this one for their operation? Questions about Organization and People These questions are designed to tell us the characteristics of the organization that produces the product. 31. How are training needs determined? How is time allocated for training and development? How does the organization evaluate the effectiveness of training? What are the future skill needs and gaps? 32. do new employees learn what is expected? How long does it take for a new person to become fully productive? 33. What are the assignments of groups and individuals on the organization chart? How is this likely to change in the future? 34. Show us on the organization chart which people or groups have responsibility for which components or modules of the software. How was this decided? 35. What are the key roles in producing the product? 36. What are the communication paths among the key roles? 37. How good is teamwork within the organization? Across organizations? 38. How good is the morale within the organization? 39. What is the turnover rate within the organization? 40. Why do people leave or join the project or organization? 41. What are people rewarded for? 42. Is decision making within the project effective? 43. How effective is the management of the project? Do managers know what is going on?