Proceedings 2007 v10

SOFTWARE MEASUREMENT EUROPEAN FORUM

Proceedings of

SMEF 2007 9 - 11 May 2007 Centro Congressi FRENTANI, ROME (ITALY)

EDITOR Ton Dekkers Shell Information Technology International B.V. The Netherlands COVER PHOTO Ton Dekkers

i

CONFERENCE OFFICERS Software Measurement European Forum 2007 Conference Manager Cristina Ferrarotti, Istituto Internazionale di Ricerca Srl., Italy Conference Chairperson Roberto Meli, DPO - Data Processing Organiseation, Italy Program Committee Chairperson Ton Dekkers, Shell Information Technology International B.V.,The Netherlands Program Committee Silvia Mara Abrahão, Universidad Politecnica de Valencia, Spain Prof. Alain Abran, École de Technologie Supérieure / Université du Québec, Canada Dr. Klaas van den Berg, University of Twente, The Netherlands Dr. Luigi Buglione, ETS - Université du Québec / Atos Origin, Italy Manfred Bundschuh, DASMA e.V, Germany Prof. Gerardo Canfora, University of Sannio, Italy Prof. Giovanni Cantone, University of Rome Tor Vergata, Italy Carol Dekkers, Quality Plus Technologies, Inc, U.S.A. Prof. Dr. Reiner Dumke, University of Magdeburg, Germany Dr. Christof Ebert, Vector Consulting, Germany Cao Ji, China Software Process Union, People Republic China Dr. Thomas Fehlmann, Euro Project Office AG, Switzerland Pekka Forselius, Software Technology Transfer Finland Oy, Finland Cristine Green, EDS, Denmark Susanne Hartkopf, Fraunhofer IESE, Germany Harold van Heeringen, Sogeti Nederland B.V., The Netherlands Heungshik Kim, Samsung, South Korea Rob Kusters, Eindhoven University of Technology / Open University, The Netherlands Miguel Lopez, University of Namur, Belgium Dr. Nicoletta Lucchetti, SOGEI, Italy Dr. Jürgen Münch, Fraunhofer IESE, Germany Sandro Morasca, Università dell’Insubria, Italy Pam Morris, Total Metrics, Australia Serge Oligny, Bell Canada, Canada Dr. Anthony Rollo, Software Measurement Services Ltd., United Kingdom Luca Santillo, independent consultant, Italy Habib Sedehi, University of Rome, Italy Charles Symons, United Kingdom Frank Vogelezang, Sogeti Nederland B.V., The Netherlands

ii

Software measurement is sometimes mistakenly perceived by the business community of developers as a work overload, a luxury we cannot possibly afford in a strongly competitive business market. On the contrary, beyond being useful for measuring our own personal contribution to production and for making the right decisions on how to improve our own production process, software measurement turns out to be absolutely essential when such a process produces consequences at corporate level, in other words, it has to be matched with stakeholders expectations. In this situation what is really needed is to be able to manage our own software project or service as well as to be able to objectively report its actual state and its prospective developments to customers, managers, auditors that are called upon to provide their consent or alternatively contribute to the decision making process. A critical factor for a successful collection and use of field measurements is the measurement full integration into other production processes: in other words measurement should become undistinguishable and inseparable from production. To be able to do so, measurement should be cost-effective, fast, unambiguous and user-friendly. Once information is gathered, it should be fed into and become part of the corporate decision-making and governance process which should look at information as quality decision nourishment. It is therefore essential to ensure outstanding data quality, as well as appropriate and accurate processing practices, likewise derived information (indicators) should be properly distributed and displayed in the most appropriate format depending on the end receivers it should be aimed at. International standards have long been introduced by law (de jure) and by consensus (de facto) as to how raw data should be transformed into information that meet a specific need at a more general, abstract and complex level. The time is ripe for these standards to be put in place by defining the most appropriate working practice depending on each individual situation. Transforming raw elementary data into business information takes a systemic measurement approach that can be pursued by developing a corporate ICT Measurement System (a solid building block of the organiseation) the foundations of which consist of Measurement programmes (temporary initiatives through which the Measurement System is being developed and enhanced). Within any Measurement System it is important to acknowledge the role played by functional and technical metrics, each with its own application field. IFPUG Function Point and COSMIC Full Function Point are the most prevailing functional measurement methods currently available on the market. They work side by side with size estimation techniques, such as Early&Quick FP, which allow estimating ahead of time and approximately the actual measurement when there is neither time, nor resources to allocate to standard evaluation. To complete the picture of the Measurement System tools, mention should be made of basic productivity and benchmarking data sources such as the one made public by ISBSG. They are essential in designing market productivity models supported by internal and specific business models, allowing combining software functional size measurement with managerial variables (e.g. time, cost, effort and staff) that its development, enhancement and supporting activities may require over time. The more measurements and estimations are automated, the lower the cost of the measurement endeavour, which is key to the success of the Measurement System. The event will be a unique opportunity for all ICT professionals to share both their knowledge and experience on Software Measurement Topics and also meet potential customers and partners from all over the world.

iii

SMEF aspires to remain a leading international event in Europe for experience interchange and knowledge improving on a subject, which is increasingly critical for the ICT governance. • • • • • • • • •

Measuring software products and processes to improve ICT Governance. Develop your competences in Software Measurement. Learn how to build complex indicators starting from raw elementary data. Learn how to setup an ICT Measurement System in your organiseation. Meet some of the major software measurement experts in the world. Explore the state of the art in this evolving discipline. Compare your own experience with other participants' experience. Enter a world wide professional network. Submit your own ideas to software measurement opinion leaders.

Cristina Ferraroti Roberto Meli Ton Dekkers

iv

TABLE OF CONTENTS Software Measurement European Forum 2007 1

DAY ONE - MAY 9 A successful roadmap for building complex ICT indicators Patrick Hamon, Roberto Meli

19

Cost-Efficient Customisation of Software Cockpits by Reusing Configurable Control Components Jens Heidrich, Jürgen Münch

33

Performance Measurement and Governance of software projects within the Management System of an ICT company Stefania Lombardi

43

Adopting CMMI Measurement Program in a Small Research Based Software Organiseation – A Case Study Saqib Rehan, Saira Anwar, Manzil-e-Maqsood A case study of a successful measurement program as a key input into improving the development process Pam Morris

55

Benefits from the Software Metric System after 3 year of practice Guido Moretto

65

Suggestions for Improving Measurement Plans: A BMP application in Spain Juan Cuadrado-Gallego; Alain Abran, Luigi Buglione

79

Measurement for improving accuracy of estimates: the case study of a small software organisation SilvieTrudel

93

How to effectively define and measure maintainability Markus Pizka, Florian Deissenböck

103

Tracking Software Degradation: The SPIP case study Miguel Lopez, Naji Habra Study to Secure Reliability of Measurement Data through Application of Mathematics Theory Sang-Pok Ko, Byeong-Kap Choi, Hak-Yong Kim, Yong-Shik Kim

v

111

DAY TWO - MAY 10 A Measurement Approach Integrating ISO 15939, CMMI and the ISBSG Luc Bégnoche, Alain Abran, Luigi Buglione

131

Advancing Functional Size Measurement: which size should we measure? Charles Symons

143

Changing from FPA to COSMIC: A transition framework Harold van Heeringen

155

Seizing and sizing SOA applications with COSMIC Function Points Luca Santillo

167

Approximate size measurement with the COSMIC method: Factors of influence Frank Vogelezang, Theo Prins

179

Early & Quick Function Points® v3.0: enhancements for a Publicly Available Method Tommaso Iorio, Roberto Meli, Franco Perna

199

Uncertainty of Software Requirements Thomas Fehlmann, Luca Santillo

209

A prototype tool to measure the data quality in Web portals Angelica Caro, Juan Enriquez de Salamanca, Coral Calero, Mario Piattini

219

A framework for semi-automated measurement of a software factory productivity Andrea Bei, Fabio Rabini, Giovanni Ricciolio

237

Software Maintenance Estimates Starting From Use Cases Yara Maria Almeida Freire, Arnaldo Dias Belchior

249

Beyond Development: estimating model for Run & Maintain cost Ton Dekkers

265

Improving Estimations in Agile Projects: Issues and Avenues Luigi Buglione, Alain Abran

275

Allowing for Task Uncertainties and Dependencies in Agile Release Planning Kevin Logue, Kevin McDaid, Des Greer

vi

285

DAY THREE - MAY 11 Application Portfolio Management, the basics - How much software do I have Marcel Rispens, Frank Vogelezang

299

Earned Value Application in Programme and Portfolio Management Cao Ji, Liu Liming, Wang Ning

307

Function Point Chaos – Making sense of zero Function Point projects Carol Dekkers ERP Repository: Background of the ISBSG ERP-Questionnaires Ton Dekkers

313

Accuracy of Estimation Models with Discrete Parameter Values shown on the Example of COCOMO II Thomas Harbich, Klaus Alisch Way to excellence in software development using Software Process Improvement Map and systematic project data collection Pekka Forselius

325

BONUS Guidelines for the Service-Development within Service-oriented Architectures Andreas Schmietendorf, Reiner Dumke

335

Abstracts

363

Author’s affiliations

vii

viii

A successful roadmap for building complex ICT indicators Patrick Hamon, Roberto Meli

Abstract The Practical Software Measurement framework (PSM) and the ISO/IEC 15939 standard describe a Measurement Information Model (MIM) based on a hierarchy of related concepts starting from entity attribute base measures and ending to information products which fulfil user information needs. In the past recent years, many progresses have been achieved in the standardization and diffusion of measurement methods for basic attributes of a software product (i.e. size, quality). Many other software process elementary attributes have been clearly defined in terms of measurement methods like effort, staff, duration, costs etc. It is possible to state that at the basic attribute level, good practices are already in place and that the measurement results may be considered reliable. Problems arise when we want to build the path from base measures to information products related to specific measurable concepts. Even the definitions of the intermediate derived measures (like productivity, defect density, speed of delivery etc.) are not shared and stable among different organiseations. More over, in order to satisfy an information need at the managerial level, it is essential to clearly define the causal network between measurable concepts and the relative chains involving base measures, derived measures and information products. This is absolutely not easy and the high risk is to build a very nice indicator which is completely defined but which is not really representative of the phenomenon that is intended to be captured in numbers. This paper presents a roadmap and some hints to avoid traps and difficulties in building such measurement constructs in order to deploy successful information products tailored to satisfy information needs of the ICT management. A dashboard tool is also used as an example of an accelerator for the indicators building and management processes.

1. A “system approach” to ICT Measurement The measurement of ICT processes and of their related products / services is often perceived, by people in charge of production, as an unnecessary activity disturbing the primary workflow of system deployment. If it is practiced, it is often done to fulfil contractual obligations, sometimes to formally answer to necessary requisites for a certification of quality or of a maturity level (i.e. CMMI model). Measurement is rarely perceived as a management opportunity, more often as a threat. It is non easy to win this prejudice if measurement remains a facultative and naïf task left to the personal initiative of individual analysts or project managers. A formalised ICT Measurement System allows to position this "apparently useless" measurement activity in a context of business governance that, if nothing else, makes it explicit, integrated into the production processes, cost effective and valuable for the management and for the other involved parts. Measurement may happen at different levels in the ICT organiseation: at a project level it gives information about the progress and the state of a specific project; at the portfolio level it gives information about the global usage of organiseational resources and the progress and the state of the totality of projects; at the process level it gives information about the general behaviour of projects and the statistically derived trends over time.

1

Very often, the software engineering community uses terms like Metrics Program and Measurement Plan, but what are the differences among a Metrics Program, a Measurement Plan and an ICT Measurement System?

1.1. Metrics Program, Measurement Plan and ICT Measurement System

Metrics Program

A Metrics Program is an initiative to promote the usage of a measurement process in the organiseation: it is a project oriented effort; in other words it is temporary. A Measurement Program has a specific and obtainable set of goals to be achieved within a limited and predefined budget and time. Resources are not assigned to a specific permanent organiseational unit on a repetitive and regular base but dynamically allocated to the Program according to a specific plan. A Metrics Program usually deploys deliverables and not continuous services. A Measurement Plan is a project related document describing all that is needed to implement the measurement process within a specific ICT project. It lives with the associated project and does not survive to it. An ICT Measurement System [1] is something more than a process: it may be defined as a living operational system deputed to the management of the measurement aspects of the ICT processes and products / services. It represents a governance tool both of the contractual relationships between clients and suppliers and of the internal production processes. An ICT Measurement System (Figure 1) is made up by various components: Processes, Organiseational Structures, Information Systems, Competences, Motivations, Methods, Techniques, Tools, and Infrastructures. An ICT Measurement System is the only permanent entity capable to supply measurement services to the projects that need them.

STARTUP PROJECT

IMPROVEMENT INITIATIVES

Creation and evolution of a ICT-MS

Measurement System System Measurement

Tools / Infrastructures Information Systems Methods & Techniques

Processes

Motivations Know-how Organizational Structures

Measurement Services Supply Services & Products

Evaluations & Policies

TOP & SENIOR MANAGEMENT

Services & Products

Services & Products

Services & Products Measuremens Evaluations & Policies

Measuremens Evaluations & Policies

PROJECT & SERVICE MANAGER

LINE MANAGEMENT

Measurement Plan

Figure 1: ICT Measurement System

2

Measuremens Evaluations & Policies

TECHNICAL SUPPORT STAFF

1.2. Resources and Standards for an ICT Measurement System Several resources are available to guide the implementation of a measurement process. Unfortunately they do not cover adequately the constitution of the ICT-MS as a company’s permanent system but are, anyway, precious sources of information for some of the major components of that system. Specifically there are some de facto and de jure standards helping the identification of: stakeholders, information requirements, activities, tools, and deliverables of a measurement process. Figure 2, extracted by [2] clarifies the relations

among the main standards and models. Figure 2: Standards and Models The ISO/IEC 15939 standard [3] and the Practical Software Measurement framework (PSM), [4][5][6] describe a Measurement Process and its related Measurement Information Model (MIM). These models are the reference guides for anyone involved in the field. They outline a project oriented measurement process in terms of information models, activities and results. Other important available resources are: the Capability Maturity Model Integration (CMM-I) [7], the “Guidelines to Software Measurement” by IFPUG [8] and the Software Engineering Body of Knowledge (SWEBOK) [9]. 1.2.1. ISO/IEC 15939 Figure 3, extracted from the standard documentation, shows the flow of activities and their relations. The ISO approach muddles up activities needed to build up a permanent ICT-MS with activities needed at the temporary project level. Actual practices suggest to keep them separate to be more effective and efficient. The cycle illustrated in Figure could be covered at various levels: the project level and the portfolio level (both of them being temporary), the functional level and the ICT-MS level (both of them being permanent). Information needs are different at all levels and feedback loops have different timings.

3

Figure 3: Measurement activities according to ISO/IEC 15939:2002 1.2.2. Practical Software Measurement (PSM) In this model the differentiation between actions at the system level and actions at the project level is clearer although a permanent system is not explicitly identified and suggested. Chapter 5 of the Practical Software and System Measurement guide is devoted to the 'Enterprise and Organiseational Context' from which the following Figure 4 has been extracted. PSM approach is still very “project” oriented and the global organiseational context is not completely covered at the same detailed level as the project is.

Figure 4: PSM – Enterprise & Organisational Context 1.2.3. CMMI CMMI is a set of best practices used making organiseation succeeding. Best practices are grouped by activities (Process Areas) like Requirement, Risk, Configuration, Planning, etc. One of the Process Areas is “Measurement & Analysis”, which purpose is to develop and sustain a measurement capability that is used to support management information needs (Figure 5).

4

There are 2 Specific Goals related to measurement: • Align Measurement and Analysis Activities. • Provide Measurement Results. Specific practices SP1.X are about how to Align Measurement and Analysis Activities. Specific practices SP2.X are about how to Provide Measurement Results. There are 4 major components in the measurement system which are: • Measurement plan: the whole specification of the indicators. • Measurement indicators: the information needed. • Measurement Repository: the quantitative knowledge of projects. • Procedures & Tools: data collection and others automates.

SP 1.1 Establish Measurement Objectives

SP 2.4 Communicate Results

SP 1.3 Specify Data Collection Procedures

SP 1.2 Specify Measures Measurement Plan

Measurement Repository

Measurement Indicators

Procedures & Tools

SP 2.3 Store Data & Results

SP 2.2 Analyze Data

SP 1.4 Specify Analysis Procedures

SP 2.1 Collect Data

Figure 5: CMM-I – PA “Measurement & Analysis” 1.2.4. Conclusions To summarise, it is possible to state that there is a large amount of information and models about how to run a Metrics Program or initiative and about how to measure and analyse specific products and processes, but there is less documented knowledge about the building of permanent ICT-MS. Specifically there is not enough clearness about how to: • Support a project manager in using measures to manage a project. • Support a service manager in using measures to manage continuous services. • Support a line manager to use measures to manage the business units. • Create and maintain a permanent structure capable to supply measurement services in an explicit and recognizable way and to evolve itself to follow the business needs.

1.3. Some ICT-MS components 1.3.1. Organiseational Structures In an ICT Measurement System there are at least 3 category of person. • Project: People who are developing and maintaining the software or system: o Provide objective information. o Provide subjective information. o Attend training. o Produce lessons-learned experience. o Use provided processes and models.

5

• Measurement & Analysis Team: people who understand, assess and refine, package: o Analyse experiences o Develop models and relationships o Produce standards and training o Provide feedback • Technical Support: people who maintain the information repository: o Write data collection procedure. o Establish database structure. o QA and feed back data. o Archive data and documents. 1.3.2. Competencies A good ICT-MS needs some high level tasks: • Understand the information needs. • Specify indicators. • Design reporting chain. • Perform indicators. • Automate data collection. • Administrate the repository. • Analyze historical data. • … To be able to manage these tasks, the M&A team should have a lot of skills to build measurement plan, Balanced Score cards, Tableaux de Bord, GQ(I)M (Goal Question (Indicator) Metric), SPC (Statistical Process Control), ETL (Extract Transform Load Techniques), Estimate, benchmark, causal analysis, Function Point Analysis, SLOC (Source Line Of Code), Earned Value Analysis, etc. 1.3.3. Measurement Information Model Either the Practical Software Measurement framework (PSM) and the ISO/IEC 15939 standard describe a Measurement Information Model (MIM) like the following (Figure 6) based on a hierarchy of related concepts starting from entity attribute base measures and ending to information products which fulfil user information needs. In the past recent years, many progresses have been achieved in the standardization and diffusion of measurement methods for basic attributes of a software product (i.e. size, quality). Many other software process elementary attributes have been clearly defined in terms of measurement methods like effort, staff, duration, costs etc. It is possible to state that, at the basic attribute level, good practices are already in place and that the measurement results may be considered reliable. Problems arise in building the path from base measures to information products related to specific measurable concepts. Even the definitions of the intermediate derived measures (like productivity, defect density, speed of delivery) are not shared and stable among different organisations.

6

Figure 6: Measurement Information Model 1.3.4. Information System architecture Because there are many stakeholders in the process, the tools should be modular.

Direction

Project

Quality & SEPG

Integrator

Publication services Indicators Library M&A Repository Data Collection Services Excel

XML

CSV

Data Bases

Tools

Figure 7: Information System architecture 7

•

• •

•

If we have a look at Figure 7, and we start from the bottom, we can identify: Data Collections Services: in any company there are a lot of sources which can provide data. It will be some Excel files (a lot!), sometimes some XML or Text files (CSV), usually a lot of data bases (SAP, ERP, …), and a lot of tools (Configuration management, change management, planning, requirement management, test, ...). It is very important that the M&A solution might provide services that let one build easily a new data collector and to reuse it at any time. It should be possible to adapt it to a new situation. The data collector should automate the collection of data, as frequently as you want, and wherever the sources are. M&A Repository: this is the knowledge database. Histories of projects which will help to predict the future with accuracy. In general this is SQL database. Indicators Library: once an indicator is specified, it must be possible to reuse it on another project. The other project could also reuse an extractor. All the M&A system must be designed with a reuse approach. In large companies having a lot of projects and a good level of process maturity, it is important to build a project dashboard in a few seconds. In the Library, it should be also possible to find templates of indicators, like curves’ profiles. Publication services: the purpose of this component is to provide the information products to the allowed people, with security management. The dashboard must be accessible through the web, wherever the user is, always updated.

2. How to build complex ICT indicators 2.1. Focusing Measurement on the Information Needs of Managers If we go back to Figure , we see that “Information Needs” drive the planning, performance and evaluation activities within the measurement process. “Information Needs” are the requirements of essential measurement activities in the “Core Measurement Process”. Once Information Needs are defined, a "Measurement Plan" is developed by decomposing them into "Analysis Results and Performance Measures" (information products), which contain measures and associated guidance. These information products are delivered to managers and drive "Improvement Actions". By placing Information Needs outside the scope of the Core Measurement Process itself, the measurement process can be used to support a wide range of management and executive functions. While this makes measurement a flexible process, it also requires that information needs be clearly reviewed, prioritised and documented before initiating or expanding a measurement process. Because identifying information needs is a critical step in ensuring measurement process success, organiseations must take the time to correctly select and define their management information needs. This section presents common types of information needs. These common types (of information needs) are found in both commercial and government organiseations involved in systems and software engineering. Further, many of them address the information needs of key practices from the CMMI. These common types should encourage you to extract potential measurement process requirements from your organiseations management, system or software and support processes [10]. 2.1.1. Identify Information Needs for Current Management Practices Organiseations often develop a culture that encompasses technical management and engineering functions. This culture develops as process, training, infrastructure and habit evolve. For example, in a systems acquisition program, a monthly meeting may be held to review the supplier’s progress and address potential risks. This meeting is a result of the culture and process within the organiseation,

8

which has found that a periodic review of progress has led to a greater probability of on-time delivery. From a measurement perspective, the periodic meetings represent a set of candidate information needs, that when satisfied, makes the culture more effective and efficient. The information needed at the periodic meetings becomes a formal “information need” of the measurement process. One of the primary barriers to measurement adoption is the misalignment between how managers work and what information the measurement process provides. By examining how managers really work, and using that to drive the measurement process, you are more likely to achieve greater measurement process adoption and success. 2.1.2. Identify Measurements of "Requirements" Without exception, systems and software managers must measure the requirements engineering activities of the life cycle. Measuring requirements engineering involves quantifying the progression of software requirements from concept to formulation to design to test. Assessing these requirements ensures that your product contains all required functionality. Typically, program plans and projections are based on estimates of the software requirements, which are used as the basis for software size estimates. Because estimating requirements plays such a large part in developing the initial program plan, it is imperative to monitor that requirements are proceeding as expected. Consider a scenario where you are developing 25% more requirements than you planned – every life cycle activity may then be over schedule and budget by 25% or more. It is advisable to measure the number of requirements that each software process generates or accepts. Measuring the number of system or top-level software requirements (i.e. “features” or “capabilities”), as well as the decomposition of system requirements into more detailed requirements helps you to keep tabs on the scope of your program. According to the ISO/IEC 14143-1:1998 [11] user requirements may be decomposed into three types of requirements: functional, quality and technical requirements. Although it is possible and useful to tag every functional requirement with an id. and to count the number of them, we must admit that requirements are not all of the same “size”. A requirement like “the system should manage all kinds of enquiry from the customers” is much bigger than “the user should be able to add a new article in the basket”. In addition to measuring the number of functional requirements, then, it is very useful to capture a “weighted” measure of the same items. This is exactly what a functional measurement method (like IFPUG Funtion Points or COSMIC Full Function Points) is intended to do: functional size is a “normalised” measure of functional requirements. Every functional requirement is evaluated in terms of the number and weights of Base Functional Components (BFC) which it contains. Monitoring each functional requirement with its size is a better way to control the software process. At a project start or in the initial phases of a life cycle it is not often possible to measure BFCs because they are too detailed objects to be evident at that time, so it is important to be able to estimate functional size using an alternative method. Early & Quick Function Point [12] is a highly performing estimation method available for that and it is particularly tailored to estimate requirements at different levels of “granularity”. One of the most common issues detected by requirements measures is “requirements creep”: the tendency to keep adding requirements to a program without considering how many additional resources or risks those new requirements represent.

9

In order to track differences between developed and planned requirements, it is necessary to also measure the status of each requirement as it moves through life cycle activities. A typical requirement status could be: defined, approved, allocated, designed, implemented, tested and verified. For example, in the CMMI Requirements Management process area, one of the typical work products identified for sub-process 1.3 “Manage Requirements Changes” is “requirements status.” A measure that shows the status of all requirements is essential in monitoring program status and acts as a scorecard to illustrate that requirements are being implemented. Early in the program schedule, ensure that requirements become defined, approved and allocated as the system architecture is finalised. Near the end of the program schedule, you should see requirements move from implemented status to tested then verified status. While valuable in detecting “requirements volatility”, this measure also supports monitoring effort, configuration management and quality. 2.1.3. Identify Risks That Impacted Previous Programs In most organiseations, as well as in the experience of many managers, there is a history of project lessons that should not be repeated! This includes reasons that projects were never completed, or why projects were delivered late, over budget and without needed functionality. Historical risks in similar and recent software programs are prime candidates for measurement in the next (now current) program. With historical risks, focus on identifying the cause of the risk, rather than the symptom or the response. For example, for a project where the software was delivered late, try to remember and uncover the specific software components that led to the lateness. Perhaps the reason for the program delay was that a key piece of COTS software was not available, or that the integration took longer than expected. Consider also, that software managers are often aware of software problems, but only react when a schedule delay is required. In our own software development, we have been “burned” by technical and schedule problems related to our COTS vendors, and we have essentially let their problems impact our projects. We now take an “act early” approach, where our product manager immediately considers technical alternatives once a problem is identified. In your environment, consider whether you want to measure to detect technical problems for monitoring, or to take management action on known ones. In some cases, your measurement program will need to do both. 2.1.4. Identify Risks for the Current Program During program planning, you may establish a risk management plan. For each risk, you typically develop risk identification, mitigation, impact and probability. A measurement process can support a risk management plan by identifying risks that need to be mitigated, and by quantifying the effect of mitigation activities. From a measurement perspective, risks can become information needs that drive the measurement process. The measurement process can, and should, address these needs. Since there are costs associated with measurement (as well as risk management), you may want to select a subset of risks to measure. You could use probability and estimated mitigation cost (or impact) as a discriminator in selecting risks to measure. For example, you may choose to measure only high and medium probability risks where the associated mitigation cost is greater than $100K.

10

A common risk within our organiseation is that software developers will not spend as much time as planned on a given product baseline. In the past, we found that senior developers were temporarily “borrowed” for other product development or support activities, for example, to investigate the cause of a field report from a customer. To address this risk, we developed an information need related to ensuring that resource expenditures correspond to our business priorities, i.e. first things first. 2.1.5. Measure What You Are Trying To Improve It is the author's experience to frequently witness organisations that implement large-scale, institutional change without implementing the corresponding means for managing the resultant process. Tom DeMarco coined a phrase commonly heard in the world of software management: ‘If you don’t measure it, you can’t manage it.’ [Demarco 1982] So, before you take a small step to improve an isolated software task, or a large step to improve an entire process, make sure that you ask yourself how you will demonstrate actual process improvement. In addition to the desire to better manage your new or changed software process(es), be aware that there is an even more important reason to measure the processes that you are attempting to improve. This reason is to develop a quantitative understanding of why your software process behaves as it does. Developing this quantitative process understanding of this type requires being able to mathematically describe the primary process factors. Once this mathematical relationship is established, the next step is to monitor and control the effects of these process factors. Furthermore, your estimates will become more accurate as a result of this better understanding. Many measurement practitioners confuse a general quantitative understanding of their process with the quantitative management capability included in the CMM® Level 4 - Optimised. In practice though, organisations develop quantitative models for activities ranging from requirements engineering, inspections, defect detection and removal, to system testing software release activities … and few of them are rated at level 4. The point here is that by developing an understanding of your processes through measurement, you will be in a better position to estimate, control and manage them, and less likely to rely on subjective guessing. (In other words, don’t wait until you are attempting a CMM Level 4 assessment to start measuring… start now and you’ll be that much ahead of the game.) 2.1.6. Identify Software Quality Measures Some years ago, a former market-leading technology company decided to counter its market slump by hiring a technology vice president. At the first meeting of his direct reports, he walked around the table, put an airsickness bag in front of each one and said, “Your schedules make me sick”. He went on to say that schedules without quality don’t mean anything. In essence, it doesn’t matter how well you stick to the schedule if the system or software product is unusable by the customer. This VP knew what the market demands – it is unfortunate that more companies do not take quality seriously. If they did they would focus on building a quality product rather than racing to get a substandard product to market quickly. System or software quality is more than measuring the quality of the end product. End product quality is the result of the systems and software quality activities employed during development. If you ignore the quality aspects of systems and software development, it is anybody’s guess what the quality of the end product will be.

11

One technique for addressing software quality is the use of “quality gates.” This involves establishing reasonable, and measurable, thresholds at several points during development, and then ensuring that the software or work products meet them before continuing. A quality gate could be all requirements approved, all unit tests passed, all code inspected, all requirements (or subset) tested. 2.1.7. Identify Assumptions Used in Preparing the Project Plan A typical systems or software development program plan includes a number of assumptions about progress, quality and resources. Assumptions made during program planning are excellent information needs for the measurement process. If the assumption is not realised, then many of the resulting schedule and resource plans may need to be examined, or re-planned. For example, when performing an estimation using the Cost Constructive Model (COCOMO), the estimated number of lines of code, (or other software sizing measure such as function points), is a primary driver in establishing the amount of effort. If your project exceeds the number of lines of code during development, this may indicate that more effort is needed during the “down stream” software activities, such as unit testing, system testing or integration. 2.1.8. Identify Resources Consumed and Products Produced To Understand Process Performance At the beginning of initiating a measurement process, your organiseation will typically not have historical process data. In such cases, one of your goals should be to understand the behavior of the processes in the systems or software lifecycle. Consider that each process in the life cycle consumes resources and produces a product (either an internal or a customer product). You should establish basic measurements to determine how many resources are being consumed and how much product is being produced. You might consider doing this for the processes that consume the majority of your budget or schedule. 2.1.9. Identify the Information Needed to Satisfy Organisational Policy In many system or software shops, managers are required to use specific techniques in monitoring and controlling their programs. In large defense programs, for example, an earned value management system is required. When managers are required to use specific management techniques, the organisation should provide the data that managers need to effectively apply the technique. The measurement process is the method that the organisation uses to deliver the information that managers need to use the technique. For example, many defense organisations must use an earned value management system. In support of this, the measurement process should deliver required cost and schedule status information to the program in the form of cost performance index, schedule performance index, tocomplete performance index, earned value (and its components), and variance at completion. Without a measurement process to ensure that the earned value data is collected, analyzed and delivered in a timely fashion, even a very useful technique such as earned value can be inconsistently or (often) incorrectly applied. Program or site policy and standards documentation provide many information needs for the measurement process. While setting up a measurement process, analyse the systems and software management standards and policy to see what management techniques have, or are being, mandated. Then, extract the information needs from these mandated standards and address them during measurement process implementation.

12

2.1.10. Conclusion Identifying information needs is the first step in establishing an effective measurement process. The techniques above provide tools for extracting the real information needs within your organiseation. Once all information needs are identified, you can assign a relative priority to them, in case you need to balance the information needs and the resources available to the measurement process. The measurement process will refine those information needs into appropriate measurement activities, specific measures and information products. Over time, you should review the effectiveness of the information products, and the individual information needs, as your organiseation adopts new technology and processes. This approach for identifying information needs ensures that the measurement information you, and other managers, receive is effective in helping you monitoring and controlling your programs. By focusing measurement on true information needs, managers are better armed to monitor and control their programs, and to assess the likelihood of an on-time and on-budget completion. In addition, by saving a manager time in gathering and analyzing the information they need to manage, managers can spend more time on their real role: decision-making.

2.2. Useful hints and techniques (to do’s - not to do’s) There are a lot of good hints to be successful in deploying a Measurement & Analysis solution. Here after are some important: • Use GQ(I)M and causal networks to understand management expectations. • Measures must be used for decision making. • Measures must be associated to business goals. • Evaluate the cost of automated data collection vs non automated. • Train people on the tool AND on the indicators. • Start small: 6 indicators are enough. • Be simple. Avoid complexity as much as possible. • After the creation of a Measurement System, managers will have to wait for several month before having enough data: manage this.

2.3. Some traps in building ICT complex dashboards 2.3.1. Common errors There are many traps in building a Measurement & Analysis system. Below is a list of the most common errors met. • The major one is to start without having competencies. The M&A domain seems very simple for a non expert: “every one is able to build a graph with Excel, so I will succeed to provide graphs for my organiseation”. This is the first main cause of failure. • Another big mistake is to forget that an efficient M&A system is agile. Most of companies will spent a lot of time in specifying indicators and dashboard, without worrying of the ability of the solution to be agile. Indicators are changing very often, and people would like to have their new indicators last week! The danger is to have a very nice dashboard which is not evolutionary. In a typical dashboard construction, building the indicators (which is more than 80% of the need) will usually take 20% of the resources, whereas the “look “ (less than 20% of the need) will take 80% of the resources. • Another mistake is to build only financial indicators, answering the need of the general manager. Operational people cannot manage their activity as “default detection” when they are

13

looking to a financial indicator. So each level of responsibility should have its own dashboard and indicators. 2.3.2. Other traps • Not understanding the “real” causal dependencies among events, measures, indicators: using the right indicator for the wrong phenomenon. • Using a bad representation graph. • Not understanding the “real” information need. • Defining a set of “unobtainable” measures. • Supporting “pure analysis” instead of “performance monitoring” or “theoretical research” instead of “management”. • Graph should be selected to say something important to succeed (objective, issues, risks, uncertainty, …). • Indicator’s update must be in accordance with the reactivity needed. If you need to have the data once a week in order to act with the good reactivity, the indicator should not be updated every month. • Indicators must be well defined in order that every one can understand it. • Mind the indicators that can have ambiguous comportment. For example if you use a ratio, the result is growing if the above number grows or the below number decreases. Two causes for one result. A bad interpretation could occur then. • Avoid encyclopaedia dashboard, with too many indicators. A good dashboard is up to 20 indicators, no more. • Remind that financial indicators are fare from the field. Once a project is over budget, it’s too late. You ought to follow the number of tests, or requirements instead. • Do not let a summer job building you decision tool. It’s not a problem of technology, it a problem of how to manage. • Avoid home made, complex solution. Every one would like to build his own M&A system with Excel, Access, HTML, etc …. But in a large organisation, you will loose a lot of time and money to build something you can buy. And once the developer leaves, you will have troubles to maintain your system. • Start small. The first M&A project should have a small number of indicators, and a small perimeter. Then enlarge the scope. • Let freedom to people to build their own indicator. The organisation should propose “official” indicators, but should also let people managing their own information need. It can be, for example, to manage a specific constraint on a project. If middle management cannot have this possibility, they will be frustrated, and will probably continue to use Excel without capitalising the knowledge.

3. Conclusions Information needs are usually driven by business governance goals and are not often coincident with base measures. The path from elementary attributes and complex information products is difficult and full of traps. In order to be successful it is important to give to the measurement & analysis process a “house” in organiseational terms. An ICT Measurement System is a good place where to work since it is explicitly recognised, not temporary, defined and supported by the management.

14

The present paper pointed out a list of suggestions about how to face the construction of pertinent information products based on reliable causal networks and on agreed rules and base measures. This still remains a challenging task for which the most valuable resource is the “people” professionalism supported by the right technology.

4. References [1] [2] [3] [4] [5] [6] [7]

[8] [9] [10] [11] [12] [13] [14]

GUFPI-ISMA, “Metriche del software -Esperienze e ricerche”, FrancoAngeli, 2006 IFPUG, IT Measurement: Practical Advice from the Experts, Addison-Wesley, 2002, ISBN: 020174158X “ISO/IEC 15939:2002(E) - Software engineering — Software measurement process “, ISO/IEC 2002 McGarry J. et al., Practical Software Measurement – Objective Information for Decision Makers, Addison-Wesley, 2004, ISBN: 0201715163 Practical Software and Systems Measurement Guidebook version 4.0b, by Department of Defense and U.S. Army, October 2000 Practical Software and Systems Measurement, http://www.psmsc.com/ Carnegie Mellon® Software Engineering Institute (SEI), http://www.sei.cmu.edu/cmmi/ , http://www.sei.cmu.edu/cmmi/models/ss-cont-v1.1.doc, http://www.sei.cmu.edu/cmmi/models/ss-staged-v1.1.doc IFPUG, Guidelines to Software Measurement, 2004 SWEBOK project, http://www.swebok.org Patrick Hamon, Peter Baxter, “Focusing Measurement on the Information Needs of Managers”, SEPG 2007 “ISO/IEC 14143-1:1998(E) - Information technology — Software measurement — Functional size measurement — Part 1: Definition of concept, ISO/IEC, 1998 Iorio Tommaso, Meli Roberto, Perna Franco, “Early & Quick Function Points® v3.0: enhancements for a Publicly Available Method”, SMEF 2007, Rome (IT), 2007 Caputo Kim, “CMM® Implementation Guide”, Addison-Wesley 1998 Zachary G. Pascal, “Showstopper!”, The Free Press/Macmillan 1994, pp 243-255

15

5. Annex: Some dashboard graphical examples1

1

All screenshots extracted from Data Drill product. More info at: http://www.spirula.fr/index.php?lang=uk

16

17

MANAGEMENT

PERFORMANCE

........

18

Cost-Efficient Customisation of Software Cockpits by Reusing Configurable Control Components Jens Heidrich, Jürgen Münch

Abstract Detecting and reacting to critical project states in order to achieve planned goals is one key issue in performing a software development project successfully and staying competitive as a software development company. Software Cockpits, also known as Software Project Control Centres, support the management and controlling of software and system development projects and provide a single point of project control. One key element of such cockpits is the analysis, interpretation, and visualisation of measurement data in order to support various stakeholders for different purposes. Currently, many companies develop their own cockpits (typically simple dashboards) that are mainly based on Spreadsheet applications. Alternatively, they use dashboard solutions that provide a fixed set of predefined functions for project control that cannot be sufficiently customised to their own development environment. Most of the more generic approaches for control centres offer only partial solutions and lack purpose- and role-oriented data interpretation and visualisation based on a flexible set of control techniques that can be customised according to planned goals in a cost-efficient way. A systematic approach for defining reusable, customisable control components and instantiate them according to different organisational goals and characteristics is basically missing. This article gives an overview of the Specula project control framework and illustrates how to employ it for implementing project control mechanisms. The focus is on how to describe and combine control components in a generic way and how to select the right components based on explicitly defined project goals. Furthermore, related approaches are discussed and the use of Specula as part of industrial case studies is described.

1. Introduction One means for obtaining intellectual control over development processes and determining the performance of processes and the quality of products is to institutionalise measurement on the basis of explicit models. Companies have started to introduce so-called software cockpits, also know as Software Project Control Centres (SPCC) [16] or Project Management Offices (PMO) [17], for systematic quality assurance and management support. A software cockpit is comparable to an aircraft cockpit, which centrally integrates all relevant information for monitoring and controlling purposes. For example, a project manager can use it to get an overview of the project state and a quality assurance manager can use it to check the quality of the software product. There exist a variety of such software cockpits in practice, gathering different measurement data and implementing different techniques for data interpretation and visualisation ranging from simple dashboards to approaches supporting advanced controlling techniques and allowing for organisation-wide data collection, interpretation, and visualisation. An important success factor in the software engineering domain is that these solutions are customised to the specific goals (such as controlling), the organisational characteristics and needs, as well as the concrete project environment

19

(e.g., available data and process maturity). For instance, for multi-disciplinary, distributed software development, measurement data has to be collected from different sources (locations) and formats. In this case, integration of data is crucial for getting a consistent picture of the project state. Supporting such customisation by pro-actively reusing configurable components promises costefficient set-up of software cockpits. Specula (lat. watch tower) is a generic framework for implementing software cockpits within an organisation using reusable control components [8], [9]. It was developed at the University of Kaiserslautern and the Fraunhofer Institute for Experimental Software Engineering (IESE) and basically consists of four parts: (1) a logical architecture for implementing software cockpits [16], (2) a conceptual model formally describing the interfaces between data collection, data interpretation, and data visualisation, (3) an implementation of the conceptual model, including a construction kit of control components, and (4) a method of how to select control components according to explicitly stated control goals and set up project control for a specific development organisation. The evaluation of the Specula framework is currently being conducted in several industrial case studies as part of the Soft-Pit research project funded by the German Federal Ministry of Education and Research (http://www.soft-pit.de)1. The project focuses on getting experience and methodological support for operationally introducing control centres into companies and projects. Section 2 will present the Specula framework in more detail and illustrate its conceptual modules and the basic data flow between those modules. Section 3 will define reusable control components and show how to instantiate them in order to systematically collect, interpret, and visualise measurement data during project performance. Section 4 will illustrate the customisation process of the framework based on explicitly defined measurement goals and will present a sample scenario on how to apply this process in a real project environment. Section 5 will present related work in the field of software project control centres and project performance indicators. The article will conclude with a brief summary and discussion of future work and describe the continuous application of the Specula framework as part of industrial projects.

2. The Specula Framework Specula, also known as goal-oriented SPCC approach, is a state-of-the-art framework for project control. It implements a control centre as defined in [16]. One essential objective of Specula is to interpret and present collected measurement data in a goal-oriented way in order to optimise a measurement program and effectively detect plan deviations making use of freely configurable, reusable control components. Specula is largely based on the Quality Improvement Paradigm (QIP) and the GQM approach [2]. QIP is used to implement a project control feedback cycle and make use of experiences and knowledge gathered in order to reuse and customise control components. GQM is used to drive the selection process of finding the right control components according to explicitly defined measurement goals. Figure 1 illustrates the basic conceptual modules of the Specula framework. The customisation module is responsible for selecting and adapting the control components according to project goals and characteristics and defined measurement (control) goals. It is possible to include past experience (e.g., effort baselines, thresholds) in the selection and adaptation process. This experience is stored 1

Grant number: 01ISE07A.

20

in a so-called experience base. A so-called Visualisation Catena (VC) is created, which formally describes how to collect, interpret, and visualise measurement data. The set of reusable control components from which the VC is instantiated basically consists of integrated project control techniques (for interpreting the data in the right way) and data visualisation mechanisms (for presenting the interpreted data in accordance with the role interested in the data). The central processing module collects measurement data during project performance and interprets and visualises them according to the VC specification. Measurement data can be retrieved (semi)automatically from project repositories or manually from data collection forms and (semi-)formal documents. Finally, charts and tables are produced to allow for online project control. A packaging module collects feedback from project stakeholders about the application of the control mechanisms and stores them in an Experience Base (e.g., whether a baseline worked, whether all plan deviations were detected, or whether retaliatory actions had a measurable effect). Using these modules, the Specula framework is able to specify a whole family of project control centres (which is comparable to a software product line for control centres) Project Members GQM GQM Plan Plan

M M

Project Planners

G G Q Q

Q Q

M M

M M

GQM GQM VI VI FI FI DE DE WFI WFI

Documents Q Q M M

Customisation Customisation

Goals Goals and and Characteristics Characteristics

M M

Project Project Repositories Repositories

Experience Experience Base Base

Data Data Collection, Collection, Interpretation, Interpretation, and and Visualisation Visualisation

VI VI

Visualisation Visualisation Catena Catena VI VI

Goal Goal Question Question Metric Metric View ViewInstance Instance Function FunctionInstance Instance Data Data Entry Entry Web WebForm FormInstance Instance Association Association Data Data Flow Flow

Packaging Packaging

Online Online Control Control

VI VI

VI VI

FI FI

Control Control Components Components

FI FI

FI FI

FI FI

DE DE

DE DE

DE DE

WFI WFI

WFI WFI

DB DB

DE DE

Project Stakeholders

IDE IDE

Figure 1: Overview of the Specula Framework Large parts of the framework are supported by a corresponding tool implementation, which currently completely automates the central processing module as well as parts of the customisation module (defining control components and creating a VC). The process of deriving a VC from a GQM plan, project goals, and characteristics is currently not tool-supported and has to be performed manually. Section 4 will give an overview of the performed steps and provide some guidance on how to systematically perform this task. Packaging is currently only supported as far as maintenance of control components is concerned (e.g., creating new components, changing existing ones, and adapting parameter settings).

3. Specification of Control Components In order to control software development projects, especially projects using a distributed development process, goal-oriented interpretation and visualisation of collected measurement data is

21

essential. [8] describes the concept of a so-called Visualisation Catena (VC) for project control. A VC steers the process of measurement data collection, processing, and visualisation. The processing and interpretation of collected measurement data is usually related to a special measurement purpose, like analysing effort deviations, or guiding a project manager. A set of techniques and methods (from the repository of control components) is used by the VC for covering the specified measurement purpose. We refer to the applied techniques and methods as controlling functions in the following. The visualisation and presentation of the processed and collected measurement data is related to roles of the project that profit from using the data. The VC creates a set of custom-made controlling views, which presents the data corresponding to the interests of the specified role, such as a high-level controlling view for a project manager, and a detailed view of found defects for a quality assurance manager. The whole visualisation catena (including controlling functions and views) has to be adapted in accordance with the context characteristics and organisational environment of the software development project currently controlled. The visualisation catena is the central component of the Specula framework and is mainly responsible for the purpose-, role-, and context-oriented interpretation and visualisation of measurement data. Visualisation Catena

Web Form Instance

4manages data for

Data Entries

3processes contents of

Function Instance

3presents results of

View Instance

4 is build upon 6 instance of Web Form

Association Association Inheritance Inheritance Aggregation Aggregation Class Class

6 instance of

6 instance of

Data Type

Function

4 comprises 6 instance of View

6 accessed through DAO Package

DATA COLLECTION

DATA INTERPRETATION

DATA VISUALISATION

Figure 2: Components of a Visualisation Catena and their Types Figure 2 gives an overview of all VC components and their corresponding types. Basically, a VC consists of elements responsible for data collection, interpretation, and visualisation. The Specula controlling framework distinguishes between the following components on the type level: (T1) Data types describe the structure of incoming data and data that is further processed by the VC. For instance, a time series (a sequence of time stamp and corresponding value pairs) or a project plan (a hierarchical set of activities having a start and end date and an effort baseline) could be logical data types that could either be directly read-in by the system or be the output of a data processing function. (T2) Data access object packages describe the different ways concrete data types may be accessed. For instance, an XML package contains data access objects for reading data (having a certain data type) from an XML file, writing data to an XML file, or changing the contents of an XML file. A special package may be used, for instance, to automatically connect to an effort accounting system or bug tracking data base. A data access object contains data type-specific

22

parameters in order to access the data repositories. For instance, a URL, table name, user name, and password would have to be specified in order to retrieve data from an SQL database. (T3) Web forms describe a concrete way to manage measurement data manually involving user interaction. A web form manages a concrete data type. For instance, data may be added, existing data may be changed, or completely removed. A web form also refers to other data types that are needed as input. For instance, if you want to enter effort data manually, you need the concrete activities of the project for which the effort is accounted. Moreover, a web form can be parameterised according to a project’s and organisation’s context. (T4) Functions represent a certain controlling technique or method, which is used to process incoming data (like Earned Value Analysis, Milestone Trend Analysis, or Tolerance Range Checking). Usually, a function covers at least one measurement purpose, like monitoring a project’s attribute, analysing data, or just comparing actual data with baselines, and has to be adapted in correspondence to the context. A function needs different data types as input, produces data of certain data types as output, and may be adapted to a concrete context through a set of parameters. (T5) Views represent a certain way to present the (processed) data, like drawing a two-dimensional diagram or just a table with a certain amount of rows and columns. Usually, a view is associated with one or more roles of the project, like the project manager or the quality assurance manager, and has to be adapted in correspondence to the project context through a set of parameters. A view visualises different data types and may refer to other views in order to create a hierarchy of views. The latter may, for instance, be used to create a view for a certain project role that consists of a set of sub-views. In addition, the following components are distinguished on the instances level: (I1) Data entries instantiate data types and represent the concrete content of measurement data that are processed by the control centre. We basically distinguish between external and internal data. External data must be read-in or imported from an external location, or manually entered into the system. Each external data object has to be specified explicitly by a data entry containing, for instance, the start and end time and the interval at which the data should be collected. In addition, the data access object package that should be used to access the external data has to be specified. Internal data are the outcome of functions (executed by the control centre). They are implicitly specified by the function producing the corresponding data type as output and therefore need no explicit specification and representation as data entry. External as well as internal data may be used as input for instances of functions or views if their corresponding data types are compatible. (I2) Web form instances provide web-based forms to manually manage measurement data for data entries. All mandatory input data type slots of the instantiated web form have to be filled with concrete data entries and all mandatory parameters have to be set accordingly. If measurement data is managed by an external system (e.g., a company-wide bug tracking system) and consequently read-in via an external data access object (as specified in a data entry), no web form instance needs to be instantiated. (I3) Function instances apply the instantiated function to a certain set of data entries filling the mandatory input slots of the function. A function instance processes (external and internal) data and produces (internal) output data as specified by the function, which could be further processed by other function instances or visualised by view instances. All mandatory function parameters have to be set accordingly. (I4) Finally, view instances apply the instantiated view to a certain set of data entries filling the corresponding mandatory data type slots of the view. A view instance may refer to other (sub-) view instances in order to build up a hierarchy of views. For instance, if a certain view is suited for the project manager as well as for the quality assurance manager, it may be modelled as

23

one sub-view that is contained in the project manager’s view as well as in the quality assurance manager’s view. Each component of a VC and its corresponding type contains explicitly specified checks that may be used to test whether the specification is complete and consistent, whether data are read-in correctly, whether function instances can be computed accurately, and whether view instances can be created successfully. A visualisation catena comprises the whole data processing and presentation chain needed to build up a purpose-, role-, and context-oriented view on the incoming data. It consists of a set of data entries, each having exactly one active data access object for accessing incoming data, a set of web form instances for managing the defined data entries, a set of function instances for processing externally collected and internally processed data, and finally, a set of view instances for visualising the processing results. Comprehensive practical examples of Visualisation Catenae for different project goals may be found in [9]. A more formal specification of all components may be found in [7].

4. Selection of Control Components The previous section defined reusable control components and their instantiation as a projectspecific Visualisation Catena. For a goal-oriented selection of control components, a structured approach is needed that describes how to systematically derive control components from project goals and characteristics. The Specula framework makes use of GQM in order to define measurement goals in a structured way. GQM provides a template for defining measurement goals, systematically derives questions that help to make statements about the goals, and finally derives metrics in order to help answer the stated questions. In order to complete such a measurement plan for a concrete project, each metric can further be described by a data collection specification (DCS) basically making statements on who or which tool has to collect the measurement data at which point in time of the project from which data source. [9] describes usage scenarios on how to derive a GQM plan from a control goal and how to define a VC that is consistent with the defined goals. In this article, we will present the approach used for selecting reusable control components in more detail. 3visualises assessment of Goal

View Instance

assessed through 6

visualises answers of 3

3computes assessment for Function Instance

Question

4 refined by

3computes answers for

answered through 6

4 refined by

Metric

operationalised through 6 Data Collection Specification

3collects data for

3implements

4 comprises

6 presents results of

4 is build upon

6 processes contents of

3computes values for

Data Entries 3collects data according to

6 implemented through

Web Form Instance

Association Association Inheritance Inheritance Aggregation Aggregation Class Class

Figure 3: Relationships between GQM and the Visualisation Catena Concept

24

Figure 3 presents an overview of all relationships between a GQM plan, its DCS, and a VC. (A) Data entries collect measurement data for GQM metrics according to the DCS. If the data has to be collected manually, a web form instance is used to implement the DCS in addition. For instance, if the DCS states that the start and end date of an activity shall be collected from an MS Project file, a corresponding data entry is defined and a web form instance implements importing the project plan from the file. If a central repository exists for such plan information, the date entry would directly connect to the repository (via a matching data access object) and retrieve the measurement data on a regular basis as specified. In the latter case, no web form instance would be needed. (B) Function instances compute metric values if a metric has to be computed from other metrics. For instance, if a cost performance index is computed for an Earned Value Analysis [6], the budgeted costs of work performed and the actual costs of work performed are needed. This complex metric would be computed from the two simpler ones by a function instance. A function instance could also compute answers for GQM questions by taking into account all metrics assigned to the question and applying an interpretation model to all metric values. For instance, if the question were to ask for cost performance of the project, a function instance would assess the values of the cost performance index accordingly. In analogy, a function instance could assess the attainment of a GQM goal by assessing the answers of all assigned questions using an interpretation model. For instance, if the goal was to evaluate the project plan with respect to schedule and cost, a function instance could assess the answers of the corresponding questions asking for cost and schedule performance and cost and schedule variance. (C) View instances visualise the answers to GQM questions. A chart is produced or tables are displayed illustrating the metric results of the corresponding questions and the interpretation model used to answer the question. Basically, this means that the results of function instances and data entries are visualised by selected view instances. For instance, the cost performance and schedule performance index could be visualised as a line chart in which good and bad index values are marked accordingly. Moreover, a view instance could visualise the assessment of the GQM goal by visualising the combined answers of all assigned questions. Attributes •• Object Object •• Purpose Purpose •• Quality QualityFocus Focus •• Viewpoint Viewpoint •• Context Context

GQM Plan

Repositories Visualisation Catena VI VI

VV VI VI

G G

•• Question Question •• Object Object •• Quality QualityAttribute Attribute

Q Q

VI VI

FF

M M

M M

M M

DCS DCS

DCS DCS

DCS DCS

•• Object Object •• Collection Collection Time Time •• Data DataSource Source •• Collection Collection Procedure Procedure •• Validation Validation

VI VI

VV VI VI FF FI FI

FI FI

Q Q •• Definition Definition •• Object Object •• Quality QualityAttribute Attribute •• Scale/Domain/Unit Scale/Domain/Unit •• Type Type

G G Q Q M M DCS DCS

FI FI

FI FI

FI FI

DT DT

WF WF

DE DE

DE DE

DE DE

DE DE

WFI WFI

WFI WFI

DB DB

IDE IDE

DT DT DE DE WF WF WFI WFI

Goal Goal Question Question Metric Metric Data Data Collection Collection Specification Specification View View View ViewInstance Instance Function Function Function Function Instance Instance Data Data Type Type Data Data Entry Entry Web WebForm Form Web WebFrom From Instance Instance Association Association Data Data Flow Flow

Figure 4: Identification of Reusable Control Components based on GQM

25

Utilising the relationships discussed above, we may use the (semi-)formal description of a GQM plan to identify reusable control components and instantiate a corresponding VC accordingly. [5] gives a formal definition of a GQM plan and specifies several additional attributes that could be used to identify control components from a repository. Figure 4 illustrates the main attributes of a GQM plan specification and their influence on selecting the right control components from which the VC is instantiated. The high-level steps of this selection process are as follows: (S1) Assign each metric to a data type. The instantiated data entry will contain the metric values later on. Identify matching data types based on the metric definition, the object to measure and the quality attribute. For each simple metric (which is not computed out of other metrics) instantiate the data type and create a corresponding data entry. Use the data collection specification to determine the start time, end time, and interval when the data should be collected. Use the object and data source information in order to assign a corresponding data access object. Activate checks if a certain validation strategy matches those implemented in the data type. If the metric has to be collected manually (that is, no repository exists where the measurement data is stored), identify a web form based on the data source and attach the instantiated web form to the data entry. If no matching data type or web form can be identified, create a new control component matching the requirements of the measurement plan. (S2) For each complex metric (which is computed from other metrics), identify a function that is able to compute the metric based on the metric definition, the object to measure, and the quality attribute. If the algorithm cannot be matched to a single function, try to identify helper functions to compute the metric values. For instance, if a function checks whether actual effort values stay within a certain tolerance range and your project only defines one effort baseline in the project plan, scale the baseline in order to compute an upper and lower bound using a corresponding function. Instantiate the identified functions by first filling all input data slots with data entries or results of other function instances. Finally, parameterise the function instances according to the metric definition. For instance, if the tolerance range should be 10% above or below the effort baseline, set the corresponding parameter of a scaling function instance accordingly. Activate checks if a certain validation strategy matches those implemented in a function. If no matching function can be identified, create a new control component matching the metric definition. (S3) If an interpretation model is described in the GQM plan that defines how to formally answer a question, identify a function implementing this model based on the object and quality attribute addressed in order to compute the answers to the question. Instantiate the identified functions by filling all input data slots with data entries or results of other function ni stances assigned to the question. Parameterise the function instances according to the interpretation model. If no matching function can be identified, create a new control component matching the interpretation model. If a question is comprised of sub-questions, identify functions for each sub-question first and aggregate the answers of the parent question using a corresponding function. (S4) Visualise the answers of the question by identifying a set of views based on the kind of answers to the question and the data visualisation specifications of the measurement plan (if any). A view may be identified by brute force by searching for views that visualise the corresponding data type of (i) a function computing answers for the question or (ii) a function computing (complex) metric values for the question, or (iii) a data entry collecting (simple) metric values for the question.

26

Instantiate the identified views by filling all input data slots with data entries or results of function instances assigned to the question. Parameterise the view instances according to the data presented (e.g., title and axis description, size, and colour). If no matching view can be identified, create a new control component matching the interpretation model. If a question is comprised of sub-questions, identify views for each sub-question first and reference them in the visualisation of the parent question using a view that is able to align views hierarchically. (S5) If an interpretation model is described in the GQM plan that defines how to formally assess goal attainment, identify a function implementing this model based on the object and quality focus addressed in order to attain the measurement goal. Instantiate the functions in analogy to the 3rd step. (S6) Visualise goal attainment by identifying a set of views based on the kind of assessment of the goal and the data visualisation specifications of the measurement plan (if any). Instantiate the views in analogy to the 4th step. Visualisation Catena GQM Plan Object: Object: Purpose: Purpose: Quality Quality Focus: Focus: Viewpoint: Viewpoint: Context: Context:

Effort Effort Plan Plan Evaluate Evaluate Plan PlanDeviation Deviation Project ProjectManager Manager Project ProjectLAB LAB

Q1: Q1:Absolute Absoluteeffort effort deviation deviation per per activity? activity? M1: M1:Absolute Absoluteeffort effort deviation deviation above above baseline baseline and andbelow below10% 10%of of baseline baseline M2: M2: Effort Effort baseline baseline per per activity activity

DCS1: DCS1: Collect Collect weekly weekly from from project projectplan plan(stored (storedininMS MS Project Projectformat) format)

M3: M3: Actual Actual effort effort per per activity activity DCS2: DCS2:Collect Collectdaily dailyfrom from effort effortaccounting accountingdata database base (as (aseffort effortper perperson personand and activity) activity)

Reused Components

VI1: VI1: Effort Effort Controlling ControllingView View

V1: V1:Hierarchical Hierarchical Line Line Chart Chart

Effort Plan Deviation

FI1: FI1: Effort Effort Tolerance Tolerance Range RangeChecking Checking

F1: F1: Tolerance Tolerance Range Range Checking Checking F2: F2: Scaling Scaling F3: F3:Extract ExtractProcess Process Information Information F4: F4: Aggregate Aggregate Data Data

Lower Effort Baseline

FI2: FI2: Compute Compute Lower Lower Effort EffortBound Bound Effort Baseline per Activity

DT1: DT1: Project Project Plan Plan Structure Structure DT2: DT2: Table Table Structure Structure WF1: WF1: MS MS Project Project Import Import From From

FI3: FI3:Extract Extract Baseline Baseline Project Plan (and Effort Baseline)

WFI1: WFI1: MS MS Project Project Import ImportFrom From

Effort per Activity

FI4: FI4: Aggregate Aggregate Effort Effort Data Data Reported Effort

Effort Effort Accounting Accounting

Figure 5: Sample Selection of Reusable Control Components The selection process described above is largely performed manually. It could be partly automated depending on the degree of formality of the corresponding measurement plan. However, currently, performing this process requires a deeper understanding of the measurement program that should allow for project control and the components of the repository that may be reused. The control components within the repository depend on the organisation (and the very project that should be controlled). Some components may be more general and applicable for several companies, whereas others may be very specific and implement organisation-specific control strategies. This is also related to the different kinds of components in the repository. For instance, one control component may implement a (fairly) complex control technique (like Earned Value Analysis) and another component may just provide some simple data processing to support other functions (like scaling a time series or converting between different data types).

27

Figure 5 shows a very simple example of how to select reusable control components. The GQM goal was to evaluate the effort plan with respect to plan deviation. The one and only question asked was about absolute effort deviation per activity. A complex metric defined the deviation as the amount that an actual effort value is above an effort baseline or below 10% of this baseline. Two simple metrics were defined consequently and operationalised by corresponding data collection specifications. The baseline should be extracted from a project plan stored in an MS project file, so a corresponding web form collecting project plan information and a data type representing the project plan including the effort baseline are instantiated. Furthermore, a function is applied to extract the effort baseline from the project plan. The actual effort data should be extracted from the company-wide effort accounting system including the effort per person and activity. A data type is instantiated that connects to the accounting system using a corresponding data access object. A function is applied to aggregate the effort data for each activity across all persons. In order to detect effort plan deviations below 10% of the baseline, a lower effort bound must be computed. Therefore, a scaling function is instantiated and parameterised with 10%. In order to compute the complex metric, a tolerance range checking function is applied that computes the effort plan deviation accordingly. Finally, a view is instantiated in order to graphically display the results of the assigned function instances and data entries.

Figure 6: Web-based Implementation of the Specula Framework A visualisation of the example can be found in Figure 6. The screenshot is taken from the webbased Specula tool implementation and also contains other view instances of a project manager

28

controlling view. Overall, four GQM goals were interpreted and visualised, including effort plan evaluation, defect slippage control, defect states and classes control, and schedule deviation. The VC consists of 8 view instances, 5 function instances, 9 data entries, and 9 web form instances. The Specula framework is currently used and developed as part of the Soft-Pit project and evaluated there in the context of industrial application scenarios. First evaluation results will be presented in [3].

5. Related Work An overview of the state of the art in Software Project Control Centres can be found in [16]. The scope was defined as generic approaches for online data interpretation and visualisation on the basis of past experience. However, project dashboards were not included in this overview. In practice, many companies develop their own dashboards (mainly based on Spreadsheet applications) or use dashboard solutions (e.g., test dashboards [20][20]) that provide a fixed set of predefined functions for project control (e.g., deal with product quality only or solely focus on project costs) and are very specific to the company for which they were developed. Most of the existing, rather generic, approaches for control centres offer only partial solutions. Especially purpose- and role-oriented usages based on a flexible set of techniques and methods are not comprehensively supported. For instance, SME (Software Management Environment) [10], [14] offers a number of role-oriented views on analysed data, but has a fixed, built-in set of control indicators and corresponding visualisations. The SME successor WebME (Web Measurement Environment) [20] has a scripting language for customising the interpretation and visualisation process, but does not provide a generic set of applicable controlling functions. Unlike Provence [13] and PAMPA [19], the approaches Amadeus [18] and Ginger2 [22] offer a set of purpose-oriented controlling functions with a certain flexibility, but lack a role-oriented approach to data interpretation and visualisation. There also exist lightweight SPCC implementations (e.g., [4]) that concentrate on core metrics that are adapted to the corresponding project environment. The indicators used to control a development project depend on the project’s goals and the organisational environment. There exists no default set of indicators that is always used in all development projects in the same manner. According to [15], a “good” indicator has to (a) support analysis of the intended information need, (b) support the type of analysis needed, (c) provide the appropriate level of detail, (d) indicate a possible management action, and (e) provide timely information for making decisions and taking action. The concrete indicators that are chosen should be derived in a systematic way from the project goals [12], making use of, for instance, the Goal Question Metric (GQM) approach. Some examples from indicators used in practice can be found in [1] and [14]. With respect to controlling project cost, the Earned Value approach provides a set of commonly used indicators and interpretation rules [6]. With respect to product quality, there exists even an ISO standard [11]. However, the concrete usage of the proposed measures depends upon the individual organisation. Moreover, there exists no unique classification for project control indicators. One quite popular classification of general project management areas is given by the Project Management Body of Knowledge (PMBoK) [17]. The PMBoK distinguishes between nine areas, including project time, cost, and quality management.

29

The ideas behind GQM and the Quality Improvement Paradigm (QIP) [2] are well-proven concepts that are widely applied in practice today. [5] presents an approach based on GQM and QIP to create and maintain enhanced measurement plans, addressing data interpretatation and visualisation informally. Moreover, related work in this field is presented.

6. Conclusion and Future Work The article presented the Specula controlling framework in order to set up a project control mechanism in a systematic and goal-oriented way, profiting from experiences gathered. Reusable control components were defined and instantiated in order to illustrate how to define measurementbased project control mechanisms and to instantiate them for the software development projects of a concrete organisation. A high-level process was shown that gave guidance on how to select the right control components for data collection, interpretation, and visualisation based on explicitly defined measurement goals. Moreover, a simple example was presented of how to apply generically defined control components. The Specula framework implements a dynamic approach for project control; that is, measures and indicators are not predetermined and fixed for all projects. They are dynamically derived from measurement goals at the beginning of a development project. Existing control components can be systematically reused across projects (and partly across organisations if similar goals are concerned) or defined newly from scratch. Data is provided in a purpose- and role-oriented way; that is, a certain role sees only measurement data visualisations that are needed to fulfil the specific purpose. Moreover, all project control activities are defined explicitly, are built upon reusable components, and are systematically performed throughout the whole project. A context-specific construction kit is provided, so that elements with a matching interface may be combined. Further development and evaluation of the approach will take place in the context of the Soft-Pit project. The project includes the conduction of several industrial case studies with four different German companies, in which control centres and their deployment are evaluated. The project is mainly organised intro three iterations focusing on different controlling aspects. An application of Specula in the first iteration showed the principle applicability of the VC concept in an industrial environment [3]. The current iteration focuses more on the selection process of the reusable control components and will evaluate parts of the approach presented in this article. Future work will also concentrate on setting up a holistic control centre that integrates more aspects of engineering-style software development (e.g., monitoring of process-product dependencies, integration of supplier interfaces, and linking results to higher-level goals). The starting point for setting up such a control centre are usually high-level business goals from which measurement programs and controlling instruments can be derived systematically. Thus, it would be possible to transparently monitor, assess, and optimise the effects of business strategies performed (such as CMMI-based improvement programs).

7. Acknowledgements We would like to thank Sonnhild Namingha from Fraunhofer IESE for reviewing a first version of the article. This work was supported in part by the German Federal Ministry of Education and Research (Soft-Pit Project, No. 01ISE07A).

30

8. References [1] [2] [3]

[4]

[5] [6] [7] [8]

[9]

[10]

[11] [12] [13]

[14]

[15]

[16] [17]

[18]

[19]

Agresti, W.; Card, D.; Church, V.: Manager’s Handbook for Software Development; SEL 84101. NASA Goddard Space Flight Center. Greenbelt, Maryland, November 1990. Basili, V.R.; Caldiera, G.; Rombach, D: The Experience Factory. Encyclopaedia of Software Engineering 1, 1994, pp. 469-476. Ciolkowski, M.; Heidrich, J.; Münch, J.; Simon, F.; Radicke, M.: Evaluating Software Project Control Centers in Industrial Environments; International Symposium on Empirical Software Engineering and Measurement (ESEM) 2007, Madrid. (submitted) Daubner, B.; Henrich, A.; Westfechtel, B.: A Lightweight Tool Support for Integrated Software Measurement; Proceedings of the International Workshop on Software Metrics and DASMA Software Metrik Kongress IWSM/MetriKon 2006, Potsdam (Germany), 2006, pp. 67-80. Differding, C: Adaptive measurement plans for software development; Fraunhofer IRB Verlag, PhD Theses in Experimental Software Engineering, 6, ISBN: 3-8167-5908-4, 2001. Fleming, Q.W.; Koppelmann, J.M.: Earned Value Project Management; Second Edition. Project Management Institute, Newtown Square, 2000. Heidrich, J.: Custom-made Visualization for Software Project Control; Technical Report 06/2003, Sonderforschungsbereich 501, University of Kaiserslautern, 2003. Heidrich, J.; Münch, J.: Goal-oriented Data Visualization with Software Project Control Centers; In: G. Büren, M. Bundschuh, R. Dumke (ed.): MetriKon 2005, Praxis der Software-Messung, Tagungsbad des DASMA-Software-Metrik-Kongresses, Nov. 15-16, 2005, Kaiserslautern, Germany. Shaker, ISBN 3-8322-4615-0, 2005, pp. 65-75. Heidrich, J.; Münch, J.; Wickenkamp, A.: Usage Scenarios for Measurement-based Project Control; In: Ton Dekkers (ed.): Proceedings of the 3rd Software Measurement European Forum, May 10-12, 2006, Rome, Italy. Smef 2006, pp. 47-60. Hendrick, R.; Kistler, D.; Valett, J.: Software Management Environment (SME)— Concepts and Architecture (Revision 1); NASA Goddard Space Flight Center Code 551, Software Engineering Laboratory Series Report SEL-89-103, Greenbelt, MD, USA, 1992. ISO 9126: Software Engineering – Product Quality; Technical Report. ISO/IEC TR 9126. Geneva, 2003. Kitchenham, B.A.: Software Metrics; Blackwell. Oxford, 1995. Krishnamurthy, B.; Barghouti, N.S.: Provence: A Process Visualization and Enactment Environment. Proceedings of the 4th European Software Engineering Conference, Lecture Notes in Computer Science 717; Springer: Heidelberg, Germany, 1993, pp. 451-465. McGarry, F.; Pajerski, R.; Page, G.; Waligora, S.; Basili, V.R.; Zelkowitz, M.V.: An Overview of the Software Engineering Laboratory; Software Engineering Laboratory Series Report SEL94-005, Greenbelt, MD, USA, 1994. McGarry, J.; Card, D.; Jones, C.; Layman, B.; Clark, E.; Dean, J.; Hall, F.: Practical Software Measurement – Objective Information for Decision Makers, Addison-Wesley Professional; 1st edition, ISBN 4-320-09741-6, October 15, 2001. Münch, J.; Heidrich, J.: Software Project Control Centers: Concepts and Approaches. Journal of Systems and Software, 70 (1), 2003, pp. 3-19. Project Management Institute: A Guide to the Project Management Body of Knowledge (PMBOK® Guide) 2000 Edition. Project Management Institute, Four Campus Boulevard, Newtown Square, PA 19073-3299 USA, 2000. Selby, R.W.; Porter, A.A.; Schmidt, D.C.; Berney, J.: Metric -Driven Analysis and Feedback Systems for Enabling Empirically Guided Software Development. Proceedings of the 13th International Conference on Software Engineering, 1991, pp. 288-298. Simmons, D.B.; Ellis, N.C.; Fujihara, H.; Kuo, W.: Software Measurement – A Visualization Toolkit for Project Control and Process Improvement; Prentice Hall Inc: New Jersey, USA, 1998.

31

[20] SQS AG, SQS-Test-Professional, Component Process Performance Management to Monitor Test Quality and Error Rates, http://www.sqs.de/portfolio/tools/tools_ppm.htm, last checked Jan 11, 2006. [21] Tesoriero, R.; Zelkowitz, M.V.: The Web Measurement Environment (WebME): A Tool for Combining and Modeling Distributed Data. Proceedings of the 22nd Annual Software Engineering Workshop (SEW), 1997. [22] Torii, K.; Matsumoto, K.; Nakakoji, K.; Takada, Y.; Takada, S.; Shima, K.: Ginger2: An Environment for Computer-Aided Empirical Software Engineering. IEEE Transactions on Software Engineering 25(4), 1999, pp. 474-492.

32

Performance Measurement and Governance of software projects within the Management System of an ICT company Stefania Lombardi

Abstract The Company Management and Reporting System must guarantee effective information support to the Delivery Department to verify the achievement of Company strategic goals. These goals are divided into measurable goals to report on the state of the IT Delivery Department with regard to several analysis points of view (time and cost performance, productivity, quality). In this context the Performance Governance System is placed. This System, starting from 2001, collects measurement data of the performance of software projects (development, enhancement and maintenance), core business of the Company in accordance with an organiseational model and a system of metrics and measures shared by the delivery lines and the business areas of the company and consistent with the market ones. The System consists of a tool to continuous monitoring of running projects, comparison of measures with internal and external benchmarks, support to cost estimation in offering phase and in projects planning through the availability of a database constantly updated with regard to the performance measure values obtained in many different technological and production environments. The present database collects more than 430 projects for about 700,000 FP concerning development and enhancement software projects (200,000 of the current year) and about 1,800,000 FP (supported) in maintenance (350,000 for the current year). The data are collected by an incremental method starting with a minimum set of data, absolutely necessary for the analysis of performance. Other classes of data are gradually added so as to look after the evolution of productivity rates with regard to the technologies used, the size of projects, the business areas, the various kind of professionals (also if internal or external suppliers). The adopted method is “to begin with not many elements and gradually improve data quality, correlation with economics and with the indicators on software services quality”, still less with other productivity influence factors up to now the present process that allows the analysis of the productivity trend and benchmarking compared with the average productivity rate and compared with the Best In Class projects using the same technologies. The process provides for monthly reporting and a monitoring dashboard for the management in which the performance of the production department is analyzed together with the economic trend (using Balanced scorecard) and the customer satisfaction index.

1. Introduction The company management system model requires the implementation and availability of an accurate and flexible reporting, requisite to guarantee a suitable support to the top management and to both production department and sales areas, to verify the achievement of the strategic and operating goals.

33

In this context the company reporting is based on a model according to a “just in time” logic reflecting the organiseation capability to answer in a proper way to a competitive environment and on a management control system, process based, able to split the goals into part goals and responsibilities to assign to small teams or to single person. Starting from these bases the Organisation regarded it right to adopt the balanced scorecard methodology. Such model, in fact, offers a powerful support in the translation of the business strategy in action and therefore in the location of the pointers who concur to represent in a complete way the attainment of objects to you. The reporting to support the management system model of the operating structure reflects this approach, examinee simultaneously various dimensions of performance. The competitive success, the attainment of outcomes of high value for the shareholder and the customer satisfaction can be caught up to optimal levels only in presence of management tools that facilitate the valorisation of their interrelations.

2. The Organisation Strategy A basic requisite of the business strategic lines of action regards the “set-up of the operating machine” that is come true through an organiseational model focused on the customers requirements, both the acquired and the expected ones, through the efficiency improvement and a delivery excellence program directed towards the innovation, the quality, the operating excellence, the attainment of synergies between the delivery lines and the Business Unit. All strategic goals and the related implementing actions cover the dimensions of analyses provided by the methodology of the balanced scorecard and are synthetically represented in Figure 1. Financial perspective

For financial success how will we have to appear to our shareholders?

Internal processes perspective

Customers perspective In order to realize our perspective, how will we have to appear to our customers?

Perspective and Strategy

In order to satisfy our shareholders and our customers, in which business Processes do we need to be outstanding?

Learning and growth perspective In order to realize our perspective, how will we support our ability to change and to improve?

Figure 1 Specifically, the strategic goal “set-up of the business operating machine” needs, in the perspective of the internal processes, tightening targets of efficiency for the production activities, which turn to the systematic analysis of productivity for software development, enhancement and maintenance. 34

The process is briefly shown in Figure 2. Production Performance Governance

Share Metrics and Measurement System Collect and make available data measures Data processing and analysis Evaluation of Target to reach

Figure 2

3. Scope of analysis and Performance Governance System description Focus primarily refers to total amount of software production (development, enhancement and maintenance) for each year, and so data collection interests the year’s projects activities and has a quarterly updating frequency in accordance with the economic management control phases (budget, forecast, final). This allows the following accounting of elementary data on measured deliverables as to the key dimensions of financial analysis (order, contract, client, business area) and the relation with both economic data and the key indicators on service level and customer satisfaction. Professional effort is collected separately for each kind of activity in accordance with a shared classification between delivery lines and marketing sector and aligned to acknowledged market classifications (for instance “IT Services Market Research Methodology and Definitions” by Gartner). The most relevant indicators are identified for each production activity. Specific productivity measures are carried out for software development, enhancement and maintenance according to a system of metrics applied within the delivery lines and aligned with the external market ones. Function Points IFPUG (FP) are the software dimensional metric used. Software measurement of productivity is in FP/staff-day. Effort is normalised, if needed, considering one person-year equal to the internal standard value in person-days. For software development and enhancement the productivity value has been normalised with regard to the percentage of Software Life Cycle (SLC) actually performed, with the aim to make comparable projects having different SLC. For maintenance, software dimension is referred to all supported FP.

35

4. The Data Base in Performance Governance System Data base collects data of measured projects starting from 2002. His originality consists in the fact that the running projects are the basic components of the system; in such a way they are constantly monitored during each year following the timing of company economic management and control phases. Historical data of ended projects in previous years are another precious element for trend analysis and internal benchmark. The data base is a basic component in this way for the operation control system and project management. The graph in Figure 3 shows the progressive growth of data volumes valued as a percentage of all the activities of software development, enhancement and maintenance in the company. RATIO PER YEAR OF MEASURED SOFTWARE PROJECTS TO ALL SOFTWARE PROJECTS

75% 58%

63%

66%

80% 29%

60% % measured projects /Total Software Projects

40% 20% 0%

2002

2003

2004

2005

2006

Figure 3 At the end of last year the data base held productivity data for more than 2,000 person-year, for more than 430 projects and about 2,500,000 of Function Points. Of these about 1,900,000 FP refer to ended projects and 600,000 FP to running projects. About 700,000 FP concern development and enhancement software projects (200,000 in the last year) and about 1,800,000 FP (supported) in maintenance (350,000 for the last year). The Table 1 shows the most important data collected. It distinguishes the initial minimum set of data gathered in start-up phase of process and during the first years, and the evolution, starting from 2006 (second phase). In the first phase it aimed to the progressive filling up of information on the measured projects, with a few primary data and it took care to quality as regards completeness, consistency and accuracy. In the second phase further elements influencing productivity are added together with possibility to aggregate data according to economic control criteria (Business Area/Customer/contract/order).

36

Table 1 Phase 1 Phase 2 Project-Id Business Area/ Customer/Contracts/order Production activities (in accordance with a Factors influencing productivity for standard classification) development and enhancement (reuse, replication, skill, processing complexity, creeping requirements, documentation level, quality level (number of indicators), employees turnover) Effort (internal and external) Factors influencing productivity in Maintenance (number of operations, documentation level, skill, quality level (number of service level), percentage of supported FP changes Function Points Defects rate Applied technologies (percentage of each Language, Application Development Tool, DBMS, Operating System,….) SLC

4.1. Criterions of analysis Analysis is possible on the detail of single projects and under several points of view and aggregations; productivity values trend is presented with reference to: • Typologies of Productive Activities • Dimensions • Technologies • Working team composition (internals and/or externals) • Market Areas • Concluded and on going projects Best in Class are furthermore identified for each point of view. Moreover analysis is conducted both at a global level for the years as a whole and separately for each year. As information has to be used for on going projects monitoring, trend and benchmark for current year is particularly important.

37

Here are some examples of analysis: DEVELOPMENT AVERAGE 2003

2004

2005

2006

Productivity (FP/staff-day)

2002

Lev-3

Liv-4-9

Liv-10-18

Liv-19-27

Figure 4 RELATION BETWEEN PRODUCTIVITY AND PROJECT SIZE YEAR 2006 - DEVELOPMENT PROJECTS PROD. MEDIA BIC

PROD. MEDIA

PRODUTTIVITA' (FP/GG-PP)

FP: 8900

FP: 6050

FP: 9.065

0

150

300

450

600

750

900

1.050 1.200

1.350 1.500

1.650 1.800

1.950 2.100

2.250 2.400

2.550 2.700

2.850 3.000

3.150 3.300

3.450 3.600

3.750 3.900

4.050 4.200 4.350

4.500 4.650

4.800 4.950

5.100 5.250

5.400 5.550

5.700 5.850

6.000

DIMENSIONI (FP)

RUNNING PROJECTS

ENDED PROJECTS

Figure 5

4.2. Efficiency analysis in Corporate Management System IT sector productivity data treatment and representation modality for Corporate Management System purposes are based on what is object of analysis in the ambit of the Productive Performance Management described in the foregoing paragraphs and use all its primary data.

38

The representation supplements and completes what produced and provided in such ambit to IT delivery sectors and to the Corporate Management, enabling a data analysis in compliance with further aggregation levels (contract, customer, whole IT delivery) and a consultation integrated with data of further dimensions (financial, customer, of learning and growth) dealt in the corporate dashboard. For the purposes of the management dashboard and of the reports expected for the Delivery Review are used some derived indicators: Max (ProductivityScenery-1; ProductivityBalance yy-former) Represents the highest productivity value among those detected within the previous scenery of the current year and within the balance scenery of the former year Weighted Average on effort (ProductivityScenery-1; ProductiityBalance yy-former) Represents the average, weighted on the effort, among productivity value detected within the previous scenery of the current year and within the balance scenery of the former year Weighted Average_technologies operated by the contract Represents the average productivity within all detected projects, with reference to the Typologies of activities and to the technologies operated by the projects related to the contract under consideration. Average productivity is weighted on effort. Weighted BIC Average_technologies operated by the contract Represents the average productivity within all detected Best In Class (BIC) projects, with reference to the typologies of activities and to the technologies operated by the projects related to the contract under consideration. Average BIC productivity is weighted on effort. Weighted BIC Average_technologies operated by the contract and Weighted Average_technologies operated by the contract represent the two benchmark values used for the aims of the monthly delivery review reporting and of the corporate dashboard. Two different presentations are displayed pertaining respectively to the trend of productivity on the contract/customer and the comparison with reference benchmarks. 4.2.1. Productivity Trend on Contracts/Customers The objective of these reports is to give a brief indication on the contract/customer productivity trends by taking into consideration the current year’s observations and the relevant ones at the closing of the previous year. These suggestions are based on the gap found between the current productivity scenario and the former productivity scenario and the final balance sheet of the previous year.

39

4.2.2. Comparison with internal and external target The objective of the study is to provide the highlights on the productivity measures of software projects in the contracts (or as regards the customers) compared with the benchmark values. These are calculated by analyzing the productivity values found in projects which perform the same activities and use the same technologies used in the projects of the considered contracts/customers. The benchmarks are given by: • The average between the productivity of all the projects on the same technologies. • The average between the productivity of the Best In Class (BIC) projects on the same technologies. The charts and the figures in Figure 6 are an example of the reporting provided in the Business Process Review sessions and the monitoring dashboard.

Figure 6

5. Suggestions and conclusions Main success factors: • The first key element of success consists in the fact that together with the historical data gathered from ended projects also data of current projects are steadily gathered. The running projects are all monitored both as trend relevant to their own performances and also as comparison with internal and external targets. That consequently allows a better governance process. • The second success factor is in the gradual propagation of the system. In this “step by step” method it was possible to get on in accordance with the growth of data quality culture and sensibility within the delivery lines. The “step by step” improvement process presupposes an initial acceptance of few data (not all of high quality) and the following drawing up and spread of a prototype of the governance model within the organiseation. It has been after evolved year by year till the present system. • The third component of success can be found: o In the system capability to permeate the operative control processes. o In the impulse to the starting up of improvement plans based on the results of analysis.

40

o On the indications for the offering process. In a few words this means using the system in the day by day company management. The evolution of the model follows two guidelines: • In order to guarantee continuity of the service, the system must adapt itself to changes in the company related to new market scenarios and organiseational and structural changes • Extension of the measures of efficiency to other ICT related activities (Facility management, Operations, IT consulting, CRM) with the objective of reaching total governance of the ICT contracts in a market context in which a reduction of the traditional component of development and maintenance of software is in progress while the component of Business Management Services is increasing. The system does not depend on the size of the company, thus it is applicable to both large or small businesses in the ICT sector and in the IT department of companies or institutes that operate in other markets.

6. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]

Capers Jones, “Software assessment benchmarks, and best practices “ Addison – Wesle y Information Tecnhnology Series, 2000 Capers Jones, “Applied Software Measurement “, computing McGraw-hill 1996 Capers Jones, “Assessment and control of software risk “ , computing McGraw-hill 1994 Capers Jones, “Estimating software cost “ , computing McGraw-hill 1998 Donald J. Wheeler – David S. Chambers, Il controllo statistico dei processi, Franco Angeli, 1995 IFPUG Linee Guida al Software Measurement (versione Italiana) 2005 IFPUG , Function Point Counting Practices Manual, Release 4.2, International Function Point User Group, 2004 ISBSG, ISBSG Estimating, Benchmarking, and Research Suite R.8 , R.9, (“Data CD”) Gartner Vendor definitions Guide, 2005 GUFPI ISMA, Metriche del software esperienze e ricerche, Franco Angeli, 2006 GUFPI ISMA Linee guida all’uso dei Function Point nei contratti, 2006 CNIPA, Linee guida sulla qualità dei beni e dei servizi ICT per la definizione e il governo dei contratti nella PA, Centro Nazionale per l’Informatica nella Pubblica Amministrazione, 2005 ISO/IEC, IS 15939 Information technology – Software engineering – Software measurement process, International Organiseation for Standardization, 2002

41

LIB

42

Adopting CMMI Measurement Program in a Small Research Based Software Organiseation – A Case Study Saqib Rehan, Saira Anwar, Manzil-e-Maqsood

Abstract Quality enhancements are subject to improvements and changes in organiseation. The CMMI studies always showed that improvements in processes of software organiseation are very invaluable. This concept motivated to gain the same benefits in research and development based software organiseations. Improvements are the way to enhance the quality; otherwise the quality of the organiseation will gradually descend. Capability Maturity Model Integration (CMMI) is used as software process improvement strategy. Literature shows that CMMI is adopted in small to large sized software development organiseations. This paper describes results of our research based study that investigates embedment of measurement program of CMMI in a small research based software development organisation. This paper also identifies particular considerations to be made while assessing the capability of such organiseations. On the basis of these considerations a new CMMI assessment tool is devised. This tool is used to perform gap analysis in the target organisation which resulted in recommendations for improvement in all key process areas of CMMI level 2. The study concluded in development and pilot implementation of tailored processes for Measurement and Analysis Key Process Area (KPA) in targeted organiseation. The comparison of results after reassessment shows that the capability of target organiseation has increased.

1. Introduction It is evident that successful improvement can only occur when organiseations apply those strategies that best fit in their own context and particular developmental needs [10]. It is well known belief that processes are central to quality. Management often takes process approach to measure and improve their organiseation’s quality. ISO describes process approach as principle of getting results more efficiently [16,17]. The ISO/IEC 14598-5 [12] is used by some organiseations for the assessment of software product and processes from many divergent viewpoints but at the same time the capability maturity model integration (CMMI) [2,3] is well known for its best practices. The model is a combination of Systems Engineering (SE), Software Engineering (SW), Integrated Process and Product Development (IPPD), and Supplier Sourcing (SS) practices. The purpose of CMMI is two fold it not only assesses the processes’ maturity but also provides guidance towards the improvements that are needed in any software development organiseation. While ISO 9126 [13, 18] models are suited for the evaluation of internal and external product quality, the CMMI’s [2, 3] basic purpose lies in the evaluation of processes and improvements. Capability Maturity Model Integration (CMMI) [2, 3] for Software is the framework for assessing the maturity of processes in an organiseation. CMMI is recognised as the software process improvement strategy and typically is used in software development organiseation [6,7,8,9].

43

The concept of improvements is not new in the research and development based software organisation and here CMMI [2, 3] is chosen due to its best practices nature and its suitability for the processes assessments in improvements seeking organisation. This study is probably one of the earliest attempts to assess the adaptation of CMMI in research based organiseations for its process improvements. This paper discusses the possible issues for adopting CMMI in research based development organiseation. This paper focuses such an adoption of processes for Measurement and Analysis Key Process Area (KPA) of CMMI for research based projects. The subsequent section gives the scope and objectives of the study. Section 3 describes the benefits attained. Section 4 discusses the research methodology and the gap analysis performed along with their results. Discussion of targeted area for improvement, its pilot implementation results and conclusions are given in the remaining sections.

2. Scope of project The intent of this project is to assess processes and increase capability and maturity of research based organiseation. To serve the purpose, an organiseation of research and software development based is selected and issues are identified in that organiseation. The CMMI model is applied to assess the existing process and help in the incorporation of lacking ones. Finally organiseation processes are tailored for adoption of CMMI measurement program.

3. Benefits of the project The benefits of the undertaken project are mutual for the organiseation and for the people working on this project. This project allows us to understand: • Where the research organiseations currently stand on a roadmap of software process improvement plan? • How mature the existing processes in the organiseation are? • The specific practices that need to be incorporated in processes in order to reach a specific maturity level? Finding solution to these questions enables the organiseation to determine how well the current targets are achieved, helps in setting the future targets and provide with layouts of further improvement.

4. Research site and methodology For our study, we selected Center for Research in Urdu Language Processing (CRULP) as our research site. CRULP is conducting research and development in linguistic and computational aspects of Urdu and other languages of Pakistan. There are three areas in which research is conducted and then software are developed according to the research done, namely speech, computational linguistics, script processing. The organiseation usually follows the SDLC lifecycle to accomplish the activity In the projects undertaken by CRULP, a major portion of time is consumed by the research activities and afterwards actual software development work is carried out based on the research done. Organisation’s basic details are specified in Table 1.

44

Table I: Organisation details Organisation Size 50 Employees (approximately) including developers and researchers Organisation Maturity Level Never Assessed before Team Details Five A. Nokia B. Machine Translation Number of Teams Working C. Text to speech Translation D. Linguistics E. Lexicon The organiseation has 20 processes in Organiseation’s Set of Standard Processes (OSSP), which contains the information of the processes that guide research and development activities in the organiseation. As CRULP is a small research based development organiseation therefore the CMMI practices cannot be directly mapped on the practices followed in CRULP. CMMI provides the flexibility to tailor the processes according to the needs of organiseation. It gives the guidelines for developing processes for a software development organisation [5]. This capability of CMMI [2,3] is acquired in research and development based software organiseation for its processes improvement. In this section introduction is given on CMMI based assessment and improvement methodology. Then representation used to adopt CMMI in this project is discussed. The plan used to perform all the activities in this project is given. At the end actual assessment performed is discussed in detail by giving the gap analysis and its results.

4.1. Project plan The purpose of the activity is to perform the gap analysis for all key process areas (KPAs) of CMMI-Level 2 (Managed level) in the chosen research based software organiseation and then deals with the incorporation of measurement and analysis activities within the organiseation. The high level activities planned for process improvement are shown in Figure 1. Establish sponsorship and establish vision and scope were already part of the organiseation, and are achieved with the commitment of management. This study started with the development of the plan, performed the gap analysis, conducted the process of data gathering, builds the model of processes, developed the new processes, piloted the new processes, rolled out the new processes, and ended with the conduction of mini evaluation. The implementation of processes, its analyses and validation and future actions are under process.

45

Figure 1: Project Plan for CMMI Assessment

4.2. CMMI assessment and representation According to SEI [1], CMMI product suite was developed using a consensus-based approach to identifying and describing best practices in a variety of disciplines. Successful processimprovement initiatives must be driven by objectives of the organiseation. The Capability Maturity Model Integration (CMMI) staged representation contains 25 key process areas indicating the aspects of product development that are to be covered by organiseation processes [1].

4.3. Issues In general, research projects in a research and development organiseation present some specific challenges due to their inherent nature. Some of the challenges are as follows: • Varying nature of project life cycle: In a research organiseation the research and development activities are associated with each other. Though performable in parallel but development by nature requires the research activity as base. For the reason it becomes difficult to understand the exact nature of the project. Requirements are often ambiguous and not determined at start causing the variance in project life cycle. • Deliverables and customer expectations: The research time span in such organiseation is difficult to establish. So the projects often suffer from the problem of late delivery of the deliverables and understating the customer’s expectations. In particular, it is necessary to have process scaling and flexibility in tailoring approach and organiseation’s processes should be structured to facilitate this. Another challenge associated is that the corporate infrastructure may make it difficult for these types of projects to address certain CMMI requirements using common means.Further more ownership of research content in a project may be ambiguous effecting organiseational process linkage.

46

In addition to these, there exist other issues which are as follows: • Different teams work on different modules of the project, perform the research, do development then test the developed module and finally integrate them together. Since the projects are research oriented therefore the teams and the persons working on one module are the only one familiar with the domain. They are the only people who are able to distinguish and determine the testing cycle. Though they perform the testing and quality assurance activities but these activities are performed by the same teams which develop the module and thus no explicit testing team exist due to the special nature of projects. • In any organiseation a Software Engineering Process Group (SEPG) provides the basis for the activities of process improvement. The SEPG members not only assess organiseation’s capability but also develop plans to incorporate improvements. They coordinate these plans and help organiseation in gaining boom. They further measure effectiveness of their effort [11]. There is no SEPG for the management of research based processes in the chosen organiseation. As evident, the foremost requirement thus is to ascertain a SEPG group in the organiseation. For the formulation of SEPG group there exists possibility which includes the assignment of dedicated resources for carrying out activities of SEPG but as the organiseation size is small so it is difficult to have dedicated resources. The feasible solution is taken up with the consent of the management and SEPG is formed by selecting the resources from each team and assigning them additional responsibility of process management.

4.4. Gap analysis The target of formulated SEPG was to assess organiseation’s current capability and maturity and suggest improvements in processes. Keeping in view all issues, SEPG decided to undertake assessment activity. CMMI is selected for software process improvement program in taken research and development based organiseation for the current project. The organiseation has never been assessed for maturity of their processes and for the first time this task was under taken by SEPG. The projects done by CRULP were successful as they provide the needed functionality but the problem of schedule slippage and budget prevails. The defined processes of CRULP provided the evidence of its being at level 1. So it was decided to assess current processes on the scale of CMMI level 2 (Managed Level). [4] The SEPG found the deficiencies in processes during assessment and, to have a clear picture of them, performed the gap analysis. This activity ascertained the lacking areas and determined processes essential for boosting organiseation’s maturity. To perform gap analysis for a small research based development organiseation there was no tool available from Software Engineering Institute or any other source [2, 3]. As a consequence an assessment questionnaire was developed and validated for assessment of processes of CMMILevel 2 (Managed level), keeping in view the research based nature of projects and the practices given by CMMI. The questionnaire thus developed contains around 250 questions and all questions targeted the key process areas with their practices, sub practices and goal. It highlights practices that must be incorporated in any research and development organiseation for attainment of CMMI level 2. [2, 3] The answer of questions were aimed in three directions of “yes” for completely following practices, “No” for completely not following practice and “sometimes” for having that practice in organiseation but is left out according to current need and time [15].

47

4.4.1. Gap analysis results The SEPG collected the data through the use of the questionnaire and found that the practices of Measurement and Analysis key process area and supplier agreement management are not followed in organiseation. The results generated by gap analysis are represented in Table 2. Table 2: Summary gap analysis results Number Number of activities Number Number of Key Process Areas of Yes of No Sometimes answers answers answers Requirement Management 40 22 18 0 61 Project Planning 51 10 0 42 Project Monitoring and Control 19 18 5 23 Supplier Agreement Management 0 23 0 22 Measurement and Analysis 0 22 0 16 Process and Product Quality Assurance 11 3 2 21 Configuration Management 9 12 0 By analyzing the data in Table 2 it is visible that the assessed organiseation is currently not completely acquiescent to CMMI level 2 and many practices that come under many key process areas are not followed. These particular practices are needed to be addressed. The most lacking areas as evident are measurement and analysis & supplier agreement management as shown in Figure 2.

CM PPQA MA Key Process SAM Areas PMC

yes no sometimes

PP RM 0%

20%

40%

60%

80%

100%

Percentages (%ages) RM: Requirement Management PMC: Project Monitoring and Control SAM: Supplier Agreement Management MA: Measurement and Analysis PP: Project Planning CM: Configuration Management PPQA: Process and Product Quality Assurance

Figure 2: Comparison of key process areas for percentage compliance

5. Targeted improvement area As said, “you cannot predict nor control what you cannot measure” [19] so after the gap analysis result it was realised that it is necessary for any organiseation’s processes maturity to have the measurement program incorporation. It was also noticed that the target organiseation is not

48

collecting many metrics, nor collect them (if any) according to CMMI practices and thus neither use the collected metrics for any decision making, which indicates this area as the most promising for the attainment of CMMI [2, 3] level 2. This was a non-compliant area as shown in Figure 3, as there was no specific process catering for measurement and analysis practices. The target was thus to align the data collection mechanisms with the practices of CMMI [2, 3] level – 2 Measurement and Analysis KPA. Measurement and Analysis (Previous) 100 80 Key Goal Weightages

60 40 20 0 yes

sometimes

no

Align Measurement and Analysis Activities

Provide Measurement Result

Institutionalize Managed Process

Institutionalize Defined Process

Figure 3: Non-compliance of measurement and analysis practices

5.1. Measurement and analysis KPA Measurement enables detection of trends and helps in anticipation of problems, thus provides better control of costs, reduce risks, improve quality and ensure that objectives are achieved. Effective measurement processes help software groups succeed by enabling them to understand their capabilities so that they can develop achievable plans for producing and delivering products or services. “Metrics that are well chosen, accurately collected, wisely applied intensify the motivation that improves the process with in which people solve problems”.[14] Process measures, based on organiseational goals and challenges, are aimed at providing support for decision making in an organiseation [20]. Here one factor that existed in the organiseation was that the management of the organiseation had strong background related to measurement and thus wanted to incorporate measurement program in the organiseation but the teams working on projects have limited ground understanding and thus were not aware about the effectiveness of measurement program. 5.1.1. Purpose The purpose of Measurement and Analysis is to develop and sustain a measurement capability that is used to support management information needs. The integration of measurement and analysis activities into other processes of project facilitates the following activities: • Objective planning and estimating. • Tracking actual performance against established plans and objectives. • Identifying and resolving process-related issues. • Providing a basis for incorporating measurement into additional processes in future.

49

5.1.2. Processes Measurement and Analysis KPA of CMMI– Level 2 has two specific sub Goals: • Align Measurement and Analysis Activities. • Provide Measurement Results. The processes thus written are based on the practices of these goals and they are tailored as per the need of the CRULP while keeping in view the research oriented nature of the organiseation [1, 2]. 5.1.3. Metrics repository The metrics repository is an essential part of the measurement program in any organiseation. The metrics repository must contain the measures including the derived and base ones along with the formulas for calculating derived measures from base measures. The repository must provide the traceability of information need with measures and the data collected against those measures.

6. Pilot implementing processes Pilot implementation of newly developed processes was planned to be carried out on one team of a single project working in CRULP. The taken team was of Machine Translation. This team works on the part of the machine translation of words and sentences from one language to another one. There is another subdivision of machine translation team which is based on the logical phases of machine translation. There is another team which consists of linguistics they work to support each team in their development activities. All these teams work in a same manner. As they develop the initial research and then do the development, and most of the time they are partly dependent on other teams’ work. A specific example of this is the bugs reported by each group which needs to be evaluated within the teams so that the actual nature of the bug is identified. For effectively doing this there have to be process. And to measure the efficiency of the process and work being done there is a need to measure the process. The processes for measurement and analysis KPA require an information need for achieving some organiseational objective. This information need is mapped on to a set of measures for which data is collected, stored and then analyzed to report the information required. As the only constant in the world is “change”, the resistance to change is a known problem in improvement. So what we believe is that the resistance to any change with respect to processes, in any organiseation, is viable but can be minimised as where on one end the senior management’s continuous support exists and on the other the SEPG group is formed from with in the people who have to use the processes in the organiseation. That is in other words the ownership of the processes is within the hands of the people using those processes and not imposed on them from outside. This strategy helped us in achieving the sense organiseation wide that the benefits of processes are in favour of all the people working there. And not only a single group is imposing the processes with the apparent sense that they are the only one who needs the processes. Thus the risk of resistance was minimised. The details of pilot implementation of processes are as follows.

50

6.1. Information need The management of CRULP realised that though the data exist on large scale but organiseation is not making any use of it. The management thus raised the need for the reports on testing team efforts, and testing cycle time.

6.2. Area of piloting The selected area for pilot implementation of the measurement and analysis processes is testing cycle processes, from formulation of test cases to execution of test cases, from identification of bugs to bug fixing process.

6.3. Pilot implementation plan • • • • •

Pilot implementation of the processes was planned and then executed by following these steps: Processes and templates to be used with them were developed and made available to all the concerned team members before starting the piloting activities. The initial meetings were held with the team members of SEPG group. The processes and the templates were revised according to the suggestions and comments of the SEPG members. As the management raised the initial information need so the templates were initially filled by the senior management. The team leads and SEPG team leads of that particular team were trained to use the processes. The actual tool used by the Machine Translation (MT) team members to report a bug and report a fixation was automated with the help of scripting. This tool is shared organiseation wide through a shared repository.

The metrics repository was made for the overall measurement program to support specifying measures and then analyzing the data against those measures. The repository ensures that all the base measures are added before they are used in any formula for derived measures. It also ensures that the organiseation takes full benefit of measurement program by providing a collection of measures readily available for reference and for usage for any new information need that comes up. The structure of the measurement repository is given in Figure 4.

Figure 4: Structure of measurement repository

51

For a shared and customizable bug repository of the organiseation, a web based tool ‘Elips’ was suggested. By tailoring and configuring that tool according to the needs of CRULP it has been made part of the bugs reporting and fixation process

7. Comparison After performing the pilot implementation of the processes, we have conducted the assessment of Measurement and Analysis KPA of CMMI-Managed level. This reassessment was necessary to know the capability and maturity of the processes which were pilot implemented. After reassessment results were captured as specified in Table 3. Table 3: Reassessment results Total Measurement and Analysis activities Yes Align Measurement and Analysis 5 5 Activities 5 5 Provide Measurement Results

Someti mes

No

0

0

0

0

Institutionalise a Managed Process

10

8

0

2

Institutionalise a Defined Process

2

2

0

0

The re-assessment results showed that after the incorporation of CMMI practices the Measurement and Analysis key Process Area becomes compliant to CMMI-level 2 as shown in Figure 5. The organiseation is now only using the CMMI practices for the measurement and analysis but also capable for using the generated data for decision making. The results were quite effective and cause organiseation to have an effective measurement program specifically according to their and keeping in view its research oriented nature and at the same time integrating the CMMI practices. Measurement and Analysis (After New Processes)

100 Key Goal Weightages

50 0

yes

sometimes

no

Align Measurement and Analysis Activities

Provide Measurement Results

Institutionalize Managed Process

Institutionalize Defined Process

Figure 5: Compliance of CMMI

52

8. Conclusion The whole research was done with the aim of improving the processes of a research based development organiseation while keeping in view its nature of project and guidelines from CMMI. This research ensures that CMMI is equally applicable in any organiseation and even provides the flexibility for the assessment of research based organiseations. The activity gives an in depth understanding of software process development, process capability assessment, pilot implementation and process improvement. The processes made for target organiseation can be easily used in some other research based small organiseation with needed modifications. The metrics repository made for CRULP can be used for some other organiseation also as long as the basic scheme of the repository management is not changed.

9. Acknowledgement The authors would like to thank management of CRULP for their continued support throughout the software process assessment and improvement activities.

10. References [1] [2]

[3] [4] [5] [6]

[7] [8]

[9] [10]

[11] [12] [13]

CMMI-SE/SW v1.1, Staged Model , SEI Technical Report CMU/SEI-2002-TR-004, December 2001 available at www.sei.cmu.edu/cmmi/models/v1.1ippd-staged.doc CMMI-SE/SW · Staged Representation, Capability Maturity Model® Integration (CMMISM), Version 1.1. CMMISM for Systems Engineering and Software Engineering available at www.sei.cmu.edu/cmmi/models/v1.1se-sw-staged.doc Carnegie Mellon University, Software Engineering Institute: “Getting Started with CMMI® Adoption” available at http://www.sei.cmu.edu/cmmi/adoption/cmmi-start.html Dennis M. Ahern, Jim Armstrong, Aaron Clouse, Jack R. Ferguson, Will Hayes, Kenneth E. Nidiffer: “CMMI® SCAMPISM Distilled: Appraisals for Process Improvement” SuZ Garcia: “Writing Effective Process Guidance” available at http://www.sei.cmu.edu/ttp/publications/toolkit/WritingEffectiveProcessGuidanceTutorial.html Timo Varkoi, Timo Miikinen and Hannu Jaakkola, “Process Improvement Priorities in Small Software Companies”, Tampere University of Technology Information Technology, Pori P.O.Box 30, FIN-28601 Pori, Finland. Aileen P Cater-Steel, “Process Improvement in Four Small Software Companies”, Proceedings of the 13th Australian Software Engineering Conference (ASWEC’01), 2001. Aileen P Cater-Steel, “Low-rigour, Rapid Software Process Assessments for Small Software Development Firms”, Proceedings of the 2004 Australian Software Engineering Conference (ASWEC’04), 2004. Judith G. Brodman & Donna L. Johnson, “What Small Businesses and Small Organiseations Say About the CMMI: Experience Report”, IEEE , 1994. Pat Allen, Muthu Ramachandran and Hisham Abushama, “PRISMS: an Approach to Software Process Improvement for Small to Medium Enterprises”, Proceedings of the Third International Conference On Quality Software (QSIC’03), 2003. Fowler, Priscilla; Rifkin, Stanley (1990). Software Engineering Process Group Guide. CMU/SEI-90-TR-024. Carnegie Mellon University. ISO/IEC 14598-5:1998: Information Technology – Software Product Evaluation – Part 5: Process for Evaluators, International Organiseation for Standardization, 1998. ISO/IEC 9126-1:2001, Software engineering -- Product quality -- Part 1: Quality model, International Organiseation for Standardization, 2001

53

[14] Stephan H. Kan, “Metrics and models in software quality engineering”, 2002. [15] David Zubrow, William Hayes, Jane Siegel, Dennis Goldenson: “Maturity Questionnaire” 1994 CMU Special report CMU/SEI-94-SR-7 [16] ISO 9000:2000, Quality management systems Fundamentals and vocabulary, [17] ISO 9004:2000, Quality management systems Guidelines for performance improvements. [18] Scalet et al, 2000: ISO/IEC 9126 and 14598 integration aspects: A Brazilian viewpoint. The Second World Congress on Software Quality, Yokohama, Japan, 2000. [19] Software Metrics: A rigorous practical approach by N.E. Fenton, S.L. Pfleeger. 1997 [20] Practical Software Measurement: Measuring for Process Management and Improvement by W.A. Florec, R.E. Park, A.D. Carlton. 1997.

54

Benefits from the Software Metric System after 3 year of practice Guido Moretto

Abstract This paper illustrates the experience gained by InfoCamere, the ICT consortium company at the service of the Italian Chambers of Commerce, in the implementation of a software measurement system. After describing the background against which InfoCamere is set and its key features, the paper gives account of the motivation that led to the definition of a software measurement system (SMS) whose purpose to improve the estimation process of project efforts and more in general apply quantitative metrics supporting software development and maintenance processes. The paper also illustrates the approach of a metric system conceived as a seamless activity supporting projects. The paper shows the organisational design along with the tools used to measure the software and estimate project efforts. Efficacy of effort estimation process is analysed based on outcomes produced over a period of a few years. Finally the paper winds up on a number of lessons learned that can be of help to companies that need a permanent system for measuring software development and maintenance activities.

1. Introduction Often companies do not have clear ideas about the way software measurement should be performed. You may hear people talking about “metric programs” which are thought of being required to activate software measurement, but the definition itself generates the same misunderstanding as any initiative taking place in a limited period of time, being a “program” a coordinated set of projects, and projects by nature have a limited lifetime. A short term measurement activity while providing key information on software management process quantitatively, is not featured by the element of continuity necessary to trigger the valuable spiral of desired improvement that all management models should be aiming at. Many software measurement papers in the literature portrays attractive metrics-driven improvement perspectives both in terms of management and customer relationship reiterating that it is possible to set up indexes based on software measurement size to evaluate system efficacy. The business world is far more complex and any pervasive activity with a strong impact on business management issues has always a high mortality rate because it needs to withstand resistance to change of business micro units within the company that are affected by the initiative, and because it takes a lot of time for these initiatives to start producing measurable outcomes and in the meanwhile they may even be canceled before they deliver any results at all. The choice of having clear-cut strategy and objectives is instrumental in helping the system metrics manager to show to the top management the efficacy of the measurement method in a short time. This paper describes the experience of InfoCamere and the benefits they derived from choosing a long-term software measurement system heavily built upon the project management process. However limited in its scope of software measurement use, our approach proved quite effective against its objectives and provided a better insight into the project process.

55

2. Context InfoCamere is the ICT consortium company at the service of the Italian Chambers of Commerce responsible for the design and management of the network of the Italian Chambers of Commerce nationwide. Its statutory mandate is the management of its key financial registers (register of companies, register of non payments, statements of accounts etc.). The customer base consists of 103 Italian Chambers of Commerce, professional associations, trade associations, local and central public administrations, business and economic information agencies. InfoCamere is credited with CNIPA as a digital signature certification authority and with over two million certified documents it is recognised as the leading digital certification authority in Italy. It provides certified mail and document administration services to local and central public administrations and private companies. Here are some figures about InfoCamere: • Over 500 operated servers. • 80,000 GB data. • over 8 million transactions per day. InfoCamere operates a software fleet consisting of some 600 application products such as clusters of items belonging to a given application and provided with a single lifecycle both in terms of definition and distribution (development-test-production), rallied in some 100 logical applications. The wide range of services on offer entails seamless project management activities both in terms of developing “green-field” projects and providing enhacement maintenance to “brown-field” software projects.

3. InfoCamere Metric System Faced with potential market developments, InfoCamere saw the need to review its production processes with a view to reassessing its competitive advantage within the market. The management agreed on a strategy that would favour, improve and consolidate the knowledge of its business performance in particular in terms of its software production processes. For this purpose in 2003 a new project was launched for the setting up of a Software Metric System, known as “InfoCamere Metric Systrem” (ISM). Along with the activation of an ISM (InfoCamere Metric System), InforCamere also launched a controlled process for project management purposes involving business, technical and administrative staff in the definition, development and monitoring of projects designed to set up new products and services. The Project Control Room is headed by the Project Management Office that is responsible for carrying out a wide spectrum of activities ranging from project management empowerment to project coordination, and project monitoring definition. The definition and the monitoring of the project flow allowed the ISM to monitor and evaluate all business activities in terms of software size estimation.

56

3.1. ISM set up

C.E.O. Project management Office & Metric System (SMI)

Dep. Camere Metric Consultant

Dep. Registro Imp MC

Dep. Prodotti Applicativi

Other Department. .. . .. ..

MC

Figure 1 The ISM (the InfoCamere Metric System) is set up based on a soft matrix where a staff unit (Metric System Officer) is responsible for functionally coordinating a group of experts (Metric Consultants) within the product line. 3.1.1. Metric consultant The Metric Consultant is selected out of several professionals working as analysts and acting as an interface between customer-based explicit functions and technical issues. Metric Consultants receive training based on the revision and interpretation of user requirements and therefore on software size subjects, from functional measurement to effort estimation. Those professionals within the company stand as a Professional Community capable of delivering technical assistance and services within the Metric System, preferably within their own department but also across the board and out of their own department. Life-long training known as the “Action Learning” program, proves to be the key distinctive feature of the Professional Community which is articulated into regular workshops where specific Function Point issues are dealt with and where “metric” subjects are explored extensively, these workshops however are also seen as an opportunity to acknowledge the work of the MC as a building block of the company metric system. 3.1.2. Instruments A number of instruments and tools have been developed based on Calc – Open Office.org, to facilitate the work of Metric Consultants in the following tasks: • Software measurement using Early & quick Function Point methods. • Software measurement using FP – IFPUG methods. • Project efforts measurement (supporting the Project Manager). • Assessment of congruity in sw outsourcing contracts (supporting the Project Manager).

57

3.1.3. Project Effort Measurement The model used for measuring project efforts proves to be particularly interesting.

Figure 2 Because measurement is a highly complex process that can hardly be framed within discrete mathematical models, it was therefore necessary to define an effort measurement methodology capable of looking at several models jointly. The Meta-model was based on FP statistical curves and international database (ISBSG) but over time other derived curves were incorporated based on company project data (project track records). In addition to FP/Effort statistical models, other measurement models based on project peer review, WBS macro estimates and PM personal experience estimates were proposed. Data produced by these models will be evaluated by the PM who will assign a weighted value to each result in order to develop a weighted measurement average (Figure 2). As project features can significantly affect the effort rate, statistical models are supported by “a corrective factor related to project productivity”. These corrective factors are selected companywide as those showing the strongest impact on the effort rate.

58

Figure 3 The use of one single company form for effort measuring purposes has increased transparency of estimate processes. Because of the Meta-Model, the Project Manager is therefore encouraged to formalise the estimation process used and possibly review the estimate process based on track records. 3.1.4. Software measurement

Figure 4

59

For all projects under way it was agreed that software measurement should be performed at the early stage (Business Plan or Project Specifications) and at release time. Upon specific circumstances or upon request of the PM further size estimate or measurements could take place during the project lifecycle. Size estimate taking place at the early stage of the project aims at supporting the PM in estimating the effort required. Measurements activities performed at release time serve the purpose of expanding the database with aggregate records that could be exploited for further analysis.

4. Some figures 4.1. Project database The definition of a project database was one of the very early activities of the Metric System. Project database includes project size measurements (FP) performed throughout the different project stages, and key project features: technology, skills, outsourcer. Through the Project Management Office it is possible to gather all project data systematically, effort data can be easily made available, as well as data on costs, time of delivery, progress report, turn-over, customer relationship, both forecasts and actuals. Because of the availability of such data it was possible to carry out productivity analysis in order to upgrade estimation models on an yearly basis. At present functional size measurements performed at release time are available for some thirty projects, the size of such projects ranges from 50 to 1,500 FP; 75% of those projects are included between 100 and 500 FP.

4.2. Analysis of measurement efficacy Evaluating efficacy in a measurement system is something that can occur only ex post that is after completion of its reiterated application. It was necessary to wait for the completion of a sufficient number of projects using such measurement model in order to be able to perform such an analysis. For projects benefiting from the support of the Metric System at the early stage, the estimation error was worked out; the results for all the projects activated throughout the years follow: Table 1 Estimated with model Mean Magnitude Relative Error By starting project year Starting N. Proj MMRE years 2002-03 15 36% 2004 13 16% 2005 7 09%

Where: MRE = Magnitude Relative Error = abs(actual effort – estimated effort)/ estimated effort We see that the estimate error shifted from 36% when the Metric System was at an embryo stage, decreasing on a yearly basis as the estimation model improved and the process became more refined. The same analysis is carried out for projects that did not use the effort estimation model.

60

The trend of the estimate error follows: Table 2 Estimated without model Mean Magnitude Relative Error By starting project year Starting Year Num Proj MMRE 0 2002-03 14 31% 2004 6 35% 2005

Projects that did not use the estimate model were featured by an MMRE valued significantly above previous values, in essence unchanged throughout those years. To understand the situation better, all key sub-sets for which data was available were studied further. Eight significant clusters were identified: Table 3 Project Type Lump Outsourcing Contract Dimensioned using FP

1

2

3

4

5

6

7

8

C

C

C

C

I

I

I

I

no

no

yes

yes

no

no

yes

yes

no

yes

no

yes

no

yes

no

yes

In the Project Type C = Contract , I = Investment. Therefore two macro groups were identified:

Figure 5 In the first group there are projects size estimated with the FP method, whilst in the second group there are projects that were size estimated with the FP method already in the early stage, and this irrespective of the business model used for measuring the effort.

61

Interestingly it should be noted that the estimate error is found in all projects for which it was possible to measure the software size at the early stage.

5. Lesson Learned Although the analysis is based upon a limited amount of cases, what comes out quite clearly is the link between the FP size measurement at the project rollout and the efficacy of effort estimation. Such a link becomes ever more self-evident as the projects for which it was possible to measure the functions points at the early stage proved to be associated to a more effective effort estimation process. The FP size measurement was applied to all the new projects launched from 2004 onwards. Size measurement wasn't applied to the following projects kind: • Projects where it was not possible to apply Function Point metrics (technological adjustments, middleware, operating systems, etc.). • Projects where the PM failed to provide information and documentation required to carry out software size measurement.

5.1. Comparing estimation methods on efficacy The business estimation model was applied to most of the projects undergoing functional size measurement (Figure 4 – Size measurement using FP). It can be assumed that by defining an estimation meta-model consisting of several models, the PM is capable of making a critical assessment of the estimation made only based on hand-on experience.

5.2. Clarity of project requirements In order to carry out size measurements of the software using FP metrics it is necessary to ensure that functional requirements be defined with sufficient clarity. At the project early stage, the size measurement process calls for formalization of functional requirements to be released. In addition the PM is required to review all the analysis steps together with the Metric Consultant describing and explaining its content in a most exhaustive manner. Therefore the PM can identify weaknesses or inefficiencies, agree on how to correct and take account of them in the estimation process. An in-depth knowledge of “what to produce” leads to a thorough upgrading of the capacity to estimate efforts.

5.3. Measuring is not enough! Project management is built upon accuracy of data and reliability of measurement in terms of cost, time and amount of software to be released. Information proves to be useful only if it is consistent over time. The process supporting the project management should ensure that all project dimensions be adequately taken account of and monitored over time. In addition all the people working on projects should conform to one single operational procedure in a standard manner. In order for these requirements to be fully met it is necessary to provide the company with procedures that are fully shared among the people working on projects, from sales to marketing, planning and control and R&D.

62

The Metric System is not a “measurement island” but instead is an integral part of the whole project management process that the company has implemented over the last three years. Measuring is not enough, the lesson to be learned from the outcomes of our analysis is that the key to success is a systemic approach to quantitative issues in internal business processes.

6. References [1] [2] [3]

[4]

[5] [6] [7]

Antonio Candiello, “Qualità e Tecnologie informatiche per l'innovazione nelle PMI - Un modello integrato di gestione tra strumenti e comunità professionali”, Franco Angeli, Milano 2006 GUFPI ISMA, “Metriche del software- Esperienze e ricerche”, Franco Angeli, Milano, 2006 Moretto G., Lelli M., “Practical approaches for the utilization of Function Point metrics in IT outsourcing contracts. ”, in conference Software Measurement European Forum 2005 proceedings, Roma, 16-18 March 2005 Zanellato M.B., Moretto G., Lelli M., “A software measurement System to support Business: Sistema Metrico Infocamere”, in conference Software Measurement European Forum 2004 proceedings, Roma, 28-30 January 2004 Maxwell K., “applied statistics for software managers”, Prentice Hall, 2002 IFPUG, “IT Measurement”, Addison-Wesley Information technology, Boston, 2002 Robertson S. e J.,”Mastering the requirement Process”, Addison Wesley, 1999

63

LIB

64

Suggestions for Improving Measurement Plans: A BMP application in Spain Juan José Cuadrado-Gallego, Luigi Buglione, Daniel Rodríguez

Abstract Time and Cost are most often in industry the two main (often solely) dimensions of analysis against which a project is monitored and controlled, excluding other possible dimensions such as Quality, Risks, impact on society and Stakeholders’ viewpoints in a broader sense. Another issue of interest is the proper amount of measures and indicators to implement in an organiseation to optimising the two sides of the cost of quality (COQ - cost of quality - and CONQ - cost of non quality). How can multiple concurrent control mechanisms across several dimensions of analysis be balanced? The approach of Balancing Multiple Perspectives (BMP) has been designed to help project managers choose a set of project indicators from several concurrent viewpoints. After gathering experiences from Canada, Germany and Turkey, this paper presents the results from a new BMP application in Spain, using a list of 14 candidate measures interviewing a double set of respondents from industry. Lessons learned are presented for improving measurement plans.

1. Introduction A Software Engineering topic of discussion during the last 15 years has been the identification of main project failure causes; few of these studies list directly the amount of Tracking and Control (T&C) resources, the lack of historical data and the limited ability of internal staff to estimate effort and cost as a major item [1][2]. A well-known and cited study is the Chaos Report by the Standish Group: in 1994 the 52.7% of projects cost over 189% of their original estimates and only 16.2% of software projects were completed on-time and on-budget [3].

Figure 1: Percentage of IT Project Status (1994-2004) [3] After ten years - according always to the new Chaos Reports - in 2004 the situation seems to be slightly improved, but fundamentally the percentage of challenged projects remains stable (around 50% of surveyed projects), while the average percentage of costs and time overrun have been decreased – but in any case maintaining too high values - respectively of c.a. one third and one half.

65

Figure 2: Average % of IT Project Costs & Time Overruns (1994-2004) [3] It is not only necessary to analyze high-level indicators such as the ones presented in Figures 1 and 2, but also a deeper root-cause analysis to explain these trends. Furthermore, a greater attention must (or at least should) be paid to the ways a project could be more profitable and less defectprone, but often not as much to the project budget allocated for T&C activities. This issue is not only intimately linked with the Project Managers’ role skill1, but also with all those roles involved in project effort and cost estimates. Therefore, relevant questions would be what is the project budget percentage dedicated to those activities and how much does it cost to track and control a software project. From an economic viewpoint, T&C costs can be seen as part of the Cost of Quality (COQ) - including prevention and appraisal costs - as the counterpart to the Cost of Non Quality (CONQ) – including internal and external failure costs2. Figure 3 represents the classical view on COQ and CONQ: the break-even point (BOP) will be reached at t time, optimising the overall cost for quality (COQ+CONQ)3. Thus, each organiseation will determine and optimise its BOP in terms of time and money balancing available resources and taking into account the best number and type of measures to be managed for the T&C process choosing among different perspectives. This point, however, is not easy to calculate.

Figure 3: Relationships between COQ-CONQ along time Also, well-known software process improvement models such as CMMI [4] and ISO 15504 [5] require in their specific (basic) practices to take into account the amount of effort and cost that 4

1

More and more emphasized by the recent growing demand for Project Management certifications, such as Project Management Institutes PMP (www.pmi.org/prod/groups/public/documents/info/pdc_pmp.asp), Prince2 (www.ogc.gov.uk/methods_prince_2.asp), AAPM (www.projectmanagementcertification.org) and IPMA (www.ipma.ch). 2 For a detailed list of cost items to consider to compute COQ and CONQ, see [6][7]. 3 There is a huge amount of references about the effects while balancing COQ and CONQ in the Total Quality Management (TQM) literature. About a large review of COQ-related issues, see for instance [8]. 4 See also www.isospice.com

66

T&C processes require, so that project managers can properly balance their available budget across the different project phases and processes. For instance, CMMI in its staged representation requires a Project Historical Database (PHD) at its Maturity Level (ML) 3 in its OPD (Organiseational Process Definition) process area, which its purpose is “to establish and maintain a usable set of organiseational process assets […]; the organiseation process asset library (PAL) is a collection of items […] including […] data”; furthermore, looking a level below, ML2 processes, PP (Project Planning) in Specific Practice (SP) 1.4, sub-practice #3 states that it is needed to “estimate effort and cost using models and/or historical data”. There are a couple of issues to highlight: • In the recurrent hypothesis that a ML2 organiseation does not have or have planned to create its own PHD, a common practice is to run an extended “external” benchmarking process, even if MA (Measurement & Analysis) – another ML2 process – requires for collecting data as an fundamental enabler for the decision-making process. • “and/or” means that usually ML2 organiseations (but also at higher MLs) adopt in a non-critical manner estimation models such as COCOMO [9] or SLIM [10]. Even if their usage would have been performed under a “calibration process”, as described for instance in [9]5, current literature does not provides clear data about how much does it cost to calibrate these models 6 to evaluate if they allow project managers to initially save time and cost before implementing their own database and estimation models. In both cases, one direct consequence is to reduce the probability to improve our estimation ability and therefore the overall profitability of the organiseation. Thus, the problem is not solved, but only shifted, because no matter the (corrective) action taken, we do not control and understand which is the proper level of costs to allocate in our budget for planning, monitoring & control projects in an organiseation [11]. Demarco [12] stated in 1995 that “metrics cost a ton of money. It costs a lot to collect them badly and a lot more to collect them well […] At its best […] metrics can inform and guide developers and help organiseations to improve. At its worst, it can do actual harm. And there is an entire range between the two extremes”. In one of the few studies carried out in the ’90s proposing actual figures, Jones [13] reported the costs of measurement in projects to be approximately between 3% and 6% for internal projects measurement and between 2.5% and 4.5% for the external ones. Again, two out of ten problems leading to failure in the implementation of software measurement programs are reported by Rubin to be the intensive use of a single measure or, conversely, the use of too many [14]. What, then, is the issue surrounding measurement costs? Is it to reduce or cancel a portion of a measurement program in order to meet budgeted targets from an economic/financial viewpoint, or – more appropriately – to balance how the T&C process budget should be spent across several dimensions of analysis (e.g., quality, risk, ethics, user satisfaction, and so on)? Management tools such as the Balanced Scorecard (BSC) are based on multiple concurrent perspectives. In this paper, a procedure called Balancing Multiple Perspectives (BMP) is proposed to tackle this measurement issue, and could be used as a tool to reinforce the choice of measures and indicators to support the design of strategic maps [15]. It includes a questionnaire with a list of 14 candidate measures from 4 sections (respondents profiles and viewpoints, measures, causal 5 6

See Chapter 4. See a recent thread on the IFPUG bulletin board: www.ifpug.org/discus/messages/1778/8469.html

67

relationships, cost of the T&C process), with the objectives of representing the “as is” situation and determining what the “to be” situation will be, including cost figures to be possibly considered in future project budgets [16]. Since the maturity level when using and applying measurements can vary a lot among countries (i.e. educational programs in Software Engineering, ICT market demands, cultural resistance to measurement, etc.), it was decided to extend the experimentation to other countries to observe other possible attitudes and perceptions. BMP was proposed during 2005 and first results were gathered and presented in the first semester 2006 (H1/2006) by two sets of Canadian and German respondents, respectively MSc, BSc and PhD students and ICT professionals [17]. A second BMP experience was conducted in Turkey during 2006 by other two sets of respondents, Turkish ICT professionals and MSc or PhD degree graduates [18]. In this paper, we present a Spanish BMP application in a sample from Information and Communication Technology (ICT) Industries. The aim of this study is twofold. On the one hand, to stimulate the discussion in the technical community about which is the proper level of costs for properly supporting the measurement process to achieve established goals and, on the other hand, to take care of possible elements for corrective/improvement actions, mainly working on the causeeffect links in the company’s process strategic map. The paper is organised as follows: Section 2 presents the BMP, its objectives and the related procedure. Section 3 presents results from the survey and the assumptions under which it was conducted, while in Section 4 the results are analyzed and discussed. Finally, Section 5 reports our conclusions and some suggestions for future work.

2. BMP: Balancing Multiple Perspectives The average percentage of a project budget dedicated to the T&C process is generally underestimated. A first indirect evidence can be verified through Gantt charts or Project Plans, where T&C is often planned as activity and not as process. Using the Plan- Do- Check-Act (PDCA) schema, costs for the “Plan” phase are usually not considered, not including a series of micro tasks about coordination activities during the project lifetime and subsequent controls in the “Act” phase before arriving to the “Check” one. From a SPICE perspective, the clause less accomplished is Clause 8, the one about Measurement and Improvement 7. The ultimate corporate objective is (obviously) profitability – as also stated in the BSC approach. When fiscal quarterly results are strained, the counteraction is to reduce costs on projects and in cost-based activities, including what pertains to the “control” (and therefore measurement) sphere.

2.1. Objectives A key concept in the BMP approach [15] is that increasing performance does not need to be limited to reducing cost, but it can also be achieved by optimising through balancing the actual forces and energies at play within a project. While time and cost are the main analysis perspectives of interest to managers, other concurrent perspectives could be profitably be taken into account as well. It is obvious that increasing the number of controls increases the budget percentage allocated to T&C activities. Therefore, while maintaining both constraints (broadening the perspectives of analysis under the same project budget percentage for T&C activities), an interesting solution would be to balance the number of measurement controls across more than two perspectives. 7

See [6] about the Italian situation.

68

A basic mechanism behind BMP is to make more explicit trade-offs across several dimensions of analysis. For instance, if the priority is to pay more attention to time-to-market aspects, quality could suffer (in terms of product defect rate). Similarly, if the priority is to produce defect-free software products, a more adequate testing phase might be required, by increasing project costs while reducing the prospective project mark-up on the one hand, and, on the other, reducing the potential rework following the release through a lower defect rate.

2.2. The procedure The BMP procedure for controlling multiple concurrent dimensions consists of four steps (which could be performed jointly by a project manager and his quality assurance assistant): • Determine the dimensions of interest in the project: at least three dimensions – or four or five, as in EFQM [19], Baldrige [20] and BSC [21]. • Determine the list of the most representative measures associated with each dimension. • For each of the measures selected, identify which other control variables might be impacted negatively (e.g. counterproductive impacts: for instance, higher quality will often mean a greater initial cost or longer project duration; the same applies to cost and risk). • Determine the best combination of indicators and the causal relations between them to build a measurement plan for the project. It is not sufficient to perform steps (1) and (2) for designing a measurement plan within an organiseation, because in such a context, this produces only a list of measures (often project goalbased, and derived and classified by dimension of analysis; e.g. time, cost, quality, risk, ethics, user satisfaction). The added value from this list can be leveraged if relationships among those goals (measured and tracked against their measures over time) are established in the planning phase of this measurement plan, realising what Kaplan and Norton called the strategic map [21]. Hoffman recently asserted that “one problem comes from a lack of relationship between the metrics and what we want to measure […]. And a second problem is the overpowering side effects from the measurement programs” [22].

3. Application of the BMP Survey in Spain To corroborate and extend the lessons learned on the applicability of the BMP approach, a new trial was conducted in Spain, again taking into account two sets of questionnaires collected from Industry professionals from two companies. The description of the questionnaire used, the results and the related analyses are presented in this section. In addition, we seek insights from this trial on how to integrate such a procedure into project management activities. The questionnaire used is available on the SEMQ website8. It is composed of four main sections: (1) Respondent profiles and viewpoints; (2) Measures; (3) Causal Relationships and (4) Cost of the T&C process. A detailed list of the measures selected for the BMP questionnaire is presented in Table 1. The purpose was to obtain useful information about the current and desired measurement programs, both from technical and economic viewpoints.

8

www.geocities.com/lbu_measure/qestlime/bmp.htm

69

Table 1: A list of indicators from the BMP questionnaire QUESTION 1a 1 2 1b 3 1c 4 2 1 2 3 4 5 3a 1 2 4a 1 4b 1 4c 1

# DESCRIPTION Respondent profiles by project role (# and %) Experience profiles for current project role (# and %) # of analysis viewpoints (OLD) # of analysis viewpoints (NEW) # of selected measures (OLD) # of selected measures (NEW) # of affected viewpoints (NEW) Average (avg) number of measures by viewpoint (# and %) Ranking of selected measures by: abs value, respondent project role, analysis viewpoint List of causal relationships among measures Ranking of relationships by: abs value, respondent project role, analysis viewpoint % of respondents knowing the cost of M&C (monitoring and control) activities Max, Min, Avg and Med for the returned values (%) – OLD Max, Min, Avg and Med for the returned values (%) – NEW

3.1. Subjects of the Questionnaire The sample of this study consists of 15 Spanish professionals who had been involved in Software Engineering for years, and data was gathered between Q4/06 and Q1/07. In this paper, this sample will be referred as S1. The BMP questionnaire was provided to the respondents by the authors, who briefly outlined for them its main objectives and provided them with instructions for completing the questionnaire.

3.2. Questionnaire Results and Discussion In the following subsections, the results are presented for each of the respondents (R1, R2, etc.) against the measures listed in Table 19. 3.2.1. Question 1 – Respondent profiles and viewpoints Table 2: Respondent profiles by project role and experience for current project role 1a

In the project(s) you worked on, you contributed in the capacity of (stress your current role ): Role

R1

R2

R3

R4

R5

R6

R

R8

1a.1 R9

R10

R11

R12

R13

R14

R15

#

1a.2

%

7 Project Mgr

30

Team Lead.

4

5

1

1

Quality Ass.

3

Developer Tester

3 1

3

1

1

4 1

5

2

Avg (yrs)

0

Other

11

0,5 3 3

4 1

1

3 0,5

3

2

8

17,5

3

12

2,0

2

8

2,0

5

20

1,9

9

36

3,7

4

16

1,9

In terms of respondent demographics, the S1 respondents (n=15) were mostly Testers and Team Leaders (27%), followed by Project Managers, Developers and Quality Assistants (13%), while there was just a Systems Engineer. In terms of years of experience, Project Managers had an average 17.5 years of experience, Testers 3.7 years, Team Leaders and Quality Assistants 2 years while Developers and the Systems Engineering 1.9 years.

9

Note that only integers were used in percentages: therefore it is possible, due to rounding, that the sum of a series of values appears not exactly equal to 100%.

70

Table 3: Number of analysis viewpoints (current or past project) – Sample S1 1b

In the project(s) you R1 Viewpoint Time x Cost Quality x Risk Other(1) Other(2)

worked on, you contributed in the capacity of (stress your current role): R2 R3 R4 R5 R R R8 R9 R10 R11 R12 R13 R14 6 7 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

R15

#

%

Rank

x x x

14 12 10 2 2 1

67 57 48 10 10 5

1 2 3 4 4 6

Avg

2.9

1b.3

Tables 3 presents the viewpoints taken into account in the current or previous projects, and Table 4 the expectations of which viewpoints should be taken into account. The variables time and cost are currently the two most common viewpoints, followed by quality. Time was chosen by all respondents except R11 (quality assistant), who chose other alternative viewpoints. On average, the number of viewpoints chosen was 2.9, and therefore two viewpoints are usually considered for T&C activities for that dataset of respondents. Table 4: Number of analysis viewpoints (next project) – Sample S1 1c

How many viewpoints were usually managed R1 R2 R3 R4 R5 Viewpoint Time Cost Quality x Risk x x Other(1) x Other(2) x

for monitoring & controlling such project(s)? R R R8 R9 R10 R11 R12 R13 6 7 x x x

x x x

x

x x

x

R14

x x x

R15

x

#

%

Rank

1 1 4 9 2 2

5 5 19 43 10 10

5 5 2 1 3 3

Avg

1.1

1c.3

The second part of the question is what should be (or would be) added in terms of controls. Half of the respondents, no matter what their project role, felt that it was more urgent to consider the risk viewpoint in a structured way, followed by quality and then other perspectives (resources, in particular people; safety). With the exception of R11, respondents mentioned cost and quality as the two most important analysis viewpoints from his/her perspective, with no reference to time analysis. Moreover, respondents expressed the need for one further viewpoint, on average, for analyzing future projects. 3.2.2. Questions 2 – Measures The next group of questions concerned measures currently used/selected and those desired for future projects. We decided to propose a sufficiently standardised set of measures, i.e., the list of 67 measures/indicators suggested by the PSM (Practical Software & Systems Measurement) Guide, Version 4.0b [23]. This set of measures covers at least the four viewpoints suggested in the introductory paper on BMP (time, cost, quality and risk).

71

Figure 4: % of selected measures (Old/New) The overall number of currently selected measures was equal to 100% (67 out of 67), while 60 out of 67 measures would be introduced for a better control in future projects (90% of the total PSM proposed set). Next table presents the detailed figures by project role. Table 5: Number of selected measures by project role # Project Role Developer Project Manager Tester Team Leader Quality Assistant Systems Engineer

2 2 4 4 2 1

# OLD 41 30 57 67 25 21

# NEW 19 32 37 10 25 5

Avg # (OLD) 20 15 14 17 12 21

Avg # (NEW) 9 16 9 4 12 5

Comments c.a. 2:1 ratio between old/new measures c.a. 3:2 ratio between old/new measures c.a. 7:1 ratio between old/new measures c.a. 4:1 ratio between old/new measures

Table 5 proposes the distribution of selected measures (old and new) by viewpoint. There is a general tendency to be more focused on “old” measures, with a “conservative” approach not really devoted to apply new controls. In particular, the most defensive roles seem to be the Team Leaders group and the Systems Engineer, followed by Developers. Table 6: Affected viewpoints and average number of measures by viewpoint 2.3

Affected viewpoints and average number of measures by viewpoint T C Q R 269 245 279 136 Gen 44,83 40,83 46,50 22,67 24,07 13,36 26,42% % 27,41% % 212 196 213 107 Old 35,33 32,67 35,50 17,83 25,65 14,01 27,75% % 27,88% % 57 49 66 29 New 9,50 8,17 11,00 4,83 19,29 11,42 22,44% % 25,98% %

O1

O2 52 8,67

37 6,17

A bs Avg

5,11% 26 4,33

3,63% 10 1,67

% Abs Avg

3,40% 26 4,33

1,31% 27 4,50

% Abs Avg

10,24%

10,63%

%

Table 6 proposes the distribution of selected measures (old and new) by viewpoint. As observed from question (1), the most frequently chosen viewpoints overall are quality (27%), time (26%) and cost (24%), followed by risk (13%), and other secondary perspectives (9%). The same trend was observed also analyzing separately “old” and “new” measures. This is also shown graphically in Figure 5.

72

Figure 5: Viewpoints affected (Old/New) Staying with the measures, it is interesting to analyze which were selected more often, in terms of both currently used measures and desired measures. In order to show more significant data gathered, note that in tables from Table 7 up to Table 13 only measures selected at least by multiple respondents or assigned to more than a single perspective. Table 7: Measures selected, ranked and with detail by analysis viewpoint # Id. Category 34 1 25 52 53 2 15 20 4 16

Supportability-Mainten Milestone Performance Functional Size-Stabil Process Effectiveness Process Effectiveness Milestone Performance Personnel Envir.-Support Resour. Work Unit Progress Personnel

Measure Time to Restore Milestone Dates Requirements Defect Containm Rework Milestone Dates Effort Resource Utiliz. ProblReport Stat Staff Experience

Indicator SysFailures and Restoration Dev.Milestone Schedule Requirements Stability Req’s Def. discovered after Req Ph Dev.Effort by Activ.vs Tot.Rew.Eff Milestone Progress Staffing Level Resource Utilization PR Status Staff Experience

T

C

Q

R

6 13 8 7 9 13 5 7 7 7

7 8 3 7 7 7 9 10 4 6

9 5 12 6 7 4 8 6 11 8

6 2 3 4 2 3 4 3 3 4

O(1 O(2 ) ) Old New Tot 1 0 1 3 2 0 1 0 1 1

0 0 1 1 1 0 0 1 0 0

27 28 26 18 14 27 16 18 26 19

2 0 2 10 14 0 11 9 0 7

29 28 28 28 28 27 27 27 26 26

Table 7 presents the measures most selected, grouped by frequency of selection in four chunks. The most chosen measure is about the time to restore, but in the second chuck there are four measures about planning, requirements and defectability. It is interesting to note (see after also comments on Question 3) the selection of Staffing level and Resource Utilization in the third chuck, showing a huge attention on resources for a proper planning. The perspectives more frequently associated to these measures are Quality and Cost (not the opposite), showing a possible interesting signal about the priorities respondents provided in their answers. In the fourth chuck this indication has been stressed again with the Staff Experience, accompanied by the analysis of Problem Report (PR) status, in order to properly track the progress of the project related to the restorability from defects. Also here the most associated perspective is Quality before Cost. A general consideration is about the Risk perspective, resulting the less associated in general: this is an interesting signal about the role that risk has in the overall estimation process, often used more from a qualitative than quantitative viewpoint and it is possible to note it also looking back at Question 1b and 1c. Table 8: Measures selected, ranked and with detail by analysis viewpoint – Project Mgr # Id. Category 25 1 2 3 34

Functional Size-Stabil Milestone Performance Milestone Performance Milestone Performance Supportability -Mainten

Measure Requirements Milestone Dates Milestone Dates Milestone Dates Time to Restore

Indicator Requirements Stability Dev.Milestone Schedule Milestone Progress Maintenance Activities SysFailures and Restoration

73

T

C

Q

R

2 2 2 1 1

0 2 2 2 1

2 1 1 1 2

1 0 0 1 1

O(1 O(2 ) ) Old New Tot 1 0 0 0 0

0 0 0 0 0

6 5 5 2 5

0 0 0 3 0

6 5 5 5 5

Table 8 shows the five more selected measures by Project Managers. The stability of requirements is thought to be the most relevant, with a initial and evident impact on the following three milestone dates measures. Risk is slightly associated to those measures (50% of possible answers); measure #3 (maintenance activities) was the solely one with an association to risk as a new measure. It is possible to note that quite all measures are yet actually used, with a low tendency to adopt new measures. Table 9: Measures selected, ranked and with detail by analysis viewpoint – Developers # Id. Category 20 19 34 53 63

Envir.-Support Resour. Financial Performance Supportability -Mainten Process Effectiveness Customer Feedback

Measure Resource Utiliz. Earned Value Time to Restore Rework Survey Results

Indicator Resource Utilization Cost Profile w/Actual Costs SysFailures and Restoration Dev.Effort by Activ.vs Tot.Rew.Eff Customer Satisfaction Survey

T

C

Q

R

1 1 2 2 1

2 2 2 1 1

2 1 1 2 2

1 1 0 0 1

O(1 O(2 ) ) Old New Tot 0 0 0 0 0

0 0 0 0 0

4 5 5 2 4

2 0 0 3 1

6 5 5 5 5

Table 9 shows the selections by Developers. Resource Utilization is the most rated measure, followed by cost, time and customer issues. A couple of elements must be noted: measure #53 is mostly desired for future project and not always yet applied; measure #63 seems to be yet applied also to the development part of the project and “lived” by those people, not only by project managers as the responsible for the project in front of the external stakeholders or the internal Top Management. Table 10: Measures selected, ranked and with detail by analysis viewpoint – Team Leaders # Id. Category 7 8 16 2 4 22 53

Work Unit Progress Work Unit Progress Personnel Milestone Performance Work Unit Progress Physical Size-Stability Process Effectiveness

Measure ProblReport Stat ProblReport Stat Staff Experience Milestone Dates ProblReport Stat Lines of Code Rework

Indicator

T

C

Q

R

PR Status – Open Priority 1/2 by CI PR Status – Open Priority ½ by Type Staff Experience Milestone Progress PR Status SW Size by Config.Item Dev.Effort by Activ.vs Tot.Rew.Eff

1 2 2 3 2 1 2

1 1 2 3 0 2 3

3 3 3 0 3 3 1

2 1 0 0 1 0 0

O(1 O(2 ) ) Old New Tot 0 0 0 0 0 0 0

0 0 0 0 0 0 0

7 7 4 6 6 6 3

0 0 3 0 0 0 3

7 7 7 6 6 6 6

Table 10 shows two main chunks, with the Team Leaders choices: the first one mostly focused on PR status as well as on Staff Experience; the second one again on PR status, rework, milestone progress and software size (even if using LOC). Also here, rework (measure #53) is half associated both as an old/new measure. Table 11: Measures selected, ranked and with detail by analysis viewpoint – Testers # Id. Category 52 15 4 67 1 16 25 26 29 34 53 66

Process Effectiveness Personnel Work Unit Progress Customer Support Milestone Performance Personnel Functional Size-Stabil Functional Size-Stabil Functional Correctness Supportability -Mainten Process Effectiveness Customer Support

Measure Defect Containm Effort ProblReport Stat Req. for Support Milestone Dates Staff Experience Requirement s Requirements Defects Time to Restore Rework Req. for Support

Indicator Req’s Def. discovered after Req Ph Staffing Level PR Status Mean Response Time by Priority Dev.Milestone Schedule Staff Experience Requirements Stability Req.Stability by Type of Change Severity -1 defects status SysFailures and Restoration Dev.Effort by Activ.vs Tot.Rew.Eff Total Calls per Month by Priority

74

T

C

Q

R

4 2 2 2 3 3 3 2 2 2 3 2

2 3 2 2 2 1 1 2 2 2 2 2

2 4 4 3 3 2 3 2 2 3 2 2

2 1 1 1 1 2 1 1 2 2 0 1

O(1 O(2 ) ) Old New Tot 1 1 1 1 0 1 0 1 1 0 1 1

1 0 0 1 0 0 1 1 0 0 1 1

8 6 10 9 9 6 9 9 9 7 2 8

4 5 0 1 0 3 0 0 0 2 7 1

12 11 10 10 9 9 9 9 9 9 9 9

Table 11 presents the Testers’ choices, with four main chunks. The most selected measure is #52, showing a strong attention to timely evidence defects from the analysis phase in order to produce less and less defects when coding the software solution. But it is interesting the following selection (measure #15), denoting an interest to use the proper staff people for a certain testing activity (quality is the most associated perspective, more than cost), even if it’s half-rated as a new, desired measure, not always yet in place. Anyway, looking at all the other measures selected, it is possible to note that requirements are perceived more and more as the crucial element to work on before arriving to the CUT (Code and Unit Test) phase, in order to save time and money and increase the overall product quality for the Customers. Another common element noted also in this group is that the most desired new measure is #53 about rework, that – from a tester’s viewpoint – would mean to have more elements for properly plan testing activities along the whole software lifecycle and not only at the code level, reducing therefore also the number of PR. Table 12: Measures selected, ranked and with detail by analysis viewpoint – QA # Id. Category 20 1 2 13 14 16 25 29 30 34 35 52 53 64

Envir.-Support Resour. Milestone Performance Milestone Performance Personnel Personnel Personnel Functional Size-Stabil Functional Correctness Functional Correctness Supportability -Mainten Supportability -Mainten Process Effectiveness Process Effectiveness Customer Feedback

Measure Resource Utiliz. Milestone Dates Milestone Dates Effort Effort Staff Experience Requirements Defects Defects Time to Restore Time to Restore Defect Containm Rework Perform. Rating

Indicator Resource Utilization Dev.Milestone Schedule Milestone Progress Effort Allocation w/replan Effort Allocation by Dev.Activity Staff Experience Requirements Stability Severity -1 defects status Defect Density SysFailures and Restoration Mean Time to Repair or Fix Req’s Def. discovered after Req Ph Dev.Effort by Activ.vs Tot.Rew.Eff Composite Perfor.Award Scores

T

C

Q

R

1 2 1 1 1 1 1 1 1 0 0 1 1 0

1 0 0 1 1 1 0 0 0 1 1 0 0 0

2 1 1 0 0 1 2 1 1 1 1 1 1 1

1 1 2 1 1 0 0 0 0 0 0 0 1 1

O(1 O(2 ) ) Old New Tot 0 0 0 0 0 0 0 0 0 1 1 1 0 1

1 0 0 0 0 0 0 1 1 0 0 0 0 0

3 4 4 2 2 2 2 2 2 3 3 2 2 2

3 0 0 1 1 1 1 1 1 0 0 1 1 1

6 4 4 3 3 3 3 3 3 3 3 3 3 3

Table 12 proposes the selections from the two Quality Assistants interviewed. Resource Utilization was the most rated measure, followed by two milestone performance ones, yet applied in current projects. In the third chuck there are 11 measures about personnel (3), functionalities (3), supportability and maintenance (2), process effectiveness (2) and the customer feedback but with a particular view on the overall performance rating (measure #64), differently from the other stakeholders choosing measures from this group. Few new measures seem to be desired. Table 13: Measures selected, ranked and with detail by analysis viewpoint – Sys.Engineer # Id. Category 4 5 52 55

Work Unit Progress Work Unit Progress Process Effectiveness Technology Suitability

Measure ProblReport Stat ProblReport Stat Defect Containm Req. Coverage

Indicator PR Status PR Aging – Open PRs Req’s Def. discovered after Req Ph Critical Tech. Requirements

T

C

Q

R

1 1 0 1

1 1 1 0

1 1 1 1

1 0 1 1

O(1 O(2 ) ) Old New Tot 0 0 0 0

0 0 0 0

4 3 3 3

0 0 0 0

Last but not least, the viewpoint by the Systems Engineer interviewed. The greater attention is on PR, both in terms of status (measure #4) and their aging (measure #5), as well as about the frequencies in discovering defects after the requirement phase (measure #52). Finally, it was selected also an interesting non functional element (measure #55), where a Systems Engineer contributes in the Analysis phase. There is no new measure desired.

75

4 3 3 3

3.2.3. Questions 3 – Causal Relationships Question (3) was answered by quite all respondents (13 out of 15), and this consistent answer is a clear indication of how measures are spreadly used in their measurement programs. Two main relationships identified: staff experience & milestone progress (proposed by PM, QA and the Systems Engineer) and defectability & work unit/milestone progress (proposed by developers and testers). Crossing Question 3a (which measures?) and Question 3b (why), it appeared a more visible mid-long term view by the first group (PM, QA and Systems Engineer), probably having in their DNA this kind of approach, linking clearly cause and effects, while the answers provided by the other group for motivating their choices revealed what it could be called a “day by day” planning, where achieving milestones is the main goal to satisfy and report to their managers, through a reduced defectability. 3.2.4. Questions 4 – Cost of T&C process Concerning the current cost of the T&C process (question (4a)), only four respondents (2 project managers, 1 team leader and a tester) out of 15 stated that he/she had a rough idea about the cost. This answer was expected to be answered by most respondents. This can be a signal that this kind of project costs are not properly tracked during the project lifetime, but considered part of the more general “project management” cost item. About the “how much” (question (4b)), project managers were more prudent and realistic, providing a 15% indication, that’s the minimum value shown, while the tester provided the higher value. Table 14: T&C costs: Sample S1 Max Median Avg Min

Past-Current 30.00% 17.50% 20.00% 15.00%

Next 30.00% 20.00% 17.69% 5.00%

Difference 0.00% +2.50% -2.31% -10.00%

Concerning future projects (question (4c)), quite all respondents provided an expectation of budget allocation for T&C activities (13 out of 15): between 30% and 5% of the project budget, with an average of 18% (the respondent proposing a higher value was a team leader, the two project managers were also more prudent, confirming the 15% yet shown as the actual projects’ T&C cost). It is worth noting that the median is higher than the average value, showing a shared willing to slightly increase the budget for the T&C process, no matter the project role actually covered.

4. Conclusions and Future Work One of the problems when discussing Tracking and Control (T&C) in software projects is the amount of budget allocated in absolute terms, with little room for evaluating whether or not there is a proper balance in terms of perspectives for these controls. Usually, the two perspectives most often involved are time and cost, while others, such as quality, risk, safefy and so on, are occasionally taken into account, and possibly assigned the responsibility for any additional costs for new controls to implement on projects. But the key to optimising T&C activities, making projects more profitable, is not to eliminate controls, but to balance them, by attempting to cover and balance more viewpoints than simply time and cost.

76

This paper presented an application of the criteria for proper use of BMP (Balancing Multiple Perspectives), introducing a set of possible measures for data gathering and analysis based on the BMP questionnaire, which was tested by means of a samples composed of 15 experienced Spanish ICT professionals and working in large companies or as consultants. The initial results stressed that, in terms of desired perspectives, risk would be the first perspective to be implemented, followed by quality. Concerning measures, project managers would be more open to introducing new measures on projects, while team leaders pay more attention to not increasing costs and are quite conservative, as well as developers and the systems engineer interviewed. Again, the distribution of measures by viewpoint currently focuses more on the quality perspective (followed by time and cost), that is the same ranking also in terms of desired distribution. It is possible to observe that the measures more often selected from the proposed list have been assigned to the quality perspective, in particular “Systems Failure and Restoration” (measure #34). Another indication came from question 3 (causal relationships among measures): quite all people provide an answer and it is interesting to note the link stressed between staff experience and milestone progress, before considering defectability & work unit/milestone progresses, revealing the need to considering people as a starting element in the causal chain among SLC processes. Indirectly, this attention was noted also in the ranking of perspectives, where quality was unusually ranked as #1. Finally, concerning the cost of the T&C process (question 4), few people know how much really costs this process: the perception of how much is currently spent is probably higher than the reality (an average of 20% of the project budget), with an expectation for the future of a slight reduction (an average of 18%). Future work on BMP developments will involve further investigation through the application of the BMP questionnaire, and, after gathering an appropriate amount of data, a study of how to use the BMP as a tool to facilitate definition of the BSC strategy map in terms of the counter-effects of choosing indicators for each perspective, and of mapping them to the possible dimensions of analysis (e.g. time, cost, quality, risk, etc.) to achieve double-check balancing.

5. Acknowledgements We would like to thank Atos Origin Italy and the Spanish Ministry of Science and Technology for supporting this research (Project CICYT TIN2004-06689-C03).

6. References [1] [2] [3] [4]

[5]

[6]

Gilb T., Project Failure: Some Causes and Cures, Feb 29 2004, www.webster.edu/ftleonardwood/COMP5940/Student_Files/Project_Failure/ProjectFailure.pdf. NAO, Delivering successful IT-enabled business change, Report, UK National Audit Office, November 2006, www.nao.org.uk. Standish Group, The Chaos Chronicles version 3.0, The Standish Group, 2004. CMMI Product Team, CMMI for Development, Version 1.2, CMMI-DEV v1.2, CMU/SEI2006-TR-008, Technical Report, Software Engineering Institute, August 2006, www.sei.cmu.edu/publications/documents/06.reports/06tr008.html. ISO/IEC JTC1/SC7/WG10, TR 15504-5, Software Process Assessment - Part 5: An Assessment Model and indicators guidance, v.3.03, International Organiseation for Standardization, Genève, 1998. Thione L., La Qualità delle Imprese Italiane. Stato Attuale, Problemi e Prospettive, Monografia, Sincert, Dicembre 2005, www.sincert.it/docs/405RelQimprese1205.pdf.

77

[7] [8]

[9] [10] [11] [12] [13] [14] [15]

[16]

[17]

[18]

[19] [20] [21] [22]

[23]

Jagannathan S.R., Bhattacharya S. & Matawie K., Value Based Quality Engineering, TickIT International, No.1, 2005, pp.3-9 Schiffauerova, A. & Thomson, V., A review of research on cost of quality models and best practices, International Journal of Quality and Reliability Management, Vol.23, No.4, 2006, pp. 647-669, Emerald Group Publishing Ltd, www.mcgill.ca/files/mmm/CoQModels-BestPractices.pdf B.W. Boehm, C. Abts, A.W. Brown, et al., Software Cost Estimation with COCOMO II, Prentice Hall, 2000, ISBN 0130266922 Putnam, L.H. & Myers W., Five core metrics: the intelligence behind successful software management, Dorset House Publishing, 2003, ISBN 0-932633-55-2 Jones C., Software Project Management Practices: Failure Versus Success, Crosstalk, Vol.17, no.19, Oct. 2004, pp.5-9. www.stsc.hill.af.mil/crosstalk/2004/10/0410Jones.pdf Demarco T., Why Does Software Cost So Much? And Other Puzzles of the Information Age, Dorset House, 1995, ISBN 093263334X Jones C., Applied Software Measurement: assuring productivity and quality, 2/e, McGraw-Hill, 1996, ISBN 0070328269 Rubin H.A., The Top 10 Mistakes in IT Measurement, IT Metrics Strategies, Vol. II, No. 11, November 1996, www.cutter.com/benchmark/1996toc.html Buglione L. & Abran A., Multidimensional Project Management T&C - Related Measurement Issues, Proceedings of SMEF 2005, Software Measurement European Forum, 16-18 March 2005, Rome (Italy), pp. 205-214, www.dpo.it/smef2005/filez/proceedings.pdf Buglione L.& Abran A., Improving Measurement Plans from multiple dimensions: Exercising with Balancing Multiple Dimensions - BMP, 1st Workshop on "Methods for Learning Metrics", METRICS 2005, 11th IEEE International Software Metrics Symposium, 19-22 September 2005, Como (Italy), metrics2005.di.uniba.it/learining-metrics-workshop/Buglione.pdf Dumke R., Abran A. & Buglione L., Suggestions for Improving Measurement Plans: First results from a BMP application, Proceedings of SMEF2006, 3rd Software Measurement European Forum, 10-12 May 2006, Rome (Italy), pp. 209-224, www.dpo.it/smef2006/papers/b11.pdf Buglione L., Gencel C. & Efe P., Suggestions for Improving Measurement Plans: A BMP Application in Turkey, Proceedings of IWSM/METRIKON 2006, Potsdam (Germany), November 2-4, 2006, Shaker Verlag, ISBN 3-8322-5611-3, pp. 203-227 EFQM, The EFQM Excellence Model, European Foundation for Quality Management, 1999, efqm.org/publications/EFQM_Excellence_Model_2003.htm. NIST, Baldrige National Quality Program: Criteria for Performance Excellence, National Institute of Standards and Technology, 2007, www.quality.nist.gov. Kaplan R.S. & Norton D.P., Strategy Maps: Converting Intangible Assets into Tangible Outcomes, Harvard Business School Press, 2004, ISBN 1591391342 Hoffman D., The Darker Side of Metrics, Pacific Northwest Software Quality Conference, 2000, softwarequalitymethods.com/SQM/Summaries/DarkerSideMetrics.html. Dept. of Defense & US Army, PSM - Practical Software & Systems Measurement. A Foundation for Objective Project Management, Version 4.0c, March 2003, www.psmsc.org.

78

Measurement for improving accuracy of estimates: the case study of a small software organisation Sylvie Trudel

Abstract This paper describes the case study of a small Canadian software development organisation that has established and sustained a measurement program for its software activities, which includes functional size measurement using the COSMIC-FFP method. This company has been in operation for over 20 years, and has 11 employees, all directly involved in software projects. Their “not to exceed” estimate business model, guarantees that fixing all defects found by their customer are free of charge. Quality is absolutely not an issue since they deliver a new software release to their major customer every other week with less than one defect per release on average, which is usually fixed within three hours. For that reason, they do not require a defect management system. Thus, the motivation for a measurement program came from other issues such as the inaccuracy of initial estimates, commitment to quality and productivity by applying best practices and continuous improvement, and the desire to improve productivity due to the loss of potential contracts to offshore organisations. Many challenges were encountered and resolved while implementing the measurement program such as dealing with company growth, tuning of estimation models to improve accuracy between initial estimates and project actual performance data, and applying the required rigour to sustain measurement activities. This paper also describes a simple measurement plan. Measurement results presented were used to improve the accuracy of estimation models, in which the step-by-step approach is described. With more accurate estimates, several sound business decisions were made regarding future projects.

1. Introduction For many involved in software projects, accurate estimation is still perceived as an art, despite the fact that several estimation methodologies have been developed and published over the last years. Of course, the success of an estimation methodology implies collecting and analyzing accurate project measures that are later used in an estimation model. Some of these measures, namely effort and size, are known to be highly correlated as development effort is dependant on the software size. But these two factors alone do not guarantee accuracy of an estimation model and an organisation must stop and reflect in order to understand the other factors influencing the productivity of a software team. Adding factors to an estimation model may look like it increases its accuracy, but it could make it less accurate due to the error propagation inherent with each factor used [1]. Part of this thought process was made in a small Canadian software development company in order to come up with a more accurate estimation model. The need for accuracy came from their business model. The factors that were examined came from their product technology and their software process. This paper describes what was done and measured to improve the accuracy of their estimation model.

79

2. Company description 2.1. Company overview The small Canadian company was started 22 years ago by its president who is now acting as project manager. All of the 11 employees are developers, two of which are also analysts responsible for requirements development. They have no overhead but accounting (one day per week) and housekeeping are subcontracted. The company has up to six active customers, one of which is a large financial organisation managing loans for assets acquisition that makes up for an average of 80% of the company’s annual gross revenues. For that customer alone, the company develops and maintains a series of systems to support their sales and operations. One of these systems, a sophisticated ERP called SUM, is interfaced with 10 peripheral systems and has been developed and used for over 10 years. Maintaining and developing new features in SUM required seven person-years in 2006. They have a backlog of projects roughly defined and planned to keep the whole team busy for the next 6 to 8 months.

2.2. Business model The company’s business model is to provide each customer a “not to exceed” project estimate. If it takes less effort to develop a project than what was estimated, the customer is invoiced at a lower price. When it takes more effort to develop than what was estimated, the customer is invoiced the estimated price. Therefore, there is a strong motivation for accurate estimating because, if the estimation is too high, the customer may decide to outsource the project elsewhere, a situation which has occurred in the past. As well, if the estimation is too low, the company does not make any profit and may even have to absorb a loss. In addition, any defect found is to be fixed at the company’s expense. Therefore, there is a strong commitment to quality and it became a company goal to deliver defect free products, which can also be seen as a competitive advantage. Every hour spent performing activities during “Initiate project” and “Analyze and estimate” (see section 3.2 for process overview) is always billed to the project. As a result, the effort estimates must include all other types of activities: project management, software development and testing, documentation, packaging, and validation. Different hourly rates are used per activity depending on the skill sets required, i.e. individuals performing project management or analysis activities have significantly higher wages.

3. Process description 3.1. Process improvement initiative The company missed deadlines on several features given their short bi-weekly release cycle. They experienced cost overruns on half of their projects. Quality was not an issue since they usually have less than one defect per release, found once the release is in production and fixed within a three hour period. Nevertheless, some of their potential projects were lost to major outsourcing organisations in India from 2001-2002 and they needed to be more competitive. They had learned about the existence of the CMMI [2] and were concerned about applying its best practices to increase their efficiency and productivity. In November 2004, the company formally began to continuously improve its software process by hiring the author as an external consultant. They had a stable but undocumented process. They were not facing any big issues related to software development but small irritants were observed, namely in software development estimation and lack of formality in customer communication. The

80

path adopted was to learn about the CMMI, assess their practices against CMMI practices at a rate of one process area per month, and start improving their process on a continuous basis. The demand for features from their largest customer was growing, so was the team size. They wanted to continue to be results-oriented and needed to involve more team members into project coordination. The process was and still is project-oriented. They define a project as being a set of one or more related features laid out to develop or modify a part of existing or new modules. The average project effort is approximately 150 hours; some bigger projects attain more than 1,300 hours. They document user needs and requirements in simple text files which include interface mockups. They apply the Scrum methodology [3] for managing their projects and related detailed requirements. They use spreadsheets to gather planning data, design decisions, and test cases. Peer reviews are applied to selected deliverables. Effort is measured using a home-grown timesheet system called eMalaya.

3.2. Process overview The company’s process has nine phases (Figure 1), each of them being detailed in a set of activities directly producing or updating an output. Initiate project (define user needs)

No Go

Analyze and estimate Defects

Go

Manage project (coordinate work)

Sub-project assigned

No Go

Design, code and unit testing

Tests completed

Go Package new version

Install and validate version

Go

Release new version to production End of month

No Go: Defects Invoice projects (monthly)

Project completed Close project (retrospective)

Figure 1: Process overview

3.3. Measurement program The company already had a measurement program that included effort and schedule measures, and its main purpose was for billing the customers at the end of every month and at the end of the project and tracking R&D. However, no measurement plan existed that related the measures to the business goals because some of the required measures and indicators were not available. In the fall of 2006, it was decided to document the measurement plan as an exercise to understand the information needs of the manager and team members.

81

In small companies, when a new practice is introduced in the process, such as developing and maintaining a measurement plan, its advantages must be clear for the manager; otherwise he or she may decide quickly to abandon this new practice, especially if it seems cumbersome. So a very simple approach for documenting it had to be taken. It was decided to use the classic “GoalQuestion-Metric” (GQM) technique and to put the measurement plan into a spreadsheet with three worksheets in it: one for the goals (Table 1), one for the questions and indicators (Table 2), and one for basic measures (Table 3). Table 1: Company goals ID Goals Reason G1 Deliver projects within effort estimates Reach corporate goal of 30% gross margin. Ensure product quality and customer G2 Deliver defect free versions into production satisfaction, minimise rework. One of the challenges the manager and analysts encountered at the beginning of the measurement program was the continuous rigour and discipline required to feed the measurement program. At first, the main file used to monitor all projects, the project portfolio file, did not contain any of the measures or indicators and this data was simply kept in its source repository. It was decided early on to include these measures and indicators in the project portfolio file so the manager would be able to view a whole year of projects at a glance. Only then did the motivation to sustain the measurement program arise, because it looked simple, useful, and they knew exactly why these measures were taken and what decisions or actions should be taken based on indicator values.

4. Product description 4.1. Product overview SUM is deployed in 14 locations throughout Canada and is utilised by approximately 250 users on a daily basis. Three servers are used simultaneously to provide the required performance: Montreal (Quebec), Toronto (Ontario), and Calgary (Alberta). It is built on client-server architecture for Windows. Ten years ago, SUM was fully developed in Visual Fox Pro (VFP), including its database (see Figure 2 for Previous SUM Architecture). In 2003, the database was changed to MS-SQL Server, which was more secure and reliable than VFP, and they began a six-year reengineering plan in .Net C# (for both Windows and Web user interfaces), which includes refactoring the business logic and the data into separate layers to accommodate various interface types (Windows, Web, Mobile) without duplicating important portions of the source code that also requires more intensive testing (see Figure 3 for New SUM Architecture). Today, SUM is made of 310 windows and screens distributed in 11 internal modules and 10 external modules; it has over 1 million lines of code (see Table 4) and 474 database tables containing a total of 4,333 fields (see Table 5).

82

83

Manager+PM+Analyst

Manager+PM+Analyst

As soon as Scrum detailed estimated is done

All

After every release

"Defects" column

PM

Version file

Unit

- Re-estimate either plan or Scrum. When > 1, do a - If appropriate, advise customer of an retrospective. estimate change prior to beginning project.

- Verify that the process was When > 15%, adjust applied, especially on CRs. estimation model - Verify any encountered issue.

Manager+PM+Analyst

Stakeholders

Quarterly

Separate "Scrum effort" column

Scrum Master or PM

Project Portfolio file

Hour

Possible Actions

End of every project

Stored when

On top of the "Effort overrun" column

Manager


%

G2

How many defects do we have per year and per release? Number of defects per release and total

Q4

Use conditional formatting to When > 15%, highlight in When Scrum estimate exceeds planned When > 1, highlight in highlight any overrun. red. effort of more than 5%, highlight in red. red.

"Effort overrun" column

Stored where

Q3

What project proportion What are the differences between the has an overrun > 5%? planned effort and the initial Scrum detailed estimate? (Number of projects of Planned effort – Scrum initial effort overrun>+5%)*100/ total number of projects G1 G1

Q2

Analysis Procedure

Manager


%

For each project, what is the difference between actual effort and planned effort? (Actual effort (planned effort + CRs))*100 /(planned effort + CRs) G1

Q1

Responsible

Source of data

U of M

Goal

Formula

Indicators (questions)

ID

Table 2: Example of questions and indicators (related to goal #1)

User Interface Layer

VFP Database engine (local)

Synchronization package (VFP)

VFP Database engine (regional)

VFP Database engine (national)

Database Layer (SQL-Server)

Figure 2: Previous SUM Architecture

84 VFP, C# for Windows, or C# Web

VFP User Interface and Business Logic

Business Logic Layer

C#

Data Layer

SQL scripts, stored procedures, and user defined functions

Figure 3: New SUM Architecture

M2

M3

1 hr Employees Anatime

Precision Measured by Data source

Project plan

PM

1 hr

Hours

CR files

PM

1 hr

Hours

Scrum Works

Employees

1 hr

Hours

Scrum initial effort Per project

M4

Quality Verify Peer review Peer review Review by the assurance timesheets prior estimates to estimates to PM. to invoicing ensure that ensure that (monthly): right nothing was nothing was project, right forgotten. forgotten. task, consistent Validation Validation number of with customer. with customer. hours.

Data Timesheet must Project < As soon as a As soon as Scrum collection be entered 50 hrs = CR is initial effort is procedure every day manual only approved, completed, the enter it in the PM copies the Project > CR Follow-up effort value in the 50 hrs = FSM table in the project portfolio project plan. file.

Hours

Actual effort Planned effort Total effort for all CRs Per project Per project Per project

M1

U of M

Scope

Measures

ID

Table 3: Some of the basic measures (related to goal #1)

Table 4: Source code physical measures Language # lines, incl. comments # commented lines Comments ratio VFP 541,288 211,071 39% .Net C# 484,082 121,391 25% Note 1 SQL 100,098 18,800 19% Total: 1,125,468 351,262 31% Note 1:The database layer contains 398 Stored Procedures for 73,515 lines of code and 999 User Defined Functions for 20,327 lines of code. Table 5: Database structure physical measures DynamicNote 2 Static Total Number of tables 172 302 474 Number of attributes (fields) 2,751 1,582 4,333 Note 2:The content of Dynamic tables is synchronised every 15 minutes between the Montreal, Toronto, and Calgary servers, for redundancy and security reasons.

4.2. Product release cycle The product is released into production every other week. Supplemental releases may be required for bug fixing or the addition of an urgent feature. A new version releases several features from several projects. A project can also be developed iteratively and delivered over several product releases, in this case hidden or partially deployed project can be included in a release. For every release, the build master compiles a new version from the source code repository, installs it on the test environment and tests are performed for half a day (Thursday morning). If defects are found at this point, verbal communication occurs with the developer in charge who fixes the defect immediately, stores the modified code into the repository, and advises the build master who rebuilds a new version. These defects are not measured. When tests results show no defects, a readiness for validation notice is sent to their customer. Two team members supervise validation testing at the customer’s site during one to four hours (Thursday afternoon). At this point, any defect found is fixed Friday morning and the product is retested in the afternoon. These defects are not measured either. Once validated, the new version is released into production by the customer’s IT staff, using the installation procedure during the weekend. When defects are found after it has been sent to production, they are noted and measured in the “Version file”. These defects are fixed immediately and a new version is sent before noon that day. On average, fixing defects takes one hour, building the application requires 10 minutes, and deployment requires 30 minutes. This efficiency is possible using a home-grown deployment tool.

4.3. Product quality Delivering defect free products is one of the company’s main goals. In 2006, 35 releases of SUM were deployed, of which 17 had zero defects. In the other 18 releases, 28 defects were found and each of them was fixed within half a day. It represents an average of 0.8 defect per release. The main reasons for these astonishing results are: a robust software architecture, in-depth knowledge of the business domain, the framework used, and strong commitment to quality from management. Considered as a rare situation, this company does not use any defect management tool because there are simply not enough defects to manage.

85

5. Project estimation 5.1. Initial estimation process Before 2005, estimation was done on a task-effort basis. During analysis, a list of tasks was detailed to define the software work needed to accomplish the project in the four main software layers: user interfaces and reports, business logic, data and database. Every task was estimated by the analyst and a developer would validate the list of tasks and associated design and programming effort. A percentage was then added for testing and project management. As a result, half of the projects ended up exceeding estimates.

5.2. Functional size measurement Functional size measurement (FSM) using COSMIC-FFP [4] was introduced in the fall of 2005 as an ingredient to develop a new estimation model. FSM was then performed on 3 to 4 projects. In December 2005, as the “Guideline for sizing business application software using COSMIC-FFP” [5] was published, validation was applied to actual functional size and modifications were made resulting from a better understanding of the COSMIC-FFP method. The project plan template was modified to add a worksheet for FSM, which contains the list of functionalities with their associated data groups. The analyst was responsible to measure the size of current and new projects. When using the project plan template, the analyst identifies each functionality, its associated data groups and relevant data movements. Functional size is automatically calculated based on identified data movements.

5.3. Early estimation models based on FSM A productivity model was established based on functional size and actual effort. It was mostly used to validate estimates made on a task-effort basis. Significant differences in productivity models were observed among projects, ranging from 1.5 to 6 hours per size unit. Variation between estimates and project performance was sometimes more than 50%. This inaccuracy makes a big difference for a 150,000$ project vs. a 600,000$ project, where the potential customer would have accepted the first, but refused the second.

5.4. First observations on inaccurate results It was no surprise to find out that the most important factor for differences between effort estimates and projects actual effort was due to change requests (CR) that were not systematically estimated and measured. A so called “small request” made by the customer over the phone can become a 45% effort increase. A decision was taken to formally document any CR for approval by the customer and to monitor all CRs of a project in the project portfolio file. Also, effort for implementing CRs is entered in the timesheet system with specific description in order to isolate that effort easily. When effort from implementing CRs in a project was extracted, variations between actual effort and estimated effort were less than 27%. However, it was still significant and research was undertaken to improve accuracy.

86

6. Improving estimation models Intuitively, the analysts and project manager believed in the FSM approach for effort estimation and decided to refine the concept of measuring functional size without having to switch to the developer’s viewpoint. The following sections describe the detailed steps that were taken to come up with more accurate estimation models.

6.1. Step 1: assess reasons for inaccuracy from product and process Investigation helped explain this variation and it appeared that the main difference came from the technology used: VFP projects productivity averages 2.5 hours per size unit, and C# projects productivity averages 4.5 hours per size unit, once they thought the learning curve of team members was over. One of the difficulties was when several technologies (i.e., VFP for GUI and C# for business logic) needed to be integrated because it is more complicated and requires more effort, up to 6 hours per size unit. Also, a lot of effort had to be spent on creating stored procedures for new database tables. Many projects developing new features were actually using existing tables along with their stored procedures. In these cases, the required effort per size unit was lower than the productivity model of projects for which most tables needed to be created. The team productivity model behaved as if new development occurs only for the first functional process (i.e., creating GUI, business logic, data persistence, and database layers) and as if it switches to maintenance mode thereafter (i.e., stored procedures are already created and used in an evolutive or adaptative software maintenance). Training was provided by an expert on estimation based on FSM in fall of 2005. This expert described the following estimation ratios for software maintenance: • Add a new data movement = 100% of effort. • Delete a data movement = 10% of effort, mainly required for testing. • Modify portions of an existing data movement = 50% of effort. These ratios were applied on the estimation model itself (e.g., if the estimation model is 5 hours per size unit, deleting a data movement would require 0.5 hour and adding one or several attributes to a data group would require 2.5 hours for each affected data movement.) The functional size remains the same; an adjusted estimation model is simply applied on each data movement. The analysts tried to estimate using this technique but felt uncomfortable with it for the following reasons: • This method seemed appropriate only when the software architecture is in a single layer because it implies that all data movements related to that new data group are also new and have to be fully developed. • When developing in a multi-layer architecture, a new data group requires more effort to create when developing the first functional process, and less effort when reusing that data group data layer code and business layer code in any subsequent functional process. • When modifying existing data groups and data movements, such as adding attributes, there is a significant difference of effort due to the number of attributes affected, and thus the 50% ratio for maintenance needed to be redefined. Using the developer’s viewpoint was considered for FSM but the idea was quickly abandoned for fear of increasing measurement effort by having to measure all data movements for each of the software layers. Instead, it was decided to continue using the user’s viewpoint but try to establish the impact of reuse on each data movement.

87

6.2. Step 2: evaluate impact of reuse from software architecture layers In depth knowledge of the development process tied to the software architecture led the analysts to evaluate the impact of in-process reuse of basic software components. The analysts measured the ratio of effort required to develop several new functional processes requiring all new data groups. For example, such functional process could be a screen to assign an employee to a department. On average, half of the effort is spent developing the business and data layers in C#, 20% is spent on developing the database layers (SQL stored procedures and user defined functions), and the rest is for the user interface for which effort may vary depending on the technology used. The technology issue will be addressed later. When a second functional process is developed, such as displaying the list of employees with their current department, all required components from the database layer and many of the components from the business logic and data layer already exist, so the developer simply has to reuse them. In such case, the display (exit) of an employee/department record is considered as a new data movement, but reading the employee, employee-department, and department tables is done with reusable components, for which minor adjustments in the business logic must be done, along with required effort in the user interface for selection or filtering criteria. A function that requires a minor change is defined as adding one to three attributes in an existing data group, affecting all relevant data movements (e.g., adding a customer web site link in a user interface, which requires to add the web site field in the customer table, add the business logic to validate format of the web site link, and add the field in the user interface to be displayed.) All components exist but affected data movements require small modifications throughout all layers. The same principle applies for a major change defined as adding more than three attributes to an existing data group, affecting all relevant data movements (e.g., adding a shipping address and shipping phone number of customers on the “display invoices” function.) Example of obtained results is shown in Table 6. Table 6: Effort ratio per software layer Effort ratio Software layer New Reuse Minor change User interface 30% 15% 10% Business logic and data (C#) 50% 10% 10% Database layer (SQL) 20% 0% 10% Total: 100% 25% 30%

Major change 30% 30% 10% 70%

6.3. Step 3: apply reusability factors to data movements Taking into account the reusability allows “weighting” of each data movement (i.e., calculating a fraction of its size by applying the appropriate effort ratio: new=100%, reused=25%, minor change=30%, and major change=70 %). This activity is performed as part of FSM and is automated in the estimation spreadsheet, as shown in Figure 4.

88

Total : Module Create email/fax Create email/fax Create email/fax Create email/fax Create email/fax

Functional Process

Display main window Display main window Display main window Display main window Maintain address book Create Maintain email/fax address book Create Maintain email/fax address book Create Maintain email/fax address book Create Maintain email/fax address book (File continues…)

Trigger

New

0

0

1

0

1

-9,75 Reus e Impa ct 0

Documen tHeader curDocHe ader Error message Contacts

Reuse

1

1

0

0

2

-1,5

0,5

New

0

0

1

1

2

0

2

New

0

1

0

0

1

0

1

Reuse

1

1

1

1

4

-3

1

curConta ct

New

0

0

1

1

2

0

2

Error message

New

0

1

0

0

1

0

1

UserLogo n

Reuse

1

1

1

1

4

-3

1

0

0

1

1

2

0

2

Data Group

Reus e type

13

14

13

12

52 FFP Total

FFP Read

FFP Exit

FFP Entry

FFP Writ e

curUserLo New gon

42,25 Weighted size 1

Figure 4: Example of FSM and weighted size This reusability factor is applied during FSM and requires only one to two seconds per data group per functional process, which is negligible knowing that an average size project requires approximately 1.5 hour to measure.

6.4. Step 4: establish estimation models per technology To establish an estimation model based on the weighted size, several projects were measured, weighted, and then compared with actual effort. At mid-2006, the initial estimation models based on weighted size units (WSU) per technology were established (see Table 7). Table 7: Initial estimation model based on weighted size per technology Technology VFP C# for C# Web Windows Estimation model (hours/WSU) 3.22 3.86 5.15 These estimation models were not readjusted until recently when it was decided to follow-up on their accuracy. Three C# projects and two VFP projects were estimated and measured using the technique described above (see Section 6.6).

6.5. Step 5: adjust effort estimation with risk factors The analysts and project manager also discovered three risk factors that were influencing their productivity on certain projects: technology (known or unknown), complexity (low, medium, high) related to knowledge of business domain and business process of the customer, and number of other stakeholders involved (none, third party from the client, one or many vendors).

89

Contingency is added as a percentage of total estimated effort when risk is perceived. No contingency is required when the technology is known, the complexity is low, and no third party or vendor is involved. The sample of projects used in this case study did not require any contingency, so as the majority of developed projects.

6.6. Step 6: validate effort estimation with actual data The last step was to enter actual data in the project portfolio spreadsheet: actual functional size, estimated and actual effort, change requests effort, and weighted size. Then, automated computations were done for productivity (actual hours per functional size unit) and estimation models (actual effort per WSU), the latter being monitored to readjust the overall estimation model on a periodic basis (see Table 8). Table 8: Actual performance data for a sample of projects Proj Techno- Funct. Weighted Original Actual Overrun Producect logy Size size units effort effort % tivity # (FFP) (WSU) estimate w/o CRs model (hours) (hours) (Hr/FFP) 1 C# Win 218 159.0 598 567.4 -5% 2.6 2 C# Win 74 53.3 131 109.7 -16% 1.5 3 C# Win 124 89.5 223 236.9 6% 1.9 Average for C# Win: 2.0 Variance for C# Win: 0.3 4 VFP 47 42.0 102 78.7 -23% 1.7 5 VFP 66 55.5 155 138.3 -11% 2.1 Average for VFP: 1.9 Variance for VFP: 0.1

New estimation model (Hr/WSU) 3.6 2.1 2.6 2.8 0.6 1.9 2.5 2.2 0.2

7. Preliminary results of the "weighted size” approach The number of data points was not sufficient to conclude if the weighted size approach is successful or not to gain accuracy between estimates and actual effort. However, these preliminary results were observed from the measurement exercise: • The average productivity for C# Windows projects went from 4.5 to 2.0 hours per COSMICFFP functional size units, which is very good compared to similar projects found in the ISBSG repository [6]. This can be explained by three factors: i) the learning curve for the C# technology may not have been over last year when the initial estimation model was created; ii) six months ago, the manager has dismissed an employee perceived to be a “net negative producing programmer” [7] and, after that employee left, the manager perceived an increase of the overall team productivity; iii) the software process is applied consistently. • The productivity difference between C# for Windows and VFP projects seems to have decreased significantly, which may open new business opportunities. • There seems to be a tendency to overestimate, which is desired to a certain extent, due to the business model.

90

8. Conclusion and future work It is no surprise to observe that inaccurate projects are those for which the team did not have the discipline of formalising change requests. Even if it is unrealistic to desire an estimation model that is 100% reliable, early refinement results for C# projects with less than 16% variance are encouraging. Several other projects were being developed and measured at the time this article was being written, and the organisation will continue to monitor their actual performance data in order to readjust the estimation models on a periodic basis. However, if the weighted size approach does not result in an increased accuracy, it may be abandoned to continue using the model of hours per COMIC-FFP functional size unit.

9. Acknowledgements The author wishes to thank Michel Martel, president, and Stéphan Laporte, analyst of Analystik inc., ([email protected]; [email protected]) for their precious collaboration and information sharing. Special thanks to Carmela Caterina, president of the Montreal SPIN, for her comments that improved the written English of this text.

10. References [1] [2] [3] [4] [5]

[6] [7]

Santillo, L., “Error Propagation in Software Measurement and Estimation”, in IWSM/Metrikon 2006 conference proceedings, Potsdam, Berlin, Germany, 2-3 November 2006. Chrissis, M.B., Konrad, M. and Shrum, S., “CMMI: Guidelines for Process Integration and Product Improvement”, Addison-Wesley, the SEI Series in Software Engineering, Boston, 2003. Schwaber, K., “Agile Project Management with Scrum”, Microsoft Press, Redmond, WA, 2004. Abran, A. e.a., “COSMIC FFP Measurement Manual 2.2”, January 2003, http://www.lrgl.uqam.ca/cosmic -ffp . Lesterhuis, A. and Symons, C., “Guideline for sizing business application software using COSMIC-FFP”, the Common Software Measurement International Consortium, version 1.0, December 2005, http://www.lrgl.uqam.ca/cosmic -ffp . International Software Benchmarking Standards Group, http://www.isbsg.org . Schulmeyer, G.G., “The Net Negative Producing Programmer”, http://www.pyxisinc.com/NNPP_Article.pdf , consulted on February 18th 2007.

91

LIB

92

How to effectively define and measure maintainability Markus Pizka and Florian Deissenböck

Abstract Maintainability and flexibility at the software level are of predominant importance to drive innovation at the business process level. However, existing definitions of maintainability, such as the Halstead Volume, McCabe's Cyclomatic Complexity or the SEI maintainability index provide a very poor understanding of what maintainability is how it can be assessed and ultimately controlled. This paper explains a new and more effective way to construct software product quality models. The key design principle is the strict separation of activities and properties of the system. This separation facilitates the identification of sound quality criteria and allows to reason about their interdependencies and their effects. The application of this quality modelling approach in large scale commercial software organiseations helped to effectively reveal important quality shortcomings and raised the awareness for the importance of long-term quality aspects among developers as well as managers.

1. Introduction Virtually any software dependent organiseation has a vital interest in reducing its spending for software maintenance activities. This comes at no surprise as the bulk of the life cycle costs for software systems are not consumed by the development of new software but the continuous extension, adaptation, and bug fixing of existing software [21]. In addition to financial savings, for many organiseations, the time needed to complete a software maintenance task, such as an extension of an existing functionality, largely determines their ability to adapt their business processes to changing market situations or to implement innovative products and services. That is to say that with the present yet increasing dependency on large scale software systems, the ability to change existing software in a timely and economically manner becomes increasingly critical for numerous enterprises of diverse branches.

1.1. Maintainability – an ongoing confusion The term most frequently associated with more flexible software and significantly reduced longterm costs is maintainability. But what is maintainability? Frequently found definitions of maintainability such as “The effort needed to make specified modifications to a component implementation'”1 or “a system is maintainable if the correction of minor bugs only requires minor efforts” [24] are obviously overly simplified. The latter is particularly confusing as is in fact not a definition but a tautology. If one asked what a minor bug is, the answer would most certainly be “… its correction requires minor efforts”. Besides these rather naive definitions, various metrics-based approaches try to define maintainability as compliance to a set of rules that correspond to measurable properties of the code, such as strong cohesion, limited coupling, etc. The general problem with this approach is the lack of a sound rationale for the selected criteria which in turn sometimes has a tendency of discussing some kind of technical beauty instead of effectively improving software maintenance.

1

SEI Open Systems Glossary

93

In 2003 we conducted a study on software maintenance practices in German software organiseations [15]. While 60% out of the 47 respondents said that they would consider software maintenance as a “significant problem”, only 20% performed specific checking for maintainability during quality assurance. The criteria used by these 20% to check for maintainability differed significantly and ranged from object-orientation, cyclomatic complexity [18] limited numbers of lines per method, descriptive identifier naming, down to service oriented architectures or OMG's modeldriven architecture. Hence, there is little common ground on what “maintainability” actually is, how it can be assessed, and how it could be achieved.

1.2. “Maintainability” is misleading This confusion can easily be explained and resolved by considering “maintainability” as a term. The “ility” ending is used to transform the adjective “maintainable” into a noun and thereby denote it as a property of a system. The adjective “maintainable” in turn denotes the assumption that the activity “to maintain” (verb) is a property of the object that we regard, i.e. a software system. However, the perception that the ability to maintain a system was a property of the system is very limited and neglects various other factors that have a strong influence on software maintenance activities, such as the qualification of maintainers, organiseational knowledge management and adequate tools. This shortcoming is most obvious when it comes to “readability” since the ability to read is primarily not a property of the document or program to read but a question of the skills of the reader. Therefore, we strongly argue that maintainability is not solely a property of a system but touches three different dimensions: • The skills of the organiseation performing software maintenance • Technical properties of the system under consideration • Requirements engineering In addition to skills and technical properties as discussed above, dimension ? – requirements engineering – plays an important role in defining “maintainability” because the question how much flexibility one wants to have at what points in a software systems for which purpose sets the goals for constructing and later on assessing the required flexibility of the software system. For example, the strategy to construct the shortest possible implementation without any flexibility and restructuring or even rewriting the system in the event of significant changes might yield lower maintenance effort than designing for flexibility in-advance with numerous levels of indirection that might never be needed [25].

1.3. Setting the goal – cost-effectiveness From a practical point of view, the ultimate goal of any effort to improve software maintenance practices has to be increasing cost-effectiveness. While the questions whether certain architectural styles or coding and documentation techniques comply with a certain standard, or are up-to-date should be rather irrelevant, the question how to minimise the time and budget needed per change request should be of paramount importance. Throughout this paper, we therefore assume that “maintainability” requirement is to be cost-effective. The quality model that we will introduce below respects the three different dimensions of maintainability as described above and is aligned to achieving cost-effectiveness by explicitly

94

modelling maintenance activities which are the main drivers of software maintenance costs and putting them into relation with technical and organiseational properties. For the reminder of this paper, we will keep cost-effectiveness in mind as the requirements dimension of maintainability and focus on finding the important technical and organiseational properties (i.e. dimensions 1 and 2) that influence software maintenance productivity.

1.4. Effective technical and organiseational criteria Certainly, programming and documentation guidelines as well as international standards [13] list various possible criteria for the technical dimension of ”maintainability”. However, the missing adoption of these criteria is due to one or both of the following two shortcomings of these criteria: • Too general to be assessed (e.g. modifiability) or • No sound justification (e.g. methods may not be longer than 30 lines). Non-assessable criteria can inherently not have any impact, unjustified ones become ignored. Therefore, effective criteria must be both well-founded and checkable. Note that we stress “checkable” instead of “measurable with a tool” since we carefully distinguish between automatic, semi-automatic and manual checking (i.e. inspections) and exploit all three possibilities. The approach presented in this paper uses a top-down method to identify criteria that fulfil these requirements. The stepwise top-down refinement of goals into subgoals and down to checkable criteria helps to achieve completeness and allows to reason about the criteria and their interplay. The starting point of this refinement is the breakdown of maintenance tasks into phases and activities according to [11]. Considering the diverse nature of activities, such as “problem understanding” and “testing” it becomes evident, that the technical and organiseational criteria that actually influence maintenance effort a numerous and diverse. Psychological effects, such as the broken window [23] deserve just as much attention as organiseational issues (e.g. turnover) and properties of the code like naming of identifiers [5]. Each of these aspects has a significant impact on maintenance activities and therewith future maintenance costs.

2. Related work Several groups proposed metrics-based methods to measure attributes of software systems which are believed to affect maintenance [1][4]. Typically, these methods use a set of well-known metrics like lines of code, Halstead volume [10], or McCabe's Cyclomatic Complexity [18] and combine them into a single value, called maintainability index. Although such indices may expose a correlation with economical experiences, they suffer from serious shortcomings. First, their intrinsic goal is to assess overall maintainability which is, as we claimed above, of questionable use. Second, they limit themselves to properties that can be measured automatically by analyzing source code. Unfortunately, many essential quality issues, such as the usage of appropriate data structures and meaningful documentation, are semantic in nature and can inherently not be analyzed automatically. Others important technical properties such as useful documentation and normalised data modes are outside the scope of code analysis. Lastly, the indices and the underlying metrics frequently violate the most basic requirements of measurement theory [9][14]. Because of this, well-known metrics, such as the Cyclomatic Complexity, are neither sufficient nor necessary to indicate a quality defect. As the technical dimension of maintainability is a quality attribute of a system similar to security or safety [13] research on software maintenance adopted many ideas from the broader field of software quality. Quality models aim at describing complex quality criteria by breaking them down

95

into more manageable sub-criteria. Such models usually organisee quality attributes in a tree with an abstract quality attributes like “maintainability” at the top and more concrete ones like “analyzability” or “changeability” on lower levels. The leaf factors are ideally detailed enough to be objectively assessed. The values determined by the metrics are then aggregated towards the root of the tree to obtain values for higher level quality attributes. This method is often called the decompositional or Factor-Criteria-Metrics (FCM) approach and was first used by McCall [19] and Boehm [3]. Although these and more recent approaches [7][8][17] are superior purely metrics based approaches, they also fail to establish a broadly acceptable basis for quality assessments. The reasons for this are the prevalent yet unrealistic desire to condense complex quality attributes into a single value and the fact that these models typically limit themselves to a fixed number of model levels. For example, FCM's 3 level structure is inadequate. High-level goals like usability cannot be broken down into measurable properties in only two steps. Further troublesome is their reluctance against properties that cannot be measured automatically or aren't directly related to the product but the associated organiseation. E.g. it is incomprehensible why none of the models known to us highlights the influence of organiseational issues like the existence of a configuration management processes on the overall maintenance effort. Organiseational issues are typically covered by process-based approaches to software quality like ISO 9000 or CMMi [20]. Albeit, the underlying assumption that good processes entail high quality products is a widely disputed misconception [16]. Well-defined processes are help to achieve reliability and reproducibility in software projects. However, the quality of the outcome still strongly depends on the actual skills, tools and criteria used during development.

3. Modelling criteria and their effects To provide a solid foundation for assessing “maintainability” we developed a two dimensional quality model that integrates and explains relevant technical and organiseational criteria and describes their impact on actual maintenance activities. The following paragraphs describe the rationale and the structure of the model. Further details are found in [28].

3.1. Acts and facts need to be separated In an initial step we tried to answer the question “What are the factors that influence maintenance productivity?” by collecting relevant ideas from related work and building a FCM like decompositional quality model from there. Unlike Dromey [7] we did not build the model bottom-up starting from the measurable criteria but tried to build the model top-down to ensure that all criteria considered relevant for maintenance productivity, independently from the question on how difficult the measurement or assessment could become, were collected. The incremental refinement of these factors showed that preserving a consistent model that adequately described the interdependencies between the various quality criteria became soon very hard. The reason for this was that our model just like other well-known other models mixed up nodes of two very different kinds: activities and characteristics of the system. An example for this problem is shown in figure 1 which shows the maintainability branch of Boehm's Software Quality Characteristics Tree [3].

96

Figure 1: Software Quality Characteristics Though adjectives are used as descriptions the nodes in the gray boxes refer to activities whereas the uncoloured nodes describe system characteristics. So the model should rather read as: When we maintain a system we need to modify it and this activity of modification is somehow influenced by the structuredness of the system. While this may not look important at first sight we claim that this mixture of activities and characteristics is at the root of most problems encountered with known quality models. The semantics of the edges of the tree is unclear or at least ambiguous because of this mixture. And since the edges do not have a clear meaning they neither indicate a sound explanation for the relation of two nodes nor can they be used to aggregate values! As the actual maintenance efforts strongly depend on both, the type of system and the kind of maintenance activity it should be obvious that the need to distinguish between activities and characteristics is imperative.

3.2. Acts facts matrix The separation of activities from facts leads to a two-dimensional model that regards activities and facts as rows and columns of a matrix with explanations for their interrelation as its elements. The selection of activities depends on the particular development and maintenance process of the organiseation that uses the quality model. Here, we use the IEEE 1219 standard maintenance [12] as an example. An excerpt from its activity breakdown structure is shown in figure 2a. Now, the edges of the activity tree have the clear meaning of task composition. The 2nd dimension of the model, the facts about the situation, are modelled similar to an FCM model but without activitybased nodes like readability (2b). Again, the semantics of the edges within this tree is unambiguous though different from the activity tree.

Figure 2a: Example activity (a) and fact (b) trees - Maintenance

97

Figure 2b: Example activity (a) and fact (b) trees - FCM Obviously, the granularity of the facts shown in the example is too coarse for proper evaluation of the facts. In practice, we refine the situation tree stepwise down to detailed, tangible facts that we call atomic. Since many important atomic facts are semantic in nature and inherently not computable, we carefully distinguish three fact categories: • Computable facts that can be extracted or measured with a tool. • Facts that require manual inspection. • Facts that can be computed to a limited extent requiring additional manual inspection. One example for this is dead code analysis. The interrelation between atomic facts and activities can be best expressed by a matrix as depicted in the simplified figure 3. The matrix points out how facts affect activities (here simplified as true/false), allows to aggregate results from the atomic level onto higher levels in both trees because of the unambiguous semantics of the edges, and also allows to cross-check the integrity of the model. For example, the sample matrix states that tools don't affect coding, which is untrue and due to the incompleteness of the example, that doesn't regard tools like modern integrated development environments.

Figure 3: Matrix model explaining mainenance effort

98

4. Experiences The above described meta model was first designed and used to develop a quality model within a large scale commercial project in the field of telecommunication. The system regarded and assessed in this context consisted of 3.5 MLOC2 in C++, COBOL and Java, was 15 years old and under active development by 50-100 developers which completed 150 change requests per year. The quality model was later on refined, extended and used by three further large scales and well-known industrial organiseation. One of these organiseations now uses this model as the foundation to design its company-wide coding and documentation rules [26]. The other organiseation used it to assess the state of a significant part of its core software landscape consisting of more than 10 MLOC of legacy software written in COBOL and PL/I. Due to confidentiality requirements detailed results of these practical experiences cannot be disclosed. However, we are able to summarise important results and describe selected aspects of the concrete model instances used in these practical settings to provide further information on the practical applicability of the model.

4.1. Relevant activities As can be expected, the task to derive the activity tree is usually merely simple and is best accomplished by transforming the software development and maintenance process of the organiseation with its task, subtask and activity structure into an integrated tree representation. The top most node is usually either “software development” or “software maintenance”. From there it splits into the typical tasks: analyze, design, implement, test (subtasks “prepare testenvironment”, “perform test case”, “interpret test results”, deploy (“build”, “integrate”, “install”, “run”), document, and operate. While all of these activities can easily be further refined to more detailed activities from a technical point of view, the reasonable refinement is in practice limited by the possibility to monitor the actual effort. The maximum useful refinement is predetermined by the possible entries of the time recording system. The impact of changing a fact in the situation on a certain activity cannot be tracked if the same activity is not recorded individually in the time recording system. Activities of this kind can only be used as a rationale for selecting a situation fact. In turn, the matrix model has helped to identify weaknesses in time recording systems and has already fostered more detailed time recording systems some organiseations paving the ground for a goal-oriented and truly economics driven optimisation.

4.2. Facts about the situation In all scenarios where the quality model was used, so far, the situation tree grew to at least 250 different facts about the situation which sounds large and complicated but is actually not surprising because the productivity of software maintenance activities is indeed severely influenced by various different aspects. Due to space constraints, we can only mention few of them to give some examples. First of all the quality models used in practice regard not only the source code but also organiseational aspects (skills, distribution of knowledge within the team, turnover, tools, structured processes, roles, test-suite, coding and documentation rules, etc.), the documentation of the system, the data model and even the actual data that is stored in the information system. The latter is usually neglected in other quality models but of high relevance. When performing changes on a large scale 2

Million Lines of Code

99

system, one has to ensure that the change is not only compatible with the existing code but also with the existing (legacy) data. For example, even if certain data violates new assumptions, such as a minimum or a maximum allowed time-stamp, the existing data either needs to be transformed or the new functionality must be prepared for possible exceptions to its rules. In both cases, the legacy data significantly increases the time need for the activities “impact analysis”, “design”, “implementation”, and eventually “deploy”. Similar to that, we and our industrial partners found various aspects of data models of particular importance. We therefore introduced metrics about the data model such as max. number of primary key attributes per table, max. number of attributes per table and number of entries per table. In practical settings these facts proved to be highly useful to detect duplicated attributes (e.g. in a table with more than 150 attributes), lack of normalization and further shortcomings that negatively influence various activities. Among the many important facts of the source code sub tree of the situation tree, we found two facts as extremely helpful. First is the code redundancy, second is IF-ratio – i.e. the number of simple conditionals (“IF”) per 1000 lines of code (kLOC). We argue that redundancy duplicates the effort needed for analysis, implementation and testing. As described in [27], we consider redundancy as a main cost driver in software maintenance. In our practical work, we frequently found systems with 40% redundancy but also encountered systems with 90% redundancy! The IFratio proved to deliver a useful indicator for the algorithmic model behind a software system respectively its design. While well-structured systems typically expose a rate of 20-30 IFs per kLOC, we detected in legacy systems values of 60 and up to 100 IFs per kLOC. The main reason detected for these striking values were consistently change requests that were performed under strong pressure of time by simply adding new cases and exceptions to existing algorithms. The consequence of this is that analyzing and testing such systems becomes extremely difficult and timeconsuming.

4.3. Using the quality model for assessments Using the proposed quality model for assessing the state of large-scale software systems has proved to yield valuable information for both technical staff as well project and product managers. In all cases, the unusual criteria used in the model revealed severe weaknesses that were not regarded before, e.g. 90% redundancy or an average 20 times repetition of constant values. In addition to this, the integration of the two dimensions activity and situation in this model provided a common ground for members of the technical staff and manager to identify possible optimization. The model allows developers to explain the commercial impact of a property of the organiseation or the system to managers. In turn, manager use the model to identify the facts about the situation the drive unwanted costs of certain activities.

5. Conclusion Although maintainability is undisputedly considered one of the fundamental quality attributes of software systems the research community has not yet produced a sound and accepted definition or even a common understanding what maintainability actually is. Substantiated by various examples we showed that this shortcoming is due to the intrinsic problem that there simply is no such thing as “the maintainability of a software system'”. We showed that the factors that influence maintenance productivity must be put into context with particular activities. This notion is captured by our novel two-dimensional quality model for software maintenance which maps facts about a development

100

situation to maintenance activities and thereby highlights their relative influence. This model has been successfully applied in industrial settings to reveal significant economic optimization potential.

6. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28]

G. M. Berns, “Assessing software maintainability”, ACM Communications, 27(1), 1984. B. Boehm, “Software Engineering Economics”, Prentice-Hall, 1981. B. W. Boehm et al, “Characteristics of Software Quality”, North-Holland, 1978. D. Coleman, D. Ash, B. Lowther, and P. W. Oman, “Using metrics to evaluate software system maintainability”, Computer, 27(8), 1994. F. Deissenböck and M. Pizka, “Concise and consistent naming. In IWPC 2005, pages 97–106, Washington, DC, USA, 2005, IEEE Computer Society. F. Deissenböck, M. Pizka, and T. Seifert, “Tool-supported realtime quality assessment” PreProceedings of STEP 2005, Budapest, Hungary, 2005. R. G. Dromey, “A model for software product quality”, IEEE Trans. Softw. Eng., 21(2), 1995. R. G. Dromey, “Cornering the chimera. IEEE Software”, 13(1), 1996. N. Fenton, “Software measurement: A necessary scientific basis”, IEEE Tran. Soft. Eng., 1994. M. Halstead, “Elements of Software Science”, Elsevier Science Inc., New York, NY, USA, 1977. C. S. Hartzman, C. F. Austin, “Maintenance productivity” In CASCON 1993, IBM Press, 1993. IEEE 1219, “Software maintenance”, Standard, IEEE, 1998. ISO 9126-1, “Software engineering - Product quality - Part 1: Quality model”, ISO, 2003. C. Kaner and W. P. Bond, “Software engineering metrics: What do they measure and how do we know?”, In METRICS 2004, IEEE CS Press, 2004. K. Katheder, “Studie zur Software-Wartung”, Bachelor thesis, TU München, Germany, 2003. B. Kitchenham and S. L. Pfleeger, “Software quality: The elusive target”, IEEE Software, 1996. R. Marinescu and D. Ratiu, “Quantifying the quality of object-oriented design: The factorstrategy model”, In WCRE 2004, IEEE CS Press, 2004. T. J. McCabe, “A complexity measure”, In ICSE 1976, IEEE CS Press, 1976. J. McCall and G. Walters, ”Factors in Software Quality”, The National Technical Information Service (NTIS), Springfield, VA, USA, 1977. M. Paulk, C. V. Weber, B. Curtis, and M. B. Chrissis, “The Capability Maturity Model: Guidelines for Improving the Software Process”, Addison-Wesley, 1995. T. M. Pigoski, “Practical Software Maintenance”, Wiley Computer Publishing, 1996. STSC, “Software Reengineering Assessment Handbook v3.0”, Technical report, STSC, U.S. Department of Defense, Mar. 1997. J. Q. Wilson and G. L. Kelling, “Broken windows”, The Atlantic Monthly, 249(3), 1982. B. Wix and H. Balzert, editors, “Softwarewartung. Angewandte Informatik”, BI Wissenschaftsverlag, 1988. M. Fowler, “Who needs an Architect?”, IEEE Software, pages 11-13, 20(5), September 2003. F. Deissenböck, S. Wagner, M. Pizka, S. Teuchert, J.-F. Girard, “An Activity-Based Quality Model for Maintainability”, Submitted to Int. Conf. on Software Maintenance, Paris, 2007. M. Pizka, “Code Normal Forms”, NASA SEW-29. Greenbelt, MD. April, 2005. M. Broy, F. Deissenböck, M. Pizka, “Demystifying Maitainability”, Proceedings of the 4th Workshop on Software Quality, ACM Press, Shanghai, China. 2006

101

LIB

102

Tracking Software Degradation: The SPIP case study Miguel Lopez, Naji Habra

Abstract A main assumption behind the majority of the refactoring works states roughly that normal evolution of any software deteriorates its structure in such a way that, at a certain point, a refactoring becomes interesting/necessary to recover quality structure. Of course, this general idea necessitates several precisions about the software itself, the evolution process, the structural properties being deteriorated, etc. According to this fact, an important question is how can we track this degradation in order to better apply refactoring recommendations? The current paper aims at answering this question by suggesting a set of some structural metrics (based on fan in, fan out, cyclomatic complexity) that can highlight pieces of code whose degradation is so high that some refactoring actions are necessary in order to brake this deterioration. The paper also provides an empirical case study where we experiment these structural metrics. This empirical study is achieved by investigating the evolution of one open-source CMS software (i.e., SPIP) through a succession of 9 releases. The experiment follows the classical paradigm of Goal-Question-Metrics which proposes to divide the experimentation into three levels: a conceptual level (determining the experimentation goal), an operational level (the questions that express the goal through an operational formulation) and a quantitative level (the metrics chosen to answer the questions). Here is our experiment structuring: • The goal is to check whether software evolution presents an identifiable point where a refactoring becomes necessary and to identify that point. • The questions issued from that goal are: what is well-structured software, how this attribute evolve through the succession of releases, where that attribute becomes too bad to require a refactoring. • To answer the above questions we select simple structural "metrics" computed in the dependency graph where vertices are files and where oriented edges represent the include relationship between files. Starting with such graph, the three following measurements are used: the cyclomatic complexity of the whole graph, the indegree of a vertex f (i.e., the number of files including the file f), the outdegree of a vertex f (i.e. the number of files included in the file f), and the ratio between indegree and outdegree for each file. The direct finding of this first case study is the possibility of identifying a particular point when the structure becomes deteriorated in a severe degree which could intuitively correspond to the point where experts judge that a refactoring becomes necessary.

1. Introduction Software degradation remains a poorly known and understood phenomenon amongst software engineer communities. Even if this phenomenon seems to be as expensive as highly requirement changes rates or ambiguous specifications, software degradation is unknown. However, it begins to be considered as a serious fact that must be strongly investigated.

103

But, what software degradation does mean? How can we define software degradation? According to [5], A central feature of the evolution of large software systems is that change – which is necessary to add new functionality, accommodate new hardware, and repair faults– becomes increasingly difficult over time. Due to some changes of the business or technical needs, the software becomes more difficult to modify. The effort to adapt, correct, or modify a given software is increasing as the versions are released. Software degradation or code decay seems to be involved in software maintenance. Usually, it is accepted that software degradation begins once the first version has been delivered, and since that moment, software enters into a degradation phase characterised by an increasing modification effort. In [10], the first case studies coming from the end of the sixties can be found. Today, an important community is involved in finding out the software evolution principles [11][6][4][3][14]. However, even if software evolution is a rich scientific literature, software degradation is a fact difficult to highlight objectively. Some measures can be found in the state of the art that allows tracking a certain evolution. Moreover, those measures have been very useful to determine the very first formulation of the software evolution principles. Within these measurements, we can find the traditional software metrics like lines of code, amount of modules…. After these traditional measurements, others have been suggested by researchers in software degradation: • Code decay indices [5]. • Amount of programs, amount of added programs [6]. • Measure on the call graph: a call is a single link between functions within a code module [4]. • Entropy measures [3]. Only the two last propositions, [4] and [3], take into account the interdependencies between software elements (modules, functions…). We note this point, since we are convinced that the software degradation problem can be understood (or better understood) if and only if the analysis and therefore the measures that support this analysis consider these interdependencies as a central property in terms of degradation. Actually, that is the main position of the current paper, that is, focusing on the dependencies that can be found within software. In other words, we will investigate the dependencies discovered within a source code, and set up a means to track their evolutions and/or degradations. So, an important question is how can we track this degradation in order to better apply refactoring recommendations? The current paper aims at answering this question by suggesting a set of some structural metrics (based on fan in, fan out, cyclomatic complexity) that can highlight pieces of code whose degradation is so high that some refactoring actions are necessary in order to brake this deterioration. The paper also provides an empirical case study where we experiment these structural metrics.

104

This empirical study is achieved by investigating the evolution of one open-source CMS software (i.e., SPIP) through a succession of 9 releases. The experiment follows the classical paradigm of Goal-Question-Metrics which proposes to divide the experimentation into three levels: a conceptual level (determining the experimentation goal), an operational level (the questions that express the goal through an operational formulation) and a quantitative level (the metrics chosen to answer the questions). The direct finding of this first case study is the possibility of identifying a particular point when the structure becomes deteriorated in a severe degree that could intuitively correspond to the point where experts judge that a refactoring becomes necessary.

2. Problem Statement & Lehman’s Laws Lehman has suggested a set of 8 laws [12] that have been verified in several ways during the last decades. Those laws or statements describe how software in the real world evolves from release to release. These laws have been observed within a set of data obtained during a 1968 study of the software process [10] led to an investigation of the evolution of OS/360 [11] and, over a period of twenty years, to formulation of eight Laws of Software Evolution. The current work is focused on the second law of Lehman. This is stated as follows: Table 1: Second's Lehman Law Increasing Complexity: As a program is evolved its complexity increases unless work is done to maintain or reduce it. This law results from the imposition of change upon change as the system is adapted to new needs. As the need for adaptation arises and changes are successively implemented, interactions and dependencies between the system elements increase in an unstructured pattern and lead to an increase in system complexity. A first point to discuss concerning this law is the system level. Lehman suggests modeling the software within this second law as a whole system. Indeed, the changes operate as triggers that will have an impact on the system elements and their relations. Observing the software at the system level should allow capturing such phenomenon of increasing complexity. This system level will ease the study of the interdependencies that are considered, in the current work, as a central property of the software in terms of its degradation. A second point is the second part of the statement’s law. Lehman suggests that some maintenance process must operate to reduce the increasing of complexity. That point is very interesting even if it will not be discussed here. Nevertheless, Lehman suggests modeling the software at the system level, and moreover he proposes to consider the maintenance effort as the solution to prevent the increasing (or degradation). Moreover this second point, it reduces the law to a maintenance problem. Indeed, the maintainer (or programmer) must understand the source code to modify it. The source code can be seen as a text that the maintainer/programmer must understand. So, the second law states that it is more and more difficult to understand and change this text, that is, the source code, if any maintenance effort is not performed. The problem we face as practitioner is to monitor this complexity that prevents understanding and modifications. How can we track the complexity evolution?

105

So, this second’s law treats the software degradation in complexity terms. Indeed, the more the software ages, the higher is its complexity. But, what does software complexity mean? A lot of works have been done related to complexity [13][9][7][8][1]. According to these studies, software complexity has several definitions (computational, cyclomatic, information, …) in regard with the goal of the investigation. Obviously, software complexity remains a multiple concept that cannot be kept in one single definition. And, this is not the goal of fhe current paper. So, software complexity means within the current work the difficulty to maintain (understand and change) given software. This viewpoint is developed in [16]. And, the difficulty to maintain software is also closely related to the second’s Lehman law. Software degradation is stated in terms of software complexity, software evolution, and difficulty to maintain it. Now, the question remains: how can we measure the increasing complexity that is described in the second’s Lehman law and how can we highlight this degradation or increasing complexity? Note that we use software degradation and increasing complexity as synonyms. Indeed, the more complex is a software, the more is difficult to maintain this software. Both are closely related concepts. And, often, both concepts are the same single idea.

3. Empirical Study Description 3.1. Goal Question Metric Approach The Goal-Question-Metric (GQM) approach is a top-down methodology for empirical study to select ad-hoc metrics on basis of higher-level goals (Ref). This approach is a three-fold process: • Conceptual level (goal) A goal is defined for an object for a variety of reasons, with respect to various models of quality, from various points of view and relative to a particular environment. • Operational level (question) A set of questions is used to define models of the object of study and then focuses on that object to characterise the assessment or achievement of a specific goal. • Quantitative level (metric) A set of metrics, based on the models, is associated with every question in order to answer it in a measurable way. It is important to note that GQM contributes but does not replace theory elaboration and/or model construction.

3.2. GQM Applied The application of the GQM method to the current study gives the following: • Goals: o G1: Validate the hypotheses: § Hypothesis: software structure deteriorates over time and releases (Lehman’s second law of increasing complexity) o G2: Find a hint to determine the time t (if any)

106

• Questions: o What is a well-structured software? § Loosely Coupled? § Hardly complex? § Sufficiently commented? § A software with design patterns? § A software with a reduced amount of path o How these attributes evolve through the succession of releases? § Linear ? Geometric? o Is there a particular identified acceleration/stabilization point? • Metrics: To answer those questions, we suggest modeling the software as Lehman proposed, that is, as a whole system. Since the phenomenon we are trying to highlight is the deterioration of the software code and its increasing difficulty to maintain, the software elements we are interested in is the source code made up of files. Indeed, the software maintenance can be related to the degree of structure of the source code seen as a text. This text can have a certain degree of structure (distributed throughout interconnected files), which eases the understanding and the modification. In PHP (which is the programming language of SPIP), a statement allows to construct the code (the text) in a structured way (to distribute the text throughout different files), that is, the include statement. The include statement1 provides the developer with a mean to construct his/her source code by extracting a part of file and putting it in another file. This other file is then included in the first one. When a file is included, the code it contains inherits the variable scope of the line on which the include occurs. Any variables available at that line in the calling file will be available within the called file, from that point forward. However, all functions and classes defined in the included file have the global scope. Table 2 shows examples of include statement where the code includes the prepend.php file. Table 2: Include Example

The include statement and its synonyms are a basic mean to construct in a structured way the source code. So, considering the set of called and calling files provides a way to model the software as a whole. Indeed, the source code can be seen as a graph where the nodes are the files and the edges are include relations. This graph shows the dependencies between files in the source code, and can be a good candidate for the modeling of the software at the system level. 1

Include statement has some synonyms like include_once, require, require_once. All the four statements are considered with the same impact onto the structure in this study.

107

Now, some metrics related to a graph can be applied to study the evolution of the software. And, the current work focused on the following graph metrics: • # Nodes: amount of nodes (or files). • # Edges: amount of edges (or include relations). • Cyclomatic complexity: applied to the dependencies graph in the following way: o e – n + 1 with e the amount of edges, and n the amount of node. o This metric can be interpreted as the number of cycle of the dependency graph where the nodes are the files and the edges the include relations. Note that the include statement allows constructing in a structured way the code. However, the more include statements are used in a given code, the more the amount of cycles (paths) to traverse through this graph will be available. So, it means that the difficulty to understand (navigate to the graph of files and include relations) and therefore to modify the software is related to amount of include statements. The include statement allows constructing a structured source code, but at a given point this prevents maintainability. So, in that sense, we expect to find out some patterns in the evolution of the cyclomatic complexity of the whole dependency graph that allows observing the second’s Lehman law, that is, the increasing difficulty to understand and modify the software. In other words, we expect to find out a pattern that allows observing the increasing amount of paths that traverse the software dependency graph made up files and include statements.

3.3. Results Table 3 shows the data related to the 9 versions of SPIP [15]. Table 3: Data SPIP Versions v104 v11 v12 v13 v14pr5 v15pr4 v16 v17 v18

#nodes 83 85 93 100 122 125 138 191 279

108

#edges 104 109 121 138 264 302 324 393 513

complexity 22 25 29 39 143 178 187 203 235

Figure 1 shows the same results within a line chart. SPIP Evolution 600

500

400

Complexity # nodes # edges

300

200

100

0 v104

v11

v12

v13

v14pr5

v15pr4

v16

v17

v18

Figure 1: Line Chart We can observe that the evolution of complexity and the amount of edges accelerate their increasing after v13.

4. Results Discussion Firstly, this case study does not give any evidence of the degradation and its breaking point. It is only a case study that gives some new consideration about software degradation. Secondly, as we can see in the Figure 1, the increasing of the complexity is accelerated after the version v13. This point could be the breaking point. It could mean that in the version v14pre5, there is such an amount of cycles (paths that traverses the dependency graph) that it becomes more difficult for the maintainer (or programmer) to easily understand and modify the source code (that is, the text). As the software evolves, the amount of ways to navigate from file to another is increasing which could increase the effort to understand and therefore to change the software. Thirdly, Figure 1 shows an evolution as the second Lehman’s law states. Indeed, this law describes the increasing evolution of the complexity of a program. Since now, this second law was empirically studied with metrics like lines of codes, amount of programs, entropy…. In the current work, we proposed a metric of the complexity of the whole software seen as a dependency graph between files. From this viewpoint, this metric can considered as a good candidate to proof the second Lehman’s law. Fourthly, this measure takes into account the maintenance phenomenon. Indeed, the dependency graph models the software as the maintainer sees it, that is, a set of related files through which he/she must navigate to understand and thereafter modify the program. Finally, this measure of complexity can also be applied to other languages. We will compute it to C and Cobol programs. And this will be the topic of our further works.

109

5. Conclusion In the current work, we have proposed a new measure to highlight the increasing difficulty to understand and modify software as it is released. The amount of cycles of the dependency graph shows in the current case study that the increasing is exponential, what corresponds with other observations [11]. Nevertheless, even if the observation is similar, the measure suggested is closer to the concept of software complexity in terms of maintenance and in the Lehman’s laws terms than the classical measures. Indeed, the dependency graph seen as a network that the maintainer navigate in order to understand and modify it. The more the maintainer must walk through cycles, the more he/she will easily understand the software to modify it. However, the breaking point has not been identified. Some points of Figure 1 seems can be considered as breakeven points, but further investigations are needed to identify these points. 6. References [1] Abran, A., Lopez, M., and Habra, N., “An Analysis of the McCabe Cyclomatic Complexity Number”, in IWSM Proceedings, Berlin, 2004. [2] Blaine, J.D. and Kemmerer, R.A. "Complexity Measures for Assembly Language Programs", JSS, 5, 1985. [3] Bianchi, A., Caivano, D., Lanubile, F. and Visaggio, G., “Evaluating Software Degradation through Entropy”, Seventh International Software Metrics Symposium (METRICS'01) p. 210. [4] Burd, E. and Munro, M., “An initial approach towards measuring and characterising software evolution”. In Proceedings of WCRE'99, pages 168--174. IEEE Computer Society, 1999. http://citeseer.ist.psu.edu/burd99initial.html. [5] Eick, S.G., Graves, T.,L., and Karr A.,F., “Does Code decay ? Assessing the evidence from change management data”, IEEE Transactions on Software Engineering, Vol 27,No 1, January 2001. [6] H. Gall, M. Jazayeri, R. G. Klsch and G. Trausmuth, `Software Evolution Observations Based on Product Release History', Proceedings of the Conference on Software Maintenance, 1997, pp. 160-166. http://citeseer.ist.psu.edu/gall97software.html. [7] Gill, G., and Kemerer, C., “Cyclomatic Complexity Density and Software Maintenance Productivity,” IEEE Transactions on Software Engineering, December 1991. [8] Heimann, D., “Complexity and Defects in Software—A CASE Study, ”Proceedings of the 1994 McCabe Users Group Conference, May 1994. [9] Kafura, D., and Reddy, G., “The Use of Software Complexity Metrics in Software Maintenance,” IEEE Transactions on Software Engineering, March 1987. [10] Lehman M M, “The Programming Process”, IBM Res. Rep. RC 2722, IBM Res. Centre, Yorktown Heights, NY 10594, Sept. 1969. [11] Lehman M M and Belady L A, Program Evolution, - Processes of Software Change, Academic Press, London, 1985, pp. 538. [12] Lehman M. M.and Parr, F. N., “Program evolution and its impact on software engineering”, Proceedings of the 2nd international conference on Software engineering, p.350-357, October 13-15, 1976, San Francisco, California, United States. [13] McCabe, T.J., “A Complexity Measure”, IEEE Transactions on Software Engineering, Vol. SE2, No. 4, October 1976. [14] Parnas, D. L., “Software aging,” Proceedings of the 16th international conference on Software engineering, p.279-287, May 16-21, 1994, Sorrento, Italy. [15] SPIP, http://www.spip.org. [16] Zuse, H., Software Complexity - Measures and Methods, Walter De Gruyter Inc, December 1990

110

A Measurement Approach Integrating ISO 15939, CMMI and the ISBSG Luc Bégnoche, Alain Abran, Luigi Buglione

Abstract In recent years, a number of well-known groups have developed sets of best practices on software measurement, but from different perspectives. These best practices have been published in various documents, such as ISO 15939, the CMMI model and the ISBSG data repository. However, these documents were developed independently and, as a result, for a software engineering organiseation initiating a measurement program it is a challenge to work out a strategy to leverage the benefits of each, while at the same time offsetting gaps. First, although ISO 15939 (Software Measurement Process) is an international standard which defines the activities and tasks that are necessary to implement a software measurement process, because its activities and tasks are defined at a very high level, additional support is necessary for ease of implementation. Second, while CMMI (Capability Maturity Model Integration) is a model which contains the essential elements of an effective software engineering process, it is now strongly measurement-oriented, which means that it provides guidance on which elements need measurement, but does not provide specific guidelines for defining specific measures and does not support an international repository of project measurement results. Third, the International Software Benchmarking Standards Group (ISBSG) provides a repository of project data which may be used for benchmarking and development of estimation models. This paper proposes an approach to integrating resources such as ISO 15939, CMMI and the ISBSG data repository in support a software engineering measurement program.

1. Introduction Software engineering, like any other engineering discipline, can benefit from continuous improvements. This requires that the actual performance of a software engineering process be objectively evaluated and assessed against a baseline, that an improvement program be designed and implemented, and, finally, that the impact of any improvement made be objectively evaluated. A body of best practices has been published in various documents, such as ISO 15939 [1][2], CMMI [3] and the International Software Benchmarking Standards Group (ISBSG) data repository and related glossary, data collection questionnaire and releases [4]. However, these documents were developed independently. For a software engineering organiseation initiating a measurement program, it is therefore challenging to work out a strategy to leverage the benefits of each, while at the same time offsetting gaps. But, where does one begin when starting up a new software measurement program? ISO 15939 is a must when the time comes to implement a software measurement program, as it covers all the activities and tasks necessary for a successful implementation. However, this international standard is not sufficient by itself, and additional knowledge coupled with considerable expertise is still needed. For instance, this international standard clearly states that it “does not catalogue software measures, nor does it provide a recommended set of measures to apply on software projects” [1].

111

Instead, it provides guidance for “defining a suitable set of measures that address specific information needs” [1]. It remains, however, that these information needs must be worked out and measures found to help meet these needs [2]. CMMI defines goals and practices covering multiple maturity levels and multiple process areas. These goals and practices may be used to provide more guidance about which elements of a software engineering process need measurement and to identify some of the information needs. This paper includes an analysis of CMMI in order to assess whether or not this model could be used, along with ISO 15939, as a starter kit for planning a software measurement process. However, neither ISO 15939 nor CMMI provides detailed data which can be of immediate use to organiseations for benchmarking or guidance purposes. Such data is, however, available from the ISBSG [3], which provides benchmarking standards based on ISO 15939, as well as a repository of over 4,000 projects, as of early 2007. Could the ISBSG be used as a turnkey solution when the time comes to implement a software measurement process? This paper looks into this question as well. Section 2 presents ISO 15939 and section 3 a mapping between ISO 15939 and CMMI. Section 4 presents the measurement view incorporated in the CMMI model and section 5 the ISBSG. Finally, our conclusions are presented in section 6.

2. ISO 15939 – An Overview The ISO 15939 international standard documents the required components of a software measurement process and includes a number of appendices for additional guidance. The software measurement process is described in terms of activities and tasks only; properties such as entry criteria, exit criteria and work products are not defined. The appendices provide useful information, such as a measurement information model, examples of specific measures using the model, the work products that may be produced by the process and examples of criteria for evaluating some work products.

2.1. Software Measurement Process • • • •

The software measurement process consists of four activities (Figure 1): “Establish & sustain measurement commitment”: The scope is defined, the necessary commitment is established and resources are assigned. “Plan the measurement process”: Information needs are identified, information products are defined, measurement procedures are defined and supporting technologies are acquired. “Perform the measurement process”: Data are collected, meaningful information products are produced and results are communicated. “Evaluate measurement”: Information products are evaluated and potential improvements of the measurement process are identified.

112

Figure 1: Software Measurement Process Model [1] Only the second and third activities are considered to constitute the “core measurement process”, which is itself driven by the information needs as input and producing information products as output. These information products will be used by measurement users as an objective basis for communication and decision-making.

2.2. Information Needs & Products For each information need, there is a corresponding information product that satisfies it. An information product comprises one or more indicators and their associated interpretations. But information products are not close to the measured entities; this is why a detailed measurement information model needs to be defined (Figure 2). From the measured attributes, there are measurement methods, measurement functions, algorithms and criteria that are applied before actual values can be assigned and interpreted. However, in order to stay within the scope of this paper, only the information needs and products (indicators and interpretations) are considered. In [2], the hierarchy of concepts in the Measurement Information Model illustrated in Figure 1 has been subdivided into three sets for ease of understanding – see Figure 3: • Data collection: includes the measurement methods and the base measures. • Data preparation: includes the agreed-upon mathematical formula and related labels (e.g. measurement functions and derived measures). • Data analysis: includes the analysis models, indicators and interpretation.

113

Figure 2: ISO 15939 Measurement Information Model [1]

3. ISO 15939 vs. CMMI ISO 15939 defines a software measurement process which takes information needs as input in order to produce useful information products as output. But is it possible to obtain guidance about those information needs? What if the measurement users, especially managers, do not know about software measurement and, consequently, about their own information needs?

3.1. Using Both Fortunately, it is possible to create a starter kit by using both ISO 15939 and CMMI. Indeed, CMMI, version 1.2 [3], offers guidance about which elements of a software engineering process need measurement, and, because it is a software engineering process model applicable to both the software and the systems engineering domains, this model is used extensively as a process improvement model. Hence, it is possible to extract information needs from CMMI and use them as input for the core software measurement process defined in ISO 15939. This new information flow (see Figure 4) may be particularly useful for an emerging business that does not have personnel with the substantial knowledge and expertise required to drive a software measurement process. The new information flow would be used during task “5.2.2.1 Information needs for measurement shall be identified” [1].

114

Figure 3: Hierarchy of concepts in the Measurement Information Model [2]

Figure 4: CMMI and the Software Measurement Model

115

3.2. Methodology ISO 15939’s task 5.2.2.1 states that the “information needs are based on: goals, constraints, risks, and problems (which originate from the technical and management processes” [1]. In a context where CMMI is used, some of the information needs will be based on the goals and practices defined in that context. Within the scope of this paper, a CMMI goal or practice related to measurement is considered to be a goal or practice which: • Generates data that could be analysed in order to produce an objective basis for communication or decision-making. • Involves decision-making that would benefit from objective information. • Explicitly requires measurement as part of the measurement process. Unfortunately, the first and second criteria may generate many information needs that do not have the same relevance. Hence, it is important to identify the following levels of relevance: • “Mentioned” when the information need is based on the first and/or second criterion; • “Recommended” when the information need is expressed in terms of measurement without being explicitly required as a part of the measurement process; • “Required” when the information need is based on the third criterion.

3.3. Measurement Interest Areas To organisee the extracted information needs, it is important to classify the information needs. Here, the concept of “measurement interest areas” is used. This makes it possible to take a snapshot of each maturity level defined in CMMI without going into too much detail. The following measurement interest areas are based on combining the process group classifications from ISO 12207 and CMMI (see Figure 5): • “Requirements”: a software life cycle area which involves requirements development, requirements analysis and requirements acceptance; • “Analysis”: a software life cycle area which involves risk analysis and decision analysis; • “Design & Implementation”: a software life-cycle area which involves software design and software coding; • “Verification & Validation (V&V)”: a software life cycle area which covers both verification and validation throughout the project life cycle, including testing activities, related to the verification of internal and external quality (e.g. quality models from ISO 9126); • “Project Management”: a supporting area which covers project planning and monitoring, and control for the whole life cycle; • “Configuration Management”: a supporting area which covers versioning and baseline for the whole life cycle; • “Quality Assurance”: a supporting area which covers evaluation of activities and work products against a managed or defined process with the purpose of improvement for the whole life cycle; • “Training”: a supporting area which covers training for the whole life cycle and other supporting areas.

116

Figure 5: Measurement Interest Areas – using the CMMI structure of software processes

3.4. QA vs. V&V It is important to explain the distinction between Verification/Validation (V&V) and quality assurance (QA). This distinction is not always clear to practitioners. Verification and validation constitute a set of activities that evaluates the quality of a specific software product. In the scope of a software measurement process, verification and validation activities measure the quality of a specific software product in order to support decision-making surrounding improvement (correcting bugs, re-factoring) of this specific software product. Quality assurance looks at a longer perspective and is not specific to a particular software product. Within the scope of a software measurement process, quality assurance makes use of the measures with a view to evaluating actual process performance against the managed or defined process in order to support decision-making surrounding improvement of the organiseation as a whole. Since the organiseation's mission is to deliver software products with built-in quality, quality assurance and verification/validation are both aimed at improving software product quality, but from significantly different points of view.

4. CMMI – An Analysis For this paper, the staged representation of CMMI has been chosen: “The staged representation prescribes an order for implementing process areas according to maturity levels, which define the improvement path for an organiseation from the initial level to the optimising level” [3]. In the following sub-sections, Maturity Levels (MLs) 2 to 5 are analyzed in terms of information needs for a software measurement process. Since all organiseations are considered to be at least at ML1 by default, the first level is not defined and not analyzed. Because these sub-sections are like snapshots, a detailed analysis of each level is documented in four tables (see Tables 1 to 4) at the end of this paper.

4.1. Level 2 – Managed At the Managed level, an organiseation does not yet have a defined set of processes. Instead, it has processes that are planned, performed, measured (with some beginner measures) and controlled. At this level, there is an emphasis on requirements management and project management to ensure that the software products satisfy the specified requirements and that they are delivered according to plan (cost and time). At this level (ML2), “Measurement & Analysis” is already a central concern. Hence, the answer to the question “When to begin?” is “As soon as possible”. CMMI does not explicitly require any information needs to be satisfied at ML2; however, some information needs are mentioned, with particular attention paid to project estimates (cost and time). At least some of these information needs should be addressed by the soon-to-be software measurement process. 117

Some of the extracted information needs among measurement interest areas of ML2 are listed next and summarised in Figure 6. Note that one information need may give rise to multiple indicators and measures, which may, in turn, satisfy other information needs. • Requirements: o Need to know the degree of compliance of the requirements with established criteria. o Need to evaluate the impact of requirements for commitment. o Need to know the consistency of other work products vis-à-vis the requirements. • Analysis: o Need to evaluate the risk associated with a project. o Configuration management: o Need to evaluate the impact of change requests. o Need to know the integrity of the baselines. • Project management: o Need to collect data about project effort, project cost, work product attributes and task attributes; o Need to estimate effort and cost using models and/or historical data. o Need to track the actual project performance. o Need to know the effectiveness of corrective actions when taken on identified issues. • Quality Assurance: o Need to evaluate the process as performed against the applicable process descriptions. o Need to evaluate the work products against applicable descriptions. References to more information needs from CMMI practices and sub-practices are listed in Table 1.

Figure 6: Distribution of information needs at ML2

118

4.2. Level 3 – Defined The Defined level is reached when an organiseation has a defined set of processes that are improved over time. “[…] at maturity level 3, processes are typically described more rigorously than at maturity level 2” [3]. Moreover, at Maturity Level 3 (ML3), processes are managed using detailed measures of the processes and work products. At ML3, information needs cover all measurement interest areas. However, most of the information needs (60%) come from “quality assurance” and “verification and validation”. Some of the extracted information needs among measurement interest areas of ML3 are listed next and summarised in Figure 7. Note that one information need may give rise to multiple indicators and measures, which may, in turn, satisfy other information needs: • Requirements: o Need to know the functional size of the requirements. o Need to know the completeness, feasibility and verifiability of the requirements. o Need to track technical performance requirements during development effort. • Analysis: o Need to evaluate the risk associated with the requirements. o Need to evaluate, categorise and prioritise identified risks using established criteria. o Need to trigger a risk mitigation plan when an unacceptable level or threshold is reached. o Need to compare alternative solutions using established criteria in order to select the best solution. • Design and implementation: o Need to know the degree of compliance of the design with established criteria. o Need to know the consistency of the design vis-à-vis the requirements. o Need to evaluate the completeness and coverage of all product component interfaces. o Need to know the degree of compliance of the code with the design. • Verification and validation: o Collect data from peer reviews on the code. o Collect results from unit testing. o Collect data from peer reviews on the documentation. o Need to evaluate assembled product components following product integration. o Need to confirm correct operation at the operational site. o Need to identify corrective actions by analyzing verification and validation data. • Project management: o Need to estimate the project’s planning parameters using the measurement repository. o Need to manage the project using a set of specific measures. • Quality assurance: o Need to appraise processes, methods and tools periodically to identify strengths and weaknesses and to develop recommendations. o Collect data from peer reviews on the common set of measures and procedures for storing and retrieving measures. • Training: o Collect data about training activities. o Collect data about test results. References to more information needs from CMMI practices and sub-practices are listed in Table 2.

119


4.3. Level 4 – Quantitatively Managed The Quantitatively Managed level is reached when detailed measures of quality and process performance are collected and statistically analyzed. “Quantitative objectives are based on the needs of the customer, end users, organiseation, and process implementers” [3]. Project management is achieved by establishing quantitative objectives and then by composing a project process that should reach these objectives, given the measured performance history of subprocesses composing the project process. At Maturity Level 4 (ML4), information needs come only from “project management” and “quality assurance”. At this level, these two measurement interest areas are closely related, since project management is totally based on process performance, which is the object of quality assurance. It is outside the scope of this paper to list the advanced information needs of ML4 (Figure 8). References to more information needs from CMMI practices and sub-practices are listed in Table 3.


120

4.4. Level 5 – Optimising The Optimising level is reached when “processes are continually improved based on a quantitative understanding of the common causes of variation inherent in processes” [3]. “Maturity level 5 focuses on continually improving process performance through both incremental and innovative technological improvements” [3]. At Maturity Level 5 (ML5), information needs come only from “quality assurance”, since the aim is only to improve processes. It is outside the scope of this paper to list the advanced information needs of ML5 (Figure 9). References to more information needs from CMMI practices and sub-practices are listed in Table 4.


4.5. Overview An overview of the information needs from all maturity levels may help in understanding the scope of a software measurement process (Figure 10)1.

Figure 10: Distribution of information needs in all maturity levels 1

Readers must take into account that a number of information needs overlap one another.

121

Initially considering only ML2 and ML3 as the target for process improvement, the most signigicant measurement interest areas are, in order of importance, “verification and validation”, “project management” and “quality assurance”. This gives a good idea of which information needs should be given the most consideration when initiating the implementation of a software measurement program. Why is it better to consider ML2 and ML3, rather than ML2 only? The answer is that it would be irresponsible to ignore a measurement interest area like “verification and validation” when implementing a software measurement process. An emerging business needs to verify and validate the quality of its software products, and this is, in most organiseations, a key concern to be addressed through a software measurement process. In addition, a few studies have investigated the maturity level equivalence for those organiseations already ISO 9001:2000-certified and implementing CMMI processes between ML2 and ML3 [5][6]; for instance, an ISO-certified organiseation must – to be certified – demonstrate that a process is in place to identify and eliminate the causes of non conformities. This means that, for these organiseations, there should be documented evidence of some measurement-intensive process areas (PAs), such as Causal Analysis & Resolution (CAR), which corresponds to Decision Analysis & Resolution (DAR) at ML3. For this analysis, the staged representation of CMMI was chosen because CMMI was easier to analyze. In the staged representation, process areas are categorised under maturity levels. Consequently, some process areas are ignored in the earlier stages of that representation. However, in the continuous representation, the maturity levels also exist within each process area. Consequently, all process areas that are relevant for a given business may be considered in the earlier stages if the continuous representation is chosen. To end the discussion on the information needs that may be extracted from CMMI, it is important to keep in mind that CMMI only offers guidance, and that information needs should, above all, be business information needs.

5. ISBSG – As a Turnkey Solution 5.1. Introduction The ISBSG is a not-for-profit organiseation created in 1994 “to develop the profession of software measurement by establishing a common vocabulary and understanding of terms” [4, 7, 8]. It groups together national software measurement associations, currently representing 13 different countries. The ISBSG software project repository provides “software development practitioners with industry output standards against which they may compare their aggregated or individual projects, and real data of international software development that can be analysed to help improve the management of IT resources by both business and government” [8]. To achieve these goals, the ISBSG makes available to the public a questionnaire [8] designed to help in collecting data about projects, including software functional size measured with any of the measurement standards recognised by the ISO (COSMIC-FFP functional size – ISO 19761, etc.). The ISBSG assembles this data in a repository and provides a sample of the data fields to practitioners and researchers in an Excel file, referred to below as the ISBSG MS-Excel data extract. The ISBSG data collection questionnaire available at www.isbsg.org includes a large amount of information about project staffing, effort by phase, development methods and techniques, etc. Moreover, the ISBSG provides a glossary of terms and measures to assist in the collection of project data into the repository and to standardise the way the collected data is analyzed and reported [7].

122

The ISBSG data collection questionnaire includes 7 sections, subdivided into several subsections (Figure 11).

Figure 11: Structure of the ISBSG Data Collection Questionnaire The ISBSG project repository is mostly used for project productivity benchmarking and for building effort estimation models. In addition, Cheikhi, Abran and Buglione have investigated in [9] the extent to which the current ISBSG repository can be of use for benchmarking software product quality characteristics on the basis of ISO 9126. They also identify the subset of quality-related data fields made available by the ISBSG to industry and researchers, and illustrate its use for quality analysis. Therefore, even though the ISBSG data collection repository does not necessarily address the totality of the information needs of an organiseation, there are advantages to taking the ISBSG as a reference solution for initiating a software measurement program: • The ISBSG offers an existing measurement framework that can facilitate faster implementation of the software measurement process with industry-standardised definitions of base and derived measures throughout the project life cycle phases. This will align the internal project database repository with this international repository. • The ISBSG offers a database repository with data from over 4,000 projects, which means that it already contains valuable data.

5.2. Analysis The ISBSG data collection questionnaire is divided into multiple sections containing, in all, one hundred and thirty one (131) questions (some with a number of sub-questions). The analysis was made by categorising each question based on the measurement interest areas defined in this paper. The detailed analysis may be found in Table 5 at the end of the paper. 123

Briefly, the most important measurement interest areas in the ISBSG are “project management” and “quality assurance” (Figure 12). This is understandable, since the ISBSG mostly focuses on project cost and effort, identifying project types, allowing for the identification of the more productive practices and processes, etc.

Figure 12: Distribution of questions in the ISBSG Questionnaire

5.3. Comparison with CMMI Since there is no documented one-to-one relationship between CMMI practices and ISBSG questions, our comparison has been made on the basis of the percentage of interest given to each measurement interest area (Figure 13). The highlights of this comparison are the following: • The ISBSG focuses strictly on “project management” and “quality assurance”. • The ISBSG lacks “verification and validation” data. • The ISBSG does not consider “analysis” at all, not even risk analysis.

Figure 13: Comparison of ISBSG with CMMI 124

6. Conclusions The analysis of CMMI, from the perspective of a software measurement process, illustrates that information needs from ML2 and ML3 can provide guidance in identifying business information needs. As a consequence, ISO 15939 and CMMI may be used together as a starter kit when planning a software measurement program and related processes. However, even though CMMI particularly stresses “verification and validation”, it does not refer to, nor does it recommend, a specific a quality model or a specific set of verification and validation measures. Therefore, the design and selection of one or more quality models (and related measures) is left to the organiseations themselves. As a consequence, the information needs concerning verification and validation are stated in CMMI only at a very high level. To address this issue, ISO 91262 proposes and defines detailed quality models for both internal quality, external quality and quality in use. Furthermore, ISO 9126 proposes an inventory of over 200 measures about software quality, but it is left to the organiseations to select from these, which is, of course, a challenging task. Furthermore, the ISBSG was identified as a candidate turnkey solution for a software measurement process. Organiseations at a lower maturity level should, however, select only the subset of ISBSG measures that can be realistically collected in an organiseation initiating a measurement program, including measures concerning “project management” and “quality assurance”. Organiseations interested in implementing the full set of ISO 9126 quality models (internal quality, external quality and quality in use) must select and add the relevant measures proposed in ISO 9126, parts 2 to 4. A possible joint usage of the three elements (ISO 15939, CMMI and ISBSG) is presented in Figure 14.

Figure 14: CMMI, ISBSG and the Software Measurement Model Finally, even if ISO 15939 and CMMI could be used as a starter kit and the ISBSG is used as a turnkey solution, it is important to keep in mind that information needs must be identified both by and for the business.

2

The ISO is currently working on the next release of ISO 9126, which will be published as part of the ISO 25000 series.

125

7. References [1] [2]

[3]

[4]

[5]

[6] [7] [8]

[9]

ISO, IS 15939:2002, Information technology – Software engineering – Software measurement process, International Organiseation for Standardization, Geneva, 2002. Abran, A., Al Qutaish, R., Desharnais, JM., Habra, N, An Information Model for Software Quality Measurement with ISO Standards, International Conference on Software Development – SWDC-REK, University of Iceland, Reykjavik, Iceland, May 27-June 1, 2005. CMMI Product Team, Capability Maturity Model Integration for Development (CMMI-DEV, V1.2), CMU/SEI-2006-TR-008, Technical Report, Software Engineering Institute, Pittsburgh (PA), August 2006. ISBSG, Data Collection Questionnaire New development, Redevelopment or Enhancement Sized Using COSMIC-FFP Function Points, version 5.9, International Software Benchmarking Standards Group, 24/05/2005. Paulk, M. C., A Comparison of ISO 9001 and the Capability Maturity Model for Software, Software Engineering Institute, CMU/SEI-94-TR-012, July 1994, URL: http://www.sei.cmu.edu/pub/documents/94.reports/pdf/tr12.94.pdf Mutafeljia B. & Stromberg H., ISO 9001:200 - CMMI v1.1 Mappings, July 2003, URL: http://www.sei.cmu.edu/cmmi/adoption/pdf/iso-mapping.pdf ISBSG, Glossary of Terms, version 5.9.1, International Software Benchmarking Standards Group, 28/02/2006 ISBSG, Data Collection Questionnaire New development, Redevelopment or Enhancement Sized Using COSMIC-FFP Function Points, version 5.9.1, International Software Benchmarking Standards Group, 28/02/2006 Cheikhi L., Abran A. & Buglione L., The ISBSG Software Project Repository from the ISO 9126 Quality Perspective, ASQ Software Quality Professional, American Society for Quality, Vol. 9, No. 2, March 2007

126

8. Appendix A - Tables Table 1: Detailed analysis of CMMI Maturity Level 2 practices Information Measurement Interest Areas Needs CMMI Level 2

Requirements

Analysis

Design/ Implementation

Verification/ Validation

Configuration Management

Project Management

Quality Assurance

Training

Requirements Management 1.1.3, 1.2.2, 1.3.1, 1.3.3, 1.5.1

Mentioned Recommended Required Project Planning Mentioned

2.2.3

3.3

Recommended

1.2.2, 1.4.1, 1.4.3

Required Project Monitoring and Control Mentioned

2.1.1, 2.3.2

Recommended

1.1.1 to 1.1.5

Required Supplier Agreement Management Measurement and Analysis

This process area has not been considered in the analysis since the information needs extracted from all other process areas may also be applied to the suppliers. This process area has not been considered in the analysis since it takes the software measurement process and formulates it in terms of goals and practices. This process area is fully compatible with ISO 15939.

Process and Product Quality Assurance Mentioned

1.1.3, 1.2.3

Recommended Required Configuration Management Mentioned

2.1.1, 2.1.2, 3.2.1

Recommended Required Generic Goals Mentioned

2.8.1, 2.9

Recommended Required

Table 2: Detailed analysis of CMMI Maturity Level 3 practices Information Measurement Interest Areas Needs CMMI Level 3

Requirements

Analysis


Requirements Development Mentioned Recommended Required

3.5.1 3.2.1, 3.3.1, 3.3.4 3.3.5

Technical Solution

127



Project Management

Quality Assurance

Training

Information Measurement Interest Areas Needs CMMI Level 3 Mentioned

Requirements

Analysis


1.3.1

Recommended



Project Management

Quality Assurance

Training

3.1.3, 3.1.4, 3.2.5 2.1.3, 2.1.4, 3.1.2

Required Product Integration Mentioned

2.1.1

3.4.6


3.3.1, 3.3.2

Verification Mentioned Recommended 2.2.2 to 2.2.7, 3.1.1, 3.1.2, 3.2.1 to 3.2.5

Required Validation Mentioned

2.1


2.2.1 to 2.2.5

Organiseational Process Focus Mentioned 1.2.5, 1.2.6, 2.4.6


2.4.5

Organiseational Process Definition Mentioned Recommended Required

1.4.1 to 1.4.8

Organiseational Training Mentioned Recommended Required

2.2.1 to 2.2.4

Integrated Project Management Mentioned Recommended

1.3.7 1.2.2, 1.3.2, 1.4.3, 1.5.2

Required Risk Management Mentioned Recommended

2.2.1 to 2.2.3, 3.1.1

Required Decision Analysis and Resolution Mentioned

1.5.1, 1.6

Recommended Required Generic Goals

128

Information Measurement Interest Areas Needs CMMI Level 3

Requirements

Analysis




Project Management

Quality Assurance

Training

Mentioned Recommended Required

3.2.1


Requirements

Analysis




Project Management

Quality Assurance

Training

Organiseational Process Performance Mentioned Recommended 1.2.1 to 1.2.4, 1.4.1, 1.4.2, 1.5.1, 1.5.2

Required Quantitative Project Management Mentioned Recommended 1.1.3 to 1.1.5, 1.2.3, 1.3.4, 1.4.1 to 1.4.4 2.1 to 2.4

Required


Requirements

Analysis




Project Management

Quality Assurance

Organiseational Innovation and Deployment Mentioned Recommended Required

1.2 2.3

Causal Analysis and Resolution Mentioned

1.1, 1.2


129

Training

Table 5: Detailed analysis of the ISBSG questionnaire Information Measurement Interest Areas Types Design/ ISBSG

Requirements

Analysis

Implementation



Project Management

Quality Assurance

Training

Project Process Process Infrastructure Planning Specification Design Build or Programming Test Implementation or Installation Project Management and Monitoring

22, 24, 25 31

7

8, 9, 10, 11

12, 14, 15, 16, 17, 18, 19

13

24, 26

21, 23

29

30

29, 32

27, 28

35

36

35, 37

33, 34

40

41

40, 42

38, 39

46

44, 47

46, 48, 49

43, 45

50, 51, 52

Technology 53, 54, 55, 56, 57, 58, 59

General Information People and Work Effort Development Team

60, 65, 66

Customer and End Users

67, 68, 69, 70, 71

IT Operations

72, 73

Work Effort Validation

77, 78

61, 62, 63, 64

74, 75, 79, 80, 81

Product 83, 84, 85, 86, 87, 88

General Information COSMIC Project Functional Size New Development or Redevelopment Software Size

92, 93

Enhancement Software Size

98, 99, 100, 101, 102

Context of the Functional Size Measurement

105, 106

92, 93

89, 90, 91

94, 95, 96, 97

109

107

Experience of the Functional Counter

108, 110, 111, 112

114, 115

Project Completion General Information

124

User Satisfaction Survey

123

118, 119, 120

125, 126

Project Costs

128 129, 130, 133, 134

Cost Validation

130

131

113, 116, 117

Advancing Functional Size Measurement: Which size should we measure? Charles Symons

Abstract A tape measure is calibrated showing a standard unit of measure, e.g. the centimetre. But if you use a tape to measure an irregularly shaped object, then you have a choice of sizes that can be measured. The same is true for a piece of software, which has many possible functional sizes. ‘Traditional’ or ‘1st generation’ Functional Size Measurement (FSM) Methods such as the IFPUG method and its variants, the MkII FPA method, etc., have not properly recognised this problem because they were designed to be applicable only in the domain of business application software – though the problem exists to a degree even using these methods in this domain. The ‘2nd Generation’ COSMIC method was designed to be applicable to measure sizes of any ‘data-rich’ software, i.e. of business and real-time software. It has been applied to measure functional sizes of complex real-time software architectures such as found in largescale telecoms and avionics systems. In such environments the measurer is faced with sizing requirements that may be spread over multiple layers of software, that are expressed from different viewpoints and at different levels of granularity, and that will execute on multiple technical platforms. Consequently, many valid functional sizes can be measured for such software, all with the same unit of measure. Clearly, if this problem were to remain unresolved, it would be impossible to be sure that size measurements from different sources could be meaningfully compared, even on the same piece of software. This paper will introduce the measurement challenges and will propose a set of concepts and standards designed to ensure the comparability of functional size measurements from different sources. This is vital if such measurements are to be used for reliable comparisons of performance, and for estimating and benchmarking. The proposals are valid for any FSM Method. Adoption of these concepts and standards by the software metrics community should lead to greater credibility and acceptance of FSM Methods, with potentially enormous benefits for the software industry. These concepts and standards have been developed by members of the COSMIC Measurement Practices Committee, on whose behalf this paper is presented.

1. Introduction Perhaps the three best-known methods of measuring a functional size of software are the socalled ‘1st Generation’ IFPUG [1] and MkII FPA [2] methods and the ‘2nd Generation’ COSMIC [3] method. All three methods assume that the functional user requirements1 of some software to be

1

This paper uses the terminology of the standard ISO/IEC 14143/1:2007 ‘Functional Size Measurement – Part 1 Definition of Concepts’, notably for terms such as ‘functional user requirements’, functional size’, ‘user’, ‘BFC’,

131

measured can be analysed so as to identify certain types of ‘base functional components’ (or BFC’s) defined by the method. Each method assumes a size unit of measure e.g. a ‘function point’ and assigns a fixed number of size units to each BFC-type. The functional size of a given piece of software according to the method is then obtained by adding up the sizes of all the BFC’s identified in the software’s functional user requirements (or ‘FUR’). As an example, all three methods define a type of component of functionality that is essentially the same, but is given a different name in each method, i.e. the ‘elementary process’ (IFPUG), the ‘logical transaction’ (MkII FPA) and the ‘functional process’ (COSMIC). Each method then classifies or sub-divides this component so as to define the method’s BFC-types which are unique to each method. The size units that are attributed to the BFC’s are also unique to each method, but the underlying approach of ‘functional size measurement’ (FSM) methods is the same for all methods. All FSM Methods therefore behave like any measurement method. They define a general size concept (‘functional size’, a size based purely on functionality, ignoring any technical or quality requirements) and a unit of measure2, e.g. ‘a COSMIC function point’, where the latter varies with the method. So far, so good. However, a piece of software is like an irregular object and, as we shall discuss below, it has many possible sizes for its different dimensions. The general problem may be illustrated by supposing two pieces of software A and B and two associated functional size measurements SA and SB, measured using the same FSM Method. The question we must be able to answer is ‘is the ratio SA / SB a valid and fair measure of the relative sizes, or is it a meaningless because the sizes SA and SB are of different dimensions?’ FSM Methods in general, have hitherto given only limited guidance on ‘which’ size to measure in various circumstances. The IFPUG method, for example, distinguishes procedures for measuring: • The work-output of a development project (the size of the functionality delivered with the first installation of the software when the project is complete). • The work-output of an enhancement project (the size of the functionality of some existing software that has been changed when the project is complete). • An application (the size of the functionality of an installed application, which is updated every time the application is enhanced). The MkII FPA method makes a similar set of distinctions. The COSMIC method, up to its current publicly-available version 2.2, recognised that more parameters need to be considered to define a particular type of measurement. But with growing experience in use of the method, a better understanding has been gained of the challenges of etc. The paper assumes a basic understanding of functional size measurement methods and the terminology used. 2 To be strictly accurate 1st Generation FSM Methods do not define a unit of measure. Neither the IFPUG method nor the MkII FPA method define one ‘Function Point’. For example, IFPUG BFC’s are assigned sizes from 3 to 15 Function Points.

132

functional size measurement allowing more refinements to be introduced. The next version 3.0 of the method3, to be published in the first half of 2007, will include what COSMIC now believes to be a much more comprehensive set of parameters that must be considered when defining ‘which’ functional size should be measured of a given piece of software. The process of defining this set of parameters for a given measurement is known as determining the ‘Measurement Strategy’. The purpose of this paper is to give a preview of the Measurement Strategy concepts and the process as it will be described in the COSMIC Measurement Manual version 3.0 when it is published. At the time of writing this paper, the description given below has not been completely finalised, but it is believed that the final description will not change in any important respects. The ideas proposed herein have been developed by members of the COSMIC Measurement Practices Committee, on whose behalf this paper is presented. It is important to recognise that the Measurement Strategy parameters and its process are totally independent of the COSMIC method’s view of ‘how’ to measure a functional size of a piece of software. The process, the parameters and the proposed standards of the Measurement Strategy that determine ‘which’ size to measure should be valid in principle for any FSM Method, regardless of ‘how’ it measures a functional size. (Some of the parameters given below are described as ‘defined in the COSMIC Measurement Manual v3.0’, only because the definitions have been taken from that publication.) Another important point to emphasise is the economic importance of these Measurement Strategy parameters and proposed standards. Functional size measurements are routinely used as input to project estimating methods and as measures of work-output for software projects. It is highly desirable therefore for all parties involved in FSM measurement (e.g. suppliers and users of FSM Methods, benchmarking services, estimating tools, etc) to agree on these parameters and proposed standards. Only with such agreements will it be possible to be sure that any two functional size measurements from different sources can reliably be compared, even if made with the same FSM Method.

2. The parameters of a Measurement Strategy for a piece of software A given piece of software can have many possible functional sizes depending on several factors. The choice of which size to measure depends entirely on the Purpose of the measurement. As an example, consider two purposes: • A distributed application system is to be developed whose major components will execute on different technical hardware. The measurement purpose is to obtain sizes of each component separately that can be input to an estimating tool that will take into account the different technical platforms when estimating the total effort to develop the application. • The same application as in A will be developed partly from new code and partly from re-used or packaged software. The purpose is to measure the proportion of re-used functionality of the whole application (i.e. ignoring its breakdown into components) in order to compare the actual re-use against a target for re-use.

3

From version 3.0 of the method, its name will be simplified from the ’COSMIC-FFP’ method to the ‘COSMIC’ method. the name of the unit of measure is being changed to a ‘COSMIC Function Point’, abbreviated as ‘CFP’.

133

The total size of the application resulting from these two purposes will not be identical and the sizes of the constituents will of course be quite different. Setting a Measurement Strategy involves determining three main parameters for the piece of software to be measured: • The scope of the piece of software. • The viewpoint of the user of the software. • The level of granularity of the functional user requirements of the software to be measured. Although these parameters should be determined in this sequence, each depends on the measurement purpose. Determining the purpose is therefore the first and most important step of setting a Measurement Strategy.

2.1. The ‘scope’ of the software being measured The ‘scope’ of a functional size measurement is defined in the COSMIC Measurement Manual as “the set of functional user requirements to be included in a specific functional size measurement instance”. The scope is determined by examining the two groups of parameters below. 2.1.1. The ‘level of decomposition’ of the software to be measured This parameter is defined in the COSMIC Measurement Manual v3.0 as “any level of division of a piece of software showing its components, sub-components, etc” 4. The reason this parameter is significant is that functional sizes of the components of a piece of software ‘X’ cannot be simply added up to obtain the size of the whole piece ‘X’. This is because the size of any one component will include an allowance for the messages it exchanges with other components (according to all FSM Methods). When the sizes of the major components of a piece of software X are measured separately and these sizes are added up, their total will exceed that of the size of X measured as a whole, due to the size contributions of the inter-component message exchanges. Clearly the finer the sub-division of the whole X (i.e. the more components it is divided into), the greater will be the disparity between the sum of the sizes of the components and the size of the same software X measured as a whole. Ideally one would like to standardise certain levels of decomposition, but these are very difficult to define precisely. Possible candidates for standard levels of decomposition that must be useful in practice, e.g. for estimating are: • A whole application. • A major component of a distributed application that executes on a single hardware platform, that is different from the platform on which other major components execute. • A re-usable object-class.

4

The reader’s attention is drawn to the definitions of two terms that must be carefully distinguished. ‘Level of decomposition’ relates to the software itself. ‘Level of granularity’ (defined later in this paper) relates to the description, e.g. the functional user requirements, of the software.

134

The problem with this set is the starting point. What one organiseation defines as an ‘application’ might not be considered as such by another organiseation. Other possible hierarchies of levels of decomposition are even harder to define precisely. If two organiseations happen to define their hierarchies using the same terms such as ‘system’, sub-system’, ‘program’, and ‘module’, there can be no certainty that these levels correspond at any levels since these terms can be interpreted in many ways. 2.1.2. The possible components of the delivered size For any piece of software at any level of decomposition as defined above, the functionality of the software delivered by a project team (i.e. the team’s ‘work-output’) may consist of: • Newly developed functionality. • Changes to existing functionality. • Existing functionality that has been re-used, unchanged5. This classification into three groups is important in practice for purposes such as estimating, because each group will normally be associated with a different productivity.

• • • • •

The functional sizes that can be measured and that should be distinguished are as follows: The size of newly developed functionality. The size of functionality that has been changed. The size of changes to functionality that has been changed. The size of re-used, existing functionality that is delivered unchanged. The size of functionality after it has been changed.

We can now see the limitations of the advice given by the IFPUG and MkII FPA methods on measurement of ‘different types of sizes’. Both methods define the work-output of a ‘development project’ as newly-delivered functionality, without distinguishing whether it is newly-developed, or reused, existing functionality. Both methods define the work-output of an ‘enhancement project’ as the sum of added, modified and deleted functionality. However, the IFPUG method measures size b) above of an enhancement whereas the MkII FPA method, which could measure size b), recommends instead measuring size c) as a better measure of the work-output of an enhancement project (as does the COSMIC method). A high proportion of software projects nowadays involve some re-use of existing software in the form of application packages, or of ‘COTS’ (Commercial Off-The Shelf) software, or of re-usable object-classes, together with some newly-developed or changed software. Understanding the sizes of these various contributions to the total delivered size becomes increasingly important for performance measurement and estimating.

5

It might be argued that this third category ought to be split between: · existing functionality that has been re-used unchanged, and · changes to functionality that, before the project, was re-used, unchanged functionality However, most often re-usable functionality is not changed, since doing so negates the benefits of re-usability. We therefore ignore this possible fourth category.

135

2.2. The ‘viewpoint’ of the ‘user’ of the software being measured The 2007 edition of the ISO/IEC 14143-1 standard on functional size measurement concepts defines the ‘user’ of a piece of software as “any person or thing that communicates with or interacts with the software at any time”. If we now take a typical piece of real-time application software such as the embedded application of a multi-function printer/copier machine, we can see that the following can be interpreted as ‘users’ of this application according to this definition: • Any human operator of the machine who interacts indirectly with the application via buttons, displays lights and such-like. • Any of the hardware devices that interact directly with the application. • Any peer application, such as the software of a PC which sends files to be printed, assuming the printer/copier is networked. • An operating system, if there is one, on which the printer/copier application relies. The meaning of the term ‘user’ is therefore not altogether satisfactory in that for functional size measurement purposes, the operating system is almost never thought of as a user of an application. The reverse is true. An application is invariably a user of an operating system (assuming there is one). The Functional User Requirements or ‘FUR’ of an application would never normally include the requirements of the operating system as a ‘user’. For one thing, the FUR of an operating system are common to all applications that use it. This leaves us with three types of possible users of this application, namely human operators, hardware devices and peer applications. Now consider the functionality of the printer/copier from the viewpoint of the human operator versus that of the hardware devices. Even without any detailed knowledge of the internal working of this device, it should be clear that each type of user ‘sees’ different functionality. Specifically, a human operator cannot ‘see’ all of the functions that the application must provide for the machine to work. Human operators may guess how certain functions work, such as the detectors of paper jams, ink running out, paper present/absent, the automatic tests that run on start-up, all the adjustments that follow from pressing a ‘darker/lighter’ button when copying, etc. But these functions result from direct interactions between the application and the many and various engineered hardware devices that the application drives and many of these interactions are invisible to the human user. If we were to measure the functional size of such an application from the human operator-user viewpoint and again from the viewpoint of the engineered hardware devices as users of the application, then we would find a much smaller functional size from the human viewpoint. Similarly, peer applications see an even more limited set of functionality (via the defined Application Program Interface) than the human user. We conclude that the definition of the ‘user’ of a piece of software to be measured, and his/its ‘viewpoint’ needs to be more differentiated if any resulting functional size measurements are to be correctly interpreted. (This problem does not really exist for 1st Generation FSM Methods. These were designed to measure only ‘whole’ business applications and changes to them and they provide rules for measuring from the viewpoint of human and peer application users. The need to solve this

136

problem became apparent when using the COSMIC method which was designed to be applicable for business and real-time software at any level of decomposition, for which the users can be any of the three types given above.) The simple solution defined in the COSMIC Measurement Manual version 3.0 is to introduce the concept of a ‘functional user’, defined as “a (type of) user that is a sender or intended recipient of data in the functional user requirements of the software to be measured”. This concept of the ‘functional user’ makes a direct link to the ‘FU’ in ‘FUR’. In the vast majority of practical applications of FSM, the measured functional sizes are used for purposes related in some way to the effort to create or to modify the software to which the FUR apply. So if the purpose is to estimate the development effort of some new piece of software, the FUR to be measured will be those that the developer must satisfy. If, therefore, the purpose were to measure the development effort of the printer/copier application, the FUR to be measured would be those where the functional users are the hardware devices that the application must interact with directly. Measuring a set of FUR that described only the indirect interactions of the application with a human user would give a functional size that would be too small, which had no relation to the development effort and would thus be unsuitable for estimating. Sometimes, however, it is of great interest to measure a piece of embedded application software from the viewpoint of the human user as operator, rather than from the viewpoint of the hardware devices. Toivonen, for example, used the COSMIC method to measure the functionality offered by two mobile phones to their human functional users. He then related the functional sizes to the respective memories provided with the phones so as to compare their relative efficiency in their use of storage for the functions provided to human users [4]. Suppliers of benchmarking services and of estimating tools to the business application community have taken it as obvious that the functional sizes they require are those corresponding to the associated project development or enhancement effort, and they usually recommend 1st Generation FSM Methods to measure those sizes. These methods were designed to measure business application software, only from the human user and peer application viewpoints, so the measurement methods are consistent with the related services. The dangers are clear, however, when suppliers of such services and tools extend their offering to the domain of real-time software. Not only are the 1st Generation FSM Methods stretched beyond their original design goals, but these methods give no guidance on which functional user viewpoint to assume in cases where there may be a choice. Users of 2nd Generation FSM Methods such as COSMIC that are designed to measure functional sizes in both the business and real-time domains should adopt a standard that any functional size measurement must be accompanied by a definition of the type of functional user from whose viewpoint the measurement was made.

2.3. The ‘level of granularity’ of the FUR of the software being measured As the FUR of a piece of software evolve early in the life of a development project, often the need for more and more requirements is discovered and so the extent of the software appears to grow. This phenomenon, known as ‘scope creep’, is well understood, and is NOT what we are

137

talking about here. In this section we assume the overall scope of the FUR to be fixed. We will examine how the FUR of a piece of software to be measured are developed in more detail as the project progresses. A ‘level of granularity’ has been defined in the COSMIC Measurement Manual v3.0 as follows: “Any level of magnification of the description of a piece of software (e.g. a statement of its requirements, or a description of the structure of the piece of software) such that each increased level of magnification of the description reveals the software’s functionality at an increased and comparable level of detail”. As an analogy to illustrate this definition, consider three maps of a nation’s road system: • Map A shows only motorways and main highways. • Map B shows all motorways, main and secondary roads (as in a motorist’s atlas). • Map C shows all roads of all types (as in a set of local district maps). These maps reveal the details of the national road network at three different levels of granularity, each with their different map scales. Note that the overall scope of the maps (the nation) is the same in each case. If the purpose is to measure the total size of the national road network, then this can only be accurately measured using Map C. Similarly with software. Suppose, as an example, we document the FUR of a piece of software, e.g. a simple order-processing system, from the viewpoint of its human functional users as the FUR evolve through four levels of granularity. The structure of the system might be summarised as follows”: • System Level: The order processing system. Sub-system Level: Customer maintenance, product maintenance, order registration, invoicing and goods despatch sub-systems. • Functional Process Level: (or Elementary Process6 or Logical Transaction Level) For the Customer maintenance sub-system, the functional processes could be: o Add customer o Modify customer. o Delete customer. o Enquire on customer. Similarly for the other sub-systems, each being ‘magnified’ to expose its functional processes. • Functional Process Component Level: When a functional process is magnified to the level of granularity at which functional sizes can be accurately measured, the Base Functional Components (or BFC’s) revealed depend on the FSM Method. For example, the ‘Add customer’ functional process measured on the COSMIC method would probably be analysed to consist of four ‘data movements’ (the BFC-type of the COSMIC method):

6

The IFPUG method also requires the identification of ‘Logical Files’ at the Elementary Process level which, on further magnification are broken down to the ‘Record Type’ Level for measurement. These parallel levels of Base Functional Components are ignored for this discussion

138

o Enter customer (for the entry of data about the new customer). o Read customer (to check that the entered data is not for a customer that already exists). o Write customer (to move the validated customer data to persistent storage). o Exit messages (for error messages, or to indicate successful completion). So the functional size of this functional process is 4 CFP (COSMIC Function Points). All of the above should be familiar and obvious to any experienced user of any of the main 1st or 2 Generation FSM Methods. The critical level of granularity on which these FSM Methods all rely to ensure that their rules for sizing are unambiguous is that of the ‘functional process’ (or ‘elementary process’ or ‘logical transaction’, depending on the method). As stated above, although the three methods have different definitions for this standard component of functionality, they are all striving to define the same concept. If there were no common agreement on the functional processes of a piece of software that must be measured, there would be no hope of achieving repeatable measurements with any FSM Method. Functional sizes can of course be measured early in the life of a project, when the FUR have only been defined at a high level of granularity such as, say, the sub-system level. But it seems to be impossible to define any level of granularity higher than that of a functional process in a way that everyone could interpret in the same way. How do you define a ‘sub-system’ for example in any universally meaningful way? Therefore, to measure a functional size of a sub-system, say, of some software being developed, the artefacts of the sub-system must be measured in some local way and a local process must be worked out and calibrated locally to scale those measurements to the level of granularity of functional processes and their components. Only functional sizes measured at, or scaled to, the level of functional processes and their components have any hope of being repeatable and universally understood. The existing COSMIC Measurement Manual, version 2.2 gives examples of methods of scaling sizes measured on software artefacts at high levels of granularity to sizes at the level of functional processes; version 3.0 will give more examples.] The COSMIC method’s definition of a functional process, for example, is as follows: “An elementary component of a set of Functional User Requirements comprising a unique cohesive and independently executable set of data movement types. It is triggered by a movement of data (an Entry) from a functional user that informs the software that the functional user has identified a triggering event. It is complete when it has executed all that is required to be done in response to the triggering event type. nd

NOTE: In addition to informing the software that the event has occurred, the Entry triggered by the event may include data about an object of interest associated with the event,” A key feature of this definition is that a functional process is triggered by a (single) functional user (-type) detecting a (single) event (-type). A functional user cannot be, for example, a ‘department’ staffed with humans that handle multiple types of functional processes, or a ‘control panel’ that has many types of instruments as functional users. But what if the functional user is a piece of software? We have already seen from 2.1 above that a piece of software can be decomposed in multiple ways, e.g. from a ‘system’ or an ‘application’, down to an ‘object-class’ or to a ‘module’, depending on the technology. If a functional user can be a piece of software and the latter has no unambiguously definable level of decomposition, it follows that the level of granularity of any associated functional processes is not absolutely definable.

139

This conclusion is further illustrated by the following example from a major manufacturer of telecoms equipment and associated software. The diagram shows part of the functional user requirements (FUR) of a ‘Logical Network Element’ (LNE) and the details of the FUR as they are magnified at two lower levels of granularity. The point of interest of the analysis approach of this example is that the LNE is also decomposed at each level of granularity.

The diagram shows, at the highest level of granularity, a single functional process of Logical Network Element 1 (LNE1). As far as this functional process is concerned, LNE1 has two functional users at the same level of granularity, namely LNE2 and LNE3. These users are peer pieces of software. Some data enters LNE1 from LNE2 and some data is sent by LNE1 to LNE2 and to LNE3. Some data is also sent to and retrieved from storage by LNE1. At one lower level of granularity, the diagram shows that LNE1 is decomposed into four System Components, namely SC1 – SC4. In other words, at the SC level, there is no longer one measurement scope (of LNE1), but four scopes, one for each SC. At this level, the functional users of each System Component are either other System Components in LNE1 or are System Components within LNE2 and LNE3 (the diagram does not illustrate this latter aspect). The single functional process at the LNE1 level has been decomposed into three functional processes, one in each of the System Components SC1, SC2 and SC4. (We now see that SC3 does not participate in the functional process at the LNE1 level.) At the lowest level of granularity, we see that each System Component is decomposed into a number of Sub-systems. In total there are now nine measurements scopes within the one LNE. At this level, the functional users of any one Sub-system are either other Sub-systems within LNE1 or are Sub-systems in LNE2 or LNE3 (the latter are not illustrated). The single functional process at the LNE1 level has now been decomposed into nine functional processes at this lowest level of granularity, one in each Sub-system.

140

At each level of granularity, some data is moved to storage and some is retrieved from storage. The diagram shows which components of LNE1 are involved in this functionality as we decompose to lower levels of granularity. This diagram therefore illustrates that when a pure software architecture is viewed at different levels of granularity and is also decomposed at each level into its components, such that the functional users are always other pieces of software, functional processes can be defined at any level of granularity. The result of this analysis also shows that the size of the functionality shown in the diagram must increase as the detail of more components and functional processes is revealed at lower levels of granularity/decomposition. This ‘growth’ is analogous to what we have seen in the example of the road maps. As we move from a large-scale map to one of smaller scale showing more roads, so the size of the road network appears to increase, although the unit of measure for all maps (e.g. the kilometre) is the same. This finding is extremely important for organiseations involved in comparative benchmarking or in the supply of or use of estimating tools. When the functional users of some software to be measured include humans or engineered devices, it is possible to define the level of granularity of functional processes unambiguously because everyone can recognise a single human or a single engineered device. But in a context where the functional users are only pieces of software, there is no one unique level of granularity/decomposition at which functional processes can be defined as a basis for proceeding with measurement. Within the local context of the telecoms manufacturer that provided this example, it is possible to set an unambiguous local standard for the level at which sizes are measured. For this manufacturer, separate project teams start to develop software at the Sub-system level; Sub-systems are autonomous applications. It is therefore at this level of granularity/decomposition that the telecoms manufacturer wishes to measure functional sizes for project estimating purposes. But other organiseations might have quite different ideas on what is a ‘Sub-system’. So if measurements made in pure software architectures must be compared from different organiseations, very great care must be taken to check that the measurements have been made at comparable levels of granularity and of decomposition.

2.4. The Measurement Strategy Process The four elements of the measurement strategy process should be carefully considered before starting a measurement to ensure that the resulting size can be properly interpreted. The four elements are: • Establish the purpose of the measurement. • Define the scope of each piece of software to be separately measured, considering all the parameters decribed above. • Establish the viewpoint of the functional users of each piece of software that is to be measured. • Establish the level of granularity of the software artifacts to be measured and how, if necessary, to scale from the sizes of the artefacts to sizes at the level of granularity of functional processes. Some iteration may be needed around steps (b), (c) and (d) when requirements are evolving and new details indicate the need to refine the definition of the scope(s) to be measured.

141

The great majority of functional size measurements are carried out for a purpose that is related to development effort in some way, e.g. for developer performance measurement, or for project estimating. In these cases, defining the measurement strategy should be very straightforward. The purpose and scope are usually easy to define, the functional users are the users for whom the developer must provide the functionality, and the level of granularity at which the measurements are required is that at which single functional users detect single events. But not all measurements fit this common pattern, so the measurement strategy parameters must be carefully defined in each case.

3. Summary and Conclusions To ensure that functional size measurements can be interpreted unambiguously, there is a need for various standards that help define ‘which’ size of a piece of software has been measured. Such standards should be adopted by suppliers and users of FSM Methods, estimating tools, benchmarking services and such-like. Standards are needed in three areas: 1 To define standard measurement scopes: o Standard levels of decomposition, e.g. an ‘application’, a major component of an application, an object-class. o Standard types of delivered work-output (a new development, a set of changes to existing software, some existing software that is re-used, unchanged) and their respective possible size measurements. 2 To require that any functional size measurement should be accompanied by a statement of the viewpoint of the functional user (-types) from which the measurement was made. 3 To require that all functional size measurements be made, if possible, at the level of granularity at which functional processes and their components can be unambiguously identified. Where this is not possible, i.e. in pure software architectures where all the functional users of the software to be measured are other pieces of software, local standards must be established to ensure the comparability of the levels of granularity of measurements between those sharing data. It is important to remember that these elements of a Measurement Strategy are not tied to the COSMIC FSM Method, but should be common to all FSM Methods. It is only the broader applicability and flexibility of the COSMIC method that has required these elements to be considered more carefully than with 1st Generation FSM Methods.

4. References [1] [2] [3] [4]

Function Point Counting Practices Manual, release 4.2, The International Function Point Users Group, 2004. MkII Function Point Analysis, Counting Practices Manual, version 1.3.1, The United Kingdom Software Metrics Association, September 1998. The COSMIC Measurement Manual (The COSMIC Implementation Guide for ISO/IEC 19761:2003), version 2.2, January 2003. Toivonen, H., ‘Defining Measures for Memory Efficiency of the Software in Mobile Terminals’, International Workshop on Software Measurement, Magdeburg, Germany, October 2002.

142

Changing from FPA to COSMIC A transition framework Harold. van Heeringen

Abstract Many organiseations are considering to change their functional size measurement method from FPA to COSMIC 1, mainly because of the fact that more and more projects become ‘less sizeable’ with FPA. A lot of them refrain from doing so, because they fear to loose their experience base based on function points. This paper presents a framework to successfully transform a functional size measurement method from FPA to COSMIC. The purpose of this paper is to offer some guidance to make it easier for organiseations to transfer to COSMIC, while keeping the experience data that has been gathered with FPA. It continues on earlier work of Vogelezang and Lesterhuis (2003) and Desharnais, Abran & Cuadrado (2006) For this paper, 26 projects have been sized with both methods. An analysis will be presented to see if there are differences between measurements carried out in COSMIC and FPA. Furthermore, an analysis will be presented about the outliers (the data points that do not correlate with the regression formula).

1. Introduction Many organisations worldwide rely in their project estimation on the functional size measurement method Function Point Analysis (FPA). FPA is the worldwide standard functional sizing technique and has been used since the early eighties of the previous century as a method to estimate software development projects. Many organiseations worldwide have been using the method for many years now and these organiseations often have build up a history database with metrics based on FPA. FPA has been developed in the era where software development environments were rather stable. In those days most of the applications were programmed in a 3GL programming language like Cobol, while the waterfall development method was used on a mainframe hardware platform. The counting guidelines of FPA are well applicable on systems with these characteristics. However, more and more organiseations encounter problems in applying the FPA counting guidelines to more modern types of documentation, like for instance UML. This leads to more projects that are rated ‘uncountable’, which leads to less reliable estimations and less confidence in functional size measurement methods (FSM). In the late nineties, an international consortium of scientists and practitioners decided to develop a new FSM and called it COSMIC Full Function Points. COSMIC is a so-called 2nd generation FSM [1] and is, like IFPUG FPA [2] and NESMA FPA [3][4] also an ISO certified functional sizing method [5]. COSMIC is now gaining more and more attention from the metrics community worldwide and is recognised as the successor for FPA in many organisations.

1

From version 3.0 of the COSMIC manual on [7], the method’s name is revised from COSMIC-FFP to COSMIC and its size unit is renamed from cosmic functional size unit (cfsu) to COSMIC Function Point (CFP). These new names have been used in this paper.

143

2. Reasons to apply COSMIC instead of FPA One of the complaints from FSM practitioners these days is that the FPA functional sizing method is becoming hard to apply on a number of new forms of functional requirement documentation. In this era of web applications and service oriented architectures, the guidelines to identify logical files for instance are sometimes hard to apply. Many applications don’t even work with permanent data any more, so there will be no EIF’s and ILF’s present. This notion leads to less confidence in the method and of course in less confidence in the estimations based on this method. COSMIC can often be applied in circumstances where FPA cannot be applied. One of the reasons is the possibility to identify separate layers and/or peer components in an information system. Where FPA only considers a logical transaction from the start (for instance user input) until the end (for instance write to database), the concept of peer components makes it possible to size the different subsystems that carry out the functionality. COSMIC is therefore better suitable to size applications with different technical components. Furthermore, with COSMIC there is the possibility to measure size of software in other domains than the traditional business application domain. While FPA can only be used to size business applications, COSMIC claims to be applicable in the real-time software and the infrastructure software domain, next to being applicable in the business application domain as well. Organiseations that develop software in these two domains do not have a real choice when it comes to choosing the appropriate functional sizing method. Another reason to consider a transfer to COSMIC is because with this method the size differences between separate functions can be expressed more accurately. In FPA, an EI function for instance gets 3, 4 or 6 function points. A complex one gets 6 points, but a very complex EI also gets 6 points. In COSMIC, the size for a function can be any number between two and (theoretically) infinity. It is therefore possible to state that function A is for instance twice as big as function B. Measurement becomes more accurate and this has also an effect on the whole Estimating and Performance Measurement process within organiseations. This characteristic also makes it easier to carry out some form of scope management, like for instance SouthernSCOPE [6], making it easier to accurately define the scope that fits within the budget available. Next to these arguments, COSMIC claims to be a more intuitive method, with fewer guidelines than FPA. This makes it easier to learn the method and to apply it correctly. Theoretically, this may be true. However, in our experience this argument is not entirely valid. We experience as much discussion in COSMIC analysis as in FPA and our research shows us that there are no significant differences between the ‘analysis speeds’ of the methods. The launch of version 3.0 of the COSMIC Measurement manual [7] and the existence of the Business Application Guideline [8] may prove that COSMIC is a faster method, but this needs further investigation. So, why is not every organisation transferring their FSM to COSMIC at the moment? One of the reasons is certainly because the method is relatively unknown. Another reason can be that there are relatively few analysts available who know the method and who are able to do a good COSMIC analysis. Furthermore the number of training facilities is small. In addition, many organiseations that do consider a transfer, fear to loose their experience base in FPA. If that is the case, they feel that they throw away a lot of experience data which had cost a lot of work to collect. In section 7 of this paper a framework is presented that helps organiseations to transform their functional sizing method from FPA to COSMIC without losing their experience base. First, the differences and similarities between the two methods are discussed.

144

3. Differences and Similarities Both FPA and COSMIC are ISO certified Functional Sizing Methods [1][5][3]. This means that both methods can be used to measure the functional size of software, independent of the technical implementation or the (skilled) individual conducting the analysis. However, measurements carried out with the two methods do not yield the same results. Unlike other areas where there are multiple measurement methods to measure the same metric (like for instance the length metric in meter or yard), it is not possible to apply a mathematical sound conversion formula. The reason for this is the fact that at this moment it is not possible to do an exact conceptual mapping of the base functional components [9] that both methods measure. Gencel and colleagues [10] are currently conducting research on an FSM unification model, which may lead to an exact conceptual mapping between the methods in the future. The most important differences in the two methods are presented in Table 1. Table 1: Main differences between FPA and COSMIC Characteristic NESMA/IFPUG FPA COSMIC Applicable on domain Business software Business / Real-time / Infrastructure software Data model required? Required Not required (but very handy) Measurement of separate Not possible Possible components Size limit per function Yes Size is not limited Benchmarking data Lots [11] Few [11] (ISBSG R10 n= 3108) (ISBSG R10 n=110) Measurement of No No, but local extension is processing functionality possible Early sizing Based on datamodel Based on process model However, there are also a lot of similarities between the two methods. Both methods only size the functional user requirements out of the total set of requirements. Furthermore the functional user requirements identified are broken down in functions by both methods. In FPA there are two types of functions: data functions and transactional functions. In COSMIC, there is only one type of function: the functional process. The COSMIC Functional process coincides strongly with the transactional function in FPA. Schematically, both methods look like this: Users

Transactional Processes

EI

Data

ILF

EQ

EIF

EO

Figure 1: COSMIC schematically

145

Users

Functional Process

Data

E

W

X

R

Figure 2: FPA schematically In detail, the methods can be compared in the following way [10]: Table 2: Comparison of the two FSM methods FSM Method Data types Data size Transaction types Transaction FPA ILF External input (EI) # RET’s EIF External output (EO) # DET’s External Inquiry (EQ)

COSMIC

Transient

Persistent

Part of Functional Functional Proces process

Transaction size # DET’s # files referenced

Entry

# data movements

Exit

# data movements

Read

# data movements

Write

# data movements

In short: FPA sizes the logical entities (ILF’s and EIF’s) in the data model with a number of function points (limited to 15 points per entity). Furthermore, FPA identifies logical transactions (EI, EO, EQ) and sizes these with function points (limited to 7 points per transaction). COSMIC does not size the entities in the data model, but it does size the logical transactions and there is no size limit of the logical transactions. However, as the number of entities in the data model increase, the average number of data movements within the functional processes is likely to increase as well. This means that the data model that is explicitly sized within FPA is only implicitly sized in COSMIC.

146

This leads to the assumption that there is no definite conceptual mapping possible between the BFC-types of the two methods, but a strong relationship between the outcomes measured with both methods is very likely [9]. There have already been reported a number of conversion formula’s in the past few years.

4. Previous Studies Convertibility between functional sizing methods is a highly relevant topic, which is proven by the fact that there have been a number of studies that report a conversion formula. These studies have been carried out by measuring a number (N) of projects with both methods, while using the end user measurement viewpoint from COSMIC. The findings of these studies are presented in Table 3. Table 3: Previous conversion studies Author / year Formula Fetcke (1999) [12] Y(CFP) = 1.1 (IFPUG) – 7.6 Vogelezang & Lesterhuis Y(CFP) = 1.2 (NESMA) – 87 (2003) Y(CFP) = 0.75 (NESMA) – 2.6 (200 FP) Desharnais & Abran Y(CFP) = 1.0 (IFPUG) –3 (2006) [16] Y(CFP) = 1.36 (IFPUG-TX) +0 (Transactions only)

Correlation R2 = 0.97 R2 = 0.99

R2 = 0.93 R2 = 0.98

N 4 11

14

The findings of these studies make it clear that there is a high correlation between functional size measured in COSMIC and functional size measured in IFPUG or NESMA FPA and that the conversion formula is in most cases close to 1 on 1. However, in most of these studies a number of serious outliers have been reported.

5. New analysis In 2006, Sogeti has sized 26 projects in both FPA and COSMIC. In the COSMIC measurements, only the end user measurement viewpoint has been used, to make the outcomes of the analysis comparable. The measurements have been carried out by NESMA certified analysts and have also been reviewed by NESMA certified analysts. The analysts have a considerable amount of COSMIC experience as well. Most of the COSMIC analyses are reviewed by COSMIC entry level certified analysts, so the quality of the measurements should be high. Still, there is always the impact of the documentation quality. These measurements have been carried out as part of the Sogeti bidding process. This means that the documentation is delivered by client organiseations, requesting a price quotation based on the requirements described. In many cases, the quality of the documentation was not very good and the analysts had to make a lot of assumptions during the analysis. The projects involved are all situated in the business application domain. The major part of the organiseations involved is banking, insurance and government organiseations. The dataset is presented in Table 4.

147

Table 4: Dataset Sogeti analysis 2006 Project ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

#FP Nesma 302 653 606 245 112 499 565 249 129 381 924 1076 412 279 279 136 135 874 61 1622 627 586 741 498 286 334

# ILF 11 13 17 6 2 16 34 14 1 0 45 45 14 11 11 3 3 32 1 27 23 31 34 21 12 6

#EIF 6 1 0 6 9 3 0 3 12 30 2 2 1 4 4 0 2 0 4 4 1 0 0 0 1 8

# EI # EO 16 53 45 31 6 45 38 23 4 0 136 136 19 20 20 13 0 95 1 124 58 75 49 63 20 26

19 53 55 23 4 34 25 14 6 42 7 7 21 20 20 11 0 39 6 169 25 30 51 39 23 27

# EQ 9 20 8 3 0 1 1 1 4 0 5 43 11 1 1 2 0 13 0 1 22 2 13 6 4 3

#CFP

# Func. Proc.

313 603 778 257 75 445 488 270 73 281 1144 1448 509 286 352 137 120 925 66 1864 714 620 893 530 252 301

54 110 152 43 8 66 64 36 14 42 143 181 51 44 44 25 15 159 7 223 113 118 113 104 35 34

Based on the dataset above, the correlation between FP and CFP is the following:

Figure 3: Conversion formula FPA - COSMIC The conversion formula that can be calculated form this dataset is the following: CFP = 1.22 (NESMA FP) - 64 R2 = 0.97

148

This diagram shows almost the same results as the results presented of previous studies. There is quite a high correlation, but the correlation formula deviates from those reported earlier. So, this means that one of the conclusions of Abran [16] that correlation is often very high, but that there are some variations in the conversion formula across organiseations, looks valid. However, based on the fact that the dataset consists of projects of numerous different organiseations, it can be concluded that a conversion formula with a high correlation coefficient can be found in any dataset with projects measured with both methods, at least as long as the software resides in the business application domain. Based on the similarities and differences between the two methods, one would expect that there is a relationship between the percentage of the data functions of the total function points and the average number of CFP per functional process. If the percentage of the data functions is low, this implies that there are not many files to be referenced in the system and this would result in a low amount of Reads and Writes in COSMIC. One might suspect that a low ratio data functions/total FP would correspond to a low number of CFP per functional process. The following figure shows us however that this is not the case.

Avg cfsu / functional process -->

12,0

Y = 2.5X + 6.3 R2 = 0.03

10,0 8,0 6,0 4,0 2,0 0,0 0%

10%

20%

30%

40%

50%

60%

%Data functions FP-->

Figure 4: Relationship between the number of CFP per functional process and the percentage of function points that the data model delivers So, there does not appear to be a relationship between the number of function points derived from the data model and the average number of CFP per function. If this is true, why is the correlation between FP and CFP then this high? From theory and previous work, we expect the amount of COSMIC functional processes to be equal to the number of FPA elementary user transactions. However, this study shows different results. In a lot of projects we see that the number of COSMIC functional processes is higher than the number of FPA elementary user transactions. The main reason for this is that in IFPUG and

149

NESMA FPA there are particular guidelines for the so-called code tables. Code tables consist most of the times of only two attributes: a code + a description. In IFPUG these tables are not counted at all, and also the associated functionality is discarded. In NESMA, there is only one data function in total for code tables ILF and one data function for code tables EIF and standard one EI, one EO and one EQ for the associated functionality. However in COSMIC there are some other rules to count code tables. Counting a code table is directly dependable on the fact whether it can be marked as an object of interest to the user. If there is any functionality for the user to maintain the code table, COSMIC regards the table as ‘of interest’ to the user and the functionality to maintain the code table is counted as regular functionality. Furthermore, data movements containing data elements from these code table objects of interest are counted just like other data movements. For instance, the checking Reads of the code table are counted in other functional processes as well. This notion means that systems with a relative high number of code tables, and where these code tables are considered an object of interest, are likely to have a higher number of CFP per functional process and a higher amount of functional processes in total. This can be an explanation for outliers where the number of CFP exceeds the expected amount of function points significantly.

6. Outlier analysis In this paragraph the outliers of the study are analyzed in order to make it possible to learn from them. In this study, outliers are identified that deviate more than 20% of the trend line shown in Figure 4. Of course, these data points are also responsible for the fact that the trend line is as it is, but this is the only way possible to study this. From the dataset in table 5.1, only projects 7, 9, 10, 16 and 19 are considered to be an outlier. Table 5: Outliers identified in the dataset Projec t ID 7 9 10 16 19

FP Nesma 565 129 381 136 61

ILF

34 1 0 3 1

EIF

0 12 30 0 4

EI

38 4 0 13 1

EO

25 6 42 11 6

EQ FP per trans. pr. 1 5.1 4 4.4 0 5.5 2 4.5 0 4.9

CFP

488 73 281 137 66

Func. Proc. 55 14 42 25 7

CFP/Func. Deviation proc trendline 8.9 5.2 6.7 5.5 9.4

22% 21.8% 29.9% 34.4% 533.4%

From this sample, it becomes clear that 2 out of 5 projects are relatively small. When looking at these projects into detail, we see that in general, projects with a fairly small size in function points, are even smaller in COSMIC. This is true for project 9. This is quite logical, because of the fixed points FPA gives for the data functions derived from the logical data model. In project 19, the high number of CFP per functional process makes up for the lack of points from the data model. Within the metrics community it is often said that it makes no sense to apply functional size metrics to small projects (below 150 FP). This is supported by recent research from Vogelezang and Prins [17]. Project 7 deviates from the trend line because of the fact that there are fewer COSMIC functional processes than FPA logical transactions identified, instead of the other way around. The high average number of CFP per functional process does not make up for the lack of data model function points and the difference in the number of functions.

150

When analyzing the detailed FPA and COSMIC analysis, the following reasons come to mind: • There are a lot of different elementary FPA functions identified for the different processing towards a printer or a file. In COSMIC there is only an extra exit (X) counted for the different lay-out of the output towards the printer or a file. • There are a lot of combo- or list boxes present that list attribute values of ILF’s or EIF’s. In NESMA FPA, each of these different combo- or list box is counted as an individual EO (4 function points). In COSMIC the combo- and list boxes are not separately counted, but are part of a bigger function. The fact that a number of values must be read and presented lead to one Read and one Exit data movement. So, each of these combo- and list boxes result in a difference in the total number of functions, and there is only a small correction in the increase of the average CFP per functional process. Project 16 deviates 34.4% from the trend line, but it does not look very strange. On first sight, the amount of data functions is quite low (only 3 ILF’s). There seems to be a fully filled CRUD matrix, resulting in 13 EI’s, 11 EO’s and 2 EQ’s. The number of transactions is almost equal to the number of COSMIC functional processes. The deviation lies in the fact that the average number of CFP per functional process (5.5) is significantly lower than the average (7.1) and only slightly above the average number of FP per function (4.5). Project 10 is an application in which only EIF’s and EO’s are present. The average number of FP is 5.5 whereas the average number of CFP is 6.7. Again, the lack of the points for the data model in COSMIC is an important factor, as the difference between the average number of CFP per functional process and the average number of function points per transactional process is not high enough to cover up for this. In short, the results of this analysis are the following: • The conversion of small projects (