When is Software Ready for Production? Parallels with Automotive QS9000 Methods Michael Ellims1, Richard Evans2, Keith M Hobley3 and Ian Kendall4 1
Pi Technology, Milton Hall, Ely Road, Milton, Cambridge, CB4 6WZ, United Kingdom.
[email protected] 2
Jaguar Cars Limited, W/1/014, Engineering Centre, Abbey Road, Whitley, Coventry, CV3 4LF, United Kingdom.
[email protected] 3
School of Computing, The University of Leeds, Leeds, LS2 9JT, United Kingdom.
[email protected]
4
Rolls-Royce & Bentley Motor Cars Limited, Crewe, Cheshire, CW1 3PL, United Kingdom.
[email protected]
Abstract QS9000 is an automotive industry quality system standard. It builds on ISO9001 and interprets it in the automotive context. It also augments ISO9001 by addressing issues more specific to the automotive industry. While the practices defined in QS9000 are well understood and well utilised for physical vehicle components, it is not currently able to be applied to software in any meaningful way. This paper introduces the concepts relating to QS9000, provides some background on the issues of determining software readiness for production, and a possible solution to the issues within the QS9000 framework.
1 Introduction Today's automotive environment is highly competitive and the attribute of quality is of fundamental importance to producing commercially successful motor vehicles. In 1995 a joint venture of Ford, General Motors and Chrysler published the QS9000 [Chrysler 1995] document set - an automotive specific interpretation of the international quality management standard ISO9001 [ISO 1994]. Since then, QS9000 has been adopted to varying degrees as a basis for ensuring the quality and "readiness for production" of components fitted to motor cars.
The Motor Industry Software Reliability Association (MISRA) is a consortium of motor manufacturers, suppliers and academic partners whose mission is "To provide assistance to the automotive industry in the application and creation within vehicle systems of safe and reliable software". In 1999 MISRA distributed questionnaires to gauge the interest in the development of a software readiness for production metric based on QS9000. There was a positive response to this survey, the result of which was the creation of a MISRA working party. This group meets regularly to progress the work, an important contribution being trials that are being carried out by the group members on real projects within their organizations. This paper discusses the challenges of managing embedded software development in the automotive industry, introduces the content and concepts of QS9000 and goes on to discuss the work currently being undertaken by a MISRA working group in exploring the potential for a software readiness metric within the QS9000 framework.
2 Embedded Software in the Automotive Industry Embedded software continues to become an increasingly important aspect of motor vehicles. Software is involved in the control of most of the features in a modern vehicle and with manufacturers continuously seeking to differentiate their products from those of the competition it is clear that embedded software content will only increase over time. We consider two issues that can lead to problems; complexity and domain knowledge. Software enables complex tasks to be performed. There is also complexity at the interface between systems (vehicle based systems and interactions between vehicle and its environment). When dealing with complexity, misunderstandings can cause problems. Traditionally, software has been used to enhance the performance of existing mechanical systems e.g. electronic fuel injection replacing a carburettor. In these cases domain knowledge was high and had been built up over decades of development. However, software is increasingly being used for new applications for which the domain knowledge is not necessarily high. An example of this type of system is Adaptive Cruise Control (ACC). This feature allows a vehicle to automatically cruise behind another vehicle through the use of radar to track the motion of the vehicle in front. This type of system involves complexity in terms of interactions between many vehicle based systems and is also a new system with less domain knowledge. The increase in complexity and a lack of domain knowledge have an important
influence on the potential for problems in software based systems. When these problems are realised in practice, it is often impossible for managers to understand the reasons why they have occurred. This can be for a number of reasons, for example: little visibility of the software development process; the intangible nature of software; the lack of visibility of the factors which influence software. This situation is compounded by the need to go into great detail to explain the reasons for a particular software problem. All these factors conspire to equip managers poorly to control the risks associated with software development. The effects of software problems span both cost and timeliness of software delivery (project risks) and the consequences of failure of the product in the hands of the customer (product risks). Another barrier that faces managers in the automotive industry is that the approaches used for achieving quality and reliability have, in the main, been developed with physical components in mind and there is an emphasis on manufacturing process capability. A quality assurance department applying these concepts to a programmable control unit will probably only perform checks for physical dimensional correctness.
3 The Motivation At the outset of the "software readiness for production" work described in this paper, the vision was that the status of a given software development could be reported to managers graphically as a percentage completeness (see Figure 1). The idea was that by plotting the planned and actual progress on such a graph, the difference between the two would provide management with a meaningful indication of software progress. It therefore follows, that the aims for the work described in this paper are to provide managers with the following; •
Visibility of the issues that affect software development, and a meaningful measure of the "readiness for production" of a given software development.
•
Visibility of the impact of changes at any given stage in the software development lifecycle. The impact of changes can be illustrated very effectively on a graph such as Figure 1.
•
An understanding of the dependencies throughout the development lifecycle i.e. the impact of requirements quality/maturity on the software development and the impact of validation and calibration activities performed by the vehicle manufacturer on the overall readiness for
production of software. Planned progress
Actual progress
Completion (%)
100 80 60 40 20
0
3
6
9
12
15
18
21
24
27
30
33
36
39
42
0
Time to production (months)
Figure 1 - Software completion metric expressed graphically
4 An Introduction to QS9000 QS9000 is the US automotive industry’s interpretation of ISO9001 and was jointly published by Ford, GM, and Chrysler (now DaimlerChrysler) in 1995. It is true to the spirit of ISO9001, but extends the requirements to be more specific about how they should be met. QS9000 applies to the Original Equipment Manufacturers (OEM), subsidiaries, and suppliers. QS9000 is a suite of documents which consists of the following publications: Quality System Requirements (QSR). This is the primary document in the QS9000 Suite. It's foundation is in the International Quality System Standard ISO 9001. Industry and company specific requirements have been added along with references to the supporting documents in the QS9000 Suite. Quality System Assessment (QSA), is used in the audit of the QSR through the use of a series of scored questions. Failure Mode and Effect Analysis (FMEA), defines the technique for identifying and evaluating potential failure modes of product and/or process and their effect. Production Part Approval Process (PPAP), covers the generic requirements for production part approval. Its purpose is to determine whether all the engineering design requirements are properly understood by the supplier and are capable of being implemented prior to commencement of the first production run. Advanced Product Quality Planning (APQP), provides a structured approach
for the implementation of a Product Quality Plan which will support the development of the product, to satisfy customers. Statistical Process Control (SPC), covers the basic principles of SPC, e.g. common and special cause variation, and the use of Control Charts for both variable and attribute data. Measurement Systems Analysis (MSA), provides instruction for selecting procedures to assess the quality of a measurement system. It includes both guidelines and specific procedures for assessment and study techniques.
5 Is QS9000 Applicable to Software? In the previous sections we discussed the challenges that face managers responsible for projects which have a software content. We have also introduced QS9000 as a framework for managing the quality of automotive vehicles, systems, and components. In a broad sense QS9000 is applicable to software in that it defines the basis for a quality management system. However, it does not explicitly acknowledge software and does not refer to any specific basis for software quality management systems e.g. ISO9000-3 [ISO 1997] and the TickIT scheme [TickIT 1994]. The main question is, therefore, can we extend or interpret aspects of QS9000 to apply to software development? In developing a software readiness for production metric, we are interested in defining completion criteria, the measurement of progress with respect to completion, and the reporting of results of these measurements to management. This means that the following parts of QS9000 are not directly relevant: •
Quality System Requirements
•
Quality System Assessment
•
Failure Mode and Effects Analysis
•
Statistical Process Control
•
Measurement Systems Analysis
This leaves the Advanced Product Quality Planning (APQP) and Production Part Approval Process (PPAP) which will be discussed further.
5.1 Advanced Product Quality Planning (APQP) The APQP document constitutes a structured method for defining and executing the actions necessary to ensure a product satisfies the customer. Responsibility for consistency with APQP is placed firmly on the supplier, and is required of all system, subsystem and component manufacturing locations.
APQP is based on a vehicle development lifecycle and defines what activities (to ensure quality) must be completed by the end of each development phase. It is based on the concept of continuous quality improvement such that lessons learnt from one system's development should be fed back into the APQP process for future developments. In a sense it is the APQP document that ties the rest of the QS9000 documentation together in an temporal manner. It is the document that introduces the requirement for the Production Part Approval Process (PPAP) which in turn defines the criteria for acceptability of a part for production.
5.2 Production Part Approval Process (PPAP) The PPAP document defines an approach to determine if all customer, engineering, design record and specification requirements are properly understood by the supplier. It must also show that the process has the potential to produce a product meeting these requirements during an actual production run at the quoted production rate. PPAP is therefore always required to have been completed prior to the first production shipment of a product. The PPAP document contains concepts of some potential usefulness with respect to software readiness for production. These are enumerated below. Submission levels. The customer (vehicle manufacturer) can decide to place a requirement for one of 5 submission levels on a supplier. Level 1 is the lowest submission level and requires the supplier to perform only self certification. Level 5 is the most demanding in that a declaration must be made and supported with considerable evidence that the part will meet its requirements. Process requirements. These relate to process capability and although not directly related to the theme of this paper, they do introduce the notion of measurement and statistics which could be interpreted in terms of a software process completion metric. Performance tests. These relate to the functional behaviour of parts and as such this type of activity is directly relevant to software development unlike certain dimensional tests which only apply to physical components. Part submission warrant (PSW). This is a certificate which is completed by the supplier in order to declare that a part is ready for production. If a part does not have an associated PSW it is not permitted to initiate volume production.
6 Towards a Solution Here we introduce a mechanism for producing a software measure that can be used for a "software PPAP". It is introduced here to canvas as much comment about the proposal as possible. The ideas are presented in a "pure" form but it is
assumed that changes will be made in practice as necessity dictates. Indeed the current working document sets out principles that should be followed, not procedures. This is done because the working methods and procedures of any two software groups, even within the same company can vary quite widely [Glass 2000]. This of course is due to a variety of factors such as : •
application type, e.g. PC based interface software vs. embedded control software.
•
safety type, e.g. embedded radio control vs. brake by wire.
•
complexity, e.g. light cluster control vs. engine control unit.
As an example of how processes may differ, consider the differences that may result with two embedded software projects with different safety integrity levels. The MISRA guidelines [MISRA 1994] state the following, "higher levels of integrity require more information and more rigorous application of software engineering techniques". Therefore, if we perform a hazard analysis we may find that the radio application can be classified at Safety Integrity Level (SIL) one and the brake-by-wire application at Safety Integrity Level three or four. It then follows that each application would necessarily have different processes associated with it. With this in mind, if we consider the issue of software testing, then for the radio we need only have a repeatable test plan. However for the brake system we are required to have 100% white box testing. This in turn implies that different tools, recording methods and so on will also be required. A more extreme example is where a project is producing a prototype system for a purpose such as proof of concept. Does such a system require a formal process? Probably, however common sense would suggest that it should be less complex to allow the maximum gain in experience for the lowest cost.
6.1 Assumptions The ideas presented here start by assuming the following two points. These have been chosen as a starting point because most, if not all suppliers of embedded software have made progress in this area already and it seems prudent to build on already existing work. Firstly, that the software supplier has a quality system of some sort under which the software will be produced. The current working document specifies ISO9001 but an equivalent system would work as well. Secondly that the quality system defines the process (or how to define it) and hence the stages which the software will go though in the production of the final product. That is, there is some idealised model of the process being used which
progress can in fact be measured against. There are no specific requirements for the development process which can be derived from QS9000 other than those relating to ISO9001. While the characteristics of a typical software development process will be influenced by publications such as MISRA guidelines, details of actually defining a software development process are outside of the scope of this work though it is discussed further in section 6.4. This reliance on a quality system and on a defined process in turn meets the main requirements of APQP (section 5.1), i.e. we have to define how we are to produce the product. To help clarify matters we shall use a simplified version of the V model [McDermid 1991] as shown in Figure 2 to illustrate the ideas being put forward.
Acceptance Tests
Requirements
Functional Specification
System Test
Integration
Overall Design
Module Design
Unit Test
Code Figure 2 - the example V model
Assuming that we have defined the process and its stages then we should have also defined all the work products of those stages. In our example process the major work products include requirements documents, functional specifications, design documents and so on. There may also be other work products associated with each stage, for example checklist, review records and so on. It should of course be noted that work products cannot be limited solely to documents, the class of object referred to here as a work product may include
spreadsheets, databases files and log files and of course the final object code itself! It is expected that this set of work products would form at least the basis of the documentation set that is to be provided under the PPAP process (section 5.2, "Submission levels"). The exact set of documentation supplied is of course a matter to be decided between the two parties involved. To be able to measure progress against the defined process each work product needs to have certain properties. These properties include features such as traceability, that is the ability to track changes over time and to associate different versions of work products with other associated work products in some manner. Most importantly each work product also needs to have a well defined set of completion criteria. That is we have to be able to know when a work product is complete. There are several other issues that we wish to take into account as well, such as: •
we wish to add minimal extra work;
•
we want to fit in with what people/organisations already do;
•
we want to distribute the work of generating the data so that it is integrated into the process rather than act as an "add on".
The idea behind the proposal is that it is only possible to reliably or objectively measure items for which a completely unambiguous measure can be defined. For example it is not possible to directly measure the "quality" of code because there is no single manner in which "quality" can unambiguously be defined. Much work has been done to try and find some measure that may be indicative of quality, for example McCabe cyclomatic complexity1 metrics or Halsteads software science [Pressman 1997]. However while valuable for what they do show, they tend to measure only code based attributes, that is properties of existing code rather than whether or not the code is complete or more importantly, suitable for purpose. In contrast the new metric is intended to gauge how "suitable" software is based on the following idea. If the software has been produced according to the process model as defined by the project then our confidence in it being correct should be directly related to how much of the process has been applied.
6.2 Requirements How to deal with requirements is a thorny issue. The scheme laid out in the following sections implicitly assumes that there is some method of recording and 1
Based on the common assumption that increased complexity is positively correlated with an increased probability of error. Fenton [Fenton 1991] seems to dispute whether the McCabe metric even measures complexity.
tracking them but explicitly avoids making any assumptions on how this should be done. If we take the V models development process shown in Figure 2 then in this instance it may be enough that you can explicitly track the requirements from the requirements document into the function specification document (or documents). The original proposal considered the use of requirements for tracking. This could be done and it is being looked into further. However we feel that there may be a number of drawbacks to this approach which we have detailed below. Additional work : currently many projects are not organised to track requirements separately from the rest of the process. Therefore to ask them to do so would add yet another set of paperwork that needs to be tracked on top of the requirements that already exist, or at least should! Granularity : there is also an issue of granularity, from the point of view of the customer a requirement such as "provision for a CAN network suitable for control of auxiliary equipment will be provided". This may be a "good" requirement from the point of view of the customer (who may know nothing of CAN networks) but leaves open a slew of questions for the software provider. The opposite can occur if a customer makes a requirement for a SAE J1939 [SAE 2000] connection. Is the requirement that there is such a connection, or is the requirement stating that all messages defined by J1939 are implemented. The question here is in effect; is this one requirement or 50? Completeness : a complete system is often more than the sum of its requirements. By this we mean that there is usually functionality present in the system that exists in addition to what is specified in the requirements document. Applicability : requirements may apply to a complete system (e.g. hardware, software, harness, sensors) not just to the software. Therefore if we are purely tracking requirements then we need to have some means of separating out the different functional areas from the requirements. All of the issues raised here may of course in fact turn out not to be issues in practice. There is not an issue with the fact that requirements do need to be tracked what we need to be careful of is specifying how this is to be done given that we need to cope with a great deal of process diversity.
6.3 Metric Outline In this section we outline the basic mechanics of the ideas put forward in the framework above. As stated previously, each of the steps is expected to produce a known work product or products and have a well defined end point. The working draft of the guidelines assumes that we can state that either a work product is complete or it is not. There is also the assumption that a latter stage in the process cannot be
complete if one of the previous stages is not complete. We also assume that we can broadly allocate how much effort each item of work will comprise, compared to other stages. As an example, consider a single software module, which for the sake of argument we will call ABC. It is reasonable [Ellims 1999] to assume that the amount of work associated with each phase of the model process can be divided roughly as shown in Table 1 which ignores any management overhead (about 25%).
Work Item
Weighting
Function specification
2/14
Module design
6/14
Code
1/14
Unit test
1/14
Integration test
2/14
System Test
2/14
Total Effort
14/14
Table 1 - example weighting values for work effort
Several things should be noted about this table. Firstly, we have made the assumption that for every module there will be an independent functional specification, which of course may not be the case. Also we have used fractions for the weights, however percentage values could be used if desired. Finally, the effort values suit the process model we are using as an example and are probably not applicable to any process model that differs significantly from our example. Each work item in the example is comprised of a number of separate work products. The details of these should be included in the QA plan and should form the focus of monitoring when tracking progress. At the start of the development process for the module, the completion value associated with it would be zero. As each step in the process is completed the weight for that step is added to the total for that module. When all steps are completed the final value for the module would be a value of 14/14. At this point it is probably useful to demonstrate how we expect the metric to be applied. If only the functional specification is complete the completion value for the system is 2/14. That is about 14% of the total expected work has been completed. If all stages were complete except for example the unit testing then the completion value would be 13/14. It would stay at that figure until the full process had been
completed. Using this complete/not complete concept, if the unit testing was subsequently performed and found errors in the code (but not the design) then the weight from the code would be removed from the final figure reducing it to 12/14 (see section 6.4, "Change"). In addition, the weights for any items further along in the process, i.e. integration and system testing also have to be removed. This then would lower the completion value for the module down to 8/14. For a complete system comprised of N modules the total completion value would also be N. The actual completion value for the system would be the sum of the completion values for all the modules that comprise the system. For example if there were 10 modules, and 4 were complete and no work was done on the other modules then for the project as a whole it would be 40% complete. However if the functional specifications were in fact complete for the other six modules then this value would increase by 6 x 2/14 to about 48% for the whole project. This of course assumes that all modules require equivalent amounts of time and effort (see section 6.4, "Module size").
6.4 Issues There are of course a number of issues associated with the scheme put forward above which we shall attempt to address at this point. Process : the process needs to be defined in a quality plan (or APQP) and it defines how we are to produce the final product which should also be a direct mapping on the requirement stated in section 5.1 that we specify "the actions necessary to ensure a product satisfies the customer". In addition to this there are a number of points that would need to be explicitly taken care of in the process so the mapping onto the PPAP is complete. The issue of "submission levels" (section 5.2) may be independent of process in that suppliers have different levels of trust. However there would appear to be a good mapping between the concept of submission levels and the idea of SIL for the software or system. That is, a higher SIL may necessarily imply a higher submission level. As noted at the beginning of section 6, SIL also directly impacts the process requirements and again indirectly the submission requirements. The issue of "performance tests" (section 5.2) is also an issue for the process. What is meant by "performance" in this context is rather vague. However in our example system it may include any or all of the following: functional tests derived from requirements; operational tests derived from the overall design; unit test results; and code coverage. It may in fact be possible to assume that correct completion of all of these tests provides the analogy we are seeking.
Module size : not all modules are of a similar size. That is they require varying amounts of work to actually bring to completion. This can be dealt with by assigning different overall weights to modules. For example weights of one, two or three could be assigned to modules. A weight of two would indicate a "normal" module, one would be a "small" module and three could be assigned to "large" modules. Engineering judgement will be needed to predict the overall effort which will be required and in practice a normal module may be ranked one and an exceptional module three or more. The converse may also be true and it may be possible for all modules to have a rank of one. Rework : what if when performing the unit tests we find an error in the code? It is assumed that in the pure model that this would then mean that the code again became incomplete. That is, all the credit given to it would be lost. However this is not a serious matter for minor changes as the time to correct the matter would be small so the total time over which the drop in completion value would have effect should also be small. Unless of course we don’t actually get around to making the change for a while in which case the drop in the completion value from the module affected will remain obvious, as intended. Change : as for rework this would result in a loss of all completion points. How big an effect this would have would depend on how far back up the chain (or V, see Figure 2) the change was introduced. For example if it was a change to the functional specification then this would result in total loss of all the completion points to date down stream of that point. Variants : the effects of rework and change outlined above are an extreme view. That is if something is changed then we cannot make any statements about whether it is finished or not. This "pure" view is taken in light of the experience that the code is always 90% done. This can of course be generalised by the cynic to assume that 90% of everything is always done. However despite the justifiable sarcasm above, it may be valid to assume that we only have to redo a small percentage of the work on any work product and can remove points accordingly. For example if we find a coding error during unit testing, then the pure view would say the code is now not complete. However if the function comprises 5% of the total code then it may be a valid approach to take the view that only 5% of the code needs rework so only 5 of the points need to be removed. There are of course complications, if the function to be altered is called by other functions does this imply that those functions also have to be considered to be changed? Possibly. The down side of taking this type of view on how to manage completion is that it may lead to hair splitting as well as multiplying the accounting and tracking difficulties, for example if two functions required re-work then how do we account for this? Do the functions interact? If they do, then if one has its changes complete, is it finished, or is it waiting on the other function to be re-worked?
Usage : The information generated by the proposed metric is in itself reasonably useful. However there may be several ways in which the usefulness could be enhanced. The simplest example would be to track and plot the completion value over time. It should also be possible to compare expected values of progress against actual recorded progress (section 3, second bullet point). Depending on the level of detail required, it might be of value to split the metrics recorded and to plot each work item as a separate item. In this way any problems can be tracked to the relevant work item. In addition where change does occur the effect can be seen against those work items having to be reworked (section 3, third bullet point). These concepts also fit in well with current ideas about how these values are to be calculated and tracked. The current model assumes that this information will be held in a spreadsheet of some format and that as work products are completed the sheet or sheets would be updated.
7 Conclusions QS9000 offers a framework, based on ISO9001 for managing parts quality for volume production motor vehicles. While it has been developed for physical parts and not for software development we have found that there are a number of aspects which could be useful. The ideas that we have presented here have in the main been developed by a small group of software engineers and managers from the automotive industry as a response to the issues raised in sections three and five. These represent our current thinking (as of September 2000) on how a metric for software readiness for production could be defined that is: •
development process independent,
•
provides visibility to management of software development progress and,
•
provides a means to formally signoff software as ready for production within the QS9000 framework.
The primary purpose of producing this paper has been to present these ideas to as wide an audience as possible and most importantly, to entice a response from that audience as to the suitability and/or practicality of the ideas presented. The authors would also like to take this opportunity to thank the MISRA organisation for providing a forum for the discussion2 of the ideas presented here. We would also like to thank the many other people who have contributed to these ideas.
2
Especially the exploding cream slices provided at lunch!
8 References [Chrysler 1995] Chrysler Corporation, Ford Motor Company and General Motors Corporation, QS 9000: Quality System Requirements, 1995, Carwin Continuous Ltd, Thurrock, Essex, UK. [Ellims 1999] Ellims, M. Parkins, P. Unit Testing Techniques and Tool Support. SAE paper 1999-01-2842. [Fenton 1991] Fenton, N.E., Software Metrics a Rigorous Approach, Chapman and Hall 1991, ISBN 0412404400. [Glass 2000] Glass, Robert L. Process Diversity and a Computing Old Wives’/Husbands Tale. IEEE Software July/August 2000 Vol. 17 Number 4. [ISO 1994] ISO 9001:1994, Quality systems – Model for quality assurance in design, development, production, installation and servicing, 1994, ISBN 0580234398. [ISO 1997] ISO 9000-3:1997, Quality management and quality assurance standards part 3. Guidelines for the application of ISO 9000:1994 to the development, supply, installation and manufacture of computer software, 1997, ISBN 0580293165. [McDermid 1991] McDermid and Rook, P. Software development process models. (15/25) Software Engineers Reference Book, McDermid J.A. Editor, Butterworth Heinemann 1991, ISBN 0750610409. [MISRA 1994] Development Guidelines For Vehicle based Software, The Motor Industry Software Reliability Association 1994. Published by MIRA. ISBN 0952415607. [Pressman 1997] Pressman, Roger S; Software Engineering A Practitioners Approach Fourth Edition (European Adaptation by D. Ince), McGraw-Hill 1997, ISBN 0077094115. [SAE 2000] J1939 : Recommended Practice for Truck and Bus Control and Communications Network, Society of Automotive Engineers. [TickIT 1994] TickIT project team, TickIT Guide: A guide to Software Quality System Construction and Certification using ISO 9001:1994, Issue 4.0, 1998, ISBN 0580289974.