The CRASH Report - 2014 - ITQ

4 downloads 215293 Views 701KB Size Report
(CAST Research on Application Software Health) ... quality of business application software. Structural quality .... these differences were so small - accounting.
The CRASH Report - 2014

(CAST Research on Application Software Health)

Summary of Key Findings

TheCRASH CRASHReport Report--2014 2011/12 • Summary Key Findings The | Summary of of Key Findings

Contents 1. Introduction to CRASH............................................... 4 2. The CRASH Sample..................................................... 4 3. Structural Quality Measurement ................................. 5 4. Observations on the Full CRASH Sample.................... 6 5. Source, Shore, and Number of Users............................ 9 6. CMMI Maturity Level................................................. 9 7. Development Method................................................ 11 8. Summary of Factors Affecting Health Factor Scores ..13 Authors.......................................................................... 14

2

TheCRASH CRASHReport Report--2014 2011/12 • Summary Key Findings The | Summary of of Key Findings

Executive Summary CR ASH reports highlight trends in five structural quality characteristics, or health factors - Robustness, Performance, Security, Changeability, and Transferability. The data reported here are from the Appmarq benchmarking repository maintained by CAST, comprising 1316 applications submitted by 212 organizations from 12 industry sectors located primarily in the United States, Europe, and India. These applications totaled approximately 706 million lines of code. Statistical analysis found that: • An application suffering from violations of good architectural and coding practice that make it less robust are also likely to be less secure.

• With minor exceptions, the health factor scores have little relation to application size. • CMMI Level 1 organizations produced applications with substantially lower structural quality on all health factors than applications developed in CMMI Level 2 or Level 3 organizations. • Across all health factors, a mix of Agile and Waterfall methods produced higher scores than either Agile or Waterfall methods alone. • The choice to develop applications in-house versus outsourced had no effect on health factor scores, while the choice to develop applications onshore versus offshore had very small effects on Changeability and Robustness.

33

The | Summary of of Key Findings TheCRASH CRASHReport Report--2014 2011/12 • Summary Key Findings

706 million lines of code 1316 custom applications 212 organizations 12 industry sectors 11% of the applications are over a million LOC

1. Introduction to CRASH This is the third biennial report produced by CAST on global trends in the structural quality of business application software. Structural quality refers to the engineering soundness of the architecture and coding of an application, rather than to the correctness with which it implements the customer’s functional requirements. These reports highlight trends in five structural quality characteristics, or health factors - Robustness, Security, Performance, Transferability, and Changeability. Structural quality is measured as violations of rules representing good architectural and coding practice in each of these five areas. Evaluating an application for violations of structural quality rules is critical since they are difficult to detect through standard testing. Structural quality flaws are the defects most likely to cause operational problems such as outages, performance degradation, unauthorized access, or data corruption. CRASH reports provide an objective, empirical foundation for discussing the structural quality of software applications throughout industry and government. This report provides a brief summary of the important results from the full 2014 CRASH Report.

2. The CRASH Sample The CRASH Report data are drawn from the Appmarq benchmarking repository maintained by CAST, comprised of 1316 applications submitted by 212 organizations for the analysis and measurement of their structural

quality characteristics. These applications totaled approximately 706 MLOC (million lines of code). These organizations are located primarily in the United States, Europe, and India. The sample includes 565 applications written primarily in Java-EE, 280 in COBOL, 127 in .NET, 77 in ABAP, 59 in Oracle Forms, 33 in Oracle ERP, 39 in C, 28 in C++, 24 in ASP, and 84 written in a mix of languages. The sample is widely distributed across size categories and appears representative of the types of applications in business use. However, the applications usually submitted for structural analysis and measurement tend to be business critical systems, so we do not claim that this sample is statistically representative of all the world’s business applications. Rather, it appears most representative of the mission critical subset of the custom application portfolio. The smallest application accepted into the CRASH sample contains 10 KLOC (kilo or thousand lines of code). Within the CRASH sample 28% of the applications are less than 50 KLOC, 33% contain between 50 KLOC and 200 KLOC, 29% contain between 201 KLOC and 1 MLOC, and 11% are over 1 MLOC including 20 applications over 5 MLOC. At least one application in each language contained over 1 MLOC, with Java-EE, COBOL, and C having applications over 10 MLOC. There are 12 industry sectors represented in the 212 organizations that submitted applications to the Appmarq repository. Financial services firms submitted 421 applications, 314 from insurance, 187 from telecom, 169 from manufacturing, 56 from utilities, 56 from government agencies, 48 14

TheCRASH CRASHReport Report--2014 2011/12 • Summary Key Findings The | Summary of of Key Findings

Java-EE applications accounted for at least 1/3 of the applications in every industry segment except insurance

from retail, 41 from IT consulting, 40 from business service providers, 32 from independent software vendors, 22 from energy, and the remainder from a mix of other business sectors. Java-EE applications accounted for at least one-third of the applications in every industry segment except insurance. Several strong associations were observed between industry sectors and languages. For instance, the preponderance of COBOL applications were in financial services and insurance. ABAP applications were observed primarily in manufacturing, while C applications were most prominent in telecom and utilities.

3. Structural Quality Measurement The following terms will be used through this report. Structural Quality: The non-functional quality of a software application that indicates how well the code is written from an engineering perspective. It is sometimes referred to as technical quality or internal quality, and represents the extent to which the application is free from violations of good architectural or coding practice.

Structural quality - the extent to which the application is free from violations of good architectural or coding practice.

Structural Quality Health Factors: The CRASH data include five structural quality characteristics, which will be called health factors in this report. Scores for these health factors are computed on a scale of 1 (high risk) to 4 (low risk) by analyzing the application to detect violations of over 1200 good architectural and coding practices. Scoring is based on an algorithm that evaluates the number of times a violation occurred compared to the number of opportunities where

it could have occurred, weighted by the severity of the violation and its relevance to each individual health factor. The quality characteristics analyzed in the CRASH Report are: • Robustness: The stability and resiliency of an application and the likelihood of introducing defects when modifying it. • Performance: The efficiency of the software with respect to processing time and resources used. • Security: An application’s ability to prevent unauthorized intrusions. • Changeability: An application’s ability to be easily and quickly modified. • Transferability: The ease with which a new team can understand an application and become productive working on it. • Total Quality Index: A composite score computed by aggregating scores from the five health factors listed above. Violation: A structure in the source code that is inconsistent with good architectural or coding practice and has proven in the past to cause problems that affect either the cost or risk of an application. Technical Debt: Technical debt represents the effort required to fix violations of good architectural and coding practices that remain in the code when an application is released. Technical debt is calculated only on violations that the organization intends to remediate. Like financial debts, technical debts incur interest in the form of extra costs accruing for a violation until it is remediated, such as the extra effort required to modify the code or inefficient use of hardware or network resources. 5

TheCRASH CRASHReport Report--2014 2011/12 • Summary Key Findings The | Summary of of Key Findings

Architectural and coding flaws that reduce an application’s Robustness are often accompanied by flaws that make it less Secure.

4. Observations on the Full CRASH Sample The distribution of scores for each of the health factors are presented in Figure 1. The distributions for Robustness, Security, and Changeability are negatively skewed indicating the preponderance of scores are in the upper range. Approximately 75% of scores for the operational risk factors of Robustness, Performance, and Security are above 3.0, compared to the lower distributions for the cost-related health factors of Changeability and Transferability. Among possible explanations are that fewer violations related to operational risk are released from development, or these violations are prioritized for remediation over the cost-related factors of Transferability and Changeability. Since the Total Quality Index is a composite of the five health factor scores, its distribu-

tion and descriptive statistics tend toward a mean among the statistics for each of its five component health factors with the exception that its range and standard deviation are less than that of its components. Thus, the Total Quality Index exhibits less variation and is less affected by outliers or extreme scores. Within these data Security was strongly correlated with Robustness, which means that violations of good architectural and coding practice that reduce an application’s Robustness are also very likely to be accompanied by the types of violations that make it less secure. Performance showed little correlation with the other health factors, contradicting the long-standing belief that changes which affect Performance positively will affect other software attributes negatively.

Figure 1. Distributions of Health Factor scores for full 2014 sample

3 6

TheCRASH CRASHReport Report--2014 2011/12 • Summary Key Findings The | Summary of of Key Findings

47

TheCRASH CRASHReport Report--2014 2011/12 • Summary Key Findings The | Summary of of Key Findings

The five health factor scores have little or no relation to size with two language specific exceptions.

Finally, the five health factor scores have little or no relation to size when analyzed across the entire CRASH sample. The only exceptions to this observation were negative relationships between size and Robustness in Java-EE and between size and Security in COBOL. As shown in Figure 2, the Security scores for COBOL applications decline as size increases. Although there are COBOL applications with lower Security scores in all size ranges, the decline in Security scores is dramatic for COBOL applications over 3 million lines of code.

The following sections will report on how various demographic factors affected structural quality in the CRASH sample. Since different numbers of violations were defined for each health factor in each language, demographic effects cannot be easily compared across applications written in different languages. Only the large sample of Java-EE applications contains a sufficient number of applications in each category of the various demographic variables to make statistically valid inferences from the data.

Figure 2. Scatterplot of Security scores with size in COBOL

58

TheCRASH CRASHReport Report--2014 2011/12 • Summary Key Findings The | Summary of of Key Findings

There were no statistically significant differences between sourcing choices on any health factor scores.

Health factor scores for applications serving more than 5000 users are higher than scores for those serving 5000 or fewer users.

5. Source, Shore, and Number of Users Of the 501 Java-EE applications that reported sourcing information, 224 were developed in-house, while 277 were outsourced. There were no significant differences in the average size measured in lines of code between in-house and outsourced applications. There were no statistically significant differences between sourcing choices on any health factor scores in the Java-EE sample. Although there were no mean differences on these health factors, there was substantial variation within each sourcing category suggesting that factors other than application source might be affecting the health factor scores. In the Java-EE sample, 387 applications were developed onshore while 114 were developed offshore. There were no statistically significant differences in scores for Performance, Security, Transferability, or the size of applications developed onshore or offshore. The only significant differences based on shoring choice indicated that applications developed onshore were slightly more changeable and robust. Although statistically significant, these differences were so small - accounting for less than 2% of the variation in the scores - that they have little practical significance. In the Java-EE sample 50 applications were reported to serve under 500 users, 37 applications served 500 to 5000 users, and 101 applications served more than 5000 users. Significant differences were found for all health factors based on the number of users served by the application. Across all health factors the significant differences were ac-

counted for by health factor scores for applications serving more than 5000 users being higher than scores for those serving 5000 or fewer users. Applications serving more than 5000 users are typically customer facing applications. Therefore it is not surprising that greater effort would be focused on the structural quality of these applications considering their risk to the business if they suffer operational problems or are difficult to maintain.

6. CMMI Maturity Level In the Java-EE sample, 23 applications were developed in CMMI Level 1 organizations, 26 were developed in CMMI Level 2 organizations, and 32 were developed in CMMI Level 3 organizations. There were not enough CMMI Level 4 or Level 5 organizations in the sample to provide valid comparisons beyond CMMI Level 3. There were no significant differences in the sizes as measured in lines of code between the applications developed in organizations at any of the three CMMI levels. Figure 3 displays the distributions of scores for applications developed in CMMI Levels 1, 2, and 3 organizations for each health factor. Significant differences were observed among applications developed at different CMMI maturity levels on all health factors. The strongest effects were observed for Robustness, Security, and Changeability, accounting for between 20% and 28% of variation in the scores. The statistically significant impact of CMMI Maturity Level on Performance and Transferability was not as strong, but still accounted for between 11% and 12% of the variation in scores. 19

TheCRASH CRASH Report • Summary of Key Findings The ofSummary Key Findings CRASHReport Report--2014 -2011/12 2014| Summary | Executive of Findings

4,0

4,0

3,5

3,5

Robustness

Total Quality Index

Figure 3. Health factor distributions for CMMI Levels 1, 2, and 3 applications

3,0 2,5 2,0

Level 2

2,5

Level 1

Level 3

4,0

Level 2

Level 3

4,0

3,5

3,5

3,0

Security

Performance

3,0

2,0

Level 1

2,5 2,0

3,0 2,5 2,0

Level 1

Level 2

Level 3

4,0

4,0

3,5

3,5

Transferability

Changeability

Applications developed by Level 1 organizations have significantly lower health factor scores that those developed in CMMI Level 2 or Level 3 organizations.

3,0 2,5 2,0

Level 1

Level 2

Level 1

Level 2

Level 3

3,0 2,5 2,0

Level 1

Level 2

Level 3

Level 3

2 10

The | Summary of of Key Findings TheCRASH CRASHReport Report--2014 2011/12 • Summary Key Findings

Health factor scores for the mix of Agile and Waterfall are higher than for Agile or Waterfall approaches used separately.

Further statistical analysis confirmed that the significant mean differences observed on each health factor resulted from applications developed by Level 1 organizations having significantly lower health factor scores than those developed in CMMI Level 2 or Level 3 organizations. No statistically significant differences were observed between the scores on any of the health factors for CMMI Level 2 and Level 3 organizations. These results are not surprising since the change from CMMI Level 1 to Level 2 involves controlling the most common impediments to successful software engineering practice such as unachievable commitments and volatile requirements. With these problems managed, developers are able to perform their work in a more orderly and professional manner, resulting in fewer mistakes during development. This change will have significant impact on the structural quality of the software. The growth from CMMI Level 2 to Level 3 is focused more on achieving an economy of scale from standardizing development practices, so it is not surprising that health factor scores were similar between these two levels. Nevertheless, these data offer definitive proof that process improvements can have strong positive effects on the structural quality of IT applications.

7. Development Method

the distributions of scores for applications using different development methods. Significant differences were observed among development methods on all health factors. The strongest differences between development methods were observed for Robustness and Changeability where they accounted for 14% to 15% of the variation in scores. Smaller but significant differences were observed for Security (9%) and for Performance and Transferability (5% to 6%). Additional statistical analyses confirmed that these differences were accounted for by higher health factor scores for the mix of Agile and Waterfall compared to scores for Agile or Waterfall approaches used separately, or not using a method. Scores for Agile and Waterfall methods did not differ significantly from each other on any of the health factors. These results indicate that for large business critical applications the mix of Agile and Waterfall methods produces greater structural quality than other development methods, although for Performance and Transferability these differences are not large. The superiority of the Agile/Waterfall mix suggests that for these types of applications the greater emphasis on up front design leads to better scores for the Robustness, Changeability, and Security of the application, and to a smaller extent for its Performance and Transferability.

In the Java-EE sample, 57 applications reported using Agile methods, 60 applications reported using Waterfall methods, 46 applications reported using a mix of Agile and Waterfall methods, and 21 projects reported using no method. Figure 4 displays 3 11

TheCRASH CRASHReport Report--2014 2011/12 • Summary Key Findings The | Summary of of Key Findings

4,0

3,5

3,5

3,0 2,5 2,0

Agile

Mix

3,0 2,5 2,0

None Waterfall

4,0

4,0

3,5

3,5

Security

Performance

Robustness

4,0

3,0 2,5 2,0

Agile

Mix

4,0

3,5

3,5

2,5 2,0

Agile

Mix

None Waterfall

Mix

None Waterfall

Agile

Mix

None Waterfall

Agile

Mix

None Waterfall

2,5

4,0

3,0

Agile

3,0

2,0

None Waterfall

Transferability

Total Quality Index

Figure 4. Health factor distributions for development methods

Changeability

Scores for Agile and Waterfall methods did not differ significantly from each other on any of the health factors.

3,0 2,5 2,0

412

The | Summary of of Key Findings TheCRASH CRASHReport Report--2014 2011/12 • Summary Key Findings

8. Summary of Factors Affecting Health Factor Scores The strongest impacts on structural quality among all the demographic factors were for process maturity. CMMI Level 1 organizations produced applications with substantially lower scores on all health factors than applications developed in CMMI Level 2 or Level 3 organizations. The impact of development method was not as great as that for process maturity, but still impacted all health factors. The mix of Agile and Waterfall methods produced higher scores than either Agile of Waterfall methods used alone, suggesting that for business critical applications the value of agile and iterative methods is enhanced by the up-front architectural and design activity that characterized Waterfall methods. The choice to develop applications in-house versus outsourcing them or onshore versus offshore had little to no significant effect on health factor scores.

These results provide definitive evidence for the value of process maturity and achieving the right mix of Agile and Waterfall methods in developing business critical applications. Structural quality on large business critical applications was best achieved when impediments to disciplined software engineering practices were removed and early design activity was integrated with short cycle release. The full CRASH report will include benchmark data for each language on each health factor, as well as data on the most frequently violated rules of good architectural and coding practice for each language.

These results provide definitive evidence for the value of process maturity and achieving the right mix of agile and waterfall methods in developing business critical applications. Structural quality on large business critical applications was best achieved when impediments to disciplined software engineering practices were removed and early design activity was integrated with short cycle releases.

513

The | Summary of of Key Findings TheCRASH CRASHReport Report--2014 2011/12 • Summary Key Findings

Authors Dr. Bill Curtis Senior Vice President and Chief Scientist

Lev Lesokhin Executive Vice President, Strategy and Analytics

Dr. Bill Curtis is best known for leading development of the Capability Maturity Model (CMM). Prior to joining CAST, Bill was a Co-Founder of TeraQuest, the global leader in CMMbased services. Earlier he directed the Software Process Program at the Software Engineering Institute (SEI) at Carnegie Mellon University. He also directed research at MCC, at ITT‘s Programming Technology Center, in GE Space Division, and at the University of Washington. He is a Fellow of the Institute of Electrical and Electronics Engineers for his contributions to software process improvement and measurement.

Lev Lesokhin is responsible for CAST‘s market development, strategy, thought leadership and product marketing. He has a passion for customer success, building the ecosystem, and advancing the state of the art in business technology. Lev comes to CAST from SAP’s Global SME organization. Prior to SAP, Lev was a leader in the research team at the Corporate Executive Board, a consultant at McKinsey and a member of technical staff at the MITRE corporation. Lev holds an MBA from the Sloan School of Management at MIT and a B.S.E.E. from Rensselaer Polytechnic Institute.

Alexandra Szynkarski Product Marketing Manager

Stanislas Duthoit Research Associate

Alexandra Szynkarski is the product manager for CAST Highlight and research assistant in CAST Research Labs. Her research interests include comparative analysis of application technical quality across technologies and industry verticals, as well as measuring technical debt. Alexandra received an MS in international business administration from the Institut d’Administration des Entreprises in Nice, France.

Stanislas Duthoit is a research associate in CAST Research Labs. His interests include structural quality benchmarks and measuring software performance trends on the global application development community. Stanislas holds a MSc in Civil Systems and a Certificate in Management of Technology from UC Berkeley.

6 14