On Building a Better Program Size Measure
On Building a Better Program Size Measure Akito Monden1, Shinji Uchida2, Ken-ichi Matsumoto1 1
Graduate School of Information Science, Nara Institute of Science and Technology {akito-m, matumoto}@is.naist.jp 2
Department of Information Engineering, Nara National College of Technology
[email protected]
Abstract: Source Lines of Code (SLOC) is a most basic and widely-used program size measure in software project management and/or quality assurance although it greatly depends on a programmer who implemented the program. To build a better (i.e. programmer-independent) program size measure, this paper analyzed 9 independently-built C programs of a same functional specification, and found that 3 base measures (the number of tokens, tokens of code clones, and function parameters) are useful to eliminate programmer-dependent aspects of SLOC. A new size measure called Adjusted Length of Code (ALOC) built upon these 3 base measures showed that variations of size in ALOC was at most 1.22 times difference among 9 programs while SLOC showed 3.16 times difference. Furthermore, ALOC showed at most 1.60 times difference among another 6 independently-built programs of an alternative specification while SLOC showed 4.66 times difference among these programs. These results suggest that the new measure ALOC can reduce the programmer-dependent aspects of program size and can be used as a better size measure in project management. Keywords Product Metrics, Source Lines of Code, Program Analysis
1
Introduction
From the ancient age to the present of software development, Source Lines of Code (SLOC; usually not including comments and blank lines) has been a most basic measure of software product. So far, various SLOC-based process measures were defined and used for project management and/or quality assurance, such as defect density (defects per SLOC), test case density (test cases per SLOC), productivity (SLOC per person-hour) and so on. However, as everyone knows, SLOC has a serious flaw. That is, SLOC greatly depends on a programmer who implemented the program. Even if given functional specifications are the same, some programmer requires much more SLOC than others to implement the specification. Nevertheless, software companies keep
IWSM/MetriKon 2010
A. Monden, S. Uchida, and K. Matsumoto using SLOC-based process measures because there is no useful alternative of SLOC. Recently, Function Point (FP) has been proposed as a functional size measure, and has been successfully used as a basis of cost estimation, effort allocation and project scheduling, etc. However, the use of FP is usually limited to the early phase of software development since measurement of FP requires additional effort. On the other hand, a simple product size measure like SLOC, which can be continuously measured at low cost throughout a development, is still needed because project managers need to become aware of the growth of the product size to make sure development is going right. This paper tries to build a programmer-independent program size measure based on observations of independently-build programs of a same functional specification. 2
REQUIREMENTS TO SIZE MEASURE
This Section clarifies requirements that the program size measure need to satisfy. Let P be a program, S be a functional specification of P, and H be an implementer (a programmer) of P. We specifically denote P(S, H) as a program implemented by H based on S. Then, Adjusted Length of Code ALOC(P), which is the programmer-independent size of P, should satisfy the following requirement. [Req. 1] For an arbitrary S, Hi and Hj (i ≠ j), ALOC (P(S, Hi)) ≈ ALOC(P(S, Hj)) This requirement means that two independently-built programs of a same spec. should have almost same size. Obviously, conventional SLOC does not satisfy this requirement. In addition to Req. 1, we need the following requirement because any function that returns a constant value satisfies Req. 1. [Req. 2] For arbitrary P and Q of different specification, ALOC(P║Q) ≈ ALOC(P)+ALOC(Q) where P║Q is a program where Q was concatenated to P. For a practical use of the size measure, we also give the following requirement. [Req. 3] For an arbitrary P, ALOC(P) ≈ E(SLOC(P)) where SLOC(P) is SLOC of P, and E(x) is the expected value of x. This requirement means that ALOC(P) nearly equals to the average SLOC of P's potential implementations. By satisfying Req.3, project managers can directly substitute ALOC for SLOC as a better program size measure. For example, if a company has a productivity baseline "10 SLOC per person-hour", then it can be substituted by "10 ALOC per person-hour."
Software Measurement Conference
On Building a Better Program Size Measure 3
ANALYSIS OF INDEPENDENTLY-BUILT PROGRAMS
3.1
Materials
To identify a set of source code measures that can be used to eliminate programmer-dependent aspects of SLOC, this Section analyzes 9 independently-built C programs of a same functional specification. The functional specification, programmers and measurement tools are as follows. [Specification] This program solves a 3×3 board 8-puzzle (a sliding-block puzzle) via breadth-first search algorithm. The program reads an initial state of a board from standard input, find a solution, and output a sequence of states from the initial to the goal. To avoid the state explosion, the program must make sure not to inspect the same state appeared in past search. [Programmers] 9 master course students of Nara Institute of Science and Technology (NAIST). [Measurement tools] Resource Standard Metrics [4] was used for SLOC measurement. CCFinderX [1][3] was used for code clone measurement. Token Extractor [7] was used for token measurement. Table 1 shows characteristics of 9 programs PA, PB, …, PI. "Programming Tips" column indicates programming tips, algorithms or data structures being used in each program. "Line feed" column indicates whether a programmer over use line feed or not. "SLOC" indicates Source Lines of Code not including comments and blank lines. "Function" is the number of functions in a program. "Function parameters" is the total number of parameters of functions. "Token" is the number of tokens in a program. "Token types" is the number of unique tokens in a program. "Tokens of code clone" is the number of tokens covered by any code clone (duplicated code portions.) "Coverage of code clone" is the percentage of tokens covered by any code clone (i.e. "Tokens of code clone" divided by "Tokens"). As shown in Table 1, SLOC varies from 116 to 366 (3.16 times difference.) This support our intuition that SLOC greatly depends on a programmer who implemented the program. 3.2
Analysis
3.2.1 Programming Style As shown in Table 1, three programs (PD, PE and PF) over uses line feeds (or carriage returns). All these programs contain line feeds before and after every bracket "{" and "}", which cause increase of SLOC. Also, some programmer put more LFs (line feeds) for initializing variables, e.g. writing "int i; [LF] int first; [LF] int
IWSM/MetriKon 2010
A. Monden, S. Uchida, and K. Matsumoto
P
Programming Tips
PA
Line feed
Tokens of code clone
Coverage of code clone
SLOC
Functions
Function parameters
Tokens
Token Types
queue, adjacency list, hash
165
9
7
991
114
57
5.8%
PB
queue, hash
203
8
7
1406
154
351
25.0%
PC
queue, adjacency list, hash
119
4
2
883
100
153
17.4%
PD
3-dimension array
Yes
329
1
0
3342
107
2439
73.0%
PE
queue, adjacency list, hash
Yes
116
5
2
782
101
39
5.0%
PF
queue, adjacency list, hash
Yes
159
4
1
1090
109
312
28.7%
PG
queue, adjacency list, hash
135
4
2
965
119
130
13.5%
PH
queue, hash
366
22
25
2522
209
968
38.4%
PI
queue, adjacency list, hash
168
8
8
1274
138
254
20.0%
Table 1: Characteristics of 8-puzzle programs
move;" instead of writing "int i, first, move;" in one line. Such differences in programming styles indicate that SLOC varies among implementations even if their specification, algorithms, programming Tips are same. To lessen the effect of programming styles on program size, we decided to use the number of tokens instead of line counting as a basis of program size measure. 3.2.2 Granularity of modules As shown in Table 1, some programs had more fine-grained modules than others. For example, program PH had 22 functions and 25 parameters (fine-grained) while others had at most 9 functions and 8 parameters (coarse-grained). Since defining functions and passing arguments to function parameters require significant amount of code lines and tokens, granularity of modules significantly impacts the program size. Indeed, PH was largest in SLOC because of too many functions and parameters. To evaluate the effect of module granularity on program size, we use the number of functions and function parameters as another basic measures.
Software Measurement Conference
On Building a Better Program Size Measure 3.2.3 Code clone As shown in Table 1, some programs had much more code clones (duplicated code portions) than others; and, this made variations in SLOC among programs. For example, program PD's coverage of code clone was 73% while PE's was only 5%. Indeed, PD was largest in tokens because of too many code clones. To evaluate the effect of code clone, we use tokens of code clones and coverage of code clone as another basic measures. 3.2.4 Algorithms and programming tips As shown in the "Programming tips" column of Table 1, programming tips (including algorithms and data structures) being used varied among programs. Three largest programs (PB, PD and PH) did not use "adjacency list" technique, which enables programmers to write compact code. This indicates that lack of proper programming tips can cause increase of program size. Although it is difficult to directly capture the lack of proper programming tips by any source code measure, code clone measures could be used as indirect measures. As shown in Table 1, all three programs PB, PD and PH had significant amount of code clones. It can be considered that lack of proper programming tips tend to increase the repetitions of similar code fragments and thus increase the code clone measures. 4
DEFINITON AND DERIVATION OF SIZE MEASURE
Based on the analysis in previous Section, we decided to use Tokens, Functions, Function parameters, Tokens of code clones, and Coverage of code clones in Table 1 as base measures to build the programmer-independent size measure ALOC. Here we employ a simple regression model (below equation (1)) to build the ALOC measure where candidates of predictor variables are base measures.
ALOC k1 N1 k 2 N 2 k n N n C
………( 1 )
ALOC : Adjusted Length of Code (ALOC) Nj : Predictor variables kj : partial regression coefficient C : constant To estimate the regression coefficients ki and the constant C of equation (1), we need to carefully prepare a fit dataset so that the resulting regression model satisfies Req. 1, 2 and 3 in Section 2. To satisfy Req. 2, we need to prepare a concatenated program P║Q where P and Q have different specifications. However, since all programs in Table 1 have same specification, here we prepared 9 double-sized programs PA║PA, PB║PB, PC║PC, …PI║PI so that we can expect ALOC(Px║Px) to IWSM/MetriKon 2010
A. Monden, S. Uchida, and K. Matsumoto
Predictor variable Tokens
Regression coefficient
Standard regression coefficient
.393
3.98
Function parameters
-12.2
-.978
Tokens of code clones
-.422
-3.36
(Constant)
15.5
Table 2: Resultant regression model
be 2×ALOC(Px) by ignoring the code clone pairs between former and latter part of Px║Px. As desired outputs, we gave the average SLOC of PA, …PI for all original size programs (to satisfy Req. 1 and 3) and 2 times of the average SLOC to concatenated programs Px║Px. Next, we selected a set of base measures to be included as predictor variables of equation (1). Considering that equation (1) additively connects predictor variables to compute ALOC, we selected "tokens of code clones" instead of the ratio measure "coverage of code clones." We then excluded the measure "Functions" since it had very high correlation (0.984) with "Function parameters." As a result, three base measures (tokens, tokens of code clones, and function parameters) were used as predictor variables. Table 2 shows the resultant regression model. We confirmed that all coefficients and the constant are statistically significant (p < 0.01). This regression model is our program size measure ALOC. Figure 1 shows how ALOC values computed by the model (y-axis) fit the desired outputs (x-axis). As shown in Figure 1, computed ALOC satisfy Req.1-3 for all programs PA, …PI. Table 3 shows comparison among SLOC, desired ALOC (i.e. average SLOC) and computed ALOC of programs PA, …PI. While SLOC varied from 116 to 366 (3.16 times difference), computed ALOC varied from 274 to 335, which resulted in only 1.22 times difference.
Software Measurement Conference
On Building a Better Program Size Measure ALOC (desired)
SLOC
P
ALOC (computed)
PA
165
296
296
PB
203
296
335
PC
119
296
274
PD
329
296
301
PE
116
296
282
PF
159
296
300
PG
135
296
316
PH
366
296
294
PI
168
296
312
Table 3: Comparison between SLOC and ALOC of 8-puzzle programs
700 600 500 400 300 200 100 0 0
Figure 1:
100
200
300
400
500
600
700
Scatter plot of desired ALOC - computed ALOC
5
EVALUTION OF SIZE MEASURE
5.1
Experiment with Alternative Specification
To evaluate the generality of ALOC measure (i.e. a regression model) derived in previous Section, we measure another 6 independently-built C programs PS, PT,…PX of an alternative specification. The functional specification of these programs is to translate a text stream by Huffman coding. 6 programmers include
IWSM/MetriKon 2010
A. Monden, S. Uchida, and K. Matsumoto
P
SLOC
Function parameters
Tokens
Tokens of ALOC code clone
PS
334
7
2135
1433
165
PT
153
5
839
204
198
PU
499
0
2411
1656
264
PV
107
0
625
132
205
PW
130
3
730
93
226
PX
113
1
786
289
190
Table 4: Source code measures of Huffman coding programs
1 faculty member and 5 master course students (all came from software companies) of NAIST. Table 4 shows the result of the experiment. As shown in Table 4, while SLOC varied from 107 to 499 (4.66 times difference), ALOC varied from 165 to 264, which resulted in only 1.60 times difference. This suggests that derived ALOC measure in Section 4 can reduce the programmer-dependent aspects of program size for different specifications; and, can be used as a better size measure in project management. 5.2
Threats to Validity
Here we discuss the threats to the validity of our work. We used only two functional specifications and 9 and 6 implementations for each specification. We need to analyze other programs and other specifications in the future work. There are some other programming factors that can impact the program size. For example, lack of using proper standard libraries can cause increase of program size because a programmer need to write additional functionality in such a case. We need to consider such a factor in the future research. 6
RELATED WORK
Software companies often use logical SLOC (which counts the number of statements rather than source lines) instead of physical SLOC to reduce the influence of programming style. We calculated logical SLOC for 9 programs of Table 1. As a result, logical SLOC still varied from 56 to 235 (4.20 times difference). Therefore, logical SLOC is insufficient for reducing the programmerdependent aspects of program size. Halstead proposed a program size measure called "Volume" that takes the amount of vocabulary into account [2]. Halstead's Volume is given by N×log2n where N is the total number of tokens and n is the number of unique tokens. We calculated
Software Measurement Conference
On Building a Better Program Size Measure Volume for 9 programs of Table 1. As a result, Volume varied from 5207 to 22530 (4.33 times difference). Therefore, Volume is not useful as a programmerindependent size measure. Kusumoto et al. attempted to measure Function Point from source code in a specific application domain [5][6]. This can be an alternative approach to achieve our goal. Since this research is still on the way to achieve the goal, further research is required for a practical use. 7
Conclusion
To reduce the programmer-dependent aspects of program size, this paper first defined three requirements for a programmer-independent size measure. From an analysis with 9 independently-built C programs of a same functional specification, we found that 3 base measures (tokens, tokens of code clones, and function parameters) are useful to eliminate programmer-dependent aspects of SLOC. A new size measure Adjusted Length of Code (ALOC) built upon these 3 base measures showed that variations of size in ALOC was greatly reduced in these 9 programs. To evaluate the generality of ALOC measure, we also measured another 6 independently-built C programs of an alternative specification. The result showed ALOC measure is effective for different specifications; and, can be used as a better size measure in project management. Our new program size measure ALOC can be automatically measured from source code; and, since it satisfies Req. 3 of Section 2, project managers can easily substitute ALOC for SLOC as a better program size measure. In the future, it is necessary to improve our measure based on analyses of other programs and other functional specifications. Acknowledgement Part of this work was conducted in the StagE Project, the Development of Next Generation IT Infrastructure, supported by Ministry of Education, Culture, Sports, Science and Technology. Also, part of this work was conducted under Japan Society for the Promotion of Science, Grant-in-Aid for Scientific Research (C) (22500028). References 1. CCFinderX, http://www.ccfinder.net/ 2. Halstead, M.H., "Elements of Software Science (Operating and programming systems series)", Elsevier Science Inc., New York, 1977. 3. Kamiya, T., Kusumoto, S., and Inoue, K., "CCFinder: A Multi-Linguistic Tokenbased Code Clone Detection System for Large Scale Source Code," IEEE Trans.
IWSM/MetriKon 2010
A. Monden, S. Uchida, and K. Matsumoto Software Engineering, vol. 28, no. 7, pp. 654-670, 2002. 4. Resource Standard Metrics, http://msquaredtechnologies.com/m2rsm/ 5. Shinji Kusumoto, Takuto Edagawa, and Yoshiki Higo, "On an Automatic Function Point Measurement from Source Codes," In 2nd Workshop on Accountability and Traceability in Global Software Engineering (ATGSE2008), pp. 27-28, Dec. 2008. 6. Shinji Kusumoto, Masahiro Imagawa, Katsuro Inoue, Shuuma Morimoto, Kouji Matsushita, Michio Tsuda, " Function Point Measurement from Java Program," In Proc. 24th International Conference on Software Engineering (ICSE2002), pp. 576582, May 2002. 7. Token Extractor for C/C++ http://www.vector.co.jp/soft/winnt/prog/se482039.html
Programs,
Software Measurement Conference