Program Readability: A Proposed Software Metric

KSII The first International Conference on Internet (ICONI) 2009, December 2009 Copyright ⓒ 2009 KSII

1

Program Readability: A Proposed Software Metric Paul S. Fisher1, Jinsuk Baek1, and Minho Jo2 Department of Computer Science, Winston-Salem State University Winston-Salem, NC 27110 - USA [e-mail: {fisherp, baekj}@wssu.edu] 2 Graduate school of Information Management and Security, Korea University Seoul - South Korea [e-mail: [email protected]] *Corresponding author: Minho Jo 1

Abstract This paper addresses the issue of readability extended from the ideas of the Flesch Reading Ease only it proposes another approach that is based upon pattern recognition, and a measure of the locality of these patterns as exhibited in the program text. We suggest that short patterns are easier to grasp than patterns that are longer. We test this technique using a pattern recognition system we have developed called Finite Inductive Sequences, and apply it to English text to determine if it matches the Flesch Reading Ease score which intuitively it should match. Then we apply it to C++ programs, one written by an expert programmer and the other written by a competent programmer, but not an expert. We then derive comparative measures for these two programs, suggest some ideas related to these preliminary results, and suggest some additional considerations for this kind of work. The application of this would primarily be associated with development in the Agile environment, but it could be appropriate for any developmental or maintenance activity involving programming.

Keywords: Readability, pattern recognition, software complexity

1. Introduction Over the past several decades, the area of software metrics has received considerable attention. The ultimate goal of this work is and has been to provide entity charged with the responsibility to deliver a software project a quantitative measure [1] that can be applied consistently in that local domain to determine: 1) reliability and maintainability; 2) measures to control the development process; and 3) locate modules with potential instability. Given these goals, the objective of such efforts have centered on the ability to compare projects involving differing programming languages, coding styles and even differing levels of project complexity. With quantitative measures in hand the desire is then to be

able to provide numbers that reflect the effort, resources, and time associated with a project. In the Editor’s note to [1] the Software Technology Center has added from one of its reviewers the following: “When the decision is made to choose a method to measure software complexity, there is no single method that will meet every need and the use of hard and fast rules may actually increase complexity.” There have been numerous studies and [2] references eleven such studies performed in the 1970’s for both large and small projects to determine the efficacy of the contrasting complexity measures, and their applicability when applied to determine the desired intent. In this same article, there are four references for the topic of the psychological complexity of computer programs, or the complexity

This research was supported by the Brain Korea 21 program, Ministry of Education, Science and Technology, the Korean Government.

2

of the programs introduced by the programmers. From this basis, the design of software has undergone a major evolution where software design now resides on a spectrum between adaptive and predictive [3]. A design and implementation approach called Agile software development encompasses several techniques that minimize risk by incrementally developing software in iterations where an iteration is a period of time allocated to planning, analysis, design, implementation, testing and documentation. This period of time is from one to four weeks, and during any one iteration, the software system is completed ready to deliver, even if the incremental enhancement to the project is small. Team size in the Agile environments must necessarily remain small as the team operates under verbal interaction, using experienced developers. [3] suggests that there are still environments where the predictive approach is valid. So again one size does not fit all. There is one last idea that warrants examination in this introduction. That is Refactoring [4] of software. The idea is to approach existing software with a set of transformations that allow a user to alter existing software by repeated application of these transformations. The idea is to make these transformations, then test the transformed software to make sure its functionality is not compromised. These successive transformations will eventually produce a more acceptable final product which is amenable to maintenance and alteration techniques associated with Agile methods, and better fits a simpler description of structure. The remainder of this paper will indirectly address both the Refactoring transformations and the Agile methods as we propose an idea for a readability criteria for software. We discuss this idea of readability, since at some point, someone will be tasked with the responsibility to look at code and make some changes, whether the changes are the Refactoring transforms or the alterations that occur due to Agile methods. In a sense, the Refactoring transforms are alterations that make the code more readable, but what is a quantitative measure of this readability. We will conclude this paper with some preliminary research on this idea. We want to insure that the implementer of this software will maintain under Agile methods or Refactoring the objective of increasing the simplicity of code based upon readability. This issue of readability involves ideas from the complexity environment, and the pattern perception of the human cognitive system.

Fisher et al.: Program Readability: A Proposed Software Metric

2. Reading Complexity The process of implementing a system in software requires skills beyond basic programming languages and systems. One must have functional knowledge of the application, and in the Agile environment, the developer does not need to have a clear picture of the end product since the end is achieved by mutual consent over iterations of small steps. However, the ability to read and understand code is central, as the code must come from a design, and the design comes from the understanding of the participants, which may start in obscurity and over time evolve to the clarity necessarily encapsulated by the released application. Readability has evolved in natural language to the point where many organizations use it to evaluate materials for potential adoption or for submission of documentation and manuals. Consider the following brief paragraph: The proposed mechanism provides many advantages compared to the previous mechanisms when applied to the HMIPv6 network. First, it reduces message overhead, because hAAA or mAAA sends a message for establishing the security association only to the uniquely selected mAAA or sAAA rather than all or some subset of the neighboring AAA servers. In addition, the BS will not send periodic invite messages without any knowledge about the new network entities in its area. Those service entities including BS’s or AAA servers are most likely to be actual BS’s or AAA servers for a given MAP or MN owing to our accurate mobility prediction. The above paragraph has a Flesch Reading Ease [5] score of 27.5 and the Flesch Kincaid Grade Level [6] is twelve. The Introduction to this paper has a Flesch Reading Ease score of 25.4 and a grade level of twelve. Any score under 30 is considered to be very difficult. The higher the Flesch Reading Ease score, the easier the document is to read, and the recommendation is to have a Reading Ease score of between 60 and 70 for the typical reader. It seems to us, that the italicized paragraph is much more difficult to read than the Introduction of this paper, since the paragraph requires considerable specific knowledge to understand what is written. Computing professionals come in all skill levels, and Agile styles require experienced implementers. It is interesting to think about the skills that experience brings to an implementer. Perhaps the skills of experience have taught the master performer how to

KSII The first International Conference on Internet (ICONI) 2009, December 2009

think about and verbalize in order to obtain clarity, visualize the application in terms of its composition and presentation, and see the implementation as a concept within the mind. In any case, those around the master performer must also be able to grasp the same material at some level to be a productive member of the team. The idea then is to consider a reading measure for programs that would give a value indicating the level of skill necessary for an implementer to have in order to read and understand the program. This does not address the requisite knowledge necessary to understand the purpose of the application, only the idea that the individual with his/her level of skill can understand what is expressed by the instance of the program. For those who have or are teaching the programming skills, we recognize the fact that teaching the student about the process of programming and having them practice this skill in a problem domain where there is no familiarity is difficult.

3. Pattern Readability and Recognition Research some years ago investigated the idea of small program complexity [7] and with other research detailing measures to signal success in the areas of development, design and maintenance [8] [9] [10] [11] [12]. Clearly, from these and many other articles the results suggest that a major factor is the experience of the team members. Returning to the italicized paragraph in Section 2, the paragraph has a very complex reading level from the Flesch Reading Ease index. In fact to understand this paragraph taken from a proposal for a modification to a wireless protocol, one would need to be significantly more prepared in wireless protocol than just a casual reader to make sense of the paragraph. So one could design a collection of short descriptions, measure their reading index, and then present them to the software individuals, and ask them to start reading on the very complex level. At the end of reading each level document, a quiz for understanding could be given and if a sufficient score was achieved, then the individual would be adjudged to be an expert in this technology at the level of the passed quiz. Thinking about experience, the above approach would indicate the background knowledge of the individual concerning any project that was to be undertaken. If we now could couple this to a reading index of the code, we could approach the determination of programming experience by the same means. So the objective is to determine a readability index for programs. A solution to this

3

would be to measure the complexity of small programs, and then as each level of complexity dropped, have the individual answer questions about the software, and again when the score was appropriate, the individual could be determined to be at that software skill level. Many organizations already approach the problem from the standpoint of a test. For example, Robert Half Technology requires each of its candidates for consideration of a temporary or permanent software position to take and exam given at three levels. Passing this exam will open the doors to placement for positions determined to be at the level of the exam passed. Failure at the lowest level will not allow the candidate to continue in this process. Organizations can also require employees to be certified in selected environments at some level. Again, these approaches strive to determine the individual’s skill and understanding of a technology. 3.1 Finite Inductive Sequences (FI) and Readability In this section, we describe a particular kind of sequence composed of symbols coming from some alphabet. The basis for this material is found in [13] [14]. The primary structure of FI, called a ruling, is a finite state machine that can use a short driving sequence to generate another sequence that may be much longer. Sequences of symbols are said to be finitely inductive (FI) over a finite alphabet if the choice of any symbol at any particular position in a sequence depends immediately upon only the choices of symbols at the previous n points. The least such n is called the inductive base of the sequence. Given an FI sequence, an implicant is a pair (w, p) consisting of a word w over the alphabet and a symbol p of the alphabet such that w occurs at least once as a substring of the sequence; and whenever w occurs as a substring of the sequence there is a succeeding entry and it is p. The w is called the antecedent while p is called the consequent. An implicant is said to be in reduced form if no proper terminal segment of its antecedent is the antecedent of another implicant. We note the following: a. Every finite sequence is finitely inductive. b. For any finitely inductive sequence, the inductive base is the maximal length of the antecedents when considering all of its reduced form implicants. c. If an FI sequence has inductive base n and an alphabet of k symbols, then kn is an upper bound for the number of its reduced form implicants.

4


d. The implication for FI strings is that they are deterministic, and the inductive base for the ruling, as well as the antecedent length for each implicant is a measure of organizational complexity of the program, in just measuring the symbol arrangement. Expression (2) is an example of function table resulting from an FI sequence derived through a process called Factoring, meeting the requirement for an optimal storage system for streams of symbols representing various events. The system consists of a structure with n levels, although we will only be dealing with one level structures. The rules are defined from incoming data representing an event according to a push up formulation of the symbols. The inductive base is the maximum number of symbols in all rules for that level less one. The rules are formulated much like those characterizing a Markov process. [We point out that the difference between the Markov Model and FI sequences is that within the theory of FI sequences, the order of the sequence (i.e. its inductive base) can be altered to any a’priori value desired unlike the order of the Markov Model]. That is, given some number n, the inductive base, and the string of characters representing an event, then we can say that the previous n characters uniquely determine the next character in the string. As the inductive base is reduced, the number of levels will increase. The total number of rules remains nearly invariant as the number of levels change. Consider the following string representing an event and sent to a system to be processed. This processing we call Factoring. [Note: we can extend FI from finite strings to infinite strings, if the string becomes eventually periodic such as the string in (1) [13]. bbcb:abcabcc:abcabcc: . . .

(1)

This sequence comes from an alphabet of three symbols: a, b, c. It has a non-periodic starting sequence: bbcb, and a periodic part: abcabcc, and it is infinite in length. We can form rules for this sequence as a factor of the inductive base. However, since the inductive base must be large enough to uniquely guarantee the next symbol, an inductive base of two or less would be unsuitable since the occurrence of ‘bc’ does not specify the next symbol uniquely (‘bc’ specifies both a ‘b’ and an ‘a’). We illustrate the table that is generated when we allow the inductive base to be the minimal value required for that particular rule. We always assume reduced form rules. The first rule in (2) states that each time we see the antecedent ‘bb’ then a ‘c’ is determined, without ambiguity, to be the next symbol in the string.

bb Æ c, bbc Æ b, cb Æ a, a Æ b, ab Æ c, babc Æ a, bcabc Æ c, cc Æ a, ccabc Æ a (2) From (2) the inductive base is 5 (the largest of the values). Some of the rules are inductive base 1, 2, and 3. Now a complexity measure for programs would involve both the number of implicants by their antecedent size and the inductive base of the initial Factoring. For (1) the summary of complexity is shown in Table 1. Table 1: Readability of the Sequence (1) # of Implicants Implicants of Length:1 2 3 4 9 1 4 1 1

5 2

4. Results Using the idea described in Section 3.1, we tested the system against three English texts: 1) Section 1.0 of this paper, the Introduction; 2) several paragraphs from the same source as the paragraph in Section 2.0 that is taken from a proposal for an extension to wireless protocol when the base station, node and router are all mobile; and 3) paragraphs from a letter written by one of the authors to some of his grandchildren. The results are shown in Table 2. In Table 2, the Number of Implicants is indicated for each of the documents: P is from the Wireless Proposal, I is the Introduction to this paper, and G is from the letter to the Grandchildren. The Flesch Reading Ease and the Flesch-Kincaid Grade Level came from Microsoft Word. The antecedent size counts the number of implicants for each antecedent from Factoring the document. The last value is the Score and indicates the complexity of the document in terms of our measure of its visual presentation. The score is essentially a grade point average for the length of the antecedents. We use for the values in Table 2, a multiplying factor of 5 for length 1, 4 for length 2, 3 for length 3, and so on. For the value associated with the Introduction of antecedent size 6, we combine it with the value for 5, since the other two have no corresponding values. Summing these points, the largest value would have the implicants skewed to the lower antecedent size, and we propose that this makes the patterns in the documents easier to grasp than if the antecedents are longer. We provide an example of that in the next paragraph. When there are few visual alterations in a document (symbol stream), the inductive base gets long. For example, if we process the string AABABCABCD as a repeating pattern, then the inductive is four. If we change the sequence to AAAABAAAAAB again as a repeating pattern, the inductive base becomes ten because of the lack of distinguishing symbols in the string.

KSII The first International Conference on Internet (ICONI) 2009, December 2009

5

Table 2: Calibration Results of FI for English Language Texts Number of Implicants P – 547 I – 666 G – 745

F Reading Ease 27.5 25.4 66.1

F-Kincaid Grade 6 12.0 12.0 9.7

Counts Ant Size:

6 0 2 0

5 8 2 4

4 17 16 29

3 94 113 141

2 240 375 398

1 188 159 173

Score 4066 4001 3949

Figure 1: A Snippet from a Ruling L Implicant Number 0 2856 0 2857 0 2858 0 2859 0 2860 0 2861 0 2862 0 2863 0 2864 0 2865 0 2866 0 2867 0 2868 0 2869 0 2870 0 2871 0 2872 0 2873 0 2874

Antecedent

38 44

240 242

246 246

40 40

103 103

243 243

101 101

6 6

206 206

154 154

71 71

In [14] we reported on the factoring of a musical score that had been transcribed for a guitar. In this piece the inductive base was 64. If one moves from the melodic to the atonal music, then the inductive base, we believe it would be considerable reduced due to the appearance of so many unique combinations from a small alphabet. The implication is not in considering the semantic content of the material, but from just the patterns of symbols as the variety of symbols in a string allow the human to more easily grasp and process the symbols into meaningful blocks. In order to obtain these results, we wrote a simple lexical analyzer and replaced the words in the text by numeric values that were their index position in the symbol table. We included all of the symbols from English that we thought we would need, and this turned out to be 95 such symbols. We included the space character, as this is an important delimiter, but we did not include the index in the output for this symbol. Once this output was obtained, we processed this stream of numbers by the Factoring program to produce a storage structure called a Ruling. In the normal usage of FI, we would produce a ruling of several levels – usually no more than ten, and also keep the inductive base small – say four or five. However, for this application, we were interested in producing the equivalent of a Markov model for the string, where there was only one level. In this way we could consider the comparative complexity of the implicants on the same level. Figure 1 provides few of the rules in the Ruling produced from one of the data

38 38 247 38 38

116 154 167 204 205 244 244 244 254 254

211 248

44 247 247

23 23 23 23 23 23 23 23 23 23 215 44 44 44 56 58 71 78 159

245 245 245 245 245 245 245 245 245 245 245 247 247 247 247 247 247 247 247

247 247 247 247 247 247 247 247 247 247 247 247 247 247 247 247 247 247 247

Consequent Æ 216 Æ 21 Æ 118 Æ 205 Æ 44 Æ 8 Æ 101 Æ 8 Æ 8 Æ 244 Æ 220 Æ 44 Æ 11 Æ 44 Æ 101 Æ 247 Æ 44 Æ 247 Æ 247

files that we examined. In Figure 1 the L indicates the level in the Ruling, and as shown, all of these implicants are in Level 0. The Implicant Number indicates the implicant that is referenced by the content of that row in the Ruling, and these do not indicate any significance, except to allow reference to them. The next set of values represents the antecedents for the implicants. For example, the implicant 2865 requires an antecedent of 11 symbols, as the two implicants numbers 2864 and 2865 only differ in the first position. The last column contains the consequents for the implicants. As can be seen from the two implicants, the consequents of 8 and 244 both have the same pattern of symbols determining their exact location in the stream of data being processed – and these two symbols can be located very far apart – except for the symbol that makes the antecedents distinct: the 38 and the 44. Again lots of long antecedent implicants implies that there are common substrings of symbols in the text. Lots of short antecedents suggest that the symbols are distinct and don’t occur frequently, or if they do occur frequently, the subsequences in which they appear are identical. Now applying this technique to C++ programs, we have included the results from two programs, both doing the same task: that is they are both factoring programs. The programs were written by two individuals without collaboration. One of the individuals is a superb programmer, and the other is limited in experience.

6


Table 3: The Results of Comparing C++ Programs from and Experienced and an Inexperienced Programmer Antecedent Length:

10

9

8

7

6

5

4

3

2

1

Implicant Counts by Antecedent Length: Novice: 6 4 13 16 20 Professional: 0 0 4 2 2

15 2

37 12

49 17

148 63

221 63

584 223

753 574

1080 1096

18 19

5 1

12 6

17 8

50 30

75 30

197 107

254 276

364 528

6 9

Inexp Norm: Exp Norm:

14

2 0

13

1 0

12

4 2

11

5 1

7 1

Factoring these two programs, looking at the length of antecedents we have the two ‘counts’ shown in Table 3. For example, the Inexperienced programmer had 6 antecedents of length 14, while the more professional or experience programmer had none. The norms shown in Table 3 are derived by dividing the counts by the number of impliecants in each ruling, and multiplying the result by 1000. Here the expectation is that if the results of processing a program are the same as processing English text, then the inexperienced programmer code would have a larger weight using the weighting scale used for the text where more weight is given to the shorter implicants. This is indeed what we see from the data. The weighted results from these two examples show that the inexperienced programmer has a value of 9561 while the experienced programmer produced a weight of 10188. It is also instructive to look at the actual values in Table 3. The inexperienced programmer generated more subsequences of similar code resulting in an inductive base of 14 while the experienced programmer only required an inductive base of 12. Neither program exhibits many symbols that are unique, and this would suggest that some specific symbol was generated for a local purpose and not used again. The experienced programmer generated far more short implicants than did the inexperienced programmer. This would imply that in reading the program, an individual would find more structures that are local, while the inexperienced programmer has similar structures imbedded in the code in many more locations, producing more implicants with longer antecedents. From a pattern distinction viewpoint, the program with shorter implicants should be easier to read.

5. Conclusions The work presented here is very preliminary and does not represent enough data to formally state a conclusion, except that these results suggest further verification would be useful in determining if programs suited for readability might be better from the standpoint of maintenance and development, where the development strategies involve an Agile programming approach. Also in the future we will consider an application of this technique to programs that have been refactored, to actually determine the change in their readability. We also will consider

multiple programs from the same individual, to see how dependant readability is upon the topic matter represented by the software written. This research is still preliminary, and we are interested in considering this factor to aid in developing a program readability factor, and how it can be used as a quantitative measure to guide the development of complex software systems.

References [1]

[2]

[3] [4] [5] [6] [7] [8] [9] [10] [11]

[12] [13] [14]

T. J. McCabe, and A. H. Watson, “Software Complexity,” Software Technology Support Center, Hill AFB, UT, December 1994, available at hppt://www.stsc.hill.af.mil/crosstalk/1994/12 complex.asp. M. M. Tanik, “A Comparison of Program Complexity Prediction Models,” ACM SIGSOFT Software Engineering Notes, Vol 5, No. 4, pp. 10–16, October 1980. B. Boehm, and R. Turner, “Balancing Agility and Discipline: A guide for the Perplexed,” Addison-Wesley, 2003. M. Flowler, “Refactoring: Improving The Design Of Existing Code,” Addison-Wesley, 1999. R. Flesch, “A New Readability Yardstick,” Journal of Applied Psychology, Vol. 32, No. 3, pp. 221–233, June 1948. “Flesch-Kincaid Redability Test,” available at http://en.wikipedia.org/wiki/Flesch-Kincaid_Readabi lity_Test. K. Magel, “A Theory of Small Program Complexity”, ACM SIGPLAN Notices, Vol. 17, No. 3, pp. 37–45, March 1982. E. Chrysler, “Some Basic Determinants of Computer Programming Productivity”, Communications of the ACM, Vol. 21, No. 6, pp. 472–483, June 1978 J. S. Reel, “Critical Success Factors in Software Projects,” IEEE Software, Vol. 16, No. 3, pp. 18–23, May/June 1999. M. Feathers, “Before Clarity,” IEEE Software, Vol. 21, No. 6, pp. 86–88, November/December 2004. R. D. Banker, S. M. Datar, and C. F. Kemerer, “A Model to Evaluate Variables Impacting the Productivity of Software Maintenance Projects,” Management Science, Vol. 37, No. 1, pp. 1–18, January1991. T. Field, “When BAD Things Happen to GOOD Projects,” CIO, October 15 1997. J. Case, and P. S. Fisher, “Long Term Memory Modules,” Bulletin of Mathematical Biology, Vol. 46, No. 2, 1984 P. S. Fisher, M. N. Novaes, and G. E. Mobus, “Pattern Recognition and Generation in Music,” Proceedings of the Music Workshop of the 1990 European Conference in Artificial Intelligence, Stockholm, Sweden, 1990.

Program Readability: A Proposed Software Metric

Program Readability: A Proposed Software Metric

Suggest Documents

OSMAN â A Novel Arabic Readability Metric - Lancaster University

ERGONOMIC CYCLE TIME: A PROPOSED METRIC ...

OSMAN â A Novel Arabic Readability Metric - Lancaster University

OSMAN â A Novel Arabic Readability Metric - Lancaster University

Software Metric Framework

PROGRAM ELEMENTS PROPOSED MATERIALS

A Metric in Global Software Development Environment.

Role of Software Readability on Software Development Cost ... - Core

A Software/Design Method for Predicting Readability for ESL Students

A Business Classifier to Detect Readability Metrics on Software ...

A Software/Design Method for Predicting Readability for ESL Students

A Software/Design Method for Predicting Readability for ESL Students

A Simpler Model of Software Readability - Prem Devanbu

A Software/Design Method for Predicting Readability for ESL Students

A proposed methodology for establishing software process ...

A Proposed Hardware-Software Architecture for ...

A Proposed Hardware-Software Architecture for

Proposed Marketing Program - GM Graphics

Page 1 Page 2 A Software Metric Set for Program Maintenance ...

Readability & Books.p65

Program Behavior Prediction Using a Statistical Metric ... - Canturk Isci

Deriving Metric Thresholds from Benchmark Data - Software ...

Program Behavior Prediction Using a Statistical Metric ... - Canturk Isci

Program Behavior Prediction Using a Statistical Metric ... - Canturk Isci

Program Readability: A Proposed Software Metric