LtRules: an Automated Software Library Usage Rule Extraction Tool Chang Liu
En Ye
Debra J. Richardson
School of EECS Ohio University
School of EECS Ohio University
Bren School of ICS University of California, Irvine
[email protected]
[email protected]
[email protected]
ABSTRACT
must obey to ensure proper execution of these API functions. We refer to these constraints as software library usage rules, or API usage rules. For example, in a library that provides file operations, there may be an API function that opens a file and returns the file handle, another API function that closes the file given a file handler, and yet another API function that writes some texts to the file given a file handler. An API usage rule may require that a file be opened before texts can be written to it, or that a file not be written to after it is closed. Because these usage rules are implicit and often buried in lengthy technical documents, programmers are prone to incorrectly use API functions. Such incorrect usage is hard to detect both manually and automatically, again because in the current state of practice, usage rules exist only implicitly and informally. With the help of model checking techniques [6], API usage rules can be specified as temporal properties that model checking tools can recognize, so programmers can use tools to verify if programs using the API violate the rules. Software model checking is a promising approach to improving software quality because it can search all possible execution paths for specific errors. However, the need to manually specify temporal properties of software systems is a major barrier to wider adoption of software model checking, because the specification of temporal properties is a difficult, time-consuming, and error-prone process. Many programmers are reluctant or unable to write temporal specifications. To address this problem, we propose to automatically extract temporal specifications representing API usage rules from known good programs. A known good program of a library refers to a complete program that uses the API of that library; has been tested, deployed, and used; and is generally known as not containing unacceptable defects. The key idea of our approach is to take known good programs using the same API as oracles, use a model checker to verify API usage rule candidates against them, and extract valid rules from the rule candidates. These valid rules can help programmer learn about common software library usage. They can also be used to check new programs through the same model checking process. Since the API usage rules are automatically extracted, our approach can relieve programmers from the burden of writing temporal specifications, which will help enhance the accessibility of software model checking. API usage rules represent intrinsic relationships among API functions of a software library. They may be stated explicitly or implied implicitly; they may be specified for-
The need to manually specify temporal properties of software systems is a major barrier to wider adoption of software model checking, because the specification of software temporal properties is a difficult, time-consuming, and error-prone process. To address this problem, we propose to automatically extract software library usage rules, which are one type of temporal specifications. Our approach uses a model checker to check a set of software library usage rule candidates against known good programs using that library, and identifies valid rules based on model checking results. These valid rules can help programmers learn about common software library usage. They can also be used to check new programs using the same library. We have implemented our approach in an Eclipse plug-in named LtRules, which can extract software library usage rules from C programs using BLAST as the underlying model checker.
Categories and Subject Descriptors D.2.4 [Software Engineering]: Software/Program Verification
General Terms Verification
Keywords Software Library Usage Rule, Model Checking
1.
MOTIVATION
When developing software systems, software programmers often use software libraries via their Application Programming Interfaces (API) to leverage on functionalities implemented in these libraries. A library API consists of a set of library functions, through which library features can be accessed. There are often implicit constraints among API functions of the same library, which application software
Copyright is held by the author/owner. ICSE’06, May 20–28, 2006, Shanghai, China. ACM 1-59593-085-X/06/0005.
823
executed in the program. For example, this rule exists between push() and pop() in a stack library. 4. The strict-alternation rule: Two API functions are always invoked in strict alternation, such as KeAcquireSpinLock() and KeReleaseSpinlock() in the Windows Driver Development Kit. 5. The function-pair rule: Two API functions are always invoked equal times and also in strict alternation, such as ZwCreateFile() and ZwClose() in the Windows Driver Development Kit. 6. The adjoining-function rule: Two API functions are always invoked together in the same order, and no other API function can be invoked between them, such as Shutdown() and Close() in the Socket Class in the .NET Framework Class Library. Among these six categories, the first two categories deal with one API function while the other four categories deal with two API functions. The last four categories are related in the following ways:
Figure 1: A screenshot of the LtRules tool
• The adjoining-function rule implies the function-pair rule, which means that any two API functions satisfy the function-pair rule if they satisfy the adjoiningfunction rule, whereas the reverse may not be true.
mally or described informally. Nevertheless, they always exist. Our observation is that even though existing program analysis techniques, include model checking, are not sophisticated enough to deduce such rules from source code in a practical fashion, pre-constructed usage rules verified against a large number of known good programs closely reflect library semantics because they are consistent with what had guided programmers in the development of those known good programs.
2.
• The function-pair rule implies the strict-alternation rule. • The strict-alternation rule implies the push-pop rule.
THE SOFTWARE LIBRARY USAGE RULE EXTRACTION TOOL LTRULES
We have implemented our approach in an Eclipse plugin named LtRules 1 , which can automatically extract API usage rules from C programs using BLAST [9] as the underlying model checker. The tool is available for download at http://www.ent.ohiou.edu/∼liuc/ltrules. A screenshot of LtRules is shown in Figure 1. Currently, LtRules can extract six categories of API usage rules: 1. The initialization rule: One API function is always invoked first among all the API functions in the group, such as initVerify() (or initSign()) in the java.security. Signature class in the SUN’s standard Java class library. 2. The finalization rule: One API function is always invoked last among all the API functions in the group, such as the Close() in the Socket Class in the .NET Framework Class Library. 3. The push-pop rule: At any given point, one API function is always invoked equal or more times as compared to the other. The invocation times of each API function refers to the number of times it actually gets 1 “Lt” in LtRules stands for “light”, as in lightweight formal methods.
824
For each category of rules, LtRules creates a corresponding template and represents it in BLAST’s specification language. For example, the strict-alternation rule template is shown in Figure 2. In the BLAST specification language, the keyword global defines a global variable (i.e. state); the keyword event is used to change global state and verify properties based on the execution of a C program; the keyword pattern specifies which API function activates an event; the keyword guard specifies the checks to be made before taking any actions if the pattern is matched; the keyword action specifies actions to be taken at certain points during execution after the guard condition is satisfied. For example, the first event in Figure 2 specifies that if an API function named funcA is invoked in the program to be verified, then the value of global variable flag should equal to zero. If not, the specified property is violated; otherwise, the value of global variable flag should be set to one. Note that in this template, funcA, funcB are symbolic names for API functions. They will be replaced by concrete API function names later when usage rule candidates are generated. To use LtRules, programmers only need to provide known good C programs and a group of related API functions as input. They will get valid API usage rules as the output. Related API functions are a group of API functions provided by the same library that may have an impact on each other. These functions are often invoked together in a sequence to complete a task. Programmers are interested in only usage rules among these API functions. The process of API usage rule extraction, which is shown in Figure 3, mainly consists of the following two steps:
global int flag = 0; event { pattern { $? = funcA;} guard { flag == 0} action { flag = 1; } } event { pattern { $? = funcB;} guard { flag == 1} action { flag = 0; } }
A group of related API functions
API rule templates
API rule template instantiation
API rule candidates
Known good programs
Model checking
Valid API rules
Figure 2: Specification template for the strictalternation rule in the BLAST specification language
Figure 3: The model checking based API usage rule extraction process
3. AN EXAMPLE OF USAGE RULE EXTRACTION
1. API rule template instantiation: For all related API functions in a group, a list of API usage rule candidates is generated by replacing the symbolic function names in existing rule templates with concrete API function names. The rule candidates represent all possible relationships among API functions in the group. 2. API rule candidate verification: BLAST is used to check all the rule candidates against all known good C programs to extract valid rules. The rule candidates and the programs are used as the input for BLAST during verification. Based on the model checking results, we can identify which rule candidates are valid, and thus extract the valid rules from the rule candidates. BLAST checks one rule candidate against one program at a time. A rule is valid only if it is satisfied by all known good programs. To reduce the number of model checker invocations, we introduce the following two optimization methods. • For the initialization rule and the finalization rule, all rule candidates are checked against the first known good program. Starting from the second program, only those rule candidates that have been satisfied by the previous programs are checked. Note that the order of known good programs model checked in this process does not affect the result of rule extraction. • For the other four categories of rules, because of the implication relationships among the rules, we first check all candidates of the push-pop rule, the least strict rule, against all known good programs, and extract valid push-pop rules. For the strictalternation rule, only those rule candidates with valid corresponding push-pop rule are checked. For example, if two API functions satisfy the pushpop rule, the strict-alternation rule candidate for these two functions will be checked next. Similarly, we check other rules of the same two functions by following the strictness order, which is strict-alternation rule, followed by the functionpair rule, followed by the adjoining-function rule.
LtRules was used to extract API usage rules in the Open Secure Sockets Layer (OpenSSL) library [1]. The experiment was performed on a Gateway E-2000 PC with a 2.4 GHz Pentium 4 CPU and 512 MB RAM. It ran the Fedora Core 4 Linux operating system. We used BLAST 1.0 in this experiment. We selected four applications using the OpenSSL library: CashCow, Slush, Pay-Pal Sender and Netcat SSL. These four OpenSSL applications had been released for more than three years and used by many users. Therefore, they were relatively stable and could serve as known good programs. We also selected a group of six API functions, all of which deal with the SSL/TLS sessions defined in the SSL SESSION structure. They were SSL free(), SSL new(), SSL set fd(), SSL connect(), SSL write(), and SSL read(). From the four OpenSSL applications, it took two minutes and fifty-six seconds to extract four valid OpenSSL API usage rules out of 132 rule candidates, which represented all possible relationships between six OpenSSL API functions. BLAST was invoked 65 times in the extracting process. The four valid rules are listed below: 1. push-pop[SSL new(),SSL free()], which represents the push-pop rule between SSL new() and SSL free(). The following rules are presented in the same format. 2. push-pop[SSL new(),SSL set fd()]. 3. push-pop[SSL new(),SSL connect()]. 4. strict-alternation[SSL set fd(),SSL connect()]. All four valid API usage rules indicate common usages of these OpenSSL library functions and reflect their inherent semantics. For example, the first rule states that the SSL new() function, which creates a new SSL structure for use with an SSL session, and the SSL free() function, which frees an allocated SSL structure, satisfy the push-pop rule, i.e. at any given point in program execution, SSL free() is always invoked equal or less times than SSL new(). It is clear from the semantics of these two functions that if SSL free() is invoked more times than SSL new, it would attempt to free SSL structures already released, and therefore is inconsistent with its original semantics. As another example, the last rule reflects that the SSL set fd() function, which assigns a socket to an SSL structure, and the SSL connect()
825
the need for programmers to manually write rule specifications if they wish to perform software model checking. This approach may have a positive impact on wider adoption of software model checkers.
function, which starts an SSL session with a remote server application, are commonly used in strict alternation. This experiment demonstrated that LtRules extracted valid API usage rules from several known good applications in a reasonable amount of time. Programmers can learn about common software library usage pattern based on the extracted API rules. They can also use these rules to check new unverified programs through the same model checking process.
4.
6. REFERENCES [1] OpenSSL: http://www.openssl.org/. [2] R. Alur, P. Cerny, P. Madhusudan, and W. Nam. Synthesis of interface specifications for Java classes. In POPL ’05: Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles Of Programming Languages, pages 98–109. ACM Press, 2005. [3] G. Ammons, R. Bodik, and J. R. Larus. Mining specifications. In POPL ’02: Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles Of Programming Languages, pages 4–16. ACM Press, 2002. [4] T. Ball, V. Levin, and F. Xie. Automatic creation of environment models via training. In TACAS ’04: Proceedings of the 10th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 93–107, 2004. [5] T. Ball and S. K. Rajamani. The SLAM project: debugging system software via static analysis. In POPL ’02: Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 1–3. ACM Press, 2002. [6] E. M. Clarke, O. Grumberg, and D. A. Peled. Model Checking. MIT Press, 1999. [7] D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf. Bugs as deviant behavior: a general approach to inferring errors in systems code. In SOSP ’01: Proceedings of the 18th ACM Symposium on Operating Systems Principles, pages 57–72. ACM Press, 2001. [8] M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. IEEE Transactions on Software Engineering, 27(2):1–25, Feb. 2001. [9] T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Software verification with BLAST. In SPIN ’03: Proceedings of the 10th International SPIN Workshop on Model Checking of Software, pages 235–239, 2003. [10] W. Weimer and G. Necula. Mining temporal specifications for error detection. In TACAS ’05: Proceedings of the 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, 2005. [11] J. Whaley, M. C. Martin, and M. S. Lam. Automatic extraction of object-oriented component interfaces. In ISSTA ’02: Proceedings of the 2002 ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 218–228. ACM Press, 2002. [12] J. Yang and D. Evans. Automatically inferring temporal properties for program evolution. In ISSRE ’04: Proceedings of the 15th IEEE International Symposium on Software Reliability Engineering, pages 340–341, 2004.
RELATED WORK
4.1 Program invariants Ernst et al. proposed an approach to automatically learn likely invariants involving program variables from dynamic traces [8]. Their approach deals with the value relationships among a program’s variables, while ours focuses on a program’s temporal properties i.e. the execution relationships among API functions in a library.
4.2 SLAM The SLAM [5] project developed at Microsoft Research aims at automatically checking that a C program correctly uses an API according to API usage rules. In the SLAM project, Ball et al. automatically create environment models (i.e. models of the kernel) via training [4]. They create abstractions of the API procedures by model checking several programs that use the same API. LtRules shares some common features with the training approach. LtRules also uses known good programs for extraction, and needs the help of a model checker. However, LtRules deals with interrelationship among API functions, while the training approach focuses on API functions themselves. Therefore, these two approaches are complementary and can be used together. The API usage rules extracted by LtRules can be represented in SLIC, which could alleviate SLAM’s limitation of requiring manually written API usage rules.
4.3 Specification extraction Many researchers have also made attempts to automatically extract software temporal properties from software systems [3, 11, 7, 10, 2, 12]. Different from them, LtRules uses a model checker to verify rule candidates against known good programs. Due to the exhaustive nature of model checking, the rules extracted by LtRules are guaranteed to be consistent with known good programs, while those of specification extraction techniques cannot guarantee that with the exception of the technique in [2]. Although the specification extraction technique mentioned in [2] can extract safe specification, it is less automatic than LtRules–it needs the user to input the exception predicate, an initial set of predicates, and a specification size besides the program, while LtRules only needs a group of related API functions and known good programs as the input.
5.
CONCLUSIONS
LtRules turns implicit API usage rules into explicit ones that are formally specified in a form recognizable to software model checkers. Unlike natural language documentation, these automatically extracted usage rules are guaranteed to be consistent with code. Most importantly, it eliminates
826