Utilizing Specification Testing in Review Task Trees for Rigorous Review of Formal Specifications∗ Shaoying Liu Department of Computer Science Faculty of Computer and Information Sciences Hosei University, Tokyo, Japan Email:
[email protected] http://wwwcis.k.hosei.ac.jp/~sliu/
Abstract Review Task Tree (RTT) is a comprehensible notation for expressing review tasks in rigorous review of formal specifications. It has a mechanism for evaluating the final review result based on the review results of all the members (atomic tasks) of its minimal cutset. However, the notation does not provide any method for actually reviewing the atomic tasks. In this paper, we present a way to utilize a specification testing method for rigorous review of the atomic tasks. Strategies for generating test cases for each kind of RTT are described and explained with examples. A small case study is conducted using the proposed technique and the result is analyzed to show the benefit and the potential limitation of the technique.
1 Introduction There is a growing interest in adopting formal specification in software development [1][2], but meanwhile the trend is also constrained by the lack of effective, practical techniques for validating and verifying formal specifications [3]. An obvious advantage of formal specification over informal specification is that the formal specification can be rigorously analyzable. There are several analysis techniques proposed by researchers in the literature, such as formal proof [4], specification animation [5], model checking [6], specification testing [7], and rigorous reviews [8], but among those techniques, the most commonly used one in practice that emphasizes the human role is rigorous reviews [9]. ∗ This work is supported by the Ministry of Education, Culture, Sports, Science, and Technology of Japan under Grant-inAid for Scientific Research on Priority Areas (No. 15017280).
Review is a traditional technique for static analysis of software to detect faults possibly resulting in the violation of the consistency and validity of the software system. Basically, software review means to check through software either by a team or an individual. Since software means both program and its related documentation, such as functional specification, abstract design, and detailed design, a review can be conducted for all levels of documentation [10][11][12][13]. Compared to formal proof, review is less rigorous, but emphasizes the importance of human’s role in verification and validation. Since software development is really a human activity involving many important human judgements and decisions, review is an effective technique for software verification and validation in practical software engineering [14][9]. Furthermore, when dealing with non-terminating software and/or non-executable formal specifications (e.g., VDM, Z, SOFL), review has obvious advantages over the techniques that require execution of software systems (e.g., testing). The question is how to improve the rigor and effectiveness of review techniques. When dealing with specifications with no formal semantics, review techniques have to be applied intellectually based on reviewers’ experience, and may not be supported systematically in depth. However, for formal specifications, more rigorous review techniques can be developed and applied, though to our best knowledge only a small amount of research has been reported in this area. The Cleanroom software engineering method emphasizes the use of review for verification of programs against their specifications, but does not seem to provide techniques for verifying specifications themselves [15]. Semmens and Bryant describe a technique for detecting faults in requirements by translating semiformal diagrams, such as entity relationship models,
data flow diagrams and the related English texts, into Z specifications [16]. Despite the fact that the technique was proposed under the name of Rigorous Review Technique, it means to use formal specification as a method to help the analyst identify errors in requirements specifications, but does not address the issue of how to review a formal specification. In our previous work on rigorous review [8], we proposed a property-based rigorous review approach to checking formal specifications, especially for nonexecutable specifications. In this approach a Review Task Tree notation was proposed to express review tasks in a manner that allows the reviewer to concentrate on each review task at a time. However, a remaining problem with the review technique is the lack of specific and effective ways to review each property indicated by a review task. The reviewer has to read the property and make a decision of whether there are faults in the corresponding property based on his or her understanding of the contents. This is usually easy for a simple property (a predicate involving simple arithmetic expressions), but can be hard and even not trustable for a sophisticated property (predicate involving complicated operators and expressions). To make a review more effective, we have found, through several small case studies, that “walk through with sample data” is effective in helping the reviewer understand the property and therefore easily detect the potential faults. Developing this idea into a systematic method, we propose to utilize the specification testing method, reported in our previous publication [7], to help review properties in an RTT. The combination of specification testing and RTT provides benefits for software verification and validation in two aspects. It takes advantages of the dynamic feature of specification testing in the static review process to allow the reviewer to check every detail of the target property in an “operational” manner. Thus, not only can internal consistency of formal specifications be verified, but their validity can also be checked. Another benefit is that the test cases generated for review properties of a specification can be reused as the test cases for testing the implementation of the specification in the late phase, therefore it can help reduce the overall cost of program testing. We choose SOFL (Structured Object-oriented Formal Language) as the target specification language [17] for discussions upon the proposed review method, because it is one of the very few specification languages showing how to incorporate formal descriptions into the practical specification construction process and shares the important features with the commonly used formal specification languages VDM-SL (Vienna De-
velopment Method - Specification Language), Z, and classical Data Flow Diagrams. Thus, the results of this paper can be easily applied to the domains in which VDM, Z or DFD are often employed. For the sake of space, we give no introduction to SOFL, which does not affect the comprehension of the paper. The reader can refer to our previous publications on SOFL for details [17][18]. The remainder of the paper is organized as follows. Section 2 gives a brief introduction to the RTT technique. While section 3 focuses on test cases generation based on predicate structures, section 4 discusses how to generate test cases for various types of variables. In section 5 we describe how to efficiently support the review process by means of hyperlinks. Section 6 presents a small case study and the evaluation of the result. Finally, in section 7 we conclude the paper and point out future research. 2 Brief Introduction to RTT
As reported in our previous publication [8], a rigorous review of a specification takes four steps: 1. Derive all the necessary consistency properties to be reviewed from the specification, and express them as predicate expressions. They are usually organized in a table to facilitate the management of reviews. 2. Build a Review Task Tree (RTT) to present review tasks for each property. 3. Perform reviews for all the necessary tasks in the tree. 4. Analyze the review results to determine whether faults are detected or not. The consistency properties are the review targets, therefore, they need to be obtained first. Of course, it is important to ensure that they are correctly expressed and well-organized to facilitate reviews. Since a property may be too complicated to be reviewed as a whole, building an RTT for it helps to indicate clearly all the decomposed review tasks. There are usually branch nodes denoting intermediate review tasks and leaf nodes representing the atomic tasks in the RTT; and all the intermediate tasks are semantically defined by the related atomic tasks. Therefore, all the necessary tasks to be reviewed in an RTT are only atomic tasks denoted by leaf nodes. The review of each atomic task is mainly done by the reviewer based on his or her knowledge of predicate logic and the inference rules of
the logic, but it may also be supported with some software tool (we are working on the tool, but it is still too primitive to be reported in this paper). After reviewing all the necessary tasks, the results need to be analyzed to determine whether faults are detected or not. The effectiveness of the review approach is greatly dependant on the types and coverage of consistency properties. There are many kinds of consistency properties for a specification, but the most important ones we have discussed in our previous publication include • • • •
Internal consistency Invariant-conformance consistency Satisfiability Integration consistency
process P(x: int) y: int pre x > 0 post y > x + 1 and y < 100 end_process To review the satisfiability of the process (i.e., check whether process P is satisfiable), for example, we derive the proof obligation, denoted by P_o for process P as follows:
int]
Implication
x: int
not x > 0
y: int
Exists
not x >0 or Exists
Conjunction
y > x + 1 and y < 100
y>x+1
y < 100
Figure 1. A RTT for the proof obligation
The internal consistency ensures that the use of expressions does not violate the related syntactic and semantic constraints imposed by the specification language. Invariant-conformance consistency deals with the consistency between operations and the related type and/or variable invariants. Satisfiability is a property requiring that satisfactory output be generated based on input by an operation under its precondition. Integration consistency ensures that the integration of operations to form a more powerful operation keeps the consistency among the interfaces of operations. The steps 1, 2, and 4 of the above review method were well discussed in our previous work, but how to perform reviews for all the necessary tasks in an RTT (i.e., step 3) was not addressed in depth. Let us look at a simple example of how to build an RTT for a target property and what is the remaining problem with the review approach. Let process P representing a transformation from input to output be defined as follows:
forall[x: int] | x > 0 => exists[y: y > x + 1 and y < 100
P_o
|
The proof obligation states that for any value x in type int (integers), if it satisfies the precondition x > 0, then there must exists an output y in type int that
satisfies the postcondition y > x + 1 and y < 100. Although it is obvious, without a systematic review process, that this proof obligation cannot be discharged (implying there is a fault in the specification), we use this simple example to explain our review approach for the sake of comprehensibility. To build a comprehensible RTT for reviewing this proof obligation, we first convert it into an equivalent quantified expression:
forall[x: int] | not x > 0 or exists[y: y > x + 1 and y < 100
int]
|
An RTT for reviewing this expression is drawn based on the rules described in detail in our previous work [8], as shown in Figure 1. The top review task is hold(P_o), meaning that the proof obligation holds, and it is represented by the rectangle containing P_o in the tree. Note that the RTT is not just a graphical representation of the proof obligation, but it presents all the review tasks graphically. The meaning of the RTT is interpreted as follows: the proof obligation P_o is true if the constraint type int is not empty (i.e., that x is a member of int is true) and the Implication is also true based on the fact that int is not empty. The reason why we require to check if the constraint type int is not empty is that an empty constraint set in a universally quantified expression used in a functional specification of a real system is usually meaningless, although it is not a mistake in logic. The truth of Implication is ensured by the truth of one of the three expressions: not x > 0, Exists, and not x > 0 or Exists. To ensure the truth of Exists, we must guarantee that int is not empty (i.e., y can be a member of int) and Conjunction can be true. Furthermore, Conjunction is ensured by reviewing the three atomic tasks in the order from left to right: y > x + 1 can be true (i.e., it is satisfiable); y < 100 can be true; and y > x + 1 and y < 100 can be true.
A
B
Property A (or B) can hold (or holds) if all of its child properties hold.
A
B
Property A (or B) can hold (or holds) if one of its child properties hold.
A
B
Property A (or B) can hold (or holds) if all of its child properties hold in the order from left to right. Property A (or B) can hold (or holds) if all of one of its child properties holds in the order from left to right
A
B
A
B
Property A (or B) can hold (or holds) if its only child property holds.
B
Property B holds if its right child property holds under the assumption that its left child property holds.
B
Property A (or B) can hold (or holds). It is an atomic property that has no decomposition.
A
Figure 2. The major components of RTT
The explanations of the related components of the RTT is given in Figure 2. Property A can hold means that A is satisfiable and property B holds means that B is a tautology. The problem now is how to help the reviewer review all the tasks in the RTT. An effective review method is expected to assist the reviewer to understand the meaning of the target predicate expression and to make correct decision about whether it involves faults. The judgement on the correctness of each predicate expression should not be made only based on the logical correctness, but also the validity of the expression (i.e., it satisfies the user’s conception of requirements). Our experience and intuition indicate that people tend to learn and understand new things through simple examples. This has motivated us to adopt the principle of specification testing in RTTs to help review each atomic task and the entire RTT as a whole. The details of this technique will be introduced from the next section. 3 Review by generating test cases
The essential idea of specification testing is to provide effective testing strategies for generating test cases for testing properties of a specification, as described in our previous publication [7]. Since each property to be tested is represented by a predicate expression, the generation of test cases is usually done based on the structure of the property. This technique is just suitable for our purpose for reviewing predicate expressions in an RTT. Since a task in an RTT may involve a single relation, known as an atomic expression, a conjunction, or a disjunction, we focus the discussion on the test case
generation for these three kinds of expressions, respectively. 3.1 Test case generation for atomic expressions
There are two kinds of tasks involving an atomic expression. Let P(a_1, a_2, ..., a_n) denote an atomic expression involving free variables a_1, a_2, ..., a_n. Then one kind of task is that P can hold, denoted by can_hold(P), and another kind is that P holds, expressed as hold(P). To review each task with the help of test cases, we need to generate test cases that help to determine both the logical correctness of the task and the validity of the task. For example, when checking can_hold(P), we need to ensure that P is satisfiable and P does meet the user’s desired property. For this purpose, we put forward several criteria for test set generation, where a test set is a set of test cases for all the free variables in the atomic expression.
Criterion 1. Generate a test set T_s for can_hold(P) such that T_s = {(a_1, a_2, ..., a_n)
a_1: T_1, a_2: T_2, ..., a_n: T_n & P(a_1, a_2, ..., a_n)} = { }. |
Where a_1, a_2, ..., a_n are all the free variables in expression P. The Criterion 1 requires that a non-empty test set needs to be generated such that each of its members satisfies the expression P(a_1, a_2, ..., a_n). In fact, this is another way of saying that P(a_1, a_2, ..., a_n) is satisfiable or can_hold(P) is true, but the set comprehension provides more information, such as the types T_i (i=1..n) of the free variables, to help the reviewer understand the property P better. The types can be retrieved from the declarations of the variables a_i in the related specification, either by the reviewer manually or by a software tool automatically. However, we should bear in mind that this requirement imposes no specific constraint on the number of the test cases in the test set, as long as the set is not empty. It is usually the responsibility of the reviewer to decide how many test cases in the set are needed to review the logical correctness and the validity of the expression. For example, let P denote the atomic expression y > x + 1 in Figure 1. Since this expression involves two free variables x and y, a test case is a pair of integers (their type) and a test set is a set of pairs of integers. A test of the expression, which includes a test set and the corresponding evaluations of the expression (a relation in this case), is shown in Table 1. The important difference between review with a test set and a real test with the test set of the expression
0, 2147483647} of type int and define the following set
x y P 5 7 true 10 15 true 0 0 false
comprehension:
T_s = {(x, y) x: {-2147483648, 0, 2147483647}, y: {-2147483648, 0, 2147483647} & y > x + 1} = {(-2147483648, 0), (-2147483648, 2147483647), (0, 2147483647)} |
Table 1. A test for P
is that in the review it is the reviewer who actually evaluates (actually reads, evaluates, and understands) the expression with the test cases. Therefore, the test set should be generated to facilitate the review in this regard.
Criterion 2 Generate a test set T_s for the task hold(P) such that T_s = {(a_1, a_2, ..., a_n) a_1: |
T_s1, a_2: T_s2, ..., a_n: T_sn & P(a_1, a_2, ..., a_n)} = T_s1 * T_s2 * * T_sn. ···
Where T_si (i=1..n) is an arbitrary subset of type T_i of variable a_i declared in the specification. Ideally, we should test whether the set comprehension T defined below is equal to the entire product type of the types T_1, T_2, ..., T_n, that is,
T = {(a_1, a_2, ..., a_n) a_1: T_1, a_2: T_2, ..., a_n: T_n & P(a_1, a_2, ..., a_n)} = T_1 * T_2 * * T_n.
and then check whether T_s contains the exactly same elements as those of the product set T:
T = {(-2147483648, -2147483648), (-2147483648, 0), (-2147483648, 2147483647), (0, -2147483648), (0, 0), (0, 2147483647), (2147483647, -2147483648), (2147483647, 0), (2147483647, 2147483647)} Where -2147483648 and 2147483647 are the smallest and biggest integers (ISO unicode standard), respectively. Since it is obviously false, we have detected that requiring hold(P) is impossible, which indicates a fault related to P in the specification.
|
3.2
···
However, this is usually extremely difficult, if not impossible, because the original types T_i (i=1..n) can be quite big or even theoretically infinite (e.g., integers, real numbers). For this reason, a feasible way to test the property P is to create reasonably big but representative subsets of those types, such as T_si (i=1..n) we used in Criterion 2, and the credibility and effectiveness of this method is very much dependant on how the subsets are chosen. It seems that there is no perfect answer to this problem, but some special values in each type may need to be taken into account. The detailed description of selecting test cases from each data type available in SOFL is provided in section 4. It is worthy of noticing that usually an atomic predicate expression is not complicated, therefore it is highly possible for the reviewer to quickly find out whether the expression is valid (or holds) or not, even without generating the set comprehension. The merit of generating the set comprehension, however, is to provide a systematic way for a software tool to automatically review atomic expressions whenever it is possible. Let us take the same predicate y > x + 1 as we used for forming Table 1 as an example to explain the idea of Criterion 2. Since the type for both x and y is integer int, we can generate the subset {-2147483648,
Test case generation for compound expressions
It is possible to have a compound predicate expression in a review task in an RTT, such as can_hold(y > x + 1 and y < 100). All the compound expressions in a single review task in an RTT can be formed by using the logical operators and and or (because all the other operators, such as implies => and is equivalent to , can be simplified into equivalent expressions involving only and and or). Therefore, we concentrate on the test case generation only for conjunctions and disjunctions.
Criterion 3. Generate a test set T_s for the task can_hold(P_1 and P_2) such that T_s = T_s1 ∩ T_s2
= { }
Where T_s1 and T_s2 are the test set for P_1 and that contain the same variables, respectively. Since the review of P_1 (or P_2) with test set T_s1 (or T_s2) has concentrated on the correctness of the property itself, the review of the task can_hold(P_1 and P_2) should focus on the issue whether we can generate a nonempty intersection of the test sets of the involved property (e.g., P_1, P_2). For this purpose, a natural way to generate such an intersection is to reuse the test sets created for reviewing P_1 and P_2: P_2
check whether there is any common elements in both test sets. If the intersection of the existing test sets for P_1 and P_2 is empty, then it is necessary to explore other possible test sets for P_1 and P_2 until either a nonempty intersection is found or the reviewer can determine the impossibility of forming a nonempty intersection. For instance, let T_s1 be a test set for the task can_hold(y > x + 1) in Figure 1 and T_s2 a test set for can_hold(y < 100), and they are defined as follows: T_s1 = {(5, 7), (10, 15), (0, 0)} T_s2 = {(5, 7), (0, 110), (0, 0)}
Then the intersection T of these two test sets is: T = T_s1 ∩ T_s2 = {(5, 7), (0, 0)} { } Note that the dummy values are generated for variable x in test set T_s2, though x is not involved in the expression y < 100, in order to apply Criterion 3, which does not affect the effect of application of Criterion 3.
Criterion 4. Generate a test set T_s for the task hold(P_1 or P_2) such that T_s = T_s1 ∪ T_s2 = T_s1 * T_s2 *
· · · * T_sn
Where we assume that P_1 and P_2 involve the same variables a_1, a_2, ..., a_n whose types are T_1, T_2, ..., T_n, respectively; T_s1 and T_s2 are the test set for P_1 and P_2 that contain the same variables, respectively; and each T_si (i=1..n) is a sufficiently big subset of type T_i. Theoretically, we should require that T_s = T_s1 ∪ T_s2 = T_1 * T_2 * ··· * T_n, but for the same reason as that for the situation described in Criterion 2, we have to chosen representative subsets of the original types of the involved variables to form the product set T_s1 * T_s2 * ··· * T_sn. Since the review of P_1 (or P_2) with test set T_s1 (or T_s2) has concentrated on the correctness of the property itself, the review of the task hold(P_1 or P_2) should focus on the issue whether we can show that the union of the test sets of the involved property (e.g., P_1, P_2) is the same as the product set T_s1 * T_s2 * ··· * T_sn. A systematic way to do so is first to generate all the representative subsets of the types to form their product set, and then to divide it into the two subsets for testing property P_1 and P_2, respectively (i.e., a subset on which P_1 holds and another subset on which P_2 holds). If the
union of both subsets is not equal to the product set T_s1 * T_s2 * · · · * T_sn, it indicates the existence of faults in the disjunction P_1 or P_2. Otherwise, the hold(P_1 or P_2) will be confirmed by the current test sets. Since the review of the task hold(P_1 and P_2) (or can_hold(P_1 or P_2)) can be done by only reviewing the subtasks hold(P_1) (or can_hold(P_1)) and hold(P_2) (or can_hold(P_2)), the RTT of these two tasks do not require the reviews of the compound expression P_1 and P_2 (or P_1 or P_2). Therefore, we do not need to discuss how to review them using test sets. 3.3
Generating test cases for review of ordered tasks
It is possible to involve review tasks in an RTT that must be reviewed in a certain order. These kinds of RTTs are usually derived from quantified expressions, as we described in detail in our previous publication [8]. Let us take the RTT for the review task can_hold(forall[x: T] | Q(x)) as an example to explain the guideline for generating test cases for review of ordered review tasks. The RTT for can_hold(forall[x: T] | Q(x)) is given in Figure 3. According to the meaning of the triangle connector attached to the box containing the quantified expression, the review of the top-level task must be ensured by first reviewing the task hold(x: T) on the left of the connector and then reviewing the task can_hold(Q(x)) on the right. Reviewing the task hold(x: T) requires to check: (1) whether the bound set T is valid and (2) whether T is empty. Reviewing the task can_hold(Q(x)) must depend upon variable x bound to set T, but also have to consider the other free variables possibly involved in Q(x). For example, suppose T denotes the integer type int and Q represents the expression: x > y + 2, where x is the bound variable and y is a free variable, then hold(x: T) will become hold(x: int) and can_hold(Q(x)) will become can_hold(x > y + 2). While reviewing hold(x: int) is simple, reviewing can_hold(x > y + 2) needs to pay attention to the generation of test cases for variable x and for y: x must be bound to int while y can freely chosen from its type (e.g., real). What we need to check in reviewing can_hold(x > y + 2) is whether the relation x > y + 2 can be true for every x in a representative subset of int. Criterion 5 given below provides a guideline for generating test cases for reviewing such ordered tasks. Criterion 5. Suppose task Q is required to be reviewed after task P in an RTT and T_p is the test set
forall[x: T] | Q(x)
x: T
Q(x)
Figure 3. The RTT for reviewing whether the [x: T] universially quantified expression | Q(x) can hold.
forall
for reviewing task P, then generate a test set T_q that must use all the test cases in T_p for reviewing Q. The reason why we require that the test set T_p be used in the test set T_q is quite apparent, because in this way the dependency of Q upon P can be examined in the review, which is required by the corresponding RTT. Consider the specialized RTT in Figure 3 as an example. Assume that we generate a test set T_p for reviewing the task hold(x: int) as follows:
x: {—10, -1, 0, 1, 10} Then applying Criterion 5, we generate the following test set T_q for reviewing the task can_hold(x > y + 2):
T_q = {(x, y) x: {—10, -1, 0, 1, 10}, y: {-20, 0, 5} & x > y + 2} = {(-10, 0), (-1, -20), (0, -20), (1, -20), (10, 5)} |
Obviously, all the test cases for x are used in the test set T_q, and for each test case of x there always exists an integer for y such that the condition x > y +2 is satisfied, where {-20, 0, 5} is a set of test cases for variable y. The fact that f orall[x: int] | x > y + 2) can be true is confirmed by the review. 4 Generation of test cases for variables
No matter whether a variable is involved in an atomic expression (e.g., x > y + 2), a compound expression (e.g., x > y + 2 and x < 100), or a quantified expression (e.g., forall[x: int] | x > y + 2), we always face the problem of how to choose test cases for the variable from its type when generating a test set for the relation or expression. The general principle for such a test case generation is to try to choose representative values and boundary values from its type. The
representative values may vary from type to type, but it seems extremely difficult, if not impossible, to provide a precise and 100% trustable criteria for determining the representative values for each type. Nevertheless, on the basis of common practice, it is possible to define some commonly used values which are believed to be representative and effective in detecting faults. In this section, we propose both boundary and representative values for both the basic and compound built-in types adopted in SOFL and VDM, but for the sake of space, we show how to generate test cases only for variables of type real and set type; other types can be dealt with in a similar manner. When generating test cases for variables of type real, apart from boundary and representative values we also need to consider the special value 0, as shown in the following table. Boundary low Rminimum, value: boundaries: ABSminimum high Rmaximum boundary: Special value 0 Representative inset value: {Rminimum + 1 , i
...,
Rmaximum − 1 } \ {0} where the Rminimum is the most negative real number; ABSminimum is the smallest absolute value; and Rmaximum is the biggest real number the computer represents. For example, we can take the values: Rminimum = -1.79769313486231570 E308 (IEEE 754 floating-point), ABSminimum = 1.79769313486231570 E-308, and Rmaximum = 1.79769313486231570 E308. A set type can be defined in terms of an element type. Let Set = set of T be a set type whose elements are composed of values from type T. Then we propose the following specific rules: Boundary low {} (empty set) value: boundaries: {e} (single element) high a reasonably boundary: big set Outside e.g., 1. value: Representative s inset value: power(T)
Choosing the high boundary value of a set type may be difficult, because it depends on T. A practical solution to this problem is to choose a reasonably big
set as the high boundary value. This boundary value must be determined by the tester based on the testing requirements. An outside value for a set can be a value of other types different from the type Set, for example, natural number 1. A representative value can be any value of the power set of T denoted by power(T).
P_o
Implication
x: int
During the review of review tasks in an RTT, the reviewer may often need to refer to the corresponding properties in the property table and/or “components” (e.g., type declaration, variable declaration, process specification) in the specification in order to make correct judgements in the review. To efficiently support the review process, it is important to build appropriate and efficient hyperlinks among the properties, review tasks in RTTs, and the related items in specifications. To ensure the appropriate links, we first build a formal model for hyperlinks among the properties, review tasks, and the items in specifications. Let PropertyTable denote a set of properties derived from a specification S; RTTset denotes a set of RTTs derived from the properties in PropertyTable; Task denotes the set of all the review tasks of the RTT in RTTset; and Spec denotes the set of components (e.g., type declaration, invariant, process) of specification S. Then we define several mappings that will serve as the foundation for building hyperlinks among them. Pr : PropertyTable → RTTset Rt : RTTset → 2Task Rp : Task → PropertyTable Rs : Task → Spec Ps : PropertyTable → Spec
Since an RTT is unique for a property in the property table, function Pr is a one-to-one mapping. For the similar reason, Rt is also a one-to-one mapping from the set of RTT to the power set of Task: an RTT corresponds to a unique set of review tasks and vice versa. However, since different tasks in the same RTT are related to the same property in the property table, Rp is a many-to-one mapping. In an RTT, the properties contained in different tasks may be derived from the same components in the specification (e.g., process), therefore Rs is a many-to-one mapping as well. Likewise, Ps is also a many-to-one mapping. The hyperlinks among the properties in the property table, the tasks in the RTT, and the components in the specification can be supported by a software tool. Although we have not completely implemented such a
Conjunction
y: int
5 Hyperlinks among properties, RTTs, and specifications
not x >0 or Exists
Exists
not x > 0
y > x + 1 and y < 100
y>x+1
No 1
Properties
y < 100
Category
forall[x: int] | not x > 0 or exists[y: int] | y > x + 1 Satisfiability and y < 100
2
P_1
R_1
3
P_2
R_2
4
P_3
R_3
... process P(x: int) y: int pre x > 0 post y > x + 1 and y < 100 end_process
Property table
...
Specification
Figure 4. An exmaple of hyperlinks among specification, RTTs, and properties.
tool, the idea of the hyperlink mechanism is depicted in Figure 4. The RTT is derived from the property numbered 1 in the property table that is derived from the specification of process P in the entire specification. The link from the task can_hold(y > x + 1) to property 1 in the property table shows an instance of the mapping defined by function Rp , while the link from the task can_hold(y < 100) to the process P in the specification indicates an instance of the mapping defined by function Rs. Furthermore, the link from property 1 to the top-level task hold(P_o) presents an instance of the mapping function Pr , and the link from property 1 to process P in the specification represents an instance of the mapping function Ps . We understand that a software tool supporting the hyperlink mechanism is essential in facilitating the review process, and therefore plan to extend our current prototype tool to support the hyperlink mechanism in the near future.
6 A small case study We have conducted a case study to review an ATM (Automated Teller Machine) specification resulting from our another recent project using the proposed review method. The ATM specification consists of informal and semi-formal user requirements specification, formal abstract design specification, and formal detailed design specification [19], but we choose only the formal abstract design specification as the review
Categories Inserted Detected Detection of errors errors errors rate Variables Data types Internal consistency of processes Invariantconformance consistency Satisfiability Integration consistency Original errors in expressions
3 25 15
2 24 16
67% 96% 107%
1
1
100%
3 0
2 0
67%
5
Table 2. The summary of the case study result
target for our case study because it is the most suitable part for applying the proposed review method. The abstract design specification contains six modules, forty two processes, six CDFDs (condition data flow diagrams), and in total takes twenty three pages. The result of the case study in terms of detecting errors is summarized in Table 2. The table is divided into four columns: categories of errors, inserted errors, detected errors, and detection rate. The first column, counted from left to right, gives a list of error types; the second one shows a list of the numbers of errors in each error category that were inserted before the case study; the third column lists the numbers of errors detected in each category; and the fourth column shows the detection rate for each error category. The table may surprises the reader because of some detection rates being over 100%. For example, in the case of internal consistency of processes, one more error is found than those inserted, but this error is an original error in the specification rather than an inserted one. Since the preconditions of all the processes in the specification given as true, there was no error concerned with the integration consistency inserted and detected. Encouragingly, we found another five original errors in the specification by the review as well; all of them are mistakes in using correct forms of operators defined on some compound data types, such as composite type. The case study has demonstrated that the proposed method is effective in facilitating the reviewer to concentrate on each review task and in detecting faults, but meanwhile it has also shown that effective application of the method requires that the reviewer be famil-
iar with the details of the specification, since review decisions may not be made if the reviewer does not understand the semantics of the specification. 7 Conclusions
On the basis of the introduction to the Review Task Tree approach to verification and validation of formal specifications, we have presented a way to utilize test cases to support the review of tasks in Review Task Trees. We have put forward the criteria for generating test cases to review every kind of task in a review task tree and the criteria for generating representative and boundary values for variables of the important data types. To evaluate the effectiveness and find potential weakness of our method, we have conducted a case study of reviewing part of an ATM system. The result of the case study shows that the method is effective in detecting faults, but also indicates that it would be difficult for people to apply the method if they do not understand the specification. For this reason, the method has a high potential of being effectively incorporated into the commonly used peer review techniques in industry. We are building a prototype tool to support the RTT approach to reviewing formal specifications. Currently the tool can help draw an RTT for a given property and has the necessary editing functions, but as future work we plan to expand the functionality to provide an effective hyperlinking mechanism to facilitate efficient and accurate references to entities in the property table, review task trees, and the original specification, in order to enhance the efficiency and reduce mistakes of reviews. Furthermore, to evaluate the applicability of the review method to more complicated software systems, we are also interested in conducting larger-scale case studies with the potential software tool support in the future. References
[1] Wassyng Alan and Mark Lawford. Lessons Learned from a Successful Implementation of Formal Methods in an Industrial Project. In Proceedings of Foundation on Software Engineering 2000, page to appear, Pisa, Italy, Septmember 814 2003. [2] Toshiyuki Tanaka, Han-Myung Chang, and Keijiro Araki. Applications of Formal Methods on Doamin Analysis of Practical Systems (in Japanese). In Proceedings of Foundation on Soft-
ware Engineering 2000, pages 205—208, November 2000.
[3] Dan Craigen, Susan Gerhart, and Ted Ralston. Industrial Applications of Formal Methods to Model, Design and Analyze Computer Systems - An International Survey. Noyes Data Corporation, U.S.A., 1995. [4] C.A.R.Hoare and He Jifeng. Unifying Theories of Programming. Prentice Hall, 1998. [5] Tim Miller and Paul Strooper. Model-Based Specification Animation Using Testgraphs. In Chris George and Huaikou Miao, editors, Proceedings of 4th International Conference on Formal Engineering Methods, Lecture Notes in Computer Science, pages 192—203. Springer-Verlag, October 2002. [6] Gerard J. Holzmann. The Model Checker SPIN. IEEE Transactions On Software Engineering, 23(5), May 1997. [7] Shaoying Liu. Verifying Consistency and Validity of Formal Specifications by Testing. In Jeannette M. Wing, Jim Woodcock, and Jim Davies, editors, Proceedings of World Congress on Formal Methods in the Development of Computing Systems, Lecture Notes in Computer Science, pages 896—914, Toulouse, France, September 1999. Springer-Verlag. [8] Shaoying Liu. A Rigorous Approach to Reviewing Formal Specifications. In Proceedings of 27th Annual IEEE/NASA International Software Engineering Workshop, Greenbelt, Maryland, USA, December 4-6 2002. IEEE Computer Society Press. [9] Stefan Biffl and Michael Halling. Investigating the Defect Detection Effectiveness and Cost Benefit of Norminal Inspection Teams. IEEE Transactions On Software Engineering, 29(5), May 2003. [10] M. E. Fagan. Design and Code Inspections to Reduce Errors in Program Development. IBM Systems Journal, 15(3):182—211, 1976. [11] David L Parnas and David M Weiss. Active Design Reviews: Principles and Practices. Journal of Systems and Software, (7):259—265, 1987. [12] Tom Gilb and Dorothy Graham. Software Inspection. Addison-Wesley, 1993.
[13] David A. Wheeler, Bill Brykczynski, and Reginald N. Meeson. Software Inspection — An Industry Best Practice. IEEE Computer Society Press, 1996. [14] Karl E. Wiegers. Peer Reviews in Software: A Practical Guide. Addison-Wesley, 2002. [15] Richard C. Linger and Carmen J. Trammell. Cleanroom Software Engineering: Theory and Practice. In Michael G. Hinchey and Jonathan P. Bowen, editors, Industrial-Strength Formal Methods in Practice, pages 351—372. Springer-Verlag, 1999. [16] Lesley Semmens and Tony Bryant. Rigorous Review Technique. In Michael G. Hinchey and Jonathan P. Bowen, editors, Industrial-Strength Formal Methods in Practice, pages 231—254. Springer-Verlag, 1999. [17] Shaoying Liu, A. Jeff Offutt, Chris Ho-Stuart, Yong Sun, and Mitsuru Ohba. SOFL: A Formal Engineering Methodology for Industrial Applications. IEEE Transactions on Software Engineering, 24(1):337—344, January 1998. Special Issue on Formal Methods. [18] Jin Song Dong and Shaoying Liu. An Object Semantic Model of SOFL. In Keijiro Araki, Andy Galloway, and Kenji Taguchi, editors, Integrated Formal Methods 1999, pages 189—208, York, UK, June 28-29 1999. Springer-Verlag. [19] Shaoying Liu. A Case Study of Modeling an ATM Using SOFL. Technical report HCIS-200301: http://cis.k.hosei.ac.jp/tr/, Faculty of Computer and Information Sciences, Hosei University, Koganei-shi, Tokyo, Japan, 2003.