Developing Quality Software Systems Using the SOFL Formal Engineering Method∗ Shaoying Liu Department of Computer Science Faculty of Computer and Information Sciences Hosei University, Tokyo, Japan Email:
[email protected] URL: http://www.k.hosei.ac.jp/~sliu/ Abstract
Formal Engineering Methods are a bridge from Formal Methods to industrial applications. In this paper I describe the relation between formal engineering methods and formal methods, and present a specific formal engineering method SOFL (S tructured O bject-Oriented F ormal Language) for developing quality software systems. I explain how SOFL can be applied in practice through examples.
1 Introduction Formal methods have made significant contributions to the establishment of the theoretical foundation for software development over the last thirty years by emphasizing the use of mathematical notation in writing system specifications and the employment of formal proof based on logical calculus for verifying designs and programs [1][2][3][4][5][6]. However, we should also clearly recognize the reality that formal methods are facing strong challenges to their acceptance by industrial users, although a few exceptions have been reported [7]. According to my observations, students who have studied formal methods at university hardly ever apply them in practice when they start to work in industry; some try initially, but most eventually give up. There may be various reasons for this unfortunate situation, but the fact itself indicates that education in formal methods alone is insufficient to solve the problem. I believe that formal methods need to be developed further to address many important engineering issues so that they can be applied to development of real systems in industrial environments. For example, how can
∗ This work is supported in part by the Ministry of Education, Culture, Sports, Science and Technology of Japan under Grant-in-Aid for Scientific Research (B) (No. 11694173) and (C) (No. 11680368).
formal specifications, especially for large-scale and complex systems, be written so that they can be easily read, understood, modified, verified, validated, and transformed into designs and programs? How can the use of formal, semiformal, and informal methods be balanced in a coherent manner to achieve the best quality assurance under practical scheduling and cost constraints? How can formal proof and testing, static analysis, or prototyping techniques be combined to achieve rigorous and effective approaches to the verification and validation of formal specifications, designs, and programs? How can the refinement from unexecutable specifications into executable programs be effectively supported? How can software development projects using formal methods be managed so that they can be well predicated before they are carried out, and well controlled during their implementations? And how can effective software tools supporting the use of formal methods be built so that the usability of formal methods can be improved and the productivity and reliability of systems can be enhanced? To attack these problems, we may need to compromise some of the original principles of formal methods. If we treat formal methods as an effective “medicine” for curing the “disease” of software crisis, but one which is too bitter for most people to swallow at the moment, then a compromise solution is to provide a “sugar coating”. Although this may reduce the effectiveness of the “medicine”, it allows most people to be able to take it and therefore to gain a trade-off between its effectiveness and accessibility. Since research to provide possible solutions to these questions addresses different aspects of the problem, I call this area Formal Engineering Methods (FEM). In other words, formal methods focus on the power of mathematical notation and calculus to increase the rigor of software development, without paying much attention to human factors (e.g., capability, skills, and education background) and other practical constraints (e.g., accuracy and completeness of requirements, changes in both specifications and programs, and the scale and complexity of systems), whereas formal engineering methods advocate the incorporation of mathematical notation into the software engineering process to substantially improve the comprehensibility and effectiveness of commonly used methods for development of real systems in the industrial setting. In this sense, formal engineering methods are a further development of formal methods, not towards achievement of more expressive power, but towards improvement of their usability for industrial applications. They should support the integration of compatible informal/semi-formal and formal methods for the construction of formal specifications in a user-friendly manner, and rigorous but practical verification of software systems. Over the last ten years, I have been working on a specific formal engineering method known as SOFL. SOFL stands for S tructured O bject-Oriented F ormal Language; it achieves practicality of formal methods while preserving their rigorous features by integrating conventional Data Flow Diagrams [8], Petri Nets [9], VDM-SL [10], as well as Object-Oriented approach [11] in a coherent manner for specification constructions, and by integrating formal proof with static analysis and testing in order to review and test specifications. It also supports the use of refinement and evolution as the underlying principle for developing
Formal Methods
Application of Formal Methods
Formal Engineering Methods
Figure 1: An illustration of formal engineering methods formal design specifications from informal and then semi-formal user requirements specifications. By refinement I mean the technique for resolving nondeterminism in the specification, while by evolution I mean a technique for either extending or modifying the specification. The remainder of the paper is organized as follows. Section 2 gives a detailed description of formal engineering methods. Section 3 introduces the SOFL specification language, while section 4 focuses on the discussion of the SOFL method. Section 5 and 6 describes, respectively, two verification and validation techniques: rigorous review and specification testing. Finally, in section 7, I offer some concluding remarks and outline the future research.
2 Formal Engineering Methods Formal Engineering Methods aim to support application of formal methods to the development of large-scale and complex computer systems; they are neither equivalent to applications of formal methods, nor equivalent to formal methods themselves. Rather, they are a bridge between formal methods and their application, as illustrated in Figure 1. How to build the bridge therefore becomes the primary task of formal engineering methods. As mentioned in the introduction, integrations of formal notation with commonly used graphical notations and formal proof with commonly used verification and validation techniques may provide a solution to this problem. Attempts at integrations should be made to achieve accessibility or usability of formal methods. For this reason, the use of formalism should be limited to the level at which they help to increase clarification of ambiguity, and should be kept as simple as necessary. In principle, formal engineering methods should have some or all of the following features: — Adopting specification languages that integrate properly graphical notation, formal notation, and natural language. Graphical notation is suitable for describing the overall structure of a specification comprehensibly, while formal notation can be used to provide precise abstract definition of the components involved in the graphical representation. Interpretation of the formal definitions in a natural language facilitates understanding of them.
— Employing rigorous but practical techniques for verifying and validating specifications and programs. Such techniques are usually achieved by integrating formal proof and commonly used verification techniques, such as testing [12], reviews [13], inspection [14], and model checking [15]. — Advocating the combination of prototyping and formal methods. A computer system has both dynamic and static features. The dynamic features, including the usability of a GUI and system performance, are apparent only during the system operation. Prototyping can be effective in capturing the user requirements for some of the dynamic features in the early phases of system development and provide a basis for developing an entire system using formal methods. — Supporting both refinement and evolution rather than only strict refinement in developing specifications and programs. Evolution of a specification, at any level, means change, and such change does not necessarily satisfy strict refinement rules (of course, it sometimes does). The interesting point is how to control, support, and verify changes of specification during software development in a practical manner. — Deploying techniques for constructing, understanding, and modifying specifications. For example, effective techniques for specification construction can be achieved by integrating existing requirements engineering techniques with formal specification techniques, and techniques in simulation and computer vision can be combined to create visual simulation technique to help specification understanding and explore the potential behavior of the ultimate system. — Adopting intelligent software tools to support formal specification and rigorous verification in such a way that the process of either building a specification or conducting a verification is guided by the tools, so that the developer is under the “control” of the tools and may not necessarily need to directly manipulate formal notations as well. Thus, the efficiency, productivity, and correctness of software development can be substantially enhanced. In summary, formal engineering methods embrace integrated specification, integrated verification, and all the supporting techniques, with the purpose of providing user-friendly formal notations and methods for automation of specification construction, transformation, and system verification and validation. To put it simply,
FEM = Integrated specification + Integrated verification + Supporting techniques Effective formal engineering methods would have to be realized by means of software supporting tools. Fortunately, the precise syntax and semantics of
formal specification languages provide great advantages over traditional informal or semi-formal languages, allowing the construction of powerful tools that support software development in depth and systematically. I believe that the quality of software tools is the key to success in making formal specification and rigorous verification techniques more accessible to industry at large.
3 The SOFL Specification Language The SOFL specification language was designed by integrating Data Flow Diagrams (DFDs), Petri Nets, the formal specification language VDM-SL (Vienna Development Method - Specification Language), and the features of objectoriented programming languages [16][17]. A DFD is intended to describe the architecture of a specification (e.g., system requirements specification and design specification), while a Petri Net is used to provide an operational semantics for the DFD. Such a formalized DFD is called a CDFD (condition data flow diagram) in SOFL. To provide precise definitions for the components of the CDFD, such as processes, data flows, and stores, VDM-SL is adopted, with necessary extensions or modifications. To support data abstraction and object-oriented features, such as encapsulation, information hiding, reusability, and polymorphism, classes are allowed to be defined and their objects are allowed to be used as either data flows or stores in CDFDs. A specification in SOFL is a hierarchy of condition data flow diagrams (CDFDs), resulting from decomposition of the processes involved. A CDFD at each level is associated with a module for formal definitions of the components of the CDFD, as shown in Figure 2. In fact, a module can be perceived as a self-contained structure with a certain behavior (or functionality) that is represented by the associated CDFD. A process models a transformation from its input to its output; a data flow indicates a data item moving from one process to another; and a store represents a data depository (such as a file or database). A module plays the same role as both data dictionary and process specification in conventional data flow diagrams [8], but the difference is that all the definitions are given in a formal textual notation to achieve preciseness and conciseness: all the data flows and stores are defined with types and invariants, and processes are defined using either a pair of pre and postconditions or explicit specifications. The general structure of a process specification is as follows:
process ProcessName(input) output ext ExternalVariable pre PreCondition post PostCondition decom LowerLevelModuleName explicit ExplicitSpecification comment end_process
module SYSTEM_Example const, type, var, inv, behav, process P1; process P2;: process P3; process P4; end_module; module P1_Decom; const, type, var, inv, behav, process P11; process P12; process P13; end_module ; module P3_Decom; const, type, var, inv, behav, process P31; process P32; end_module ;
b a
P2
P4 c
y a
P11
x
d
P1 P3
f
e
P12 b
C(x) z
P13
c
e w
c c
class S1; const, type, var, inv; method Init; method M1; method M2; end_class;
P32
P31
class S2; const, type, var, inv; method Init; method M3; method M4; end_class;
Figure 2: The outline of a SOFL specification A process specification starts with a process name ProcessName, its input data flows input, and output data flows output. If the process deals with external variables (state variables defined in the var section of the module), they need to be listed after the keyword ext. The functionality of the process can be defined by either a pair of pre and postconditions, usually known as implicit specification, or an explicit specification ExplicitSpecification, which is formed by using the predicate logic together with the sequential, conditional, and iteration constructs available in most programming languages. A complex process may be decomposed into the next lower level CDFD whose associated module must be written after the keyword decom for good traceability in the documentation. To facilitate understanding of the formal specification of the process, informal comments may be written to provide a proper explanation. It is worth noting that all the parts of a process specification, except its signature (i.e., process name, input and output data flows), are optional. A class in a specification is treated as a user-defined type from which objects can be instantiated for being used as data flows and/or stores in CDFDs. A class shares the similar structure with a module, but differs from the module in several ways. First, objects can be derived from a class but cannot be derived from a module. Second, a method in a class can have only one output, whereas a process may have many. Finally, a class has no associated CDFD, but a module is associated with a CDFD.
4 The SOFL Method The SOFL method is composed of a software process model and techniques for specification and verification. The former provides an overall control of software
development process, while the latter provides specific techniques to support activities involved in the development process.
4.1 The process model
Development of software systems using SOFL combines the approaches of the waterfall model [18] and the transformation model [19], as shown in Figure 3. The process is similar to the waterfall model in emphasizing the necessity of requirements analysis, design, and coding, but differs from the conventional transformation model in the way that transformations from high-level specifications to low-level ones may not necessarily be strict refinements: they can be either evolutions or refinements, depending on the phase of the development. One of the important features of the process model is the use of a threestep approach for building formal specifications: informal, semi-formal, and formal specification. Developing an informal specification helps the analyst (the person who carries out requirements analysis) study the domain knowledge and identify what data items and operations are necessary for the system to be built. Transforming an informal specification into a semi-formal one helps the analyst clarify ambiguities in the concepts involved, understand the required operations in more detail, and set up a well-organized and comprehensible specification. Since informal and semi-formal specifications are suitable for use in communications between the user and the developer, they are normally used for documenting requirements. On the other hand, a complete formalization of the semi-formal specification allows the designer to understand the user requirements and clarify ambiguities in the semi-formal specifications, and it will also help to build a firm foundation for implementation. Therefore, a formal specification should be achieved to represent design, including both abstract design and detailed design. The abstract design focuses on the architecture of the system and the functionality of related operations, which are suitable for being defined with pre and postconditions, whereas the detailed design is intended to provide algorithms and data structures to facilitate implementation using a specific programming language (e.g., Java). In fact, it is a common phenomenon nowadays that companies conducting requirements analysis and writing requirements specifications are different from the companies designing and implementing the systems, for economic reasons (e.g., the cost of design and implementation by another company is much lower). Therefore, there is a strong need for the companies conducting design and implementation to study the requirements thoroughly and clarify potential ambiguities. The three-step approach of the SOFL method can help enhance the quality of the communications, understanding, and the documentations in such a process. The SOFL process model also emphasizes that transformations from informal specification to semi-formal specification and then to formal specification can be achieved by evolution, which means any of three activities: refinement, extension, and modification. It also requires that transformations from the abstract design to the detailed design and then to the final program be a strict
I n f o r m
a l
s p e
c i f i c a t i o n
E v o lu tio n
V a lid a tio n
R e q u ire m e n ts a n a ly s is
S e m i-fo rm a l sp e c ific a tio n
V a lid a tio n
E v o lu tio n
F o rm a l im p lic it s p e c ific a tio n (a b s tra c t d e s ig n ) D e s ig n V e rific a tio n
R e fin e m e n t
F o rm a l e x p lic it s p e c ific a tio n (d e ta ile d d e s ig n )
R e fin e m e n t
V e rific a tio n
p ro g ra m
C o d in g
Figure 3: The software development process using SOFL refinement process. Thus, it can be ensured that the final program meets the design specification and, hopefully, the requirements specification as well. To ensure the quality of evolution and refinement, rigorous but practical techniques are advocated to support validation and verification of specifications produced in various phases. Validation aims to make sure that the written specification does reflect accurately the user’s conception of requirements while verification ensures the consistency within a specification or between different specifications.
4.2 Example
I take a simplified ATM (Automated Teller Machine) as an example to illustrate how a system can be formalized by taking the three-steps introduced above. First, an informal specification that contains all the desired functions is written as follows: displaying the balance of an account or withdraw from the account. (2) Insert a cashcard and supply a password. (3) If displaying the balance is selected, the current balance is displayed. (4) If withdraw is selected, the amount of the money to withdraw is properly provided. (1) Select the service of
To gain a thorough understanding and clarify the potential ambiguities in the specification, we develop it into a semi-formal specification. Since the scale of the problem is rather small, I use a single module to model the system in which the functional specification of the desired processes, data types, and communication between processes are described. The functional specification is
1
account_file e_msg
amount Withdraw
cash card_id
balance
Receive_ Command
w_draw
sel
account1
Check_ Password
pr_meg account2
pass
Show_ Balance
balance
Figure 4: The CDFD modeling an ATM given in the form of pre and postconditions, but in the manner that both pre and postconditions are basically defined informally. Data types are defined using SOFL built-in data types or the user-defined classes, but given types are allowed. The communication between processes is modeled using a CDFD, as shown in Figure 4. The outline of the semi-formal specification of the system is given as follows:
module SYSTEM_ATM; type Account = composed of
account_no: nat0 password: PassWord balance: real
end
PassWord = given; /*The type PassWord will be finalized in the formal specification. */
var ext #account_file: set of Account; /*this variable contains data in a file. */ behav CDFD_1; /* Assuming the CDFD of the ATM is numbered 1. */ process Init() end_process; /* The initialization process has no specific function because there is no local store in the CDFD to initialize. */
process Receive_Command(balance: sign w_draw: sign) sel: bool ... end_process; |
process Check_Password(card_id: nat0, sel: bool, psas: PassWord) account1: Account pr_meg: string account2: Account
|
|
... end_process; process Withdraw(amount: real, account1: Account) e_msg: string cash: real ext wr account_file pre “account1” exists in “account_file”. post if the balance of “account1” is greater than or equal to “amount”, then |
assign “amount” to cash and reduce “amount” from the balance of “account1”; otherwise, if “amount” is greater than the balance of “account1”, then issue an message to indicate the lack of sufficient money to withdraw from “account1”. end_process;
process Show_Balance(account2: Account) balance: real ... end_process; end_module; The next step is to design the system based on the semi-formal requirements specification. The focus of this step is on the architecture of the system and the formalization of its every component. For this reason, the structure of the module, including its associated CDFD, processes, and data variables in the semi-formal specification may need to be evolved to suit the design purposes (e.g., to achieve high efficiency and good usability of the system). Following this line we derive the following formal design specification, with necessary omission for the sake of space:
module SYSTEM_ATM; type
... PassWord = nat0; /* the precise definition of PassWord is finalized. */ ...
inv forall[x: Account] 1000 account1.balance - amount)})
or not exists[x: account_file] x = account1 and x.balance >= amount and |
comment
e_meg = ”The amount is too big”)
If the balance of “account1” is greater than or equal to “amount”, then assign “amount” to cash and reduce “amount” from the balance of “account1”; otherwise, if the “amount” is greater than the balance of “account1”, then issue an message to indicate the lack of sufficient money to withdraw from “account1”. end_process;
...
end_process; end_module; In this formal design specification we adopt the semi-formal specification in comments of processes to help explain the meaning of the formal specification for future maintenance or system evolution. Also, we decide to define the given type PassWord in the semi-formal specification as nat0 for passwords are required to be a natural number with four digits. In this particular case, we do not need to evolve any process or the entire CDFD, but this does not mean there is no need to do so in general. Note that several operators defined on set types and composite types are used, such as union(), diff (), modify(), etc. Briefly speaking, the operation union(x, y) is the union of two sets x and y; diff (x, y) yields the set whose elements belong to x but not y; and modify(x, f -> v1) yields a new composite object from the given composite object x by replacing the value of its field f with v1. The detailed discussion of these operators can be found in [16].
5 Rigorous review Review is a traditional technique for static analysis of software to detect faults that undermine its reliability. Basically, software review means to check through software in an appropriate manner, either by a team or an individual. Since software means both program and its related documentations, such as specification and design, a review can be conducted for all level documentations. Various review methods have been proposed and/or applied in practice with different names, such as active design reviews [13] and inspection [14], and more importantly, many studies have shown that detecting faults in specifications helps to substantially reduce the cost and risk of software projects [20]. When dealing with specifications with no formal semantics, the review techniques have to be
applied intellectually based on reviewers’ experience, and may not be supported systematically in depth. However, for formal specifications more rigorous review techniques can be developed and applied. To make reviews effective and efficient, especially for large-scale and complex systems, it is important to use a systematic method that allows the reviewer to focus on a manageable component at a time and provides an automatic analysis based on the review results of all the related components. In this section I describe a specific technique for rigorous reviews of SOFL specifications with those features. 5.1
Steps for rigorous reviews
A review of a specification takes four steps: 1. derive all the necessary properties to be reviewed from the specification, and to express them as predicate expressions. 2. build a Review Task Tree (RTT) to present all the review tasks graphically and logically. 3. perform reviews for all necessary tasks in the tree. 4. evaluate the review results to determine whether any fault is detected or not. Review is intended not only to verify the consistency of the specification, but more importantly to validate the specification. For this reason, a review task tree must show a clear logical relation between the top task and other review tasks. The evaluation of review results of all the individual tasks will be properly used to determine whether the property under review is correct. There might be many kinds of properties for a specification, but the most important ones include — internal consistency, — satisfiability, — validity. Definition 1 A specification is said to be internally consistent if and only if
there is no contradiction with the syntax and semantics of SOFL language in the specification.
Definition 2 A specification is satisfiable if and only if there exists an output
satisfying the postcondition for a given input satisfying the precondition.
Such a specification ensures the existence of a program that implements it. Definition 3 A specification is valid if and only if it satisfies the user’s con-
ception of requirements.
Since the conception of requirements is not a formal concept, the formalization of the “validity” concept is almost impossible. This nature presents a challenge to reviews of specifications for their validation, and there seems no radical solution yet to this problem. 5.2
Properties of specifications
To conduct rigorous reviews of the properties described above, we first need to express them as predicate expressions. For the sake of space, I focus on the consistency between a process and the related invariant as an example to explain the procedure of a rigorous review. Let P (x; y; s) : [pre_P, post_P ] denote a process and I = ∀x∈X · Q(x) a related invariant defined in module M . Then the consisteny between P and I is defined as follows:
A process P and invariant following conditions hold:
Definition 4
I
are consistent if and only if the
(1) ¬(pre_P ∧ I ⇔ f alse) (2) ¬(pre_P ∧ post_P ∧ I ⇔ f alse) In other words, invariant I must not be violated before and after the execution of process P , according to its pre and postconditions, for the invariant is required to be sustained throughout the module and the related parts in the entire specification. To review these predicate expressions, we build a review task tree (RTT) drived from the expressions. Figure 5 shows all the most important nodes of the review task tree notation. Each node represents a review task, defining what to do with a property, and it may be connected to “child nodes” in different ways, depending on the type of the node. There are two kinds of review tasks: one is “the property involved holds” and another is ”the property involved can hold”. The result of reviewing a task has three possibilities: positive, uncertain, and negative. A positive result means that no fault in the property involved is detected; an uncertain result provides no evidence to either support or deny the property; and a negative result indicates that the property contains faults. Let us take the process Withdraw and the only invariant, known as I, for type Account in the ATM system as an example to illustrate how a RTT can be built to support a review of their consistency. Figure 6 shows a RTT built based on the formal expression of the consistency: 1000