A Formal Approach to Software Components Classification and Retrieval William C. Chu Dept of Information Science Feng Chia University Taichung, Taiwan
Chao-Tsun Chang Information Center
GHQ ROCAF, Taiwan
[email protected] Chung-Shyan Liu* Dept. of Infor & Computer Engr Chung Yuan C. University Chung Li, 32023, Taiwan
Hongji Yang Computer Science Dept De Montfort University England hjyQdmu.ac.uk
1iucsQmars.ice.cycu.edu.tw
Abstract
of abstraction. We can not integrate two conceptually consistent, but implementation inconsistent, components together [2]. Consistency checking will be difficult if the components are not formally specified. In this paper, we propose an approach to reuse-based software development using formal method. In our approach, each software component is annotated with a set of predicates to formally describe the component. The Components are classified by the formal predicates using faceted scheme. A user may retrieve components from the library using either keywords or predicates. When a component is retrieved, it is checked to determine if it matches the requirements. Then, the component is integrated with the designed system, along its required functionalities. The integrated component/system is transformed into a Predicate/Transition net (PrT net) to perform a consistency checking. If there is no inconsistency, the component may be adapted or incorporated directly. Otherwise, the conditions that cause the inconsistency will be revealed. The user may decide to search the library again by modifying the query specification and restart the whole process, or to terminate the search. The rest of this paper is organized as follows. Section 2 is related work. Section 3 describes our approach, which includes definition components and predicates, classification model, similarity measure, and retrieving process. Section 4 proposes reuse-based software development using our model. Section 5 presents discussion and future work. Section 6 is conclusion.
I n this paper, we propose a n approach t o reuse-based software development using formal method. I n our approach, each software component is annotated with a set of predicates t o formally describe the component and is classified using faceted scheme. A user m a y retrieve components f r o m the library using either keywords or predicates. W h e n a component is retrieved, it is checked t o determine i f it matches the requirements. Then, the component is integrated with the designed system, along its required functionalities. T h e integrated component/system is transformed into a Predicate/l?ransition net ( P r T n e t ) t o perform a consistency checking. If there is n o inconsistency, the component m a y be adapted or incorporated directly. Otherwise, the conditions that cause the inconsistency will be revealed. T h e user m a y decide t o search the library again by modifying the query specification and restart the whole process, or t o terminate the search.
1
Introduction
Software reuse has the potential for improving software quality and programmer productivity. Software reuse involves four steps: (1) definition, (2) retrieval, (3) adaptation, and (4) incorporation [7]. Therefore, it is essential that there exists a good software library, where the reusable components are properly classified so that a user can easily retrieve the needed components. Also, it is helpful that some guidance be available to assist a reuser to modify and incorporate the retrieved components into a system. However, it is difficult to effectively classify software components and to retrieve them for reuse, because the concepts represented by a component, as well as its semantics of interface, are usually not formally defined and thus difficult to be extracted. Also, to correctly incorporate a software component into a system, it is required that consistency checking be performed at different level
2
*Supported by grant from NSC, ROC/Taiwan,under cyntract NSC85-2213-E033-002.
264 0730-3157/97 $10.000 1997 IEEE
Related Work
Basically, there are three approaches to software component classification and retrieval: keywords/faceted scheme, information retrieval, and semantic network based approach. In keywords/faceted approach, a set of pre-defined keywords are provided for classifying the software components. Domain analysis must be performed first to identify the keywords [3]. Faceted scheme uses several facets,
where each facet contains a keyword, to describe a software component [4].In faceted scheme, similarity between two components are computed using conceptual distance, which estimates the expected effort to adapt a software component to satisfy the design specification. In information retrieval approach, keywords are extracted from the text automatically [5][6]. But, the keywords extracted usually do not always carry semantic information. In GURU [6], lexical affinities and their statistical distribution were used for automatically extracting index information from documents in natural languages. But the hierarchical agglomerative clustering used for classification may not be suitable for software components, since software components usually contain much more semantic information. Semantic network provides structural knowledge representation with inference power and is suitable for representing the concepts contained in software components. However, this method needs excessive labor work to create a knowledge structure of moderate size and can only support a narrow domain [7]. Recently, there are proposals that combine different types of approaches. In [l],a hybrid approach that combines) faceted scheme and information retrieval was proposed. This approach extends faceted scheme by associating a set of keywords with each facet, where the keywords and their significance can be extracted from code and documents using information retrieval. The similarity between two components is computed by counting the weighted significance of the matched keywords. In [7], a library system for software reuse, called AIRS, that Combined faceted scheme and semantic network was presented. In AIRS, the software components are represented using frame structure that together constitute a semantic network. The similarity between two components is measured by subsumption and closeness, both are manually assigned. In [2], a formal method, based on semantic interface predicate/transition net, was proposed. In this approach, a software component is annotated with formal predicate, which may be obtained either from reverse engineering or directly from design phase if formal method is used for development. A formal method captures the semantics of a software component more precisely and thus prQvides more detailed information for the adaptation and integration of the calmponent.
file is opened and FALSE if it is closed. A component is described by a finite set of related and/or unrelated conditions. A set of related conditions represent conditions with dependency relationship, i.e. the set of conditions that may have to be evaluated in some predefined order. A set of unrelated conditions require no such order. For example, two conditions for Stack components, Push(Stack-A) and Pop(Stack-A), condition Push(Stack-A) should hold before condition Pop(Stack-A) to prevent accessing from an empty stack. class vending-machine c protected: int total ; int change ; unsigned char button-AB-flag ; char select-button ; void compute-change0 ; void button-switchcint money) public: vending-machine 0; { total=O; change=O; button-AB-flagrO; select-button=’ ’; 1 void insert-coincdime *dm) ; void insert-coincquarter *qt) ; void press-button-AO ; void press-button-BO ; void coin-return0 ;
1 ; // !Precondition: //!CLASS(vending-machine) //!Postcondition:
//!DATA-MEMBER(total,#INTEGER, #PROTECTED) //!ASSIGN(total, 0) //!DATA-MEMBER(change,#INT,#PROTECTED) //!ASSIGN(change. 0) //!DATA-MEMBER (button-AB-flag, #UNSIGNED CHAR, // ! #PROTECTED) //!ASSICN(button-AB-flag. 0) //!DATA-MEMBER(select,button. #CHAR. #PROTECTED) //!ASSIGN(select-button. ’’1 //!MEMBER-FUNCTION(compute-change, #VOID. //! #PROTECTED, #VOID):VIRTUAL(#O) //!MEMBER-FUNCTION(button-switch. #VOID, //! #PROTECTED, money, #INTI :VIRTUAL(#O) //!MEMBER-FUNCTION(insert-coin, #VOID, #PUBLIC, //! dm, # *dime) :VIRTUAL(#O) .OR. //!MEMBER-FUNCTION(insert-coin, #VOID, #PUBLIC. //! qt. # *quarter) :VIRTUAL(#O) //!MEMBER-FUNCTION(press-button-A, #VOID. //! #PUBLIC, #VOID) :VIRTUAL(#O) //!MEMBER-FUNCTION(press-button-B, #VOID, //! #PUBLIC, #VOID) :VIRTUAL(#O) //!MEMBER-FUNCTION(coin-return, #VOID. #PUBLIC, //! #VOID) :VIRTUAL(#O)
3 Our Approach 3.1
Condition, Components, Classes
The base objects in the library are called components that are grouped into sets called classes. The conditions are the description of attributes, behavior, or design information of components and/or classes. For example, condition can be a description of ”functionality”, ”performance complexity”, ”required 0. S.” , ” pre-condition”, ”post-condition”, ”relation”, ... etc. Conditions are the basis of our classification system. A condition is defined as a predicate. A predicate is either a relation, which represents the relation of attributes of components and/or classes, or a function that accepts a list of attribute parameters and returns either TRUE or FALSE. For example, the predicate Greater(x, 3) is a relation means that ” x is greater than 3” and the predicate Pile-Open?(file) is a function that returns TRUE if
Figure 1: A class declaration a n d its annotated pred-
icates.
265
3.2
Classification Model
and candidate components will be computed according to the above formula. The similarity will be 1 only if there is a component that matches the query perfectly.
The information of a program can be distinguished into four levels: text level, syntactic level, semantic level, and conceptual level. If the predicates can be defined for each level, then the predicates can be applied at all levels of abstraction during retrieval and analysis. Some predicate, called primitive predicates, generally in text and syntactic level, can be reverse engineered and automatically generated through code analysis. For example, the predicates Public(), Private(), and Protected() that indicate the access privilege of class members can be automatically generated. Some predicates, especially the predicates in conceptual level and related to design informaiton, can not be generated through code analysis and must demand the input from system engineers or domain experts. Such predicates are called specified predicate. Finally, some predicates, called induced predicates, can be inferred from primitive and/or specified predicates. The components are classified by their predicates using faceted scheme, where each facet corresponds to one type of predicates as indicated above. Our classification model is different from faceted scheme in the following: 1. In facet scheme, each facet represents a category of concepts. But in our model, besides a category of concepts, each facet also represents a level of abstraction. 2. In our model, the number of facets needs not to be fixed. The facets are implicit at retrieval and thus transparent to user.
3.3
3.4
+
1. Reuser input keywords and possible attributes of required components/classes 2. Use keywords to match the names of predicates and list the candidate components/classes attached with the matched predicates. 3. Ask reuser input information related to attribute parameters of matched predicates. Use input information to filter the candidate components/classes.
Similarity
Iteratively applied this step until a small set of possible candidate components/classes have been generated.
The similarity between two components is computed by matching predicates. Suppose that component A has predicates (a1,a2, ...,a,) and component B has predicates ( b l ,b2, ...,b m ) , then the similarity between A and B is computed as
4
=
Components Specification Phase - A user states the requirements of the software to be developed (target) in predicates if the design specification is well-defined. When the design specification is not clear, such as in exploratory programming, the user may make query in keywords. Retrieving Process - Based on the specified keywords and predicates, retrieve appropriate reusable components/classes from reuse library. Integration/Analysis Phase - The retrieved component is transformed into P r T net and then integrated, along the required functionalities, with other parts of the designed system, also in PrT net. The integrated PrT net is analyzed to check if there is any inconsistency. If yes, the predicates that cause the inconsistency will be revealed. It should be noted that the inconsistency checking can be fully automated. In other approach, a reuser has to read the code and associated documents to find out the inconsistency. Verification and Adaptation Phase - Check if the retrieved components needs to be modified and then make necessary modifications.
1 if ai = bj Oifai#bj fori = 1 . .. n a n d 3 = 1 . ..m
The similarity is computed by counting the number of matched predicates between two components and then this number is normalized. Since a predicate may have more than one attributes associated with it, and if it is allowed that not all attributes have to match exactly, then the matched weight may be re-defined as Eij
Software Development with the Support of our Model
Components and classes in reusable library can be used to build different types of software. The steps for the software development process with the support of our model are described as below:
where eij is the matched weight between predicate a of component A and predicate j of component B and is defined as:
eij
Retrieving Process
With the assistance of our classification model, the reuse library can offer much more help than current reuse libraries. The query process of current reuse libraries is completely guided by reusers, i.e. the required information are inputed by reusers with a lot of browsing efforts involved. Thus, the libraries provide only limited help when the reusers have only rough idea about the components. Alos, the libraries can only provide very limited assistance when the reusers know the exact components that they want, because of lack of proper formalism support. We call these libraries top-down reuse libraries. Based on the attributes presented in predicates, of reusable components/classes, our reuse library can ask/guide reusers to input more detailed information. We call our library a top-down bottom-up reuse library. The steps of retrieving process can be described as follows:
= matched attributesltotal n u m b e r of attributes
Now eij may be a fraction instead of just 1 or 0. In our approach, we use Eij instead of eij. When a user makes a query to the library, the query may be either in keywords or in predicates. If a query is made in keywords, the query will be transformed into predicates before the match begins. The predicates in the query will be matched with the predicates of the components in the library and the similarity between the quefy
266
5 Integration Phase - Integrate retrieved or modified components/classes, along the required functionality, with the designed system. The above steps may be iteratively applied. If any inconsistency is detected, a software designer needs to go to step 1, using the detected inconsistent predicates to modify the predicate specification of a query and then go through the whole process again.
5
AnExample
Quarter
For each basic block of a program, there are annotated predicates, a pre-condition and a post-condition, to describe the semantic meaning of the basic block. The functionalities and behaviors of a software component can now be formally described by a set of predicates. The annotated predicates make a software component easier to understand. Also, inference rules may be defined for consistency checking when the component is integrated into a designed system. In this section, an example, a vending machine, will be used to illustrate our reuse-based formal approach to software development. We will show how the behaviors and functions of a software component are specified in predicates, how the requirements predicates may be specified to retrieve a target component, and how a user may be guided to retrieve a component that best match the requirements. Also, how the similarity is computed and used. In Figure 1, a C++ class declaration and its annotated predicates are shown. This C++ class ’declaration defines the interface of an example vending machine. It should be noted that some design information that are scattered in the source code may be captured by the predicates. For example, // ! DATA-MEMBER (change, #INT ,#PROTECTED) describes that change is a data member of the class. It also describes the data type and access mode of change. In Figure 2, the predicates for a member function, press-button-A(), are shown. void vending-machine::press-button-A() if (total>=dO) I select-button=’A’; total-=60;
I ~
I
Not enough money
/Atota’ Enough money
Coin inserted
z dropped
total = 0 (no change)
change dropped
Figure 3: A Predicate/Transition net representation of a vending machine.
I
The price for both drink A and drink B is 60 cents. After a buyer makes a selection, the change will be given to the buyer automatically. In Figure 3, a predicate/transition net representation of the vending machine is shown. The user then searches the library for a component that matches the requirements. The user may query the library either in keywords or in predicates. Suppose that the user makes a query in keywords first and selects the keyword CLASS from the keyword list that is provided by the browser in some facet. This will match the predicate CLASS, which has the following attributes: 0
3 / / !Precondition : //!MEI~BER-FUNCTION(press_button-A, #VOID, #PUBLIC,
//! #VOID) :VIRTUAL(#O) // !Postcondition: //!CONDITION(total>=60) ; // !AS!;IGN(select-f lag, ’ A ’ ) ) ; // !ASSIGN(tota1, total-60)
CLASS(name [,inherited mode, inherited class name])
The guidance agent should now prompt to ask the user for more information, such as class name, inherited mode and inherited class name, which are all attributes of the predicate CLASS. Suppose now that the class name entered by the user is Vending Machine, which matches a library component with the same name. Since each component has a set of annotated predicates, the predicates will now be used for further guidance. The predicates of the library component, vending machine, is shown in Figure 1. The component,
Figure 2: A member function definition and its annotated predicates. Assume that a user needs to develop a vending machine, which has the following requirements: 0 The vending machine sells two drinks, drink A and drink B, which a buyer may select using a button on the panel. 0 The vending machine only accepts quarters.
267
//!Precondition: //!MEMBER-FUNCTION(insert-coin, #VOID, #PUBLIC, //! qt. # *quarter) :VIRTUAL(*O) //!Postcondition: //!ASSIGN(total , total+ 25) void vending-machine::insert-entry(quarter *qt) { total+=25;
>
Figure 4: The member function insert-coin and its annotated predicates. vending machine, has several data members and member
functions. Each data member is annotated with a predicate shown below: DATA-MEMBER(name, #data type, #access mode)
//!Precondition: //!CLASS(vending-machine) //!Postcondition: //!DATA-MEMBER(sum, OINT, #PROTECTED) //!ASSIGN(sum, 0) / / !DATA-MEMBER(change, #INT.#PROTECTED) //!ASSIGN(change. 0) //!DATA-MEMBER (button-flag, #INT ,#PROTECTED) //!ASSIGN(button-flag, 0 ) //!DATA-MEMBER(se1ect-button. #CHAR, #PROTECTED) //!ASSIGN(select-button, ") //!MEMBER-FUNCTION(calculate, #VOID, #PUBLIC, //! #VOID):VIRTUAL(#O) //!MEMBER-FUNCTION(switch, #VOID, #PROTECTED, //! money, #INTI :VIRTUAL(#O) //!MEMBER-FUNCTION(coin-entry, #VOID. #PROTECTED, //! qt, # *quarter) :VIRTUAL(#O) //!MEMBER-FUNCTION(button-A, #VOID, #PUBLIC, //! #VOID) :VIRTUAL(#O) //!MEMBER-FUNCTION(button-B, #VOID. #PUBLIC, //! #VOID) :VIRTUAL(#O)
For example, the data member, total, is annotated with the predicates:
//!Precondition: //!MEMBER-FUNCTION(coin-entry, #INT, #PROTECTED, //! qt, # *quarter) :VIRTUAL(#O) //!Postcondition: //!ASSIGN(sum , sum+ 25) / / !RETURN(sum)
//!DATA-MEMBER(total,#INTEGER, #PROTECTED) // !ASSIGN(tota1, 0) The first predicate means that total is an integer and is declared as a protected member, which is its access mode. The second predicate means that when an object is instantiated, total has initial value of zero. Each member function is annotated with predicates as below:
......
Figure 5: The predicates of the target vending machine that user specifies.
MEMBERJU"TION( name, #return type, #access mode C, #VOID I [,parameter name, #data type] 1 ) :VIRTUAL(C#OI#13)
0
For example, the predicates that are annotated with the member function, insert-coin(), are shown in Figure 4. The pre-condition says that the member function is declared as a public and virtual, and it takes a variable qt of type quarter* and returns nothing. The post-condition say that after insert-coin0 is called, total is incremented by 25. Since the design specification of the required vending machine is described in PrT net and is shown in Figure 1, the user will answer the questions prompted by the guidance agent according to this PrT net. At the end of the query, the required vending machine is specified as shown in Figure 5 . Now, we may match this set of predicates with components in the library and retrieve a best matched component based on similarity measure presented in previous section. A predicate may contain more than one attributes and some attributes between two predicates must agree with causing any consistency problem. For example, a double and a char* are not compatible and thus can not be converted in most cases. Such attributes will be denoted with # in a predicate. Also, the order of attributes must be preserved when matching predicates. In our approach, besides similarity, two other important indications may also be pointed out: 0
Under specification: some predicates of the candidate component are not matched.
Both conditions mean that an adaptation phase must be performed. The unmatched predicates may provide valuable information for user to locate the place where modification must be performed. In over specification, the user must add or change some code of the component to meet the requirements. In under specification, the component may be reused directly by removing part of code. Suppose that there is a vending machine class in the library and its design specification, in PrT net, is shown in Figure 6. The predicates of this candidate vending machine are shown in Figure l. In Figure 7, the results of the matched between two vending machines are shown.
6 6.1
Discussion Inference
When a user makes a query, it is very likely that the pre-condition of the query matches pre-condition of some components, and the post-condition of the query matches post-condition of some other components. But there is no component in the library match both the pre-condition and post-condition of the query. However, it is possible that the pre-condition and post-condition of the query can be satisfied by composing several components. A query Q can be satisfied by composition of a list of components CO,Cz, . . . ,C, if the following composition rule holds:
Over specification: some predicates of the requirements are not matched.
268
Figure 7: Comparison between the predicates of two vending machines. will be meaningful only if an extensive domain analysis is performed.
Enough money Coin inserted
7 Conclusion
Coin return
We have presented an approach to reuse-based software development using formal method. In our approach, each software component is annotated with a set of predicates to formally describe the component. The components are classified by the predicates using faceted scheme. A user may retrieve components from the library using either keywords or predicates. When a component is retrieved, it is integrated with the designed system as a Predicate/Transition net (PrT net) to perform a consistency checking. We also presented an example using vending machine. From the example, is can be seen that a formal method may describe the semantics of a component precisely and thus the accuracy of retrieval is very high. Also, our approach allows a guidance agent to be designed to assist a user to find a best component that meets the requirements.
L
total = U (no change)
total > 0 Caculste change change dropped total=O
__
Figure 6: A Predicate/Transition net represe.ntation of a vending machine class in the library.
References
[11 C.T. Cham and C.S. Liu. “A Hybrid Amroach 6
to Object “Library Classification &nd Retrieval”, Proc IEEE Computer Software and Applications,
1. The pre-condition of Q is qz and the post-condition is
COMPSAC-95,1995,278-283. [2] W.C. Chu and H. Yang, “A Formal Method to Software Integration in Reuse”, Proc IEEE Computer Software and Applications, COMPSAC-96,1996,343348. [3] R. Prieto-Diaz, “Domain Analysis for Reusability”, Proc IEEE Computer Software and Applications, COMPSAC-87, 1987,23-29. [4] R. Prieto-Diaz, “Implementing Faceted Classification for Software Reuse”, Communication of the ACM, Vol 34, No 5, May 1991,89-97. [5] W. F’rakes and B. Nejmeh, “An Information System for Software Reuse”, Proc of 10th Mannowbrook Workshop on Software Reuse, 1987,142-151. [6] Y. S. Maarek, D. M. Berry, and G. E. Kaiser, “An Information Retrieval Approach for Automatically Constructing Software Libraries”, IEEE Transactions on Software Engineering, Vol 17, No 8, Aug 1991, 800-813. [7] E. Ostertag, J. Hendler, R. Prieto-Diaz, and C. Braun, “Computing Similarity in a Reuse Library System: An AI-Based Approach” , ACM Dansactions on Software Engineering and Methodology, Vol 1, No 3, July 1992,205-228.
40.
2. The pre-condition of Ck is Ckz and post-condition is Cko, where k = 0 . . . n 3. Clz = qt 4. Cno = go 5. For k = 0 . . .72 - 1,cko = C ( k + l ) % Thus, we can compose a new component to satisfy a query with the help of an inference engine. If a library is classified using only keywords, such composition is not possible. In semantic network approach, such inference must be encoded in the semantic network and thus makes the network extremely complex.
6.2
1
Similarity Measure
The similarity defined in previous section assumes that all predicates have the same weight. But it is possible that the loredicates may have different weights. Such weights may be manually assigned, such as conceptual distance in faceted scheme and semantic network, or may be obtained using statistical method, like frequency count in information retrieval. However, when different predicates are combined into a new predicates, some rules are needed to compute the weight of the new predicate. Such rules
269