Retrieving Software Components Using Directed ...

8 downloads 2132 Views 379KB Size Report
Department of Information and Computer Science, Waseda University ... ability distance” (DRD), which represents how different two components are in detail from the .... sent semantically the degree of difference between two components.
Retrieving Software Components Using Directed Replaceability Distance Hironori Washizaki and Yoshiaki Fukazawa Department of Information and Computer Science, Waseda University 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan {washi, fukazawa}@fuka.info.waseda.ac.jp

Abstract. In component-based software development, the mechanism by which the software components which best satisfy a user’s query are retrieved is indispensable. However, conventional retrieval methods cannot evaluate the total characteristics of a component, because they consider a single aspect of the component or require an additional description. In this paper, we propose a new distance metric, “directed replaceability distance” (DRD), which represents how different two components are in detail from the viewpoint of structure, behavior, and granularity. We have developed a retrieval system which uses DRD as a measure of the difference between a user’s query prototype component and the components stored in a repository. In this paper, we outline the concept of DRD and the usefulness of our retrieval system.

1

Introduction

Recently, software component technology, which is based on the development of components in combination, has attracted attention because it is capable of reducing development cost[1]. A software component is a unit of composition with contractually specified interfaces, provides a certain function, and can be independently exchanged. In a narrow sense, the software component is defined as that which is distributed in the form of an object code (binary code) without source codes. According to this definition, the internal structure of the software component is not available to the public. In this paper, “software component” is used according to this definition. Since it is natural to model and implement components in an object-oriented paradigm/language[1], we limit this study to the use of the OO language for the implementation of components. Researchers have noted that a technique for retrieving and selecting a component which satisfies a requirement specification has not yet been established[2]. Since software can be distributed over the Internet, the reuse of components over the Internet is emerging[2]. Today, a component repository and a retrieval mechanism which appropriately supports the retrieval of components from the repository are necessary to enable such reuse. The important characteristics of components are the following[3]: – Structure: the internal participants and how they collaborate J. Bosch (Ed.): GCSE 2001, LNCS 2186, pp. 153–162, 2001. c Springer-Verlag Berlin Heidelberg 2001 

154

– – – – –

H.Washizaki and Y.Fukazawa

Behavior: stateless behavior and behavior which relates to the states Granularity: the component size and the classification Encapsulation: how much design/implementation decisions are hidden Nature: main stage used in the development process Accessibility to Source Code: the modifiability of the component

We aim to reuse components that are mainly distributed in the form of the object code, and the components used at the implementation stage. Therefore, in designing the retrieval mechanism, it is not necessary to consider the two characteristics “nature” and “accessibility to source code”. “Encapsulation” is important because it is directly related to the testability of the component. However, users retrieve a component generally on the basis of its functionality, and it is possible to verify the encapsulation of a component after retrieval. Therefore, “structure”, “behavior”, and “granularity” can be considered important characteristics of the component in terms of retrieval.

2

Component retrieval

Conventional retrieval approaches for the software component which is in the wide sense can be classified into four groups: automatic extraction approach (text-based approach), specification-based approach, similarity distance-based approach and type-based approach. The automatic extraction approach is based on the automatic extraction of structural information from components[4]. The user’s queries are expressed as keywords corresponding to the names of interfaces, components, and so forth. This approach is effective in the case that all source codes of the components are available. However, in the case that the source codes are not available, as assumed in this paper, the extracted information is insufficient for the retrieval[5]. The semi-formal specification-based approach is based on catalog information of the components[2]. The user’s queries are given as keywords which correspond to a specification of the catalog. In addition, the formal specification-based approach, which uses semantic description of the component’s behavior, has been proposed[6]. In general, the description uses the first-order predicate logic, so this approach guarantees precise adaptability to the user’s query. However, the preparation costs of both approaches become very large because the additional descriptions are necessary. The similarity distance-based approach is based on the similarity between a user’s query and the component stored in the repository. There are two major approaches in the similarity evaluation method: an approach using the class inheritance relation in OO language[7] and an approach using the similarity of element names (names of interfaces, etc.)[5]. The user’s queries are given by a prototype of the component which satisfies the user’s requirement. The type-based approach is based on component type and component interface type[8]. The user’s queries are given by type information expected to realize the user’s requirement. Search results are classified according to adaptability, for

Retrieving Software Components Using Directed Replaceability Distance

155

example, exact match and generalized match, but more detailed ranking with each match set cannot be obtained. These conventional approaches consider only a single characteristic of the component when retrieving. Therefore, these approaches cannot evaluate the total semantic adaptability of the component[2]. The retrieval mechanism should be able to consider two or more characteristics simultaneously when retrieving. In addition, not all components circulated over the Internet have additional specification descriptions[5]. Therefore, the retrieval mechanism should not require any additional information other than the components themselves.

3

Directed Replaceability Distance

We propose directed replaceability distance (DRD) as a distance metric to represent semantically the degree of difference between two components. In a situation in which component cq is used and the system requirements are the same before and after the replacement, when cq is replaced with another component cs , the parts where cq is used must be modified. DRD(cq , cs ) indicates the necessary adaptation cost in such a situation. At this time, considering the surroundings of cq , it is assumed that all interfaces of cq are uniformly used. First, we define three primitive distances: the structural distance DRDS , the behavioral distance DRDB and the granularity distance DRDG . These primitive distances are normalized between 0 and 1, and correspond to respective characteristics of the component under consideration: the structure, the behavior and the granularity. Then, DRD is defined as a combination of DRDS , DRDB , and DRDG based on the dynamically weighted linear combination model[9]. As a result of using this combination model, the degree of consideration for each of the three characteristics can be changed according to the user’s attention by changing assignment of weight values. DRD(cq , cs ) is defined as follows. DRD(cq , cs ) ::= w1 DRDS (cq , cs ) + w2 DRDB (cq , cs ) + w3 DRDG (cq , cs ) 3 i=1 wi = 1, wi ≥ 0 The structural distance DRDS reflects the difference between components’ names and the difference between the components’ interface structures (signatures). For example, there are three interfaces, I1 ∼ I3 , shown in Fig. 1 and I1 is assumed to be used. I2 is different in terms of the argument type compared with I1 . However, considering numeric types, the accuracy of int is higher than that of short. Therefore, it is expected that the necessary adaptation work for interface replacement of I1 with I2 is almost completed with the narrowing type cast for parts where I1 is used. On the other hand, compared with I1 , I3 is markedly different in terms of the name and the argument type. Therefore, the adaptation cost when replacing I1 with I2 is smaller than that when replacing it with I3 . DRDS can be calculated from the sets of such interface structural differences which components have before and after replacement. The behavioral distance DRDB reflects the difference between the components’ interface execution results and the difference between the types of the

156

H.Washizaki and Y.Fukazawa structural example I1 : void setData(int aData) I2 : void setData(short aData) I3 : void setBar(String aData)

behavioral example I1 : { data = aData; } I2 : { data = new Integer(aData); } I3 : { }

Fig. 1. Examples of interfaces with different structures/behaviors components’ readable properties, whose value changing can be observed. ActiveX[11] and JavaBeans[12] are the component systems which support the readable/writable property mechanism using the IDL definition or the naming conventions. For example, there are three interfaces, I1 ∼ I3 , shown in Fig. 1 and I1 is assumed to be used. It is also assumed that these three interfaces have a common interface declaration I1 , and data is a readable property using the introspection mechanism provided by the target component system. When I1 is invoked, the value change of the property whose type is int is observed. In the case of I2 , the type of the changed property is Integer. Therefore, I1 and I2 are considered to be similar in terms of behavior because int and Integer as types are very similar. However, the invocation of I3 does not bring about any changes in the readable properties. Therefore, I3 is significantly different in terms of behavior compared with I1 . DRDB can be calculated from the sets of interface behavioral differences which components have before and after replacement. The granularity distance DRDG reflects the difference between component sizes and the difference between component interface execution times. For example, there are three components, Bar (component size: 10kbytes, total interface execution time: 10msec), Car (15k,20) and Dar (100k,150). Bar is assumed to be used. Bar and Car are similar in terms of component size and the total execution time of interfaces. On the other hand, the values for Dar are large compared with those for Bar. Therefore, if the resource constraint is severe, the replacement of Bar with Dar is more difficult than the replacement with Car. DRDG can be calculated from the component granularity difference and the set of interface granularity differences. In the following, only the structural distance is described precisely. The structural distance is defined as follows, using the word distance dw for the names of components and the element set distance ds for the sets of interface structures. structure of component CS , interface structure IS CS ::= {name : String, interf aces : {i1 : IS , ..., in : IS }} cq , cs : CS cq = {name = nq , interf aces = isq } cs = {name = ns , interf aces = iss } dw(nq ,ns )+2ds(isq ,iss ) DRDS (cq , cs ) ::= 3 The word distance dw(wq , ws ) between words wq and ws is defined as follows, using the longest common substring wp . Here, |W | is the length of the word w. dw(wq , ws ) ::=

|ws |(|wq |+|ws |−2|wp |) (|ws |+1)(|wq |+|ws |+2|ws ||wp |)

or 1 (if wp does not exist)

The element set distance ds(Rq , Rs ) between element sets, which are Rq as the one replacement before and Rs as the one replacement after, is defined as

Retrieving Software Components Using Directed Replaceability Distance

157

follows. Here, |R| is the number of R elements. First, f1 selects the mapping between Rq and Rs with the minimum total distance dx(q, s) according to the types of q and s for all pairs < q, s >. Second, in the case that |Rq | > |Rs |, f2 creates ordered pairs < q  , root > with the root, according to the type of q  , for all elements q  in the remainder of Rq after calculating f1 . However, in the case that |Rq | < |Rs |, f3 creates ordered pairs < root, s > with the root, according to the type of s , for all elements s in the remainder of Rs after calculating f1 . Finally, ds summarizes f1 ∼ f3 , and divides the total by the largest value between |Rq | and |Rs |. The situation where the number of targets is greater than the number of queries is more desirable than that where the number of targets is less than the number of queries. The definition of ds satisfies this desirability, because dx(q, root) > dx(root, q), always. The distance dx(q, s) is dis(q, s) or dt(q, s), according to whether the types of q and s are the interface structure (IS ) or the normal type (t). Rq = {q1 : x, ..., qm : x},  Rs = {s1 : x, ..., sn : x} f1 (Rq , Rs ) ::= mindx(q,s) ∈[Rq ,Rs ] dx(q, s)  f2 (Rq , Rs ) ::= q ∈Rq −(Rq ∩Rs ) dx(q  , root) (|Rq | > |Rs |)  f3 (Rq , Rs ) ::= s ∈Rs −(Rq ∩Rs ) dx(root, s ) (|Rq | < |Rs |) f (R ,R )+f (R ,R )+f (R ,R )

2 q s 3 q s ds(Rq , Rs ) ::= 1 q s max(|R q |,|Rs |)  {name = null, signature = {} → root } (x = IS ) root =  root  (x = t)

The interface structure IS is composed of the name of the interface and the functional type of the signature. The interface structural distance dis(iq , is ) between interface structures iq and is is defined as follows, using the word distance dw between the names of interfaces and the functional distance df between the signatures of interfaces. IS ::= {name : String, signature : F } iq , is : IS iq = {name = nq , signature = sigq } is = {name = ns , signature = sigs } dw(nq ,ns )+2df (sigq ,sigs ) dis(iq , is ) ::= 3 The functional distance df (fq , fs ) between functional types fq and fs is defined as follows, using the element set distance ds for arguments and using the normal type distance dt for the return value. functional type F ::= {params : {t1 : t, ..., tn : t} → return : t} fq , fs : F fq = {params = pq → return = rq } fs = {params = ps → return = rs } ds(ps ,pq )+dt(rq ,rs ) df (fq , fs ) ::= 2 The value type (int, ...), the object type (Object, ...), and the value-wrapper type (Integer, ...) are enumerated as the normal type. Using the object-oriented type system[10], these types form a single Is-a graph which makes  root  the top. We use the subclass relation as the subtyping relation of the object type.

158

H.Washizaki and Y.Fukazawa

Since the value-wrapper types have primitive values as the value types, we use the subset subtyping of those primitive values as the subtyping relation of the value-wrapper type. Fig.2 shows a standard Is-a graph in Java language. > void

l (x)=1

Object

l (x)=2

long

Void

Date

int

Long

Time

short value type

Integer value-wrapper type

......

object type

......

example

supertype subtype

Fig. 2. Standard Is-a hierarchical graph The subtyping relation is described as subtype

Suggest Documents