Towards a Data Complexity Metric Set for Web Service ... - IEEE Xplore

3 downloads 452 Views 257KB Size Report
However, some new features of Web service-based software such as heterogeneity and loose-coupling bring great trouble for its latter maintenance and ...
2011 11th IEEE International Conference on Computer and Information Technology

Towards a Data Complexity Metric Set for Web Service Composition Chengying Mao 1. School of Software and Communication Engineering, 2. The State Key Laboratory of Software Engineering, Wuhan University, Jiangxi University of Finance and Economics, 430072 Wuhan, China 330013 Nanchang, China Email: [email protected]

activities. In fact, it provides quantitative measure of the development, operation, and maintenance of software for supplying meaningful and timely management information. Since the internal structure of service unit is infeasible for the developers or maintainers of Web service-based software, it is very hard to perform effective measurement on Web service compositions [4]. Until today, a systematic metrics framework of WSCs has not been constructed, and the related research is still in the embryonic phase. The existing researches about complexity metrics of WSCs mainly focus on the control complexity of composite business logic. The work can be divided into three parts. The first one is to measure the control complexity of source code of a single Web service unit. The second is to model WSC’s business logic with formal modeling language such as BPEL [5], BPMN and Petri-Net, then, measure the structure complexity of such formal models by adapting the traditional metrics such as LOC, McCabe and Halstead value [6]. The last one adopts cognitive informatics to understand and measure the fundamental characteristics of WSCs. In our previous work, we also proposed some metrics for control structure of WSC with respect to two graphical description models [7,8], i.e., service control flow graph (SCFG) and Petri-Net. In this paper, we attempt to propose a data complexity metric framework for WSCs from two perspectives: data traffic and data dependency. The paper is structured as follows. In the next section, we will briefly introduce the Web services and composition techniques. Then, some data traffic-based metrics for Web service compositions are addressed in Section 3. In Section 4, three data dependency-based metric subsets are proposed by analyzing the def-use relation in BPEL program of WSC. Finally, the concluding remarks and future work are given in Section 5.

Abstract — Web services technology and the corresponding software have been widely acknowledged in recent years. However, some new features of Web service-based software such as heterogeneity and loose-coupling bring great trouble for its latter maintenance and comprehension. The complexity analysis of such system is helpful to solve it. At present, the existing researches mainly concerns on the complexity metrics for control flow. In the paper, we give a data complexity metric set as an effective complement. The data complexity can be measured from two perspectives: data traffic and data dependency. For the first one, the volume of data flow is scaled by analyzing service’s parameters and their data types. The second one is implemented by analyzing the definition and use of variables in BPEL program dependence graph. Based on the def-use pairs, the metric subsets about degree, def-use chain and entropy are addressed. Based on the proposed metric set, we can more fully understand the Web service-based system. In addition, it can also facilitate the performance or defects analysis for such type of system. Keywords — Web service composition; data complexity; data dependency; measurement; entropy

I. INTRODUCTION In recent years, software as a service (SaaS) has become a new paradigm and been widely accepted both in academic community and industry. With the direction of this new philosophy, service-oriented architecture (SOA [1]) is proposed. In such software development model, Web service [2] is becoming more and more widespread as an emerging technology. Although the technology can greatly facilitate the construction of large-scale software system, quite a few problems of latter maintenance have not been solved yet. In the earlier stage of Web services technology, most researches were concerned with the single Web service unit and the corresponding specifications about service description, discovery and invocation. However, the single Web service’s function is so limited. The real-world situation of software development requires us to compose Web services together to realize the complex business logic. In general, Web service compositions (WSCs) [3] are distributed applications with numerous aspects of static structure and dynamic behavior that are different from traditional software. For this reason, it is necessary to design a new complexity metric set for Web service composition. Software measurement is an important sub-field of software engineering, and plays a critical role for all software

978-0-7695-4388-8/11 $26.00 © 2011 IEEE DOI 10.1109/CIT.2011.34

II. BACKGROUND A. Web Service Composition Different from the traditional software paradigm, serviceoriented architecture provides a novel way to construct new system or integrate the legacy applications. In this software architecture, software is not an entity to be installed in customer’s computer, but be treated as a service to be remotely invoked according to his / her demand. Hence, customers can freely choose the services provided by different software suppliers to compose software. Nowadays, Web services technology has become the most representative 127

approach to construct software system with service-oriented architecture. A Web service is defined as “a software system designed to support interoperable machine-to-machine interaction over a network” [2] by the W3C. In order to implement the effective interaction between service units, Web services should be well-encapsulated components. Therefore, Web services can be viewed as the Web application programming interfaces (APIs) that can be accessed over a network such as the Internet, and executed on a remote system hosting the requested services. In general, this new technology mainly involves three kinds of shareholders, that is service provider, registration center and service requestor. In terms of service providers, they are responsible for developing service and publishing the service onto registration center. In general, service code runs on the provider’s Web server, and only interface information is stored in registration center. Service requestor uses the standard interface protocol to invoke service through passing the specific context parameters and gathering return value. As the above mentioned, software users should choose some Web services according to their functional requirements, then compose them together to realize a concrete software system. The corresponding assembling technology is so-called Web services composition, and the implemented system is Web service-based system. As a loose-coupling application, the business workflow of Web service-based system is usually modeled by using process languages like WS-BPEL, BPML and OWL-S. In these business specifications, Web service units are composed together via some specific logics, such as sequence, branch, and parallel. In Web service-based system, Service interface information is usually written via Web service description language (WSDL) [9], a XML format standard drawn up by W3C organization. The data passing from one Web service to the other Web service is encapsulated in accordance with the simple object access protocol (SOAP) [10]. Since the data passing is the most important interaction between two Web services, so that analyze the data complexity of Web services composition is valuable for understanding such new-style software system.

conduct detailed exploration in this aspect, but mainly concern on the complexity of data flow. Data flow is another information flow in software system, so data flow complexity is also an important issue worth considering. For Web service-based system, data flow complexity is a key factor in its whole complexity measurement framework. In Web service composition, the data-flow complexity of a process increases with the complexity of its data structures, the number of formal parameters of activities, and the mappings between activities’ data [13]. In this aspect, J. Cardoso classified dataflow complexity into three sub-metrics: data complexity, interface complexity, and interface integration complexity [14], and the concept of interface entropy (IE) is defined to measure interface complexity for Web services. Here, we argue that data interaction is the most important factor worth considering from the global perspective of service composition. Therefore, we mainly consider the data interaction complexity, and use the two novel methods to scale this item. On the other hand, XML is the fundamental technology for integrating diverse services, processes and applications into a Web service-based system. Accordingly, we can measure the complexity of XML files to conclude the data complexity of Web services. In reference [15], D. Basic and S. Misra presented a metric to assess the quality of the Web services in terms of maintainability. In their measure, the operations and parameters of operation in WSDL file are used as the important measurement items, and the metrics such as argument per operation (OPS), operations per service (APO) and data weight of WSDL (i.e., DW (wsdl) ) have been proposed. Although our data complexity measure also uses WSDL file as the reference object, it mainly analyzes the data transferring at service call site in BPEL program. In addition, Basic and Misra’s method is mainly used to evaluate the complexity of a single Web service, while our approach copes with the measurement of data interactions between Web services. III. DATA TRAFFIC-BASED METRIC In Web service-based software, Web services should communicate each other. As shown in Figure 1, the interchanged data between them is uniformed by a standard protocol, i.e. simple object access protocol (SOAP) [10]. In this protocol, message is usually encased in the element named envelope, and an envelope will include a message header and message body. SOAP header is an optional element used for specific processing in application. The SOAP body element carries the content of the message, and contains one or more child elements, i.e. parts. From the perspective of message composition, elements of message can be divided into two types: basic primitive type element and complex type element. The basic type refers to the primitive data type in meta-model of XML schema, such as integer, float and string. The complex type can be viewed as the composition of some basic type elements, e.g. array. Besides the parameter data, SOAP message contains many other data used for

B. Related Work In recent years, analysis work on the complexity of Web services has caused a lot of attentions and aroused extensive discussions. For instance, J. Cardoso and V. Gruhn et al. have surveyed several contributions for measuring business process models [11,12]. Here, we only briefly address the existing methods closely related to our work. In general, the business logic of a Web service-based system is represented by business process modeling languages, such as BPEL and OWL-S. It should not to be neglected that, business process code has its special constructs such as flow, and pick. Therefore, it is necessary to design some new metrics for such new process languages. In recent years, quite a few structure complexity metrics are proposed for measuring the control flow of business process in WSC. For space reason, we don’t

128

controlling or checking. However, our study about data complex measurement only focuses on the data corresponding to the interface parameters of service units.

where Nws represents the service number in the whole Web service-based system, including the system itself since it is the biggest composed service. IV. DATA DEPENDENCY-BASED METRIC It’s not hard to find that, the above metric reflects the amount of data flow between Web services, so that this metric can be used to predict the performance of Web service-based system. However, this metric can not indicate the data dependence intensity between two Web services. For example, in a system design blue-print, Web service A sends a mass of data to another Web service B, but B only uses a litter part of this message in its code. From the perspective of data transfer, these two Web services have the tight data coupling. But, they only have loose data dependency at all. As a consequence, we can find that the data traffic-based metric can’t reflect the real interaction in some situations. Here, we will discuss some new metrics from the perspective of data dependency. To simplify our discussion, we assume that the workflow of WSC is usually described in business process execution language (BPEL). The common data dependence [16] of variable definition and use also exists in BPEL programs. However, the variables in BPEL program have their special features such as data composition, so it is necessary to perform some extended analysis on them. Definition 1 (Data Definition / Use). Let v be a variable in program, the assignment of a value to variable v in statement nd is identified by a tuple def (v, nd ) . Meanwhile, the use of v is identified by a tuple use(v, nu ) , where the value of variable v is referenced without modifying statement nu . For BPEL programs, variable definition mainly occurs in the following cases: (1) a receiving activity, (2) outputVariable clause in invoking activity, and (3) in the element of assignment activity. In general, the use of a variable can be divided into calculation use (c-use) and predicate use (p-use). In BPEL programs, c-use mainly appears in (1) reply activity, (2) inputVariable clause in invoking activity, and (3) in the element of assignment activity. The p-use of a variable usually appears in a predicate statement, and can be found in the switch condition of case or while statement, or in the transitionCondition of state transition. Definition 2 (Def-Use Pair). A definition-use (def-use for short) pair is an ordered pair ( nd , nu ) , where a statement called nd contains a definition of a variable v , which is used in a statement nu in a program. Definition 3 (Data Dependence). Suppose ( nd , nu ) is a def-use pair w.r.t. variable v , node nu is data dependent on node nd if there are no variable redefinition of v within the path from nd to nu . Accordingly, the path from nd to nu is called clear def-use path w.r.t. variable v . In our previous work [7], we have performed the control dependency analysis on BPEL program, and the concept of BPEL program dependence graph (BPDG) is also defined. Similar to the traditional program dependence graph, BPDG

Figure 1. The sketch of data interaction between two Web services

At first, we consider the data interaction complexity in a service request-response pair, i.e. one time invocation of Web service. We argue that the more basic data transferred between two Web services, the more complex data interaction between them. Hence, the data transferring complexity (DTC) in one time service invocation can be expressed as follows: DTC (inv) = ¦ DTCcomplex(elem) + ¦ DTCbasic(elem) (1) If an element which passes message belongs to the basic data type, its data complexity is assigned with 1, that is to say, DTCbasic (elem) = 1 , if elem is basic type data (2) On the contrary, if the element belongs to complex data type, its data complexity can be denoted as the sum of complexities of parts in it. Here, its part should be a value with basic data type. Otherwise, the decomposition continues. DTCcomplex(elem) = ¦ DTCbasic(elem. part ) (3) here elem. part is basic type data. It is clear that, the data interaction complexity of registration service invocation is assigned with 3 according to our measurement method. Based on the above metric, the data transferring complexity of whole Web service composition can be measured in the following way. DTC (WSC ) = ¦ DTC (invi ) (4) i

where invi refers to the ith service invocation in WSC. It should be noted that, there maybe exists several times of invocations between two Web services. In this case, the transferring complexities of all these times should be added to DTC (WSC ) . Obviously, DTC (WSC ) is the sum of service invoking complexities in the whole system. It means that the WSC with more service units will have the higher DTC value. However, sometimes we pay more attention to the average amount of data flowing between two Web services. Here, this kind of measurement is named standard data transferring complexity, which can be computed as follows: 2 × ¦ DTC (invi ) i (5) DTCs(WSC ) = Nws × ( Nws − 1)

129

reflects the control and data dependence relations between activity nodes in BPEL program. There are four types of edges in BPDG, that is, control dependence, data dependence, synchronized dependence, and parameter/call mapping edges. In this paper, we mainly concern on the data dependence between Web services, so only data dependence and parameter mapping edges are taken into consideration. Here, the BPDG of BPEL program is denoted as GBPD , and the sets of data dependence edges and parameter mapping edges are denoted as Ed and Epm , respectively.

Suppose Sdd is the data dependency set in BPDG, in fact, it is the data dependence (including parameter mapping) set. Therefore, it can be expressed as : Sdd = Ed * Epm = {< vi, vj > | vi, vj ∈ GBPD} , where vi is the variable definition node or actual parameter node, and vj is the variable use node or formal parameter node. Based on the above representation, we can define the concept of def-use chain as below. Definition 4 (Def-Use Chain). Suppose there exists a directed path of data dependence pdd =< v1, v 2, " , vi, vj, " , vm > (1 < m ≤| V (GBPD ) |) in GBPD , for each adjacent node pair < vi, vj > , it must be an edge in data dependence set or including parameter mapping set, i.e., < vi, vj >∈ Ed * Epm , and if the length of pdd could not further increase, then the corresponding path is called def-use chain. Definition 5 (Independent Def-Use Chain Set). Suppose pdd 1 and pdd 2 are two def-use chains in GBPD , if there is at least one node different from pdd 1 to pdd 2 , we can say pdd 1 is independent from pdd 2 . All these independent chains are called independent def-use chain set, denoted as Siduc . Based on the above definitions, we can give some metrics for data dependence in BPDG. As shown in Table 2, these metrics items can reflect the data flow among BPEL activity nodes.

A. Basic Metrics Based on the BPEL program dependence graph, we can list some basic data dependency metrics as shown in Table 1. TABLE I. THE BASIC COUNT-BASED METRICS FOR DATA DEPENDENCY IN WEB SERVICES COMPOSITION. No.

Metric Acronym

Description

1

DIL

2

MDN

Data Interaction Liveness, i.e., the percentage of all nodes related with data definition and use in BPEL program. Max Definition Number, i.e., the max times of variable definition in an activity in BPDG.

3

MUN

Max Use Number, i.e., the max times of variable use in an activity in BPDG.

4

ADT/AUT

Average Definition/Use Times, i.e., the average times of variable definition/use per activity node.

TABLE II. THE DATA COMPLEXITY METRICS BASED ON THE REPRESENTATION OF DEF-USE CHAIN IN WSC.

The above metrics can be formally defined as below. Suppose the node set of BPEL program dependence graph GBPD is V (GBPD ) , and the set of nodes corresponding to edge set Ed is V ( Ed ) . Thus, the metric item of DIL can be expressed as formula (6). V (GBPD ) × 100% (6) DIL = V ( Ed ) + V ( Epm) The metric item MDN can be calculated in the following formula. Note that, for a data dependence edge ei =< vi1, vi 2 > , the source node vi1 is variable definition node and the target node vi 2 is variable use node. Hence, MUN can be defined in the similar way. MDN = max( E ) (7) where E ⊆ Ed ∧ ∀ei =< vi1, vi 2 >, ej =< vj1, vj 2 >∈ E , vi1 = vj1 . In general, all defined variable will be used in BPEL program, so ADT is equal to AUT. These two metrics can be calculated in the following formula. ADT = AUT = Ed V (GBPD) (8)

No.

Metric Acronym

Description

1

MLC

the Max Length of def-use Chains, i.e., the length of the longest def-use chain in BPDG.

2

ALC

the Average Length of def-use Chains, i.e., the average length of all def-use chains in BPDG.

3

NDC

Number of the independent def-use Chains, i.e., the cardinality of the independent def-use chain set in BPDG.

Take the metric ALC for an example, it can be formally expressed as below: ¦ len( pddi ) ALC = i (9) Siduc where pddi ∈ Siduc , and len( pddi) represents the length of defuse chain pddi (1 ≤ i ≤ Siduc ) . C. Entropy-Based Metric The above two kinds of data dependence metrics mainly concern on the local data interaction information, such as degree of activity node and def-use chain. As a consequence, these items couldn’t reflect the global complexity of data dependence in BPEL program of WSC. In general, entropy is an effective index used to scale the disorder or randomness of information. Here, we adopt this concept to measure the overall complexity of data interaction in Web service-based software, i.e., data dependence entropy. Suppose there is a program dependence graph GBPD for BPEL program of WSC, the data dependence sub-graph can be achieved after removing control dependence edges from

B. Def-Use Chain-Based Metrics Although the above metrics have considered data dependency, they merely calculate the basic counts of variable definition and use for some specific activity node. In this section, we give some metrics focusing on the data interaction (i.e., def-use relation) between activity nodes in BPDG. Here, some concepts about def-use chain are defined first.

130

GBPD , which can be named GBDD . Obviously, V (GBDD ) = V (GBPD) and E (GBDD) = Ed * Epm . In data dependence sub-graph GBDD , suppose the number of edges related some specific activity node vi is numi . The metric of data dependence entropy (DDE) can be calculated as below: V ( GBDD ) DDE = − ¦ i =1 pi ⋅ log 2 pi (10) where pi represents the ratio of the number of edges related with node vi to twice the number of all data dependence edges, that is, pi = numi 2(| Ed | + | Epm |) , 1 < i ≤| V (GBDD) | . According to the metric DDE, we can find that, if its value is close to 0, there are very few variable invocations in Web service composition. Otherwise, nearly all activity nodes involve the variable definition or use in WSC.

Laboratory of Software Engineering under Grant No. SKLSE2010-08-23, and the Program for Outstanding Young Academic Talent in Jiangxi University of Finance and Economics. REFERENCES [1]

[2] [3]

[4]

V. SUMMARY AND OUTLOOK [5]

Web services system has become a mainstreaming software form in recent years. However, its maintenance and comprehension bring a great challenge to software engineers due to the characters of heterogeneity and loose-coupling. Software measurement can relieve the pressure and facilitate the latter maintenance and management of Web servicebased software. The existing research work mainly concerns on the control flow complexity of business process in WSC, and data flow complexity is usually ignored. In the paper, two kinds of data complexity metrics have been presented. Data traffic-based metric is used to describe the amount of the transferred data between Web services in WSC. The data amount is scaled by the data types of parameters in SOAP message. On the other hand, data dependency-based metric set is divided into three subsets, i.e. basic metrics, def-use chain-based metrics, and entropy-based metric. All these metric items are in accordance with data dependence analysis results (def-use pairs) of WSC. The proposed data complexity metric set can be an effective complement for the control complexity metrics of WSC. Here, we only make some preliminary explorations on the measurement of service software. Some work is worth further study, more specifically, the metric set needs to be supplemented in future, and the combination of metrics of control complexity and data complexity can be used to predict the performance or defects of Web service-based software system.

[6]

[7] [8]

[9]

[10] [11]

[12]

[13]

[14]

ACKNOWLEDGMENT This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No. 60803046 and 61063013, the Natural Science Foundation of Jiangxi Province under Grant No. 2010GZS0044, the Science Foundation of Jiangxi Educational Committee under Grant No. GJJ10433, the Open Foundation of State Key

[15]

[16]

131

M. H. Valipour, B. AmirZafari, K. N. Maleki, and N. Daneshpour, “A Brief Survey of Software Architecture Concepts and Service Oriented Architecture,” Proc. of 2nd IEEE International Conference on Computer Science and Information Technology (ICCSIT’09), 2009, pp. 34-38. W3C Web Services Activity, available from: http://www.w3.org/ 2002/ws/ J. Koehler and B. Srivastava, “Web Service Composition: Current Solutions and Open Problems,” Proc. of ICAPS 2003 Workshop on Planning for Web Services, 2003, pp. 28-35. C. Mao, “Control and Data Complexity Metrics for Web Service Compositions,” Proc. of the 10th Int’l Conf. on Quality Software (QSIC’10), IEEE CS Press, 2010, pp. 349-352. OASIS WSBPEL Technical Committee, Web Services Business Process Execution Language, Version 2.0, available at http://docs. oasis-open.org/wsbpel/2.0/wsbpelv2.0.pdf J. Cardoso, “Control-flow Complexity Measurement of Processes and Weyuker’s Properties,” Proc. of the 6th International Enformatika Conference, Transactions on Enformatika, Systems Sciences and Engineering, 2005, vol. 8, pp. 213-218. C. Mao, “Slicing Web Service-based Software,” Proc. of SOCA’09, Taipei, Taiwan, Dec. 14-15, 2009, pp. 91-98. C. Mao, “Complexity Analysis for Petri Net-based Business Process in Web Service Composition,” Proc. of the 5th IEEE Int’l Symp. on Service-Oriented System Engineering (SOSE’10), IEEE CS Press, 2010, pp.193-196. World Wide Web Consortium (W3C), Web Services Description Language (WSDL) Version 1.1, available at http://www.w3.org/TR/ wsdl, 2001. World Wide Web Consortium (W3C), Simple Object Access Protocol, Version 1.2, available at http://www.w3.org/TR/soap12/, 2007. J. Cardoso, J. Mendling, G. Neumann, and H. A. Reijers, “A Discourse on Complexity of Process Models,” Proc. of the 4th International Conference on Business Process Management (BPM’06) Workshops, LNCS 4103, 2006, pp. 117-128. V. Gruhn and R. Laue, “Complexity Metrics for Business Process Models,” Proc of the 9th International Conference on Business Information Systems (BIS’06), Lecture Notes in Informatics (LNI) 85, 2006, pp. 1-12. H. A. Reijers and I. T. P. Vanderfeesten, “Cohesion and Coupling Metrics for Workflow Process Design,” Proc. the 2nd International Conference on Business Process Management (BPM’04), LNCS 3080, 2004, pp. 290-305. J. Cardoso, “About the Data-Flow Complexity of Web Processes,” Proc. of the 6th International Workshop on Business Process Modeling, Development, and Support, Porto, Portugal. 2005. D. Basic and S. Misra, “Data Complexity Metrics for XML Web Services,” Advances in Electrical and Computer Engineering, 2009, vol. 9, no. 2, pp. 9-15. K. J. Ottenstein, “Data-Flow Graphs as an Intermediate Program,” Ph.D. Dissertation, Computer Sciences Department, Purdue University, Lafayette, IN. 1978.

Suggest Documents