by multiple service providers to perform a specific function. Service ... Using an integer string to represent a web services composition, the best one is the.
Cost-Driven Web Service Selection Using Genetic Algorithm* Lei Cao, Minglu Li, and Jian Cao Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 20030, China {lcao, li-ml, cao-jian}@cs.sjtu.edu.cn
Abstract. Web services composition has been one of the hottest research topics. But with the ever increasing number of functional similar web services being made available on the Internet, there is a need to be able to distinguish them using a set of well-defined Quality of Service (QoS) criteria. The cost is the primary concern of many business processes. In this paper, we propose a new solution using Genetic Algorithm (GA) in cost-driven web service selection. GA is utilized to optimize business process composed of many service agents (SAg). Each SAg corresponds to a collection of available web services provided by multiple service providers to perform a specific function. Service selection is an optimization process with taking into account the relationships among the services. Better performance has been gotten using GA in the paper than using local service selection strategy. The global optimal solution might also be achieved with proper GA parameters.
1 Introduction Web services are self-describing software applications that can be advertised, located, and used across the Internet using a set of standards such as SOAP, WSDL, and UDDI [1]. A single web service is most likely inadequate to serve the customers’ business needs; it takes a selection of various web services composed together to form a business process. Web services composition [2] has been one of the hottest research topics in this new field. However, with the ever increasing number of functional similar web services being made available on the Internet, there is a need to be able to distinguish them using a set of well-defined Quality of Service (QoS) criteria. QoS is a broad concept that encompasses a number of non-functional properties such as cost, response time, availability, reliability, and reputation [3]. These properties apply both to standalone web services and to web services composed of other web services (i.e., composite web services). Typically, cost and time are two primary factors that customers are concerned about. The challenge is to select the services that not only satisfy the individual requirements but also best fit the overall composed business process. Therefore, the entire business process needs to be optimized prior to execution. The philosophy of Genetic Algorithm (GA) [4] mimics the evolution process of “the survival of the fittest” in *
This paper has been supported by the 973 project (No.2002CB312002) of China, grand project of the Science and Technology Commission of Shanghai Municipality (No.03dz15027).
X. Deng and Y. Ye (Eds.): WINE 2005, LNCS 3828, pp. 906 – 915, 2005. © Springer-Verlag Berlin Heidelberg 2005
Cost-Driven Web Service Selection Using Genetic Algorithm
907
nature. GA is a parallel global optimization algorithm with high robustness, and it is not restricted by natures of the optimization problem, such as continuity and differentiability. Only the objective function and the corresponding fitness level influence the direction of search, so it is very suited to those complicated optimization problems that cannot be handled efficiently by traditional optimization algorithms (methods of calculus or techniques of exhaustive search). In this paper, we take the cost as the primary concern of many business processes. Using an integer string to represent a web services composition, the best one is the string that leads to the lowest cost. A service selection model using GA is proposed to optimize business process composed of many service agents (SAg). Individual SAg corresponds to a collection of available web services provided by multiple service providers to perform a specific function. Service selection using GA is an optimization process taking into account the relationships among the services. The remainder of the paper is organized as follows. Section 2 discusses related work. Section 3 describes GA utilized in our web service selection model. In Section 4, a service selection case is presented. Finally, Section 5 concludes the paper.
2 Related Work In [2], authors present a framework SELF-SERV for declarative web services composition using state-charts. Their service selection method uses a local selection strategy. The service selection is determined independently to other tasks of the composite services. It is only locally optimal. In [5], authors make research on the end-to-end QoS issues of composite service by utilizing a QoS broker that is responsible for coordinating the individual service component to meet the quality constraint. Authors design the service selection algorithms used by QoS brokers to meet the end-to-end QoS constraints. The service selection problem is modeled as the Multiple Choice Knapsack Problem (MCKP). Authors give three algorithms and recommend the Pisinger’s algorithm as the best one. But their solution is usually too complex for run-time decisions. In [6], authors present the Web Services Outsourcing Manager (WSOM) framework via a mathematical model for dynamic business processes configuration using existing web services to meet customers’ requirements, and propose a novel mechanism to map a service selection space {0,1} to utilize global optimization algorithms - Genetic Algorithms. The binary encoding is adopted in the algorithm, but it is not human-readable. When the business process is complicated, the chromosome will be too lengthy. Moreover, GA utilization in the service selection is not given in detail, especially about how the relationships among the services will affect the global optimization.
3 GA Utilized in Web Service Selection Model 3.1 Web Service Selection Model Architecture The web service selection model is based on a multi-agent platform, which is composed of multiple components (See Fig. 1).
908
L. Cao, M. Li, and J. Cao
U D D I R egis try (L o cal o r P u b lic)
SOA P SCAg (Inclu ding G A m od u le)
IA g
UAg
ACL C h a n n el
S A g-1
S C A g : S ervice C om p o sitio n A g en t
SA g -2
S A g-3
IA g : Inform atio n A g en t U A g : U D D I A g en t A C L : A g en t C o m m u n icatio n L an g uag e
S A g-N
S A g : S erv ice A g en t
Fig. 1. Model architecture
1. Service Agent (SAg) corresponds a list of web services that have the specific function. Its capability depends on the specific function. SAg possesses all useful information about those web services, such as cost, response time, and so on. In order to facilitate dynamic selection, up to date information concerning parameters that affect the decision of web service activation must be gathered. 2. UDDI agent (UAg) is the broker of inner SAg when asking for outer UDDI registry. SAg might get information about required web services via UAg. 3. Information agent (IAg) is the information center of the model. All SAg must register themselves with it. 4. Service composition agent (SCAg) is responsible for administering the composite business process as a services flow (SF). Using the modeling tool, users can model the SF in advance. The model is composed of many predefined or customized SAg. GA module is the core of SCAg. We will depict it in the remainder of the paper. Before the composite business process is instantiated, SCAg will activate GA module to optimize the service selection to get the final executable SF. 5. ACL channel [7] is the communication bus among those agents mentioned above. 3.2 Problem Objective Function When the cost becomes the sole primary factor that customers are concerned about, the service selection is equivalent to a single-objective optimization problem. There are mainly two kinds of service flow. 3.2.1 Pipeline Service Flow. For a pipeline services flow that has N steps (N SAg in an execution path) ( S1 , S 2 ,..., S N ) , it only has one type structure: sequence one. The control flow will come through all its SAg, and only one proper web service in each SAg will be chosen to bind to. The overall cost of the SF is equal to the summation cost of all its components. N
cos t ( SF ) = ∑ cos t ( Sik ), 1 ≤ k ≤ size( Si ) i =1
size( Si ) = | Si |, i = 1,..., N
The objective function is: Min cos t ( SF ) .
(1)
Cost-Driven Web Service Selection Using Genetic Algorithm
909
3.2.2 Hybrid Service Flow. For a hybrid services flow that has N steps (No more than N SAg in an execution path) ( S1 , S 2 ,..., S N ) , we assume it has three type structures: sequence one, parallel one and conditional branching one. The overall cost of the hybrid SF is described as follows: N
cos t ( SF ) = ∑ cos t ( Sik ) ∗ CFi , 1 ≤ k ≤ size( Si ) i =1
size( Si ) = | Si |, i = 1,..., N CFi ∈ {0,1}, i = 1,..., N
For some SAg linked by conditional branching structure, only one of them can be passed by the control flow at one execution. Thus the hybrid SF has some different execution path. The overall cost of each execution path can always be represented by the summation cost of its subset components. M
cos t ( SF ) = ∑ cos t ( Sik ), 1 ≤ k ≤ size( Si ) i =1
(2)
size( Si ) =| Si |, i = 1,..., M M≤N
The objective function is also: Min cos t ( SF ) . 3.3 GA Module The philosophy of GA mimics the evolution process of “the survival of the fittest” in nature. A fitness function is firstly defined to evaluate the quality of an individual, called a chromosome, which represents a solution to the optimization problem. GA starts with a randomly initialized population. The population then evolves through a repeated routine, called a generation, in which GA employs operators such as selection, crossover, and mutation borrowed from natural genetics to improve the fitness of individuals in the population. In each generation, chromosomes are evaluated by the fitness function. After a number of generations, highly fit individuals, which are analogous to good solutions to a given problem, will emerge. 3.3.1 Solution Representation. The representation scheme determines how the problem is structured in GA and also influences the genetic operators that are used. Which kind of solution representation is used in GA depends on characteristics of the optimization problem. Traditional GA encodes each problem solution into a binary string, called chromosome, which facilities studying GA by schema theorem. However, the binary encoding is not human-readable, which makes it difficult to develop efficient genetic operators that can make good use of the specialized knowledge for the service selection problem. The integer encoding provides a convenient and natural way of expressing the mapping from representation to solution domain. With integer encoding, the interpretation of the solution representation for service selection is straightforward, so the integer encoding is used in this paper. The solution to service selection is encoded th
th
into a vector of integers, where the i element of an individual is k if the k web service in SAg i is selected. For example, the service flow has four SAg
910
L. Cao, M. Li, and J. Cao
{S1 , S2 , S3 , S4 } , each of which has four candidate services, and one possible individual
is
[1 4 2 3]
.
Then
the
corresponding
{S11 , S24 , S32 , S43 } , which means the 1
th
th
service of
composite
service
is
th
S1 , the 4 service of S2 , the
th
2 service of S3 and the 3 of S4 are selected concurrently. 3.3.2 Fitness Function. The fitness function is responsible for evaluating a given individual and returning a value that represents its worth as a candidate solution. The returned value is typically used by the selection operator to determine which individual instances will be allowed to move on to the next round of evolution, and which will instead be eliminated. Highly fit individuals, relative to the whole population, have a higher probability of being selected for mating, whereas less fit individuals have a correspondingly low probability of being selected. To facilitate the selection operation in GA, the global minimization problem is usually changed into a global maximization problem. Through transforming Eq. (1) or Eq. (1’), the proper fitness function for the service selection problem can be obtained: N ⎧ ⎪U − ∑ cos t ( Sik ) F =⎨ i =1 ⎪0 ⎩
cos t ( SF ) < U , 1 ≤ k ≤ size( Si )
(3)
cos t ( SF ) ≥ U .
Where U should select an appropriate positive number to ensure the fitness of all good individuals are positive in the feasible solution space. On the other hand, U can also be utilized to adjust the selection pressure of GA. When U is increased, the relative fitness of good individuals are reduced, so the selection pressure is decreased, which can prevent the evolution process from premature convergence to get trapped into local minimums. But a too large U will slow down the evolution process, therefore the computing time will increase. 3.3.3 Population Initialization. The population initialization creates the first population and determines the starting point for the GA search routine. An individual is generated by randomly selected web service for each SAg of the services flow, and the newly generated individual is immediately checked to see whether the corresponding solution satisfies the constraints. If any of the constraints is violated, then the generated individual is regarded as invalid and discarded. The process is repeated until enough individuals are generated. The population size popsize ∈ [20,100] is generally recommended. 3.3.4 Genetic Operators. GA uses selection operator to select the superior and eliminate the inferior. The individuals are selected according to their fitness - the more suitable the more chances they have to reproduce. For the service selection problem, the popular roulette wheel method is used to select individuals to breed. This method selects individuals to reproduce based on their relative fitness, and the expected number of offspring is approximately proportional to that individual’s
Cost-Driven Web Service Selection Using Genetic Algorithm
911
performance, i.e. if one individual is twice as fit as another, then it is twice as likely to be selected and so on. At the same time, it is hoped the fittest individual can be retained in the offspring, so an elitism reservation strategy is applied, which always passes the fittest individual to the next generation, i.e. the individual with the highest fitness does not take part in crossover and mutation operation, instead it is used to replace the offspring with the lowest fitness generated by the crossover and mutation operation. S1=[3 2 0 4 1 6 2 5 2 0] S2=[2 1 2 3 0 5 4 0 1 3] Crossover
Crossover Result
{
S1'=[3 2 0 4 0 5 4 0 1 3] S2'=[2 1 2 3 1 6 2 5 2 0] Mutation
M utation Result for S1'
Mutation Point
S1''=[3 2 0 4 0 5 4 6 1 3]
Fig. 2. Crossover operator and Mutation operator
GA uses crossover operator to breed new individuals by exchanging partial chromosome of the mated parents in stochastic mode. The crossover probability pc ∈ [0.5,1.0] is generally recommended. The purpose of crossover is to maintain the qualities of the solution set, while exploring a new region of the feasible solution space. For the service selection problem, the single-point crossover operator is used: single crossover point is generated in the mated parents at random, then the two parents exchange the tail portion after the crossover point to create two new offspring (See Fig. 2). After each crossover operation, the offspring are immediately checked to see whether the corresponding solutions are valid. If any of constraints is violated, then both offspring are discarded and the crossover operation for the mated parents is retried. If the valid offspring still cannot be obtained after a certain number of retries, the crossover operation for these two parents is given up to avoid a possible infinite loop. GA uses mutation operator to add diversity to the solution set by replacing an allele of a gene by another in some chromosomes in stochastic mode. The mutation probability pm ∈ [0, 0.05] is generally recommended. The purpose of mutation operation is to avoid losing useful information in the evolution process. From another point of view, the mutation operation can improve the local search performance of GA. Together with crossover operation they complete the local and global search of the solution space. For the service selection problem, the stochastically selected SAg is randomly bound to a web service different from the original one (See Fig. 2). After an offspring is mutated, it is also immediately checked to see whether the corresponding solution is valid. If any of constraints is violated, then the mutated offspring is discarded and the mutation operation for the offspring is retried. Because the mutation
912
L. Cao, M. Li, and J. Cao
operation is carried out at a very small probability, the probability of generating invalid solution is also very small. Even if the invalid solution is generated, a valid solution can be easily obtained through a number of retries.
4 A Case Study and the Implementation We assume that there are twenty tasks in a pipeline SF and each of them can be accomplished by a service agent (SAg) (See Fig. 3). The SF may be a practical “Travel arrangement” composite service. Each kind of services might have multiple providers, and a service provider could provide several kinds of services. Those business entities may be competitors and do not work with each other. In other way, business entities would prefer to work with those within the alliance, and the alliance claims that there should be some degree discount if customers select some of them. Due to those business rules, there are relationships such as partnerships, alliances, competitors among the services. We must take into account those relationships when select the final services as part of the optimization process.
Start
AA
SAg-1
AA
SAg-2
Data&Control Flow
SAg-N
AA
AA
Logic node
Finish
AA
A: And
Fig. 3. A pipeline services flow
4.1 Initial Data
From Table 1, we can see each SAg might have various candidate numbers from 5 to 15 predefined at random. Different choices of web services in each SAg might have different cost. We have marked each SAg’s maximum cost with red color and minimum cost with blue color in Table 1. Table 1. Services cost SAg1 SAg2 SAg3 SAg4 SAg5 SAg6 SAg7 SAg8 SAg9 SAg10 SAg11 SAg12 SAg13 SAg14 SAg15 SAg16 SAg17 SAg18 SAg19 SAg20
WS1 642 1025 435 1314 1763 2077 521 1451 603 808 822 1786 986 1483 781 1588 875 1536 1640 1437
WS2 683 1093 472 1393 1799 2075 539 1447 645 829 839 1795 947 1465 755 1553 870 1531 1644 1427
WS3 642 1063 474 1308 1732 2057 512 1410 661 814 839 1796 953 1408 759 1595 819 1525 1649 1439
WS4 691 1083 454 1367 1752 2089 549 1449 605 822 802 1769 943 1404 742 1528 868 1591 1692 1418
WS5 616 1091 407 1390 1714 2016 502 1452 604 885 801 1720 935 1483 790 1585 895 1551 1635 1403
WS6 663 1015 410 1349 1746 2061 580 1457
WS7 612 1078 480 1385 1768 2046 579 1427
WS8
WS9
WS10
WS11
WS12
WS13
WS14
WS15
1011 424 1350 1741 2092 525 1430
1356 1732 2073
1327 1719 2051
1749 2083
1707 2090
2079
2098
2007
1443
1474
1408
1464
1427
1493
1432
868 841 1787 975 1492 751 1545 884 1555 1619 1442
811 870 1714 912 1465 758 1591 851 1575 1685 1495
814 831 1756 927
1747 926
1782 909
1703 924
1765 960
991
920
991
775 1580 877 1518 1697 1408
763 1591 858 1502 1660 1442
778
772
767
732
773
852 1504 1671 1457
848 1532 1603 1478
832 1576 1692 1484
882 1697 1410
1408
1452
Cost-Driven Web Service Selection Using Genetic Algorithm
913
For Eq. (2), we take the summation of all SAg’s maximum cost as the constant U, and get U =$24385. The individual fitness can be regarded as how much can be saved after the SF execution if the travel agency has been given the money U to arrange the traveler’s trip. If we use the local selection strategy, we might get a final services sequence as follows: { SAg1:WS7, SAg6:WS15, SAg11:WS5, SAg16:WS4,
SAg2:WS8, SAg7:WS5, SAg12:WS11, SAg17:WS3,
SAg3:WS5, SAg4:WS3, SAg5:WS12, SAg8:WS11, SAg9:WS1, SAg10:WS1, SAg13:WS10, SAg14:WS4, SAg15:WS13, SAg18:WS9, SAg19:WS11, SAg20:WS5 }
And its overall cost is the summation of all SAg’s minimum cost, so it equals to $22777. In this case, the individual fitness is $1608. According to some given business rules, we define constraints and corresponding actions (See Table 2). For the special individual, its fitness should be modified with the discount given. And for the invalid individual, it will be discarded. The expression {B(SAgi) = m, B(SAgj) = n} means that SAg i binds to the
mth service and SAg j
th
binds to the n service concurrently. Table 2. Constraint library 1 2 3 4 5 6
CONSTRAINT { B(SAg1)=1, B(SAg3)=2 } { B(SAg12)=2, B(SAg15)=2 } { B(SAg2)=2, B(SAg4)=3, B(SAg5)=4 } { B(SAg9)=2, B(SAg18)=5 } { B(SAg8)=2, B(SAg10)=5 } { B(SAg3)=6, B(SAg5)=4, B(SAg7)=3, B(SAg11)=5 }
TYPE Special Special Special Special Special Invalid
ACTION Modify (Discount=0.1) Modify (Discount=0.2) Modify (Discount=0.2) Modify (Discount=0.1) Modify (Discount=0.2) Discard
In the paper, some GA parameters are given as follows: z
z z z z
Solution representation: Integer encoding (serial number of services in SAg starting from zero) Population size: popsize = 40 Crossover probability: pc = 0.8 Mutation probability: pm = 0.05 Maximum generations: Genmax = 2000 .
4.2 Evolution Procedure
The initial population is randomly generated. After
Genmax generations, we have
gotten the final optimal solution:
[1
2 2 3 4 15 5 2 2 5 5 2 10 4 2 4 3 5 11 5]
914
L. Cao, M. Li, and J. Cao
It is also represented as { SAg1:WS1, SAg6:WS15, SAg11:WS5, SAg16:WS4,
SAg2:WS2, SAg7:WS5, SAg12:WS2, SAg17:WS3,
SAg3:WS2, SAg8:WS2, SAg13:WS10, SAg18:WS5,
SAg4:WS3, SAg9:WS2, SAg14:WS4, SAg19:WS11,
SAg5:WS4, SAg10:WS5, SAg15:WS2, SAg20:WS5 }.
4000
4000
3500
3500
3000
3000
2500
2500 Cost ($)
Fitness of the best individual
The solution fitness is $3193.
2000
2000
1500
1500
1000
1000
500
500
0
200 400 600 800 1000 1200 1400 1600 1800 2000 Evolution generation
Fig. 4. (a) GA evolution process
0
10
20
30
40 50 60 Retry time
70
80
90
100
(b) Multi-time GA optimization
We can see the GA evolution process from Fig. 4(a). The best fitness of the population has a rapid increase at the beginning of the evolution process then convergences slowly. It also means the overall cost of the SF is generally decreasing with the evolution process. For better solution, the whole optimization process can be repeated for a number of times (100 times in this paper, and different initial population in each time), and the best one in all final solutions is selected as the ultimate solution to the service selection problem. From Fig. 4(b), we can see the result. The red line represents the solution using local selection strategy. The blue curve represents all best solutions at 100 GA tests. Apparently, all best solutions using GA have better fitness than those using local selection strategy. In addition, the fact about their similar fitness proves that our GA could find the global optimal solution to the service selection problem.
5 Conclusion and Future Work A single web service is most likely inadequate to serve the customers’ business needs; it takes a selection of various web services composed together to form a business process. The cost is the primary concern of many business processes. In the paper, we propose a new solution using Genetic Algorithm (GA) in cost-driven web service selection. Using an integer string to represent web services composition, the best one is the string that leads to the lowest cost. Service selection is an optimization process with taking into account the relationships among the services. Seen from the experiment result, better performance can be gotten than that using local service selection strategy. The global optimal solution can be achieved with GA within short time. In future, we will consider more QoS factors besides cost as the objectives of service selection and implement multi-objective global optimization using GA.
Cost-Driven Web Service Selection Using Genetic Algorithm
915
References [1] F. Curbera et al. Unraveling the web services: an introduction to soap, wsdl, and uddi. IEEE Internet Computing, Mar/Apr issue 2002. [2] B. Benatallah, M. Dumas, Q. Z. Sheng, and A. H. Ngu. Declarative composition and peerto-peer provisioning of dynamic web services. In Proceedings of the International Conference on Data Engineering (ICDE), pages 297.308, San Jose CA, USA, February 2002. IEEE Press. [3] J. O'Sullivan, D. Edmond, and A. ter Hofstede. What's in a Service? Distributed and Parallel Databases, 12(2.3):117.133, September 2002. [4] Stuart Russell, Peter Norvig, Artificial Intelligence: A Modern Approach (Second Edition), Prentice Hall, 2002 [5] Y. Liu , AHH Ngu and L. Zeng , QoS Computation and Policing in Dynamic Web Service Selection, In Proc. 13th Int. Conf. World Wide Web (WWW), May 2004 [6] Liang-Jie Zhang, Bing Li, Requirements Driven Dynamic Services Composition for Web Services and Grid Solutions. Journal of Grid Computing, Vol. 2, No. 2. (June 2004), pp. 121-140 [7] FIPA Specifications (http://www.fipa.org)