services in cloud computing service composition (CCSC) is a noteworthy problem ... Selection; Benchmark; Best Practices; Standard Test; Dataset;. Quality of ...
2014 Fifth International Conference on Intelligent Systems, Modelling and Simulation
A New Dataset and Benchmark for Cloud Computing Service Composition Amin Jula1, Hamid Nilsaz2
Elankovan Sundararajan3, Zalinda Othman4
1Data
3Data
Mining and Optimization Research Group, Centre for Artificial Intelligence, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, UKM Bangi, 43600 Selangor, Malaysia, 2 Department of Mathematics, Mahshahr branch, Islamic Azad University, Mahshahr, Iran. {amin.jula, hamid.nilsaz}@gmail.com
Mining and Optimization Research Group, Centre for Artificial Intelligence, 4Centre of Software Technology and Management, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, UKM Bangi, 43600 Selangor, Malaysia. {elan, zalinda}@ftsm.ukm.my
matching and security are only a part of the problems. What can bring success for a cloud supplier is customer satisfaction. Quality of provided services has a significant role in satisfaction of cloud users. Accordingly, maximizing quality of service (QoS) should be considered an ultimate goal for cloud suppliers. Reaching this goal requires a pre-defined procedure for selecting every required service among a great number of similar services that are provided by a lot of service providers. It can be concluded that finding an optimal solution for each required complicated service is an NP-hard optimization problem. Hence, solving the problem in reasonable time is impossible by using classic methods, and designing heuristic algorithms for reaching near to optimal solutions in suitable time is inevitable. Since 2009, remarkable researches have been done and different algorithms are proposed by researchers based on different algorithmic approaches. A considerable number of efforts focused on optimizing classic algorithms and customizing them for solving CCSC problem. Researches on using graph-based algorithms [58] and basic non-heuristic methods [9-16] approaches are in this category. Effective attempts are also seen in the use of intelligent algorithms and heuristics [17-25]. But what is quite obvious, the absence of a reliable and common benchmarks for comparing the results of the proposed methods. Although some studies have used one or more datasets (e.g. QWS [26], WSDREAM [27] and tpds2012 [27]), but it is very difficult and sometimes impossible to compare the quality and error rate of the obtained results due to lack of utilizing common CCSC problems with identified optimal solutions. In this paper, a set of reliable CCSC problems called “CCSC_Benchmark” is introduced. CCSC_Benchmark includes problems with different sizes generated randomly based on WSDREAM dataset. For each problem, the optimal solution is also specified to help algorithm designers in evaluating the proposed methods and calculating error rate. It is hoped that CCSC_Benchmark to be used by all researchers in the field in order to pave the way for future works.
Abstract— Cloud computing as an effective computing approach is attracting increasingly the attention of heavy processing applicants to its capabilities. Cloud suppliers try to prevent any service-request unanswered utilizing a large number of service providers. Selecting optimal uniqueservices in cloud computing service composition (CCSC) is a noteworthy problem that must be addressed extremely accurate and scrupulously. Recently, considerable studies have been done for solving CCSC, however lack of widely accepted and reliable CCSC problems for fair and equitable comparison of proposing methods, is a blind spot and should be considered as a high-priority problem. In this paper, a reliable set of ten CCSC problems is introduced (CCSC_Benchmark) which is prepared to be used as a "standard test" in future works. To complete the usability of the problem set, the best solutions of the generated problems are also found and presented to provide a facility for calculating error rate of the proposed methods.
Keywords-Cloud Computing;Service Composition; Service Selection; Benchmark; Best Practices; Standard Test; Dataset; Quality of Service Parameters; QoS
I.
INTRODUCTION
In recent years, wide introducing of cloud computing [1, 2] has led to a significant attention towards using its capabilities and facilities. Applying cloud computing needs multilateral researches on different issues which are raised [3, 4]. Hence, Researchers have also started their studies on developing and improving different aspects of the clouds. One of the important subjects of global applying of cloud computing is the emergence of composite service concept. Requiring of vastly different complicated services will make it impossible for service providers to prepare all required services individually. Therefore, it is necessary for cloud suppliers to utilize ready-for-ever service composition systems to provide every required complicated service by putting together available services. Cloud computing service composition (CCSC) also confronts different problems. Automacity, functionality
2166-0662/14 $31.00 © 2014 IEEE DOI 10.1109/ISMS.2014.22
83
describe how the WSDREAD is used to extract a missingless dataset and the second one, B, will discuss about the way the CCSC_Benchmark is generated based on extracted dataset.
The remainder of the paper is as follows. Cloud computing service composition problem is described in part II. In part III, the method of imputation of dataset missing values and generating proposed benchmark are explained. Finally, conclusion is presented in part IV. II.
A. Imputation of Missing Values As it is realized in [28], Response-time (RT) and Cost are two very important QoS parameters among all parameters affect the quality of service in CCSC. Hence, in preparing the benchmark, the focus is on RT and Cost. WSDREAD is the utilized dataset because of its RT realworld data. However, it needs improvements because of facing significant missing values. There are many service providers for which some variable values are missed. Since there are only 339 service providers, excluding all of these incomplete cases would reduce the size of the dataset and leads a loss of power in data analysis [29]. A pragmatic approach to cope with the problem of missing data is to impute theses values. Imputing the missing data leads to a complete dataset and allows to apply the analysis methods which need a complete dataset [30]. A variety of traditional imputation methods are proposed to deal with missing values, including complete case analysis, mean substitution, regression imputation and so on. Multiple-imputation is a general solution to the problem of missing data and can be considered as an innovative approach over the traditional methods [31].
CLOUD COMPUTING SERVICE COMPOSITION PROBLEM (CCSC)
Providing an optimal composite service required to select optimal for all the required single services. Being optimal for each service depends on customer constraints and service QoS values. According to providing most of single services by a great number of service providers, and because of abundant number of unique single services in the service pool, selecting optimal single services for being combined together for providing composite services is one of the most difficult problems in service composition, here is called CCSC. The Service Composition problem in cloud computing can be considered as how to select unique single services among all provided services in the service pool, such that the achieved composite service satisfies both the QoS requirements and expected functionalities on the basis of customer requirements. Due to the extensiveness of the search space, CCSC is addressed as an NP-hard optimization problem. Suppose that Composite Service i, CSi , will be obtained by composing n Single Services, SS j , as it is defined in (1).
CSi [ SS1, SS 2 ,...,SS j ,...,SS n ]
(1)
If there are k QoS parameters should be considered in finding the optimal single services, then the quality of CSi will be calculated by (2) that is identified as objective function.
Q(CSi ) (Q( SS1, Q( SS 2 ),...,Q( SS n ))
(2)
In CCSC, the ultimate goal is to find the optimal composite service in which the quality of service is maximized. III.
Figure. 1 Summary of missing values of WSDREAM dataset for RT
As an exploratory analysis before implementing the imputation, we analyzed the patterns of missing values in order to show the number and percent of missing values for each service, and detect services with the highest percentage of missing values. The pie charts in Figure.1 show the number of services, service providers, and individual data values which have missing values. As shown in Figure.1, out of 5825 services, 4380 services (75.19%) have missing values and the percentage of missing values in the whole dataset is 5.1%.
GENERATING THE BENCHMARK
As it is stated earlier, providing a set of problems by which all researchers accept to assess their proposed methods and techniques is extremely important. In this part, it is described that how CCSC_Benchmark1 is generated and can be used as an identical evaluation criterion in assessment of CCSC researches. In the following, there are two subtitles. First subtitle, A, will 1
http://www.clouds-research.net/CCSC_Benchmark
84
i Min ( RT ) all selected serives i i Optimal _ SolutionCost ( P) Min (Cost ) all selected serives i i Optimal _ SolutionRT ,Cost ( P) Min (0.5 RT 0.5 Cost) all selected serives i Optimal _ SolutionRe sponseTime ( P)
(5)
This work was partly supported by the Ministry of Higher Education Malaysia [grant number ERGS/1/2013/ICT07/UKM/02/3]. The authors would like to express their gratitude for the research grant. REFERENCES [1]
M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, et al., "A View of Cloud Computing," Communications of the Acm, vol. 53, pp. 50-58, Apr 2010.
[2]
B. Hayes, "Cloud computing," Communications of the Acm, vol. 51, pp. 9-11, Jul 2008.
[3]
T. G. Peter Mell, "The NIST Definition of Cloud Computing," N. I. o. S. a. Technology, Ed., ed: U.S. Department of Commerce, 2011.
[4]
T. Dillon, W. Chen, and E. Chang, "Cloud Computing: Issues and Challenges," in Advanced Information Networking and Applications (AINA), 2010 24th IEEE International Conference on, 2010, pp. 27-33.
[5]
N. R. Sabar, M. Ayob, R. Qu, and G. Kendall, "A graph coloring constructive hyper-heuristic for examination timetabling problems," Applied Intelligence, vol. 37, pp. 1-11, 2012.
[6]
H. Liu, Z. B. Zheng, W. M. Zhang, and K. J. Ren, "A Global Graph-based Approach for Transaction and QoS-aware Service Composition," Ksii Transactions on Internet and Information Systems, vol. 5, pp. 1252-1273, Jul 2011.
[7]
J. Gekas and M. Fasli, "Automatic Web service composition based on graph network analysis metrics," in On the Move to Meaningful Internet Systems 2005: Coopis, Doa, and Odbase, Pt 2, Proceedings. vol. 3761, R. Meersman, Z. Tari, M. S. Hacid, J. Mylopoulos, B. Pernici, O. Babaoglu, et al., Eds., ed Berlin: Springer-Verlag Berlin, 2005, pp. 1571-1587.
[8]
J. Huang, Y. B. Liu, R. Z. Yu, Q. Duan, and Y. Tanaka, "Modeling and Algorithms for QoS-Aware Service Composition in Virtualization-Based Cloud Computing," Ieice Transactions on Communications, vol. E96B, pp. 10-19, Jan 2013.
[9]
K. Kofler, I. ul Haq, and E. Schikuta, "A Parallel Branch and Bound Algorithm for Workflow QoS Optimization," in Parallel Processing, 2009. ICPP '09. International Conference on, 2009, pp. 478-485.
[10]
M. Liu, M. R. Wang, W. M. Shen, N. Luo, and J. W. Yan, "A quality of service (QoS)-aware execution plan selection approach for a service composition process," Future Generation Computer Systems-the International Journal of Grid Computing and Escience, vol. 28, pp. 1080-1089, Jul 2012.
[11]
W. Shangguang, Z. Zibin, S. Qibo, Z. Hua, and Y. Fangchun, "Cloud model for service selection," in Computer Communications Workshops (INFOCOM WKSHPS), 2011 IEEE Conference on, 2011, pp. 666-671.
2
B. Generating the CCSC_Benchmark The template is used to format your paper and style the text. All margins, column widths, line spaces, and text fonts are prescribed; please do not alter them. You may note peculiarities. For example, the head margin in this template measures proportionately more than is customary. This measurement and others are deliberate, using specifications that anticipate your paper as one part of the entire proceedings, and not as an independent document. Please do not revise any of the current designations. CONCLUSION
Service composition as one of the most important problems in the field of cloud computing motivated researchers to propose novel techniques and methods for finding optimal solutions in less time. Despite remarkable conducted researches, the lack of suitable benchmarks for providing an identical evaluation and comparison basis has caused different datasets and problems have been created by researchers. In this paper, by imputing missing values of WSDREAD for response-time parameter and generating suitable dataset for Cost of the same size of what improved WSDREAM is, a new dataset is introduced includes two sets of values for response-time and cost. Then, based on generated dataset, a set of CCSC problem called CCSC_Benchmark is generated and exposed in http://www.clouds-research.net/CCSC_Benchmark to provide a proper identical criterion for comparing different proposing algorithms and techniques. 2
(4)
ACKNOWLEDGMENT
In the first step, services with 30% or more missing values (285 services) were excluded from analysis. This lead to a new data set containing 5540 services. For multiple-imputation, an iterative Markov chain Mont Carlo method with assumption is used. In order to make the imputation-method as effective as possible, the method is iterated 5 times in which 5 is actually the percentage of missing cases in the dataset [32]. For preparing proper Cost values to be used in the dataset and the benchmark, it was necessary to randomly generate an array including assumed Cost values so that RT array and Cost array are isodiametric. Thus, a reliable dataset is provided so that can be seen neither missing values nor outlier data. It also provides RT and Cost values together due to cover both most important QoS parameters.
IV.
(3)
http://www.clouds-research.net/CCSC_Benchmark
85
[12]
Y. Zhu, W. Li, J. Luo, and X. Zheng, "A novel two-phase approach for QoS-aware service composition based on history records," in Service-Oriented Computing and Applications (SOCA), 2012 5th IEEE International Conference on, 2012, pp. 1-8.
[13]
Z. ur Rehman, F. K. Hussain, and O. K. Hussain, "Towards Multi-criteria Cloud Service Selection," in Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2011 Fifth International Conference on, 2011, pp. 44-48.
[14]
Y. Qi and A. Bouguettaya, "Efficient Service Skyline Computation for Composite Service Selection," Knowledge and Data Engineering, IEEE Transactions on, vol. 25, pp. 776-789, 2013.
[15]
[16]
[17]
[18]
Q. Wu, M. Zhang, R. Zheng, Y. Lou, and W. Wei, "A QoSSatisfied Prediction Model for Cloud-Service Composition Based on a Hidden Markov Model," Mathematical Problems in Engineering, vol. 2013, p. 7, 2013. R. Karim, D. Chen, and A. Miri, "An End-to-End QoS Mapping Approach for Cloud Service Selection," in Services (SERVICES), 203 IEEE Ninth World Congress on, 2013, pp. 341-348. A. Jula, Z. Othman, and E. Sundararajan, "A hybrid imperialist competitive-gravitational attraction search algorithm to optimize cloud service composition," in Memetic Computing (MC), 2013 IEEE Workshop on, 2013, pp. 37-43. Z. Ye, X. Zhou, and A. Bouguettaya, "Genetic Algorithm Based QoS-Aware Service Compositions in Cloud Computing," in Database Systems for Advanced Applications. vol. 6588, J. Yu, M. Kim, and R. Unland, Eds., ed: Springer Berlin Heidelberg, 2011, pp. 321-334.
[19]
S. A. Ludwig, "Clonal selection based genetic algorithm for workflow service selection," in Evolutionary Computation (CEC), 2012 IEEE Congress on, 2012, pp. 1-7.
[20]
Y. Yang, Z. Mi, and J. Sun, "Game theory based iaas services composition in cloud computing environment," Advances in Information Sciences and Service Sciences, vol. 4, pp. 238-246, 2012.
[21]
H. Jiang, C. K. Kwong, Z. Chen, and Y. C. Ysim, "Chaos particle swarm optimization and T–S fuzzy modeling approaches to constrained predictive control," Expert Systems with Applications, vol. 39, pp. 194-201, 1/ 2012.
86
[22]
J. X. Liao, Y. Liu, J. Y. Wang, and X. M. Zhu, "Service Composition Based on Niching Particle Swarm Optimization in Service Overlay Networks," Ksii Transactions on Internet and Information Systems, vol. 6, pp. 1106-1127, Apr 2012.
[23]
S. G. Wang, Q. B. Sun, H. Zou, and F. C. Yang, "Particle Swarm Optimization with Skyline Operator for Fast Cloudbased Web Service Composition," Mobile Networks & Applications, vol. 18, pp. 116-121, Feb 2013.
[24]
X. Zhao, Z. Wen, and X. Li, "QoS-aware web service selection with negative selection algorithm," Knowledge and Information Systems, pp. 1-25, 2013/04/16 2013.
[25]
Q. Lie, W. Yan, and M. A. Orgun, "Cloud Service Selection Based on the Aggregation of User Feedback and Quantitative Performance Assessment," in Services Computing (SCC), 2013 IEEE International Conference on, 2013, pp. 152-159.
[26]
E. Al-Masri and Q. H. Mahmoud, "Discovering the best web service: A neural network-based solution," in Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on, 2009, pp. 4250-4255.
[27]
Z. Zibin, Z. Yilei, and M. R. Lyu, "Distributed QoS Evaluation for Real-World Web Services," in Web Services (ICWS), 2010 IEEE International Conference on, 2010, pp. 83-90.
[28]
A. Jula, E. Sundararajan, and Z. Othman, "Cloud Computing Service Composition: A Systematic Literature Review," Expert Systems with Applications, (2013), doi: http://dx.doi.org/10.1016/j.eswa.2013.12.017.
[29]
L. RJA and R. DB, Statistical Analysis with Missing Data, 2nd ed. New York: John Wiley & Sons, 2002.
[30]
D. S. Bouhlila and F. Sellaouti, "Multiple imputation using chained equations for missing data in TIMSS: a case study," Large-scale Assessments in Education, vol. 1, pp. 1-33, 2013.
[31]
J. A. C. Sterne, I. R. White, J. B. Carlin, M. Spratt, P. Royston, M. G. Kenward, et al., "Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls," RESEARCH METHODS & REPORTING, vol. 339, pp. 157160, 2009.
[32]
P. Royston and I. R. Whit, "Multiple Imputation by Chained Equations (MICE): Implementation in Stata," Journal of Statistical Software, vol. 45, pp. 1-20, 2011.