A Semantic-based Software-as-a-Service (SaaS) Discovery and Selection System Y. M. Afify, I. F. Moawad, N. L. Badr, M.F. Tolba Faculty of Computer and Information Sciences Ain shams University, Cairo, Egypt
[email protected],
[email protected], dr.nagwabadr,
[email protected] Abstract- With the proliferation of Software-as-a-Service in the cloud environment, users find it tiresome and time consuming to search for the right service that satisfies all their requirements. In addition, services may provide the same functionality but differ in their characteristics or the Quality of Service attributes (QoS) they offer. In this paper, we propose a semantic-based system that facilitates the SaaS publication, discovery and selection processes. To achieve these goals, we developed a unified ontology that combines services domain knowledge, SaaS characteristics, QoS metrics and real SaaS offers. A hybrid service matchmaking algorithm is introduced based on the proposed ontology. It integrates semantic-based metadata and ontology-based matching. Prototypical implementation results demonstrate the effectiveness of the proposed system. Keywords- Cloud Computing, SaaS; Publication; Discovery; Selection; Services Characteristics; QoS; Services Ontology
I. INTRODUCTION Cloud Computing has become a new paradigm for the provision of computing infrastructure (IaaS), platform (PaaS) or software-as-a-service (SaaS) [1]. Cloud service identification and discovery remains a hard problem due to different service descriptions, non-standardized naming conventions and diverse features of cloud services [2, 3]. After successfully discovering services, it is a tedious and time-consuming task for users to evaluate and compare the available cloud offerings [4, 5]. The cloud providers typically publish their service descriptions, pricing policies and Service Level Agreement (SLA) rules on their websites in various formats. Hence, it is not an easy task to manually obtain and compare service configurations from cloud providers’ websites and documentations [2]. Analysis of existing research work on cloud services revealed some crucial limitations: the need for a detailed ontology for each cloud computing layer, the incompatibility and lack of standardization in services publication, the selection process depends solely on QoS attributes (except [6]), and finally; there is no prevailing complete system that tackles the publication, discovery and selection of SaaS services.
In order to address the above limitations, we propose a complete semantic-based system that governs the publication, discovery and selection of cloud SaaS services achieving the following objectives: • • • •
Developing a unified ontology that serves as a repository for real SaaS offers. Introducing a hybrid cloud service matchmaking algorithm based on the proposed unified ontology. Utilizing both the cloud service characteristics and QoS metrics in the selection phase. Assessing the effectiveness of the proposed system via prototypical implementation.
The remainder of this paper is organized as follows. Section 2 surveys some related work. Section 3 introduces the unified ontology. Section 4 presents proposed system architecture with a detailed description of its work flow. Section 5 presents the proposed hybrid service matchmaking algorithm in details. The implementation details and evaluation are presented in section 6. Finally, the conclusion and future work are presented in section 7. II. RELATED WORK A. Cloud Service Discovery Agent-based cloud computing is proposed in [7] to aid the development of software tools for cloud resource management. However, QoS parameters were not considered. In [8], cloud services are annotated for semantic-based discovery of relevant cloud services. Their work depends on WSDL files. Consequently, it cannot be directly applied to SaaS offers that differ in their characteristics. Ontology-based discovery architecture presented in [9] provides QoS aware deployment of appliances. However, they focus on IaaS service providers only. An unstructured P2P paradigm for service discovery in cloud is introduced in [10]. However, they did not consider QoS parameters in their paradigm. On the other hand, in [11], authors investigated a QoS-aware service discovery method in an unstructured P2P network. However, more practical applications are under future research for evaluating the feasibility of the method.
Unified business service and cloud ontologgy with service querying capabilities is proposed in [4] too provide the mapping between business functions and the offered services in cloud landscape. This work has the followingg limitations: a) the query limitation to exact matching off the business functions required by the user and b) the queryy is depicted in SPARQL language [12], which greatly limits the use of the ontology to experienced users only. The prooposed system overcomes their limitations with the followinng features: a) accept flat-text user request, which greatly faccilitates use of the system and maximizes users acceptance, bb) semantically expand the user request and the services desscription using WordNet [13], c) exploit ontology richness too find relevant services with related functionality and d) provide characteristics and QoS-based selection featuress. B. Cloud Service Selection CloudCmp framework for cloud service comparison is presented in [14]. However, their approach does not include a complete service selection or recommendation aalgorithm. The IaaS cloud service selection is formalizzed as a multicriteria decision making (MCDM) problem inn [15]. On the other hand, a QoS model for SaaS ERP is prooposed in [16] and used in a MCDM system that recommeends the most suitable SaaS ERP to the user. However, cloud characteristics were not considered in the selection process of bboth [15,16]. C. Cloud Service Discovery and Selection An OWL-S based semantic cloud service discovery and However, QoS selection system is proposed in [17]. H parameters were not considered. The SMICloud framework is proposed in [5]. They proposed an Analytical Hierarchy Processing ((AHP) ranking mechanism for cloud services selection. Howeever, no details were given on the discovery and matching proccesses and their metrics focus on quantifiable metrics in context of IaaS. Notably, existing work mainly focuses on IaaS services. Despite the evidential popularity of SaaS servvices, only few studies focus on this subject, and their mainn objective is service selection. On the other hand, the prooposed system governs the publication, discovery and selectionn of cloud SaaS services, which did not receive eligible research attention.
and Services Code® (UNSPSC® ®) [22] and North American Industry Classification System (N NAICS) [23]). The unified ontology curren ntly consists of 650 concepts that represent the domain knowledge for four SaaS application domains: Customer Relationsship Management (CRM), Enterprise Resource Planning (ERP), Collaboration and Document Management (DM). Another contribution in thiis respect is modeling real service offers according to the developed ontology. Cloud hrough a market research by service offers were collected th manually visiting cloud providerss’ portals. The ontology is representted in the Web Ontology Language (OWL) [24]. Protégé 4.1 editor [25] was used to implement the ontology. Fig. 1. shows the main concepts of the unified ontology. Fig. 2. shows a snapshot of the services unctions are represented as domain ontology. Business fu concepts (e.g., Reporting). Objject properties are used to connect the service domain concepts c and SaaS service concepts, they describe the business processes supported by each service (e.g. provides). SaaS characteristics [3] aree represented as concepts as shown in fig. 3. They are conneected to service offers using object properties (e.g. isFree). Service QoS attributes are described using datatype properrties (e.g. hasResponseTime), which specify QoS values guaran nteed by the service provider.
Fig. 1. Unified Ontolo ogy: Main Concepts
Y III. UNIFIED ONTOLOGY The proposed ontology merges knowledge about services domain, SaaS characteristics and QoS metrics in addition to real offers. It serves as a semantic-based repository across the publication, discovery and selection processes. The first contribution in this respect is thee collection of the most important concepts in SaaS servicess domain. The required services domain knowledge was ccollected from multiple resources: Business Function Onttology (BFO) framework [18], cloud ontologies [19-21], and industry classification standards (United Nations Stanndard Products
Fig. 2. Unified Ontology: Servicces Domain Ontology View
hen stored in ontology. Expanded service description is th 2.
Service Clustering Module This module is responsiblee for clustering the service offers based on their similar functionalities in order to expedite the retrieval of the mostt relevant SaaS services. The Agglomerative Hierarchical Clusstering (AHC) [26] approach is used. If the similarity betweeen two services is above a threshold value, then they belong to the same cluster.
Fig. 3. Unified Ontology: Service Characteristiccs View
IV. SYSTEM ARCHITECTUR RE In addition to the web-based user interface aand the unified ontology, the proposed system is composed of thhree main subsystems. The proposed system architecture is shoown in Fig. 4. A. Service Registration Sub-system This sub-system is composed of two modules: the catalogue manager and the service clustering moodules. 1.
Catalogue Manager Module Cloud service providers register their serrvices through user-friendly web-based interface based oon predefined parameters: provider name, service name, URL L, description, features, application domain, price per month, ccharacteristics, SLA. Then, the cloud providers map the service features to ontological concepts retrieved from the unified oontology. The catalogue manager is responsible for the preprocessing stage [26] of the service desscription. Preprocessing includes tokenization, stop wordss removal and stemming. It aims at the unification of servicees descriptions before the matching process. The catalogue m manager is also responsible for accepting updates of the registered services. The WordNet ontology is then consulted to expaand the service description using the token synonyms.
ontaining many services, the To compare two clusters co average inter-similarity [26] of the two clusters is computed. The process continues until clussters do not change for two successive iterations. After clu ustering, a cluster signature vector is created for each cluster that t contains the key terms of this cluster that best describe the services s in this cluster. B. Service Discovery Sub-system osed of two modules: the This sub-system is compo semantic query processor and fun nctional matching modules. 1. Semantic Query Processor Mod dule The user enters his flat-textt query using a user-friendly web-based interface. Like the serv vice description, the semantic query processor pre-processes and d expands the user query. 2. Functional Matching Module The functional matching module is responsible for y against the cluster signature matching the expanded user query vectors. The Vector Space Modeel (VSM) [26] is exploited to present the query and clusters signature vectors. The term quency (tf-idf) model [26] is frequency-inverse document freq used to calculate the weight vecto ors using (1): ,
=
,
· log
| ′
| | |
(1) ′′
|
Where tft,c is the count of terrm t occurrences in cluster c, ′ |C| is the total number of clusteers and | ′ ′ | is the | number of clusters that contain th he term t. The similarity between cluster signature vector c and c similarity measure (2): query q is calculated using the cosine
Fig. 4. Proposed System Architecture
,
∑
= ∑
,
·
(2)
,
∑
,
,
Services that belong to the cluster with the maximum similarity are retrieved to be processed by the selection subsystem. In case the maximum similarity calculatted is less than a threshold, the system accepts refined request fr from the user. C. Service Non-Functional Selection Sub-system m Cloud services differ greatly in their ccharacteristics. Several cloud taxonomies [3, 27, 28] describee the common cloud service characteristics. To the best off the authors’ knowledge, existing research work [5, 15, 16, 229, 30] - except [6] - focuses on QoS-based selection only annd neglects the other cloud service characteristics. To com mplement the existing work, both characteristics and QoS m metrics of the SaaS cloud services are employed in the selectioon phase. This sub-system is composed of ttwo modules: characteristics-based filtering and QoS-based rannking. 1.
Characteristics-based Filtering Module The discovered services are filtered according to characteristics that the user is interested in. Therrefore, only the specified characteristics are considered in matchhing process. We have a set of k services K = {s1, s2...sk} resulting from the discovery process, where k ≥ 1, andd a set of n characteristics C = {c1, c2...cn} selected by the user as a base for non-functional matching, where n ≥ 1.. The service characteristic values form the following k٠n m matrix V, where vi,j represents the value of characteristic j for servvice i.
V=
c1
c2
,
,
,
,
,
,
,
,
,
Cn
⁄
=
,
,
=
(3)
0,
In the case of enumeration ty yped characteristic, the values are positively ordered i.e. valu ues with higher position are evaluated as better than the other.. The relative ranking value is calculated using (4): , ⁄
=
, ,
(4)
,
The characteristic-based mattching values are calculated, and then results are displayed to o the user. Consequently, he can specify x, which represents the t number of services to be ranked, where 1 < x ≤ k. Finally y, the first ordered x services using (5) are promoted to the QoS S-based ranking module. max
⁄
(5)
2.
QoS-based Ranking Modulee SaaS selection among serv vices that provide the same functionality but differ in their QoS attributes is a MultiDM) problem that involves Criteria Decision-Making (MCD criteria with interdependent reelationship. The Analytical Hierarchy Process (AHP) [32] iss used to rank the discovered services with matching characteriistics. In order to choose the best metrics m upon which SaaS can be compared, a survey was condu ucted on recent research work [5, 15, 16, 29-31]. We selectively y extracted 21 metrics as our evaluation model. The AHP SaaS S ranking problem is modeled in Fig. 5. The hierarchy is for th he selection parameter levels (QoS metrics) only, not the alternatives a (service offers). Finally, the ranked services are diisplayed to the user.
The user specifies his required service characteristic values R= {r1, r2…rn} from a predefined set extrracted from the ontology. The user enters his priority weigghts W= {w1, w2…wn} for selected characteristics such that w weights sum is 1. The characteristics-based filtering involves comparing the user required values R to the service characcteristic values matrix V to filter the discovered services. In the proposed system, the cloud service characteristics value types are considered. Single value types include stringbased (e.g. license) and enumeration types (ee.g. openness). The relative ranking of the user required valuue against the service characteristic value is represented by ⁄ [31]. In the case of string-based characteristic, tw wo cases exist. First, the required value is the same as the service value. Second, either different values or the servvice value is unknown for this characteristic. The relative raanking value is calculated using (3): Fig. 5.
AHP hierarchy for SaaS QoS-based Ranking
V. HYBRID SERVICE MATCHMAKING ALGORITHM In order to cluster cloud SaaS services that provide related functionalities, a hybrid matchmaking algorithm is proposed that makes use of both semantic-based services metadata and ontology-based matching [33]. Fig. 6. presents the pseudocode of the proposed SaaS services matchmaking algorithm.
Where leaves(c) is number of leaves of concept c, subsumers(c) is its relative depth, and maxleaves is number of leaves corresponding to root node of the hierarchy. Case 3: the distance-based and content-based similarity models are integrated. The distance-based similarity calculates the minimum path of edges between the two concepts. Resnik semantic similarity measure [35] is calculated using (9):
Line 2: Semantic-based Matching
,
Using (2), the semantic similarity between the expanded service descriptions is calculated resulting in , . Lines 3-27: Ontology-based Matching The novelty of this work is to make use of the richness of the ontology represented in the object properties in the context of cloud SaaS service discovery. Service features are modeled using object properties that relate service concept to business function concepts. Ontology-based similarity matching comprises features and hierarchical similarity matching. Line 3: Features Similarity Features similarity denotes the common features provided by the two services. For example, for two services s1 and s2, the features similarity is calculated using (6): ,
=
| min |
| |, |
Where | 1 | represents the number of common 2 object properties in two services. Lines 4-27: Hierarchical Similarity Hierarchical similarity measures are utilized to find any ontological relationship between the different business functions supported by the two services. In the previous example, if we have n unique business functions for s1 and m unique business functions for s2. Semantic similarity between two business functions is calculated using one of the following cases: Case 1: if two concepts have a child-parent relationship, then they are considered to have the similarity value 1. Case 2: if two concepts are siblings, then distance similarity is irrelevant. Consequently, content-based similarity is computed using Lin measure [34], which is calculated using (7): ,
=
,
2
(7)
Lin measures the ratio of the Information Content (IC) of the Lowest Common Subsumer (LCS) to the IC of each of the concepts. The computation model proposed in [34] is used. The concept’s IC is calculated using (8): =
|
|
|
1
| 1
(8)
min
,
(9)
,
,
=
,
⁄2
(10)
The ontological services similarity between s1 and s2 is calculated using (11): ∑ ,
max
,
=
(11)
Lines 28-30: Overall Services Similarity ,
=
·
·
, ·
,
(12)
,
Where a, b and c are weights that reflect the importance of each similarity measure, and the weights sum up to 1. Algorithm: SaaS Services Matchmaking Input: Two services s1 and s2 , Output: Overall similarity:
(6) |
= 2
Where D is the maximum depth of the ontology. Finally, the ontological measures are integrated using (10):
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30.
Begin Calculate semantic similarity of service descriptions , using (2) Calculate the features similarity , using (6) , Calculate the hierarchical services similarity using (11) Sum = 0 For each concept ci in s1 where 1