Classification and Composition of QoS Attributes in ... - Semantic Scholar

2 downloads 17186 Views 620KB Size Report
hosting a service. Examples of resources are web servers, databases and virtual machines. A resource belongs to a provider, as expressed by the composition ...
2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Classification and Composition of QoS Attributes in Distributed, Heterogeneous Systems Elisabeth Vinek

Peter Paul Beran and Erich Schikuta

CERN CH-1211 Gen`eve 23 Switzerland Email: [email protected]

Workflow Systems and Technology Group University of Vienna Rathausstrasse 19/9, 1010 Vienna, Austria Email: {peter.beran,erich.schikuta}@univie.ac.at

are registered, unregistered and possibly changed regularly, statistics are difficult to gather and analyze in a coherent way. In other scenarios however, in systems that have well-set boundaries and responsibilities, there can be an abundance of non-functional attributes to be collected and analyzed. In any case, the system and its structure have to be described and understood before applying optimization strategies. In our previous work [1], an ontology defining the components of a distributed system has been developed, along with a taxonomy of service attributes, both functional and nonfunctional. In this paper, the QoS attributes are studied in more detail and the contributions can be summarized as follows.

Abstract—In large-scale distributed systems the selection of services and data sources to respond to a given request is a crucial task. Non-functional or Quality of Service (QoS) attributes need to be considered when there are several candidate services with identical functionality. Before applying any service selection optimization strategy, the system has to be analyzed in terms of QoS metrics, comparable to the statistics needed by a database query optimizer. This paper presents a classification approach for QoS attributes of system components, from which aggregation functions for composite services are derived. The applicability and usefulness of the approach is shown in a distributed system from a High-Energy Physics experiment posing a complex service selection challenge. Index Terms—Service-Oriented Architectures, QoS-based Web Service Selection, Pattern-based Aggregation Functions, Blackboard-based Optimization.



I. I NTRODUCTION •

In distributed, service-oriented systems, where a given functionality might be covered by multiple services differing only in non-functional qualities, the selection of a service to respond to a specific request is an increasingly important task. Additionally, for responding to complex requests, it is necessary to compose several atomic services, each of which may have several concrete instances. In order to optimally select and compose services, their environment and behavior needs to be understood and represented in detail. The optimality has to be defined on a case-per-case basis, depending on the requirements of a specific system or user. Assuming that services from a single service class have the exact same functionality, it is thus crucial to take into account their non-functional properties (NFP), also referred to as QoS properties. The questions of how to represent, classify and collect those attributes can be addressed separately from any specific optimization objective. While most of the work in the area of service selection optimization focuses on the actual optimization objective and solution approaches, the present work emphasizes the step preceding the optimization, namely defining and aggregating system statistics according to service composition patterns. Depending on the considered system and the viewpoint of the optimizer, NFPs can be more or less known. For example, in a highly dynamic environment without a central control instance, where services from various contributors and providers 978-0-7695-4395-6/11 $26.00 © 2011 IEEE DOI 10.1109/CCGrid.2011.53



A comprehensive classification of QoS attributes is presented, aiming to help for “scanning” a distributed system in terms of applicable metrics. The aggregation behavior of QoS attributes in composition patterns – i.e. when concrete services are composed – is analyzed. The classification of QoS attributes and their aggregation functions are applied to a concrete application scenario, highlighting how goal functions can be derived, and presenting an XML Schema binding to illustrate the applicability of our approach.

The studied application scenario is an example of a service selection challenge in ATLAS, a High-Energy Physics experiment, where massive amount of data is queried and has to be shipped over the network. The ATLAS TAG system [2], [3] is composed of distributed databases and services accessing them. When a request is issued by a physicist, services have to be selected and composed to build a workflow. As there is a central control instance and the system components and boundaries are well known, QoS attributes can be gathered and analyzed. The application scenario illustrates the usage of the system ontology, attribute classification and aggregation functions, and thus enables to validate the presented approach. The remainder of this paper is organized as follows. Section II discusses related work. Section III introduces the motivating example. Section IV presents a classification schema for non-functional system attributes that can be used as input to an optimization process. In Section V service composition patterns are presented, and the corresponding aggregation 424

patterns of attributes are discussed. Section VI emphasizes how the presented system ontology, attribute classification and aggregation functions can be used in our example scenario. Finally, Section VII concludes the work and gives an outlook to future efforts on this topic.

such as execution time, counter, data movement, synchronization, ration and temporal overhead According to Jaeger et. al. [11] aggregation models for QoS dimensions based on composition patterns derived from workflow patterns are proposed. Concrete aggregation functions on several composition patterns are provided for execution time, costs, encryption, throughput and uptime probability. As opposed to our work, the authors depart from the analysis of workflow patterns and provide chosen QoS metrics as examples. They argue that a previous classification of QoS attributes is not beneficial, because their definition is highly dependent on the environment and QoS metrics might be incompatible in terms of values and units. We are however showing that such a classification can be made, because some QoS attributes are most likely present in any considered system. Even if their concrete definition might vary and values might be computed based on custom functions, the overall aggregation patterns remain valid. The two works are thus related, but the problem is examined from a different angle. In [11], the authors start from pattern definition, we are departing from a sound, formal system description.

II. R ELATED W ORK Applying optimization strategies to service selection and composition is a widely studied research topic. A general framework for dynamic service composition is presented in Zeng et. al. [4]. QoS attributes are included in the framework, but the need to elaborate more on accurate QoS prediction is outlined as future work. Efficient algorithms for QoS-aware service composition are presented by Yu et. al [5] and Alrifai et. al. [6]. The focus of these works is on the optimization process. Specific QoS metrics are provided as examples, but are not put into a generic context. Badr et. al. [7] propose an ontology to classify NFPs into four discriminative classes: execution, security, business and environmental. By adopting a weighted sum approach with weights reflecting the relative importance (preference) of a property they are able to fine-tune their web service selection process so as to fit the user expectations. According to Reiff-Marganiec et. al. [8] a generic model for capturing, measuring and evaluating non-functional aspects based on the inContext platform is presented. These QoS criteria are classified into two categories, optional (soft) and mandatory (hard). Moreover the authors present a relevance ranking algorithm that allows to incorporate these soft and hard criteria in the decision process during the service lookup and selection phase by applying weights for each criteria. Chaari et. al. [9] present a non-functional parameters-based framework for Web service discovery and selection. This approach builds on an ontology-based categorization of qualitative and quantitative parameters, defining four measurement scales for them: nominal, ordinal, interval and ratio. Using this classification the syntactic description of a Web service – a WSDL document – can be enriched by a semantic description that covers NFPs. Comparable to our work Truong et. al. [10] present an ontology-based approach for the classification and categorization of performance metrics that occur in Grid systems. They focus on the selection, composition and execution of Grid workflows with respect to the knowledge they gather by doing some performance monitoring and analysis. A significant difference between their and our approach is that they clearly focus on the activity and run-time aspect, while we also cover physical and logical infrastructure issues. In their understanding metrics can be applied on multiple abstraction layers, such as workflow, workflow region, activity, invoked application and code region. Moreover, activities imply data as well as control dependencies that steer the workflow execution process. Similar to our approach the authors in [10] also identified aggregation functions (e.g. sum, average) that have to be applied on higher abstraction levels to obtain aggregated metrics. Moreover, they defined performance metric categories

III. M OTIVATING E XAMPLE : T HE TAG USE - CASE TAGs are event-level metadata in the ATLAS experiment, used for preselection of interesting events for further physics analysis. The advantage of TAGs is that they are relatively small, and making a TAG-based event preselection considerably reduces the amount of data needed in the resourceintensive analysis process. TAGs are stored in relational databases distributed among ATLAS sites. In order to cover the TAG use cases, there exist several services accessing TAG data and implementing some further logic. These services are also distributed among sites, i.e. there are several instances of the same generic service. As the deployment of new versions is centrally controlled, the various instances are identical in terms of functional behavior, but their non-functional characteristics differ, as they run on heterogeneous machines at distributed sites with specific attributes. When a user submits a request to the TAG system, the service instances (deployments) and the TAG data sources need to be selected and composed. As interactive services querying only few data coexist with services that need to access and transfer important amounts of data, the links between individual service deployments also need to be considered in the selection process. For some links, the prevailing time is the network latency, for others the throughput is crucial. The framework for modeling and optimizing the TAG system thus has to take several aspects into account and be flexible enough to adapt to a changing environment, as sites can join or leave the system, resources can be replaced, and even priorities and optimization objectives can change over time. A practical TAG workflow example is a data extraction use case as depicted in Figure 1. Two services are needed for the extraction: Extract and TAG DB. There exist several deployments of them, running on various resources at different sites. This workflow is a single illustrative example from

425

the TAG system. There exist other workflows requiring other compositions, but a detailed discussion of all processes from the TAG system is beyond the scope of this paper. In order to illustrate the service selection process, the extract workflow has been decomposed into layers, as depicted in Figure 1. • User (External) View: From the user’s point of view, the process consists of only two steps, namely defining a query and submitting it to the extract service. The output consists of one or more TAG files containing only the events of interest satisfying the input query. • System (Internal) View: The internal view includes all steps necessary to set up the composition in order to respond to a given request. The input query defined by the user is first analyzed in terms of required data and operations. The resulting list of services and data needed is used as an input to the deployment selection process. With the resulting references to deployments, these are finally invoked and executed. Based on this motivational example, a system ontology, an attribute classification and attribute aggregation functions based on patterns have been developed. Their applicability to the exemplified use case is shown in Section VI.

databases and virtual machines. A resource belongs to a provider, as expressed by the composition symbol in Figure 2. • A Service is a generic functionality, described by functional attributes. The union of a system’s services completely describes its functionality. • A Deployment is the part invoked by a client. It is modeled as an association class between Resource and Service, i.e. it is a concrete instance of a generic service, running on a specific resource. Additionally, a Link class is defined between components. When several deployments are composed to form a complex workflow, the link between them – generally referring to a network link – is important to consider, as the transfer of data has an impact on the overall performance. In the presented system ontology, the link is however defined at the component level, in order to support flexibility as regarding to its definition in a concrete system description. In some systems, it might be possible to define and describe a link between deployments, but in others it might suffice to define links between providers or resources. Each component is described by one or multiple discriminative attributes. Generally, attributes can be classified into functional and non-functional ones. Functional attributes describe the specific function of a component, generally including inputs, behavior and outputs. Thus, they describe what the specific component is supposed to accomplish, as opposed to how. Examples of a functional attribute are access patterns and details of business logic. Non-functional or QoS attributes describe characteristics related to the operations of components, specifying how a component accomplishes a given functionality. Examples for QoS attributes include different performance metrics, availability and reliability. In the remainder of this work, these QoS attributes are further classified, and their usage is illustrated.

IV. C LASSIFICATION AND D ESCRIPTION OF N ON -F UNCTIONAL (Q O S) ATTRIBUTES In this section, a system ontology and a classification of non-functional (QoS) attributes are introduced and described. A. System Ontology A system ontology has been developed in order to represent a distributed, service-oriented system with its main components and characteristics. The system model is presented in Figure 2 as a UML class diagram and described in the following. SystemOntology +ontologyName *

1

Attribute +attributeName +attributeType functional

* *

1

0..*

1..*

non-functional

Provider +providerName +virtualOrganization

* 1

Component +componentName +componentType

Resource -resourceID -resourceName

B. Types of QoS Attributes

Link -linkID -linkName

*

As pointed out in Section II, many different QoS attributes can be found in literature [5], [6], [7], [8], [9], [11]. Usually, a generic QoS computation model is presented, and few concrete values are given as example to illustrate the applicability of an approach. However, when facing an application scenario where it is required to gather statistics and identify the important ones as input to an optimization process, it is crucial to build an exhaustive list of those statistics. In order to do so, it is important to have a classification of appropriate attributes. In our previous work [1], an ontology has been proposed for both functional and non-functional attributes, based on their logical correlation to a domain. This ontology is now being refined for non-functional attributes, resulting in a classification reflecting the different attribute types, their derivation and relations (uses, subclass, refines). The attribute classification is shown in Figure 3 and is detailed in the following. The first-level classification is made based on attribute properties, whether they are absolute-valued, relative-valued or computed. It is important to note that these categories are not mutually

* *

*

*

Service +serviceID +serviceName

Deployment -deploymentID

Fig. 2.

System Ontology (UML Class Diagram)

The System Ontology is composed of Components and Attributes describing them. A component can be one of the following classes: • A Provider is a Virtual Organization or simply a geographical site that participates in the considered distributed system by hosting at least one resource. • A Resource is a physical or logical entity capable of hosting a service. Examples of resources are web servers,

426

User View

(BPMN Activity Diagram)

TAG Data Extraction Workflow

Define Query

Extract TAGs

query

(BPMN Activity Diagram)

Fetch Data Invoke Extract

list of data sources

TAG . . . TAG DB 1 DB n

non-functional Attribute

«refines»

QoS Attribute +identifier +type +value +unit

Absolute «subclass»

«subclass»

«subclass»

Computed «subclass»

«subclass» «subclass»

Measured

Estimated

«uses»

Polling

Fig. 3.

«subclass» «subclass»

User-Ranking

Time-based Size-based

«uses» «uses»

Snapshot

Priority +priority Weight +weight

Relative

«subclass»

System-Inherent

«uses»

relational data

transformed data

website (HTML)

share (FILES)

are usually attached to resources as defined in the system ontology. Examples include the CPU count, the available memory and disk space of a server, and the bandwidth of a network cable. We define these attributes as quasi-static, because their value could change without the whole resource being changed. For example, when a server is upgraded and gets more RAM, this single metric changes, but the resource remains the same, i.e. the values are usually static, but not necessarily throughout the whole lifetime of the component. – A Measured Attribute is a basic, single and constantly changing value that can be determined dynamically at a given point in time. Examples include CPU load and network load as well as free RAM and free disk space. Different techniques are used for publishing the measured values. Polling allows to gather attribute values of system components by requesting them in a regular time interval. This can be achieved by either using a push-based (decentral) or a pull-based (central) notification mechanism. In the push-based case the components publish their attribute values to a central instance, often referred to as registry or repository. In the pull-based case the central instance collects the attribute values by querying all components. In case of a Snapshotbased publishing, the attribute values are collected upon request, usually when the values are needed for the optimization process of a request. This technique allows to take into account the state of a component at request time, e.g. in terms of load. In an application scenario, the presented techniques can be combined, depending on which is most appropriate

«refines»

«subclass»

«subclass»



Deliver Data

User & System View of the Extract Use Case (BPMN Activity Diagram)

exclusive, i.e. a concrete attribute can belong to more than one category, as provided examples will point out.

Defined

Transform Data

Extract:ID TAG DB:IDs Extract Service

Fig. 1.

Deliver Data

Fetch Data (from TAG DB n)

analyzed query

Transform Data

...

Choose Deployments

Check Data Locality

...

(from TAG DB 1)

Analyze Query

...

System View

Data Extraction Optimizer

Combined

Historical Prediction

QoS Attribute Classification (UML Class Diagram)

An Absolute Attribute is defined as a basic, single value that is either simply defined and set, measured or estimated. It can be of different types, such as numerical (integer, double, float), logical (boolean), character-based (string), time-related (date, time, timestamp) or even a list (set, bag, map) of values. In case of numerical types the value can have an optional unit (e.g. Byte, KB, MB for data size). The unit can also be undefined (e.g. maximal number of concurrent sessions). If considered values have different units of the same type, conversion functions have to be applied. – A Defined Attribute is a value that is set and normally not subject to changes. These attributes

427

for a given attribute. – An Estimated Attribute is a basic value that is estimated instead of measured. By analyzing execution logs, usage profiles for all system components can be determined. Based on this analysis, attributes can then be estimated by assigning them to one of the usage profiles. Examples include CPU and network load. To allow for this estimation two sources can be used. In case of a historical estimation past experiences are used to derive attributes by performing a sophisticated log analysis. In case of a prediction, known values are used to extrapolate others. For example, if the amount of data touched by a given query is known, it can be used to predict the execution time and the resource demand of a deployment for that particular query. •

– A Combined Attribute refers to a computed attribute that contains time and size-based information, or any combination of other information. Examples include values for performance e.g in tasks per time unit or average data blocks per time unit. C. Significance of QoS Attributes Not all QoS attributes are equally important for an optimization process. There has to be the possibility to defined the importance of attributes, by assigning weights and priorities. • A Weight – also referred to as intensity factor – is a measure of how resource intensive – in terms of CPU, memory, or network usage – a service is. In case of physical devices we can use the Resource concept and weight CPU and memory attributes. In terms of network connections we can use the Link concept and use it for network-related attributes, e.g. by weighting the latency sensitivity versus the bandwidth sensitivity. • A Priority can be used to control the order in which attributes have to be considered in the overall optimization process. Usually attributes with a high impact on the optimization process do also have a high priority, while attributes that less affect the optimization process have a low priority assigned. In the presented Blackboard approach (see Section VI) the priority defines the optimal order of QoS attributes considered in the optimization process.

A Relative Attribute is a QoS value that is defined in scales instead of absolute values, such as low-mediumhigh. Two subcategories have been identified. – A User-Ranking Attribute is defined by users in systems where they are given the possibility to provide feedback after having performed operations. As this feedback can hardly be expressed in absolute values, usually scales are proposed and users are asked to rate their satisfaction, e.g. on an ordinal scale such as very good (1), good (2), bad (3). – A System-Inherent Attribute is a level defined by the system provider instead of the user. For example, for the Amazon Elastic Compute Cloud (Amazon EC2) service, instance types are defined in the following categories: small, large and extra large [12]. Moreover, attributes like privacy or security can be rated as low, medium and high.



V. C OMPOSITION PATTERNS AND AGGREGATION OF ATTRIBUTES As mentioned in our use-case scenario of Section III workflows exist where a multitude of congenerous attributes – and their atomic values – of different services, deployments, or resources have to be considered to obtain a composed and aggregated attribute value that is used for optimization purposes. According to the workflow patterns of Van der Aalst et. al. [13] and the efforts of Jaeger et. al. [11] in QoS aggregation for Web services we identified seven composition patterns (wrapped, sequence, loop, choice, split, merge, sync) that emerge in such optimization scenarios, as exemplified in Section VI. Our patterns are depicted in Figure 4, whereas n denotes the number of components.

A Computed Attribute is a value computed by using several absolute attributes and other values (e.g. integration over time) as input. Computed Attributes are possibly more meaningful and more accurate to be included in the optimization process. They can be computed and updated in regular time intervals, on request, or upon changes of some values. – A Time-based Attribute refers to a computed attribute that has a time aspect, i.e. the value is computed based on available time information. An example for such an attribute is the availability, which is defined for a certain time period as the downtime divided by the uptime, usually denoted in percent. – A Size-based Attribute refers to a computed attribute that has a size aspect, i.e. the value is computed based on size information. An example for such an attribute is the reliability, which is defined for a certain number of invocations and is computed by the number of successful invocations divided by the number of all invocations, also denoted in percent.

A. Composition Patterns •

428

Standalone defines a basic building block, a single component, that can be composed with other components of the same type. Usually this takes place on the deployment level, where multiple deployments are composed to build sophisticated workflows. However, the composition can also take place on the resource level, whereas some physical server farms build up a logical extended server farm. Nevertheless this pattern does not define a composition pattern, instead it only provides the base for a possible composition. Without loss of generality we use the computation of the execution time (ET ) of components (and their links) for the following discussion

standalone

C11

wrapped

C11

is related to Pattern 1 (Sequence) of the Basic ControlFlow Patterns.

Legend: sub‐process

start

C22

L11

end

L22

ETsequence =

n X

ET (Ci ) +

i=1

sequence

C11

C22

L11

L22

L22

C Component

L11

L Link

C22 choice (XOR)

split

C11

(AND)

merge (XOR)

sync (AND)

C11

L11

C22

L22

C33



Legend: nodes & edges

C11 loop

C33

Data Stream Data Stream

L11

C22

L22

C33

L33

C44

ETloop = •

Fig. 4.

Composition Patterns for System Components

as a running example. The ET belongs to the class of absolute attributes and is usually estimated by the use of historical data of former executions and therefore denotes as a function. Equation 1 presents the computation of the ET for a standalone (single) component. ETstandalone = ET (C1 ) •

(1)

Wrapped defines a component that is wrapped by another component, e.g. a service (C1 ) that calls an embedded “legacy” service (C2 ) to perform a specific subtask. This pattern has no analogy in [11] or [13]. An attribute affected by this pattern is e.g. the availability; the total availability of this composition is the product of the availabilities of the involved components. For other attributes, other mathematical functions have to be applied. For example, the total (ET ) is the sum of the (ET )s of all involved components (Equation 2). ETwrapped =

n X

(ET (Ci ) + ET (Li ))

(3)

#loops X

n X

l=1

i=1

! ET (Cil )

+

ET (Lli )



(4)

Fork is a branch that splits up a single data stream originating from a single component (C1 ) into multiple data streams provided to multiple subsequent components (C2 , C3 , C4 ). Hereby the data stream is either fully replicated or partially fragmented. In this composition pattern two facets emerge: a fork where only one of the subsequent components receive the data (Choice) or a fork where all subsequent components receive the data (Split). – Choice is a logical xor-fork where only one of the subsequent components receives the data stream. Such a choice is possible, when multiple follow-up components provide the same functionality and an optimization mechanism has to choose the “best” one regarding a specific requirement. Considering the ET an optimizer has to find the subsequent component with the lowest ET (best case), respectively has to apply a minimum search. In the worst case the ET is the sum of the execution times of the first and the slowest component of C2 , C3 and C4 , as outlined in Equation 5. According to [13], [14] this pattern is related to Pattern 4 (Exclusive Choice) of the Basic Control-Flow Patterns.

(2) best ETchoice

i=1 •

ET (Lj )

j=1

Loop describes a recurring – sometimes recursive – trail of component invocations, where a single component is called multiple times to accomplish a specific task. Assume a replication scenario, whereas two components (C1 , C2 ) transform and replicate incoming data. Here, the total ET (Equation 4) is calculated as the execution time of both components times (#loops) the replication degree of the data, if we assume that one loop handles exactly one copy. According to [13], [14] several congruent patterns can be identified, all belonging to the Iteration Control-Flow Patterns such as Pattern 10 (Arbitrary Cycles), Pattern 21 (Structured Loop) and Pattern 22 (Recursion).

C44

L33

n−1 X

= ET (C1 ) + min (ET (Li−1 ) + ET (Ci ))

Sequence is a queue of components where each output of a component is directly provided as an input to the next component, e.g. C1 provides its output using L1 to C2 and C2 provides its output using L2 to C3 . An attribute affected by this composition is e.g. the ET (Equation 3) that is the sum of all execution times of the involved components and links. According to [13], [14] this pattern

i=2...n worst ETchoice

= ET (C1 ) + max (ET (Li−1 ) + ET (Ci ))

i=2...n

(5)

– Split is a logical and-fork where all subsequent components receive the data stream. In such a case

429

patterns we exemplified such aggregation functions for the attribute Execution Time (ET ). These equations demonstrate how the ET – also referred as C.C.ET (Configuration → Compute → Execution Time) according to [1] – is affected when used in different compositions.

the ET is determined by the slowest component, respectively a maximum search has to be applied, as depicted in Equation 6. According to [13], [14] this pattern is related to Pattern 2 (Parallel Split) of the Basic Control-Flow Patterns. ETsplit = ET (C1 ) + max (ET (Li−1 ) + ET (Ci )) i=2...n



VI. M ODELING , G OAL F UNCTIONS AND XML B INDING OF THE M OTIVATIONAL E XAMPLE

(6)

This section explains the usage of the presented system ontology, attribute classification and aggregation functions in the use case scenario introduced in Section III.

Join is a conjunction of several data streams from multiple components (C1 , C2 , C3 ) to a single data stream to a single component (C4 ). This is usually the case when multiple components provide a partition of the full data set for a subsequent component. In this composition pattern two facets emerge: a join where only one component delivers data to the subsequent component (Merge) or a join where all components deliver data to a subsequent component (Sync). – Merge is a logical xor-join where only one component delivers the data stream to the subsequent component. Regarding the ET the component C4 only has to wait for the completion of one of the components C1 , C2 or C3 (non-blocking wait). So to speak the component with the shortest execution time determines the overall execution time, as pointed out in Equation 7. According to [13], [14] this pattern is related to Pattern 5 (Simple Merge) of the Basic Control-Flow Patterns. ETmerge =

A. Representation of the TAG system Figure 5 details the optimization view of the extraction workflow. It is an additional layer to the ones presented in Figure 1 (user and system view). This view details the optimization process and the usage of the attributes. As defined in the ontology and depicted in the lower part of Figure 5, all involved components have associated QoS attributes ( A ). The attributes considered in this example are provided in Table I. A specific attention has to be given to the weights or intensity factors: the links between interactive services are defined as being latency-intensive, i.e. as few data is transferred the delay is mainly defined by the network latency. The link between non-interactive services (e.g. extract and the database – asynchronous calls) are defined as bandwidthintensive, i.e. as an important amount of data is transferred, the network throughput is the important measure, whereas the latency can be neglected. Moreover, user satisfaction has to be taken into account. If the performance of interactive services is poor, the whole system will be considered as ”slow”, whereas for asynchronous services such as extract a few seconds – or even minutes, depending on the data size – more or less do not influence the overall system acceptance. As shown in Figure 5, the following patterns are present in the workflow: split, sequence and sync. The aggregate values of all QoS attributes thus have to be computed based on the pattern-specific aggregation functions defined earlier. Although the components being composed are deployments, attributes of resources, services and providers also influence the process, because of their relations to deployments, as defined in the system ontology.

(ET (Ci ) + ET (Li )) + ET (Cn ) (7) – Sync is a logical and-join where all components have to deliver some data stream to the subsequent component that has to wait for the arrival of all data packets (blocking wait). Regarding the ET a maximum search has to be applied, as defined in Equation 8. According to [13], [14] this pattern is related to Pattern 3 (Synchronization) of the Basic Control-Flow Patterns. ETsync =

min

i=1...n−1

max (ET (Ci ) + ET (Li )) + ET (Cn ) (8)

i=1...n−1

B. Goal Functions and Optimization There can be various goal functions for distributed systems, depending on the optimizer view-point and system requirements. In a long-lasting experiment like ATLAS, it is even likely that objective functions and requirements might change over time. From the single user’s point of view, a common objective function is the minimization of the execution time of a request. Coming back to our example presented in the previous subsection, the objective can be stated as: “minimize the total execution time of the extract and TAG database deployments and their links.” The following patterns are involved, in this order: sequence (from extract deployment to split), split (in n branches), sequence (link, deployment,

B. Aggregation Functions Considering the presented composition patterns the attribute values for Absolute and Computed attributes – operating on the nominal scale – have to be calculated by applying a mathematical aggregation function such as average, sum, product, count, min, max, mode, mean, median, range or standard deviation. Sometimes a combination of these functions is needed, e.g. for the loop pattern when components are called several times. Moreover custom functions like BestMofN are of interest in case of an optimization task that requires to find the best M components among N available ones. For our composition

430

TAG DB 1

P11 : Provider A

Scientist

R11 : Resource

S11 : Service

A A

A A A A

D11 : Deployment

L11 : Link A A

A A A A A

L1n : Link A A A

D1n : Deployment

DE : Deployment

LE : Link A A A A

Multi‐Layered Blackboard

Scientist

L21 : Link A A A

...

Optimization View

(UML Object Diagram with Ontology Classes)

TAG Data Extraction Workflow

A A A A A A

RE : Resource

SE : Service

A A

A A A A

PE : Provider A

Extract Service

R1n : Resource

S1n : Service

A A A

A A A A

P1n : Provider

TAG DB n

A A

Fig. 5.

L2n : Link A A

A A A A A

Optimization View of the Extract Use Case (System View mapped to System Ontology Classes)

Attribute Name (Short Id) (Full Id [1]) Network Latency (NL) (Configuration.Network.NL) Network Throughput (NT) (Configuration.Network.NT) Latency-Sensitivity (LS) (Metric&Statistic.Profiles.LS) Throughput-Sensitivity (TS) (Metric&Statistic.Profiles.TS) Intensity Factor (IF) (Metric&Statistic.UsageStat.IF) Performance Indicator (PI) (Metric&Statistic.UsageStat.PI) Execution Time (ET) (Configuration.Compute.ET) Load (L) (Metric&Statistic.UsageStat.L)

Absolute

Relative

Computed

Significance

Symbol

Component

Defined

-

-

-

-

Link

Estimated (prediction)

-

-

-

-

Link

-

System-Inherent

-

Weight

sL

Services

-

System-Inherent

-

Weight

sT

Services

-

User-Ranking

-

Weight

λ

Services

-

System-Inherent

Combined

-

p

Deployment

Estimated (historical)

-

Combined

-

ET

Deployment

Measured (snapshot)

-

-

-

l

Resource

TABLE I Q O S ATTRIBUTES CONSIDERED IN THE E XTRACT U SE C ASE

link) and sync. Let ET (C) be the time spent for executing on component C, and ET (L) the time spent on link L. ET (C) is a computed value function of the data size (defined by the query), the deployment performance indicator p and the current resource load l. ET (L) is function of the data size, the network bandwidth and the network latency and current network throughput, related to their respective intensity factors. The total execution time of the depicted workflow can be formulated as: ETtotal

overall objective of the optimization problem is to minimize the total processing time. This simple example has been chosen to demonstrate how the QoS attributes can be combined, aggregated and used in formulating an optimization objective. In real-case scenarios like the TAG use-case, several competing objective functions can exist (e.g. minimize the processing time of each individual request, and maximize the overall system throughput, which can be mapped to minimize the variance in processing times per data unit). Such a scenario can be mapped to a multi-objective optimization problem that will be described in detail in a future work.

= ET (DE ) + ET (LE ) + (9) k k+1 X X max ( λDji ET (Dji ) + λLil ET (Lil )) i=1...n

j=1

According to our optimization target to e.g. minimize the execution time of an individual request we use a multi-layered blackboard [15]. This blackboard approach covers the splitsequence-sync part of our workflow and addresses the NPhard problem of service selection, especially to find the most appropriate TAG databases for the query execution of a single request. This selection highly depends on factors given by attributes outlined in Table I but also covers criteria like data replication, data partitioning and data propagation. In

l=1

where n is the number of parallel branches and λ corresponds to the weight indicating the relative importance of a deployment or link in the overall process. For instance, a link between an interactive web service and a database will be weighted higher than a link between a database and a Grid service that runs over a longer time period anyway. The

431

Usually, in heterogeneous environments, the cost estimation is a difficult task because QoS attribute values often differ in their behavior or are of a different data or unit type. Due to our system and attribute classification in conjunction with the composition patterns that can be applied on top of the attribute values, it is now easily possible to compare different selection choices – decision paths in the tree – with each other. The cost estimation itself is accomplished by using the aggregated attribute values that are computed by applying the mathematical cost functions according to the used composition pattern.

this optimization we consider the selected databases, their interconnections (links), and all depending resources, services and providers of the database deployments.

Local Iteration

Global Iteration

Local Iteration

(layer 1) 1) (layer (layer 2) 2) (layer

C. XML Schema Binding A schema binding is necessary to describe the abilities of the components (providers, resources, services, and deployments) and links that are part of the optimization view, as seen in Figure 5. For this purpose components have attributes attached that describe their abilities according to the classification outlined in Section IV. An XML schema is developed as exchange format, much like BPEL for workflows, that can be used by the experts on the blackboard. In our approach such a schema mainly consists of two parts: Declaration and Definition. A Declaration (Part 1) describes the fundamental building blocks of our model, namely classes and attributes as defined in our system ontology and attribute classification. For the presented use-case scenario the attribute listing must at least cover the attributes in Table I. Moreover the declaration also contains aggregation functions, that are applied on attribute level. An exemplary declaration document is given in Listing 1.

blackboard

Fig. 6.

Local Iteration

RESOURCES OPERATIONS

blackboard

(layer 3) 3) (layer

DATA

blackboard

Multi-Layered Blackboard for TAGDB QEP

In our approach we extended the traditional notion of a blackboard system and divided the problem domain into three layers (Figure 6), each consisting of a single blackboard and tackling another issue of the query optimization problem. A blackboard can be seen as a shared information space where so called experts place their ideas concerning a specific problem aspect. These experts are either humans, providing some kind of sophisticated algorithm, or machines, providing computational resources and hence defining the environment. Each blackboard uses an A∗ -algorithm that offers possibilities to prune the search space and thus avoid exhaustive searching. It spawns a decision tree which nodes resemble partial solutions to the problem and their estimated costs. In a stepwise approach (local iteration) the algorithm starts at the root and expands nodes whenever the estimated costs associated with that node are lower than the costs of any given solution that has already been found. Solutions which are heuristically known or proven to be inefficient are discarded automatically. After completion of a layer the process continues with the next layer. However it is possible to step back one layer if another solution path, due to changing conditions, now possesses the minimal costs (global iteration). As mentioned before the multi-layered blackboard therefore consists of the following three layers: • A Data Blackboard covers all data-specific attributes concerning data schema, locality, distribution, movement, replication and partitioning. • A Resources Blackboard covers attributes concerning the assignment and allocation of computational resources, such as CPU, memory and network. • An Operations Blackboard covers attributes concerning the query optimization process itself by applying optimization rules for database operators such as selection, projection, join, sorting, grouping, aggregation, and so on.

COMPONENT LINK ... Execution Time ESTIMATED TIME-BASED sum(i=1..n of fct(C[i])) + sum(j=1..n-1 of fct(L[i])) fct(C[1]) + max(i=2..n of fct(L[i-1]) + fct(C[i])) max(i=1..n-1 of fct(C[i]) + fct(L[i])) + fct(C[n]) ... ...

Listing 1.

Declaration Document (Classes, Attributes and Compositions)

A Definition (Part 2) covers the instantiation of the templates that are provided in the declaration part. It defines the concrete component infrastructure and its connections among the components in order to provide the business workflow to fulfil a given user query. • Infrastructure: In Figure 5 several components exist, such as five links (LE , L11 , L12 , Ln1 , Ln2 ), three deploy-

432

ments (DE , D11 , Ln1 ) and their corresponding resources (RE , R11 , R1n ), services (SE , S11 , S1n ) and providers (PE , P11 , P1n ). One of these components, a TAG database, can be defined as outlined in Listing 2 specifying its attributes execution time (C.C.ET) and network bandwidth (C.N.B) according to the attribute ontology defined in [1].

to put a stronger focus on the implementation details of the component binding. R EFERENCES [1] E. Vinek, P. P. Beran, and E. Schikuta, “Mapping Distributed Heterogeneous Systems to a Common Language by Applying Ontologies,” in PDCN 2011: Proceedings of the 10th IASTED International Conference on Parallel and Distributed Computing and Networks. IASTED/ACTA Press, 2 2011, accepted for publication. [2] The ATLAS Collaboration, “The ATLAS Experiment,” http://www.atlas.ch/. [Online]. Available: http://www.atlas.ch/ [3] F. Viegas, D. Malon, J. Cranshaw, G. Dimitrov, M. Nowak, A. Nairz, L. Goossens, E. Gallas, C. Gamboa, A. Wong, and E. Vinek, “The ATLAS TAGS database distribution and management - Operational challenges of a multi-terabyte distributed database,” Journal of Physics: Conference Series, vol. 219, no. 7, p. 072058, 2010. [Online]. Available: http://stacks.iop.org/1742-6596/219/i=7/a=072058 [4] L. Zeng, A. H. Ngu, B. Benatallah, R. Podorozhny, and H. Lei, “Dynamic composition and optimization of Web services,” Distrib. Parallel Databases, vol. 24, pp. 45–72, December 2008. [Online]. Available: http://portal.acm.org/citation.cfm?id=1454353.1454365 [5] T. Yu, Y. Zhang, and K.-J. Lin, “Efficient Algorithms for Web Services Selection with End-to-End QoS Constraints,” ACM Trans. Web, vol. 1, May 2007. [Online]. Available: http://doi.acm.org/10.1145/1232722.1232728 [6] M. Alrifai and T. Risse, “Combining Global Optimization with Local Selection for Efficient QoS-aware Service Composition,” in Proceedings of the 18th international conference on World wide web, ser. WWW ’09. New York, NY, USA: ACM, 2009, pp. 881–890. [Online]. Available: http://doi.acm.org/10.1145/1526709.1526828 [7] Y. Badr, A. Abraham, F. Biennier, and C. Grosan, “Enhancing Web Service Selection by User Preferences of Non-functional Features,” in Proceedings of the 2008 4th International Conference on Next Generation Web Services Practices. Washington, DC, USA: IEEE Computer Society, 2008, pp. 60–65. [Online]. Available: http://portal.acm.org/citation.cfm?id=1474558.1475674 [8] S. Reiff-Marganiec, H. Q. Yu, and M. Tilly, “Service-Oriented Computing - ICSOC 2007 Workshops,” in In Proceedings of Non Functional Properties and Service Level Agreements in Service Oriented Computing Workshop, E. Nitto and M. Ripeanu, Eds. Berlin, Heidelberg: Springer-Verlag, 2009, ch. Service Selection Based on Non-functional Properties, pp. 128–138. [Online]. Available: http://dx.doi.org/10.1007/978-3-540-93851-4 13 [9] S. Chaari, Y. Badr, F. Biennier, C. BenAmar, and J. Favrel, “Framework for Web Service Selection Based on Non-Functional Properties,” International Journal of Web Services Practices, vol. 3, no. 2, pp. 94–109, 2008. [10] H.-L. Truong, S. Dustdar, and T. Fahringer, “Performance metrics and ontologies for Grid workflows,” Future Gener. Comput. Syst., vol. 23, pp. 760–772, July 2007. [Online]. Available: http://dx.doi.org/10.1016/j.future.2007.01.003 [11] M. C. Jaeger, G. Rojec-Goldmann, and G. Muhl, “QoS Aggregation for Web Service Composition using Workflow Patterns,” in Proceedings of the Enterprise Distributed Object Computing Conference, Eighth IEEE International. Washington, DC, USA: IEEE Computer Society, September 2004, pp. 149–159. [Online]. Available: http://portal.acm.org/citation.cfm?id=1025120.1025698 [12] Amazon Web Services, Amazon EC2 Instance Types, http://aws.amazon.com/ec2/instance-types/, November 2010. [13] W. M. P. Van Der Aalst, A. H. M. Ter Hofstede, B. Kiepuszewski, and A. P. Barros, “Workflow Patterns,” Distrib. Parallel Databases, vol. 14, pp. 5–51, July 2003. [Online]. Available: http://portal.acm.org/citation.cfm?id=640475.640516 [14] N. Russell, A. H. M. Ter Hofstede, W. M. P. van der Aalst, and N. Mulyar, “Workflow Control-Flow Patterns: A Revised View,” BPMcenter.org, Tech. Rep., 2006. [15] P. P. Beran, W. Mach, R. Vigne, J. Mangler, and E. Schikuta, “A Heuristic Query Optimization Approach for Heterogeneous Environments,” in Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, ser. CCGRID ’10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 542–546. [Online]. Available: http://dx.doi.org/10.1109/CCGRID.2010.65

DEPLOYMENT TAG DB (Tier 0, CERN) & Oracle 11g 125 1000 ...

Listing 2.

Definition Document (Infrastructure Part)

Connections: Here the interconnections between components in terms of links are defined. Each connection is represented as a pairwise relation between a component and a link. Thus all composition patterns for the presented use-case can be derived from this connection definition, such as the sequence, the split and the sync. With the help of these XML bindings it is possible to describe the presented workflow in terms of the given infrastructure and the required interconnections support the optimization of the user query. •

VII. C ONCLUSION We presented several building blocks to formally define and represent a distributed, heterogeneous system in order to allow for a component selection optimization process. First, a system ontology developed in our previous work has been refined to include not only components of a distributed system, but also the links between them – generally network links – that are important to consider in any optimization process. Second, the class of non-functional attributes of components has been analyzed in detail, resulting in a taxonomy based on attribute types, derivation and relations. This classification serves as a guideline to determine system attributes. Third, composition patterns affecting the aggregation of attributes have been defined. They are derived from workflow patterns because of their well-suited applicability to service selection tasks. Finally, QoS attribute aggregation functions are defined for the presented patterns. A real-case scenario from a High-Energy Physics application has been described to illustrate how to integrate those fundamental building blocks. A detailed system model has been derived from the ontology and QoS attributes have been identified based on the presented taxonomy. Moreover, a graph-based model of an example workflow revealing composition patterns, along with the necessary attribute aggregations, has been presented. Finally, a binding implementation has been suggested. We believe that these procedures with their underlying formal models can be applied to other application scenarios as well, as we plan to show in future work. Additionally, the model can be extended to include more patterns, and we plan

433

Suggest Documents