Emilio Mancini, Massimiliano Rak and Umberto Villano. AbstractâThe highly distributed nature and the load sen- sitivity of Service Oriented Architectures (SOA) ...
JOURNAL OF AUTONOMIC AND TRUSTED COMPUTING, VOL. -, NO. -, NOVEMBER 2006
1
Autonomic Service Oriented Architectures with MAWeS Emilio Mancini, Massimiliano Rak and Umberto Villano
Abstract— The highly distributed nature and the load sensitivity of Service Oriented Architectures (SOA) make it very difficult to guarantee performance requirements under rapidlychanging load conditions. This paper deals with the development of service oriented autonomic systems that are capable to optimize themselves using a feedforward approach, by exploiting automatically generated performance predictions. The MAWeS (MetaPL/HeSSE Autonomic Web Services) framework allows the development of self-tuning applications that proactively optimize themselves by simulating the execution environment. After a discussion on the possible design choices for the development of autonomic web services applications, a soft real-time test application is presented and the performance results obtained in three different execution scenarios are commented. Index Terms— Autonomic systems, MAWeS, Web Services, Service Oriented Architecture.
I. I NTRODUCTION
N
OWADAYS Service Oriented Architectures (SOA) and their Web Services implementations are among the most relevant software solutions. But while the technology standards SOAP, WSDL, . . . ) and their implementations (Axis, .NET, . . . ) are mature enough today, the definition of good methodologies for development of efficient services and webbased applications is still an open issue. Management of heterogeneous, geographically distributed systems is an hard task, as the development of applications whose performance can be predictable is often impossible. In practice, the only solution to guarantee critical performance requirements seems to be the use of an architecture able to auto-configure and to auto-tune until the given requirements are met. Autonomic computing aims to bring automated selfmanagement capabilities into computing systems [1]–[4]. The name “autonomic” derives from the biological sciences, where it indicates the human autonomic nervous system. Autonomic capabilities are usually classified in four different categories: self-configuring, self-healing, self-optimizing and selfprotecting. A self-configuring system can dynamically adapt to changing environments, a self-healing one can discover, diagnose and react to disruptions, a self-optimizing one can monitor and tune resources automatically and a self-protecting system can anticipate, detect, identify and protect against threats. Even if currently there are many simple examples of application of autonomic concepts, and a toolkit for building autonomic systems is available from IBM, all known solutions are fairly “young”, and rather unstable [5]–[7]. The majority of these solutions are based on reactive autonomicity, or, stated another way, on feedback control. The continuous analysis of system logs and/or direct monitoring points out configuration
or performance problems. The system automatically reacts applying new configurations or tuning policies. Recently a different approach has been proposed, which is based on predictive autonomicity, i.e., on feedforward control [7]. In this case, the system detects and forecasts cyclic variations in parameters and their impact on performance, and dynamically self-tunes, anticipating the need. However, application development tends to be harder, because the autonomic behavior needs to be designed in an integrated way with application and system. In companion papers [8], and more thoroughly in [9], we have presented a framework (MAWeS) that may be integrated with the application. This framework makes it possible to predict application performance in the expected system status before its startup, and to optimize its execution in the future load conditions using both system- and application-dependent parameters. Our proposal is based on what we have called the MetaPL/HeSSE methodology [10]–[12]. This uses a metalanguage (MetaPL) to describe the software and to identify the critical parameters that influence the performance (e.g., bandwidth, number of application threads, . . . ), and exploits the predictions produced by a simulation tool (HeSSE) to optimize the values. The software is thus proactively adapted to the forthcoming environment changes. In the previously mentioned papers, [8] and [9], we have presented the MAWeS framework and a toy application where autonomicity is exploited at application level, supporting selftuning applications that extend a given client. In this paper, we instead describe the development process of self-optimizing predictive autonomic Web Services, exploiting the MAWeS framework at service level, and exploring the more complex case of composite services. The rest of the paper is organized as follows. Firstly we describe MetaPL, HeSSE, MAWeS and a review of related work. Then we introduce the framework methodology and its application at service level. Finally we present the experimental results, and our future work and the conclusions are outlined.
II. MAW E S METHODOLOGY The MAWeS Framework has been developed to support the predictive autonomicity in Web Service based architectures. It is based on two existing technologies: the MetaPL language [10], [13] and the HeSSE simulation environment [14]. The first is used to describe the software system and the interactions inside it; the second, to describe the system behaviors and to predict performances using simulation.
JOURNAL OF AUTONOMIC AND TRUSTED COMPUTING, VOL. -, NO. -, NOVEMBER 2006
A. The simulation tool: HeSSE HeSSE is a simulation tool that allows to simulate the performance behavior of a wide range of distributed systems for a given application, under different computing and network load conditions. It makes it possible to describe Distributed Heterogeneous Systems by interconnecting simple components, which reproduce the performance behavior of a section of the complete system (for instance a CPU, a network . . . ). These components have to reproduce both the functional and temporal behavior of the subsystem they represent. In HeSSE, the functional behavior of a component is the service set exported to other components. So, connected components can ask other components for services. The temporal behavior of a component describes the time spent servicing. System modeling is performed primarily at the logical architecture level. For example, the use of a processor to execute instructions is modeled as the total time spent computing, without considering the per-instruction behavior. The software behavior is instead described by traces, which can be obtained by running programs (augmented with “sensors”), or by applying suitable filters on MetaPL descriptions. New tools able to obtain traces from profiler logs are currently under testing. B. The description meta-language: MetaPL MetaPL is an XML-based meta-language for parallel programs description, which, like other prototype-based languages, can be used when applications are in design phase or they are not completely available [10], [11], [13]. It is language-independent, and can be adopted to support different programming paradigms; it is structured in layers, with a core that can be extended through Language Extensions, implemented as XML DTDs. These extensions introduce new constructs into the language. Starting from a MetaPL program description, a set of extensible filters makes it possible to produce different program views. Filters can be used to generate views for program comprehension or documentation purposes, but also to produce traces of program activity annotated with timing information. These traces are amenable to direct simulation in HeSSE, thus making it possible to obtain fairly reliable predictions of program performance, even at the very first steps of code development [15]. C. MAWeS The MAWeS framework relies on MetaPL descriptions, describing the software structure, and on HeSSE configuration files, describing the hardware/software execution environment, to run HeSSE simulations and to obtain performance data. Through the execution of multiple simulations, corresponding to different parameterized configurations, it can choose the parameter set that optimizes the software execution according one or more criteria (e.g., shortest execution time). Depending on whether the optimization choices are performed at the launch of an application or at run-time, the necessity to simulate a high number of system configurations may lead to increased execution latencies, or to loss of
2
MAWeSclient User Developed Application
Decision Unit
HeSSEws
MetaPL
M/H Client
Description
MAWeS Interface
MetaPL /HeSSE WSInterface
MAWeSCore Submit MetaPL description
Generate traces
MetaPLws
Return optimal parameters
Simulate
Execute application
Target Services
Environment Services
Frontend
Core
WSInterface
Fig. 1. The three layers of the MAWeS Framework. The client functionalities in the Frontend can be integrated in applicative software
reactivity of the autonomous system. In order to reduce the simulation overhead to a minimum, a large set of information is collected off-line (i.e., during the set-up of the application). This makes possible to reduce the number of simulations to be performed at run-time. In any case, the developer can help MAWeS to reduce the number of candidate configurations and thus simulation overheads, indicating explicitly in the MetaPL code the upper and lower bound for system parameters. For example, the developer can supply maximum and minimum number of computing nodes to be used for executing the autonomic application, letting the autonomic framework decide by examining HeSSE performance predictions if in the coming CPU and network conditions it is better to increase parallelism or to reduce the amount of communication. MAWeS is structured in three layers (Fig. 1). The first one is the front-end, which contains the software modules used by final users to access the MAWeS services. The second one is the core, which includes the components that manage MetaPL files and make optimization decisions. The last one contains the Web Services used to obtain simulations and predictions through MetaPL and HeSSE. The MAWeS frontend includes a standard client application interface, MAWeSclient, providing the general services that can be used and extended to develop new applications. The MAWeS Core exploits environment services (i.e., the services offered by the environment to monitor and to manage itself) and the MetaPL/HeSSE web services interface using the application information contained in the MetaPL description to find out optimal execution conditions. It is a software unit provided both as a web service and integrated into the MHAWeSclient. The MetaPL/HeSSE WS interface defines a set of services that make it possible to automatize the generation of performance predictions by MetaPL/HeSSE. It is worth pointing out that all simulations and optimization steps are made on the hardware/software environment indicated by the developer through the MetaPL description. Possible successive changes in the execution environment involve corresponding modifications of the MetaPL code. The activity diagram of Fig. 2 shows the actions of an application exploiting the MAWeSclient interface, which asks the framework for configuration optimizations and reconfigures
JOURNAL OF AUTONOMIC AND TRUSTED COMPUTING, VOL. -, NO. -, NOVEMBER 2006
Application + MAWeS frontend Client Application
Application Web Services
MAWeS client
Start optimization
Pack and send data
3
Core
WS Interface
MAWeS core
HeSSEws
Identify current configuration
Test conf. Start service
MetaPLws
Generate traces
Service execution
Collect optimized param . Simulate
Collect results
Reconfigure Evaluate sim . results
Fig. 2.
The activity diagram of a SOA application that uses MAWeS services
itself. The MetaPL description must suitably identify the set of parameters that can be modified by the optimization engine. MAWeS will automatically perform a set of simulations varying the values of these parameters to find the optimal set of values. The user can specify the tunable parameters by means of the autonomic MetaPL extension, which defines new MetaPL elements for the Mapping section [8], [9]. The extension introduces in the meta-language the Autonomic tag, which describes the target simulation configurations that can be used for application execution (Fig. 3). Another tag, named Parameter, identifies a variable that affects the overall performance, whose optimal value has to be evaluated by the framework. The parameter domain can be specified in terms of range (minimum and maximum), or in terms of a list of valid values. The tag Constrain allows to introduce limits in the parameters domain using mathematical relations between the parameters. As mentioned before, the use of narrow variation ranges may help to prevent high autonomic overheads. Finally, the tag Target identifies the simulation output parameter that is the optimization target (e.g., the response time to be minimized).
... 2 4 1 band/processes=2 ... Fig. 3. Fragment of a MetaPL description of a Web Services client with autonomic extensions. The parameters definition (processes and band) and the constraints are shown in italics
The use of composite services, i.e., of services exploiting several sub-services or sharing the task among them, has been studied mainly in the context of Workflow Management. Standard languages such as BPELWS, WSFL o XLANG have been developed to support service composition through choreography and orchestration techniques. [18]. The difficulties to obtain satisfactory performance in such systems make them ideal candidates to MAWeS autonomic optimizations.
D. Related work The majority of autonomic computing models rely on feedback control mechanism to provide self-configuration and/or self-optimization capabilities (see for example [16]). The use of predictive autonomicity was firstly introduced in [7]. Further research is currently performed on the use of feedforward control, and on the joint use of feedforward and feedback [4]. The adoption of simulation-based tools in the context of autonomic computing has been introduced by the authors in [8] and [9]. Previously, the use of simulation for Web Services, and of Web Services interfaces to simulation, has been described in [17].
III. AUTONOMIC W EB S ERVICES Web Applications are composed of two main elements: the Client, the software module that interacts directly with the final user, and the Services, a set of distributed functions and objects with Web Service interface. Usually, but not mandatorily, the client module maintains the application status during the execution and exploits the services for elaboration tasks (i.e., manipulation of data, access to data bases, . . . ). The development of a new application consists of the definition of new clients, which interact with the final user, of a new set
JOURNAL OF AUTONOMIC AND TRUSTED COMPUTING, VOL. -, NO. -, NOVEMBER 2006
of services, as reusable as possible, and of the aggregation of the existing ones. Web Services systems are affected by significant call stack overheads, and are difficult to tune when the dimension of the hardware/software system grows, especially in the case of architectures based on composite services. In order to achieve good performance from a web application, it is possible to exploit self-optimization techniques at three different, non exclusive, levels: • Server Level: the autonomic system affects the underlying distributed system, i.e., the optimization actions are performed at the operating system and network levels; • Service Provider Level: the autonomic system affects the service provider, i.e., the optimization actions have impact on the tuning of parameters and on the workload management policies of the set of offered services; • Application Level: the autonomic system affects user applications, modifying their resource use and the order of the actions they perform. While the Server level self-tuning activities are usually managed by system administrators, both Service and Application level optimizations can be adopted by web application developers. Our previous papers [8], [9] were focused on the use of MAWeS (MetaPL/HeSSE Autonomic Web Services) to provide autonomicity at Application Level. In this case, the application developer exploits the predefined client provided by the framework. The client invokes the optimization services, and then the actual services with optimal parameters: the control of the application performance is up to the client, as shown in Fig. 4(a).
(a) Client-side self-optimization
(b) Server-side self-optimization Fig. 4. The application-level and the Service-level approach in MAWeS. In (a), the web application client contains the logic needed to control the optimizations. In (b), each service chooses when to optimize its parameters
An alternative to self-optimizing applications is the integration of the autonomic optimization logic directly in the services. In this case, the client application just aggregates the services, whose behavior self-adapts during the execution. The overall control of the self-tuning features is in the called services, as shown in Figure 4(b). The choice between the
4
two architectures is made on the basis of considerations on the particular service architecture that has been provided with auto-optimizing capabilities. In particular, in the case of composite services, it is better to use the latter method, which hides the service complexity to clients. When the search for optimal parameters takes place inside the offered services, a common problem is how to choose when to start the system performance analysis. We considered the following strategies for triggering the optimization: • Deploy: optimization takes place only at service deployment; • Service Call: each time a client calls the service, it selfoptimizes, tuning itself to the current system status; • Reactive: optimization takes place in correspondence with particular events, such as a timer expiration or a status notification from a load monitor. The first strategy, Deploy, makes the tool unable to react to status changes in the environment: once the service is deployed, it is impossible to re-tune it. Hence it is suitable only for very stable systems, where the status conditions do no change sensibly during the whole application execution cycle. The second strategy, Service Call, imposes high overhead to the service execution: every time that the service is started, a new set of simulations are to be performed. It can be useful in the case of long-running services, where the overhead is dwarfed by computation times, or more in general when one may aspect from optimizations performance advantages that are higher than the introduced overhead. The most flexible solution is using the Reactive approach. In this case, optimizations choices can be performed only when they are really necessary, as the consequence of “significant” system status changes. The optimization trigger can be a notification event chosen by the application developer. This notification can be generated, for example, using a workload profile, that makes it possible to predict the times at which the system workload should change. The autonomic Web Services architecture that will be considered in the rest of this paper is the server-side of Fig. 4.(b), with the MAWeS client integrated into the web service (Fig. 5). When, according the optimization schedule that derives from one of the three strategies mentioned above, the MAWeS client starts the optimization services, the actions a-g in the figure are performed, and a “fresh” set of optimal parameters is produced. On the other hand, calls from the final user’s web client trigger the actions 1-4 in the figure, which execute the end-user service using the latest known set of optimal parameters. IV. T EST APPLICATION : W ORKHOUR In order to show by means of a practical example how the autonomic capabilities provided by MAWeS can be integrated in service-oriented architectures, we have developed a synthetic test application, WorkHour. This application has been implemented using two alternative architectures. The first is a call to an atomic web service, i.e., to a service which does not call other web services. Here, every service self-configures autonomously. The second, instead, is based on composite services.
JOURNAL OF AUTONOMIC AND TRUSTED COMPUTING, VOL. -, NO. -, NOVEMBER 2006
Web Service MAWeSclient Final User MAWeSclient
1. start service
a. start optimization
b. submit MetaPL description
MAWeS Client
5
MAWeS Core MAWeS Core c. create the project
Decision Unit
MetaPL
Service Code
f. return optimal parameters
MAWeS Interface
HeSSEws d. generate traces
Description
Web Client
MetaPL /HeSSE MAWeSCore WSInterface
M/H Client
MetaPLws e. simulate Repeat
2. read parameters
4. end service
Client
Fig. 5.
3. execute service
g. store optimal parameters
Environment Services Service -MAWeSFrontend
Core
WSInterface
Integration of MAWeS in Web Services
Many applications, especially in the entertainment area (video decoders, network gaming systems, ...), can tolerate a momentary loss of output quality, in cases of brief computing resource overloads. For instance, an “avatar” that moves in a video game world can choose a non optimal path to find its objective without appreciable variations of the user experience, or a video stream system can send or decode some frames at lower quality, but nevertheless guarantee its services. Often overload phenomena are not completely random, but they follow some kind of empirical law. For instance, the web or streaming servers have an increase of the number of accesses every day at the same hour. WorkHour is a synthetic test application representative of such a soft real-time system behavior. It simulates the execution of a set of work units which should terminate within a predefined time slot, otherwise a (tolerable) output degradation is observed. Basically, WorkHour exposes a service that performs the conversion of set of images to a compressed jpeg format using the standard Java image libraries. The faster the conversion is executed, the better the output quality. In practice, the optimization criterium to be used is the minimization of execution time, or (what is the same) the maximixation of the converted frame rate. In all the tests presented below, all frames have the same size (720x575 pixels). When the service is called, the back-end retrieves these images from an external repository, and, after the computation has been performed, stores the result frames on the same repository. WorkHour entails network load to read and to store the data from/to the repository (and for subservice calls, in the case of the composite services architecture), plus CPU load to perform the conversion. As far as simulation is concerned, we make the simplifying assumptions that the amount of data transmitted on the network and the CPU time required for conversion are constant for every image. We have implemented ad hoc components (Work Servers) that execute these units. The application has been designed using a web service architecture and the methodology described in previous sections, with the structure shown in Fig. 5.
In order to experiment with both monolithic and compositeservices architectures, the WorkHour end-user service exposes one web interface with two services: • Convert, which performs the computation atomically (i.e., without resorting to subservices); • ConvertCS, which is not able to perform the computation on its own, but distributes the frames to be processed to one or more instances of Convert. Depending on whether the client interface of WorkHour calls ConvertCS, which distributes the work to subservices, or directly Convert, which computes with no use of subservices, the conversion performed by WorkHour will be obtained by a simple or a composite-service architecture. Figures 6 and 7 show two fragments of the MetaPL description of WorkHour, containing the web interface and the autonomic section, respectively. V. E XPERIMENTAL RESULTS In this Section the WorkHour application will be exploited to perform some experimentations. The following case studies, corresponding to three different implementation scenarios, do not cover all possible working hardware/software configurations. In fact, they do not even try to do so, as exhaustive testing would be decidedly out of the scope of this paper. They are just examples designed to give the reader the flavor of the use of MAWeS for autonomic SOA development. In the first scenario, a monolithic system architecture with Reactive strategy (optimization triggered by load analysis notifications) will be used to compare feedforward autonomicity, which is one of the distinctive features of MAWeS, to a more traditional feedback approach. In the two following scenarios, the version of WorkHour based on composite services will instead be used. In the scenario number 2, the hardware is dedicated to the use of WorkHour, which is the only source of CPU and network load, and the Service-call trigger strategy is adopted. Finally, in the scenario number 3 a non-dedicated system is considered and, due to the presence of external load, the Reactive trigger strategy is adopted.
JOURNAL OF AUTONOMIC AND TRUSTED COMPUTING, VOL. -, NO. -, NOVEMBER 2006
numframes frameblock Fig. 6. Fragment of the MetaPL description of WorkHour: the web interface
The tests that follow have been performed on a Rocks 4.1 cluster [19], composed of biprocessor nodes with XEON 2.8 GHz CPU, and 1 GB of RAM, and linked by Gigabit Ethernet. In all of the three scenarios considered, a dedicated node has been used to execute the MAWeS framework and to carry out simulations.
6
... ... Fig. 7. section
Fragment of the MetaPL description of WorkHour: the autonomic
WorkHour Client Convert
WorkHour
Get data
Convert
Store data
Reconfigure
[Reactive strategy] Load analysis
Fig. 8.
MAWeS
Start optimization
Find best configuration
Activity diagram of WorkHour, scenario 1
A. Scenario 1: simple services, feedforward vs. feedback The first testing scenario compares the behavior of WorkHour with feedforward to the more conservative feedback approach. The strategy chosen for triggering the optimizations is the Reactive one, with optimizations started by events produced by a load monitoring facility. These events signal the occurrence of load changes measured by the monitor if autonomicity is based on feedback. In the case of feedforward, they signal instead expected load changes in the near future. These could be recognized, for example, by analyzing current load changes and historical load data. Fig. 8 show the activity diagram of WorkHour in the first scenario. This figure, as well as the ones describing scenario number 2 and 3 presented in the following, are just a version of Fig. 2 where the Client Application (first column from left in Fig. 2) is not shown, the Application Web Services (second column) is expanded in greater detail, and the MAWeS behavior (column 3-6) is just sketched. In order to test the autonomic behavior, and show how the services are able to adapt themselves to the environment, a CPU load is injected in the all the computational nodes in the system (i.e., we apply an external activity on the target
environment so that the service does not have exclusive use of the computational resources). The following tests assume that the external load has only two possible values, and it changes at a given (known) time. As mentioned elsewhere, MAWeS is based on a feedforward approach, and the simulations needed for optimizations are driven by profiles of the (expected) future load. However, it is possible to “hack” MAWeS for evaluation purposes, so as to adopt a more common feedback strategy, with optimizations driven by measured load. The service response time is the index adopted to compare the different solutions. Figure 9 shows the response of the same services, under the same external loads, without any load adaptation (No Adaptation), with feedback (Feedback) and feedforward strategy (FeedForward). The diagram reports on the x axis the start of the work unit, and on the y axis the duration of the unit. It turns out that the feedforward approach gives the best results. The feedback approach, instead, has a very high response time for a few work units (from 10 to about 18 s), then optimizes its performance. This behavior is not surprising, as, when the optimization
JOURNAL OF AUTONOMIC AND TRUSTED COMPUTING, VOL. -, NO. -, NOVEMBER 2006
is based on feedback, the system can optimize itself only after a short period of unoptimal performance. In other words, the reconfiguration is triggered by unwanted working conditions. When this happens, the program can eventually find the optimal parameters through MAWeS.
7
WorkHour Client ConvertCS
WorkHour
ConvertCS
[Service call strategy]
MAWeS
Start optimization
Distribute work
Reconfigure
Find best configuration
[n]
5000 feedforward feedback no adaption
4500
Get data
Convert
Store data
4000
Execution time (ms)
3500 3000
Fig. 10.
Activity diagram of WorkHour, scenario 2
2500 2000 1500 1000 500 0 0
5000
10000
15000
20000
25000
30000
Time (ms)
Fig. 9. The comparison between system evolution without MAWeS, with feedback and with feedforward MAWeS
On the other hand, using the MAWeS feedforward approach, the user must monitor some index (such as the number of accesses to a site) and then study the system evolution through simulation, so as to find optimal parameter values. In this case, there is an unoptimal use of resources whenever it is not possible to predict effectively the evolution of the system. This happens in Figure 9 between 8 and 10 seconds of evolution, where the system has an anomalous behavior due to the misprediction. Apparently, the low execution times seem to indicate high performance, but in fact there is a very high rate of dropped frames. B. Scenario 2: composite services, dedicated system, optimization on service call The following is the first scenario designed to analyze the behavior of composite services. Here the system is dedicated, in that WorkHour is the only source of CPU and network load. In this conditions, probably the best strategy for triggering the autonomic optimizations is the one on Service Call. Due to the absence of external load, there cannot be unexpected and sharp load variations, and the autonomic system can manage to keep the load evenly shared by distributing suitably incoming calls to subservices. As mentioned later, the only drawback of this approach may be the increase of service start latency due to autonomic overheads. For this reason, this strategy is advisable for services with a relatively long service time, which can dwarf autonomic overheads. With optimization on Service Call, before the actual service is started, MAWeS is queried for optimized parameters (in practice, the nodes where the request has to be serviced). The “reconfiguration” (which is not an actual reconfiguration, in this case) just manages the optimal configuration received from the autonomic framework, and the service is distributed accordingly to up to n subservices (Fig. 10).
The test shown here studies the evolution resulting from two service calls partially overlapped in time (Fig. 11). With optimization on service call, the optimization is made once for all at service start. The addition of further load to nodes being used for servicing would entail a loss of the optimized working conditions. Hence the system tries to service the second call using unloaded nodes (in this very simple case, only the third node in the figure). This optimization strategy is very simple and has the advantage to introduce autonomic overheads only at well-known moments in time, i.e., at the start of each service. However, it entails a growth of the service latency (time to actually start the service). In order to point out this overhead, and to evaluate its detrimental effect on service times, Fig. 12 shows the Gantt of the whole system activity (clients, services, MaWeS components) for the above described test. As shown in the figure, as MAWeS evaluates beforehand possible CPU/network load configurations, in this case the simulator is invoked only two times for each service call to simulate alternative CPU and network load configurations. This requires (using the current prototype implementation of the framework) a few seconds of activity. This is overhead that adds up to actual service times. For long-running services (as in the case presented in the Gantt, where the first service runs for over 150 s), this is acceptable. In other cases, it is better to switch to reactive triggering strategy for optimization.
Client Service Client
Call 2 Call 1
MAWeS
Service
Service
Service
Fig. 11. Servicing two calls: the first service is performed on two nodes, the second only on the third node
JOURNAL OF AUTONOMIC AND TRUSTED COMPUTING, VOL. -, NO. -, NOVEMBER 2006
0
6
9
194 s
Client
Service (Host 1) Service (Host 2) Service (Host 3) MAWeS client MAWeScore MetaPL manager HeSSE simulator
Gantt of a system evolution in the scenario 2
WorkHour Client ConvertCS
WorkHour
ConvertCS
Distribute work
almost at the same time, after about 210 s. Using MAWeS autonomic optimizations, when the load monitoring/analysis unit recognizes that a CPU load is expected, triggers the simulation of alternative configurations and possibly the reconfiguration of active load. In Fig. 14, after the reconfiguration, the subservices (Call 1-b and 2-b) are assigned to just one node. Figure 16, which shows the effect on response times, has to compared to the previously considered Fig. 15. Now a few seconds after the external load is injected (58 s), the load of node c0-0 is reassigned (the frames processed by c0-0 suddenly drop). The other two hosts keep processing about 110 frames per second, and terminate the service of the two calls respectively after 115 and 145 s, well in advance as compared to the system with no reconfiguration.
Client
Fig. 12.
8
MAWeS
Reconfigure
[n] [Reactive strategy]
Client Get data
Convert
Store data
Find best configuration
Service
MAWeS
Client
Load analysis
Start optimization
Call 1-b Call 2-b Call 1-a
Fig. 13.
Call 2-a
Activity diagram of WorkHour, scenario 3
C. Scenario 3: composite services, non-dedicated system, reactive optimization
Service
Service
Load manager
Load manager
c0-1
c0-2
c0-0
Fig. 14. Servicing two calls before (Call 1-a and Call 2-a) and after a reactive optimization (Call 1-b and Call 2-b). In the first configuration each call exploits three subservices in distinct nodes, in the second is serviced by a single subservice on one of the unloaded nodes
90 "c0-0" "c0-1" "c0-2"
80
70
60
Frames/s
The aim of this scenario is to show the behavior of MAWeS on a non-dedicated system, i.e., a system where some external, dynamically variable CPU/network load may be present on the hosts. In light of the above, as load tends to vary during the processing of each service call, the autonomic optimization should be triggered by an external event, such as a signal from a load monitoring/prediction unit. Once the optimal configuration (optimal for the next time period, as MAWeS tries to anticipate the need) has been produced, a reconfiguration takes place. This leads to a possible modification of the subservices involved in the service of the call. Unlike in the previous case, the work has already been distributed before optimization, and the subservices are already working when the first reconfiguration is carried out. This behavior is represented graphically in the activity diagram in Fig. 13. The test studies the evolution when two concurrent calls are serviced, as in the previous test. However, now the service composition strategy adopted is to exploit the maximum number of available hosts, even if they are already loaded. Figure 14 shows this behavior for the system (Call 1-a and 2-a), before any reconfiguration is carried out. Figure 15 shows what happens if, starting from this system status, a synthetic CPU load is injected on one host (about 50% of CPU on host c0-0), plus a moderate load on the network (about 5% of the theorical bandwidth). Without reconfiguration (Fig. 15), the loaded host processes about 50 frames per second, and the other two hosts between 80 and 85 frames per time unit. The service of the two calls terminates
Service Load manager
50
40
30
20
10
0 0
50
100
150
200
250
Time (s)
Fig. 15. Frames converted per second: external load is injected in the interval from 50 to 165 s, and the system is not reconfigured
JOURNAL OF AUTONOMIC AND TRUSTED COMPUTING, VOL. -, NO. -, NOVEMBER 2006
120
In the second part of the paper, we have instead presented a soft real-time synthetic test application, and shown some of the tests performed on the field, illustrating the obtained results. These results, though obtained on small-scale systems and without considering exhaustively all possible scenarios, point out that the approach is valid, and deserving further investigation.
"c0-0" "c0-1" "c0-2" 100
80
Frames/s
9
60
R EFERENCES
40
20
0 0
20
40
60
80
100
120
140
160
Time (s)
Fig. 16. Frames converted per second: when external load is injected, all computations are moved to hosts c0-1 and c0-2
VI. F UTURE WORK Even if a prototype version of MAWeS is currently available, and makes it possible to evaluate the ideas on which the autonomic framework is based by performing experimentations, we are indeed convinced that our work on proactive autonomic optimization is at the very start. In particular, extensive testing remains to be done to assess the validity and the optimality of optimizations carried out in systems that are more large and widely distributed than the ones considered so far. At least in theory, scalability and stability of the system might be an issue, but we have not yet detected the problem in our tests. Currently, we are still involved in lowscale experiments, and we are working to the completion of a complete graphical development environment based on Eclipse [20]. As regards further fields of application for MAWeS, our interest is principally focused on applications based on mobile agents. The mobile agents programming paradigm is an emerging approach for distributed programming, especially in the GRID and SOA contexts. In mobile agents systems, optimization problems are of great interest, due to the continuous changes of the execution contexts. An agent is able to suspend its own execution, to transfer itself to another agent-enabled host and to resume its execution at that destination. As a result, the platform usually does not have complete control over the execution node state. In practice, the only solution to guarantee critical requirements seems to be the use of an architecture able to auto-configure and to auto-tune, and this is the point where MAWeS comes into play. This will be one of our future research fields. VII. C ONCLUSIONS In this paper, we have described the process of building service-oriented autonomic applications that are able to selftune by the MAWeS framework. After an overview of the methodology on which MAWeS is based, we have considered the possible design choices for the development of autonomic web services applications.
[1] J. O. Kephart and D. M. Chess, “The vision of autonomic computing,” in Computer, vol. 36, num. 1. IEEE Computer Society Press, 2003, pp. 41–50. [2] K. P. Birman, R. van Renesse, and W. Vogels, “Adding high availability and autonomic behavior to web services.” in Proc. of 26th International Conference on Software Engineering. Edinburgh, UK: IEEE Computer Society, May 2004, pp. 17–26. [3] IBM Corp., An architectural blueprint for autonomic computing. USA: IBM Corp., Oct. 2004. [Online]. Available: www-3.ibm.com/ autonomic/pdfs/ACBP2 2004-10-04.pdf [4] Y. Zhang, A. Liu, and W. Qu, “Software architecture design of an autonomic system,” in Proc of 5th Australasian Workshop on Software and System Architectures, Melbourne, Australia, Apr. 2004, pp. 5–11. [5] B. Jacob, S. Basu, A. Tuli, and P. Witten, A First Look at Solution Installation for Autonomic Computing. IBM Corp., July 2004. [Online]. Available: http://www.redbooks.ibm.com/redbooks/pdfs/sg247099.pdf [6] B. Jacob, R. Lanyon-Hogg, D. K. Nadgir, and A. F. Yassin, A Pratical Guide to IBM Autonomic Computing Toolkit. IBM Corp., Apr. 2004. [Online]. Available: http://www.redbooks.ibm.com/redbooks/ pdfs/sg246635.pdf [7] L. W. Russell, S. P. Morgan, and E. G. Chron, “Clockwork: A new movement in autonomic systems,” in IBM Systems Journal, vol. 42, no. 1. IBM Corp., 2003, pp. 77–84. [8] E. Mancini, M. Rak, R. Torella, and U. Villano, “A simulation-based framework for autonomic web services,” in Proc. of the Eleventh International Conference on Parallel and Distributed Systems, Fukuoka, Japan, July 2005, pp. 433–437. [9] ——, “Predictive autonomicity of web services in the MAWeS framework,” Journal of Computer Science, vol. 2, no. 6, pp. 513–520, 2006. [10] N. Mazzocca, M. Rak, and U. Villano, “The MetaPL approach to the performance analysis of distributed software systems.” in Proc. of 3rd Int. Workshop on Software and Performance. IEEE Press, 2002, pp. 142–149. [11] E. Mancini, N. Mazzocca, M. Rak, and U. Villano, “Integrated tools for performance-oriented distributed software development.” in Proc. of SERP’03 Conference, vol. 1, Las Vegas (NE), USA, June 2003, pp. 88–94. [12] E. Mancini, N. Mazzocca, M. Rak, R. Torella, and U. Villano, “Performance-driven development of a web services application using MetaPL/HeSSE,” in Proc. of 13th Euromicro Conference on Parallel, Distributed and Network-based Processing, Lugano, Switzerland, Feb. 2005, pp. 12–19. [13] N. Mazzocca, M. Rak, and U. Villano, “MetaPL a notation system for parallel program description and performance analysis,” in Parallel Computing Technologies, Lecture Notes in Computer Science, V. Malyshkin, Ed., vol. 2127. Berlin (DE): Springer-Verlag, 2001, pp. 80–93. [14] ——, “The transition from a PVM program simulator to a heterogeneous system simulator: The HeSSE project,” in Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science, J. Dongarra et al., Ed., vol. 1908. Berlin (DE): Springer-Verlag, 2000, pp. 266–273. [15] B. Di Martino, E. P. Mancini, M. Rak, R. Torella, and U. Villano, “Cluster systems and simulation: from benchmarking to off-line performance prediction,” Concurrency and Computation: Practice and Experience, submitted for publication. [16] Y. Diao, J. L. Hellerstein, S. Parekh, and J. P. Bigus, “Managing web server performance with AutoTune agents.” in IBM Systems Journal, vol. 42, num.1. IBM Corp., 2003, pp. 136–149. [17] S. Chandrasekaran, G. S. Silver, J. A. Miller, J. Cardoso, and A. P. Sheth, “Xml-based modeling and simulation: web service technologies and their synergy with simulation,” in Proc. of the 2002 Winter Simulation Conference, vol. 1. San Diego, California, USA: ACM, Dec. 2002, pp. 606–615.
JOURNAL OF AUTONOMIC AND TRUSTED COMPUTING, VOL. -, NO. -, NOVEMBER 2006
[18] W. M. P. van der Aalst, “Don’t go with the flow: Web services composition standards exposed,” IEEE Intelligent Systems, pp. 72–76, Jan. 2003. [Online]. Available: http://jmvidal.cse.sc.edu/library/aalst03a. pdf [19] P. M. Papadopoulos, M. J. Katz, and G. Bruno, “Npaci rocks: tools and techniques for easily deploying manageable linux clusters,” Concurrency and Computation: Practice and Experience, vol. 15, pp. 707–725, 2002. [Online]. Available: http://www3.interscience.wiley. com/cgi-bin/abstract/104524123 [20] E. Gamma and K. Beck, Contributing to Eclipse: Principles, Patterns, and Plug-Ins. Addison Wesley, Oct. 2003.
10