Towards autonomic computing middleware via reflection - CiteSeerX

17 downloads 89502 Views 624KB Size Report
while reflective computing supports how to monitor and control. ... application server, which can automatically optimize itself ..... of Apache web server. Then, they ...
Towards Autonomic Computing Middleware via Reflection Gang HUANG, Tiancheng LIU, Hong MEI, Zizhan ZHENG, Zhao LIU, Gang FAN School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China. E-mail: {huanggang, liutch, zhengzzh, liuzhao, fangang}@sei.pku.edu.cn, [email protected] Abstract Autonomic computing middleware is a promising way to enable middleware based systems to cope with the rapid and continuous changes in the era of Internet. Technically, there have three fundamental and challenging capabilities to an autonomic computing middleware, including how to monitor, reason and control middleware platform and applications. This position paper presents a reflection-based approach to autonomic computing middleware, which shows the philosophy that autonomic computing should focus on how to reason while reflective computing supports how to monitor and control. In this approach, the states and behaviors of middleware-based systems can be observed and changed through reflective mechanisms embedded in middleware platform at runtime. On the basis of reflection, some autonomic computing facilities could be constructed to reason and decide when and what to change. The approach is demonstrated on a reflective J2EE application server, which can automatically optimize itself in the standard J2EE benchmark testing.

1. Introduction The rapid evolution and pervasiveness of network and distributed applications make middleware technologies proliferate [13]. Such proliferation can be perceived from three dimensions. Firstly, middleware encapsulates more capabilities to manage underlying computing resources while these functions are traditionally considered as the major functions of distributed operating systems. Secondly, though middleware is originated from problems common to most of distributed systems, it implements much more functions only usable in a specific application domain, such as finances, retails, telecommunications, etc. Thirdly, middleware provides some facilities, such as component models, to help the development and deployment of distributed systems. Furthermore, the proliferation can also be perceived from the fact that a middleware product provides much

more plentiful functions and qualities than ever. For example, J2EE (Java 2 Platform Enterprise Edition) [14] provides JDBC (Java Data Base Connectivity), JTA (Java Transaction Architecture), JMS (Java Message Service), RMI (Remote Method Invocation) and other functions, which are traditionally provided by multiple middleware products respectively. Nowadays, the extremely open and dynamic natures of Internet make user requirements and operating environments change frequently. It implies that the plentiful functions and qualities of middleware should be able to change. Such changes usually have to be performed at runtime because many distributed systems attempt to provide 7(days) x 24(hours) availability. Obviously, it is very difficult for administrators to manage middleware in the sea of rapid, continuous and real-time changes. As a result, the benefits from the proliferation of middleware may be decreased drastically because of the increasing complexity and cost of middleware management. Recently, autonomic computing is becoming a hot topic in computer technologies because it reveals the practical and urgent requirements on the computer-based system that can manage itself and then promise customers to drastically decrease the cost of management [7][8]. It is a natural idea to equip middleware with the capability of autonomic computing, which could be called autonomic computing middleware. Technically, autonomic computing middleware could be considered as the middleware able to monitor (observe runtime states and behaviors), reason (analyze the states and behaviors and then decide when and what to change) and control (perform changes of runtime states and behaviors) itself. For a given middleware, its observable and changeable states and behaviors at runtime are determined by the middleware itself, while the analysis and decision making depend on not only the middleware but also the deployed applications and responsibilities of the administrators. For example, the workload can be represented by the utilization of CPU and memory and adjusted via setting some specific configuration parameters. But what is overload and how to deal with it may be different according to different applications or

Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04) 0730-3157/04 $20.00 © 2004 IEEE 1

administrators. The overload may be indicated by 60 percent utilization of CPU and 80 percent utilization of memory or other percents. The overload can be eliminated via reducing the size of the thread pool, decreasing the number of active component instances, or migrating a set of components to other hosts. In other words, autonomic computing middleware should provide a sophisticated framework, only specific to middleware, to monitor and control itself while provide a flexible framework to reason itself depending on the middleware, applications and administrators. This paper presents an approach to autonomic middleware based on reflective mechanism. In this approach, reflective mechanism exposes up-to-date states and behaviors of middleware platform and applications and allows them to be changed at runtime [10]. A set of facilities built on reflective mechanism are responsible for observing, aggregating and analyzing runtime information to determine conditions for change, and changing the runtime system under the guide of policies. The rest of the paper is organized as follows: section 2 outlines the framework that implements an autonomic computing middleware through a reflective middleware; section 3 presents a case study of how to enable a reflective middleware to optimize itself automatically; section 4 introduces some related work and the last section concludes this paper and identifies the future work.

2. Approach Overview 2.1. Autonomic Computing Autonomic computing is a systematic approach to achieving computer-based systems managing themselves without human interventions [7]. An autonomic computing system have four basic characteristics [8]: selfconfiguration frees people to adjust properties of the system according to changes of the system and environment; self-optimization frees people to achieve best-of-the-breed utilization of resources; self-healing frees people to discover and recover or prevent system failures; self-protection frees people to secure the system.

Figure 1. Technical Model of Autonomic Computing Technically, an autonomic computing system is a normal computational system with autonomic ability. As shown in Figure 1, an autonomic computing system can be divided into two parts performing different computations:

basic computation utilizes computer and network resources to solve problems in its application domain; autonomic computation, distinguishing autonomic computing systems with other computational systems, is responsible to make basic computation reliable, secure and efficient. In details, autonomic computation collects, measures and analyzes the states and behaviors of basic computation, and then decides when and how to adjust states and behaviors of basic computation.

2.2. Reflective Computing Reflection, also known as computational reflection, is originated by B.C. Smith to access and manipulate the LISP program as a set of data in execution [12]. Since it helps to achieve flexible and adaptive systems, reflection is propagated into operating system, distributed system and middleware one by one.

Figure 2. Technical Model of Reflective Computing Figure 2 illustrates the fundamental concepts of reflection. A reflective system is a computational system with two levels. The base level consists of base entities that perform the usual functionality of the system, that is, the basic capability of a computational system regardless of whether it is reflective or not. In details, it builds a model to represent the problem domain and then reasons and manipulates on the model to solve the problems. The meta level consists of meta entities that perform reflection on the system. It builds a model to represent the base level. This model, called self-representation of the system, is causally connected with base entities, that is, changes of base entities will immediately lead to corresponding changes in self-representation, and vice versa [9]. The computation in the meta level is to guarantee the causal connection between self-representation and base entities. Then, a reflective system can be formally defined as the computational system having the ability, called reflection, that its internal states and behaviors can be accessed and modified through its causal connected self-representation.

2.3. From Reflective to Autonomic From the above analysis of autonomic computing

Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04) 0730-3157/04 $20.00 © 2004 IEEE 2

system and reflective system, it can be concluded that both of the systems perform special computation on basic computation, that is, autonomic computation and reflective computation respectively. Most importantly, it is obvious that autonomic computation can utilize reflective computation as shown in Figure 3: firstly, reflective computation collects the states and behaviors of basic computation; secondly, autonomic computation measures and analyzes these data and decides when and what to change; finally, reflective computation decides how to change and enforces the changes. Since reflective computing has been studied for more than twenty years and demonstrated in many computational systems, it brings sophisticated foundation for autonomic computing. And also, autonomic computing cannot substitute reflective computing because some monitoring and controlling work still require human interventions.

Figure 3. Autonomic Computing via Reflection Reflective middleware is the middleware introducing reflection into its construction and management so as to make its internal states and behaviors accessible and modifiable at runtime [1]. For a reflective middleware, the autonomic capability can be implemented by the framework as shown in Figure 4.

reflective API, which always guarded by access control to prevent hostile operations’ damage. In autonomic facilities, Meta-Data Repository stores information collected at run time and aggregates them using special rules to support analysis in a historical way, like statistics. Policy Repository stores a set of policies that help to make decisions. Reasoning Engine analyzes up-to-date or history data to identify when to change, and selects a policy to decide what to change. No matter how powerful the autonomic facilities are, the middleware still need human interventions. Through a GUI, people can manage middleware platform and applications and add, remove or change the policies. The framework is demonstrated in a J2EE application server, called PKUAS (Peking University Application Server) [6][10]. Because of the length limitation of this paper, we will only discuss how to enable PKUAS to optimize itself automatically. More technical details can be found in [6] and [10].

3. Self-Optimization of Reflective J2EE Application Server In PKUAS, self-optimization is considered as automatic management of computing, networking and storage resources to achieve pre-defined performance metrics, like utilization, throughput and response time. Here, we will discuss how to automatically adapt the size of the thread pool in PKUAS to achieve the best tradeoff between throughput and response time in ECperf, which is the standard benchmark for J2EE application server. Such self-optimization of PKUAS reduces 80% testing time.

3.1. ECperf Testing without Self-optimization

Figure 4. Autonomic Computing Middleware via Reflection The framework consists of reflective middleware, a set of autonomic facilities, a reflective programming model providing application programming interface (reflective API) and a graphical user interface. In reflective middleware, the base entities are usually implemented as a set of components, which can be observed and adjusted independently. The meta entities can be accessed through

ECperf simulates the typical transactions in e-business applications with a lot of clients sending requests simultaneously and repeatedly. The generation rate of requests (called txRate) determines the number of concurrent client calls. Higher the txRate is, higher throughput J2EE application server achieves in the test until the server is overloaded. Then, the tester has to try different txRate and different resource configuration to get the best throughput. Unfortunately, the txRate in the default ECperf cannot be changed at runtime. That means it takes very long time to get the best test result. As shown in Figure 5, the highest throughput of PKUAS appears at 16 txRate, while the best tradeoff between throughput and response time appears at 15 txRate (Testing computer configuration: Intel P4 2.8GHz, 512M Memory, Windows 2000 Server). It takes 1530 minutes, 3 (times at least for configure resources for a given txRate) x 17 (times for trying different txRates) x 30 (minutes at least to execute ECperf one time), to get the

Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04) 0730-3157/04 $20.00 © 2004 IEEE 3

results. If we try to find the most precise configuration of resources, much more tests have to be done. Moreover, such tiring test has to be done again in another machine with different computing power.

get larger throughput. But too many threads will cause competition among them and the thread switch time also reduces the throughput. To choose a proper thread pool size, the process capability of the server machine, the complexity of the business logic and some other factors must be taken into consideration. Unfortunately, these factors are often different in different systems, even in different deployment environments for the same system.

Figure 6. Self-Optimization Model for ECPerf

Figure 5. Throughput and Response Time of ECperf Running on PKUAS without Self-Optimization

3.2. Self-optimization Model for ECperf In PKUAS, most actions of resource management are performed as observing and changing some meta data through reflection, such as the size of the thread pool and the size of the JDBC connection pool per PKUAS instance, the size of the EJB instance pool and the interval to passivate instances per container instance, the size of connection pool and the size of message buffer per interoperability protocol, and so on. PKUAS can optimize itself through selecting one or more performance metrics and setting the above meta data to satisfy the metrics. After carefully investigating the meta data related to resource management, we find that the size of the thread pool is the most important factor affecting the throughput and response time in ECperf. PKUAS uses a thread pool to control the use of threads, as shown in Figure 6. Working threads are the real threads that serve incoming requests. The controller is a thread that creates and destroys working threads and dispatches incoming requests to working threads. The task queue is used to buffer incoming requests when working threads cannot handle the requests simultaneously. In general, fewer threads get shorter response time and more threads

The communication layer is responsible for receiving incoming requests and putting them into the task queue. The thread pool controller is responsible to dispatch these requests, record runtime information, for example, the working threads number, the idle threads number, and stores the information in ThreadPoolMetaObject. The container system records the response time of each request, counts the requests handled by it and stores the information in ContainerMetaObjects. Both the thread pool and container system provide reflective API by calling which the reason engine can access their meta data. The reason engine is responsible to analyze the statistical information gotten from ThreadPoolMetaObject and ContainerMetaObject, make the decision whether increasing, decreasing or keeping the thread pool size according to the given policy, and finally set the pool size back to ThreadPoolMetaObject. Each policy can be represented by the throughput (t), response time, and CPU utilization (u). The policy for ECperf should be the best tradeoff between the throughput and response time without CPU overload, e.g., 90% CPU utilization.

3.3. ECperf Testing with Self-optimization Using self-optimization, the ECperf test can be simplified and consumes less time. The process of ECperf testing with self-optimization can be shortened to two steps, that is, automatically finding the best configuration of PKUAS (i.e., the size of the thread pool) and ECperf (i.e., the txRate) with self-optimization and then running ECperf without self-optimization again to generate the

Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04) 0730-3157/04 $20.00 © 2004 IEEE 4

standard ECperf testing result. At the client side, because the modification of source codes is not allowed by ECperf specification, another standalone client, called autonomic test client, is build to change the txRate and test time at runtime. The client starts ECperf with 1 txRate. When PKUAS proposes to increase the txRate, the client will interrupt ECperf and restarts it with the txRate increased 1. When PKUAS proposes to run ECperf with a given txRate, the client will interrupt ECperf and restart it with the given txRate. At the server side, the reason engine monitors and analyzes the throughput, response time and CPU utilization. The reason engine can make three decisions. First, if the reason engine gets a throughput value and the CPU utilization does not exceed 90%, the reason engine will try both to increase and decrease the thread pool size. Second, if after changing thread pool size the throughput does not increase and the CPU utilization is still below 90%, the reason engine will tell the client to increase the txRate. Third and final, if the response time increases dramatically (e.g., more than 50%) or the CPU utilization exceeds 90%, the reason engine will tell the client to increase the txRate. If the response time or CPU utilization still increases, it means the previous txRate and thread pool size are the best configuration. If they reduces or keep unchanged, it means the reason engine has to try other thread pool size and larger txRate. Once the best configuration is detected, the reason engine will stop the collection and analysis of runtime information so that the standard ECperf testing with the best configuration can decrease the impact of self-optimization. Figure 7 shows the throughput and response time in the above test. It runs on the same PC used in the test as shown in Figure 5. Note that, EJB call, not Business Operation, is used as the count unit. They have linear relationship although they are not the same. According to the figure, the best tradeoff between the throughput and response time appears at 15 txRate, which is the same as we do the test manually. At the same time, the CPU utilization keeps 30% to 80%. The first step consumes about 270 minutes, 18 (times for trying different txRates) x 15(minutes for PKUAS to try different configuration, collecting data and reasoning), to get the results. And the second step consumes 30 minutes to run ECperf and get the standard report. So the whole process consumes 300 minutes, which is about 20% of the manual test time. Self-optimization itself consumes CPU time and other computing resources. It impacts the performance of the whole system more or less. For ECperf with 15 txRate, the throughput without self-optimization is 1458.67 BBops/min and the throughput with self-optimization is 1414.67 BBops/min. That is, only 3% throughput is decreased by the self-optimization. Furthermore, self-

optimization can be stopped and started on demand because it is configured as a plug-and-play service of PKUAS. The performance impact can be controlled by the administrators.

Figure 7. Throughput and Response Time of ECperf Running on PKUAS with Self-Optimization

4. Related Work Recently, an international workshop on autonomic computing is held [15], and a special issue is also appeared in IBM Systems Journal [11]. The papers in the workshop and special issue cover a wide range, including middleware, network, storage and grid, which indicate that autonomic computing gains more attention from industry and academia. IBM WebSphere reviews itself from the perspective of autonomic computing [3]. It already supports many cases of autonomic computing, which are heuristic to PKUAS. Though WebSphere seems not utilize the reflection, we cannot compare it with PKUAS because there are little information on its internal implementation. JAGR is a self-recovering J2EE application server [2]. It adds a set of recovery-oriented computing components into JBoss, which is an open-source J2EE application server with some reflective mechanisms [5]. To discover the failures, JAGR monitors exceptions thrown by application and platform components in JBoss to identify the potential failures, monitors exceptions received by clients to confirm whether the previous potential failures

Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04) 0730-3157/04 $20.00 © 2004 IEEE 5

are actually failures or not, and monitors the invocation path per client request and aggregate the results to detect the failures from anomalous paths. To recover the failures, JAGR builds a failure-path graph to identify which components should be rebooted when a given failure is discovered. Diao et al. take an experiment on self-optimizing web server [4]. They find that the utilization of CPU and memory are determined by two configuration parameters of Apache web server. Then, they build a set of agents on Apache to automatically adjust the two parameters to meet the resource utilization pre-defined by administrators. Compared to the self-optimizing case of PKUAS, they select resource utilization as performance metrics while PKUAS selects throughput and response time simultaneously. As a result, their detailed adaptation behaviors are different than those of PKUAS. Though the above two experiments utilize reflective mechanisms more or less, they are unaware of reflection. Furthermore, they do not systematically investigate the approach to autonomic computing middleware based on reflection, while the autonomic computing should be studied and achieved in a systematic way [7][8].

5. Conclusion and Future Work Autonomic computing middleware is a promising way to improve the benefits for middleware proliferation. This position paper presents an approach to achieve autonomic computing middleware through reflection. In details, reflective mechanisms provide a causally connected selfrepresentation of the middleware, which guarantee that changes of the self-representation will immediately lead to corresponding changes in the system, and vice versa. Based on reflective mechanisms, a set of facilities for analysis and planning can be easily constructed. To demonstrate the approach, we discuss how to enable a reflective J2EE application server to optimize itself automatically in the standard J2EE benchmark testing. There are still many issues remaining to complete the autonomic computing middleware framework, including the meta data warehouse and its online analysis, auto generation and evaluation of policies, and generic decision maker. Based on this framework, more autonomic computing cases should be studied and evaluated.

2001AA113060; the Major Project of Science and Technology Research of Ministry Of Education P.R.C. under Grant No. 0214 and IBM University Joint Study Program.

References [1] [2]

[3] [4]

[5]

[6]

[7]

[8] [9]

[10]

[11] [12] [13]

[14] [15]

Acknowledgement This effort is supported by the Major State Basic Research and Development Program of China (973) under Grant No. 2002CB31200003; the National Natural Science Foundation of China under Grant No. 60233010, 60125206; the National High-Tech Research and Development Plan of China under Grant No.

Agha, G. editor. Special Issue on Adaptive Middleware, Communications of the ACM, 2002, 45(6). Candea, G., E. Kiciman, S. Zhang, P. Keyani, A. Fox. JAGR: An Autonomous Self-Recovering Application Server. In Proceedings of Autonomic Computing Workshop and Fifth Annual International Workshop on Active Middleware Services (AMS’03), June 2003. Connor, J. Building e-business resiliency through autonomic computing. October 2002. Diao, Y., J.L. Hellerstein, S. Parekh and J.P. Bigus. Managing Web Server Performance with AutoTune Agents. IBM Systems Journal, Vol 42, No 1, 2003. pp. 136-149. Fleury, M. and Reverbel, F. The JBoss Extensible Server. In Proceedings of IFIP/ACM Middleware 2003, LNCS 2672, pp 344-373, 2003. Huang, G., H. Mei, F.Q. Yang. Runtime Software Architecture based on Reflective Middleware. Science in China (Series F), Vol. 47, No.4, 2004. IBM. Autonomic Computing: IBM’s Perspective on the State of Information Technology, http://www.ibm.com/ research/autonomic, 2001. Kephart, J.O. and Chess, D.M. The Vision of Autonomic Computing. IEEE Computer, January 2003, pp. 41-50. Maes, P. Concepts and Experiments in Computational Reflection, Proceedings of ACM Conference on ObjectOriented Programming, Systems, Languages and Applications (OOPSLA' 87), Orlando, FL USA, October 1987, pp.147-155. Mei, H. and G. Huang. PKUAS: An Architecture-based Reflective Component Operating Platform, invited paper, 10th IEEE International Workshop on Future Trends of Distributed Computing Systems, 26-28 May 2004, Suzhou, China. Ritsko, J.J, editor. Special Issue on Autonomic Computing, IBM Systems Journal, Vol.42, No.1, 2003. Smith, B.C. Reflection and Semantics in Lisp. In Proceedings of ACM POPL’84, 1984, pp. 23-35. Soley R. and the OMG Staff Strategy Group, Model Driven Architecture: OMG White Paper, Draft 3.2, http://www.omg.org/mda, Nov 27th, 2000. SUN Microsystems, Java 2 Platform Enterprise Edition Specification, Version 1.3, Proposed Final Draft 4, 2001. Titsworth, F., editor. Proceedings of Autonomic Computing Workshop and Fifth Annual International Workshop on Active Middleware Services (AMS’03), IEEE Computer Society, June 2003.

Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04) 0730-3157/04 $20.00 © 2004 IEEE 6

Suggest Documents