A Methodology & Tool for Determining Inter-component Dependencies ...

5 downloads 12287 Views 563KB Size Report
Third International Conference on Autonomic and Autonomous Systems (ICAS'07) ... of Interceptors in J2EE application servers to generate call traces for the ...
A Methodology & Tool for Determining Inter-component Dependencies Dynamically in J2EE Environments Umesh Bellur [email protected] School of Information Technology, IIT Bombay

Abstract— The emergence of server side component models such as J2EE has considerably simplified the construction of enterprise applications. However the dependence of the enterprise on these applications and their proliferation has shifted the complexity in managing the environment in which these applications operate. The difficulty in management is cheifly due to the inter-dependencies that exist between the components that make up these applications. Changing one component can have unpredictable side effects in the performance and functional behavior of other components because of this coupling. Self management or autonomic computing looks into techniques of self for self-configuration, self-healing and self protection. We believe that in order for a middleware to be self-managing, it is first necessary to obtain and maintain its topology which characterizes the components and their inter-dependencies making up this complex distributed environment. We present here a methodology and a tool to non-intrusively and dynamically extract and store the component topology from a distributed execution environment. The techniques presented in this paper have been evolved primarily for a J2EE environment although they can be applied without loss of generality to any distributed object system.

I. I NTRODUCTION eBusinesses are the norm today and this has led to a whole host of advances to simplify the development of eBusiness applications. Some of these include server side component models such as J2EE and .NET along with IDEs that can automatically generate all of the plumbing needed to build such applications leaving the application developer with the sole task of adding business logic. However the complexity of the production IT environment hosting these applications has skyrocketed and this coupled with the lack of availability of human expertise to manage these environments has prompted research into autonomic computing or the science of self managed systems. These techniques have become even more crucial in an atmosphere of constant change prompted by frequent deployments introducing new application features. It is interesting to note that most of this complexity originates in the well understood concept of coupling which introduces dependencies between components either within an application or across applications. Hence in order to build systems that can manage themselves, it is necessary to know oneself - the components that make up the application and the dependencies on other components. Design documentation such as message sequence charts rarely keep up with changes in code and also prove useless when it comes to documenting dependencies

across applications. It is therefore imperative to have a means by which we can dynamically determine the component map or topology of such computing environments. In this paper we present a way to model this topology and to extract it dynamically from the production environment at runtime. This work is being done in the context of a larger autonomic computing project at IITB named LAMDA1 for Lights out Automated Management of Distributed Applications. The project is a comprehensive effort at providing an integrated autonomic enterprise environment that includes self configuration and self healing. [1]. A. Topology For our purposes we have assumed a component based application environment which is not unreasonable in today’s enterprise. We are currently working with J2EE technologies but are confident that these techniques can be extended without loss of generality to competing technologies such as .NET. The term ’Topology’ is used here to represent the following: • Physical infrastructure and it’s configuration. This covers machines that make up a network along with the network map specifying network topology. Configuration details include physical configuration such as CPU, memory, disk etc. and logical configuration such as OS versions, packages installed etc. • The static and dynamic view of the application components. By static we refer to the interfaces supported by each component and it’s packaging details while by dynamic we refer to the deployment view of components relative to one another. This is necessary to create analytical or simulation models for the purposes of self configuration eventually. • Dependencies that exist between the application components as well as those between application components and infrastructure (software, hardware and network). The topology therefore characterizes the application and its execution environment and provides a language for common understanding of what an application is and what it depends on. The topology can now used for every facet of autonomic computing - self configuration, self healing and self protection. We are currently developing a tool for fault localization in 1 This project is sponsored in part by INTEL IT Research Council via grant 20393 and by an IBM Faculty Award in Autonomic Computing

14 Third International Conference on Autonomic and Autonomous Systems (ICAS'07) 0-7695-2859-5/07 $20.00 © 2007 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on April 29, 2009 at 04:56 from IEEE Xplore. Restrictions apply.

J2EE applications that models the failure dependencies of the application using its topology and uses it for failure diagnosis once the application is deployed. The topology extracted from these applications is also being used for performance prediction using queuing network models. The rest of this paper is organized as follows. Section II presents a taxonomy for existing approaches to topology determination along with related work. We present our dynamic topology determination methodology and the AutoTopology tool for J2EE applications in section III. We then present a case study for a sample J2EE application (Dukes Bank) and the resulting output generated by ’AutoTopology’ in section IV. We end the paper with directions for further research in LAMDA towards building autonomic systems that use this topology information. II. TAXONOMY AND R ELATED W ORK Most interesting issues in topology extraction revolve around determining dependencies between components and so we present below a taxonomy of the various dependency determination approaches and a survey of existing systems that use these approaches. Dependency determination approaches can be classified broadly as static or dynamic based on whether one needs to execute the code or not. Static approaches are obviously useful in that we don’t need to disturb a production environment to place probes etc. Design representations as well as source code can be used to find services and their inter-dependencies as shown in [2] [3] [4]. For example UML class diagrams and message sequence charts which can be obtained by reverse engineering source code depict interconnection patterns in designs. However this approach is useful only when analyzable source is available which is often not the case. More importantly static approaches do NOT allow us to determine cross application dependencies since packaging and deployment is usually done on a per application basis. Dynamic approaches can further be categorized based on the level of instrumentation as intrusive, semi-intrusive and non-intrusive approaches. Intrusive techniques are those that relies on code instrumentation and dependencies are calculated by correlating data gathered during the flow of transactions through various components [5] [6] [7]. This approach will be unsuitable where there are multi-vendor components due to interoperability concerns and in places where code cannot be inserted into the system for security, licensing or other technical constraints. Semi-intrusive approaches do not need to instrument the application code. However the middleware used for deploying the applications is instrumented to to tag incoming requests and trace the request to determine the components used to serve a particular class of request [8] [9]. Faults could be injected into the system and the propagation of these faults could be used to infer component dependencies [10] as well. The ADD project [11] at IBM discovers dependencies by perturbing the system and measuring change in the system response. Such approaches are still intrusive to the middleware used for deployment. Non-intrusive approaches looks on the system as a black-box and involves tracing of

communications and later inferring causal paths and patterns from the trace. The method does not call for modifying either the application or the middleware code.In [12] these traces are obtained by sniffing on all the participating hosts (or we can mirror the communications taking place on all the ports to which the hosts are connected). The disadvantage of this method is that it cannot provide dependencies of components co-located on a single host or process. The outputs are also dependent on where on the communication stack the sniffer is placed. In general, we feel that such black box approaches will produce useful outputs only when they can extract data at the level of individual components in the applications - not just at the level of the container or process. Our approach and tool can extract the topology of any J2EE application without instrumenting the application or the J2EE application server that it is deployed on. We make use of the JVMPI interface provided by all JVM implementations and the concept of Interceptors in J2EE application servers to generate call traces for the application. The topology for the application is generated out of the call traces by executing the use-cases for the application and combining the dependency information in these call traces. The logging involved does incur overheads during extraction but can be stopped whenever the topology extractor is not in use without the need to restart the J2EE application server or redeploy the application. Our approach is therefore mostly non-intrusive and dynamic. III. DYNAMIC T OPOLOGY E XTRACTION M ETHODOLOGY AND AUTOT OPOLOGY T OOL The components in a J2EE application may be deployed within the same JVM, different JVMs on the same machine or may be distributed over multiple machines over a network. In general, the inter-component communication can be viewed to go across a layered application stack as depicted in Figure 9. The component dependencies can be extracted by monitoring the communication at one or more of these layers in the application stack.

Fig. 1.

Topology Extraction at various layers in the system.

Packet sniffing at the network layer can be used to monitor traffic between processes on multiple hosts. For example, a packet sniffer like Ethereal[13] can be used to capture the marshalled remote invocation object being passed to the RMI

15 Third International Conference on Autonomic and Autonomous Systems (ICAS'07) 0-7695-2859-5/07 $20.00 © 2007 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on April 29, 2009 at 04:56 from IEEE Xplore. Restrictions apply.

port of the destination machine, which contains information about the remote object and the method to be invoked on that remote object. However, this approach is not very useful since we cannot determine the client side object identify or the specific interface being invoked. At the JVM layer, Java Virtual Machine Profiler Interface(JVMPI) can be used to log all method calls in the JVM which can be used to extract intra-JVM dependencies. This approach alone does not extend to calls that cross the JVM boundary. Remote Method Invocation(RMI) is used to invoke methods on remote Java objects. We can write a custom RMI socket factory for sending/receiving data that could log all details for the remote invocation. But the complexity of writing such a socket factory and the overheads involved for logging all RMI calls from the JVM makes this approach unsuitable. At the application server layer, calls made on EJBs2 can be intercepted using the Interceptor mechanism provided by J2EE application server. The interceptors at the client and server side log their respective call information which can be correlated later and can yield inter-JVM dependencies too. It is clear from our research that more than one layer would have to be handled invidually and their results correlated to object the complete set of runtime dependencies. In particular, we use JVMPI at the JVM layer and J2EE interceptors at the application server layer to extract the application runtime dependencies. The remainder of this section describes our methodology to establish the topology consisting of the following steps: 1) Profiling and logging from within the JVM using JVMPI 2) Interceptors and logging at the container level. 3) Node analysis to establish traces using these logs at each node 4) Merging call traces across nodes to establish a single trace. 5) Generating topology from this trace. A. JVMPI Profiler Java Virtual Machine Profiler Interface (JVMPI) [14] is a two-way function call interface between the Java Virtual Machine and an in-process profiler agent as shown in figure 2. A profiler agent can register for various events such as class loads/unloads, method entries/exits, JVM shutdown, etc. JVM sends an event by calling NotifyEvent with a JVMPI Event data structure as the argument.From this information we can determine both the client and target object identities involved in the invocation.

Fig. 2. 2 EJB

JVMPI Profiler Agent

stands for Enterprise Java Bean which is a remotely invokable Java object in the J2EE standard

JVMPI provides APIs to remotely turn profiling on and off dynamically so that we can avoid the generation of large logs as well as shutting down and restarting the application server. B. J2EE Interceptors An interceptor is a piece of code that intercepts calls made into an object of an intercepted class. J2EE application servers [15] allow to define interceptors that are hooked into method invocations and field accesses. The points where the interceptors are to be inserted, can be specified in configuration files of the application server.

Fig. 3.

Server Interceptor in the JBoss Interceptor Stack.

When a method call is made on an remote component or object, the client container creates an instance of the invocation class that represents the client call. This object is passed through the list of configured client-side interceptors. We have written our own client-side interceptor that captures this invocation object to obtain the details of the call in progress including the name of the target object, its location and the method to be invoked. Similarly, serverside interceptors process the invocations in the server-side container before routing them to the server component. In order to correlate the call information at the server and client side, we transmit a unique link identifier along with the originating machine details in the payload hashmap inside the invocation object. Once the JVMPI profiler and the client-side and server-side interceptors are deployed (these interceptors can be dynamically (un)deployed on J2EE app servers), we drive the system through each use case. The logs generated are then analyzed to infer runtime dependencies in the application in the form of a call trace i.e sequence of events that occurred in the system while serving the request. The call trace models various execution sequences viz. unconditional sequential path, alternate paths based on some condition, concurrent paths on multiple threads, join path for two or more concurrent paths and a loop path for repeated calls to a set of interfaces which can be clubbed together [Figure 4]. The analysis steps involved in extracting the topology from the logs generated after the profiling are discussed below. C. Node Analysis The logs generated by the profiler and the interceptors are analyzed together to determine the call trace for each

16 Third International Conference on Autonomic and Autonomous Systems (ICAS'07) 0-7695-2859-5/07 $20.00 © 2007 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on April 29, 2009 at 04:56 from IEEE Xplore. Restrictions apply.

ServerInfo.xml. Components.xml contains information on each application component like its name, machine location, type, list of interfaces, and also the contentionable nature of the component. Trace.xml contains the call trace of the application execution. ServerInfo.xml contains the application server specific contentionable resources. E. Generating topology from call traces

Fig. 4.

Execution path types in a call trace.

node in the system. To correlate the entries in the interceptor logs and profiler agent logs, we maintain a unique object identifier variable in our client-side and server-side interceptors which is set whenever these interceptor instances are constructed. In the profiler, when we encounter a call on the interceptor, we fetch the value of object identifier from JVMPI EVENT OBJECT DUMP and log this object id together with the method called. For calls that cross machine boundaries, we use a link identifier to complete the call trace from the end of the originating machine to the remote destination object. The JVMPI logs on the remote destination machine will give us the details of the dependencies of this remote object. Thus completing the end-to-end call trace as shown in figure 5.

Fig. 5.

Profiler and Interceptor Logs.

The output of this step is a trace file which is generated for each node in the distributed system. These trace files may contain unresolved outward links to other machines which are resolved in the next step to generate a single trace file for the entire application. D. Merging call traces across nodes The individual node traces generated by the Node Analyzer are then combined by the Trace Aggregator. It resolves intermachine dependencies and generates the complete call trace for the usecase executed. Details are shown in [Algorithm 1]. Algorithm 2 identifies redundant information in the form of sequence of repeated calls and represents it as a loop for simplicity. Alternate paths based on branch conditions or different use cases having a common prefix trace is also detected here and marked accordingly. The Trace Aggregator generates three XML files, namely Components.xml, Trace.xml, and

The call traces generated by Topology Aggregator can be aggregated into a topology graph. The topology graph is directed graph with nodes representing the components in the application and direction of the edges indicating the usage dependency. For example, if a component A uses component B then the graph will have an edge (A,B) representing this dependency. Each edge is also annotated with a dependency strength that represents the probability of a component A using another component B in the application. Following cases in the original call trace need to be considered while transforming them into topology graphs: 1) Sequential path : A sequential call-edge is mapped directly to corresponding usage edge in the topology graph with the usage probability remaining the same as that before the call is made. 2) Alternate path : When the call trace has an AlternatePath node, the execution can take any of the available alternate paths and the probability of usage splits at this node. If no additional information regarding the probability distribution of taking these alternate paths is known, a uniform distribution is assumed. If the same call-edge occurs multiple times in a call trace, then the probability of making that call needs to be accumulated to generate a single value for the usage probability. Consider the following interesting cases: a) Nested alternate paths : If an alternate split is nested in some other alternate split, then the calledge will be taken if both the alternate splits select the favorable path. Hence, the total usage probability is a product of their individual probabilities. b) Union across child subtrees : If the call edge occurs multiple times (but not nested) in the call trace, then total usage probability is the union of the events that both call-edges will be taken. If they belong to two different paths of the same alternate split, then the union operation will reduce to a simple ADD since the intersection of the events will be zero. 3) Concurrent path : Concurrent paths are executed in parallel, but are reduced similar to the sequential path case. 4) Loop path : The loop path is treated as a single sequential path. Algorithm 3 generates topology graphs for individual usecases. The AggregateAcrossAlternatePaths procedure aggregates the usage probability for common usage edges across child alternate paths through a ADD operation while AggregateAcrossChildren aggregates usage probabilities across child sequential paths using a UNION operation.

17 Third International Conference on Autonomic and Autonomous Systems (ICAS'07) 0-7695-2859-5/07 $20.00 © 2007 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on April 29, 2009 at 04:56 from IEEE Xplore. Restrictions apply.

Algorithm 1 DoTraceAnalysis (startingM achine IP ) 1: Build the DOM tree for the Trace of the starting machine 2: for each unresolved destination machine DEST IP do 3: Read the log and build the DOM tree from trace of DEST IP 4: for all outward links from starting machine to DEST IP do 5: Find the corresponding inward link by matching OutwardT raceLinkID of the starting machine and the InwardT raceLinkID from DOM of destination machine 6: Replace outward link node from starting m/c with children of inward trace link node of DEST IP 7: end for 8: end for 9: RemoveLoops(DocumentRoot) {Identify alternate paths and loops in the call trace} Algorithm 2 RemoveLoops (treeN ode) 1: Initialize : previousSiblings ← φ 2: for each child node C of Node do 3: if C.P athID = S.P athID where S ∈ previousSiblings then 4: if nodes between S and C repeated from C then 5: for all Matching node pairs s and c do 6: repeat 7: Compare subtrees of s and c 8: until Mismatch found or both subtrees empty 9: end for 10: if Mismatch found then 11: Create new AlternatePath node with the two mismatching nodes as the alternate paths 12: end if 13: Delete the repeated nodes after C 14: Create new LoopPath with repeated nodes from S to C as its child nodes 15: Adjust PathID of matched nodes and their subtrees to indicate LoopPath or AlternatePath 16: end if 17: else 18: previousSiblings ← C 19: end if 20: end for Algorithm 3 GenerateTopology(callTree, usageProbability) 1: for each child C of CallTree do 2: if C is an AlternatePath node then 3: usageP robability = usageP robability/numberOf AlternateP aths 4: for all Alternate paths P of C do 5: altU sageEdges[P ] = GenerateTopology(P , U sageP robability) 6: childU sageEdges[C] = AggregateAcrossAlternatePaths(altU sageEdges) {ADD} 7: end for 8: else 9: childU sageEdges[C] = GenerateTopology(C, U sageP robability) 10: end if 11: usageEdges = AggregateAcrossChildren(childU sageEdges) {UNION} 12: return usageEdges 13: end for

The topology of the application can be generated by computing a weighted addition across all the use-cases using the probability of executing the usecase as weights. The use-case probabilities can be obtained from historical data or in the simplistic case assumed to be uniform across all usecases. In the following section we present the output generated by our tool for a sample J2EE application.

IV. C ASE S TUDY - D UKES BANK J2EE A PPLICATION Duke’s Bank demonstrates a selection of J2EE technologies working together to implement a simple on-line banking application. It uses EJBs and web components (JSPs and servlets) and uses a database to store the information. We have deployed the application over JBoss 4.0.2, an open-source

18 Third International Conference on Autonomic and Autonomous Systems (ICAS'07) 0-7695-2859-5/07 $20.00 © 2007 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on April 29, 2009 at 04:56 from IEEE Xplore. Restrictions apply.

J2EE application server. A Components.xml is generated that gives information regarding all the components deployed on various nodes in the system. The following information is listed for each component: • Component type : Entity Bean, Session Bean or Message Bean • Clustered or Non-clustered • Location of component • List of Interfaces. • Resource Utilization : CPU, Memory and Disk Usage (not generated currently) • Contentionable resources eg. thread pool, component object pool, etc. The runtime dependencies in the application is captured as call traces and different types of execution paths are appropriately represented in Trace.xml. A sample call trace for the ”List accounts of customer” usecase is shown in Figure 6.

V. P ERFORMANCE /C ORRECTNESS OF THE E XTRACTION T OOL We have made the following measurements with respect to the topology extraction tool: •





Profiling Overhead: This indicates the increase in response time for those transactions that were profiled for understanding the dependencies. The overhead is caused mainly by the interceptors which log information before handing control back to the data path. Scalability of Analysis: This indicates the time taken to complete analysis of the log files and build the dependency graph. Correctness of the tool: Correctness can be measured as a combination of completness defined as the percentage of dependencies captured and accuracy that we define as the percentage of dependencies that turn out to be false positives. Since our tool is deterministic, we will not generate false positives and our accuracy is 100 percent. So we focus only on completeness.

A. Test setup Our testing and mesurement was conducted on a 3 node distributed system which hosted the DB on one node, the application server on the second node and the web server on the third node. Each node was configured with an Intel P3 1.5GHz processor and 256 MB of memory. The application we used to perform overhead measurements and scalability analysis was a set of benchmark J2EE applications that included PetStore from SUN Microsystems, Dukes Bank, the ECPerf and SpecJAppServer benchmarks from SPEC. Fig. 6.

Call Trace generated by AutoTopology for ’List Accounts’ usecase.

Once call traces for all the usecases is generated and we have the associated probabilities of executing these usecases, we extract the topology graph from those traces [Figure 7]3 .

Fig. 7.

Topology graph generated from the ’List Accounts’ usecase.

Note that all information shown pictorially in the diagrams is also available in XML to be consumed by other tools that form part of the LAMDA suite. The topology tool also generates a ServerInfo.xml file listing the contentionable resources in the application server such as thread pools, invoker pools, etc. Information about these resources is needed so that they can also be modeled as queues in the performance model. 3 Only a small snapshot of the topology graph shown here due to space constraints.

B. Results The results of the profiling overhead study and the analysis scalability study are shown in [Figure ??]. It appears that the overhead is linear to the number of sequentially executions of the interceptors which is intuitive. Note that even parallel log entries cause contention for the log file only if both components that are called in parallel are on the same node. As for the scalability of the analysis, it appears to be quite sensitive to the number of actual dependencies discovered beyond a point. The graph is flat for the first 7-10 dependencies with a linear increase beyond that. However even with a use case that shows about 30 dependencies the total time taken to perform the analysis and output the topology graph is less than a second. Given that the inter-discovery interval is expected to be fairly large (of the order of several minutes to hours), the time taken for analysis is more than acceptable. Completeness of the discovery is driven largely by the transactions that take place during profiling. Dependencies that are sensitive to input data of the transaction (are revealed only under specific inputs) may take longer to discover. For the use cases we had and the data ranges of the input parameters, we were able to generate ALL the dependencies for the benchmark applications. This was verified by a code review which manually created a topology graph for each application.

19 Third International Conference on Autonomic and Autonomous Systems (ICAS'07) 0-7695-2859-5/07 $20.00 © 2007 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on April 29, 2009 at 04:56 from IEEE Xplore. Restrictions apply.

fairly accurate as evidenced by the performance graph shown in Figure 11.

Fig. 8.

Overhead and Scalability Results Fig. 10.

Graph of Response Time Versus Load

VI. U SING T OPOLOGY I NFORMATION A. Topology to Predictive Performance Models Once we have the topology, the next step would be to obtain a performance model for performance prediction purposes. This will help us understand the QoS of the existing application and identify the bottlenecks to be tuned. For our purposes we have turned to layered queueing networks to help create performance models.

B. From Prediction to Self Configuration Once we are able to predict the QoS that can be gotten from an application, we can apply control theoretic feedback techniques to configure the underlying infrastructure automatically as shown in Figure ??.

Fig. 11. Fig. 9.

Strategy for Self Configuration

Translating the Topology to an Analytical Performance Model.

The critical components of a performance model are the various queueus whose parameters will have to be appropriately identified and the service time distributions at the related queueing stations. In our case these queues represent points of contention such as thread pools and synchronized pieces of code protected by synchronization constructs such as monitors. Standard component models allow us to glean this information in a relatively painless manner. The Figure 9 shows a sample analytical model derived from the topology. We have built tools to automatically convert a topology into a LQN performance model which can then be solved using various techniques today. The results of our translations are

Our strategy is of course predicated that the infrastructure whose configuration we are controlling can affect the performance signficantly and therefore help us bring it under control. We are currently investigating various techniques for the configurator and hope to report results soon. C. Topology Based Root Cause Isolation In dealing with self healing, it is our belief that the larger problem is one of root cause isolation which will help us self diagnose the set of known symptoms to a small set of highly probably root causes. We therefore take the following approach toward self diagnosis:

20 Third International Conference on Autonomic and Autonomous Systems (ICAS'07) 0-7695-2859-5/07 $20.00 © 2007 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on April 29, 2009 at 04:56 from IEEE Xplore. Restrictions apply.

Model dependencies in the system. Use it to account for fault propagation – Assumption: Failure dependencies follow usage dependencies. • Build a model based on causality graph of components. • Assume no failure data available at start. • Use Bayesian Belief Networks (BBNs) for diagnosis. This helps us handle uncertainty and simultaneous failures. We have developed models and tools to auto generate BBNs from a topology graph and the results have been submitted for review. • •

VII. C URRENT S TATUS AND F UTURE W ORK The AutoTopology project[16] can extract dynamically the topology of any J2EE application without instrumenting the application or the server used for deployment. The output of the tool is a topology graph with components as nodes and usage dependency modelled by the edges with an associated usage probability. Other projects in LAMDA for self-healing and selfconfiguring systems work off this topology information. The system dependencies extracted thus is being used to model the application as a queuing network for performance prediction and capacity planning. The root-cause analysis project models the failure dependencies of the application using a Bayesian Belief Network(BBN) constructed from the topology graph. This bayesian model is used for predicting the most probable set of faulty components given the failures observed in the application.

[10] G. Kar S. Bagchi and J. L. Hellerstein. Dependency analysis in distributed systems using fault injection: Application to problem determination in an e-commerce environment. In DSOM ’01: Proc. of the twelfth international workshop on Distributed Systems:Operations & Management, France, Oct. 2001. INRIA. [11] G. Kar A. Brown and A. Keller. An active approach to characterizing dynamic dependencies for problem determination in a distributed environment. In In Proc. of IFIP/IEEE International Symposium on Integrated Network Management, pages 377–390, France, May 2001. INRIA. [12] Marcos K. Aguilera, Jeffrey C. Mogul, Janet L. Wiener, Patrick Reynolds, and Athicha Muthitacharoen. Performance debugging for distributed systems of black boxes. In SOSP ’03: Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 74– 89, New York, NY, USA, 2003. ACM Press. [13] Brad Hards. A guided tour of ethereal. Linux J., 2004(118):7, 2004. [14] Sun Microsystems. Java Virtual Machine Profiler Interface (JVMPI). http://java.sun.com/j2se/1.5.0/docs/guide/jvmpi/jvmpi.html. [15] JBoss. Application Server. http://www.jboss.org/products/jbossas. [16] Sudhir Reddy. Dynamic topology extraction for component based enterprise application environments. Master’s thesis, Kanwal Rekhi School of Information Technology, IIT Bombay, 2005.

R EFERENCES [1] Umesh Bellur. Topology based automation of distributed applications management. In Proceedings of the fourth international workshop on Software and performance, pages 171–173. ACM Press, 2004. [2] Susan L. Graham, Peter B. Kessler, and Marshall K. McKusick. gprof: a call graph execution profiler. In SIGPLAN Symposium on Compiler Construction, pages 120–126, 1982. [3] Hesham El-Sayed, Don Cameron, and Murray Woodside. Automation support for software performance engineering. In Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 301–311. ACM Press, 2001. [4] Gordon P. Gu and Dorina C. Petriu. XSLT transformation from UML models to LQN performance models. In Proceedings of the third international workshop on Software and performance, pages 227–234. ACM Press, 2002. [5] G. Kaiser, P. Gross, G. Kc, J. Parekh, and G. Valetto. An approach to autonomizing legacy systems. In Workshop on Self-Healing, Adaptive and SelfMANaged Systems (SHAMAN 2002), 2002. [6] Systems Management: Application Response Measurement. Open-group technical standard c807. http://www.opengroup.org/tech/management/arm/. [7] Brian Tierney, William Johnston, Brian Crowley, Gary Hoo, Chris Brooks, and Dan Gunter. The netlogger methodology for high performance distributed systems performance analysis. In HPDC ’98: Proceedings of the The Seventh IEEE International Symposium on High Performance Distributed Computing, page 260, Washington, DC, USA, 1998. IEEE Computer Society. [8] Mike Y. Chen, Emre Kiciman, Eugene Fratkin, Armando Fox, and Eric Brewer. Pinpoint: Problem determination in large, dynamic internet services. In DSN ’02: Proceedings of the 2002 International Conference on Dependable Systems and Networks, pages 595–604, Washington, DC, USA, 2002. IEEE Computer Society. [9] R. Isaacs and P. Barham. Performance analysis in loosely-coupled distributed systems. In In 7th CaberNet Radicals Workshop, Oct. 2002.

21 Third International Conference on Autonomic and Autonomous Systems (ICAS'07) 0-7695-2859-5/07 $20.00 © 2007 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on April 29, 2009 at 04:56 from IEEE Xplore. Restrictions apply.

Suggest Documents