Special Issue on Ubiquitous Computing Security Systems
APPROACH TO DETERMINING AN EXTERNAL PROBLEM FOR SELF-HEALING
Jeongmin Park, Joonhoon Lee, Hyunsang Youn, and Eunseok Lee School of Information and Communication Engineering Sungkyunkwan University Suwon 440-746, South Korea
[email protected],
[email protected],
[email protected],
[email protected]
ABSTRACT Self-healing is a methodology used for constructing a system that can detect faults and recover itself and returns from an abnormal state to a normal state. Much attention has recently been focused on self-healing ability that recognizes problems arising in a target system. However, if a system wants to provide self-healing functionalities, there are many loads such as target system analysis and system environment analysis for external problem. Thus, this paper proposes using deployment diagram for self-healing approach to determine problem arising in external environment. The UML deployment diagram is widely used for resource specification of a system and generally designed in the system design phase. The approach proposes of 1) analysis for associations between software and hardware; 2) generating a monitor using constraints in deployment diagrams; and 3) adding the monitor to the component after adapting it to the specific software architecture. As proof of the approach, we automatically generate a resource monitor automatically, and used a video conference system. We illustrate how the method detects anomalies using the example. Keywords: External problem, Problem Deternmination, External state
1
INTRODUCTION
The complexity of the software execution environment poses new challenges for software developers. When computer systems operate abnormally, detecting and resolving the problem requires much time and effort. Therefore, software should adapt without human intervention to achieve a self-healing ability. Self-healing is concerned with the ability of the system to automatically recover from faults [1,2]. Self-healing components have been the subject of several studies. For constructing a system that facilitates self-healing, Shin et al.[3,4,5] propose self-healing component architecture. Faults can be divided into two types in views of the system: the fault occurred in software and the fault from resources such as a cpu usage, a ram usage, and bandwidth, etc. However, this approach does not focus on faults arising from resources. The monitor for self-healing in the Healing Layer must be implemented by the developer and it requires additional efforts in the software development process. In this paper, we describe an approach to generate the resource monitor automatically by using a UML Deployment diagram. The approach consists of the following steps:
UbiCC Journal – Volume 4
•
• •
Analyzing associations between software and hardware in the UML deployment diagram of a component. Generating the resource monitor using constraints specified by the designer in the diagram. Adding the monitor to the component after adapting the component's structure
Through our approach, resource monitor can be generated automatically using the deployment diagram for a target system. It is useful in implementing the resource monitor for the component because it reduces additional work for monitoring the resources. Component developers just simply modify parts of the monitor generated automatically for adaptation and can easily add healing strategies to it. For illustrating the approach, we tested our method by adapting a video conference system for evaluation. We can see that the monitor generated by our method worked correctly when a resource problem occurred. The next section of the paper describes related work. Section 3 presents the approach in more detail. Section 4 illustrates evaluations for the approach. The paper ends with a summary in Section 5.
670
Special Issue on Ubiquitous Computing Security Systems
2
In this section, we present a self-healing component architecture [3,4,5] and an Autonomic Failure-Detection algorithm [6], which is one of the failure detection methods. 2.1. Layered software architecture for a selfhealing component Each self-healing component consists of a healing layer and a service layer.[3,4,5] The service layer performs tasks requested by another task or component in the system. It also contains active objects, connectors, and passive objects, which are accessed by active objects. The active object can execute another active object or a passive object. In contrast, a passive object is called only by an active object. It cannot perform independently unless another object calls it. The connectors transfer messages to or from tasks and synchronize them. The healing layer makes a decision that an object in the service layer of the component becomes sick, the healing process is launched via connectors. It is composed of 6 objects as follows. •
Component Monitor: This module observes behaviors of each object through messages from connectors in the service layer.
•
Component Reconfiguration Plan Generator: This module produces reconfiguration plans for when a fault occurs in the service layer. It also has information for objects in the service layer.
•
Component Repair Plan Generator: This module constructs self-healing strategies for faulty objects. It has recovery plans for each object in the service layer.
•
Component Reconfiguration Executor, Component Repair Executor: These modules execute plans generated by the plan generators.
•
9
RELATED WORK
Component Self-healing Controller: This module controls the five modules above. This architecture has the following features. 9 The architecture can identify an object with faults. 9 Healing strategies for each object are pre-made.
UbiCC Journal – Volume 4
9
The architecture does not allow detailed mistakes. Only faults that occurred in the component can be detected.
2.2. Autonomic Failure-Detection Algorithm Mills et al. [6] proposed an algorithm that detects failures automatically. In the approach, objects and devices that need to be observed send a signal to the monitor periodically, similarly to a human’s heartbeat. The monitor can manage many components. It determines whether the object or device has a problem by checking the signal over time. Let H p represent the period of a signal. The maximum time for detecting faults will then also be H p . However, faults can occur at any time during the signal period. The average time for detecting faults is H p / 2 . This algorithm can identify whether an object has problems or not in a very short time. However, it has an overhead cost because it requires frequent communication to exchange the signal between the monitor and the objects.
3
PROPOSED APPROACH
In this paper, we present an improved selfhealing component architecture that can recover resource problems. We do not focus on inner problems in this paper because this is covered by Shin et al.[3,4,5] The resource in this case could be independent of the software. The monitor measures the state of resources periodically and decides whether self-healing policies should be adopted or not. For this, we used a modified “heartbeat” algorithm. The algorithm sends the signal to resources. Through this mechanism, the resource monitor can measure values and detect anomalies. 3.1. Architecture for generating resource monitor The architecture can be divided into an analyzing phase and a generation phase. Figure 1 illustrates the flow of structure. The architecture can be divided into an analyzing phase and a generation phase. Figure 1 illustrates the flow of structure. •
UML Deployment Diagram: This is the input of the architecture. The diagram is transformed into an XMI (XML Meta-Interchange)[7,8].
671
Special Issue on Ubiquitous Computing Security Systems
•
XMI Parser: The XMI parser analyzes resource constraints of and associations with the resource. In the analyzing phase, the outputs are monitoring targets and constraints. These outputs are parsed in XML format.
•
Monitor Template Generator: The monitor template generator uses the output of the XMI parser. It generates a monitor template, which detects device problems or resources selected for monitoring. This template is implemented in the specific language.
•
•
means the duration time until the detection of a fault. It can be also said to be the waiting time in the method; initially, its value is 1 second. This value is used as a setting value for experiments and can be changed for any system environment.
Contents CPU usage Mem usage Heartbeat
Configuring: The monitoring template code need to be modified for adaptation. The software developer configures it for the structure of software.
Bandwidth
Resource Monitor: The resource monitor generated by the approach can be adapted to the software directly.
Method
Duration
•
Table 1: Constraints List Input Unit 0.0 ~ 1.0
Percent
0.0 ~ 1.0
Percent
0.1 ~ 1.0 User defined minimum bandwidth User defined connection type Duration time for detecting fault
Second KB/s
Second
Step2 - Analyzing diagram: At first, the node (for example, client, server etc) was identified in the system. Next, constraints for resources, such as the constraints of cpu, Memory, Bandwidth and Heartbeat rate, were identified. The Parsing Engine parses XMI information (Fig. 4.) and generates XML about the two types.
Figure 1: Architecture for generating resource monitor 3.2. Process of approach We present the process composed of 4 steps in this section (Fig. 2). •
Step1 - Specifying the system using a UML deployment diagram: Initially, the software developer creates a deployment diagram (Fig. 3). The deployment diagram is a diagram which represents a static aspect of the system in the UML design model and illustrates associations among components. Constraints proposed within the method are shown in Table 1 Method means linking techniques of network or physical devices and using them for detecting abnormal terminations. Duration
UbiCC Journal – Volume 4
Figure2: Process of approach (4-steps) •
Step3 - Generating monitor template: In this step, the template for an executable resource
672
Special Issue on Ubiquitous Computing Security Systems
monitor was generated by using the information analyzed in the previous step. The Template Generator (TG) performs the generation of a monitor by analyzing the XML generated by the Parsing Engine. It also generates fault processing and anomaly detection routines for each constraint. (Fig. 2) •
Figure3: Deployment diagram example
3.3. Problem detection algorithm In this section, we describe the parts that were adapted to the autonomic fault-detection algorithm relate to our approach (Fig. 5). The resource monitor in the self-healing layer judges the state of the system as abnormal if a reply is sent to the devices or resources and does not return in the period. It was also regarded as abnormal if the values of the resource violated a constraint. In this context, a selfhealing layer should construct a reconfiguration plan and perform it. Unlike related works, Lmax and
Lavg are 1.5 times longer than before because the monitor sends the signal first. The monitor determines that a resource is still in the normal state if a fault has occurred just after replying to the monitor. At this time, it sends a signal that tells it to cycle to a new resource. However, the resource is actually in fault, and a cycle is wasted because the resource is already in trouble. Therefore, our approach takes more time to detect faults than related work.
Figure5: Error detection algorithm 3.4. Self-healing components including resource monitor
Figure4: XMI Information and constraints model derived from a deployment diagram •
Step4 - Composing monitor: In this step, a developer modifies the resource monitor according to the software environment. The fault processing handler or guidelines are actually implemented in the monitor template generation level by the approach. It also performs customization regarding parts needed and parts modified. Afterwards, a resource monitor is added to the self-healing layer or component.
UbiCC Journal – Volume 4
Resource monitoring is illustrated in Fig. 6. The device and self-healing component architecture featured resource monitoring. Devices and the modified architecture available to resources monitoring the self-healing component architecture were designed by E. Shin [2, 3]. Resource monitoring is illustrated in Fig. 6. The device and self-healing component architecture featured resource monitoring. Devices and the modified architecture available to resources monitoring the self-healing component architecture were designed by E. Shin [2, 3].
673
Special Issue on Ubiquitous Computing Security Systems
In this paper, we present an improved selfhealing component architecture that can recover resource problems. We do not focus on inner problems in this paper because this is covered by Shin et al.[3,4,5] The resource in this case could be independent of the software. The monitor measures the state of resources periodically and decides whether self-healing policies should be adopted or not. For this, we used a modified “heartbeat” algorithm. The algorithm sends the signal to resources. Through this mechanism, the resource monitor can measure values and detect anomalies. Figure6: Proposed Self-healing component architecture Six objects used for healing referred components and three objects used for detecting resources and reorganizing is added in this architecture. The added objects are divided into three parts. : External Resource Monitor, External Resource Reconfiguration Plan Generator, and External Resource Reconfiguration Executor.
To evaluate the algorithm, we expressed the basic design of a video-based conference system. The purpose of this system was to successfully conduct a video-based conference. During the meeting, the client should not be interrupted by external problems of the software. In this paper, the purpose was to check whether the client detected errors that arose from the software's external problems after automating the resource monitor and applying it to the client in the videobased conference system by the approach
External Resource Monitor checks the status of external devices and resources. External Resource Reconfiguration Plan Generator makes organizational plans for service levels in accordance with external situations. External Resource Reconfiguration Executor executes the plans. The purpose of the External Resource Reconfiguration Plan Generator is to make plans that prevent other well-operating objects from being affected by other resources by isolating objects that are easily influenced by resources, similar to the organization of the component plans. Self-healing Controller that controls objects in the self-healing layer governs the resource reconfiguration executor to perform a reconfiguration of the service layer. When it comes to external errors, it performs in the same way and allows anomalies of the service layer by minimizing resources.
4 Implementation and Evaluation
UbiCC Journal – Volume 4
Figure7: Parsing Engine Prototype
Figure8: Template Generator Prototype 4.1. Environments To evaluate this approach, we implemented clients of a video conferencing system based on .NET Framework 2.0. We used C# with the implements in MS Windows XP. We used Borland Together for UML modeling. The server was
674
Special Issue on Ubiquitous Computing Security Systems
implemented by Java2 SDK 1.4. The client additionally used DirectShow.NET for the video device. A deployment analyzer and resource monitor template were also implemented in C#. Fig. 3 illustrated the deployment diagram that we used. Fig. 7 and Fig. 8 illustrate the Parsing Engine prototype and Template Generator.
the client, and a routine that prints the error time in a resource monitor in pursuit of the accuracy of the Failure-Detection Latency evaluation. The detection results for various constraints are listed in Table 2. The error detection time, which was estimated for the CPU for 10 times, is shown in Fig. 10. Table 2: Experimental results of the monitoring
4.2. Normal case Resource monitor continues to monitor the resource unless resource performs its work without any anomalies. 4.3. Abnormal case Monitor detects an abnormal state when the measured value was over the normal range or the connection with the other resources was accidentally terminated. Figure 9 illustrates the case when the CPU usage was in excess of 80%. In this paper, we did not focus on self-healing strategies. Therefore, strategies for healing the faulty state were generated by the administrator.
Check list
Constraints
Success of detection
CPU usage
Max 80%
Success
Memory usage
Max 70%
Success
Bandwidth usage
Min 50KB/s
Success
Network connection
Abnormal network determination
Success
Figure 10: Error detection time of resource monitor
Figure 9: Detection of anomalies of CPU by monitor
4.4. Objective of evaluation and the results The purpose of the evaluation is to determine whether the approach recognizes error situations or not within a designated time in applied purpose systems and to compare applied target systems with not applied to the system, if errors occur in the resources. We used programs such as the benchmarking program and forced server determination in the case of extreme situations in the system. Additionally, we added a routine that immediately reports the time when errors occur in
UbiCC Journal – Volume 4
As a result of the evaluation, the resource monitor detected the four items that constraints are set up. Even though there were differences in the average fault detection time, we were able to verify that the resource monitor could detect it within the maximum fault detection time. 5 CONCLUSION This paper proposed an approach to reduce the efforts of a self-healing developer and offered a software architecture that detects the resources available. The produce of resource monitors can be automated by using the deployment diagram. The advantages are listed below.
675
Special Issue on Ubiquitous Computing Security Systems
•
The resource monitor production is automated
•
A strategy is in place in the case of faults in resources.
Until now, developers have to do more effort to implement the monitor which checks resources for the software. However, in this study, we confirmed that we could make resource monitors automatically that can include a self-healing component by a deployment diagram. To evaluate these, we arranged a prototype component and confirmed whether the detection monitor operated correctly when an abnormal situation occurred.
[5] Micheal E.Shin, Jung Hoon An, “Selfreconfiguration in self-healing systems”, Proceedings of the 3th IEEE international Workshop on EASE’06, pp.106-116 (2006). [6] K. Mills, S. Rose, S. Quirolgico, M. Britton, C. Tan, "An autonomic failure-detection algorithm", ACM SIGSOFT Software Engineering Notes, Vol. 29, Issue 1, pp. 7983(2004). [7] G. Booch, J. Rumbaugh, I. Jacobson, "The Unified Modeling Language User Guide", Addison Wesley, pp.100-150 (1999). [8] XMI Online Document, http://www.omg.org/xml
However, we could not overcome a high overhead since signals must be exchanged frequently if errors are to be detected. To solve this problem, a study that investigates selfregulating cycles of exchanging signals between monitors is needed. The study of automation in self-healing strategies for recovering from faulty states remains future work. 6 ACKNOWLEDGEMENT This work was supported by the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MEST) (No. 2009-0077453) and a result of Faculty Research Fund (2008) of Sungkyunkwan University. Corresponding author: Eunseok Lee.
7 REFERENCES [1] B.Topol, D.Ogle, D. Pierson, J. Thoensen, J. Sweitzer, M. Chow, M. A. Hoff-mann, P. Durham, R. Telford, S. Sheth, T. Studwell, “Automating problem determination: A first step toward self-healing computing system”, IBM white paper (2003). [2] D. Ghosh, R. Sharman, H. R. Rao, S. Upadhyaya, "Self-healing - survey and synthesis", Decision Support Systems in Emerging Economies, Vol. 42, Issue 4, pp. 2164-2185 (2007). [3] Michael E. Shin, "Self-healing component in robust software architecture for concurrent and distributed systems", Science of Computer Programming, Vol. 57, No. 1, pp. 27-44 (2005). [4] Michael E. Shin and Jung Hoon An, "SelfReconfiguration in Self-Healing Systems", Proceedings of the Third IEEE International Workshop on EASE'06, pp 89-98 (2006).
UbiCC Journal – Volume 4
676