is based on grid monitoring architecture (GMA). Currently many grid monitoring software use SNMP which is not flexible as it does not provide publish subscribe ...
MOGINMA: AN AUTONOMOUS GRID NODE MONITORING AGENT Arshad Ali, Fawad Nazir, Hamid Abbas Burki, Tallat Hussain Tarar NUST Institute of Information Technology 166-A, Street 9, Chaklala Scheme 3 Rawalpindi Pakistan
Hafiz Farooq Ahmad, Hiroki Suguri Communication Technologies 2-15-28 Omachi Aoba-Ku Sendai 980-0804 Japan
ABSTRACT
The accelerated development in grid computing has positioned it as a promising next generation computing platform. It enables the creation of virtual organizations (VO) for sharing resources distributed across the world. In order to address complex resource monitors and management issues we have proposed architecture for autonomous grid node agent which is based on grid monitoring architecture (GMA). Currently many grid monitoring software use SNMP which is not flexible as it does not provide publish subscribe model, filtering of the monitored data and does not support push model for monitoring parameters thus generating lot of unwanted network traffic. Our proposed architecture is based on GMA so it provides all these functionalities. We argue that our proposed architecture is autonomous as it provides functionality of equality, locality and self-containment in all MoGiNMA implementations (sub-systems). Equality, locality and selfcontainment are main components of autonomic controllability in any sub system as part of autonomous distributed system (ADS). We are carrying out the implementation of the proposed system. Our preliminary results show some performance constraints. The overall concept can lead toward autonomous resource management in dynamic systems like grid. We argue that the proposed system will provide strong foundation that lead toward autonomic computing realization environment.
KEYWORDS Grid Computing, Grid Monitoring Architecture (GMA), Grid Resource Management (GRM), Autonomous Distributed Systems (ADS)
1. INTRODUCTION Computationally expensive calculations are usually performed on supercomputers which cost millions to buy and maintain. Computing costs can be significantly reduced by utilizing existing resources that are speckled throughout the world. Most of the computational resources in the world are underutilized. The Grid [1] can unify these resources to be used as a single resource. People can harness computing resources owned by others when they are idle. Sharing computing resources naturally increases the computing capacity available to all participants, since one research project may use many computing resources while other projects are not using computational power [5]. The lack of computing power and the high cost of supercomputers can be avoided by using the Grid. Grid is an ambitious and exciting global effort to develop an environment in which individual users can access computers, databases and experimental facilities simply and transparently, without having to consider where those facilities are located. The Grid could in principle have access to parallel computers, clusters, farms, local Grids, even Internet computing solutions, and it would choose the suitable tool for a given calculation/application. Therefore the Grid is the most generalized and globalized form of distributed computing one can imagine. The Grid has become a viable form of computing because of
957
IADIS International Conference WWW/Internet 2004
the rapid advancements in networks and the internet. The grid is made up of a variety of remote resources that are owned by many different people and organizations. Wide varieties of geographically distributed resources such as computers, supercomputers, storage systems, data sources are unified forming a Grid. In distributed systems such as the Grid, resources are reserved and released dynamically, network links fail independently and unpredictably, machines and servers connect and disconnect in an arbitrary way. As the systems are becoming more and more complex the complexity is overgrowing the people administering these systems. To tackle this problem we are working towards the development of Autonomic Computing Systems [2]. The Autonomic Computing Systems are self-managing systems which aim to increase reliability, autonomy and performance. The increase in reliability is achieved by designing systems to be self-protecting and self-healing. The increase autonomy and performance is achieved by enabling systems to adapt to changing circumstances, using self-configuring and self-optimizing mechanisms Self-healing is concerned with ensuring effective recovery when a fault occurs. Self-optimization means that a system is aware of its idyllic performance, can measure its current performance against that idyllic and has strategies for attempting improvements. A self-protecting system will guard itself from accidental or malicious external attack. (i.e. being aware of probable threats and having ways of handling those threats ). Self-configuring is a system’s ability to readjust itself automatically to changing circumstances. This may simply be in support of ongoing development or to assist in self-healing, self-optimization or self-protection.
2. MOGINMA ARCHITECTURE 2.1 MoGiNMA Grid Node Agent Architecture MoGiNMA architecture is shown in Figure 1. MoGiNMA is a software system which will be dynamically executed on each resource which is an integral part of virtual organization in Grid systems. MoGiNMA comprises of three basic information stores monitoring data archiving, Event/trap information base and ontology database. The Monitored Data Archiving store is based on RDBMS and is used to archive historical monitored data on the MoGiNMA server (The machine on which MoGiNMA is running). Event/Trap information Base which is based on text files is used to keep record of the consumer’s of monitored data to which MoGiNMA server had to send trap and informs. Ontology1 Database store which is based on XML databases is used to keep ontology information that MoGiNMA agents use to understand the messages which are being passed within peer MoGiNMA agents. There are two other main modules other than the stores, the listener module and the sender module. First of all we will discuss the listener module. The main function of the listener module is to listen to the user requests, interpret according to the data in ontology database and service the user request. Whenever the request comes to the MoGiNMA server it checks whether the request is from registry or from some monitoring software. Then it authenticates the user using the authentication module and using the authorization module to check the user authorization information. If some requesting user information is not with that MoGiNMA server then it communicates with other peers in his peer group to authenticate. After authentication and authorization the request is sent to the request interpreter module. The request interpreter module is used to interpret the type of request and then assign the requested parameters to a specific module which can service the request from the user. This interpreter module has a thread pool which will initiate all the servicing modules and the entire servicing module also again have there own respected thread pools to service multiple user requests at one time. The request interpreter module is scalable as the MoGiNMA agent 2can increase the amount of services it provides and accordingly the request interpreter shall be informed of the new type of requests that can occur. In this paper we will talk about the three basic services which MoGiNMA server can provide, that are monitoring module service, archived data access service and ontology update service. The monitoring module service is used to get the monitoring 1
an explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them
2
MoGiNMA server is the machine or devices in which MoGiNMA software is running. It is also called MoGiNMA agent.
958
MOGINMA: AN AUTONOMOUS GRID NODE MONITORING AGENT
information from the MoGiNMA server like CPU utilization, memory utilization, process level information, ARP cache information, network traffic, traffic analysis etc. The monitoring module again uses three other modules SNMP module, shared libraries module, command line options module. Depending upon the type of request from the user the monitoring system will choose any one of these modules. If the user request can be services from SNMP daemon then the monitoring module will forward the user request to the SNMP module. If the request will be serviced by the shared libraries in windows system and flat files /proc directory in Linux then the monitoring module will forward the request to shared libraries module. If the request will be serviced by the command line options then the monitoring module will forward the request commands line option module. Archiving data access module is used to access the monitored data which is archived by the continues monitoring module and stored in monitoring data archiving store and provide the user with historical data and aggregation results on the history monitoring data of the MoGiNMA server. The ontology update module is used to update the ontology information in to the ontology database store of each MOGINMA server according to the ontology updating messages send by the neighboring peers MoGiNMA servers. Now in this way all the peers share the knowledge and all the MoGiNMA servers can understand each and every message from other peer MoGiNMA servers. Now talking about the third module, the sender module, in this we have three important parts the continuous monitoring module, decision making module and inform module.
The continuous monitoring module, continuously monitors the MoGiNMA server parameters like CPU usage, memory usage, network in/out traffic etc these parameters can be dynamically configured and added for monitoring and storing into the into the monitoring data archiving. In the problem identification module some thresholds can be set like if CPU usage goes up-to some limit send trap to some specific consumer from the event/trap relation information base. If some problem is identified that is to be resolved the decision making module above the problem identification module will be informed about the problem and decision making module will take appropriate decision. This decision making module is also scalable and we can add more and modules to make decision making process more effective and intelligent. On the top of the sender module we have three others modules which are actually used to send message to neighboring peer agents and consumers. These modules are the inform ontology updates module, trap/inform demon module and help message to peers module. The inform ontology update modules sends inform messages to its peers about the ontology updates so that all peers can update their ontology knowledge base and understand the messaged which hey receive from their neighboring peers. Trap/inform demon module is responsible to send traps to consumer who have subscribed with as consumers in the traps/inform database store and there information is present in the event/trap relation information base. The type and frequency of information traps should be defined at the time of service invocation. Help messages to peer module is responsible to send request
959
IADIS International Conference WWW/Internet 2004
messages to peers in case of any request not understood by the MoGiNMA server, any user authentication problem, any un understood pattern in the behaviors of the MoGiNMA server.
2.2 MOGINMA Peer Group Architecture Peer-to-peer systems (P2P) have emerged as a significant social and technical phenomenon over past years. Two factors have fostered the recent explosive growth of such systems, the low cost and high availability of large numbers of computing and storage resources, and second, increased network connectivity. As these trends continue, the P2P paradigm is bound to become more popular. Unlike traditional distributed systems, P2P networks aim to aggregate large numbers of computers that join and leave the network frequently and that might not have permanent network (IP) addresses. In pure P2P systems, individual computers communicate directly with each other and share information and resources without using dedicated servers. We have also used peer-to-peer (P2P) distributed systems, i.e., systems in which all nodes have identical responsibilities and all communication is symmetric. We have implemented an application which does ontology sharing and communication among its peers and peer groups. Peer-to-peer computing offers several advantages over other traditional distributed systems, such as automatic load balancing and self-organization. In our proposed algorithm we used P2P system for developing an autonomous monitoring system for grid monitoring and ontology sharing. We used distributed ontology architecture in our implementation. In the process of ontology distribution the ontology update messages are sent to peers based on semantics.
3. CONCLUSION Grid is an ambitious and exciting global effort to develop an environment in which individual users can access computers, databases and experimental facilities simply and transparently, without having to consider where those facilities are located. The resources are the most critical part of Grid systems. Grid systems are complex and highly scalable so there is a need to develop systems that is autonomic [3] in nature and aims at bringing a new level of automation like self-healing, self-optimizing, self-configuring and self-protection functions [4]. This paper proposes architecture for Grid node agent which will reside on each execution node of the grid and make its operations autonomic. We are carrying out the implementation of the proposed system. This concept can provide strong foundation that lead toward autonomic computing realization environment. Negotiation is a key to scalable and adaptive autonomous distributed systems. In future we plan to incorporate negotiation protocol which will help MoGiNMA servers to interact with other peer MoGiNMA servers to resolve problems in a collaborative environment. Our negotiation platforms will contain adaptive negotiation strategies. MoGiNMA nodes will have learning capabilities through ontology sharing so that they can easily adapt to new environments.
REFERENCES [1]Ian Foster. The Grid: A New Infrastructure for 21st Century Science. February 2002. [2]Kinji Mori. Autonomous Decentralized Systems: Concept, Data Field Architecture and Future Trends [3]Roy Sterritt, Dave Bustard. Autonomic Computing-A Means of Achieving Dependability [4]Roy Sterritt, Dave Bustard. Towards Autonomic Computing Environment [5]Ian Foster, Carl Kesselman. Grid Information Services for Distributed Resource Sharing [6]Helene N. Lim Choi Keung. Predicting the Performance of Globus Monitoring and Discovery Services Arizona State University. Peer-to-Peer Computing Scalable. [7] M. A. Baker and G. Smith. GridRM: A Resource Monitoring system for the grid, International workshop on grid computing 2002
960