The 7th International Conference for Internet Technology and Secured Transactions (ICITST-2012)
RACS: A Framework for Resource Aware Cloud Computing Sheheryar Malik, Research Team OASIS INRIA - Sophia Antipolis Sophia Antipolis, France
[email protected]
Fabrice Huet Research Team OASIS INRIA - Sophia Antipolis Sophia Antipolis, France
[email protected]
Abstract—Porting of the enterprise IT infrastructure to the cloud based solutions has raised many issues particularly related to the cloud computing. Every enterprise wants to utilize reliable cloud infrastructure with a high level of performance by keeping cost as low as possible. We need a model to achieve this. In this paper, we introduce a framework, which increases the performance of the application and ensures high level of reliability during the scheduling of the process / application onto the cloud. It is a cloud scheduler module named as Resource Aware Cloud Scheduling (RACS) module, which helps the scheduler in making the scheduling decisions on the basis of different characteristics of cloud resources. These characteristics can be reliability, network latency, bandwidth, error rate, topology, proximity, processing power, fault tolerance, memory availability, library availability, environment compatibility, and monetary cost of the cloud services. RACS consists of multiple sub modules, which are responsible for their corresponding tasks.
Keywords- Cloud computing; Cloud scheduling; Resource awareness; Latency grouping; Network measurement; Reliability I. I NTRODUCTION Use of cloud computing infrastructure is increasing day by day with the advent of novel cloud services. A big number of enterprises are attracted to shift their computing from in-house infrastructure to cloud infrastructure. The client enterprise can take advantage of intensive computing capabilities and scalable virtualized environment of cloud computing to execute their tasks. By using cloud infrastructure, processing is done on remote cloud computing nodes. On the other side, due to the rapid increase in the number of cloud service providers (cloud operator), a cloud user has many options to choose a cloud operator from a variety. But it becomes really difficult to select one particular cloud operator, as a cloud operator could be better than the other in some cloud services and lacking behind in some other services. Client enterprise can choose resources / virtual machines from different cloud operators on the basis of their strengths. But the selection of the virtual machines on different cloud operators on the basis of different characteristics of cloud resources is a complex process. A sophisticated mechanism is required for the cloud scheduler to undergo this process. In this research, we propose a framework (Resource Aware Cloud Computing), which helps the cloud scheduler to choose
978-1-908320-08/7/$25.00©2012 IEEE
Denis Caromel Department of Computer Science University of Nice Sophia Antipolis Sophia Antipolis, France
[email protected]
the cloud resources from the different cloud operators, and perform the scheduling decisions on the basis of certain characteristics. It describes a sophisticated mechanism for the dynamic selection of cloud resources. The scheduling decision can be done on the basis of resource characteristics like reliability, network latency, bandwidth, error rate, topology, proximity, processing power, fault tolerance, memory availability, library availability, environment compatibility, and monetary cost of cloud service. All the functions of scheduler are performed on the basis of the resource characteristics. So the main emphasis of our research is, on the scheduling of tasks on the cloud on the basis of resource characteristics. The rest of the paper is structured as follows. Section 2 gives the background and existing related work done in the area. Then in section 3, we present our proposed model. In Section 4, we describe the results of experimental evaluation of our proposed algorithm. We conclude in Section 5 and discuss the future research directions. II. C ONCEPTUAL F RAMEWORK : R ESOURCE AWARE C LOUD C OMPUTING We propose a framework named Resource Aware Cloud Computing (RAC2), which helps the cloud scheduler to choose the cloud resources from the different data centers / cloud operators, and perform the scheduling decisions on the basis of certain characteristics. It describes a sophisticated mechanism for the dynamic selection of cloud resources. The framework implementation is Resource Aware Cloud Scheduling module (RACS), which assists the scheduler in doing scheduling decisions. The scheduling decision is performed on the basis of reliability, internode network latency, fault tolerance, and monetary cost of cloud service. The model of Resource Aware Cloud Computing (RAC2) is given in figure 1. In the figure we can see that a user submits its job to the cloud scheduler / manager along with its required characteristics. The scheduler requests the assistance from the RACS module. RACS has implementation for these different characteristics. With the help of resource characteristics repository it finds a result and returns it back to the scheduler. Then scheduler schedules the job on the cloud infrastructure on the basis of information gained by the RACS module.
680
The 7th International Conference for Internet Technology and Secured Transactions (ICITST-2012)
Fig. 1.
Resource Aware Cloud Cloud Computing Framework
In the framework, we have addressed following issues with respect to resource characteristics. • Latency aware node Grouping and Scheduling (LGS) • Reliability based Scheduling and Fault Tolerance (RSFT) • Cost aware Scheduling (CaS) A. Latency aware Node Grouping and Scheduling (LGS) Network latency is an important issue in distributed computing. It becomes more important when we are executing our distributed tasks onto virtual machines, belonging to different cloud operators or physical distant clusters. Distributed applications generally need some inter-node (inter-virtual-machine) communication. To provide a better throughput we need to have minimum latency among communicating nodes (i.e. virtual machines). So there should be a solution to provide the cloud user with the group of nodes which have internode latency as minimal as possible and also the overheads in the solution should be minimum. It should help the scheduler to schedule the cloud users tasks on the basis of this grouping information. 1) Problem Statement - LGS: We come up with a problem statement to solve. It has a following requirements to address; (1) group nodes on the basis of nodes connection value (i.e. latency) instead of number of connections to the nodes. (2) nodes in a group must have minimal inter-node latency. (3) algorithm should work on incomplete latency information i.e. we do not have NxN communication instances (connections). (4) the groups should be pre-computed, so that the user applications should not need to ask the algorithm to compute the groups every time it needs to run. (5) produce mutually exclusive groups, so a node in one group should not appear in other group. If the node is not-mutually exclusive in groups then a node which has already been assigned to an application
978-1-908320-08/7/$25.00©2012 IEEE
can be a candidate to be assigned to another application because of appearing in another group. Even it is really required, when a process wants to execute its sub-tasks on multiple groups. So the groups should be mutually exclusive to fulfill this requirement. In this part of the work we propose a model for latency based node grouping and task scheduling. 2) Related Work - LGS: Node grouping is most commonly referred as community / group / cluster detection and have been done for various fields, including internet, world wide web, social network, biological network, complex systems, citation network, graph theory etc. There are different types of algorithms tailored to the needs of the discipline. Most of these algorithms are based on grouping of nodes on the basis of connection count for a particular node. However, in our problem scenario, we are grouping the nodes on the basis of connection value between nodes. The connection value is determined by the latency between the nodes. In general, a network community is a group of nodes with more interactions between its members than the other [1]. A node in a group or community normally has more interactions inside than outside the group. We have examined the existing algorithms for our problem statement, but none of the them fulfill all the requirements of it. As mentioned in our problem statement, every node connection has a value and the node proximity to its neighbor is determined by that value. Generally, the nodes are grouped with respect to the number of nodes connected to a particular node in the existing community detection algorithms. Thus the nodes have relationship in terms of hop count. Most of these algorithms produce non-mutually exclusive groups. Thus, none of these existing algorithms fulfill all the requirements of our problem statement. The most established and widely used algorithms are: Distributed community detection model by Hui et al. [2] in which they have three variations of their algorithm based on SIMPLE, K-CLIQUE and MODULARITY methods. Local Spectral Partitioning algorithm by Anderson [3] et. al. is a graph partitioning algorithm. Flow-based Metis+MQI [4] is proposed by Lang et. al. and is also a graph partitioning algorithm. Clauset has proposed an agglomerative algorithm [5] that greedily maximizes the modularity. Zhao devises a mechanism for hierarchical agglomerative clustering with respect to ordering constraint [6]. Garcia devised a general framework for agglomerative hierarchical clustering algorithms [7], in which he tested and compared different existing algorithms. Newman has proposed an algorithm [8], [9], which is based on the edge removal mechanism. Our algorithm produces mutually exclusive groups according to our problem statement and performs well on incomplete latency information with a complexity of O(n2 ). 3) Proposed Solution - LGS: We propose an algorithm, to group nodes with respect to their inter-node latencies [10]. This algorithm groups the nodes on the basis of incomplete information gathered during the inter-node communication. The resource aware cloud scheduling module stores or updates the latency information into its repository and then executes the algorithm to create or updates the group on the basis of this
681
The 7th International Conference for Internet Technology and Secured Transactions (ICITST-2012)
information. When a user’s application requests the scheduler for a particular number of nodes with minimum inter-node latency, our proposed algorithm assist the scheduler in finding the most suitable node group to fulfill the request. As stated before that the propose algorithm group the nodes with minimum inter-node latency together. In this solution, the algorithm is quite capable to group the nodes with the minimum available latency information. Latency information is available only for those nodes, which have done some communication with other nodes. It does not rely on broadcasting to calculate the inter-node latency, as broadcasting consumes a lot of bandwidth and put a burden on network traffic. The proposed algorithm does not send special messages to other nodes to calculate latency. It uses the piggy back technique by determining the latency for the nodes which have done some communication. The algorithm produces multiple mutually exclusive groups with variable number of nodes. Thus a node can appear in only one group. It is ideally suitable when a process wants to execute its sub-tasks on multiple groups. Scheduler finds the group or groups with most suitable size for the demanding process. For a process which is not going to demand more resources during its course of execution, it should find a group closer to the minimum number of nodes required. For a process which can demand the scalable resources, scheduler chooses a group which can support execution in peak scalability demanded. The algorithm has different configurations, which differs according to the situation. The algorithm result is normally different for every configuration. The algorithms runs periodically and also when there is some change in cloud, like the process migration to other virtual machine, change of IP address etc. B. Reliability based Scheduling and Fault Tolerance (RSFT) Generally, a client enterprise is concerned about the reliability of cloud infrastructure and the link to those infrastructure resources, due to remoteness and loses control over infrastructure. A cloud user wants to execute his computation tasks onto reliable infrastructure and to have deterministic results of application execution. To achieve this, he wants to acquire the highly reliable cloud services. Even for some specific type of applications (e.g. real time, financial, medical), reliability is an essential property to be ensured. But there are more chances of errors in cloud computing, due to the undetermined latency and lose control over computing nodes. The chances of errors can be even more, if we use cloud resources from different cloud operators. So it is required to have some reliability assessment mechanism for cloud resources, and the scheduling decision should be made on the basis of resource reliability. So we require to have a solution which assess the reliability of cloud resources and perform scheduling on the basis of the reliability values. There are some reliability assessment algorithms for real time systems, but they assess timeliness only for correctness. There is no role of timeliness in the reliability assessment. 1) Problem Statement - RSFT: After analyzing the scenario and existing work we have come up with a problem statement.
978-1-908320-08/7/$25.00©2012 IEEE
The requirements to address in the problem statement are following; (1) Reliability values should be a continuous number rather than binary. (2) Reliability should be assessed by a central cloud management module. (3) It must be evaluated directly by the cloud manager on the application performance rather than by the feedback from the peer nodes. (4) There should be different algorithms for different application types (at least different for for real time and non-real time application). (5) Timeliness must have some role in the reliability assessment of the real time algorithm. (6) Should be able to perform scheduling and fault tolerance both in the same model on the basis of reliability values. In this part of the work we propose a model for reliability assessment and consequently scheduling and fault tolerance on the basis of reliability values. 2) Related Work - RSFT: Most of the related work is done in the area of reputation assessment and reliability assessment. Most of the existing models/algorithms rely on the binary trust values, whereas in our problem statement we have shown that it is better to have a continuous number representing reliability. In most of the existing work, algorithms do not have a choice for different types of applications. In most the reliability assessment algorithms for real time systems, the timeliness is checked only for the correctness of result and do not have any direct role in the assessment of the reliability. Most of the existing work is done for the peer-to-peer systems or volunteer computing systems which are quite different from cloud computing. They assess the reliability mostly on the basis of feedback from the peers. So we have examined the existing algorithms for our problem statement, but none of the them fulfill all the requirements of it. Some of the related algorithms / models are: Opera (OPEn ReputAtion) is a reputation based resource selection mechanism to improve the resource efficiency in the data centers and is proposed by Nguyen and Shi [11]. An adaptive reputation-based scheduling model presented by Sonnek et. [12] for large scale donation-based distributed infrastructure. Achim et. al. presented a model for the reputation based service selection in cloud environment [13]. Hou et. al. introduced a reputation based grid resource selection model [14]. Wang et. al. presented a reliability driven reputation based scheduler for the public resource computing [15]. Alunkal et. al. proposed a model for reputation based grid resource selection [16] and is based on the concept of dynamic global trust. Azzedin and Maheswaran proposed a model for integrating trust into grid resource management systems [17], in which the grid is divided into autonomous administrative entities named as grid domains. Sherwood et. al. presented a distributed scheme for reputation assessment in NICE [18]. Zhang and Fanghave proposed a reputation system for reliable service selection in P2P systems [19], which does the service selection based on feedback of peers and by the users based on their previous experience. Rahman et. al. proposed a reputation based scheduling mechanism for workflow applications in peer-to-peer grids [20], which employs structured P2P indexing and networking techniques to create a grid overlay. Damiani et. al. proposed a reliable resource se-
682
The 7th International Conference for Internet Technology and Secured Transactions (ICITST-2012)
lection mechanism in P2P networks based on reputation [21], in which reliability is assessed through distributed polling. EigenTrust algorithm is proposed by Kamvar et. el. which do the reputation management in peer-to-peer file sharing networks [22]. PowerTrust is a reputation system for P2P systems [23], which uses a distributed ranking system to select the most reputable nodes. Swaminathan and Manimaran have proposed a scheduler for multiprocessor real time system, which performs the scheduling on the basis of reliability [24]. 3) Proposed Solution - RSFT: We propose a model, which enables a scheduler to schedule the tasks, on the basis of adaptive reliability of nodes (virtual machine) [25], [26]. The model is for the reliability assessment of cloud’s computing instances. The core of this model is reliability assessment algorithms, which computes the reliability for computing instance on the basis of type of application. Reliability assessment is adaptive, which changes after every computing cycle. It takes into account the instant reliability and previous reliability for calculation. If a virtual machine manages to produce a correct result (also within the time limit for real time application), its reliability may increase depending on the algorithm. And if it fails to produce the correct result (or result within time for real time application), its reliability decreases. A metric model is given for the reliability assessment. In our model, each compute instance has reliability values associated with him. The system assesses the reliability for different types of applications. We have divided applications in two main categories i.e. general applications and real time applications. Real time applications are further divided into their two main categories i.e. soft real time applications and hard real time applications. Thus each compute instance has three reliability values, one for each application type. We also have different reliability assessment algorithms for the three different application categories. For general applications, we have basic reliability assessment mechanism. For soft real time and hard real time applications we have a time based reliability assessment mechanism. In time based reliability assessment, not only the logical result of the applications but also the time of delivery actually makes an impact on the reliability ratings of a compute instance. The assessment mechanism for both soft and hard real time differ from each other as the time requirements in hard real time are more stricter than in soft real time applications. C. Cost aware Scheduling (CaS) Monetary cost is one of the major concern for the cloud users while using cloud computing infrastructure cost. They want to have higher throughput, reliability, quality of service but along with the cost as low as possible. In a federated cloud scenario, a user has a variety of resources available from various cloud vendors. He can choose on the cloud vendor on the basis of cloud, but it becomes very difficult to assess that which cloud vendor will cost him lesser. As one cloud vendor may offer a service at a cheaper rate than other and the other may offer another service at a cheaper rate. So an automated solution is needed to perform the automated selection of cloud
978-1-908320-08/7/$25.00©2012 IEEE
resources on the basis of their pricing model and user needs. In this part of the work we propose a cost based cloud scheduling model. 1) Problem Statement - CaS: After analyzing the scenario and existing work we have come up with a problem statement. The requirements to address in the problem statement are following; (1) Assess the estimated time of execution or service utilization for an application. (2) Estimate the cost of resource utilization. (3) Should provide the cloud user with a picture having various pricing options with different cloud vendors and options. 2) Related Work - CaS: There is very less work done in the area of cost estimation for cloud resources. Some of the work are: Sharma et. al. presented a cost aware elasticity provisioning system for the cloud name Kingfisher [27] which provides support for elasticity in the cloud. It do this by providing multiple mechanisms to reduce the transition time of reconfigurations and by optimizing the selection of virtual server configurations to minimize cost. Liu et. al. presented a cost-aware resource selection model [28] which is based on Weighted Set Covering Problem (WSCP) is in accordance to the principle of spatial locality of data access. They apply a weighted Greedy heuristic to produce an approximately optimal resource set for each task and shown that their approach can produce an approximately optimal solution to achieve efficiency and economic demands. 3) Proposed Solution - CaS: We propose an model, which helps the cloud scheduler to do the scheduling on the cloud on the basis of cost of service utilization [29]. Our model enables a cloud user to have a depiction of the monetary cost for the cloud resources, which he is going to utilize. It tells the user an estimated comparative cost of cloud services from different cloud operators. It also informs him about the estimated level of quality of service, reliability and other factors along with the monetary cost. There is also an automated mode, in which user specifies the minimum and maximum levels of different criteria and scheduler automatically choose the cloud services with optimal cost. We have made an algorithm which is responsible to perform all the tasks mentioned above. This algorithm works in two modes i.e. interactive and automated mode. This algorithm gathers the information from different cloud providers about their billing policies. Then when a user wants to schedule its job on the cloud, it first assesses the estimated time of utilization for the requested service. If the user has selected the interactive mode, then the algorithm informs the user about the possible pricing options along with the other parameters like quality of service, throughout, reliability, security etc. Then user chooses the cloud service / vendor. If the user has selected the automated mode, then the algorithm automatically assigns the service to the users request on the basis of parameters given by the user. III. F RAMEWORK D ESIGN AND I MPLEMENTATION Resource Aware Cloud Computing framework is consisted of various models, algorithms and concepts. RACS is the implementation for these models and algorithms. RACS helps
683
The 7th International Conference for Internet Technology and Secured Transactions (ICITST-2012)
Fig. 2.
Resource Aware Cloud Cloud Computing Framework
the scheduler to make the scheduling decisions on the basis of resource characteristics. It also help the Resource Manager in performing resource allocation, de-allocation, node acquisition or release decisions. It is aims to helps the scheduler to make more proficient decisions. By doing this, it increases the efficiency of the user’s tasks on the cloud infrastructure. RACS is to be an integrated part of the cloud scheduler and has to work with the scheduler in a harmony to make the scheduling decisions more prudent and efficient. We have integrated RACS with the ProActive scheduler for experimentation. RACS has different sub-modules, which have implementation for the models and algorithms for different resource characteristics. It also has a repository, which contains information about cloud services and their characteristics available from different cloud operators. RACS enables an enterprise either to choose services deliberately or automatic service selection through scheduler. The RACS may ask the cloud scheduler to choose computing nodes from different cloud operators on the basis of different resource characteristics criteria. RAC2 has two modules named RACS Server and RACS Client. RACS server is the main module which perform all the decisions on the resource characteristics. Whenever RACS is referred in this paper, it actually refers to RACS Server. RACS client is installed on the computing nodes and is responsible for the monitoring related to the resource characteristics like latency and fault monitoring.
978-1-908320-08/7/$25.00©2012 IEEE
The scenario of assisting a scheduler by the RACS is given in figure 2. Here a cloud user submit its application with the required criteria to the scheduler. Scheduler requests the RACS Server to reply with the result against characteristics. RACS Server ask the corresponding sub module(s) to perform the action. The sub-module returns the result to the RACS Server which ultimately send it back to the Scheduler. RACS Same procedure is applied if the user request is routed through Cloud Resource Manager. In this case, the request is typically for allocation or de-allocation of the resources based on resource characteristics. When the Scheduler received the response from the RACS server, it performs the scheduling on the resources located at the data center(s). In the data centers there are variety of computing nodes available typically virtual machines nodes. In case of IaaS, the bare virtual machines are available, and in case of PaaS, there are virtual machines with platform provided by the cloud operator like Linux, Unix, Windows etc. On each computing node there is a RACS client installed which perform the monitoring. RACS Client does not have any direct involvement in the scheduling decisions. They are used to work with RACS server to determine certain values for the characteristics of resources like latency estimation, fault monitoring for reliability assessment etc. Our framework consists of two major modules i.e RACS Server and RACS Client as shown in figures 2 and 3. RACS Server is located along with the Scheduler, whereas RACS
684
The 7th International Conference for Internet Technology and Secured Transactions (ICITST-2012)
Fig. 3.
Resource Aware Cloud Cloud Computing Framework
Client is installed on each computing node. A. RACS Server RACS server is the core module which perform all the decisions on the resource characteristics. RACS Server performs two types of tasks. (1) It assists the cloud scheduler in performing scheduling decisions on the basis of the resource characteristics criteria. In this it fetches the information it has stored in the resource characteristic repository and provides it to the scheduler to perform scheduling. (2) It executed the algorithms / procedures for the calculation and estimation of values or operations associated with the resource characteristics. These operations are typically independent of Schedulers requests. They are periodically or event-basely performed, e.g. internode latency estimation and grouping is performed periodically. So RACS is responsible for all these. There are three main sub-modules, which perform the main tasks related to resource characteristics. There are different sub-modules or components of the RACS Server which perform their corresponding tasks as shown in figure fig:GeneralFramework-RACSArchitecture. • Latency aware node Grouping and Scheduling (LGS) Server • Reliability based Scheduling and Fault Tolerance (RSFT) Server • Cost aware Scheduling (CaS) Server • RACS Manager • Resource Characteristics Repository (RCR) • RACS Clock 1) Latency aware Node Grouping and Scheduling (LGS) Server: This is the first of the main sub-modules of the RACS Server. It is a server type component and responsible for node grouping and scheduling. So LGS Server performs two tasks; (1) grouping of nodes with respect to their internode latency, (2) and help the scheduler to scheduler the user’s tasks on the basis of latency-based grouping information. The core of the module is the group discovery algorithm, which is responsible to group the nodes. The communication is done through
978-1-908320-08/7/$25.00©2012 IEEE
RSFT Controller. The controller sets the configurations in the RCR and executed the group discovery algorithm for group computation. The result of the algorithm is again stored in the RCR. Controller works with Latency Monitor at RACS Client to get the latency information between two communicating nodes. This latency information is updated in RCR. In case of a scheduling request with a particular group size, the request is first forwarded to RACS Manager at RACS Server from the Scheduler. The RACS Manager forwards that request to the RSFT Controller. Controller finds an appropriate group from the RCR and returns the result back to the scheduler through the same channel. 2) Reliability based Scheduling and Fault Tolerance (RSFT) Server: This is the second sub-module of the RACS. In this module we assess the reliability of the computing nodes and then on the basis of these reliability values schedule the users tasks and perform fault tolerance. So RSFT Server performs three main tasks; (1) Assessment of reliability values for the computing nodes on the basis of application type, (2) help the cloud scheduler at the application scheduling time, in performing the scheduling decision on the basis of reliability values associated with the cloud nodes, (3) and help in performing fault tolerance during application execution on the basis of node reliability values. The core of this sub-module are reliability assessment algorithms which are associated with different application types. All the operations and communication are done though RSFT Controller. The Controller is responsible to execute the algorithm for the assessment of nodes’ reliability. It selects an algorithm according to the type of application. The corresponding algorithm finds the reliability values and stores them in the reliability store area of RCR. In case the scheduler requests the RACS Server for node(s) with highest reliability, the request is forwarded to RSFT Controller through RACS Manager. Controller then check in the reliability store for the nodes with the highest level of reliability for a particular application type. Then it return the result back to the Scheduler. 3) Cost aware Scheduling (CaS) Server: This is the third sub-module of the RACS. In this module we assess the estimated cost for utilizing a particular resource or service. The core of this module is cost estimation service, which performs the following tasks; (1) Assess the estimated time of execution with reference to particular application, (2) estimate the cost of resource utilization among different options and different cloud operators. All the operations and communication are done though CaS Controller. The Controller is responsible to run the cost estimation service and return the result to the scheduler through RACS Manager. 4) RACS Manager: RACS Manager is the gateway to the RACS Server. All the requests are routed through it. It is responsible for communication and to perform an operation on the sub-modules of RACS Server. It works as an interface between RACS Server and RACS Client, Scheduler, Resource Manager. 5) Resource Characteristics Repository (RCR): RCR is a data store to keep the information about the resources and
685
The 7th International Conference for Internet Technology and Secured Transactions (ICITST-2012)
their characteristics. It holds the values which are used for the scheduling and resource allocation purposes. It gets the values by following ways, (1) from the repository of cloud resource manager, (2) from the calculations and estimations done by the RACS Server’s sub-modules, (3) and from the user in form of configurations. It has different repository areas for different purposes e.g. Reliability Store area stores the reliabilities of the nodes. 6) RACS Clock: RACS Clock is part of the RACS Server and its job is to synchronize all the clocks of the cloud nodes. It works in conjunction with the Clock Tickers at the RACS Clients to keep the clocks of cloud node synchronized with the Server. B. RACS Client RACS client is installed on the computing nodes and is responsible for the monitoring related to the resource characteristics like latency and reliability. It collects the resource related information and send it to the RACS Serve periodically or on the request of RACS Server. There are two main components of the RACS Client which perform their corresponding tasks as shown in figure fig:GeneralFramework-RACSArchitecture. The components or sub-modules of RACS Client are; • Latency Monitor (LM) • Fault Monitor (FM) 1) Latency Monitor: Latency Monitor is responsible for monitoring the internode latency between nodes. It extract the time stamp information from the piggyback message. Then it compares the message emission time at the sender node with the reception time at its own node to find the latency. This information is periodically sent to the LGS Server at the RACS Server. 2) Fault Monitor: It is responsible for monitoring of faults occurring in the computing nodes. Our current implementation use two methods to do that. One is dynamic and other is static. In dynamic fault monitoring we run the actual application and have an Acceptance Test for the application. The acceptance test verifies the application correctness at the run time. In the static method, we run the selection script instead of actual application execution. The selection script tells the requirement and environment for application execution. If there is some problem the selection script test is failed and we reduce the reliability. IV. D EPLOYMENT, E XPERIMENTS AND P ERFORMANCE E VALUATION We have deployed RACS with the ProActive Cloud by integrating it into ProActive scheduler. ProActive is an open source cloud middleware. Our ProActive deployment has computing nodes at various data centers. It has nodes on local clusters and on Amazon EC2. We have also deployed it along with ProActive at Grid5000 by acquiring nodes. We evaluated RACS against number of experiments with different configurations on a variety of platforms. It includes the local cluster, ProActive cloud nodes, Amazon EC2 nodes and Grid5000 nodes. These experiments were done for different
978-1-908320-08/7/$25.00©2012 IEEE
purposes depending on the resource characteristics. The results of these experiments are already demonstrated in published papers related to the issues discussed in this paper. V. A DVANTAGES Our proposed model possess many advantages. • improves the performance of the user applications running onto the cloud. • offers better reliability. • gives a feeling to the users that they know something internal to the cloud. • offers a fault tolerant mechanism specific to the type of applications. • improves the sharing among multiple users. • maximize application throughput by minimizing the internode latencies in a distributed application. • allows users to save monetary cost by – reducing the time of service utilization by minimizing the internode latencies in a distributed application. Lesser utilization time may have lesser monetary cost. – reducing degree of replication by using more reliable resources. More degree of replication requires more cost. – utilizing lesser reliable resources (in case of a reliability based cost calculation). Some applications may not require to be executed on very reliable resources and trade off can be made on reliability to save cost. – using the CaS to have a cost estimation before and thus selecting for the cheaper resources. • for the cloud vendor – it is highly energy efficient. Use of RACS brings higher application throughput and lesser failures, which consequently result in lesser resource occupation time and saves energy. – reduce the resource wastes, as resources are wasted due to failures. Resource waste can be reduced by scheduling tasks onto more reliable resources. – makes it easier to find the the computing resources which are unreliable. Thus he can easily remove them. Unreliability can lead to a mistrust on the cloud vendor from the users. – reduces the degree of administration. – saves the monetary cost due to above mentioned factors. VI. C ONCLUSIONS In this paper we have presented a framework for the network and resource aware cloud computing. It helps the cloud scheduler to perform the scheduling on the basis of three characteristics. These characteristics are latency, reliability, and monetary cost. We have build models and algorithms for these characteristics. In this paper we have given some introduction, related work and model information for these models. They are discussed in brief here and details can be found in individual papers on each issue.
686
The 7th International Conference for Internet Technology and Secured Transactions (ICITST-2012)
R EFERENCES [1] J. Leskovec, K. J. Lang, and M. W. Mahoney, “Empirical comparison of algorithms for network community detection,” in Proceedings of the 19th international Conference on World Wide Web (WWW 2010), Apr. 2010, pp. 631–640. [2] P. Hui, E. Yoneki, S.-Y. Chan, and J. Crowcroft, “Distributed community detection in delay tolerant networks,” in Proceedings of the 2nd ACM/IEEE International Workshop on Mobility in the Evolving Internet Architecture (MobiArch’07), Aug. 2007. [3] R. Andersen, F. Chung, and K. Lang, “Local graph partitioning using pagerank vectors,” in Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), Oct. 2006, pp. 475– 486. [4] K. Lang and S. Rao, “A flow-based method for improving the expansion or conductance of graph cuts,” in Proceedings of the 10th International IPCO Conference on Integer Programming and Combinatorial Optimization, ser. Lecture Notes in Computer Science, D. Bienstock and G. Nemhauser, Eds. Springer Berlin / Heidelberg, Jun. 2004, pp. 383– 400. [5] A. Clauset, “Finding local community structure in networks,” Physical Review E, vol. 72, Aug. 2005. [6] H. Zhao and Z. Qi, “Hierarchical agglomerative clustering with ordering constraint,” in Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (WKDD ’10), Jun. 2010, pp. 195–199. [7] R. J. Gil-Garcia, J. M. Badia-Contelles, and A. Pons-Porrata, “A general framework for agglomerative hierarchical clustering algorithms,” in Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), Aug. 2006, pp. 569 –572. [8] M. E. J. Newman, “Detecting community structure in networks,” Eur. Phys., vol. 38, pp. 321–330, 2004. [9] ——, “Fast algorithm for detecting community structure in networks,” Physical Review E, vol. 69, 2004. [10] S. Malik, F. Huet, and D. Caromel, “Latency based dynamic grouping aware cloud scheduling,” in Proceedings of the 26th IEEE International Conference on Advanced Information Networking and Applications Workshops, ser. AINA 2012, Mar. 2012, pp. 1190–1195. [11] T. Nguyen and W. Shi, “Improving resource efficiency in data centers using reputation-based resource selection,” in Proceedings of the IEEE 2010 International Conference on Green Computing, Jul. 2010, pp. 389– 396. [12] J. Sonnek, A. Chandra, and J. Weissman, “Adaptive reputation-based scheduling on unreliable distributed infrastructure,” IEEE Transactions in Parallel and Distributed Systems, vol. 18, pp. 1551–1564, Nov. 2007. [Online]. Available: http://dx.doi.org/10.1109/TPDS.2007.1094 [13] O.-M. Achim, F. Pop, and V. Cristea, “Reputation based selection for services in cloud environments,” in Proceedings of the 2011 14th International Conference on Network-Based Information Systems, Sep. 2011, pp. 268–273. [14] Z. Hou, X. Zhou, M. Wilde, J. Gu, and M. Hategan, “A runtime reputation based grid resource selection algorithm on the open science grid,” in Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems, ser. ICPADS ’09, Dec. 2009. [15] X. Wang, C. S. Yeo, R. Buyya, and J. Su, “Reliability-driven reputation based scheduling for public-resource computing using ga,” in Proceedings of the 2009 IEEE International Conference on Advanced Information Networking and Applications, ser. AINA 2009, May 2009, pp. 411–418. [16] B. K. Alunkal, I. Veljkovic, G.-v. Laszewski, and K. Amin, “Reputationbased grid resource selection,” in Proceedings of the ACM/IFIP Workshop on Adaptive Grid Middleware, Sep. 2003. [17] F. Azzedin and M. Maheswaran, “Integrating trust into grid resource management systems,” in Proceedings of the IEEE International Conference on Parallel Processing, ser. ICPP ’02, Aug. 2002. [18] R. Sherwood, S. Lee, and B. Bhattacharjee, “Cooperative peer groups in NICE,” Computer Networks, vol. 50, pp. 523–544, Mar. 2006. [19] Y. Zhang and Y. Fang, “Fine grained reputation system for reliable service selection in peer-to-peer networks,” IEEE Transactions in Parallel and Distributed Systems, vol. 18, pp. 1134–1145, Aug. 2007. [20] M. Rahman, R. Ranjan, and R. Buyya, “Reputation-based dependable scheduling of workflow applications in peer-to-peer grids,” Computer Networks, vol. 54, pp. 3341–3359, Dec. 2010. [Online]. Available: http://dx.doi.org/10.1016/j.comnet.2010.05.016
978-1-908320-08/7/$25.00©2012 IEEE
[21] E. Damiani, D. C. di Vimercati, S. Paraboschi, P. Samarati, and F. Violante, “A reputations-based approach for choosing reliable resources in peer-to-peer networks,” in Proceedings of the 9th ACM conference on Computer and communications security, ser. CCS ’02, Nov. 2002, pp. 207–216. [22] S. D. Kamvar, M. T. Schlosser, and H. Garcia-Molina, “The EigenTrust algorithm for reputation management in P2P networks,” in Proceedings of the 12th international conference on World Wide Web, ser. WWW ’03, May 2003. [23] R. Zhou and K. Hwang, “PowerTrust: A robust and scalable reputation system for trusted peer-topeer computing,” IEEE Transactions in Parallel and Distributed Systems, vol. 18, pp. 460–473, Apr. 2007. [24] S. Swaminathan and G. Manimaran, “A reliability-aware value-based scheduler for dynamic multiprocessor real-time systems,” in Proceedings of the 16th International Parallel and Distributed Processing Symposium, ser. IPDPS ’02, April 2002. [25] S. Malik and F. Huet, “Adaptive fault tolerance in real time cloud computing,” in Proceedings of the 2011 IEEE World Congress on Services, ser. SERVICES 2011, Jul. 2011, pp. 280–287. [26] S. Malik, F. Huet, and D. Caromel, “Reliability assessment model for scheduling and fault tolerance in cloud computing,” in Submitted in the 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, ser. PDP 2013, Feb. 2013. [27] U. Sharma, P. Shenoy, S. Sahu, and A. Shaikh, “A cost-aware elasticity provisioning system for the cloud,” in Proceedings of the 2011 31st International Conference on Distributed Computing Systems, ser. ICDCS ’11, Jun. 2011. [28] W. Liu, F. Shi, W. Du, and H. Li, “A cost-aware resource selection for data-intensive applications in cloud-oriented data centers,” International Journal of Information Technology and Computer Science (IJITCS), vol. 1, pp. 10–17, Aug. 2011. [29] S. Malik, F. Huet, and D. Caromel, “A model for cost aware cloud computing,” in Submitted in the 10th IEEE International Conference on Dependable, Autonomic and Secure Computing, ser. DASC 2012, Dec. 2012.
687