Energy Efficient Data Placement Algorithm for Cloud ...

3 downloads 7683 Views 449KB Size Report
Based Partitioning Algorithm (FBPA) for replica placement, considers the servers in the ... Keywords: Cloud computing, Cloud storage, GridSim, Pre-replication ...
Energy Efficient Data Placement Algorithm for Cloud Data Center R. Kingsy Grace1,∗ , D. Janani2 , R. Manimegalai3 , M. Geetha2 and J. Mahitha2 1 AP,

Department of CSE, Sri Ramakrishna Engineering College, Coimbatore. 2 Students, Sri Ramakrishna Engineering College, Coimbatore. 3 Director-Research, Park College of Engineering and Technology, Coimbatore. e-mail: [email protected], [email protected]

Abstract. Cloud storage is a model of networked enterprise storage where data is stored in virtualized pools of storage. The data in cloud storage is hosted by third parties. A data center is a centralized repository, either physical or virtual, for the storage, management, and dissemination of data and information. Hosting companies operate large data centers. People, who required their data to be hosted, buy or lease storage capacity from them. As server energy demand doubles every 4 to 6 years, large amounts of CO2 are produced. It is estimated that servers consume 0.5% of the World’s total electricity usage. The proposed algorithm, Frequency Based Partitioning Algorithm (FBPA) for replica placement, considers the servers in the data center as nodes of the hypergraph. The hypergraph is partitioned using hmetis tool. The best partition is identified based on node’s access frequency to place the data. To reduce the energy consumption, the average query span i.e. the average number of machines that are involved in processing of a query is minimized. The proposed algorithm is simulated using GridSim Toolkit. The results show that FBPA minimizes average query span by 8.97% and execution time by 55.17% when compared with Pre-Replication Algorithm (PRA). Keywords:

Cloud computing, Cloud storage, GridSim, Pre-replication algorithm, Energy consumption.

1. Introduction Cloud computing is Internet-based computing, whereby shared resources, software and information are provided to consumers on-demand, to execute applications [1]. A Cloud data center is a centralized repository or a local cluster of nodes in which multiple copies of the same data is stored on different nodes [2]. The Data management is the process of controlling data which involves data pre-processing, formatting, data fusion, data storage, data analysis, query estimation and optimization. Data management is necessary in cloud and grid environments by providing interfaces between existing data storage and manipulation systems, identifying missing functionality and verifying application requirements [10]. Replication in computing ensures consistency between redundant resources by sharing information to improve reliability, fault-tolerance, accessibility and performance of operational source systems [8]. It enables IT organizations to provide the business with access to the most recent or current data irrespective of the complexity and diversity of the IT landscape. Data replication is necessary in cloud for fast retrieval, off-site real time data copies, without the offsite infrastructure and maintenance and rapid development. To reduce the amount of time needed to access the data, data replication can be utilized [4]. Placing data on temporary local storage devices offers many advantages, but such data placements also require careful management of storage resources and data movement, i.e. allocating storage space, staging-in of input data, staging-out of generated data, and de-allocation of local storage after the data is safely stored at the destination. Data placement is either embedded in the computation which leads to some delay, or performed as simple scripts which do not have the privileges of a job [11]. The US Environmental Protection Agency (EPA) data center report [5] mentions that the energy consumed by data centers has doubled in the period of 2000 and 2006 and estimates another two fold increase from 2007 to 2011 if the servers are not used in an improved operational scenario [5]. This power related cost includes investment, operating expenses, cooling costs and environmental impacts. Frequency Based Partitioning Algorithm (FBPA) for ∗ Corresponding author

© Elsevier Publications 2014.

923

R. Kingsy Grace, et al.

replica placement is proposed to reduce energy consumption in cloud data center. It places the data in the best partition based on the highest access frequency. 2. Related Work Ashwin et al. have proposed replica selection and data placement algorithms to reduce average query span i.e. number of machines involved in the execution of a query [4]. The proposed algorithm in [4] identifies which data items to replicate and where to place them. The proposed work considers the network as a hypergraph and uses hMETIS tool for partition [9]. After the partitioning, the data items will be replicated using the data placement algorithms. Four data placement algorithms, namely, Iterative HPA (IHPA), Dense Subgraph-based (DS), Pre-replication (PR) and Local Move-Based Replication (LMBR) are proposed in [9]. Among the four proposed algorithms, LMBR reduces energy consumption [4] than the others. Nitesh et al. have proposed the dynamic reconfiguration of cluster based on the workload [5]. If the current workload goes below or beyond the specified threshold the cluster nodes will be turned off or on. The system tries to reduce the energy consumption of data centers by reconfiguring the cluster. The proposed algorithm checks for the availability of enough space and turns on or off the nodes if required. Two types of cluster reconfiguration, scaling up and scaling down are used. The process of scaling up will be triggered when the average utilization of the nodes goes beyond the specified threshold. The proposed system has resulted in the energy savings of 54% under low workloads and 33% under average workloads. Resource inefficiency is caused by wastes at different levels in data centers [6]. Open Reputational Model (OPERA) is a new trust model that allows users to query the reputation vector of any registered component. It also reduces the management intervention and employs monitoring tool. Reputation gives information about the past behaviours of an entity and helps one to decide which entities to trust. To demonstrate the effectiveness of OPERA Tung et al. have integrated the OPERA trust model into the scheduler of Hadoop. It helps to reduce the number of re-executed tasks and the execution time of Hadoop’s jobs under the presence of failures and heavy workloads. It improves the energy efficiency up to 53.32%. Susan et al. have proposed a smart replication strategies called Sliding Window Replica Strategy (SWIN) [3] to minimize the amount of data transmitted and stored. The key idea of sliding window replica scheme is to build a “sliding window” that is a set of files which will be used immediately in the future. The performance of SWIN replica strategy is evaluated in an energy efficient cluster called Sage. Energy consumption has become a critical issue for data centers, triggered by the rise in energy costs, volatility in the supply and demand of energy and the widespread proliferation of power-hungry Information Technology (IT) equipment [7]. This is based on the combination of cooling power provisioning along with IT workload provisioning at a more granular level in a large data center by using the concept of Computer Room Air Conditioning Units (CRACs) in thermal zones. The author explored a novel approach to coordinated-management of IT systems audits cooling infrastructure to joint power and temperature objectives. In particular, for a given total IT workload, the proposed method is to determine the optimal settings of CRACs in a data center so as to minimize overall energy consumption in the data center while satisfying specified temperature constraints. Efficient Multi-site Data Movement in Distributed Environment yields an energy savings of 10%, for unconsolidated IT power while it gives energy savings of 50% for consolidated IT power. 3. FBPA: Frequency Based Partitioning Algorithm for Replica Placement The proposed system aims at reducing the energy consumption in cloud data centers by efficient data placement. Initially, the system considers the computers in the data center as the nodes of a hypergraph which consists of vertices and hyperedges. After the construction of hypergraph, the hypergraph is partitioned using Java based multilevel partitioning tool called hMETIS [9]. After the partitioning, the grid resources and the grid user entities are created using GridSim. The proposed Frequency Based Partitioned Algorithm (FBPA) for replica placement reduces the job execution time and average query span which leads to reduced energy consumption. The performance of FBPA is compared with PRA [4] based on various parameters such as number of partitions, average query span, number of queries and execution time. The main idea of FBPA is to improve the performance of the algorithm by reducing execution time and average query span. This is done by replicating nodes in the partitions based on their frequency i.e. the number of times node is accessed. The hypergraph H (V, E) is given input to the proposed FBPA algorithm. H is the hypergraph with V vertices, E hyperedges and N number of partitions with C capacity of each [4]. The next step is to partition the hypergraph using Hypergraph Partitioning algorithm (HPA). The hMETIS partitioning tool is used as HPA. The partitions formed 924

© Elsevier Publications 2014.

Energy Efficient Data Placement Algorithm for Cloud Data Center

Figure 1. FBPA Pseudocode.

are named as G 1 , G 2 , G 3 . . . These partitions are grouped under the name G. The network is created using GridSim. The frequency and score values for all the vertices are computed. The requency is the number of times node is accessed and score is the number of hyperedges that contain node but do not contain any other node in the partition G i [4]. The vertices are arranged in descending order based on the frequency and score. Spanning partitions are calculated to identify the hyperedges of a particular vertex. The Maxspan is calculated and the important node is moved to the Maxspan. The pseudo code for FBPA is shown in Figure 1. 4. Experimental Results In this work, the problem of minimizing the total energy consumption is solved by reducing the number of resources consumed. The proposed algorithm for replica placement FBPA, computes final partitioned hypergraph and average query span. Then HPA is applied once in the initial partitioning, which leads to reduced execution time. In PRA [4], score values alone are computed and HPA is applied twice to get the final partitions. The key parameters of the dataset considered are: (i) |D|, the number of data items, (ii) N Q, the Number of Queries, (iii) C, the partition capacity, (iv) N, the number of partitions, (5) ADI, the Query size (Average Data Items). In this proposed FBPA, average query span and job execution time are measured by varying the above mentioned parameters. The performance of the PRA and the proposed algorithm, FBPA are compared based on various parameters such as (i) Number of Partitions Vs Average Query Span (ii) Number of Partitions Vs Execution Time (seconds) (iii) Number of queries Vs Average Query Span. Figure 2 shows the comparison of number of partitions with the average query span by varying partitions from 5 to 25. The average query span of PRA, FBPA and non-replication are compared. From figure 2 it can be observed that the proposed work, FBPA is 8.97% and 14.04% efficient when compared to PRA [4] and non-replication respectively. Figure 3 shows the comparison of number of partitions with the execution time by varying partitions from 5 to 25. When the non-replication strategy is employed the algorithm execution time is zero. Figure 3 shows the execution time of the proposed approach, FBPA and the existing PRA. It can be observed that the proposed FBPA required 55.17% less execution time than PRA. Figure 4 shows the comparison of number of queries with the average query span by varying number of queries from 20 to 100. The result shows that FBPA is 3.96% and 4.04% efficient than PRA and non-replication. From figure 2 and figure 3 it is clear that the proposed FBPA performs better than PRA and non-replication strategy.

© Elsevier Publications 2014.

925

R. Kingsy Grace, et al.

Figure 2. Comparison of number of partitions with average query span.

Figure 3. Comparison of number of partitions with execution time.

Figure 4. Comparison of number of queries with average query span.

5. Conclusion The proposed algorithm, Frequency Based Partitioning Algorithm (FBPA) for replica placement effectively reduces the energy consumption in the cloud data center, FBPA solve the problem of data placement to minimize the total energy consumption and the total resources consumption. The proposed system identify query span, the number of machines involved in executing a query, as having a direct and significant impact on the total resource consumption. These relations are used the proposed algorithm to solve the above problem. The FBPA algorithm provides the partitions of nodes, which is obtained from replicating the important nodes among the partitions based on the frequency (number of times accessed) of the node. It reduces the average query span. To evaluate the efficiency of the proposed system, performance is compared with the existing data placement algorithm. In terms of average query span, number of partitions, execution time, query size and number of queries. The simulation results show that, FBPA has less execution time in comparison with PRA, since the proposed system, FBPA prevents repartitioning using HPA by placing replica in the maximum spanning partition. The proposed algorithm, FBPA minimizes average query span by 8.97% and execution time by 55.17% when compared with Pre-Replication Algorithm (PRA). References [1] R. Buyya, A. Beloglazov and J. Abawajy. Energy-efficient Management of Data Center Resources for Cloud Computing: A Vision, Architectural Elements, and Open Challenges. In: Proceedings of the 2010 International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 2010, Las Vegas, USA. [2] N. Bitar, S. Gringeri and T. J. Xia. Technologies and Protocols for Data Center and Cloud Networking. IEEE Communications Magazine, 2013, vol. 51, issue 9, pp. 24–31. [3] Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd. Data Replication and Power Consumption in Data Grids. IEEE 2nd Conference on Cloud Computing Technology and Science (CloudCom), 2010, Indianapolis, pp. 288–295.

926

© Elsevier Publications 2014.

Energy Efficient Data Placement Algorithm for Cloud Data Center [4] K. Ashwin Kumar, Amol Deshpande and Samir Khuller. Data Placement and Replica Selection for Improving Co-location in Distributed Environments. CoRR Technical Report arXiv: 1302.4168, 2013. [5] Nitesh Maheshwari, Radheshyam Nanduri and Vasudeva Varma. Dynamic Energy Efficient Data Placement and Cluster Reconfiguration Algorithm for Map Reduce Framework. Future Generation Computer Systems, 2012, vol. 28, issue 1, pp. 119–127. [6] Tung Nguyen and Weisong Shi. Improving Resource Efficiency in Data Centers Using Reputation-based Resource Selection. Sustainable Computing: Informatics and Systems, 2012, vol. 2, issue 3, pp. 389–396. [7] R. Das, S. Yarlanki, H. Hamann, Jeffrey O. Kephart and V. Lopez. A Unified Approach to Coordinated Energy-Management in Data Centers. 7th International Conference on Network and Service Management (CNSM), 2011, pp. 1–5. [8] Sang-Min Park, Jai-Hoon Kim, Young-Bae Ko and Won-Sik Yoon. Dynamic Data Replication Strategy Based on Internet Hierarchy BHR. Lecture notes in Computer Science Publisher, Springer-Verlag, Heidelberg, 2004, vol. 3033, pp. 838–846. [9] George Karypis and Vipin Kumar. Multilevel hypergraph partitioning: hMETIS – A Hypergraph Partitioning Package. Version 1.5.3. Department of Computer Science, University of Minnesota, 1998. [10] S. Sakr, A. Liu, D. M. Batista and M. Alomari. A Survey of Large Scale Data Management Approaches in Cloud Environments. Communications Survey and Tutorials, IEEE, 2011, vol. 13, issue 3, pp. 311–336. [11] T. Kosar. Data Placement in Widely Distributed Systems. Ph.D. dissertion, University of Wisconsin-Madison, Computer Science Department, United States, 2005.

© Elsevier Publications 2014.

927

Suggest Documents