Skewly Replicating Hot Data to Construct a Power-efficient Storage Cluster Lingwei Zhang1, Yuhui Deng1,2, Weiheng Zhu1, Jipeng Zhou1, Frank Wang3 1
Department of Computer Science, Jinan University, Guangzhou, 510632, P. R. China. E-mail:
[email protected] 2 Key Laboratory of Computer System and Architecture, Chinese Academy of Sciences Beijing, 100190, PR China 3 School of Computing,University of Kent, CT27NF, UK
Abstract The exponential data growth is presenting challenges to traditional storage systems. Component-based cluster storage systems, due to their high scalability, are becoming the architecture of next generation storage systems. Cluster storage systems often use data replication to ensure high availability, fault tolerance, and load balance. However, this kind of data replication not only consumes a large amount of storage resources, but also generates more energy consumption. This paper presents a power-aware data replication strategy by leveraging data access behavior. This strategy uses 80/20 rule (80% of the data accesses often go to 20% of the storage space) to skewly replicate only a small amount of frequently accessed data. Furthermore, the storage nodes are divided into a hot node set and a cold node set. Hot nodes, which store a small amount of hot data copies, are always in an active state to guarantee the QoS of the system. The cold nodes which store a large volume of infrequently accessed cold data, are placed in a low power state, thus reducing the energy consumption of the cluster storage system. Simulation results show that the proposed strategy can effectively reduce the resource and energy consumption of the system, while ensuring the system performance. Keywords: energy consumption; storage cluster; data replication; hotspot data;
1. Introduction As a result of explosive growth of data, commercially component-based clustered storage system, due to its high scalability, is becoming the architecture of next generation storage systems. However, the probability of system failure grows with the expansion of the system. The failures include hardware overheating, power failure, disk corruption, network wiring, maintenance, etc (Pinheiro et al., 2007). Additionally, natural disaster and hacker attack also cause the system failure. Those failures would reduce the system availability to the range of 95 to 99.6% (Dahlin et al., 2003). Traditional data replication approaches can improve the system availability, ensure the fault tolerance, and maintain load balance by using multiple data copies. For example, GFS uses three data copies (Ghemawat et al., 2003). However, due to the explosive growth of data, it is challenging to maintain multiple copies of the whole dataset in the existing storage systems. We should have an in-depth consideration about the data replication. The 80/20 rule was first proposed by Italian economist Pareto. He stated that, for many events,
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
roughly 80% of the effects come from 20% of the causes (Pareto principle). Breslau et al. (1999) analyzed six traces and reported that the distribution of web page requests generally follows a Zipf-like distribution where the relative probability of a request for the i’th most popular page is proportional to 1/iα , with α typically taking on some values less than unity. Staelin et al. (1990) observed that there was a very high locality of reference on extremely large file systems. Some files in the file system have a much higher skew of accesses than others. Gomez and Santonja (2002) found that some of the data blocks are extremely hot and popular, but others are rarely or never accessed in terms of the investigation of several real traces. Cherkasova and Ciardo (2000) addressed the characterization of web workloads, which shows that 10% of the files accessed on the server typically account for 90% of the server requests and 90% of the bytes transferred. Xie and Sun (2009) developed a file assignment strategy for parallel I/O systems by leveraging the 80/20 rule. The above work indicate that most of the data requests only access a small part of the data files. This is also called skewed data access pattern. We call this small part of data files as hotspot data files. Since most of the data requests can be satisfied with the small part of hotspot data files, replicating the hotspot data should maintain the system performance at an acceptable level, while consuming minimal resources. This paper presents a low-power data replication strategy in storage clusters based on data access behavior. This strategy only replicates the frequently accessed hotspot data that is normally less than 20% of the whole data set in terms of the 80/20 rule. It divides the storage nodes into a hot node set and a cold node set. Hot nodes, which store a small amount of hot data copies, are always in an active state to guarantee the QoS of the system. The cold nodes, which store a large number of infrequently accessed cold data, are placed in a low power state, thus reducing the energy consumption of the cluster storage system with a minimal resource consumption. Our key contributions are as follows: (1) We propose a power-aware data replication strategy in storage clusters by leveraging the skew of data access pattern. (2) We divide storage nodes into a cold node set and a hot node set in terms of the data access pattern. The hot nodes are employed to guarantee system QoS, while the cold nodes can be switched to a low-power state to save energy when appropriate. (3) We perform comprehensive simulations to evaluate and explore the impacts of different parameters on the behavior of system performance and power consumption. The remainder of the paper is organized as follows. Section 2 introduces the related work. Section 3 describes the system architecture. Section 4 analyzes the system in detail, including the creation of data copies, data consistency, the optimal number of data copies, etc. Section 5 constructs a simulation system and evaluates the proposed method in this paper. Section 6 concludes the paper with remarks on the contributions of the paper. 2. Related work Recently, there are a large number of studies focusing on dynamic data replication technologies. Ranganathan et al. (2002) presented a dynamic and model-driven replication strategy that automatically produces copies in a decentralized fashion whenever it is required to improve the system availability. In this model all the peers are independent to take replication decision and they can create copies of files they store. Yuan et al. (2007) proposed a dynamic data replication strategy by considering the bottleneck of data grid storage capacity of different nodes and the bandwidth available between these nodes. Tang et al. (2005) presented two dynamic replication algorithms including simple bottom up and aggregate bottom up to reduce the average response time. In the proposed architecture, each node at any middle tier provides
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
resources to the lower tier nodes as a server. A replication decision is made only at the dynamic replication scheduler which maintains information about the data access history and client access pattern. PHFS (Khanli et al., 2011) uses predictive techniques to predict the future usage of files and then pre-replicates the files in a hierarchal data grid on the path from source to client. Park et al. (2004) attempted to improve the network locality by replicating the files within the network region. Wolfson et al. (1997) presented an algorithm that changes the replication scheme as changes occur in the read-write pattern. The algorithm continuously moves the replication scheme towards an optimal one. Lamehamedi et al. (2002) presented a set of replica management services and protocols to offer high data availability, low bandwidth consumption, improved fault tolerance, and scalability of the system by considering the access cost and replication gains. Rehman et al. (2005) proposed to place a replica at a site where both the utility and risk index are considered according to the current network load and user requests. Green computing has been a hot research topic in the community of cluster computing for many years. It is more challenging for the storage clusters because of the explosive growth of data. Fan et al. (2007) investigated the power consumption of a typical server. They reported that a disk drive takes 12W. From a power standpoint, it seems the power consumption of a single disk drive is not a problem. However, if hundreds or thousands of disk drives are put together, the total power consumption will quickly become a big headache (Deng, 2011). One example shows that the storage subsystem accounts for 27% of the energy consumed in a data centre (Power Heat, and Sledgehammer). To worsen the situation, this fraction is swiftly increasing as storage requirements are rising by 60% annually (Moore, 2002). Shafi et al. (2003) studied real web-server workloads from sports, e-commerce, financial, and internet proxy cluster and found that the average server utilization varies between 11% and 50%. The reason of the low utilization is because the system has to offer overprovision to guarantee performance at the periods of peak loads. This observation gives us opportunities to reduce the energy consumption of clusters. There are many research efforts invested in reducing the energy consumption of clusters. Verma et al. (2008) employed power management techniques such as dynamic consolidation and dynamic power range enabled by low power states on servers to reduce the power consumption of high performance applications on modern power-efficient servers with virtualization support. Pinheiro et al. (2003) developed a system that dynamically turns cluster nodes on/off to handle the load imposed on the system. The system makes reconfiguration decisions by considering the total workload imposed on the system, the power, and performance implications of changing the current configuration. Elnozahy et al. (2003) employed various combinations of dynamic voltage scaling and node vary-on/vary-off to reduce the aggregate power consumption of a server cluster during periods of reduced workload. MISER(Ge et al., 2007) is a run-time DVFS scheduling system. MISER is capable of providing fine-grained and performance-directed DVFS power management for a power-aware cluster. Huang and Feng (2009) proposed a run-time DVFS scheduling algorithm for a cluster system to reduce the energy consmuption. β-algorithm (Hsu and Feng, 2009.) is a run-time DVFS scheduling algorithm that is able to transparently and automatically reduce the power consumption while maintaining a specified level of performance. FAWN (Andersen et al. 2009) combines low-power CPUs with small amounts of local flash storage, and balances computation and I/O capabilities in order to offer low-power, efficient, and parallel data access on a large-scale cluster. Gordon (Caulfield et al., 2009) utilizes low-power processors and flash memory to reduce the power consumption and improve performance for data-centric cluster. ECS2 (Huang et al., 2013) utilizes data redundancies and deferred writes to conserve energy for erasure-coded storage clusters. The parity blocks are buffered exclusively in active data nodes whereas parity nodes are placed into a low-power mode, thus saving energy. Huang et al. (2005) proposed to dynamically place copies of
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
data in file system’s free blocks to reduce the head positioning latencies. As one or more copies can be accessed in addition to their original data block, choosing the nearest copy can significantly improve I/O performance. The reduced disk access time also leads to 40–71% energy savings per access. This is the first work of saving energy by leveraging data replication method in the community. However, their work only focuses on a single disk but not a global vision of a cluster system. In contrast to the existing work of data replication and energy saving of computer systems, by leveraging the skew of data access pattern, this paper proposes to concentrate a small portion of data (frequently accessed data) into a few hot nodes that are maintained in an active state, and switch the cold nodes to a low-power state, thus saving energy of a storage cluster. 3. Architecture Internet Metadata Server Hot node Set
Cold node Set
1000Mbps Network
Fig.1. Replication based low-power storage cluster architecture Fig.1 shows the architecture of a replication based and low-power storage cluster system. It is mainly composed of a metadata server and storage nodes. The metadata server is responsible for managing and scheduling according to the load conditions of the storage nodes. Storage nodes are divided into a hot node set and a cold node set. Each node in the hot node set is called a hot node and each node in the cold node set is called a cold node. The data stored in the hot nodes is called hotspot data which is frequently accessed by clients. The hotspot data has multiple copies that are distributed across the hot nodes. The data stored in the cold nodes is occasionally accessed by clients. Normally, the hotspot data is just a small part of the data set in the storage cluster system in terms of 80/20 rule. Therefore, the number of hot nodes is normally much less than that of cold nodes. Moreover, the number of nodes in these two sets can be dynamically adjusted by the system, and the nodes in the cold set are switched to a low power state most of the time. Fig.2 demonstrates the main replication modules in the storage cluster. It consists of Replica Selector (RS), Replica Catalog (RC), Node Catalog (NC), Replica Manager (RM), and Local Replica Manager (LRM), where RS, RC, NC, RM reside in the metadata server and LRM locates in each storage node. Different modules play different roles in the energy-efficient storage cluster. RS receives requests issued from clients, retrieves the corresponding file number from the requests, and then searches the record from RC according to the file number. After locating the record, we get all the replicas’ location of this file and then send to the RM for processing. Each record in RC is organized as . RC stores the mapping relationship from logical file name to physical file name, where the logical file name is a global identification of a single data file in order to provide clients for a specific file
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
service, and the physical file name is the true file name in a specific node. According to the mapping relationship, once we get a file number from the client request, the storage nodes that store the file can be identified. We can further determine whether the file is a hotspot data or not according to the strategy of hot data identification. If this file is a hotspot data, it means there are multiple copies distributed in the storage cluster. According to the field of the location of replicas, we can find out all the copies. Each record in NC is . NC stores the mapping relationship from logical node name to the real data file. The metadata server forwards the requests to a specific storage node according to load condition, and determines whether the node is a hot node or a cold node in terms of the hot mark. If a request is forwarded to a cold node, the cold node will be switched to an active state to handle the request. RM is responsible for managing the load-balancing of the storage cluster. It receives all the replicas’ location of the requested file from RC. By using a load-balancing algorithm, RM finds out the storage node that has the minimal workload and forwards the request to the node. RM can also dynamically replicate the hotspot data to the hot node set. LRM is located on each storage node in the cluster. It is responsible for communicating with RM on the metadata server. When it receives a request from the metadata server, it starts to establish a logical data path to client for data transmission. In addition, LRM also monitors the replication of hotspot data. When it is told to backup the hotspot data by RM, it will accept the file transmission and create the file copy in the local storage. Client Request
Not Found
RS: retrieves
File
file number
Exist?
RC: searches
RC: increases the
file number
number of file access
Data Transmission
Y
R Y
W
Hot
Read or
File?
Write?
NC: searches
RM:
LRM: receives
node number
load-balance
requests
RM: replicates to a hot node
N
Metadata Server
Nodes
Fig.2.Replication modules and flowchart of the proposed replication strategy As illustrated in Fig.2, the interaction between clients and the storage nodes is mediated through the metadata server. When a client wants to access the storage nodes, firstly, it sends requests to the metadata server. When the RS module in metadata server receives a request, it retrieves the file number from the request and then searches the file number in RC. If the file number matches the record in RC, the number of file accesses will be increased. Otherwise, RS will notify the client that the file is not located. When the access frequency of a file reaches the predefined threshold of hot file, the RC will mark this file as a hot file and the RM will replicate it to a hot node. After that, RS will search NC in terms of the file number to identify the nodes that store the requested file. Secondly, the metadata server will forward the requests to the corresponding storage nodes in terms of the workloads and the scheduling strategy. Generally, RM
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
prefers to select a hot node to forward the request. If the requested file is not in the hot node set but in the cold node set, the RM will switch on the corresponding cold node that stores the required file to process the request. The required file will be transferred to the client directly from the node that contains the data without going through the metadata server. Since most of the requests will be concentrated in the hot node set, the hot nodes are always in an active state to handle the requests and the storage nodes in the cold node set are transferred to a low-power state to save energy. The cold nodes are switched from a low-power state to an active state to serve the corresponding requests when it is necessary. After satisfying the requests, the cold nodes are transferred to a low-power state to reduce energy consumption. Since most of the requests are served by the hot nodes in terms of the 80/20 rule, this approach will not significantly affect system performance, while saving energy.
4. System Model 4.1 Identifying Hotspot Data Table 1. Characteristics of different cache replacement algorithms Algorithms LFU
LRU
SLRU
Features It is based on the access counts of the cache lines. The cache lines which have been used least frequently are evicted. Unfortunately, the recently active but currently cold cache lines tend to remain entrenched in the cache. Therefore, the inactive data increases the miss ratio and reduces the cache performance. LRU evicts the cache lines used least in the recent past on the assumption that it will not be used in the near future. LRU is the most frequently used algorithm because it is simple and easy to implement, and offers very good performance. SLRU divides the cache into a probationary segment and a protected segment. The segments arrange the lines in the cache from the most to the least recently accessed. Data from misses is appended to the most recently accessed end of the probationary segment. All hits in the cache are eliminated from the current position and appended to the most recently accessed end of the protected segment. Hits Header
Hot list
Tail
List is full (degraded) Hits (promoted) New request
Header
Recent list
Tail
Evicted
Fig.3 Mechanism of hot data identification There are several typical cache replacement algorithms including Least Frequently Used (LFU), Least Recently Used (LRU), and Segmented LRU (SLRU) (Karedla et al., 1994). Table 1 summarizes the characteristics of different cache replacement algorithms. In order to identify hot data effectively, we use
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
two fixed length LRU lists including a hot list and a recent list to identify the most frequently and recently accessed data. When the system receives a request, the corresponding data will be recorded on the recent list. If the data on the recent list is accessed again in a short period of time, the data will be promoted to the hot list. If the promoted data is already on the hot list, the data will be moved to the head of the hot list. If the hot list is full, the last data on the hot list will be degraded to the recent list. If the recent list is full, the last data on the recent list will be discarded. Fig.3 shows the mechanism of hot data identification. This method is very effective in our experiment. When a client sends a file request as , the RS receives the request and finds out the record from RC in terms of the file number. The first few data objects on the hot list will be replicated by the RM from a specific node to the other k-1 nodes so that it has k copies. 4.2 Optimal Number of Replicas The traditional goal of data replication is improving the system reliability and concurrency. If one copy is not available due to hardware failures, the accesses going to the copy can be diverted to the remaining copies, thus improving the system reliability. Furthermore, multiple copies offer parallel access to increase the system concurrency. Therefore, the system reliability and concurrency grows with the increase of the number of copies. When the number of copies is increased to a certain point, the system reliability reaches one. More copies will not further enhance the system reliability except wasting resources. Generally, the reliability of a distributed storage unit over period T follows the exponential distribution as follows:
F
e
λT
(1) where λ is the failure rate of the distributed storage unit (Li et al., 2012; Blake and Rodrigues, 2003). For example, if a data centre experiences 10 disk failures out of 1000 disks per year, the average failure rate is 0.01/year. According to the reliability equation, the data availability can be derived from equation (Li et al., 2012; Blake and Rodrigues, 2003):
a
1
1
e
λT
(2) where k is the number of copies, λ is the failure rate of the distributed storage unit, T is the storage duration of the data with k copies. It can be found that if we have the number of copies, the failure rate, and the data availability requirement, the storage duration can be computed as : √ (3) λ Table 2 summarizes the storage duration with different failure rate λ, number of copies k, and data
T
availability requirement. Table 2.The calculation of storage duration (years) k 1
2
3
a
λ
0.1
0.01
0.001
99%
0.100503
1.005034
10.050336
99.99% 99.9999%
0.001000 0.000010
0.010001 0.000100
0.100005 0.001000
99%
1.053605
10.536052
105.360516
99.99%
0.100503
1.005034
10.050336
99.9999%
0.010005
0.100050
1.000500
99% 99.99%
2.426366 0.475276
24.263665 4.752764
242.636649 47.527644
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
99.9999%
0.100503
1.005034
10.050336
4.3 Data Concurrency Since this paper attempts to leverage multiple hot data copies to reduce the energy consumption of storage clusters, data concurrency is essential to maintain the consistency between the source data and the replicated data. The data concurrency used in our system includes data updates, joining and leaving hot node set. When a client wants to update a data file, it sends a request to the metadata server. The RS in the metadata server receives the request and finds out the record from RC in terms of the file number. The read/write lock will be checked to identify the availability of the file. If it is locked, the request will be blocked in the FIFO queue of the metadata server to guarantee the data consistency, since one of the storage nodes is processing the file. The current request will obtain the lock from RM after the previous file operation is completed and the lock is released. RM will propagate the updates to all copies in the hot node set. In the RC, the dirty bit of the data copy required to be updated but stored in cold node set will be marked. The lock will be released. If the cold node containing the data copy is switched on to serve a read request, the dirty bit of the corresponding data copy in the RC will be checked. If it is marked, the data copy will be updated. This approach can also length the idle periods experienced by the cold node. The system workload normally changes with time. If a large portion of data residing in a cold node becomes hot in terms of the hot data identification mechanism, this node will become a hot node. When the node joins the hot node set, the LRM of the node sends a join message containing the node number to RM. The RM will check all the files on the node to confirm whether the dirty bit is set. If the dirty bit is set, it means the file copies on other nodes have been updated after the node left the hot node set. In order to maintain the copy consistency, we need to replicate the new file copies from the cluster nodes to update the files on the newcomer. When the update is finished, it will clear the dirty bit and switch the newcomer from a low-power state to an active state. By analogy, if most of the hot data in a hot node becomes cold, this hot node will be evicted from the hot node set and join the cold node set. When the node leaves the hot node set, the LRM of the node sends a leave message containing the node number to RM. The RM will mark the leave tag to the node record and transfer the node from an active state to a low-power state. After the node leaves the hot node set, the dirty bit must be set if a client sends a request to update the file copies, and the dirty bit is cleared until the new file copies re-update the old files on the node when it rejoins the hot node set. 4.4 Bandwidth We assume that there are N storage nodes in the cluster offering storage service, and the cluster employs a 1000Mbps network to connect to the Internet. Traditionally, a well designed scheduler should be able to distribute the clients’ requests evenly across the N storage nodes to maintain load balance. This indicates that network traffic of each node is 1000/N Mbps. However, as we stated in the introduction section, the requests going to the storage nodes often obey a Zipf-like distribution. This indicates that most of the data accesses focus on a small amount of hotspot data. This data access pattern will generate more opportunities for the cold storage nodes to stay idle. Unfortunately, it will increase the bandwidth and CPU consumption of the hot storage nodes. These hot storage nodes would have a high failure rate due to the intensive workloads. Therefore, it is necessary to monitor and manage the bandwidth usage of the storage nodes. The proposed replication strategy relies on the metadata server to distribute the requests to different
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
storage nodes in terms of the network traffic, thus avoiding overloading the hot storage nodes. 4.5 Power Consumption
Fig.4. Measured power consumption of a cluster node Most computers offer multiple power states (e.g. active, standby, and halt), which consume different amounts of power. Table 3 summarizes the six different power states ranging from S0 to S5 in terms of ACPI. According to the introduction, we define S0, S3, S5 as an active state, a standby state, and a halt state, respectively. Fig.4 illustrates the power consumption of different power states by using a power analyzer to measure a typical cluster node (Power Analyzer Datalogger), where A is the halt state, B is the boot up phase, C is the active state, D is the standby state, E is resume phase, F is the active state, and G is the shut down phase. Table 3. Different power states defined by ACPI Power states
Functionalities of a power state
S0
It is a working state where all devices are fully powered on.
S1
It maintains the power to CPU and RAM, but powers down other devices.
S2
It powers off CPU and leaves other devices on
S3
It switches off all devices except the memory. The operating system is suspended to memory, so that the computer can be quickly woke up to the working state.
S4
It suspends the operating system to disk. When resuming the system, it is required to load the system’s previous running state from disk into memory. Therefore, the recovery process takes more time than S3.
S5
It switches all the devices off except the power supply unit.
Fig.5 further demonstrates the power state transition of a typical cluster node. The halt state takes 2.8W. Its power is increased dramatically and the peak power reaches 59.7W when booting the node. The booting process takes 84 seconds. The node stays stable in the active state and the power consumption is 40.2W. The node can be suspended to the standby state within one second and negligible power consumption, and the power is maintained at 3.5W in the standby state. It takes 10 seconds to resume the
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
node from the standby state to the active sate. In this period, the peak power reaches 51.9W. It takes 24 seconds to shut down the node, the peak power reaches 52.5W during this period. halt (2.8W) up(59.7W):84s
active (40.2W)
down(52.5W):24s suspend(0W):1s resume(51.9W):10s
standby (3.5W)
Fig.5. power state transition of a typical cluster node According to the above evaluation, it is easy to know that the power consumption of the standby state is very close to that of the halt state. However, it takes 84 seconds to switch the node from the halt state to the active state, and 24 seconds to transfer it back. In contrast, it only takes 10 seconds to switch the node from the standby state to the active, and one second to get it back. Therefore, the standby state is more power efficient than halt state. We will take the standby state as a low power state in the following simulations. 4.6 Performance Response time is a very important metric to measure a cluster system. It is the time it takes by a request from initiating the request to the completion of the request. There are many ways of reducing the response time or improving the system throughput for parallel I/O systems (Huang et al., 2005; Hsu et al., 2005; Lee et al., 2000). We will explain how to calculate the response time of the storage cluster in this section. Metadata Server
Node Queues
Disks
FIFO Queue User Interface
Requests
Fig.6.Queue model employed in the storage cluster system Fig.6 illustrates the queue model used in the storage cluster system. Firstly, the user interface receives the requests sent by clients and puts the requests in the metadata server’s FIFO queue. Secondly, the metadata server uses a scheduling algorithm to distribute the requests to a specific queue of the corresponding storage node according to workloads. Thirdly, each node retrieves the requests from its queue for the file read/write operation. In this process, we assume a set of files F f , … , f . The s , d , where s represents the size of the ith file and d denotes the characteristic of each file is f disk in which the file is stored. A set of disks is described as D d , … , d . The characteristic of each disk is d c , r , q , where c represents the capacity of the jth disk, r indicates the transfer rate, q means the number of requests that have not been processed in the queue. A set of requests is denoted as a , b , f , where a represents the arrival time R r , … , r . The characteristic of each request is r
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
of the kth request, b implies the beginning time of processing the request, and f denotes the requested file. The calculation of the response time can be divided into the following three scenarios. (A) When the kth request is forwarded to a node where the queue is empty and the disk is idle, the response time of the request is equal to the I/O process time of the file:
t
r
t
t
t
(4)
And the transfer time of the file is:
t
(5)
(B) If the node queue is empty but the disk is busy in processing the previous requests when the kth request arrives, the request has to wait for the disk to come to the idle state:
t
r
t
b
a
(6)
Then, the response time is:
t
r
t
t
t
t
(7)
(C) If the kth request arrives and the file requests q in the queue have not been processed, the kth request needs to wait for the completion of processing all the requests:
∑ t
t
(8)
Then, the response time is:
t
r
t
t
t
t
t
(9)
When the workload of the cluster system is not intensive, the system can be transferred to an energy-saving state. In this scenario, the cold storage nodes are switched to a low-power state and the hot storage nodes stay on the active state for providing continuous services. Since most of the data accesses demonstrate a Zipf-like distribution, this indicates that most of the requests concentrate on the hot data that is a small part of the overall data set. If a request casually accesses cold data, the corresponding cold storage node will be switched to the active state to process the request. After completing the request, the cold storage node will be transferred back to the low-power state.
5
Evaluation
5.1 Simulation Setup In order to evaluate the system, we construct a simulator consisting of 16 storage nodes and 1 metadata server. The clients initiate requests and send the requests to the metadata server’s FIFO queue. The metadata server retrieves a request from the FIFO queue, locates the storage nodes that have the copies of the requested file, and forwards the request to the node that has the least requests in the node queue. The node is then responsible for processing the request and performing the file read/write operation. According to a performance evaluation in (Deng, 2009), the read/write bandwidth of a single storage node is determined as 60MB/s and 50MB/s, respectively. System parameters used in the simulation are summarized in Table 4.
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
Table 4. System parameters used in the simulation Parameter
Value
Number of nodes Bandwidth File access pattern File size distribution File assignment strategy Number of files Number of requests Active power of each node Standby power of each node Read rate Write rate
16 1000Mbps Zipf-like distribution Uniform file size(100MB) Round-Robin 1000 100000 60W 4W 60MB/s 50MB/s
Some simulation parameters used in our experiment include transition time, response time, and inter-arrival time. The transition time is defined as a threshold that once a node is idle for a specific period of time longer than the given transition threshold, the node is spun down in an effort to save energy. The response time is the time it takes by a request from initiating the request through waiting for and completing the request. The inter-arrival time denotes the time between the arrival of the first request and the arrival of the next request. We set the inter-arrival time in the simulation as a random variable distribution. 5.2 File Access Pattern alpha=0.6 alpha=0.8 alpha=1.0
Number of Access
50000 40000 30000 20000 10000 0
0
5
10
15
20
File Number
Fig.7. Distribution of file accesses Zipf-like distribution is used in the simulation to simulate the file popularity in the storage cluster. We assume that all the files are ranked in order in terms of their popularity where file i is the ith most popular
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
file. With the Zipf-like distribution, the access probability of the ith file is: Ω
PN i Where Ω
∑N
α
, α 0
α
,
(10)
1 is a constant, N is the total number of files in the cluster.
Therefore, Ω is a constant. According to equation (10), PN i is inversely proportional to iα . Therefore, the probability of accessing a specific file grows with the decrease of the file number. Fig.7 shows the distribution of file accesses with different α. It only depicts the top 20 most popular files because the remaining files are rarely accessed. The three curves in Fig.7 demonstrate the same trend. It indicates that only a small portion of files are accessed with very high frequencies, while most of the other files are rarely accessed. Furthermore, the value of α has a great impact on a few files that have very small file number (e.g. file 0). This means that the larger the value of α is, the more requests that access to the most popular files. In order to explore the impacts of different file access patterns on the system behavior, we also generate a random distribution to evaluate the system in following sections. 5.3 Bandwidth Consumption Breslau et al. (1999) reported that the impact of file size on the file popularity is negligible by investigating Web proxy traces. Therefore, we employ uniform file size for simplicity. However, we consider how the file size affects the bandwidth consumption of each node. The bandwidth consumption of each node is the total amount of data processed by the node divided by the entire system running time. We assume that B is the bandwidth consumption, N is the number of requests, S is the file size of each request, T is the entire system running time. Then the bandwidth consumption can be computed as:
B
N
S
(11)
T
Table 5.Impact of file size on the bandwidth consumption of storage nodes (Mbits/s) Node Num
25 MB
50 MB
75 MB
100 MB
125 MB
150 MB
175 MB
200 MB
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total
91 34 20 12 10 7 6 4 4 3 3 2 2 2 2 1 203
186 67 38 25 17 14 12 9 8 7 7 6 5 5 4 4 414
272 102 63 40 27 21 17 14 13 12 9 8 8 7 6 6 625
363 143 76 55 41 28 22 18 18 16 11 10 10 9 8 9 837
460 176 101 65 46 35 32 24 19 17 16 14 12 8 11 9 1045
548 218 117 78 56 42 37 28 23 20 20 17 16 12 10 12 1254
643 255 137 83 69 47 36 35 29 22 21 19 20 18 18 14 1466
727 280 161 108 70 61 43 40 35 27 27 24 20 20 16 15 1674
File size
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
Table 5 summarizes the impact of file size on the bandwidth consumption of storage nodes. It shows that the bandwidth consumption increases with the growth of the file size. For example, the bandwidth consumption of node 0 is improved from 91 Mbits/s to 727 Mbits/s when the file size is increased from 25Mbytes to 200Mbytes. Another indication is that the bandwidth consumption grows with the decrease of the node number. For example, when the file size is determined as 200 Mbytes, the bandwidth consumption is increased from 15Mbits/s of node 15 to 727 Mbits/s of node 0. This is because the files are ranked in order in terms of their popularity, and the files are sequentially assigned to different storage nodes according to the popularity of the files, the frequently accessed files are normally concentrated in the low numbered nodes. This indicates that the bandwidth consumption of the low numbered nodes should be higher than that of the nodes with big node number. 400
Zipf(1 copy) Zipf(2 copies) Zipf(4 copies) Zipf(8 copies) Random
350
Bandwidth(Mbps)
300 250 200 150 100 50 0 0
2
4
6
8
10
12
14
16
Node Number
Fig.8.Bandwidth consumption Fig.8 shows the bandwidth consumption of each node in the storage cluster with different number of copies. Because we replicate the frequently accessed files in terms of Zipf-like distribution, generally, with the increase of the number of the copies, the number of low numbered storage nodes required to handle the requests grows. Fig.8 confirms this behavior of bandwidth consumption. Therefore, if a popular file has multiple copies, the metadata server should distribute the requests across different replica nodes, thus avoiding overloading a specific storage node and the corresponding bandwidth consumption. Furthermore, when using random distribution, Fig.8 demonstrates that the bandwidth consumption is uniform across the cluster nodes as expected. 5.4 System Power Consumption The replication strategy proposed in this paper divides the storage nodes into a hot node set and a cold node set. The hot nodes are always in an active state to provide service, and the cold nodes are maintained in a low-power state most of the time to save energy. Once a request is forwarded to a cold node, the node will be switched to the active state to serve the request. In order to avoid frequent power state switching, the node will stay in the active state for a certain amount time (This is defined as a transition time in Section 5.1) after completing the current requests. When the transition time reaches a predefined threshold, the node will be transferred to the low-power state to conserve energy.
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
In our simulation, the storage cluster consists of 16 nodes. The nodes take 60W and 4W in the active state and the standby state, respectively. Initially, all the nodes are maintained in an active state. The cold nodes will be switched to the standby state after reaching the transition time. We will evaluate the strategy by adjusting five parameters including transition time, the value of α in the Zipf-like distribution, number of copies, and read/write ratio. Initially, the system power reaches 960W when all nodes are in an active power state. If the cold nodes do not receive any requests during a transition time, they will be switched to the standby state for saving energy. After a transition time, the system power decreases drastically, because a large amount of cold nodes are switched to the standby state. When each request comes in, the system will identify the accessed file whether it is a hotspot data file. If the accessed file is not a hotspot data file but satisfies the above conditional expression, it will be regarded as a new hotspot data file and then replicated to the hot nodes for storing the file copies. The number of copies and read/write ratio have a negligible influence on the system power consumption. This is because no matter how many copies a file has, these copies are only stored in hot nodes. In most cases, the number of copies would not increase/decrease the number of the hot nodes, so that the system power consumption is not tightly correlated with this parameter. Therefore, we only analyze the system power consumption impacted by transition time, and the value of α in the Zipf-like distribution. 5.4.1. Transition Time The transition time indicates that once a node is idle for a specific period of time without any incoming requests, which is longer than some given transition threshold, the node is spun down in an effort to save energy. Upon the arrival of a new request, the node is spun up to serve the request. The transition time has a great impact on the system power consumption. It mainly affects the frequency of power state transition of cold nodes. Since the cold nodes can be switched to a standby state for saving energy, the system power consumption would be greatly affected by the state transition frequency. For example, if the transition time is 10, a cold node does not turn back to the standby state until the requests are not forwarded to its queue within the next 10 seconds. If it receives a request during next 10 seconds, it will deal with the request and then wait for another 10 seconds to turn back to the standby state. This can reduce the state transition frequency of cold nodes. 1000
System Power(W)
800
600
400 Zipf(T=10) Zipf(T=20) Zipf(T=40)
200
0 0
100
200
300
400
500
Request Number
Fig.9. System power affected by transition time (Zipf-like Distribution)
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
1000
System Power(W)
800 600 400
Random(T=10) Random(T=20) Random(T=40)
200 0 0
100
200
300
400
500
Request Number
Fig.10. System power affected by transition time (Random Distribution) Fig.9 demonstrates the system power consumption with different transition time (where α=1.0) when using Zipf-like distribution. Initially, all the nodes are maintained in the active state, the system power reaches 960W. After a transition time, some storage nodes are switched to a standby state for saving power. If the transition time is too short, the nodes will experience frequent power state transitions. This would result in workload fluctuation in the system. Fig.9 shows that the system power consumption with short transition time changes more frequently than that of the long transition time. However, if the transition time is too long, the client requests may hit more cold nodes during the transition time. It will make the cold nodes maintain the active state constantly, thereby resulting in a high power consumption. In order to make the system work well, we should set an appropriate transition time. Fig.10. illustrates the system power consumption affected by different transition time when random distribution is employed to simulate the file accesses. The basic trend is very similar to that of Fig.9. However, the random distribution incurs more power consumption than that of Zipf-like distribution. This is because the idle periods are distributed across all cluster nodes when using random distribution, which generates less opportunities for the nodes to save energy in contrast to that of Zipf-like distribution. Fig.11 shows the system power saving affected by transition time and different patterns. When using Zipf-like distribution, the α is equal to 1.0. This figure demonstrates that the power saving grows with the decrease of the transition time for both the Zipf-like distribution and random distribution. Our experimental results indicate that when using Zipf-like distribution, the saved power of transition time=10, transition time=20, and transition time=40 are 55.38%, 34.34%, and 12.13%, respectively. When the access pattern is changed from Zipf-like distribution to random distribution, the saved power are decreased to 24.47%, 12.89%, and 3.64%, respectively. This means that the Zipf-like distribution generates more energy saving than that of random distribution. This is because the Zipf-like distribution can aggregate requests to less storage nodes than that of random distribution, thus generating more opportunities to reduce the energy consumption.
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
60
transition time=10 transition time=20 transition time=40
Power Saving(%)
50 40 30 20 10 0
Zipf
Random
Fig.11. System power saving affected by transition time 5.4.2. The Value of alpha=1.0 alpha=0.8 alpha=0.6
1000
System Power(W)
800 600 400 200 0 200
300
400
500
600
Request Number
Fig.12. System power affected by α value The value of α in the Zipf-like distribution does have an impact on the system power consumption. From Fig.7, we learn that the larger the value of α is, the more requests that access to the most popular files. In other words, a larger value of α indicates that more requests access the hotspot data, so that it can increase the sleep time of the cold nodes. Therefore the whole system remains in the low-power state. Relatively, if α value is too small, it will increase the amount of requests to access the cold nodes. More cold nodes are woke up to deal with the requests so that the system power consumption will increase. As the result shown in Figure 12 (where transition time=5), after the first 200 requests are handled, the system starts to identify hotspot data, and then the value of α begins to affect the system power consumption.
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
Generally, the power saving grows with the increase of value α. We also investigate the system power saving affected by the α value of Zipf-like distribution. Our experimental results show that the power saving of α=0.6, α=0.8, α=1 are 25.34%, 41.14%, and 53.68%, respectively. 5.5 Response Time 35
number of copies=8 number of copies=4 number of copies=2 number of copies=1
30
Response Time(s)
25 20 15 10 5 0 0
1000
2000
3000
4000
5000
Request Number
Fig.13. Response time affected by the number of copies In the simulator, we set the read bandwidth and write bandwidth as 60MB/s and 50MB/s, respectively (Deng, 2009). According to equation (9), it can be found that when the system is in a high-load state, the response time of a request is affected by five parameters including seek time, rotate time, transfer time, busy time, and waiting time. Due to the large data size of files, the seek time and the rotate time are much lower than the transfer time. They are negligible. Therefore, the response time of each request can be approximately calculated by the following equation:
t where the busy time t
t
t
t
(12)
refers to the time required to finish the current request and then turn back to the
refers to the time required to finish all requests in the node idle state for disks. The waiting time t queue. The transfer time t is the time required to complete the data file transmission of the current request. In order to measure the response time, we calculate and record the average response time of every 50 requests. Since the transition time only impacts the system power consumption but not the response time, in the following evaluation, we only consider the average response time impacted by the number of copies, read/write ratio and the value of α. The number of copies is an important parameter affecting the system response time. Metadata server schedules requests to the back-end storage nodes in terms of system workload. If a file has multiple copies in different nodes, due to the load-balancing of the metadata server, the requests can be forwarded to a low-loaded node so that the response time is short. According to the Zipf-like distribution, the requests have a high probability to access the popular files, thus most of the requests focus on the hot nodes. From Fig. 13 (where read/write ratio=0.7, α=0.8), we can see that when the number of copies is 1, the average
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
response time increases rapidly. This is because the requests which access the popular files are centralized to a few hot nodes, and the subsequent requests will experience a long waiting time in the node queue. However, when the number of copies is increased, the requests which access popular files can be distributed across multiple nodes, thus maintaining a load balance between the nodes that store the popular files and reducing the response time.
10
read/write ratio=0.7 read/write ratio=0.5 read/write ratio=0.3
Response Time(s)
8 6 4 2 0 0
500
1000
1500
2000
Request Number
Fig.14. Response time affected by read/write ratio 10
Zipf(alpha=0.6) Zipf(alpha=0.8) Zipf(alpha=1.0) Random
Response Time(s)
8 6 4 2 0 0
500
1000
1500
2000
Request Number
Fig.15. Response time affected by α value In addition to the number of copies, we also set the read/write ratio to evaluate the system behavior. Fig.14 demonstrates the response time affected by read/write ratio (where number of copies =2, α=0.8). We measure the average response time with read/write ratio of 0.3, 0.5, and 0.7. Since the read bandwidth is
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
higher than the write bandwidth as defined in Table 4, the transfer time of read operations is smaller than that of write operations. Therefore, for the same number of requests, the average response time decreases with the increase of the read/write ratio. The parameter α in the Zipf-like distribution also has an impact on the response time. Fig.15 demonstrates the response time affected by the value of α (where number of copies =2, read/write ratio=0.7). According to the Zipf-like distribution, the larger the α is, the more requests that access the most popular files. Since the popular files are stored in the hot nodes, the waiting time in the queues of the hot nodes grows with the increase of the value α. Due to the increase of the waiting time, the average response time will be increased accordingly. On the contrary, if the value α is too small, this will increase the probability that the requests access the cold data which is stored in cold nodes. It is easy to observe that the minimal average response time is about 2 seconds across Fig.13, Fig.14, and Fig.15. One may argue that the average response time is too long. According to Table 4, the read and write bandwidth are defined as 60MB/s and 50MB/s, respectively, and the file size in the simulation is determined as a uniform file size of 100MB. If we set the read/write ratio as 100% and 0%, the transfer time should be 1.67 seconds and 2 seconds, accordingly. Therefore, the average response time of two seconds is reasonable, and it will be reduced significantly with the decrease of the file size. 5.6 Reliability discussion Even though the reliability of disks has been significantly increased due to the load/unload technology, the number of spin down/up a disk can tolerate is still limited (Zhu et al., 2005). When using the spin down/up approach to control the power states of disks, it results in an accelerated consumption of duty cycles (Bisson, 2007). This is even worse for high-end disks, since the spinning down/up time of the high-end disks are much longer than that of the mobile disks. The reason behind this is because the high-end disks are physically different from the mobile disks. In order to reduce flexing under the stress of faster RPM and increased heat, the high-end disks normally employ lower-capacity but heavy platters for continuous operation and higher vibration-tolerance while serving I/O requests. They also often use different bearing, airflow, and filter designs. Table 6. Number of power state transitions with different configurations of the storage cluster Data copies =1
α= 0.6 α=0.8 α=1
Data copies =2
Data copies =4
Data copies =8
T=10
T=20
T=40
T=10
T=20
T=40
T=10
T=20
T=40
T=10
T=20
T=40
3044
2056
1071
2701
1929
1024
2323
1738
985
1294
1100
698
2573
1815
1049
2536
1682
1010
1890
1457
962
933
815
615
1848
1588
924
1728
1518
898
1480
1218
836
685
611
493
The power state transition can incur a significant energy cost and time penalty. Although the increased energy cost and time penalty may be alleviated by using advanced methods, the impact on the reliability of computer components may be not. Frequently switching on/off the cluster nodes has a significant impact on the reliability besides the disk drives. This is incurred by two reasons (Gianni, 1997). The first one is that things expand as they heat up. When computer components with different thermal coefficients of expansion are held rigidly in place (e.g. IC legs are anchored by solder on a printed circuit board) the expansion/contraction cycle can eventually cause premature failure of the electrical connection by fatigue. The second one is the surge of power through those delicate microscopic circuits when a computer is switched on. Therefore, the power state transitions should be distributed across all the cluster nodes rather than concentrated on a few nodes in order to extend the life span of the cluster. Furthermore, it is not
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
worthwhile to leverage the idle length covering a couple of minutes due to the increased number of power state transitions, although the idle length could be used for energy saving. Table 6 summarizes the total number of power state transitions across the 16 storage nodes with different configurations of the storage cluster, when performing our experiments. It shows that the number of transitions decreases with the growth of data copies, transition time and α value of Zipf-like distribution. This is because more hot data are concentrated to less hot nodes with the increase of data copies, transition time, and α value, thus decreasing the number of transitions of cold nodes. As discussed before, system reliability decreases with the increase of the number of power state transitions. Therefore, in our future work, we will investigate how to further reduce the number of power state transitions, while maintaining system performance and power efficiency.
6
Conclusion
This paper proposes a data replication strategy in storage clusters by leveraging the data access pattern of 80/20 rule. This strategy only replicates those frequently access data that is a small portion of the overall data set, thus reducing the resource consumption of data replication. Furthermore, this strategy divides the storage nodes into a hot node set and a cold node set. The hot nodes, which store a small amount of the hot data copies, are always in an active state to guarantee the system performance. The cold nodes, which stored a large number of infrequently accessed cold data, are placed in a low power state, thus reducing the energy consumption of the storage cluster. Comprehensive simulations are performed to evaluate the impacts of transition time, the value α in the Zipf-like distribution, number of copies, and read/write ratio on the bandwidth consumption, power, and system response time. The experimental results demonstrate that the energy saving grows when the skew of data access pattern increases, and the average response time decreases with the growth of the skew. When using this approach to reduce the energy consumption of storage clusters, we can maintain a balance between the energy saving and system performance. Therefore, we believe that the proposed low-power data replication approach can be applied to different application scenarios, and achieve a significant power reduction while guaranteeing the system performance.
ACKNOWLEDGMENT
We would like to thank the anonymous reviewers for helping us refine this paper. Their constructive comments and suggestions are very helpful. This work is supported by the National Natural Science Foundation (NSF) of China under grant (No.61272073, No. 61373125, No. 61073064), the key program of Natural Science Foundation of Guangdong Province (no. S2013020012865), the Scientific Research Foundation for the Returned Overseas Chinese Scholars (State Education Ministry), the Educational Commission of Guangdong Province (No. 2012KJCX0013), Open Research Fund of Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences (CARCH201107).
References Andersen D G, Franklin J, Kaminsky M, Phanishayee A, Tan L, Vasudevan V. FAWN: A fast array of wimpy nodes. In: Proceedings of the 22nd ACM Symposium on Operating Systems Principles(SOSP), Big Sky, MT, 2009. ACPI. .
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
Bisson T, Brandt S A, Long D D E. A Hybrid Disk-Aware Spin-Down Algorithm With I/O Subsystem Support. In: Proceedings of IEEE International conference on Performance, Computing, and Communications Conference 2007 (IPCCC 2007), p. 236-245. Blake C, Rodrigues R. High availability, scalable storage, dynamic peer networks: pick two. In: Proceedings of The 9th conference on Hot Topics in Operating Systems, 2003, p. 1-6. Breslau L, Cao P, Fan L, et al. Web caching and Zipf-like distributions: evidence and implications. In: Proceedings of The 18th Conference on Computer Communications, 1999, p. 126–134. Caulfield A M, Grupp L M, Swanson S. Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications. In: Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’09) , 2009. Cherkasova L, Ciardo G. Characterizing temporal locality and its impact on web server performance. Technical Report HPL-2000-82, Hewlett Packard Laboratories, 2000. Dahlin M, Chandra B B V, Gao L, et al. End-to-end WAN service availability. IEEE/ACM Transactions on Networking 2003;11(2):300-313. Degani Y, Dudderar T D, Han B J, et al. Thermal Stress Relief with Power Management. In: Proceedings of the 1997 IEEE International Symposium on Electronics and the Environment, 1997. Deng Y. Deconstructing Network Attached Storage systems, Journal of Network and Computer Applications 2009;32(5):1064-1072. Deng Y. What is the Future of Disk Drives, Death or Rebirth? ACM Computing Surveys 2011; 43(3):23. Elnozahy E N M, Kistler M, Rajamony R. Energy-efficient server clusters. In: Proceedings of Power-Aware Computer Systems. Springer Berlin Heidelberg, 2003, p. 179-197. Fan X, Weber W D, Barroso L A. Power provisioning for a warehouse-sized computer. In: Proceedings of The 34th Annual International Symposium on Computer Architecture, 2007, p. 13–23. Ge R, Feng X, Feng W, Cameron K W. CPU MISER: a performance-directed, run-time system for power-aware clusters. In: Proceedings of International Conference on Parallel Processing(ICPP07), 2007, p. 18. Ghemawat S, Gobioff H, Leung S T. The Google file system. In: Proceedings of the nineteenth ACM symposium on Operating systems principles, 2003, p. 29-43. Gomez M E, Santonja V. Characterizing temporal locality in I/O workload. In: Proceedings of The International Symposium on Performance Evaluation of Computer and Telecommunication Systems, 2002. Huang S, Feng W. Energy-efficient cluster computing via accurate workload characterization. In: Proceedings of the 9th IEEE/ACM International Symposium Cluster Computing and the Grid, 2009, p. 68–75. Huang H, Hung W, Shin K G. FS2: dynamic data replication in free disk space for improving disk performance and energy consumption. In: Proceedings of The twentieth ACM symposium on Operating systems principles, 2005, p. 263-276. Huang J, Zhang F, Qin X, Xie C. Exploiting redundancies and deferred writes to conserve energy in erasure-coded storage clusters. ACM Transactions on Storage 2013; 9(2): Article 4. Hsu C, Feng W. A power-aware run-time system for high-performance computing. In: Proceedings of ACM/IEEE SC Conference, 2005. Hsu W W, Smith A J, Young H C. The automatic improvement of locality in storage systems. ACM Transactions on Computer Systems 2005;23 (4):424–473. Karedla R, Love J S, Wherry B G. Wheery. Caching Strategies to Improve Disk System Performance. IEEE Computer, 1994;27(3):38–46.
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.
Khanli L M, Isazadeh A, Shishavan T N. PHFS: A dynamic replication method, to decrease access latency in the multi-tier data grid. Future Generation Computer Systems 2011;27 (3):233-244. Lamehamedi H, Szymanski B, Shentu Z, et al. Data replication strategies in grid environments. In: Proceedings of The Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002, p. 0378. Lee L W, Scheuermann P, Vingralek R. File assignment in parallel I/O systems with minimal variance of service time. IEEE Transactions on Computers 2000;49 (2):127–140. Li W, Yang Y, Chen J, et al. A Cost-Effective Mechanism for Cloud Data Reliability Management Based on Proactive Replica Checking. In: Proceedings of The 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2012, p. 564-571. Moore F. More energy needed, Energy User News November 25, 2002. Pareto principle. . Park S M, Kim J H, Ko Y B, et al. Dynamic data grid replication strategy based on Internet hierarchy. In: Proceedings of Grid and Cooperative Computing, 2004, p. 838-846. Pinheiro E, Bianchini R, Carrera E V, Heath T. Dynamic cluster reconfiguration for power and performance. In: Proceedings of Workshop on Compilers and Operating Systems for Low Power, 2003, pp. 75–93. Pinheiro E, Weber W D, Barroso L A. Failure trends in a large disk drive population. In: Proceedings of the 5th USENIX conference on File and Storage Technologies, 2007, p. 17-28. Power Analyzer Datalogger. . Power Heat, and Sledgehammer. White paper, maximum Throughput, Inc., April 2002. Rahman R M, Barker K, Alhajj R. Replica Placement in Data Grid: Considering Utility and Risk. In: Proceedings of The International Conference on Information Technology: Coding and Computing, 2005, p. 354-359. Ranganathan K, Iamnitchi A, Foster I. Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities. In: Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2002, p. 376. Shafi H, Bohrer P J, Phelan J, et al. Design and validation of a performance and power simulator for PowerPC systems. IBM Journal of Research and Development 2003;47 (5-6):641-651. Staelin C, Garcia-Molina H. Clustering active disk data to improve disk performance. Tech. Rep. CSTR-283-90, Department of Computer Science, Princeton University, 1990. Tang M, Lee B S, Yeo C K, et al. Dynamic replication algorithms for the multi-tier Data Grid. Future Generation Computer Systems 2005;21 (5):775-790. Verma A, Ahuja P, Neogi A. Power-aware dynamic placement of HPC applications. In: Proceedings of The 22nd annual international conference on Supercomputing, 2008, p. 175-184. Wolfson O, Jajodia S, Huang Y. An adaptive data replication algorithm, Journal of ACM Transactions on Database Systems 1997;22 (2):255-314. Xie T, Sun Y. A File Assignment Strategy Independent of Workload Characteristic Assumptions. ACM Transactions on Storage, 2009;5(3):10. Yuan Y, Wu Y, Yang G, et al. Dynamic Data Replication based on Local Optimization Principle in Data Grid. In: Proceedings of The Sixth International Conference on Grid and Cooperative Computing, 2007, p. 815-822. Zhu Q, Chen Z, Tan L, et al. Hibernator: Helping Disk Arrays Sleep Through The Winter. In: Proceedings of the 20th ACM Symposium on Operating Systems Principles 2005 (SOSP 2005), p.177-190.
Journal of Network and Computer Applications Vol.xxx, No. xxx, 2014, pp.xxx.