An Energy-Aware File Relocation Strategy Based ... - Semantic Scholar

1 downloads 0 Views 296KB Size Report
pears, namely, hot file(s) in cold node or cold file(s) in hot node. If a hot file is in a cold node, EFR relocates it into a hot node in a round robin method. Similarly ...
An Energy-Aware File Relocation Strategy Based on File-Access Frequency and Correlations Cheng Hu1 and Yuhui Deng1( ),2 1

Department of Computer Science, Jinan University, Guangzhou, 510632, China 2 State Key Laboratory of Computer Architecture, Institute of Computing, Chinese Academy of Sciences, Beijing, 100190, China hucheng [email protected], [email protected]

Abstract. Energy consumption has become a big challenge of the traditional storage systems due to the explosive growth of data. A lot of research efforts have been invested in reducing the energy consumption of those systems. Traditionally, the frequently accessed data are concentrated into a small part of hot storage nodes, and other cold storage nodes are switched to a low-power state, thus saving energy. However, due to the energy penalty and time penalty, it takes extra energy and generates additional delay to switch a cold storage node from a low-power state to an active state. In contrast to the existing work, this paper proposes a Skew File Relocate (SFR) strategy which aggregates the correlated cold files to the same cold storage node in addition to concentrating the frequently accessed files to the hot nodes. Because the correlated files are normally accessed together, SFR can significantly reduce the number of power state transitions and lengthen the idle periods that the cold storage nodes are experienced, thus saving more energy and improving the system response time. Furthermore, other three relocation strategies are designed to explore the performance behavior of SFR. Experimental results demonstrate that SFR can significantly reduce the energy consumption while maintaining the system performance at an acceptable level. Keywords: Energy Aware; File Relocation Strategy; File-Access Frequency; File-Access Correlations; Clustered Storage System

1

Introduction

Commercially component-based clustered storage systems with the advantage of high scalability become the architecture of next generation storage systems. However, with the explosive growth of data, this system will become more and more complex, and then it will consume enormous energy and require a mass of storage resources. Deng [1] indicated that energy efficiency has become one of the

most important challenges in designing disk drive storage systems. EPA (U.S. Environmental Protection Agency) [2] estimated that about 61 billion kilowatthours (kWh) are consumed by data centers in 2006 (1.5 percent of total U.S. electricity consumption) with a total electricity cost of about $4.5 billion, and the energy efficiency trends reveal that the power consumption keeps a growth rate of 18% per year. To reduce the energy consumed by storage system, many approaches dynamically transfer the power state of storage nodes [3–5]. 80/20 rule [6] indicates that roughly 80% of the effects come from 20% of the causes. Cherkasova and Ciardo [7] found that in web workloads, 90% of the requests go to 10% of files. Because the storage nodes which store cold files are occasionally accessed, these nodes can be transferred to a low-power state. Due to the extended idle length of those storage nodes, energy conservation can be achieved. However, those approaches did not consider the impact of file-access correlations on system performance and energy consumption. In most cases, correlated files are accessed together. And if correlated files are put into the same node in the standby state, the number of wake-ups will be reduced, which leads to a further reduction of energy consumption. Besides, based on 80/20 rule we can maintain a minimal number of active storage nodes. As a result, the energy consumption of a specific clustered storage system can be significantly reduced. Many empirical studies [8, 9] have shown that it’s viable to identify the fileaccess frequency and correlations. In this paper, we design a Skew File Relocation (SFR) strategy which is energy-aware. And we present a novel method to mine the file-access frequency and correlations, which is crucial to realize SFR strategy. In order to explore the system behavior of the proposed strategy, we design other three strategies for comparison. Furthermore, we simulate a clustered storage system to evaluate these file relocation strategies. Experimental results demonstrate that our strategy can significantly reduce the energy consumption, while maintaining the system performance at an acceptable level. Our main contributions are as follows: 1. We propose an energy-aware file relocation strategy—SFR which leverages both file-access frequency and correlations. 2. Frequency and Correlations Mining (FCM) method is presented to mine file-access frequency and correlations. 3. To evaluate SFR, we design other three strategies for comparison, and perform a simulation to measure the energy consumption and response time of a specific clustered storage system. The rest of this thesis is organized as follows. Section 2 discusses related work. Next, section 3 introduces a clustered storage system, SFR, and the FCM method. Then, in section 4 we construct a simulator, in which other three file relocation strategies designed for comparison are evaluated along with SFR. Finally, a summary is given in Section 5.

2 2.1

Related Work Energy Saving of Clustered Storage Systems

Many efforts have been made to reduce the energy consumption of clustered storage systems. To improve the energy efficiency of server clusters, Chase et al. [4] designed an architecture for resource management in a hosting center. They put unused clusters into a sleep state. Pinheiro et al. [3] developed a system that dynamically turns cluster nodes on/off to handle the load imposed on the system. Power management techniques were implemented by Verma et al. [10] to reduce the power consumption of high performance applications on modern power-efficient servers with virtualization support. And Bostoen et al. [11] listed and classified a variety of energy saving techniques. Recent work by Krioukov et al. [12] designed a power-proportional cluster which consists of a power-aware cluster manager and a set of heterogeneous machines. Thereska et al. [13] presented Sierra, a power-proportional distributed storage subsystem for data centers. This subsystem powers down servers during troughs. About 23% of power was saved in their experiment. Zhang et al. [5] presented a power-aware data replication strategy by leveraging data access activities. Deng et al. [14] designed a power-aware web cluster scheduler which divides cluster nodes into an active group and a low-power group. Many researches also reduced energy consumption through task scheduling [15–17]. Power-saving storage systems based on dynamic voltage scaling (DVS) of processors have been also proposed [17, 19]. The intuition behind the power savings is that the energy consumption of CPU is proportional to the square of the voltage [18]. Besides, reducing the energy consumption of hard disk drives is also a widely used way for reducing the energy consumption of a storage system. RAID [20] is a well-known technology to resolve this issue. EERAID [21], PARAID [22] and GRAID [23] are all based on RAID. There are also several approaches for reducing energy consumption of hard disk drives grounded on the skewed distribution of file-access frequency [24]. Iritani and Yokota [25] further propose Placement of files for Latency and Energy Consumption Optimization (PLECO), a novel method achieves the goal of energy conservation by placing correlated files into the same hard disk drive. Different from ignoring file-access correlations or aiming to reduce the energy consumption of CPUs or hard disk drives, in this paper, we propose a novel method to mine the correlations among the accessed files. With this method we design an energy-aware file relocation strategy which can significantly reduces the energy consumption of clustered storage systems. 2.2

Mining File-Access Frequency and Correlations

In order to optimize I/O performance and mitigate the problem that the speed of cache, memory and hard disk is dramatically unmatched, many early researchers put forward several approaches to derive relationships between files rooted in

access sequences. Tait et al. [26] investigated a client-side cache management technique for detecting file access patterns, and then they exploit them to prefetch files from servers. Lei and Duchamp [27] extended this approach and introduced the last successor predictor. Kroeger and Long [28] used traces of file system activity to compare four different models: last-successor prediction model, Finite Multi-Order Context modeling (FMOC), graph based modeling and an improved FMOC model called Partitioned Context Model (PCM). In order to prefetch more file (not only would be accessed next time but also some times later) and further reduce I/O latencies, Kroeger and Long [29] modified PCM and thus create a technique called Extended Partition Context Modeling. All these works told us, accurate prediction of future access pattern can be made by studying the past access pattern. Ishii et al. [30] inserted a memory access map into a prefetcher. This map is a data structure like bitmap for holding past memory accesses whose access patterns can be detected in turn. He et al. [31] advocated to accumulate I/O information, and then by exploring these information they reveal data usage patterns. Jiang et al. [32] presented a disk-level prefetching scheme, which leverages data layout and access history on disk drives to find out access pattern. To derive the file-access correlations, there are also several approaches. Among them, FARMER [8] is an approach to mine file-access correlations leveraging file access sequence and semantic distance among files. And SUGOI (Search by Utilizing Groups Of Interrelated files in a task), a file search system, was introduced by Wu et al. [9]. SUGOI contains a task mining component, which extracts tasks and then discovers the interrelation between them from file-access logs. In contrast to the existing work, this paper proposes a novel method for mining file-access frequency and correlations. The method is then employed to design an energy-aware file relocation strategy. In order to explore the system behavior of the proposed strategy, we design other three strategies for comparison. Furthermore, we simulate a clustered storage system to evaluate these file relocation strategies.

3 3.1

System Design System Overview

System Architecture. A clustered storage system is designed in this paper. In terms of 80/20 rule, hot files are a small part of file set. So, in our design, storage nodes are divided into hot ones and cold ones. Hot files are relocated into the hot nodes. And cold files are relocated into the cold nodes with other cold files correlated to them. We only consider the file-access correlations of cold files and place correlated cold files together. That’s because the hot nodes never go into a standby state, whether correlated hot files are in the same storage node has no impact on system performance and energy consumption. The hot nodes are kept in the active state (or the idle state when there’s no request) all the time, owing to the accesses to hot files would be more than 80% of the total requests.

Clients

Internet Metadata Server

. . .

Cold storage nodes ...

LAN

Hot storage nodes

Fig. 1. System architecture.

R/W Requests

Standby

Active

No request Suspend

Idle

Fig. 2. Power state migration of a storage node.

However, the cold nodes are in the standby state unless a request is received. And after all the requests are finished and there are no subsequent requests for a predetermined period of time, they are transferred to the standby state again. As depicted in Fig. 1, the system contains one metadata server and several storage nodes. Hot nodes and cold nodes are divided from those storage nodes. The metadata server is the manager of the system. For the purpose of adapting to those relocation strategies, every storage node contains a file relocation buffer memory. Only when the file relocation buffer is full, can the files be relocated to the hard disk drive of this storage node. The intuition behind is that SFR relocates related cold files to the same storage node, so the files in a relocate buffer memory are related, thus they can be flushed to the hard disk drive and stored sequentially, and when these files are accessed in the future, this management will dramatically reduce the latency. We detail the process of file relocation in section 3.2.

Power State Transition of Nodes. In general, the storage nodes of a clustered storage system have three power states: active state, idle state and standby state. The power state migration of a storage node is presented in Fig. 2. When the data in a storage node need to be accessed, the storage node will be switched to the active state to serve the read/write requests. After the requests are completed and there are no subsequent requests waiting, the storage node is then transferred to the idle state. The power of the idle state is slightly lower than the active

8:00-8:15

8:15-8:30

8:30-8:45

a

d

a

b

b

c

c

a

e

...

...

...

...

...

Transaction duration

Fig. 3. Files accessed in transactions.

state. The node will be transferred back to the active state when a new request is received. To conserve energy, the node can be further suspended to the standby state, in which the CPU, RAM and hard disk drives of the node are all switched to the standby state. The power of a storage node in the standby state is much lower than that in the active state or the idle state. To perform requests after entering the standby state, the storage node must be woken up and then resumed to the active state. Suspending a storage node to the standby state only takes a little time and energy. However, resuming a storage node from the standby state to the active state takes extra energy and time due to the energy penalty and time penalty. 3.2

Mining File-Access Frequency and Correlations

FCM method is proposed to mine the file-access frequency and correlations. The file-access frequency is easy to identify. FCM sorts all the files by the number of accesses, then in terms of 80/20 rule, the top 20% files are hot files and the remaining files are cold ones. However, mining the file-access correlations is not a simple work. Wu et al. [9] presented a method for the frequent itemset task mining in order to search files in file systems. But here we improve it to mine the file-access correlations. The basic idea of the frequent itemset task mining is that files accessed within the same period of time are related to each other. FCM applies this ideal and improves it to mine the file-access correlations. First, FCM divides the file-access logs into several transactions. A transaction is a series of file-accesses in a certain duration. We use 15 minutes as a duration (transaction duration). Fig. 3 illustrates a instance of three successive transactions. As shown, in the first transaction (8:00-8:15), file a, b and c are accessed. To mine file-access correlations from transactions, a data-mining algorithm—Apriori [33] is adopted

in this paper. Apriori is a best-known basic algorithm for mining frequent item sets in a set of transactions. Of course, it is feasible to use other data-mining algorithms. The Apriori algorithm uses two measurements (support and confidence) to discover the rules. In this paper, we also use these two measurements, but to discover the correlation degree of two files. We take file FA and FB as an illustration. Support is the probability that the two files appear in a transaction. Confidence is the probability that the file FB appears in a transaction when the file FA appears. We calculate support and confidence by adding a weighted factor α, for the sake that FCM could adapt to the latest file-access cases. α is a weighted factor related to the order of a transaction. E.g., as in Fig. 3, if we take the time duration 8:00-8:15 as the first transaction and there are 3 transactions in total, the α of the transaction (8:00-8:15) is 1 + 1/3. We calculate the α of the ith transaction by αi = 1 + i/n. (1) In where, i represents the order of the transaction and n represents the total number of transactions. Support of two files is calculated by Support = (αi + αj + ... + αk )/(α1 + α2 + ... + αn ).

(2)

Confidence (short for ”conf” due to the length) of two files is calculated by Conf = (αi + αj + ... + αk )/(αi + αj + ... + αk + ... + αl ).

(3)

In (2), the subscripts i, j...k represent the order of transactions in which the two files appear, and the divisor represents the sum of α of all transactions. In (3), the subscripts i, j...k represent the order of transactions in which the two files appear, and the divisor represents the sum of α of all transactions in which the first file appears. In this paper, the file-access correlations of two file is represented by the form < abc → xyz >. The maximum value of support is 1. This happens when the two files appear in every transaction. And the maximum value of confidence is 1 as well. This happens when the two files appear together all the time. The minimum value of support and confidence is 0. To filter file-access correlations, we set thresholds for these two measurements. File-access correlations of two files whose support or confidence is less than the threshold value will be considered to be 0. In other words, if file-access correlations of two files are very weak, we think that is just a coincidence, and there is no correlations between them. When SFR relocates a cold file, for each cold node, by leveraging FCM, SFR first get the confidences of the files in the relocation buffer and the file needs to be relocated. Then, it calculates the sum of these confidences. Finally, the cold node which has the maximum sum is selected as the best one, because this node contains the most related files. We give an example in Fig. 4, the data in the table is figured out based on Fig. 3. As shown, the best storage node for relocation is the node B.

Sum of Confidence= 0.6+0 Support Confidence



0.6

0.6



0

0



0.27

0.4



0.33

1

b

a

e

File relocation buffer memory of Node A

Ĝ

c d Sum of Confidence= 0.4+1 File relocation buffer memory of Node B

Fig. 4. File relocation of SFR.

4

Simulation Setup

We take a simulation experiment to evaluate all those strategies. The experiment models a clustered storage system which contains 16 storage nodes and one metadata server. Among those storage nodes, two nodes are used as hot nodes, and the other 14 nodes are cold nodes. To reduce the number of wake-ups of cold nodes and provide a good performance, when cold nodes perform requests, every of the them will keep in a active state (or a idle state when there is no request to perform) for 30 seconds. In fact, data-mining process is executed when the metadata server receives a file-access request. This process can be inserted into the process of file information retrieval. Therefore, the effect of the data-mining process on system performance would be negligible. In the experiment, we do not include the execution time and energy overheads of the data-mining process. In addition, relocating files into a cold node happens when the cold node finished all the requests and have not be suspended into the standby state. Similarly, only when a hot node is idle, files can be relocated into the hot node. So, executing relocations almost have no impact on system performance. We also neglect the overheads of relocations in the experiment. We plan to do more accurate experiments in future work. In order to explore the system behavior of our proposed strategy—SFR, we also implement other three file relocation strategies. – High-Performance (HP) strategy is used as the baseline. HP does not divide the storage nodes into hot ones and cold ones. So it does not suspend any storage node to a standby state. And all the files in the storage system will not be relocated. – File Relocate Once (FRO) relocates files into storage nodes only once. At first, FCM identifies the hot files and cold files after a learning stage. Then, FRO executes the relocation only once after this learning stage. Storage nodes are divided into hot ones and cold ones in this strategy. Hot files are relocated into hot nodes and cold files are relocated into cold nodes. FRO does not leverage the file-access correlations to relocate cold files.

Table 1. Simulation platform specs Specification OS:

Windows 7 professional x64

RAM:

4GB DDR III

CPU:

Intel i3-3240

Hard drive:

500GB/5400rpm

Table 2. Characteristics of simulated storage node Parameter

Qualification Value Active

Power(Watt)

Energy(Joule) Delay(Second) Hard disk drive transmission rate(MB/s) RAM transmission rate(GB/S)

Idle

60 40.2

Standby

4

Suspend

4

Wake up

519

Suspend

1

Wake up

10

Read

60

Write

50

Read/write

10

– Equal File Relocate (EFR) relocates files when a mismatching situation appears, namely, hot file(s) in cold node or cold file(s) in hot node. If a hot file is in a cold node, EFR relocates it into a hot node in a round robin method. Similarly, if a cold file is in a hot node, EFR relocates it into a cold node in a round robin method.

4.1

Platform Environment

We use a PC as the experimental platform, the detailed specs are showed in Table 1. And the parameters of storage nodes used in our experiment are given in Table 2. The transmission rate of a hard disk drive is set according to a performance evaluation in [34]. And the RAM transmission rate is the theoretical value of a DDR-III RAM with the frequency of 1.33GHz. The other parameters of the storage nodes are set based on the measured value in [5]. Real network file system traces are used in this paper. Network file system traces trace files’ access behaviors and record those behaviors in a specific format. In order to investigate the real situations and insure our research is universal, we use three different traces: lair62b, home02 and deasna02. The characteristics of these traces are given in Table 3. R/W represents the read/write request. As

Table 3. Characteristics of the Network file system traces Trace name

Size

Time length

R/W per hour

lair62b

11GB

984hours

63k/h

home02

48GB

2160hours

260k/h

deasna02

32GB

960hours

840k/h

shown, deasna02 has the heaviest workloads. Lake of space we do not give the detailed information of these traces here, and it can be obtained in the web page: http://www.eecs.harvard.edu/sos/traces.html. 4.2

Evaluation

We take the 8:00-18:00 traces of a weekday respectively from those network file system traces. The traces of 8:00-10:00 is used as the learning sample which is provided to FCM. So, these two hours is the learning stage of FCM. After the learning stage, FCM keeps on updating file-access frequency and correlations by tracing the file-access behaviors. All mentioned file relocation strategies are adopted. Then, we test the average response time and energy consumption of the storage nodes with remainder 8 hours traces respectively. We set 0.1 as the threshold of support and 0.4 as the threshold of confidence. We believe that the values of thresholds will affect the result of experiment. But we do not discuss them here, because we just want to show the availability of SFR. Energy Consumption. Energy consumptions of the storage nodes with the four file relocation strategies are compared in Fig. 5. As shown, in every traces SFR consumes the least energy. This is because that waking cold nodes up from the standby state will spend a large amount of energy. If correlated files are scattered across different cold nodes, all these cold nodes which have already been suspended to the standby state should be woke up. SFR relocates correlated cold files to the same cold node, this will reduce the number of wake-ups, thus the energy consumption could also be reduced. HP consumes the most energy because all storage nodes work as hot nodes, and it does not suspend any storage node to the standby state. FRO relocates files based on file-access frequency, and it executes the relocation only once after the learning stage. Although file-access behaviors will change as time goes by, FRO still saved more than 11% energy compared with HP. It reveals the locality of file-access. In other words, file-access pattern would be kept on for a long time. Compared with SFR, EFR does not select the optimal cold node for cold file relocation, it relocates files in a round robin. EFR could not prevent correlated files scattered across different cold nodes, so it consumes higher energy compared with SFR. Experimental result demonstrates that the file relocation strategy we proposed is effective. SFR which leverages the file-access frequency and correlations

14.4 13.3 11.8 10

0 HP

FRO

EFR

20

Energy Consumption (MJ)

18.5

Energy Consumption (MJ)

Energy Consumption (MJ)

20

18.5

14.2 12.8 11.0 10

0

SFR

HP

lair62b

FRO

EFR

20

18.5 16.4 14.8 13.1

10

0

SFR

HP

home02

FRO

EFR

SFR

deasna2

Fig. 5. Energy consumption comparison with four file relocation strategies.

1

0

0.1033

0.1019

EFR

SFR

0 HP

FRO

lair62b

1.7065

1.5

1.0

0.5

0

0.0216

0.0216

EFR

SFR

0.0 HP

FRO

home02

Average response time (s)

2

Average response time (s)

Average response time (s)

2.0 2.0955

0.1493

0.15

0.10

0.05

0.0107

0.0106

EFR

SFR

0 0.00 HP

FRO

deasna2

Fig. 6. Average response time comparison with four file relocation strategies.

saved the most energy compared with other strategies. And compared with the baseline—HP, SFR reduces the energy consumption of storage nodes by more than 29%.

Response Time. Fig. 6 and Fig. 7 show the average response time and the variance of response time of the four different file relocation strategies with the three traces. HP does not suspend any storage node to the standby state, so it have the best performance. Please note that the experiment value of HP is too small, that it is rounded to 0 by statistical software. FRO could not adapt to the latest fileaccess behaviors, because it executes the relocation only once after the learning stage. And the performance of FRO is the worst. EFR have a good performance owe to leverage the file-access frequency. Furthermore, by relocating correlated cold files into the same cold node, SFR could reduce the wake-ups to perform requests for accessing cold files. And because waking up cold nodes will spend a much long time relative to data transmission rate, SFR gains a lower average response time. The variance of response time shows the same trend: HP is the most steady strategy, the next is SFR, then EFR is tinier poor than SFR, and the worst is FRO.

12

11 10 9 8 7 6 5 4 3 2 1

0.9505

0.923

0

0 HP

FRO

EFR

lair62b

SFR

11.2672

11 10 9 8 7 6 5 4 3 2 1

0.2135

0

0.207

0 HP

FRO

EFR

home02

SFR

Variance of response time

12.2042

12

Variance of response time

Variance of response time

13

1.112

1.0

0.5

0.09516

0.09479

EFR

SFR

0 0.0 HP

FRO

deasna2

Fig. 7. Variance of response time comparison with four file relocation strategies.

5

Conclusion

In this paper we proposed an energy-aware file relocation strategy—SFR. By leveraging the file-access frequency and correlations, SFR relocates files when mismatching situation appears. In this strategy, storage nodes are divided into a hot node set and a cold node set. By reason that more than 80% of total requests go to those hot files, the hot nodes are always kept in an active state (or an idle state when there’s no request) to satisfy the system performance. While, the cold nodes are maintained in the standby state unless requests are received. And the cold nodes go back to the standby state only if file-access requests are accomplished, and there are no subsequent requests for a predetermined period of time. Furthermore, FCM method is proposed to mine file-access frequency and correlations. To explore the system behavior of SFR, we implement four file relocation strategies in a simulation experiment, including SFR. Compared with the baseline, SFR reduced the energy consumption of storage nodes by more than 29%. It is demonstrated that SFR which relocates files by leveraging the file-access frequency and correlations can significantly reduce the energy consumption while maintaining the system performance at an acceptable level. For future work, we would like to assign the number of hot nodes according to the workload (increase in a heavy workload and decrease in a light one). Obviously, we could obtain a further reduction of energy consumption and provide better service. Then due to the changes of hot/cold nodes, it is necessary to relocate files among nodes, and it deserves careful study. Finally, it’s so appealing to evaluate our strategy by real implementation instead of simulation. Acknowledgments. This work is supported by the National Natural Science Foundation (NSF) of China under Grant (No. 61572232, and No. 61272073), the key program of Natural Science Foundation of Guangdong Province (No.S2013020012865), the Open Research Fund of Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences (CARCH201401), and the Fundamental Research Funds for the Central Universities, and the Science and Technology Planning Project of Guangdong Province

(No. 2013B090200021). And the corresponding author is Yuhui Deng from Jinan University.

References 1. Y. Deng: What is the Future of Disk Drives, Death or Rebirth? ACM Computing Surveys. ACM Press. Vol.43, No.3, 2011. Article 23. 2. R. Brown: Report to congress on server and data center energy efficiency: Public law 109-431. Lawrence Berkeley National Laboratory (2008). 3. E. Pinheiro, R. Bianchini, E. V. Carrera, and T. Heath: Dynamic cluster reconfiguration for power and performance. Compilers and operating systems for low power. Springer US, 2003. 75-93. 4. J. S. Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat, and R. P. Doyle: Managing energy and server resources in hosting centers. ACM SIGOPS Operating Systems Review. Vol. 35. No. 5. ACM, 2001. 5. L. Zhang, Y. Deng, W. Zhu, J. Peng, and F. Wang: Skewly replicating hot data to construct a power-efficient storage cluster. Journal of Network and Computer Applications. Elsevier Science. Vol.50, 2015, pp.168-179. 6. Pareto Principle, http://en.wikipedia.org/wiki/Pareto\_principle 7. L. Cherkasova, and G. Ciardo: Characterizing temporal locality and its impact on web server performance. Technical Report HPL-2000-82, Hewlett Packard Laboratories, 2000. 8. P. Xia, D. Feng, H. Jiang, L. Tian, and F. Wang: FARMER: a novel approach to file access correlations mining and evaluation reference model for optimizing peta-scale file system performance. Proceedings of the 17th international symposium on High performance distributed computing. ACM, 2008. 9. Y. Wu, K. Otagiri, Y. Watanabe, and H. Yokota: A file search method based on intertask relationships derived from access frequency and rmc operations on files. Database and Expert Systems Applications. Springer Berlin Heidelberg, 2011. 10. A. Verma, P. Ahuja, and A. Neogi: Power-aware dynamic placement of hpc applications. Proceedings of the 22nd annual international conference on Supercomputing. ACM, 2008. 11. T. Bostoen, S. Mullender, and Y. Berbers. Power-reduction techniques for datacenter storage systems. ACM Computing Surveys (CSUR) 45.3 (2013): 33. 12. A. Krioukov, et al.: Napsac: Design and implementation of a power-proportional web cluster. ACM SIGCOMM computer communication review 41.1 (2011): 102108. 13. E. Thereska, A. Donnelly, and D. Narayanan: Sierra: practical powerproportionality for data center storage. Proceedings of the sixth conference on Computer systems. ACM, 2011. 14. Y. Deng, Y. Hu, X. Meng, Y. Zhu, Z. Zhang, and J. Han: Predictively Booting Nodes to Minimize Performance Degradation of a Power-aware Web Cluster. Cluster Computing. Springer Science. Vol.17, No.4, 2014, pp.1309-1322. 15. L. Mashayekhy, M. Nejad, D. Grosu, Q. Zhang, and W. Shi: Energy-aware Scheduling of MapReduce Jobs for Big Data Applications. Parallel and Distributed Systems, IEEE Transactions on , vol.PP, no.99, 2014, pp.1-1 16. V. Ebrahimirad, M. Goudarzi, and A. Rajabi: Energy-Aware Scheduling for Precedence-Constrained Parallel Virtual Machines in Virtualized Data Centers. Journal of Grid Computing, 13(2), 2015, 233-253.

17. Z. Tang, L. Qi, Z. Cheng, K. Li, S. U. Khan, and K. Li: An Energy-Efficient Task Scheduling Algorithm in DVFS-enabled Cloud Environment. Journal of Grid Computing, 2015, 1-20. 18. M. Weiser, B. Welch, A. Demers, and S. Shenker: Scheduling for reduced CPU energy. Mobile Computing. Springer US, 1996. 449-471. 19. S. Zikos, and H. D. Karatza: Performance and energy aware cluster-level scheduling of compute-intensive jobs with unknown service times. Simulation Modelling Practice and Theory 19.1 (2011): 239-250. 20. D. A. Patterson, G. Gibson, and R. H. Katz: A Case for Redundant Arrays of Inexpensive Disks (RAID). Proceedings of the 1988 ACM SIGMOD international conference on Management of data, SIGMOD ’88, pages 109C116, New York, NY, USA, 1988. ACM. 21. D. Li and J. Wang: EERAID: energy efficient redundant and inexpensive disk array. Proceedings of the 11th workshop on ACM SIGOPS European workshop, EW 11, New York, NY, USA, 2004. ACM. 22. C. Weddle, et al.: PARAID: A gear-shifting power-aware RAID. ACM Transactions on Storage (TOS) 3.3 (2007): 13. 23. B. Mao, et al.: GRAID: A green RAID storage architecture with improved energy efficiency and reliability. Modeling, Analysis and Simulation of Computers and Telecommunication Systems, 2008. MASCOTS 2008. IEEE International Symposium on. IEEE, 2008. 24. D. Colarelli, and D. Grunwald: Massive arrays of idle disks for storage archives. Proceedings of the 2002 ACM/IEEE conference on Supercomputing. IEEE Computer Society Press, 2002. 25. M. Iritani, and H. Yokota: Effects on performance and energy reduction by file relocation based on file-access correlations. Proceedings of the 2012 Joint EDBT/ICDT Workshops. ACM, 2012. 26. C. D. Tait, and D. Duchamp: Detection and exploitation of file working sets. Distributed Computing Systems, 1991., 11th International Conference on. IEEE, 1991. 27. H. Lei, and D. Duchamp: An analytical approach to file prefetching. USENIX Annual Technical Conference. 1997. 28. T. M. Kroeger, and D. D. E. Long: The case for efficient file access pattern modeling. Hot Topics in Operating Systems, 1999. Proceedings of the Seventh Workshop on. IEEE, 1999. 29. T. M. Kroeger, and D. D. E. Long: Design and Implementation of a Predictive File Prefetching Algorithm. USENIX Annual Technical Conference, General Track. 2001. 30. Y. Ishii, M. Inaba, and K. Hiraki: Access map pattern matching for high performance data cache prefetch. Journal of Instruction-Level Parallelism 13 (2011): 1-24. 31. J. He, X. H. Sun, and R. Thakur: Knowac: I/o prefetch via accumulated knowledge. Cluster Computing (CLUSTER), 2012 IEEE International Conference on. IEEE, 2012. 32. S. Jiang, X. Ding, Y. Xu, and K. Davis: A prefetching scheme exploiting both data layout and access history on disk. ACM Transactions on Storage (TOS) 9.3 (2013): 10. 33. R. Agrawal, T. Imieliski, and A. Swami: Mining association rules between sets of items in large databases. ACM SIGMOD Record. Vol. 22. No. 2. ACM, 1993. 34. Y. Deng: Deconstructing network attached storage systems. Journal of Network and Computer Applications 32.5 (2009): 1064-1072.