Computing. Vol.94, No.1, 2012, pp.69-93.
Evaluating Disk Idle Behavior by Leveraging Disk Schedulers Yuhui Denga,b, Kai Lia, Lingwei Zhanga, Ming Fanga, Xinyu Huanga Department of Computer Science, Jinan University, 510632, P. R. China b Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, P. R. China Email:
[email protected];
[email protected] a
________________________________________________________________________ Disk idle behavior has a significant impact on the energy efficiency of disk storage systems. For example, accurately predicting or extending the idle length experienced by disks can generate more potential opportunities to save energy. This paper employs a trace driven simulation to evaluate the impacts of different disk schedulers and queue length thresholds on the disk idle behavior. Experimental results give three implications: (1) Position based schedulers and long queue length thresholds can significantly reduce the maximal queue length and the average queue length. (2) Position based schedulers and long queue length thresholds can generate more idle periods which are shorter than one second, but they do not affect those long idle periods contained in the modern server workloads. (3) Disk idle periods demonstrate both self-similarity and weak long-range dependence, and the disk schedulers and queue length thresholds do impact the Hurst parameter and the correlation behavior of the workloads. The analysis results in this paper provide useful insights for designing and implementing energy efficient policies for the disk drive based storage systems.
Keywords: Disk drive, disk idle time, self-similarity, long-range dependence
________________________________________________________________________ 1. INTRODUCTION Reducing the energy consumption of disk storage systems has long been a challenge in the community [2, 13, 28, 29]. Disk drives are building blocks of those systems. Most disk drives support three power states: namely active, idle, and standby. Many research efforts have been invested in improving energy efficiency by leveraging the power states. Generally, the works can be classified into two categories including mobile systems [18, 25, 41] and high-end storage systems [7, 26, 32, 37, 42]. Dynamic Rotations Per Minute (DRPM) [17, 19] spins server disks at different speeds in correlation with workloads to save energy with very little perturbation in the performance. The technologies involved in above works all rely on the idle phases contained in the workloads to save energy. For example, accurately predicting the idle periods, or extending the length of disk idle phases [11]. Therefore, understanding the idle behavior is essential for designing or implementing energy efficient disk storage systems. Modern disk drives maintain a queue by setting a queue threshold. A preferred threshold is normally three or four requests. When the threshold is reached, the corresponding disk controller and/or the device driver queue the incoming requests until the requests are processed [33]. Akyürek and Salem [1] investigated the queue length of three real traces including sakarya, 1
Computing. Vol.94, No.1, 2012, pp.69-93.
snake and ballast. The sakarya trace was collected from a Sun Sparcstation 2 workstation which runs Sun-OS 4.1.1. The trace consists of two parts. The first part (sakarya1) was gathered when the disk was used to store executable files and libraries that were shared by 14 Sparcstations through NFS. The second part (sakarya2) was recorded when the disk was employed to store user files. The snake trace was collected from the disk which stored the root file system and the swap partition of a Hewlett-Packard 9000/720 workstation. This workstation is a file server for nine HP 9000/720 diskless workstations. The ballast trace was collected from a network file server for 45 Sun 3 workstations. The file system on the traced disk contained primarily binary files and was shared by other workstations via NFS. Akyürek and Salem [1] reported that the mean queue length of sakaryal trace varies between 1.5 and 2.3. For the snake and the ballast traces, the mean queue length is between 0.5 and 0.7, and for the sakarya2 trace, the mean queue length is shorter than 0.5. These numbers indicate that on average the arriving requests do see a queue. Occasionally, the queue can get longer because of bursty arrivals. The response time of disk drives can be dramatically reduced by scheduling the pending requests in the queue. Over the last four decades, a lot of scheduling algorithms have been proposed and implemented [14, 20, 34, 40]. The main goal of the schedulers is to dynamically order the pending requests and minimize the total positioning overhead by taking into account various delays associated with disk accesses, while providing reasonable response times for individual requests [40]. Riska and Riedel [34] evaluated the correlations between disk response time and the queue length. They discovered that by providing more information to the disk schedulers, a longer queue combined with an unfair scheduling is the best and only solution to handle short but sharp overload conditions. In this paper, we review the characteristics of disk schedulers and discuss the mechanisms of disk energy consumption. In contrast to the existing works, we analyze the impacts of different disk schedulers and queue length thresholds on disk idle periods by using seven real traces and adjusting queue length thresholds. Three quantitative methods are employed to evaluate the impacts. Our key contributions are as follows: z
We deconstruct traditional disk schedulers and analyze how the schedulers affect disk idle behavior rather than disk performance.
z
From an idle behavior standpoint, we analyze the correlations between different queue length thresholds and schedulers, and evaluate how the queue length impacts disk idle behavior.
z
We quantitatively and qualitatively evaluate the self-similarity contained in disk idle periods generated by different schedulers and queue length. 2
Computing. Vol.94, No.1, 2012, pp.69-93.
The remainder of this paper is organized as follows. Section 2 introduces the background knowledge including disk schedulers, the mechanism of disk energy consumption, the correlations between disk schedulers and disk idle periods, and the theory of self-similarity. Section 3 presents our measurement environment, illustrates and analyzes the experimental results. A discussion of the work is presented in Section 4. Section 5 concludes the paper with remarks on its contributions to the research in this field.
2. BACKGROUND 2.1 Disk Scheduler A general storage subsystem can queue data access requests at two levels including its disk drive and storage controller/the device driver of the operating system. Modern disk drives maintain a queue length by setting a queuing threshold. When the threshold is reached, the controller and/or the device driver queue the incoming requests until the requests are processed. Fig. 1 plots a typical storage subsystem with two queues including queue 1 at the storage controller and queue 2 at the disk drive.
Device Driver
Disk Drive
Controller Queue1
Queue2
Fig. 1. Request queue in a typical storage subsystem
When a request is submitted to a disk drive through the correspondng file system, the response time consists of two parts including a service time and a queue time. Service time, which measures the time from start of service to completion, denotes the time required to serve an I/O request. Queue time is the time spent in waiting for its turn to be served. Service time depends on the characteristics of disk drives, the current location of disk head, the request address, and the data length [9]. Queue time depends on service time and the data access pattern such as the inter-arrival time of requests. Due to the decrease of service time, the corresponding queue time can be reduced. Service time is normally dominated by seek time and rotational latency. A disk scheduler can reorder or rearrange the requests in the queue to reduce the seek time and the rotational latency, thus reducing the service time. 3
Computing. Vol.94, No.1, 2012, pp.69-93.
Over the last four decades, a lot of scheduling algorithms have been proposed and implemented [3, 14, 20, 34, 40]. First Come First Served (FCFS) disk scheduling policy performs I/O requests in order and every request is served without any starvation. This scheduler is easy to implement and it is fair because the expected waiting time of a request is independent of its physical address (Cylinder/Head/Sector). However, the FCFS often results in suboptimal performance and high mean queue time. Shortest Position Time First (SPTF) scheduler executes the pending request in the working queue which is the closest one to the current disk head position, regardless of the moving direction of the disk head. The SPTF decreases the high queue time of FCFS over a wide range of workloads by reducing the total seek time. The problem is that the disk head could linger over a subset of the cylinders in an attempt to perform all requests close to that area, thus starving any requests outside of that space. SCAN algorithm serves requests in the path when the disk head shuttles from the outermost cylinder to the innermost cylinder and then back from the innermost to the outermost. Every request is performed during the scan of two directions. Requests to the middle cylinders achieve better performance because the disk head passes over the center region at more regular intervals than the edges. Without sacrificing too much performance in the mean queue time, the SCAN algorithm reduces the variance of queue time, thus decreasing the probability of starvation in comparison with SPTF. Many variations of the SCAN algorithm have been developed. The reader is referred to [14] for a comprehensive understanding of the scheduler algorithms. According to the above analysis, a basic premise of disk schedulers is that the queue is long enough and the schedulers can take advantage of the queue to reorder the requests.
2.2 Energy Conservation and Disk Idle Time 2.2.1 Traditional Disk Drives R/W Requests (3)
(1) Active
(2)
Idle
Standby
(4) Fig. 2. Power state migration of disk drives
Most modern disk drives have three power states: namely active, idle, and standby[10]. The power state migration is labelled with a sequence number as defined in the following descriptions (see Fig. 2). Disk drives are working in an active state where the disk platters spin at a full speed. 4
Computing. Vol.94, No.1, 2012, pp.69-93.
(1) When a data access is completed and there is no succeeding request, the disk drive is transferred to an idle state where the disk platters are still spinning, but the electronics may be partially unpowered, and the heads may be parked or unloaded. (2) If the disk drive receives a request when it is in an idle state, the disk drive will be transferred to an active state. (3) If the disk drive remains in the idle state for a certain amount of time, it is spun down to a standby state where the disk stops spinning and the head is moved off the disk. (4)The disk drive is transferred back from the standby state to the active state by spinning up when a new request arrives.
Table 1 Main energy related characteristics Parameter Power (Active) Power (Idle) Power (Standby) Energy (Spin Down) Energy (Spin Up) Time (Spin Down) Time (Spin Up)
IBM 36Z15 13.5Watt 10.2 Watt 2.5 Watt 13.0 Joule 135.0 Joule 1.5 Sec 10.9 Sec
IBM 73LZX 9.5 Watt 6.0 Watt 1.4 Watt 10.0 Joule 97.9 Joule 1.7 Sec 10.1 Sec
IBM 40GNX 3.0 Watt 0.82 Watt 0.25 Watt 0.4 Joule 8.7 Joule 0.5 Sec 3.5 Sec
Carrera et al. [5] summarized the main energy related characteristics of three different IBM disk drives. In order to make this paper self-contained, we list the parameters in Table 1. The table demonstrates that disk drives in the standby state use considerably less energy than that in the active mode, but have to be spun up to full speed before they can serve any requests. When a disk drive is spun up from the low power state (standby) to the high power state (active), it incurs a significant penalty of energy and time, because the disk platters have to be spun up to full speed before they can serve any requests and the heads have to be moved back, which requires servo calibration to accurately track the head as it moves over the drive. For example, an IBM 36Z15 (server disk) has to spend 135 Joules and 10.9 seconds in transferring the disk from the standby state to the active state. To justify this penalty, the energy saved by putting the disk in the standby has to be greater than the energy needed to spin it up again, and the disk has to stay in the low power state for a sufficiently long period of time to compensate the energy overhead [42].
2.2.2 Dynamic Rotations Per Minute Disk Drives
Power state transition is an effective approach for laptop disk drives to save energy. This is because of the long idle periods and the relatively low penalty of energy and time (The IBM 5
Computing. Vol.94, No.1, 2012, pp.69-93.
40GNX listed in Table 1 is a laptop disk). However, frequently spinning up/down disk drives has a significant impact on the reliability. Greenawalt [16] investigated this impact. He reported that a disk drive can last approximately 17 years if it runs continually. Unfortunately, spinning up/down once an hour will reduce the life span to 4.56 years. Each up/down cycle takes the same wear as 3.75 hours of continual use. Modern disk drive manufacturers provide a duty cycle rating which is the number of times the rotating media can be spun down before the chances of failure increase to more than 50% on drive spinning up. Therefore, when controlling a disk drive’s power state transition with a spinning up/down policy, the policy must consider the accelerated consumption of duty cycles [3]. DRPM [17] is proposed to exploit short idle periods of server workloads by dynamically modulating the rotational speed of disk drives. This method can provide large savings in power consumption with very little perturbation in delivered performance and duty cycles. The DRPM checks the request queue of disk drives to decide the speed transitions. If there are no requests pending, the DRPM disk drive reduces its RPM until is reaches a low watermark. If the percentage change in the response time over a certain period of time grows beyond an uppertolerance level, the system will increase the low watermark so that the RPM grows correspondingly.
2.3 Disk Schedulers and Disk Idle Time R1 (Track=1500) R2 (Track=2500) R3 (Track=2000) R4 (Track=3000) R5 (Track=5000)
R1
R2
R3 R4
R5
(a) Time
T1 R1
T2
T4
T3 R2
R3
R4
R5
(b) Time
T1F R1
T2F R3
R2
T3F
T4F
TFidle
R4
R5
(c) Time
T1S
T2S T3S T4S
TSidle
Fig. 3 Anatomy of disk schedulers and the corresponding disk idle periods
6
Computing. Vol.94, No.1, 2012, pp.69-93.
This section will explain the impact of different disk schedulers on disk idle behavior. Fig. 3 shows the anatomy of disk idle periods affected by two different disk schedulers. Fig. 3 (a) plots five requests arriving at the disk drive at different time points. The track numbers of the five requests are R1Track=1500, R2Track=2500, R3Track=2000, R4Track=3000 and R5Track=5000 respectively. T1, T2, T3, and T4 denote the inter-arrival time between the five requests. Fig. 3 (b) depicts that the five requests are executed in order in terms of a FCFS scheduler. The service time of a request is normally dominated by the disk seek time and the rotational latency when the size of the request is small. For simplicity, we assume that the service time of each request is proportional to the distance between the track numbers of the current request and the previous request. For example, disk head is moved from track 0 to track 1500 to execute request R1, request R2 has to wait until the disk head is moved from track 1500 to track 2500, and so on. Therefore, the service time of R1 is 1.5 times of R2. By analogy, the service time of R3 is only half of R2, and the service time of R4 is two times of R3. Because the inter-arrival times of T1, T2, and T3 are shorter than the service times of T1F, T2F, T3F, the requests R2, R3 and R4 are delayed to be served and have to wait in the queue until the previous request is finished. TFidle in Fig. 3 (b) denotes disk idle time which indicates the time between the completion of R4 and the arrival of R5. Fig. 3 (c) demonstrates the execution sequence of the five requests in terms of a (Shortest Seek Time First) SSTF scheduler. When the disk is serving request R1, the requests R2, R3, and R4 arrive and wait in the queue. Because the disk head has to travel from track 1500 to track 2500 to serve the request R2, and the track number of request R3 (track 2000) in on the path, the SSTF scheduler exchanges the execution sequence of R2 and R3, which can reduce the service times of both request R2 and R4, thus increasing the idle time from TFidle to TSidle. Fig. 3 and the above analysis demonstrate that different disk schedulers could have a significant impact on the disk idle behavior. In the above case, the disk idle time is the time between the end and the beginning of two consecutive busy periods at the disk drives. In Section 3, we will employ real traces to evaluate and analyze the impact of different disk schedulers on the disk idle behavior with quantitative methods.
2.4 Self-similarity Self-similarity denotes that a certain property of an object is exactly or approximately similar to a part of itself with respect to scaling in space and/or time. Self-similarity is an important aspect of the workloads generated by computer systems and their components. The selfsimilarity process has been discussed and elaborated by many literatures. In order to make this 7
Computing. Vol.94, No.1, 2012, pp.69-93.
paper self-contained, we borrow the theory from Leland et al. [24] and briefly present the theory in this section. The reader is referred to the original paper for a more comprehensive understanding. The autocorrelation function (ACF) of a stochastic process describes the correlation between the processes at different points in time. In a stationary time series of stochastic process { X i } ( i = 1,2, L , N ), the ACF of lag k is: ACF( k )=
E{( X i − u )( X i + k − u )} E{( X i − u ) 2 }
( k = 0,1,2, L )
(1)
where u is the mean and E{( X i − u ) 2 } is the variance of { X i }. The lag k denotes the time separation between the occurrences X i and X i + k . We assume that ACF( k ) ~ k − β L(t ) .
L(tx) = 1 , for all x > 0. A new t →∞ L (t )
Where k → ∞ , 0 < β < 1 , and L(t ) satisfies the constraint: lim
stationary time series X ( m ) = ( X k( m ) : k = 1,2,3, K ) (for each m = 1,2,3, K ) is obtained by averaging the original series X over non-overlapping blocks of size m . It indicates that the
X ( m ) is given by
X k( m ) =
( X km− m +1 + K + X km ) N ( k = 1,2,3, L , ). The process X is m m
(exactly) second-order self-similarity with Hurst parameter H = 1 −
β 2
( 0 < β < 1 ) if for all
m = 1,2,3, K , the variance of X (m ) Var ( X ( m ) ) ∝ m − β and ACF( m )( k )=ACF( k ) ( k ≥ 0). If Hurst parameter H = 1 −
β 2
( 0 < β < 1 ) and ACF( m )( k ) → ACF( k ) ( m → ∞ ), the
process X is (asymptotically) second-order self-similar. The Hurst parameter is widely used as a way to quantify self-similarity in stochastic processes. The accuracy of the Hurst parameter estimation depends on the time series itself and the estimation method [35]. There are a number of approaches which can be used to calculate the Hurst parameter [23]. We will adopt Variance method, Periodogram method, and Whittle method to perform the evaluation. 3. EXPERIMENTAL EVALUATION AND ANALYSIS 3.1 Measurement Environment Table 2. Disk characteristics of Seagate-Cheetah15k5 8
Computing. Vol.94, No.1, 2012, pp.69-93.
Interface Storage capacity (GByte) Number of data surfaces RPM Sustained bandwidth (Mbytes/sec) Average seek time (ms) Average read/write (ms)
Ultra320 SCSI 146 4 15000 Up to 125 3.5 4.0
Trace driven simulation is a form of event driven simulation in which the events are taken from a real system which is operating under conditions similar to the ones being simulated. By using a simulator and traces, we can explore the features of disk idle behavior in different environments and under a variety of workloads. DiskSim [4] is an efficient, accurate, highly configurable, and trace-driven disk system simulator. Fig.4 illustrates the diagram of storage components involved in the Disksim simulator. It includes modules of device drivers, buses, controllers, adapters, and disk drives as building blocks to construct a typical secondary storage system. It also supports a number of externally trace formats and internally-generated synthetic workloads. Hooks for inclusion in a larger scale system-level simulator are provided by the simulator. DiskSim has been validated both as part of a more comprehensive system-level model and as a standalone subsystem. Device driver Bus Controller Bus Disk
Bus Disk
Disk
Fig. 4 Architecture of the Disksim simulator Several experimentally validated disk models are distributed with Disksim. The experimental results reported in this paper were generated by using a validated Seagate-Cheetah15k5 disk model. The detailed disk characteristics are summarized in Table 2. We used three schedulers including FCFS, Pri-vscan, and SPTF [40] to explore the disk idle behavior. The length of threshold was set as 2, 4, 8, and 16. When it is necessary, we put the scheduler and the length of
9
Computing. Vol.94, No.1, 2012, pp.69-93.
threshold together for simplicity. For example, FCFS-2 denotes the disk scheduler is FCFS and the threshold is 2 requests.
3.2 Characteristics of real traces
Table3. Characteristics of the seven block-level traces Trace name Number of requests Read percentage(%) Average read request size (KB) Average write request size (KB)
Mds1 62604 58.84 4
Wdev0 176298 23.13 13.3
Web1 12484 37.57 9.2
Hm1 10930 78.75 20.5
Rsrch1 4012 0.17 13.2
Rsrch2 157281 84.94 4
Src2-1 482 10.97 5.2
11.5
8.4
9.6
22.3
8.4
4.2
24.8
In order to understand better the data access patterns generated by modern servers in data center, Narayanan et al. [30] instrumented the core servers in Microsoft’s data center to collect block-level traces in 2007. They traced 36 volumes containing 179 disks on 13 servers. Each server has two disk drives configured as a RAID1 for booting the server, and uses one or more RAID5 as data volumes. Windows Server 2003 SP2 is adopted as the operating system of all the servers. Data is stored through NTFS and accessed through a variety of interfaces including CIFS and HTTP. The length of traces covers one week. In our experiments, we extracted 7 one-day traces from the Microsoft trace, and modified the trace format to meet the requirements of Disksim. Table 3 illustrates the characteristics of the 7 block-level traces, where Mds, Wdev, Web, Hm, Rsrch, and Src indicate that the servers are used for media, test Web, Web/SQL, hardware monitoring, research projects, and source control, respectively.
Bucket=1 second
Bucket=6 seconds 1500 1000
60 40 20 0
500 0
1
201 401 601 801 1001 1201 T ime(buckets)
Bucket=60 seconds
2000 Arrival rate
80
Arrival rate
Arrival rate
100
1500 1000 500 0
1
201 401 601 801 1001 1201 T ime(buckets)
1
121 241 361 481 601 721 T ime(buckets)
Fig.5 Request arrival rate of Mds1
10
Computing. Vol.94, No.1, 2012, pp.69-93.
100 50
Arrival rate
150
1500 1000 500
1
1
201 401 601 801 1001 1201 T ime(buckets)
6000 4000 2000 0
0
0
Bucket=60 seconds
8000
2000
200
Arrival rate
Arrival rate
Bucket=6 second
Bucket=1 second
250
1
201 401 601 801 10011201 T ime(buckets)
121 241 361 481 601 721
Time(buckets)
Bucket=6 seconds
Bucket=1 second
Bucket=60 seconds
40
800
30
Arrival rate
30 25 20 15 10 5 0
Arrival rate
Arrival rate
Fig.6 Request arrival rate of Wdev0
20 10 0
1
201 401 601 801 1001 1201
600 400 200 0
1
T ime(buckets)
201 401 601 801 1001 1201 T ime(buckets)
1
121 241 361 481 601 721 T ime(buckets)
Fig. 7 Request arrival rate of Web1 Bucket=60 seconds
Bucket=60 seconds
1200 1000 800 600 400 200 0
Bucket=60 seconds 800 Arrival rate
Arrival rate
Arrival rate
400 300 200 100
121 241 361 481 601 721 T ime(buckets)
Fig.8 Hm1 trace
400 200 0
0
1
600
1
121 241 361 481 601 721 T ime(buckets)
Fig.9 Rsrch1 trace
1
121 241 361 481 601 721 T ime(buckets)
Fig.10 Rsrch2 trace
The arrival rate of requests measures how many requests arrive within a bucket. A bucket indicates a specific time slot. Because the Src2-1 trace is very idle (482 requests for a whole day), we did not plot it here. Figs. 5-10 illustrate the request arrival rate of the remaining 6 different traces. Since the Mds1, Wdev0, and Web1 traces are relatively intensive, we changed the bucket size from 1 second to 6 seconds, and to 60 seconds. Table 4. Arrival rate related characteristics of the traces
Mds1
Wdev0 Web1
Bucket size 1 second 6 seconds 60 seconds 1 second 6 seconds 60 seconds 1 second 6 seconds
Maximal 86 1410 1844 217 2004 6019 25 35
Mean 2.3 34.2 159.4 1.7 13 111.3 0.3 0.9
Standard deviation 6.9 58.7 282.5 7.5 75.7 343.9 1.6 2.9 11
Computing. Vol.94, No.1, 2012, pp.69-93.
Hm1
Rsrch1
Rsrch2
Src2-1
60 seconds 1 second 6 seconds 60 seconds 1 second 6 seconds 60 seconds 1 second 6 seconds 60 seconds 1 second 6 seconds 60 seconds
672 2 2 1063 5 5 370 4 6 731 3 3 3
5.7 0.0025 0.0025 14.1 0.0042 0.0042 3.4 0.0076 0.0076 24.4 0.0025 0.0025 0.0042
30.3 0.065 0.065 81.1 0.1446 0.1446 25.2 0.1504 0.1943 89.9 0.0867 0.0867 0.1118
Figs. 5-10 show the request arrival rate at different time granularities for a certain amount of time. The figures indicate that burst is the most obvious feature across the 6 traces. For Wdev0 and Web1 traces (see Fig. 6 and Fig.7), the bursts are short lived at fine time granularity and they appear to smooth out with the increase of time granularity, although a number of spikes can be observed to interrupt the smoothness. For example, the bursts of Wdev0 and Web1 at 60 seconds time granularity are much smother than that at 1 second time granularity. We believe that the request arrival process of Wdev0 and Web1 demonstrate self-similarity. The behavior of Mds1 illustrated in Fig. 5 is likely unpredictable though we can observe bursts across the three time granularities. Figs. 8-10 show that the bursts contained in Hm1, Rsrch1, and Rsrch2 are separated by long idle periods. We will evaluate the 6 traces with some quantitative methods to further explore the bursty behavior. Table 4 summarizes the arrival rate related characteristics of the 7 traces including maximal, mean, and standard deviation of the arrival rate. The unit is the number of requests per bucket. The last column of Table 4 indicates that the standard deviation of the 7 traces grows significantly with the increase of the time granularity. This statistics implies that the arrival rates of the 7 traces appear to be more bursty with the growth of the time granularities. Let’s take Mds1 as an example, at 1 second time scale, the standard deviation of the arrival rate is 6.9, but when the time granularity is increased to 60 seconds, the standard deviation is increased to 282.5. The increased standard deviation indicates that the 7 traces all contain bursty behavior. In contrast to the intuitional observations in Figs. 5-10, we believe that the statistics in Table 4 are more convincing.
3.3 Performance evaluation
12
Computing. Vol.94, No.1, 2012, pp.69-93.
Avg response time(ms)
30 20 10 0 2
4 8 Threshold
16
FCFS
Pri-vscan
2
4 8 T hreshold
Pri-vscan
20 10 0 4 8 T hreshold
FCFS
SPT F
30
2
16
16
Pri-vscan
30 20 10 0
(d) Hm1
16
(e) Rsrch1 Avg response time(ms)
FCFS
SPT F
2
4 8 T hreshold
16
1.45 1.4 1.35 1.3
FCFS
SPT F
40
4 8 T hreshold
Pri-vscan
(c) Web1
50
2
FCFS 1.5
(b) Wdev0 Avg response time(ms)
Avg response time(ms)
(a) Mds1 FCFS 40
SPT F
600 500 400 300 200 100 0
Avg response time(ms)
SPT F
Avg response time(ms)
Pri-vscan
Avg response time(ms)
FCFS 40
Pri-vscan
SPT F
4 8 T hreshold
16
4 3.8 3.6 3.4 3.2 3 2
(f) Rsrch2
Pri-vscan
SPT F
4 8 T hreshold
16
400 300 200 100 0 2
(g) Src2-1 Fig. 11 Average response time
Please note that the aim of this paper is evaluating the disk idle behavior by using different disk schedulers and different queue length thresholds. Average response time is adopted to validate the impacts of different schedulers, thresholds, and traces on the disk performance. Fig. 11 shows the average response time of the 7 traces, and gives three indications: (1)The performance of disk scheduling mechanisms is very sensitive to different traces. This is because of the different data access patterns contained in the traces. For example, when the threshold is set to 2 and the scheduler is FCFS, the average response times of Web1 and Wdev0 are 1.48ms and 496ms, respectively. (2) Simple FCFS produces the longest average response time, and the position based schedulers such as SPTF yield the shortest average response time, which confirms the report given by [34]. 13
Computing. Vol.94, No.1, 2012, pp.69-93.
(3) Queue length thresholds have a negligible impact on the FCFS scheduler, but have a significant influence on the SPTF scheduler. Additionally, the results show that if the threshold grows beyond a certain length, the increased length only achieves a limited contribution to the disk performance.
3.4 Queue length evaluation
The queue length at disk drives has a significant impact on the efficiency of scheduling mechanisms. According to Little’s Law, the average number of requests in a queue is equal to the average arrival rate multiplied by the average response time. Because the queue length can demonstrate some implications of the system performance, resource utilization, and workload intensity, we investigated the impacts of queue length thresholds and different schedulers on the queue length. The queue length threshold was set at disk drives (Queue 2 in Fig. 1). The maximal
SPT F
FCFS 3100
1050 1000 950 900 2
4 8 T hreshold
2900 2800 2700 2600
16
2
SPT F
160 150 140 130 2
4 8 T hreshold
(d) Hm1
4 8 T hreshold
16
FCFS 195 190 185 180 175 170 165 160
Pri-vscan
2
4 8 T hreshold
(e) Rsrch1
Pri-vscan
SPT F
4 8 T hreshold
16
12 10 8 6 4 2 0
16
2
(b) Wdev0
Max queue length
Max queue length
Pri-vscan
FCFS
SPT F
3000
(a) Mds1 FCFS 180 170
Pri-vscan
Max queue length
Pri-vscan
(c) Web1 SPT F Max queue length
FCFS 1100
Max queue length
Max queue length
and average queue lengths were collected at device driver (Queue 1 in Fig. 1).
16
FCFS
Pri-vscan
SPT F
2
4 8 T hreshold
16
20 15 10 5 0
(f) Rsrch2
14
Computing. Vol.94, No.1, 2012, pp.69-93. FCFS
Pri-vscan
SPT F
4 8 T hreshold
16
Max queue length
200 190 180 170 160 150 2
(g) Src2-1 Fig. 12 Maximal queue length
Modern workloads tend to be more bursty, which normally increases the queue length dramatically for a short period of time [34,39]. Ruemmler and Wilkes [36] reported that workloads fluctuate severely and the queue length can reach as many as 1000 requests. Fig. 12 illustrates the maximal queue length with different schedulers across the 7 traces when the threshold is changed from 2 to 4, 8, and 16. The maximal queue length of Wdev0 reaches 3048 for the FCFS scheduler when the threshold is set as 4. This is the highest one among the 7 traces. However, the value drops to 2759 when the scheduler is changed to SPTF and the threshold is 4. The same phenomenon can be observed across the 7 traces. Another trend illustrated by Fig. 12 is that the maximal queue length decreases with the growth the threshold. This implies that an optimal scheduler and relatively long threshold can reduce the bursts of the I/O workload to a certain degree.
FCFS
SPT F
0.005 0 2
4 8 Threshold
2
16
(a) Mds1 Pri-vscan
SPT F
0.004 0.003 0.002 0.001 0 2
4 8 T hreshold
(d) Hm1
4 8 T hreshold
FCFS 0.001
16
Pri-vscan
0.0004 0.0002 0
(e) Rsrch1
SPT F
0.00002 0.000015 0.00001 0.000005 0 2
FCFS 0.002
SPT F
0.0006
4 8 T hreshold
Pri-vscan
4 8 T hreshold
16
(c) Web1
0.0008
2
FCFS 0.000025
16
(b) Wdev0
Avg queue length
Avg queue length
FCFS
SPT F Avg queue length
0.015 0.01
Pri-vscan
1.2 1 0.8 0.6 0.4 0.2 0
Avg queue length
Pri-vscan
Avg queue length
Avg queue length
FCFS 0.025 0.02
16
Pri-vscan
SPT F
0.0015 0.001 0.0005 0 2
4 8 T hreshold
16
(f) Rsrch2 15
Computing. Vol.94, No.1, 2012, pp.69-93.
Avg queue length
FCFS
Pri-vscan
SPT F
0.001 0.0008 0.0006 0.0004 0.0002 0 2
4 8 T hreshold
16
(g) Src2-1 Fig. 13 Average queue length
The average queue length is closely related to the sustainable workload the disk drive experienced. Fig. 13 depicts the observed average queue length of the 7 traces with three different schedulers as a function of the queue length threshold. It demonstrates that different schedulers and queue length thresholds do incur an impact on the average queue length. Generally, the average queue length decreases with the increase of the threshold, and the position based schedulers (e.g. SPTF) also reduce the average queue length. This is similar to the patterns illustrated in Fig.12. It is very interesting to observe that the average queue lengths of the 7 traces are smaller than 0.05 except the Wdev0 trace. For the Wdev0 trace, the highest number of average queue length is 1.0025 when the FCFS scheduler is used and the threshold is 2. The length drops to 0.55 when the scheduler is changed to SPTF and the threshold is set as 16. This shows some evidences that most of the workloads are pretty idle.
3.5 Disk idle behavior FCFS-2 Pri-vscan-2 SPT F-2
FCFS-4 Pri-vscan-4 SPT F-4
FCFS-8 Pri-vscan-8 SPT F-8
FCFS-16 Pri-vscan-16 SPT F-16
30000
Length of idle periods(ms)
20000
10000
1000
100
10
0
100% 80% 60% 40% 20% 0%
(a)Mds1
16
Computing. Vol.94, No.1, 2012, pp.69-93. FCFS-2 Pri-vscan-2 SPT F-2
FCFS-4 Pri-vscan-4 SPT F-4
FCFS-8 Pri-vscan-8 SPT F-8
FCFS-16 Pri-vscan-16 SPT F-16
100% 80% 60% 40%
Length of idle periods(ms)
30000
20000
10000
1000
100
10
0
20% 0%
(b) Wdev0
30000
10000
FCFS-16 Pri-vscan-16 SPT F-16
20000
FCFS-8 Pri-vscan-8 SPT F-8
1000
100
FCFS-4 Pri-vscan-4 SPT F-4
10
0
FCFS-2 Pri-vscan-2 SPT F-2 100% 80% 60% 40% 20% 0%
Length of idle periods(ms)
(c) Web1 FCFS-2 Pri-vscan-2 SPT F-2 100% 80% 60%
FCFS-4 Pri-vscan-4 SPT F-4
FCFS-8 Pri-vscan-8 SPT F-8
FCFS-16 Pri-vscan-16 SPT F-16
30000
Length of idle periods(ms)
20000
10000
1000
100
10
0
40% 20% 0%
(d)Hm1 FCFS-2 FCFS-16 Pri-vscan-8 SPT F-4
FCFS-4 Pri-vscan-2 Pri-vscan-16 SPT F-8
FCFS-8 Pri-vscan-4 SPT F-2 SPT F-16
30000
Length of idle periods(ms)
20000
10000
1000
100
10
0
100% 80% 60% 40% 20% 0%
(e)Rsrch1
17
Computing. Vol.94, No.1, 2012, pp.69-93. FCFS-2 Pri-vscan-2 SPT F-2 100%
FCFS-4 Pri-vscan-4 SPT F-4
FCFS-8 Pri-vscan-8 SPT F-8
FCFS-16 Pri-vscan-16 SPT F-16
80% 60% 40% 20% 30000
Length of idle periods(ms)
20000
10000
1000
100
10
0
0%
(f)Rsrch2 FCFS-2 Pri-vscan-2 SPT F-2
FCFS-4 Pri-vscan-4 SPT F-4
FCFS-8 Pri-vscan-8 SPT F-8
FCFS-16 Pri-vscan-16 SPT F-16
100% 80% 60% 40% 20% 30000
Length of idle periods(ms)
20000
10000
1000
100
10
0
0%
(g) Src2-1 Fig.14 The cumulative distribution of disk idle periods
Fig. 14 illustrates the cumulative distribution of disk idle periods by using different schedulers and thresholds. A basic observation is that the number of idle periods of all traces except Web1 trace is significantly affected by different schedulers and thresholds, and using a specific scheduler and threshold can generate many idle periods. However, about 80% of the generated idle periods are shorter than 1000ms across the 7 traces. This is reasonable because the service time of disk drives is normally at milliseconds level, and the squeezed idle periods should be at the same level by reordering and scheduling the requests.
Table 5. Characteristics of the idle periods Traces Mds1 Wdev0 Web1 Hm1 Rsrch1 Rsrch2 Src2-1
Number of idle periods FCFS-2 SPTF-16 28378 36339 59886 124103 11371 11699 3445 8701 2214 3538 71001 123395 103 239
Average idle length (ms) FCFS-2 SPTF-16 4869 4863 3630 3574 8900 8881 39719 35557 171794 171615 3409 3415 2878671 2370684
18
Computing. Vol.94, No.1, 2012, pp.69-93.
In order to further explore the idle behavior, we investigated the exact number of idle periods and the average idle length by using different schedulers and thresholds. The results show that the number of idle periods across 7 traces is affected significantly by schedulers and thresholds. For example, for the Wdev0 trace, the number of idle periods is increased from 59886 to 124103 when the policy is changed from FCFS-2 to SPTF-16. However, we observed a slight decrease of the average idle length when the scheduler is changed from FCFS to Pri-vscan, and to SPTF, and the length also drops with the growth of the thresholds. This is not expected.
Table 5
summarizes parts of the experimental results.
Table 6. The number of idle periods which are longer than one second (1000,10000] ms 4778 15152 1815 220 443 6914 5 Pri-vscan
SPT F
FCFS No. of idle periods
No. of idle periods
FCFS
32000 30000 28000 26000 24000 22000
(10000,20000] ms 114 516 70 47 11 24 0 Pri-vscan
4
8
80000 60000 40000 2
16
Threshold
(a) Mds1 Pri-vscan
SPT F
7500 6000 4500 3000 2
4
8
T hreshold
(d) Hm1
16
FCFS 3500
Pri-vscan
2000 1500
(e) Rsrch1
SPT F
9400 9300 9200 9100 2
4 8 T hreshold
FCFS
SPT F
2500
4 8 T hreshold
Pri-vscan
16
(c) Web1
3000
2
FCFS 9500
16
No. of idle periods
FCFS 9000
4 8 T hreshold
(b) Wdev0
No. of idle periods
No. of idle periods
SPT F
100000
2
>20000ms 230 667 356 99 31 6 1
No. of idle periods
Mds1 Wdev0 Web1 Hm1 Rsrch1 Rsrch2 Src2-1
16
Pri-vscan
SPT F
120000 100000 80000 60000 2
4 8 Threshold
16
(f) Rsrch2
19
Computing. Vol.94, No.1, 2012, pp.69-93.
No. of idle periods
FCFS
Pri-vscan
SPT F
250 200 150 100 50 0 2
4 8 T hreshold
16
(g) Src2-1 Fig. 15 Number of the idle periods which are shorter than one second
We further investigated the number of idle periods distributed in ranges of (0,10] ms, (10,100] ms, (100,1000] ms, (1000,10000] ms, (10000,20000] ms, and (20000, ∞ )ms. We found that the 7 traces have very long idle periods ranging from dozens of seconds to a few hours, and different schedulers and thresholds do not affect the idle periods which are longer than one second. Table 6 lists the number of the idle periods which are longer than one second. However, a specific scheduler can generate many short idle periods. This is reasonable because the service time of an I/O request is normally a few milliseconds, and the squeezed idle periods should be at the same level by using the traditional disk schedulers to reorder the requests in the queue. This also explains why we observed a decrease of the average idle length listed in Table 5. Fig. 15 plots the number of idle periods which are shorter than 1 second. A basic trend is that the numbers are increased with the growth of the threshold, and different schedulers do affect the numbers. According to Table 4, Table 5, and Table 6, the characteristics of inter-arrival time and disk idle time of the 7 traces demonstrate significant difference. For example, when the threshold is configured as 2 and the scheduler is FCFS, the ratios of the number of idle periods to the number of inter-arrival periods are 45% for Mds1, 34% for Wdev0, 91% for Web1, 32% for Hm1, 55% for Rsrch1, 45% for Rsrch2, and 21% for Src2-1, respectively. The reduction in number of idle periods indicates that the arrival process of requests at the disk drive shows bursty behavior. Different ratio indicates different degree of burst contained in the traces.
3.6 Self-similarity evaluation
20
(e)Rsrch1
SPTF-4
SPTF-8
SPTF-16
SPTF-4
SPTF-8
SPTF-16
SPTF-4
SPTF-8
SPTF-16
SPTF-2 SPTF-2 SPTF-2
Pri-vscan-8
Pri-vscan-16 Pri-vscan-16
Pri-vscan-4
Pri-vscan-2
Pri-vscan-8
Pri-vscan-2
Pri-vscan-4
Pri-vscan-8
Pri-vscan-4
Pri-vscan-2
FCFS-16
FCFS-8
Variance method Periodogram method Whittle estimator FCFS-4
SPTF-16
SPTF-8
SPTF-4
SPTF-2
Pri-vscan-16
Pri-vscan-8
Pri-vscan-4
FCFS-16
Pri-vscan-2
FCFS-8
FCFS-4
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 FCFS-2
Hurst parameters
(d) Hm1
Variance method Periodogram method Whittle estimator FCFS-2
Hurst parameters
FCFS-16
FCFS-8
FCFS-4
SPTF-16
SPTF-8
SPTF-4
SPTF-2
Pri-vscan-16
Pri-vscan-8
Pri-vscan-4
Pri-vscan-2
FCFS-16
FCFS-8
Variance method Periodogram method Whittle estimator
(c )Web1 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Pri-vscan-16
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Variance method Periodogram method Whittle estimator FCFS-4
FCFS-8
(b)Wdev0
FCFS-2
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 FCFS-2
Hurst parameters
(a) Mds1
FCFS-16
FCFS-4
SPTF-8
Variance method Periodogram method Whittle estimator
SPTF-16
SPTF-4
SPTF-2
Pri-vscan-8
Pri-vscan-16
Pri-vscan-4
Pri-vscan-2
FCFS-8
FCFS-16
FCFS-4
Variance method Periodogram method Whittle estimator
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 FCFS-2
Hurst parameters
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 FCFS-2
Hurst parameters
Computing. Vol.94, No.1, 2012, pp.69-93.
(f)Rsrch2
Fig. 16 Hurst parameters of different traces by using different methods
We employed Variance method, Periodogram method, and Whittle estimator to calculate the Hurst parameters [23], because using different methods can check the accurate degree of the estimation of Hurst parameter. The configurations of the disk drive in the simulation are set as different disk schedulers with different queue length thresholds. For smooth Poisson traffic, the Hurst parameter is 0.5. For self-similar series, the Hurst parameters are distributed between 0.5 and 1. With the increase in value of the Hurst parameter, the degree of self-similarity grows. Fig. 16 shows the Hurst parameters of 6 traces with different configurations and different calculation methods. We did not calculate the Hurst parameters of trace Src2-1, because it contains very long idle time (482 requests for 24 hours). Fig. 16 demonstrates that the Hurst parameters of disk idle periods are higher than 0.5 across the 6 traces 21
Computing. Vol.94, No.1, 2012, pp.69-93.
with different configurations by using different calculation methods. This confirms the selfsimilarity of the disk idle periods. The figure also implies that different schedulers and thresholds can change the Hurst parameters to a certain degree.
3.7 Long-range dependence evaluation
1
201
601
801
1001
0.00
-0.50
1
201
401
1
201
401
0.50
FCFS-16
ACF
ACF
0.50
401
0.50
Pri-vscan-2 ACF
ACF
0.00
-0.50
0.50 0.30 0.10 -0.10
ACF
FCFS-2
601
801
1001
601
801
1
201
0.50
Pri-vscan-16
0.00
-0.50
-0.50
1001
SPTF-2
0.00 401
601
801
1001
801
1001
801
1001
801
1001
801
1001
801
1001
SPTF-16
ACF
0.50
0.00
1
201
401
601
801
1001
-0.50
1
201
401
601
(a) Mds1 0.050 1
201
401
601
801
1001
-0.150
ACF
1
201
401
601
801
-0.050
1001
0.150
FCFS-16
0.050
ACF
ACF
1
201
401
601
801
1001
-0.050
201
0.150
Pri-vscan-16
0.050
-0.150
1
401
601
-0.150
-0.150
0.150 -0.050
-0.050
SPTF-2
0.050
ACF
-0.050
0.150
Pri-vscan-2
0.050
ACF
ACF
0.150
FCFS-2
0.150
1
201
401
601
801
1001
-0.150
SPTF-16
0.050 -0.050
1
201
401
601
-0.150
201
401
601
801
0.600
Pri-vscan-2 ACF
1001
0.600 0.400 0.200 0.000 -0.200 1
FCFS-16
201
401
601
801
201
401
601
801
SPTF-2
0.100 1
1001
-0.400
1001
0.600 0.400 0.200 0.000 -0.200 1
Pri-vscan-16
ACF
ACF
0.600 0.400 0.200 0.000 -0.200 1
1001
0.600 0.400 0.200 0.000 -0.200 1
FCFS-2
201
401
601
SPTF-16
ACF
0.600 0.400 0.200 0.000 -0.200 1
ACF
201
401
601
801
201
401
601
(c )Web1
1001
0.30 0.20 0.10 0.00 -0.10 1
1001
1001
0.30 0.20 0.10 0.00 -0.10 1
ACF
201
401
601
801
FCFS-16
201
401
601
801
Pri-vscan-16
ACF
ACF
0.30 0.20 0.10 0.00 -0.10 1
0.30 0.20 0.10 0.00 -0.10 1
Pri-vscan-2
201
401
601
801
SPTF-2
ACF
1001
0.30 0.20 0.10 0.00 -0.10 1
FCFS-2
ACF
0.30 0.20 0.10 0.00 -0.10 1
201
401
601
801
1001
SPTF-16
ACF
ACF
(b)Wdev0
201
401
601
801
201
401
601
801
1001
(d) Hm1
22
Computing. Vol.94, No.1, 2012, pp.69-93. 0.400
ACF 0.400 0.200 0.000 -0.200 1
201
401
601
801
1001
-0.100
1
201
0.400
FCFS-16
201
401
601
801
401
601
801
1001
-0.100
1
201
0.400
Pri-vscan-16
-0.100
1001
SPTF-2
401
601
801
1001
SPTF-16
ACF
1
ACF
-0.100
ACF
0.400
Pri-vscan-2
ACF
FCFS-2
ACF
0.400
1
201
401
601
801
-0.100
1001
1
201
401
601
801
1001
801
1001
(e)Rsrch1 FCFS-2
1
201
401
601
801
1001
FCFS-16
Pri-vscan-2
0.004 0.003 0.002 0.001 0.000 -0.001
1
201
401
601
801
1001
0.004 0.003 0.002 0.001 0.000 -0.001
SPTF-2
ACF 1
201
401
601
801
1001
Pri-vscan-16
0.004 0.003 0.002 0.001 0.000 -0.001
1
201
401
601
SPTF-16
ACF
ACF
ACF
0.004 0.003 0.002 0.001 0.000 -0.001
0.004 0.003 0.002 0.001 0.000 -0.001
ACF
ACF
0.004 0.003 0.002 0.001 0.000 -0.001
1
201
401
601
801
1001
1
201
401
601
801
1001
(f)Rsrch2 Fig. 17 Autocorrelation function of idle periods
The autocorrelation function ACF( k )defined by equation (1) in Section 2.4 measures the correlation of a random variable with itself at different time lags. Since this paper attempts to explore the correlations between the idle periods, the autocorrelation function is adopted to determine whether the current disk idle interval is correlated with the following disk idle intervals. The value of ACF( k ) can range between 1 (highly positive correlation) and -1 (highly negative correlation). Generally, the ACF( k ) gradually approaches zero with the increase of k . ACF( k )=0 indicates that there is no autocorrelation at lag k . The decay rate of ACF determines that a time series is a short-range dependence or a long-range dependence. A slow decay rate implies a long-range dependence, and vice versa. Fig. 17 plots the autocorrelation functions of the idle periods across the 6 traces with different schedulers and thresholds. Like the Hurst parameter, we did not plot the autocorrelation function of Src2-1 due to the very long idle time. We calculated up to 1000 lags to demonstrate the possibly slow decay rate of the ACF( k ). According to the decay rate, we believe that the disk idle periods illustrated in Fig. 17 show a weakly long-range dependence. It is very interesting to observe that the spikes of the autocorrelation function are distributed across all the plots in Fig. 17 except Wdev0 and Rsrch2. Please note that the Y axis of Fig. 17(f) is in a different scale. A value of the positive autocorrelation function indicates that there is a strong temporal locality, i.e., a value of the random variable has a high probability to be followed by another variable of the 23
Computing. Vol.94, No.1, 2012, pp.69-93.
same order of magnitude, while a value of negative autocorrelation function implies the inverse [35]. Therefore, the higher the spike is, the stronger the temporal locality is. For example, the temporal locality of Hm1 is stronger than that of other traces, because the spikes are much higher than that of the other 5 traces. Another implication from Fig. 17 is that different schedulers and thresholds do affect the correlation behavior. For example, changing the schedulers and thresholds generates more spikes which indicate increased locality. According to the above analysis, we believe that the disk idle periods demonstrate weak longrange dependence and high temporal locality, and the schedulers and thresholds affect the correlations contained in the idle periods.
4. DISCUSSIONS
Switching the traditional disk drives between different power states is normally adopted to take advantage of those long idle periods to save energy. Generally, the existing methods used to do the power state transition can be classified into two categories [8]. The first one is a simple timeout strategy which has gained wide popularity and is currently implemented in many operating systems. Once a disk drive is idle for a specific period of time, which is longer than some given timeout threshold, the disk is spun down in an effort to save energy. Upon the arrival of a new request, the disk is spun up to serve the request. The timeout strategy offers good accuracy, but it wastes energy when the disk is waiting for the timeout period to expire. The second one is a dynamic prediction which is based on the behavior of applications. For example, a series of events that are likely to happen again in the future. The method shuts down disk drives immediately to eliminate the waiting time of the timeout strategy. However, such kind of prediction turns out to be very difficult to achieve because of the large amount of random variance observed in the lengths of sequential idle periods. The prediction method normally predicts the forthcoming idle periods by using a cumulative average of the previous idle periods [22]. Unfortunately, when a very long idle period occurs, the predicted value of this long idle period is often much lower than the actual idle period. This underestimation is undesirable for energy saving, especially when the predicted value is lower than the time penalty of disk drives. In this case, the disk drive will stay in an active power state instead of entering a standby power state. This results in a large amount of unnecessary energy consumption. Our evaluation shows that modern workloads at block level are bursty, and the long idle periods contained in the
24
Computing. Vol.94, No.1, 2012, pp.69-93.
workloads range from dozens of seconds to a few hours. Therefore, it is a challenge to do a prediction with the workloads in terms of the traditional view. Many research efforts have been invested in investigating computer system related workloads. By analyzing four sets of traces that span time scales of milliseconds through months, Gribble et al. [15] demonstrated that high-level file system events exhibit a self-similar behavior, but only for short-term time scales of approximately under a day. Hsu and Smith [21] found that for time scales ranging from tens of milliseconds to tens and sometimes even hundreds of seconds, the I/O traffic of personal computers and servers are well represented by a self-similar process. Riska and Riedel [35] reported that disk drives in a wide range of applications and computing environments exhibit a long-range dependence in inter-arrival time and request location. Ostring and Sirisena [31] further investigated the influence of long-range dependence on traffic prediction. They discovered that the prediction accuracy grows with the increase of the Hurst parameter, and this is mainly because of the utilization of the specific short-term correlations contained in the traffic. In contrast to the existing works, our evaluation discovered that the idle periods demonstrate self-similarity and weak long-range dependence, and different schedulers and thresholds do affect both the Hurst parameter and autocorrelation function. Therefore, we have two indications: (1) The length of disk idle periods is predictable, because one implication of long-range dependence is that future events in the process have a significant dependence on the event history. (2) Using a specific scheduler and threshold that can improve the Hurst parameter should be able to significantly improve the prediction accuracy of the idle periods, thus saving energy. A traditional view is that the idle periods contained in server workloads are too short to compensate the energy and time penalty incurred by switching off/on server disk drives [5, 42]. Therefore, DRPM [17] is designed to exploit those short idle periods to save energy though the technique can be used for the long idle periods. Our evaluation shows that all the idle periods generated by specific schedulers and thresholds are shorter than one second, and the number of generated idle periods is significantly affected by the schedulers and thresholds. We also discovered that the position based schedulers and long queue length thresholds can significantly reduce the average queue length and the maximal queue length observed in the disk queue. This indicates that the intensity of workloads and bursts are both reduced. Therefore, we believe that the DRPM disk drives can conserve more energy by using a position based scheduler and a long queue length threshold.
25
Computing. Vol.94, No.1, 2012, pp.69-93.
As discussed above, using a specific scheduler and queue length threshold can improve the prediction accuracy of workloads, thus saving more energy for the DRPM disk drives. However, traditional disk schedulers are designed for leveraging the characteristics of disk drives (e.g. reducing random accesses, minimizing seek time, etc) and optimizing performance. Recently, it has been recognized that it is not necessary to instantly have a computer’s maximum power available, as long as the Quality of Service (QoS) delivered satisfies a predetermined standard. Power consumption is now a metric equal to performance [38]. Therefore, an energy oriented scheduler is required to further boost the energy conservation of disk drives. Lu et al.[27] reported that normally an overwhelming portion of the resource capacity is employed to guarantee the performance of a small fraction of the requests. The requirement of resource capacity can be significantly reduced by relaxing the performance guarantees for this small portion of requests, while maintaining strong QoS guarantees for most of the requests. Therefore, they proposed to reshape the workloads to provide graduated, distribution based QoS guarantees. If we reconsider their work from a power standpoint, an energy oriented scheduler is feasible by setting different QoS for different requests and reshaping the workloads. This is because the worst case requirements are usually determined by a small portion of the workloads. Flash memory is a non-volatile memory which can be electrically erased and reprogrammed. Its major advantages such as small physical size, no mechanical components, lower power consumption, non-volatility, and high performance have made it likely to replace disk drives in more and more systems where either size and power or performance are important [6,12]. Generally, flash memory can play two roles in the existing computer system architecture. The first role is an extension to RAM, and a layer between RAM and the traditional disk drives. For example, a hybrid disk integrates NAND flash memory into a standard disk drive as a second level cache [3]. This architecture can extend the length of the idle periods and leave the disk drives in a low power state much longer by satisfying the requests in the flash memory, thus saving energy. However, the hybrid disk must consider the energy saving against the decrease in the reliability, though it can strike a good balance among storage capacity, performance, and energy efficiency. We believe that the hybrid disk is a temporary method. The second role is replacing the traditional disk drives as a new block storage media. Unfortunately, due to the constraints of storage capacity and price, the flash memory still has a long way to go to completely replace the traditional disk drives, though it is an important and promising storage media. Therefore, it is important to understand the idle behavior of disk drives in order to build energy efficient disk storage systems. 26
Computing. Vol.94, No.1, 2012, pp.69-93.
5. CONCLUSIONS This paper explores the impacts of different disk schedulers and queue length thresholds on the disk idle behavior by using 7 real traces. The experimental results give the following indications by employing quantitative and qualitative methods: (1) Position based schedulers and a long queue length threshold can significantly reduce the maximal queue length and the average queue length experienced by disk drives. This indicates potential opportunities for the DRPM disk drives to save energy. (2) Position based schedulers and a long queue length threshold can generate more idle periods which are shorter than one second, but they do not affect those long idle periods contained in the modern server workloads. (3) Disk idle periods demonstrate both self-similarity and weak long-range dependence. The disk schedulers and queue length thresholds do affect the Hurst parameter and the correlation behavior of the workloads. This implies that if we reshape the workloads to improve the Hurst parameters, we should be able to have a more accurate prediction of the workloads. The analysis results in this paper should be able to provide useful insights for designing or implementing energy efficient disk storage systems. Possible direction for future work includes a power-aware disk scheduler by leveraging the experimental results reported in this paper.
ACKNOWLEDGMENT We would like to thank the anonymous reviewers for helping us refine this paper. Their constructive comments and suggestions were very helpful. The work was supported by the National Natural Science Foundation (NSF) under grant (No. 61073064), the Fundamental Research Funds for the Central Universities, and a start-up research fund from Jinan University. Any opinions, findings and conclusions are those of the authors and do not necessarily reflect the views of the above agencies.
REFERENCES [1] [2] [3]
Akyurek S, Salem K (1995). Adaptive block rearrangement. ACM Transactions on Computer Systems 12(2): 89-121 Battles B, Belleville C, Grabau S, and Maurier J (2007). Reducing data center power consumption through efficient storage. Netapp White Paper. Bisson T, Brandt SA, Long DDE (2007). A hybrid disk-aware spin-down algorithm with I/O subsystem support. In: proceedings of IEEE International conference on Performance, Computing, and Communications Conference 2007 (IPCCC 2007), pp 236-245
27
Computing. Vol.94, No.1, 2012, pp.69-93. [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]
[16]
[17] [18] [19] [20] [21] [22] [23] [24] [25]
[26] [27] [28] [29] [30]
Bucy J, Schindler J, Schlosser S, Ganger G, et al. (2008). The DiskSim simulation environment version 4.0 reference manual. CMU-PDL-08-101. Carrera E, Pinheiro E, and Bianchini R (2003). Conserving disk energy in network servers. In: Proceedings of the 17th International Conference on Supercomputing, pp 86-97 Chang L, Kuo T (2005). Efficient management for large-scale flash-memory storage systems with resource conservation. ACM Transactions on Storage 1(4): 381–418 Colarelli D, Grunwald D (2002). Massive arrays of idle disks for storage archives. In: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, pp 1–11 Deng Y, Wang F, and Helian N (2008). EED: Energy Efficient Disk drive architecture. Information Sciences 178(22): 4403-4417 Deng Y (2009). Exploiting the performance gains of modern disk drives by enhancing data locality. Information Sciences 179 (14): 2494-2511 Deng Y, Pung B (2011). Conserving Disk Energy in Virtual Machine Based Environments by Amplifying Bursts. Computing 91(1): 3-21 Deng Y (2011). What is the Future of Disk Drives, Death or Rebirth?ACM Computing Surveys 43(3): article 23. Deng Y, Zhou J (2011).Architectures and optimization methods of flash memory based storage systems. Journal of Systems Architecture 57(2): 214-227 Fan X, Weber W, Barroso L (2007). Power provisioning for a warehouse-sized computer. In: Proceedings of the 34th Annual International Symposium on Computer Architecture, pp 13-23 Geist R, and Daniel S (1987). A continuum of disk scheduling algorithms. ACM Transactions on Computer Systems 5(1): 77–92 Gribble S, Manku G, Roselli D, Brewer E, Gibson T, and Miller E (1998). Self-similarity in file systems. In Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modelling of computer systems, pp 141-150 Greenawalt PM (1994). Modeling Power Management for Hard Disks. In: Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems (MASCOTS 94), pp 62-66 Gurumurthi S, Sivasubramaniam A, Kandemir M, Franke H (2003). Reducing disk power consumption in servers with DRPM. Computer 36 (12): 59-66 Helmbold D, Long D, Sconyers T, and Sherrod B (2000). Adaptive disk spin-down for mobile computers. Mobile Networks and Applications 5 (4): 285–297 Hitachi. 2009. Power and Acoustic Management. Hitachi White Paper. http://www.hitachigst.com/. Hofri M (1980). Disk scheduling: FCFS vs. SSTF revisited. Communications of the ACM 23(11): 645-653 Hsu W, and Smith A (2003). Characteristics of I/O traffic in personal computer and server workloads. IBM Systems Journal 42 (2): 347-372 Hwang C, Wu A (2000). A predictive system shutdown method for energy saving of event-driven computation. ACM TODAES 5(2): 226 – 241 Karagiannis T, Faloutsos M, and Molle M (2003). A user-friendly self-similarity analysis tool. ACM SIGCOMM Computer Communication Review 33(3): 81-93 Leland W, Taqqu M, Willinger W, and Wilson D (1994). On the self-similar nature of Ethernet traffic (extended version). IEEE/ACM Transactions on Networking 2(1): 1-15 Li K, Kumpf R, Horton P, and Anderson T (1994). Quantitative analysis of disk drive power management in portable computers. In: Proceedings of the USENIX Winter Conference, pp 279– 291 Li D and Wang J (2004). EERAID: energy-efficient redundant and inexpensive disk array. In: Proceedings of the 11th ACM SIGOPS European Workshop. Lu L, Doshi K, Varman P (2008). Workload decomposition for QoS in hosted storage services. In: Proceedings of the 3rd workshop on Middleware for service oriented computing, pp 19-24. Maximum Throughput, Inc. 2002. Power, Heat, and Sledgehammer. Moore, F. 2002. More power needed. Energy User News. Narayanan D, Donnelly A, Rowstron A (2008). Write off-Loading: practical power management for enterprise storage. ACM Transactions on Storage 4 (3): article 10 28
Computing. Vol.94, No.1, 2012, pp.69-93. [31] [32] [33]
[34]
[35] [36] [37] [38] [39] [40]
[41]
[42]
Ostring S and Sirisena H (2001). The influence of long-range dependence on traffic prediction. In: Proceedings of the IEEE International Conference on Communications (ICC 2001), pp 1000-10005 Pinheiro E and Bianchini R (2004). Energy conservation techniques for disk array-based servers. In: Proceedings of the 18th International Conference on Supercomputing, pp. 68-78. Riska, A and Riedel E (2003). It’s not fair - evaluating efficient disk scheduling. In: Proceedings of 11th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS'03), pp. 288-295 Riska A, Riedel E, and Iren S (2004). Adaptive disk scheduling for overload management. In: Proceedings of the 1st International Conference on The Quantitative Evaluation of Systems (QEST04), pp 176-185 Riska A and Riedel E (2006). Long-range dependence at the disk drive level. In: Proceedings of the 3rd international conference on the Quantitative Evaluation of Systems, pp 41-50 Ruemmler C and Wilkes J (1993). Unix disk access patterns. In: Proceedings of the Winter 1993 USENIX Technical Conference, pp 313–323 Son S, Chen G, and Kandemir M (2005). Disk layout optimization for reducing energy consumption. In: Proceedings of the 19th International Conference on Supercomputing, pp. 274-283. SPEC-Power and Performance. 2009. http://www.spec.org/power_ssj2008/. Welsh M and Culler D (2003). Adaptive overload control for busy Internet servers. In: Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems (USITS’03) Worthington B, Ganger G, AND Patt Y (1994). Scheduling algorithms for modern disk drives. In: Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, pp 241-251 Zedlewski J, Sobti S, Garg N, Zheng F, Krishnamurthy A, and Wang R (2003). Modeling HardDisk Power Consumption. In: Proceedings of the 2nd USENIX Conference on File and Storage Technology (FAST03), pp 217-230 Zhu Q, Chen Z, Tan L, etc. (2005). Hibernator: helping disk arrays sleep through the winter. In: Proceedings of the 20th ACM Symposium on Operating Systems Principles 2005(SOSP 2005), pp 177 – 190
29