Evaluation of Disk-level Workloads at Different Time-scales Alma Riska∗ College of William and Mary Williamsburg, VA 23185
[email protected]
Abstract In this paper, we characterize three different sets of disk-level traces collected from enterprise systems. The data sets differ in the granularity of the recorded information and are called accordingly the Millisecond, the Hour, and the Lifetime traces. We analyze the disklevel utilization, the availability of idleness, the dynamics of the read and write traffic, over time and across an entire drive family. Our evaluation confirms that disk drives operate in moderate utilization and experience long stretches of idleness. The workload arriving at the disk is bursty across all time scales evaluated. Also, there is variability across drives of the same family, with a portion of them fully utilizing the available disk bandwidth for hours at a time.
1
Introduction
Driven by the ubiquitous need to store every piece of information digitally, storage systems have experienced tremendous growth and advancement in the recent years. Today, storage system designers are facing challenging application and user requirements, such as achieving virtually 100% system-level data availability, closing the performance gap in the IO hierarchy, and reducing power consumption during operation. In order to meet these demands, storage systems are advancing the current state-of-the-art in system software and architectures and deploying new emerging technologies such as solid state drives [10]. Imperative to successful deployment of new features and technologies in storage systems is incorporation of accurate workload characterization into the design process. Consequently, significant research efforts have been placed to analyze storage system workloads. For ∗ The authors were Seagate Technology employees while conducting this research.
Erik Riedel∗ EMC Corporation Cambridge, MA 02142 riedel
[email protected]
representative storage workload characterization, efforts have been placed to analyze the workload at various levels of the data path, including the file system [3, 2], the device driver [16, 22], and the array controller [1]. System instrumentation is the most common approach to enable trace collection from live and experimental storage systems running general purpose benchmarks such as TPC [19] and SPC [18] or more targeted ones such as Postmark [11] and IOZone [12]. A good set of traces obtained through such instrumentation can be found at the SNIA trace repository [17]. Any captured workload has its set of drawbacks and advantages. For example, experimental systems can be controlled and configured according to one’s needs allowing flexibility in the range of measured scenarios. But benchmarks often run stationary workload which is not the case in real systems. Instrumentation of live systems enables collection of realistic workloads. However, deploying the instrumentation in targeted systems is often tedious and limits the number of monitored systems, because of concerns that the measurements may interfere with the user traffic or may affect the integrity of the system. Consequently, the captured workloads may not be representative of the full spectrum of workloads in a family of applications and/or systems. In this paper, we use three unique data sets obtained through three different measurement techniques from enterprise storage systems. The main distinction in the available information in these data sets is the granularity. The first data set, called the Millisecond traces, is measured using a non-invasive SCSI bus analyzer and contains fine-grained (i.e., millisecond level) block-level workload information. The second data set, called the Hour traces, is obtained by instrumenting the storage array controller to extract periodically monitoring logs from drives in the field, and contains coarse-grained (i.e., hour level) workload information. The third data set, called the Lifetime traces, is obtained by extract-
ing monitoring logs from the disk drives at the end of life, upon return to the vendor and contains workload information which is significantly coarse (i.e., over the entire drive lifetime). In this paper, we identify characteristics that apply over an entire family of enterprise disk drives by validating that they hold through all the data sets despite the differences in information granularity, measurement techniques, and measured population. Our analysis suggests that disk drives in enterprise systems exhibit moderate utilization and mostly read traffic. However variability is a salient characteristic across time and across drives. This means that over time the load seen by disk drives changes significantly, with the maximum being several orders of magnitude higher than the average and reaching the maximum data transfer rate in a drive. Similarly, there are nonnegligible subsets of drives that process workloads different from the average such as serving mostly writes or experiencing high utilization. The rest of the paper is organized as follows. In Section 2, we present a high level description of the data sets included in our evaluation. Section 3 presents the analysis of the three sets of data with respect to the load seen by the disk drives, their utilization level, characteristics of the read and write traffic, and the read/write ratio. In Section 4, we discuss the results obtained from our characterization. We present related work in Section 5. We conclude with Section 6, which summarizes our work.
2
Description of the Data Sets
The three data sets in our evaluation, i.e., the Millisecond, the Hour, and the Lifetime traces are distinguished by the time granularity of the recorded information and the recorded information itself. Below, we describe in detail the information available in the data sets and identify their advantages and disadvantages.
2.1
The Millisecond Traces
These traces are measured in real enterprise storage systems deployed in the field by attaching a bus analyzer to the storage system interconnects. This measurement technique is non-invasive and does not require any instrumentation. The bus analyzer collects electrical signals which are processed off-line once the measurement is complete. Each trace records the time of a request arrival and departure (at the millisecond level), the number of bytes requested, the type of request (read or write), and the logical block number of the request location. These
traces are collected in single and multi-disk storage systems supporting dedicatedly one application. For the multi-disk systems, the bus analyzer also identifies the disk the request belongs to. In each of the Millisecond traces, we characterize the workload arriving at only one single disk, because most disks in an array behave very similarly and exhibit almost identical statistical characteristics. We have available traces from several enterprise systems, but we present here analysis for only four of them, in favor of a concise presentation. Table 1 gives the size of our Millisecond traces. Server Disks Length Number of Application (hours) Requests Web 1 out of 1 7.3 114,814 E-mail 1 out of 42 25 1,596,581 Coding 1 out of 42 12 483,563 User Acc. 1 out of 42 12 168,148 Table 1. The size of the Millisecond traces. The advantages of the Millisecond traces are associated with their level of detail. The available information allows to derive exactly the arrival, service, and departure processes at the disk. Furthermore, the traces provide the request ordering and their logical location, which allows to understand the disk access pattern, temporal and spatial locality, and the burstiness present in the workload. These traces can be used to drive detailed disk-level simulators and evaluate the efficiency of various disk-level optimization techniques. The main drawbacks of these traces are their relatively short time-span and the limited number of measured systems. As a result, there is uncertainty on the generality of the captured scenarios and the possibility that they may simply be non-representative cases.
2.2
The Hour Traces
SMART, which stands for the Self-Monitoring, Analysis, and Reporting Technology, is part of modern disk drives and monitors disk operation via a set of attributes. The monitored attributes are recorded in a fixed size log every two hours in the form of a fixed size frame. As a result, every new frame overwrites an existing one. Extracting the SMART logs periodically from disk drives enables off-line construction of longer logs. The size of the extracted SMART logs available to us is about 120 frames, which means that every log holds at most 240 hours of monitored disk activity. These SMART logs have been extracted every several weeks for up to three years from a large population of enter-
prise drives from a single storage system vendor1 . As a result, for every drive, we have created, off-line, a log of monitored activity for an extended period of time, at the two-hour granularity. Note that the activity recorded for each drive has gaps, because the frequency of log extraction has been slower than the frequency of overwriting the SMART frames (240 hours in our case). There are only a few attributes in SMART that contain workload information. Specifically, we use the amount of bytes read and written during the recorded two hour intervals and the corresponding drive age. In our analysis, we distinguish the drive families i.e., 15,000 RPM and 10,000 RPM and the capacity. The 15K RPM family is usually deployed in systems with higher demands on performance and reliability than the 10K RPM family of drives. We describe the size of the two sets of traces in Table 2. The Hour traces consists of a set of time series, one per drive and per attribute (bytes read or written). Each time series is in length equal to the number of extracted frames. However, because the gaps in the measurements, the time span of each time series is shorter than the measurement period, as indicated in Table 2. Drive Capa Disks Avg. span Avg. Num. Family city in set of data of Frames 15K RMP Med. 7,193 1.42 yrs. 1,856 15K RMP Low 2,943 1.40 yrs. 1,525 10K RPM High 15,440 1.50 yrs 3,478 10K RPM Medium 37,734 1.60 yrs 3,420 10K RPM Low 13,624 1.66 yrs 3,546 Table 2. The size of the Hour traces. The advantages of these traces include the large population of monitored drives and the length of the monitoring period. As a result, we can draw conclusions with regard to workload dynamics across time and across disks in a family. The drawback of the data set is associated with the nature of the recorded information. The available information, in the form of the volume of bytes read and written at the two-hour granularity, does not allow to reconstruct the per-request fine-grained disk activity as enabled by the Millisecond traces. Also, these traces are captured from the disks deployed in the storage systems of one single vendor and that may affect the generality of the analysis. 1 We have no information on the applications running on the measured systems or the storage configuration.
2.3
The Lifetime Traces
Upon return to the vendor, disk drives go through a test to determine their condition for warranty purposes. During this test, a set of attributes are extracted from the drives. These attributes contain up-to-date cumulative information on drive operation. Two of these attributes, the volumes of bytes read and written throughout the life of the drive, are used in our evaluation. The lifetime of a drive, measured in hours of operation, is also one of the collected attributes. As a result, there is only one single value per attribute per drive in these traces. Similar to the Hour traces, we categorize the drives by family, i.e., 15,000 RPM and 10,000 RPM. We also distinguish the drives by their age, less or more than one month. The reason for the age-based categorization is to separate drives that have experienced only activities typical in infancy, such as disk drive integration in the system, from drives that have experienced regular activities in addition to the integration activities in infancy. Table 3 gives the size of our Lifetime traces. Drive Total Less than one More than one Family drives month old month old 10K RPM 197,013 43,999 153,014 15K RPM 108,649 19,557 89,092 Table 3. The size of the Lifetime traces. The main advantage of this data set is that it includes a large number of drives from the entire spectrum of enterprise applications and all the storage systems vendors. The drawbacks of the Lifetime traces are related to the granularity of the information. The cumulative attributes allow only calculation of the average read and/or write demand per drive. Another concern is the possible bias in the sampling of this drive population given that they are returned and may have issues with their operation. The common knowledge is that issues with drive operation are correlated to the volume of work. However, the analysis in this paper suggests that this is not a valid concern for the set of drives in the Lifetime traces.
3
Workload Characterization
As described in the previous section, the data sets that we analyze here differ from each other with respect to the available information and its granularity. To generalize the workload characteristics, we calculate similar metrics for each data set, such as the amount of bytes read and written per unit of time and analyze
3.1
CDF − Probability (x