Elastic Queue: A Universal SSD Lifetime ... - ACM Digital Library

4 downloads 0 Views 318KB Size Report
Jun 8, 2016 - Elastic Queue: A Universal SSD Lifetime Extension. Plug-in for Cache Replacement Algorithms. Yushi Liang Yunpeng Chai. Renmin University ...
Elastic Queue: A Universal SSD Lifetime Extension Plug-in for Cache Replacement Algorithms Yushi Liang

Yunpeng Chai

Renmin University of China {liangyushi, ypchai}@ruc.edu.cn

Ning Bao National University of Singapore [email protected]

Abstract Flash-based solid-state drives (SSDs) are getting popular to be deployed as the second-level cache in storage systems because of the noticeable performance acceleration and transparency for the original software. However, the frequent data updates of existing cache replacement algorithms (e.g. LRU, LIRS, and LARC) causes too many writes on SSDs, leading to short lifetime and high costs of devices. SSD-oriented cache schemes with less SSD writes have fixed strategies of selecting cache blocks, so we cannot freely choose a suitable cache algorithm to adapt to application features for higher performance. Therefore, a universal SSD lifetime extension plug-in called Elastic Queue (EQ), which can cooperate with any cache algorithm to extend the lifetime of SSDs, is proposed in this paper. EQ reduces the data updating frequency by extending the eviction border of cache blocks elastically, making SSD devices serve much longer. The experimental results based on some real-world traces indicate that for the original LRU, LIRS, and LARC schemes, adding the EQ plug-in reduces their SSD write amounts by 39.03 times, and improves the cache hit rates by 17.30% on average at the same time. Categories and Subject Descriptors D.4.2 [Storage Management]: Secondary storage Keywords Flash, SSD, Cache, Endurance, Lifetime

1.

Introduction

Aiming at better quality of service, many companies deploy the Flash-based solid-state drives (SSDs) as a second-level

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. SYSTOR ’16, June 6–8, 2016, Haifa, Israel. c 2016 ACM 978-1-4503-4381-7/16/06. . . $15.00. Copyright http://dx.doi.org/10.1145/2928275.2928286

Hengyu Chen

Yaohong Liu

University of Florida Renmin University of China [email protected] [email protected]

cache to boost the performance of storage or database systems ([10, 21, 23, 29]). The SSD caching solution is both cost effective and easy to configure for improving the performance of storage servers equipped with traditional hard disk drives (HDDs). There are already many widely used cache algorithms proposed in history, such as LRU, LFU, LIRS ([15]) and ARC ([20]). These algorithms are designed according to different principles, and have their own advantages to adapt to distinctive application features. Thus under different application scenarios, it is hard to find a universally outstanding cache scheme that can always outperform the others. We have made a series of experiments about the behaviors of LRU, LFU, LIRS, and ARC under different application environments. As Table.1 shows, LIRS achieves the highest hit rates in Random File Access and Video on Demand, but shows relatively poor performance for the other two applications, for which, LFU and ARC achieve the highest hit ratios, respectively. Usually we can find the most appropriate cache algorithm for a certain application under the guidance of experience or a series of tests. Then this specific scheme can be employed in the practical production environment to achieve the best performance. Table 1: Hit ratios of four classic cache algorithms in different application scenarios. Application LRU LFU LIRS ARC Cloud Storage 78.4% 83.6% 80.1% 80.4% Random File Access 13.5% 10.1% 30.8% 24.5% Web File Server 16.5% 18.1% 16.6% 18.3% Video-On-Demand 16.0% 11.7% 22.5% 19.8% Note: SSD cache size is set as 8% of total storage space. However, a serious challenge to hinder SSD caches being widely used lies in the limited write endurance of SSD devices ([7, 13]). For example, assume that SSD lifetime is expected to be 5 years. Intel 910 and Micron P300, two of the most popular high-end enterprise SSDs, can only stand being written for 7 ∼ 10 times of their own capacities each day. For example, Intel 910 endures 14PB lifetime writes,

so for 5 years’ usage, the write amount for each day is limited to 7.67TB, which is about 10 times of its 800GB capacity ([3, 28]). Such a strict write limitation of SSDs is not enough for the classical cache algorithms, such as LRU, LIRS, etc., which keep updating the cached contents frequently to maintain a high cache hit rate. Some advanced cache algorithms have been proposed (e.g., SieveStore ([24]), LARC ([14]), L2ARC ([12]), WEC ([8]), and ETD-Cache([9])) to alleviate I/O stress to SSDs, since the traditional cache algorithms do not suit the new SSD devices well due to the too much write pressure. Despite that these improvement take the endurable problem into account, the way that they select blocks to cache are inherent and fixed (see Section 2 and Table 2 for more). Thus, we can not freely choose the most appropriate cache algorithms according to application features, and accompanied with a loss on performance. Therefore, all the existing cache solutions cannot meet the following two conditions of a satisfactory SSD cache scheme at the same time: 1) Adaptability to applications (SieveStore, etc. ×). 2) Reducing SSD writes (LRU, etc. ×). To solve this problem, we need a universal SSD lifetime enhancement solution which can be coupled with any cache replacement algorithms. In this case, we can choose the best cache algorithms for a given application scenario, with the benefits of both taking the advantage of this cache scheme for better performance and remarkably reducing write amount to SSDs. In this paper, we propose a universal plug-in named Elastic Queue, which can cooperate with any other cache replacement algorithms and mitigate write pressure to SSDs simultaneously. The existing cache replacement algorithms are responsible for selecting contents that are most worth caching in a specific application environment, while Elastic Queue is in charge of pinning them in SSD cache with extended eviction borders (see Section 3) to reduce the unnecessary SSD write operations. Our experimental results based on some real-world traces indicate that compared with the original LRU, LIRS, and LARC schemes, deploying the Elastic Queue plug-in reduces their SSD write amounts by 39.03 times, and promotes their cache hit rates by 17.30% on average at the same time. Among these experiments, two thirds of the cases achieve the higher hit ratios and the contribution of cache hits from writing each block into SSDs is enlarged by 45.78 times. The rest of this paper is organized as follows. The related work is presented in Section 2. Section 3 gives an overview of our proposed Elastic Queue plug-in for cache schemes. Then in Section 4, we elaborate on Elastic Queue in detail, followed by the quantitative evaluation in Section 5. Finally, Section 6 concludes this paper with a summary of our contributions.

2.

Related Work

2.1

Write Endurance of SSD

In recent years, the Flash-based solid state drives (SSDs) are in rapid growth in the storage market ([4, 5]). SSDs have the same block I/O interface as hard disk drives (HDDs), and large enough storage capacities (i.e. hundreds of GB or even TB). What is more, the performance of SSDs is much higher than the traditional HDDs ([3, 28]). Thus in many corporations, SSDs have been deployed in the storage systems, especially as caches to boost performance ([1, 2, 10, 11, 21, 23]). However, the biggest challenge of SSDs is their write limitation, i.e. once a SSD has been written over a certain number of times, it becomes no longer reliable. The limited write endurance of SSDs comes from two aspects: First, the flash chips, which are used to store data inner SSDs, have to be erased before re-writing, and the number of erasing operations for each storage cell in a flash chip is very limited. Second, an erase unit is much larger than a read/write unit, which means that devices have to do extra writing jobs to migrate the valid data from the to-be-erase unit to other places ([7, 27]). These additional write operations shorten the lifetime of SSDs further. In addition, unlike RAM or HDDs, SSDs’ performance of reading and writing is imbalanced. Writing to SSDs is about one order of magnitude slower than the read operation; the erasing operation applied before any re-writing is about two orders of magnitude slower compared with reading ([4]). Therefore, SSDs’ writing, especially the overwriting operations is much slower than reading. Considering SSDs’ limited lifetime and slow writes, cache algorithms deployed on SSDs should limit the frequency of the data updates to achieve both good performance and long endurance. 2.2

Cache Replacement Algorithms

A large number of cache algorithms are proposed in the past tens of years. In addition to the classical LRU algorithm, there are also a lot of outstanding cache schemes, such as FBR ([26]), LRU-k ([22]), 2Q ([17]), LRFU ([19]), CAR ([6]), Clock-Pro ([16]), MQ ([32]), LIRS ([15]) and ARC ([20]). However, these algorithms are designed with the assumption of adopting RAM as cache devices, which do not have the write endurance problem like SSDs. Thus, these algorithms usually use frequent data updates to achieve high hit ratios, and that makes SSDs wear out quickly. Along with the popularization of SSDs, there are more and more research work about SSD-based cache ([18, 25, 27, 30, 31]). Some works, such as LARC ([14]), L2ARC ([12]), SieveStore ([24]), WEC ([8]), and ETD-Cache ([9]), are proposed to limit the write amounts of SSD cache to extend their lifetime. Among them, LARC is a representative one because of its low overhead and good effects, so we choose it as one of our evaluation targets (see Section 5).

For the traditional cache algorithms (e.g., LRU, LFU, etc.) which tend to use recency, frequency, or other metrics to identify the popularity of blocks, we can choose the most appropriate one to achieve the highest performance in any practical application. However, although the optimized SSD cache solutions have considered the SSD life limitation in design, they usually have fixed strategies of selecting cache targets (summarized in Table 2). Therefore, they cannot adapt to all kinds of applications to achieve the highest hit ratios for a given application scenario. Table 2: Strategy of selecting cache targets of improved SSD cache algorithms. Algorithm Selection strategy of cache objects SieveStore Selecting hot blocks with high frequency by only admitting blocks that are accessed beyond the threshold of times to enter SSDs. LARC Managing physical cache according to LRU and filtering continuously accessed hot blocks for SSD cache with a ghost LRU queue. L2ARC Managing cached data in RAM with ARC and loading the to-be-evicted blocks periodically into an FIFO queue on SSDs. WEC Managing SSD-cached data in LRU queue. ETDFiltering data blocks for SSDs with LRU Cache queues and a sampling method.

3.

Overview of Elastic Queue

In Section 3.1, we first construct a unified priority queue model for the cache replacement algorithms, which also indicates their common problem of introducing too much SSD writes and motivates us to propose a universal plug-in for cache replacement algorithms, i.e., Elastic Queue, whose principle is presented in Section 3.2. 3.1

Unified Priority Queue Model of Cache algorithms

Although cache algorithms look very different in principles and forms, all of them can be regarded as a unified model of priority queue, which sorts cache blocks with the priority of staying in cache according to their unique way of prioritizing data. Blocks given the highest priority are most qualified to stay in cache, farthest away from eviction, while those with the lowest priority should be evicted once cache space is in short supply. Assume that the solid frame in Fig. 1 stands for SSD cache space and all blocks are placed in the descending priority order. When blocks walk through the cache border (i.e., evicted from SSD cache), they fall into the dashed area and wait for the next-round caching admission. With this model, we can find out that the ideal cache blocks are described as as Fig.1(a). Their access intervals are small enough, so they are always hit in the solid frame,

Hit

Cache Border

Hit

Cache Border

(a)

Evict

(b) Hit

Cache Border

Elastic Border

(c) Cache Size

Figure 1: Priority Queue Model. All cache algorithms can be abstracted as a form of a priority queue: (a) stands for the ideal cache blocks to supply cache hits without walking through the cache border (i.e., eviction), while (b) happens more frequently in practical, generating a lot of cache replacement since the blocks often go through the cache border. Our solution (c) overcomes the shortcoming of (b) and reaches the effect close to (a) by applying an elastic border. i.e., inside the cache border. Caching these data does not introduce additional SSD writes. In practical situation, however, hot blocks are easy to be driven out of the cache border since the access intervals of cache blocks are usually very unstable, even for hottest blocks [8], as Fig. 1(b) shows. Regarding their good popularity, they are very likely to return to cache when they are hit again, and that will make SSDs suffer from extra writes. In fact, the situation shown in Fig. 1(b) is very common in real-world traces. To illustrate this issue, we conduct an experiment to analyze the eviction behavior of LRU. Fig. 2 plots the percentage of top m% blocks in the set of evicted cache blocks. The top m% blocks are the first m% blocks located in the head of the block list ordered by their total access counts through the whole trace, i.e., the most accessed blocks. Note that all the tested traces are from real-world applications or benchmarks (see their details in Section 5 and Table 3). Throughout the test, we found that a substantial portion of hot data were evicted during occasional long intervals. For example, 31.94% of the victims belong to top 10% blocks on average for the five traces. And the values of top 20%, 30%, and 50% blocks are respectively 52.50%, 64.17%, and 78.95%. The analysis shows that although existing cache algorithms can effectively identify hot data blocks, their common problem lies in no protection of the cached blocks, allowing them to easily pass through the border to generate a large number of cache updates. 3.2

Principle of Elastic Queue

To solve the problems shown in Fig. 1(b), some popular blocks can be pinned in SSD cache until they go out of a new delayed elastic border, short for EB as Fig. 1(c) illustrates. Since the original cache border have no eviction effect on the pinned blocks, this solution can prolong the pinned blocks’

8 0 %

8 0 % 6 0 %

P e rc e n t o f T o p 2 0 %

P e rc e n t o f T o p 1 0 %

6 0 % 4 0 %

4 0 %

2 0 % 0 %

cache replacement, and Elastic Queue assigns individual elastic border for each pinned block respectively according to its popularity.

B lo c k s

1 0 0 %

B lo c k s

1 0 0 %

2 0 %

a s

c c tv

lc t o in fb -rfa m e ta -j d a ta -s

0 %

(a) a s

(a) Top 10%

B lo c k s P e rc e n t o f T o p 5 0 %

8 0 %

B lo c k s

8 0 %

P e rc e n t o f T o p 3 0 %

1 0 0 %

4 0 % 2 0 % 0 %

a s

c c tv

in c t fb - r fa m e ta - jo d a ta - s l

2 0 %

a s

c c tv

lc t o in fb -rfa m e ta -j d a ta -s

(d) Top 50%

Figure 2: The percent of the top m% popular blocks in the evicted block set. The victim set of LRU has a considerable proportion of top popular blocks, indicating many hot blocks walk through the cache border to generate large amounts of SSD writes. staying time in SSDs and significantly reduce the in-and-out times on the cache boundary, thus achieving an effect similar to Fig. 1(a). Based on this observation, we propose a universal lifetime enhancement mechanism called Elastic Queue (EQ) to add a protection mechanism for the already cached blocks. Elastic Queue can work with any cache algorithms as a plugin, making use of their rules to select contents that are most worth caching in a specific application environment, while the plug-in itself is in charge of pinning the cached blocks in SSDs and extending their eviction borders elastically. Coupled with EQ, cache schemes can prevent evictions caused by some popular blocks’ transient cooling. As a result, popular blocks cached in SSDs can stay much longer to contribute more hits and the amount of SSD writes caused by unnecessary cache updates is sharply reduced.

4.

Details of Elastic Queue

In this part, we first elaborate the metadata management of Elastic Queue in Section 4.1. And then, two important modules, i.e., Block Pinning and Block Unpinning, will be presented in Section 4.2 and 4.3, respectively. 4.1

4

5

6

7

8

9 10 11

Metadata Management

The framework of Elastic Queue is illustrated in Fig. 3. Fig. 3 (a) shows a full priority queue managed by a cache algorithm including all the blocks in the system. The cache border is set according to the size of SSD cache. Our proposed Elastic Queue plug-in introduces a new eviction border, i.e., elastic borders (EB), for the pinned blocks cached in SSDs to protect these blocks from unnecessary

a 12

b 13



DTB7 = 4

Elastic Queue

1

2

3

4

5

6

7

a

b

Cache Size Cache Border

4 0 %

(c) Top 30%

3

DTBa = 3 EB of a

Cache Size Cache Border (b)

6 0 %

0 %

2

lc t o in fb -rfa m e ta -j d a ta -s

(b) Top 20%

1 0 0 %

6 0 %

c c tv

1

EB of 7

DTB6 = 2

Full Priority Queue

General Block ahead of cache border Evicted General Block Pinned Block

Figure 3: Metadata management of Elastic Queue: Elastic Queue (EQ) has an elastic length to hold both the hot blocks identified by an appropriate cache algorithm under a specific application environment and the pinned blocks cached in SSDs, which have elastic borders to stay longer in SSDs for less writes. There are two kinds of blocks in Elastic Queue: general blocks and pinned blocks. Their main difference lies in that only the pinned blocks are actually stored in SSD devices. Once the priority of a general block drops and the block walks through the cache border, its metadata will be immediately evicted from Elastic Queue and becomes the grey block marked in Fig. 3 (a). The distance between the current location of a block in the queue and its eviction border (i.e., elastic border for pinned blocks and cache border for general blocks) is called Distance-To-Border (DTB). For example, as shown in Fig. 3 (a), block 6’s DTB is 2, so if it is pushed forward by two hot blocks, it will be discarded by Elastic Queue. The pinned block 7’s elastic border is 4 block units ahead (i.e. the DTB of block 7 is 4). In order to reduce the overhead of maintaining such a long queue like Fig. 3 (a), we can remove the records of the evicted general blocks (i.e., the grey ones in Fig. 3 (b)). In fact, any pinned block in Fig. 3 (b) can be mapped to its actual positions in the full priority queue by recording the evicted general blocks count before it. In addition, a Block Pinning module and a Block Unpinning module are designed in EQ to select the most valuable cache blocks for SSD cache and to determine the elastic border of each pinned blocks in SSDs, respectively. We can use some examples to explain the detailed working mechanism of the Elastic Queue. Shown as Fig. 4, assuming EQ is managed by LRU, blocks 1, 2, 3, 4, 5, 6, 7 are the most recently used (MRU) ones at the beginning, and locate before the cache border. Blocks 1, 4, 6, a, b, c, d are already pinned in SSD cache, while the other blocks are general blocks whose metadata are recorded. In these examples, general block 3 cannot be moved into SSD when it is promoted to the MRU end of EQ since the cache is full of pinned blocks. A hit on pinned block a drives a general block 7 out of cache border and makes

2

3

4

5

6

7 a

b

c

d d

Cache Size

3

1

2

4

5

6

7 a

b

c

a

3

1

2

4

5

6

c

d

8 8

a a

3 3

1 1

2 2

4 4

5 5

b 6 6

b

c

b d

Pinned Block General Block

(1) Hit blk 3

d

#9

If a cache algorithm outperforms others prominently in some applications, we can say that the way it chooses cache blocks is very effective in these scenarios. Hence, we can utilize this scheme for selecting blocks in the Block Pinning module. However, we cannot simply choose the hottest general block in the priority queue when a free slot appears in SSD caches, because it only means the selected block is the hottest at one point in time. In fact, the locations of hot blocks in a priority queue usually fluctuate sharply, so a better solution is to take the observation for a while to get the priority change tendency of blocks and to predict their popularity more accurately. If a block usually appears near the highpriority end of the queue and shows no sign of decrease in popularity, it is believed to be a worth-caching block in the application scenario. Priority Snapshot

Following the above idea, we design a Priority Snapshot mechanism to watch the popularity trend of data blocks with the help of the deployed cache algorithm. Periodically, we record the priority (i.e., the location in EQ) of all the general blocks by taking a snapshot of EQ. When Elastic Queue need to pin a block in SSD cache, it analyzes all the recorded snapshots and judges which block tends to remain popular and worths staying in cache in the following time. In



#5



#9 #1

(a)

(4) Blk c is unpinned

Block Pinning

#5

Snapshots

(3) A new blk 8 arrives

it evicted directly. Then, although pinned block 6 takes the LRU position and should be pushed out of cache by the new coming block 8, it is protected by EQ until it reaches its own elastic border. Instead, the arrival of block 8 makes block c go beyond its elastic border. As a result, block c is unpinned and removed from the cache space. As a free slot in SSDs is now available, EQ will load the most valuable general block (e.g. block 8) into SSD cache, pin it for protection and specially assign an elastic border to the block. However, the crucial problems that Elastic Queue faces are (1) how to identify the most popular blocks that are worth of protection and (2) where should their elastic borders locate. These two problems will be respectively issued in the Block Pinning and Block Unpinning modules of EQ presented in the following two parts.

4.2.1



(2) Hit blk a

Figure 4: Some examples of the metadata management in Elastic Queue.

4.2

Priority in EQ

1

Priority in EQ

Initial Elastic Queue

#5



Snapshots #1

#5

#9 #9

(b)

Figure 5: Examples of Priority Snapshot. When the average priority values of blocks in (a) and (b) are similar to each other, we can determine which one to cache according to their priority changing tendencies. Block in (a) shows a general uptrend and the one in (b) is going downwards, so the former is regarded as a better one to stay in cache. addition, only recent several snapshots are reserved to lower the overhead. Generally, when we need to select blocks from the general blocks in EQ, the Block Pinning module first select the blocks with higher average priority. When the values of multiple blocks are close to each other, we utilize the Priority Snapshot mechanism for selection. In Fig. 5, priority trending curves of two blocks are plotted respectively, and Elastic Queue takes nine snapshots at time point #1 ∼ #9, which gives an approximate view of priority variation tendency by applying linear regression analysis. The block in Fig. 5(a) has ascending priority values according to the EQ snapshots, indicating that it becomes more and more popular according to the deployed cache scheme. On the contrary, the block in Fig. 5(b) gets lower priority each time the snapshot is taken. It’s an obvious deduction showing the block is losing popularity. Therefore, although both of the two blocks have similar and relatively high average priority values, the block in Fig. 5(a) is believed to be very helpful in generating cache hits in the following period and should be protected by Elastic Queue. 4.2.2

Pinning Blocks

By counting the number of pinned blocks, the Block Pinning module can find out whether the SSD cache has available space. When a pinned cache block is unpinned (i.e., kicked out from the Elastic Queue), it is removed from the SSD devices and makes a free slot in the SSD cache to accept a new one. At this time, based on the aforementioned snapshot analysis, we can select the most popular and potential general block to pin in EQ and to store in SSDs. The newly pinned block will be given an elastic border (see Section 4.3 for detailed methods). 4.2.3

Black List

If a cold block happens to be pinned in SSDs, it will occupy the valuable SSD space for a relatively long time due to the protection mechanism of EQ, leading to lower cache hit

rates. Therefore, we set a blacklist for Elastic Queue in this module to prevent the re-entrance of unpopular data. If a block passes through its elastic border and is evicted from SSDs, it will be appended into a blacklist and forbidden to enter SSDs again over a period of time. The blacklist of Elastic Queue has a certain length and is managed by FIFO. Thus the blocks in the blacklist will be replaced after a decent interval, considering their popularity may change as time goes by. 4.3

Block Unpinning

For pinned blocks in SSD cache, we need to protect them from being driven out by extending their DTBs. However, surely some cold blocks will mix in the already cached blocks, and hot data will not be popular forever. Thus, once a pinned block is squeezed by better blocks and crosses the elastic border, it will be unpinned by the Block Unpinning module and leaves the cache device to make room for new popular ones. The default DTB of pinned blocks is an essential parameter and its value is highly related to the active block number (a.k.a. ABN) of the storage system. Generally, most contents in a storage system are inactive; only a small part of data are accessed in a past time window. So we utilize the number of these unique active data blocks to set an appropriate standard DTB value. The impacts of different DTB settings are discussed based on a series of experiments (see Section 5.3). 4.3.1

Priority Distribution

Different blocks certainly show different popularity, thus we should not assign the same DTB for them. If a top data block is pinned, it should be well protected. Recall the fact that even the hottest data have some long access intervals inevitably, Elastic Queue should give them longer DTBs to lower the risk of evicting them. Instead, if a relatively less popular block is pinned, we do not need to maintain a very low risk of eviction, thus the value of its DTB can be shorten a bit to save SSD cache resources. The challenge is how to classify the cached blocks accurately. Although the Priority Snapshot mechanism provides a good evaluation of block popularity according to the deployed cache scheme, we usually only record a few snapshots in recent time to decrease the overhead, since it need to record all blocks’ priority in EQ. Therefore, in the Block Unpinning module, we use another statistical method with lower overhead to identify the priority distributions of blocks in a longer time and determine the DTB of each block accordingly. By counting the hits of every block in each segment of the elastic queue, we can observe the access interval distribution of a block. As Fig. 6 shows, block a leaves lots of footprints within or near the SSD cache border. Priority distribution of block a indicates that it generally has short access intervals and appears to be a hot block in the long run. Thus, we

Block a

Block b

Cache Size

Cache Size

Full Priority Queue

Full Priority Queue

Figure 6: Priority Distribution of two representative blocks. We count the access times of each pinned block when it is located in different segments of the full priority queue. In long-term view, block a shows higher popularity because most of its hits are located in the head of the full priority queue. Thus it is given a longer DTB, while block b is less worthy of protection and gets a shorter DTB. should grant a longer DTB for block a for better protection. Meanwhile, block b has a larger access interval and is less popular than block a, meaning it deserves a relatively shorter DTB. 4.3.2

Data Classification

In Elastic Queue, we simplify above method by considering the two parts before and after the cache border in EQ as two segments and divide the cache blocks into the following three types: 1) Very hot data. These data blocks’ accesses cluster in SSD cache space and have short intervals in most cases (e.g., block a in Fig. 6). The DTBs of these blocks will be enlarged based on the default DTB value to lower the risk of losing them. 2) Stable data. This kind of data often enters the area between the cache border and the elastic border, and then returns to the head part of EQ when it is hit (e.g., block b in Fig. 6). This means the popularity of these data blocks are stable, but the distance of the adjacent requests of these block are often much longer than the depth of the SSD size. In fact, the Elastic Queue mechanism is very effective for this kind of data blocks by keeping them in SSD cache for a much longer time compared with the original cache schemes. Their feature is that the hit counts peak appears in the segment following cache border, and we set the default DTBs for them. 3) New data. For the newly coming blocks with short record history, there is not enough accessing information to judge their popularity accurately. They may become the above very hot data, stable data, or bad data which will not be accessed in a long time. Therefore, we assign a reduced DTB value based on the default one for this kind of blocks.

5.

Evaluation

In order to evaluate the performance of the proposed Elastic Queue mechanism, we have implemented the representative LRU, LIRS, LARC cache schemes, and our proposed Elastic Queue plug-in to enhance the SSD write endurance for all these above schemes in a trace-driven simulated caching

system. The simulation system is designed and developed by us with about 5,000 lines in Erlang with the benefit of running in parallel to accelerate the evaluation process. All the traces used in the evaluation are from the real-world applications or benchmarks, and they are listed in Table 3. To observe the performance of Elastic Queue thoroughly, we adopt the following three metrics: • Cache hit ratio, which stands for performance. Note that

the total hit ratio of the whole cache system including both RAM and SSD is adopted as the performance metric in this section. The RAM cache is set as a very small value compared with SSD cache, so SSD cache is the dominate factor for this metric. • Amounts of SSD written data, which reflect the lifetime

of SSD devices directly. • Write efficiency of SSD, which is defined as the total

SSD hit count divided by the amount of all SSD writes. It can be used to indicate the average performance benefit of writing a single block into SSD cache. Table 3: Five Real-world Traces used for Analyses. Trace Name Application Type Request Count as File Server 215,678 cctv Video-On-Demand 550,310 filebench-rfa File Server 2,000,000 meta-join Cloud Storage 554,561 data-slct Cloud Storage 419,723 Note: The request size is 4KB. 5.1

Overall Results

In this part, the overall results of our proposed Elastic Queue mechanism for enhancing LRU, LIRS, and ARC will be presented. The RAM size is set as 0.5% and the SSD size is set as 8% of the storage capacity. The results of LRU, LIRS, LARC, and their enhancement coupled with Elastic Queue are compared in Fig. 7. 5.1.1

Cache Hit Ratio

Fig. 7 (a), (b), and (c) plot the cache hit ratios of LRU, LIRS, and LARC, respectively. Elastic Queue leads to different influence on the cache hit rates in various traces. For cctv, filebench-rfa and meta-join, Elastic Queue shows a noticeably positive effect, while for as and data-slct, Elastic Queue brings out slightly decreases. In most cases, existing cache algorithms enhanced with Elastic Queue can achieve a bit higher cache hit rates compared with the original ones. For LRU, LIRS, and LARC under all the five traces, Elastic Queue leads to higher hit rates in 66.67% of the cases, and the improvement of the cache hit ratio is 17.30% on average. The results indicate that the existing cache algorithms can achieve a higher average performance coupled with the Elastic Queue plug-in, despite that the data updating

frequency is reduced significantly (see the results of SSD writes below). That is to say, the average quality of cached data does not decrease due to the frequency reduction of updating cached data. Instead it increases by forbidding some low-quality data to enter SSDs. The combined solution effectively selects the high-quality data and keeps them in a long run in SSD cache. 5.1.2

SSD Write Amount

In Fig. 7 (d), (e), and (f), the SSD write amounts are normalized by that of the as trace with Elastic Queue, and note that the Y axis is in exponential growth. From these figures, we can obviously find that the SSD write amounts are decreased by at least an order of magnitude when deployed with Elastic Queue for all the traces. For the trace meta-join, the SSD write reduction is high up to more than two orders of magnitude for LRU and LIRS. Compared with the original cache algorithms, Elastic Queue reduces reaches 39.03 times of SSD written data on average. Considering its positive effects on cache hit rates in the above experiments, Elastic Queue enlarges the lifetime of the SSD devices used as a second-level cache for nearly 40 times without performance loss. Thus Elastic Queue can be deployed in enterprise storage systems to save a considerable part of costs spent on SSD replacement. 5.1.3

SSD Write Efficiency

Write efficiency can reflect the performance benefit of introducing one block into SSD cache, so it is an important metric to evaluate cache schemes for the devices with write limitations (e.g. SSDs). The results of write efficiency are illustrated in Fig. 7 (g), (h), and (i), and the Y axes in these figures are also in exponential growth. It is easy to find that the Elastic Queue plug-in improves the write efficiency by at least an order of magnitude on the basis of the original cache algorithms in most cases, and the average growth is up to 45.78 times, benefiting from both the SSD write amount reduction and the increase of cache hit ratios. In a word, the experimental results in Fig. 7 demonstrate that the Elastic Queue plug-in does help the existing cache algorithms to reduce the SSD write amounts and to improve the write efficiency of SSD cache significantly (i.e. both more than an order of magnitude). What is more, Elastic Queue achieves these results based on similar or even higher cache hit ratios at the same time. 5.2

Effectiveness Analyses of introducing EQ

In this part, we give a detailed analysis about the effectiveness of adopting the Elastic Queue plug-in into existing cache algorithms. The analysis includes the following two aspects: 5.2.1

Improvement on No-hit Percentage

Our proposed Elastic Queue plug-in for existing cache algorithms extends the eviction border of cached blocks

4 0 % 2 0 % 0 %

a s

c c tv

6 0 % 4 0 % 2 0 % 0 %

- jo in a ta - s lc t d m e ta

fb -rfa

a s

(a) LRU: Hit ratio

1

1

a s

c c tv

fb -rfa

c c tv

fb -rfa

- jo in a ta - s lc t d m e ta

L A R C L A R C + E Q

1 0 0 1 0 1

- jo in a ta - s lc t d m e ta

a s

c c tv

fb -rfa

- jo in a ta - s lc t d m e ta

(d) LRU: Write amount

(e) LIRS: Write amount

(f) LARC: Write amount

L R U L R U + E Q

L IR S L IR S + E Q

L A R C L A R C + E Q

1

c c tv

fb -rfa

- jo in a ta - s lc t d m e ta

(g) LRU: Write efficiency

1 0 0 W r ite E ffic ie n c y

1 0 0 W r ite E ffic ie n c y

W r ite E ffic ie n c y

1 0

- jo in a ta - s lc t d m e ta

1 0

a s

a s

1 0 0 0

R e la te d S S D

1 0

1 0 0

2 0 %

(c) LARC: Hit ratio L IR S L IR S + E Q

1 0 0

R e la te d S S D

R e la te d S S D

1 0 0

fb -rfa

4 0 %

0 %

- jo in a ta - s lc t d m e ta

1 0 0 0 W r ite A m o u n ts

W r ite A m o u n ts

L R U L R U + E Q

c c tv

fb -rfa

6 0 %

(b) LIRS: Hit ratio

1 0 0 0

a s

c c tv

L A R C L A R C + E Q

8 0 %

T o ta l C a c h e H it R a te

6 0 %

1 0 0 %

L IR S L IR S + E Q

8 0 %

T o ta l C a c h e H it R a te

8 0 % T o ta l C a c h e H it R a te

1 0 0 %

L R U L R U + E Q

W r ite A m o u n ts

1 0 0 %

1 0 1

a s

c c tv

fb -rfa

- jo in a ta - s lc t d m e ta

(h) LIRS: Write efficiency

1 0 1

a s

c c tv

fb -rfa

- jo in a ta - s lc t d m e ta

(i) LARC: Write efficiency

Figure 7: Overall Results: Cache algorithms with the enhancement of Elastic Queue achieve a similar total hit ratio of cache system and decrease the write amount of SSD by about an order of magnitude at the same time. As a result, the write efficiency of SSD cache increases significantly. Note that the values of the Y axis in all the write amount and write efficiency figures are in exponential growth. elastically, so as to significantly prevent the already cached blocks from being evicted too easily. Thus the cache blocks that do not have a chance to generate a single cache hit will be much reduced. Plotted by Fig. 8, therefore, we have made a statistical analysis of the no-hit block percentage changes before and after equipping the Elastic Queue plugin based on the classical LRU scheme. There is an obvious decrease of the no-hit percentage when Elastic Queue is coupled with LRU. The no-hit percentage decline of the five traces varies from 42.61% to 61.50%. This result shows that large amounts of useless or low-efficient SSD writes are eliminated by the Elastic Queue plug-in.

5.2.2

Hotness of Pinned Blocks

In traditional opinion, the blocks evicted by cache schemes (i.e., walking through the cache border shown in Fig. 3) are not popular enough to be maintained in cache devices. Thus a concern about the Elastic Queue plug-in lies in that the pinned blocks between the cache border and the elastic border maybe unpopular to lower the SSD cache hit rates. The Elastic Queue plug-in aims to protect the popular blocks from being evicted too early due to occasional large access intervals. In fact, most of the pinned blocks located between the cache border and the elastic border keep being popular in the following period. This can be observed from

the result of Fig. 9, which shows the average hit rates of the pinned blocks located between the cache border and the elastic border (i.e. their total hit counts divided by their block number) for the five traces. There are 88.2% of these pinned blocks that are hit on average, and the values of the five traces range from 78.67% to 98.91%. P e r c e n t o f N o - H it W r itte n B lo c k s

1 0 0 %

L R U L R U + E Q

7 5 % 5 0 % 2 5 % 0 %

a s

c c tv

fb -rfa

m e ta - jo in

d a ta - s lc t

Figure 8: No-hit block percentage improvement with the help of Elastic Queue: The EQ plug-in reduces the worthless SSD writes significantly to extend the device lifetime. The result indicates that Elastic Queue is effective in protecting popular data. Without Elastic Queue, these blocks will be discarded directly, and when they are accessed again, they will consume extra SSD writes for the same contents. The Elastic Queue pins these blocks for a while, and then 88.20% of them return to the SSD cache space without putting extra burden on SSD writes. This result can also explain why plugging Elastic Queue in the original cache algorithms can reduce the SSD write amounts and improve write efficiency significantly.

E ffe c tiv e R a tio o f E Q

1 0 0 % 7 5 % 5 0 % 2 5 % 0 %

a s

c c tv

fb -rfa

m e ta - jo in

d a ta - s lc t

Figure 9: On average, 88.20% of the pinned blocks between the cache border and the elastic border in Elastic Queue are hit again to return to a high-priority position, indicating pinning these blocks for a longer time is effective. The re-writing amount of these blocks after eviction according to the original cache algorithms is reduced by the Elastic Queue plug-in.

5.3

Impact of default DTB

In this part, we conduct some experiments to show the influence of setting different default Distance-To-Border (DTB) for pinned blocks under the trace as, which has a medium performance under Elastic Queue according to the

previous experimental results. DTB is an important parameter for the Elastic Queue mechanism, since it determines how much longer a pinned block can stay in SSD cache before it is hit again (although different pinned blocks are assigned different DTBs based on the default value). Thus, DTB affects the quality of the cached data in SSDs. In the experiments, the default value of DTB in Elastic Queue ranges from 1 to 6 times of the active block number (ABN) of the storage. Shown as Fig. 10(a) and (b), along with the increase of the default DTB value, the total cache hit ratio goes down slightly, while the SSD write amounts decrease sharply, especially when the value ranges from 1x to 2x of ABN. Fig. 10(c) indicates that the SSD write efficiency increases to the highest value when the threshold is set as 4∼6 times of ABN, and a too large threshold makes the write efficiency decrease. Therefore, the default value of DTB should be set big enough. A too small threshold introduces not much difference with the original cache algorithms, leading to too much SSD writes. On the other hand, when the threshold is too large, cold blocks occupy most of the SSD cache for a long time, resulting in decrease of data quality. In all the above experiments, we set the default value of DTB as 1x of ABN for all the traces. In fact, different traces should be configured in different settings to achieve the best performance. For the trace as, considering both the write efficiency and total hit ratio, we think that 4x ∼ 5x of ABNs are appropriate in the practical environment. 5.4

Impact of SSD Size

The adaptability of Elastic Queue to different SSD size settings is evaluated in this part. We also adopt the trace as in the experiments, and the eviction threshold is set as 4x ABN of as. As Fig. 11 plots, Elastic Queue can work very well in all kinds of capacity settings of SSD cache ranging from 2% to 10% of storage. As Fig. 11(a) shows, the total hit ratios of the original LRU and LRU coupled with Elastic Queue (a.k.a. LRU+EQ) are close to each other under all the SSD size settings. For the SSD write amount aspect, LRU+EQ always keeps about two orders of magnitude less SSD write amounts compared with LRU. Along with the increase of SSD sizes, there is a little descending trend of the relative write amounts of the original LRU compared with LRU+EQ (see Fig. 11(b)), because larger SSD sizes lead to higher hit rate and less cache misses, which are the source of SSD writes, for the original LRU. However, this phenomenon is not an advantage of LRU, because smaller SSD devices are more demanding on low write pressure to extend the device lifetime. Elastic Queue keeps stable cache hit rates and SSD write amounts in all kinds of SSD sizes, so LRU+EQ achieves high and stable write efficiency, which is one order of magnitude more than the original LRU, as Fig. 11(c) plots. In

2 5 % 0 %

1 x

2 x

3 x

4 x

5 x

L R U + E Q

D e fa u lt D T B v a lu e s ( # o f A B N )

6 x

5 0

L R U + E Q

4 0 0

4 0 3 0

W r ite E ffic ie n c y

5 0 %

W r ite A m o u n ts ( # o f b lo c k s )

7 5 %

6 0 0

2 0 0 0

S S D

T o ta l C a c h e H it R a te

1 0 0 %

(a) Hit ratio

1 x

2 x

3 x

4 x

5 x

D e fa u lt D T B v a lu e s ( # o f A B N )

2 0 1 0 0

6 x

(b) Write amount

1 x

2 x

3 x

4 x

L R U + E Q

5 x

D e fa u lt D T B v a lu e s ( # o f A B N )

6 x

(c) Write efficiency

Figure 10: Impacts of default DTB: 1) It should be set as a large value, which is comparable with the active block number (ABN) of the storage, to get satisfactory effects; 2) Larger values lead to both lower hit rates and less write pressure. We should adopt an appropriate one according to system performance requirement. 1 0 0 %

0 %

L R U L R U + E Q

2 % 4 % 6 % 8 % 1 0 % C a p a c ity R a tio o f S S D C a c h e a n d S to r a g e

(a) Hit ratio

1 0 W r ite E ffic ie n c y

W r ite A m o u n ts

4 0 %

R e la tiv e S S D

T o ta l C a c h e H it R a te

6 0 %

2 0 %

L R U L R U + E Q

1 0 0

8 0 %

1 0 1

1

2 % 4 % 6 % 8 % 1 0 % C a p a c ity R a tio o f S S D C a c h e a n d S to r a g e

(b) Write amount

L R U L R U + E Q

2 % 4 % 6 % 8 % 1 0 % C a p a c ity R a tio o f S S D C a c h e a n d S to r a g e

(c) Write efficiency

Figure 11: Impacts of SSD Size: Algorithms with the enhancement of Elastic Queue achieve similar cache hit ratio and more than an order magnitude improvement on both reducing SSD write amounts and promoting write efficiency under various SSD size settings. a word, the Elastic Queue plug-in is effective under various capacity settings of SSD cache.

6.

Conclusion

In this section, we conclude this paper with a summary of our contributions: 1) SSD cache cannot sustain frequent data updates caused by traditional cache algorithms (e.g., LRU, LIRS, LARC, etc.) due to its limited write endurance, while some cache schemes optimized to reduce SSD writes have fixed strategies of selecting cache blocks, losing the freedom to choose the most appropriate cache scheme for best performance. Therefore, we propose a universal SSD lifetime enhancement plug-in for cache algorithms called Elastic Queue (EQ), which can be coupled with any cache algorithm and thus take the advantages of both of the above solutions. 2) We propose a unified priority queue model for cache algorithms, and two methods, i.e., Priority Snapshot and Priority Distribution, are respectively designed to select popular data for SSD cache and to classify the cached blocks

to determine their own eviction border. These methods can work well with any cache algorithms, fully utilizing the performance advantage of existing cache schemes. More importantly, They can form an accurate judgment about block popularity through a long-term and low-overhead observation on the behavior of blocks in the priority queue.

Acknowledgments This work was supported by the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China (No. 16XNLQ02), the National Natural Science Foundation of China (No. 61202115), the open research program of State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Science (No. CARCH201302), and the National High tech R&D Program of China (863 Program) (Grant No. 2013AA013204). The corresponding author is Yunpeng Chai (E-mail: [email protected]).

References [1] Linux bcache, 2015. http://bcache.evilpiepirate.org. [2] Linux dm-cache, 2015. cache.

http://en.wikipedia.org/wiki/Dm-

[3] Maximizing Throughput Micron RealSSDTM P300 Solid State Drives. 2015. http://www.micron.com/∼/media/Documents/Products/Product%20Flyer/ssd p300 flyer.pdf. [4] B. aDam LeVenthaL. Flash storage memory. Communications of the ACM, 51(7):47–51, 2008. [5] D. G. Andersen and S. Swanson. Rethinking flash in the data center. IEEE micro, 30(4):52–54, 2010. [6] S. Bansal and D. S. Modha. Car: Clock with adaptive replacement. In FAST, volume 4, pages 187–200, 2004. [7] S. Boboila and P. Desnoyers. Write endurance in flash drives: Measurements and analysis. In FAST, pages 115–128, 2010. [8] Y. Chai, Z. Du, X. Qin, and D. A. Bader. WEC: Improving Durability of SSD Cache Drives by Caching Write-Efficient Data. In IEEE Transactions on Computers, pages 3304–3316, Feb. 2015. [9] N. Dai, Y. Chai, Y. Liang, and C. Wang. ETD-Cache: an expiration-time driven cache scheme to make SSD-based read cache endurable and cost-efficient. In Proceedings of the 12th ACM International Conference on Computing Frontiers. ACM, page 26, May 2015. [10] EMC. EMC FAST Cache: A Detailed Review. Oct. 2011. http://www.emc.com/collateral/software/whitepapers/h8046-clariion-celerra-unified-fast-cache-wp.pdf. [11] Facebook Flashcache, 2015. s://github.com/facebook/flashcache.

http-

[12] B. Gregg. Zfs l2arc. Oracle Blogs July, 22, 2008. [13] L. M. Grupp, J. D. Davis, and S. Swanson. The bleak future of nand flash memory. In Proceedings of the 10th USENIX conference on File and Storage Technologies, pages 2–2. USENIX Association, 2012. [14] S. Huang, Q. Wei, J. Chen, C. Chen, and D. Feng. Improving flash-based disk cache with lazy adaptive replacement. In Mass Storage Systems and Technologies (MSST), 2013 IEEE 29th Symposium on, pages 1–10. IEEE, 2013. [15] S. Jiang and X. Zhang. Lirs: an efficient low inter-reference recency set replacement policy to improve buffer cache performance. ACM SIGMETRICS Performance Evaluation Review, 30(1):31–42, 2002. [16] S. Jiang, F. Chen, and X. Zhang. Clock-pro: An effective improvement of the clock replacement. In USENIX Annual Technical Conference, General Track, pages 323–336, 2005. [17] T. Johnson and D. Shasha. X3: A low overhead high performance buffer management replacement algorithm. Proceedings of the 20th VLDB Conference, 1994. [18] T. Kgil, D. Roberts, and T. Mudge. Improving nand flash based disk caches. In Computer Architecture, 2008. ISCA’08. 35th International Symposium on, pages 327–338. IEEE, 2008.

[19] D. Lee, J. Choi, J.-H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. On the existence of a spectrum of policies that subsumes the least recently used (lru) and least frequently used (lfu) policies. In ACM SIGMETRICS Performance Evaluation Review, volume 27, pages 134–143. ACM, 1999. [20] N. Megiddo and D. S. Modha. Arc: A self-tuning, low overhead replacement cache. In FAST, volume 3, pages 115– 130, 2003. [21] NetApp. Optimizing Storage Performance and Cost with Intelligent Caching. Aug. 2010. http://www.netapp.com/us/system/pdf-reader.aspx?m=wp7107.pdf&cc=us. [22] E. J. O’neil, P. E. O’neil, and G. Weikum. The lru-k page replacement algorithm for database disk buffering. ACM SIGMOD Record, 22(2):297–306, 1993. [23] Oracle Corporation. Deploying Hybrid Storage Pools with Oracle Flash Technology and the Oracle Solaris ZFS File System – An Oracle White Paper. Aug. 2011. http://www.oracle.com/technet-work/serverstorage/archive/o11-077-deploying-hsp-487445.pdf. [24] T. Pritchett and M. Thottethodi. Sievestore: a highly-selective, ensemble-level disk cache for cost-performance. In ACM SIGARCH Computer Architecture News, volume 38, pages 163–174. ACM, 2010. [25] J. Ren and Q. Yang. A new buffer cache design exploiting both temporal and content localities. In Distributed Computing Systems (ICDCS), 2010 IEEE 30th International Conference on, pages 273–282. IEEE, 2010. [26] J. T. Robinson and M. V. Devarakonda. Data cache management using frequency-based replacement, volume 18. ACM, 1990. [27] M. Saxena, M. M. Swift, and Y. Zhang. Flashtier: a lightweight, consistent and durable storage cache. In Proceedings of the 7th ACM european conference on Computer Systems, pages 267–280. ACM, 2012. R solid-state drive 910 series. Order, [28] P. Specification. Intel Jun. 2012. http://www.intel.com/content/www/us/en/solidstate-drives/ssd-910-series-specification.html.

[29] M. Woods. Exadata Smart Flash Cache Features and the Oracle Exadata Database Machine. Jan. 2013. http://www.oracle.com/technetwork/serverstorage/engineered-systems/exadata/exadata-smart-flashcache-366203.pdf. [30] Q. Yang and J. Ren. I-cash: Intelligently coupled array of ssd and hdd. In High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on, pages 278–289. IEEE, 2011. [31] Y. Zhang, G. Soundararajan, M. W. Storer, L. N. Bairavasundaram, S. Subbiah, A. C. Arpaci-Dusseau, and R. H. ArpaciDusseau. Warming up storage-level caches with bonfire. In Presented as part of the 11th USENIX Conference on File and Storage Technologies (FAST 13), pages 59–72, 2013. [32] Y. Zhou, Z. Chen, and K. Li. Second-level buffer cache management. Parallel and Distributed Systems, IEEE Transactions on, 15(6):505–519, 2004.