Locality-driven MRC Construction and Cache Allocation Jianyu Fu
Dulcardo Arteaga
Ming Zhao
National University of Defense Technology Arizona State University
[email protected]
ParallelM, Inc. Sunnyvale, Califonia
[email protected]
Arizona State University Tempe, Arizona
[email protected]
ABSTRACT
running together. How to appropriately allocate the shared cache space to the VMs have great impact on the VMs’ performance, since different VMs have different workload characteristics and thus have different cache behaviors. Cache utility curves, such as miss ratio curves (MRCs), are effective for managing cache allocations. For a workload, its MRC represents the historical cache behavior of this particular workload by plotting the quantified relationship between its cache miss ratio and the cache size allocated to it [4]. MRC Construction. The classical method to construct MRCs is based on the metric of reuse distance. The reuse distance of a block reference is defined as the number of other unique references between two successive accesses to this particular block, and it can be calculated using Mattson’s stack algorithm [2]. For a workload’s reference stream, it uses a stack to store all the unique references. Assume Least Recently Used (LRU) is the replacement policy, then the references in the stack are ordered from most recent to least recent access. To construct the MRC, it scans the reference stream, operates the stack, calculates the reuse distance for each access, and finally generates a histogram of reuse distances. In practice, tree-based algorithms, which employ a balanced tree to quickly compute reuse distances and a hash table to accelerate lookups into this tree [3, 4], are usually adopted to reduce the construction cost. One basis of MRCs is that for any reference, as long as the cache size is greater than the reference’s reuse distance, this individual reference will hit the cache. Then the relationship between cache size and cache hit ratio (it can be transformed from the cache miss ratio) can be easily quantized, and the applications can be allocated appropriate size of cache to fulfill their performance requirements. WS-based MRCs. Traditional MRCs are constructed based on the Working Set (WS) model, and every reference is considered to operate the stack and calculate the reuse distance. For example, there is a reference sequence A0 -B-D-E-G-A1 -A2 , and A0 , A1 and A2 mean the reference to the same block A at different times. The reuse distance of the first five references are defined as ∞, and they cannot hit the cache. A1 ’s reuse distance is 4, which means it can hit the cache only if the cache size is 5 or greater. Since there are no other references between A1 and A2 , A2 ’s reuse distance is 0, and it can hit the cache even though the cache has only one block. However, we can observe from the above reference sequence that B, D, E, G increase A1 ’s reuse distance (A1 needs at least 5 cache blocks to hit the cache), but they do not contribute to the overall cache hit ratio. We can also illustrate the observation from another perspective: if B, D, E, G are not allowed to admit into the cache and they are not counted for A1 ’s reuse distance calculation, then A1 can hit the cache with only one block cache. For the reuse distance calculation in WS-based MRCs, since they consider the data which do not have good locality and contribute little to the cache hit ratio, they will mislead the applications to
Flash caches have been widely deployed in cloud computing environments to boost the performance of virtual machines (VMs), but they are usually oversubscribed among the VMs due to the limited cache space compared to the VMs’ working set (WS) size, making the cache allocation a challenging problem. Miss Ratio Curves (MRCs) can be used to manage cache partitioning among the VMs; however, traditional WS-based MRCs are constructed based on all the data, including the data without good locality, which forces the VMs to allocate unnecessary cache space to achieve their objective performance. The paper presents QCache, a locality-driven solution for MRC construction and cache allocation. First, it proposes a new design of MRC, RWS-based MRCs, which is constructed based on the reuse working set and guides the cache allocation based on the good-locality data. Second, with RWS-based MRCs, it provides two different algorithms to optimize the cache allocation among all the VMs, regarding the improvement of the overall performance and the optimization for each VM’s Quality-of-Service respectively.
CCS CONCEPTS • Information systems → Cloud based storage; Storage management;
KEYWORDS Caching, flash memory, miss ratio curve, reuse working set ACM Reference Format: Jianyu Fu, Dulcardo Arteaga, and Ming Zhao. 2018. Locality-driven MRC Construction and Cache Allocation. In HPDC ’18 Posters/Doctoral Consortium : The 27th International Symposium on High-Performance Parallel and Distributed Computing, June 11–15, 2018, Tempe, AZ, USA. ACM, New York, NY, USA, 2 pages. https://doi.org/10.1145/3220192.3220461
1
INTRODUCTION
Flash-memory-based caches are commonly deployed in cloud computing environments to improve the storage performance of virtual machines (VMs). However, due to the high cost of flash caches and their limited capacity compared to the applications’ working set size, caches are usually oversubscribed among multiple VMs Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
[email protected]. HPDC ’18 Posters/Doctoral Consortium , June 11–15, 2018, Tempe, AZ, USA © 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-5899-6/18/06. . . $15.00 https://doi.org/10.1145/3220192.3220461
19
HPDC ’18 Posters/Doctoral Consortium , June 11–15, 2018, Tempe, AZ, USA allocate unnecessary cache space to achieve their performance objectives. When the applications tend to allocate more cache space to store even their low-locality data, the cache over-subscription will be further aggravated, affecting the overall system performance and the applications’ service-level objective guarantee.
RWS-BASED MRC
CloudCache [1] introduced a new cache demand model, Reuse Working Set (RWS), to capture only the data with good temporal locality. This approach uses new cache admission policies to admit only the RWS into cache to make efficient on-demand cache management. However, there is still no known method for constructing MRCs with cache admission considered. To efficiently allocate cache space to VMs based on the good-locality data, we propose a new MRC design, RWS-based MRCs.
2.1
2 A1 ∞
3 A2 0
4 B1 ∞
5 C0 −
6 D0 −
7 D1 ∞
8 A3 2
4
128.6 69 34.5 Cache Size (GB)
60 40 20 0
128.6 69 34.5 Cache Size (GB) (b) Flash Writes
CONCLUSIONS AND FUTURE WORK
MRC is an useful tool to guide cache allocation, especially in multiVM cloud computing environment, but WS-based MRCs tend to allocate unnecessary cache space to a workload to cache its lowlocality data. The paper proposes locality-driven RWS-based MRC construction and makes cache allocation based on the good-locality data. Based on RWS-based MRCs, we will explore more cache allocation algorithms regarding different performance objectives and do comprehensive experiments to compare our new design to traditional algorithms in future.
Construction Algorithm
Based on the metric of R3 distance, we introduce the construction of RWS-based MRCs using a hash table and a splay tree. The hash table stores the block reference’s address as the key, and the block’s access information (e.g., its reference count, timestamp of its last reference) as the value. The splay tree is a self-adjusting binary search tree. Each node in the splay tree has a timestamp of one reused reference as the search key, and it also has a weight parameter, which is defined as the number of the nodes in its right sub-tree, to calculate the R3 distances quickly.
3
20
RWS-MRC
VM’s QoS target. We use RWS-based MRCs as the basis and adopt different algorithms to do cache allocation to fulfill the objectives. Overall. This algorithm focuses on improving the overall system performance. After constructing the RWS-based MRC for each VM, their MRCs are searched repeatedly using the greedy algorithm to select the VM whose MRC currently has the maximum gradient to assign one piece of cache space, until the whole cache are allocated to the VMs, so that the utilization of flash caches can be maximized. QoS. This algorithm focuses on ensuring the VM’s QoS targets. Each VM is set a reasonable QoS target, and the allocation scheme aims to allocate an appropriate cache size to each VM so as to achieve as many VMs’ targets as possible. After constructing the RWS-based MRC for each VM, their MRCs are searched repeatedly using the greedy algorithm as well to select the VM whose MRC currently minimizes the overall distance to their QoS targets to assign one piece of cache space, until the whole cache are allocated. Figure 1 shows a comparison for using WS-based or RWS-based MRCs to do cache allocation to optimize the overall system performance when we replay 14 real-world workloads together. The results show that RWS-based MRCs improve the cache hit ratio by 3.2∼15.6%, and reduce the flash write ratio by 51.2∼60.6%.
Table 1 shows an example of the calculation of R3 distance for a sample trace. For A0 , B0 , C0 and D0 , since they are not admitted into cache, their R3 distances are NULL; for A1 , B1 and D1 , since they are admitted into cache for the first time, their R3 distances are ∞; for other references that have been admitted into cache before, their R3 distances are calculated according to the definition.
2.2
WS-MRC
Figure 1: WS-based vs. RWS-based MRCs
Table 1: Calculation of R3 Distance 1 B0 −
40
(a) Hit Ratio
To fit the RWS-based MRCs, we propose a new metric reused reference’s reuse distance (hereinafter called R3 distance). Different from traditional reuse distance, it is defined as only the number of other unique reused references between two successive references to the same block. Its key difference from the reuse distance metric is that R3 distance only captures the data with good locality, thus RWS-based MRCs just need to provide cache space that can fulfill the good-locality data of a workload.
0 A0 −
60
0
R3 Distance
Time Ref. R3D
RWS-MRC Flash Write Ratio (%)
WS-MRC
Hit Ratio (%)
2
J. Fu et al.
REFERENCES [1] Dulcardo Arteaga, Jorge Cabrera, Jing Xu, Swaminathan Sundararaman, and Ming Zhao. 2016. CloudCache: On-demand Flash Cache Management for Cloud Computing. In FAST’16. USENIX, 355–369. [2] Richard L. Mattson, Jan Gecsei, Donald R. Slutz, and Irving L. Traiger. 1970. Evaluation Techniques for Storage Hierarchies. IBM Systems Journal 9, 2 (1970), 78–117. [3] Qingpeng Niu, James Dinan, Qingda Lu, and P Sadayappan. 2012. PARDA: A Fast Parallel Reuse Distance Analysis Algorithm. In IPDPS’12. IEEE, 1284–1294. [4] Carl A Waldspurger, Nohhyun Park, Alexander T Garthwaite, and Irfan Ahmad. 2015. Efficient MRC Construction with SHARDS. In FAST’15. USENIX, 95–110.
CACHE ALLOCATION
In cloud computing environment, there are typically multiple VMs running together in the same node, and how to efficiently allocate the shared cache space among so many VMs is challenging. In this paper, we focus on the cache allocation for two important situations: (1) to optimize the overall system performance; (2) to ensure each
20