Modeling the Aging Process of Flash Storage by Leveraging Semantic I/O Yuhui Deng1, Lijuan Lu 2, Qiang Zou3, Shuqiang Huang4, Jipeng Zhou1, 1
Department of Computer Science, Jinan University, Guangzhou, 510632, P. R. China E-mail:
[email protected]. 2 School of Business Administration, South China University of Technology, Guangzhou, 510640, P. R. China 3 School of Computer, Southwest University, Chongqing, 400715, P. R. China 4 Network and Education Technology Center, Jinan University, Guangzhou, 510632, P. R. China The major advantages of flash memory such as small physical size, no mechanical components, low power consumption, and high performance have made it likely to replace the magnetic disk drives in more and more systems. Many research efforts have been invested in employing flash memory to build high performance and large-scale storage systems for data-intensive applications. However, the endurance cycle of flash memory has become one of the most important challenges in further facilitating the flash memory based systems. This paper proposes to model the aging process of flash memory based storage systems constructed as a Redundant Array of Independent Disks (RAID) by leveraging the semantic I/O. The model attempts to strike a balance between the program/erase cycles and the rebuilding process of RAID. The analysis results demonstrate that a highly skewed data access pattern ages the flash memory based RAID with an arbitrary aging rate, and a properly chosen threshold of aging rate can prevent the system from aging with a uniform data access pattern. The analysis results in this paper provide useful insights for understanding and designing effective flash memory based storage systems. Keywords: Flash memory; aging; RAID; rebuild; semantic I/O 1. Introduction Flash memory is a non-volatile memory which can be electrically erased and reprogrammed. Its major advantages such as small physical size, no mechanical components, lower power consumption, non-volatility, and high performance have made it likely to replace disk drives in more and more systems (e.g., digital camera, MP3 player, mobile phone etc.) where either size and power or performance are important [1]. There are two major types of flash memory which are available on the market following different logic schemes, namely NOR and NAND. Compared with the NOR flash memory, NAND flash memory has faster erasing and write times, along with higher data density, which makes NAND flash a better candidate for data storage. A NAND flash memory is composed of a fixed number of blocks, where each block consists of a number of pages and each page has a fixed-sized main data area and a spare data area. Data on NAND flash memory is read or written at the page level, and the erasing is performed at the block level [2]. Recently, more and more researchers are beginning to investigate how to employ flash memory to build high performance and large-scale storage systems for data-intensive applications [1, 3, 4, 5].
------------------------------------------------------------------------------------------Future Generation Computer Systems 32 (2014) 338–344
A very important feature of NAND flash is that the pages cannot be rewritten. When a portion of data on a page is modified, the new version of the data must be written to an available page somewhere, and the old version is invalidated. When the storage capacity becomes low, garbage collection is triggered to recycle the invalidated pages. Because erasing is performed in blocks, the valid pages in the recycled blocks have to be copied to somewhere before erasing the blocks. Another important feature of the NAND flash memory is the endurance cycles. A block will wear-out after a specified number of program/erase cycles. For example, the NAND flash memory chip (K9NBG08U5A) has an endurance of 100K program/erase cycles [2]. Therefore, a poor garbage collection policy could quickly wear out a block and a flash memory chip. Table 1 summarizes the characteristics of four typical NAND flash memories [2]. Table 1. Characteristics of typical NAND flash memories Manufacturer
Samsung
Intel
AMD
FUJITSU
Type
K9NBG08U5A
JS29F16G08FANB1
Am30LV0064D
MBM30LV0128
Capacity
4G x 8 Bit
2G x 8 Bit
8M×8Bit
16M×8Bit
Page Size(Byte)
(2K + 64)
(2K + 64)
(512 + 16)
(512 + 16)
Block
(128K + 4K)
(128K + 4K)
(8K + 256)
(16K + 512)
25µs(Max)
N/A
10µs(Max)
Size(Byte) Random Read
25µs(Max)
Serial Read
50ns(Min)
25ns(Min)
E ( H ) . Based on this premise, we have:
λc = 0
(10)
This means that for the flash memory based storage systems, theoretically, the highly skewed data access patterns would age the whole system with an arbitrary aging rate. 4.3 Uniform data access pattern As discussed in Section 1, if the accesses are evenly distributed across the flash memory blocks, the lifespan of flash memory chips can be maximized. According to the observation, we can assume that each block has roughly the same heat. This can be done by replicating multiple copies of those hot data, thus distributing the number of program/erase over the replicas and alleviating the highly skewed accesses. Therefore, we have H ≅ E (H ) .The density of aged blocks is described as: ∂a (t ) = − a (t ) + λ × E ( H ) × a (t ) × (1 − a (t )) ∂t
(11)
where − a(t ) indicates the density of reviving blocks from the aged blocks per time unit, the second term represents the density of newly aged blocks which is proportional to the effective aging rate λ , the number of semantic links emanating from each block E (H ) , and the probability that a given link points
------------------------------------------------------------------------------------------Future Generation Computer Systems 32 (2014) 338–344
to a fresh block (1 − a(t )) . We also ignore all higher-order components in this equation.
We assume that the system reaches a stable state after a certain amount of time, namely
∂a(t ) =0. ∂t
We have: −a + λ × E ( H ) × a × (1 − a) =0
There are two solutions to the above equation: a1 = 0 , and a 2 =
a2 =
(12)
λ × E(H ) − 1 . Since λ × E(H )
λ × E(H ) − 1 ≠ 0 , it implies that λ × E ( H ) ≠ 1 . Therefore, the threshold of aging rate is: λ × E(H ) 1 E(H )
(13)
λ ≤ λc 0 ⎧ ⎪ λ × E(H ) − 1 a=⎨ λ > λc ⎪⎩ λ × E ( H )
(14)
λc =
Based on the above analysis, we have
It gives us two indications: first, if the aging rate is higher than a threshold, the aging process will continue until the whole system becomes aged. Second, if the aging rate is lower than the threshold, the whole system will revive from the aging process very soon. 5. Discussion A typical flash memory based storage system involves two important layers including a Memory Technology Device (MTD) driver and a Flash Translation Layer (FTL) driver. The MTD driver mainly provides three low-level operations including read, write, and erase. Based on the low-level functionalities. The FTL is in charge of how to handle the specific characteristics of flash memory. The objective of the FTL driver is to provide transparent services for file systems to access flash memory as a block device, and to extend the durability by using wear-leveling technique. A FTL should contain at least four functions including logical-to-physical address mapping, wear-leveling, garbage collection, and power-off recovery. The functionalities of FTL and MTD can be integrated into either the firmware inside a flash memory device (e.g. Disk on Module) or the file system (e.g. Memory StickTM). For example, ShiftFlash [30] is a flash memory based storage with time-shifting functionality to make it more robust and resilient. By monitoring and recording the modifications of the FTL mapping table, ShiftFlash enables flash state to be reverted to any point-in-time (PiT) in the past. The TRIM command is designed to enable the operating system to notify the Solid State Disk (SSD) which pages no longer contain valid data due to erasing. During a deletion, the operating system will mark the pages as free for new data and send a TRIM command to the SSD to indicate the invalidation of the data. According to the indication, the SSD knows not to relocate data from the pages during garbage collection. This results in fewer writes to the flash, reducing the overhead of garbage collection. The TRIM has been integrated into the SATA subsystem of Opensolaris, EXT4, NTFS, and BTRFS [31,32,33].
------------------------------------------------------------------------------------------Future Generation Computer Systems 32 (2014) 338–344
The applications issue I/O requests by calling file system APIs (e.g. fread/fwrite). File systems normally manage the storage capacity as a linear array of fixed-size blocks. Therefore, a file access is converted to many block-level I/O requests by the file system. Each block-level I/O request contains a specific Logical Block Address (LBA) and a data block length. Therefore, the semantic I/O links are determined by the disk file systems residing above the corresponding disk drives. For example, Log-structured File System (LFS) [34]delays , remaps and clusters all data blocks into large, contiguous regions called segments on disks, only writing large chunks to the disk, which exploits disk bandwidth for small files, metadata and large files. Fast File System (FFS)[35] determines the location of the last allocated block of its file and attempts to allocate the next contiguous disk block. When blocks of a file are clustered, the number of disk I/O can be reduced. C-FFS [36] adjacently clusters the data blocks of multiple small files especially the small files in the dame directory and move to and from the disk as a unit. Ext4 employs extents to remember numbers of blocks. An extent is a continuous run of physical blocks carrying data for a continuous run of logical file blocks. The extents are kept in a B-tree indexed by the starting logical block number of an extent. This method is more space efficient than the traditional block pointers used by ext3/ext2 as the length of an extent can be up to 215 blocks in ext4. BRTFS[37] employs a B-tree to store data types and point to information stored on storage media. It aggressively aggregates and sends writes in clusters, even if they are from uncorrelated files. This results in larger write operations, and significantly decreases the number of write operations, thus reducing the wear caused by repetitive writing of the same pages of flash memory. A copy-on-write technique ensures blocks and extents are not overwritten in place. SFS [38] works is a similar way to the traditional LFS. It transforms all random writes at the file system level to sequential ones at the SSD level, as a way to exploit the maximum write bandwidth of the SSD. Furthermore, SFS collects data hotness statistics at file block level and group data accordingly. All these file systems focus on clustering small files or data blocks into contiguous disk blocks with different approaches, thus reducing the number of disk accesses. Different file systems have different semantic representations and different impacts on SSDs. However, we believe the different file system behavior only impacts the p(λ ) in our proposed model. 6. Conclusion This paper models the aging process of flash memory based RAID by leveraging different data access patterns. The model attempts to keep a balance between the program/erase cycles and the rebuilding process of RAID. The analysis results demonstrate that a highly skewed data access pattern ages the flash memory based RAID with an arbitrary aging rate, and a properly chosen threshold of aging rate can prevent the system from aging with a uniform data access pattern. The analysis results in this paper should be able to provide useful insights for designing or implementing a wear-leveling aware garbage collection policy.
ACKNOWLEDGMENT
We would like to thank the anonymous reviewers for helping us refine this paper. Their constructive comments and suggestions are very helpful. This work is supported by the National Natural Science Foundation (NSF) of China under grant (No.61272073, No. 61073064), the Natural Science Foundation of Guangdong Province (No.S2011010001525), Open Research Fund of Key Laboratory of Computer
------------------------------------------------------------------------------------------Future Generation Computer Systems 32 (2014) 338–344
System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences (CARCH201107).
References [1]. L. Chang and T. Kuo, Efficient management for large-scale flash-memory storage systems with resource conservation, ACM Transactions on Storage 1(4) (2005) 381–418. [2]. Y. Deng, J. Zhou. Architectures and Optimization Methods of Flash Memory Based Storage Systems. Journal of Systems Architecture. Elsevier Science. Vol.57, No.2, 2011, pp.214-227. [3]. A.M. Caulfield, L. M. Grupp, S. Swanson. Gordon: using flash memory to build fast, power-efficient clusters for data intensive applications. In Proceedings of the 14th international conference on Architectural support for programming languages and operating systems(ASPLOS09),2009,217-228. [4]. J. He, J. Bennett, A. Snavely. DASH-IO: an empirical study of flash-based IO for HPC. In Proceedings of the 2010 TeraGrid Conference (TG10), 2010. [5]. F. Chen, D. A. Koufaty, X. Zhang. Hystor: making the best use of solid state drives in high performance storage systems. In Proceedings of the international conference on Supercomputing (ICS11),2011, 22-32. [6]. D. Kwak, et al., Integration technology of 30 nm generation multi-level nand flash for 64 Gb NAND flash memory, in Proceedings of IEEE Symposium on VLSI Technology, 2007, pp. 12–13. [7]. A. Shimpi, Intel’s 34 nm SSD Preview: Cheaper and Faster? . [8]. Samsung SSD 840: Testing the Endurance of TLC NAND. http://www.anandtech.com/show/6459/samsung-ssd-840-testing-the-endurance-of-tlc-nand. [9]. Samsung SSD 840 (250GB) Review. http://www.anandtech.com/show/6337/samsung-ssd-840-250gb-review/2 [10]. A. Kadav, M. Balakrishnan, V. Prabhakaran, D. Malkhi, Differential RAID: rethinking RAID for SSD reliability, in Proceedings of the ACM EuroSys Conference (Eurosys’10), 2010. [11]. K. A. Smith, M. I. Seltzer. File system aging—increasing the relevance of file system benchmarks. In Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems. 1997, pp.203-213. [12]. D.A. Patterson, G.A. Gibson, R.H. Katz, A case for redundant arrays of inexpensive disks (RAID), in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1988,pp. 109–116. [13]. P.M. Chen, E.K. Lee, G.A. Gibson, R.H. Katz, and D.A. Patterson. RAID: High-Performance Reliable Secondary Storage. ACM Computing Surveys, 1994,Vol. 26, No. 2, pp. 145-185. [14]. M. Blaum, J. Brady, J. Bruck, and J. Menon. EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures. IEEE Trans. Computers, 1995,Vol.44, No.2, pp. 192-202. [15]. A. Kawaguchi, S. Nishioka, H. Motoda, A flash-memory based file system, in: Proceedings of the USENIX Technical Conference, 1995, pp. 155–164. [16]. M. Chiang, C. Paul, R. Chang, Manage flash memory in personal communicate devices, in: Proceedings of the IEEE International Symposium on Consumer Electronics, 1997.
------------------------------------------------------------------------------------------Future Generation Computer Systems 32 (2014) 338–344
[17]. L. Chang, T. Kuo, S. Lo, Real-time garbage collection for flash-memory storage systems of real-time embedded systems, ACM Transactions on Embedded Computing Systems 3 (4) (2004) 837–863. [18]. G. Fu, A. Thomasian, C. Han, and S.W. Ng. Rebuild Strategies for Redundant Disk Arrays. In Proc. 12th NASA Goddard and 21st IEEE Conf. Mass Storage and Technologies (MSST04), 2004, pp. 223-226. [19]. S.Im and D.Shin. Flash-Aware RAID Techniques for Dependable and High-Performance Flash Memory SSD. IEEE Transactions on Computers, Vol. 60, No. 1,pp.80-93, 2011. [20]. K.Park, D. Lee, Y. Woo, G. Lee, et al. Reliability and performance enhancement technique for SSD array storage system using RAID mechanism. in: Proceedings of the 9th International Symposium on Communications and Information Technology(ISCIT), 2009. [21]. S.Rizvi and T.Chung. Data Storage Framework on Flash Memory based SSD RAID0 for Performance Oriented Applications. in: Proceedings of the 2nd International Conference on Computer and Automation Engineering (ICCAE), 2010. [22]. L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and Zip-like distributions: Evidence and implications. In Proceedings of the 18th Conference on Computer Communications. 1999, pp.126–134. [23]. C. Staelin and H. Garcia-Molina, Clustering active disk data to improve disk performance, Tech. Rep. CS-TR-283-90, Dept. of Computer Science, Princeton University,1990. [24]. G. R. Ganger, B. L. Worthington, R. Y. Hou, Y. N. Patt, Disk subsystem load balancing: disk striping vs. conventional data placement, in: Proceedings of the Hawaii International Conference on System Sciences, January 1993, pp. 40-49. [25]. Y. Deng, F. Wang, N. Helian. EED: Energy Efficient Disk Drive Architecture. Information Sciences. Vol.178, No.22, 2008, pp.4403-4417. [26]. Storage Systems Program HP Laboratories, http://tesla.hpl.hp.com/public_software/. [27]. Z. Li, Z. Chen, Y. Zhou, Mining block correlations to improve storage performance, ACM Transactions on Storage .Vol.1, No.2, 2005, pp. 213-245. [28]. D. Roselli, J. R. Lorch and T. E. Anderson, A comparison of file system workloads, in: Proceedings of the USENIX Annual Technical Conference (Berkeley, CA), 2000, pp.41–54. [29]. M.Opper and D. Saad. Advanced mean field methods, theory and practice. The MIT Press, Cambridge, Massachusetts, London, England, 2001. [30]. P.Huang, K.Zhou, C.Wu. ShiftFlash: Make flash-based storage more resilient and robust. Performance Evaluation, Vol. 68, No.11 (2011). pp 1193-1206. [31]. TRIM. http://en.wikipedia.org/wiki/TRIM. [32]. SATA TRIM support in Opensolaris. http://www.c0t0d0s0.org/archives/6792-SATA-TRIM-support-in-Opensolaris.html. [33]. G.Kim, D. Shin. Performance Analysis of SSD write using TRIM in NTFS and EXT4. In Proceedings of the 6th International Conference on Computer Sciences and Convergence Information Technology (ICCIT).2011. [34]. M. Rosenblum, J.K. Ousterhout. The Design and Implementation of a Log-Structured File System. ACM Transactions on Computer Systems, Vol.10, pp.1-15,1991. [35]. M.McKusick, W.Joy, S.Leffler, R.Fabry. A fast file system for UNIX. ACM Transactions on Computer Systems. Vol.2, No.3, pp.181-197, 1984.
------------------------------------------------------------------------------------------Future Generation Computer Systems 32 (2014) 338–344
[36]. G.Ganger, M.Kaashoek. Embedded inodes and explicit grouping: exploiting disk bandwidth for small files. In Proceedings of the Annual USENIX Technical Conference. 1997. [37]. J. Kára. Ext4, btrfs, and the others. In Proceeding of Linux-Kongress and OpenSolaris Developer Conference. pp. 99–111, 2009. [38]. C.Mina, K.Kimb, H.Choc, S.Leed, Y.Eome. SFS: Random Write Considered Harmful in Solid State Drives. in Proceedings of the 10th USENIX conference on File and storage technologies, 2012.
------------------------------------------------------------------------------------------Future Generation Computer Systems 32 (2014) 338–344