Short-Random Request Absorbing Structure with Volatile ... - CiteSeerX

2 downloads 0 Views 503KB Size Report
replace traditional magnetic hard disk drives in the computer systems ... overall performance of the flash memory based storage systems. ... data [4]. Page is a basic unit of read/write operation in. NAND flash memory. However, a block is the ...
Proceedings of the 3rd WSEAS International Conference on COMPUTER ENGINEERING and APPLICATIONS (CEA'09)

Short-Random Request Absorbing Structure with Volatile DRAM Buffer and Nonvolatile NAND Flash Memory SEUNG-HO PARK, JUNG-WOOK PARK and SHIN-DUG KIM Department of Computer Science Yonsei University 134 Shinchon-Dong Sudaemoon-Ku Seoul Republic of Korea {shpark, pjppp} @parallel.yonsei.ac.kr, [email protected]

Abstract: This paper is to design a short-random request absorbing structure which can be constructed with volatile DRAM buffer and nonvolatile flash memory chips. Specifically, major weakness of NAND flash memory mostly comes from frequent short and random writes spreading in the whole logical address space, causing writing performance decrease. This phenomenon occurs because NAND flash memory does not allow in-place overwriting and some additional blocks are required for block updates. When short-random writes are frequently generated to use the limited number of update blocks, performance decreases significantly. Thus, a short-random absorbing DRAM buffer is designed to reduce such overhead. Especially the allocated blocks for update are divided into DRAM blocks and NAND chain-blocks according to the length of any writing request. Consequently it avoids several unnecessary erase operations and page copy operations. The trace based simulation result shows higher writing performance can be achieved in all sorts of traces, reducing block erase count with only 64Mbytes of DRAM writing buffer, where the overall erase count can be reduced by around from 24.35 percent to 8.08 percent compared to the SSD structure without any DRAM buffer. Thus, the proposed method can achieve scalable access performance, while minimizing the erase count and extending device lifetime. Key-Words: NAND flash memory, flash translation layer, disk buffer, DRAM, and solid state disk. FTL (flash translation layer) is an intermediate software module that exists between the host and a flash memory in order to reduce such performance degradation [1][2]. The main function of FTL is to create a mapping table, which remaps a write operation to an empty page beforehand; this reinforces “erasebefore-write” limitation of the flash memory [3]. Thus, various FTL algorithms have been proposed to improve overall performance of the flash memory based storage systems. To obtain higher performance of NAND flash memory, interleaving technique is used to increase the overall performance of SSD. Specifically, memory access performance can be scaled as the degree of interleaving increases, but in practice it is limited by the given access patterns of real applications. However, increasing the degree of interleaving can also significantly increase the number of physical block erases, causing performance degrade. Especially in specific storage access pattern generated over whole logical address space, the multibank interleaving technique has a fatal weakness of managing those access patterns because the number of update blocks is limited and the reclamation process accompanies some overhead operation. Thus, short-

1. Introduction Flash memory based solid state disks (SSD) tends to replace traditional magnetic hard disk drives in the computer systems because of rapid increase in its storage space per cost. And NAND flash memory is a nonvolatile storage that is often used for its advantages of small size, non-mechanical, shock resistance, and low power consumption. However, the weakness of NAND flash SSD mostly comes from frequent short and random writing performance in whole logical address space, which can decrease writing performance significantly. This phenomenon occurs because NAND flash memory does not allow in-place overwriting. Also, the flash memory based SSDs has also address translation overhead. In contrast to other storage devices, flash memory does not allow its own pages to be overwritten without erasing the block containing the target page. To modify a page, whole block including the requested page must be erased in order to perform the write operation. Since write operation and erase operation are relatively slower than read operation, this “erase-before-write” limitation lowers the performance of flash memory. ISSN: 1790-5117

257

ISBN: 978-960-474-41-3

Proceedings of the 3rd WSEAS International Conference on COMPUTER ENGINEERING and APPLICATIONS (CEA'09)

random writes should be managed appropriately to minimize the overhead from maintaining a large logical block. This work is to design an SSD with short-random absorbing DRAM buffer to reduce such overhead and analyze performance improvement via several access patterns of various applications. The allocated blocks for update are divided into DRAM blocks and NAND chain-blocks according to the length of any writing request. Short-random writings can be absorbed to DRAM writing buffer which is able to be overwritten in-place manner. Consequently it avoids several unnecessary erase operations and page copy operations. According to our simulation results, the overall erase count can be reduced by around from 24.35 percent to 8.08 percent compared to the SSD structure without DRAM buffer. Thus, the proposed method can achieve scalable access performance, while minimizing the erase count and extending device lifetime. The rest of this paper is organized as follows: Section 2 presents the background of current NAND flash memory trends and its characteristics. Sections 3 shows the short-random request absorbing structure for NAND flash memory with small size of DRAM writing buffer and its management algorithm. In Section 4, the performance and the analysis of this system are provided. Finally, we conclude in Section 5.

constructed as a set of adjacent physical blocks, formed across several flash memory chips. As the size of chainblock increases, the performance of storage system also increases, but the numbers of block erases and merges increase. Thus, small random writes should be controlled efficiently for stable performance of the SSD. Several FTL schemes were proposed to overcome non-overwriting in-place problem. In log block scheme, a single update block is assigned to each logical block to perform write operation [7]. However, when short random writes are frequently generated, block-threshing problem may occur. To solve this problem, random and sequential writes are managed differently when any writing is performed [8]. For sequential writing operations, each block is associated to its log block. However, for random writing operations, data are collected and stored in the log block in a fully associative manner. Therefore, block-threshing can be prevented to a certain extent. Also a method of classifying hot data and cold data was proposed [9][10], as superblock-based FTL, where adjacent blocks are collected to form a logical superblock, and hot and cold data are separated within the logical block. By collecting cold data in some specific blocks, unnecessary additional block erase can be avoided. Another research to increase the performance of the NAND flash SSD was also designed as a hybrid and mixed solid state disk [11][12]. In this research, single level cell (SLC) and multi level cell (MLC) are mixed to increase the performance of the storage. Here, hot data are stored into small volume of SLC, for fast access performance and long life cycle of SLC. In general, if small and hot data are mixed within a single large logical block, unnecessary flash memory operations occur in block reclamations and merge operations. Moreover, frequent flash block erases cause the shortening of the storage’s life time. Especially in general-purposed storages such as SSDs, the size of a logically processed block is increased in order to maintain high parallelism, which leads to higher interleaving to provide better performance. However, a chance for un-updated pages to be mixed within the logical block also increases, requiring more consideration to utilize disk access patterns. Based on the discussion about writing pattern on NAND flash SSD, we expand it to multibank SSD structure while minimizing additional block erases and accessing multiple NAND flash memory banks in parallel with small amount of DRAM writing buffer on its disk structure.

2. Background In this section, current trend of NAND flash memory technology will be examined briefly and its inherent problem to get high performance as a solid state disk. A NAND flash memory is composed of a fixed number of blocks, and each block is made up of several pages. Each block can be composed of 128 pages depending on any chip architecture, and each page can be composed of 4096 bytes of data area to store actual data [4]. Page is a basic unit of read/write operation in NAND flash memory. However, a block is the basic unit of erase operation, which is one of the three basic operations, such as read, write, and erase operation. This means that erase operation is performed in a much larger unit than read or write operation. One important characteristic required for flash memory storage is the “erase-before-write,” where deletion of the block storing valid pages and re-copy of the valid pages into a new block need to be performed [5]. For this type of update operations, a group of update blocks is maintained to handle write operation whenever any write request comes from host system. The latest copy of data is considered as valid, and the old data to be overwritten by a new data is considered as invalid. Because the number of update blocks is limited, reclamation process should be performed periodically to provide available blocks. In order to increase the overall storage capacity, NAND flash memory based SSD is designed with multiple banks [6]. Also, a single large chain-block is ISSN: 1790-5117

3. Short-Random Absorbing Structure In this section, the proposed design and implementation of NAND flash memory based SSD with small DRAM writing buffer are described in detail. This system is composed of a small amount of DRAM 258

ISBN: 978-960-474-41-3

Proceedings of the 3rd WSEAS International Conference on COMPUTER ENGINEERING and APPLICATIONS (CEA'09)

writing buffer and a large capacity of multiple NAND flash memory chips.

3.1. Overall Structure Since write requests come along with additional block management overhead, such as additional write operations on NAND flash pages and additional erase operations on NAND flash blocks, a DRAM writing buffer can be adopted to reduce this overhead. When write requests come from the host, they are determined to be stored into either DRAM writing buffer or directly into the NAND flash chips based on their length. When any write request is shorter than the threshold length, it is considered as a frequently used random request, otherwise as a sequential request. This threshold length was chosen as 96KB. Figure 1 shows short-random request absorbing structure by combining DRAMs with flash memory. The structure consists of a DRAM writing buffer for short-random write requests and multiple NAND flash memory chips for normal storage space. The small size of DRAM is responsible for managing short-random writes and multiple NAND flash chips play a role of parallel access for maximum performance. To access the multiple flash memory chips in parallel, the flash memory multichannel controller is located above them. Thus, the size of a logical block can be constructed as multiple physical block size by using the number of flash memory chips as a chain-block form. On the other hand, the size of a block which composes the DRAM writing buffer can be minimized in order to make the address translation freely. Therefore, multiple DRAM blocks can be allocated to a single logical block address in case that it contains a large amount of frequently updated data. DRAM block management and NAND chain-block management modules handle such block allocations. Write requests are separated to be stored either into DRAM short-random absorbing buffer or to multiple NAND flash memory chips by the threshold. By doing so, short random requests can be collected and stored directly into fast volatile DRAM memory while other requests are maintaining maximum interleaving level. When replaced from the short-random absorbing buffer, the data and the logical sector address are transmitted to the NAND flash memory. Because NAND flash memory does not allow its own page overwritten, the data should be written to already erased block which is called update block. Then, a chain-block mapping table and page mapping tables are used to maintain the location of each data. In case of DRAM absorbing buffer, we should know whether the desired page is located here or not. Therefore, a DRAM block mapping table is used to maintain the information about the location in the buffer.

ISSN: 1790-5117

Fig. 1. Short-Random write request absorbing structure

3.2. NAND and DRAM Block Management The logical sector address generated by host finally approaches the NAND flash memory or the DRAM writing buffer. First, as the unit of a logical block size, LBA (logical block address) is calculated from the logical sector address. The size of the logical block is same as the size of a NAND chain-block. For example, when the size of a single physical block of the NAND flash memory is 512KB and the number of NAND flash memory chips composing the SSD is 16, the size of the NAND chain-block is 8MB. Therefore, the size of an LBA is also specified as 8MB and the volume occupied by the flash block mapping table becomes relatively small; while the SSD is in operation, the whole flash block mapping table can be managed by being loaded on RAM. Then, the actual DRAM block and the NAND chain-block are found by using the logical sector address and the flash block mapping table. Finally, page mapping tables are used to find the desired page within the blocks. Figure 2 shows a structure of the flash block mapping table. Typically, one data chain-block exists at each entry of the table. In this situation, when update is made on the logical block address space, write operation is performed by allocating available DRAM block or available NAND chain-block to the logical block, from shared DRAM block pool or available NAND chainblock pool, respectively. When the length of a write request is shorter than the threshold value, a DRAM block is assigned, and when the length of a write request 259

ISBN: 978-960-474-41-3

Proceedings of the 3rd WSEAS International Conference on COMPUTER ENGINEERING and APPLICATIONS (CEA'09)

Fig. 2. Logical view of allocating blocks is longer than the threshold value, a NAND chain-block is assigned. Sometimes when writing operations with various lengths occur within a single logical block, two types of blocks could coexist. This accommodates each logical block to various writing pattern circumstance. In Figure 2, the DRAM block is assigned for the update process in LBA 3, where the write request shorter than the threshold is made, and the NAND chain-block is assigned for the update process in LBA 6, where the write request length which is longer than the threshold is made. In LBA 5 and LBA 9, both the DRAM block and the NAND chain-block coexist. Because DRAM blocks allow in-place overwrites in contrast to NAND chain-blocks and short writings tend to be overwritten frequently, this block allocation technique prevents the NAND flash blocks from frequent allocations and erases. Hence, the performance of NAND flash memory based SSD can be more improved.

Fig. 3. LRU block reclamation Then the update NAND chain-block and the data NAND chain-block are merged into one data NAND chain-block. When choosing a logical block to be merged, the least recently used update NAND chainblock is selected. In other words, we assume that there is little possibility to be updated again in least recently updated logical block. Also, there is another case when a logical block is to be merged. When an update NAND chain-block is full of its pages, more write operation to the chain-block cannot be performed. Consequently, the merge operation should be carried out only for this logical block. DRAM Block Reclamation: When a write request which is shorter than the threshold value comes and there is no available DRAM block to be allocated to the LBA, the DRAM block reclamation is triggered. The method of reclaiming the DRAM blocks is similar to the NAND chain-block reclamation. First, find a logical block on which short-random request was least recently written. Then, all of data are moved to NAND chainblock to release the DRAM blocks. If the update NAND chain-block is also full while data are being moved from DRAM blocks, the two NAND chain-blocks should be merged. The only thing different from NAND chainblock reclamation is that it does not need to be merged when all DRAM blocks which belong to the logical block are full of data for the reason that DRAM blocks allow overwriting.

3.3. LRU Block Reclamation When there is no more available NAND chain-block or no more available DRAM block to be allocated, block reclamation process is performed. As a result of the block reclamation process for above mentioned two situations, an available NAND chain-block or some available DRAM blocks can be obtained. Therefore, these two kinds of reclamation are processed separately as NAND chain-block reclamation and DRAM block reclamation, respectively. Figure 3 shows these processes. NAND Chain-Block Reclamation: When a write request which is longer than the threshold value is made and there is no available NAND chain-block to be allocated, the NAND chain-block reclamation is triggered. In this process, one logical block should be chosen, which contains an update NAND chain-block. ISSN: 1790-5117

260

ISBN: 978-960-474-41-3

Proceedings of the 3rd WSEAS International Conference on COMPUTER ENGINEERING and APPLICATIONS (CEA'09)

Table 1. Input trace files access pattern Total Write Write Patterns Workload request average ratio amount length Windows P1 47.3 % 4,231,121 38.30 Update Application P2 73.5 % 8,808,639 65.30 Install P3 P4 P5 P6

File Download Disk Clean Manager File Compress Decompress General Usage

99.5 %

4,208,347

125.22

5.7 %

3,498,937

23.04

49.4 %

1,757,190

74.29

32.3 %

2,987,093

41.40

Table 2. Buffer effect on block erase operations Without With Patterns Decrease ratio Buffer Buffer P1

55.29 %

314,768

174,032

P2

58.15 %

718,992

418,112

P3

24.35 %

16,642

4,408

P4

8.08 %

79,184

64

P5

54.20 %

75,184

40,752

P6

38.47 %

271,856

104,576

(unit: the number of block erases)

(unit: sectors)

erases becomes; in the opposite case, the number of block erases increases. Also, since the number of block erases is closely related to the life time of NAND flash memory, it is better to have fewer flash block erases in managing the same number of write requests. Table 2 shows the number of physical block erases when the short-random absorbing writing DRAM buffer is used or not, according to the types of trace patterns. Based on each trace pattern, block erase count decreases from 58.15 percent to maximum 8.08 percent with only 64Mbytes of DRAM used. This result also means that writing performance can be improved by reducing additional erase operations and page copying overhead occurred in merging and reclamation.

4. Experimental evaluation The proposed SSD simulator is developed to evaluate overall performance of the storage system which includes both DRAM writing buffer and NAND flash memories. The capacity of the storage is chosen as 10Gbytes which consists of 16 NAND flash memory chips. The size of DRAM writing buffer is fixed to 64MBytes. The simulator can perform all kinds of NAND flash operations such as page read, page write, and block erase. Also those operations can be performed in parallel via multi-channel access to NAND flash chips. I/O traces used as input to the simulator extracted from actual PC usage. The trace data are gathered from 10Gbytes hard disk drive with Windows 2000 NTFS file system. The trace data are classified into five different patterns. Each pattern has some discriminated characteristics, such as the total number of requests, the ratio of writing requests, and the average length of writing requests. Also, each trace includes several different tasks which can be obtained in common jobs from computing – windows updates, installing some applications, downloading files from network, using disk management utility, compress and decompress of files, and general usage of several applications. Table 1 shows detailed information of extracted trace files.

4.2. Read Performance Improvement The short-random writing absorbing DRAM buffer does not only reduce NAND flash block erase count, but also increases read performance of the SSD. This is because short-random frequently used data can reside in DRAM writing buffer. As a result, those data can be obtained directly from the DRAM writing buffer without accessing NAND flash memories. Figure 4 shows the performance improvement of each access pattern. The read performance can be improved from 1.14 percent to 22.98 percent.

4.1. Decrease of Block Erase Operations First, we have to classify writing requests into shortrandom or long-sequential writes by setting a chosen threshold value. Then, the short-random requests which are shorter than this value should be written to the DRAM writing buffer. The threshold length is decided as 94Kbytes, where the least number of erase operations is generated than any other threshold values. As writing requests should be performed on update NAND chain-blocks some time, they always cause block erases. Therefore, the more efficient the block management in SSD is, the lower the number of block ISSN: 1790-5117

Figure 2. Read performance improvement 261

ISBN: 978-960-474-41-3

Proceedings of the 3rd WSEAS International Conference on COMPUTER ENGINEERING and APPLICATIONS (CEA'09)

[11] L.P. Chang, “Hybrid Solid-State Disks: Combining Heterogeneous NAND Flash in Large SSDs”, Design Automation Conference ASPDAC, pp428-433, March 2008. [12] S.H. Park, J.W. Park, J.M. Jeong, J.H. Kim and S.D. Kim, “A Mixed Flash Translation Layer Structure for SLC-MLC Combined Flash Memory System”, 1st International Workshop on Storage and I/O Virtualization, Performance, Energy, Evaluation and Dependability, February 2008.

5. Conclusion In this paper, we have proposed a short-random request absorbing mechanism that can utilize both a DRAM writing buffer and massive NAND flash memories for SSD. Its performance advantage mainly comes from allowing short-random writing operations to the DRAM writing buffer which has a merit of fast inplace overwriting than NAND flash memory without any erase operations. Through the suggested shortrandom request absorbing structure with its block management, write performance can be improved by reducing physical NAND block erase count significantly. Also, read performance can be improved by short-random absorbing buffer too. References: [1] T.S. Chung, D.J. Park, S.W. Park, D.H. Lee, S.W. Lee and H.J. Song “System Software for Flash Memory: A Survey”, EUC 2006, LNCS 4096, pp. 394-404, 2006. [2] S.H. Lim and K.H. Park, “An Efficient NAND Flash File System for Flash Memory Storage”, IEEE Transactions on Computers, Vol. 55, No. 7, July 2006. [3] S.Y. Kim and S.I. Jung, “A Log-based Flash Translation Layer for Large NAND flash memory”, Advanced Communication Technology, Vol. 3, pp 1641-1644, February 2006. [4] NAND Flash technical paper, SLC-Large block 8G bit, 1Gx8, K9K8G08U1A, Available from: http://www.samsung.com/global/business/semicondu ctor/productList, 2007. [5] E. Gal and S. Toledo, “Algorithms and Data Structures for Flash Memories”, ACM Computing Surveys, Vol. 37, No. 2, pp. 138-163, June 2005. [6] C. Park, P. Talawar, D. Won, M.J. Jung, J.B. Im, S. Kim and Y. Choi, “A High Performance Controller for NAND Flash-based Solid State Disk (NSSD)”, Non-Volatile Semiconductor Memory Workshop, IEEE NVSMW 2006, pp17-20, February 2006. [7] J. Kim, J.M. Kim, S.H. Noh, S.L. Min and Y. Cho, “A Space-Efficient Flash Translation Layer for Compactflash Systems”, IEEE Transactions on Consumer Electronics, Vol. 48, No. 2, May 2002. [8] S.W. Lee, D.J. Park, T.S. Chung, D.H. Lee, S.W. Park and H.J. Song, “A Log Buffer-Based Flash Translation Layer Using Fully-Associative Sector Translation”, ACM Transactions on Embedded Computing System, Vol. 6, No. 3, Article 18, 2007. [9] J.W. Hsieh and T.W. Kuo, “Efficient Identification of Hot Data for Flash Memory Storage Systems”, ACM Transactions on Storage, Vol. 2, No. 1, pp. 2240, February 2006. [10] J.U. Kang, H. J, J.S. Kim and J. Lee, “A Superblock-based Flash Translation Layer for NAND Flash Memory”, Proceedings of the 6th ACM & IEEE International conference on Embedded software, October 22-25, 2006. ISSN: 1790-5117

262

ISBN: 978-960-474-41-3

Suggest Documents