A bulk-erase is initiated when the flash memory storage systems have a large ... A bulk-erase could involve a large number of live ...... software, and email client.
Efficient Allocation Algorithms for FLASH File Systems By Li-Fu Chou A thesis submitted to Institute of Computer Science and Information Engineering College of Electrical Engineering and Computer Science National Taiwan University in partial fulfillment of the requirements for the degree of master in computer science 2004
Abstract Embedded systems have been developing rapidly in recent years, and flash memory technology has become an essential building block because of its shock-resistance, low power consumption, and non-volatile nature. Since flash memory is a write-once and bulkerase medium, an intelligent allocation algorithm is essential to providing applications efficient storage service. In this paper, we propose three allocation algorithms – a First Come First Serve (FCFS) method, a First Re-arrival First Serve (FRFS) method, and an Online First Re-arrival First Serve (OFRFS) method. Both FCFS and OFRFS are online allocation mechanisms which provide a deterministic performance for hard real-time systems. In addition, the FRFS method, which serves as an off-line mechanism, is taken as the standard of comparison. The capability of the proposed mechanisms is demonstrated by a series of experiments and simulations. The experimental results indicate that FRFS provide superior performance when the data access patter is analyzed in advance, and the on-line OFRFS method provides good performance by run-time estimation of access patterns.
Contents 1 Introduction
5
2 Flash Memory Allocation Model
7
2.1
Flash Memory Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.2
Page Access Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.3
Flash Memory Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
3 Algorithm
10
3.1
First Come First Serve Allocation . . . . . . . . . . . . . . . . . . . . . . . 10
3.2
First Re-arrival First Serve Allocation . . . . . . . . . . . . . . . . . . . . . 13
3.3
Online First Re-arrival First Serve Allocation . . . . . . . . . . . . . . . . 16
3.4
Hybrid Online Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 Experimental Results
23
4.1
Regularity of Page Access Pattern . . . . . . . . . . . . . . . . . . . . . . . 23
4.2
Experimental Guidelines and Results . . . . . . . . . . . . . . . . . . . . . 24 4.2.1
Experimental Guidelines . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.2
Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5 Conclusion
31
1
List of Figures 2.1
There are totally 8 blocks in this example and each block has 4 cells, i.e., F = 8 and B = 4. The block bk (0 ≤ k ≤ 7) has cells c4k , c4k+1 , c4k+2 , and c4k+3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.2
The modification of a file can be modeled as a series of page modification. .
8
2.3
An data access example when the number of blocks is 8 and the number of cells in each block is 4. c2 is set to invalid when the page inside c2 is modified. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
3.1
The pseudo code for First Come First Serve Allocation (FCFS) algorithm.
11
3.2
A complete example of FCFS in which both F and B are 4. The initial status is in (a), and the subsequent status after each step of the allocation is shown from (b) to (n). The overall maximum number of requested blocks is 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3
The pseudo code of First Re-arrival First Serve Allocation (FRFS) algorithm. 14
3.4
A complete example of FRFS. . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5
The status of prediction table after each allocation step of OFRFS. . . . . 17
3.6
An example of how to determine the number of cells that an incoming page has to reserve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.7
The pseudo code of On-line First Re-arrival First Serve Allocation (OFRFS) algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.8
An example of OFRFS. The state of prediction table at each moment is shown in Figure 3.5. The maximum number of requested blocks is 4. . . . 22
4.1
The statistic graph of the first trace record. . . . . . . . . . . . . . . . . . 24
4.2
The statistic graph of the second trace record. . . . . . . . . . . . . . . . . 25
4.3
The statistic graph of the third trace record. . . . . . . . . . . . . . . . . . 25
4.4
The average number of the requested blocks from the four allocation algorithms on the first trace file, . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5
The average number of the requested blocks from the four allocation algorithms on the second trace file, . . . . . . . . . . . . . . . . . . . . . . . . 28 2
4.6
The average number of the requested blocks from the four allocation algorithms on the third trace file, . . . . . . . . . . . . . . . . . . . . . . . . . 29
3
List of Tables 3.1
FRFS timestamps each page and computes its re-arrival time. Note that the re-arrival time i means infinity. . . . . . . . . . . . . . . . . . . . . . . 13
4
Chapter 1 Introduction The recent rapid developments of embedded systems have changed many aspects of our daily life. More and more embedded systems are deployed in household appliances, office machinery, transportation vehicles, and industrial controllers. These tiny devices, with the help from increasing computing power of modern microprocessors, are able to perform and control complex operations. With this advancing embedded system technology more and more ”smart” devices are able to provide inexpensive and reliable controlling capability. Flash memory system has become a very important part of embedded systems because of its shock-resistance, low power consumption, and non-volatile nature. With the recent technology breakthroughs in both capacity and reliability, more and more embedded system applications tend to use flash memory as the storage systems. For example, manufacturing systems in factories must be able to tolerate severe vibration, which may cause damage to hard-disks. As a result, flash memory is a good candidate for such applications. Flash memory is also suitable to portable devices, which have very limited power source due to restricted battery capacity, so that their operating time can be lengthened. In addition, a heavy portion of cost percentage in manufacturing an embedded system product lies in the flash memory storage. As a result, the efficient use of flash memory system, including a good allocation algorithm to fully utilize flash memory, is the motivation for this research. There are two important issues in the efficiency of the flash memory storage system implementation – write-once and bulk-erasing. Since the existing data within a flash memory cell cannot be overwritten directly, a special “erase” operation must be performed before the same cell can be reused. The new version of data will be written to some other available “live” space, and the old version of data is then invalidated and considered ”dead”. As data being repeatedly updated, the locations of the data change from time to time. This out-place-update scheme is adopted by flash memory systems. A bulk-erase is initiated when the flash memory storage systems have a large number of live and dead cells mixed together. A bulk-erase could involve a large number of live data copying since the live data within the regions that will be erased must be copied to somewhere else before the erasing. This garbage collection recycles the space occupied by 5
dead data. There have been various techniques proposed to improve the performance of garbage collection for flash memory [3, 7, 8]. Kawaguchi, et al. proposed a cost-benefit policy [3], which uses a value-driven heuristic function as a block-recycling policy. Chiang, et al. [8] refined the work by considering the locality in the run-time access patterns. Kwoun, et al. [7] proposed to periodically move live data among blocks so that blocks have more even life-cycles. Although researchers have proposed excellent garbage-collection policies, there is little work done in providing a deterministic performance guarantee for flash-memory storage systems. It has been shown that garbage collection could impose almost 40 seconds of blocking time on time-critical tasks without proper management [9]. As a result, Chang, et al. [5] proposed a deterministic garbage collection mechanism to provide a predictable garbage collection mechanism. The focus of this research is to emphasize the importance of allocation strategy, so that the system requires less times for garbage collection. In time-critical systems we must consider both efficient garbage collection and allocation mechanism. A good allocation algorithm greatly improves garbage collection time by reducing the number of flash memory blocks required to realize a data update pattern. This motivates us to develop intelligent data allocation algorithms to reduce the resource consumption for flash memory storage systems. The rest of this paper is organized as follows: Chapter 2 introduces the flash memory operation and allocation model. Chapter 3 presents three allocation algorithms – a First Come First Serve (FCFS) method, a First Re-arrival First Serve (FRFS) method, and an Online First Re-arrival First Serve (OFRFS) method. Chapter 4 summarizes the experimental results and Chapter 5 concludes with possible future research directions.
6
Chapter 2 Flash Memory Allocation Model This chapter describes the allocation model for flash memory file systems. There are two major architectures in flash memory design: NOR flash and NAND flash [2]. NOR flash is a kind of EEPROM and NAND flash is designed for data storage. This dissertation focuses on NAND flash as a storage for embedded system. NAND also has a better price/capacity ratio compared to NOR flash. We will describe the properties of flash memory systems and the allocation problem we would like to resolve.
2.1
Flash Memory Systems
A flash memory system is a collection of many cells. Each cell in the flash memory has a unique ID, therefore the cells are denoted by c0 , c1 , c2 , . . . . A fixed number of cells are grouped into a block. This number is denoted by B, as the number of cells in a block. We assume that there are F blocks in the system. Each block of the flash memory also has a unique id, and the blocks are denoted as b0 , b1 , b2 , . . . , bF −1 . As a result the cells in block bk are cB∗k , cB∗k+1 , cB∗k+2 , cB∗k+3 , . . . , cB∗(k+1)−1 . Each flash memory cell can be in one of the following three states – free, used, and invalid. A free cell has no data in it, a written cell has some data written into in it, and a invalid cell has a data that is no longer valid. On the other hand, a block can be in one of the following two states – active, inactive. An active blocks has valid data written in some of its cells, and an inactive block only has free or invalid cells. Initially all cells are free. When a data is written into a free cell, the cell becomes “written”. Unlike a disk file system, a written cell cannot be written with any other data. If we would like to rewrite a data stored a cell, we need to write it to another free cell, and mark the original cell invalid. Invalid cells can be put back into the free state only through an “erase” operation. However, a block can be erased only when every cell of the block is free or invalid. That means only inactive blocks can be erased. After the erase operation all cells become free and new data can be written into them. That is, if any cell of a block is in the written state that means the block is active, and we cannot erase 7
b[0] b[1]
c[0]
c[1]
c[2]
c[3]
c[4]
c[5]
c[6]
c[7]
. . . b[k−1] b[k]
c[4k−4]
c[4k−3]
c[4k−2]
c[4k−1]
c[4k]
c[4k+1]
c[4k+2]
c[4k+3]
c[30]
c[31]
. . . b[7]
c[28]
c[29]
Figure 2.1: There are totally 8 blocks in this example and each block has 4 cells, i.e., F = 8 and B = 4. The block bk (0 ≤ k ≤ 7) has cells c4k , c4k+1 , c4k+2 , and c4k+3 . it.
2.2
Page Access Pattern
A file is divided into several pages denoted by p0 , p1 , p2 , . . . , etc. Each file page has the same size and can fit into a flash memory cell. The pages of the file will be written into allocated flash memory cells. We assume that the file operation will be performed in pages, and we will concentrate on those pages that are modified or written. As a result, file modification operation can be modeled as a sequence of page (and flash memory cell) modification. This sequence is called page access pattern. a
b
c
m
m
Page sequence
b
File1
d
e
File2
c
e
f
File1
m
a
b
a
b
m
m
c
d m
d
Figure 2.2: The modification of a file can be modeled as a series of page modification.
8
2.3
Flash Memory Access
After we define the file access pattern and flash memory model, we formally define the access of flash memory. Access to the flash memory can be divided into two categories – reading and writing. Since only the write operation changes the state of cells and the allocation status of flash memory, we just need to focus on the write operation. That is, we simply ignore the read operation in the page access pattern. We divide the process of writing to the flash memory into three stages. In the first stage, we transform the file modification process into a page access pattern. In the second stage, we use a function f to map each page in the page access pattern to a free cell. This function is called page allocation function. We then apply function f on the page access pattern obtained from the first stage and allocate a cell for each page in the page access pattern. At the third stage, the page of the page access pattern pk will be allocated a memory cell cf (pk ) , decided in the second stage, and the state of cell cf (pk ) is set to “used”. If the same page appeared in the page access pattern before, we set the state of the cell it was previously allocated “invalid”. Fig 2.3 gives an example of this process. In this example, we put the page pk into cell ck . If the cell ck is written, we will search for the first free cell after ck and put the page into it.
file access pattern
Allocation state of the flash memory
p[2]
p[4]
p[5]
p[6]
invalid used
used used
used
p[9]
p[2]
used
b[0] b[1] b[2] b[3] b[4] b[5] b[6] b[7]
Figure 2.3: An data access example when the number of blocks is 8 and the number of cells in each block is 4. c2 is set to invalid when the page inside c2 is modified. In the example above, we put the page p2 , p4 , p5 , p6 , and p9 into the cell c2 , c4 , c5 , c6 , and c9 respectively. When page p2 came we tried to allocate c2 for it, but since the cell is written we search for the first free cell after c2 . Then we found c3 and allocated it to page p2 . In addition, since p2 appeared before we need to invalidate the cell P2 was previously allocated. 9
Chapter 3 Algorithm This chapter describes three algorithms for flash memory allocation problem – a First Come First Serve (FCFS) method, a First Re-arrival First Serve (FRFS) method, and an Online First Re-arrival First Serve (OLFRFS) method.
3.1
First Come First Serve Allocation
The First Come First Serve Allocation algorithm places pages according to their arrival time, with the first coming page occupying the first available flash memory cell. This strategy is very intuitive for the following reasons. First, FCFS is very easy to implement and we can simply place the pages without any complex computation. Secondly, it seems reasonable that the first incoming page will become invalid first, so that if we apply FCFS the blocks could be reused in the earliest possible time. As a result FCFS may require less memory blocks for the same page access pattern. FCFS allocation algorithm uses a data structure called block list to store the flash memory blocks. We allocate block from the block list, and put the incoming page into the cell of this block to finish the allocation process. As described earlier, Each block of the flash memory has a unique identification number, and the blocks are denoted as b0 , b1 , . . . , bF −1 , where F is the total number of the blocks. Initially the block list contains all the blocks of the flash memory, and the blocks are ordered in increasing identification number order. FCFS places each page into memory according to its arrival time. That is, the first page is placed into the first position of the first blocks in the block list; the second page is placed into the second position of the first blocks in the block list, and so on. In other words, we start placing the pages into the second block in the block list. only when the cells of the first block are all used. As a result we place the pages into blocks sequentially one block at a time. The flash memory recycles blocks when necessary. If a page re-appears in the access pattern, and sets the cell where it has been occupying invalid, and allocate a new cell for 10
it using FCFS as if it is a new incoming page. Formally the block is active if and only if there is at least one used page in it, i.e., a page that contains valid data. In other words, a block is inactive only when all of its pages are either free or invalid. If a block is inactive, it can be erased and become ready to use again. If FCFS finds that the block the data previously occupies become inactive, it will move this re-usable block to the end of the block list so it can be reused. As a result, the k-th incoming page from the access pattern is placed into the ((k − R ∗ B)%B)-th cell of the ((k − R ∗ B)/B)-th block in the block list, where B is the number of the cells in a block and R is the number of the erased blocks so far. During the FCFS allocation procedure, we keep track of the total number of the active blocks that have written cells in them. At the end, the maximum of these blocks number is the number of blocks requested. Figure 3.1 gives the FCFS pseudo code. 1: 2: 3: 4: 5: 6: 7: 8
Initilize the block list. for each page in the page access pattern { Set the position of the page according to the arrival time of the page. Place the page into the ((position-R*B)%B)-th cell of the ((position-R*B)/B)-th block in the block list. If the page appeared before, mark the cell it resided invalid. If the block the page resided before is now inactive, move it to the end of the block list to be reused. Count the number of the active block in the block list, and compare it with the one from the previous iteration. }
Figure 3.1: The pseudo code for First Come First Serve Allocation (FCFS) algorithm. Figure 3.2 gives a complete example of FCFS allocation in which both F and B are 4. As shown in (a) the page access pattern is a b c b a a d b d a d d a. From (b) to (n), we place each page into a cell according to its arrival time. In (b), we place the page a into cell c0 because it is the first page in page access pattern. The number of active blocks is now 1. In (c) and (d), we place the page b and c into cell c1 and c2 respectively. Note that in (e) the page b reappears, therefore after we place page b into a new cell c3 according to its arrival time we mark cell c1 invalid (indicated by a slash in c1 ). In (f), we do the same for the page a – place the page into the cell c4 according to its arrival time and mark the previously allocated cell c0 invalid. Now the number of active blocks increases to 2. From (g) to (i), we do the same for the following three pages – place the page into cell and mark the cell they resided before invalid. The number of active blocks increases to 3 after (j). From (k) to (m), we do the same for the following three pages. The number of active blocks increases to 4 after (n), which is the maximum number of active blocks during the entire allocation process. Therefore the number of requested blocks is 4. 11
abcbaadbdadda
bdadda
(a)
(h)
bcbaadbdadda (b)
dadda (i)
a
(j)
a b
(k)
a b c
(l)
a b c b
a b c b a a d b d a d a
adbdadda (f)
a b c b a a d b d a da
aadbdadda (e)
a b c b a a d b d dda
baadbdadda (d)
a b c b a a d b adda
cbaadbdadda (c)
a b c b a a d
(m) a b c b a a d b d a d d
a b c b a dbdadda
(g)
(n)
a b c b a a
a b c b a a d b d a d d a
Figure 3.2: A complete example of FCFS in which both F and B are 4. The initial status is in (a), and the subsequent status after each step of the allocation is shown from (b) to (n). The overall maximum number of requested blocks is 4.
12
3.2
First Re-arrival First Serve Allocation
Despite the fact that FCFS is intuitive and easy to implement, it does not perform well in practice. Instead of placing pages according to their arrival time, we propose another strategy called First Re-arrival First Serve (FRFS) that places pages according to their re-arrival time. A page re-arrives when that same page appears again in the page access pattern. And the re-arrival time is the time when the page re-arrives. if a page does not re-arrive in the access pattern, its re-arrival time is set to infinity The intuition that we use the re-arrival time to allocate cells is that we want to reuse blocks as soon as possible, so that the allocation could uses the minimal number of blocks. By assigning those pages that will be marked invalid first, the first used block will be reused at the earliest possible time. FRFS uses the same block list data structure as in FCFS. Initially the block list contains all blocks of the flash memory, and blocks are sorted by their identification numbers. FRFS algorithm has three stages. In the first stage FRFS computes the rearrival time of each page by scanning through the entire access pattern. Note that FRFS needs to know the page access pattern in advance; otherwise, it has no idea about the re-arrival time of each page. During the first stage, we use an array to record the previous arrival time of a particular page, so that we can obtain its re-arrival time. The re-arrival time of those pages that do not appear again is set to infinity. Figure 3.1 gives the computation of the re-arrival time. In the second stage FRFS allocates a cell for each page. We sort the page sequence by their re-arrival time and assign each of them a ordinal order according to its re-arrival time. The page having the earliest re-arrival time is assigned 0, the page with the second earliest re-arrival time is assigned 1, and so on. Table 3.1 illustrates an access pattern, the re-arrival time of each page, and the ordinal numbers they are assigned according to their re-arrival time. Time Access Pattern Re-arrival Time Order
1 2 3 4 a b c b 5 4 i 8 1 0 9 3
5 6 7 a a d 6 10 9 2 5 4
8 9 10 11 12 13 b d a d d a i 11 13 12 i i 10 6 8 7 11 12
Table 3.1: FRFS timestamps each page and computes its re-arrival time. Note that the re-arrival time i means infinity. In the third stage FRFS places the pages into the blocks according to the ordinal number they are assigned from the second stage. The number of blocks that will be requested by FRFS can also be calculated. FRFS places each page according to its ordinal number. If its ordinal number is k, it will be placed into the ((k − R ∗ B)%B)-th cell of the ((k − R ∗ B)/B)-th block in the block list, where B is the number of pages in a block and R is the number of the erased blocks so far. Note that when a page re-arrives, 13
we need to set the status of the cell it previously resided to be invalid. If all pages of a block are invalid, this block is inactive, and can then be erased and reused. We move this re-usable block to the end of the block list and increase R by one. During the entire allocation procedure we keep track of the number of blocks requested. At the end the maximum of these block numbers is the number of blocks that will be needed by FRFS. Figure 3.3 gives the pseudo code for FRFS. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10
Initialize the block list. Compute the re-arrival time for each page Compute the order according to the re-arrival time. for each( page in the page access pattern) { Set the position of the page according to the order of the page. Place the page into the ((position-B*R)%B)-th cell of the ((position-R*B)/B)-th block in the block list If the same page appeared before, mark the cell it resided invalid. If the block the page resided before is inactive, move it to the end of the block list to be reused. Count the number of the active block in the block list, and compare it with the previous iteration. }
Figure 3.3: The pseudo code of First Re-arrival First Serve Allocation (FRFS) algorithm. Figure 3.4 gives an example of FRFS. In this example, F which means the number of block in flash memory is 4 and B which means the number of cells in each block is 4. As shown in (a) the page access pattern is a b c b a a d b d a d d a. From (b) to (n), we place each page into cell according to its ordinal number with respect to its re-arrival time. In (b), we place the page a into cell c1 because its ordinal number is 1. The number of active blocks is now 1. In (c) and (d), we place the page b and c into cell c0 and c9 respectively. After (d), the number of active blocks increases to 2. Note that in (e) the page b reappears, therefore after we place page b into a new cell c3 we mark the previous cell c0 invalid. This is indicated by a a slash in c0 . In (f), we place page a into a new cell c2 and mark the previous page cell c1 invalid. The number of active blocks increases to 3 after (g). In (h), we place the page d into cell c4 . Because it is a new page, we do not have to mark any cell invalid. It is interesting that in (i) block b0 is empty. The reason is that because page b reappears, we have to mark its previously allocated cell c3 invalid. And after doing this we discover that all cells in block b0 are now invalid. As a result we erase this block and mark all cells in it free. Furthermore, we move this block to the end of the block list and add R which is the number of erased blocks so far to 1. The state of block list is b1 , b2 , b3 , b0 . And the number of active blocks decrease to 2. In (j), we place the page 14
d into the cell c6 , the ((6-4*1)%4)-th cell of the ((6-4*1)/4)-th block in the block list. From (k) to (l), we do the same for the following three pages – place the page into cell according its ordinal number and mark the previously allocated cell invalid. In (m), the page d reappears, we have to mark its previously allocated cell c7 invalid. And after doing that, we discover that all cells in block b1 are now invalid. Therefore, we erase this block and mark all cells in it free. Furthermore, we move this block to the end of the block list and add R to 2. Then the number of active blocks decreases to 1. In (n) we place the page a into cell c12 and mark its previously allocated cell c8 invalid. After that the number of active blocks increases to 2. After these steps, we find that the maximum number of active blocks required is 3.
bdadda
abcbaadbdadda (a)
(h)
(i)
a
(j)
b a
b a
b a
b
d a d
(l)
c
b a a b
c b
a c b
da d a d d a c b a
adbdadda (f)
d a d
(k)
c aadbdadda
(e)
c b
dda
baadbdadda (d)
d a adda
cbaadbdadda (c)
c
dadda
bcbaadbdadda (b)
b a a b d a
c
(m)
a c b d
c
(n)
a c b d a
dbdadda (g)
b a a b
a
Figure 3.4: A complete example of FRFS.
15
3.3
Online First Re-arrival First Serve Allocation
Despite the fact that FRFS performs better than the FCFS, as we will see in the next chapter about experimental results, FRFS may not be practical since it needs to know the entire access pattern in order to make the allocation decisions. However, in practice it is impossible to obtain this global knowledge, and most of the time we are required to allocate a page in an on-line manner. That is, we are required to make an allocation decision as soon as a page request arrives. As a result, we have to modify FRFS so that it will be able to adapt to the on-line scenario. We observe that most of the page access patterns contain certain amount of “regularity”. By regularity we mean that the interval between the same page appears and re-appears is more or less the same. In the next chapter we formally define this interval regularity, and describe evidences that this regularity does exist in file access trace records collected in real systems. Based on this observation, we can modify our FRFS algorithm to explore the interval regularity, and build an on-line FRFS (OFRFS) that can correctly estimate the time a page will re-appear. Namely we use the interval obtained from the history and the time it lastly appeared to estimate when the same page will reappear, and allocate the page for it accordingly. The online FRFS uses two essential data structures. The first data structure is a block list as in FCFS. Initially the block list contains all the blocks of the flash memory, and the blocks are sorted by their identification numbers. The second data structure is a prediction table that contains prediction information for all the pages that have appeared. For actual implementation the prediction table will be placed in random access memory of the embeddd deveices, so that we can access the prediction information fast. Each page in the prediction table contains two important data – the estimated arrival interval for this page (denoted by α) and the last time it appeared (denoted by β). Initially, the prediction table is empty. When a new page appears, we set its estimated arrival interval α to a default value v, which is the mean value of the intervals of all pages we have observed in the past. Then we set the last arrival time β of this page to the current time, and insert this entry into the prediction table. If the incoming page is already present in the prediction table, we update the estimated interval as a linear combination of the previous α, and the length of interval between the current time and the previously arrival time β, since the latter is exactly the most recent interval this page re-appears. Formally, we compute the new estimated interval length as the following Equation 3.1, where α ∗ is the new estimated interval length, t is the current time, and r is the constant between 0 and 1 [4]. α∗ = rα + (1 − r)(t − β)
(3.1)
Finally by definition we set the previously arrival time β to the current time. Figure 3.5 gives an example of how we estimate the length of the next re-arrival interval as the sum of half of the previously estimated interval length, and half of the latest interval length. The number on the left within a block is the identification number, the number in the 16
middle is the estimated interval of this page (α), and the number on the right is the previously arrival time β. The ratio constant r is set to 0.5 in this example. Time
Access Pattern
1
a
a
2.64
1
2
b
a
2.64
1
b
2.64
2
3
c
a
2.64
1
b
2.64
2
c
2.64
3
4
b
a
2.64
1
b
2.32
4
c
2.64
3
5
a
a
3.32
5
b
2.32
4
c
2.64
3
6
a
a
2.16
6
b
2.32
4
c
2.64
3
7
d
a
2.16
6
b
2.32
4
c
2.64
3
d
2.64
7
8
b
a
2.16
6
b
3.16
8
c
2.64
3
d
2.64
7
9
d
a
2.16
6
b
3.16
8
c
2.64
3
d
2.32
9
10
a
a
3.08
10
b
3.16
8
c
2.64
3
d
2.32
9
11
d
a
3.08
10
b
3.16
8
c
2.64
3
d
2.16
11
12
d
a
3.08
10
b
3.16
8
c
2.64
3
d
1.58
12
13
a
a
3.04
13
b
3.16
8
c
2.64
3
d
1.58
12
Prediction Table
Figure 3.5: The status of prediction table after each allocation step of OFRFS. Now we give a complete example of how OFRFS estimates the arrival interval in Figure 3.5. Initially the prediction table is empty. When the first page a appeared at time 1, we set its estimated interval α to v and the last arrival time β to 1, and insert a into the prediction table. For the ease of explanation we will set v to 2.64 and r to 0.5 in the following description. When new pages b and c arrived at time 2 and 3 respectively, we set the estimated interval α and previously arrival time β accordingly, and insert them into the table. Now the page b reappears at time 4. Since b is already in the table, we calculate its new estimated interval α according to Equation 3.1. The new estimated interval alpha is 2.64/2 + (4 − 2)/2 = 2.32, and last arrival time β is set to the current time 4. When page a reappeared at time 5 and 6, since it is already in the prediction table, we just update its estimated arrival interval α and last arrival time β. The new estimated arrival 17
interval α is 2.64/2 + (5 − 1)/2 = 3.32 and 3.32/2 + (6 − 5)/2 = 2.16 at time 5 and 6 respectively. The last arrival time β of a is set to 5 and 6 respectively. Now a new page d appeared at time 7, and we set its estimated arrival interval α to a default value 2.64, and the last arrival time β to 7. From time 8 to 13, we do the same for the following six pages – update the estimated arrival interval α and the last arrival time β. We now complete the online FRFS implementation after we know how to estimate the re-arrival time for each page. When a page appeared we first obtain its estimated re-arrival interval from the prediction table. Then we add the new estimated interval and the current time together as the predicted re-arrival time. When FRFS allocates a cell for the coming page, it needs to “skip” certain number of cells, for those pages that will have re-arrival time smaller than the current page. The reason is that by allocating the order the pages reappear, blocks of cells can be reused as soon as possible. Therefore, we need to count the total number of pages which will have smaller estimated re-arrival time. We now describe the process of counting the number of pages that will have smaller estimated re-arrival time than the current page, so that we may skip the right number of cells during allocation. When the current page arrives, its re-arrival time is the sum of its estimated interval α and the current time. Now for the other pages, by adding its estimated interval α to its last arrival time β, we obtain the time it will arrive and be allocated into a cell we reserved for it. However, the time this page is expected to leave the reserved cell is the sum of twice of their estimated interval α and the last arrival time β – the re-arrival time of this page after being allocated into the reserved cell. For each of the other pages p we compare the sum of twice of its estimated interval and the last arrival time with the sum of the estimated interval of the current page and the current time. If the page p has a smaller re-arrival time, we reserve a cell for it. Figure 3.6 illustrates an example.
Now t1 t2
t3
t4 t5 t6
t7
Time alpha Current Page alpha a alpha b
Figure 3.6: An example of how to determine the number of cells that an incoming page has to reserve. 18
Consider the example in Figure 3.6. The re-arrival time of the current page is simply the sum of its estimated interval α and the current time. The page a was allocated into some cell at t2, and is expected to re-arrive at time t3. If we reserve cell for a now, we expect a to stay in the reserved cell until t4 when it leaves. Since t4 < t6 the page a will leave the cell we prepared before the current page does, the decision of reserving a cell for it seems valid. On the other hand, the page b is allocated in some cell at time t1, and is expected to re-arrive at t5. If we reserve a cell for it, it is expected to leave the reserved cell at t7. However, since t7 > t6, it does not make much sense that we need to reserve cell for it while allocating the current page. During the estimation process, we do not know the status of every pages that will eventually appear, and we only know the predicted re-arrival time of those pages that have appeared in the past. As a result we simply count the number of pages that we are aware of that have a smaller predicted re-arrival time than the current page. Let us denote this number of other pages that will reappear earlier than the current page as S. After knowing S, we look for the (S + 1)-th free cell in the block list and put the page into it. Note that when a page re-arrives, we need to set the status of the cell it previously resided to be invalid. If there is no written cell in a block, this block is inactive and can then be erased and reused, so we move this re-usable block to the end of the block list. During the entire allocation procedure we keep track of the number of blocks requested. At the end the maximum of these block numbers is the number of blocks that will be needed by OFRFS. Figure 3.7 gives the pseudo code for OFRFS. Figure 3.8 gives an example of OFRFS. The number of blocks in flash memory (F ) is 4 and the number of cells in each block (B) is 4. The parameter v is set to 2.64 and r is 0.5. As in Figure 3.8(a) the page access pattern is a b c b a a d b d a d d a. From Figure 3.8(b) to (n), we place each page into cell according to its predicted re-arrival time. The state of prediction table is shown in Figure 3.5. In Figure 3.8(b) we place the page a into cell c0 . since there is no known page with smaller predicted re-arrival time, as indicated in Figure 3.5. For the next page b, since page a has a predicted re-arrival time 1 + 2 ∗ 2.64 = 6.28 and page b has a predicted re-arrival time 2 + 2.64 = 4.64, the number of other pages that will reappear earlier than page b is 0. We then place page b into the first free cell c1 from the block list. Similarly we place the next page c into the first free cell c2 because page a, b, and c have predicted re-arrival time 6.28, 7.28, and 5.64 respectively. None of them has a smaller predicted re-arrival time than page c. In Figure 3.8(e) page b reappeared and has a predicted re-arrival time larger than that of page a, therefore we place it into the second free cell c4 . The number of requested blocks increases to 2. In addition we mark its previously allocated cell c1 invalid, as indicated by a slash in the cell. In Figure 3.8(f) page a reappeared, so we marked its previously allocated cell c0 invalid after placing it into the second free cell c5. In Figure 3.8(g) we place the page a into cell c3 and mark its previously allocated cell c5 invalid. In (h), we just need to place the page d into the third free cell c8 . The number of requested blocks increases to 3. It is interesting to note that in (i) the block 19
1: 2: 3: 4:
5:
6: 7: 8: 9: 10
Initialize the block list Initialize the prediction table for each page in the page access pattern { If the page did not appear before, set the estimated interval to be default value and insert it into prediction table. Otherwise, re-calculate the estimated interval and update. Use the sum of current time and estimated interval as predicted re-arrival time of current page. Use the sum of previous arrival time and twice estimated interval as predicted re-arrival time of others. Count the number of pages which have smaller re-arrival time than the current page in prediction table. Let the number be S. Place the page into the (S+1)-th free cell in the block list. If the same page appears before, mark the cell it resided invalid. If the block the page resided before is inactive, move it to the end of the block list to be reused. Count the number of the active block in the block list, and compare it with the previous iteration. }
Figure 3.7: The pseudo code of On-line First Re-arrival First Serve Allocation (OFRFS) algorithm.
20
b1 is empty. The reason is that when we placed the page b into the cell c9 and mark its previous allocated cell c5 invalid, we discover that there is no written cell in block b1 , and the block is inactive. As a result we erase this block and mark all of its cells free. Block b1 can now be reused, the number of active blocks decreases to 2 and the state of block list is b0 , b2 , b3 , b1 . In (j), we do the same for the page d – place it into the third free cell c12 and mark its previously allocated cell c8 invalid. After (j), the number of active blocks increases to 3. In (k), we place the coming page a into the cell c11 and mark its previously allocated cell c3 invalid. In (l), we place the page d into the second free cell c13 and mark its previously allocated cell c8 invalid. From (m) to (n), we place the following two pages according to their predicted re-arrival time, and mark their previously allocated cell invalid. The number of active blocks increases to 4 after (n). After these steps, we find that the maximum number of active blocks required is 4.
3.4
Hybrid Online Allocation
Despite that our online FRFS allocation algorithm allocates pages in an on-line manner with the aid of prediction table, we do not have sufficient information to deduce the re-arrival time for those pages that have very little information in the prediction table. When a new page arrives, we just set its estimated arrival interval α to a default value v, which is the mean value of the intervals of all pages we have observed in the past. This could seriously mislead the prediction since even when the page access pattern does have certain regularity, the interval of one page could be very different from the other. Setting the estimated interval of all pages to the same initial constant seems questionable. To overcome this problem we modify our allocation algorithm as follows: If the number of the times a page has appeared is small, we do not use the estimated interval from the prediction table to predict its re-arrival time. Instead we allocate these page with the naive FCFS method. We will refer to this algorithm as the Hybrid Online Allocation Algorithm. This algorithm solves the problem that the estimated interval is not correct because of insufficient data in the prediction table.
21
bdadda
abcbaadbdadda (h)
(a)
c b a
(i)
a
c b a
a
(j)
b
c b a
a
b
(k)
c
aadbdadda (e)
a
b
a
d b d
c b a
b d a
c b a
b d a
da (l)
c b
d a
adbdadda (f)
d b
dda
baadbdadda (d)
a
adda
cbaadbdadda (c)
d
dadda
bcbaadbdadda (b)
a
c b a
(m)
d d
c b a
b d a
(n)
d d a c b a
b d a
dbdadda (g)
c b a
a
Figure 3.8: An example of OFRFS. The state of prediction table at each moment is shown in Figure 3.5. The maximum number of requested blocks is 4.
22
Chapter 4 Experimental Results This chapter describes our experimental results. Before showing the performance of each allocation algorithm, we verify the regularity of the page access pattern through experiments, so that we can justify the prediction method in the online algorithms.
4.1
Regularity of Page Access Pattern
In order to predict the re-arrival time of a particular page, we must have certain confidence in the “regularity” of the page access pattern. In this section we formally define the regularity, and demonstrate that it does exist in the trace files we collected from real systems. In order to quantify this interval regularity, for each page we calculate the mean value and standard deviation of the intervals the same page reappears. We then for each page calculate the ratio of the standard deviation to mean value of these intervals. This ratio is denoted by R, and a small value of R indicates that the page reappears in a predictable pattern. We use this ratio R as the criterion to judge the regularity of page access pattern. Since each page has its own ratio R of standard deviation to mean interval length, the distribution of R from all pages is a good indicator of how well we will be able to predict the re-arrival time. We construct this R histogram as follows. We first find the maximum ratio R among all pages and denote it as RM . The R domain is then divided to produce 100 equally sized sub-domains, denoted by r0 , r1 , . . . , r99 . The range of rk is between RM ∗ k/100 and RM ∗ (k + 1)/100. We then construct the histogram by counting the number of pages that will fall into each sub-domain. Figure 4.1 (a) gives an example of the histogram. To compute the probability of the event that a page has a specified range of R, we put a weight on the each page. The weight of a page is the frequency that this page appears in the trace file. Now each page has a weight, so we can compute its weighted contribution towards the overall distribution. We now compute the sum of weight for all pages from a sub-domain of R, and divided it by the total weight from all pages. This quantity is 23
indeed the probability of the event that a page has a specified range of R. Now we calculate the accumulated probability of these ratio values. We are interested in the calculation that given a fixed probability p, what is the minimum ratio value R ∗ that the event of having the random variable R smaller or equal to R ∗ is at least p? Formally we have the following inequality. P r(R ≤ R∗ ) ≥ p
(4.1)
For the sub-domain rk we sum up the probability of sub-domains r0 , r1 , . . . , rk , and plot the accumulated probability in Figure 4.1 (b). Now we have three trace files collected in three different real systems. The first trace file is collected from an FAT32 file system. Applications are a web browser and an email client. The second trace is a disk trace downloaded from BYU Trace Distribution Center [1]. This disk trace file is from a TPC-C database benchmark with 20 warehouses, with one client running one iteration. The database system is Postgres 7.1.2, and the trace is collected with DTB v1.1 from Redhat Linux kernel 2.4.13. The third trace is collected from an FAT32 file system. The applications include web browser, editor, P2P software, and email client. It is collected with Microsoft tracelog v5.0. For each of our trace records, there are two statistic graphs in Figure 4.1, 4.2, 4.3. When we set p 90%, R∗ equals 31, 3, 49 respectively in each of the trace records. We conclude that trace2 has higher regularity than trace1 , which in turn has higher regularity than trace3 . 250
1 ’analysis_trace_0_0.pl’
’analysis_trace_0_1.pl’ 0.9
200
0.8
0.6 possibility
number of pages
0.7 150
100
0.5 0.4 0.3
50
0.2 0.1
0
0 0
10
20
30
40
50
60
70
80
90
100
0
percentage of Rm
10
20
30
40
50
60
70
80
90
100
percentage of Rm
(a) Histogram of the first trace file.
(b) Accumulated probability graph of the first trace file.
Figure 4.1: The statistic graph of the first trace record.
4.2
Experimental Guidelines and Results
This section describes the performance of each of our algorithms. we first describe the simulation parameters. To simplify the discussion we assume that the total number of 24
400
1 ’analysis_trace_1_1.pl’
350
0.95
300
0.9
250
0.85 possibility
number of pages
’analysis_trace_1_0.pl’
200
0.8
150
0.75
100
0.7
50
0.65
0
0.6 0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
percentage of Rm
50
60
70
80
90
100
percentage of Rm
(a) Histogram of the second trace file.
(b) Accumulated probability graph of the second trace file.
Figure 4.2: The statistic graph of the second trace record.
140
1 ’analysis_trace_2_0.pl’
’analysis_trace_2_1.pl’ 0.9
120 0.8 0.7 0.6
80
possibility
number of pages
100
60
0.5 0.4 0.3
40
0.2 20 0.1 0
0 0
10
20
30
40
50
60
70
80
90
100
0
percentage of Rm
10
20
30
40
50
60
70
80
90
100
percentage of Rm
(a) Histogram of the third trace file.
(b) Accumulated probability graph of the third trace file.
Figure 4.3: The statistic graph of the third trace record.
25
blocks in the flash memory system is infinite, and we only focus on the number of blocks that are actually used. Based on the current flash memory technology we assume that the number of cells in each block, B, is 32 [6]. Finally, we use three real trace files from disk writing operations as our page access pattern, instead of a randomly produced pattern. This helps us to examine our algorithms in a realistic experimental setting. Note that a disk file could be much larger than the capacity of a flash memory system, therefore we limit the range of our page access pattern to be within a flash file system. A file is divided into pages p0 , p1 , p2 , . . . , and we limit total number of accessed pages to be 65536. As a result the pages in a page access pattern are taken from the set {p0 , p1 , . . . , p65535 }.
4.2.1
Experimental Guidelines
We implemented four algorithms – First Come First Serve (FCFS), First Re-arrival First Serve (FRFS), Online First Re-arrival First Serve (OFRFS) and Hybrid Online (HO). For each trace file, we compare their performance under different access patterns length N . We conducted experiments for different values of N , which is set to 2n ∗ 256 for n = 1, 2, 3, . . . till the maximum value depending on the length of trace files. For each of these page access patterns we run the three allocation algorithms for 100 times and compare the average number of requested blocks.
4.2.2
Experimental Results
Figure 4.4, 4.5, and 4.6 plot the relation between the length of the page access pattern and the average number of requested blocks for each trace file. Observations From Figure 4.4, 4.5, and 4.6 we have the following observations. First for each trace file, the average number of requested blocks of FCFS is larger than that of FRFS. Secondly, the average number of requested blocks of OFRFS is less than that of FCFS in the first trace file, almost the same as that of FCFS in the second trace file, and actually larger than that of FCFS in the third trace file. Thirdly, the average number of requested blocks of HO is smaller than that of OFRFS in all trace files. Explanations There are two reasons that FRFC outperforms FCFS. First FRFS permutes the pages of the page access pattern in an order obtained from their re-arrival time. Consequently we put those pages that will become invalid earlier into the first blocks. When the last page in the first block arrives again, the block can immediately be erased and re-used. This accelerates the rate of producing re-usable blocks and decreases the number of requested blocks. Secondly, every page access pattern has some pages that exist forever, which we call infinite pages, and these infinite pages do not re-appear. Consequently the infinite pages occupy the cells they reside forever. In FRFS allocation algorithm, these infinite pages are of infinite re-arrival time, so they are all allocated to the last possible blocks. In our implementation we allocate these infinite pages separately 26
900 FCFS FRFS OFRFS HO
Average Number of Requested Blocks
800
700
600
500
400
300
200
100
0 0
50000
100000 150000 200000 Lenght of Page Access Pattern
250000
300000
Figure 4.4: The average number of the requested blocks from the four allocation algorithms on the first trace file,
27
300 FCFS FRFS OFRFS HO
Average Number of Requested Blocks
250
200
150
100
50
0 0
200000
400000 600000 800000 Lenght of Page Access Pattern
1e+06
1.2e+06
Figure 4.5: The average number of the requested blocks from the four allocation algorithms on the second trace file,
28
400 FCFS FRFS OFRFS HO
Average Number of Requested Blocks
350
300
250
200
150
100
50
0 0
20000
40000
60000 80000 Lenght of Page Access Pattern
100000
120000
140000
Figure 4.6: The average number of the requested blocks from the four allocation algorithms on the third trace file,
29
in a different block area, so that they will not be distributed among those blocks that could be reused, causing those blocks not being able to be reused. As a result FRFS avoids the case that infinite pages interfere the recycle process and this also decreases the final number of requested blocks. We observe that OFRFS performs quite differently in different trace files. The reason seems to be that the regularity of the trace files, i.e. the regularity of trace2 is higher than that of trace1 , which is higher than that of trace3 . As a result the average number of requested blocks of OFRFS is larger than that of FCFS in the third trace file, because the third trace file has the least regularity among all three and OFRFS cannot precisely predict the re-arrival time of each page. The performance of OFRFS is almost the same as FCFS in the second trace file because when the trace file is regular the performance of FCFS also improves. For the same page access pattern length, the average number of requested blocks of FCFS in the second trace file is smaller than that in the first trace file. As a result the performance of OFRFS is similar to that of FCFS. On the other hand, the first trace file has medium regularity, and OFRFS performs better than FCFS. It seems that OFRFS is less sensitive to regularity than FCFS does, so its performance degrades not as much as FCFS does. The average number of requested blocks of HO is smaller than that of OFRFS in all trace files. HO eliminates the situation that we allocated page according to wrongly predicted re-arrival time, due to insufficient data in the prediction table. If we do not trust the estimated interval, we resort to allocating the page in FCFS scheme. It is not until we have sufficient data to make reasonably correct interval estimation before we allocate the pages according to OFRFS scheme. This ensures that the performance of HO is at least as good as that of FCFS.
30
Chapter 5 Conclusion This paper discusses efficient allocation algorithms for flash file systems. The goal is to allocate a flash memory cell for each incoming page from a file access pattern so that the number of requested blocks for the file access pattern is reduced. We also implemented these algorithms and conduct experiments to compare their performances. From the experimental result, and consisting with our intuition, the First Re-arrival First Serve (FRFS) method outperforms the First Come First Serve (FCFS) method. However FRFS may not be practical since it needs to know the entire access pattern in advance in order to make the allocation decisions. In practice this global knowledge is impossible to obtain, and most of the time we are required to allocate pages in an on-line manner. Nevertheless, FRFS illustrates a very important idea that if we can allocate pages according to their re-arrival time, we can reuse block as soon as possible and the number of requested blocks will decrease. In our previous empirical study (Chapter 4) we demonstrated that file access patterns usually do have regularity. With this observation, we can predict the re-arrival time of each page from the history. This motivates us to allocate the page into the cell according the predicted re-arrival time, instead of the real re-arrival time. The Online First Rearrival First Serve (OFRFS) method also exhibited good performance as we observed in Chapter 4. OFRFS performs exceptionally well when the length of file access pattern increases, so that it can predict next arrival time of each page precisely, and allocate pages accordingly. This reduces the number of requested blocks for the same access pattern, when compared with FCFS. The current scope of this paper does not consider the life cycle of flash memory. That is, most flash memories only guarantee a limited number of erase and re-write cycles, and typical values are guaranteed up to 10,000 times [10]. Our allocation algorithm focuses on the number of requested blocks and do not take the number of erase and rewrite operations into consideration. It is possible that if an allocation algorithm does not evenly distribute the erase and re-write operations to all cells, some cells may worn out much earlier than the others. As a result, our future work includes a more balanced cell allocation strategy by which the number of operations is balanced on all cells. This 31
broader and more complex cost measurement model will certainly be more realistic, and hence more practical.
32
Bibliography [1] Byu trace distribution center. http://tds.cs.byu.edu/tds/index.jsp. [2] Flash memory. http://en.wikipedia.org/wiki/Flash memory. [3] S. Nishioka A. Kawaguchi and H. Motoda. A flash memory based file system. In Proceedings of the USENIX Technical Conference, 1995. [4] Peter Baer Galvin Abraham Silberschatz and Greg Gagne. Operating System Concepts Sixth Edition. John Wiley & Sons, Inc., 2003. [5] Li-Pin Chang and Tei-Wei Kuo. A real-time garbage collection mechanism for flashmemory storage systems in embedded systems. In Preceedings of the 8th International Conference on Real-Time Computing Systems and Applications, 2002. [6] Samsung Electronics Company. K9f2808u0b 16mb*8 nand flash memory data sheet. [7] K. Han-Joon and L. Sang-goo. Memory management for flash storage system. In Proceedings of the Computer Software and Applications Conference, 1999. [8] C. H. Paul M. L. Chiang and R. C. Chang. Manage flash memory in personal communicate devices. In Proceedings of IEEE International Symposium on Consumer Electronics, 1997. [9] Vipin Malik. Jffs2 is broken. In Mailing List of Memory Technology Device (MTD) Subsystem for Linux, June 28th 2001. [10] Olaf Pfeiffer and Andy Ayre. Using flash memory in embedded applications. http://www.esacademy.com/faq/docs/flash/index.htm.
33