An index rewriting scheme using compression for flash memory ...

2 downloads 20349 Views 518KB Size Report
Apr 10, 2007 - Advances in computer and networking technologies have made extensive use of portable informa- ... attractive features make flash memory one of the best choices for portable information systems. Similarly .... Samsung Flash.
Journal of Information Science http://jis.sagepub.com

An index rewriting scheme using compression for flash memory database systems Siwoo Byun, Moonhaeng Huh and Hoyoung Hwang Journal of Information Science 2007; 33; 398 originally published online Apr 10, 2007; DOI: 10.1177/0165551506076331 The online version of this article can be found at: http://jis.sagepub.com/cgi/content/abstract/33/4/398

Published by: http://www.sagepublications.com

On behalf of:

Chartered Institute of Library and Information Professionals

Additional services and information for Journal of Information Science can be found at: Email Alerts: http://jis.sagepub.com/cgi/alerts Subscriptions: http://jis.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav

Downloaded from http://jis.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2007 Chartered Institute of Library and Information Professionals. All rights reserved. Not for commercial use or unauthorized distribution.

An index rewriting scheme using compression for flash memory database systems

Siwoo Byun and Moonhaeng Huh Department of Digital Media, Anyang University, Republic of Korea

Hoyoung Hwang Department of Media Engineering, Hansung University, Republic of Korea

Abstract. Flash memories are one of the best media to support portable computers’ storage areas in mobile database environments. Their features include non-volatility, low power consumption, and fast access time for read operations, which are sufficient to present flash memories as major database storage components for portable computers. However, we need to improve traditional index management schemes based on B-Tree due to the relatively slow characteristics of flash operations, as compared to RAM memory. In order to achieve this goal, we propose a new index rewriting scheme based on a compressed index called F-Tree. FTree-based index management improves index operation performance by compressing pointers and keys in tree nodes and rewriting the nodes without a slow erase operation in node insert/delete processes. Based on the results of the performance evaluation, we conclude that the F-Tree-based scheme outperforms the traditional schemes.

Keywords: tree indexing; portable devices; flash memory; mobile database; simulation

1.

Introduction

Advances in computer and networking technologies have made extensive use of portable information devices such as PDAs (personal digital assistants), HPCs (handheld PCs), PPCs (pocket PCs), car navigation systems, mobile phones, and smart phones. Each information device could include several applications such as information management tools and a small database. Flash memory is one of the best candidates supporting the use of small information devices for data management in portable computing environments [1].

Correspondence to: Siwoo Byun, Department of Digital Media, Anyang University, 708–113, Anyang 5-dong, Manan-gu, Anyang-city, Kyonggi-do 430–714, Republic of Korea. Email: [email protected] Journal of Information Science, 33 (4) 2007, pp. 398–415 © CILIP, DOI: 10.1177/0165551506076331 Downloaded from http://jis.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2007 Chartered Institute of Library and Information Professionals. All rights reserved. Not for commercial use or unauthorized distribution.

398

Siwoo Byun et al.

Recently, flash memory has become a critical component in building embedded systems or portable devices because of its non-volatile, shock-resistant, and power-economic nature. Its density and I/O performance have been improved to a level at which it can be used not only as a main storage for portable computers but also as a mass storage for general computing systems [2]. Although flash memory is not as fast as RAM, it is a hundred times faster than a hard disk in read operations. The performance of flash memory in comparison to other media [3] is shown in Table 1. These attractive features make flash memory one of the best choices for portable information systems. Similarly to many storage media, flash memories are treated as ordinary block-oriented storage media, and file systems are built over flash memories. Although they have been very convenient for engineers in building up application systems over a flash memory-based file system, the inherent characteristics of flash memory also cause various unexpected system behaviors and overheads. Engineers and users may be faced with significantly degraded system performance after certain periods of flash memory storage usages [2]. A flash memory is organized in many blocks, and each block is a fixed number of pages [4] (see Figure 1). A block is the smallest unit of an erase operation, while read and write operations are handled by pages. The typical sizes of the blocks and pages of a flash memory are 16KB and 512B, respectively. Because flash memory is write-once, we do not overwrite data with updates. Instead, data is written in free space, and the old versions are invalidated. This update strategy is called outplace update [5]. In other words, any existing data on the flash memory cannot be overwritten unless it is erased. Initially, all pages on the flash memory are considered free. When a piece of data on a page is modified, the new version must be written somewhere on an available page. The pages storing the old versions are considered dead, while the page storing the newest version is considered alive. After a certain period of time, the number of free pages is reduced. As a result, the system must reclaim free pages for further writes. Because erasing is done in a unit of one block, the live pages in a recycled block must be copied to another location, after which the block can be erased. The slow erase operation contributes to degrading the system’s performance. Flash memory has two critical drawbacks. First, the blocks of flash memory need to be erased before they can be rewritten. This aspect is caused by the fact that flash memory technology only allows the toggling of individual bits in one way for writes. The erase operation, which needs much more time than the read or write operations, resets the memory cells with either all ones or all zeros. The second drawback is that the number of rewrite operations allowed for each memory block is limited to under 1,000,000. This requires the flash management system to wear down all memory blocks as evenly as possible [6]. Due to these disadvantages, traditional index management technologies cannot easily be applied to the flash memory database in portable devices. Therefore, the index management system, which is based on flash memory media, must be able to make effective use of the advantages of the features of flash memory while effectively overcoming its constraints. The contributions of this paper are as follows: • We investigate the characteristics of flash memory in portable devices in terms of data access patterns and performance bottlenecks.

Table 1 Performance comparison of storage media Storage media

Volatility

I/O unit

Read time

Write time

Erase time

SRAM DRAM NOR Flash NAND-I Flash NAND-MLC Flash Hard disk

Volatile Volatile Non-volatile Non-volatile Non-volatile Non-volatile

Byte Byte Byte Page Page Page

50 ns (1B) 100 ns (1B) 150 ns (1B) 12 ms (512B) 20 ms (512B) 12.4 ms (512B)

50 ns (1B) 100 ns (1B) 200 ms (1B) 200 ms (512B) 300 ms (512B) 12.4 ms (512B)

— — 1 s (128KB) 2 ms (16KB) 2 ms (16KB) —

Journal of Information Science, 33 (4) 2007, pp. 398–415 © CILIP, DOI: 10.1177/0165551506076331 Downloaded from http://jis.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2007 Chartered Institute of Library and Information Professionals. All rights reserved. Not for commercial use or unauthorized distribution.

399

Siwoo Byun et al.

Fig. 1.

Samsung Flash.

• We propose a new index structure called F-Tree to improve the performance of index operations in a flash memory database system. We present the notion of compressed index rewriting to handle slow write/erase operations. • Finally, we evaluate the performance of our scheme in terms of the index operation throughput and the average response time. The remainder of this paper is organized as follows: In the next section, we describe the previous research done in the field. In Section 3, we present our index management scheme in detail, with the F-Tree structure. Section 4 gives the details of a simulation model for evaluating performance, and includes the performance results. Finally, the conclusion comprises Section 5.

2.

Background

Previous approaches to dealing with indexing techniques that appear in the literature can be divided into two categories: hard disk-based and main memory-based approaches. The hard diskbased approach achieves high index performance by reducing expensive disk I/O and disk space. The cost of slow disk access is much more expensive than that of fast CPU access in the course of an index search. Therefore, in order to optimize disk access, a disk-based system sets the index node size to the block size and includes as many entries as possible in a node [7]. Two well known index structures are B-Tree [8] for a general index and R-Tree [9] for a spatial index. R-Tree is usually implemented as a disk-based index structure for accessing a large collection of spatial data. Insertion, deletion, and re-balancing often cause many sectors to be read and written back to the same locations. For disk storage systems, these operations are considered efficient, and R-Tree nodes are usually grouped in contiguous sectors on a disk for further efficiency considerations [5]. The disk-based index, however, could suffer from performance degradation due to disk I/O bottlenecks in real-time applications, such as moving object indexing in a GIS (Geographic Information System) database. In order to overcome this bottleneck, memory-based indexes were proposed. The memory-based index aims to reduce CPU execution time and memory space since there is no disk I/O. In general, the memory-based indexing system outperforms the disk-based system in terms of index operation performance. However, the memory-based systems could suffer from unreliable Journal of Information Science, 33 (4) 2007, pp. 398–415 © CILIP, DOI: 10.1177/0165551506076331 Downloaded from http://jis.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2007 Chartered Institute of Library and Information Professionals. All rights reserved. Not for commercial use or unauthorized distribution.

400

Siwoo Byun et al.

data access due to system faults such as power failure. The well known index structure for a memorybased system is T-Tree [8]. T-Tree has the characteristics of the AVL-Tree, which has O(logN) tree traverse. T-Tree also has the characteristics of B-Tree, which has many entries in a node for space efficiency. T-Tree is considered a good structure for a memory-based index in terms of operation performance and space efficiency [10]. However, in [11], the performance of the B-Tree could outperform that of the T-Tree due to the concurrency control overhead. Thus, for performance reasons, a memory-based system generally uses enhanced B-Trees as well as T-Trees. In disk-based systems, each node block may contain a large number of keys. The number of subtrees in each node, then, may also be large. The B-Tree is designed to branch out in this large number of directions and to contain a lot of keys in each node so that the height of the tree is relatively small. This means that only a small number of nodes must be read from the disk to retrieve an item. The goal is to have quick access to the data, and this means reading a very small number of records [12]. That is, the depth of the tree is very important because the number of node access refers to the number of disk I/O in the course of the tree traverse to search for or insert a node. Thus, the diskbased system minimizes the disk I/O cost by using a shallow and broad tree index. The memorybased system, on the other hand, does not prefer a shallow and broad tree index. This is because the depth of the tree or the size of the tree node can be adjusted to improve index performance, and the node access cost is very low in fast RAM. Unfortunately, index design and the implementation of a disk-based or memory-based system could not be applied directly to a flash memory-based system. This is due to the following drawbacks of flash memory. First, flash memory cannot be overwritten unless it is erased first. Note that writes and erases over flash memory take 20 and 200 times more than reads, respectively [13]. Thus, the write and erase operations should be handled differently from the read operation in terms of the execution cost. Second, the frequent erasing of some particular locations of the flash memory could quickly deteriorate the overall lifetime of the flash memory because each erasable unit has a limited cycle count on the erase operation. This requires the index management module to wear down all memory blocks as evenly as possible. Third, the write operation consumes nine times more energy than the read operation in portable devices, which have small batteries [10]. For these reasons, any direct application of the traditional index implementation to flash memory could result in severe performance degradation, and could significantly reduce its reliability. In order to improve the index operation performance of B-Tree in flash memory storage devices, B-Tree Flash Translation Layer (BFTL) [14] was proposed. BFTL could efficiently handle finegrained updates caused by B-Tree index access and reduce the number of redundant write operations in flash memory. The implementation of BFTL was done directly over the flash translation layer (FTL) so that no modifications to the existing application systems were needed. BFTL sits between the application layer and the flash memory block-device emulated by FTL. As shown in Figure 2, BFTL consists of a small reservation buffer and a node translation table. When the applications insert or delete records, the newly generated records are temporarily held by the reservation buffer to reduce redundant writes. Since the reservation buffer only holds an adequate amount of records, the index unit of the records should be timely committed (flushed) to the flash memory. The node translation table maintains the logical sector addresses of the index units of the B-Tree node so that the collection of the index units can be more efficient by smartly packing them into few sectors. Although BFTL achieves an enhanced performance in terms of write operations, it requires an additional hardware implementation of the reservation buffer and the node translation table. Furthermore, the search overhead can increase with frequent access to the reservation buffer and the node translation table. Both BFTL and our scheme aim to improve index operation performances for flash memory storage devices. However, the focus of our scheme is different from that of BFTL, which aims to minimize the number of redundant write operations in B-Tree. In other words, our scheme aims to reuse tree nodes by index compression and node rewriting techniques in B+-Tree, which is an enhanced index of B-Tree. Our scheme sits between BFTL and FTL because it can compress and rewrite the packed index unit of BFTL. Therefore, our scheme and BFTL can be merged to maximize performance synergy in flash memory storage systems. Journal of Information Science, 33 (4) 2007, pp. 398–415 © CILIP, DOI: 10.1177/0165551506076331 Downloaded from http://jis.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2007 Chartered Institute of Library and Information Professionals. All rights reserved. Not for commercial use or unauthorized distribution.

401

Siwoo Byun et al.

B-Tree-Related Applications

Other Applications

Node Translation Table The Commit Policy

BFIL Reservation Buffer

Flash Memory Translation Layer (FIL)

Flash Memory

Fig. 2.

3.

Architecture of BFTL.

Flash memory-based index management scheme

In order to devise a new index management structure for flash memory, we focused on a B-Treebased index, especially the B+-Tree (Figure 3), which is considered to be the most well known and efficient indexing scheme for both disk-based and memory-based database systems. The B+-Tree is an enhanced index of B-Tree and is considered to be more suitable than B-Tree for flash memory. This is because B+-Tree stores index entries in internal nodes, and data entries in leaf nodes. On the other hand, B-Tree stores both index and data entries in internal nodes. Thus, B+-Tree is able to search more quickly with less I/O than B-Tree. In this respect, we proposed a new index management structure called F-Tree (flash memory-based Tree), which is based on B+-Tree and is able to handle the characteristics of flash memories efficiently. Note that flash memories have inferior features with very slow access for write and erase operations, compared to RAM memory. The main idea behind the F-Tree is based on the notion of compressed index rewriting. 3.1.

Notion of compressed index rewriting

B+-Tree is a well known index tree designed for efficient index operations such as insertion, deletion, and search. The idea behind B+-Trees is that internal nodes can have a variable number of child

Fig. 3.

An instance of B+-Tree.

Journal of Information Science, 33 (4) 2007, pp. 398–415 © CILIP, DOI: 10.1177/0165551506076331 Downloaded from http://jis.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2007 Chartered Institute of Library and Information Professionals. All rights reserved. Not for commercial use or unauthorized distribution.

402

Siwoo Byun et al.

nodes within some pre-defined range. As data is inserted or removed from the data structure, the number of child nodes varies within a node, and so the internal nodes are coalesced or split so as to maintain the designed range. Because a range of child nodes is permitted, B+-Trees do not need re-balancing as frequently as other self-balancing binary search trees, but may waste some space since the nodes are not entirely full. The lower and upper bounds on the number of child nodes are typically fixed for a particular implementation [15]. The B+-Tree is kept balanced by requiring that all leaf nodes have the same depth. This depth will increase slowly as elements are added to the tree, but an increase in the overall depth is infrequent, resulting in all leaf nodes being one more hop further removed from the root. By maximizing the number of child nodes within each internal node, the height of the tree decreases, balancing occurs less often, and efficiency increases. In [8], the simulation shows that the average fill factor of 69% is the optimal value in a B+-Tree node. That is, saving too many entries in a node could lead to performance degradation due to frequent insert/delete operations caused by tree re-balancing. However, direct B+-Tree implementation on flash memory could lead to performance degradation due to slow write operations. Furthermore, this implementation could lead to system unreliability due to frequent write operations in the same location [5, 16]. Therefore, in order to implement the B+-Tree in flash memory, one of the most important goals is to reduce the number of write and erase operations. To achieve this goal, we propose a compressed index rewriting scheme that exploits separated field compression and one-way rewrite techniques. As mentioned above, flash memory should perform erase operations prior to actual write operations. Since erase operations set every bit to 1 to initialize each block, write operations are allowed to change individual bits from 1 to 0 in a one-way fashion. That is, in order to change from 0 to 1, another erase operation should be performed prior to the actual change. However, in the case of changing a certain value from 1 to 0, flash memory could perform the rewrite operation without the erase operation [17]. If we exploit this special feature of a one-way rewrite, we could reduce the overhead of the write operation and enhance both the indexing performance and the lifetime of flash memory. The one-way rewriting technique is able to write the same node at least twice. This is achieved by compressing the original node prior to the first write operation. We achieved a 52% compression ratio by using reordered field compression. Since the compressed data area is sequentially allocated one by one, the rest of the 52% area of the original node can be reserved for another write operation. If the second write operation is requested, the first half of the node is invalidated and the compressed data of the second one is sequentially allocated. Unfortunately, if the size of the contents obtained from the compression is larger than half of the original size, we do not exploit the one-way rewriting technique. However, we can avoid most of these cases by properly handling the level of compression algorithm. We propose an enhanced B+Tree index called F-Tree, which exploits one-way rewriting and a reordered field compression. The internal and leaf node structures of the F-Tree index are illustrated in Figure 4. We used the LZO [18] algorithm for the node compression because LZO is not only simple but also easy for handling source codes. As shown in Table 2 [19], the LZO compression speed is about 4 MB/s and the decompression speed is about 15 MB/s in a Pentium 133MHz CPU. We can control the compression ratio and speed by varying the compression level of the LZO algorithm. We used the LZO-3 level in Table 2 and slightly enhanced the node compression ratio by regrouping the key and pointer fields in a similar sequence. Our simulation showed that the compression ratio is 55.6% and 54.7% for the internal node and the leaf node, respectively. These compression ratios enable the one-way rewriting module to rewrite the original node after the first write operation in the compression mode. If F-Tree exploits a more efficient compression algorithm than LZO, the index operation performance can be much more enhanced. Since CPU and RAM are far faster than flash memory access, the compression and decompression overhead is negligible in our experiments. This is because the main performance factor is at a slow erase/write operation speed in flash memory, not at a read speed in CPU and RAM. In F-Tree, the main reason for node compression is to reduce slow erase/write operations by a one-way rewriting procedure, and to contain more entries (keys and pointers) in the node. For I/O performance in F-Tree, the physical page size of flash memory corresponds to the size of one node. That is, one page Journal of Information Science, 33 (4) 2007, pp. 398–415 © CILIP, DOI: 10.1177/0165551506076331 Downloaded from http://jis.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2007 Chartered Institute of Library and Information Professionals. All rights reserved. Not for commercial use or unauthorized distribution.

403

Siwoo Byun et al.

Root … Internal Node

K1 P1

Ki

… …

Key Compression



Pi



Pq

Pointer Compression

Tree 512Byte Pointer Rewrite

Tree Pointer X

Tree Pointer

X

X ≤Ki

X

Ki-1 N.Kq-1 ) N ← N.Pq // N is last child node. else { Search node N for an entry i such that N.Ki-1 < key ≤ N.Ki ; N ← N.Pi } N ← FM-Read (N); } // end of while loop Search node N for entry (Ki,Pi) with key = Ki ; // search leaf node N. if ( entry_found ) { N ← N.Pi return FM-Read(N) // read data record. } else { print_error ( “record with search field value key is not in the data file” ); return NULL; } } // End of Function Boolean FT-Insert (int key, record *rec) { R ← block containing root node of tree; N ← FM-Read (R); while ( Type(N) ≠ LEAF_NODE ) { // N is not a leaf node of tree. Journal of Information Science, 33 (4) 2007, pp. 398–415 © CILIP, DOI: 10.1177/0165551506076331 Downloaded from http://jis.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2007 Chartered Institute of Library and Information Professionals. All rights reserved. Not for commercial use or unauthorized distribution.

406

Siwoo Byun et al.

q ← number of tree pointers in node N ; if ( key ≤ N.K1 ) N ← N.P1 else if ( key > N.Kq-1 ) N ← N.Pq else { Search node N for an entry i such that N.Ki-1 < key ≤ N.Ki ; N ← N.Pi } N ← FM-Read (N); } // end of while loop Search node N for entry (Ki,Pi) with key = Ki ; // search leaf node N. if ( entry_found ) return false; // record already in file, cannot insert. else {// insert entry in the tree to point to record. Create entry (K,Pr) where Pr points to the new record rec; if ( leaf node N is not full ) { insert entry (K,Pr) in correct position in leaf node N ; FM-Write (N); } else { // split if leaf node is full Copy N to Temp // Temp is an oversize leaf node to hold extra entry. Insert entry (K,Pr) in Temp in correct position; // Temp now holds (Pleaf + 1) entries in RAM. New ← Create a new empty leaf node for the tree; New. Pnext ← N. Pnext; J ← é (Pleaf + 1)/2ù; N ? First J entries in Temp; N. Pnext ← New; New ← Remaining entries in Temp; Insert the leaf node New and key in correct position in parent internal node; FM-Write (N); FM-Write (New); // if parent is full, split it and propagate the split further up. } return true; } // end of else } // End of Function Boolean FM-Delete (int key) { R ← block containing root node of tree; N ← FM-Read (R); while ( Type(N) ≠ LEAF_NODE ) { // N is not a leaf node of tree. q ← number of tree pointers in node N ; if ( key ≤ N.K1 ) N ← N.P1 else if ( key > N.Kq-1 ) N ← N.Pq else { Search node N for an entry i such that N.Ki-1 < key ≤ N.Ki ; N ← N.Pi } N ← FM-Read (N); } // end of while loop Search node N for entry (Ki,Pi) with key = Ki ; // search leaf node N. if ( ! entry_found ) return false; // key is not in tree, cannot delete. else {// delete entry and record. Remove entry (Ki ,Pi) in node N ; FM-Write (N); Remove Record which Pi points to ; FM-Write (Record); Decrease number of tree pointers in node N ; Journal of Information Science, 33 (4) 2007, pp. 398–415 © CILIP, DOI: 10.1177/0165551506076331 Downloaded from http://jis.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2007 Chartered Institute of Library and Information Professionals. All rights reserved. Not for commercial use or unauthorized distribution.

407

Siwoo Byun et al.

// underflow: if number of tree pointers is less than predefined minimum factor. if ( ! is_underflow(N) ) return true; // handle underflow. if (Type(N) = ROOT_NODE ) { Collapse N ; Make the remaining child the new root ; // so tree height decreases. FM-Write (N); } else { // merge immediate neighbor nodes L ← number of entries in left node ; R ← number of entries in right node ; if ( L > minimum_size && R > minimum_size ) // minimum number of entries in a node return true; else { // either L or R is less than minimum_size in node Balance current node: // since the node has too few entries due to the removal Insert all the key values in the two nodes into a single node S; FM-Write (S); Merge with the neighbor whose pointer is the current node’s parent; FM-Write (N); // Continue merging the internal nodes until you reach a node with the correct size or the root node; } } // end of merge return true; } // end of else } // End of Function

4.

Performance evaluation

We compared the performance of F-Tree to the well known index scheme, B+-Tree, by means of computer simulation. Alternative approaches might include: (1) measurements from an actual implementation, or (2) the use of approximate analytic methods. With regard to the first approach, even assuming that the obstacle resulting from a large programming effort to implement the whole usable system could be surmounted, there remains a problem with consistent measurement. This is because there are several interference factors such as kernel interrupt handling, context switching, paging, cleaning, garbage collecting, caching, prefetching, and buffering. These factors lead to unpredictable time delays and incorrect results in a real system. If an accurate and robust model is constructed in the analytic approach, it is much easier to evaluate the performance in comparison with a simulation approach in terms of the required programming endeavour and the computing resources. However, constructing an accurate analytic model is not easy because it possesses a complex and dynamic run time nature, which is hard to model with clear mathematical expressions. Although all the detailed factors for the run time nature are included in the analytical model, usually a simulation study or an actual measurement needs to be performed in order to validate the performance evaluation results of an analytic model. 4.1.

Simulation setup

We compared the performance of the F-Tree index with two types of the B+-Tree index. The average fill factor of the F-Tree was fixed at 69%, which is a general value for the best performance [8]. We denote this F-Tree as FTR69. On the other hand, we divided the average fill factor of B+-Tree into Journal of Information Science, 33 (4) 2007, pp. 398–415 © CILIP, DOI: 10.1177/0165551506076331 Downloaded from http://jis.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2007 Chartered Institute of Library and Information Professionals. All rights reserved. Not for commercial use or unauthorized distribution.

408

Siwoo Byun et al.

69% for best performance and 97% for best space-efficiency. We denote these B-Trees as BTR69 and BTR97, respectively. The simulation model was programmed using the CSIM [20] discrete-event simulation software and Microsoft Visual C++, and comparative computational experiments were carried out on a Pentium IV 2.8 GHz server with 512M RAM running under a Microsoft Windows 2000 Server operating system. We used a closed queuing model of CSIM for a flash memory database system. As shown in Figure 5, our model is closed in the sense that the system maintains its workload countout by multiprogramming so long as the average number of active index operations stays the same. The system model used for the simulation consists of four distinct components: the user operation generator (UOG), the index operation manager (IOM), the F-Tree manager (FTM), and the flash data manager (FDM). The UOG is responsible for generating user operation, modelled as a sequence of read and write operations in a flash memory database. The IOM manages the user operations from start to commitment, and generates the index operations such as search, insert and delete, and sends them to a queue of scheduled requests. The FTM accepts index operation requests and processes each one in F-Tree. The FDM handles the flash memory I/O by compressing and rewriting internal or leaf nodes in F-Tree. The major simulation parameters are shown in Table 3. Some of the actual values used for the simulation parameters were taken from previous research [21], where they are fully justified. The primary performance metrics used in this study are the index operation throughput rate and the response time. The throughput rate is defined as the number of index operations that are successfully completed per second, and the response time is the time that elapses between the submission and the completion of an index operation. 4.2.

Simulation results and their interpretation

We now analyze the results of the simulation experiments performed for the three index schemes, FTR69, BTR69, and BTR97. Our simulation experiments were used to investigate the effect of the operation workload level on the performance of three index management schemes, num_OPs varies. The search_ratio is set to a default value of 50%. The insert_ratio and delete_ratio are set to a default value of 25%.

Fig. 5.

Queuing system model for simulation.

Journal of Information Science, 33 (4) 2007, pp. 398–415 © CILIP, DOI: 10.1177/0165551506076331 Downloaded from http://jis.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2007 Chartered Institute of Library and Information Professionals. All rights reserved. Not for commercial use or unauthorized distribution.

409

Siwoo Byun et al.

Table 3 Major simulation parameters and settings System parameters

Description

Value

num_OPs RAM_delay Flash_read_delay Flash_write_delay Flash_erase_delay Flash_page_size search_ratio insert_ratio delete_ratio

Number of index operations per second I/O time for reading an object in RAM I/O time for reading an object in flash memory I/O time for writing an object in flash memory I/O time for erasing an object in flash memory Page size in flash memory Probability of search operation Probability of insert operation Probability of delete operation

300–2400 in steps of 300 0.1 ms/byte 16 ms/page 250 ms/page 2 ms/block 512 Bytes 50% 25% 25%

The overall operation throughput as a function of num_OPs is presented in Figure 6, and its corresponding average operation response time is depicted in Figure 7. The average response time gradually increases with the workload level. This is mainly due to the increment of index access contention that follows the increment of the workload level. In this experiment, we observed that the highest throughput is exhibited by FTR-69, followed by BTR69 and BTR97. In Figure 6, we observe that the performance of each scheme levels off or begins to be degraded beyond num_OPs of 1200 to 1500. Although num_OPs has been increased beyond that range, the number of active index operations that are currently being executed in the system appears to decrease slightly. This fact implies that adding more index operations beyond that range simply contributes to increasing the index access contention, not necessarily incurring an enhanced level of throughput. From this observation, we can claim that the performances of the three index management schemes are mainly limited by the factor of access contention such as slow write and erase operations, caused by insert or delete procedures. In Figure 6, we also observe that the performance gain of FTR relative to BTR begins to decrease as num_OPs increases beyond 1500, although FTR achieves a higher performance than BTR by reducing the number of slow erase operations throughout the whole range of num_OPs. When num_OPs reaches the highest data contention point of 1500, the operation throughput of FTR begins to decrease. This implies that FTR also experiences the negative effect of operation contention under the high workload environment. 1600 1400

Throughput

1200 1000 800 600 400 200 0 300

600

900

1200

1500

1800

2100

2400

Index Operations/sec BTR69 Fig. 6.

FTR69

BTR97

Index operation throughput.

Journal of Information Science, 33 (4) 2007, pp. 398–415 © CILIP, DOI: 10.1177/0165551506076331 Downloaded from http://jis.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2007 Chartered Institute of Library and Information Professionals. All rights reserved. Not for commercial use or unauthorized distribution.

410

Siwoo Byun et al.

350

Response Time(ms)

300 250 200 150 100 50 0 300

600

900

1200

1500

1800

2100

2400

Index Operations/sec BTR69 Fig. 7.

FTR69

BTR97

Average response time.

In Figure 7, we observe that the response time of FTR and BTR increases gradually, as the workload level increases. At the overall workload level, the response time curve of FTR is superior to that of BTR. With respect to the operation throughputs, the performance gain of FTR relative to BTR reaches 23% when 2400 operations exist. This performance difference implies that a large portion of performance gain in FTR over BTR comes from the implementation of the notion of compressed index rewriting. However, BTR inevitably delays most operations due to the slow erase operations. Compared to BTR, FTR successfully overcomes the negative effect of the slow erase operations by employing reordered field compression and exploiting rewiring techniques under the same conditions. We also investigate the performance differences of FTR69, BTR69, and BTR97 in terms of search, insert, and delete operations. The search operation throughput as a function of num_OPs is depicted in Figure 8. Its corresponding average response time is depicted in Figure 11. The simulation results

5000 4500

Throughput

4000 3500 3000 2500 2000

50 00

45 00

40 00

35 00

30 00

25 00

20 00

15 00

1500

Index Operations/sec BTR69 Fig. 8.

FTR69

BTR97

Search throughput.

Journal of Information Science, 33 (4) 2007, pp. 398–415 © CILIP, DOI: 10.1177/0165551506076331 Downloaded from http://jis.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2007 Chartered Institute of Library and Information Professionals. All rights reserved. Not for commercial use or unauthorized distribution.

411

Siwoo Byun et al.

1400 1200

Throughput

1000 800 600 400 200

00 24

00 21

00 18

00 15

12

00

0 90

0 60

30

0

0

Index Operations/sec BTR69 Fig. 9.

FTR69

BTR97

Insert throughput.

1400 1200

Throughput

1000 800 600 400 200

0 24 0

0 21 0

0 18 0

0 15 0

0 12 0

0 90

0 60

30

0

0

Index Operations/sec BTR69 Fig. 10.

FTR69

BTR97

Delete throughput.

indicate that BTR97 is capable of providing superior performance. This is mainly because BTR97 used a high fill factor of 97% for best space-efficiency, and thus reduced the number of node accesses by shortening the depth of the index tree. The performance gain of BTR97 relative to FTR69 reaches 20% when 5000 index operations exist. The insert operation throughput and its corresponding average response time are depicted in Figures 9 and 12, respectively. As compared to search operations, insert operations are much more intensive in a sense that insert operations should perform the slow write and erase operations in flash memory. In this experiment, we observed that FTR69 showed a remarkable performance difference from the other schemes at the overall range of num_OPs. Journal of Information Science, 33 (4) 2007, pp. 398–415 © CILIP, DOI: 10.1177/0165551506076331 Downloaded from http://jis.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2007 Chartered Institute of Library and Information Professionals. All rights reserved. Not for commercial use or unauthorized distribution.

412

Siwoo Byun et al.

120

Response Time(ms)

100 80 60 40 20

00 50

00 45

00 40

00 35

00 30

00 25

00 20

15

00

0

Index Operations/sec BTR69 Fig. 11.

FTR69

BTR97

Search response time.

Response Time(ms)

800

600

400

200

24 00

21 00

18 00

15 00

13 00

90 0

60 0

30 0

0

Index Operations/sec BTR69

Figure 12.

FTR69

BTR97

Insert response time.

At the range of num_OPs above 600, FTR69 outperforms all the other schemes. In Figure 9, the performance gain of FTR69 relative to the other schemes reaches 27%. This performance gain comes from the implementation of the compressed index rewriting described in Section 3.1. The compressed index rewriting mechanism reduces the number of erase operations in order to shorten the node update delay caused by the insert operations. This effectiveness of compressed index rewriting can be confirmed by the simulation results in Figure 12. Actually, the response time of FTR69 is 36% lower than those of the other schemes. Journal of Information Science, 33 (4) 2007, pp. 398–415 © CILIP, DOI: 10.1177/0165551506076331 Downloaded from http://jis.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2007 Chartered Institute of Library and Information Professionals. All rights reserved. Not for commercial use or unauthorized distribution.

413

Siwoo Byun et al.

Response Time(ms)

800

600

400

200

00 24

00 21

00 18

00 15

12

00

0 90

0 60

30

0

0

Index Operations/sec BTR69 Fig. 13.

FTR69

BTR97

Delete response time.

The delete operation throughput and its corresponding average response time are depicted in Figures 10 and 13, respectively. Note that the throughput of the delete operation is higher than that of the insert operation. This is because delete operations only perform unlinking data records instead of actual updates. As shown in Figure 10, the performance gain of FTR69, relative to the other schemes, even reaches 34%. There is a trade-off in employing compressed index rewriting in a sense that this mechanism certainly requires an additional overhead to compress/uncompress index nodes. However, as shown in our simulation results, the superior performance of our scheme compensates for this overhead.

5.

Conclusions

We proposed a new index rewriting scheme based on F-Tree in order to achieve high index operation performance in flash memory-based database systems. Unlike the previous B-Tree-based approaches, the F-Tree-based scheme improves operation performance by compressing pointers and keys in tree nodes and rewriting the nodes in node insert/delete processes. We also proposed a simulation model based on a closed queuing system to show performance comparisons. Our simulation results show that the F-Tree-based scheme outperforms the traditional B-Tree-based scheme in terms of response time and transaction throughput, especially in a high updates environment. This is because the positive effect of index compression and rewriting successfully overcomes the negative effect of the slow characteristics of flash write/erase operations under the same condition. Since FTree has a generic functionality of efficient index management in flash memory-based systems, it can be widely employed in portable and embedded computing devices.

References [1] S. Byun, M. Hur and H. Hwang, Flash memory lock management for portable information systems, International Journal of Cooperative Information Systems 15(3) (2006) 461–79. [2] L. Chang and T. Kuo, An adaptive striping architecture for flash memory storage systems of embedded systems. In: Proceedings of the 8th IEEE Real-Time and Embedded Technology Symposium, California, San Jose, 24–27 September (IEEE Computer Society, 2002) 187–96. Journal of Information Science, 33 (4) 2007, pp. 398–415 © CILIP, DOI: 10.1177/0165551506076331 Downloaded from http://jis.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2007 Chartered Institute of Library and Information Professionals. All rights reserved. Not for commercial use or unauthorized distribution.

414

Siwoo Byun et al.

[3] K. Yim, A novel memory hierarchy for flash memory based storage systems, Journal of Semiconductor Technology and Science 5(4) (2005) 262–9. [4] C. Park, J. Seo, D. Seo, S. Kim and B. Kim, Cost-efficient memory architecture design of NAND flash memory embedded systems. In: Proceedings of the 21st International Conference on Computer Design, San Jose, California, 13–15 October 2004 (IEEE Computer Society, 2004) 474–9. [5] C. Wu, L. Chang and T. Kuo, An efficient R-Tree implementation over flash-memory storage systems. In: E. Hoel and P. Rigaux (eds), Proceedings of the ACM GIS03, New Orleans, Louisiana, 2003 (ACM, New York, 2003)17–24. [6] H. Kim and S. Lee, A new flash memory management for flash storage system. In: Proceedings of the 23rd Annual International Computer Software and Applications Conference, Phoenix, Arizona, 1999 (IEEE Computer Society, 1999) 284–9. [7] S.K. Cha, J.H. Park, and B.D. Park, Xmas: an extensible main-memory storage system. In: F. Golshani and K. Makki (eds), Proceedings of the 6th ACM International Conference on Information and Knowledge Management, 10–14 November 1997 (ACM, New York, 1997) 356–62. [8] R. Elmasri and S.B. Navathe, Fundamentals of Database Systems (Addison-Wesley, 1994). [9] N. Beckmann, H.P. Kriegel, R. Schneider and B. Seeger, The R*Tree: an efficient and robust access method for points and rectangles. In: H. Garcia-Molina and H.V. Jagadish (eds), Proceedings of the ACM SIGMOD International Symposium on the Management of Data, 23– 25 May 1990 (ACM, New York, 1990) 322–31. [10] C. Lee, K. Ahn and B. Hong, A study of performance decision factor for moving object database in main memory index. In: Y. Eom (ed.) Proceedings of the Korea Information Processing Society 2003 Spring Conference, Seoul10–11 May (KIPS, Seoul, 2003) 1575– 8. [11] H. Lu, Y. Yeung Ng and Z. Tang, T-Tree or B-Tree: main memory database index structure revisited. In: Proceedings of the 11th Australasian Database Conference, 31 January–3 February 2000 (IEEE Computer Society, 2000). [12] Software Design Using C++ (2006). Available at: http://cis.stvincent.edu/carlsond/swdesign/btree/ btree.html (accessed 5 February 2006). [13] What is Flash? (2006) Available at: www.samsung.com/Products/Semiconductor/Flash/WhatisFlash/ FlashStructure.htm (accessed 11 July 2006). [14] C. Wu, L. Chang and T. Kuo, An efficient B-Tree layer for flash-memory storage systems. In: J Chen and S.S. Hong (eds), Proceedings of the RTCSA, Tainan, Taiwan, 18–20 February 2003 (Springer, Berlin/ Heidelberg, 2003) 409–30. [Springer, Lecture Notes in Computer Science, 2968/2004.] [15] B-tree (2006). Available at: http://en.wikipedia.org/wiki/B-tree (accessed 5 February 2006). [16] J. Nam and D. Park, The efficient design and implementation of the B-Tree on flash memory. In: M.H. Kim (ed.) Proceedings of the 32nd Korea Information Science Society Fall Conference, Seoul, 11–12 November, 2005 (KISS, Seoul, 2005) 55–7. [17] J. Jeong, S. Noh, S. Min and Y. Cho, A design and implementation of flash memory simulator, Journal of Korean Information Science: C 8(1) (2002) 36–45. [18] LZO (2006). Available at: www.oberhumer.com/opensource/lzo/#download (accessed 5 February 2006). [19] LZO (2006). Available at: www.oberhumer.com/opensource/lzo/lzodoc.php (accessed 5 February 2007) [20] H. Schwetman, CSIM User’s Guide for Use with CSIM Revision 16 (Microelectronics and Computer Technology Corporation, Austin, 1992). [21] K. Yim and K. Koh, A study on flash memory based storage systems depending on design techniques. In: Proceedings of the 30th Korea Information Science Society Fall Conference, Seoul, 21–22 October, 2003 (KISS, Seoul, 2003) 274–6.

Journal of Information Science, 33 (4) 2007, pp. 398–415 © CILIP, DOI: 10.1177/0165551506076331 Downloaded from http://jis.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2007 Chartered Institute of Library and Information Professionals. All rights reserved. Not for commercial use or unauthorized distribution.

415

Suggest Documents