Rahul ShindeEmail author; Vinay Patil; Akshay Bhargava; Atul Phatak; Amar More. Conference paper. DOI : 10.1007/978-3-319-03095-1_60. 1.4k Downloads.
Inline Block Level Data De-duplication Technique for EXT4 File System Rahul Shinde, Vinay Patil, Akshay Bhargava, Atul Phatak, and Amar More MIT Academy of Engineering, Pune, India {rahul.shindeat,vinay18.patil,bhargava.akshay14, atul.phatak5,amarmore2006}@gmail.com http://www.mitaoe.ac.in
Abstract. Day by day data centers are growing and also their data. Data is key part of their organization and hence backed up after a regular interval. Due to huge data size, to improve utilization and life span of the disks, data de-duplication techniques are followed. In data deduplication single copy of the data is stored on the disk by finding and eliminating the redundant copies. Now a days EXT4 has become a popular file system as it supports increased file system size and improved performance. So EXT4 file system can be used to store the backups and the data de-duplication could still increase the disk capacity virtually and could reduce the number of disk writes. In this paper we present a data de-duplication algorithm for EXT4 file system. Using this algorithm the duplicate data is eliminated before it is actually written to the disk and the extents in the EXT4 file system are arranged accordingly. Keywords: File system, EXT4 file system, Data de-duplication, Data backup.
1
Introduction
For every datacenter, storage efficiency is one of the crucial factors since the data is backed up regularly, which results in huge amount of data to be stored on disk and hence there arises a need for data de-duplication. Data de-duplication [1] makes disk more affordable by eliminating redundant data from the disk thereby increasing the efficiency. In this method only unique copy is stored on the disk and the rest of the copies which are same as the one which is already stored will only be a reference to that unique copy and no extra storage space will be allocatd to them. Consider an example of backup server which takes backup on weekly basis. Suppose in the first week the data stored is 50GB and in the next week the data is increased to 70GB out of which 50GB is same as the first week, then the total data stored during first and second week will be 50GB + 70GB = 120GB. But if de-duplication is applied on the backup server, then the actual data that would be stored at the end of second week is just 70GB since 50GB of it was same as that of first week i.e. only modified data gets saved. S.C. Satapathy et al. (eds.), ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of CSI - Volume II, Advances in Intelligent Systems and Computing 249, c Springer International Publishing Switzerland 2014 DOI: 10.1007/978-3-319-03095-1_60,
559
560
R. Shinde et al.
In EXT4 file system [13], the physical block number is of 48 bits and the size of each block is of 4KB or 4096 bytes (212 ). Hence the maximum file system size could reach up to (260 ) i.e. (248 ) * (212 ) or 1EB. Due to this large file system capacity, EXT4 file system could be used as a file system for back up servers and by applying de-duplication, the large amount of data could be saved on the disk. As compared to previous EXT file systems which has small file system capacity, EXT4 has larger file system capacity which makes it more efficient to handle various applications. Also block allocation and inode allocation strategies [11] in EXT4 file system are improved compared to previous versions of EXT file systems which makes EXT4 file system more efficient for backup applications. In Section 2 we mention the related work done in the area of data de-duplication, Section 3 provides information about extents and extent structure in EXT4 file system. Section 4 shows overall design of our system and Section 5 gives the implementation details of our system. Results are discussed in Section 6 followed by the conclusion.
2
Related Work
Data de-duplication has received lot of interest in storage research and industry. If we review related work in the area of de-duplication, we can say that most of the work has been carried out in the context of various types and levels of de-duplication. Microsoft storage server [2], EMC’s Centra [3] use file level deduplication. Venti [4] perform de-duplication with respect to a fixed block size. The NetApp de-duplication [5] for file servers makes use of hashes to find duplicate data blocks. By using byte by byte comparison, hash collisions are resolved in this system. This process runs in the background, therefore it is a post-process de-duplication system. Inline de-duplication at block level [6, 7] is implemented in ext3 file system. SDFS is a file system [8] for Windows and Linux designed to support the unique needs Virtual Environments and supports enhanced functionality for VMWare, Xen, and KVM. Extreme Binning [9], proposed a scalable de-duplication technique for chunk based [10] file backup. We have used inline data de-duplication approach. which is applied Our layer added to the EXT4 file system is implemented for inline block level data de-duplication.
3
Structure of Extents in EXT4 File System
Extents [11–14] are an indivisible part of EXT4 file system. They were introduced in order to improve the throughput of the file system through sequential read and write operations.
Fig. 1. EXT4 extent structure
Inline Block Level Data De-duplication Technique for EXT4 File System
561
The features of extents are delayed allocation and persistent pre-allocation. Pre- allocation deals with allocating space of specific size at the time of file creation. Delayed allocation deals with allocation of blocks after page is flushed. Extent consists of three parts starting logical number that the extent covers, length i.e. total number of blocks stored inside the extent and starting physical block number (Higher 16 bits and lower 32 bits) as shown in Fig 1.
Fig. 2. Extent tree for sparse file in EXT4 file system
An inode contains a maximum of 4 extents but in case of huge, sparse files the extents are stored on the disk in the form of H-tree [13, 14] as shown in Fig. 2. So, an improved technique of accessing blocks for sequential read/write is seen in EXT4 file system as compared to its descendants.
4
System Design
Our system deals with eliminating redundant data from disk. User application collects required data and it is sent to kernel for storage. Kernel invokes write system call which is intercepted in the VFS layer. At VFS layer the data is held in the buffer. Further buffer is divided into 4KB chunks [1] which is a standard block size in EXT4 file system. For each block, hash value is calculated and the hash table is maintained where this hash value is stored. So, when each time 4KB of chunk arrives, its hash value is calculated and is compared with the previous hash values from hash table. As shown in Fig. 3 the algorithm has to deal with two possibilities: 1. The hash value obtained is different from the hash value which is already stored in hash table 2. The hash value already exists in hash table Also the system deals with handling of extents by verifying whether the block is contiguous with the previously encountered duplicate block. This is further explained in Section 5.
562
R. Shinde et al.
Fig. 3. EXT4 de-duplication system design
5
Implementation Details
Our de-duplication algorithm deals with eliminating redundant data from the buffer itself before it is written to the disk and also handles extents which provide sequential read/write operations on blocks. At the generic level the data that is held in the buffer is divided into chunks of 4KB each. For each block, hash value is calculated and is stored in the hash table. For this purpose we have used two hash functions viz. MD5 - Message Digest 5 and FNV - Fowler Nollvo hash. For calculating the hash value of 4KB data chunk MD5 algorithm is used which returns 128 bit unique value and is used to check the duplicate data. The main issue here is how to organize the hash table in order to lookup the hash values efficiently. If we store 128 bit value sequentially, then it would take O(n) time to search the required hash value and hence we have furhter applied FNV algorithm to construct the hash table. This algorithm returns a 32 bit hash value of integer type. But if complete 32 bits are used to build our data structure it would require 232 = 4GB of memory. In order to optimize the memory, 32 bits are truncated to 21 bits [7] which reduced memory requirements to 221 = 2MB. So with this 21 bit value, a total of 221 = 2097152 indices can be used to build the hash table. At every index a singly linked list is maintained. The node structure of this list is as shown in the Fig. 4. There are four fields in the node structure which comprises of hash value, block number which is of 6 bytes since the size of block number in EXT4 is 48 bits [13], reference count which shows how many blocks are referring to the unique stored copy and the next pointer, which points to the next node. Corresponding to the correct hash index this node structure is
Inline Block Level Data De-duplication Technique for EXT4 File System
563
Fig. 4. Node Structure for the de-duplication database
filled. So, when each time 4KB of chunk arrives its hash value is calculated and is compared with the previous hash values stored in this node structure. As shown in Fig. 3 the two possibilities may arise as follows: 1. The value obtained is different from already stored hash value in table 2. The value is already existing in hash table Considering the first case, each time hash value is calculated for 4KB chunk and if it is not there in the hash table then the corresponding hash value is stored in the node structure, normal write operation is performed and the block is written to the disk. However when the second case occurs, firstly the reference count is incremented to denote that there is a block which is referring to the original block which is already saved. Then the block is checked to verify if it is contiguous with the previous duplicate block. If true, it means its block number is one greater than the previous block, so the block is merged in the current extent and the block is not written to the disk. Otherwise a new extent is allocated for the duplicate block. In both the cases the metadata i.e. inode table and bitmaps are updated. Our system deals with the following scenarios for blocks in a file: 1. The non-duplicate blocks present before the first duplicate block 2. The duplicate block or multiple duplicate blocks are present 3. The number of blocks are present after the duplicate block/s Dealing with first case, the number of extents allocated will be 1 till block count reaches up to 32768 since one extent can contain 32768 blocks [13], or else the next extent is allocated so that a sequential read/write can be performed on blocks that are contained in a single extent. Second and third cases are explained as shown in Fig. 5. Suppose there are 2 files say a.txt and b.txt, each file containing 7 blocks of 4KB each. The hash value of each block of a.txt is calculated and stored in hash table. On the other side, when second file is to be stored, the hash value of each 4KB block is calculated and is compared with already stored hash values in hash table. If match is found, the same hash value will not be stored again in hash table instead the reference count field in the node structure of already allocated node with similar hash value will be incremented. If match does not occur, the
564
R. Shinde et al.
Fig. 5. Node Structure for the de-duplication database
hash value is stored in the hash table and the same process is repeated for all other blocks of file. Let’s assume that the starting block number of a.txt is 1000 and that of b.txt is 2000. As shown in the Fig. 5 block c and d of file b.txt contain same data, so blocks p, q, r, s are stored in the first extent with length 4 and with starting block number 2000.For blocks c and d a separate extent is allocated with length 2 and starting block number as 1002 and for t a separate extent is allocated with length 1 and block number 2004.
6
Results
We have deployed the algorithm by modifying the source code of EXT4 file system in the Linux Kernel. In order to get the lower level details about the extents and block allocation by EXT4 file system we have used ghex [12] followed by istat, fsstat and blkcat commands [15].
Fig. 6. Node Information of a.txt
Inline Block Level Data De-duplication Technique for EXT4 File System
565
Fig. 7. Node Information of b.txt
Each inode entry starts with A481 value as shown in Fig. 6 and Fig. 7. Bytes 40- 41 represent the magic number which is statically defined to 0xF30A, bytes 42-43 represent the number of extents in our file, bytes 52-55 represent the logical block number, bytes 56-57 represent number of blocks in extent, bytes 58-59 give the upper 16 bits of physical block address and bytes 60-63 give the lower 32 bits of physical block address. We have gathered these results by keeping the EXT4 partition size to 10GB with default 4KB block size. Assume that initially there are no files on the disk, we sotred the first file say a.txt having 10 blocks in it. In the second file b.txt, assume that 7th and 8th blocks are same as that of a.txt. Looking at the inode information of a.txt the number of extents allocated is 1 (bytes 42-43) and the number of blocks in that extent is 0x000A => 10. The starting block number of this extent (bytes 60-63) is 0x0000862C => 34348, so for the 7th and 8th block of a.txt block numbers are 34354 and 34355 respectively. Let’s analyze b.txt to check whether we got the same block numbers. Observing the inode information of b.txt it is seen that 3 extents (bytes 42-43) are allocated for b.txt, whereas the first extent covers 6 blocks (bytes 56-57) and the starting block number of this extent is 0x00008640 => 34368. The next 4 bytes represents the logical block number of the 7th block. The shaded portion (0x0002) next to it represents that this is the next extent and there are 2 blocks in this extent. The next shaded portion 0x00008632 => 34354 represents the starting block number of this extent and the next block number is 34355. We got the same block numbers as we have got in a.txt for 7th and 8th logical blocks. If these two blocks were not duplicate then 7th and 8th block number starting from 34374 i.e. 34374 and 34375 would not have been assigned for these blocks. But since they are duplicate blocks, 34374 and 34375 block numbers are assigned for the remaining two blocks, as can be seen in the shaded portions 0x0002 and 0x00008646=>34374 respectively. Now if we calculate the size of a.txt using “ls -s”command which gives the allocated size of each file in blocks it is 40KB, and the size of b.txt is 32 KB since two blocks were duplicate. This proves the working of de-duplication algorithm.
7
Conclusion
The outcome of our de-duplication approach for EXT4 file system is inspiring and it shows how our approach is able to save substantial amount of disk space by actually avoiding writing duplicate data to the disk. Our approach has successfully handled the allocation of extents for contiguous and non-contiguous
566
R. Shinde et al.
duplicate block/s. Since the duplicate blocks are identified before they are actually written to the disk, our algorithm also helps in reducing the number of writes to disks which may be helpful in SSDs where the number of writes are limited.
References 1. El-Shimi, A., Kalach, R., Kumar, A., Oltean, A., Li, J., Sengupta, S.: Primary Data Deduplication-Large Scale Study and System Design. In: Proc. USENIX ATC, Boston, MA (2012) 2. Windows Storage Server, http://technet.microsoft.com/en-us/library/gg232683WS.10.aspx 3. EMC Corporation: EMC Centera: Content Addresses Storage System, Data Sheet (2002) 4. Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: The First USENIX Conference on File and Storage Technologies (Fast 2002), vol. 2, pp. 89–101 (2002) 5. Alvarez, C.: NetApp de-duplication for FAS and V-Series deployment and implementation guide. Technical ReportTR-3505 (2011) 6. Brown, A.: Kristopher Kosmatka: Block-level Inline Data de-duplication in EXT3. In: University of Wisconsin - Madison Department of Computer Sciences (2010) 7. More, A., Shaikh, Z., Salve, V.: DEXT3 Block Level Inline De-duplication using EXT3 File System. In: Linux Symposium, p. 87 (2012) 8. Larabel, M.: SDFS: A File-System With Inline De-Duplication (2011) 9. Bhagwat, D., Eshghi, K., Long, D.D., Lillibridge, M.: Extreme Binning: Scalable, Parallel de-duplication for Chunk-based File Backup. In: IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, MASCOTS 2009, pp. 1–9. IEEE (2009) 10. Zhu, B., Li, K., Hugo Patterson, R.: Avoiding the disk bottleneck in the data domain de- duplication file system. In: Fast, vol. 8, pp. 269–282 (2008) 11. Cao, M., Santos, J.R., Dilger, A.: EXT4 Block and Inode Allocator Improvements. In: Linux Symposium, p. 263 (2008) 12. Fairbanks, K.D.: An analysis of EXT4 for digital forensics. Digital Investigation 9, S118–S130 (2012) 13. Avantika, M., Cao, M., Bhattacharya, S., Dilger, A., Tomas, A., Vivier, L.: The new ext4 filesystem: current status and future plans. In: Proceedings of the Linux Symposium, vol. 2, pp. 21–33 (2007) 14. Kadekodi., S., et al.: Taking Linux Filesystems to the Space Age: Space Maps in EXT4. In: Linux Symposium (2010) 15. http://computer-forensics.sans.org/blog/2010/12/20/ digital-forensics-understanding-ext4-part-1-extents#part1-5