De-duplication in File Sharing Network Divakar Yadav, Deepali Dani, and Preeti Kumari Jaypee Institute of Information Technology, A-10, Sector-62, Noida (India)
[email protected],
[email protected],
[email protected]
Abstract. Redundant data transfer over a network is one of the important reasons of traffic congestion today. In this paper, we proposed an efficient and secure file sharing model using de-duplication technology to resolve it. A file sharing based de-duplication system reduce bandwidth and storage at both client and server machine. It does not download duplicate blocks that already have been downloaded. To achieve the security of client data, three-tier architecture is proposed in this work. For this purpose SHA-1 hash function is being used, in which 8KB block of data is converted into a 20 bytes digest. Thus the design presents a dramatic reduction in storage space requirement for various workloads and hence reduces time to perform backup in bandwidth constraint environment. Keywords: De-duplication, three-tier architecture, Hash Algorithm, Bandwidth conservation, Storage reduction.
1 Introduction Our idea is to exploit the de-duplication techniques in online or on-the-fly compression, as de-duplication was earlier being used only in back-up operations. Also existing file sharing systems like Bit-Torrent [1] does not facilitate the user to update the version of his own file. In our implementation while downloading a large file, our system exploited Duplicate Transfer Detection (DTD), i.e. client will only download non-redundant blocks from multiple sources [2].
2 Proposed File Sharing Network System System works in Coordinator-peer architecture, where information about files resides on a central server and peers contain actual data that needs to be downloaded. Server stores data in database tables, which is directly accessible only from Application server and not from client. Server has optimizations for computing hashes and it has scalable architecture to handle multiple requests. Clients have capability of returning blocks of shared files when requested remotely. Client can make the file sharing private by making a peer group. Metadata is shared by server and files are S. Aluru et al. (Eds.): IC3 2011, CCIS 168, pp. 551–553, 2011. © Springer-Verlag Berlin Heidelberg 2011
552
D. Yadav, D. Dani, and P. Kumari
downloaded P2P. This ensures robustness during downloading and updating of file in the network. The proposed algorithm for the duplicate finder (at database server) is as follow: Duplicate_Finder(file_name, file_hash_values) Begin updatedVersion ÅfindLatestVersion(file_name) L = totalHashes(updatedVersion) M= totalHashes(file_name) TotalDe-dupeBlocks= 0 If L>M OR L==M then For( i=1;i 0 UpdateFile(file_name) Return(updated_Version,BlockNumbers,ClientIPAddress) Else return NULL ; // No latest Version Present at the server Endif Else // L