A Novel Security Schema for Distributed File Systems Bager Zarei
Mehdi Asadi
Saeed Nourizadeh
Shapour Jodi Begdillo
Dept. of Computer Engineering, Islamic Azad University, Shabestar Branch, Iran
Dept. of Computer Engineering, Islamic Azad University, Khamneh Branch, Iran
Dept. of Computer Engineering, Islamic Azad University, Shabestar Branch, Iran
Dept. of Computer Engineering, Islamic Azad University, Parsabad mogan Branch, Iran
[email protected]
[email protected]
[email protected]
[email protected]
Abstract-Distributed file systems are key tools that enable collaboration in any environment consisting of more than one computer. Security is a term that covers several concepts related to protecting data. Authentication refers to identifying the parties involved in using or providing file services. Authorization is the process of controlling actions in the system both at the user level, such as read and write, and at the administrative level, such as setting quotas and moving data between servers. Communication security addresses the integrity and privacy of messages exchanged over networks vulnerable to attack. Storage security concerns the safety of data “at rest” on disks or other devices. But these are mechanistic approaches to security. At a higher level, the issue of trust of all the parties in each other and the system components control the security decisions. This project is a successful endeavor to make security in distributed file systems. The main object of the paper was to get in depth knowledge about base concepts of distributed file system like fault tolerance, fault recovery, scalability and specifically security.
• Storing information • Retrieving information • Sharing information File systems are useful for long-term and persistent storage of information and are a way to organize information, usually in the form of files and a file hierarchy. A file consists of mapping a name to a block of information accessed as a logical group. Many ways exist to implement file systems and many file systems have been implemented, but relatively few of these are widely used. Normally, file systems are implemented as part of the operating system’s functionality, in the kernel. Applications use a common interface provided by the operating system for working with files, while the operating system controls the semantics of this file access.
1 .Keywords- Security, DFS , Fault Tolerance, Fault Recovery, Load Balancing, DES2.
I.
INTRODUCTION
To install security in distributed file systems there are different protocols such as SSL and Kerberos. These protocols use methods such as client and server mutual authentication, Authorization to clients for accessing and using of file system services and etc. In Authorization method, client must has a ticket or valid certificate for using file system services, that the ticket includes information such as username, expiration time of ticket and etc. The rest of this paper is organized as follows: In section 2 backgrounds and the main purpose of file systems are discussed. In section 3 design points of file system from different points of view are presented. In Section 4 the design requirements of distributed file system are covered. Secure distributed file systems are proposed in section 5 and finally the section 6 includes conclusion and our proposed solution for installing security in a distributed file system and comparison our proposed method with kind of existing systems for installing security in DFS. II. FILE SYSTEM BACKGROUND File systems serve several main purposes: 1 2
Distributed File System Data Encryption Standard
III. FILE SYSTEM ISSUES Some issues, common to most file systems, need addressing in the design of any new file system. 1.
Sharing Sharing files among users is important in any file system. Several ways exist to define the sharing semantics in a file system. UNIX semantics for file sharing dictates that users should immediately see the effects of all writes to a file. Thus if both user1 and user2 open the same file, if user1 writes to the file and user2 later reads from the file, user2 should see what user1 had written to the file. Another method to implement file sharing is called session semantics. These semantics guarantee changes to files are seen by other users when the file is closed. If multiple users have opened the same file, they could hold different or stale copies of the files when one of them writes to the file. Another approach is to treat every file as read-only and immutable. Every change to a file essentially creates a new immutable file with a new file name to refer to the new copy. Because all files are read-only, caching and consistency become much simpler to implement. Access Control
Access control in a file system is making sure users only access file resources they are allowed to access, including read access to some files and full write access to other files. Once
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 305–310. © Springer Science+Business Media B.V. 2008
306
ZAREI ET AL.
users are authenticated, a mechanism must exist to control the access to resources. In UNIX, file permissions can be set to any combination of read, write, and execute. Each file is associated with a single owner and a single group, and only the owner can set the permissions. Three groups of permissions exist: one for the owner, one for the group, and one for everyone else. UNIX access control is too difficult to use and does not promote sharing. We desired a better system for access control, preferring a system similar to AFS with access control lists, allowing normal users to create groups and set permissions on files. 2.
Caching Different media have different costs of access. Cache memories near processors are very fast, local disks are slower, and remote disks can be even slower to access. The basic idea of caching is to keep copies of recently accessed data on faster medium to speed up repeated access to the same data. One copy somewhere is usually considered the master copy and every other copy is a secondary copy. All types of file systems need some form of caching. Local file systems cache disk blocks in memory to reduce the time needed to fetch blocks from a disk. Distributed file systems need to cache remote blocks or files locally to reduce the amount of network traffic needed. Cryptographic file systems need to cache decrypted blocks to operate on them. Consistency
Changes made to a cache copy must eventually be propagated back to the master copy, and a cache copy can become out of date if the master copy is changed by another client. The general problem of keeping the cache copy consistent with the master copy is called cache consistency. To maintain consistency, a system needs a caching policy to determine what data is cached and when data is removed from the cache. The job of maintaining consistent caches can be given to either the clients or the server. The clients can be responsible for periodically checking the validity of their caches. This has the disadvantage of frequent checks that may not be necessary. Another approach is to have the server maintain information about what each client has cached. When the server detects a client has something invalid in its cache, the server contacts the client and tells the client to invalidate that particular file. One disadvantage of the server-centered approach is extra complexity in both the client and server code. Write Policy
When a cache block is modified, the changes can be pushed back to disk or a remote server at different times, corresponding to different cache write policies. One method, called write-through, is to immediately send all writes through the cache directly to the master copy. Another method is delayed-write, postponing writes until a future time. A type of
delayed write, called write-on-close, writes out the cache when the file is closed. 3.
Fault Tolerance A system is fault tolerant if the system can work properly even if problems arise in parts of the system. File systems can be characterized as either state-full or state-less. The less implicit state contained in each request, the more fault tolerant a system can be. Availability is how readily accessible files are despite possible problems such as servers becoming unavailable or communication problems. A fault tolerant system should aim to provide high availability. One common way of providing fault tolerance and increasing availability is to replicate files either on multiple disks on the same machine or on multiple machines. Replication greatly complicates the system in maintaining consistency among the copies. Complete consistency sometimes may be sacrificed for performance. 4.
Scalability Another design concern was scalability. We can view scalability in several different ways. Traditionally, people have looked at how many clients file servers can handle simultaneously. Once we leave local file systems and work with a network of machines, we start to care about the number of machines we can work with. Scalability is the ability for the system to handle a large number of active clients and servers and still be able to provide high performance for all involved. IV. DESIGN REQUIREMENTS OF A DFS There are a number of requirements that need to be addressed in the design of a heterogeneous distributed file system. In particular, these are in the areas of distribution, security and concurrency control. • Distribution: A distributed file system should allow the distribution of the file service across a number of servers. In particular, the following distributed file system requirements should be addressed: o Access transparency – clients should be unaware of the distribution of files across multiple servers. o Location transparency – clients should see a uniform file name space, which is independent of the server where the files are located. o Hardware and operating system heterogeneity – a distributed file system should provide a mechanism to allow multiple file servers, each running on a different file system type, to interact with each other, thus providing one global file system to its clients. • Security: Due to the inherent insecurity of the internet, which is vulnerable to both eavesdropping and identity falsification, there is a need to both secure all communication and verify the identity of all parties in a secure manner. The security architecture of a distributed file systems should provide authentication of both client and server, in addition to securing all communication between them.
A NOVEL SECURITY SCHEMA FOR DISTRIBUTED FILE SYSTEMS
• Concurrency: As the file server may be accessed by multiple clients, the issues of concurrent access to files needs to be addressed. A distributed file system should provide a locking mechanism, which allows both exclusive and shared access to files. V.
DISTRIBUTED SECURITY SYSTEMS
Distributed systems are more vulnerable to security breaches than centralized systems as the system is comprised of potentially untrustworthy machines communicating over an insecure network. The following threats to a distributed system need to be protected against: • An authorized user of the system gaining access to information that should be hidden from them. • A user masquerading as someone else, and so obtaining access to whatever that user is authorized to do. Any actions that they carry out are also attributed to the wrong person. • Security controls being bypassed. • Eavesdropping on a communication line, thus gaining access to confidential data. • Tampering with the communication between objects modifying, inserting and deleting transmitted data. • Lack of accountability due, for example, to inadequate identification of users.
307
SSL provides session encryption by protecting against people listening in on the network. The files, however, are still seen in unencrypted form by the file server. In some situations, one may not want the system administrator of the file server to be able to read one’s files. A user may also be concerned about someone breaking into the file server and gaining access to all of the files there. File encryption prevents the leak of information to someone who has access to the file server. Several significant differences exist between file encryption and session encryption. Files are persistent; so long term security must be taken into account. Encrypting a file such that no one can break into the file now is not enough; it must not be possible, or the risk should be sufficiently low, that in the future the file will also remain safe. Another difference is that files can require random access, whereas most encryption techniques simply cannot work with anything other than a sequential stream of data. Digital Signatures
Secure Communication Secure communication is provided through both the encryption and the digital signing of the data being passed over the network.
Digitally signing data prevents the unauthorized modification of data during transmission. Rather than sign the whole message, which would add too much overhead, a unique digest is signed instead. This message digest is produced by a secure hashing algorithm, which outputs a fixed-length hash, unique to that message. This ensures that if the message is modified in any way, the digest will change, which can then be detected at the receiving end. Digitally signing data is accomplished through the use of a public-key cipher. The message digest is encrypted with the private key of the sender, which the recipient deciphers using the senders public-key. If the data was modified, or the digest was encrypted with a different private key, then the original digest and the digest calculated by the recipient will be different. This ensures that both the data was not modified during transmission and that it originates from the true sender.
Encryption
2.
These threats are prevented through the securing of all communication over the network, the correct authentication of both client and server, and the authorization of clients to access resources. 1.
Encryption prevents eavesdropping of sensitive data transmitted over an insecure network. There are two types of encryption algorithms: asymmetric and symmetric. Asymmetric or public-key, algorithms have two keys, a public and a private. Data is encrypted with one key and decrypted with the other. Symmetric encryption, or private key, however, uses just one key, agreed between the sender and recipient beforehand. This method is faster than asymmetric encryption. A common cryptographic technique is to encrypt each individual conversation with a separate key, called a session key. Session keys are useful because they eliminate the need to store keys between sessions, reducing the likelihood that the key might be compromised. However the main problem with session keys is their exchange between the two conversant. This is solved through the use of either a public-key cipher or key agreement algorithm, such as Diffie-Hellman. File Encryption
Authentication Before a client is allowed to access any of the data stored in the file server, it must be able to prove its identity to the server. A secure mutual identification scheme, which does not require the transmission of passwords over the network, and provides authentication of both client and server was designed. This scheme is similar to the model provided by Kerberos, and is described below. Coulouris et al. state that a secret key is the equivalent of the password used to authenticate users in centralized systems. Therefore, possession of an authentication key, based on the client’s password can verify the identity of both the client and the server. The method used to convert the password into an encryption key is based on the method used in the PKCS #5 standards, whereby the key is generated by computing the hash of the password. A user identifier is used to identify the client for a particular session. This identifier can be encrypted with the authentication key to produce a token. This is illustrated in Figure 1.
308
ZAREI ET AL.
3.
Authorization Authorization is the granting of access to a particular resource. This prevents both unauthorized and authorized users gaining access to information that should be protected from them. Authorization is commonly implemented using ACL’s3, which is a list of users associated with each file in the system, specifying who may access the file and how.
Fig.1. Token Generation
4.
Kerberos The name “Kerberos” comes from a mythological threeheaded dog that guarded the entrance to Hades. Kerberos is a network authentication protocol developed at MIT. It is designed to provide authentication to client/server systems using secret key cryptography. Kerberos is based on the SecretKey Distribution Model that was originally developed by Needham & Schroeder. Keys are the basis of authentication in Kerberos and typically are a short sequence of bytes for both encrypt and decrypt: Encryption => plainTxt + Encryption key = cipherTxt Decryption => cipherTxt + Decryption key = plainTxt
It was implemented in C on Linux system. DES encryption algorithm has been used to encrypt the messages. The DES is a reversible operation, which takes a 64-bit block and a 64-bit key, and produces another 64-bit block. Usually the bits are numbered so that the most-significant bit, the first bit, of each block is numbered 1. UDP has been used as a Transport layer protocol for Kerberos. Kerberos deals with three kinds of security objects: Ticket: A token issued to a client by the Kerberos ticketgranting service for presentation to a particular server, verifying that the sender has been recently authenticated by Kerberos. Tickets include an expiry time and a newlygenerated session key for use by the client and the server. Authenticator: A token constructed by the client and sent to the server to prove the identity of the user and the currency of any communication with a server. An authenticator can be used only once. It contains the client’s name and a timestamp and is encrypted in the appropriate session key. 3
Access Control Lists
Session Key: A secret key randomly generated by Kerberos and issued to a client for use when communicating with a particular server. Encryption is not mandatory for all communication with servers; the session key is used for encrypting communication with those servers that demand it and for encrypting all authenticators.
Client’s processes must possess a ticket and a session key for each server that they use. It would be impractical to supply a new ticket and key for each client-server interaction, so most tickets are granted to clients with a lifetime of several hours so that they can be used for interaction with a particular server until they expire. • Advantages of Kerberos: – Secure authentication – Single sign-on – Secure data flow – Client and server mutual authentication • Applications benefiting from Kerberos: – File Systems : NFS ، AFS ، DFS – Shell Access : Login ، rlogin ، telnet وssh – File Copy : Ftp ، Rcp ، Scp ، Sftp – Email : KPOP ، IMAP • Limitations of Kerberos: – Doesn’t explicitly protect against Trojan attacks. – Is mainly intended for single-user workstations. KDC can be a single point of failure. 5.
Secure Sockets Layer The SSL Handshake Protocol was developed by Netscape Communications Corporation to provide security and privacy over the Internet. The protocol supports server and client authentication. The SSL protocol is application independent, allowing protocols like HTTP, FTP, and Telnet to be layered on top of it transparently. The SSL protocol is able to negotiate encryption keys as well as authenticate the server before data is exchanged by the higher-level application. The SSL protocol maintains the security and integrity of the transmission channel by using encryption, authentication and message authentication codes. The SSL Handshake Protocol consists of two phases, server authentication and client authentication, with the second phase being optional. In the first phase, the server, in response to a client’s request, sends its certificate and its cipher preferences. The client then generates a master key, which it encrypts with the server’s public key, and transmits the encrypted master key to the server. The server recovers the master key and authenticates itself to the client by returning a message encrypted with the master key. Subsequent data is encrypted with keys derived from this master key. In the optional second phase, the server sends a challenge to the client. The client authenticates itself to the server by returning the client’s digital signature on the challenge, as well as its public-key certificate. This is the standard protocol used by sites on the Web needing secure transactions.
A NOVEL SECURITY SCHEMA FOR DISTRIBUTED FILE SYSTEMS
309
5. After getting message from TGS the client decrypts it by own private key and get session key. Then makes a new message similar below and sends it to server (client encrypts its own request with session key that has got from TGS ago). Message = {user-name, Sealed-request} Fig.2. Secure Sockets Layer
VI. CONCLUSIONS AND PROPOSED SOLUTION To receive to the security in distributed file systems there are different mechanisms such as encryption, certificate authentication and digital signature. DES use for encryption and protection of information against different kinds of attackers. In this paper several common methods for installing secure communication between client and server such as SSL, Kerberos and mutual authentication are explained completely. Finally we proposed a method for increasing security in the distributed file systems that uses the combination of proposed methods in this paper. Figure 3 and Figure 4 are a view of our proposed schema that is based on a certificate or TGS4 and an AS5. Finally secure communication between client and server will be made. We will describe step-by-step our system as implemented in this paper:
6. After receiving message from client the server reviewed the certificate which is related with this client (that has got from TGS ago). If the expiration time of client certificate is not reached, by using of session key that existing in certificate, it decrypts the client request and after doing the request it encrypts the result by session key and sends to client. At the end client decrypts the result by session key and use it. Our proposed system has all of the benefits of Kerberos but it has decreased the number of interchanged messages for making a secure communications between client and server from 6 messages to 5 messages. As in networks systems we need rapid services so the rapid of services is increased with decreasing of number of messages. This is what we want in distributed systems and networks. In following table two common methods that used to create secure communications in networks are compared with our proposed architecture.
1. Client sends a message to AS for receiving certificate (the message includes username, Authentication server name and client service request that is sealed with client private key) Message = {user-name, AS-name, SealedClientServiceRequest}
2. The AS checks its ACL (Access Control List) for determining validity of client. If the client has validity, AS puts request and user-name of client on buffer (client Authentication). The TGS Then takes the request and username from buffer and finds the private key of client from the username and decrypt the client request. If client valid for use of its requested service, TGS creates a certificate includes username, certificate expire time and session key.
Fig.3. Proposed scurety system Architecture
Certificate = {user-name, certificate- expiration time, sessionkey}
3. TGS makes a message similar below and after encryption it by server private key, sends it to server. Message = {Certificate}
4. TGS makes another message similar below and after encryption it by client private key, sends it to client. Message = {session-key} Fig.4. Message time orderin Proposed Architecture 4
Ticket Granting Server 5 Authentication Server
310
ZAREI ET AL. System Comparison Criteria
Kerberos
SSL
Our Proposed Architecture
Key
Private Key
Public Key
Private Key
Time Synchronization
Synchronous
Asynchronous
Synchronous
Suitable Application
Networks Environments
Ideal for the WWW
Networks Environments
Passwords reside in users’ minds where they are usually not subject to secret attack.
Security
Kerberos has always been open source and freely available.
Availability Number of Interchanged Messages
6 Messages
Certificates sit on a user’s hard drive (even if they are encrypted) where they are subject to being cracked.
Passwords reside in users’ minds where they are usually not subject to secret attack.
Uses patented material, so the service is not free. ------
-----5 Messages
Table 1- Comparison of different systems to create secure communications
REFERENCES [1] Marcus O’Connell, Paddy Nixon - JFS: A Secure Distributed File System for Network Computers, 25th EUROMICRO ‘99 Conference, Informatics: Theory and Practice for the New Millenium, 8-10 September 1999, Milan, Italy. IEEE Computer Society 1999. [2] Benjamin C. Reed, Mark A. Smith, Dejan Diklic - Security Considerations When Designing a Distributed File System Using Object Storage Devices –. IEEE Computer Society 2003. [3] John R. Douceur and Roger P. Wattenhofer – Optimizing File Availability in a Secure Serverless Distributed File System, 20th Symposium on Reliable Distributed Systems (SRDS 2001), 28-31 October 2001, New Orleans, LA, USA. IEEE Computer Society 2001. [4] Ted Anderson & Leo Luan - Security Practices in Distributed File Systems, The Second International Workshop for Asian Public Key Infrastructures Taipei, Taiwan , October 30November 01, 2002.
[5] Scott A. Banachowski , Zachary N. J. Peterson , Ethan L. Miller & Scott A. Brandt - Intra-file Security for a Distributed File System, Proceedings of the 10th Goddard Conference on Mass Storage Systems and Technologies / 19th IEEE Symposium on Mass Storage Systems, College Park, MD, April 2002, pages 153–163. [6] Kevin Fu, M. Frans Kaashoek, and David Mazieres, Fast and secure distributed read-only file System, ACM Transactions on Computer Systems, February 2002. [7] Distributed File System Security with Kerberos. Austin Godber, http://uberhip.com/godber/cse531 [8] Q. Xin, E. L. Miller, and T. J. E. Schwarz. Evaluation of distributed recovery in large-scale storage systems. In Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing (HPDC), pages 172–181, Honolulu, HI, June 2004.