EH - Extendible Hashing in a Distributed Environment - CiteSeerX

0 downloads 0 Views 70KB Size Report
West Virginia University. P.O. Box 6104 ... the case of Extendible Hashing [3], the directory element is the centralized .... At the beginning of the loop, the Cache.
EH - Extendible Hashing in a Distributed Environment Victoria Hilford, Farokh B. Bastani Department of Computer Science University of Houston Houston, TX 77204 - 3475 Email: [email protected]

Bojan Cukic Department of Electrical and Computer Engineering West Virginia University P.O. Box 6104 Morgantown, WV 26506 Email: [email protected]

Abstract In today’s world of computers, dealing with huge amounts of data is not unusual. The need to distribute this data in order to increase its availability and increase the performance of accessing it is more urgent than ever. For these reasons it is necessary to develop scalable distributed data structures. In this paper we propose EH  , a distributed variant of the Extendible Hashing data structure. It consists of buckets of data that are spread across multiple servers and autonomous clients that can access these buckets in parallel. EH  is scalable in the sense that it grows gracefully, one bucket at a time, to a large number of servers. The communication overhead is relatively independent of the number of servers and clients in the system. EH  offers a high query efficiency and good storage space utilization. The simulation results reveal that the method is comparable to the LH  introduced by Witold Litwin.

1. Introduction A classic problem is how to store information so that it can be searched and retrieved efficiently. That is, given a set of records - each partitioned into a unique identifier called the key (Ki ) and a “data part” containing all other desired information (Di ) - we want to retrieve arbitrary records from the set in the shortest possible time. A unique key is needed in each record to implement operations such as lookup, insert, delete, and update, as well as maintenance actions to handle overflow. The data part of each record contains one or more fields of arbitrary type, such as numbers, text, and dates. Hashing, a technique that mathematically converts a key into a storage address, offers one of the best methods of finding and retrieving information associated with a unique identifying key. Dynamic hashing techniques will allow the storage to expand and contract with the number of record insertions and deletions. Three categories of dynamic hashing schemes are the Dynamic Hashing of Larson

[6], the Extendible Hashing of Fagin et. al. [3], and the Linear Hashing of Litwin [7]. As the amount of data increases and as distributed computer systems become more available, the need for distributed data structures increases. In the past years, only a few distributed data structures have been introduced. In 1993, W. Litwin et al. [8] introduced LH , an efficient, extensible, distributed variant of the Linear Hashing (LH) [7] data structure. It generalizes Linear Hashing to parallel or distributed RAM and disk files. LH also allows parallel operations. LH , just like LH, is a directoryless algorithm. It locates buckets through one of two algorithms based on the current split level and the bucket number. A client maintains what it thinks is the current split level and highest numbered bucket. Because the table can grow without the client knowing, the actual bucket may be located elsewhere. In 1993, R. Devine [1] introduced DDH (Distributed Dynamic Hashing), an extension of the Dynamic Hashing method. It proved to be a scalable distributed data structure as well. In 1994, B. Kroll et al. [5] introduced a distributed search tree (DRT) with good storage space utilization and high query efficiency. The problem in taking the centralized versions of several data structures and creating their distributed variants is not straightforward. Several obstacles occur. For example, in the case of Extendible Hashing [3], the directory element is the centralized component that cannot be distributed, but instead can be replicated. Keeping replicas consistent is an expensive operation. The distributed version of the Linear Hashing [7], LH , has aso some drawbacks. Splits must be ordered for the clients, since the splits are required to follow the bucket numbering order within each level. Determining when a bucket can be split is not an autonomous decision that can be made by the affected server. The ordering of splits is imposed on the buckets to support the directoryless character. This restriction is inherent in LH  because of the need for all buckets to be determined from one of two bucket location functions. For this reason a centralized split coordinator participates in the bucket split operation. The

split coordinator can create a hot-spot. Also, the bucket that is not allowed to immediately split must handle overflow locally. Further, since all the buckets on a level must split before the next level can start to be split, this causes premature splitting of non-full buckets. The distributed search tree, DRT, has parts of the tree that are replicated (at least the root or, more generally, the entry nodes into the tree). These replicated parts represent a bottleneck. All these examples show that classical data structures are not easily amenable to distribution. Trade-offs are the only way centralized data structures can be transformed into distributed data structures. Therefore, much more research should be done in the area of distributed data structures. Having all these factors in mind, we attempted to distribute the Extendible Hashing of Fagin et. al. [3]. The directory data structure was replaced by a different data structure that is more efficient. This new data structure, called Cache Tables, eliminates the multiple directory entries that point to the same bucket. In the next section, the Extendible Hashing algorithm in its original form is discussed. In Section 3, a new data structure, EH , is introduced. Section 4 presents the results of a simulation evaluation of EH . We conclude and summarize the paper in Section 5.

2. Extendible Hashing Extendible Hashing divides the storage into fixed-length blocks or buckets and allocates one bucket at a time as needed. It has two levels of organization: a first level of a directory denoted by D that consists of a linear array of bucket pointers and a second level of buckets which holds the records or pairs of keys and record pointers. The directory, D, is headed by a value (D.H) called the global depth of the file. This value varies with the directory space expansion and contraction. Given a record ri with the key Ki, the target bucket for the record is derived as follows. The pseudokey, Ki0 = Ψ(Ki ) is first generated.0 Ki0 is a string of binary digits i 2 0; 1 of the form Ki = w?1 ::: 1 0 . Then the 0 (D.H) - suffix bits of Ki , considered as an integer, provides an index q of a directory element Dq . Let G denote the address generation function. Then, the index q is given by G which is defined as

G (D:H; Ki ) = 0

P= ? 2. D:H j

0

1

j

j

The directory element Dq contains a bucket pointer, denoted by Dq :B , which gives the bucket address into which the record ri must be stored or looked up. This method requires two or more probes (one into the directory and at least one into the bucket) per lookup. A disadvantage of extendible hashing is that the directory size can grow exponentially in the number of record insertions unless the generated pseudokeys are uniformly distributed over the directory space.

3. EH With the growing number of networked workstations that share information in a distributed manner, the arguments for a distributed data structure are compelling. First, distribution provides freedom from single node limitations that can create a bottleneck when multiple clients send requests to the same node. Secondly, easy scalability as defined in [2] can be achieved. This means that starting with a system where one server manages a file of a specific size that is accessed by a specific number of clients at a specified rate, a scalable approach can efficiently manage a file that is n times larger and accessed by n times more clients at the same per-client rate, by adding servers and distributing the file across these servers. Furthermore, the response time of the client’s requests should be as good as in the one-server case. The three distributed approaches LH  , DDH, and DRT are scalable since the communication overhead is largely independent of the number of servers and clients in the system. EH , the new distributed data structure introduced in this paper, is also scalable. To overcome the problem of the directory not being able to fit in main memory due to its size, we take the following approach. Replace the directory structure with Cache Tables (eliminate multiple directory entries pointing to the same bucket) that will map keys to servers (buckets). This will accomplish two functions. One, the collapsing of the directory into these several tables (called Cache Tables) will provide a space efficient storage. Second, the Cache Tables will not be centralized as in the case of the extendible hashing, and independent replicas do not need to be consistent. Cache Tables are kept by the clients and servers. These Cache Tables may become obsolete, but will never give incorrect information. The following is a general description of the mechanics. The details of Cache Tables, the Client Algorithm, and the Server Algorithm are given in the next subsections. The environment consists of clients and servers. Clients issue INSERT REQUEST to insert keys and SEARCH REQUEST to retrieve keys. A client’s request is sent to a server based on the Cache Tables information available at that time in the client. A server receives the client’s request and checks its Cache Tables to determine if it is the correct server to process the request. If it is, the operation is performed. If the operation is an insert, either there is room to insert the key or a split will take place. In Figure 1, the first scenario is depicted. In the second scenario, i.e., when a split takes place, the split flag in the INSERT ACK message is set so that the client can update its Cache Tables. In case no flags are set, the server does not have to send an INSERT ACK back to the client. If the operation is a search, a RETRIEVE ACK message is sent back to the client.

research. The same assumptions are used in the distributed version of the Linear Hashing LH [8] and the distributed version of the Random Tree DRT [5].

Client sends INSERT REQUEST to correct Server that has room to insert. Server sends INSERT ACK back to the Client. INSERT REQUEST

Client

3.1. Cache Tables

Server

INSERT ACK

Figure 1. Client sends INSERT REQUEST to a correct Server If this is not the correct server (which means that the client’s Cache Tables are obsolete), an addressing error will occur, and the server will forward the client’s request to another server according to its Cache Tables information. The server receiving this forwarded message will make the same assessment of whether it is the server that should process the request. If it is not the one that should process the request (meaning that this server’s Cache Tables are obsolete as well), it will reforward the message. In a later subsection, we will investigate the worst case for the number of forwarded messages. When a server A suffers an addressing error, the receiving server B will send a message back to server A with the addressing error flag set. This will allow server A to update its Cache Tables. The final server at the end of this chain will process the request. If the request is insert, then the final server will determine if room is available or not. If room is not available, a split will take place. The final server will send a response to the client with the status of the operation. The scenario when the final server has room is depicted in Figure 2. In the INSERT ACK response sent back to the client, the addressing error flag is set to allow the client to update its Cache Tables. In the scenario when the final server is full and a split takes place in the INSERT ACK message, both the split flag and the addressing error flag are set.

We are using the same notations as in the Extendible Hashing section. Thus, given a record ri with the key Ki , 0 the pseudokey Ki is generated. Cache Tables have the characteristic that several levels of the file H (global depth of the file) coexist. This permits an obsolete view of the file to exist, the view being updated using lazy updates [4]. The penalty of using an obsolete view is an increase in the number of messages and not in the correctness of the algorithm. Let us call this range of levels MinLevel and MaxLevel. The meaning of these two levels is that Cache Tables are searched by using H from (MinLevel)-suffix bits 0 0 of Ki up to (MaxLevel)-suffix bits of Ki . The integer corresponding to these suffix bits is an index q in the Cache Tables with the information about the server (or bucket) into which the record ri must be stored or searched. q is calculated using G defined in the Extendible Hashing section. If the server computed MinLevel-suffix bits does not exist in the Cache Tables, then the number of suffix bits is incremented by one, and the process is repeated. When we reach MaxLevel-suffix bits we have found the corresponding server. Initially MinLevel and MaxLevel are zero (i.e. no suffix bits are needed to insert keys into the Initial Server). When this Initial Server becomes full, an additional server is allocated. MinLevel and Maxlevel are then set to 1, the meaning being that the range of levels is 1-suffix bits. All the keys (belonging to the Initial Server) whose pseudokey 1-suffix bit is 1 are moved to the newly allocated server, let us call it server 1, while the keys whose pseudokey 1-suffix bit is 0 stay in the Initial Server, let us call it server 0.

3.2. Client Algorithm

Client sends INSERT REQUEST to incorrect Server; forwarding to the final Server that has room. Server sends INSERT ACK back to Client.

INSERT REQUEST

Client

INSERT REQUEST

Server

INSERT REQUEST

Server

Server

INSERT ACK

INSERT REQUEST

Server

Server

INSERT ACK

(addressing error flag is TRUE)

(addressing error flag is TRUE)

INSERT ACK (addressing error flag is TRUE)

Figure 2. Client sends INSERT REQUEST to an incorrect Server All of these messages will add up to the cost of the operation (either insert or search). In this paper we do not deal with key deletions. This is a more complicated problem that can be part of future

Initially the client’s Cache Tables are empty, with MinLevel and MaxLevel being zero. A loop for all the client’s requests is entered. At the beginning of the loop, the Cache Tables are searched in order to determine the server to whom the request will be sent. Then the request is sent and the client waits for an acknowledgement. The final server in the chain of the possible forwarding servers will respond to the client. If the response does not indicate a split or an addressing error, then the client proceeds with its next request. If either a split or an addressing error occurs, then the client updates its Cache Tables with the information in the ACK response from the final server. Several clients can operate on this distributed file at the same time and all requests and split operations are performed concurrently.

3.3. Server Algorithm The server starts with empty Cache Tables and MinLevel and MaxLevel zero. It loops forever waiting for requests. Each request can come from either a client or another server (in the case of addressing errors). The server can receive requests to insert a key (INSERT REQUEST), to retrieve a key (RETRIEVE REQUEST), to initialize as a new server (NEW SERVER INIT), or an acknowledgement for a forwarded insert or retrieve message (INSERT ACK or RETRIEVE ACK). In the case of a request to insert or retrieve a key, the server first checks to see if it is the correct server by checking its Cache Tables using it’s MinLevel and MaxLevel. If it is, it proceeds with the request. In the case of an insert request, a split of the server may occur. In the case of an acknowledgement for a forwarded insert or retrieve request, the server may receive information about its Cache Tables being obsolete, either due to a split or additional forwarding. With the information received, the server can update its Cache Tables, MinLevel, and MaxLevel. In the case of server initialize request, this new server will receive its buddy’s Cache Tables, MinLevel, and MaxLevel information, plus the actual keys that hash to this new server. For efficiency reasons, all the information needed by the new server can be sent in one message and the keys belonging to this new server can be sent in another message (if possible). Some of the messages received by a server are depicted in Figures 1 and 2. The operations that take place are executed in parallel. Multiple clients can insert and retrieve keys in parallel with each other. From the viewpoint of message reception, all servers operate atomically. Clients do not see the internal state of servers. Rather, the consistency points are defined by servers as they send and receive messages. All server operations are locally serialized through complete processing of a message before starting another. If an insertion causes a split, the affected servers must mutually ensure consistency of their parts of the distributed file.

3.4. File growth example Let us assume for simplicity, that the capacity of each server is 4 keys. In this example we use two clients that insert keys. We start with one server (Initial Server) and MinLevel and MaxLevel zero. This means that both clients insert keys into the Initial Server until it gets full causing a split. The notation used in the top part contains information about the MinLevel, MaxLevel, and Cache Tables, and the bottom part contains the bucket itself with the pseudokeys. The notation used for the Cache Tables is that pseudokeys ending in 0 or 1 (i.e., X) will hash to the Initial Server. When the split occurs, the Cache Tables, MinLevel, and MaxLevel are updated. The first 4 pseudokeys (Client

1 pseudokey 1(00100), Client 2 pseudokeys 2(10010) and 3(01110), and Client 1 pseudokey 4(01001)) are inserted in the Initial Server, according to the Cache Tables contents and the Client Algorithm. The file configuration before Client 1 inserts pseudokey 5 (01100) is shown in Figure 3. MinLevel = 0 MaxLevel = 0 A Initial Server

MinLevel = 0 MaxLevel = 0 Client 1

MinLevel = 0 MaxLevel = 0 Client 2

X -> Initial Server

X -> Initial Server

X -> Initial Server

0 1 0 0

1 2 3 4

0 0 1 1

1 0 1 0

0 1 1 0

0 0 0 1

Figure 3. File configuration after 4 keys inserted When Client 1 sends the request to insert pseudokey 5 into the Initial Server, it causes the Initial Server to split. All the pseudokeys ending in X0 stay in this Initial Server (it becomes Server 0), and all the pseudokeys ending in X1 are moved to the new server (Server 1). MinLevel and MaxLevel are set to 1. Figure 4 shows the file configuration after the split and after Client 1 receives the response back (INSERT ACK with the split flag set).

MinLevel = 1 MaxLevel = 1

MinLevel = 0 MaxLevel = 0

MinLevel = 0 MaxLevel = 0

A

Client 1 X0 -> Server 0 (A) X1 -> Server 1 (B)

Client 2

Initial Server

X -> Initial Server

1 2 3 4

0 1 0 0

0 0 1 1

1 0 1 0

0 1 1 0

X -> Initial Server

0 0 0 1

MinLevel = 1 MaxLevel = 1 A Server 0

MinLevel = 1 MaxLevel = 1 B Server 1

X0 -> Server 0 (A) X1 -> Server 1 (B)

1 2 3 5

0 1 0 0

0 0 1 1

1 0 1 1

0 1 1 0

0 0 0 0

X0 -> Server 0 (A) X1 -> Server 1 (B)

4

0

1

0

0 1

Figure 4. File configuration after the first split This process continues and eventually MinLevel will be incremented.

3.5. Chain length worst case analysis The search of the Cache Tables with the use of MinLevel and MaxLevel range follows the traversal of a tree from the root (signifying the Initial Server) down towards the nodes going through each level of the tree by using an additional

suffix bit. The worst possible situation is when the actual MaxLevel bits need to be traversed in order to satisfy the client’s request. Then, the response to the request will also trigger adjustment messages to the servers in the chain in addition to the reply sent to the client. In reality, as seen during the simulation, the average case performance is close to ideal, which is 1 message for insert requests and 2 messages for retrieve requests.

4. Simulation results In order to compare the performances of both, EH  and LH , the simulation software package CSIM [9] was used. The simulation environment consists of a set of sites connected in a network and communicating with each other by sending and receiving messages. A site can be either a server that stores data or a client that initiates requests. Each site has an associated message queue for all incoming messages. It is assumed that the communication over the network is free of errors. Each server (assume a total of s servers) can store a constant number of keys, called the server’s (bucket’s) capacity (b). The organization of the keys within each server is irrelevant for the purpose of this simulation. It can be organized locally as a B + -tree for example. One or two clients issue insert and retrieve requests in parallel. The original message from the client is sent to the server in the client’s current view of the file. If the receiving server is not the one that contains the key, then the message is forwarded to another server. This message forwarding continues until the correct server is reached. Then, an adjustment message is sent back from the final server to the client. This adjustment message updates the client’s view of the current state of the file. One file performance measure is in terms of the the number of messages exchanged between the client and server(s) in order to satisfy a client’s request. The other file performance measure is in terms of the storage space utilization defined as the ratio of the total number of keys (n) and the n capacity bs, or storage space utilization = bs . The initial configuration consists of one server. As keys are inserted, the servers split one at a time to allow for the additional keys. This process continues until a specified number of keys is inserted. Statistics about the average number of messages per inserted key as seen by the client (AvMsgs) are accumulated. Also, information on the number of addressing errors incurred during the keys insertion (Errs) is accumulated. Additional information about the average number of messages per insert in the case where each client’s insert request is acknowledged by the final server (Msgs-ack) is tracked during the insert phase. Also, during the insert phase, storage utilization is tracked. After the insert phase, a retrieve phase is started. The client’s view of the file is cleared, and then the client starts a

series of search requests. The performance measure in this case is the average number of messages per retrieved key (AvMsgs). The same experiments were carried out for the EH  and LH techniques. Table 1 presents the performance of the build and search phases of a file when one client first inserts 10,000 random keys and then resets its view and retrieves 1,000 random keys. This was repeated 100 times and the results were averaged. What is varied is the server’s bucket capacity BKT Cap from 62 to 8,000 keys and the number of servers available (or the number of buckets No of Bkts) from 255 to 2. The simulation results show that in the case where no acknowledgments were sent, the average number of messages per key insert is very close to the best possible - one message per insertion for both methods. This performance difference is under 3% for both EH  and LH . Table 1 also shows a better performance of the EH  over the LH for the insert performance. In the case of one client, the EH method incurs 0 addressing errors (the explanation is the fact that the client’s view of the file is kept up to date at all times) compared to 94 down to 1 addressing errors for the LH method. Even in the case when acknowledgments are used, EH outperforms LH . As far as the search performance is concerned, both methods produce results close to the best possible - two messages per retrieval. Statistics about storage utilization are also recorded in Table 1 for both methods. As a general observation, performance is better for larger bucket capacities for both methods. Another experiment was a performance analysis of the algorithm when two clients concurrently access the file. The case of one client being less active than the other is simulated. The Insert Ratio column in Table 2, the N:1,N is to be interpreted as how many keys the first client (Client0(Active)) will insert before the second client (Client1(Less Active)) will be allowed to insert one key. N is referred as the Insert Ratio. The first client always inserts 10,000 keys. Thus, with an Insert Ratio of 100 to 1, the second client only inserts 100 keys. All insert operations required an acknowledgment. The bucket capacity was set to 500 keys. Table 2 shows the average number of messages per insert (AvMsgs), the total number of addressing errors (Errs), and the percentage of addressing errors related to the number of inserts (%Errs) for both the EH and LH . The table shows that the performance of the second client degrades as it is made less active. This happens because the inserts by the first client expand the file thus causing the second client’s view of the file to be outdated. This then results in the second, less active client experiencing an increased percentage of addressing errors (%Errs). Comparing EH and LH , it looks like the EH outperforms LH in the case

Bkt Cap

No of Bkts

25 62 125 250 1000 4000 8000

1050 255 128 64 16 4 2

Errs EH LH 0 146 0 94 0 64 0 41 0 14 0 3 0 1

Build Av Msgs EH LH 1.0566 1.2852 1.0251 1.200 1.0127 1.064 1.0063 1.033 1.0015 1.009 1.0003 1.002 1.0001 1.001

Msgs-ack EH LH 2.0566 2.2706 2.0251 2.111 2.0127 2.057 2.0063 2.029 2.0015 2.007 2.0003 2.002 2.0001 2.001

Search Av. Msg EH LH 2.363 2.005 2.167 2.007 2.006 2.006 2.038 2.006 2.004 2.004 2.000 2.002 2.000 2.001

Storage Utilization EH LH 0.706 0.6240 0.640 0.6325 0.625 0.625 0.625 0.625 0.625 0.625 0.625 0.625 0.625 0.625

Table 1. Build and search performance (10K inserts and 1K retrieves) Insert Ratio 1:1 10:1 100:1 1000:1

Av Msg EH LH 2.0015 2.004 2.0003 2.002 2.0000 2.002 2.0000 2.002

Client 0 (Active) Errs EH LH 35 39 4 20 1 20 0 20

% Errs EH LH 0.35% 0.39% 0.04% 0.20% 0.01% 0.20% 0.00% 0.20%

Av. Msg EH LH 2.0014 2.004 2.0080 2.013 2.1100 2.110 2.8000 2.500

Client 1 (Less Active) Errs % Errs EH LH EH LH 28 38 0.28% 0.38% 27 13 2.70% 2.70% 25 10 25.00% 10.00% 8 5 80.00% 50.00%

Table 2. Two Clients (bucket capacity = 500) of the first client (the active one), but is outperformed in the case of the second client (less active one).

5. Conclusion With the increase of the amount of data that we are dealing with, the need to quickly access it in parallel is needed. At the heart of distributed algorithms are data structures. The need to take a centralized version of a data structure and distribute it over a network is urgent. Criteria such as increased availability and performance in accessing the data are used. Also scalability in the sense that the growth is graceful, one bucket(server) at the time without loss in response time is sought. EH is an efficient, scalable, distributed data structure. It provides new methods to be used in applications such as next generation databases, distributed dictionaries, bulletin boards, etc. EH analysis shows that it is efficient during insertions and retrievals, with performance close to optimal. It also offers good storage space utilization.

References [1] R. Devine, Design and Implementation of DDH: A Distributed Dynamic Hashing Algorithm, In 4th Int. Conf. on Foundations of Data Organization and Algorithms, 1993, pp. 101-114. [2] D.J. DeWitt, J.N. Gray, Parallel Database Systems: The Future of High Performance Database Systems, In Communications of the ACM, Vol. 35 No. 6, 1992, pp. 85-98. [3] R. Fagin, Jurg Nievergelt, N. Pippenger, and H. R. Strong, Extendible Hashing - A Fast Access Method for Dynamic Files, In ACM Transactions on Database Systems, 1979, pp. 315-344. [4] Theodore Johnson, Padmashree Krishna, Lazy Updates for Distributed Search Structure, In ACM-SIGMOD Intl. Conference on Management of Data, 1993, pp. 337-346. [5] Brigitte Kroll, Peter Widmayer, Distributing a Search Tree Among a Growing Number of Processors, In ACM-SIGMOD Intl. Conference on Management of Data, 1994, pp. 265-276. [6] P. Larson, Dynamic Hashing, In BIT, Vol. 18(2), 1978, pp. 184-201. [7] W. Litwin, Linear hashing: A new tool for file and table addressing, In Proc. of VLDB, 1980, pp. 212-223.

Acknowledgement The authors would like like to thank W. Litwin, M-A. Neimat and D. Schneider for their support in simulating the LH .

[8] W. Litwin, M-A Neimat, and D. Schneider, LH : Linear Hashing for distributed files, In ACM-SIGMOD Intl. Conference on Management of Data, May 1993, pp. 327-336. [9] H. Schwetman, CSIM Reference Manual, In Technical Report, MCC, Austin, Texas, 1992.

Suggest Documents