Implementing Highly-Available WWW Servers based on ... - CiteSeerX

4 downloads 14898 Views 276KB Size Report
In this paper we investigate issues related to building highly-available World Wide ... namic Domain Name Server (DNS) and a WWW server based on passive ...
Implementing Highly-Available WWW Servers based on Passive Object Replication Roberto Baldoni, Simona Bonamoneta, Carlo Marchetti December 1998

Technical Report 14-98 Dipartimento di Informatica e Sistemistica Universita di Roma \La Sapienza"

Abstract In this paper we investigate issues related to building highly-available World Wide Web (WWW) servers on workstation clusters. We present a novel architecture that includes a dynamic Domain Name Server (DNS) and a WWW server based on passive object replication. This architecture allows to reduce the service down-time of a WWW server without impacting on the Hyper Text Transfer Protocol (HTTP) and, thus, on the WWW client software. We implemented this architecture in our department and we show some experimental results that analyze, given a batch of HTTP requests, the average response time to a request and the average time of WWW service unavailability due to object crashes.

Keywords: WEB computing, Domain Name System (DNS), High-Availability Services, Passive Object Replication, Primary-Backup Approach.

Authors address: Dipartimento di Informatica e Sistemistica, Via Salaria 113, I-00198 Roma, Italy. E-mail: fbaldoni,bonamone,[email protected] URL: http:// www.dis.uniroma1.it/~fbaldoni,bonamone,marchetg. 

1 Introduction World Wide Web (WWW) access has becoming widely popular in the recent years with the consequent exponential increasing of client connections. This e ect gave rise to the study and the implementation of scalable and high-performance WWW servers. These servers are usually built on a multi-workstation platform and their aim is to split the (Hyper Text Transfer Protocol) HTTP requests among the workstations in order to share, in a fair way, the load and keep as small as possible the average response time to a client request ([3, 10, 17]). An important initiative to build an high-performance WWW server comes from NCSA [17]. This architecture is based on a roundrobin policy for Uniform Resource Locator (URL) address resolution. Each URL is associated to a list of IP addresses corresponding to the servers providing that WWW service. This list is handled out by the Domain Name System (DNS). Each time a client asks for an address resolution to the DNS, the latter will return the IP address following a round-robin policy (1 ). Moreover,server IP addresses can be added to the list by the DNS system manager at any time. The redundancy of WWW servers provides also some degree of fault tolerance with respect to simple mirrored services. Distributed WWW servers as the ones considered above do not handle a replicated state. i.e., each workstation handles a set of HTTP requests and no information about this handling is given to the other components of the WWW architecture. Ensuring fair load sharing without handling a replicated state among the workstations providing the service is not sucient to design some key distributed WWW services like the ones, for example, including electronic commerce. The latter presents very fast growth over the Internet thanks to some \ad-hoc" software, like IBM's net.commerce [16], that allows companies to build virtual stores on WWW servers. Consider the situation in which a user lls out an HTML form to buy some products from a company that sells on WWW. The user sends the form data and gets back the transaction acknowledgment from the WWW server. If the latter crashes immediately after having sent the ack, this gives rise to several questions: will the user receive the product? For how many time the service will be unavailable to users and, more, how many purchase orders will the company The round-robin policy implemented by NCSA was possible thanks to some particular changes made on the software package Berckley Internet Name Domain (BIND) version 4.9.2 that was used to implement the DNS [1]. 1

2

loose because of the crash? Hence, from the viewpoint of the businesses, such WWW servers have to:

 Reduce the service down-time as small as possible (i.e., High-Availability);  Increase the fault-tolerance of the service as much as possible (i.e., the maximum number of consecutive failures that stop de nitely the service and require human actions to restart the service);

 Ensure that each workstation providing the service \sees" a consistent value for its own replicated state; Previous points can be achieved either by developing non-fault-tolerant software over redundant and expensive hardware (like Stratus and Tandem Systems) or by exploiting techniques based on redundant software. i.e., software replication [13]. Software replication is a low-cost technique providing fault-tolerance that requires objects and processes hosted on di erent workstations to (i) interact in order to maintain consistency among replicas of objects and (ii) to cooperate in order to give the illusion to clients that the service is provided by a single entity. Replication allows the service to survive despite a workstation crash. There are two main classes of replication techniques [4, 5, 13]: Primary-Backup replication and Active replication. In Primary-Backup replication (also named passive replication), a given object y is called primary and it is responsible for handling client requests and for maintaining the value of other y's replicas on the other sites consistent with its own value. Once the primary fails, other replicas must elect a new primary. Active replication requires all replicas to execute the same actions. The client sends an invocation to all the objects, each replica processes the invocation and sends back a result to the client. The client waits until it receives the rst reply. As all the replicas do execute the invocation, no special protocol is required upon the occurrence of a failure. On the other hand, active replication needs each replica to be deterministic in order to compute the same result. As shown in [20], deterministic behavior is hard to get in a multiuser environment. Considering the above example, if the WWW server would be implemented by a set of processes each handling a replica of an object (e.g., the database of the purchase orders), then we could de ne a process that acts like a primary and that noti es an order of a product to the other processes 3

composing the server before sending back the ack to the client. Moreover, once the current active replica of the object crashes, the non-crashed must cooperate in order to re-establish the service as quick as possible in order to reduce the service down-time. This paper focuses on the implementation of highly-available distributed WWW servers over the Internet based on passive object replication. We consider the server implemented over a set of workstations connected by a local area network (LAN). From a logical point of view, the WWW service is provided by a server realized by cooperating processes handling a set of replicated objects. Each workstation runs an instance of the process and handles a replica of each object. Passive replication seemed us to be the most suitable replication technique for the WWW environment as (i) it does not need a client to multicast its invocation to every replica (this is actually not allowed by the HTTP protocol which employs only one-to-one communication between client and server) and (ii) it allows non-deterministic computations. In order to meet previous third point, our implementation ensures updates of objects satisfying the linearizability consistency criterion [15]. This is due to the fact that linearizability is the simplest consistency criterion to be implemented. Intuitively, linearizability ensures that if two non-crashed replicas y1 and y2 receive two invocations Ia and Ib , and y1 handles Ia before, then y2 cannot handle Ib before Ia . From the point of view of the WWW architecture, the fact that the primary can change over the time imposes some modi cations. More speci cally, the DNS must be aware of the primary address currently providing the service. This implies the DNS table (the database used for address resolution) has to be dynamically updated. Note that DNS dynamical updates are compatible with the last software release of BIND (i.e., BIND 8) [1, 18]. Last but not least, the proposed architecture is not intrusive from the client viewpoint, in the sense that it does not change the HTTP protocol and, thus, the client software does not need any modi cations. This paper is structured in four section. Section 2 brie y describes the current WWW architecture, Section 3 introduces the proposed WWW architecture including the DNS supporting dynamic updates and the distributed WWW server based on passive object replication. Section 4 describes how a prototype of the architecture has been implemented in our department and some 4

experimental results concerning the average service down-time and the average response time to a client request. Section 5 concludes this paper.

2 Background: The World Wide Web The WWW architecture can be roughly split into three interacting entities (2 ). The client, running a WWW browser (e.g. NETSCAPE Navigator, Microsoft Explorer, Lynx etc.), the Domain Name System (DNS) and the WWW server. Note that while clients are single entities, servers and DNS can be realized by sets of entities. In particular, a server can be implemented by one or more processes and a DNS is structured as a distributed hierarchical database that allows the mapping between physical IP addresses and logical names of the services (URL). Hence, DNS is, basically, a database partitioned among its entities and each entity is responsible (i.e., it is authoritative ) for a portion of the DNS database. When someone asks for an address resolution, a cooperation is started between the DNS entities in order to reply to the query. A client wishing to retrieve some information from a WWW server, in the simplest scenario, it rst queries the DNS (resolve message in Figure 2) to have back the server IP address (address message in Figure 2), then contacts the server by establishing a TCP connection (HTTP request) on that address (request message in Figure 2). Finally, the server sends back the required information (HTTP reply) to the client (reply message in Figure 2). The previous protocol cannot work properly if a service has not been registered in its authoritative DNS entity (i.e., the entry corresponding to the pair hservice, physical addressi is not in the DNS global database). Thus, an initial o -line registration of the service is needed and is depicted in Figure 2 by the dashed line. This communication is realized by an email exchange or by a telephone call between the DNS system manager and the WWW service system manager. Then, the registration is \manually" executed by the DNS system manager in its portion of the DNS database. Note that a manual registration has to be done, for example, each time the IP physical address of a given WWW service changes. This is clearly a pitfall for implementing highly-available WWW servers. This is not an exhaustive description of the WWW architecture: many details (for example the caching system) are skipped for simplicity. There are many books describing such an architecture in any detail. 2

5

DNS e

lv

o es

ss

r

e dr

ad

request CLIENT

SERVER reply

Figure 1: A Simple WWW Interaction.

3 The Proposed Architecture The two key points of the proposed architecture are: (i) automatic (on-line) and certi ed updates of the DNS table and (ii) a WWW server implemented by processes that handle a common set of replicated objects following a primary-backup approach.

3.1 Dynamic DNS Updates Recent discussions about the DNS architecture have pointed out the need of dynamically updating the DNS table [18]. Dynamic updates allow some certi ed name-servers (3 ) to automatically modify the DNS database. For the most part, dynamic update functionality is used by programs implementing DHCP (i.e., Dynamic Host Con guration Protocol, [11]). These programs assign automatically and dynamically IP addresses to workstations which are booting. The usage of such programs would create a daily ow of updates that could be impossible to be manually handled. Our idea is to allow WWW servers (i.e., workstations which are not name-servers) belonging to a Certi ed Address List (CAL) to send registration messages to their authoritative DNS (e.g., upon the occurrence of a primary crash). Upon the receipt of a certi ed registration message, nameservers execute an automatic registration of the pair hservice, physical addressi in their database (message register in Figure 2). In such a case a rst registration is still necessary (CAL update message in Figure 2) in order to de ne which processes have to be included in the CAL of the their A name-server is the process managing a portion of the DNS database. In the literature \name-server" and \local DNS" are usually used as synonymous. 3

6

CA

L

DNS

da

te

ve

re

ol

gi

s re

up

st

ss

er

e dr

ad

request CLIENT

SERVER reply

Figure 2: The proposed WWW Interaction

authoritative DNS. The pseudo-code version of the name-server is given in Figure 3. It waits for the arrival of either a resolve (line 4) or a register (line 9) message (4 ). If a resolve message is received, then the RESOLVE NAME function is invoked (line 6). RESOLVE NAME is a function that performs a name resolution as described in [1] (5 ). Upon the completion of the name resolution, the required physical address is returned by RESOLVE NAME function and sent back to the sender (line 7). When a register message is received, it is checked if the sender process is authorized to dynamically update the DNS database (line 10) i.e., its physical address is contained in the local (authoritative) CAL for the speci ed service. In the armative, the UPDATE procedure is invoked. This procedure simply updates the local authoritative table of the DNS database by substituting the previous physical address with the new one. Otherwise, no update is performed at all.

3.2 WWW Server based on Passive Object Replication In this section we rst introduce the computation model which is actually similar to the CristianFetzer \Timed Asynchronous Model" [7]. Then we describe the failure detection service we implemented in order to detect crashes. Finally, the structure of the WWW server based on passive Note that manual update commands can be transformed in local register messages. Intuitively, RESOLVE NAME is a function such that, if the local authoritative DNS contains the required address, then it returns this address, otherwise it sends hierarchically and recursively a resolve message to another DNS and waits for a reply. When the reply is received, the physical address is returned. 4

5

7

procedure NAME SERVER() begin do forever select when receive (resolve,service name; sender add) from any ; begin physical add:=RESOLVE NAME(service name); send (address,physical add) to sender add; end when receive (register,hservice name; primary addi); sender add) from any if sender add 2 Certi ed Addresses List then UPDATE TABLES(hservice name; primary addi); endselect od end .

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13)

Figure 3: The name-sever pseudo-code.

replication is proposed.

3.2.1 The Model The system consists of a set of n > 1 server processes, a centralized DNS (6 ), a non-empty set of client processes and a non-empty set of objects. Each server process pi is augmented with a failure detector process fdi [8], has access to a local hardware clock perfectly synchronized with respect to real time (7 ) and handles a replica of each object (note that a server process, its failure detector and its replicas run on the same workstation). A server is able to o er to the clients a set of services. Each service is modeled as a remote method invocation of an object. In the following, we consider servers o ering only one service modeled by an object with only one method whose invocation modi es the state of the object every time it is invoked. The extension to multiple objects implementing multiple methods is trivial. Each pair of entities (both processes and DNS) in the system exchange messages over reliable For simplicity and without lost of generality, we will no longer consider the distributed nature of the DNS database in the following. 7 Bounded drifts with respect to real time are allowed in practical models. e.g., the Cristian-Fetzer \Timed Asynchronous Model" [7]. 6

8

and FIFO communication links. For each link connecting two server processes, it does exist a known time-constant  such that if processes pi and pj are not crashed, then a message sent from pi to pj at time t will be received by pj within t +  [4, 5, 7]. This is a reasonable assumption if server processes are connected over a local area network (see Section 4). Concerning failures, we are interested in analyzing the impact of crashes in WWW servers. As a consequence, we assume the DNS to be reliable and other entities follow a crash failure behavior [14]. In particular, we assume that if an entity (i.e., server process, failure detector process, object) fails, all the entities residing on the same workstation fail as well i.e., roughly speaking, a failure of an entity causes its hosting workstation to halt.

3.2.2 Failure Detection To pursue the aim of implementing highly-available servers, a server process needs to detect the failure of another one. In the past it has been argued that this detection should be decoupled by its client processes and implemented as a middleware service [8, 13, 19]. A server process can ask its failure detector for the most updated state (alive or crashed) of the other server processes and can take decision according to that knowledge. It is well known that detecting failures is dicult in asynchronous distributed systems, in particular, it is practically quite hard to decide if a process is slow or actually crashed (8 ). However, if a WWW server is implemented on a local area network, there are some guarantees on timed message delivery [7] and, as shown in [9], a failure detector process can be obtained by a simple protocol of independent assessment using multicast of messages. Each failure detector process multicasts an I'm alive message every  time-units to all the other failure detectors. A non-crashed failure detector fdi perceives the crash of pj if the time elapsed since the last receipt of a fdj I'm alive message is greater than  . In such a case the fdi view of the state of fdj switches from alive to crashed. In the model we consider, this implies the workstation hosting fdj has halted. From an operational viewpoint, exploiting a broadcast network (e.g. ETHERNET, TOKEN RING), implies the number of messages sent every  time-units to be n. In the case of a point-to8

From a theoretical point of view, this diculty gave rise to the FLP impossibility result [12].

9

point network, this number increases to n(n ? 1). This message trac can be reduced by observing that, in a crash failure model, is not necessary that each server independently detects the failure of all the others. Hence, attendance lists or a neighbor surveillance protocols [6] can be used to reduce to n the number of messages sent every  time-units.

3.2.3 Primary-Backup As said in the Introduction, in the primary-backup paradigm, one of the replicas (the primary ) plays a special role. It receives all the requests from clients, computes the reply and updates all non-crashed replica backups. Finally, the primary sends the results back to the client. Under the model of Section 3.2.1, this paradigm can tolerate up to n ? 1 server process crashes before halting the service [4]. The pseudo-code of a server process pi is shown in Figure 4. It rst checks if it has been designated to be the primary (line 4). If this is not the case, it enters the while loop and acts as a backup (line 5 { 12), otherwise it goes to line 13. In the while loop a backup waits for one of the following two events:

 If an update message is received from the primary, then it updates the object replica it handles (lines (7{8)),

 if a crash of the primary has been detected by fdi (FD(primary number)=dead - line (9)) then it has to designate a new primary calling the ELECTION function (line 10). The ELECTION function selects an integer j 2 [2; n] (9 ) such that

8i 2 [1; j ? 1] ) FD(i) = dead ^ FD(j ) = alive corresponds to the non-crashed server process pj with the lowest identi er (10 ). As we adopt independent assessment as failure detection mechanism, process pj has to wait  time-units before starting to act as a primary in order each process can complete the crash detection and the election procedure. We assume that initially primary number is 1 in each server process. Note that assumptions of our timed model are sucient to guarantee the agreement among the server processes on the identi er of the new primary. 9

10

10

Once the new primary has been elected, the primary goes to execute line 13. The primary as its rst action, noti es its address to the DNS by means of a register message (line 13), and then it starts the primary code which corresponds to the execution of an in nite loop (lines 14{26). Upon the receipt of a request message, it computes the object state update, the object reply, sends the update messages to the backups, updates its replica state and nally sends back the reply to the client. The interaction protocol proposed in Figure 4 is non-blocking, in the sense that, the primary does not wait for any update acknowledgment from the backups. As remarked in [5], this low-cost interaction can be meet only under assumptions of crash failure and FIFO and reliable channels.

init name server:= authoritative DNS address; procedure SERVER (primary number) begin % Backup code while (i =6 primary number) do begin select when receive (update; update data) from primary number COMMIT STATUS(status; update data); when FD(primary number)=dead primary number:=ELECTION(); endselect end send (register; hservice name; ii; i) to name server; do forever % Primary code begin receive (request; request data) from client i

update data :=COMPUTE STATUS UPDATE(status; request data); reply data :=COMPUTE REPLY(status; request data); if update data 6= null then

od end .

begin for j 2 f1; : : :; ng ? fig do if FD(j )= alive then send (update; update data) to j ; COMMIT STATUS(status; update data); end send (reply; reply data) to client; end Figure 4: The server process pi pseudo-code. 11

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (26) (27)

Note that linearizability is ensured only if (i) the update (line 21) is executed atomically (i.e., all or none of the replicas will be updated) and (ii) the request message has a suciently rich semantic to allow a new primary to discover that a client request has already caused a replica update. The latter scenario can occur if a primary crashes just before the execution of line 24. In such a case, after the expiration of a time-out, the client could resubmit the request to the new primary which, in turn, has to be able to recognize that the corresponding update was already done. Upon the occurrence of the latter scenario the COMPUTE STATUS UPDATE function (line (17)) has to return a special null value such that the new primary immediately sends back the reply to the client (see test on line 19). Let us nally note that the only procedure doing side-e ects on the persistent part of the object is COMMIT STATUS.

4 Implementation and Experimental Results We realized a prototype of the WWW server over the 10Mbit Ethernet of our department connecting seven workstation SUN SPARCstation 10. The service provided by our server is a simple counter of client connections. This counter is a replicated object. Each time a client connects to the primary, a method \inc" is invoked. The service ensures that (i) for each pair of client requests a, b, if a arrives before b at the primary, then the counter value returned to a will be lower than the one returned to b. (ii) No counter value is skipped. We implemented such a simple service as our aim is to measure the average service down-time and the average response time of a client request which are actually independent from the computational complexity of processing of a client request. Failure detection middleware and communication primitives have been written in C using threads and BSD sockets, while we used C++ for implementing server processes and objects. For update messages sent by the primary to the backups on line 21 of Figure 4, we developed a communication primitive based on UDP multicast and sequence numbers which is able to atomically deliver messages to a set of workstations connected by a local area network.

4.1 Description of the Experiments Experiments were carried out injecting failures into the primary. In particular, we vary the number of primary failures f from 0 to 6 and  (see Section 3.2.2) from 250 to 1000 msec (by steps of 250 12

msec.). We measured the average response time of a client request (ART) and the average service down-time (ASDT). i.e., the time elapsed between the crash of a primary and the noti cation to the name-server of the new primary address by means of the register message. To reduce the impact of the variance of the network delay, DNS and HTTP clients were running on the same LAN of the server, where we experienced a value for  of 70 msec (11 ). Each point in the plots (Figure 5 and 6) has been computed as follows: for each pair hf;  i, we submitted a batch of 1000 HTTP requests to the server and evaluated ART and ASDT. This measurement has been repeated 20 times and, then, the nal values of ART and ASDT has been estimated.

4.2 Experimental Results Average response time. In order to point out the di erences between ART values in di erent scenarios, Figure 5 plots ART(%) as a function of f for di erent values of  , where ART(%) represents the ratio between the ART for a given pair hf;  i and the ART value in the absence of failures (which is actually independent from  ). As expected, the growth of ART(%) is linear with the number of failures. i.e., k failures lead to k elections and, thus, to increase the average response time of a factor proportional to k . Moreover, augmenting  , ART(%) increases because more time is needed to detect a primary failure and to elect a new primary.

Average service down-time. The last observation becomes more clear looking at Figure 6 showing ASDT(%) plotted as a function of  , where ASDT(%) represents the ratio between the ASDT value for a given  and the ASDT value for  equal to 0.25 sec. To have a quantitative idea of ASDT we experienced a value of 0.572 sec. for  = 0:25 sec. Previous plots can lead system designers to quantify and to reduce as much as possible the service down-time by setting a suitable value for  , having measured  provided by their local area networks. This is the maximum value we measured over several days (including high-trac hours) on the LAN of our department while experiments were running. 11

13

150 1.00 sec 0.75 sec 0.50 sec 0.25 sec

125

ART (%)

100

75

50

25

0

0

1

2

3

4

5

6

number of failures (f)

Figure 5: Average Response Time ART(%) as a function of the number of failures by varying  from 0.25 sec. to 1 sec. .

300

250

ASDT (%)

200

150

100

50

0 0.25

0.50

τ

0.75

1 [sec]

Figure 6: Average Service Down-Time ASDT(%) as a function of  .

14

5 Conclusion and Future Work A great challenge for companies is how to improve the reliability and the availability of their WWW servers only using their internal hardware, software and network resources. In this paper we have described an implementation of an highly-available WWW server based on passive object replication on workstation cluster. This implementation pointed out the necessity of dynamic updates of the DNS databases in order to achieve high availability. Hence, we presented a novel WWW architecture which allows a certi ed list of processes to dynamically update the DNS. This architecture reduces the service down-time of a WWW server compared to servers that do not adopt software replication and does not impact on the software of the WWW client being also compatible with the last version of name-server software. We are currently extending the work along two distinct directions: building a WWW server in an omission failure environment and using CORBA to manage replication of objects. In the rst direction, we weak the model of Section 3.2.1 allowing receive omissions of messages. This new model captures the behavior of heavy-loaded LAN where messages can be dropped by operating system kernels if their bu ers over ow. This brings additional complexity in all the WWW server entities. As far as the second direction is concerned, we are currently designing a new version of the WWW server based on CORBA. This version will allow object replicas and their server processes to reside everywhere in the workstation cluster and to fail independently. Finally, this paper can also be seen as an attempt to bring results and knowledge from distributed computing to the WWW world.

References [1] Albitz P.,Liu C., \Dns and Bind" 3rd Edition, O'Reilly,1998. [2] Almeida J., Almeida V., Yates D.J., \Measuring the Behaviour of a World-Wide Web Server", Computer Science Dep. of Boston University, Technical Report CS 96-025, October 29, 1996. [3] Andresen D., Ibarra O.H., Holmedahal V., Yang T., \Towards a Scalable Distributed WWW Server on Multicomputers", In Proc. of International Conference on Parallel Processing, Honolulu, pp. 850-856, 1996. 15

[4] Budhiraja N., Marzullo K., \Tradeo s in Implementing Primary-Backup Protocols", In Proc. of International Symposium on Reliable Distributed Systems, 1994. Also Cornell University, Tech. Rep. 92-1307, 1992. [5] Budhiraja N.,Marzullo K.,Schneider F.B.,Toueg S., \The Primary-Backup Approach", in Sape Mullender, Editor, Distributed Systems, ACM Press, 1993. [6] Cristian F., \Reaching Agreement on Processor-group Membership in Synchronous Distributed Systems", Distributed Computing, 4, pp. 175-187,1991. [7] Cristian F., Fetzer C, \The Timed Asynchronous System Model", In Proc. of the 28th Annual International Symposium on Fault-Tolerant Computing, Munich, Germany, 1998. [8] Chandra T.D., Hadzilacos V., and Toueg S., \The Weakest Failure Detector for Solving Consensus", Journal of the ACM, 43, 4, pp. 685{772, 1996. [9] Cristian F., Frank Schmuck \Agreeing on Processor Group Membership in Timed Asynchronous Distributed Systems", University of California San Diego, Technical Report CSE95428,1995. [10] Dias D.M., Kish W., Mukherjee R., Tewari R., \A Scalable and Highly Available Web Server", In Proc. 41st IEEE International Computer Society Conference (COMPCOM), Feb. pp. 85-92, 1996. [11] Droms, R., \Dynamic Host Con guration Protocol", Tech. Rep. RFC 1531, Bucknell University, October 1993. [12] Fisher M. J., Linch N. A., Paterson M.S., \Impossibility of Distributed Consensus in the Presence of one Faulty Process", Journal of the ACM, 32, 2, pp. 374{382, 1985. [13] Guerraoui R., Shiper A., \Fault-Tolerance by Replication in Distributed Systems", in Proc. Reliable Software Technologies { Ada-Europe'96, Springer Verlag,LNCS 1088, 1996. [14] Hadzilacos V., Toueg S., \The Fault-Tolerant Broadcasts and Related Problems", in Sape Mullender, Editor, Distributed Systems, ACM Press, 1993. 16

[15] Herlihy M., Wing J., \Linearizability: a Correctness Condition for Concurrent Objects", ACM Trans. on Prog. Lang. and Syst., 12,3:pp. 463-492, 1990. [16] IBM Corporation. IBM Net.Commerce. http://www.software.ibm.com/commerce/net.commerce. [17] Katz E.D., Butler M., McGrath R., \A Scalable HTTP Server:The NCSA Prototype", Computers Networks and ISDN Systems, 27, pp. 155{164, 1994. [18] Thomson S., Rekhter Y., Bound J., \Dynamic Updates in the Domain Name System", Tech. Rep. RFC 2136, Internet Engineering Task Force, April 1997. [19] van Renesse R., Minsky Y., Hayden M., \A Gossip-Based Failure Detection Service", In Proc. of Middleware '98, England 1998. [20] Cohen E., Wang Y. M. , Suri G., \ When piecewise determinism is almost true", in Proc. Paci c Rim International Symposium on Fault-Tolerant Systems, pp. 66-71, Dec. 1995. [21] Zou H., Jahanian F., \Real Time Primary-Backup (RTPB) Replication with Temporal Consistency Guarantees", In Proc. IEEE International Conference on Distributed Computing Systems, pp 48{56, 1998.

17

Suggest Documents