Client-centered Load Distribution: A Mechanism for ... - CiteSeerX

Client-centered Load Distribution: A Mechanism for Constructing Responsive Web Services.

Vittorio Ghini

Fabio Panzieri

Marco Roccetti

Technical Report UBLCS-2000-07 June 2000

Department of Computer Science University of Bologna Mura Anteo Zamboni, 7 40127 Bologna (Italy)

The University of Bologna Department of Computer Science Research Technical Reports are available in gzipped PostScript format via anonymous FTP from the area ftp.cs.unibo.it:/pub/TR/UBLCS or via WWW at URL http://www.cs.unibo.it/. Plain-text abstracts organized by year are available in the directory ABSTRACTS. All local authors can be reached via e-mail at the address [email protected]. Questions and comments should be addressed to [email protected].

Recent titles from the UBLCS Technical Report Series 99-4

Comparing the QoS of Internet Audio Mechanisms via Formal Methods, A. Aldini, M. Bernardo, R. Gorrieri, M. Roccetti, March 1999. 99-5 Group-Enhanced Remote Method Invocations, A. Montresor, R. Davoli, O. Babaoglu, April 1999. 99-6 Managing Complex Documents Over the WWW: a Case Study for XML, P. Ciancarini, F. Vitali, C. Mascolo, April 1999. 99-7 Data-Flow Hard Real-Time Programs: Scheduling Processors and Communication Channels in a Distributed Environment, R. Davoli, F. Tamburini, April 1999. 99-8 The MPS Computer System Simulator, M. Morsiani, R. Davoli, April 1999. 99-9 Action Refinement, R. Gorrieri, A. Rensink, April 1999. 99-10 Proceedings of the Workshop on Virtual Documents, Hypertext Functionality and the Web, M. Milosavljevic, F. Vitali, C. Watters, May 1999. 99-11 An Algebraic Model for Evaluating the Performance of an ATM Switch with Explicit Rate Marking, A. Aldini, M. Bernardo, R. Gorrieri, June 1999. 99-12 Simulative and Experimental Analysis of an Adaptive Playout Delay Adjustment Mechanism for packetized Voice across the Internet, M. Roccetti, V. Ghini e G. Pau, June 1999. 99-13 Theory and Application of Extended Markovian Process Algebra (PhD Thesis), Bernardo, M., July 1999. 99-14 A Methodology for the Specification of Java Components and Architectures (PhD Thesis), Cimato, S., July 1999. 99-15 Parallel Discrete Event Simulation Supported by Coordination Languages Based on the Generative Paradigm (PhD Thesis), Fabbri, A., July 1999. 99-16 Analysis and Automatic Detection of Information Flows in Systems and Networks (PhD Thesis), Focardi, R., July 1999. 99-17 A Simple Game Semantics Model of Concurrency, Asperti A., Finelli M., Franco G., Marchignoli D., July 1999. 99-18 A Complete Axiomatization for Observational Congruence of Prioritized Finite-State Behaviors, Bravetti, M., Gorrieri, R., July 1999. 99-19 Middleware for Dependable Network Services in Partitionable Distributed Systems, A. Montresor, R. Davoli, O. Babaoglu, October 1999 (Revised April 2000). 99-20 Performance Analysis of Software Architectures via a Process Algebraic Description Language, Bernardo, M., Ciancarini, P., Donatiello, L., November 1999 (Revised March 2000). 99-21 Real-Time Traffic Transmission Over the Internet, Furini, M., Towsley, D., November 1999. 99-22 On the Expressiveness of Event Notification in Data-Driven Coordination Languages, Busi, N., Zavattaro, G., December 1999. 2000-1 Compositional Asymmetric Cooperations for Process Algebras with Probabilities, Priorities, and Time, Bravetti, M., Bernardo, M., January 2000 (Revised February 2000). 2000-2 Compact Net Semantics for Process Algebras, Bernardo, M., Busi, N., Ribaudo, M., March 2000. 2000-3 An Asynchronous Calculus for Generative-Reactive Probabilistic Systems, Aldini, A., Bravetti, M., May 2000. 2000-4 On Securing Real-Time Speech Transmission over the Internet, Aldini, Bragadini, Gorrieri, Roccetti, May 2000. 2000-5 On the Expressiveness of Distributed Leasing in Linda-like Coordination Languages, Busi N., Gorrieri R., Zavattaro, G., May 2000. 2000-6 A Type System for JVM Threads, Bigliardi G., Laneve C., June 2000.

UBLCS-2000-07

2

Client-centered Load Distribution: A Mechanism for Constructing Responsive Web Services. 1

1

1

Vittorio Ghini , Fabio Panzieri and Marco Roccetti Technical Report UBLCS-2000-07 June 2000

Abstract The success of a Web service is largely dependent on its responsiveness (i.e., its availability and timeliness) in the delivery of the information its users (clients) require. A practical approach to the provision of responsive Web services, recently proposed in the literature [7], is based on introducing redundancy in the service implementation, namely by replicating the service across a number of servers geographically distributed over the Internet; provided that the replica servers be maintained mutually consistent, service responsiveness can be guaranteed by dynamically binding the client to the "most convenient" replica (e.g., the nearest, lightly loaded, available replica, the available replica with the least congested connection with the client). Based on this approach, we have developed a software mechanism that meets effectively the responsiveness requirement mentioned above. In essence, this mechanism, rather than binding a client to its most convenient replica server, engages all the available replicas in supplying a fragment of the Web document that client requires. The size of the fragment a replica is requested to supply is dynamically evaluated on the basis of the response time that replica can provide its client with. In addition, the proposed mechanism can dynamically adapt to changes in both the network and the replica servers status, thus tolerating possible replica or communication failures that may occur at run-time. In this paper, we describe the design, the implementation, and an experimental validation of that mechanism. The performance results we have obtained from our validation excercise illustrate the adequacy of the mechanism we propose.

(1) Università di Bologna, Dipartimento di Scienze dell'Informazione, Mura Anteo Zamboni 7, 40127 Bologna, Italy. E-mail: ghini, panzieri, roccetti@cs.unibo.it

UBLCS-2000-07

1

1.

Introduction

Responsiveness, i.e. high availability and timeliness, is a crucial issue in the design of a Web service, as, from the service user perpective, a poorly responsive service can be virtually equivalent to an unavailable service. A host of techniques, e.g. [1,2,3,6,8,9,10,11,12,14], has been proposed in the literature that address specifically the issue of provinding highly available Web services. In general, these techniques are based on i) constructing a Web service out of repicated servers, locally distributed in a cluster of workstations, and ii) distributing the client request load among those servers. Thus, service availability is achieved through the redundancy inherent in the service implementation; in addition, the overall service throughput (i.e., the number of client requests per second that can be served) is optimized through carefull distribution of the client request load among the clustered servers. As discussed at lenght in [7], these techniques can only partially meet the high availabilty requirement mentioned above (e.g., they may be vulnerable to failures of the router/gateway that interfaces the service cluster to the rest of the network). In addition, these techniques cannot be deployed in order to meet the timeliness requirement that characterizes a responsive Web service; specifically, dealing with such issues as the client latency time over the network, that may notably affect the timeliness of a Web service, is beyond the control of these techniques. In order to overcome these limitations, an alternative approach has been proposed in [5], and further developed in [4,5]. According to this approach, a responsive Web service can be provided by replicating servers across the Internet (rather than in a cluster of workstations). In the Internet context, a successful deployment of this approach will depend on the ability of achieving the following two principal goals: i) dynamically binding the client to the "most convenient" replica server, and ii) maintaining data consistency among the replica servers. Unfortunately, neither of these two goals is easy to achieve. The dynamic binding of clients to replica servers can turn out to be difficult to implement, owing to the location based naming scheme used in the Web. This scheme provides a one-to-one mapping (i.e., the Uniform Resource Locator - URL) between a name of a resource and a single physical copy of that resource; hence, dynamic binding of a client to distinct replica servers requires that the client-side software be adequately extended in order to be able to select an available replica, at run-time. Maintaining replica consistency on a large geographical scale can be hard to achieve, without affecting the overall service performance. In addition, the Internet environment is subject to (real or virtual) partitions, that can prevent communications between functioning nodes; hence, within this environment, both clients and replica servers may hold mutually inconsistent views of which replica server is available and which is unavailable. Owing to these observations, we have developed a mechanism for constructing responsive web services that is implemented as an extension of the client-side software. This mechanism operates transparently to the client browser program, and its deployment requires no modifications to that program. Specifically, our mechanism addresses the issue of providing the clients of a replicated Web service, constructed out of replica servers distributed over the Internet, with timely responses (issues of replica consistency fall outside the scope of this mechanism). The principal goal of our mechanism is to minimize what we term the User Response Time (URT), i.e. the time elapsed between the generation of a browser request for the retrieval of a Web page and the rendering of that page at the browser site. In summary, rather than binding a client to its "most convenient" replica server, as proposed in [4,5,7], our mechanism intercepts each client browser request for a Web page, and fragments that request into a number of sub-requests for separate parts of that document. Each sub-request is issued to a different available replica server, concurrently. The replies received from the replica servers are reassembled at the client end to reconstruct the requested page, and then delivered to the client browser. Our mechanism is designed so as to adapt dynamically to state changes in both the network (e.g. route congetion, link failures), and the replica servers (e.g. replica overload, unavailability). To this end, our mechanism monitors periodically the available replica servers and selects, at run-time, those replicas to which the sub-requests can be sent, i.e. those replicas that can provide the requested page fragments within a time interval that allows our mechanism to minimize the URT. As our mechanism implements effectively load distribution of client requests among the available replicas, we have named it Client-Centered Load Distribution (C2LD). 2 The design of C LD is based on an analitycal model that we have developed. This model is essential in order to determine the size of the page fragment that can be requested to each replica, and the extent of the replica UBLCS-2000-07

2

2

monitoring period C LD is to use in order to execute a client request. We have validated this model by implementing it over the Internet, using four replica servers located in Italy, in the UK, and in the USA. 2 In this paper we describe the C LD design, implementation and validation we have carried out. Specifically, 2 2 in the following Section the C LD analytical model is introduced. Section 3 describes the C LD implementation we have developed. Section 4 discusses the experimental results we have obtained from that implementation; finally, Section 5 provides some concluding remarks.

2.

Analytical Model 2

The timeliness requirement that our C LD mechanism is to meet, in order to be effectively responsive, can be expressed by means of a User Specified Deadline (USD), i.e. a value that indicates the extent of time a user is willing to wait for a requested Web page to be rendered at his/her workstation. Before proceeding in our discussion, it is worth observing that, firstly, as the USD value is user specified, it may differ from the actual response time a browser request may experience over the Internet (i.e. the URT 2 previously introduced), at least in principle; thus, if a user sets an unrealistic USD, the C LD returns an appropriate exception. Secondly, note that, in order to introduce no modifications in the software of 2 commercial browsers that can make use of C LD, in our implementation the USD is set by the user prior to the 2 invocation of the browser, and then captured by the C LD mechanism. Owing to this USD timing constraint, each replica server that receives a sub-request for a page fragment 2 must honour that sub-request within a time interval that allow the C LD mechanism to reconstruct the requested page, out of all the received fragments, before the USD deadline expires. Thus, it is crucial that the 2 C LD mechanism assess accurately both the size of the fragment each replica is to supply, and the time intervals within which these fragments are to be received at the client site. To this end, we have developed an analytical model that enables this assessment. For the sake of clarity, this model is introduced below in three successive refinements; each refinement augments the model. The third refinement below is the one we have effectively implemented.

2.1.

Model 1

Assume that exactly NREP replica servers be available, and that each of these servers can provide its clients with a data rate DRi , i∈ {1,..,NREP }. In addition, assume that the size of the requested Web page be known, and be equal to DS bytes. Given these (unrealistic) assumptions, a browser request for a DS bytes Web page can be fragmented in NREP sub-requests for NREP different fragments of that page. In order to minimize the time needed to retrieve the requested page, the size PSi of the document fragment requested to the replica server i must be set equal to:

PSi = DS ⋅

∑

DRi N REP j=1

.

(1)

DRj

In order to retrieve the entire page within USD , the sum of the data rates DRi, at which each different replica server supplys data to the client, must be such that:

DS

∑

N REP j=1

DRj

≤ USD .

(2)

In other words, the Eq. (1) above states that the size of the document fragment requested to a replica server i must be proportional to the data rate DRi provided by that replica i. The denominator in Eq. (1) represents a normalization factor that takes into account the sum of the data rates of all the NREP replica servers.

UBLCS-2000-07

3

Formula (2) states that the set of all the NREP replica servers must provide an aggregate data rate, which is sufficient to download the entire document within USD time units. Finally, this first model indicates that the time interval URTi required to download a fragment of PSi bytes from each replica i is equal to:

URTi =

DS

∑

N REP j =1

,

DRj

i ∈{1,.., NREP } .

(3)

To conclude this Subsection, it is worth observing that the Model 1 above is unrealistic as, in practice, the data rate provided by each different replica server over the Internet may experience large fluctuations, owing to, for example, network congestion at the routers, and server’s overload. Hence, in order to construct a mechanism that can be effective in the presence of highly fluctuating data rate values, Model 1 is to be refined as follows.

2.2.

Model 2

Assume that each data rate DRi be constant only in a brief time interval equal to S seconds. Moreover, assume that the total amount of time needed to download an entire page is exactly equal to the sum of NINT subsequent time intervals, each of which has a duration of S seconds (i.e. URT = NINT . S ). Then, the size PSi,k of the document fragment, requested to a given replica i at the beginning of the interval k (k∈ {1,..,NINT}), can be set equal to:

PSi ,k = DRi, k ⋅ S , where

(4)

DRi ,k represents the data rate that a given replica server i is able to provide during the time interval k.

Yet again, in order for the required page to be entirely downloaded within the USD deadline, the following requirement must be met by all the replica servers:

DS

∑

N REP i =1

∑

N INT k =1

DRi,k

≤ USD .

(5)

N INT

Eq. (5) states that the data rates provided by all the different replica servers, summed over all the NINT time intervals, must be such that the requested page is entirely downloaded before the USD deadline expire.

∑ Specifically, it is interesting to observe that the term

N INT k =1

DRi,k

N INT

in Eq. (5) represents the average data rate

DRi that a given replica server i has provided during the entire downloading period. It is worth observing that, in the above Model 2, all the sub-requests to the different replica servers are assumed to be: i) issued exactly at the beginning of each time interval k, and ii) terminated at the end of the same interval k. Owing to these assumptions, this Model is inaccurate as it does not capture the following two relevant issues. Firstly, the data rate DRi ,k that a given replica i can provide during the time interval k may not be known a priori. Secondly, this data rate may vary unpredictably in different time intervals. These two issues entail that a sub-request, submitted to a certain replica server at the beginning of a given time interval k, may terminate during some later interval h (i.e., hN). Thus, a realistic model must be able to represent both the time interval in which each sub-request is transmitted to a each replica, and the exact sequence of all the sub-requests transmitted to each replica. Model 3 below extends Model 2, accordingly.

UBLCS-2000-07

4

2.3.

Model 3

We denote with the index r, ranging in the interval

{1,.., N REQ } , each single sub-request that can be issued to i

the replica i. Using the replica index i and the sub-request index r, we can define the following mapping

k i ,r :

k i ,r (T ) = (T div S ) + 1 ,

(6)

where T denotes the exact time instant in which a given sub-request r is issued to the replica i, and (T div S ) + 1 denotes the time interval containing T. In order to assess the data rate that will be provided by a given replica i, we adopt the following measurement-based strategy. Assume that the sub-request r must be issued to the replica i at the time T , in the interval k i , r ; then, the data rate for that sub-request can be computed as:

DR i ,r =

PSi ,r −1 . URTi ,r −1

(7)

In the above formula, PS i ,r −1 represents the size of the document fragment downloaded with the previous sub-request r-1, terminated by the time T.

URTi ,r −1 is the elapsed time experienced for downloading the r-1

document fragment of size PS i ,r −1 . Note that the URTi ,r −1 value may be experimentally measured at the time T, when the previous sub-request r-1 has been completed. Given that value, the document fragment to be requested to the replica i is:

PSi ,r = DR i,r ⋅ Si ,r with and

∗

(8)

∗

Si,r = ki, r ⋅ S − T

(k

i ,r

(9)

− 1)⋅ S ≤ T < ki,r ⋅ S .

In Eq. (8) above, the term

(10)

∗

S i ,r represents the URT expected from the execution of the sub-request r

directed to the replica i; in (9), for the sake of simplicity, the term k i ,r has been used in place of k i, r (T ) . Depending on the value of T, the following two possible events may occur: 1.

the sub-request r-1 terminates exactly at the beginning of the interval

2.

the sub-request r-1 terminates during the

ki ,r , i.e. T = (k i,r − 1)⋅ S ;

ki ,r interval, i.e. (ki ,r − 1)⋅ S < T < ki,r ⋅ S .

If event 1 occurs, the size of the requested page fragment is proportional to the total duration S of a complete interval. Instead, if event 2 occurs, the size of the requested fragment is proportional to the residual time ki ,r ⋅ S − T needed to reach the end of the ki ,r interval. Finally, the following requirement must be met by all the replica servers, in order to ensure that a requested page be entirely downloaded within the USD deadline:

DS

∑ UBLCS-2000-07

N REP i =1

∑

N REQ i r=1

DRi,r ⋅URTi,r

≤ USD ,

(11)

N INT ⋅ S

5

The Eq. (10) above states that the sum of the average data rates provided by all the replica servers during the downloading period NINT ⋅ S must be such that requested page is completely downloaded before the USD deadline expire.

∑ The term

N REQ

DRi ,r ⋅ URTi ,r

i

r=1

NINT ⋅ S

in the Eq. (11) represents the average data rate

DRi that

a given replica i has been able to provide during the downloading period. To conclude this Subsection, we wish to point out that our Model 3 provides a measurement-based strategy, for assessing the actual data rate each replica can provide, as the size of a fragment to be requested in a subrequest r depends upon the measurement of the URTi ,r −1 value experienced at the time the (previous) subrequest r-1 terminates.

3.

Implementation 2

We have implemented the analytical model 3 (and named it as C LD mechanism) on top of the HTTP 1.1 interface, as suggested in [5], using the Java programming language. Our implementation supports the standard HTTP 1.1 interface, so as to operate transparently to the higher software layers (e.g. the browser software). For 2 the purposes of this implementation, we have provided the users of our mechanism with a C LD configuration procedure that allows them to set the USD timeout, introduced earlier, before they request access to a Web 2 service (a default USD value is used by our C LD implementation, if a user does not make use ot that configuration procedure). Typically, in order to access a Web service, a user starts a browser by providing it with the URL of that 2 service. That browser invokes an HTTP GET method with that URL. The C LD mechanism intercepts that HTTP GET invocation, starts the USD timeout, and, using the URL in the GET invocation, interrogates the DNS. 2 The DNS maintains the IP addresses of the NREP replica servers that implement a Web service. When C LD submits a request to the DNS for resolving a URL, the DNS returns the IP addresses of all the replica servers associated to that URL. 2 As the replica servers addresses are available to the C LD mechanism, this mechanism interacts with each replica as illustrated in Figure 1, and summarized below. /* C2LD */ … within USD do … HEAD(…) URT(i) := … PS(i,1) := … within S do if not download_done then GET (…)

DR i , r =

PS i ,r −1

/* set USD timeout */ /* send HEAD request to replica i */ /* assess URT replica i can provide */ /* compute 1st fragment size for replica i */ /* set timeout of length S */ /* check if page download completed */ /* get fragment from replica i */

;

/* compute expected data rate, based on URT of previous request

URTi ,r −1

*/

PSi ,r = DR i,r ⋅ Si ,r ; *

/* compute size of next fragment to be requested */

else return;

od … od Figure 1: Implementation of the C2LD Service. UBLCS-2000-07

6

2 C LD invokes an HTTP HEAD method on each replica i. The reply from replica i to the first HTTP HEAD 2 invocation is used by C LD to: i) get the size of the requested page, ii) estimate the data rate that replica i can provide, and iii) assess the size of the first fragment that can be fetched from that replica. C2LD maintains a global variable (download_done, in Fig.1) that indicates whether or not all the fragments of a requested page have been delivered. Until a page is not fully downloaded, C2LD uses the equations (7) and (8) in Section 2 to compute the size of the fragment that is to be requested to the replica i. 2 Once the requested fragment size has been calculated, C LD issues a HTTP GET request to the replica i, in order to retrieve the required fragment of that size. (Specifically, a fragment of Z bytes size is requested by invoking an HTTP GET method with the following option set: "Range: bytes=Y-X", where X and Y denote the bytes corresponding to the beginning and the end of the requested fragment of size Z, respectively.)

In order to adjust adaptively to possible fluctuations of the communication delays that may occur over the Internet, the fragment size is computed each time a fragment is to be requested, based on the value of the URT experienced in feching the previous fragment. Thus, in essence, as the GET request r-1 directed to a given replica i terminates, a new GET request r can be issued to the replica i with the fragment size value PS i ,r computed on the basis of the response time URTi ,r −1 . Note that some of the replica servers may not respond timely to the HTTP (HEAD and GET) invocations 2 described above (e.g., they may be unavailable owing to network congestion). Thus, C LD associates a timeout 2 to each HTTP request it issues to each server i. If that timeout expires before C LD receive a reply from a replica server i, it assumes that the server i is currently unavailable, and places it in a "stand_by" list. Replica servers in that list are periodically probed to assess whether they have become active again. Requests for replicas in the stand_by list are redirected to active replicas. Finally, we wish to mention that, in order to increase the degree of parallelism in the fetching of document 2 fragments, the C LD mechanism has been implemented using the thread programming model provided by the Java 2 Software Development Kit. As depicted in the following Figure 2, C2LD is implemented as a set of threads that share a lightweight data structure. This data structure is used to maintain shared variables, such as the USD, the download_done, and the stand_by list, mentioned above. Semaphores, implemented using the synchronized Java statement, protect the access to this data structure. The following threads implement our mechanism. An interface thread intercepts the browser GET request, and activates the get_document thread. This latter thread implements the mechanism introduced above for fetching a Web page, and returns to the browser the retrieved page (encapsulated into an HTTP message). A timer thread implements the timeout mechanism and maintains the stand_by list, instead a monitoring thread manages the monitoring period S. Finally, HTTP_connection threads are used to establish and maintain the connections with the replica servers; these threads are responsible for both assessing the URT provided by those servers, and managing the communications with them. Each each HTTP_connection thread is connected to a single replica server. In order to minimize the communication overheads caused by the connection establishment and release operations, each of these threads maintains the connection with its replica server ’open’ for the duration of a browser session. Thus, a HTTP_connection thread can transmit consecutive requests to the replica to which it is connected without incurring in the additional costs entailed by opening and closing a connection.

4.

Measurements 2

The effectiveness of the C LD implementation, described in the previous Section, has been validated through a large number of experiments (4000, approximately). These experiments were carried out during a two-months period; namely, November and December 1999. These experiments consisted essentially of a client program (i.e. a browser) downloading Web documents of different size from up to 4 geographically distributed replica servers. These documents were downloaded by the same client program using both the C2LD mechanism, and the standard HTTP GET downloading mechanism, for comparison purposes. In this Section, we describe in detail the scenario within which these experiments have been carried out, introduce the metrics we have used to assess our implementation, and discuss the performance results we have obtained. UBLCS-2000-07

7

4.1.

Scenario 2

The performance of our C LD implementation has been evaluated using the Internet infrastructure to connect a 2 client workstation (equipped with our C LD software) with four different workstations acting as the Web replica servers. Browser GET request

GET response

C 2LD interface thread

timeout timer thread

http_connection thread

get_documentn thread


GET RANGE requests

Web Server Replica 1

. . . .

monitoring thread


GET RANGE responses

Web Server Replica 2

. . . .

Web Server Replica N

Figure 2: Thread structure of C 2LD. The client workstation was a SPARCstation 5 running the SunOS 5.5.1 operating system and the Sun Java Virtual Machine 1.2. This workstation was located at the Computer Science Department of the University of Bologna. The four different replica servers were running the Apache Web server over the Linux platform. For the purposes of our evaluation, these four replica servers were located in four distinct geographical areas. Specifically, a replica server was located close to the client workstation; namely, at the Computer Science Laboratory of the University of Bologna, in Cesena. This Laboratory is four network hops far from our Department in Bologna. The connection between our Department and this Laboratory in Cesena has a limited bandwidth of 2 Mbps, and is characterized by a rather high packet loss rate (between 2% and 9%). A second replica server was located in northern Italy; namely, at the International Center of Theoretical Physics in Trieste. This server was reachable through 9 network hops via a connection whose bandwidth ranged between 8 and 155 Mbps. A third replica server was located at the Department of Computing Science of the University of Newcastle upon Tyne (UK). This server was reachable through 15 network hops from our client workstation. Finally, a fourth replica server was located at Computer Science Department of the University of California at S. Diego. This site was reachable through a 19 network hops transatlantic connection.

UBLCS-2000-07

8

Figure 3. Routes between the client and the Web replica servers.

Figure 3, obtained with the utility [13], depicts the locations of both the client and the replica servers used in our experiments, and the routes betwen this client and those servers; this Figure shows that the different routes connecting our client with the four replica servers scarcely overlap (i.e., route overlapping occurs only up to Milan, when communicating with the replica servers in Trieste, Newcastle, and San Diego). Figure 4 lists the hops along those routes. Within this scenario, a replicated Web service can be configured so as to use one of the following 11 combinations of the four available replica servers. Namely, a service can be replicated across the two servers in Cesena and Newcastle (C+N, in the following), only, or those in Cesena and Trieste (C+T), or in Trieste and Newcastle (T+N), or in Cesena and S. Diego (C+S), or in Trieste and S.Diego (T+S), or Newcastle and S. Diego (N+S). A more redundant service can be implemented across one of the following four combinations of three servers, instead: Cesena, Newcastle and Trieste (C+N+T); Cesena, Newcastle, and S. Diego (C+N+S); Cesena, Trieste, and S. Diego (C+T+S); Trieste, Newcastle, and S. Diego (T+N+S). Finally, a redundant Web service can be implemented across the four replica servers in Cesena, Newcastle, Trieste, and S. Diego (C+N+T+S). Our experiments have been carried out using all these 11 combinations of replica servers. It is worth noting that the four machines running the server replicas were moderately loaded during the period our experiments. In contrast, the routes to the European Web replica servers were heavily loaded during daytime; network traffic over these routes typically decreased during the night.

Bologna to Cesena traceroute to 1 cisco2.cs.unibo.it poseidon.csr.unibo.it 2 130.136.254.254 3 almr59.unibo.it 4 poseidon.csr.unibo.it

Bologna to Trieste traceroute to www.ictp.trieste.it 1 cisco2.cs.unibo.it 2 130.136.254.254 3 almr55.unibo.it 4 rc-unibo.bo.garr.net 5 rt-rc-1.bo.garr.net 6 mi-bo-2.garr.net 7 ts-mi-2.garr.net 8 ictp-rc.ts.garr.net 9 sv3.ictp.trieste.it

Bologna to Newcastle

Bologna to S.Diego

traceroute to murton.ncl.ac.uk 1 cisco2.cs.unibo.it 2 130.136.254.254 3 almr55.unibo.it 4 rc-unibo.bo.garr.net 5 rt-rc-1.bo.garr.net 6 mi-bo-1.garr.net 7 garr.it.ten-155.net 8 it-uk.uk.ten-155.net 9 212.1.192.150 10 193.63.94.241 11 146.97.254.173 12 leeds.norman.site.ja.net 13 man-gw.ncl.ac.uk 14 pipewell.ncl.ac.uk 15 murton.ncl.ac.uk

traceroute to habanero.ucsd.edu 1 cisco2.cs.unibo.it 2 130.136.254.254 3 almr55.unibo.it 4 rc-unibo.bo.garr.net 5 rt-rc-1.bo.garr.net 6 mi-bo-1.garr.net 7 ny-mi.garr.net 8 Abilene-DANTE.abilene.ucaid.edu 9 clev-nycm.abilene.ucaid.edu 10 ipls-clev.abilene.ucaid.edu 11 kscy-ipls.abilene.ucaid.edu 12 denv-kscy.abilene.ucaid.edu 13 scrm-denv.abilene.ucaid.edu 14 losa-scrm.abilene.ucaid.edu 15 USC--abilene.ATM.calren2.net 16 UCSD--USC.POS.calren2.net 17 sdsc2--UCSD.ATM.calren2.net 18 cse-rs.ucsd.edu 19 habanero.ucsd.edu

Figure 4: Hops along the routes. UBLCS-2000-07

9

The average network traffic conditions on both the European and transatlantic routes, experienced during the evaluation of our C2LD mechanism, are reported in the above Table 1. Specifically, in this Table, the first row indicates the 90% percentile of the Round Trip Time (RTT) obtained with the ping routine from our client workstation in Bologna to the four replica servers; the second row reports the minimum RTT experienced over the routes to those servers; the third row reports the average packet loss rate we measured over those routes, as obtained with the ping routine (ICMP).

ping from Bologna to: RTT 90% of arrived pkt (msec) RTT min (msec) Lost Packet

Cesena 107 10 2%-9%

Trieste 95 38 0%

Newcastle 160 59 0%

S. Diego 450 190 0%-3%

Table 1: The measured Round Trip Time and Lost Packet Percentage. Finally, in order to illustrate the bandwidth saturation suffered by the router at the University of Bologna, Figures 5 and 6 depict the traffic statistics at that router, measured in Mbps, on a daily and weekly basis, respectively. In these Figures, the solid area represents the amount of the outgoing traffic generated by the University of Bologna; the black line represents the amount of the incoming traffic, instead. As expected, both the outgoing and the ingoing traffic almost saturate the available bandwidth of 8 Mbps during working hours.

Figure 5: Daily Traffic Statistics.

Figure 6: Weekly Traffic Statistics.

4.2.

Metrics. 2

The following two metrics have been used to evaluate the effectiveness of our C LD mechanism: i) ii)

the percentage of document retrieval requests successfully satisfied by our mechanism, and the URT (as defined earlier) our mechanism provides.

Both these metrics capture the principal user requirements; namely, that a requested document be effectively retrieved, and that it be done timely. Based on these metrics, the evaluation of our mechanism has been carried out using different values of the following four parameters: i) ii) iii) iv)

the number of server replicas that implement a given Web service, the data rate that each different server replica can provide, the download monitoring period adopted by the C2LD mechanism, and, finally, the size of the document to be retrieved.

UBLCS-2000-07

10

2

The performance of our C LD mechanism has been compared and contrasted with that provided by the standard HTTP document retrieval mechanism, as this is implemented by the HTTP GET function. To this end, each replica server maintained a set of downloadable files of different size, ranging from 3 Kbytes to 1 Mbytes. These servers were requested to retrieve the same Web document, at the same time of the day (i.e. under the same network traffic conditions, approximately) using, alternatively, both the standard HTTP GET request, and our mechanism. The download monitoring period S used by our mechanism ranged from 50 milliseconds to 10 seconds.

4.3.

Measures

A number of experiments were carried out for each given file size and download monitoring period S. Each experiment consisted of 15 consecutives download requests. 4 out of 15 of these requests were executed by 2 invoking the HTTP GET function; the other 11 requests were executed by the C LD mechanism. The experiments used files whose size was 3, 10, 30, 50, 100, 200, 500 and 1000 Kbytes. For each file size, 2 the experiments with the C LD mechanism were repeated using a different download monitoring period, ranging from 50 milliseconds to 10 seconds. As mentioned earlier, our experiments were carried out on the 11 possible configurations of a replicated service, introduced in Subsection 4.1; however, the values reported in the Tables and Figures in this Subsection are the average of the the results obtained in all our experiments. As a first result, Table 2 summarizes the page loss percentage obtained with our mechanism, and that obtained with the standard HTTP GET downloading mechanism (performed with S. Diego, Newcastle, Trieste and Cesena, respectively). It can be seen from this Table that the C2LD mechanism provides a highly available service, since it always guaranties the downloading of the requested document. Instead, the standard HTTP mechanism is not always able to provide its client with that document, as shown by the page loss percentage experienced by the S. Diego and Cesena servers, in particular. C2LD

S. Diego (HTTP)

Newcastle (HTTP)

Trieste (HTTP)

Cesena (HTTP)

50 Kbytes

0%

0.3 %

0%

0%

1.65 %

100 Kbytes 200 Kbytes

0% 0%

0.25 % 0.2 %

0% 0%

0% 0%

5% 4%

500 Kbytes 1 Mbyte

0% 0%

0.3 % 0.45 %

0% 0%

0% 0%

1% 2.5 %

Table 2: Fault Percentage. Figure 7 summarizes the results of the URT assessment we have obtained. The lowest curve in this Figure represents the URT provided by our C2LD mechanism, as it results from averaging its value over all the performed experiments. The two higher curves, instead, represent the URT values provided by the two fastest Web replica servers, when interrogated with the standard HTTP downloading mechanism. Specifically, the URT values provided by these two replica servers were obtained as follows. The four different replica servers were excercized with the standard HTTP mechanism; then, out of these four servers, the URT results obtained by the two fastest ones were selected to plot the graph in Figure 7. 2 As shown in that Figure, the performance of the C LD mechanism and the HTTP mechanism are equivalent when the document size less than 50 Kbytes. Instead, when the document size is larger than 50 Kbytes, the 2 C LD mechanism outperforms the standard HTTP downloading mechanism. The improvement of the URT, 2 that can be obtained by the C LD mechanism with respect to the fastest replica server, is depicted in Figure 8.

UBLCS-2000-07

11

22 20 18 URT (seconds)

16 14

2-nd Fastest

12

Fastest

10

C2LD

8 6 4 2 0 3

10

30

50

100

200

500

1000 file size (Kbytes)

Figure 7: C2LD vs. HTTP. As shown in that Figure, the performance of the C2LD mechanism and the HTTP mechanism are equivalent when the document size less than 50 Kbytes. Instead, when the document size is larger than 50 Kbytes, the C2LD mechanism outperforms the standard HTTP downloading mechanism. The improvement of the URT, 2 that can be obtained by the C LD mechanism with respect to the fastest replica server, is depicted in Figure 8.

2.5000

2.2397

2.0000

seconds

1.5000 0.9352

1.0000 0.4907 0.5000 0.0001

0.0711

0.0000 -0.0309 -0.5000

-0.0231

3

10

-0.0283

30

50

100

200

500

1000

file size (Kbytes)

Figure 8: Average URT improvement provided by the C2LD mechanism 2

As already mentioned, the C LD URT values in Figure 7 represent an average URT value, as it results from all the experiments we have carried out. However, it may be interesting to assess the URT improvement that can be obtained by our mechanism as the number of replica servers, concurrently used, varies. To this end, we have carried out experiments in which the number of active replicas were varying from 1 to 4. Figure 9 illustrates the results of our experiments, in these four cases. Specifically, in these cases, the URT is measured as a function of the file size. As shown in this Figure, the larger the number of replicas that are used, UBLCS-2000-07

12

the better the URT values that can be obtained; in addition, as the file size increases, the advantage of using the C2LD mechanism becomes more notable. In addition, Table 3 shows the average percentage improvement of the URT values that has been experienced in the experiments depicted in Figure 9, as the number of replica servers grows.

15 14 13 12 11

URT (seconds)

10 1 replica

9

2 replicas

8

3 replicas

7

4 replicas

6 5 4 3 2 1 0 30

50

100

200

500

1000 file size (Kbytes)

Figure 9: URT improvement provided by the C2LD mechanism as the number of replicas grows

Number of Replica Servers

URT percentage improvement

2

4%

3

17.2%

4

21.5%

Table 3: Average URT percentage improvement We have experimented our C2LD mechanism using different values of the download monitoring period parameter S, in order to assess the influence of that paramenter on the URT provided by our mechanism. 2 Specifically, we have measured the C LD URT using values of the monitoring period S ranging from 50 ms to 10 s. Figure 10 below illustrates the results our experiments. This Figure reports the different URT curves that were obtained using files of different size. Note that the monitoring period that provides the lowest URT values is 0.5 seconds, regardless of the file size. UBLCS-2000-07

13

14 1000K

URT (seconds)

12

500K 10

200K

8

100K

6

50K 30K

4

10K

2

3K

0 50

200

500

1000

2000

5000

10000

monitoring period (milliseconds) Figure 10: URT depending on the monitoring period. Finally, for the sake of completeness, we report below two additional graphs. These graphs illustrate the performance results obtained by both requesting each replica to fetch a 500 Kbytes file, using a standard HTTP GET request, and exercizing our mechanism with the fetching of the same file; the monitoring period used by our mechanism in these experiments ranged from 500 to 2000 milliseconds. Specifically, the histograms in Figure 11 represent the URT values obtained by the C2LD mechanism (denoted as ’parallel’ in this Figure), and the URT values obtained by each single replica. Each histogram illustrating the performance of our mechanism relates to a different combination of the server replicas used during the experiments, as indicated in the Figure. Thus, for example, the leftmost parallel histogram reports the URT values obtained with our mechanism when only the replica servers in Cesena and Newcastle were 2 used; the rightmost "parallel" histogram reports the URT values obtained with the C LD mechanism when all the four replica servers were used. The remaining histograms (other than the parallel histogram) relate to the individual replica servers, as indicated in the Figure. 2 Figure 12 compares the percentage of URT improvement obtained by the C LD mechanism with that obtained by each single replica (interrogated using HTTP GET). Both Figure 11 and 12 show that, in general, 2 2 the C LD mechanism outperforms each single replica in all cases; the only exception occurs when the C LD uses only two replica servers, and, out of these two servers, one is very fast (Trieste or Newcastle), and the other is very slow (i.e. San Diego).

5.

Concluding Remarks.

In this paper, we have described an approach to the design of responsive Web services, constructed out of geographically distributed, replica servers. This approach allows the replica servers to be dynamically allocated to their users (i.e. the browser programs). 2 We have developed an implementation of this approach, termed C LD, and validated it using replica servers 2 distributed over the Internet. The results we have obtained from this implementation indicate that our C LD mechanism can be extremely effective, in order to ensure service responsiveness (i.e. availablity and timeliness). Specifically, our mechanism can outperform the standard Web page downloading mechanism, such as that provided by the HTTP implementation, as it can improve the URT by an average factor ranging between 4% and 21%, depending on the size of the dowloaded page.

UBLCS-2000-07

14

16 14

500 Kbyte URT (seconds)

12 10

S=San Diego

8

C=Cesena

6

N=NewCastle T=Trieste

4

Parallel

C+T+N+S

T+N+S

C+T+S

C+N+S

N+S

T+S

C+S

T+N

C+T

C+N

0

C+N+T

2

Figure 11: C2LD vs. single replicas performances.

80

URT Improvement

70 60 50 40

S=San Diego

30 20

N=NewCastle

C=Cesena T=Trieste

10 0 -10 C+T+N+S

T+N+S

C+T+S

C+N+S

C+N+T

N+S

T+S

C+S

T+N

C+T

C+N

-20

Figure 12: Percentage Improvement of the C2LD mechanism. We are planning to extend the work described in this paper by addressing the following topics. Firstly, we intend to evaluate the overhead caused by our mechanism both at the network level and at the servers. Secondly, we intend to evaluate the performance of our mechanism when implemented within a proxy server; in this context, we wish to explore the use of caching and prefetching techniques. Thirdly, we wish to compare an contrast the performance of our mechanism with that provided by other replicated services, such as those based on a locally distributed cluster of workstations. Fourthly, we wish to examine strategies that optimize the routing of requests to replica servers. Finally, we wish to investigate policies for maintaining data consistency among geographically distributed, replica servers.

UBLCS-2000-07

15

Acknowledgments We wish to thank our colleagues at the Computing Science Department of the University of Newcastle upon Tyne (UK), at the International Center for Theoretical Physics of Trieste (Italy), and at the Department of Computer Science of the University of California at San Diego (USA) for providing us with the resources required to carry out our experiments.

References 1.

R.B. Bunt, D.L. Eager, G.M. Oster, C.L. Williamson, “Achieving Load Balance and Effective Caching in Clustered Web Servers”, Proc. 4th International Web Caching Workshop, San Diego, CA, March 1999.

2.

"Cisco Local Director", CISCO System Inc., White paper, 1996.

3.

M. Colajanni, P. S. Yu, D. M. Dias, "Analysis of Task Assignment Policies in Scalable Distributed Web-Server Systems", IEEE Trans. on Parallel and Distributed Systems, Vol. 9, N 6, pp. 585-600, June 1998.

4.

M. Conti, E. Gregori, F. Panzieri, "Load Distribution among Replicated Web Servers: A QoSbased approach", Proc. 2nd ACM Workshop on Internet Server Performance (WISP'99), Atalanta Georgia, USA, May 1st, 1999.

5.

M. Conti, E. Gregori, F. Panzieri, "QoS-based Architectures for Geographically Replicated Web Servers", Cluster Computing (to appear).

6.

P. Damani, P.E. Chung, Y.Huang, C.Kintala, Y.-M. Wang, “ONE-IP: Techniques for Hosting a Service on aCluster of Machines”, Computer Networks and ISDN Systems, 29, 1997, pp. 10191027.

7.

D. Ingham, S.K. Shrivastava, F. Panzieri, "Constructing Dependable Web Services'', IEEE Internet Computing, Vol. 4, N. 1, January/February 2000, pp. 25 - 33.

8.

A. Iyengar, J. Challenger, D. Dias, P. Dantzig, "High-Performance Web Site Design Techniques", IEEE Internet Computing, Vol. 4., N. 2, March/April 2000, pp. 17-26.

9.

Li, H. Kameda, “Load Balancing Problems for Multiclass Jobs in Distributed/Parallel Computer Systems”, IEEE Trans. on Computers, Vol. 47, No. 3, March 1998, pp. 322-332.

10.

E.D. Katz, M. Butler, R. McGrath, "A Scalable HTTP Server: The NCSA Prototype", Comp. Net. and ISDN Sys., 27 (2), pp.155-164, Nov. 1994.

11.

J. Li, H. Kameda, "Load Balancing Problems for Multiclass Jobs in Distributed/Parallel Computer Systems", IEEE Trans. on Computers, Vol. 47, NO. 3, pp. 322-332, March 1998.

12.

V. Pai, G. Banga, M. Svendsen, P. Druschel, W. Zwaenepoel, E. Nahum,”Locality-aware Request Distribution in Cluster-based Network Servers”, Proc. ASPLOS- VIII, San Jose, CA, October 1998.

13.

Datametrics Systems Corp., http://www.visualroute.com

14.

J. Watts, S. Taylor, "A Practical Approach to Dynamic Load Balancing", IEEE Trans. on Parallel and Distributed Systems, Vol. 9, NO. 3, March 1998, pp. 235-248.

UBLCS-2000-07

"VisualRoute

-

Mapping

the

Internet",

16

Client-centered Load Distribution: A Mechanism for ... - CiteSeerX

Client-centered Load Distribution: A Mechanism for ... - CiteSeerX

Suggest Documents

An Adaptive Load Management Mechanism for ... - CiteSeerX

Resource Based Policies for Load Distribution - CiteSeerX

Load Distribution in a CORBA Environment - CiteSeerX

Load Distribution in a CORBA Environment - CiteSeerX

A Load Distribution Through Competition for Workstation ... - CiteSeerX

A Mechanism for Fair Distribution of Resources

load testing and load distribution of fiber reinforced ... - CiteSeerX

load testing and load distribution of fiber reinforced ... - CiteSeerX

Assessment of Load Transfer and Load Distribution in ... - CiteSeerX

Load Testing and Load Distribution Response of Missouri ... - CiteSeerX

A Truthful Load Balancing Mechanism with

An Optimal Load Shedding Approach for Distribution ... - CiteSeerX

Approximation Algorithms for Data Distribution with Load ... - CiteSeerX

A Truthful Mechanism for Fair Load Balancing in ... - Computer Science

A Resource Exchange Mechanism for Peak Load Management in

A Truthful Mechanism for Fair Load Balancing in ... - Computer Science

A transport layer load sharing mechanism for mobile wireless hosts ...

A transport layer load sharing mechanism for mobile ... - IEEE Xplore

A Packetized Direct Load Control Mechanism for Demand Side ...

Load Testing and Load Distribution Response of

Spatiotemporal Data Mining for Distribution Load

Load distribution for composite steelâconcrete ...

Hermes: Scalable and Load Distribution Engine for

Load Flow analysis for Radial Distribution Network

Client-centered Load Distribution: A Mechanism for ... - CiteSeerX