Dissemination of Mutable Sets of Web Objects - CiteSeerX

1 downloads 0 Views 102KB Size Report
communication platforms. It utilizes so-called data carousels to disseminate web contents via high bandwidth data broadcast networks. A data carousel is a set ...
DISSEMINATION OF MUTABLE SETS OF WEB OBJECTS Sven Buchholz* International Computer Science Institute 1947 Center Street Berkeley, CA 94704, USA

Steffen Göbel, Alexander Schill, Thomas Ziegert Department of Computer Science Dresden University of Technology D-01062 Dresden, Germany

ABSTRACT Recently there has been increasing interest in the application of broadcast networks as high-bandwidth downstream channels of hybrid asymmetric communication platforms. This interest stems from an increasing number of mobile internet users demanding for high quality services whereas wireless networks still lack in bandwidth substantially. In this paper we introduce a proxy architecture enabling transparent web access via hybrid asymmetric communication platforms. It utilizes so-called data carousels to disseminate web contents via high bandwidth data broadcast networks. A data carousel is a set of data items that is repeatedly broadcast. This concept is also referred to as ‘Caches in the Air’ or ‘Disks in the Air’. Furthermore, we propose different techniques to incorporate updates into the carousel cycle and present an evaluation of the different techniques by means of a simulation study.

KEY WORDS mobile computing, data dissemination, asymmetric communication platforms

1. INTRODUCTION The bandwidth limitations of cellular networks in conjunction with the availability of high bandwidth wireless data broadcast networks (such as the European Digital Audio Broadcast DAB; [1]) motivate the application of hybrid asymmetric communication platforms for mobile users. A hybrid asymmetric communication platform utilizes an unidirectional data broadcast network, called the dissemination network, as a high-bandwidth downstream channel. The bidirectional cellular network, which we want to call the interaction network, acts as the upstream channel for submitting requests for data items (fig. 1). Nevertheless, the downlink of the interaction network might be applied as an additional downstream channel. It can be used for

*

Dissemination Network

Service Provider Request

Reply Reply Data Broadcast

Reply

Reply Request

Interaction Network

Cellular Radio Client

Figure 1. Hybrid asymmetric communication platform signaling or for the delivery of data items not appropriate to be broadcast via the dissemination network. An important application of such asymmetric platforms, which we want to focus our attention on, is the access to web contents. However, even if general wireless web services via hybrid asymmetric platforms are feasible, our work focuses on the application domain of traffic telematics. Vertical web services, such as traffic telematics services, feature a qualified set of required information. In case of traffic telematics this might include: traffic reports, parking information, weather forecast, hotels and restaurants, as well as touristic information. As this information is furthermore required to be regionally focused, there is only a limited number of highly relevant data featuring high access probability, whereas other information comes with low access probability only. In this paper we introduce a proxy architecture that allows web access via hybrid asymmetric communication platforms (section 2). The architecture relies on indexed data carousels to disseminate objects within the broadcast channel. Furthermore, we present several update techniques to incorporate changes into the cycle of the data carousel (section 3). These techniques are evaluated by a simulation study (section 4). Related work is discussed in section 5. Finally we summarize our work and discuss future directions (section 6).

The work of Sven Buchholz was partially funded by the German Academic Exchange Service (DAAD).

WWW WWW Server-Side Proxy

Broadcast-unworthy Responses Interaction Network HTTPRequests

Carousel Manager Broadcast-worthy Responses Carousel Sender

Broadcast-worthy Responses Dissemination Network

Server

Web Browser

Client-Side Proxy

Broadcast-worthy Responses Carousel Receiver

Client

Figure 2. Proxy architecture

2.1 INDEXED DATA CAROUSELS

In order to allow transparent web access via a hybrid communication platform, we have developed a proxy architecture (fig. 2). A client-side proxy monitors the contents of the broadcast, that is organized as an indexed data carousel (cf. section 2.1). It constructs replies to requests for objects within the current carousel cycle. Requests that cannot be satisfied by the contents of the current cycle are forwarded to the server-side proxy via the interaction network. The server-side proxy requests the required objects from the web servers on behalf of the client and inserts the HTTP responses into the carousel cycle if they are broadcast-worthy. Broadcast-unworthy responses are passed to the client-side proxy via the interaction network. The decision whether a response is broadcast-worthy or broadcast-unworthy is made by the carousel manager in accordance with its carousel management strategy. This is similar to the cache admittance decision making used in proxy caches. The current version of our prototype applies the auxiliary cache based admittance policy described by [2]. By disseminating the carousel contents to all clients, frequently required objects are instantly available at the client-side – without submitting an extra request to the server-side proxy. Hence, response delays are reduced significantly. As in our application domain the number of highly relevant information is limited, there is a fair probability that requests can be satisfied by the contents of the data carousel. The carousel manager, however, allows the automatic adaptation of the set of broadcast objects to changing user requirements. Since web contents are disseminated, the Latest Value consistency model according to the taxonomy presented by [3] applies. However, we do not need any invalidation messages because the HTTP protocol includes an expiration mechanism allowing clients to check the validity of objects. That is why consistency considerations are not taken into account in the remainder of this paper.

The broadcast objects are organized in data carousels. A set of data items make up a carousel cycle that is repeatedly broadcast. In this paper we concentrate on flat data carousels, i.e. every item is broadcast once per cycle. Since in our scenario data items are web objects, they differ in size. In the following we consider indexed data carousels. Objects in an indexed data carousel need not to be selfidentifying but are identified by an entry in a directory that is broadcast once at the beginning of each cycle (cf. fig. 3). Since data carousels are capacity restricted, a maximum inter-directory time can be ensured. The benefit of the introduction of the carousel directory is the ability of a client to determine whether a required object will be broadcast during the current carousel cycle instantly after receiving the directory and not only after the complete cycle has been received. By buffering the directory at the client-side proxy the determination can be made instantly Carousel Directory

dir

2. PROXY ARCHITECTURE

E

A

D

B C

Figure 3. A flat, indexed data carousel at any time. Besides buffering the directory the client-side proxy may even buffer the complete carousel cycle. Thus, all broadcast objects indexed by the current directory are instantly available. This option for client side caching we call full cycle buffering. However, as mobile devices may not have sufficient memory to buffer the full cycle, we even consider the no buffering option assuming that only requested objects are received. All other objects are discarded.

Update

time

B

(a) FCC

C

C

dir

(c) ICR

E

A

C

...

has been removed early

D

(b) CC/ER

D

dir

A

E

E

A

A

C

dir

D

and D

B

C

dir

C

dir

B

dir

A

replaces

E dir

dir

previous cycle

E ...

E

...

E

C

A

dir

(d) ICR/R

dir

objects are reordered E

C

...

Figure 4. Carousel update techniques

3. CAROUSEL UPDATE TECHNIQUES Whenever the server-side proxy decides to replace one or several objects, respectively, by a new more broadcastworthy one, we consider this replacement as an update in the object set that is to be broadcast. The carousel update technique describes how to incorporate those updates into the carousel cycle. The decision making whether to broadcast a requested object in the carousel or not shall be excluded in this paper. It might be done by a cache admittance policy as in our prototype. Another feasible approach has been discussed in [4]. Nevertheless, we suppose that every object included into the carousel cycle will not be replaced before it has been broadcast at least once. The goal in designing an update technique is to achieve minimum response delays for both newly added objects and objects that have already been broadcast. However, with full cycle buffering each already broadcast object is instantly available unless the client has not been up for at least one full cycle. Moreover, there should be instant certainty about the unavailability of objects that are not scheduled within the current cycle. The first approach we call Full Cycle Completion (FCC) (fig. 4 (a)). FCC means that the current cycle is fully completed before the new cycle with the updated object set starts. New objects are added at the head of the next cycle to minimize the waiting time for them. The drawback of FCC is the potentially long (depending on the cycle length) latency between updating the object set and incorporating the update into the carousel cycle. One approach to decrease this drawback is Cycle Completion with Early Removal (CC/ER) (fig. 4 (b)). CC/ER is similar to FCC but all objects that have to be replaced are removed in the current cycle unless they have not been sent before. By this means, the completion time of the current cycle and therefore the latency between the update and the incorporation is reduced. The drawback of this approach is that some objects announced by the directory are not broadcast.

A more radical approach to overcome the latency between update and incorporation is the Immediate Cycle Restart (ICR) update technique (fig. 4 (c)). With ICR, the current carousel cycle is immediately aborted and a new cycle is started. Only the currently sent object is completed before the abort as we regard the transmission of as single object to be an atomic operation. The new cycle begins with the updated carousel directory followed by the newly added objects. They are succeeded by the objects of the previous cycle in maintained order. A serious problem inherent with ICR is that frequent updates mean frequent aborts and those objects at the tail of the cycle may not be broadcast at all. To overcome this problem, ICR evolves to Immediate Cycle Restart with Object Reordering (ICR/R) (fig. 4 (d)). That means that the objects in the new cycle are reordered based on their last broadcast time. The cycle starts with the objects that have never been broadcast before in chronological order of their arrival. They are followed by the objects whose last transmission is the longest time ago. The cycle is terminated by the objects that were broadcast immediately before the restart.

4. SIMULATION STUDY In order to achieve a more profound understanding of the effects of the different carousel update techniques on the response delays, we have developed a simulation model.

4.1 SIMULATION MODEL The simulator (fig. 5), which is event-driven, models a data source that is a substitute for the carousel manager of the server-side proxy. Every decision to add a new object to the carousel is modeled as an object generation by the data source. The object generation is a Markov process with exponentially distributed inter-generation times. Its expected value we denote update think time (utt). The

Carousel Sender

dir

Source

E

A

B

Sink

C

D

Figure 5. Simulation model exponential distribution function has been chosen as the most general assumption since our investigation of update techniques ought to be independent from the specific carousel management strategy applied by the carousel manager. Whenever an object is generated that cannot be incorporated into the carousel cycle (because its size exceeds the carousel capacity or because there are not enough replaceable objects1 in the current cycle), it is discarded. Discarded objects are not broadcast with the data carousel and must be passed to the client-side via the interaction network. Nevertheless, this is out of interest for the simulation study. Furthermore, we do not simulate a client population but average out the waiting times for all objects at all points of time assuming uniformly distributed request probabilities for all objects within the current carousel cycle. This simplification is made in order to abstract from the specific behavior of the client population. The object size we assume to be lognormally distributed according to the following distribution function:

F(x) = Φ ( ln(x-148.185)-7.957 ) 1.531

EX = 9.37 kbytes D²X = 28.3 kbytes

We gained this distribution function from an interpretation of the cache log of the NLANR2 cache at the PSC in Pittsburgh, PA from Aug 1 to Aug 7 2000 ([5]). We do not apply a heavy-tailed distribution (such as Pareto distribution) because objects larger than the overall carousel capacity are out of scope in the considered scenario. They are never broadcast with the data carousel

at all. Within the interval of interest, the lognormal distribution fits best the NLANR cache sample. The parameters of the carousel sender are depicted in table 1. Without loss of generality the carousel capacity be 100 time units (tu). The data rate of 10,000 byte/tu, used in our simulations, corresponds to a data rate of 400 kbit/s with a carousel capacity of 20 seconds. This matches the sample configuration used in our prototype of the proxy architecture assuming a 400 kbit/s DAB subchannel is used as the dissemination channel. The size of the directory is supposed to be directory size base + N*directory size increment whereas N is the current number of objects within the carousel. The directory size base of 200 bytes matches the one of our prototype implementation; 56 bytes is an average of observed directory size increment values.

4.2 EXPERIMENTAL RESULTS In this section we present the results of the performance evaluation of the different update techniques. The primary performance metric is the response delay induced by waiting for an object to be broadcast. The utt parameter varies from 0.5tu to 1000tu. However, in the following figures we present only the results of the 0.5tu to 100tu interval as it is the most interesting one. The overall simulation time was 20,000,000tu. The carousel is filled up with objects before each run and the measurement starts after a full cycle has been completed. Thus, warmup effects are eliminated and only the steady state behavior is taken into account. In case no client-side buffering is applied, there are two kinds of response delays that contribute to the performance of the update technique. On the one hand, these are the delays induced by waiting for an object

Table 1. Parameters of the carousel sender

100

Parameter

Description

Value

carousel capacity

maximum cycle time [time units] 100

data rate

size-to-time-ratio (to determine the send time for the single objects) [byte/time unit]

10,000

directory size base

the basic size of a carousel directory (size of an empty directory) [byte]

200

directory size increment

the increment of the directory size for every entry [byte]

56

90

response delay [time units]

80 70 60 50 40 30

FCC

20 1

An object is replaceable if and only if it has been broadcast at least once or the update technique ensures that it will be broadcast once (e.g. if the object is part of a cycle that is being completed). 2 The NLANR caching project (http://www.ircache.net/) is funded by the National Science Foundation (grants NCR9616602 and NCR-9521745).

CC/ER ICR

10

ICR/R 0 0

20

40

60

80

100

update think time [time units]

Figure 6. Response delays without client-side buffering

FCC

FCC

90

CC/ER ICR

1,5

ICR/R

1 0

0,5

1

1,5

2

update think time [time units]

Figure 7. Average number a single object is broadcast (newly added and already broadcast ones) until receiving it from the data carousel. On the other hand, there are even response delays induced by waiting for the certainty that an awaited object has been removed from the carousel cycle and is therefore unavailable henceforward. This certainty is obtained from receiving a new directory proving the awaited object to be broadcast no longer. The overall response delays taking both kinds of delays into account are presented in figure 6. From this plot we learn CC/ER proves the best performance at all update frequencies even though the simple FCC approach is fairly close by at most update frequencies. Furthermore, we learn CC/ER scales very well with increasing update frequencies. There is even a descent of response delays at high frequencies (utt from 10tu to 1tu). This is due to the early removal as growing update frequencies result in increasing cuts of the current cycle. Remaining and newly added objects take advantage thereof. Furthermore, the decreasing fraction of repeatedly sent objects contributes to the descent as new objects are inserted at the head of the cycle yielding lower response delays. The latter effect contributes likewise to a descent for ICR/R at utt values between 1.65tu and 1.2tu. The FCC curve shows the same effect even if less noticeable because the delays for newly added and repeatedly sent objects are closer together (due to the cycle completion). Only with very high update frequencies (utt < 1tu) the response delays for CC/ER grow. However, such update frequencies implicate that every single object is broadcast less than 1.2 times on average before it is replaced (cf. fig. 7). Actually, this is not a cyclic broadcast as proposed in this paper. If updates are that frequent on-demand scheduling algorithms (cf. [6]) might be a better choice. With full cycle client-side buffering the response delays shown in figure 6 are relevant only for a client that have not been up for at least a full cycle of the carousel. Every other client has buffered all but the newly added objects. Moreover, all announced objects are either available from the cache or guaranteed to be broadcast because they have never been broadcast before3. Hence, only the response 3

CC/ER

80

This is true only if no reception errors occur and only for clients that were started up at least one full cycle ago. To be more precise, only previously r e c e i v e d objects are instantly available. In case of reception errors or if the client has not been up for at least a full cycle there might be previously broadcast objects that are not instantly available from the cache.

response delay [time units]

number of broadcasts

100

2

ICR ICR/R

70 60 50 40 30 20 10 0 0

20

40

60

80

100

update think time [time units]

Figure 8. Response delays with full cycle buffering delays induced by waiting for newly added objects – those that have not been broadcast before – apply (fig. 8). According to figure 8 ICR and ICR/R yield low response delays for low and moderate update frequencies. This is achieved by the immediate cycle abortion ensuring newly added objects to be broadcast almost instantly. However, only ICR/R scales with increasing update frequencies. Without object reordering (ICR), update frequencies of more than 5 updates per cycle (utt < 20tu) cause rapidly increasing delays. This is due to the fact that unsent objects may be displaced to the tail of the cycle by subsequently added objects. With very high update frequencies, however, CC/ER or even FCC perform better than ICR/R. The break-even point is at an utt value of about 1.4tu. An utt value of about 1.4tu means that every object is on average broadcast about 1.1 times before it is replaced (cf. fig. 7). Hence, a data carousel does not seem to be the appropriate scheduling algorithm here.

5. RELATED RESEARCH The application of a broadcast network to disseminate data to a huge number of clients has been investigated in several previous research projects. The Boston Community Information System, described by Gifford ([7]), is an early effort in data broadcasting. It applies an FM channel to broadcast a flat data carousel including news and other information to radio receivers equipped personal computers. However, unlike with our approach the information set is preselected and updates are not driven by client demands. In contrast the Datacycle architecture ([8]) allows for client driven updates. They are incorporated into the carousel cycle at cycle boundaries. Thus, the Periodic consistency model according to [3] applies and an FCC akin update technique is used by default. Different update

techniques are not taken into account. Datacycle is designed to exploit VLSI data filters to process database queries on a cyclically broadcast database. Unlike our approach that broadcasts a subset of the potentially unlimited amount of data of the internet the Datacycle broadcasts an entire database. Broadcast Disks ([9], [3]) are another approach applying data carousels to disseminate database information. The key feature of Broadcast Disks is the application of so called multilevel disks. Data items are organized in groups called disks and are broadcast with different frequencies. Hence, unlike with our work the priority of the data may be taken into account. Whereas Broadcast Disks allow for updates, updates are generally serverdriven. A back-channel to request objects that are missed in the broadcast cycle is not considered. Back-channel capacity is taken into account in [4]. This work assumes a hybrid environment similar to our scenario. The main focus is an algorithm to decide on updates of the broadcast object set to adjust it to the needs of the client population. This is an alternative approach to the auxiliary cache based admittance policy ([2]) used in our prototype as the carousel management strategy. However, unlike our approach [4] does not assume a fixed upper limit of the carousel size but the carousel size varies. Updates are incorporated by a FCC akin update technique. Other update techniques are not considered by [4]. Nevertheless, it is worth to investigate if the results obtained for fixed-sized data carousels in this paper can be transferred to variable-sized data carousels as described by [4]. Whereas the previously mentioned publications apply different kinds of cyclic broadcast to push data to clients several efforts have been spent on pure pull data access in broadcast environments. Miscellaneous scheduling algorithms for on-demand data broadcasting e.g. FCFS, MRF, MRFL, LWF or RxW (cf. [6]) have been investigated.

6. SUMMARY AND FUTURE WORK In this paper, we have proposed a proxy architecture for web access via hybrid asymmetric communication platforms in the domain of traffic telematics. This proxy architecture utilizes indexed data carousels to disseminate HTTP responses via high bandwidth data broadcast networks. Furthermore, we have considered techniques to incorporate updates into the carousel cycle. Four different update techniques have been introduced – FCC, CC/ER, ICR and ICR/R. The performance of these update techniques by means of response delays has been examined by simulation experiments. These simulations

have proven that CC/ER or ICR/R, respectively, excel depending on whether no client-side buffering or full cycle buffering is applied. Moreover, we have observed good scalability with increasing update frequencies at a wide range of update frequencies. More sophisticated client-side caching options have been excluded in this paper. They are subject to future work. Furthermore, we plan to investigate the interdependency of the carousel update techniques and carousel management strategies. Another important issue for future work is the consideration of reception errors in the simulations as they might extend the response delays significantly.

REFERENCES [1] Radio Broadcasting Systems; Digital Audio Broadcasting (DAB) to mobile, portable and fixed receivers, ETSI standard EN 300 401 V1.3.2 (2000-09). [2] C. Argawal, J. Wolf, P. Yu, M. Epelman, On Caching Policies for Web Objects, Research Report RC 20619 (91325), IBM Research Division, Yorktown Heights, NY 1996. [3] S. Acharya, M. Franklin, and S. Zdonik, Disseminating Updates on Broadcast Disks, Proc. of VLDB’96, Mumbai (Bombay), India, 1996, 354-365. [4] K. Stathatos, N. Roussopoulos, J. Baras, Adaptive Data Broadcast in Hybrid Networks, Proc. of VLDB’97, Athens, Greece, 1997, 326-335. [5] NLANR Hierarchical Caching System Usage Statistics – Size-Distribution, http://www.ircache.net/ Cache/Statistics/Size-Distribution/200008/. [6] D. Aksoy and M. Franklin, Scheduling for LargeScale On-Demand Data Broadcasting, Proc. of INFOCOM’98, San Francisco, CA, USA, 1998, 651-659. [7] D. Gifford, Polychannel Systems for Mass Digital Communications, Communications of the ACM, 33(2), 1990, 141-151. [8] T. Bowen, G. Gopal, G. Herman, T. Hickey, K.C. Lee, W. H. Mansfield, J. Raitz, and A. Weinrib, The Datacycle Architecture, Communications of the ACM, 35(12), 1992, 71-81. [9] S. Acharya, R. Alonso, M. Franklin, and S. Zdonik, Broadcast Disks: Data Management for Asymmetric Communications Environments, Proc. of the ACM SIGMOD Conference, San Jose, CA, USA, 1995, 199-210.

Suggest Documents