Framework and Algorithms for Web Resource Monitoring ... - technion

0 downloads 0 Views 498KB Size Report
University of Maryland ... Current data delivery solutions can be classified as either ...... [10] U. Dayal et al. ... http://ie.technion.ac.il/˜avigal/ProMoLang.pdf.
Framework and Algorithms for Web Resource Monitoring and Data Delivery Haggai Roitman, Avigdor Gal

Louiqa Raschid

Technion - Israel Institute of Technology Haifa 32000 Israel

University of Maryland College Park MD 20742 USA

[email protected] [email protected]

[email protected]

ABSTRACT Web enabled application servers and the clients for their Web services have increased in both the sophistication of server capabilities as well as in the demand for client customization. Therefore, there is a necessity of a specification language for sophisticated client needs and a framework for resource monitoring and data delivery that goes beyond existing standards. In this work we propose ProMo, a framework for Web resource monitoring. Using an expressive specification language a client can specify a profile, a set of data delivery needs, using notification rules. Such rules are also utilized to specify server capabilities. ProMo supports an abstract model where the semantics of a notification rule can be exploited to reason about a timeline of events occurring at the server, and determine an appropriate schedule to both pull data from servers and to push notifications to clients. Such reasoning is formalized to determine whether a set of server capabilities can cover client needs, deducing the amount of monitoring that is needed to ensure profile satisfiability. In addition to the conceptual framework, we also introduce a scheduling algorithm for determining when notifications are pushed by servers and when they need to be monitored. Finally, we report in this work on a proof of concept prototype for ProMo. We use example scenarios that use real trace data from RSS feeds to illustrate the reasoning capabilities of our algorithm and their use in correctly building an efficient monitoring schedule.

Keywords Push, Pull, Data Delivery, Web Resource Monitoring, Profiles

1.

INTRODUCTION

Web enabled application servers have had to increase the sophistication of their server capabilities in order to keep up with the increasing demand for client customization. Typical applications include RSS feeds, stock prices and auctions on the commercial Internet, and increasingly, the availability of services to deliver scientific datasets and to access Grid computational resources. Despite these advances, there still remains a fundamental trade-off between the scalability of both performance and ease of implementation on the server side, with respect to the multitude and diversity of Copyright is held by the author/owner(s). WWW2006, May 22–26, 2006, Edinburgh, UK. .

clients, and the required customization to deliver the right service/data to the client at the desired time. As a first example consider the growing use of RSS feeds, supported by a pull based protocol. A client can customize her profile by specifying the rate of monitoring the RSS feed. Recently, the RSS protocol was extended to allow the specification of server-side Time-To-Live (TTL), indicating when new updates are expected. We note that while this improves customization, server side TTL for static content delivery is not used often, and was shown to be inefficient [4]. Despite these features, a client who is only interested in being alerted of updates for a particular news category, whenever the rate of updates increases to be at least twice more than the usual rate, cannot specify such a profile. This scenario requires expressiveness that is currently unavailable. As another motivating example for the need of a sophisticated tool for specifying and satisfying client needs, consider the commercial Internet where agents may be monitoring multiple information sources. A client may wish to be alerted upon a change in a stock price soon after a financial report is released. Individual servers for financial news and stock prices may push updates to the client but in order to implement the client’s profile, one needs to reason about the server capabilities of two different servers. A proxy who is handling such a complex profile would need to pull data from the stock market server soon after it receives an update from the financial report server. In this case, a single server supporting client’s needs via push is insufficient and the combination of information push from one server and pulling another server is needed. Such a setting was also shown to be of interest in the digital libraries arena [5]. Finally, consider a portal that allows clients to specify their detailed information needs, e.g., a specific news category or updates to both the stock market and financial reports. For privacy reasons, a client may not wish to reveal her interests at this portal but may be willing to reveal her private profile to a trusted proxy instead. The three examples given above show the necessity of a specification language for sophisticated client needs and a framework for resource monitoring and data delivery that goes beyond existing standards. Current data delivery solutions can be classified as either push or pull solutions or hybrids, each suffering from different drawbacks. Push (e.g., [19]) is not scalable, and reaching a large numbers of potentially transient clients is typically expensive in terms of resource consumption and implementation by a server. In some cases, where there is a mismatch

with client needs, pushing information may overwhelm the client with unsolicited information. Pull (e.g., [18, 9]), on the other hand, can increase network and server workload and often cannot meet client needs. Several hybrid push-pull solutions have also been presented in the past [11] . As illustrated by the three examples, and the limitations of current data delivery solutions, there is a need for a framework where both clients and servers can specify client data needs or server data delivery capabilities. Each client (or server) specification should be independent of the other needs or specifications, and should be transparent to the current data delivery solution. Within this framework, a proxy for the client takes the responsibility to match client needs with a relevant server. The proxy must determine if the server capability covers the client needs (See Section 4.1 for a formal definition of cover). Further, the proxy must generate a data delivery schedule which meets the client’s needs and exploits the available server capabilities. The proxy may augment server push with pull actions to meet client needs. The ProMo framework (including language, model, and algorithms) meets this need and can lead to a protocol for flexible, scalable, and efficient data delivery. In the ProMo framework, a client can specify a profile for data delivery needs using an expressive specification language that includes notification rules. Notification rules are also utilized to specify server capabilities. ProMo supports an abstract model where the semantics of a notification rule can be exploited to reason about a timeline of events occurring at the server, and the data needs of the client. Such reasoning is formalized to determine whether a set of server capabilities can cover client needs. ProMo can also determine an appropriate schedule to both pull data from servers and to push events or notifications to clients. To the best of our knowledge, ProMo is unique in its ability to reason about the semantics of notification rules and to generate schedules that can deal with a variety of data delivery capabilities at the server. We report in this work on a proof of concept prototype for ProMo. We use example scenarios that use real trace data to illustrate the reasoning capabilities of our algorithm and their use in correctly building an efficient monitoring schedule. Our contributions can be summarized as follows: • At a conceptual level, we present the ProMo framework, aiming at the satisfaction of user profiles, given server capabilities. • We provide the essence of an expressive profile language, focusing on notification rules. Our main contribution here involves the common language for specifying client needs and server capabilities. This feature is a cornerstone in our ability to identify satisfiability. • At an algorithmic level, we present a scheduling algorithm for satisfying a user profile by combining server pushed events and client monitoring. • Using real trace data from RSS servers and example profiles, we demonstrate the feasibility of our algorithm under various scenarios and compare it to a proxy that can only use TTL. The rest of the paper is organized as follows. Section 2 introduces ProMo, a framework for Web resource monitoring

and data delivery. The highlights of the profile language are given in Section 3 followed by the definition of user profile satisfaction and a scheduling algorithm in Section 4. Implementation experiences are discussed in Section 5, followed by related work in Section 6. The paper is concluded in Section 7.

2.

PROMO : A FRAMEWORK FOR WEB RE-

SOURCE MONITORING AND DATA DELIVERY ProMo is a framework for data delivery over wide area applications, such as the Web. We assume an architecture with servers who are data providers and clients who are data consumers. A client is associated with a proxy that manages data delivery for the client. Servers can deliver data to clients by pushing it to clients. Alternatively, a proxy can pull data from a server. Data delivery options can be classified based on the server capability as will be discussed next. The ProMo framework emphasizes meeting client needs, independent of server capabilities. To do so, ProMo includes an expressive language to specify server capabilities and client needs (Section 3). Next, we provide a solution that first validates the satisfiability of a client profile, given server capabilities (formalized as a coverability property). For those unsatisfied profiles, ProMo will develop a data delivery schedule that exploits any available server push options, further augmented by polling the server (Section 4.2). We present a classification of server capabilities that can be handled by ProMo, and then provide terminology to be used later in the paper. Server capabilities are as follows: No push services: A server in this category is open for client pull requests, possibly under politeness constraints [12], yet does not provide push services. This is the most common scenario over the open Web, e.g., with RSS feeds. Proxies need some estimation mechanism (e.g., an update model [16]) to decide when to probe a server. Such a monitoring is possibly subject to clientside resource constraints (e.g., [27]). ProMo will need to poll a server in this scenario. Server-defined push services: A server in this category offers a (limited) set of push services for client subscription. These services may not match a client’s actual needs. eBay, for example, pushes information to a client who made a bid, whenever the client was outbid. As another example, a server may push each update event, whereas the client may be interested in periodic updates. Finally, a server may notify a client of updates without delivering the updated data (termed value discrepancy). With such server capabilities, ProMo has to match client needs with server capabilities, filter redundant events, and augment server push by monitoring tasks. Client-defined push services: A server in this category receives client specifications and delivers data accordingly. This scenario is not very common on the Web, since it requires personalized services. However, we believe that with the growing competition in the data delivery market, we will witness a growing personalization in the market of data delivery services. In this

scenario, the reasoning framework of ProMo can be exploited/modified by the server to schedule pushes to meet a client need. Next, we define some terms. Let R = {R1 , R2 , ..., Rn } be a set of resources; a schema defines each resource R ∈ R with a class, class name and properties. We utilize RDF [29] and RDF Schema [30] for resource description. For example, Channel and Item are two resource classes used to describe the RSS feeds published by RSS server. We allow two types of timelines. The first is based on absolute time and is referred to as a timeline. The second is a relative timeline and represents a set of events occurring at the server. We refer to this timeline as eventline. Mapping an eventline to a timeline can be performed using an update model, describing the arrival of events at the server. The algorithm in Section 4.2 can uniformly handle both timelines and eventlines. For simplicity, our explanation uses a timeline. Let T be an epoch and let {T1 , T2 , ..., TN } be a set of chronons (time points) in T . An alternate solution is to use an eventline, e.g., every update to Channel. For some server capabilities, ProMo may need to rely on a model of the update process of data at the server to estimate the occurrence of an update event. Such a model is given in stochastic terms. For example, in [7, 16] the use of an update model based on Poisson processes was proposed. We shall use an update model, based on nonhomogeneous compound Poisson processes, following [16], capturing timevarying update intensities. Such a model better reflects scenarios in which RSS feeds are updated more rapidly during work hours, or when eBay bid arrival rate increases towards the end of an auction. A life policy determines how the server will maintain past updates to resources ([27]) and the policy may vary. A restrictive life=overwrite always overwrites the previous update history. An alternative life=window(Y) is used for RSS data feeds, keeping updates within Y units of time before discarding them.

3.

PROFILES

We now define ProMo profiles. Profiles are declarative specifications for data delivery. A profile should be easy to specify and sufficiently rich to capture client requirements or server capabilities. A profile should have clear semantics and be simple to implement. An example profile for the RSS feed for CNN top stories is provided in Figure 1. If used as a server capability, it specifies that the server will push the title and description of the Item and it will do so every third update. Conversely, as a client profile, the client requires being notified every third update. We use this example throughout the paper. A full specification of the profile language is available in [28]. A ProMo Profile p (in XML) contains two element types, Domain and Notification. A Domain of a profile p is a set of resources Domain(p) = {R1 , R2 , ..., Rn }. For the client, Domain describes the set of resources the client is interested in, while for the server the Domain is a set of resources the server maintains. A profile domain has two sub domains, Domain(Q) and Domain(Tr) which may overlap. Domain(Q) is a set of all resources that are referenced by the query part of all profile notifications, while Domain(Tr) is a set of all resources that are referenced within the trigger part of all profile notifications (see Section 3.1). A resource

Suggest Documents