Dec 31, 1999 - sense, network caching can be considered a best-effort network storage service. ... mirroring and web-hosting solutions are usually limited to static, .... the long-term competitiveness of the storage service industry [10].
stor-serv: Adding Quality-of-Service to Network Storage John C.-I. Chuang University of California, Berkeley
Marvin A. Sirbu Carnegie Mellon University Abstract
Network caching, while robust and adaptive, can only be considered a best-effort network storage service. Cache misses introduce significant variations in data access times, and this may be unacceptable to some applications that require consistently fast access to their data objects. The stor-serv framework, inspired by the intserv and diffserv frameworks in the data transmission domain, brings the concept of Quality-of-Service (QoS) to the network storage domain. The framework supports multiple service classes, each with varying degrees of QoS, to cater to the requirements of different applications. 1.
Introduction
The network caching phenomenon can be seen as the latest step in the evolutionary extension of the memory hierarchy. Caching starts at the hardware level (registerfile, primary cache, secondary cache), moving up into the operating system, distributed file systems (as well as distributed databases), networked applications (e.g., web-browser cache), to the edge of the network (proxy cache), and finally into the network itself. In fact, network caches are organized hierarchically as well. The growth of Internet traffic, together with the strong references of locality observed in wide-area data access patterns, makes network caching a natural and inevitable design choice. Network caching is attractive because it brings data closer to the clients. Previously, a client’s request for a data object would have to travel all the way to the server, and the requested object would then be returned all the way back to the client. Now, the client’s request may be satisfied by one of many caches, at key locations in the network, that holds a copy of the requested object. Network caching results in the following benefits: 1. 2. 3. 4.
reduced access latency, reduced server load, improved data availability, and reduced bandwidth consumption.
This appears to be a win-win-win situation. First, the clients (and the network and publishers indirectly) benefit from a reduction in access latency. Second, the publishers
Presented at the Workshop on Internet Service Quality Economics, Cambridge MA, December 1999.
benefit from a reduction in server load and an improvement in data availability. Third, the network operator benefits from a reduction in bandwidth consumption. Unfortunately, there are two important shortcomings to network caching. Firstly, network caches are deployed and run by the network operators. Publishers have no control over the number, placement, lifetime and validity of the cached objects. Neither do the caches provide any reporting on the usage of the cached objects. These arrangements do not sit well with publishers who wish to maintain control over their intellectual property, nor with publishers who require accurate object access statistics to support their advertisement-based revenue model. These publishers have resorted to bypassing the network caches (i.e., cache-busting) by explicitly tagging their objects ‘non-cacheable’. Secondly, caches have finite storage capacity and therefore cannot possibly keep every data object they see forever. Old objects have to be constantly purged to make room for new ones. Replacement policies such as least recently used (LRU), least frequently used (LFU), greedy-dual, or variants thereof, have been adopted to maximize hit rates, and these rates may reach 50-60% in practice [1]. Recent proposals such as cooperative caching [2], adaptive caching [3], etc., are aimed at improving the overall hitrate of the network caching system. However, the inevitability of cache-misses implies that caches cannot be relied upon to deliver even the most popular data objects on a consistent and predictable basis. The lifetime of a given object in a cache is a function of not just its own popularity, but of the aggregate traffic seen by the cache as well. In this sense, network caching can be considered a best-effort network storage service. 1.1
stor-serv: Beyond Best-Effort Caching
In the data transmission domain, the Internet Protocol (IP) is referred to as a besteffort transmission service (Table 1). IP routers implement some queuing discipline such as first-in first-out (FIFO), and since the routers have finite buffer capacity, packet-drops are possible, leading to jitter, or variations in packet delivery time. Some distributed network applications (especially real-time applications) may have specific latency, jitter, loss-rate and/or bandwidth requirements that cannot be satisfied by best-effort IP. These applications need to secure transmission resources that provide some form of Quality-of-Service (QoS) guarantee or assurance that are compatible with their application-specific performance requirements. The Internet Engineering Task Force (IETF) has provided two standardized QoS frameworks, Integrated-Services (intserv) and Differential-Services (diffserv), to support these application needs [4, 5]. Different service classes with varying degrees of QoS (e.g., guaranteed service, premium service, assured service) are defined within these two frameworks.
2
Similarly in the network storage domain, we say that network caching is a besteffort storage service. Caches implement some replacement policy such as least recently used (LRU), and since the caches have finite capacity, cache-misses are possible, leading to variations in data access time. Some distributed network applications may have specific performance requirements that cannot be satisfied by best-effort caching. These applications may be mission-critical, have stringent performance and/or availability requirements, or place high value in consistent data access latency, even with changing traffic patterns and network conditions. Some applications may require the data objects to be readily available even if they do not exhibit reference locality, or are rarely accessed at all. No amount of intelligent or adaptive caching, or over-provisioning (short of infinite cache size) can address the needs of these applications. These are the motivating applications for the stor-serv framework that we present here. Table 1. Applying QoS Concepts from Network Transmission to Network Storage Transmission Domain
Storage Domain
Internet Protocol (IP): •= best-effort transmission service •= router implements queuing discipline, e.g., first-in first-out (FIFO) •= packet-drops possible due to finite buffer capacity •= variations in packet delivery time (jitter)
Caching: •= best-effort storage service •= cache implements replacement policy, e.g., least recently used (LRU) •= cache-misses possible due to finite cache capacity •= variations in data-access time
intserv/diffserv: •= QoS frameworks providing preferential treatment for data packets •= Routers can handle packets of different service classes: •= guaranteed service •= premium service •= assured service •= best-effort IP
stor-serv: •= QoS framework providing preferential treatment for data objects •= stor-serv nodes can handle objects of different service classes: •= object level replication •= push caching •= differential caching •= best-effort caching •= other classes can be defined
We define stor-serv as a unified QoS framework that supports network storage services ranging from best-effort caching to object level replication with performance guarantees [6]. Publishers can select the appropriate class of storage service to satisfy 3
their application-specific requirements. The network storage provider will optimally allocate storage resources to meet these service commitments, using leftover capacity for best-effort caching. Content consumers retrieve the nearest copy of the data object, be it from a replica, cache, or the original source, in a completely transparent manner. Object level replication, as the name suggests, provides network replication of individual data objects. It is analogous to guaranteed service in the transmission domain, in that it provides performance guarantees such as object lifetime and latency bounds. Clearly, resource reservation and admission control are necessary for this service class. The stor-serv framework provides a standardized set of service semantics and an automated service provisioning process to support this service class. It is important to note that object level replication is very different from traditional replication strategies such as mirroring and web-hosting. In the absence of an automated service provisioning process, setting up a mirror or contracting to a web-hosting service necessarily involves high degrees of customization and human intervention. Therefore mirroring and web-hosting solutions are usually limited to static, long-term arrangements, involving entire sites as opposed to individual data objects. Responding to changing traffic patterns and network conditions is extremely costly, if not impossible, in these cases. Differential caching does not provide any service guarantee, but instead offers preferential treatment to “premium” objects while they are in cache [7]. For example, a premium object may age at a slower rate than a best-effort object while both are subject to the same local cache replacement policy (e.g., LRU, LFU, etc.) The cache residency is thus effectively lengthened for the premium object. The advantage of this service class is that no explicit resource reservation is necessary. Publishers simply acquire (for a fee) the right to mark their objects as “premium”. In this sense differential caching is analogous to the premium and assured service classes in the transmission domain. Reverse proxy caching, a recent caching proposal that is gaining popularity, is a special case of differential caching [8]. In this case, objects from a specific server are given priority over other objects (if there are any). However, within this pool of “premium” objects, the local replacement policy still applies. There are really two dimensions of QoS to each storage service, namely the object placement policy and object replacement policy. In simple caching, object placement follows a demand-driven diffusion pattern, and object replacement is dictated by the local rules at each cache. In differential caching, object placement remains demand-driven, but once the objects arrive at a cache, they are subject to a custom replacement policy. Push caching [9] is the opposite of differential caching, in that the publisher specifies the cache nodes to which objects are pushed, but once the objects arrive at the caches, they are subject to the local replacement policies of each cache. Finally, object level replication allows the publisher to specify custom placement and replacement policies for its objects. This is summarized in Table 2.
4
Table 2. Placement and replacement dimensions of storage QoS Service Class Best effort caching Differential caching Push caching Object level replication 2.
Custom Placement No No Yes Yes
Custom Replacement No Yes No Yes
The stor-serv Framework
Figure 1 illustrates the process of turning a publisher’s performance requirements into performance realization within the stor-serv framework. The key components of the framework include: •= service specification •= service provision −= resource reservation −= resource mapping −= admission control •= resource management •= resource discovery •= metadata management •= security •= economics Consider a publisher who requests object level replication for its data object. Associated with this object are (i) its QoS performance requirements and (ii) its traffic profile, such as object size, or even some a-priori information about the object’s information access pattern (across space and time). Using some standardized semantics, the publisher can generate a service specification and forward it to the network storage service provider using some well-established resource reservation protocol. The provider, armed with knowledge regarding network topology, resource availability, and other current and projected resource demands, performs resource mapping. This translates the high-level QoS requirements into low-level physical resource requirements, such as a target list of individual storage nodes. At this point, the individual nodes will have to be consulted to see if they are able to admit the service request (admission control). If one or more of the storage nodes reject the request, then an alternative resource mapping must be computed, and the process repeated. If all the nodes accept the request, then the service can be established.
5
In the case where the publisher is interested in obtaining a push caching service, the service provider will perform resource mapping to determine the set of target storage nodes, but can bypass the admission control step since no object lifetime guarantees are requested. Finally, both the resource mapping and admission control steps may be bypassed for a differential caching service request. In this case, the publisher acquires the privilege to put “premium” tags on its objects. It is then up to the resource managers at the individual stor-serv nodes to implement the preferential treatment of these “premium” objects. Individual stor-serv nodes perform real-time resource management, either to maximize local and global resource utilization, or to maintain service commitments when faced with changes in network conditions (e.g., congested or failed links, nodes). The resource manager is also responsible for the enforcement of best-effort and differential caching replacement policies.
Publisher
Service Specification
Network Topology & Resource Availability
Security
Resource Mapping
Report back To Publisher
Admission Control
Metadata Management
Network Conditions & Traffic Patterns Resource Management
Economics
Realized Performance Resource Discovery
Clients
Figure 1. From performance requirements to performance realization: the stor-serv framework. 6
The storage service provider is also responsible for directing the clients to the nearest copy of each requested object. If the individual stor-serv nodes are organized in the same manner as the network caching hierarchy, then this resource discovery problem is quite straightforward. However, if we do not wish to constrain the topology of the stor-serv nodes, then we will have to deal with issues such as naming, name resolution, and distance estimation. The combined output of resource management and resource discovery is, hopefully, a performance realization that is consistent with the performance requirements originally specified by the publisher. To verify this, the metadata management component is responsible for the measurement (metering) and logging of individual object requests, the aggregation of these logs across nodes, objects and services, and reporting back to the publishers for purposes of accounting, billing and audit. Depending on the level of granularity and detail of the reports, the metadata traffic volume can be as large as the data traffic volume itself. Finally, and perhaps most importantly, the publishers can now obtain verifiably accurate statistics on the access frequency of the distributed copies of its objects. There are two additional components that are important to the stor-serv framework. First, in order to establish a trust relationship between publisher and service provider, we need to put in place a robust security model. Secure communication mechanisms are necessary for the authentication and authorization of data objects at individual storage nodes, as well as for billing, payment, etc. Second, in order to ensure the most efficient use of the storage service infrastructure, well-designed economic mechanisms need to be put in place. For example, appropriate differential pricing models are necessary to support the provisioning of multi-attribute storage services. Prices will serve as market signals to publishers, encouraging the optimal use of network resources at the demanded level of performance. At the same time, it is not pre-mature to consider the industrial organization of the stor-serv service infrastructure, such as the vertical relationship between network transmission and storage services, and how it may impact the long-term competitiveness of the storage service industry [10]. In the next three sections we will discuss the technical requirements and design issues for three central components of the stor-serv framework: service specification, service provision, and resource management. 3.
Service Specification
In Section 2 we identified three broad classes of storage services beyond simple caching: differential caching, push caching, and object level replication. Now we need to establish the service specification semantics so that publishers and storage service providers can communicate, using unambiguous metrics, the requirements and expectations of a service commitment. 7
There are two chief elements to a service specification: traffic profile and performance requirements. In data transmission, the traffic profile of the source is usually expressed as some combination of peak and average rates, maximum burst length, token bucket filter rate, etc. Performance requirements, on the other hand, are usually specified in delay bounds, acceptable loss rates, etc. When a service contract is established, the network is responsible for meeting the performance requirements, so long as the source transmits data within the prescribed traffic profile. When applied to the specification of network storage services, the traffic profile declares the number and sizes of the publisher’s data objects, the time and duration of the service, and if known, the spatial and temporal distribution of demand for the data objects. The performance requirements spell out the placement and replacement policies of the service. In differential caching, the service specification spells out the custom replacement policies using parameters such as initial_age, age_rate, priority, etc. These parameters instruct the caches in managing these ‘premium’ objects. For example, premium objects may age at a slower rate than best-effort objects, or are endowed with a negative age when they first enter the cache. Alternatively, the cache may establish different “priority queues” for premium and best-effort objects, and refrain from purging premium objects until all the best-effort objects have been purged. Because differential caching does not involve any custom placement of objects, there is no need for explicit resource mapping and admission control by the service provider. The service specification can simply be attached (as metadata tags) to the data objects themselves, and interpreted by each cache locally. In object level replication, the publisher specifies the custom placement and replacement policies to satisfy some application-specific performance requirements. The performance requirements can be expressed along one or more of the following (sometimes overlapping) dimensions: •= •= •= •= •= •= •= •=
data access latency (mini-sum, mini-max) data access jitter minimum object lifetime acceptable miss rate (including 0%) data availability/redundancy coverage area bandwidth savings cost
Finally, push caching can be considered as a special case of object level replication, with a minimum object lifetime of zero. No explicit resource reservation and admission control is required for push caching, but the data objects will be subject to the local replacement policies enforced at each cache. 8
The stor-serv framework has to be able to accommodate new service classes and new performance metrics as the market demands them. We provide some example services here for illustrative purposes (Table 3). These services are just a small sample of the many possible services that may be offered over the distributed network storage infrastructure. Clearly, the more types of services to support, the richer the specification semantics need to be. The challenge, as always, is in achieving the right balance between simplicity and flexibility. While these example services offer a glimpse into the many dimensions along which services may be classified, we choose to highlight one particular dimension in the following sub-section. Table 3. Some examples of stor-serv service classes. Service Differential Caching #1 #2 #3
Description (traffic profile, performance requirements) premium = yes; age_rate = 0.5 premium = yes; initial_age = -1 hour premium = yes; priority_queue = 1
Object Level Replication1 #1 Deterministic 1MB storage capacity for 1 hour, 200ms maximum latency #2 Average 1MB storage capacity for 1 hour, 100ms average latency #3 Combination 1MB storage capacity for 1 hour, 100ms average latency, 200ms maximum latency #4 Stochastic 1MB storage capacity for 1 hour, Probability[latency > 200ms] = ε #5 Geographic 1MB storage capacity for 1 hour, 200ms latency bound for all receivers in specific domain or region, or to specific set of receivers #6 Budget1MB storage capacity for 1 hour, minimizing worst-case latency, subject to constrained budget constraint of no more than K replicas #7 Placement1MB storage capacity for 1 hour, at N specific nodes oriented #8 Advance 1MB storage capacity for 1 hour, starting from 2330hr, December 31 1999, reservation 200ms maximum latency
3.1
Deterministic vs. Statistical Guarantees
Services can be differentiated by the "firmness" of their guarantees. The QoS work in the data transmission arena provides ample illustrations. The Internet Engineering Task Force, for example, has specified three service classes within the intserv framework: guaranteed service, controlled load service and best effort service. Additionally, it has currently specified two service classes within diffserv: premium 1
While most of these example services are specified with latency requirements in milliseconds, they can also be specified in terms of network hops. Alternatively, the performance requirements may not be latency-based at all.
9
service and assured service. Similarly, the ATM Forum has specified four classes: constant bit rate (CBR), variable bit rate (VBR), available bit rate (ABR) and unspecified bit rate (UBR) [11]. These service classes can be characterized as providing one of the following performance guarantees: deterministic, statistical, or no guarantee. Applying this to our examples of object level replication services, we see that services #1, #5 and #8 provide deterministic guarantees on access latency. All data accesses are guaranteed to experience no more than the stipulated 200ms delay. Services #2 and #4, on the other hand, offer statistical guarantees. Service #2 makes latency guarantees only for data accesses in the aggregate, but not for individual data accesses. For service #4, up to ε of data accesses may fall outside the latency bound without violation of the commitment. Service #3 offers a combination of deterministic and statistical guarantees. Finally, the best effort service of network caching corresponds to the base case of offering no guarantees. It is important to recognize that services #2-4 do not necessarily represent the full range of services with statistical guarantees. The exact specification of statistical guarantee services may be dependent on the stochastic nature or the source of burstiness of the traffic load in question. There are two sources of burstiness in the demand for network storage capacity. First, it is conceivable that some content owners may experience fluctuations in the size of their corpus. News publishers, for example, may have a relatively stable corpus size for ordinary news days but an explosion of additional news articles on days with extraordinary world events or stock market activity. These publishers may wish to characterize their traffic load with average and peak capacity numbers. Second, data access patterns may be bursty with respect to the objects requested, the geographic locations of the consumers, etc. These patterns may or may not be amenable to characterization using some demand distribution function (across objects, space and time). To the extent that these stochastic behaviors or burstiness can be accurately characterized and made known to the network, appropriate statistical multiplexing techniques can be applied to improve storage utilization. On the other hand, for those applications with no burstiness in storage demand, they cannot hope to realize any statistical multiplexing gains, and are better off with deterministic-guarantee services. 4.
Service Provision
There are three main components of network storage service provision: resource reservation, resource mapping and admission control. These components have their counterparts in QoS provisioning in the transmission domain. However, not all components are involved in every storage service request. Push caching services may bypass the admission control process; differential caching services may bypass the 10
reservation, mapping and admission control processes. Only object level replication will involve all three provisioning steps. 4.1
Resource Reservation Protocol
A resource reservation protocol allows the service requester and the service provider to communicate and negotiate the reservation of transmission and storage resources according to the service specifications. In addition to the specification of traffic profiles and performance requirements (both placement and replacement), the protocol should also allow the publisher to specify other requirements, such as provisions for furnishing delivery logs and/or other indications that service guarantees are being met. The resource reservation protocol for network transmission services, RSVP [12], serves as a useful starting point for discussion. One possibility might be to extend the current RSVP protocol so that it can support reservation requests for storage resources as well as transmission resources. However, we foresee some difficulties with this approach. First of all, the concepts of the routing path and end-to-end reservation do not apply to storage. Secondly, in the case of replication, the “receivers” or the content consumers may not be known at reservation time. Whereas both sender and receiver(s) are involved in transmission-based resource reservation, only the content owner is involved in the storage-based case. This goes against the fundamental design philosophy of receiver-initiation in RSVP. The specification of the resource reservation protocol is outside the scope of this work, and should be postponed until the overall network storage service provisioning architecture has been defined. 4.2
Resource Mapping
Resource mapping is the translation of high-level service specifications into lowlevel resource requirements. To be able to make optimal resource allocation decisions, the resource mapping entity has to be constantly updated with the status and availability of a heterogeneous set of resources at a global level. It may need to maintain a knowledge-database with information such as network topology, storage capacity, link capacity, link delay, network condition, and predictions of future traffic patterns (possibly based on measurements of current traffic patterns). For a storage-based QoS service provider, such as a web-hosting service that does not control transmission resources, the resource mapper will map QoS requirements into storage resources only. It does so by assuming that only best effort transmission service is available, and this service is characterized by some delay distribution on each link. On the other hand, for a unified transmission-storage QoS infrastructure, the resource mapper may map QoS requirements into a combination of storage and transmission resources. These transmission resources may range from dedicated transmission capacity (e.g.,
11
leased lines), QoS services based on intserv, diffserv, to IP ‘overnet’ services that provide single-hop connectivity between specified end points.2 For storage services with deterministic guarantees, resource mapping has to be performed based upon the peak or worst-case resource requirements. The demand distribution of data accesses is irrelevant; the resource mapper simply identifies the set of network nodes at which storage capacity needs to be reserved in order to meet latency and/or other performance requirements for any object requested by any consumer. For storage services with statistical guarantees, the resource mapper can take into consideration the probability distribution of data accesses when determining the optimal set of network nodes. To the extent that demand for network storage can be characterized as Markovian, it may be possible to apply the effective bandwidth or equivalent capacity concepts from the data transmission domain [14]. In [15] we show that the resource mapping problem can be formally characterized and solved as a facilities-location problem, as in the location theory literature [16]. In particular, the mapping of a service with deterministic guarantee can be described as a weighted k-center problem, while that of a service with statistical guarantee can be described as a weighted k-median problem. In either problem instance, the objective is to find the optimal number and placement of replicas such that the delay or distance bound is met. By applying the formal model to the early ARPANET topology as an example network, we are able to demonstrate the operation of the resource-mapping process, including the mapping into an optimal combination of storage and transmission resources. Qualitatively, the results agree with one's intuition: •= there is an inverse relationship between the delay bound and the number of replicas required to achieve it; •= optimal replica placement changes with the number required: the optimal locations for two replicas may not overlap the optimal locations for three replicas; •= statistical delay bounds can be met with fewer replicas by exploiting locality of reference to place fewer replicas near concentrations of demand; and •= much of the statistical benefit of full replication can be achieved with smaller, partial replicas that contain the most heavily accessed objects. The primary contribution of our formal model is to quantify precisely these intuitions. For example, we show how to calculate the magnitude of the performance degradation
2
Digital Island, an Internet Service Provider, offers single hop connectivity between major network access points throughout the world by selective provisioning of network capacity. This service is used by online publishers, for example, to achieve performance targets for their information dissemination applications [13].
12
when replicas are constrained to be located at predetermined server sites as opposed to the optimal locations. 4.3
Admission Control
Because network transmission and storage capacities are finite, not all service requests can be accepted without adversely degrading the performance of the network. Therefore, admission control is needed to reject those requests whose service contracts could not be fulfilled by the resources available at the time. Admission control occurs in two stages. First, individual resource nodes (network switches or storage nodes) make local decisions as to whether a service request can be accommodated given the current availability of local resources. If all local decisions are positive, then a global check on aggregate requirements (e.g., aggregate delay bound) is performed (if necessary) before the final accept/reject decision is made. In the case of transmission, admission control occurs along the routing path between sender and receiver (or receivers in the case of multicast). Switching nodes make local conditional acceptances and forward the request downstream, or send a reject message back to the sender. If a conditional acceptance is made, the switching node is obliged to set aside the requested capacity until the aggregate admission control decision is made, at which point the capacity is either fully committed or returned to the available pool. Therefore, the local admission control decisions have to occur sequentially on a hop-by-hop basis, and are finally followed by the aggregate decision. In the case of storage, there is no notion of a path within a service request, and so all of the local admission control decisions can occur independently and in parallel. Each individual node, if returning a positive local response, will also conditionally commit its storage resource in anticipation of service establishment. The aggregate admission control is simply to collect local responses from all of the contacted nodes (no additional global checks are necessary since there are no end-to-end requirements to be met). If all the responses are positive, an accept decision is made and the service is established. If one or more responses are negative, a reject decision is made, and the conditionally committed resources are returned to the available pool. There is clearly a tightly coupled relationship between admission control and resource mapping. Therefore, it is important to recognize and leverage the possible synergy that may exist between the two entities. When resource utilization level is high, the likelihood of a service request being rejected by the individual resource nodes increases, and the resource mapping and admission control process may be iterated several times before a success is finally encountered. In this situation, it may be appropriate for the resource mapping and admission control functions to switch to a “greedy” algorithm or a quorum-based algorithm.
13
Both approaches reduce the number of possible iterations by sending admission control queries to more than enough nodes at the first attempt. In the “greedy” algorithm, the resource mapper will provide multiple sets of nodes that can satisfy a particular service request. The sets may or may not have common elements. The admission controller will send queries to the union of the sets, and declares the request admitted as soon as it receives positive responses from all the nodes of any given set. In the quorumbased algorithm, the resource mapper will provide a set of candidate nodes to which queries are sent. The service request will be declared admitted as soon as a quorum number of nodes return a positive response. 5.
Real-Time Resource Management
After the establishment of network storage services, the service provider has to perform real-time resource management in order to meet and enforce all service commitments. In network transmission, resource management crudely means deciding which packets to transmit next (scheduling management) and which packets to drop (buffer management). The simplest queue discipline is FIFO (first-in first-out), which results in best effort transmission. To accomplish QoS guarantees, a combination of packet scheduling such as fair-weighted queuing and traffic shaping at the edge of the network (e.g., token bucket with leaky bucket rate control) is necessary [17]. In network storage, resource management means deciding which data objects to keep in memory, which objects to purge. The most common replacement policy is LRU (least recently used) and it results in the implementation of best effort caching. To support QoS in network storage, we need to support the coexistence of data objects from best-effort caching, differential caching, and guaranteed-service replication. Replicated objects have to be kept in memory for the entire duration of their service contract, while cached objects are aged and purged according to some object replacement policy. In addition to the variety of network cache replacement heuristics being proposed, cache replacement strategies can also include directives from the publisher (HTTP 1.1’s nocache pragma), and ad hoc rules for identifying dynamic pages. The techniques for marking and keeping replicated objects in memory might be adapted from virtual memory management (e.g., page locking) or distributed file system design (e.g., hoarding [18]). Finally, cache consistency mechanisms and replication update policies have to be put in place, and techniques for accomplishing these are readily available from distributed databases and file systems design. 5.1
Local Storage Management
Several important research questions have to be addressed with regard to local storage management. First, is there an optimal mix between replicated and cached 14
objects in a network storage node? If so, what is the optimal mix? Alternatively, should a minimum fraction of storage be dedicated to best-effort caching? Intuitively, it makes sense not to commit all resources to replication, even though replication is expected to generate higher revenue than caching. A healthy supply of caching capacity will better deal with the burstiness in traffic and minimize the likelihood of thrashing. 5.2
Traffic Policing
Another local storage management issue is traffic policing. What happens when the content owner sends content in excess of the declared traffic profile? The storage manager exercises jurisdiction over this “non-conformant” traffic, and decides whether these objects should be discarded immediately, put into cache space (if available), or replace some existing objects in replication memory. Alternatively, the content owner may be sending an updated version of an object, in which case the stale object has to be identified and replaced. The concept of committed information rate (CIR) from frame relay may be applied here. In data transmission, performance guarantees are provided for traffic transmitted at up to the committed information rate, while traffic in excess of the CIR are delivered as best-effort traffic. This guarantees each sender a minimum share of a link resource, while allowing them to send additional traffic when other senders are idle. An analogous concept of a committed storage rate (CSR) may be developed, such that a publisher is guaranteed a minimum fraction of a multi-publisher storage facility, and can store additional objects if free space is available. 5.3
Hierarchical Resource Sharing
Hierarchical resource sharing or dynamic storage allocation also finds its analogy in link sharing in the network transmission context [19]. A content owner may have different classes of objects in its corpus, and wishes to assign different QoS levels for the different classes. The owner can make separate storage reservations, each with different performance requirements, for the different object classes. Alternatively, it can make a single storage reservation that allows real-time control over the allocation of reserved storage resources to different classes of data objects. Consider the example of a popular news web-site (Figure 2). The size of the entire corpus is 2.5GB, and the publisher classifies the objects into one of three groups. The first group comprises of objects deemed critical by the publisher, such as the homepage and its navigational bars, the headline news articles, and the advertising banners. While its current size is 250MB, the publisher expects the size to fluctuate, but not to exceed 500MB. The bulk of the news content (2GB) makes up the second group. Finally, 250MB of corporate information (e.g., press releases, job openings, mugshots of CEO and VP's) constitute the third group. 15
Group 1: critical - home page - headline news - ad-banners
250MB
250MB 500MB
720MB Group 2: normal non-headline news articles & accompanying photo/video clips
Group 3: corporate information
480MB 2000MB 20MB
30MB
Storage Quota
Actual Allocation
250MB Content
Note: not drawn to scale
Figure 2. Hierarchical resource sharing example. The publisher reserves 1GB of storage capacity and specifies the proportion to which storage will be allocated among the three groups. The publisher wants 100% of the group 1 objects to be in memory, even if the size of the group grows to 500MB. Therefore, group 1 is allotted 500MB or 50% of the storage quota. Groups 2 and 3 are then assigned 48% and 2% of the quota respectively. Since there are currently only 250MB of group 1 objects, all of these objects are guaranteed to be in memory. The extra 250MB of group 1's quota will be proportionately shared (at a ratio of 24:1) between groups 2 and 3. Therefore, group 2 gets 480 + 240 = 720MB of storage and group 3 gets 20 + 10 = 30MB of storage. Should additional objects be added to group 1, storage capacity will be reclaimed from groups 2 and 3. This ensures that group 1 objects are always in memory, up to 500MB. Without this resource sharing scheme, the publisher would have to reserve and dedicate 500MB of storage capacity to group 1, even when there are less than 500MB of objects most of the time. 16
Using this resource sharing scheme, the publisher can also control the degree of statistical multiplexing to take advantage of reference localities in data access patterns. In the same example, the publisher is able to achieve 100% coverage of group 1 objects (no statistical multiplexing), 36% coverage for group 2 objects, and 12% coverage for group 3 objects. The publisher can increase or decrease the storage quota for groups 2 and 3 to control the respective hit rates. From this example, it is clear that hierarchical resource sharing is attractive because it gracefully absorbs the "burstiness" in object-class-sizes and facilitates usercontrolled statistical multiplexing. 5.4
Global Storage Management
While the previous subsections deal with management issues local to the storage nodes, there are also global storage management issues that require study. In the normal operation of the distributed network storage infrastructure, there may be situations that require movement of data objects between storage nodes even after resource mapping and reservation. For example, changes in network status (e.g., network congestion, down nodes or links) may necessitate the movement of objects to maintain the existing service commitments. Alternatively, there may arise opportunities (e.g., termination of existing commitments, addition of new capacity) where data movement can lead to improved resource utility or load balancing. The scheduling of data migration, replication and dereplication constitutes the scope of global storage management [20]. 6.
Conclusion
Network caching has been widely deployed on the Internet, and there are many proposals to further improve the performance of the overall network caching system. However, the inevitability of cache misses means that caching is only a best-effort network storage service, and applications that rely on caching will have to live with variations in object access latencies. The stor-serv framework introduces the concept of Quality-of-Service (QoS) to the network storage domain. When fully realized, stor-serv will be able to deliver fast, consistent (and therefore predictable) data accesses even under changing traffic patterns and network conditions. Noting the many parallels between the storage and transmission domains, stor-serv heavily borrows key concepts and principles from the transmissionbased intserv and diffserv frameworks. However, the design of stor-serv also raises interesting technical challenges not encountered in the transmission domain.
17
References [1] S. Williams, M. Abrams, C. R. Standridge, G. Abdulla, and E. A. Fox, “Removal policies in network caches for world-wide web documents,” presented at ACM SIGCOMM, 1996. [2] D. Wessels and K. Claffy, “ICP and the Squid web cache,” IEEE Journal on Selected Areas in Communications, vol. 16, pp. 345-357, 1998. [3] S. Michel, K. Nyugen, A. Rosenstein, L. Zhang, S. Floyd, and V. Jacobson, “Adaptive web caching: towards a new global caching architecture,” presented at Third International WWW Caching Workshop, Manchester England, 1998. [4] Blake, S. et al., “An architecture for differentiated services,” RFC 2475, December 1998. [5] R. Braden, D. Clark, and S. Shenker, “Integrated services in the Internet architecture: an overview,” RFC 1633, June 1994. [6] J. C.-I. Chuang and M. A. Sirbu, “Distributed Network Storage with Quality-ofService Guarantees,” presented at Internet Society INET'99, San Jose CA, 1999. [7] T. Kelly, Y. M. Chan, S. Jamin, and J. Mackie-Mason, “Biased Replacement Policies for Web Caches: Differential Quality-of-Service and Aggregate User Value,” presented at Fourth International Web Caching Workshop, San Diego CA, 1999. [8] Inktomi, “Reverse proxy caching with traffic server: the benefits of caching to web hosting providers,” 1998. [9] J. Gwertzman and M. Seltzer, “The case for geographical push-caching,” presented at 5th Annual Workshop on Hot Operating Systems, May 1995. [10] J. C.-I. Chuang, “Network transmission and storage: vertical relationships and industry structure,” presented at 27th Telecommunications Policy Research Conference, Alexandria VA, 1999. [11] ATM Forum, “Traffic management specification version 4.0.,” ATM Forum Technical Committee, April 1996. [12] L. Zhang, S. Deering, D. Estrin, S. Shenker, and D. Zappala, “RSVP: a new resource ReSerVation Protocol,” IEEE Network, vol. 7, pp. 8-18, 1993. [13] J. Rendelman, “Reducing web latency -- Stanford University tries web hosting to boost 'net access,” in Communications Week, 1997, pp. 9-. [14] R. Guerin, H. Ahmadi, and M. Naghshineh, “Equivalent capacity and its application to bandwidth allocation in high-speed networks,” IEEE Journal on Selected Areas in Communications, vol. 9, pp. 968-981, 1991. [15] J. C.-I. Chuang, “Resource Allocation for stor-serv: Network Storage Services with QoS Guarantees,” presented at Network Storage Symposium '99, Seattle WA, 1999. [16] M. Labbé, D. Peeters, and J.-F. Thisse, “Location on networks,” in Network Routing, vol. 8, Handbooks in Operations Research and Management Science, Ball, M. O. et al., Ed.: Elservier Science B.V., 1995. [17] A. Parekh and R. G. Gallagher, “A generalized processor sharing approach to flow control in integrated service network - the multiple node case,” ACM/IEEE Transactions on Networking, vol. 2, pp. 137-150, 1994. [18] J. J. Kistler and M. Satyanarayanan, “Disconnected operation in the Coda file system,” ACM Transactions on Computer Systems, vol. 10, pp. 3-25, 1992. 18
[19] S. Floyd and V. Jacobson, “Link-sharing and resource management models for packet networks,” IEEE/ACM Transactions on Network, vol. 3, 1995. [20] A. Schill, “Migration, caching and replication in distributed object-oriented systems: an integrated framework,” IFIP Transactions C (Communication Systems), vol. C-6, pp. 309-329, 1992.
19