Abstract. Continuous media applications require a guaranteed transfer rate of data, which conventional storage servers are not designed to provide. The aim of ...
A Framework for the Storage and Retrieval of Continuous Media Data Banu O zden
Rajeev Rastogi
Avi Silberschatz y
AT&T Bell Laboratories 600 Mountain Avenue Murray Hill, NJ 07974 Abstract
Continuous media applications require a guaranteed transfer rate of data, which conventional storage servers are not designed to provide. The aim of this paper is to provide a general framework for the design of storage servers that deal with both continuous and non-continuous media data. We present several algorithms for the concurrent transfer of continuous media data for multiple requests with dierent rates. The algorithms provide high throughput by reducing the seek latency time and by eliminating rotational latency incurred when accessing data on disks. Each of these algorithms is accompanied by an admission control scheme to restrict the number of concurrent requests being serviced at any given time. We also augment these algorithms to support conventional data accesses without violating the rate guarantees of continuous media data requests. Finally, we extend our algorithms to deal with the newer disks, where transfer rates vary from one track to another. The algorithms presented in this paper are used in Fellini { a storage server for continuous and conventional data being implemented at AT&T Bell Laboratories.
1 Introduction
With recent advances in compression schemes and broadband networking, continuous media applications are becoming an integral part of our daily computational life. Examples are news, movies, multimedia electronic-mail, on-demand tutorials, lectures, audio, video and hypermedia documents. Continuous media applications require data to be stored or retrieved at a certain rate. For example, video data compressed using the MPEG-1 standard requires a transfer rate of about 1.5 Mbps. Since conventional storage servers are not geared to handle the demands of applications that deal with continuous media, we are faced with the challenge of redesigning storage servers so that rate guarantees can be provided to clients. Moreover, these storage servers should be able to handle both continuous media (e.g., audio and video) and conventional data (e.g., text, binary les). A Ph.D candidate in the Department of Computer and Electrical Engineering at the University of Texas at Austin. y On leave from the Department of Computer Sciences, at the University of Texas at Austin.
We refer to a request for the continuous transfer of data between the server and a client at a given rate as a real-time request, whereas a request for conventional transfer of data as a non real-time request. A storage server, thus, must ensure that once a real-time request of a client is accepted, the data transfer between the client and server can be performed at the guaranteed rate. It should also be able to handle a large number of real-time requests with di erent rates. Furthermore, it should support non real-time requests without violating the timing requirements of real-time requests, and it should not degrade the performance of non realtime requests. Most continuous media data tends to be voluminous. For example, a 100 minute MPEG-1 compressed video requires more than a gigabyte of storage space. This implies that a storage server, which provides access to continuous media clips as well as conventional data, must keep the data on secondary storage devices (disks) and page data into and out of main memory on demand. Disks have relatively low transfer rates (e.g., 40-60 Mbps). Furthermore, since disks have a relatively high latency for data access (e.g., 25-30 ms), e ectively utilizing the disk bandwidth, and thus supporting a large number of real-time requests, may result in high bu ering requirements. However, since the available bu er space is limited, the only way to support a large number of real-time requests is to develop clever techniques for reducing disk latency. Furthermore, to ensure that a client's real-time request is handled properly, an admission control scheme must be devised to restrict the number of concurrent real-time and non real-time requests being serviced at any given time. The admission controller decides on whether or not to accept (or suspend) a new request. Once a real-time request is admitted, the corresponding client is provided with a guarantee that it will be able to transfer the corresponding media data continuously at the requested rate. In order to provide rate guarantees, the admission control scheme must accurately model the resource requirements of requests and the available resources, which include the disk bandwidth and bu er space unutilized by the admitted requests. A number of schemes for handling the storage and retrieval of continuous media data from disk have been proposed in the literature 2, 8, 5, 9, 3].
However, none of them addresses the issues of servicing real-time requests with varying rates, reduction of seek and rotational latency, servicing non real-time requests, varying disk transfer rates and non-contiguous storage allocation, in a single framework. In this paper, we provide a general framework for the design of storage servers that deal with both continuous and non-continuous data. Since storing data is the dual problem of retrieving data, we only address data retrieval issues. We present algorithms for bu er management, disk scheduling and admission control. We de ne a general model for the retrieval of continuous media data that is independent of any speci c algorithm, and establish the conditions that any retrieval algorithm must meet to be correct. We present novel schemes to eliminate the overhead of rotational delay completely when admitting real-time requests with varying rate requirements. Since rotational latency is typically about half of seek latency, eliminating it results in a signi cant increase in the number of real-time requests that can be supported. Furthermore, our algorithms take into account the varying transfer rates of disks - an issue which has not been addressed by most schemes presented in the literature. Exploiting the varying transfer rates improves performance since the transfer rate of outermost cylinders is approximately twice that of the innermost cylinders for the commonly used SCSI disks. Our retrieval algorithms enable non real-time requests to be supported without violating the rate guarantees of real-time requests. We further present methods for utilizing the resources that are allocated for real-time requests, but not used, in order to reduce the response time of requests. Finally, we conclude by outlining directions for future research in the storage and retrieval of continuous media data. The algorithms presented in this paper are used in Fellini { a storage server for continuous and conventional data being implemented at AT&T Bell Laboratories. Proofs of theorems and lemmas presented in the paper can be found in 6].
2 Retrieving Continuous Media Data
A client that wishes to access continuous data, either over the network or locally, issues a real-time request to the server. Each request Ri has a speci ed rate ri associated with it. Upon the arrival of a real-time request, an admission control algorithm is used to decide whether or not to accept the new request. The algorithm is based on the availability of resources and the resource requirements of the requests. Once a request Ri is admitted, the server guarantees that the data can be retrieved from disk into its main memory bu er space in a manner that ensures that the corresponding client can consume data (i.e., read data from the server) at the speci ed rate ri 1 . We refer to the algorithm that retrieves data from disk to the bu er space as the retrieval algorithm2 . 1 Unless stated otherwise, we assume that a client never consumes data at a rate greater than the specied rate. 2 The retrieval algorithm includes disk scheduling and buer management algorithms depending on the implementation.
After a real-time request Ri is admitted, the data in the bu er is made available to the corresponding client at time bi : If the client has issued a read request before bi the client will receive (i.e., consume) the rst portion of the clip data at time bi : The di erence between bi and the time when request Ri arrived at the server is referred to as the response time of Ri : Once a real-time request Ri has been admitted, the client can subsequently access data belonging to the requested clip by issuing read requests of arbitrary sizes to the server at arbitrary times. If the requested size is greater than the amount of data available in the bu er, the number of bits currently available in the bu er is returned. The read completes, when all of the requested data is returned or the end of the clip is reached. We now present a general model for the retrieval of continuous media data from disk to the bu er space of the server, which is independent of any speci c retrieval algorithm. In general, data retrieval for each request Ri can be modeled as a sequence of instances, each of which retrieves some number of consecutive bits of the corresponding clip from disk to the bu er space. In the retrieval algorithms we present in this paper, the instances of a request Ri are initiated by the server independent of the read requests of clients. If the number of instances for each request is greater than one, then data retrieval for more than one realtime request can be interleaved thus, the response times of requests can be reduced and bu er space can be conserved. An instance starts at the time when the amount of data to be retrieved by the instance is determined it completes when all the data to be retrieved is in the bu er space. We assume that data being retrieved by an instance is available for consumption only when the instance completes, and that instances of a request complete in the order in which they are initiated. We further assume that the amount of data retrieved by all instances of a request Ri is equal to the size of the corresponding clip. For a request Ri the number of unconsumed bits in the bu er at time t is denoted by in bueri (t): For a given request, sj and cj denote the time when the j th instance is initiated (i.e., started) and completed, respectively. Furthermore, retrievei (j ) denotes the amount of data retrieved by the j th instance of request Ri . We are now in a position to state some basic properties concerning retrieval algorithms. A retrieval algorithm must ensure that each client's request Ri will not starve. A client request Ri starves if for any b consecutive bits, after the b bits are consumed by the client at time t the next consecutive bit is not available in the bu er at time t + rbi : The system ensures that a client which consumes data at most at the speci ed rate never starves. However, if a client attempts to consume data at a rate higher than the speci ed rate, then it is possible for the client request to starve. Proposition 1 below, establishes the necessary and sucient conditions for any retrieval algorithm to prevent starvation.
Proposition 1: A client's request Ri does not starve if and only if the following two conditions hold: 1. Suppose that the kth instance of Ri is the rst instance that completes after time bi : Then, the amount of data retrieved by the rst (k ; 1) instances of Ri must be greater than or equal to the amount of data that the client can consume between bi and the time the kth instance completed that is, the following condition must hold:
X retrieve (j) (c
k;1 j =1
i
k ; bi ) ri
2. For every j th instance, j k except the last instance, the sum of the unconsumed data in the bu er at the beginning of the j th instance and the amount of data retrieved by the instance is greater than or equal to the amount of data that the client can consume until the end of the next instance that is, the following condition must hold: retrievei (j )+ in bueri (sj ) (cj+1 ; sj ) ri 2 The interval between the initiations of instances and the computation time of an instance depend on the speci c retrieval algorithm. Proposition 1 is useful for establishing the correctness of any retrieval algorithm.
3 The CP Class of Retrieval Algorithms
In this section, we present a class of retrieval algorithms, and prove that any algorithm in this class does not result in the starvation of requests. This class contains a number of retrieval algorithms presented in the literature 3, 9] as well as most of the retrieval algorithms we present in this paper. Providing this class simpli es the proofs that our retrieval algorithms are starvation-free, since it suces to show that the corresponding retrieval algorithm is in this class. The class, which we refer to as the common period (CP) class, consists of those algorithms that ensure that the di erence between the start times of any two consecutive instances of an admitted request is at most T , where T is a system parameter referred to as the common period. We de ne the worst-case completion time for the j th instance of request Ri to be sj + maxk fck ; sk g, where maxk fck ; sk g is the maximum duration of an instance of Ri . The bu er space allocated for each request Ri is Bi = (T + max fc ; sj g) ri + d ; 1 (1) j j where d denotes the smallest unit of retrieval from disk (e.g., a sector). The amount of data retrieved by each instance depends on the consumption rate of the client, but it is limited by
di = d T d ri e d:
(2)
The number of bits retrieved by the j th instance of Ri is calculated as emptyi (sj ) c d d remaining g retrieve (j ) = minfb i
d
i
i
(3) where emptyi (sj ) denotes the size of the portion of the bu er that is empty at time sj and remainingi is the amount of the clip data that has not yet been retrieved from disk at time sj : Since the amount retrieved by any instance is less than or equal to the empty space in the bu er, any retrieval algorithm in the CP class will not result in overows. The data is made available to the client after the completion of the rst instance and at most T units of time before the worst case completion time of the second instance. Thus, since di bits are retrieved by the rst instance, it follows that Condition 1 of Proposition 1 holds. Furthermore, since the di erence between the start times of any two consecutive instances is at most T , and the number of bits retrieved during an instance is as described in Equation 3, Condition 2 of Proposition 1 can also be shown to hold. Theorem 1: If a retrieval algorithm is in CP, then an admitted client request Ri will not starve provided that the client consumes data at a rate equal to or slower than ri . 2 The common period T is selected by the system. The lower bound on the period is determined by the disk controller overhead, which is typically in the range 0.3-1.0ms 10]. For the Fellini storage server, the common period T is selected so that the number of concurrent requests with the most common rate (e.g., MPEG-1) is maximized, subject to the following two constraints: First, the total bu er requirements of requests must be less than or equal to the available bu er space. Second, the total disk retrieval times of requests must be less than or equal to the common period T: Since the bu ering requirements and retrieval times of requests depend on the speci c retrieval algorithm, the calculation of the value of T that meets the above constraints is presented in the examples at the end of each section in which a di erent retrieval algorithm is introduced. The CP class contains a large number of retrieval algorithms, including: a- The retrieval algorithm where each client periodically issues a read requests with a common period and where the server schedules the read requests with the earliest deadline rst algorithm 9]. b- The rst come rst served algorithm where the system generates instances for real-time requests in the order of their arrival in a cycle 3]. In the following sections, we present additional retrieval algorithms that fall in the CP class. We initially assume that continuous media clips are stored contiguously on disk we relax this assumption in Section 9 .
4 The ER Retrieval Algorithm
In this section, we present a new retrieval algorithm| the early response (ER) algorithm, which is in the CP class. The algorithm yields better performance in
terms of response time than the rst come rst served algorithm in which requests are serviced in the order in which they are admitted. The server maintains a list of admitted requests referred to as the service list. The instances for the di erent requests are initiated in the order of their appearance in the list. That is, once an instance for a request in the list is initiated and completed, the next instance to be initiated is the one associated with the next request in the list at that time. The sequence in which at most one instance per request in the service list is executed is referred to as a service cycle. A service cycle starts when an instance of the rst request in the service list is initiated it ends once the initiated instance of the last request in the list completes. A new service cycle begins immediately after the previous service cycle ends. Thus, in contrast to other retrieval algorithms presented in the literature 3, 9] which assume that service cycles are of a xed duration, in most of the retrieval algorithms we present in this paper, the length of service cycles can vary, but is bounded above by the common period T: This property is crucial to providing lower response times and reducing the bu ering requirements for each request. When a new request Ri is admitted, it is placed in the service list. The location where Ri is inserted in the service list is determined as follows. If there is a request admitted in the current service cycle before Ri that has not yet been serviced, then Ri is inserted in the service list after the last request admitted in the current service cycle. Otherwise, Ri is inserted immediately after the request being serviced. Thus, since the rst come rst served algorithm always appends a newly admitted request to the end of the service list, the ER algorithm yields a better response time. The motivation for ordering the service list as speci ed above is as follows. Ideally, the response time of a newly admitted request Ri should depend on the instance duration of only those requests that are admitted in the same service cycle, but before Ri : This is so because it is irrelevant for a request, which is admitted in any of the previous service cycles, when this request is serviced in the current service cycle, as long as it is serviced within time T of being serviced in the previous service cycle. Thus, in order to reduce the response time, a new request should be serviced immediately after all other new requests admitted in the current cycle are serviced without waiting for requests that were admitted in a previous service cycle. Once an instance completes, the amount of data to be retrieved by the instance for the next request to be serviced is calculated using Equation 3 and the data is retrieved from disk. In the worst case, an instance of request Ri could take time equal to the worst-case disk latency plus the transfer time of di bits3 , which Data in disks is stored in a series of concentric circles, or Disks rotate on a central spindle and the speed of rotation determines the transfer rate of disks. Disk latency is the time to access data on a particular track by positioning the head on (also referred to as seeking to) the track containing the data, and then waiting until the disk rotates enough so that the head is positioned directly above the 3
tracks, and accessed using a disk head.
is:
di + t (4) max f c ; s g = j j j rdisk lat where tlat is the maximum latency and rdisk is the bandwidth (transfer rate) of the disk. For the widely used SCSI (Small Computer Systems Interconnect) disk drives, the transfer rate of inner tracks is lower than that of the outer tracks. Since in this section we do not want to rely on the physical disk layout, we assume the smallest value for rdisk : The size of the bu er Bi for each request Ri is calculated as described in Equation 1: Bi = (T + r di + tlat ) ri + d ; 1 disk
To ensure that the duration of a service cycle never exceeds T a new request Rm is admitted if there is enough bu er space and if the following formula holds: m X
di + m t T lat r disk i=1
(5)
where R1 : : : Rm;1 are the requests that are contained in the service list. Once the last instance of a request Ri completes in a service cycle, in the next service cycle, after instances of requests before Ri in the list have completed, Ri is deleted from the service list. Delaying the deletion of Ri until the next service cycle, in conjunction with Equation 5 and the property that a service cycle begins immediately after the previous service cycle ends, ensures that servicing new requests early does not cause the di erence between the start times of two consecutive instances of a request to exceed T . The ER algorithm makes the data available to a client after the completion of the rst instance. Although ER yields a worst-case response time of T the average response time will be less. The ER algorithm guarantees that no client request will starve. To prove this, it is sucient to prove that the ER algorithm is in CP. To do so, we need to show that the maximum di erence between the start times of two consecutive instances of a request is at most equal to T and bi is at most T units of time before the worst-case completion time of the second instance. Theorem 2: The ER algorithm is in CP. 2 Example 1: Consider a system with the following characteristics: d = 512 Bytes, rdisk = 60 Mbps, tlat = 25:5 ms and 10 MB of bu er space. For ER, an almost optimal value of T that supports close to the maximum number of MPEG-1 requests (ri = 1:5 Mbps) is T = 2 sec. The value for T is computed as follows. Since Bi is the bu er size per request, the maximum number of requests that can be supported is B10i . Substituting (T + di +10tlat )ri +d;1 for rdisk m in Equation 5 and then solving for T yields the data.
value of T that supports the maximum number of di + tlat = 75:5 ms and streams. Thus, di 3 Mb, rdisk Bi = 389:16KB. The maximum number of MPEG1 requests that can be concurrently supported is 25, and the bu er space required to support the requests is 9.72 MB. The worst-case response time is 2 sec. However, if we assume that every 75.5 ms, at most one request arrives, then the worst-case response time is approximately 151 ms. 2
5 Seek Time Optimization
The ER algorithm services the requests with respect to their arrival without trying to optimize the disk arm movement. As a result, it calculates the length of each instance based on the worst-case assumptions. Since the maximum and minimum seek times typically di er by an order of magnitude (e.g., maximum seek time is 25 ms and minimum seek time is 1 ms), by reducing seek time, a larger number of requests can be satis ed. In this section, we present a retrieval algorithm that optimizes the disk arm movement so that the number of requests that can be admitted concurrently is increased. The new algorithm, termed C-LOOK4, differs from the ER algorithm in the order in which the admitted requests are maintained and in the admission control formula. The C-LOOK algorithm di ers from other retrieval algorithms presented in the literature 3, 9] that use scan to reduce the overhead of seek time as follows. First, the C-LOOK algorithm models seeks from one request to another independent of their locations on the disk. Second, a service cycle starts immediately after the previous one is completed without waiting for the completion of the common period. Disk latency consists of two components: seek time and rotational delay. The maximum value of disk latency tlat consists of the maximum value of seek time denoted by tseek and the maximum value of rotational delay denoted by trot: A seek is composed of a speedup, coast, slowdown and a settle time. Very short seeks (less than two to four cylinders) are dominated by the settle time (e.g., 1-3 ms), and long seeks are dominated by the coast time, which is proportional to the seek distance 10]. We model each seek as a linear function of the distance traveled plus a constant overhead denoted by tsettle , which consists of factors such as settle time. The C-LOOK algorithm orders requests in the service list in the order of the positions on disk, of the tracks they retrieve data from. Futhermore, it initiates the instances of requests in this order. Note that since continuous media clips are stored contiguously, the relative position of two requests in the service list does not change from one service cycle to another. In C-LOOK, the worst-case distance that the disk head travels during a service cycle is twice the distance between the outermost and the innermost track. Thus, we can model the worst-case aggregate seek time during one period as a constant overhead dominated by 4 The name is selected since the algorithm minimizes the seek time similar to the known C-LOOK disk scheduling algorithm 11].
the worst-case settle time tsettle for each of the m requests in the service list plus twice the maximum seek time tseek namely, m tsettle + 2 tseek : The reason for the two maximum seeks is as follows. One tseek in this formula covers, for all seeks, the part of the seek time due to the linear function of the distance traveled. The other tseek covers the overhead of moving the disk arm from its position after servicing the last request in the list to the position which will be accessed to service the rst request in the next service cycle. From now on, we will model the overhead of any m seeks in one direction simply by m tsettle + tseek : Thus, in C-LOOK, to keep the duration of a service cycle from exceeding T , a request Rm is admitted if m X
di + m (t + t ) + 2 t T: (6) rot settle seek r i=1 disk
As a result, since requests are deleted as described earlier for the ER scheme, and the relative position of requests does not change from one service cycle to another, the di erence between the start times of two consecutive instances of a request does not exceed T . Theorem 3: C-LOOK is in CP. 2 Compared to ER, C-LOOK reduces the total seek time at least by m (tseek ; tsettle ) ; 2 tseek : In the following example, we show that the reduction in seek time and in the upper bound used by admission control can yield signi cant increases in throughput. Example 2: Consider the system in Example 1. Suppose that tseek = 17ms, trot = 8:5ms and tsettle = 1ms. For C-LOOK and MPEG-1 requests, an almost optimal value of T is T = 1:5 sec. In this case, the additional overhead of each MPEG-1 request is 47 ms and the size of the bu er for each request Bi = 293:06 KB. Thus, the number of MPEG-1 requests that can be supported concurrently is 31, and the total bu er space for the requests is 9.08 MB. The worst-case response time is 1.5 sec. Even if we assume that at most one request arrives every 75.5 ms, the worst-case response time stays 1.5 sec. 2
6 Optimization of Rotational Delay
In previous sections, we assumed the worst case rotational delay for each disk access. This is an overly pessimistic assumption, since most modern disks have some form of read-ahead caches which may reduce the rotational delay substantially for one or more real-time requests. However, in order to guarantee the promised rates for each admitted request, a continuous media storage server must compute a deterministic upper bound on the total time spent for disk latency. In the following subsections, we present algorithms that reduce the time spent due to rotational delay for a request and yield a smaller upper bound on the service time of each instance of a request. Since the worstcase rotational latency is about half as much as the
worst-case seek latency, eliminating rotational latency results in a signi cant increase in the number of realtime requests that can be supported concurrently.
6.1 The FBF Algorithm
The FBF algorithm is based on two key ideas. First, during every instance except the last one, an integral number of tracks are retrieved. During the last instance, the remaining data of the clip is retrieved. Second, each retrieval immediately starts at the time when the disk head arrives at the track containing the data without waiting for the beginning of the data (this is similar to the concept of \on-arrival readahead" caching of an entire track that exists in some disk technologies). FBF restricts the storage allocation as follows. First, if the size of a clip is equal to or greater than the storage capacity of a track, then the rst portion of the clip occupies a full track. Second, if a clip does not ll a track, the remaining portions of the track can be used to store conventional data, which do not need to be retrieved at a guaranteed rate. FBF can be incorporated into any of the previously presented algorithms by simply selecting the value of d equal to the track size, namely, d = rdisk trot : Thus, at the cost of increasing the bu er size by the track size, for each request Ri the worst case service time excluding the seek time can be reduced. Since no rotational latency is incurred, a new request is admitted if m d X i (7) r +LT i=1 disk
where L is the aggregate seek time depending on the order of the service list. Thus, if the list is maintained in ER order, then L is equal to m tseek if it is maintained in C-LOOK order, then L is m tsettle +2 tseek : Note that if T ri is a multiple of the track size, then in Formula 7, the time to service request Ri includes only the seek latency and the time to retrieve T ri bits. On the other hand, if T ri is not a multiple of the track size, then the service time for Ri includes time to retrieve di > T ri bits. As a result, in certain service cycles, fewer than di bits are retrieved for Ri . The disk bandwidth that is allocated to a real-time request but unutilized by that request can be used for non real-time requests as described in Section 9.4. Example 3: Consider the system in Example 2. The track size can be calculated as 510 Kb. If FBF is used with C-LOOK, an optimal value of T is 1 sec. Thus, di = 1:53 Mb and Bi = 259:12 KB. Also, the maximum number of MPEG-1 requests that can be supported concurrently is 36, and the bu er space required is 9.32 MB. 2
6.2 The RLE Algorithm
In the FBF scheme, the time allocated for each request
Ri in Formula 7 is the time it takes to retrieve di bits where di is the smallest number that is a multiple of the track size and greater than or equal to T ri :
Thus, if T ri is only a few bits more than an integral number of tracks, the time that is allocated for Ri will be close to that allocated in a scheme that does not try to reduce the rotational delay. This is because close to trot transfer time is allocated for the few bytes that cause T ri exceed a multiple of the track size. To overcome this de ciency, we propose a new scheme, called the RLE scheme, which ensures that no rotational latency overhead is incurred when servicing requests. Furthermore, the time allocated for each request Ri only consists of the transfer time for T ri bits and the seek time overhead. Thus, the RLE scheme admits at least as many (or more) requests than FBF does. None of the schemes proposed in the literature eliminate the overhead of rotational latency when admitting real-time requests for which the data to be retrieved, T ri , is not a multiple of the track size. However, compared to FBF, the RLE scheme has higher bu er requirements and results in higher response times for requests. In this scheme, a request Rm is admitted if it satis es the following equation m T r X i +LT i=1 rdisk
(8)
where L is the aggregate disk latency (excluding rotational latency) incurred in servicing the m requests. Requests can be inserted into and deleted from the service list as in the ER or C-LOOK schemes. If ER is used, then L is m tseek on the other hand, if CLOOK is used, then L is m tsettle + 2 tseek . We now describe the scheme for servicing admitted requests in the list. Similar to FBF, in RLE the data retrieved during an instance is always an integral number of tracks, and the data is retrieved as soon as the head is positioned over the track containing the data. Thus, no rotational latency is incurred. However, unlike FBF, the data retrieved during an instance is always less than or equal to, and within a track of the data to be retrieved (later in this subsection, we describe how the data to be retrieved for an instance is computed). We refer to the di erence between the data retrieved and the data to be retrieved for an instance as the slack for the instance. In case there is slack at the end of an instance, the data to be read during the next instance is increased by the amount of slack so that additional bits can be read in order to eliminate slack. Thus, the slack cannot accumulate inde nitely and is always less than the size of a track. Also, at the start, consumption of data for a request is delayed until an additional track of data is retrieved into it's bu er. The additional track is used to ensure that bits missing due to slack are available for consumption until the slack is made up for in a subsequent instance. Note that retrieving additional bits corresponding to the slack during an instance does not cause other requests to starve. The reason for this is that since slack is basically the number of bits less that were read for a request, if there is slack for a request, then the service time for the request reduces by the time to retrieve the
slack from disk. Thus, every service cycle terminates earlier by the sum of the time taken to retrieve slack bits for every request, and consequently, every request is serviced earlier by the time it takes to retrieve from disk the sum of all the slacks. As a result, when a request Ri is serviced, its bu er contains additional bits equal to ri times the time to retrieve the sum of all the slacks. Thus, for a request, retrieving additional bits from disk equal to the slack does not result in other requests starving. Also, if the number of admitted requests is m, in the worst case, the total slack can be at most m times tck size, where tck size denotes the size of a track. As a result, the additional bits in the bu er sizeri . for request Ri in the worst case could be mtckrdisk Thus, the size of the bu er for request Ri required by sizeri . the RLE scheme is T ri + tck size + mtckrdisk With every request Ri , the slack for the request is maintained, which we denote by i . Also, we denote the value of i at time t by i (t). Initially, i is set to 0, that is, i (s1 ) = 0. The amount of data retrieved for a request Ri in the j th instance is computed as follows. In order to prevent starvation, the requirement is that at the completion of the j th instance, the bu er for Ri contains at least P T r + tck size ; (c ) + ri k k (cj )
The rst element of the set in Equation 9 ensures that the new slack is always less than or equal to the di erence between the amount to be retrieved and the amount read. The second element is an optimization to set the slack to an even lower value in case the requirements for in bueri (cj ) are satis ed. Finally, for a new request Ri , consumption of bits from the bu er is initiated on completion of the j th instance if inPbueri (cj ) is at least T ri + tck size ; i (cj ) + ri k k (cj ) bits. This ensures that in case additional rdisk slack bits are retrieved for other requests, request Ri does not starve. Note that, for a new request, in the rst instance, the data retrieved is within a track of T ri , and the slack i (cj ) is the di erence between T ri and the amount read. As a result, consumption of bits from the bu er would be delayed at least until the completion of the second instance. We are now in a position to show that on completion of an instance, the bu er contains enough bits to prevent starvation. To do so, we make the following observations. First, since the maximum number of tracks satisfying conditions 1 and 2 are read, i (cj ) is always less than tck size. Second, due to Equation 9, it follows that i (cj ) T ri + i (sj ) ; retrievei (j ).
size bits make up for the slack i (cj ) and bits Pk (ktck (cj ) rdisk is the time to retrieve from disk, the cumulative slack for all the admitted requests at time cj ). This requirement can be satis ed, if during the j th instance, the maximum number of tracks are retrieved such that the following two conditions hold: 1. retrievei (j ) T ri + i (sj ).
ri k k (cj ) rdisk : 2
i
i j
rdisk
sizeri + 2. in P bueri (cj ) T ri + tck size + tckrdisk ri k=i k (sj ) . rdisk 6
The rst condition ensures that the amount of data retrieved for request Ri is at most T ri + i (sj ). The second condition is an optimization that enables fewer bits to be retrieved if the fewer bits ensure that nally the bu er for Ri contains the required number of bits (e.g., in case there are very few requests). In the sizeri instead of i (cj )ri second condition, we use tckrdisk rdisk since the slack at the end of the instance is not known when it starts, but is always less than tck size. On completion of an instance, the slack i (cj ) is set to 0 if all the data for Ri has been retrieved. Else, i (cj ) is set to minfT ri + i (sj ) ; retrieve i (j ) T ri + tck size+ P r i tck sizeri + k=i k (sj ) ; in buer (c )g i j rdisk rdisk 6
(9)
Lemma 1: Once the consumption of bits from Ri 's bu er has begun, at the completion of the j th instance, in P bueri (cj ) is at least T ri + tck size ; i (cj ) + From Lemma 1, it follows that requests cannot starve. The threason for this is that the data retrieved during any j instance of Ri is at most T ri + i (sj ) ; i (cj ), while at cj , the data in the bu er is at least T ri + tcksize ; i(cj ). Since tck size > i (sj ), it must be the case that there is at least one unconsumed bit in the bu er when the instance completes. Thus, no requests starve. When computing the number of bits retrieved for the j th instance of request Ri , we presented Condition 2 in terms of in bueri (cj ). However, since in bueri (cj ) is not known at sj , in the following, we restate Condition 2 in terms of in bueri (sj ). Let l be the time to seek from the track on which the head is positioned at sj to the track containing data for Ri . Thus, Condition 2 can be rewritten as follows: retrievei (j ) + in buer i (sj ) T ri + tck size + P r i k=i k (sj ) + retrievei (j )ri + l r tck sizeri + i rdisk rdisk rdisk i (j ) + l is the time to retrieve where retrieve rdisk retrievei (j ) bits. 6
7 Non Real-Time Requests
Until now, we have con ned our discussion to only real-time requests. However, not all requests require rate guarantees these requests are referred to as non real-time requests. For example, requests to retrieve
non-continuous data (e.g., text, images) or certain requests to store/retrieve continuous media data (e.g., edits) do not require rate guarantees to be met. In this section, we present methods for incorporating non real-time requests with previously proposed schemes. The challenge is to provide low response times to non real-time requests without jeopardizing the promised transfer rates of real-time requests. We assume that each non real-time request Ni has associated with it a retrieval size ni which denotes the amount of data requested by the client. The response time of Ni is the di erence between its arrival time and the time when the last bit of the requested ni bits is returned to the client. In order to ensure that non real-time requests are not blocked for a long time by real-time requests, some of the disk bandwidth as well as bu er space must be reserved for them. We achieve this by allocating for real-time requests, only a fraction of the common period, namely T , where is a system parameter between 0 and 1. Thus, a real-time request Rm is admitted if the following formula is satis ed for the admitted real-time requests R1 : : : Rm;1: m X
di + L T r
i=1 disk
(10)
where L is m tlat in case of the ER scheme L is 2 tseek + m (trot + tsettle ) in case of the C-LOOK scheme and L is 2 tseek + m tsettle in case of FBF with C-LOOK and RLE with C-LOOK. Furthermore, for the RLE scheme with C-LOOK, di in the above formula is replaced by T ri : In addition, a new request is admitted if for the resulting admitted real-time requests R1 : : : Rm , and non real-time requests N1 : : : Nq , the following formula holds: m X
q ni " di + X r r +LT
i=1 disk
i=1 disk
(11)
where L" is the aggregate latency overhead of servicing the m + q requests5. For example, in case of ER, L" = (m + q) tlat while for the C-LOOK scheme, L" = 2 tseek + (m + q) (trot + tsettle ) and nally for the FBF with C-LOOK scheme and the RLE scheme with C-LOOK, L" = 2 tseek + (m + q) tsettle + q trot. Furthermore, for the latter scheme, di in the above formula is replaced with T ri : Thus, admitted real-time requests need to satisfy both Formulas 10 and 11, while admitted non realtime requests need to satisfy only Formula 11. As a result, by allocating only a portion of the service cycle for real-time requests, we can ensure that a certain amount of time in every service cycle can be used to service non real-time requests. However, in the scheme, non real-time requests are allowed to utilize 5 Non real-time requests that require more disk time than available in a service cycle are decomposed into multiple smaller requests.
the disk time in a service cycle that is reserved for real-time requests but unutilized by real-time requests, whereas the real-time requests are not allowed to utilize the disk time in a service cycle that is reserved for non real-time requests but unutilized by non real-time requests. The rationale behind this decision is that we expect that most real-time requests will last more than one service cycle, and if the time in a service cycle reserved but unused by non real-time requests is assigned to real-time requests, non real-time requests may be blocked for more than one service cycle which could result in poor response times. On the other hand, time allocated to non real-time requests is available after time T . Thus, by permitting time reserved for real-time requests to be utilized for processing non real-time requests, we permit the e ective utilization of disk bandwidth without degrading response times for both real-time as well as non real-time requests. Two waiting lists are maintained, one for real-time requests and another for non real-time requests. The lists contain requests that are not admitted upon arrival. Admitted requests are inserted into the service list as described for the various schemes. Furthermore, deletion of non real-time requests from the list is carried out as for real-time requests, treating them as real-time requests with a single instance. Upon deletion of a request, as many real-time requests as possible from the waiting list are admitted following which as many non real-time requests as possible are admitted6 . Formula 11 ensures that real-time requests do not starve.
8 Variable Transfer Rates
The commonly used SCSI disks do not provide a uniform transfer rate. The storage capacity of a track in such disks is proportional to its length and thus, so is its transfer rate. Since an inner track is shorter than an outer track, the transfer rate of the inner track may be less than the outer one. Most disks utilize a technique called zoning which groups the tracks into a number of zones such that tracks in each zone contain the same number of bits 10]. Until now, we have used the transfer rate of the innermost track, that is, the minimum transfer rate, as the value for rdisk . However, if the physical layout of clips is known, a more accurate value can be used for the transfer rate of a clip's data. Thus, the admission control can calculate the service time of requests that retrieve data from outer zones more accurately, and therefore, can accept more requests. For example, in the Seagate Elite 9 SCSI disk drive, the transfer rate of the outermost track is almost twice the transfer rate of the innermost track1]. Let rmini be the minimum transfer rate for the clip that request Ri retrieves that is, rmini is the transfer rate of the innermost zone on which a portion of the clip resides. Thus, for the ER and C-LOOK algo6 By maintaining waiting lists in FIFO order, and not allocating disk bandwidth reserved for real-time requests to non real-time requests if the waiting list for real-time requests is nonempty, fairness in the admission of real-time and non real-time requests can be ensured.
rithms, since the transfer rate for Ri 's data is at least rmini , the transfer rate rdisk in equations 4, 5 and 6 can be replaced with rmini . In the case of rotational delay optimization, dealing with varying track sizes is more involved. The reason for this is that since in those algorithms, the unit of retrieval d is equal to the track size, the unit of retrieval changes from one instance of the request to another. However, the CP class as de ned in Section 3 assumes that the unit of retrieval d is a constant. In the following, we extend the CP class to contain retrieval algorithms for which the size of a unit of retrieval may change from instance to instance, but the transfer time for a unit is constant (e.g., if the unit of retrieval is a track, then the transfer time is trot ). Let dmini and dmaxi be the minimum and maximum sizes of units retrieved for request Ri , and t be the time to retrieve a unit. We modify equations 1, 2 and 3 in the de nition of the CP class as follows. We replace d by dmaxi in Equation 1, which is the bu er size for request Ri . In Equation 2, di is set to the maximum number of units that T ri bits can span. Thus, the maximum amount to be retrieved, di , ri e units. The number of bits retrieved by the is d dTmin i j th instance satis es the following conditions (instead of Equation 3): 1. Every instance of a request Ri retrieves zero or more units. 2. retrievei (sj ) emptyi (sj ). 3. the number of units retrieved is less than or equal ri e: to d dTmin i
Note that the CP class as de ned in Section 3 is a special case of the above de nition in case dmini = dmaxi = d. It can be easily shown that no algorithm in this class will result in starvation. For the FBF scheme, since the retrieval unit is a track, t = trot, dmini = rmini trot and dmaxi = rmaxi trot , where rmaxi is the transfer rate of the outermost track on which the clip accessed by Ri resides. Thus, since the worst-case service time for request Ri is d rminT iritrot e trot + tseek , the bu er size Bi for request Ri is
Bi = (T + d r T rti e trot + tseek ) ri + rmaxi trot ; 1: mini rot
Data retrieved by each instance satis es the above three conditions, and a new request Rm is admitted if m X d i=1
T ri rmini trot e trot + L T
(12)
holds where L is the aggregate seek overhead depending on the ordering of the requests. The RLE scheme can be extended to handle variable track sizes as follows. In Equation 8, rdisk is replaced by rmini . Also, since the maximum value of the slack can never exceed rmaxi trot, the size
of the bu er allocated Ri is T ri + P fork trequest rot . Also, when retrievrmaxi trot + ri mk=1 rmax rmink ing data for the j th instance, in the second condition, we require that in bueri (cj ) be lessPthan or equal to k (sj ) i trot ri + ri T ri + rmaxi trot + rmaxrmin k6=i rmink . Simii larly, the second element in Equation 9 is T ri + rmaxi P r t r (sj ) max rot i i trot + rmini + ri k6=i rkmin ; in bueri (cj ), and k consumption of bits is initiated if in buer i (cj ) is at P k (cj ) . least T ri + rmaxi trot ; i (cj ) + ri k rmin k
9 Non Contiguous Storage Allocation
Until now, we have assumed that continuous media clips are stored contiguously on disk. From the storage point of view, a more e ective scheme would be to view the disk as a sequence of blocks of size b: A list of free blocks is maintained and blocks from the list are allocated to store a continuous media clip. When the clip is deleted, the blocks allocated to it are appended to the free list. By choosing a suitable value for the block size b, fragmentation can be reduced and the storage space can be utilized more e ectively. However, since the blocks allocated to a clip may not be contiguous, retrieving for an instance, data that spans multiple blocks, results in seeks between consecutive blocks. Thus, an instance of a request during a service cycle can be modeled as a sequence of subinstances, each of which accesses only one block on the disk. An instance begins when the amount of data to be retrieved during the instance is determined and completes when all the subinstances complete. In the following sections, we show how the previous retrieval algorithms and the corresponding admission formulas can be modi ed for the non contiguous storage allocation scheme described above.
9.1 Modied ER Algorithm
The admitted requests are inserted into the service list as described in Section 3.1. The amount of data retrieved for an instance is calculated as in Equation 3. However, since a clip is not contiguous, the time to retrieve di bits for a request Ri could be more than di . The reason for this is that the di bits tlat + rdisk to be retrieved could span multiple blocks and thus, retrieving the bits may involve multiple seeks, one per block. We order subinstances of an instance such that the disk arm moves in a single direction. Thus, in the worst case, since di bits can span at most d dbi e + 1 blocks, the cost of servicing request Ri in the worst case is max fc ;s g = 2tseek +(d dbi e+1)(tsettle +trot)+ r di j j j
disk
The reason for this is that an initial worst case seek may be required to reach to the the rst block, followed by d dbi e seeks in a single direction that take d dbi e (tsettle + trot) + tseek time.
The bu er size for each request Ri is selected as in Equation 1. A new request Rm is admitted into the system only if the following formula holds 2 m tseek +
m d X (d i e +1) (t
b
i=1
settle + trot )+
m X
di T r
i=1 disk
(13) The above formula ensures that the di erence between the start times of two consecutive instances is at most T . Thus, the modi ed algorithm is in CP, and no requests starve.
9.2 Modied C-LOOK Algorithm
In the modi ed ER scheme, servicing of every request caused a worst-case seek latency to be incurred. This overhead can be eliminated by making a single sweep over the disk. At the beginning of each service cycle, the amount of data to be retrieved during the next instance of each admitted request is calculated using Equation 3. Thus, an instance begins with the start of its service cycle. Furthermore, the subinstances of all the instances in the service cycle are reordered based on the location of the block they access on disk. Thus, the time to service the requests is reduced by 2 (m ; 1) tseek , yielding: 2 tseek
m d X + (d i e + 1) (t
b
i=1
settle + trot
m X )+
di r i=1 disk
For the algorithm to be in CP, we need to ensure that the duration of a service cycle is at most T . Thus, a new request Rm is admitted only if the following formula holds 2 tseek +
m d X (d i e + 1) (t i=1
b
settle + trot ) +
m X
di T r i=1 disk
(14) Additionally, a request Ri is deleted only at the end of the service cycle in which its last instance completes. Furthermore, data for Ri is made available for consumption at the start of the next service cycle after the rst instance of Ri completes. Note that since maxj fcj ; sj g = T the bu er size for each request Ri as speci ed by Equation 1 is 2 T ri + d ; 1:
9.3 Rotational Latency Optimizations
The schemes for rotational latency optimization that we described earlier can be used if every block is an integral number of tracks7. In this case, every track contains data belonging to a single block. The FBF optimizations in which an integral number of tracks are read by each subinstance, and data is read o a track as soon as the disk head is positioned above the track, can be incorporated into the modi ed ER and C-LOOK schemes presented in previous subsections. Bu ers sizes and data retrieved during subinstances are computed as before, except that d = tck size. Furthermore, since no rotational latency is incurred, 7
too
In case tracks vary in sizes, then blocks may vary in sizes,
admission control for ER and C-LOOK is performed using Formulas 13 and 14 without trot , respectively. The RLE scheme can be used with either the modi ed ER or the modi ed C-LOOK schemes presented in previous subsections. In case each request's subinstances are serviced separately as in the ER scheme, the data retrieved during an instance, the sizes of bu ers for requests and the time after which consumption of bits can begin is exactly as described in Section 6.2. The only di erence is that admission control is performed Formula 8 with L replaced by Piusing T ri e + 1) tsettle (since in the 2 m tseek + m ( d =1 b worst case, T ri + tck size bits may be retrieved for request Ri during a service cycle). We now consider the case in which subinstances for all the requests are ordered (as in C-LOOK) at the start of a service cycle based on their locations on disk. The size of the bu er allocated for request Ri is sizeri . Also, the maximum 2T ri + tck size + mtckrdisk number of tracks are retrieved for the j th instance of request Ri such that retrievei (j ) minfT riP + i (sj ) 2T ri + tck size + ri tck size + ri k=i k (sj ) ; in bueri (sj )g rdisk rdisk and i (cj ) is set to minfT ri + iP (sj ) ; retrievei (j ) 2T ri + tck size + ri tck size + ri k=i k (sj ) ; in bueri (sj ) ; retrievei (j )g rdisk rdisk Consumption of data the bu er for request Ri Pk kfrom (t) tck size is begun ri + rdisk after t which is the completion time of the service cycle in which the rst instance completes (this is equivalent to beginning consumption at the end of the service cycle when P the ri k k (t) bu er contains T ri + tck size ; i (t) + rdisk bits). Formula 8 with L equal to 2 tseek + Pmi=1(dFinally, T ri e + 1) tsettle is used for admission control. b Using an argument similar to that used in the proof of Lemma 1, it can be shown that at time t corresponding to the start of every service cycle, P Ri's bu er contains ri k k (t) T ri + tck size ; i (t) + rdisk bits. 6
6
9.4 Non Real-Time Requests
The scheme that we presented in Section 7 can be used in order to service non real-time requests even in the case that clips are stored non-contiguously. For example, if the modi ed ER algorithm in Section 9.1 is used to service requests, P then in Formulas 11 and 10, L is 2 m tseek + mi=1 (d dbi e + 1) (tsettle + trot), while L" is Pi=1(d di e + 1) (tsettle + trot) + 2 (m + q) tseek + m Pqi=1b (d ni e+1)(tsettle +trot) b For the modi ed C-LOOK scheme, the values for L and L" are similar to those for the ER scheme, except
that 2 (m + q) is replaced by 2. Similarly, the values for L and L" can be determined for schemes presented in Section 9.3. The scheme in Section 7 makes the worst-case assumptions about the time spent in servicing requests. We now present a scheme that takes into account the actual time spent in servicing requests, and that can be used in conjunction with the C-LOOK scheme. The scheme basically records the beginning tb and end te of a service cycle, and then uses the idle time T ; (tb ; te ) to service non real-time requests in the waiting list. Note that even after utilizing the idle time for servicing non real-time requests, the duration of a service cycle does not exceed T . Thus, since for the C-LOOK scheme, an instance starts at the start of its service cycle, the di erence between start times of consecutive instances is at most T , and it is still in CP. The above scheme can also be used for C-LOOK with the rotational latency optimizations. In order to use the above scheme with ER, the size of the bu er allocated to each client must be increased to 2 T ri + d ; 1 and in any service cycle if a new real-time request is serviced for the rst time, the idle time in that service cycle is not utilized. Otherwise, extending the duration of a service cycle to T may cause the start times between consecutive instances to exceed T .
10 Related Work
A number of schemes for handling the storage and retrieval of continuous media data from disk have been proposed in the literature 2, 8, 5, 9, 3]. None of them addresses the issues of servicing real-time requests with varying rates, reduction of seek and rotational latency, servicing non real-time requests, varying disk transfer rates and non-contiguous storage allocation in a single framework. Our schemes di er from the other proposed schemes as follows. First, none of the existing schemes address the issue of varying track sizes. Our schemes, by considering varying transfer rates, increase the number of concurrent requests that the server can support. Second, unlike existing schemes, the RLE scheme completely eliminates the overhead of rotational delay. Among schemes proposed in the literature, 9, 3] reduce only seek latency, while the schemes presented in 2, 8] do not attempt to reduce disk latency. Third, the response times of requests are considered only in 9] and 3]. Our ER scheme yields response times for real-time requests that are at least as good as the ones obtained by the earliest deadline rst (EDF) schemes presented in 9] and the rst come rst serviced scheme in 3]. We furthermore present schemes for supporting non real-time requests without violating the rate guarantees of real-time requests while providing low response time for both real-time and non real-time requests. Among the existing work, only 2, 9] address the issue of servicing non real-time requests concurrently with real-time requests. Simulation studies in 9] show that the retrieval algorithms based on the EDF algorithm for servicing real-time requests and immediate server approach for servicing non real-time requests result in better response time for non real-time requests than the C-LOOK algorithm. However, these algorithms
decrease the response time of non real-time requests at the cost of increasing the response time of real-time requests.
11 Research Issues
In this section, we discuss some of the research issues in the area of storage and retrieval of continuous media data that remain to be addressed.
11.1 Load Balancing and Fault Tolerance Issues
So far, we assumed that continuous media clips are stored on a single disk. However, in general, continuous media servers may have multiple disks on which continuous media clips may need to be stored. One approach to the problem is to simply partition the set of continuous media clips among the various disks and then use the schemes that we described earlier in order to store the clips on each of the disks. One problem with the approach is that if requests for continuous media clips are not distributed uniformly across the disks, then certain disks may end up idling, while others may have too much load and so some requests may not be accepted. For example, if clip C1 is stored on disk D1 and clip C2 is stored on disk D2 , then if there are more requests for C1 and fewer for C2 , then the bandwidth of disk D2 would not be fully utilized. A solution to this problem is striping 4]. By storing the rst half of C1 and C2 on D1 and the second half of the clips on D2 , we can ensure that the workload is evenly distributed between D1 and D2 . Striping continuous media clips across disks involves a number of research issues. One is the granularity of striping for the various clips. The other is that striping complicates the implementation of VCR operations. For example, consider a scenario in which every real-time request is paused just before data for the request is to be retrieved from a \certain" disk D1 . If all the requests were to be resumed simultaneously, then the resumption of the last request for which data is retrieved from D1 may be delayed by an unacceptable amount of time. Replicating the continuous media clips across multiple disks could help in balancing the load on disks as well as reducing response times in case disks get overloaded. Replication of the clips across disks is also useful to achieve fault-tolerance in case disks fail. One option is to use disk mirroring to recover from disk failures another would be to use parity disks 4]. The potential problem with both of these approaches is that they are wasteful in both storage space as well as bandwidth. We need alternative schemes that e ectively utilize disk bandwidth, and at the same time ensure that data for a real-time request can continue to be retrieved at the required rate in case of a disk failure.
11.2 Storage Issues
In the schemes presented in Section 9, a small value for the block size b reduces fragmentation however, it increases the disk latency. An important research issue is to determine the ideal block size for clips that would keep both space utilization high as well as disk latency low.
Another important issue to consider is the storage of continuous media clips on tertiary storage (e.g., tapes, CD-ROMs). Since continuous media data tends to be voluminous, it may be necessary (in order to reduce costs) to store it on CD-ROMs and tapes, which are much cheaper than disks. Techniques for retrieving continuous media data from tertiary storage is an interesting and challenging problem. For example, tapes have high seek times and so we may wish to use disks to cache initial portions of clips in order to keep response times low.
11.3 Data Retrieval Issues
In the schemes presented in this paper, a separate bu er is maintained for each request. Thus, it may be possible that two requests for the same clip retrieve the same data in their own bu ers thus, resulting in disk bandwidth being wasted. A solution that utilizes the disk bandwidth more e ectively is one in which requests share a global pool of bu er pages. Furthermore, data belonging to a clip is retrieved into the global pool only if the data is not already contained in it 7]. An important research issue is the bu er page replacement policy. For example, a least recently used (LRU) policy may be unsuitable if another real-time request, in the near future, needs access to the least recently used page. It may instead be more suitable to replace a page that has been accessed by a request and does not need to be accessed by any other real-time requests. Thus, a bu er page replacement policy that takes into account real-time requests being serviced would result in better performance. Also, to handle clips with di erent rate requirements, a common period was used to determine bu er sizes for real-time requests. In order to maximize the utilization of disk bandwidth as well as memory, an e ective method to dynamically vary the common period if the actual workload is very di erent from the expected workload, needs to be developed. Finally, in our work, we have not taken into account the fact that disks are not perfect. For example, disks have bad sectors that are remapped to alternate locations. Furthermore, due to thermal expansion, tables storing information on how long and how much power to apply on a particular seek, need to be recalibrated. Typically, this takes 500-800 milliseconds and occurs once every 15-30 minutes.
12 Conclusions
We presented a number of algorithms for retrieving from disks, continuous media data with varying rate requirements. The algorithms provide rate guarantees for a large number of real-time requests by reducing seek latency and eliminating rotational latency so that the bu ering requirements for requests are reduced. We also presented schemes for servicing non real-time requests concurrently with real-time requests. The schemes reserve a certain portion of the disk bandwidth for non real-time requests in order to ensure that they have low response times. Finally, we showed how our algorithms can be extended to deal with varying track sizes of disks, and storage allocation schemes that store clips non-contiguously.
The algorithms are currently being implemented as part of the Fellini project at AT&T Bell Labs. Fellini is being deployed as a storage server for continuous media data as well as non-continuous data in applications like multimedia messaging, on-line training and information services, and video on demand.
References 1] Seagate Product Overview. Oct. 1993.
2] D. P. Anderson, Y. Osawa, and R. Govindan. A le system for continuous media. ACM Transactions on Computer Systems, 10(4):311{337, Nov. 1992. 3] M. S. Chen, D. D. Kandlur, and P. S. Yu. Optimization of the grouped sweeping scheduling (gss) with heterogeneous multimedia streams. In Proceedings of the ACM International Conference on Multimedia, pages 235{242, 1993. 4] G. R. Ganger, R. Y. H. B. L. Worthington, and Y. N. Patt. Disk arrays: High-performance, high-reliability storage subsystems. Computer, 27(3):30{36, Mar. 1994. 5] B. O# zden, A. Biliris, R. Rastogi, and A. Silberschatz. A low-cost storage server for movie on demand databases. In Proceedings of the Twentieth International Conference on Very Large Databases, Santiago, Sept. 1994. 6] B. O# zden, R. Rastogi, and A. Silberschatz. Fellini |a le system for continuous media. Technical Report 113880-941028-30, AT&T Bell Laboratories, Murray Hill, 1994. # 7] B. Ozden, R. Rastogi, A. Silberschatz, and C. Martin. Demand paging for movie-on-demand servers. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems, Washington D.C., May 1995. 8] P. V. Rangan and H. M. Vin. Designing le systems for digital video and audio. In Proceedings of the Thirteenth Symposium on Operating System Principles, pages 81{94, 1991. 9] A. L. N. Reddy and J. C. Wyllie. I/O issues in a multimedia system. Computer, 27(3):69{74, Mar. 1994. 10] C. Ruemmler and J. Wilkes. An introduction to disk drive modeling. Computer, 27(3):17{27, Mar. 1994. 11] A. Silberschatz and P. Galvin. Operating System Concepts. Addison-Wesley, 1994.