Preprint 0 (2000) ?{?
1
Continuous Data Block Placement in and Elevation from Tertiary Storage in Hierarchical Storage Servers Peter Trianta llou a Thomas Papadakis b ;
Department of Computer Engineering Technical University of Crete 73100 Chania, Crete, Greece E-mail:
[email protected] b Legato Systems (Canada) Inc. 3390 South Service Road Burlington, ON L7N 3J5 Canada E-mail:
[email protected] a
Given the cost of memories and the very large storage and bandwidth requirements of large-scale multimedia databases, hierarchical storage servers (which consist of disk-based secondary storage and tape-library-based tertiary storage) are becoming increasingly popular. Such server applications rely upon tape libraries to store all media, exploiting their excellent storage capacity and cost per MB characteristics. They also rely upon disk arrays, exploiting their high bandwidth, to satisfy a very large number of requests. Given typical access patterns and server con gurations, the tape drives are fully utilized uploading data for requests that `fall through' to the tertiary level. Such upload operations consume signi cant secondary storage device and bus bandwidth. In addition, with present technology (and trends) the disk array can serve fewer requests to continuous objects than it can store, mainly due to IO and/or backplane bus bandwidth limitations. In this work we address comprehensively the performance of these hierarchical, continuous-media, storage servers by looking at all three main system resources: the tape drive bandwidth, the secondary-storage bandwidth, and the host's RAM. We provide techniques which, while fully utilizing the tape drive bandwidth (an expensive resource) they introduce bandwidth savings, which allow the secondary storage devices to serve more requests and do so without increasing demands for the host's RAM space. Speci cally, we consider the issue of elevating continuous data from its permanent place in tertiary for display purposes. We develop algorithms for sharing the responsibility for the playback between the secondary and tertiary devices and for placing the blocks of continuous objects on tapes, and show how they achieve the above goals. We study
An earlier, much abridged version of the paper appeared in [TP97] Research partly supported by the European Community under the ESPRIT Long Term Research Project HERMES no. 9141.
2
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
these issues for dierent commercial tape library products with dierent bandwidth and tape capacity and in environments with and without the multiplexing of tape libraries.
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
3
Keywords: Tertiary Storage, Bandwidth AMS Subject classi cation: A22, B3f
1. Introduction Future multimedia information systems are likely to contain large collections of delay sensitive data objects (e.g. audio and video) with various lengths and display/play(back) requirements. Some of the main data characteristics which present serious challenges when building multimedia servers are: their large size, the delay-sensitivity of their data, and their high display bandwidth requirements. Multimedia objects can be very large in size: As one example, 100 minutes of MPEG-2 video may require up to 6 GB of storage. Thus, the storage requirements of future continuous media servers for many dierent applications will easily exceed several terabytes. The cost of memory strongly depends on the type of memory being employed. Currently, for RAM memory the cost is about $1.0 per MB; for individual magnetic disk storage, the cost is about $0.1 per MB (see for example, the cost of high performance disk drives, such as the Quantum Atlas (http://www.quantum.com) and the Seagate Cheetah (http://www.seagate.com)). When a large number of disk drives are grouped using disk array technology to store large amounts of data, the cost for disk storage becomes greater than $0.3 per MB (see for example Maximum Strategy's Gen-5 disk array product (http://www.maxstrat.com)). For magnetic tape storage the cost is less $0.004 per MB. For higher-end tape libraries, the tape storage cost per MB ranges from $0.025 per MB (e.g., for the high-performance Ampex DST812 library with 50GB tapes (http//www.ampex.com/html)) to signi cantly lower gures for 330GB tapes. These \capital" costs for dierent types of memories should also include an organization's costs for managing their storage systems, which can be quite signi cant: for disk storage it is reported to be $7 per MB; for tape library storage, due to its automated functions, this cost is minimal [Sim97]. As an indication, in absolute dollar costs, a company's capital investment for the storage required for a full- edged video server (which for a few thousand videos is currently around 12TB) would be around $400,000 if a high-performance tape library is chosen (such as the 12.8TB Ampex DST812 library with four drives
4
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
and 256 50GB tapes) or more than $4,000,000 for eight Gen-5 disk arrays (each storing about 1.7TB). Given that small and mid-sized companies are reported to be willing to spend annually less than $50,000 and $120,000 for mass storage, respectively the above cost dierence ensures a dominant place in the market for tape libraries. In fact, the annual tape library market has been reported as growing at a pace of 34%, with some submarkets growing with a pace around 50% [Sim97]. Thus, tape library storage is very attractive when storage cost is of primary concern. On the other hand, the access speeds of memory units point to a dierent direction. Tertiary storage is characterized by very slow access times, in the order of tens of seconds (or up to minutes in lower-end products) for magnetic tape libraries. Magnetic disks have access times in the order of tens of milliseconds, whereas RAM memories' access times are in the order of tens of nanoseconds. The above are strong arguments for employing a hierarchy of memory technologies, forming a Hierarchical Storage Management System (HSMS), including RAM-based Primary Storage (PS), magnetic disk-array-based Secondary Storage (SS), and tape-based Tertiary Storage (TS). A key idea in a HSMS is to store all objects in the inexpensive TS, from where they will be elevated to the SS and PS levels before being accessed. This solves the problems owing to the large storage space requirements of continuous media servers and the costs of memory units. A complementary idea is to use the higher levels as a cache for the levels below; all objects will reside permanently on TS, while popular objects, for example, will reside for long periods of time in SS and portions of them will reside for long periods of time in PS. This idea addresses the problems owing to the memories' access times, and to the delay-sensitivity and high display bandwidth requirements of many continuous objects. Given the reported market growth, tape libraries are becoming increasingly popular for the low level of such a HSMS. In fact, with respect to continuous media applications, several companies have developed video storage solutions based on tape libraries for applications such as TV broadcasting [Hen97]. In most envisaged solutions, tape libraries are occupying the lowest level of the hierarchy, are used mainly for storing the vast amounts of data, exploiting their excellent cost per MB characteristics, while disk arrays are used for their high bandwidth in order to serve as many requests as possible. At the same time, colder (hoter) data objects migrate from disks (tapes) to tapes (disks). With this in mind, this paper addresses how to cleverly exploit the bandwidth oered by
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
5
Table 1 Storage and display parameters
symbol
explanation
typical values
display bandwidth (consumption rate)
0.2 { 1 MB/sec (eg, MPEG2)
d
# of blocks in an object block size \time unit" = display time for 1 block
varies; user-de ned varies; user-de ned varies; depends on s
| | |
SS bandwidth disk capacity SS capacity
3 { >100 MB/sec 3 { 50 GB 3 { >1 TB
c
| | |
tape drive bandwidth # of tape drives robot exchange time switch time = exchange (if needed) + search # of tape cartridges tape capacity TS capacity
0.5 { 20 MB/sec 1 { 64 6 { >40 sec 16 { > 60 sec 30 { 48,000 5 { 330 GB 0.5 { 190 TB
r
bT =bD
0.5 { 50
bD B s
bT
| e
j t
# of jobs multiplexed varies # of blocks in a \time slice" (in multiplexing) varies
modern tape drives employed in such HSMSs for continuous media applications, so to help increase the overall system's performance. 1.1. Hierarchical Storage Management Systems
We will now discuss HSMSs in some detail, paying attention to their speci cations and functionality in order to gain the relevant insights and put the contributions of this work in context. A summary of the HSMS's relevant parameters and some typical values is given in Table 1. An appropriate candidate for SS appears to be arrays of magnetic disks [PGK88]. Given the high bandwidth requirements of many multimedia objects and the typical, relatively low, eective bandwidth of magnetic disks (nominal bandwidths are in the range of 4{15 MB/sec), striping techniques [SGM86,BMGJ94,TF98] are likely to prove bene cial, since, when striping objects across several disks, the eective SS bandwidth for these objects is several
6
P. Triannta llou and T. Papadakis / Continuous Data Block Placement RAM Primary Storage (PS) (elevate) disk-based Secondary Storage (SS) elevate tape- / CDROM- based Tertiary Storage (TS)
Figure 1. Conceptual model of a Hierarchical Storage Management System (HSMS)
times that of a single disk. Currently, average seek and rotational delays are approximately 8{10 and 3{5 msec, respectively. With respect to TS, technological developments have led to the emergence of robot-based magnetic-tape libraries, making them the best candidate for the TS devices, given their desirable cost per megabyte gures. Tape libraries consist of a series of shelves (storing a number of tape cartridges), a number of tape drives onto which the tapes must be loaded before their objects can be accessed, and a number of robot arms (usually 1) which are responsible for loading and unloading the tapes to and from the drives. Despite TS's high data transfer rates (see Table 1) these devices remain comparatively slower than magnetic disks due to the high exchange costs (unload a tape, put it on the shelf, get another tape, load it) and to the cost of searching within a tape (proceeding at a maximum pace of less than 1.6 GB/sec). The conceptual model of a HSMS is illustrated in Figure 1. The conceptual model shows the three levels of the HSMS. Data is elevated one level at a time. In the rest of this paper, the term \elevate" has been reserved for \elevate from TS to SS". Figure 2 illustrates the physical architectural view of the HSMS. The key feature that must be noted here is that PS serves as an intermediate staging area between the TS and the SS device and that an object is not required to be SS-resident in order to be displayed; it can also be displayed from TS.
P. Triannta llou and T. Papadakis / Continuous Data Block Placement NOTE: elevate = upload+flush
7
Display
consume
111111111 000000000 000000000 RAM 111111111 000000000 111111111 000000000Primary Storage (PS) 111111111 000000000 111111111 flush
upload
Tape Controller
retrieve
Disk Controller
........... robotic tape library Tertiary Storage (TS)
........... disk array Secondary Storage (SS)
Figure 2. Physical model of a Hierarchical Storage Management System (HSMS)
1.2. Related Work, Motivations, and the Problem 1.2.1. The Big Picture There is great consensus that the I/O subsystem of a computer system has become the performance bottleneck [PGK88,Pat93]. This observation is the motivation for a large body of research in the area of storage servers, aiming to increase the available SS bandwidth in the system. One thread of this research led to the development of disk arrays [PGK88] and to the development of new methods for placing data objects on disk arrays, which attempt to exploit their inherent potential for high performance and reliability [SGM86,PGK88,CP90, Gea94], mainly through data placement techniques, such as striping. Multimedia data, such as video and audio, require very large storage capacities and very large I/O bandwidths, making disk arrays the natural choice for the SS medium. Researchers in multimedia and video storage servers then started paying attention to the placement of multimedia data on SS devices in such a way so to increase the system's performance by eectively utilizing the available SS bandwidth of the system [RV93,LS93,GR,Tea93,BMGJ94,KK95,O RS96,TF98].
8
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
Another thread concentrated on developing techniques to reduce the number of required secondary storage I/O's by exploiting main-memory buer caches and the characteristics of several emerging applications (such as video on demand servers). Through buer and data sharing techniques, such as batching, bridging, and adaptive piggybacking, and prefetching into smart-disk caches this research attained a reduction in the required number of SS I/O streams required to support video object requests [KRT94,RZ95,GLM96,DS94,Dan95,ASS96,TH99]. Thus, so far related research has concentrated on techniques for cleverly exploiting the PS and SS resources of the system in order to increase its performance. With respect to tertiary storage, there have been eorts at modelling the performance characteristics of individual tape drives and robotic tape libraries [HS96b,HS96a,JM98a,JM98b] and [Joh96,JM98a,JM98b] respectively, to predict future references and prefetch documents in the SS [KW98] for digital library applications, to derive intelligent placement strategies of data in the library [CTZ97], as well as eorts to derive intelligent scheduling algorithms for multiplexed video streams over tape drives [KMP90,LLW95]. As mentioned in the introduction, disk-based storage is viewed as presently too costly to store the thousands of video objects in a full- edged video server, and thus TS is employed for storage augmentation purposes. A continuous object can be played either from TS or from SS [Che94, KDST95,GR98,Che98]. One may Play from TS an object by issuing upload requests to TS. The uploaded blocks of the object are placed into PS buers, from where they are subsequently consumed:1 either they are transmitted over a network to a remote client, or they are displayed to a local client. However, note that the bandwidth of TS tape drives is typically signi cantly greater than the display bandwidth of objects. Thus, playing objects directly from TS can in general create serious PS buer space problems. As a result, a much more preferable choice is to rst elevate the object from TS to SS (that is, physically upload it from TS to PS, and then ush it from PS to SS), and subsequently Play from SS the object, by issuing retrieval requests to SS for the blocks to be displayed [GS94,KDST95]. Furthermore, given that very few users (if any) would tolerate high response times, the playback process must be started immediately after enough data has been elevated to SS; hence, typically, the retrieval 1
The terms \consume", \display", \playback" and \play" will be used interchangeably in this paper.
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
9
operations executing on behalf of the playback process are executed in parallel with the elevate operations which move the future blocks of the object to SS [GDS95]. Let us refer to the above procedure (i.e., elevate from TS to SS, and simultaneously play from SS), which represents the research state of the art, (for displaying continuous objects residing in TS) as the Conventional Play method. 1.2.2. The Problem This paper assumes the above framework of a continuous media server based on a hierarchical storage manager. The research reported here rst addresses the important problem of continuous data elevation, from its permanent place in TS to the higher levels [CT95]. In particular, we concentrate on on-demand elevation (i.e., elevation occurs when a display request for the object arrives). On-demand elevation is a non-trivial problem since it must deal with the bandwidth mismatch problem: rstly, the display bandwidth requirements of objects are dierent from the bandwidths of TS and SS devices; and secondly, the sustained (available, eective) TS bandwidth vary with time, depending on the requirements of the current workload (e.g., whenever multiplexing is employed). An Example System Con guration and Problem Setup Let us consider an example HSMS system con guration which can be employed for a media server for movie-viewing, tele-teaching, digital libraries, and other video-based multimedia applications. At the lower level there is a high performance tape library (based on the Ampex DST812 tape library) with 12.8 TB of storage, with four drives and 256 50GB tapes, costing about $400,000. The SS level, acts as a cache, and is a high performance disk array (based on Maximum Strategy's Gen-5 array) consisting of 95 magnetic disks, each storing 18GB, totaling about 10% of the tertiary storage, and costing about $500,000 (or 125% of the tertiary cost). Considering 90 minute MPEG-2 videos that require about 3GB of storage, about 6 videos can t in a disk. Thus, the SS level can store 570 videos in total. Assuming for simplicity 500KB video disk blocks, each with a display time of one second, with present technology each disk drive of the array, in isolation, can support the uninterrupted display of about 16 videos (e.g., with 12MB/s transfer rate, 8ms average seek cost, 4 ms average rotational delay, and 1 ms track/cylinder switch cost). However, due to the bandwidth limitations in the front-end and back-end I/O buses, the combined data transfer rate of the array
10
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
is about 250MB/sec, or about 20 times that of a single member drive. Thus, although the whole SS can store 570 videos, it can support the playback of about (20 16 =) 320 video requests. Assuming a real-world system, with several hundred video requests in the system at any time, and a Zipf access-request distribution to videos, a few (e.g., the 10-15) most popular videos will be receiving more than 70% of all requests. Thus it is important to create copies of the most popular videos and employ a clever video placement of the copies to the SS disks, so their load can be balanced and serve as many video requests as possible. Given that considerable SS bandwidth is (in addition to sustaining the playback of the SS-resident videos) required for elevating colder video blocks from TS to SS, it is easy to see that all of the system's SS I/O bandwidth can be consumed. In fact, please note that products for servers supporting large numbers of disks are currently constrained by backplane bus limitations, which is many times smaller the combined raw disk bandwidth [www98,www97]. This fact has been recognized by many storage system researchers [RGF98,KP98] and is a key problem on which they focus. In this framework, and using the Conventional Play method, the TS is used for storage augmentation purposes, simply to store the large number of colder videos and elevate them to SS when a request for them arrives. Since typically the TS bandwidth is sucient to support only few requests that fall through to the TS level [Che98] the TS drives will be fully utilized elevating colder video data to SS. The SS I/O bandwidth (including the backplane bus bandwidth) is also fully utilized supporting retrieval requests for video blocks and elevation requests for TS blocks. As mentioned, even without the bandwidth consumed by the data elevation from TS, the SS stores many more videos than it can support. When SS bandwidth is consumed by the need to elevate colder videos from the TS, the available SS bandwidth decreases signi cantly. The key goal of this work is to develop techniques which cleverly exploit the bandwidth of the tape drives during elevation so to save IO bandwidth. This saved bandwidth can allow the SS to serve additional requests for some of the other videos it stores and thus improve the overall system performance. This is not an easy task because, despite the high transfer rates of modern tape drives, the average access cost in tape libraries remains very high (currently, about three orders of magnitude higher) due to the high costs for robotic movements and head positioning delays. Thus, we focus on requests for delay-sensitive data elevation from TS to
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
11
higher levels, with the goals of achieving IO bandwidth savings and, in addition, hiccup-free displays of video streams, and low start-up latencies. The saved IO bandwidth can be used to accept and serve additional requests for SS-resident data. At the same time, we handle the bandwidth mismatch problem during data elevation, in a manner that allows the above savings in IO bandwidth while not requiring any extra PS buer space. The remainder of the paper is structured as follows. In Section 2 we will present \Alternate Play", a novel algorithm for on-demand data elevation through the levels of the hierarchy and we will discuss its bene ts with respect to IO bandwidth savings and its compromises with respect to PS buer space. In Section 3, we contribute a set of novel techniques which place the blocks of video objects on the tertiary storage media in a manner which alleviates the need for extra PS buer space and companion play algorithms which achieve the same SS bandwidth savings as the Alternate Play algorithm. Subsequently, we contribute the notion of strips of streams, the use of which makes our techniques applicable even when the display bandwidth requirements are larger than the available TS bandwidth (e.g., for low-end tape drives). In Section 4 we revisit these issues, contributing algorithms and analyses for the case when the tape library is multiplexed across streams (to avoid experiencing unacceptably-long start-up latencies and to be able to oer the SS savings of the earlier sections to many more video requests). In Section 5 we revisit the same issues, only now the assumption of a xed and known, a priori, multiprogramming degree is removed. In Section 6 we discuss some real-world pitfalls and how our algorithms can be used to avoid them. Finally, in Section 7 we will present the conclusions of this work.
2. Data Elevation with Alternate Play In this section we will present a novel elevation method, achieving IO bandwidth savings (when compared with the Conventional Play). This new method will be an integral part of many of our algorithms in the rest of this paper.
12
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
2.1. Alternate Play
The obvious advantage of the Conventional Play method is that it requires no additional PS buer space. 2 On the other hand, its obvious de ciency, is that SS is additionally taxed as a result of the execution of ush and retrieval operations, reducing thus the available IO bandwidth signi cantly. Since in our target environment we expect IO bandwidth to be one of the (two) scarcest resources, the above observation must be noted seriously. In the following, we will present some approaches aiming to alleviate this problem. An initial attempt to overcome the shortcomings of the Conventional Play method centers on the following idea: uploaded blocks from TS can be maintained in PS buers and made available to the playback process from them. This can save signi cant IO bandwidth. Of course, blocks belonging to popular objects may still be ushed to SS, in addition to maintaining them in PS buers from where they will be consumed. This alteration of the Play from TS method (i.e., Play from TS, and then ush to SS), when compared with the Conventional Play, saves bandwidth for the current request (since no retrievals from SS are needed). Also, future requests for the same object will not additionally tax the TS or SS (since no uploads or ushes for the SS-resident blocks are needed). Despite these unquestionable bene ts, there is an obvious concern regarding the amount of PS buer space which is needed to realize the aforementioned bandwidth savings. For example, if r = 2, the PS space requirements are 50% of the size of the entire object to be displayed (since when the last block is uploaded, only half of the blocks will have been consumed). In general, the PS space requirements are (r ; 1)=r of the object's size. Given that PS buer space is another scarce system resource, care must be taken to use it wisely. In an eort to reconcile this trade-o between the PS space requirements and the IO bandwidth savings, we can use hybrid techniques, so that for some blocks the Conventional Play method | essentially: Play from SS | is employed, while for other blocks a Play from TS method is followed. We refer to the newly2
Throughout the paper, \PS buer space requirements" will refer to the maximum number of PS buers (each holding 1 block) required at any instance during the display of an object. The buers holding blocks, which are currently being displayed, to be displayed during the next time unit, or currently being elevated to SS are not counted, since in each of these cases, using a buer is unavoidable.
13
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
Table 2 Example of Alternate Play (r =2; B =13) time unit
(elevated) to SS
(uploaded) to PS
0 1 2 3 4 5 6
| B2 B4 B4 B6 B6 B8 B6 B8 B10 B8 B10 B12
B1 B3 B3 B5 B5 B7 B5 B7 B9 B7 B9 B11 B7 B9 B11 B13
block displayed | B1 B2 B3 B4 B5 B6
proposed method as the Alternate Play algorithm. For the r = 2 special case, an example of the Alternate Play algorithm on a 13-block object, playing odd-numbered blocks from TS and even-numbered blocks from SS, is shown in Table 2. Newly arriving blocks have been underlined. The Alternate Play algorithm terminates when all blocks have been read o the TS. The display of the requested object continues by consuming all remaining blocks in strict alternation from PS and SS. At the end, half of the blocks will have been played from TS and half from SS. Two features of the above described (for the r = 2 case) Alternate l Play m algorithm must be noted. First, during its execution (i.e., during the rst 2;1 ' B=2 time units of the object's display) the SS bandwidth requirements of even time units jare twice the SS bandwidth requirements of odd time units; for the k +1 remaining 2 ' B=2 time units, the SS bandwidth requirements are equal to those of the odd time units. Second, during the algorithm's execution, every two time units the number of PS buers (needed to hold the Play from TS blocks) increases by 1; at the end of the algorithm's execution, half of the ' B=2 Play from TS blocks will still be in PS. It thus follows that the PS space requirements are 25% of the object's size. B
B
2.2. Generalizing Alternate Play 2.2.1. Integer r The r =2 Alternate Play algorithm of the last section, can be generalized to handle arbitrary r ratios as follows: In every time unit, r blocks are read from TS.
14
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
From these r blocks, k blocks are uploaded to PS (and played from there) and the remaining r;k blocks are elevated to SS, for some (arbitrarily/appropriately chosen) k =1; 2; : : :r;1. This gives us a family of Alternate Play algorithms, one for each dierent value of k. Pseudo-code for the above described Alternate Play is given as Algorithm 1. The given algorithm uses routines to upload an indicated block to a PS buer,
Algorithm 1 Alternate Play INPUT: Object's blocks in TS B1 := B1
next upld blk := next displ blk
upload(next upld blk++)
for
i
time unit :=
1 to
l
B
;1
r
m
do
parbegin 1. if 1 ((i ; 1) mod r) r ; k then Play-from-SS(next displ blk++) else consume(next displ blk++) 2. for j := 1 to r ; k do elevate(next upld blk++) for j := 1 to k do upload(next upld blk++) parend
or to
(i.e., upload and then ush) an indicated block to SS. The parbegin/parend construct is intended to indicate that the display (consume and Play-from-SS) of a block occurs in parallel with the readings (elevate and upload) of the next two blocks from TS. The reader may verify that Table 2 is a trace of Algorithm 1 for the r =2, k =1 case. The obvious question now is \what is the best value of k?". To answer this question, we will examine the impact of the choice of k on the SS bandwidth savings and on the required PS buer space. For any value of k, the SS bandwidth savings are Bk=r blocks (since for k out of every r blocks neither a ush to, nor a retrieve from, SS is required). Also for any value of k, the PS buer space requirements are Bk(r ; 1)=r2. This holds because the algorithm runs for B=r time units, during which Bk=r elevate
15
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
Table 3 Example of brc Alternate Play (r =3:5; k =1; B =22) time unit
(elevated) to SS
(uploaded) to PS
0 1 2 3 4 5 6
| B2 B3 [B5 ] B3 [B5 ] [B5 ] B6 B8 B5 B6 B8 B9 B11 [B12 ] B5 B6 B8 B9 B11 [B12 ] [B12 ] B14 B15 B6 B8 B9 B11 B12 B14 B15 B17 B18 B8 B9 B11 B12 B14 B15 B17 B18 B20 B21
B1 B4 B4 B7 B4 B7 B10 B7 B10 B13 B7 B10 B13 B16 [B19 ] B7 B10 B13 B16 [B19 ] [B19 ] B22
block displayed | B1 B2 B3 B4 B5 B6
blocks are uploaded to PS (k blocks per time unit) and Bk=r2 blocks (= k=r blocks for each of the B=r time units) have been played from TS. Therefore, Bk=r ; Bk=r2 blocks are still in PS when the algorithm terminates. We thus see that there is a trade-o between SS bandwidth savings and # of blocks played from SS ratio is, the required PS buer space: the higher the # of blocks played from TS lower the PS space requirements become, but the lower SS bandwidth savings are achieved. The k parameter in the general algorithm acts as a \knob", ne-tuning this trade-o. 2.2.2. Real r The alert reader may have realized that, so far, we have made the implicit assumption that r is an (arbitrary) integer. If r is a real number, as it will most likely be the case in real-world situations, one may apply the above ideas, uploading k out of every brc blocks to PS (and playing them from there). An example is shown in Table 3. Again, newly arriving blocks have been underlined. A block in brackets indicates that only part of the block is uploaded/elevated. Clearly, the SS bandwidth savings of the above outlined algorithm (when compared with the Conventional Play algorithm) are Bk= brc blocks. The PS space requirements of the above algorithm are Bk(r ; 1)=(r brc) Bk=r. This holds because the algorithm runs for B=r time units, displaying B=r blocks. During this time, Bk= brc blocks are uploaded to PS, and Bk=(r brc) blocks (= k= brc blocks for eac h of the B=r time units) are played from TS. Therefore, when the algorithm terminates, the number of blocks still in PS is Bk= brc; Bk=(r brc).
16
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
One might of course chose to upload k out of every dre blocks. Alternatively, if PS space is very scarce, one might chose to upload sometimes k out of every brc blocks and some other times k out of every dre blocks. This method will be further discussed in Section 3.2.2.
3. Data Elevation without Multiplexing The Alternate Play methods of the preceding section (some blocks are played from TS, and some from SS) will be the starting point for the derivation of better playing algorithms. 3.1. APWAT
As discussed earlier, the high PS buer requirements of the Alternate Play method are due to the long time that a PS buer is dedicated to holding a particular Play from TS block. Observe also that this long time is dependent on r: if r is large, then the Play from TS blocks will occupy a PS buer for a long time (since they will come into a PS buer too far ahead of their consumption times). In trying to attain further reductions in the PS space requirements, our key idea is that, by altering the order with which the blocks of an object are recorded on the TS media, large values of r (i.e., bandwidth mismatches between the TS and the display) can be accounted for in a way that reduces the occupancy time of PS buers by Play from TS blocks. Algorithm 2 is a placement algorithm, which, when given as inputs B and r,
Algorithm 2 Twisted Placement (integer r) INPUT: B, integer r OUTPUT: r-twisted sequence of object's blocks (to be placed on TS) l m for i:=0 to ; 1 do B
T[ri +1]
r
:=
B +1 i
randomly place blocks
Bd e+1 ; : : :; B B r
B
to blank tape positions
determines an r-twisted placement/sequence of an object's blocks B1 ; : : :; B in locations T[1]; T[2]; : : : T[B ] on a tape. Examples of twisted sequences are given in Figure 3 (the \randomly place blocks . . . " of Algorithm 2 is the \sequentially B
17
P. Triannta llou and T. Papadakis / Continuous Data Block Placement B1
B8
B2
B9
B3
B10
B4
B11
B5
B12
B6
B13
B7
B10
B3
B11
B12
B13
B4
(a) B1
B5
B6
B7
B2
B8
B9 (b)
Figure 3. (a) 2-twisted and (b) 4-twisted placements of a 13-block object
place blocks . . . " in the given examples). One may now immediately see that a produced r-twisted sequence is such that, for given r, when the object's blocks are read o the tape sequentially, every r-th block (boldfaced in Figure 3) will be uploaded into PS precisely before the time it has to be consumed. These ' B=r blocks can thus be played from TS without any PS buer space requirements; the remaining blocks will be elevated to SS, and will be played from SS when needed. Algorithm 3 materializes the above idea. The notational conventions of Algorithm 1 have been used here,
Algorithm 3 APWAT on r-twisted sequence, when system's bandwidth ratio is r
INPUT: r-twisted placement of object's blocks in TS B1 := B1
next upld blk := next displ blk
upload(next upld blk++)
for
i
time unit :=
parbegin 1. 2.
1 to
l
B
;1
m
r
do
consume(next displ blk++)
for j :=1 to r ; 1 do
elevate(next upld blk++)
parend
upload(next upld blk++)
too. Note that the given Alternate Play With A Twist (APWAT) algorithm is essentially an Alternate Play algorithm (that is, some blocks are played from TS and some others from SS), but applied on a speci c arrangement of the object's blocks, so that boldfaced blocks are played from TS.
18
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
Table 4 Example of APWAT algorithm (on 2-twisted sequence; B =13; r =2) time unit
(elevated) to SS
0 1 2 3 4 5 6
| B8 B8 B9 B8 B9 B10 B8 B9 B10 B11 B8 B9 B10 B11 B12 B8 B9 B10 B11 B12 B13
(uploaded) block to PS displayed B1 B2 B3 B4 B5 B6 B7
| B1 B2 B3 B4 B5 B6
The APWAT algorithm terminates when all blocks have been read o the TS. The display of the requested object continues by consuming one more block from PS, and all remaining blocks exclusively from SS. The SS bandwidth requirements remain constant (r ; 1 blocks per time unit) throughout the algorithm's execution; the display of the remaining blocks requires a constant bandwidth of 1 block per time unit. An example of APWAT's action on the 2-twisted sequence of Figure 3(a), when r = 2, is shown in Table 4. Again, newly arriving blocks have been underlined. One can immediately see that the SS bandwidth savings of the APWAT method are B=r blocks (since 1=r of the blocks are played from TS) for each object. Clearly also, the PS buer requirements are zero. 3.2. Generalizing APWAT 3.2.1. Disassociating integer r and twisting Given the system parameter r, APWAT can be applied not only on the rtwisted sequence, but also on any -twisted sequence, where k is a divisor of r. In this case, APWAT will play all boldfaced blocks (of the -twisted sequence) from TS. An example of APWAT's action on the 2-twisted sequence of a 13-block object (see Figure 3(a)) when r =4 is given in Table 5. Once more, newly arrived blocks have been underlined. Clearly, when the system's bandwidth ratio is r, the SS bandwidth savings of our newest APWAT algorithm applied on the -twisted sequence, are Bk=r blocks, since k=r (boldfaced) of the B blocks are never ushed to or retrieved r
k
r
k
r
k
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
19
Table 5 Example of APWAT algorithm (on 2-twisted sequence; B =13; r =4) time unit
(elevated) to SS
(uploaded) to PS
0 1 2 3
| B8 B9 B8 B9 B10 B11 B8 B9 B10 B11 B12 B13
B1 B2 B3 B3 B4 B5 B4 B5 B6 B7
block displayed | B1 B2 B3
from the SS. The PS buer space requirements of the new algorithm are B (k;1)=r. This is true because the algorithm runs for B=r time units, during each of which the number of blocks in PS is increased by k (uploaded from TS) and decreased by 1 (consumed). Note that when k = 1, APWAT's PS space requirements are zero. Note also that APWAT is always better than Alternate Play, in the sense that for any attainable SS bandwidth savings, APWAT's PS requirements are less that Alternate Play's PS requirements. This follows, since ( ;1) ( ;1) is equivalent to r k, which is always true. Bk r
B k
r2
r
3.2.2. Real r As it were the case with our Alternate Play algorithm (in Section 2.2.2), our APWAT algorithm and its companion Twisted Placement algorithm do not require r to be an integer. One may generate the brc-twisted sequence, and apply APWAT on it (playing all boldfaced blocks from TS). Let's call this algorithm the brc APWAT algorithm. The obvious drawback of the brc APWAT algorithm is that the PS buer space requirements are not zero any more. Indeed, one may routinely verify that, in the worst case, the PS buer requirement will increase to B=(r(r ; 1)). This is so, because the APWAT algorithm runs for B=r time units, displaying B=r blocks. During this time, B= brc blocks are uploaded to PS. Since all blocks are played from TS, when the algorithm terminates, the number of blocks still left in PS is b c ; = B ;bb cc B ( 1;1) . If PS space is scarce, an alternative to the brc APWAT is possible. The main idea is to generate a twisted sequence such that, when it is APWAT-played, B
B
r
r
r
r
r r
r r
20 B1
P. Triannta llou and T. Papadakis / Continuous Data Block Placement B2
B10
B11
B3
B12
B4
B13
B14
B5
B15
B6
B16
B7
B17
B18
B8
B19
Figure 4. 2.3-twisted sequence of a 20-block object
no boldfaced block will be ever uploaded earlier than the time it will be consumed (actually, since r may not be an integer, a block is allowed to be uploaded earlier than the time it will be consumed but only by a fraction of a time unit). An example of such a twisted sequence is given in Figure 4. Pseudo-code is given as Algorithm 4. Since the distances between the boldfaced blocks of our new
Algorithm 4 Twisted Placement (for real r) INPUT: B, real r OUTPUT: r-twisted sequence T[1]; : : :; T[B] of object's blocks (to be placed on TS) d := r ; brc T[1]
:=
B1
l
m
for i:=1 to ; 1 do T[brc i +1+ bdic] := B +1 randomly place blocks Bd e+1 ; : : :; B B r
i
B r
B
to blank tape positions
method are sometimes b1c and sometimes d1e , we will refer to this new twist and play method as the brc&dre APWAT algorithm. Clearly, the SS bandwidth savings of our newest brc&dre APWAT algorithm are B=r blocks, and the PS requirements is 1 block (needed to hold parts of the partially uploaded blocks). r
r
3.2.3. Disassociating real r and twisting If the system parameter is a real number r, APWAT can be applied not only on the r-twisted sequence, but also on a r0-twisted sequence, for any r0 tmin, the start-up latency will be higher (as implied by (4)), but the TS will be idling (since when the time slice of the 1st object arrives again, not all of its blocks, uploaded during its previous time slice, will have been consumed), allowing it to be used for other purposes. Nonetheless, the display of the object will be done correctly (i.e., without hiccups), the PS space requirement will still be zero, and the total SS bandwidth savings will still be jB=r for all j jobs. It is worth noting that, for the con guration in Example 1 where j = 20, the maximum start-up latency witnessed by a multiplexed stream is smaller than 7 minutes. If multiplexing was not performed, to serve 20 streams the maximum start-up latency witnessed would be over one hour and thirty eight minutes, which is of course unacceptable for video-on-demand applications.
5. Data Elevation With Unknown Multiplexing Degree The algorithms of the previous section (generating the (t; r)-organizations and subsequently APWAT-playing them) work if the number j of multiplexed jobs in known in advance. In this section, we will discuss methods of overcoming this limitation. Assuming that r> 1, if j >r, then no multiplexing of all j display requests is possible. We may thus assume that there is a known upper bound j on the maximum value of the multiplexing degree j for the system. Obviously, j 2 is also true. In order to display j objects, for arbitrary 2 j j , it suces to use the tmin(j )-organization. This is so, because, according to (2), tmin(j ) tmin(j ). The discussion now of the last paragraph of Section 4.2 implies that using the tmin(j )-organization will work correctly, without changing the attained SS bandwidth savings of jB=r blocks, or the required zero PS buer space. One drawback of the last proposal is the fact that the TS may be idling as discussed in Section 4.2. Additionally, this proposal suers from increased start-up latency, as shown by (4). If startup latency is of great importance, it can be traded-o with additional TS media storage space, as follows: Create and store a tmin (j1)-organization, a tmin(j2)-organization, etc., for various values of j = j1 ; j2; : : :. Each used tmin (j )-organization is a full replica of an object in TS. M
M
M
M
M
i
30
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
Frequently, the workload can be guessed, using past experience, and this can be used as a guide to decide which replicas to create. At run time, given the current multiplexing degree j , use the smallest such tmin(j ) value which is larger than tmin(j ). This will result in the smallest possible start-up latency. On the other hand, as shown by (3), selecting the smallest such tmin(j ) value (which is larger than tmin(j )) results in a smaller value for the maximum allowable number of multiplexed jobs. Since the total SS bandwidth savings oered by our methods are potentially greater when the allowable multiprogramming degree is higher, it is most bene cial from this point of view to use the tmin value for the highest possible number of multiplexed jobs. To resolve this tradeo between start-up latencies and achievable SS bandwidth savings, we suggest the following algorithm: given j use the smallest such tmin(j ) value which is larger than tmin(j ) as more requests arrive to the TS for multiplexing, (or depart from TS service) dynamically switch to the smallest such tmin (j ) value which is larger than tmin(j ) for the new value of j . With this algorithm we exploit the replicas (with the dierent t-tuple organizations) in order to dynamically adapt to the new j value and allow the maximum number of requests to be multiplexed by the TS, while at the same time enforcing the smallest possible penalty for their start-up latencies. curr
i
curr
i
curr
curr
i
curr
i
curr
curr
curr
6. Pragmatic Concerns In this section we focus on pragmatic concerns arising from tape technology and MPEG-based video-server application peculiarities and which present diculties for the implementation of the placement strategies developed in the earlier sections. 6.1. Streaming Mode Recording
The rst concern is with regard to the fact that tape drives are usually recorded in a streaming mode. Because of their relatively high error rates, this means that, when errors are detected during recording, the re-write of the aected block occurs without stopping and rewinding the tape, creating essentially a number of "bad spots" or "holes" amidst the useful data on the tape. This of course complicates the requirements for our twisted placements.
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
31
The proposed solution is technical. Recall from Section 3 that given the system parameter r, our play algorithms can operate on any r0 -twisted sequence, for any r0 < r. Note that essentially the aforementioned "holes" on a tape have the eect of reducing the transfer rate of the real/useful data. In other words, given r and an estimation of the write-error rate of a tape drive, we can (perhaps pessimistically, to avoid hiccups) estimate the parameter r0 and create the twisting sequence for r0. As an example, if r = 4 and our estimations of the tape write-error rate lead us to r0 = 2, instead of using the 4-twisting sequence shown by Figure 3(b), we can use the 2-twisting sequence of Figure 3(a). When recorded on tape, there will be "holes" in between the blocks of this sequence so that on average, in each time unit the tape head will have scanned as much tape is required to hold r = 4 blocks, but only r0 = 2 useful data blocks will have been read o the tape, one of which will be played from TS directly. 6.2. Twisting Variable Sized Blocks
Our champion application is video servers. The predominant encoding technique for videos is MPEG. Given the well-established practice of normalizing video block lengths with respect to the playback duration time, the recorded MPEG video blocks necessarily become of variable size. The variability in the block sizes, of course, complicates our twisted placements. In fact, it complicates all video-data placement problems. For this reason most related work (placement algorithms (and their analyses) for video data on SS) [O RS96,TF98,BMGJ94] assume constant block sizes, as we have done up to now. In this section we solve this problem. We adopt the following approach. Let us assume that b = T MB/s and s denotes the size of block i; 1 i B , with s now denoting the average block size. Recalling that each time unit lasts d seconds, in each time unit T d MB are read o the tape. Assuming that none of the B blocks has a size that is greater than T d, the pseudocode for the twist algorithm is given as Algorithm 8. T
i
Observations
1. During each time unit, by design, one partition is read. 2. Blocks B2 ; B3; :::; B
+1
L
will be the Play-from-TS blocks.
32
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
Algorithm 8 Twisted Placement with Variable Block Sizes INPUT: B, s ; 1 i B, s, i
B1
1.
Store
at the beginning of the tape
2.
Split the remaining movie size into Each partition has size There will be
L= 3.
l
( ); B s
s1
T d
Store blocks
m
L
T d
partitions
MB
such partitions, where
B2 ; B3 ; :::; B
+1
L
at the right end of
partition 1, 2, ..., L, respectively 4.
Distribute the remaining blocks into the empty spaces of the partitions -- these will be the Play-from-SS blocks.
3. As long as each Play-from-TS block is " ushed right" in each partition, there will be no additional PS space requirements { these blocks will arrive just in time before their display starts. 4. Step 4 with the above requirement is NP-Complete. It can be reduced to the well-known bin packing problem, where each partition corresponds to a bin and we want to employ as few (bins) partitions as possible during Step 4. That is, we wish to t the remaining Play-from-SS blocks as tightly as possible in the partitions, so to have the Play-from-TS blocks closer to the right end of their partitions, so to minimize the additional requirements for PS space. Thus, nding the solution with the minimum PS space requirements is an NP-Complete problem. We want an approximation algorithm for Step 4 that is computationally inexpensive and performs close to the optimal solution. A simple algorithm that meets these demands is the Decreasing First Fit (DFF) algorithm [GJ79]. DFF, when "translated" into our problem setting, works as follows: First it sorts the blocks in decreasing order of their sizes. Then the blocks are placed in this order into the rst partition they t. It can be shown [GJ79] that the performance of DFF is never worse than 22% of the optimal solution, which in our problem setting means that: 1. On occasion DFF will result in requiring 22% more tape storage for a video, because of "holes" in each partition between the Play-From-SS and the Play-
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
33
from-TS blocks. This also implies that, in the worst case, 22% more time will be required to read the blocks of a video from the tape. 2. However, each Play-from-TS block will require zero additional PS space. The obvious solution to the above problem is to, rst, shift left each Playfrom-TS block in its partition to ll the empty space and, second, shift left the partitions to cover the now empty spaces between the partitions. This will eliminate all "holes" but will also result, in the worst case, in 22% additional PS space. Thus, we need an algorithm that will strike an eective balance between TS read time and PS space overhead. Pseudocode for this algorithm is given as Algorithm 9. The essence of the algorithm's contribution is in Step 6, which guarantees that there will never be a requirement for additional PS space that is greater than T d MB. This is achieved at the expense of having fewer Play-from-TS blocks (i.e., lower SS bandwidth savings, since some partitions "lose" their Play-from-TS blocks). A following example, using typical parameter values, will illustrate the usefulness of the above algorithm. 6.3. Average I/O Bandwidth Savings of Algorithm 9
We now provide an analysis showing the average I/O bandwidth savings of the above algorithm, in the worst case performance of the DFF algorithm. The total empty space, in the worst case, that would result from the running of Algorithm 8 is 22% of the total space, that is:
total empty = :22 B s:
(5)
L BT ds
(6)
avg empty = total Lempty :
(7)
T dL : k avg empty T d ) k :22 B s
(8)
The number of partitions is
The average empty space per partition is:
Thus, on average, one in every k partitions "loses" its Play-from-TS block, where
34
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
Algorithm 9 Polynomial-Time Twisted Placement under Variable Block Sizes INPUT: All Play-from-SS and from-TS blocks, after B1 is placed at the beginning. next part = 1 remaining blocks = inherit empty = 0 next TS block = B2
set of all Play-from-SS blocks
0.
Divide available tape into partitions of size
1.
Put at the right end of partition
2.
Sort
3.
Run DFF algorithm until partition
remaining blocks
next part
T d
MB.
the block
next TS block
in decreasing order
next part
can be filled no more
i.e., until its empty space is smaller than minimum block size in
4.
inherit empty
5.
shift left the Play-from-TS block in partition
+= amount of empty space in partition
the partition's empty space
6.
inherit empty T d) inherit empty = 0
If (
then
the last block in partition
g 7. 8. 9. 10.
f next part
will now be marked
as a Play-from-SS block (i.e., partition remove from
next part next part to cover
remaining blocks
remaining blocks
next part += 1 next TS block +=
next part
lost its Play-from-TS block)
the ones just placed
1
Repeat until all blocks are placed.
Therefore, the number of partitions without a Play-from-TS block is and the I/O bandwidth savings are L ; . Example 7: Consider an object consisting of 3,000 blocks, each 0.5 MB, while d = 1 sec and T = 5 MB/sec. For this example we have: total empty = 0:22 1:5GB = 330MB, L 155 = 300, avg empty = 330300 = 1:1MB , k 5. The number of partitions without a Play-from-TS block is about 60. So, in the worst case, we require at most 5MB PS space while about 80% of all partitions retain their Play-from-TS block. L k
L k
: GB MB
MB
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
35
7. Conclusions In this paper we have addressed the problem of continuous data elevation in multimedia servers which are based on HSMSs; a problem which, in our view, has not received adequate attention despite the facts that: (i) the current cost per megabyte gures make tape libraries the most cost-ecient medium for such applications, (ii) several real-world products are already employing tape libraries for storing continuous media, and (iii) the signi cant continuous annual increase in the tape library market promises near-future technological improvements that will improve their performance. We rst contributed the notion of alternating the playback of delay-sensitive data between the TS and the SS and discussed how this idea can save signi cant SS bandwidth (but also pointed out that it requires non-zero buer space). Subsequently, we contributed the Twisted Placement algorithm with a companion play algorithm, called Alternate Play With A Twist; the Placement algorithm determines the proper placement/recording order for the blocks of objects on the tapes so that the Play algorithm achieves the same SS bandwidth savings as before but with zero PS buer space requirements this time. Subsequently, we contributed the notion of strips of streams, which are special partial replicas of stream objects, residing permanently in TS, and which consist of the blocks which are to be played from TS. Strips of streams allow the previous contributions to be enjoyed even when (1) the bandwidth of TS is smaller than the display bandwidth of objects; and (2) the objects also reside in SS (which will be the case for the most popular objects). Later we considered the subject of multiplexing TS-resident video streams over tape drives. Multiplexing is desirable for two reasons. First, it can drastically reduce the average start-up latency of videos residing in the TS. Second, employing our placement and Play algorithms in such environments can oer the associated SS bandwidth savings for all streams being multiplexed, increasing the total bene ts of our approach. We derived algorithms which employ the previously-developed techniques to continue oering SS bandwidth savings, at no additional PS buer space. We presented an algorithm showing how to store the stream on tapes so that a high multiplexing degree j is maintained. For each of the j jobs the algorithm continues to achieve the aforementioned savings; thus, the total savings are signi cantly greater. For many applications we expect that the data objects will exhibit skewed
36
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
access distributions (e.g., popular movies in a movie-on-demand application). For a large majority of cases the most popular objects will have been uploaded and will be residing in SS. In such scenarios, the SS will be responsible for satisfying a great percentage of requests, referring to the most popular streams. With present technology, however, even the high-end SS server products do not have enough bandwidth to satisfy all requests for all streams they store. (This is expected to hold in the future as well, given the pace of improvement of storage capacities.) When taking into account the IO bandwidth required for uploading colder streams from TS to SS, the available IO bandwidth becomes even smaller. In this paper, we have shown techniques which exploit the ever-increasing bandwidth of modern tape drives, for increasing the available IO bandwidth for satisfying requests. We made an eort to view the problem comprehensively in the sense that we focus on all main resources (host RAM memory, SS and IO bus bandwidth, and TS bandwidth). With our techniques the overall IO bandwidth is increased without requiring extra host RAM, while still avoiding video hiccups. In a sense, the aforementioned contributions suggest the collaboration of TS and SS in order to improve the system's throughput. The essence of our proposed techniques is that the TS of a HSMS, is not simply used to augment the storage capacity of SS (as its traditional role would indicate) but to augment the SS bandwidth as well. This is the high level contribution of this research.
References [ASS96]
A.Dan, D. Sitaram, and P. Shahabuddin. Dynamic batching policies for an ondemand video server. ACM/Springer Multimedia Systems, (4):112{121, June 1996. [BMGJ94] Steven Berson, Richard Muntz, Shahram Ghandeharizadeh, and Xiangyu Ju. Staggered striping in multimedia information systems. In Richard T. Snodgrass and Marianne Winslett, editors, Proc. ACM SIGMOD Conference, Minneapolis, MN (ACM SIGMOD Record, 23(2), June 1994), pages 79{90, May 1994. [Che94] A.L. Chervenak. Tertiary Storage: An Evaluation of New Applications. PhD thesis, University of California, Berkeley, 1994. [Che98] A. L. Chervenak. Challenges for tertiary storage in multimedia servers. Parallel Computing, 24:157{176, 1998. [CP90] P. Chen and D.A. Patterson. Maximizing performance in a striped disk array. In Proceedings of the 17th Annual IEEE Symposium on Computer Architecture, pages 322{331, 1990. [CT95] Stavros Christodoulakis and Peter Trianta llou. Research and development issues for large-scale multimedia information systems. ACM Computing Surveys, 27(4):576{
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
37
579, December 1995. [CTZ97] S. Christodoulakis, P. Trianta llou, and F. Zioga. Principles of optimallly placing data in tertiary storage libraries. In Proc. of 23rd Intl. Conf. on Very Large Data Bases, VLDB, August 1997. [Dan95] A. Dan. Buering and caching in large-scale video servers. In IEEE CompCon Conference, pages 217{224, 1995. [DS94] A. Dan and D. Sitaram. Buer management policy for an on-demand video server. Technical Report RC 19347, IBM, 1994. [GDS95] Shahram Ghandeharizadeh, Ali Dashti, and Cyrus Shahabi. A pipelining mechanism to minimize the latency time in hierarchical multimedia storage managers. Computer communications, 18(3):170{184, March 1995. [Gea94] G.R. Ganger and et al. Disk arrays: High-performance, high-reliability storage subsystems. IEEE Computer, (3):30{36, March 1994. [GJ79] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-completeness. Freeman and Company, 1979. [GLM96] L. Golubchik, J.C.S. Lui, and R.R. Muntz. Adaptive piggybacking: A novel technique for data sharing in video-on-demand storage servers. ACM/Springer Multimedia Systems, (4):140{155, June 1996. [GR] S. Ghandeharizadeh and L. Ramos. Continuous retrieval of multimedia data using parallelism. IEEE Trans. on KDE, pages 658{669, August. [GR98] L. Golubchik and R. K. Rajendran. A Study on the Use of Tertiary Storage in Multimedia Systems. In Proceedings of the Joint NASA and IEEE Mass Storage Conference, March 1998. [GS94] Shahram Ghandeharizadeh and Cyrus Shahabi. On multimedia repositories, personal computers, and hierarchical storage systems. In Proc. ACM Multimedia Conferece, 1994. [Hen97] J. Hennessy. The role of data storage in broadcasting future. In www.tvbroadcast.com/archive/2897.4.htm, 1997. [HS96a] B. K. Hillyer and A. Silberschatz. On the modeling and performance characteristics of a serpentine tape drive. In Proc. of the ACM SIGMETRICS Intern. Conf., pages 170{179, 1996. [HS96b] B. K. Hillyer and A. Silberschatz. Random I/O scheduling in online tertiary storage systems. In Proc. of the ACM SIGMOD Conference on the Management of Data, pages 195{204, 1996. [JM98a] T. Johnson and E. Miller. Benchmarking tape system performance. In Proceedings of the Joint NASA and IEEE Mass Storage Conference, March 1998. [JM98b] T. Johnson and E. Miller. Performance measurements and models of tertiary storage devices. In Proc. of 24th Intl. Conf. on Very Large Data Bases, VLDB, 1998. [Joh96] T. Johnson. An analytical performance model of robotic storage libraries. Performance Evaluation, (27{28):231{251, 1996. [KDST95] Martin G. Kienzle, Asit Dan, Dinkar Sitaram, and William Tetzall. Using tertiary storage in video-on-demand servers. In COMPCON'95, San Francisco, CA, pages
38 [KK95] [KMP90] [KP98] [KRT94] [KW98] [LLW95] [LS93] [O RS96]
[Pat93] [PGK88]
[RGF98] [RV93] [RZ95] [SGM86] [Sim97] [Tea93]
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
225{233. IEEE Computer Society Press, March 1995. K. Keeton and R. Katz. Evaluating video layout strategies for a high performance storage server. ACM/Springer Multimedia Systems, (3):43{52, 1995. J. Kollias, Y. Manolopoulos, and C.H. Papadimitriou. The optimum execution order of queries in linear storage. Information Processing Letters, 36(3), November 1990. K. Keeton and D. Patterson. A case for intelligent disks (idisks). Sigmod Rec., 27(3), August 1998. M. Kamath, K. Ramamritham, and D. Towsley. Buer management for continuous media sharing in multimedia database systems. Technical Report 94-11, University of Massachusetts, Amherst, MA, 1994. A. Kraiss and G. Weikum. Integrated document prefetching and caching in hierarchical storage based on markov-chain predictions. VLDB Journal, pages 141{163, May 1998. S. W. Lau, J. C. S. Lui, and T. C. Wong. A Cost-eective Near-line Storage in Multimedia System. In Proceedings of the 11th International Conference on Data Engineering, March 1995. P. Lougher and D. Shepherd. The design of a storage server for continuous media. Computer Journal, (1), 1993. Banu O zden, Rajeev Rastogi, and Avi Silberschatz. Disk striping in video server environments. In Proc. International Conference on Multimedia Computing and Systems, Hiroshima, Japan, pages 580{589. IEEE Computer Society Press, June 1996. D.A. Patterson. Keynote speech. In 2nd IEEE International Conference on Parallel and Distributed Information Systems, January 1993. David Patterson, Garth Gibson, and Randy H. Katz. A case for redundant arrays of inexpensive disks (RAID). In Haran Boral and Per- Ake Larson, editors, Proc. ACM SIGMOD Conference, Chicago, IL (ACM SIGMOD Record 17(3), Sept. 1988), pages 109{116, June 1988. E. Riedel, G. Gibson, and C. Faloutsos. Active storage for large-scale data mining and multimedia applications. In 24th Int. Conf. on Very Large Data Bases (VLDB), August 1998. V. Rangan and H. Vin. Ecient storage techniques for digital continuous multimedia. IEEE Trans. on KDE, pages 564{573, August 1993. D. Rotem and J.L. Zhao. Buer management for video database systems. In Proceedings of IEEE International Conference on Data Engineering, pages 439{447, 1995. Kenneth Salem and Hector Garcia-Molina. Disk striping. In Proc. International Conference on Data Engineering, Los Angeles, CA, pages 336{342, February 1986. D. Simpson. Untangle your tape storage costs. In Datamation, June 1997. F.A. Tobagi and et al. Streaming raid { a disk array management system for video les. In Proceedings of ACM Multimedia, pages 393{400, 1993.
P. Triannta llou and T. Papadakis / Continuous Data Block Placement
[TF98]
39
Peter Trianta llou and Christos Faloutsos. Overlay striping and optimal parallel I/O in modern applications. Parallel Computing Journal, 24(1):21{43, January 1998. [TH99] P. Trianta llou and S. Harizopoulos. Prefetching into smart-disk caches for highperformance continuous media servers. In IEEE International Conference on Multimedia Computing and Systems, June 1999. [TP97] P. Trianta llou and T. Papadakis. On-demand data elevation in a hierarchical multimedia storage server. In Proc. of 23rd Intl. Conf. on Very Large Data Bases, VLDB, pages 262{235, August 1997. [www97] www.research.microsoft.com/terraserver. The terraserver spatial database. November 1997. [www98] www.tpc.org. Tpc executive summaries. October 1998.