Strategic Replication of Video Files in a Distributed Environment 1 ...

3 downloads 11160 Views 316KB Size Report
Distributed Multimedia Research Center. Department of ... Entertainment, education, teleconferencing, telemarketing, telemedicine, etc. are a number of promising ap- plications of the ...... We call this situation Storage Overflow. In the Storage ...
Strategic Replication of Video Files in a Distributed Environment Youjip Won and Jaideep Srivastava Distributed Multimedia Research Center Department of Computer Science,University of Minnesota Minneapolis, MN 55455 yjwon srivasta @cs.umn.edu 



Abstract The per service cost have been serious impediment to wide spread usage of on-line digital continuous media service, especially in the entertainment arena. Although handling the continuous media may be achievable due to the technology advances in past few years, its competitiveness in the market with the existing service type such as video rental is still in question. In this paper, we propose a model for continuous media service in a distributed infrastructure which has a video warehouse and intermediate storages connected via a high speed communication network in an effort to reduce the resource requirement to support a set of service requests. The storage resource and network resource to support a set of requests should be properly quantified to a uniform metric to measure the efficiency of the service schedule. We developed a cost model which maps the given service schedule to a quantity. The proposed cost model is used to elaborately capture the amortized resource requirement of the schedule and thus to measure the efficiency of the schedule. The distributed environment consists of massive scale continuous media server called a video warehouse, and intermediate storages connected via high speed communication network. An intermediate storage is located in each neighborhood, and its main purpose is to avoid the repeated delivery of the same file to a neighborhood. We presumed the situation where a request for a video file is given sometime in advance. We develop a scheduling algorithm which strategically replicates the requested continuous media files at the various intermediate storages. Keywords: video caching, storage overflow, video scheduling, continous media delivery, distributed service, cost model

1 Introduction 1.1 Motivation Entertainment, education, teleconferencing, telemarketing, telemedicine, etc. are a number of promising applications of the Video-On-Demand technology. In some applications such as business and medicine, the cost of the technology maybe justifiable, but in other applications such as home entertainment, this technology will have to compete with $2 - $3 video rental cost. There is a significant opinion in the community that 1

a true on-demand video service, i.e. one where the video begins as soon as the customer makes a requrest for it, cannot be provided at a comparable cost to video rental rate in the near future. Given the fact that there is sufficient consumer dissatisfaction with fast rising cable rates, any expensive technology is not likely to be very successful in the mass entertainment arena. On the other hand, if we examine existing video rental patterns, it is not unreasonable to assume that customers would be satisfied with a Video-On-Reservation service, where the customer makes the service request some time in advance to the actual presentation time, i.e. an hour, a day, etc . We believe it is possible to provide Vide-On-Reservation service in a cost-effective manner. Since the a set of user requests is available to entertainment provider in VOR service, the server can perform global optimizations based on the user request information as well as the the availibility of various resources. For the Video-On-Demand service, such pre-planning or off-line computing is not possible, due to its inherent on-demand nature. Given the limits on I/O subsystems of the server, there are many difficulties in supporting multiple service requests directly from a single server, while simultaneously achieving low cost. Furthermore, as the length of the communication route between server and client increases, the cost of maintaining smooth media flow across the network increases dramatically. In this paper we present the idea of providing continuous media service in a distributed manner. The distributed environment consists of a number of intermediate storages and video warehouses, all connected by a high speed network. When there are multiple storages distributed over the area, a user can get service from various sources other than from central server, and the users can share a file at an intermediate storage. Fig. 1 depicts the environment, which consists of a video

N1

N4

IS2 IS5

VW1 IS4

N3

VW2

IS1 N2 IS3

IS6

IS7 N5 N6

Figure 1: Topological Layout warehouse, intermediate storages



’s, and the users in each neighborhood 

2



’s. A video warehouse is an

archive that may have several thousand video files in its storage system.

1.2 Related Works In this section, we briefly introduce the works done in each component of our distributed service environment. Papadimitriou et al[PRR94] proposed the service model for on-line home entertainment service and also devised an algorithm to arrange the service schedule. We categorize user requests into two different types based on whether service request is made in advance or not: Video-On-Demand and Video-On-Reservation. Little et al[LV94] classifies interactive services into five categories based on the amount of interactivity allowed. The purpose of the intermediate storage is to temporarily store the files. To exploit the bandwidth capacity and space capacity of the intermediate storage, the files needs to be efficient placed over a set of disks. Little et al[LV95] proposed an algorithm to place video files to the distributed storage system based on the popolarity of the video file. Dan et al[DS95] proposed to consider the ratio between the space requirement and bandwidth requirement of a file in allocating the disk space. Wan et al[WD96] used the idea of bandwidth to space ratio in determining striping width of a file. Due to huge volume and continuity requirements, data block placement is an important factor in determining the performance of the disk storage. By placing the blocks efficiently, it is possible to support a larger number of concurrent video sessions[RVR92, KHS94]. Video warehouse uses a high speed network to deliver the service to customers all over the metropolitan area[BC89, BB94, Bou92]. The existing network infrastructure is not originally designed to support time critical application. There have been a number of proposals to guarantee the continuity over the high speed communication network[Tow93, ZDE 93, SZKT96]. The other community of academia and industry focuses on utilizing the existing communication medium such as twisted pair telephone line or coaxial cable line which is already available to most of the residential unit. The idea of a cable modem is to provide the internet service to PC users over the TV cable systems. Such modems can delivery upto 30 Mbps to each residential unit[Hal96, BR97]. Asymmetric Digital Subscriber Line[SL92, BR97] is one of the most promising technologies to increase the capaicty of the copper loops. ADSL is now capable of pumping the data at upto 6 Mbps. Development of optimal pricing model, how much user has to pay for the service?, suddenly draws wide attention from various communities. This is because the network infra structure makes the transition from research testbed to commercial enterprise. Pricing model and QoS policy are tightly coupled to each other. Shenker et al[SCEH96] and Cocchi et al[CSEZ93] investigates the pricing model of the multiple service class network. Video Warehouse stores several thousands of continuous media files in its archive. There have been a number of suggestion about how to design the massive scale data server. Hierarchical storage structure in video warehouse is the promising solution to achieve the cost-effectiveness of the data storage. Ghandeharizadeh et al[Gha94] and Keinzle et al[MK94] worked on hierarchical storage structure to efficiently utilize the bandwidth of the several storage types. Doganata et al[DT94] provided a cost model 3

for hierarchical-storage-based continous media server. Won et al[WS96] provided a stochastic model to measure the thruput of the hierarchical storage server.

1.3 Contributions A great amount of efforts from various community is put in supporting the continuity requirement of the on-line multimedia information delivery. However, economic aspect of servicing a set of users considering storage and network resources is paid little attention. In this paper, we propose the usage of intermediate storage to relieve the burden of the central server - information provider and to reduce the network traffic. We develop a mechanism, cost model, which maps the overall resource requirement for supporting the set of requests to the domain of comparable values, cost. The proposed cost model enables the information provider to examine the cost-effectiveness of a certain service delivery schedule. We focus on the problem of minimizing the cost for servicing a set of Video-On-Reservation requests. The cost of using intermediate storage and the cost of using network determines how to deliver the service to the end-user. We develop an algorithm to strategically replicate the files to intermediate storages to obtain the efficient service schedule. An important side benefit of this optimization is that for a given load, the environment will be optimized to utilize the least required capacity, thus freeing up valuable spare capacity for the truly Video-On-Demand applications. The rest of the paper is organized as follows: Section 2 introduces the model of the environment and the services provided. In section 3, the mathematical formulation of the problem and the cost model for servicing delivery is provided. In section 4, we present our approach to solving to the video delivery problem, and in section 5, we discuss how to handle storage overflow. Performance evaluation is presented in section 6 and conclusions in section 7.

2 Video Delivery Environment and Services 2.1 Video-On-Demand and Video-On-Reservation In the proposed environment, we classify the type of services into two categories, Video-On-Demand and Video-On-Reservation. In VOD service, a server takes an immediate action to service the request, while in VOR service, user request is to make the reservation for service in advance. To handle VOR requests, time is divided into cycles. For example, supposing we consider 24 hours to be the length of a cycle, all requests for videos to be shown during cycle must arrive at the server by a predefined time before the start of the cycle . If 24 hours is too long a period for an advance reservation, one could consider smaller cycles, say 12 or 6 hours. Alternatively, one can have a number of cycles of various lengths, and requests for shorter cycles would have a higher cost. This observation is likely to lead the information provider to attract airline executives to do their pricing. For example, the user gets 75 % discount of the regular price if he makes a video request in 48 hour advance of his requested showing time, 24 hour advance with 60 % discount, 6 hour advance with 25 percent discount. Considering the computation time to obtain the schedule, the scheduler has to start the scheduling algorithm early enough so that the schedule information is available 4

by the starting of the respective cycle. The advance reservation property of the VOR request enables the scheduler to collect all the requests before it starts scheduling. In scheduling VOD requests, there is neither sufficient information nor time to perform the global resource optimization which is possible for VOR requests. Furthermore, since overall network and storage capacities are fixed, developing efficient, i.e. least resource utilizing, schedules for the VOR requests has the added benefit of making more resources available for dynamically arriving VOD requests.

2.2 Economics of Storage and Network Due to the huge volume and bandwidth constraints of the playback, continous media application imposes another dimension of complexity to the computer systems. Multimedia file compressed with MPEG-2 encoding[WG193] scheme requires 3 - 6 Mbps bandwidth for its playback. Assuming the typical length of a video file to be 110 minutes, it requires about 3.3 Giga bytes of storage space for an average sized video file. With the current market price for hard disk at approximately $ 100/(Giga Byte), it costs $165,000 to maintain five hundred video files on the disk storage, excluding other hardware overhead. Aside from the cost of disk storage for five hundred video files, the problem of interconnecting and maintaining huge number of disks with the existing SCSI interface arises as an obstacle to store five hundred video files on the disk subsystem. For example, considering the capacity of a single disk which is commercially available ranges from 2 - 10 Giga Bytes, at least 50 disk drives need to be interconnected to form disk storage subsystem. Recently, there have been proposals for new storage interfaces which adopt serial data communication technology[Dem95, SV95]. We argue that hierarchical storage structure, where secondary storage being disk subsystem and tertiary storage being tape library or optical drive is the promising storage architecture for cost effective data management. The purpose of a intermediate storage is to temporarily store the video files to avoid repeated transmissions to the same neighborhood. Each intermediate storage is primarily responsible for supporting its   will have neighborhood consisting of several houses. It is not expected that an intermediate storage  sufficient capacity to store all the files requested by neighborhood  at peak demand. Insufficient storage space available in the intermediate storage will result in a much higher service cost, since a continuous playout must then be supported across the network. We provide examples for the usage of intermediate storage in Fig. 2. For a popular video, it is likely that the video file will have to be shown several times in the same neighborhood. Consider Fig. 2, which shows three requests for Star Wars from in a 90 minute interval.  Rather than delivering the video directly from three times, it is possible to store the video file at while serving the earliest request. The second and third video requests can now be serviced directly from  . Considering the high load on the warehouse server and on the metropolitan area network during prime time hours, it may be much cheaper to service the three viewers with video caching. Fig. 2 illustrates the usage of the intermediate storage system for this purpose.





5

        &'   ()+* ,-  / 0

8

0  8  8: 9= @?A,  .  D  8  D 8  EGF $ HIKJ)L  $M?B,  .  EON7P3QR ST)U V P3QR S T U R WX Y  *>Z  [  %

set of Intermediate Storages Intermediate Storage i set of storage costs    charging rate for intermediate storage (in $/  "!#$$% set of network costs    . cost rate of the network connection between and in $/byte Number of available video files

1325476 0 

a set of requests for file * service schedule of file * obtained from individual video scheduling .  service schedule obtained with the constraints @?B, 1C25476 8  service schedule for all files i.e., 8  total cost of service schedule 8 total cost of service schedule  . set of stays involved in cache overflow during interval ?B at    . Overflow situation ?B, . EON P3QR ST U  % heat of rescheduling with respect to length of service for file * size of the file * (byte)   [  file transfer information, +J)\] ,^^_ ,^* F  [  W    e file residency information, a` cb ,a^d ,aIJ% ,a* ,^(b>f , $# *>%MgI+*# .

Table 1: Variables

6

U1

Request for "Star Wars" Starts at 1:00 pm

U2

Request for "Star Wars"

Switch Box

Starts at 2:30 pm Request for "Star Wars"

Video Warehouse

U3

Delivery to U1 : VW2 Delivery to U2: VW2 Delivery to U3 : VW2

U1 U2 U3

Starts at 4:00 pm

U1

Request for "Star Wars" Starts at 1:00 pm

U2

Request for "Star Wars" Starts at 2:30 pm

U3

Request for "Star Wars" Starts at 4:00 pm

IS

Video Warehouse Delivery to U1 : VW-> IS-> U1 Delivery to U2: IS -> U2 Delviery to U3: IS -> U3

Figure 2: Usage of the intermediate storage system

3 Problem Formulation 3.1 Service Schedule A user request arriving at the server consists of three attributes, user id, video id and starting time. The service schedule is a set of information about how to arrange the delivery of the continuous media streams to end users. There are two types of information in a service schedule - network transfer information and usage of the intermediate storage . Servicing a video file to end user requires network transfer of the stream and possible caching of the file at the intermediate      storage. The network transfer information is ( ). is a sequence of network nodes,     i.e. , where and correspond to intermediate storages. Instead of representing file transfer route as a source and destination pair, sequence of nodes are used to represent the delivery route. This enables the scheduler to dynamically select the route between each node, depending on the network   situation. Delivery information informs the server and intermediate storages that flow of file from    to is to begin at . Routing information between and end user is not specified since we assume the path between the user and its local intermediate storage is uniquely defined. The intermediate storage is said to be local to a user if it is located in the same neighborhood with the respective user.

      %&('*)  % ! &(+

%! &-+

           "!   #$ 



%,&-'*)

% ! &-+



!

%,! &-+



% &-'*)

In servicing a set of requests, a video file has to be stored temporarily in various intermediate storages by copying data blocks from the on-going continuous media stream. The purpose of storing a file in an interme diate storage is to temporarily cache the video file for subsequent delivery to other users. The information     about temporary storage of a video file is the vector of five elements: .    denotes the interval of caching. corresponds to the time when the file starts being loaded at the  intermediate storage. denotes the start time of the last service. A file resides at intermediate storage

/ &   1 2

1

 

 4     0  5  4 5 .*/ 0&  31 2  % (& ' )  6  * 7

&

7

to service a set of the requests. The data blocks which have not been consumed by the last service need to be maintained at the intermediate storage. The data blocks which are consumed by the last request in   chronological order are no longer required. Thus the caching interval is is followed by the playback    and represent the intermediate storage and the respective file. is the duration of the last service.  source of the on-going stream from which the data block is copied. can be either another intermediate storage or VW.

4 

/ 0&   1 2 % &-'*)



% &('*)

3.2 Cost Modeling



To measure the overall cost-effectiveness of the service schedule , there needs to be a generic mechanism  which maps the given schedule into a cost. Schedule is a collection of network transfer information ’s  and file residency information ’s. The mapping mechanism should capture ’s resource consumption in storage devices and the communication network. We define as a mapping function that transforms service . We use the monetary metric to represent schedule into a quantity. The cost of thus corresponds to the cost for a schedule . The cost of a schedule is sum of the cost of using storage devices and cost of   using communication medium. The network transfer information ’s and file residency information ’s has the different formation and thus different function is used to obtain its respective cost. It is worth noting that amortized resource requirement needs to be discriminated from cost. Amortized resource requirement is a metric to capture the amount of resources. Cost is amount of the resource consumed multiplied by per unit cost1 of each resource. In a heterogeneous environment, it is not possible to impose a uniform per unit cost to all resources which have different performance and characteristics. For example, per unit cost of high performance disk subsystem can be higher than that of general purpose disk subsystem. Per unit cost is inherent to an individual resource entity, e.g. each intermediate storage or each network hop. We   and   for file residency information  , and network transfer information develop the function  , respectively which maps given information into the domain of cost. The cost of schedule is hence represented as in Eq. (1).



. 7

. 7





. 7









. 7     .   7     .   7 

3.2.1



(1)

Cost Model for Storage

For continuous media files, playback length of the data file cannot be determined solely from the file size. This is due to the fact that different files may require different playback rates and thus two files of different sizes may have the same playback length. The amortized resource requirement at an intermediate storage   is the amortized resource is a function of file size, duration of residency, and playback length. requirement multiplied by the charging rate at the respective intermediate storage. To capture the amortized   has to reflect all the three attributes. In the charging storage space requirement of the service properly, mechanism for storage, we propose a cost model that considers both the duration and the size of the service. Unit of the storage cost model is $/(byte sec).

. 7

1

termed as charging rate of the resource in this paper.

8

. 7

Files are loaded on an intermediate storage by copying data blocks from streams during its transmission, and thus the disk space is filled incrementally until the entire file is cached. We assume that the storage space   of needs to be reserved from the start of the caching. We categorize the file residency information   into two types: namely, short residency and long residency, depending on the length of interval . Let       be a playback length of a file . is categorized as short residency type if , and as     long residency type otherwise. In the rest of the section, is .

5   !

 /  &  31 2     !  .*/  &  *1 2  4     % 0 . 5 1   6   &  74 5 *7 !  &(' ) Long Residency        , with service starting times          , Let us assume that the service list of is     respectively. These users are serviced via continuous media stream from   . Without loss of generality,     . Thus,  1 corresponds to  , respectively. It) is not necessary to keep the we assume       is the last user in entire file until the end of the service for  , i.e. until time     . This is because  ! chronological order, and hence the file blocks sent to  can be discarded. We assume that the space used by   will decrease linearly from  to      , and utilization will become  as the service for  proceeds file !  to the end, in Fig. 3. Given this model of the storage space, the amortized storage cost for is formulated as in Eq. (2).  .   7  5    .    7 5     .  1   & 7    ! (2) ) !  .   7 is equivalent to the area enclosed by the solid line of left-hand side graph in Fig. 3. space

space θ>ρ

θ

Suggest Documents