Data Placement and Prefetching with Accurate Bit Rate Control for ...

1 downloads 0 Views 5MB Size Report
to generate its output rate under the constraints of a given constant bitrate, encoding buffer, and ... and the codewordcount was used for accurate bit-rate control.
Data Placement and Prefetching with Accurate Bit Rate Control for Interactive Media Server SEUNG-HO LIM, YO-WON JEONG, and KYU HO PARK Korea Advanced Institute of Science and Technology

An interactive Media Server should support unrestricted control to viewers with their service level agreements. It is important to manage video data effectively to facilitate efficient retrieval. In this paper, we propose an efficient placement algorithm as part of an effective retrieval scheme to increase the number of clients who can be provided with interactive service. The proposed management schemes are incorporated with a bit count control method that is based on repeated tuning of quantization parameters to adjust the actual bit count to the target bit count. The encoder using this method can generate coded frames whose sizes are synchronized with the RAID stripe size, so that when various fast-forward levels are accessed we can reduce the seek and rotational latency and enhance the disk throughput of each disk in the RAID system. Experimental results demonstrate that the proposed schemes can significantly improve the average service time and guarantee more users service of quality, and the interactive media server can thereby efficiently service a large number of clients. Categories and Subject Descriptors: D.4.2 [Operating Systems]: Storage Management; I.4.2 [Image Processing and Computer Vision]: Compression (Coding) General Terms: Algorithms, Management Additional Key Words and Phrases: Interactive media server, disk array, stripe size, video rate, bit count control ACM Reference Format: Lim, S.-H., Jeong, Y.-W., and Park, K. H. 2008. Data placement and prefetching with accurate bit rate control for interactive media server. ACM Trans. Multimedia Comput. Commun. Appl. 4, 3, Article 21 (August 2008), 25 pages. DOI = 10.1145/1386109. 1386114 http://doi.acm.org/10.1145/1386109.1386114

1.

INTRODUCTION

Recent advances in computing and communication technologies have led to the development of an infrastructure in which servers supported a wide range of on-demand interactive multimedia services for education and entertainment business over the Internet or broadband networks. On-demand interactivity means that users can freely interact with a media server by means of VCR-like controls such as stop, pause, fast-forward and rewind. In the media server, streaming data should be accessed with their own playback rates to guarantee streaming bandwidth. Because video streams have extremely large data size, high data retrieval bandwidth is required to support the interactivity for many users. Also,

Authors’ address: Computer Engineering Research Laboratory, Department of Electrical Engineering and Computer Science, 373-1, Korea Advanced Institute of Science and Technology, Guseong-dong, Yuseong-gu, Daejeon, Korea; email: {shlim, ywjeong} @core.kaist.ac.kr; [email protected]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. c 2008 ACM 1551-6857/2008/08-ART21 $5.00 DOI 10.1145/1386109.1386114 http://doi.acm.org/10.1145/1386109.1386114  ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 4, No. 3, Article 21, Publication date: August 2008.

21

21:2



S.-H. Lim et al.

the different resolution (such as X1, X2 and X4 fast forward or rewind) of each interactive operation should meet each real-time IO requirement. In general, disk array technology is employed for the multimedia server to provide the high disk bandwidth for real-time IO requirements [Katz et al. 1989; Ganger et al. 1994; Wang et al. 1996; Shenoy and Vin 1997, 1999; Reisslein et al. 1999; Lee 2002; Huang et al. 2004; Berson et al. 1994; Chen et al. 1995; Ramanathan and Rangan 1994]. In constructing and managing media server with disk array, disk striping is an important configuration method. Disk striping is accomplished by dividing the video data into blocks according to their presentation order and storing these blocks into different disks [Katz et al. 1989; Chervenak et al. 1995]. While storing the blocks into different disks, a proper placement algorithm is needed to efficiently support the retrieval of such streams at different interactivity in accordance with the following three factors. First, each video block should be stored into disks to reduce seek and rotational latency by minimizing the number of disk requests. Second, adjacent video blocks supporting interactive operations that are retrieved in the same real-time playback round at the different interactive levels should stored continuously or adjacently within the disks to enhance the disk bandwidth utilization. A round is defined as a periodic play time to be retrieved and displayed together. Third, special encoding techniques are required to support and manage the placement algorithm. In addition, multimedia servers are used to store a wide variety of video objects with different service level agreements such as different video qualities. For example, high-quality video has a high bitrate that requires high data retrieval bandwidth to guarantee real-time playback for users. Service providers offer higher quality service at higher cost by allocating more resources of the server system such as disk bandwidth, memory or CPU. Also in the case of videos that have low priority and quality, despite that the priority is low, real-time streaming playback should be guaranteed. In order to guarantee quality of service for every service level in parallel, a differential management scheme should be applied to each video data and each interactive operation. In the allocation policy, different encoding and placement methods should be applied for each video quality to reduce disk operation overhead such as seek and rotational latency. In a real-time retrieval policy, an adaptive retrieval method should be applied to guarantee each real-time playback bandwidth. In this article, we propose an efficient placement algorithm to support interactivity in the media server, called Media Synchronized RAID (MSR), and develop a prefetching algorithm considering interactive operation in the proposed placement algorithm. Our placement policy is incorporated with a special bitcount control method, called Fine Tuning of Tail Amount (FTTA), that repeatedly tunes quantization parameters to adjust the bit counts of video frames to the given target bit count. An encoder using this method can generate coded frames whose sizes are synchronized with the RAID stripe size. Thus, when various fast-forward levels are accessed, we can reduce the seek and rotational latency and enhance the disk throughput of each disk in the RAID system. The remainder of this article is organized as follows: In Section 2, we present background and related works. In Section 3, we present an efficient placement and prefetching algorithm, as well as the system architecture. The proposed encoding technique is presented in Section 4. In Section 5, we present performance results of the placement algorithm and encoding technique. Finally, conslusions are presented in Section 6. 2.

RELATED WORK

Several previous works on the design of an interactive media server have been reported. Among these, the following summarizes feasible approaches related to the media server design proposed here. For video data placement and treatment in the media server, Huang et al. [2004] studied a rate staggering method for scalable video in a disk array based video server. The rate staggering method ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 4, No. 3, Article 21, Publication date: August 2008.

Data Placement and Prefetching for Interactive Media Server



21:3

stores different rates of video data separately to provide efficient video resolution. This method can reduce the buffer space and achieve better load balancing due to finer scheduling granularity. However, the employed allocation method does not consider precise disk stripe management or a scalable encoding technique and thus that rate staggering method is impracticable with respect to application to a real disk array. Also, the finer granularity could result in more disk requests and reduce disk utilization. Shenoy and Vin [1999] used the disk array to support interactive operations in multiresolution video. They present an encoding technique combined with a placement algorithm to efficiently support an interactive scan operation. In the placement algorithm, they use two method, fixed-size and variable-size blocks allocation. Fixed-size block placement could lead to additional disk requests to retrieve the video stream in one real-time playback round. Variable-size block placement can reduce additional disk requests, but variable block management is very difficult in a disk array. Rangaswami et al. [2003] developed an interactive media proxy that transforms non-interactive broadcast or multicast streams into interactive ones. They carefully manage the disk device by considering the disk geometry for allocation and making several stream files according to the fast-forward levels. However, this method consumes high-storage capacity, and disk array management is not considered. The real-time feature of a media server is important to guarantee the client’s requirements for playing video streams. For real-time continuous media playback, the disk storage system, disk IO scheduling, memory buffer management and admission control are key design parts in the media server [Carter et al. 2001; Gopalan and Chiueh 2002; Liu et al. 1999; Tran et al. 2003]. Numerous studies on realtime IO scheduling in a media server have been reported during the past two decades [Rangan et al. 1992; Chang and Garcia-Molina 1997; Chervenak et al. 1995; Daigle and Strosnider 1994]. Rangan et al. [1992] studied various admission control policies that permit a media server to satisfy multiple subscribers simultaneously without violating any of the continuous media playback requirements. Chervenak et al. [1995] evaluated several storage systems for video-on-demand servers, and showed that striped disk configurations performed best. Chang and Garcia-Molina [1997] examined the relationship between memory usage and disk utilization and showed that with proper memory use, maximizing disk utilization does not necessarily lead to optimal throughput. Finally, Daigle and Strosnider [1994] developed the disk scheduling models for multimedia systems that use periodic tasks to retrieve data from a disk. In particular, they proposed new a disk scheduling policy for a multimedia application, T-scan, and evaluated several scheduling policies, including fcfs, scan, P-fcfs, P-fcsn, and T-scan. These issues are not addressed in detail in this paper, since they are not strongly related with the proposed concepts. However, some of these techniques can be incorporated with the proposed data placement and encoding techniques in the same system to setup a more high performance interactive media server. With regard to the rate control method, we briefly describe the conventional rate control method to support video streaming. The objective of conventional video rate control is to enable a video source to generate its output rate under the constraints of a given constant bitrate, encoding buffer, and other factors related to the human visual system. Using these video rate controls, loss, jitter, and excessive delay of the video data transmitted over the channel can be prevented, because the server can maintain its output rate within the bounds of the channel bitrate. Also, these video rate controls can help users easily estimate the total file size of an encoded stream before the encoding process is finished if they know the total playback time of the video source. Several previous studies have focused on video rate control algorithms. Rate control schemes using method for finding the optimal solution such as dynamic programming [Ortege et al. 1994] and the Lagrangian technique [Choi and Park 1994; Lee and Ra 1996] have been also proposed. Tiwari and Viscito [1996] proposed a video rate control scheme based on a model for the picture complexity using the coding results of randomly ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 4, No. 3, Article 21, Publication date: August 2008.

21:4



S.-H. Lim et al.

pre-selected macroblocks. This scheme can be used for the software implementation of a non-realtime MPEG-2 video encoder. In Kim et al. [2001], the linear relationship between the actual bitcount and the codewordcount was used for accurate bit-rate control. These schemes are based on some bitrate estimation models (e.g., Rate-distortion models) and can enhance the subjective video quality [Kwon and Kim 2003]. The output bitrate generated by these video rate control schemes is close to the given target bitrate in average size. However, bitcounts of each frame cannot be equal to given target bitcounts, because of modeling errors occurring in the bitrate estimation model and buffer control for enhancing subjective video quality. When a stream encoded with the video rate control scheme described above is stored into the disk array storage, the lack of bitcount must be padded by dummy data or excessive codded data must be cut in order to make the bitcount equal to a multiple of the stripe size. The disk space is largely wasted by padded dummy data and the video quality is hugely degraded by cut coded data. The placement algorithm and combined encoding technique presented in this paper can enhance the performance of a disk array based interactive media server in terms of both constructing and managing the disk array. We have set up a real interactive media server using a SCSI disk array in a Linux operating system. The proposed placement algorithm, prefetching method, and encoding technique are implemented in Linux operating system. 3. 3.1

MEDIA SYNCHRONIZED RAID PLACEMENT Placement Policy on Disk Array

In this section, we explain the proposed placement and prefetching algorithm. In general, a MPEG video stream consists of a number of Group of Picture (GOP), and each GOP is represented as a sequence of I-, P-, and B-frames. If the structure of one GOP is {IBBPBBPBB}, the next level fast-forward scan could be {IPPIPP} which does not include any B-frames, and the next one is {II...} without any P-frames, and so on. Each subsequence for each fast-forward level accessed during a round should be retrieved together from disks to enhance the ratio of the useful data read to the total data read from the disk array so that more clients’ real-time playbacks are guaranteed. When a server employs a disk array to store the video streams, the server interleaves the storage of each video stream among disks in the array to effectively utilize the disk bandwidth. The amount of data interleaved on a single disk, denoted as stripe size, is fixed when the disk array is configured. In the interactive media server, the following is an efficient placement policy to minimize seek and rotational latency incurred by the servicing requests. First, the video streams are stored such that the same types of frames accessed during a round are in the same disks. Second, the different types of frames accessed in the same round are stored adjacent to other disks. However, the video streams made from conventional encoding techniques do not have fixed frame size, which is not adequate for fixed stripe size; some frames exceed the stripe size and some frames are under the stripe size. This causes additional disk requests when video frames are retrieved from disks at different fast-forward levels, because frames are spread over more disks than expected. Therefore, aspecial encoding technique is required to apply the proposed placement policy. The encoding technique to make frames having fixed size is described in the next section. In this section, we assume that the video streams are encoded with our encoding method. To precisely describe the proposed placement policy, let us assume that the size of an encoded P-frame corresponds with the stripe size. Generally, B-frame is smaller than P-frame and I-frame is larger than P-frame according to the dependence between frames. Therefore, by our encoding technique, the size of the B-frame is half of that of the P-frame and the size of the I-frame is twice that of the P-frame. ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 4, No. 3, Article 21, Publication date: August 2008.

Data Placement and Prefetching for Interactive Media Server

Disk No.

1

2

3

4

5

6

7

Video i

I I I I

I I I I

BB BB BB BB

P P P P

BB BB BB BB

P P P P

BB BB BB BB

Video j

P P P P

BB BB BB BB

I I I I

I I I I

BB BB BB BB

P P P P

BB BB BB BB



21:5

Fig. 1. The proposed placement algorithm on disk array. Let the GOP structure be {IBBPBBPBB}.

As explained previously, in the proposed placement policy, the same types of frames are stored in the same disks and different types of frames are stored in other disks. The GOP {IBBPBBPBB} is stored as follows: The I-frame consumes two consecutive stripes, the P-frame consumes one stripe, and two B-frames consume one stripe on the disk array. The next GOP is stored in the next stripe level on disk array, and so on. Figure 1 shows the placement method when the encoded frames are as described above. Using this method, we can minimize the number of disk requests for each fast-forward level. At normal playback, the data should be retrieved from all disks in a disk array with an evenly distributed number of frames. For K -times speed-up fast-forward, the server can skip every K th disk to play the video streams, since the required frames to play fast-forward are separated beyond the disk boundaries. In addition to, for the different videos, the starting disk is also interleaved in order to balance disk requests, as shown in Figure 1. According to our placement policy, many requests can be concentrated on some disks that have an I-frame or P-frame for the interactive operations for each video stream. To prevent from concentration of requests, we allocate the different start disk positions for each video stream. In Figure 1, the videos i and j have different start disk positions. 3.2

Stream Classification

If streams stored by the above placement policy, as shown in Figure 1, have the same frame rate, they will have the same bitrate, because the stripe size is fixed in the disk array. However, because many types of streams, having large picture size, small picture size, high quality or low quality, can be stored in the single media server, the limitation of having the same bitrate for all steams is a problem in our placement policy. To mitigate this problem, we classify the streams into several classes as follows: Class A:

Class B:

Class C:

A class of streams having high bitrate. One I-frame, one P-frame and one B-frame consume four consecutive stripes, two consecutive stripes, and one consecutive stripe, respectively. We set the ratio of the bit count of the I-, P-, and B-frames as 4:2:1, because this ratio generally yields the best video quality [MPEG 1996]. A class of streams having medium bitrate. One I-frame, one P-frame and two B-frames consume two consecutive stripes, one consecutive stripe, and one consecutive stripe, respectively. The ratio of the bit count of the I-, P-, and B-frames is also 4:2:1. Therefore, the bitrate of this stream is half of that of the stream of Class A. A class of streams having low bitrate. One I-frame consumes one stripe, but P-frames and B-frames cannot be synchronized with the stripe size. The ratio of the bit count of the I-, P-, and B-frames cannot be the same as that of other class. In this case, there is no gain for the fast-forward operation where I- and P-frames are scanned, but there is still gain for the fast-forward where only I-frames are accessed.

ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 4, No. 3, Article 21, Publication date: August 2008.

21:6



S.-H. Lim et al. Disk No.

Disk No.

Disk No.

1

I

I

I

I

1

I

I

I

I

1

I

I

I

I

2

I

I

I

I

2

I

I

I

I

2

BBP

BBP

BBP

BBP

3

I

I

I

I

3

BB

BB

BB

BB

3

PBBP PBBP PBBP PBBP

4

I

I

I

I

4

P

P

P

P

4

PBBP PBBP PBBP PBBP

5

B

B

B

B

5

BB

BB

BB

BB

5

PBB

PBB

PBB

6

B

B

B

B

6

P

P

P

P

6

I

I

I

I

7

P

P

P

P

7

BB

BB

BB

BB

7

BBP

BBP

BBP

BBP

8

P

P

P

P

8

P

P

P

P

8

9

B

B

B

B

9

BB

BB

BB

BB

10

B

B

B

B

10

I

I

I

11

P

P

P

P

11

I

I

12

P

P

P

P

12

BB

BB

13

B

B

B

B

13

P

P

14

B

B

B

B

14

BB

BB

15

P

P

P

P

15

P

P

16

P

P

P

P

16

BB

BB

17

B

B

B

B

17

P

P

18

B

B

B

B

18

BB

BB

(a) Class A

...

...

PBB

PBBP PBBP PBBP PBBP

9

PBB

PBB

PBB

PBB

I

10

I

I

I

I

I

I

11

BBP

BBP

BBP

BBP

BB

BB

12

PBBP PBBP PBBP PBBP

P

P

13

PBBP PBBP PBBP PBBP

BB

BB

14

PBB

PBB

PBB

P

P

15

I

I

I

I

BB

BB

16

BBP

BBP

BBP

BBP

P

P

17

BB

BB

18

(b) Class B

...

PBB

PBBP PBBP PBBP PBBP PBB

PBB

PBB

PBB

(c) Class C

Fig. 2. The position of each frame of a stream with different class levels: (a) Class A; the GOP structure is {IBBPBBPBBPBB}. (b) Class B; the GOP structure is {IBBPBBPBBPBB}. (c) Class C; the GOP structure is {IBBPBBPBBPBB} or {IBBPBBPBB}.

The position of each frame in the disk array is shown in Figure 2. From the classification, we can identify that Class A and B have the same bit count ratio between the I-, P-, and B-frames however, we cannot allocate the bit count ratio of Class C, since it cannot fit to the stripes of the disk array. We should examine Class C in more detail. In the case of Class C, all I-frames consume one stripe. For this, the sum total of sizes of the P- and B-frames in one GOP is a multiple of the stripe size as shown in Figure 2(c). The ratio of the bit count of the P- and B-frames has to be as close as possible to 2:1. Thus, to obtain the target bit counts of the P- and B-frames, first, we solve Eq. (1) and (2), and select integer values closest to the above solution satisfying Eq. (2). N P C P + N B CB = nS,

n is apositive integer

C P /CB = 2,

(1) (2)

where N P and N B are the numbers of P- and B-frames in one GOP, and C P and CB are the target bit counts of P- and B-frames. S is the stripe size. In Figure 2(c), the GOP structure alternates between {IBBPBBPBBPBB} and {IBBPBBPBB}. First, in the {IBBPBBPBBPBB} case, Eq. (1) can be expressed as 3C P + 8CB = 4S.

(3)

If we assume that the stripe size S is 32 KB, the solution of Eq. (3) and Eq. (2) is C P = 128/7 KB,

CB = 64/7 KB.

ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 4, No. 3, Article 21, Publication date: August 2008.

(4)

Data Placement and Prefetching for Interactive Media Server File System

I

BB

P

BB

P

I

I

I P

...

I P

I P

I P

RAID Device Driver

I

21:7

File System

Conventional Buffers

I



I

BB

P

I

I

...

I

...

RAID Device Driver

BB

P

BB

(a) Conventional Prefetching

I

I

BB

P

BB

P

BB

(b) Per-Disk Prefetching

Fig. 3. Per-Disk Prefetching and Buffer Management; An example of Class B Stream and X2 fast-forward operation. The light gray and dark gray represent well-prefetched data and mis-prefetched data, respectively.

Therefore, the target bit counts of the P- and B-frames, which are the integer values closest to Eq. (4) and satisfying Eq. (3), are C P = 18728 Bytes,

CB = 9361 Bytes.

(5)

Note that the ratio of the bit count of the I-, P-, and B-frame is 4:2.29:1.14. In the same manner, in the {IBBPBBPBB} case, the target bit counts of the P- and B-frame are

C P = 19662 Bytes,

CB = 9830 Bytes.

(6)

The ratio of the bit count of the I-, P-, and B-frame is 4:2.4:1.2. In both cases, the bit count ratios of the frames are not substantially different from the ratio of Class A or B, even thout they are not exactly same as that of Class A or B. It means that we can allocate the bit count for each I-, P- and B-frame to fit to the disk array in the GOP levels for all the Class levels including A, B, and C. 3.3

Per-Disk Prefetching

When the server retrieves data from disks, consecutive frames are retrieved in advance along with the currently requested frames to increase disk throughput, because a prefetching request is attached to the current request and is sent to the storage system as a single request. We call these frames as prefetching frames or prefetching requests to differentiate the currently requested frames. The prefetch requests increase disk throughput as well as disk utilization however, they incur more data transfer time in the disk and buffer space requirements in memory. Therefore, it is important that the proper amount of frames is retrieved in advance at the proper real-time playback time. On the other hand, the file system, which uses disk array storage with a striping method, logically stores video files in a sequential manner. However, the RAID driver physically interleaves the data across the disks, as described before. This creates a different viewpoint between the file system’s logical address and the RAID storage’s physical address, thus the file system does not know where the data is stored actually in disks. If prefetching frame requests are generated by the file system, the requests are transferred across the disk array, as shown in Figure 3(a), because the file system is only aware of the logically continuous allocation of video files. This is due to the aspects that multimedia data are generally large and multimedia requests are generated in a sequential accesse manner. For the interactive operations, fast-forward plays split the disk request into several requests, because only some portion of frames is needed. As a result, prefetching requests lead to unnecessary data retrieval for fast-forward plays. For example, when the server displays X2 fast-forward level, it ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 4, No. 3, Article 21, Publication date: August 2008.

21:8



S.-H. Lim et al.

only requires the GOPs without any B-frames. However, some parts of the B-frames are likely to be retrieved from disks by the prefetching requests, even though these are not required for fast-forward play. Although the prefetching algorithm of VFS inside the OS adjusts the prefetch size automatically, the prefetch size adjustment does not work properly since each frame of video stream eliminates the effect of auto adjustment of prefetch size even for the smallest frame. The request fragmentation problem arising from disk stripe fragmentation has negative effects on our data placement, since it is derived from the requested frame overshooting of our aligned frame placement on a disk array. Moreover, the proposed placement policy causes disk array to have a small stripe size, which might limit the potential disk throughput. This would create a problem in both disk bandwidth and buffer space. To reduce the problem owing to the unnecessary component of prefetching requests, we generate prefetching requests per disk, as shown in Figure 3(b). When current requests are retrieved from one disk, our file system generates prefetching requests to retrieve more data from the same disk rather than other disks. We call this method a per-disk prefetching method. Because our placement policy separates the frame types to other disks, per-disk prefetching increases the disk throughput by generating larger requests than in the case of the conventional method for each disk, for the fast-forward plays. 3.4

File System and System Architecture

For accurate data placement for each frame to a specific disk, a disk array-based file system is designed and implemented [Bovet and Cesati 2005]. We call the designed file system as MSRfs. In designing the disk array-based file system to support the proposed placement method, we should consider two major issues, the file system layout, and the disk stripe layout. Let us first consider the file system layout. Generally, a file system is composed of the file’s data and its metadata, and these are stored in separated regions A file’s data can be contiguously written to a disk subsystem with pre-occupied free blocks, however, there exists one type of metadata that disturbs the contiguous allocation of file’s data: the data’s index pointer. Whenever a data index block is allocated, the block comes from the free block pool that data are stored, and hence it is jammed between data. Since contiguous allocation of video file data in our placement policy is important, the file system layout should support complete separation between data and metadata so that no type of metadata will interfere with the contiguous allocation. Next, our placement policy is severely dependent on the disk striping layout, and thus file system is designed to consider this aspect. Since different types of frames should be placed on different disks to support efficient interactive operation, the file’s data block allocation should be aware of the disk stripe configuration. Thus, the designed file system layout is based on striping granularity. Specifically, the global layout is fixed into the disk striping distance and the metadata data region is fixed into the striping distance. The designed file system layout based on a disk array is described in Figure 4. The fundamental layout is followed by an Ext2 index based file system. Like Ext2, the MSRfs has a number of groups that constitute some collection of metadata and data. In MSRfs, each group size is fixed into multiples of disk stripe size so that the file’s data allocation is fixed into the disk stripe. In a group, metadata consists of a super block, an inode bitmap, a data bitmap, and inode blocks. In addition, index pointer blocks are added at the end of the metadata region in the MSRfs, for the complete separation between data and metadata. The metadata region is also fixed into multiples of disk stripe size so that the first data block starts from the first block of the disk stripe. The remaining part of the group can then be used for data block allocation. Whenever the index pointer block is required during a file operation, it is allocated from the free index pointer blocks so that data blocks can be used contiguously for the current file’s read/write operations. Moreover, in the disk array, the data blocks can be allocated more precisely than in a conventional file system such that data place/retrieval independency increases across the disks. In our placement method, independency is a very important feature, since each frame should be placed ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 4, No. 3, Article 21, Publication date: August 2008.

Data Placement and Prefetching for Interactive Media Server



21:9

IP Bitmap and IP blocks are used for index pointer block allocation Disk Metadata

Data

Disk

Disk

Disk

Disk

Super Block + Inode Bitmap + IP Bitmap + Data Bitmap + Inode Blocks + IP Blocks I

I

BB

P

BB

I

I

BB

P

BB

I

I

BB

P

BB

I

I

BB

P

BB

... Fig. 4. File system implementation for supporting proposed data placement and encoding technique. One group is described. The metadata and data regions are fixed into multiple of disk striping distance within a group. Interactive Media Server readahead()

Virtual File System

Per–Disk buffers

ra_page > stripe offset ?

NO

YES ra_page += stripe*num_disk; submit ra_page MSR file system Raid Device Driver

Disk

Disk

ra_pages are same frame types ra_pages are retrieved with current requested pages

Disk

Disk

Disk

Disk

Fig. 5. System Architecture and Operations; The left part of the figure represents system components, and the right part of the figure describes data retrieval operations.

independently from other frames in order to enhance the interactive operations. With the designed MSRfs and our encoding technique, when a video file is created and written, the physical position of the file’s data can start from the stripe boundary, different frames can be place independently from each other across the disks, and the same frames can stored within the same disks contiguously. When a file is deleted, the allocated data blocks, which are likely to be contiguous, are backed to free blocks pool. As times go, there are free blocks fragmentations due to many file deletion and creation operations. The fragmentation disturbs contiguous allocation of video streams according to our allocation strategy. We need to make free continuous region as many as possible with the fagmentations for later allocation, such as the cleaning of LFS file system. However, since our system is for multimedia server, the fragmentation is also bulky and not scattered through the storage, as does general file system. Therefore, it is easy to gather the fragmentation. The overall system architecture and retrieval operations are shown in Figure 5. The system implementation is based on the Linux 2.6.11 kernel version. In the left part of the figure, the system ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 4, No. 3, Article 21, Publication date: August 2008.

21:10



S.-H. Lim et al.

Frame Size (Multiple of Stripe Size)

Dummy Data x2

Cut Coded Data

x1 x0.5

I B B P B B P B B P B B I B B P B B ...

Frame Type

Fig. 6. An Example of Distribution for padded dummy data and cut coded data.

architecture consists of a Linux VFS for page caching and prefetching, MSRfs for data placement, and a RAID device driver. We modified the VFS layer for the per-disk prefetching method and implemented MSRfs for the data placement algorithm. Conventionally, the VFS is not aware of the lower block device layer architecture, but only regards it as a simple logical block device, as described in the previous section. Thus, when readahead pages(ra page) are generated in the VFS, it only considers the continuity of a simple logical block device. This is not adequate for our data placement algorithm and interactive operation. To apply the per-disk prefetching method, the VFS should know the layout of the disk array-based block device. For this, we inserted a routine that acquires the disk-array information from the lower block level device driver when the system is set up, such as the number of disks, stripe size, block size, and so on. The VFS is then able to distinguish where the requested pages are stored in the disk. Based on this, the per-disk prefetching operations are performed as follows. When data retrieval requests are generated in the media server, the VFS always tries to generate a prefetching request by calling a read ahead function. In the read ahead function, when a ra page is generated, we check whether it is in the current disk stripe offset or not. If it is, then the ra page is in the current requested disk, and thus the request continuity is not broken. If it is not, then the ra page is not in the current requested disk and is in the next disk. Thus, request continuity is broken and another disk request is generated to another disk by a prefetching request. To prevent this situation, we recalculate the position of the ra page so that it can be included in the current requested disk by adding the disk striping multiplied by the number of disks. The ra page is then submitted to the lower block driver layer. Since MSRfs can allocate each frame to each disk by our data placement algorithm, the submitted ra pages are always of the same frame type and they are retrieved together with the current requested pages, thus enhancing the disk throughput and disk utilization for interactive operations. 4.

ACCURATE BIT COUNT CONTROL

In order to establish the placement policy described above, bit counts of all frames should be accurately controlled. However, conventional video rate control schemes cannot satisfy this requirement. Although the output bitrate generated by the conventional rate control schemes is close to the given target bitrate in an average sense, the bit count of each frame cannot be equal to the given target bit count, because of modeling errors occurring in the bitrate estimation model and buffer control for enhancing the subjective video quality. When we store a stream encoded with the conventional video rate control scheme into the disk array storage, it is necessary to pad the lack of bitcount using dummy data or cut off excessive coded data in order to make the bitcount equal to a multiple of the stripe size. This could be one possible solution. An example of the distribution of padded dummy data and cut coded data is shown in Figure 6. However, the disk space is largely wasted by padded dummy data and the video quality is severely degraded by cut coded data. ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 4, No. 3, Article 21, Publication date: August 2008.

Data Placement and Prefetching for Interactive Media Server



21:11

Rate control QPs Input Frame

Motion estimation / compensation

DCT

Stored for next frame

Quantization IDCT

Encoded Frame

VLC

Inverse Quantization

(a) Conventional MPEG-2 video encoder using a rate control

R-QVLC Input Frame

Motion estimation / compensation

DCT

Stored for next frame

Updated QPs

Adjust QPs

Quantization

IDCT

Compare VLC

Encoded Frame

Inverse Quantization

(b) The procedure of the R-QVLC scheme. Fig. 7. Convention MPEG-2 video encoder and proposed conceptual procedure of the R-QVLC scheme. This is post-processing process after one frame is encoded with a conventional video rate control.

To address these problems, we propose a method that precisely fixes respective bitcounts of coded frames into given target bitcount. In our algorithm, the encoded video stream must satisfy the following two requirements. First, bitcounts of each frame are assigned to a multiple of the stripe size of the disk array. Second, bitcounts of each frame with same coding type such as I, P, or B are equal to each other. For example, let the stripe size be 32 KB and the bit-count ratio of the I-, P-, and B-frames be 4:2:1. If the bitcount of B-frame is equal to the stripe size, bitcounts of all I-frames must be exactly 128 KB, all P-frames must be exactly 64 KB, and all-B-frames must be exactly 32 KB. In order to satisfy the above two requirements, we must use a constant bitrate (CBR) video encoding method given that we encode source streams with a video rate control scheme. If we use variable bitrate (VBR) video encoding methods, however, we cannot satisfy the above two requirements, since the characteristics of each source frame, such as the pattern of motions and the distribution of colors, are different from each other and unpredictable. Thus, it is necessary to use CBR video encoding methods, in that we have to encode a source stream with a video rate control scheme. 4.1

Fine Tuning of Tail Amount

A conventional MPEG-2 video encoder using rate control is shown in Figure 7(a). The proposed rate control method does not modify the process of this conventional encoding, but works as a post-processing process after every one frame is encoded. If the actual bit count of the coded frame in Figure 7(a) is not equal to the target bit count, the encoding process is paused and the quantization parameter (QP) of the macroblock (MB) is adjusted. The conceptual procedure of the proposed bit count control scheme is added to the conventional encoding mechanism in the gray background region of Figure 7(b). In the post-processing step, if the current bitcount is lower than the target bitcount, we increase the QPs of the appropriate MBs, and carry out quantization and Variable Length Coding (QVLC) in these MBs. If the actual bitcount of the reproduced frame is higher than the target bitcount, we decrease the QPs, ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 4, No. 3, Article 21, Publication date: August 2008.

21:12



S.-H. Lim et al. 19 14 9 4

18 13 8 3

17 12 7 2

16 11 6 1

15 10 5 0

Macroblock

Fig. 8. An example of assigning the sequence number of each MB of the frame having 5-by-4 MBs.

and carry out the QVLC again. Thus, we call this scheme repeated-quantization and variable length coding (R-QVLC). During this process, if the actual bit count is equal to the target bit count, this coded frame is stored and the next input frame is encoded. This method maintains the advantages of the conventional rate control scheme while not modifying the video-compression standard of acoded stream, so it can be applied to other coding scheme such as MPEG-4 and H.264. There are four issues that must be considered in the the R-QVLC scheme. —We cannot accurately estimate the bit count variation of MBs as their QPs are adjusted. Thus, we should find the best QPs of MBs that can make the bit count of a coded frame equal to the target bit count. —In the worst case, if the actual bit count cannot be equal to the target bit count, we should make the actual bit count lower than the target bit count because if the actual bit count becomes higher than the target bit count, we should cut off excessive video data that leads to serious degradation of visual quality. —We should minimize the count of the loop process of R-QVLC for reducing encoding time. —The degradation of the subjective quality by the QP adjustment should be minimized. We propose an algorithm considering the above issues for an effective R-QVLC scheme, called the Fine Tuning of Tail Amount (FTTA) algorithm. For the algorithm, we first assign a sequence number of each MB in the frame as shown in Figure 8, which represents the order of adjusting QPs. The direction of numbering is the reverse order of the encoding process, from bottom to top and from right to left. The overall flow chart of the FTTA algorithm is shown in Figure 9. The FTTA assigns the current actual bitcount to target bitcount by repeatedly adjusting the amount of the bit count of each MB according to the sequence of MB described in the Figure 8 for each frame. There are four stages in the FTTA algorithm, initial stage, rate-decreasing stage, rate-increasing stage, and fine-tuning stage. In the initial stage, we initialize three important variables BP, TRP and BN. BP represents the sequence number of the currently selected MB for bitcount adjusting. TRP represents the MB sequence number where if BP arrives the BP turns to 0, and BN represents the number of MBs to be adjusted in the current QVLC. If the picture is composed of M by N MBs, these are initially set to 0, (M-1), 1, respectively. After the three important variables are initialized, if the actual bitcount (AB) is larger than target bitcount (TB), the algorithm proceeds to the rate-decreasing stage, otherwise, it proceeds to the rate-increasing stage. In the rate-decreasing stage, Q[BP] increases by one, and BP also increases by one. The Q[BP] represents the quantization parameter of MB[BP]. Using this quantization parameter, we execute the QVLC of this MB and generate the current AB. If AB is still larger than TB, we double BN. The QPs from Q[BP] to Q[BP + BN-1] increase by one respectively, and BP also increases by BN. Then, QVLC is executed again within from MB[0] to MB[BP] using the updated quantization parameters. If AB is still larger than TB, we double BN again and repeat the above process. If AB becomes smaller than TB, we halve BN and move to the rate-increasing stage. During the update of BP, if BP arrives at TRP, the BP becomes 0 and TRP is increased by M. ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 4, No. 3, Article 21, Publication date: August 2008.

Data Placement and Prefetching for Interactive Media Server



21:13

START N Initial Stage starts

Fine-Tuning Stage starts AB!=TB N

Y BP=0; TRP=M-1; BN=1;

Q[BP]++; BP + =BN;

BN==1

QVLC;

Y MK[BP-1]=NDQ N

Rate-Increasing Stage starts

N

ABTB Y BN*=2

N

BN=BN/2

BN=BN/2 Q[BP~BP + BN - 1]--; BP + =BN;

AB!=TB

Q[BP~BP + BN - 1]++; BP + =BN;

3

Y 2

N

QVLC;

AB

Suggest Documents