KDDI Corp. and Nagoya University[8] are used as test sequences, and Figs.6-7 show four views of 'golf_2' and. 'Xmas' multiview video sequences at one ...
PARALLEL PROCESS OF HYPER-SPACE-BASED MULTIVIEW VIDEO COMPRESSION You Yang1,3, Gangyi Jiang1,2, Mei Yu1,4, Dingju Zhu2,3 1. Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China 2. Institute of Computing Technology, Chinese Academy of Science, Beijing 100080, China 3. Graduate School of Chinese Academy of Science, Beijing 100080, China 4. National Key Laboratory of Machine Perception, Peking University, 100871, China ABSTRACT
and its localized properties are used to parallel processing in MVC. A parallelization methodology and algorithm based on hyper-space theory are proposed. Experiments are performed on LAM-MPI parallel computing platform. Experimental results show that the proposed MVC scheme is quite effective, and obtain high rate-distortion results.
Multiview video coding (MVC) is a key technology in freeviewpoint television. MVC based on traditional existing codec system has been studied widely, but all of them need powerful computational capacity in processing. Parallel process of MVC can facilitate the efficient implementation of encoder and decoder and has been required as a function by MPEG. In this paper, a parallelization methodology for MVC based on hyper-space theory is presented and tested on the local area multi-computer - message passing interface (LAM-MPI) parallel platform and modified H.264 codec. Experimental results show that the proposed method can speed up processing of multiview video compression and obtain high rate-distortion results.
Fig. 1. Hyper-Space of a monoview video coding scheme 2. HYPER-SPACE FOR VIDEO CODING
Index Terms—Video coding, Parallel processing, Image communication 1. INTRODUCTION With the advancement of computer graphics and vision technologies, it has been recognized that multiview video coding (MVC) is important in serving varieties of applications, including free-viewpoint television, 3D television. However, it is inherently different from traditional monoview video such as today’s TV. MVC should be highly compressive, supporting partial decoding, viewpoint switching[1]. Ordinary MVC based on modified H.26x and MPEG schemes has been discussed widely[2], but it requires more computational efforts. Thus, it is possibly needed for MVC to support parallel processing of different views or segments [2], but it is still a challenge to all. Parallel monoview video coding has caught attentions recently[3-5], most of these works are in low-granularitylarge-communication structure. Slices partitioned equally or unequally in one frame form the granularity and the key technology is slice partition. Multi-thread algorithm is an alternative for process[6], but it also takes slice as granularity. A great part of time is consumed in communication, especially in complex estimation and prediction between slices. In this paper, the concept of hyper-space is presented,
1424404819/06/$20.00 ©2006 IEEE
521
Assume that the number of frames to be encoded is a finite value. Let n and m denote the number of views and frames in one view, respectively, and let V={vi|1inum} denote the finite set of frames. Clearly, there exists estimationprediction (E-P) relationship between some frames (i.e. vi) or views. Specially, we form a subset Ej of V with the E-P relationship. Definition 1[7] Let V={vi|1inum} be the finite set of the frames to be encoded in video sequence, then hyper-space H is defined as H=(V, E), where E=(Ei)iI , Ei is the subset with E-P relationship and I is an index set. vi and Ej represent nodes and edges in H, respectively. For each izj, if H satisfies EiEj, H can be called as a simple hyper-space. Thus, video coding is identical to the coding in hyperspace H. For example, one monoview video coding scheme can be described as Fig.1, and clearly, the hyper-space is not a simple one. Definition 2[7] Let V=v1E1v2}Epvp+1 be a sequence of nodes and edges in hyper-space H. If all the nodes and edges are distinct, V is a path of H, and the length of the path V is p. If v1=vp+1, V is a cycle of H. If any sub-hyper-space Hc of H is acyclic, H is a tree. Fig.2 and Fig.3 are examples for tree and path in hyper- space. The concept of path, cycle and tree of hyperspace are important in multiview video coding analysis, and the details can be found in our previously work[7].
ICIP 2006
arbitrarily. The expectation of cost in single frame random 3. HYPER-SPACE BASED MVC SCHEME
nu m
access is defined as E ( X )
¦x p i
i
, it is important for a
i 1
Tree model based MVC encoder scheme shown in Fig.2 is proposed in this paper. Fig.3 shows two comparative schemes. The details of these schemes are given in [7]. Path model 1 mainly relies on disparity estimation, while for path model 2 motion estimation plays an important role. The tree model proposed here can be processed sequentially and parallelly, while path model can also be dealt[3-6].
hyper-space to evaluate the ability of random access. Lower expectation means higher ability of random access. The maximum number of xi and maximum path length (MPL) are also two aspects for assessing decoder’s ability in partial decoding and efficiency. By considering 4u4 group of GOPs, i.e. a tiny hyper-space, the parameters for decoding the GOPs are listed in Table 1. Clearly, tree model will be suitable for decoder, especially in parallel mode. 4. PARALLELIZATION METHODOLOGY FOR MVC
Fig.2. Tree_model: a tree of Hyper-Space
(a) path_model_1
(b) path_model_2
Fig.3. Two paths in Hyper-Space Table 1. Parameters in analyzing decoder model path_model_1 path_model_2 sequential tree_model parallel tree_model
E(X) 7.2 7.2 3 2.875
max xi 15 15 6 5
MPL 5 5 2 2
Multiview video sequences can be divided into two main kinds according to the interval between cameras in multiview imaging setup, that is, the dense and sparse camera video sequences. For multiview video sequence with dense cameras, there is more disparity information than the temporal, and the tree model shown in Fig. 2 is suitable to it. Multiview video sequence with sparse cameras contains more temporal correlation than disparity, and the rotated tree model can fit it. As can be seen in section 5, path models perform worse than tree model since they tackle disparity or motion estimation only while tree model dealing with both, thus parallel experiment will be performed based on tree model rather than path models in this paper. On the other hand, partial decoding, random access, low delay and viewpoint switching are important tasks within MVC decoder. Let xi be the number of frames must be pre-decoded before ith frame can be decoded. Set pi be the probability for ith frame that the viewer will access
522
The localized property of tree model indicates that some edges can be processed independently[7]. As shown in Fig.2, the isolated edges, i.e. EiEj=, are independent from each other, and thus they are data independent. On the other hand, the cross edges, i.e. EiEjz and EiEj, are data dependent. For example, as shown in Fig.2, E1 and E2 are data dependent, while E2 and E3 are independent. Independent edges can be encoded simultaneously, while dependent ones can also be parallelized but with priority. As shown in Fig.2, the edge E1, especially the nodes belongs to the intersection of E1 and other edges, must be encoded first because all the other independent edges are dependent on it, while in Fig.1, E1 and E2 can hardly be parallelized since E2E1, but E2 and E4 can be processed parallelly. Hence, we can conclude from the above that: Parallelization methodology: For two edges Ei and Ej (izj) in simple hyper-space satisfying EiEj, they can be parallelized. Furthermore, each edge will be served as a workload on one computing node, and the intersection of these two edges will be processed first and forms the local communication between these two computing nodes. Therefore, in a simple hyper-space, the number of edges in a tree model with the above properties implies the degree of parallelization. Multiview video coding, i.e. hyper-space coding, can be decomposed as edges, i.e. data domains, and each edge will be allocated as a workload to an encoder. The encoder loaded on all computing nodes is identical to each other. Thus, the structure of MVC parallelization proposed in this paper is a typical SPMD, and the structure of this encoder is shown in Fig.4. The structure is composed of one main encoder and several sub-encoders, and coded frames serving as references or bitstream are transmitted among them (not between sub-encoders) through communication channels. Coded frames serving as references in sub-encoders transmitted from the main is a critical step in processing. This kind of communication will occur first in the procedure. The I-frame or pseudo-I-frame and some P frames, i.e. the nodes in the intersection of two edges in hyper-space, is
encoded first in the main-encoder and transmitted to the proper sub-encoders serving as reference in predicting other B or P frames loaded by that encoder. When processing of an edge is completed, the bitstream from sub-encoder go through communication channel directly to the mainencoder.
the second edge from sub-encoder 1, and other encoders follow. The main encoder will go further to the next group of GOPs after bitstream from all the other edges have been written into the outcome file that marked the end of a cycle of encoding. The main encoder makes all encoders synchronously through the way of local communication, and the synchronous mechanism is shown in Fig.5. 5. EXPERIMENTAL RESULTS The ‘golf_2’ and ‘Xmas’ multiview video sequences from KDDI Corp. and Nagoya University[8] are used as test sequences, and Figs.6-7 show four views of ‘golf_2’ and ‘Xmas’ multiview video sequences at one moment. H.264 JM85 codec is modified to fit needs of hyper-space based MVC. Each successive four frames of four viewpoints are selected as 4u4 GOPs for our comparative experiments. The environment for experiments is listed in Table 2. Table 2. Experiment environment for tests Sequential Parallel Machine Ordinary PC Shuguang® server OS Fedora Core® 4 Linux (Kernel 2.6.11) CPU Pentium III® 1.7G Pentium III® 1Gu2 RAM 512MB 1024MB CODEC H.264 JM 85 Parallel platform LAM-MPI 7.1.1-2
In parallel test, two encoders are organized to fit the environment. Experiment results for sequential and parallel tests of “golf_2” and “Xmas” test sequences are shown in Fig.8. The quantization parameter (QP) is important in H.264 codec, and it is limited in the range of 28~50 in our experiments. Processing speed is measured by the number of frames processed per minute. The results show the parallel tree model makes speedup in parallel processing of MVC, and the number of encoders implies the gain in speedup. Fig.9 shows the rate-distortion performances of MVC with the path model-1, path model-2 and tree model.
Fig.4. SPMD parallel scheme of MVC
6. CONCLUSIONS The proposed parallel methodology shows the flexibility in parallel implementation. Tree model of MVC in hyperspace can be used in both sparse and dense camera arrays while providing high processing velocity and appreciated RD results, and with the flexibility to parallelization. Parallel tree model speeds up the processing. The results also show that the number of parallel computing nodes implies the gain in speedup. But evidentially, the number of parallel nodes may not exceed the number of edges in simple hyperspace that can be parallelized.
Fig.5. Synchronous mechanism for MVC parallel encoding scheme. (for 4u4 GOPs) Bitstream generated in different encoders should be organized by main encoder properly to the outcome file. Linear construction is proposed in this paper. Traditional codec system will write out the bitstream of a coded frame to the output file just after the processing of the frame is finished. Thus, the bitstream of edge in hyper-space from main encoder may be written to the outcome file first, then
ACKNOWLEDGE This work was supported by the Natural Science Foundation of China (grant 60472100), Natural Science Foundation of Zhejiang
523
Implementation of Parallel Video Encoding Strategies Using Divisible Load Analysis”, IEEE Trans. on CSVT, 15(9):1098-1112, Sept. 2005
Province (grant RC01057, Y105577), Zhejiang Science and Technology Project of China (grant 2004C31105), and the Ningbo Science and Technology Project of China (grant 2003A61001, 2004A610001, 2004A630002).
[5] P. Kolinummi, J. Sarkijarvi, T. Hamalainen, J. Saarinen, “Scalable implementation of H.263 video encoder on a parallel DSP system”, Proc. ISCAS ,1:551 – 554, May 2000
REFERENCES [1] ISO/IEC JTC1/SC29/WG11, Survey of Algorithms used for Multi-view Video Coding (MVC), MPEG2005/N6909, 2005
[6] J. P. Cosmas, Y. Paker, A. J. Pearmain, “Parallel H.263 video encoder in normal coding mode”, Electronics Letters, 34(22):2109-2110, 1998
[2] ISO/IEC JTC1/SC29/WG11, Requirements on Multiview Video Coding v.4, MPEG 2005/N7282, 2005
[7] You Yang, Gangyi Jiang, Mei Yu, Fucui Li, Yong-deak Kim, “Hyper-space based Multiview Video Coding for Free Viewpoint Television”, Picture Coding Symposium 2006, Beijing, Apr. 2006
[3] A. Rodriguez, A. González and M.P. Malumbres, “Performance evaluation of parallel MPEG-4 video coding algorithms on clusters of workstations”, Int. Conf. on Parallel Computing in Electrical Engineering (PARELEC’04), 2004
[8] ISO/IEC JTC1/SC29/WG11, KDDI multi-view video sequences for MPEG 3DAV use, M10533, Munich, (2004).
[4] P. Li, B. Veeravalli, and A. Kassim, “Design and
Fig.6. Four views of Xmax at one moment
Fig.7. Four views of Golf_2 at one moment
(a) Speedup in processing of Xmas multiview video sequence
(b) Speedup in processing of golf_2 multiview video sequence
Fig. 8. Speedup in processing of multiview video sequence for test sequences
(a) R-D performances of Xmas multiview video compression
(b) R-D performances of golf_2 multiview video compression
Fig. 9. Rate-distortion performances of multiview video compression for test sequences
524