address this problem, a codec architecture using multiple encoders and decoders is an effective approach for ..... Using
A scalable architecture for use in an over-HDTV real-time codec system for multi-resolution video. Takeshi YOSHITOME, Ken NAKAMURA, Yoshiyuki YASHIMA and Makoto ENDO NTT Cyber Space Laboratories, 1-1 Hikarino-oka, Yokosuka, JAPAN
ABSTRACT
In this paper, we propose a multi-frame synchronization method which has sucient scalability, and describe an SHR codec system we have developed that uses MPEG-2 codecs and a multi-HDTV frame synchronizer based on our method. Keywords:
over-HDTV, SHR, MPEG-2, CODEC, multi-frame, scalability 1. INTRODUCTION
High-de nition video applications using MPEG-2 video compression technology are increasing and super high resolution(SHR) video applications such as distribution of digital cinema are now being developed. SHR video images are 25-50 times larger than SDTV images, and are 4-8 times larger than HDTV images. In order to transfer SHR video in real time using PCM technology, very high speed transmission lines on the order of 6-12 Giga/sec are necessary. To transfer SHR video using economical transmission lines, the video image data quantity must be reduced to 1/20-1/40 using video compression technology like MPEG-2. Several MPEG-2 SDTV encoder LSIs12 and MPEG-2 HDTV encoders3 that use these chips have been developed using MPEG-2 video compression technology. However, it is dicult to compress SHR video with a single HDTV encoder. To address this problem, a codec architecture using multiple encoders and decoders is an eective approach for SHR video in terms of system scalability. In this paper, we propose an SHR video codec system architecture that is adaptable to the target of a wide range video resolution to be encoded. A new image synchronization technique is introduced to guarantee the scalability of the system architecture. Section 2 describes the basic concept of a multi-codec system and its bene ts. Section 3 describes the problems of a multi-codec system that need to be solved. Section 4 discusses our system architecture, in particular our image marking technique that solves the problems described in previous section. Section 5 discusses the image mark reformation and simulation results obtained with it. Section 6 shows the experimental codec system we have developed using the proposed system architecture. 2. BASIC CONCEPT 2.1. Basic concept of SHR codec
SHR image transmission using parallel encoding and decoding architectures consists of several HDTV encoders, a transmission network and several HDTV decoders. The architecture is shown in Fig.1. SHR image is represented using several HDTV images in such architectures. An SHR camera outputs several synchronous HDTV images, and they are input to HDTV encoders. In HDTV encoders, all HDTV images are encoded independently and generated bitstreams are transferred to HDTV decoders through the network. Decoded images decompressed by the decoders are output to an SHR display system. 2.2. Features
One of the most important features of this architecture is the ability to keep the system adaptable to the various resolutions of SHR video. There are many types of SHR video, such as those shown in Fig.2. This architecture can adapt many kinds of SHR images to increase or decrease the number of encoders and decoders used in the system. A second bene t of this architecture is reduction of the codec system cost. With this system, conventional HDTV encoders and decoders can be used instead of designing and manufacturing a new SHR encoder and decoder that can handle speci c-resolution SHR images. E-mail:
[email protected], Telephone: +81 468 59 3173 1752
Visual Communications and Image Processing 2003, Touradj Ebrahimi, Thomas Sikora, Editors, Proceedings of SPIE Vol. 5150 (2003) © 2003 SPIE · 0277-786X/03/$15.00
HDTV image A
HDTV image A
D
B
B network
C
SHR image B
SHR image divider
A
SHR image connector
decoder
encoder
encoder
C
decoder
C decoder
encoder
D
D decoder
encoder
SHR camera
SHR codec
SHR image A B C
D
SHR projector
. Parallel encoding and decoding architectures
Figure 1
SHR images
Left
Right
Num. of Channel resolution
feature
2
1920x1080 3D
1
3840x1080 wide
1
5760x1080 ultra wide
1
ultra 3820x2160 large
. Example of SHR images
Figure 2
3. PROBLEMS TO BE SOLVED
A parallel encoding and decoding system is a useful technique to transfer high resolution videos such as SHR images; however, there is a problem in this approach that must be solved, which is as follows. All HDTV decoders do not receive corresponding bitstreams from an HDTV encoder at the same time because the bitstreams are treated individually in the network. Thus, the decoders output decompressed HDTV images without synchronization, and as a result SHR video images comprising several HDTV images are not correct. It would appear that a good way to avoid these frame synchronization errors would be to multiplex transport streams, because all the bitstream timestamps generated by each HDTV encoder may be re-stamped by the muliplexer. However, since conventional HDTV encoders and decoders are not designed to be synchronized with other encoders and decoders; thus, encoder and decoder hardware must be modi ed. This is a serious problem, because such modi ed systems can not handle the widely-ranging resolution of SHR video, and because the newest encoders can not be used for SHR codec systems. 4. PROPOSED ARCHITECTURE 4.1. Architecture based on image mark
The architecture we propose to address these problems consists of several codecs, an image mark inserter, and a mark detector and synchronizer. The block diagram is illustrated in Fig.3. Before encoding, each HDTV Proc. of SPIE Vol. 5150
1753
image divided from an SHR image is marked at the top or bottom line by the mark inserter. The image mark represents the frame number of the original SHR image. Each encoder compresses HDTV images that include image marks, and the images are decompressed and output by decoders to the mark detector without synchronization. The mark detector and synchonizer read the HDTV image mark, and synchronizes and outputs all HDTV images that have the same frame number. In order to synchonize the multi-HDTV frames, the mark detector and synchonizer have sucient frame memories. Marked lines are replaced by black lines or adjacent lines in the mark detector and synchronizer before the HDTV images are output. This architecture does not require the ability to modify the conventional encoder and decoder, and this makes it easy to expand higher resolution SHR codec systems. HDTV image
HDTV image
A
A
encoder
encoder
Figure 3
decoder
decoder
decoder
Image mark detector and synchronizer
D
encoder
network
C
Image mark inserter
B
decoder
encoder
B
C
D
. Proposed SHR codec architectures
4.2. Image mark
Digital watermarking is one of the most famous and eective image marks to protect copyrights, and some of them are used in compressed images. Watermarking information is inserted to many pixels of the Least Significant Bit(LSB) side all over the image in order to be more dicult to recognize the existence of watermarking. This means image quality degradation caused by digital watermarking is small, but the degradation it causes is spreads to all over the image. The image mark technique we use is to replace the successive pixels in the top or bottom horizontal line with those that will represent the frame number. A comparison of pixel positions used in watermarking information and in our image marks is shown in Fig.4. Our image mark decreases the eective image area, but there is no image degradation except for the top or bottom lines. In this method, it is essential to detect marks correctly because MPEG-2 compression changes original image marks to decoded image marks. To prevent image mark detection errors, the M-bit frame number represents the image mark that consists of M x N pixels in our mark inserter and detector. Each frame number bit is represented by N pixels. The rst N pixels represent the most signi cant bit(MSB) of the frame number. In case that the range of each pixel level is 0 to F in hexadecimal, each bit of the frame number consists of 3 pixels and a frame number can be represented using 4 bits, thus the relationship between frame number and mark is shown in Table.1. The frame number 1010 in a binary expression is represented using 12 pixels whose levels are FFF000FFF000. To extract the correct frame number from a degraded decoded image, the mark detector compares each image mark pixel to a threshold. In this case, the threshold level is 7.5. The mark detector considers a pixel whose level is over 7.5 to be one, otherwise it considers the pixel to be zero. Each frame number bit is decided by majority. For example, if an image mark comprising 12 pixels is A6F013B9C304, the MSB is one by majority because there are two pixels(level=A,F) that are over the threshold and one(level=6) 1754
Proc. of SPIE Vol. 5150
that is under it. Using this method, its mark is considered to be 1010 as a binary expression. Thus, the mark detector recognizes the frame number and outputs all HDTV images that have the same frame number simultaneously. The main rule to read and write image marks is described above with an option rule to decrease read errors. Two thresholds are used in this rule. The mark detector compares each image mark pixel to a threshold A and threshold B. In this example, the threshold A level is 5 and the threshold B level is 11. The mark detector considers a pixel whose level is over 11 to be one, and one whose level is lower than 5 to be zero. A pixel whose level is between 5 and 11 is considered to be an error pixel. Error pixels are removed from the majority decision. Using this option rule, the image mark detection reliability can be upgraded. pixels with watermarking information
pixels with frame number information pixels used for frame number
(A) watermarking system
(B) frame synchonization system
Pixel position comparison of watermarking and our image mark Figure 4.
b2 b1 b0 Pixels representing frame number of b2/b1/b0 bits
. Image mark example
Figure 5
. Relationship between frame number and marked pixel value frame number pixel value decimal binary four pixels four pixels four pixels b2 b1 b0 for b2 bit for b1 bit for b0 bit 0 0 0 0 0000 0000 0000 1 0 0 1 0000 0000 FFFF 2 0 1 0 0000 FFFF 0000 3 0 1 1 0000 FFFF FFFF 4 1 0 0 FFFF 0000 0000 5 1 0 1 FFFF 0000 FFFF 6 1 1 0 FFFF FFFF 0000 7 1 1 1 FFFF FFFF FFFF
Table 1
5. SIMULATION
In many types of conventional video equipment, pixel images are represented using 8bits. The 0 and 255 levels are used for image marks in our method. However, the image mark levels are changed by MPEG-2 compression for the decoded images. If the reformation of these image marks is large, a mark detector can not recognize the proper frame number represented by an image mark and can not synchonize the proper frame. We simulated image mark reformation caused by MPEG-2 compression. Detailed simulation conditions are shown in Table 2. In order to nd the relationship between bitrate and reformation clearly, a lowest bitrate of 14Mbps is selected. A typical bitrate of the 422P@HL pro le used in this simulation is about 20-30Mbps. Proc. of SPIE Vol. 5150
1755
Figure 6 shows the simulation result of the relationship between the level change in the mark pixel and the encoding bitrate. The average change in each mark pixel is about 14-20 for complicated scenes, and about 2-4 for simple scenes when the compression ratio is 14-20 Mbps. We also simulated the relationship between the number of pixels that represents one bit of the image mark and detection error in 30 frames. The result is shown in Fig.7. If the number of pixels that represents one bit of the frame number is less than 3, detection error occurs at a low compression ratio. If the number is 4 or more, no error occurs in the scenes we tested. Table 2. Simulation Conditions Test sequence Standard Test Sequence (BTA) Test scene Green Leaves, Marching, Church Encoding pro le MPEG-2 422P@HL Bitrate 14,16,18,20Mbps GOPsize(N) 15 P-picture interval(M) 3 Mark line top line Mark range 0-15 Number of pixels per marker bit 1,2,3,4,5,6,7,8 Number of threshold 1 Nhreshold level 128
20 complicated image 16
green leaves
1 pixel/ mark bit
2 pixel/ mark bit
3 pixel/ mark bit
4-8 pixel/ mark bit
14
14
14
14
12
12
12
12
marching
8 plane image
6 4
10
10
8 6
4 2
church
Num. of error mark
10
Num. of error mark
12
Num. of error mark
14 Num. of error mark
average error of decoded pixcel
18
10
8 6
4
10
8 6
4 2
2
8 6
4 2
2 14
0 14
Figure 6.
pixel
16 18
20
14
Bitrate(Mbps)
15
16 17 18 Bitrate(Mbps)
19
16 18
20
Bitrate(Mbps)
14
16 18
20
Bitrate(Mbps)
14
16 18
20
Bitrate(Mbps)
20
. Detection error in 30 frames(Green Leaves)
Figure 7
Average error level of image marker
6. EXERIMENTAL SYSTEM
We have developed an experimental SHR codec system based on the system architecture described above. It comprises 2-8 codecs, a mark inserter, a mark detector and synchronizer. Changing the number of codecs in the system makes it possible to encode and decode various levels of video resolution. For example, two codecs can compress and decompress stereo images of 1920 pixels x 1079 lines x 30 fps, three codecs can compress and decompress images of 5760 x 1079 x 30 fps, and eight codecs can compress and decompress images of 3840 x 2158 x 60 fps. Thirty-two pixels in each frame are used for the frame number. Time lags of up to eight frames of decoded frames decompressed by decoders can be absorbed. Audio signals embedded in video signals are also adjusted automatically according to the time lag of video frames. 1756
Proc. of SPIE Vol. 5150
Using this system, we were able to transmit the semi nal game of the 2002 FIFA World Cup Soccer tournament from the Yokohama International Media Center to Yamashita Park as a closed-circuit event without any frame synchronization errors. Figure 10 is an SHR transmission diagram of this demonstration, in which three conventional HDTV encoders and three HDTV decoders were used for the system without any need for hardware modi cation. A second example of using our system was the SHR image transmission of an orchestral concert of the SAITO KINEN FESTIVAL 20024 at Matsumoto city. Figure 11 shows a transmission diagram of this event. The SHR image used for this event were 2x2 times larger than HDTV images. Two HDTV images were also transmitted. In totally, twelve conventional encoders and decoders were used for this transmission. In order to maintain high audio quality, 4-channel audio streams were transmitted without any compressions. The results obtained demonstrate that the proposed system architecture makes it possible to create high-quality video encoding systems that have scalability in terms of target video resolution. . Speci cation of experimental system Video format SMPTE-292M Number of video channel 8 Position of marker line #1 or #1080 Marker bit length 4 Number of pixel per mark bit 8 Audio format AES/EBU Number of audio channel 8 Power consumption 314W Table 3
. Image mark inserter: ES1000
Figure 8
Figure 9.
DS1000
Image mark detector and frame synchronizer:
7. SUMMARY
In this paper, we proposed a multi-frame synchronization method that has sucient scalability and described an SHR codec system we have developed that uses MPEG-2 codecs and a multi-HDTV frame synchronizer based on our method. The system architecture uses a spatially-parallel encoding approach and has scalability for the target video resolution to be encoded. Image mark insertion and detection techniques have been introduced to the system. By using these techniques, we have achieved both the scalability of the system and high cost performance. The capability of this proposed architecture was con rmed by using it in an experimental system. By using video encoding systems based on the system architecture, we con rmed that high quality video can be able used for visual applications for commercial and personal use at reasonable system cost.
Proc. of SPIE Vol. 5150
1757
SHR camera
Mark inserter ES1000 HDTV encoder x3
Studium
network
HDTV decoder x3
Mark inserter and synchronizer DS1000
SHR projector
. SHR image transmission experiment for a soccer tournament
Figure 10
SHR image (HDTVx4) Mark inserter ES1000
HDTV encoder x4
HDTV image
HDTV encoder x1
HDTV image
HDTV encoder x1
4ch Audio Concert hall in Matsumoto
Speaker
Mark detector and synchronizer DS1000 4ch Audio
HDTV decoder x4
Network
SHR image (HDTVx4) SHR projector
HDTV decoder x1
NTT R&D cneter in Musashino, Tokyo
HDTV projector
HDTV image
HDTV decoder x1
HDTV projector
HDTV image
HDTV decoder x1
Speaker
4ch Audio Palette town in Odaiba, Tokyo
. SHR image transfer experiment for an orchestra concert
Figure 11
1758
Proc. of SPIE Vol. 5150
REFERENCES
1. M.Ikeda, T.Kondo, K.Nitta, K.Suguri, T.Yoshitome, T.Minami, J.Naganuma, and T.Ogura. "An MPEG2 Video Encoder LSI with Scalability for HDTV based on Three-layer Cooperative Architecture" Design Automation and Test in Europe Conference(DATE), 1999
2. M.Ikeda, T.Kondo, K.Nitta, K.Suguri, T.Yoshitome,T.Minami, J.Naganuma, and T.Ogura. "SuperENC: MPEG-2 Video Encoder LSI based on Three-layer Cooperative Architecture" IEEE Transaction on Consumer Electronics, Vol45, No.4, November 1999,pp1130-1133 3. T.Yoshitome,K.Nakamura,K.Nitta,M.Ikeda,and M.Endo: "Development of an HDTV MPEG-2 encoder based on multiple enhanced SDTV encoding LSIs,"Proc. ICCE2001, LosAngels,U.S.A.,pp.160-161,July 2001. 4. http://www.saito-kinen.com/index-e.html
Proc. of SPIE Vol. 5150
1759