Implementing MPEG-4 Visual in software Yafan Zhao Laura Muir Iain Richardson
The Robert Gordon University Aberdeen, UK. E-mail:
[email protected],
[email protected],
[email protected]
Implementing MPEG-4 Visual in software •
Variable complexity video coding
•
Video coding optimization for sign language users
Variable complexity video coding
Video frame
Encoder
Controller Rt Rt: Target coded bitrate
Coded frame Sequence Statistics
Ct Ct: Target Computational complexity
Computation intensive functions in a software video CODEC Quantizer=8, search area=+/-15.5 100% 90% 80%
34.8%
70%
52.1%
Other functions
60% 50% 40%
39.1%
30%
Motion estimation
24.1%
20% 10%
26.1%
23.8%
TMN5
TMN8
DCT, IDCT, Q,IQ
0%
Testing environmental:
H.263/baseline, MPEG-4/simple profile
Controlling DCT complexity • Correlation between SAD of each block and End of Block (EOB)
SAD B ?
7
7
? ?
i? 0 j ? 0
Frame
C ij
EOB: Highest non-zero quantized DCT coefficient
SAD
EOB
Controlling DCT complexity • Decision thresholds Yes
SADB
SADB ? To ? Q No
DCT
Quant
Set quantized coefficients of block to zero
Entropy Coding
Controlling DCT complexity • Adaptive control algorithm Output from Motion estimation
Entropy coding
DCT coding Control algorithm
Cn
Ct
Ct: target complexity
Cn: measured complexity of frame n
Controlling DCT complexity • Experimental result of adaptive control algorithm Mother and Daughter, Q= 8, k= 6 0.6 Ct=0.5
DCT complexity
0.5
0.4 Ct=0.3 0.3
0.2 Ct=0.1 0.1
0 0
50
100
Frame
150
200
Controlling DCT complexity • Experimental result of adaptive control algorithm
Full complexity
Ct=0.5
Ct=0.3
Ct=0.1
Managing motion estimation complexity • Adaptive control algorithm diagram of motion estimation Micro-block
Motion estimation
DCT coding
Sn Control algorithm
St
Sn: Measured No of SAD operations for frame n St: Target No of SAD operations
Conclusions and future work Conclusion:
• •
Adaptive control algorithm of DCT enables consistent, predictable computation reduction without severe loss of visual quality Computational complexity of motion estimation algorithm can be reduced to a target level with a flexible trade-off between computational complexity and video quality
Future work:
• •
To integrate computational management of DCT and motion estimation functions To develop an integrated approach to the control of rate, complexity and distortion
Visual Media Standards for Today and Tomorrow IEE, 25 April 2002 Implementing MPEG-4 Visual in Software Videotelephony for the Deaf: Analysis and Development of an Optimised Video Compression Product Laura Muir, Lecturer in IT & Statistics, The Robert Gordon University. email :
[email protected]
Videotelephony for the Deaf Aim: •
To develop an optimised video compression product for deaf users of videoconferencing.
Rationale: •
The current poor performance of video communication systems, for the target users, at low bit rates. The possibility of achieving quality sign language communication by selectively prioritising image data.
•
Advice & Previous Research: • • •
Advice on deaf issues: Lilian Lawson, Director, Scottish Council on Deafness. MSc - Identified scope for optimising video quality and frame rate for deaf users and established contacts. ITU quality metrics for performance rating and benchmarking.
Target Users •
Communication (sign language, lip reading, finger spelling, facial expression)
Video Telephony
Dedicated Video Telephones
Multimedia Terminal
MPEG-4 Visual • • • • •
International standard for coding visual media data. Part of MPEG-4 standard which describes mechanisms for coding, multiplexing and presenting a range of media, including video images. Set of tools for coding video images for efficient storage, transmission across networks and viewing/manipulation by end-users. Improved flexibility and efficiency over MPEG-2. Consists of: – Core CODEC based on ITU-T H.263 standard. Video Object Plane
Motion Texture (MV) (DCT)
Bitstream
– Additional coding tools (also codes shape and transparency information) Video Object Plane
Shape Motion Texture (MV) (DCT)
Bitstream
MPEG-4 Visual: Core CODEC Video frames Motion Comp.
DCT
Quant
Motion Vectors
RLE
VLC
Headers
Motion Est. Motion Vectors
Recon.
Zigzag
IDCT
Rescale
Buffer
Optimisation Identify end-user requirements Identify important regions of image for the target user Determine threshold for visual quality and update rate for visually important areas Segmentation and coding
Prioritisation Options
Accurate Segmentation
Approximate Segmentation
Quality Measurement Image Quality: PSNR Short Header, QCIF, 10 fps: TM5 (Rate Control)
45 TM5_Car_QCIF TM5_Claire_QCIF 40 PSNR (dB)
•
TM5_Foreman_QCIF
35
30
25 0
20000
40000
60000
80000
100000
Rate (bits/second)
120000
140000
160000
180000
Quality Measurement Perceived Video Quality: •
Subjective quality scale: – – – – –
• •
Excellent Good Fair Poor Bad
Important visual criteria specified by the target end-user. Reliability and ease of sign language communication are the desired results rather than optimal quality across the whole video scene.
Objectives Investigation: •
To analyse the current videophone communications and target user market (sign language users).
Analysis: • •
•
To specify end-user requirements and quality measurement criteria based on previous research results and input from sign language users. To evaluate current videophone technology and video compression standards and quantify the limitations of existing technology for sign language communication. To propose a product research and development strategy for an optimised videotelephony product.
Objectives Design: •
To define options for optimisation of videophone video compression systems.
Implementation: •
To develop one or more optimised video compression solutions for the target user market.
Maintenance & Review •
To evaluate the suitability of the optimised video compression solutions for the target user market.
References • •
•
•
•
•
Iain E G Richardson and Yafan Zhao, “Adaptive Algorithms for Variablecomplexity Video Coding”, Proc. ICIP01, September 2001 Y Zhao and I E G Richardson, "Computational Complexity Management of Motion Estimation in Video Encoders", IEEE Data Compression conference (DCC02), Snowbird, Utah, April 2002. Hellstrom,G., Delvert,J., November 1996. Quality Measurement for Video Communication of Sign Language. [www] http://www.omnitor.se/textversion/english/qualityonvideo.html. ISO/IEC 14496-2, July 2001. Information Technology - Coding of Audio Visual Objects - Part2: Visual (MPEG-4). Annexe F.1, Automatic and SemiAutomatic Segmentations. O’Malley, C. et al, 1998. Fitness-for-Purpose of Videotelephony in Face-toFace Situations, SINTEF/Infomatics Project Report. [www] http://www.sintef.no/units/informatics/projects/visavis/status.htm. More information: – http://www.rgu.ac.uk/eng/ict (from May 2002)