Nov 14, 2007 - Youtube, Flickr, SmartVideo, Twango, ⦠⢠Mobile video. ⢠Camcorder, MMS, streaming ... (PLAY, STOP, PAUSE, FAST-FORWARD,. REWIND) ...
Multimedia Communications – Applications, Systems, and Methods Ye-Kui Wang Video/Image Transport & Systems Nokia Research Center A talk at University of Tampere 14 November 2007
Public 1
© 2005 Nokia
2007-11-14 / YKW
Outline • Multimedia communications applications • Multimedia communications systems • Video coding methods and standards • Video transport methods and standards • Summary • Acknowledgement
Please feel free to interrupt for comments and questions at any time.
Public 2
© 2005 Nokia
2007-11-14 / YKW
Multimedia communications applications • • • •
VCD, DVD Digital TV Video conferencing and telephony Internet video • Video on demand • Peer-to-peer downloading • BitTorrent, eMule, eDonkey, …
• P2p Internet TV • PPLive, PPstream, Joost, …
• Video talk • MSN , QQ, Skype, …
• Broadcast yourself • Youtube, Flickr, SmartVideo, Twango, …
• Mobile video • Camcorder, MMS, streaming, video telephony, mobile TV
• Convergence of mobility and Internet Public 3
© 2005 Nokia
2007-11-14 / YKW
Evolution of mobile video technology and applications
Multicast/ Broadcast Video telephony Packet-Switched Video Telephony See What I See / Real-Time Video Sharing Circuit-Switched Video Telephony Video Streaming MMS video File playback Video recording Public 4
© 2005 Nokia
2007-11-14 / YKW
3G videophone
3G networks
PSTN
Public 5
© 2005 Nokia
2007-11-14 / YKW
IP datacasting over DVB-H IP Datacast over DVB-H
MPEG-2 over DVB-T 24 Mbps
11 Mbps
In building coverage Power saving Optimal capacity 4-5 Mbps 3-5 TV programs for large screen Public 6
© 2005 Nokia
2007-11-14 / YKW
128-768 kbps 15-80 video streams for small screen
Outline •• Multimedia Multimedia communications communications applications applications • Multimedia communications systems •• Video Video coding coding methods methods and and standards standards •• Video Video transport transport methods methods and and standards standards •• Summary Summary •• Acknowledgement Acknowledgement
Public 7
© 2005 Nokia
2007-11-14 / YKW
Typical digital video system Input Capture S-Video or Composite In NTSC/PAL Video Decoder
Real-Time Signal Processing System Control and Communications Compression Decompression Encryption Formatting Transmission
Digital Processor(s) Digital Formats Video Converters Analogue Formats Public 8
© 2005 Nokia
2007-11-14 / YKW
Output Display S-Video or Composite Out NTSC/PAL Video encoder Video Palette RGB Out
Introduction to 3GPP packet-switched streaming service (PSS) • What is streaming • 3GPP streaming system architecture • PSS client architecture • PSS protocol stack • PSS processes • Typical PSS session • Standards involved in PSS
Public 9
© 2005 Nokia
2007-11-14 / YKW
What is streaming - 1 HSCSD GPRS WCDMA
Transmission network Multimedia content creation tools
Multimedia streaming servers
Streaming = playback while downloading
Public 10
© 2005 Nokia
2007-11-14 / YKW
Player in the user's terminal
What is streaming - 2 • A streaming system is a real-time system of the nonconversational type. • Real-time -> The playback of continuous media (e.g., audio and video) must occur in an synchronous fashion. • A streaming system is different from a conversational application. The former has the following properties: • One way data distribution (in downlink direction) • Not highly delay sensitive (no high degree of interactivity; initial start-up latency allowed) • Typically off-line media encoding (Pre-stored content) • Typical VCR user operations (PLAY, STOP, PAUSE, FAST-FORWARD, REWIND) Public 11
© 2005 Nokia
2007-11-14 / YKW
3GPP streaming system architecture S tream ing C lient C on tent S e rv e rs
C on tent C ac he UM TS C or e Ne tw o rk
GE RAN Gb
S GS N
GG S N
Gi
IP Network
I u ps
UTRAN
S tream ing C lient Public 12
© 2005 Nokia
2007-11-14 / YKW
U ser a n d term i n al pro files
P ortal s
Im a g e D ecoder V e c to r G ra p hic s D ecoder
T i m e d te xt D ecoder
A ud i o D ecoder
S o u nd O u tp ut
S peech D ecoder S y nt he ti c a ud i o D ecoder
S c e ne D e s c ri p ti o n
S e s s io n C o ntro l
User In te r fa c e
S e s s io n E s ta b li s h m e n t
T e r m i na l C a p a b i li ti e s
C a p a b i li ty E xc ha ng e
S cope of P S S
Public 13
© 2005 Nokia
2007-11-14 / YKW
3GPP L2
T e xt
Packet based network interface
Synchronisation
G ra p hi c s D i s p la y
Spatial layout
V id e o D ecoder
PSS client architecture
PSS protocol stack
Video Audio Speech
Capability e xcha nge Scene descriptio n Presenta tion description Still images Bitmap grap hics Vector grap hics Te xt Timed te xt Synthetic a udio
Capability e xcha nge Presenta tion description
HTTP
RTSP
Payload formats RTP UDP
TCP IP
Public 14
© 2005 Nokia
2007-11-14 / YKW
UDP
PSS processes • Session establishment (the methods to obtain the initial session description from a browser or directly entering the URL in the client UI). • SDP presentation description • SMIL (Synchronized Multimedia Integration Language) scene description • RTSP URL
• Capability exchange • Enables PSS servers to provide a wide range of devices with content suitable for the particular device in question depending on their characteristics and capabilities • Provides a smooth transition between different releases of PSS
• Session set-up and control • HTTP (HyperText Transfer Protocol) used for reliable transport of discrete media • RTSP (Real-Time Streaming Protocol) used for reliable or unreliable transport of session set-up and control of continuous media. • SDP is used as the format of the presentation description required by RTSP Public 15
© 2005 Nokia
2007-11-14 / YKW
Typical PSS session
SG SN
UE U TR A N /G E R A N & C N G et W eb/W A P P age with U R I
R TSP :D E SC R IB E (or other optional way to get content description file)
R TSP : SE TU P
Secondary P D P context activation request (Q oS = Stream ing): N ote R TSP : P LA Y
IP /U D P/R TP content
R TSP : TE A R D O W N Secondary P D P context deactivation request: N ote
Public 16
© 2005 Nokia
2007-11-14 / YKW
W A P /W eb server
W A P /W eb/ P resentation/ R TSP server
M edia server
Standards involved in 3GPP streaming service - 1 • Media coding standards • • • •
H.263, MPEG-4 Visual, H.264/AVC AMR, AMR-WB, AAC, AAC+, AMR-WB+, SP-MIDI JPEG, GIF, PNG, SVG XHTML, UTF-8, UCS-2, 3GPP time text format
• File format standards • 3GPP FF, MPEG-4 FF, AVC FF, JFIF, DCF, Mobile DLS, Mobile XMF
• Session setup and control protocols • RTSP, SDP, HTTP, UDP, TCP, IP, URL, URI, MIME, SMIL
• Data transport protocols • RTP, RTCP, UDP, TCP, IP, RTP payload formats (many)
• DRM and security standards • DRM, DCF, AES, SRTP
• Other standards • GZIP
Public 17
© 2005 Nokia
2007-11-14 / YKW
Standards involved in 3GPP streaming service - 2 • 3GPP • PSS (6 specs): Stage 1, General Description, 3GPP file format, Timed text format, 3GPP SMIL language profile, Protocols and codecs • AMR, AMR-WB, AAC+, AMR-WB+
• ITU-T • H.263, H.264/AVC, JPEG
• ISO/IEC • MPEG-4 Visual, H.264AVC, AAC, JPEG, UCS-2, AVC FF, MPEG-4 FF
• IETF • RTP, RTCP, UDP, TCP, IP, RTSP, SDP, URL, URI, MIME, RTP payload formats (many), PNG, RTP/AVPF, RTCP-XR, GZIP, SRTP, RTP-RX
• W3C • HTTP, XHTML, SMIL, CC/PP, RDF, SVG
• Other organizations • UTF-8 (Unicode), GIF (CompuServe), JFIF (C-cube), UAProf (WAP), SP-MIDI (MMA), Mobile DLS (MMA), Mobile XMF(MMA), DRM (OMA), DCF (OMA), AES (NIST)
Public 18
© 2005 Nokia
2007-11-14 / YKW
Outline •• Multimedia Multimedia communications communications applications applications •• Multimedia Multimedia communications communications systems systems • Video coding methods and standards •• Video Video transport transport methods methods and and standards standards •• Summary Summary •• Acknowledgement Acknowledgement
Public 19
© 2005 Nokia
2007-11-14 / YKW
Video coding: motivation Video Capture Device Driver
Store Transmit
Compression
Decompression
Video Display Device Driver
Without it… Format
Storage (90 min.)
Transmission
D1 (720x480)
83.7 GBytes
~15.5 Mbytes/s (124.4 Mbits/s)
CIF (352x288)
23.3 GBytes
~4.5 Mbytes/s (36.5 Mbits/s)
30 frames/s, 4:2:0
A movie won’t fit on a CD (800 MBytes) or a DVD (4.7 GBytes) …and it can’t be streamed over ADSL (384 Kbits/s – 1.5Mbits/s) or common Ethernet (10-100 Mbits/s) Public 20
© 2005 Nokia
2007-11-14 / YKW
Various standard resolutions Format
Application(s)
NTSC (59.94 fields/sec)
PAL (50 fields/sec)
D1
Full Analog Television Resolution
720 x 480
720 x 576
SIF
Resolution VHS VCR is capable of
352 x 240
352 x 288
Digital Television ATSC
4CIF CIF QCIF
NTSC PAL SECAM ATSC D1 CIF © 2005 Nokia
(three most common are shown)
Standard Definition (SDTV)
720 x 480 ( 60 frames or fields/sec)
High Definition (HDTV)
1280 x 720 ( 60 frames or fields/sec) 1920 x 1080 ( 60 frames or fields/sec)
Often used in Video Conferencing or for small screen applications (specified for various codecs, e.g. H.261)
704 x 576 (30 frames/sec) 352 x 288 (30 frames/sec) 176 x 144 (30 frames/sec)
National Television Standards Committee
TV format in North America, Japan and much of the world Phase Alternation Line TV format for Europe (and more of the world) Sequentiel Couleur Avec Memoire TV format for France (and a couple others) Advanced Television Systems Committee Digital TV standards (including HDTV) Standard Digital Videotape Format Often used to denote full standard TV resolution Common Interface Format or Common Interchange Format
Public 21
18 different resolutions/rates
2007-11-14 / YKW
The video coding problem
Compressed bitstream
Video encoder
010011101010101010
Video decoder
(Sent over wireless channel, via DVD etc.)
“Encode digitized video using as few bits as possible while acceptably maintaining the visual appearance” Public 22
© 2005 Nokia
2007-11-14 / YKW
Video coding: rate-distortion
Rate-Distortion (R-D) Curve 45
40
PSNR [db]
Video coding is about to achieve the best rate-distortion performance – i.e. to heighten the curve as much as possible. If would be ideal if the distortion is measured in subjective quality.
35 H.261 H.263 H.264 30
25
20 32
64
128 Bitrate [kbits/s]
Public 23
© 2005 Nokia
2007-11-14 / YKW
256
How do we achieve compression?
By removing redundant information from the video sequence
Types of redundancies in video sequences
¾
Spatial redundancy
¾
Perceptual redundancy
¾
Statistical redundancy
¾
Temporal redundancy
Coding techniques (tools) ¾
Transformation
=> Spatial redundancy
¾
Quantization
=> Perceptual redundancy
¾
Entropy Coding
=> Statistical redundancy
¾
Temporal prediction => Temporal redundancy
Public 24
© 2005 Nokia
2007-11-14 / YKW
Video coding: typical MC+DCT encoder DCT, Quantize, Entropy Encode
Input Frame
Motion Compensated Prediction
(Dotted Box Shows Decoder) Motion Comp. Predictor
Motion Estimation Public 25
© 2005 Nokia
2007-11-14 / YKW
Prior Coded Frame Approx
Encoded Residual (To Channel) Entropy Decode, Quant. Recon., Inverse DCT Approximated Input Frame (To Display) Frame Buffer (Delay)
Motion Vector and Block Mode Data (To Channel)
Picture coding types Types of Prediction
I
B
B
P
B
• Intra (I) Picture (a picture = a frame or a field) • Picture is coded based on spatial redundancy only
• Predicted (P) Picture • Picture is coded using prediction from prior I or P picture(s)
• Bi-directionally predicted (B) Picture • • • • •
Picture is coded with bi-directional (forward and backward) prediction Prediction based on I and P frames (not other B pictures) Not a source of prediction for any other pictures Since 2 ref. pictures are needed to decode, more memory is needed Pictures may be transmitted out of sequence to simplify decoding
Public 26
© 2005 Nokia
2007-11-14 / YKW
B
P
GOP, picture, slices and macroblocks Video Sequence
I Picture
Block
I Picture
Cb
Y Group of Pictures (GOP)
Slice
1
2
5
3
4
6 Cr
Picture Public 27
© 2005 Nokia
2007-11-14 / YKW
Macroblock
Macroblock and blocks Each macroblock consists of four luminance blocks and 2 chrominance blocks ¾Each luminance or chrominance block relates to 8 pixels by 8 lines of Y, Cb or Cr (chroma format 4:2:0) 8
5 8
1
2 Cb
16
3 16 Public 28
© 2005 Nokia
2007-11-14 / YKW
4 Y
6 Cr
Brief history of video coding standards
Compression efficiency
SVC was finalized in October 2007 as extension of H.264/AVC video coding standard
Public 29
© 2005 Nokia
H.264
MPEG-4 H.263++
MPEG-1 H.261 1990 2007-11-14 / YKW
H.263 MPEG-2
1995
2000
2005
SVC/MVC
H.264/AVC encoder block diagram
Video Source
Coding Control
1 Intra Prediction
+_
Intra
Quantization
Transform Inter
Quantized Transform Coefficients
3 2 Inverse Quantization
Predicted Frame
Motion Compensation
+ + Frame Store
7 Public 30
© 2005 Nokia
2007-11-14 / YKW
De-Blocking Filter
6 Motion Estimation
Entropy Coding
Inverse Transform
Motion Vectors
5
Bit Stream Out
4
Scalable video coding (SVC): scalability types • Temporal scalability • Spatial scalability • SNR or quality or fidelity scalability • Bit-depth scalability • Chroma format scalability • Region of Interest scalability • Combined scalability
Public 31
© 2005 Nokia
2007-11-14 / YKW
Temporal scalability Hierarchical B pictures typically used
Public 32
© 2005 Nokia
2007-11-14 / YKW
Spatial scalability
• Use up-sampled base layer for prediction of enhancement layer
Public 33
© 2005 Nokia
2007-11-14 / YKW
History of scalable video coding standardization • MPEG-1 Visual, 1992: • Simple temporal scalability using traditional B pictures (bi-directional prediction, non-reference)
• MPEG-2 Video (a.k.a. H.262), 1994, and H.263 +, 1998 • Simple temporal scalability using traditional B pictures • Spatial scalability • SNR scalability
• MPEG-4 Visual, 1998 • • • •
Simple temporal scalability using traditional B pictures Spatial scalability SNR scalability Fine-granularity scalability (FGS)
• H.264/AVC, 2003 • Advanced temporal scalability by encoding sub-sequence layers Public 34
© 2005 Nokia
2007-11-14 / YKW
The SVC Standards - history, status and schedule • SVC (H.264 Annex G, MPEG-4 SVC) • Jul. 2002: MPEG started the exploration and collecting requirements • Apr. 2004: MPEG call for proposals • 9 wavelet based and 5 AVC based responses
• • • • •
Oct. 2004: AVC-based proposal adopted as starting point Jan. 2005: project moved to JVT, and first WD (JD-1) out Jan. 2006: CD Jul. 2006: FCD Jul. 2007: Phase 1 frozen, to be approved by both ITU-T and MPEG within 2007
• SVC file format • First draft Apr. 2005 • Planed progress: one meeting cycle after SVC – to be frozen in Jan. 2008
• SVC RTP payload format • First draft Oct. 2005 • WG item Jul. 2006 • Plan to have last call Nov. 2007, RFC expected mid-2008 Public 35
© 2005 Nokia
2007-11-14 / YKW
Multiview video coding • Coding video sequences captured by multiple cameras from the same scene
Public 36
© 2005 Nokia
2007-11-14 / YKW
Example: 3DTV VIEW-1
TV/HDTV
VIEW-2
VIEW-3
Multi-view video encoder
Channel
Multi-view video decoder
Stereo system
-
Multi-view
VIEW-N
3DTV Public 37
© 2005 Nokia
2007-11-14 / YKW
3DTV
A typical MVC coding structure
Public 38
© 2005 Nokia
2007-11-14 / YKW
Outline •• Multimedia Multimedia communications communications applications applications •• Multimedia Multimedia communications communications systems systems •• Video Video coding coding methods methods and and standards standards • Video transport methods and standards •• Summary Summary •• Acknowledgement Acknowledgement
Public 39
© 2005 Nokia
2007-11-14 / YKW
Video transport standards • File format standards • Only useful for video transport in streaming (incl. MBMS) applications • Provision of info for timing, packetization, adaptation, remote control, etc. • ISO base media FF, 3GPP FF, MPEG-4 FF, AVC FF, DCF, AVS-M FF … • All the other FFs listed are derived from the ISO FF
• IETF standards • RTP and RTCP • Provision of real-time transport, timing, A-V sync, adaptation, etc.
• RTSP, SDP, SIP • Provision of mechanisms for session setup and control, option negotiation, etc.
• RTP payload format • Tell how to transport each media type using RTP • RTP payload format for H.263, H.263+, MPEG-4 Visual, H.264/AVC
• Standards recently developed (or still under development) Public 40
• RTP/AVPF, RTCP-XR, SRTP, RTP-RX, FLUTE, FEC
© 2005 Nokia
2007-11-14 / YKW
Video communication system and transmission errors Original video
Video source encoding
Video source decoding
Packetizing and channel coding
Depacketizing and channel decoding
Network
•Bit error • Wired: fading, noise • Wireless: attenuation, shadowing, fading, interference, noise
•Packet loss • Network congestion • Long delay • Bit error Public 41
© 2005 Nokia
2007-11-14 / YKW
Reconstructed video
Video error propagation Intra prediction sources
• Spatial error propagation, due to • variable length coding • intra prediction • Temporal error propagation, due to • inter prediction • In scalable video coding, inter-layer error propagation • Inter-layer prediction
The current macroblock
IDR
P
P
…
P
IDR
…
P
P
…
IDR
P
P
…
P
IDR
…
P
P
…
IDR
P
P
…
P
P
…
P
P
…
Inter prediction Inter-layer prediction
Public 42
© 2005 Nokia
2007-11-14 / YKW
Types of error resilience tools in real-time multimedia communications • Forward error control • • • •
Insertion of data that is redundant in error-free environment Redundant data helps in concealing or correction potential transmission errors Types: generic and content-aware Examples of content-aware forward error control in video coding: • Slices • Loss-aware macroblock mode selection / adaptive intra macroblock refresh • Redundant slices / pictures
• Error concealment by post-processing • Prediction the content of lost or corrupted data based on temporally and spatially adjacent correctly decoded data
• Interactive error correction and concealment • • • •
Feedback signal from a receiving terminal Error correction: retransmission of lost or corrupted signal Error concealment: avoid the usage of lost or corrupted part in coding Can happen in various layers in the transmission stack, e.g. • RLP/RLC (link layer) retransmission in the Ack mode • RTP layer retransmission
Public 43
© 2005 Nokia
2007-11-14 / YKW
Standard codec-level error resilience tools in H.264/AVC • H.264/AVC: • Tools supported by the old standards Intra picture/slice/macroblock coding Slicing Reference picture selection Scalable coding (temporal only, full scalability under development) Reference picture identification Data partitioning
• Parameter sets • Flexible macroblock order • Gradual decoding refresh • Redundant pictures • Scene information signaling • SP/SI pictures • Constrained intra prediction Public 44
© 2005 Nokia
2007-11-14 / YKW
Non-standard video-codec-level error resilience tools •Error detection •Error concealment •Error tracking •Multiple-description coding
Public 45
© 2005 Nokia
2007-11-14 / YKW
Transport-level error resilience tools •Forward error correction (FEC) •Retransmission •Prioritized transport and unequal error protection •Error detection • Sequence numbering (for packet loss detection) • FEC and/or cyclic redundancy check (CRC)
•Robust packetization •Robust scheduling
Public 46
© 2005 Nokia
2007-11-14 / YKW
Summary • Multimedia communications applications • Multimedia communications systems • Video coding methods and standards • Video transport methods and standards
Public 47
© 2005 Nokia
2007-11-14 / YKW
Acknowledgements • I am grateful to the following people who contributed material for the slides (listed in alphabetical order): • Imed Bouazizi • Kemal Ugur • Minhua Zhou • Miska Hannuksela • Stephan Wenger • Ying Chen
Public 48
© 2005 Nokia
2007-11-14 / YKW
Thanks for your attention! Questions & Comments?
Public 49
© 2005 Nokia
2007-11-14 / YKW