Multimedia communications applications

3 downloads 0 Views 2MB Size Report
Nov 14, 2007 - Youtube, Flickr, SmartVideo, Twango, … • Mobile video. • Camcorder, MMS, streaming ... (PLAY, STOP, PAUSE, FAST-FORWARD,. REWIND) ...
Multimedia Communications – Applications, Systems, and Methods Ye-Kui Wang Video/Image Transport & Systems Nokia Research Center A talk at University of Tampere 14 November 2007

Public 1

© 2005 Nokia

2007-11-14 / YKW

Outline • Multimedia communications applications • Multimedia communications systems • Video coding methods and standards • Video transport methods and standards • Summary • Acknowledgement

Please feel free to interrupt for comments and questions at any time.

Public 2

© 2005 Nokia

2007-11-14 / YKW

Multimedia communications applications • • • •

VCD, DVD Digital TV Video conferencing and telephony Internet video • Video on demand • Peer-to-peer downloading • BitTorrent, eMule, eDonkey, …

• P2p Internet TV • PPLive, PPstream, Joost, …

• Video talk • MSN , QQ, Skype, …

• Broadcast yourself • Youtube, Flickr, SmartVideo, Twango, …

• Mobile video • Camcorder, MMS, streaming, video telephony, mobile TV

• Convergence of mobility and Internet Public 3

© 2005 Nokia

2007-11-14 / YKW

Evolution of mobile video technology and applications

Multicast/ Broadcast Video telephony Packet-Switched Video Telephony See What I See / Real-Time Video Sharing Circuit-Switched Video Telephony Video Streaming MMS video File playback Video recording Public 4

© 2005 Nokia

2007-11-14 / YKW

3G videophone

3G networks

PSTN

Public 5

© 2005 Nokia

2007-11-14 / YKW

IP datacasting over DVB-H IP Datacast over DVB-H

MPEG-2 over DVB-T 24 Mbps

11 Mbps

In building coverage Power saving Optimal capacity 4-5 Mbps 3-5 TV programs for large screen Public 6

© 2005 Nokia

2007-11-14 / YKW

128-768 kbps 15-80 video streams for small screen

Outline •• Multimedia Multimedia communications communications applications applications • Multimedia communications systems •• Video Video coding coding methods methods and and standards standards •• Video Video transport transport methods methods and and standards standards •• Summary Summary •• Acknowledgement Acknowledgement

Public 7

© 2005 Nokia

2007-11-14 / YKW

Typical digital video system Input Capture S-Video or Composite In NTSC/PAL Video Decoder

Real-Time Signal Processing System Control and Communications Compression Decompression Encryption Formatting Transmission

Digital Processor(s) Digital Formats Video Converters Analogue Formats Public 8

© 2005 Nokia

2007-11-14 / YKW

Output Display S-Video or Composite Out NTSC/PAL Video encoder Video Palette RGB Out

Introduction to 3GPP packet-switched streaming service (PSS) • What is streaming • 3GPP streaming system architecture • PSS client architecture • PSS protocol stack • PSS processes • Typical PSS session • Standards involved in PSS

Public 9

© 2005 Nokia

2007-11-14 / YKW

What is streaming - 1 HSCSD GPRS WCDMA

Transmission network Multimedia content creation tools

Multimedia streaming servers

Streaming = playback while downloading

Public 10

© 2005 Nokia

2007-11-14 / YKW

Player in the user's terminal

What is streaming - 2 • A streaming system is a real-time system of the nonconversational type. • Real-time -> The playback of continuous media (e.g., audio and video) must occur in an synchronous fashion. • A streaming system is different from a conversational application. The former has the following properties: • One way data distribution (in downlink direction) • Not highly delay sensitive (no high degree of interactivity; initial start-up latency allowed) • Typically off-line media encoding (Pre-stored content) • Typical VCR user operations (PLAY, STOP, PAUSE, FAST-FORWARD, REWIND) Public 11

© 2005 Nokia

2007-11-14 / YKW

3GPP streaming system architecture S tream ing C lient C on tent S e rv e rs

C on tent C ac he UM TS C or e Ne tw o rk

GE RAN Gb

S GS N

GG S N

Gi

IP Network

I u ps

UTRAN

S tream ing C lient Public 12

© 2005 Nokia

2007-11-14 / YKW

U ser a n d term i n al pro files

P ortal s

Im a g e D ecoder V e c to r G ra p hic s D ecoder

T i m e d te xt D ecoder

A ud i o D ecoder

S o u nd O u tp ut

S peech D ecoder S y nt he ti c a ud i o D ecoder

S c e ne D e s c ri p ti o n

S e s s io n C o ntro l

User In te r fa c e

S e s s io n E s ta b li s h m e n t

T e r m i na l C a p a b i li ti e s

C a p a b i li ty E xc ha ng e

S cope of P S S

Public 13

© 2005 Nokia

2007-11-14 / YKW

3GPP L2

T e xt

Packet based network interface

Synchronisation

G ra p hi c s D i s p la y

Spatial layout

V id e o D ecoder

PSS client architecture

PSS protocol stack

Video Audio Speech

Capability e xcha nge Scene descriptio n Presenta tion description Still images Bitmap grap hics Vector grap hics Te xt Timed te xt Synthetic a udio

Capability e xcha nge Presenta tion description

HTTP

RTSP

Payload formats RTP UDP

TCP IP

Public 14

© 2005 Nokia

2007-11-14 / YKW

UDP

PSS processes • Session establishment (the methods to obtain the initial session description from a browser or directly entering the URL in the client UI). • SDP presentation description • SMIL (Synchronized Multimedia Integration Language) scene description • RTSP URL

• Capability exchange • Enables PSS servers to provide a wide range of devices with content suitable for the particular device in question depending on their characteristics and capabilities • Provides a smooth transition between different releases of PSS

• Session set-up and control • HTTP (HyperText Transfer Protocol) used for reliable transport of discrete media • RTSP (Real-Time Streaming Protocol) used for reliable or unreliable transport of session set-up and control of continuous media. • SDP is used as the format of the presentation description required by RTSP Public 15

© 2005 Nokia

2007-11-14 / YKW

Typical PSS session

SG SN

UE U TR A N /G E R A N & C N G et W eb/W A P P age with U R I

R TSP :D E SC R IB E (or other optional way to get content description file)

R TSP : SE TU P

Secondary P D P context activation request (Q oS = Stream ing): N ote R TSP : P LA Y

IP /U D P/R TP content

R TSP : TE A R D O W N Secondary P D P context deactivation request: N ote

Public 16

© 2005 Nokia

2007-11-14 / YKW

W A P /W eb server

W A P /W eb/ P resentation/ R TSP server

M edia server

Standards involved in 3GPP streaming service - 1 • Media coding standards • • • •

H.263, MPEG-4 Visual, H.264/AVC AMR, AMR-WB, AAC, AAC+, AMR-WB+, SP-MIDI JPEG, GIF, PNG, SVG XHTML, UTF-8, UCS-2, 3GPP time text format

• File format standards • 3GPP FF, MPEG-4 FF, AVC FF, JFIF, DCF, Mobile DLS, Mobile XMF

• Session setup and control protocols • RTSP, SDP, HTTP, UDP, TCP, IP, URL, URI, MIME, SMIL

• Data transport protocols • RTP, RTCP, UDP, TCP, IP, RTP payload formats (many)

• DRM and security standards • DRM, DCF, AES, SRTP

• Other standards • GZIP

Public 17

© 2005 Nokia

2007-11-14 / YKW

Standards involved in 3GPP streaming service - 2 • 3GPP • PSS (6 specs): Stage 1, General Description, 3GPP file format, Timed text format, 3GPP SMIL language profile, Protocols and codecs • AMR, AMR-WB, AAC+, AMR-WB+

• ITU-T • H.263, H.264/AVC, JPEG

• ISO/IEC • MPEG-4 Visual, H.264AVC, AAC, JPEG, UCS-2, AVC FF, MPEG-4 FF

• IETF • RTP, RTCP, UDP, TCP, IP, RTSP, SDP, URL, URI, MIME, RTP payload formats (many), PNG, RTP/AVPF, RTCP-XR, GZIP, SRTP, RTP-RX

• W3C • HTTP, XHTML, SMIL, CC/PP, RDF, SVG

• Other organizations • UTF-8 (Unicode), GIF (CompuServe), JFIF (C-cube), UAProf (WAP), SP-MIDI (MMA), Mobile DLS (MMA), Mobile XMF(MMA), DRM (OMA), DCF (OMA), AES (NIST)

Public 18

© 2005 Nokia

2007-11-14 / YKW

Outline •• Multimedia Multimedia communications communications applications applications •• Multimedia Multimedia communications communications systems systems • Video coding methods and standards •• Video Video transport transport methods methods and and standards standards •• Summary Summary •• Acknowledgement Acknowledgement

Public 19

© 2005 Nokia

2007-11-14 / YKW

Video coding: motivation Video Capture Device Driver

Store Transmit

Compression

Decompression

Video Display Device Driver

Without it… Format

Storage (90 min.)

Transmission

D1 (720x480)

83.7 GBytes

~15.5 Mbytes/s (124.4 Mbits/s)

CIF (352x288)

23.3 GBytes

~4.5 Mbytes/s (36.5 Mbits/s)

30 frames/s, 4:2:0

A movie won’t fit on a CD (800 MBytes) or a DVD (4.7 GBytes) …and it can’t be streamed over ADSL (384 Kbits/s – 1.5Mbits/s) or common Ethernet (10-100 Mbits/s) Public 20

© 2005 Nokia

2007-11-14 / YKW

Various standard resolutions Format

Application(s)

NTSC (59.94 fields/sec)

PAL (50 fields/sec)

D1

Full Analog Television Resolution

720 x 480

720 x 576

SIF

Resolution VHS VCR is capable of

352 x 240

352 x 288

Digital Television ATSC

4CIF CIF QCIF

NTSC PAL SECAM ATSC D1 CIF © 2005 Nokia

(three most common are shown)

Standard Definition (SDTV)

720 x 480 ( 60 frames or fields/sec)

High Definition (HDTV)

1280 x 720 ( 60 frames or fields/sec) 1920 x 1080 ( 60 frames or fields/sec)

Often used in Video Conferencing or for small screen applications (specified for various codecs, e.g. H.261)

704 x 576 (30 frames/sec) 352 x 288 (30 frames/sec) 176 x 144 (30 frames/sec)

National Television Standards Committee

TV format in North America, Japan and much of the world Phase Alternation Line TV format for Europe (and more of the world) Sequentiel Couleur Avec Memoire TV format for France (and a couple others) Advanced Television Systems Committee Digital TV standards (including HDTV) Standard Digital Videotape Format Often used to denote full standard TV resolution Common Interface Format or Common Interchange Format

Public 21

18 different resolutions/rates

2007-11-14 / YKW

The video coding problem

Compressed bitstream

Video encoder

010011101010101010

Video decoder

(Sent over wireless channel, via DVD etc.)

“Encode digitized video using as few bits as possible while acceptably maintaining the visual appearance” Public 22

© 2005 Nokia

2007-11-14 / YKW

Video coding: rate-distortion

Rate-Distortion (R-D) Curve 45

40

PSNR [db]

Video coding is about to achieve the best rate-distortion performance – i.e. to heighten the curve as much as possible. If would be ideal if the distortion is measured in subjective quality.

35 H.261 H.263 H.264 30

25

20 32

64

128 Bitrate [kbits/s]

Public 23

© 2005 Nokia

2007-11-14 / YKW

256

How do we achieve compression? ‰

By removing redundant information from the video sequence

‰

Types of redundancies in video sequences

‰

¾

Spatial redundancy

¾

Perceptual redundancy

¾

Statistical redundancy

¾

Temporal redundancy

Coding techniques (tools) ¾

Transformation

=> Spatial redundancy

¾

Quantization

=> Perceptual redundancy

¾

Entropy Coding

=> Statistical redundancy

¾

Temporal prediction => Temporal redundancy

Public 24

© 2005 Nokia

2007-11-14 / YKW

Video coding: typical MC+DCT encoder DCT, Quantize, Entropy Encode

Input Frame

Motion Compensated Prediction

(Dotted Box Shows Decoder) Motion Comp. Predictor

Motion Estimation Public 25

© 2005 Nokia

2007-11-14 / YKW

Prior Coded Frame Approx

Encoded Residual (To Channel) Entropy Decode, Quant. Recon., Inverse DCT Approximated Input Frame (To Display) Frame Buffer (Delay)

Motion Vector and Block Mode Data (To Channel)

Picture coding types Types of Prediction

I

B

B

P

B

• Intra (I) Picture (a picture = a frame or a field) • Picture is coded based on spatial redundancy only

• Predicted (P) Picture • Picture is coded using prediction from prior I or P picture(s)

• Bi-directionally predicted (B) Picture • • • • •

Picture is coded with bi-directional (forward and backward) prediction Prediction based on I and P frames (not other B pictures) Not a source of prediction for any other pictures Since 2 ref. pictures are needed to decode, more memory is needed Pictures may be transmitted out of sequence to simplify decoding

Public 26

© 2005 Nokia

2007-11-14 / YKW

B

P

GOP, picture, slices and macroblocks Video Sequence

I Picture

Block

I Picture

Cb

Y Group of Pictures (GOP)

Slice

1

2

5

3

4

6 Cr

Picture Public 27

© 2005 Nokia

2007-11-14 / YKW

Macroblock

Macroblock and blocks ‰Each macroblock consists of four luminance blocks and 2 chrominance blocks ¾Each luminance or chrominance block relates to 8 pixels by 8 lines of Y, Cb or Cr (chroma format 4:2:0) 8

5 8

1

2 Cb

16

3 16 Public 28

© 2005 Nokia

2007-11-14 / YKW

4 Y

6 Cr

Brief history of video coding standards

Compression efficiency

SVC was finalized in October 2007 as extension of H.264/AVC video coding standard

Public 29

© 2005 Nokia

H.264

MPEG-4 H.263++

MPEG-1 H.261 1990 2007-11-14 / YKW

H.263 MPEG-2

1995

2000

2005

SVC/MVC

H.264/AVC encoder block diagram

Video Source

Coding Control

1 Intra Prediction

+_

Intra

Quantization

Transform Inter

Quantized Transform Coefficients

3 2 Inverse Quantization

Predicted Frame

Motion Compensation

+ + Frame Store

7 Public 30

© 2005 Nokia

2007-11-14 / YKW

De-Blocking Filter

6 Motion Estimation

Entropy Coding

Inverse Transform

Motion Vectors

5

Bit Stream Out

4

Scalable video coding (SVC): scalability types • Temporal scalability • Spatial scalability • SNR or quality or fidelity scalability • Bit-depth scalability • Chroma format scalability • Region of Interest scalability • Combined scalability

Public 31

© 2005 Nokia

2007-11-14 / YKW

Temporal scalability Hierarchical B pictures typically used

Public 32

© 2005 Nokia

2007-11-14 / YKW

Spatial scalability

• Use up-sampled base layer for prediction of enhancement layer

Public 33

© 2005 Nokia

2007-11-14 / YKW

History of scalable video coding standardization • MPEG-1 Visual, 1992: • Simple temporal scalability using traditional B pictures (bi-directional prediction, non-reference)

• MPEG-2 Video (a.k.a. H.262), 1994, and H.263 +, 1998 • Simple temporal scalability using traditional B pictures • Spatial scalability • SNR scalability

• MPEG-4 Visual, 1998 • • • •

Simple temporal scalability using traditional B pictures Spatial scalability SNR scalability Fine-granularity scalability (FGS)

• H.264/AVC, 2003 • Advanced temporal scalability by encoding sub-sequence layers Public 34

© 2005 Nokia

2007-11-14 / YKW

The SVC Standards - history, status and schedule • SVC (H.264 Annex G, MPEG-4 SVC) • Jul. 2002: MPEG started the exploration and collecting requirements • Apr. 2004: MPEG call for proposals • 9 wavelet based and 5 AVC based responses

• • • • •

Oct. 2004: AVC-based proposal adopted as starting point Jan. 2005: project moved to JVT, and first WD (JD-1) out Jan. 2006: CD Jul. 2006: FCD Jul. 2007: Phase 1 frozen, to be approved by both ITU-T and MPEG within 2007

• SVC file format • First draft Apr. 2005 • Planed progress: one meeting cycle after SVC – to be frozen in Jan. 2008

• SVC RTP payload format • First draft Oct. 2005 • WG item Jul. 2006 • Plan to have last call Nov. 2007, RFC expected mid-2008 Public 35

© 2005 Nokia

2007-11-14 / YKW

Multiview video coding • Coding video sequences captured by multiple cameras from the same scene

Public 36

© 2005 Nokia

2007-11-14 / YKW

Example: 3DTV VIEW-1

TV/HDTV

VIEW-2

VIEW-3

Multi-view video encoder

Channel

Multi-view video decoder

Stereo system

-

Multi-view

VIEW-N

3DTV Public 37

© 2005 Nokia

2007-11-14 / YKW

3DTV

A typical MVC coding structure

Public 38

© 2005 Nokia

2007-11-14 / YKW

Outline •• Multimedia Multimedia communications communications applications applications •• Multimedia Multimedia communications communications systems systems •• Video Video coding coding methods methods and and standards standards • Video transport methods and standards •• Summary Summary •• Acknowledgement Acknowledgement

Public 39

© 2005 Nokia

2007-11-14 / YKW

Video transport standards • File format standards • Only useful for video transport in streaming (incl. MBMS) applications • Provision of info for timing, packetization, adaptation, remote control, etc. • ISO base media FF, 3GPP FF, MPEG-4 FF, AVC FF, DCF, AVS-M FF … • All the other FFs listed are derived from the ISO FF

• IETF standards • RTP and RTCP • Provision of real-time transport, timing, A-V sync, adaptation, etc.

• RTSP, SDP, SIP • Provision of mechanisms for session setup and control, option negotiation, etc.

• RTP payload format • Tell how to transport each media type using RTP • RTP payload format for H.263, H.263+, MPEG-4 Visual, H.264/AVC

• Standards recently developed (or still under development) Public 40

• RTP/AVPF, RTCP-XR, SRTP, RTP-RX, FLUTE, FEC

© 2005 Nokia

2007-11-14 / YKW

Video communication system and transmission errors Original video

Video source encoding

Video source decoding

Packetizing and channel coding

Depacketizing and channel decoding

Network

•Bit error • Wired: fading, noise • Wireless: attenuation, shadowing, fading, interference, noise

•Packet loss • Network congestion • Long delay • Bit error Public 41

© 2005 Nokia

2007-11-14 / YKW

Reconstructed video

Video error propagation Intra prediction sources

• Spatial error propagation, due to • variable length coding • intra prediction • Temporal error propagation, due to • inter prediction • In scalable video coding, inter-layer error propagation • Inter-layer prediction

The current macroblock

IDR

P

P



P

IDR



P

P



IDR

P

P



P

IDR



P

P



IDR

P

P



P

P



P

P



Inter prediction Inter-layer prediction

Public 42

© 2005 Nokia

2007-11-14 / YKW

Types of error resilience tools in real-time multimedia communications • Forward error control • • • •

Insertion of data that is redundant in error-free environment Redundant data helps in concealing or correction potential transmission errors Types: generic and content-aware Examples of content-aware forward error control in video coding: • Slices • Loss-aware macroblock mode selection / adaptive intra macroblock refresh • Redundant slices / pictures

• Error concealment by post-processing • Prediction the content of lost or corrupted data based on temporally and spatially adjacent correctly decoded data

• Interactive error correction and concealment • • • •

Feedback signal from a receiving terminal Error correction: retransmission of lost or corrupted signal Error concealment: avoid the usage of lost or corrupted part in coding Can happen in various layers in the transmission stack, e.g. • RLP/RLC (link layer) retransmission in the Ack mode • RTP layer retransmission

Public 43

© 2005 Nokia

2007-11-14 / YKW

Standard codec-level error resilience tools in H.264/AVC • H.264/AVC: • Tools supported by the old standards ƒ Intra picture/slice/macroblock coding ƒ Slicing ƒ Reference picture selection ƒ Scalable coding (temporal only, full scalability under development) ƒ Reference picture identification ƒ Data partitioning

• Parameter sets • Flexible macroblock order • Gradual decoding refresh • Redundant pictures • Scene information signaling • SP/SI pictures • Constrained intra prediction Public 44

© 2005 Nokia

2007-11-14 / YKW

Non-standard video-codec-level error resilience tools •Error detection •Error concealment •Error tracking •Multiple-description coding

Public 45

© 2005 Nokia

2007-11-14 / YKW

Transport-level error resilience tools •Forward error correction (FEC) •Retransmission •Prioritized transport and unequal error protection •Error detection • Sequence numbering (for packet loss detection) • FEC and/or cyclic redundancy check (CRC)

•Robust packetization •Robust scheduling

Public 46

© 2005 Nokia

2007-11-14 / YKW

Summary • Multimedia communications applications • Multimedia communications systems • Video coding methods and standards • Video transport methods and standards

Public 47

© 2005 Nokia

2007-11-14 / YKW

Acknowledgements • I am grateful to the following people who contributed material for the slides (listed in alphabetical order): • Imed Bouazizi • Kemal Ugur • Minhua Zhou • Miska Hannuksela • Stephan Wenger • Ying Chen

Public 48

© 2005 Nokia

2007-11-14 / YKW

Thanks for your attention! Questions & Comments?

Public 49

© 2005 Nokia

2007-11-14 / YKW