multicast multilayer videoconferencing: enhancement of a ... - CiteSeerX

MULTICAST MULTILAYER VIDEOCONFERENCING: ENHANCEMENT OF A MULTILAYER CODEC AND IMPLEMENTATION OF THE RECEIVER DRIVEN LAYERED MULTICAST PROTOCOL

A Thesis by RALPH AKRAM GHOLMIEH

Submitted to the Oce of Graduate Studies of Texas A&M University in partial ful llment of the requirements for the degree of MASTER OF SCIENCE

December 1997

Major Subject: Electrical Engineering

MULTICAST MULTILAYER VIDEOCONFERENCING: ENHANCEMENT OF A MULTILAYER CODEC AND IMPLEMENTATION OF THE RECEIVER DRIVEN LAYERED MULTICAST PROTOCOL A Thesis by RALPH AKRAM GHOLMIEH Submitted to Texas A&M University in partial ful llment of the requirements for the degree of MASTER OF SCIENCE Approved as to style and content by:

Pierce E. Cantrell (Chair of Committee) Jerry D. Gibson (Member)

Don R. Halverson (Member)

Udo W. Pooch (Member)

Chanan Singh (Head of Department) December 1997

Major Subject: Electrical Engineering

iii ABSTRACT Multicast Multilayer Videoconferencing: Enhancement of a Multilayer Codec and Implementation of the Receiver Driven Layered Multicast Protocol. (December 1997) Ralph Akram Gholmieh, B.S., Saint-Joseph University at Beirut Chair of Advisory Committee: Dr. Pierce E. Cantrell Videoconferencing is an important topic on the current Internet. The heterogeneity of the network, its access-to-all approach and the varying amount of available bandwidth present many challenges to researchers. Most of the available videoconferencing implementations send video and audio in a monolithic stream with an adaptive rate. In one-to-many sessions, the available end-to-end bandwidth between the source and a receiver might sharply dier from one end-user to another. It is clear that one rate cannot satisfy all the online users because that would force the source to use a least common denominator strategy. Earlier work of graduate students at the TAMU Multimedia and Networking Laboratory has focused on multi-layered encoding. The video is \multiplexed" on several streams. Receivers with more bandwidth can request more streams for a better picture quality. In this research, CafeMocha (the videoconferencing tool developed by Sazzad and Brown at Texas A&M) is enhanced from a two layer codec to a six layer codec. Layer control methods are then tested using the improved codec. A simpli ed version of the Receiver-driven Layered Multicast protocol (RLM) proposed by McCanne at UC Berkeley, is implemented and tested. Later, RLM itself is implemented, and tested in the one-to-one case. A new metric is de ned whereby each layer control scheme subscription path, under various rate limits, is compared to a de ned \ideal" layer

iv subscription sample path. We de ned performance as the ratio of the cumulative bandwidth delivered in the actual case to the cumulative bandwidth delivered in the \ideal" case. RLM performance values of 99.6% were recorded when the overall received bit rate was nearly constant. The performance was still good at 72.6% when the ideal highest subscription layer was bursty. RLM was found to be a good control mechanism for moderately bursty layered streams. Propositions for possible improvements are suggested in the conclusion.

v

To my parents Issaaf and Badiaa

vi ACKNOWLEDGMENTS I would like to take this opportunity to thank all the people who made this research possible. First and foremost, my thanks go to my advisor Dr. Pierce Cantrell, working under his guidance has been a pleasant and rewarding experience. My thanks also go to the members of my committee, Dr. Jerry Gibson, Dr. Udo Pooch and Dr. Don Halverson. This research has built upon and is heavily indebted to the previous research of current and former members of the TAMU Networking and Multimedia Laboratory. In particular, I would like to thank the original \CafeMocha team" formed of Tom Brown, Shari Sazzad and Charles Shroeder. I also would like to thank my colleague Sanku Jo for his help in developing and testing CafeMocha. My stay at A&M would not have been the same without my friends: Deanna, Aashit, Raul, Scott, and Franck, thank you for the great times we had together. To my parents Issaaf and Badiaa, your sacri ces, love and guidance will forever leave me in your debt. This degree is more your achievement than it is mine. To my brothers: Aziz, and Ghassan. The three musketeers will always follow their motto: \All for one, and one for all."

vii

TABLE OF CONTENTS CHAPTER I

II

III

Page INTRODUCTION : : : : : : : : : : : : : : A. Recent Developments . . . . . . . . . B. Layered Coding . . . . . . . . . . . . C. Previous Related Research at TAMU D. Thesis Outline . . . . . . . . . . . . .

:::::::: ........ ........ ........ ........ OVERVIEW OF RELEVANT PROTOCOLS : : : : : : : A. Real-Time Transport Protocol (RTP/RTCP) . . . . 1. Real-Time Transport Protocol (RTP) . . . . . 2. Real-time Control Protocol (RTCP) . . . . . . a. Sender Report Block . . . . . . . . . . . . b. Receiver Report Block . . . . . . . . . . . c. Goodbye RTCP Packets (BYE) . . . . . . d. Source Description RTCP Packets (SDES) e. Application-De ned RTCP Packets (APP) B. Pruning in Multicast Distribution . . . . . . . . . . 1. DVMRP Routing Protocol . . . . . . . . . . . 2. Pruning . . . . . . . . . . . . . . . . . . . . . . CAFEMOCHA : : : : : : : : : : : : : : : : : : : : : : : A. The Encoding Mechanism . . . . . . . . . . . . . . 1. Base Layer . . . . . . . . . . . . . . . . . . . . 2. Enhancement Layer . . . . . . . . . . . . . . . 3. Encoder Performance . . . . . . . . . . . . . . a. Base Layer . . . . . . . . . . . . . . . . . . b. Pyramidal Layer . . . . . . . . . . . . . . 4. Generalizing Assumption . . . . . . . . . . . . B. Split of the Base Layer . . . . . . . . . . . . . . . . 1. Threshold Separation . . . . . . . . . . . . . . 2. Use of a Bandwidth Limit on the Base Layer . C. Color Enhancement . . . . . . . . . . . . . . . . . . 1. YUV422-like Color Scheme . . . . . . . . . . . 2. YUV444-like Color Scheme . . . . . . . . . . .

: . . . . : . . . . . . . . . . . : . . . . . . . . . . . . .

: . . . . : . . . . . . . . . . . : . . . . . . . . . . . . .

: . . . . : . . . . . . . . . . . : . . . . . . . . . . . . .

: . . . . : . . . . . . . . . . . : . . . . . . . . . . . . .

1 2 3 8 9 10 10 10 12 14 15 17 17 18 19 19 19 21 21 21 23 23 23 27 27 31 31 33 39 39 43

viii CHAPTER

Page D. Correlation Between Y-band Layers and their Corresponding Color Layers . . . . . . . . . . . . . . . . . . E. Processing of Incoming Packets at the Decoder . . . . . F. The PARC Algorithm . . . . . . . . . . . . . . . . . . G. Stabilization of the Total Output Rate Using PARC . .

IV

V

VI

::::: ..... ..... ..... ..... ..... ..... RESULTS : : : : : : : : : : : : : : : : : : : : : : : : A. Testbed . . . . . . . . . . . . . . . . . . . . . . B. Layer Order . . . . . . . . . . . . . . . . . . . . C. Short Term Packet Loss Estimator . . . . . . . D. Practical Calculation of the Metrics . . . . . . . E. Basic One-to-One Scheme Results . . . . . . . . 1. Graphical Data Analysis . . . . . . . . . . . 2. Performance Metrics . . . . . . . . . . . . . F. RLM Results . . . . . . . . . . . . . . . . . . . 1. Graphical Data Analysis . . . . . . . . . . . 2. Performance Metrics . . . . . . . . . . . . . G. Qualitative Description of the Perceived Image . 1. General . . . . . . . . . . . . . . . . . . . . 2. Real-Time Layer Add/Drop . . . . . . . . . CONCLUSION AND FUTURE WORK : : : : : : : : CONTROL SCHEMES FOR THE LAYERS A. Overview . . . . . . . . . . . . . . . . . B. RLM . . . . . . . . . . . . . . . . . . . 1. Description . . . . . . . . . . . . . 2. Protocol Details . . . . . . . . . . C. Basic One-to-One Control Scheme . . . D. Metrics . . . . . . . . . . . . . . . . . .

: . . . . . . : . . . . . . . . . . . . . :

: . . . . . . : . . . . . . . . . . . . . :

: . . . . . . : . . . . . . . . . . . . . :

: . . . . . . : . . . . . . . . . . . . . :

. . . .

. . . .

48 49 50 53

: . . . . . . : . . . . . . . . . . . . . :

: . . . . . . : . . . . . . . . . . . . . :

56 56 58 58 59 64 66 69 69 71 72 73 74 75 91 93 95 111 113 113 115 117

REFERENCES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 120 VITA : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 124

ix

LIST OF TABLES TABLE

Page

I

Average Uncompressed and Compressed Rates for the \Miss America" Sequence Using Various Distance Threshold Values [4] : : : : : :

26

II

Average Uncompressed and Compressed Rates for the \Salesman" Sequence Using Various Distance Threshold Values [4] : : : : : : : :

26

III

Average Temporal Bandwidth use of the \When Harry Met Sally" Test Sequence for Dierent Quality values at 3 frames/second : : : :

29

IV

Modi ed YUV 4:2:2 Encoding : : : : : : : : : : : : : : : : : : : : : :

39

V

Possible Layer Combinations in Scheme 1 : : : : : : : : : : : : : : :

42

VI

Modi ed YUV 4:4:4 Encoding : : : : : : : : : : : : : : : : : : : : : :

43

VII

Possible Layer Combinations in Scheme 2 : : : : : : : : : : : : : : :

45

VIII

Average Temporal Bandwidth Rates of the Test Sequence for the Base Layers at 3 frames/second Using Color Scheme 2 : : : : : : : :

46

IX

Average Temporal Bandwidth Rates of the Test Sequence for Different Quality Values at 3 frames/second Using Color Scheme 2 for the Large Layer : : : : : : : : : : : : : : : : : : : : : : : : : : : :

47

X

De nition of Variables Showing in Fig. 31 : : : : : : : : : : : : : : :

61

XI

RLM Protocol Parameters : : : : : : : : : : : : : : : : : : : : : : : :

62

XII

RLM Protocol Variables and Update Equations : : : : : : : : : : : :

63

XIII

Variables and their De nitions : : : : : : : : : : : : : : : : : : : : :

64

XIV

Values Chosen for the Parameters Controlling the Basic Control Scheme 75

XV

Performance of the Basic Control Scheme : : : : : : : : : : : : : : :

92

x TABLE

Page

XVI

Values Chosen for the Parameters Controlling RLM : : : : : : : : : :

XVII

Performance of RLM Protocol : : : : : : : : : : : : : : : : : : : : : : 111

94

xi

LIST OF FIGURES FIGURE

Page

1

Example of a Heterogeneous Set of Connections : : : : : : : : : : : :

4

2

Layered Distribution of a Hierarchically Encoded Stream : : : : : : :

6

3

RTP Data Header : : : : : : : : : : : : : : : : : : : : : : : : : : : :

11

4

RTCP Common Header : : : : : : : : : : : : : : : : : : : : : : : : :

14

5

Sender Report RTCP Block : : : : : : : : : : : : : : : : : : : : : : :

14

6

Receiver Report RTCP Block : : : : : : : : : : : : : : : : : : : : : :

16

7

Canonical Name : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

18

8

Application Speci c RTCP Block : : : : : : : : : : : : : : : : : : : :

18

9

Sazzad's Pyramidal Encoding Scheme : : : : : : : : : : : : : : : : :

24

10

Frame Capture, Compression, and Packetization for Pyramidal Coder [3] : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

25

11

Test Sequence, Pyramidal Layer Coded at Quality 0 : : : : : : : : :

28

12

Average Rate per Quality Value for the \When Harry Met Sally" Test Sequence at 3 frames/second : : : : : : : : : : : : : : : : : : :

30

13

Flow Chart for Splitting Scheme Number 1 : : : : : : : : : : : : : :

32

14

Algorithm for Adapting the Value of the Separation Threshold in Splitting Scheme Number 1 : : : : : : : : : : : : : : : : : : : : : : :

33

15

Test Sequence Coded Using Scheme Number 1 : : : : : : : : : : : : :

34

16

Flow Chart for Splitting Scheme Number 2 : : : : : : : : : : : : : :

36

17

Example of Base Block Distribution over Two Consecutive Frames in Bandwidth Limit Scheme : : : : : : : : : : : : : : : : : : : : : : :

37

xii FIGURE

Page

18

Test Sequence Coded Using Scheme Number 2 : : : : : : : : : : : : :

38

19

Color Scheme 1 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

40

20

Upscaling Issues when Using the Same Pyramidal Coding Routine : :

41

21

Byte Order for The Pyramidal Layer Color Data : : : : : : : : : : :

41

22

Color Scheme 2 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

44

23

Ratio of the Rate on the Small UV-band Layer over the Rate on the Small Y-band Layer during the \When Harry Met Sally" Test Sequence : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

48

Ratio of the Rate on the Medium UV-band Layer over the Rate on the Medium Y-band Layer during the \When Harry Met Sally" Test Sequence : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

49

25

Decoding of the Incoming Packets : : : : : : : : : : : : : : : : : : : :

51

26

PARC Algorithm Used for Congestion Avoidance : : : : : : : : : : :

52

27

Total Bandwidth Used by the Y-band Layers (Small&Medium&Large) Controlled by the Enhanced Use of the PARC Algorithm under a Total Rate Limit of 200Kbps : : : : : : : : : : : : : : : : : : : : : :

53

28

Equations for the Enhanced PARC Control Process : : : : : : : : : :

54

29

Total Bandwidth Used by all Base Layers (Y-band Small&Medium, UV-band Small & Medium) and the Large Y-band Layer Controlled by the Enhanced Use of the PARC Algorithm under a Total Rate Limit of 300Kbps : : : : : : : : : : : : : : : : : : : : : :

55

24

30

Total Bandwidth Used by all Base Frame Layers (Y-band Small&Medium, UV-band Small & Medium) : : : : : : : : : : : : : : : : : : : : : : : 55

31

RLM Receiver State Diagram [8] : : : : : : : : : : : : : : : : : : : :

60

32

Algorithm for the Basic One-to-One Layer Control Scheme : : : : : :

65

33

Testbed Topology : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

69

xiii FIGURE

Page

34

Ideal Subscription Levels under a Router Rate Limit of 500kbps - Testing the One-to-One Basic Control Scheme. : : : : : : : : : : : :

76

35

Actual Receiver Subscription under the One-To-One Basic Control Scheme - Router Rate Limit of 500kbps : : : : : : : : : : : : : :

76

36

Ideal Bandwidth Distribution under a Router Rate Limit of 500kbps - Testing the One-to-One Basic Control Scheme : : : : : : : : : : : :

77

37

Observed Bandwidth Received under the One-To-One Basic Control Scheme - Router Rate Limit of 500kbps : : : : : : : : : : : : : :

77

38

Actual Packet Losses Recorded at the Receiver under the OneTo-One Basic Control Scheme - Router Rate Limit of 500kbps : : : :

78

39

Ideal Subscription Levels under a Router Rate Limit of 310kbps - Testing the One-to-One Basic Control Scheme : : : : : : : : : : : :

80

40

Actual Receiver Subscription under the One-To-One Basic Control Scheme - Router Rate Limit of 310kbps : : : : : : : : : : : : : :

80

41


81

42


81

43


82

44

Ideal Subscription Levels under a Router Rate Limit of 200kbps - Testing the One-to-One Basic Control Scheme : : : : : : : : : : : :

84

45

Actual Receiver Subscription under the One-To-One Basic Control Scheme - Router Rate Limit of 200kbp : : : : : : : : : : : : : :

84

46


85

47


85

xiv FIGURE

Page

48


86

49

Ideal Subscription Levels under a Router Rate Limit of 50kbps Testing the One-to-One Basic Control Scheme : : : : : : : : : : : : :

88

50

Actual Receiver Subscription under the One-To-One Basic Control Scheme - Router Rate Limit of 50kbps : : : : : : : : : : : : : : :

88

51


89

52

Observed Bandwidth Received under the One-To-One Basic Control Scheme - Router Rate Limit of 50kbps : : : : : : : : : : : : : : :

89

53


90

54

Ideal Subscription Levels under a Router Rate Limit of 500kbps - Testing RLM : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

96

55

Actual Receiver Subscription under RLM - Router Rate Limit of 500kbps : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

96

56

Ideal Bandwidth Distribution under a Router Rate Limit of 500kbps - Testing RLM : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

97

57

Observed Bandwidth Received under RLM - Router Rate Limit of 500kbps : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

97

58

Actual Packet Losses Recorded at the Receiver under RLM Router Rate Limit of 500kbps : : : : : : : : : : : : : : : : : : : : : :

98

59

Ideal Subscription Levels under a Router Rate Limit of 310kbps - Testing RLM : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 100

60

Actual Receiver Subscription under RLM - Router Rate Limit of 310kbps : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 100

61

Ideal Bandwidth Distribution under a Router Rate Limit of 310kbps - Testing RLM : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 101

xv FIGURE

Page

62

Observed Bandwidth Received under RLM - Router Rate Limit of 310kbps : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 101

63

Actual Packet Losses Recorded at the Receiver under RLM Router Rate Limit of 310kbps : : : : : : : : : : : : : : : : : : : : : : 102

64

Ideal Subscription Levels under a Router Rate Limit of 200kbps - Testing RLM : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 104

65

Actual Receiver Subscription under RLM - Router Rate Limit of 200kbps : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 104

66


67


68


69

Ideal Subscription Levels under a Router Rate Limit of 50kbps Testing RLM : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 108

70

Actual Receiver Subscription under RLM - Router Rate Limit of 50kbps : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 108

71


72


73


74

Small Y-band Layer, 160x120 greyscale, Level 0 : : : : : : : : : : : : 113

75

Small and Medium Y-band Layers, 160x120 greyscale : : : : : : : : : 114

76

Small, Medium and Large Y-band Layers, 320x240 greyscale : : : : : 115

1 CHAPTER I INTRODUCTION This last year has seen a multitude of new videoconferencing tools released for the Internet. Speci cally, the users of the Multicast Backbone (MBONE) [1] can already communicate and broadcast real-time multimedia sessions, albeit at low rates of less than 200Kbps. On the other hand, regular unicast users can already download encoded multimedia streams consisting of video and audio at very low rates. Vxtreme's Web Theater provides a reasonable video and audio quality at rates as low as 56Kbps [2]. During Spring 1997, a networking course at Stanford was put on-line and distributed via Vxtreme's Web Theater. As better applications get developed, distance learning via the Internet will no doubt become widely used. Undoubtfully, live courses will be available in the near future on the Internet. Companies are actively researching video-on-demand applications as the next possible \killer application" of the World-Wide-Web. Most of these applications try to reach the largest possible end-user population. Their high bandwidth requirement poses interesting challenges to researchers: What is the best way to distribute the multimedia streams? How can a source serve a multitude of end-users with varying connection bandwidth to each receiver? How can a source transmit to a multitude of receivers and at the same time make good use of the available bandwidth? This thesis addresses these problems in the case of real-time, one-to-many multicast videoconferencing sessions. The journal model is IEEE Transactions on Automatic Control.

2 A. Recent Developments Several important recent developments have made videoconferencing possible on the current Internet: 1. A multicast backbone has been deployed over the whole Internet. Typically, system managers have allocated around 500 Kbps to multicast streams on their incoming and outgoing links. Named the Internet Multicast Backbone or MBONE, it is an interconnected set of subnetworks and routers that support the delivery of IP multicast trac. Since 1992, the MBONE has grown from 40 subnets in four dierent countries to more than 3000 subnets in over 25 countries [10]. 2. Several protocols targeting Real-Time Communication have been de ned in Internet drafts. The most important are: RTP [11], RTCP [11], and RSVP [12]. RTP is the Real-time Transport Protocol, and RTCP is the Real-time Transport Control Protocol which is used for the control of RTP streams. RTP is a transport protocol usually placed above UDP in the protocol stack. Unlike TCP, RTP does not request a packet's retransmission in the case of delivery errors. This research uses RTP and RTCP as the transport layer for the testing videoconferencing tool. RSVP is the Resource Reservation Protocol; destined to manage bandwidth reservation demands, it is still rarely used. 3. Videoconferencing tools are now widely available. Most of these use a single multicast stream to send data to all the users. The most important are: CUSeeMe [13], nv [14], ivs [15], vat [16] and vic [17]. 4. Workstations and desktop computers are becoming increasingly powerful. The new computing power is fast enough to handle real-time encoders/decoders in

3 software. B. Layered Coding The monolithic transmission mentioned earlier adapts badly to the heterogeneous nature of the Internet. In a one-to-many conference, if the source sends more packets than a link or a router buer can handle, random packets are dropped. The receiver on the other side of the congested link would experience packet loss and a deterioration of the perceived signal. Because redundancy is eliminated from multimedia streams, a loss rate of a few percent can result in an unacceptable loss of quality to the receiver [18]. If source based rate control is the only control scheme used, the chosen rate would have to adjust to the smallest bandwidth link capacity. That is clearly unacceptable; a receiver on the same high bandwidth subnet as the source should not be limited to the bandwidth limitations of a receiver using an ISDN link to get to the Internet. An instantaneous large increase in the available bandwidth for each user would solve the problem, but that seems unlikely to happen. User demand for bandwidth continues to increase as more and more users access the Internet, and as applications continue to consume more and more bandwidth. Consider for instance the network example in Fig. 1. If source-based congestion control is used in conjunction with a unique resolution coder, the source would have to transmit at a rate lower than 28Kbps to satisfy receiver number one. Receiver R3 receives a 28.8Kbps stream even though its link capacity to the source is 10Mbps. Shacham et. al. [19] propose the separation of high and low bandwidth domains by \gateways" that would transform a high-bandwidth representation of a multimedia-stream into a low bandwidth representation. The conversion would have to be done \on-the- y" because of buer concerns and the nature of real-time streams.

4

R1

R2 28.8 Kbps modem connection

128Kbps ISDN Line

Router 10 Mbps Ethernet

Router

T1 Link 1.45 Mbps

Source 10 Mbps Ethernet

R3

Fig. 1. Example of a Heterogeneous Set of Connections Amir et. al. [20] successfully implemented a video gateway that converts a 6Mbps Motion JPEG video stream into a 128Kbps H.261 stream. Seminars broadcast at 6Mbps on the Bay Area Network (an ATM network) were retransmitted in the H.261 format to the rest of the MBONE users. Cheung et. al. [25] implement a simulcast scheme whereby several dierent versions of the same multimedia stream are simultaneously multicast. Each stream provides a dierent session quality level. The set of receivers subscribed to a particular stream can also \agree" to change its quality/bandwidth parameters within that stream's minimum and maximum bandwidth. The Destination Set Group (DSG) protocol that they developed is used by receivers to adapt to the available bandwidth. The intra-stream part of the protocol is used by receivers listening to the same stream to adjust the data rate of the stream within its prescribed limits. An inter-stream protocol is used by users to decide changes to a higher or lower quality stream as

5 their needs or bandwidth availability change. Deering [21] on the other hand proposes a realization of a multi-layered scheme where a source stripes the progressive layers of a hierarchically represented signal across multiple multicast groups. Receivers can then adapt to network heterogeneity by controlling their reception bandwidth through IP Multicast group membership. In layered coding, a data stream is divided into several sub-streams Si (1 i n) where the reception of the streams between S1 and Sm (m n) gives an increasing rendering quality as m increases. For example, a video stream can be enhanced by an increase in its size (width and height) or its depth (number of bits per pixel). To maintain eciency, the streams should not be redundant. All receivers that wish to receive the broadcast must subscribe to at least the rst layer (this is typically an RTP stream). In addition, the receiver subscribes to a control stream that allows it to make more intelligent decisions (this is typically an RTCP stream). Then, depending on the packet loss status, the receiver can join or drop layers. Consecutive multicast addresses are allocated to the multicast layers, and to the control streams. For layer n, the corresponding address is A + n, the data port is P +2n, and the control port is P +2n +1 [22], where A and P are the multicast group address and the port of the base data layer. To emphasize this point, let us go back to the network example in Fig. 1. Suppose that a three resolution video encoder is in use. Assume layer one has a bandwidth requirement of around 20 Kbps and delivers a minimum picture quality, and that the addition of layer two increases the picture's quality for a bandwidth increase of 80Kbps. Finally, assume that layer 3 increases the picture resolution for an additional increase of 400Kbps. Fig. 2 shows the ideal multicast distribution of the layers. Receiver R1 receives the maximum video quality constructed by decoding layers one, two and three with a total bandwidth requirement of 500Kbps. Receiver

6 R2 receives a moderate video quality by subscribing to both layers one and two for a total bandwidth of 100Kbps. Receiver three can still follow the videoconferencing session through a stream appropriate to its low bandwidth capability at 20Kbps. R1

R2

128Kbps ISDN Line

28.8 Kbps modem connection

Router 10 Mbps Ethernet

Router

T1 Link 1.45 Mbps

Source 10 Mbps Ethernet

R3

Layer 1 (20 Kbps) Layer 2 (80 Kbps) Layer 3 (400 Kbps)

Fig. 2. Layered Distribution of a Hierarchically Encoded Stream Layered coding transmission is only justi ed when multicast delivery trees are pruned. Pruning is the process by which multicast stream distribution trees are slowly \shrunk" to span only nodes that have subscribed receivers. This means that a multicast distribution tree for a speci c multicast layer only spans the layer's subscribed users. The routing daemon \mrouted" versions 3.0 and above implements multicast pruning. Implementation of the pruning of multicast delivery trees is detailed in the IETF Internet Draft for the Distance Vector Multicast Routing Protocol [23]. In its rst version, IPv6 provided for a 4 bit priority eld [24]. Use of a priority scheme can make multi-layered broadcasts more resilient to packet-loss as shown by Brown [3]. In contrast, McCanne et. al. [8] argue that the use of priority schemes

7 might encourage badly-behaved users to keep the network in a state of congestion by enabling them to obtain their optimal quality level while congesting a link. Note that the optimum subscription level is the same in both cases. At the last meeting of the IPng working group (IPv6 is also referred to as IP next generation), Steve Deering described and supported a proposal to change the meaning of the priority eld. The low order bit would mean that the packet is a part of \interactive" trac whereby delay is more important than throughput [30]. The signi cance of the other bits were to be de ned later. McCanne et. al. [8] propose the Receiver-driven Layered Multicast (RLM) Protocol, a non-reservation active receiver experiment scheme whereby additional layers are periodically added in the absence of signi cant packet loss. Since the optimal operating point will normally be just below the congestion point of the link (maximum link utilization), join experiments might have a negative impact on the overall performance if they are repeated too often. An exponentially increasing delay is imposed between failed experiments to handle this problem. In the case of high packet losses that do not correspond to join experiments, the receiver drops layers periodically until network congestion ceases. McCanne et. al. [8] stress the fact that receivers should communicate and announce their join experiments. Otherwise, concurrent join experiments aecting the same link might mislead the receiver adding the lower layer. The protocol's eciency would be negatively aected by receivers backing o erroneously. RLM is not compatible with router packet priority forwarding schemes since only the experimental layer will suer signi cant packet loss if the subnet is fully loaded prior to the experiment. When congestion occurs, other receivers will not perceive packet loss at lower layers and thus will be unaware of the negative experimental outcome.

8 C. Previous Related Research at TAMU Researchers in the TAMU Multimedia and Networking Lab have worked on layered data transmission since 1994. Their main research tool is CafeMocha [3]-[7], a videoconferencing tool. CafeMocha is a one-to-many implementation of a pyramidal encoder which divides video into two separate streams of information and distributes the streams using multicast channels. The base resolution video is coded using the CU-SeeMe compression algorithm [13] and sent on one multicast address. The large resolution video uses two multicast addresses, the medium channel plus an enhancement channel [3]. Thus, a user receiving layer one would require less bandwidth for a lower picture quality when compared to a user receiving both layers one and two. Brown [3] developed a quick recovery scheme in conjunction with source-based congestion avoidance techniques. In his research, the source adjusts a quantization parameter for the enhancement layer based upon RTCP packet loss reports. If low packet loss is reported, the source slowly increases quality. On the receiver side, the enhancement layer is dropped immediately with the onset of congestion. When receivers report that the enhancement layer was dropped due to congestion, the source decreases video quality, reducing the amount of data transmitted. Normally, the receiver waits up to several minutes before rejoining the enhancement layer in order to prevent recurrence of congestion. If the receiver notices that the source has reduced video quality however, it may rejoin the enhancement layer in a matter of seconds. Quality information is communicated to the receivers in RTCP sender reports. Schroeder [4] developed a rate-control algorithm for the enhancement layer. The Predictive/Adaptive Rate Control (PARC) algorithm he developed is given a target rate for the enhancement layer. This target rate is exponentially reduced in the presence of a consensus receiver high packet loss and arithmetically increased in the

9 opposite case. D. Thesis Outline To be able to follow this research potential readers should be familiar with the RTP protocol, and multicasting through DVMRP. Chapter II gives a basic overview of this indispensable networking infrastructure. Chapter III describes the changes made to the CafeMocha encoder developed by Sazzad [5]. The base layer is split in two, and color information is added in three new layers. In Chapter IV, layer control mechanisms are explained and presented. Chapter V presents the results of the simulations done using the layer-control mechanisms described in Chapter IV. A nal chapter concludes this thesis with a summary of what was learned and suggestions for future work.

10 CHAPTER II OVERVIEW OF RELEVANT PROTOCOLS The object of this chapter is to introduce the reader to the control information available through the use of the Real-Time Transport Protocol (RTP/RTCP) and to the Distance Vector Multicast Routing Protocol (DVMRP). Information about these topics is widely available, a particularly relevant web site can be found at [26]. A. Real-Time Transport Protocol (RTP/RTCP) 1. Real-Time Transport Protocol (RTP) RTP (version 2) is a real-time transport protocol that provides end-to-end delivery services to support applications transmitting real-time data over unicast and multicast network services. RTP is de ned in RFC 1889 titled \RTP: A Transport Protocol for Real-Time Applications,". A pro le for carrying audio and video over RTP is de ned in RFC 1890 titled \RTP Pro le for Audio and Video Conferences with Minimal Control" [28]. RTP provides payload type identi cation, sequence numbering, and time stamping. Control of real-time RTP sessions is carried out through the RTCP control protocol(see next section). RTP provides end-to-end delivery services, but it does not provide all of the functionality that is typically provided by a transport protocol. RTP typically runs on top of UDP to utilize its multiplexing and checksum services. Other transport protocols besides UDP can carry RTP as well. An RTP packet as de ned in RFC1889 consists of a common header, a list of contributing source identi ers, a potential pro le-speci c header extension, the actual payload, and some padding octets if required for encryption or by the underlying protocol.

11

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number SN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp TS | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Fig. 3. RTP Data Header Fig. 3 shows an RTP data header. Field (V) is a 2 bit version identi er that is currently set to 2. Padding bit (P) indicates whether the packet carries one or more padding octets at the end which are not part of the payload. The extension bit (X) indicates whether the xed header is followed by a header extension. The 4 bit CSRC count (CC) contains the number of contributing source CSRC identi ers that follow the SSRC identi er. The interpretation of the marker bit (M) is pro le-dependent. It can be used to mark signi cant events such as frame boundaries. For video, it usually identi es the last packet for a video frame which causes the receiver to render the image. CafeMocha uses the marker bit to identify the last packet on each layer. Once it receives the last packet on its two layers, the frame is rendered [3]. The sequence number (NS) and timestamp (TS) provide the timing information necessary to synchronize and display audio and video data and to determine whether packets have been lost or have arrived out of order. In addition, the header speci es the payload type (PT), thus allowing multiple data and compression types. RTP is tailored to a speci c application via auxiliary pro le and payload format speci cations. Each media stream is assigned a 32-bit session source identi er (SSRC). This 32-bit value should be unique accross a video-conferencing session.

12 An RTP session is de ned by a pair of destination transport addresses (one multicast group address plus a pair of ports for RTP and RTCP). In a multimedia session, information can be fragmented on dierent RTP sessions each having its own RTCP information reporting channel. This allows receivers to retrieve the particular media data that they want (for example, audio without video or vice-versa). RTP does not provide any mechanisms to ensure timely delivery or provide quality-of-service guarantees, nor does it assume that the underlying network is reliable. Auxiliary control mechanisms can be used if resource reservation or reliable service are required. 2. Real-time Control Protocol (RTCP) RTCP is the control protocol that works in conjunction with RTP. RTCP control packets are periodically transmitted by each participant in an RTP session to all other participants. Feedback of information to the application can be used to control performance and for diagnostic purposes. As de ned in RFC 1889 [28], RTCP performs the following four functions. 1. Provide information to applications: IP multicasting experiments have shown that receiver feedback is critical for analyzing distribution faults. RTCP's primary function is to report the quality of data distribution. Each RTCP packet contains sender and/or receiver reports that report statistics useful to the application. These statistics include number of packets sent, number of packets lost, interarrival jitter, etc. This reception quality feedback will be useful for the sender, receivers, and third-party monitors. For example, CafeMocha uses receiver reports to modify the target rate on its large layer [3].

13 2. Identify RTP sources: RTCP carries a transport-level identi er for an RTP source called the canonical name (CNAME). The CNAME is used to keep track of the participants in an RTP session. Receivers use the CNAME to associate multiple data streams from a given participant in a set of related RTP sessions, e.g., to synchronize audio and video. 3. Control RTCP transmission interval: RTCP control trac should not exceed 5 percent of the total bandwidth broadcast by active sources on the corresponding RTP session. This can be enforced by keeping track of the number of participants (a session subscriber receives RTCP control packets from all the participants and can thus calculate the exact number of subscribed users). The receiver can then estimate the time interval separating consecutive RTCP compound report packets. In CafeMocha's case, the limited number of receivers allowed us to x the RTCP report generation time interval to 5 seconds [3]. 4. Convey minimal information: RTCP packets can be a convenient method of exchanging a limited amount of information. For example, it can be used to exchange personal names in a loosely controlled session where participants informally enter and leave the session. The part common to all RTCP packets is shown in Fig. 4. Each RTCP packet carries in its header one of the following packet type (PT) codes: 1. 200 = SR Sender Report packet 2. 201 = RR Receiver Report packet

14

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P| RC | PT=SR=200 | length L | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC of sender | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Fig. 4. RTCP Common Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P| RC | PT=SR=200 | length L | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC of sender | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NTP timestamp, most significant word NTS | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NTP timestamp, least significant word | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTP timestamp RTS | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | sender's packet count SPC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | sender's octet count SOC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Fig. 5. Sender Report RTCP Block 3. 202 = SDES Source Description packet 4. 203 = BYE Goodbye packet 5. 204 = APP Application-de ned packet a. Sender Report Block A Sender Report (shown in Fig. 5) message consists of the header, the sender information block, a variable number of receiver report blocks, and potentially pro le-speci c extensions. The 5-bit RC eld speci es the number of reception report blocks con-

15 tained in this packet. The receiver can be \listening" to several multimedia streams. The 64-bit NTP timestamp (NTS) indicates the point of time measured in wall clock time when this report was sent. In combination with timestamps returned in reception reports from the respective receivers, it can be used to estimate the roundtrip propagation time to and from the receivers. The 32-bit RTP timestamp resembles the same time as the NTP timestamp (above), but is measured in the same units and with the same random oset as the RTP timestamps in data packets. This correspondence may be used for intra- and inter-media synchronization for sources whose NTP timestamps are synchronized, and may be used by media-independent receivers to estimate the nominal RTP clock frequency. The 32-bit sender's packet count (SPC) totals up the number of RTP data packets transmitted by the sender since joining the RTP session. This eld can be used to estimate the average data packet rate. The 32-bit total number of payload octets (SOC) (not including the header or any padding) transmitted in RTP data packets by the sender since starting up transmission. This eld can be used to estimate the average payload data rate. b. Receiver Report Block Fig. 6 shows a receiver report block: The SSRC identi es the sender whose reception is reported in this block. The sender of the receiver report estimates the fraction (F) of the RTP data packets from source SSRC n that was lost since the previous SR or RR packet. The sender of a receiver report block also tries to estimate the total number of RTP data packets from source SSRC n that have been lost since the beginning of reception in eld C. Packets that arrive late are not counted as lost, and the loss may

16

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P| RC | PT=RR=201 | length L | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC of sender | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC_1 (SSRC of first source) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |fraction lost F| cumulative number of packets lost C | -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | extended highest sequence number received EHSN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | inter-arrival jitter J | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | last SR LSR | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | delay since last SR DLSR | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | SSRC_2 (SSRC of second source) | : ... : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Fig. 6. Receiver Report RTCP Block

17 be negative if there are duplicates. The low 16 bits of the extended highest sequence number contain the highest sequence number received in an RTP data packet from source SSRC n, and the most signi cant 16 bits extend that sequence number with the corresponding count of sequence number cycles. J is an estimate of the statistical variance of the RTP data packet inter-arrival time, measured in timestamp units and expressed as an unsigned integer. c. Goodbye RTCP Packets (BYE) A participant sends a BYE packet to indicate that one or more sources are no longer active, optionally giving a reason for leaving. d. Source Description RTCP Packets (SDES) An SDES packet consists of an SDES header and a variable number of chunks for the described sources. Each chunk in turn consists of an SSRC/CSRC identi er and a collection of SDES items. SDES items themselves consist of an SDES item type code (8 bits), a length eld (8 bits) and as many text octets as the length eld indicates. The dierent SDES items are encoded according to a type-length-value scheme. Currently, CNAME, NAME, EMAIL, PHONE, LOC, TOOL, NOTE, and PRIV items are de ned in RFC1889. The CNAME item is mandatory in every SDES packet, which in turn is a mandatory part of every compound RTCP packet. Like the SSRC identi er, a CNAME must dier from the CNAMEs of every other session participants. But instead of choosing the CNAME identi er randomly, the CNAME should allow a person or a program to locate the source by means of its contents (usually the complete email of the user).

18

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CNAME=1 | length | user and domain name ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Fig. 7. Canonical Name 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P| ST | PT=APP=204 | length L | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | name (ASCII) N | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | application-dependent data A ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Fig. 8. Application Speci c RTCP Block e. Application-De ned RTCP Packets (APP) The APP packet is intended for experimental use at developing new applications and features. Once a new APP RTCP packet type is tested and found useful, it should be registered with the Internet Assigned Numbers Authority (IANA) as an original packet type. The 5-bit subtype (ST) eld allows a set of APP packets to be de ned under one unique name or provides any application-dependent data. A 4-octet name (N) chosen by the person de ning the set of APP packets to be unique with respect to other APP packets this application might receive. Application-dependent data may or may not appear in an APP packet.

19 B. Pruning in Multicast Distribution 1. DVMRP Routing Protocol The Distance Vector Multicast Routing Protocol (DVMRP) is the MBONE's original IP Multicast routing protocol. It was designed to run over both multicast capable LANs (like Ethernet) as well as through non-multicast capable routers. In the latter case, the IP Multicast packets are \tunneled" through non-multicast capable routers as unicast packets. This replicates the packets and has an eect on performance but has provided a temporary solution for IP Multicast routing on the Internet while router vendors decide to support native IP Multicast routing. Most of the new routers are multicast capable; however, there is still a limited deployment of multicast on Intranets since the Internet backbone does not perform multicast routing yet. DVMRP still uses a single routing table to make forwarding decisions for multicast octets. Thus, it does not consider properties of individual sessions, such as clustering of receivers, in computing the distribution tree. 2. Pruning Every multicast packet has a time-to-live eld that indicates the number of hops that the packet can go through before it is discarded. In the absence of pruning, a multicast packet will also reach unsubscribed receivers located at less than n hops away from the sender, n being the time-to-live of the packet. Pruning is the \trimming" of multicast distribution trees to span only subnets that have users subscribed to a multicast address. Initially, the distribution tree spans all the users located at less than n hops away from the source. Multicast routers who have no subscribed users send a Non-Membership-Report up the distribution tree to their \parent" router. This process is constantly repeated until the multicast distribution tree of each multicast

20 stream spans exactly the subnets where receivers are located.

21 CHAPTER III CAFEMOCHA As mentioned earlier, CafeMocha is the hierarchical videoconferencing tool developed by Sazzad, Brown and Shroeder at TAMU [3]-[7]. The encoder encodes frames at two dierent resolutions. The base layer is encoded using the codec developed by Tim Dorcey for the popular CUSeeMe videoconferencing tool [13],which encodes 160x120 frames at 16 grayscale levels. Sazzad's pyramidal encoder adds the information necessary to form a corresponding 320x240 frame at 16 grayscale levels [5]. The 8x8 blocks forming the 160x120 picture are designated as base blocks, and the 16x16 pyramidal dierence blocks are designated as pyramidal or large resolution blocks. Two layers are not adequate to fully simulate a hierarchical stream. To further research multicast transmission of hierarchically coded data, additional layers were added to CafeMocha. Two important changes were made to the codec. The base layer was split into two layers, providing users with three grayscale (B&W) quality levels. Color information corresponding to each layer was transmitted in three new layers. Thus, the nished codec has six layers. This chapter describes the initial encoding mechanism and the changes implemented to get to the nal version of CafeMocha. A. The Encoding Mechanism 1. Base Layer The base layer uses regular CUSeeMe coding [13]. Frames are 160x120 coded at 16 levels of gray. Each pixel is represented by 4 bits. The picture is divided into 8x8 blocks that are conditionally replenished; a block is not encoded and transmitted

22 unless it exhibits enough change according to an appropriate distance measure. The source maintains a copy of the current video frame at the receiver. That copy is exactly the same as the receiver's rendered frame if the receiver does not experience any losses. A distance is computed between the current 8x8 block and its counterpart at the receiver using a distance measure. If the computed distance is larger than a given threshold, the block is losslessly compressed and put in a buer for later packetization and transmission. A forced transmission mechanism is used to make sure that all blocks are refreshed at least once every n number of frames. It is important to note the high lowering in bandwidth caused by the use of conditional replenishment. A high threshold considerably reduces the bandwidth used at the cost of having \jerky" motion and many \coding artifacts". A low threshold causes more blocks to be sent, thus increasing the video quality while raising the output bit rate. There is one dierence in the way the base stream is encoded from the standard CU-SeeMe coding. In the standard encoder, blocks chosen for forced transmission are determined by the number of frames since the block was last transmitted. This requires a signi cant overhead as transmission statistics must be kept for each of the 300 blocks in a frame. The Sazzad implementation eliminates this overhead by using a probabilistic approach in which each block is transmitted with probability p, where 0:01 p 0:1. In this work, a value of 0:03 was adopted for p; on average, a block will be updated at least once every thirty-three frames. A 0:05 value was used by Brown [3]; whereas, a lower value was judged appropriate for the high motion sequences used in our testing.

23 2. Enhancement Layer The enhancement layer has a frame size of 320x240 pixels. For each 8x8 base block, there is a corresponding 16x16 pyramidal pixel block. If a base layer 8x8 block is to be transmitted, the dierence between an upscaled version of the 8x8 block and its enhancement counterpart is also transmitted after being compressed through quantization and run-length coding. Fig. 9 shows the process graphically. To decrease bandwidth, small pyramidal dierences are mapped to zero depending on a quality level parameter. For a quality level of zero, the coding is lossless. For a quality level of 15, all the dierences are mapped to zero. A quality level of 3 was found to be optimal by Sazzad [5]. Shroeder developed the Predictive/Adaptive Rate Control (PARC) algorithm which sets a target rate for the pyramidal layer. PARC keeps the data rate on the pyramidal layer close to the target rate by varying the quality parameter from frame to frame [4]. Fig. 10 shows a block diagram of the whole coding process. Encoded data is placed in channel buers until at least 1000 bytes are available for transmission. This packet size is large enough to avoid excessive block overhead, and at the same time small enough to avoid IP packet fragmentation. 3. Encoder Performance a. Base Layer Table I taken from Sazzad's thesis [5], shows the average bandwidth utilization for the video sequence commonly known as \Miss America." The \Miss America" sequence shows a \head and shoulder" video of a woman talking. The subject exhibits limited motion in that she moves her lips a little while talking, blinks occasionally and performs a small left to right movement with her body. The background is uniformly

160

320

320

120

Upsample Buffered Base Frame 8x8 blocks Subtract

240

Present Base Frame 8x8 blocks

Compute the distance between the corresponding blocks If the distance > threshold => encode base block then encode pyramidal difference Upsampled Base frame

Current Large Fram

16x16 blocks

16x16 blocks Encode the corresponding pyramidal difference blocks

Fig. 9. Sazzad's Pyramidal Encoding Scheme 24

PYRAMIDAL COMPRESSION

FRAME CAPTURE

PACKETIZATION

Quality Control

320x240 Subtraction Upscaled 16x16

Raw 16x16

"Lossy" Compression of Pyramidal Difference

RTP Header

Pyramidal Buffer

Transmit if the buffer size > 1000 bytes or if there is a residual when frame is finished

Decimation

Pyramidal Stream

NETWORK LAYER

RTP Header 160x120 Conditional Relpenishment on 8x8 blocks of 4 bit pixels

*Set RTP Timestamp

8x8

Lossless Compression

Baseline Buffer

Baseline Stream

*Increase RTP Seq. Number *Set RTP Last Packet Marker

Fig. 10. Frame Capture, Compression, and Packetization for Pyramidal Coder [3] 25

26 black. Table II also taken from Sazzad's thesis [5], shows the average bandwidth utilization for the video sequence commonly known as \Salesman." The sequence shows an upper body shot of a person. The subject is holding a small box. His entire upper body is visible. The background is full of details. The bandwidth is shown for all quality values. Bandwidth values do not include the RTP/RTCP overhead. Table I. Average Uncompressed and Compressed Rates for the \Miss America" Sequence Using Various Distance Threshold Values [4] Threshold Uncompressed Rate Compressed Rate (Kbits/frame) (kbits/frame) 0 81.6 34.0 10 34.7 18.2 20 22.5 11.9 30 16.7 8.5 40 14.0 6.9

Ratio 2.40 1.89 1.90 1.96 2.02

Table II. Average Uncompressed and Compressed Rates for the \Salesman" Sequence Using Various Distance Threshold Values [4] Threshold Uncompressed Rate Compressed Rate (Kbits/frame) (Kbits/frame) 0 81.6 45.5 10 25.6 15.0 20 17.4 10.4 30 14.6 8.5 40 13.3 7.7

Ratio 1.79 1.71 1.72 1.73 1.73

27 Tables I and II show that the CU-SeeMe codec provides a compression ratio of at least 1:7 for typical sequences. The threshold value has a very clear eect on the bit rate. In our further tests, the threshold is set to 60. Our test sequences have more motion than the \Miss America" and \Salesman" sequences. The frame rate used is low making changes high between consecutive real-time frames. The choice of the threshold value is empirical, it was found to provide a good visual ow of the received video while reducing the percentage of encoded blocks. b. Pyramidal Layer Fig. 11 shows the temporal bandwidth of the test sequence actually used for the layer control testing (discussed in the next chapter), which was rst used by Brown. Taken from the movie \When Harry Met Sally" [32], it starts with two minutes of low motion where Harry and Sally are talking on the phone and watching Casablanca. Throughout the rest of the sequence, there is a wide variance of motion. The sequence ends after the restaurant scene when the elderly lady says: \I'll have what she is having." The pyramidal layer is encoded with a constant quality of zero in Fig. 11. Table III shows the average bandwidth per frame for the test sequence run at three frames per second for dierent quality values. The average rate per quality level is drawn in Fig. 12. 4. Generalizing Assumption While splitting the layers and adding new ones, we assumed the following: 1. A coding mechanism for 8x8 base blocks is available. 2. For each coded base layer block, a 16x16 pyramidal block can be hierarchically encoded using Sazzad's mechanism.

28

600 Base Layer Pyramidal Layer 500

Rate (kbps)

400

300

200

100

0 0

100

200

300 400 Time (seconds)

500

600

Fig. 11. Test Sequence, Pyramidal Layer Coded at Quality 0

700

29

Table III. Average Temporal Bandwidth use of the \When Harry Met Sally" Test Sequence for Dierent Quality values at 3 frames/second Average Standard Minimum Maximum Quality Bandwidth Deviation Bandwidth Bandwidth (kbps) (kbps) (kbps) (kbps) 0 289.69 203.18 15.47 888.01 1 157.94 145.66 5.51 659.93 2 98.46 101.79 2.82 460.40 3 63.63 67.46 1.60 315.82 4 41.79 41.23 2.01 206.02 5 28.84 25.52 1.01 116.02 6 20.75 16.09 1.36 76.02 9 13.39 7.90 0.73 33.05 12 12.13 7.09 1.03 28.69 14 12.05 7.04 0.92 28.40 15 0.02 0.00 0.02 0.02

30

300

250

Rate (Kbps)

200

150

100

50

0 0

5

10

15

Quality Parameter

Fig. 12. Average Rate per Quality Value for the \When Harry Met Sally" Test Sequence at 3 frames/second

31 If the above two assumptions are met, any hierarchical encoding scheme can be used. Sazzad suggests the use of a DCT based encoder for the base blocks claiming that the compression ratio can be brought up to 7 or 8 as compared to the current value of 1:7 [5]. B. Split of the Base Layer Two approaches are investigated for replacing the rst layer by two sub-layers. The rst approach makes use of block distance measures, the second makes use of a maximum bandwidth constraint on the new base layer. 1. Threshold Separation The rst proposition is simple. In addition to the block replenishment threshold, 1 , a separation threshold, 2 is used, where 2 > 1 . If the distance measure is larger than the separation threshold, 2 , the base 8x8 block is sent over the small layer; otherwise, it is sent over the medium layer. Thus, a user receiving layers 1 and 2 would actually be receiving the entirety of the old base layer. The change requires no modi cation to the way the enhancement layer is coded/decoded. In the rst tests, 1 was xed and 2 was adaptively changed to insure that the replenishment blocks were divided equally between the two new layers. Fig. 13 is a ow chart of the process described above. The threshold 2 is updated every 3 seconds using the algorithm listed in Fig. 14. If the rate on the small layer is 10% larger than the rate on the medium layer, 2 is increased by 10, increasing the distance range of the medium layer (a block is sent on the medium layer if 1 distance < 2). In the opposite case, the separating threshold is decreased by two. The control values were chosen arbitrarily. Note that

32

Compute the Distance on Current Base Block

Distance > Threshold

No

Yes

Distance > Separation Threshold ?

No

Send Base Block

Yes

Send Base Block On Small Channel

On Medium Channel

Send corresponding Pyramidal Block on Large Channel

No

Last Block Processed ?

Yes, end processing for current frame

Fig. 13. Flow Chart for Splitting Scheme Number 1

33 the algorithm reacts more aggressively if the rate on the medium layer is higher than the rate on the small layer. if ByteCountOnSmall > 1:1 ByteCountOnMedium 2 = 2 + 10 if 2 > 125 2 = 125 if ByteCountOnSmall < 0:9 ByteCountOnMedium 2 = 2 ? 2 if 2 < 1 2 = 1 Fig. 14. Algorithm for Adapting the Value of the Separation Threshold in Splitting Scheme Number 1 Fig. 15 shows the bandwidth rates for the new small and medium layers. Although the control process was not nely tuned, it yielded good results. The base layer was split into two layers of approximately equal size. The bias of the control algorithm toward sending more data on the medium layer is apparent, the average data rate on the small layer was 36Kbps compared to 44.5Kbps on the medium layer. This scheme was abandoned because the second alternative (described next) was found to be superior in both ease of control and visual quality of the received picture. 2. Use of a Bandwidth Limit on the Base Layer The second approach is a bit more complicated. A maximum bandwidth is de ned for the new small layer in terms of bytes per frame. Base layer blocks are sent over the small channel until the allowable number of bytes per frame is reached. The

34

200 180

Medium Layer Small Layer

160

Rate (kbps)

140 120 100 80 60 40 20 0 0

100

200

300 400 time (seconds)

500

600

700

Fig. 15. Test Sequence Coded Using Scheme Number 1 remaining replenishment block are sent on the medium layer. Again, a user receiving layers one and two would actually be receiving the entirety of the old base layer. To insure that the receiver of the base layer always receives full-screen information, the starting point of the search for blocks that need to be replenished must be changed for each new frame (explained below). Every time a base block is encoded to be sent on either the base or the medium layer, the corresponding pyramidal dierences needed to construct its corresponding 16x16 large layer block are also encoded on the large layer. Two complementary buers are used to compute the distance needed in conditional replenishment. 1. The \small" buer keeps track of the image of users receiving the small layer. Every time an 8x8 block is to be sent on the small layer, the buered image is

35 used to compute the distance. If the block is sent, both the small and medium buers are updated. Note that before subscribing to the medium layer, a user always subscribes to the small layer. 2. The \medium" buer keeps track of the image of users receiving both the small and medium layers. When the allocated bandwidth is exhausted on the small layer, the distance is calculated using this buer. If the block is sent, only the medium buer is updated. This process creates some redundancy because updates targeted to the receivers of the small layer are also received by the medium layer. The redundancy was found to be around 2%. If the motion is low, all 8x8 base blocks are sent at the required bandwidth on the rst layer. If the motion is high, a user receiving only the small layer receives updates of successive portions of the screen without any loss of quality at the base layer. Fig. 16 is a ow chart of the splitting process. One problem that might arise in this case is that the received small layer picture might have clear horizontal separation at the start and end of successive updates because successive portions of the display are coded from successive temporal frames. This will happen in high motion sequences, which is not what a videoconferencing tool typically handles. Fig. 17 shows how blocks of two consecutive frames might be sent. In frame number one, the gray blocks are sent on the base layer and the black blocks are sent on the medium layer. The byte count allocated for the base layer is exhausted at line seven. In frame number two, scanning of blocks to be sent starts at block line eight. Blocks are sent on the small layer until the allocated byte count is reached on block line thirteen. Blocks on lines fourteen to fteen and one to seven that exhibit enough motion are sent on the medium layer.

36

Bandwidth Available on the Small Layer ?

No

Compute the Distance on Current Base Block Using Medium Buffer


Yes

Compute the Distance on Current Base Block Using Small Buffer

No


Yes

Yes Send Base Block On Small Channel

Send Base Block On Medium Channel

Update Both Medium and Small Buffers

Update Medium Buffer


No


Last Block Processed ?

Yes, end processing for current frame

Fig. 16. Flow Chart for Splitting Scheme Number 2

37

7

Frame 1

111 000 000 111 000 111 000 000111 111 000 111 000 111

8x8 Block Sent on the Small Channel

111 000 000 111 000 111

8x8 Block sent on the Medium Channel

Frame 2

111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 000 111 000 111 000111 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111

8

13

111 000 111111 000 000 000 111 000 111 000 111 000 111 000 111 000 000111 111 000111 111 000 111 000 000 111 000 111 000 111 000 000111 111 000111 000111 111 000 111 000 000111 111 000 111 000 111 000 000 000 111 000 000111 111 000111 111 000 111 000 000 111 000111 111 000 111 000 111

111 000 000 111 000 111 000 000 000111 111 000111 111 000 111 000 111

111 000 000 111 000 111

Fig. 17. Example of Base Block Distribution over Two Consecutive Frames in Bandwidth Limit Scheme

38 To test this scheme, the \When Harry met Sally" test sequence is used to determine the encoder's performance under a wide variety of motion levels. The bandwidth limit on the base layer was set to 20Kbps. Fig. 18 shows the results. The base layer is always close to 20Kbps. If very low motion is exhibited, all the blocks are sent on the small layer and none are sent on the medium layer. 250 small layer medium layer

Bandwidth (kbps)

200

150

100

50

0 0

100

200

300 400 Time(seconds)

500

600

700

Fig. 18. Test Sequence Coded Using Scheme Number 2 This splitting scheme was preferred to the splitting scheme mentioned earlier. Beside being easy to control, the output is constant at an assigned bandwidth value. The new small layer will be the base layer to which all users need to subscribe. It has a low, nearly constant, bit rate allowing reliable access for all low-bandwidth users.

39 C. Color Enhancement The other enhancement made is the transmission of color information for each luminance layer (the grayscale information). That is done by coding the U and V color bands. The Y-band, which is the luminance, is already encoded in the three grayscale layers. Note that any color picture can be encoded in several dierent formats, each corresponding to a dierent colorspace. The most simple is the red-green-blue RGB colorspace. Another possible colorspace is the Y UV format in which the luminance information is located in the Y band (black & white component) and the color information is located in the U and V chrominance bands. The Y UV format is widely used because it allows straightforward reconstruction of the black and white version of a color picture. U and V represent the color dierence signals B ? Y and R ? Y (B and R are the Blue and Red components in the RGB colorspace). In the digital domain, as speci ed in [27], the U and V color dierences are referred to as Cb and Cr . 1. YUV422-like Color Scheme A distribution similar to the NTSC YUV 4:2:2 is used. The only dierence is that pixels are encoded over 4-bits instead of 8-bits per datum. Table IV shows the encoding for each set of two pixels. Every pixel is represented by a 4-bit datum on the Y-band. Every two consecutive pixels share the same color U-band and V-band data. Table IV. Modi ed YUV 4:2:2 Encoding 4 bits Y-band 4 bits U-band 4 bits Y-band 4 bits V-band Pixel 1 Both Pixels Pixel 2 Both Pixels

40 The U and V nibbles are grouped together to form a \UVUVUV . . . " data stream on the base layer. The compression algorithms and routines used for the grayscale layers are also used for the color layers. For each 8x8 base block that is encoded, an 8x8 UV block is also encoded. This is possible because for every 8 bits of Y-data (2 pixels), a 4-bit U-datum and a 4-bit V-datum need to be encoded. Intuitively, it can be assumed that the new streams have more vertical redundancy than the luminance streams. Results have con rmed that the compression algorithms are well suited for these streams. Fig. 19 shows the data available through each band for two consecutive pixels in the base frame, and their corresponding 4x2 pixels in the large frame. Note that if a receiver elects to receive the base and pyramidal Y-band layers, the reception of only the base frame color information would mean that a 4x2 block of pixels in the reconstructed large frame would share the same color U and V data (U12,V12 in Fig. 19). Base Pixels Y1

Upscaled Data

Y2

Y1

Corresponding Pyramidal 8 Pixels Y1a

Y1b

Y2a Y2b

Y1c

Y1d

Y2c Y2d

Y2

U12

U1ab

U2ab

U1cd

U2cd

V1ab

V2ab

V1cd

V2cd

U12

V12

V12 Fig. 19. Color Scheme 1

41 In Fig. 20, the physical order of the data found in Fig. 19 is shown. An interesting problem is noticeable. The base color layer data order is set to \UVUVUV . . . ." The same pyramidal encoding routine used for encoding the Y-band is used again for the pyramidal dierences on the UV-bands. If the large color layer's data order is also set to \UVUVUV . . . ," then pyramidal dierences will not be minimized since some data of a particular band would be subtracted from upscaled data of another band. Compression through run-length coding would not be eective since dierence values are likely to be large. This problem can be easily solved by adopting for the large layer the \UUVVUUVV . . . "data order shown in Fig. 21 before calling the encoding routines. Upscaled Base Pixels

Base Pixels Y1

Y2

Upscale

Y1 Y1

Y1

Y2

Y2

Y1a

Y1b

Y2a Y2b

Y1

Y2

Y2

Y1c

Y1d

Y2c Y2d

Corresponding Shared Color Data

U12 V12

Upscale

Corresponding Pyramidal 8 Pixels

Corresponding Color Data

U12

U12

V12

V12

U1ab V1ab U2ab V2ab

U12

U12

V12

V12

U1cd V1cd U2cd V2cd Normal Data Order

Fig. 20. Upscaling Issues when Using the Same Pyramidal Coding Routine U1ab U2ab V1ab V2ab U1cd U2cd V1cd V2cd

Fig. 21. Byte Order for The Pyramidal Layer Color Data

Table V. Possible Layer Combinations in Scheme 1 B&W

Small UV-band Small UV-band & Small UV-band & (Bandwidth Medium UV-band Medium UV-band & Limited) Large UV-band Small Y-band B&W YUV 4:2:2 (bandwidth limited) 160x120 160x120 Small Y-band & B&W YUV 4:2:2 YUV 4:2:2 Medium Y-band 160x120 160x120 160x120 Small Y-band & B&W YUV 4:1:1 YUV 4:1:1 YUV 4:2:2 Medium Y-band & 320x240 320x240 320x240 320x240 Large Y-band

42

43 The nished encoder/decoder has the possible layer combinations shown in Table V. The UV-band pixel information is shared by each two consecutive pixels in the YUV 4:2:2 format. The YUV 4:1:1 format shown in the table corresponds to the reception of complete Y-band information for the large frame with only base frame color data. In this case the color data is upsampled to display a large color frame. Each color data that was shared by two base pixels is consequently shared by their upscaled eight corresponding pixels (4x2 block) in the large frame. 2. YUV444-like Color Scheme A distribution similar to the NTSC YUV 4:4:4 was also tried (a compile-time parameter determines which color scheme is used by CafeMocha) . Table VI shows the encoding for each set of two pixels. Table VI. Modi ed YUV 4:4:4 Encoding 4 bits Y-band 4 bits U-band 4 bits V-band Pixel 1 Pixel 1 Pixel 1 The U and V nibbles are again grouped together to form a \UVUVUV . . . "... stream for the small and medium layers. The small and medium layers are formed of small frame 8x8 blocks. The same compression algorithms used for the base and pyramidal grayscale (Y-band) blocks are again used for the color layer blocks. Every time an 8x8 base Y-band block is encoded, two 8x8 base UV-bands blocks are encoded. To each pixel corresponds a Y-band 4-bit datum, a U-band 4-bit datum and a V-band 4-bit datum. Fig. 22 shows the information available for each base block pixel, its upscaled in uence on the large frame, and the corresponding 4 pixels in the large frame.

44

Base Pixels

Upscaled Data

Corresponding Pyramidal 8 Pixels

Y

Y

U

Ya

Yb

Yc

Yd

Ua

Ub

Uc

Ud

U

V

V

Va

Vb

Vc

Vd

Fig. 22. Color Scheme 2 The upscaling problem encountered for the previous color scheme is present here too. Again, adopting a \UUVVUUVV . . . " data order on the large color layer solves the problem. Our nished encoder/decoder has the possible layer combinations shown in Table VII. UV-band pixel information is complete for each pixel in the YUV 4:4:4 format and subsampled by a factor of 4 in our YUV 4:2:0 format. Every base U and V pixel data corresponds to a 2x2 block of pixels in the large frame due to the up-sampling of the U and V bands. Note that the actual YUV 4:2:0 format has a similar distribution, but each datum is represented by 8 pixels. Tables VIII and IX give the average data rates for every layer when coding the \When Harry met Sally" test sequence at three frames per second. Surprisingly, the rate on the large Y-band layer is the same at an average of 26Kbps for all quality values above zero.

Table VII. Possible Layer Combinations in Scheme 2 B&W

Small UV-band Small UV-band & Small UV-band & (Bandwidth Medium UV-band Medium UV-band & Limited) Large UV-band Small Y-band B&W YUV 4:4:4 (bandwidth limited) 160x120 160x120 Small Y-band & B&W YUV 4:4:4 YUV 4:4:4 Medium Y-band 160x120 160x120 160x120 Small Y-band & B&W YUV 4:2:0 YUV 4:2:0 YUV 4:4:4 Medium Y-band & 320x240 320x240 320x240 320x240 Large Y-band

45

46

Table VIII. Average Temporal Bandwidth Rates of the Test Sequence for the Base Layers at 3 frames/second Using Color Scheme 2 Average Standard Minimum Maximum Bandwidth Deviation Bandwidth Bandwidth (kbps) (kbps) (kbps) (kbps) Small Y-band 19.0583 2.9705 4.5400 20.6000 Small UV-band 17.4191 3.5874 4.7700 25.4100 Medium Y-band 72.9547 51.2786 0.0200 202.5200 Medium UV-band 63.1020 40.5733 0.0200 161.7400

47

Table IX. Average Temporal Bandwidth Rates of the Test Sequence for Dierent Quality Values at 3 frames/second Using Color Scheme 2 for the Large Layer Average Bandwidth (kbps) Quality 0 Y-band 298.8482 Quality 0 UV-band 130.1607 Quality 1 Y-band 164.2556 Quality 1 UV-band 26.1719 Quality 2 Y-band 104.3429 Quality 2 UV-band 26.1369 Quality 3 Y-band 67.7410 Quality 3 UV-band 25.9243 Quality 4 Y-band 46.7503 Quality 4 UV-band 26.0530 Quality 6 Y-band 23.3742 Quality 6 UV-band 26.0112 Quality 9 Y-band 14.5403 Quality 9 UV-band 26.0619 Quality 12 Y-band 13.1457 Quality 12 UV-band 26.0136

Standard Minimum Maximum Deviation Bandwidth Bandwidth (kbps) (kbps) (kbps) 184.2245 15.6900 798.5700 72.0237 7.9800 319.6100 142.3516 4.7100 618.9300 13.5566 1.5800 54.4400 103.3914 4.2500 500.8600 13.8477 1.6900 57.1300 68.4569 1.9500 340.7800 14.0385 1.6200 57.4100 45.3144 2.0600 226.0400 13.8569 2.0100 57.0900 17.3064 1.2100 84.4300 14.0056 2.0300 57.4500 7.9572 0.7600 32.7400 13.9055 1.5000 57.4700 7.0469 0.7400 28.4400 13.9504 1.4500 56.6200

48 D. Correlation Between Y-band Layers and their Corresponding Color Layers Since the same number of blocks will be sent on any Y-band layer and its corresponding UV-band layer, then the rate on both will be highly correlated. Fig. 23 and Fig. 24 con rm our intuition. Over the small layers, the sample mean of the ratio of the octet count sent on the color layer to the one sent on the Y-band layer is 0:91 with a sample standard deviation of 0:11. Over the medium layers, the same ratio is 0:89 with a sample standard deviation of 0:10. 2 1.8 1.6 1.4

Ratio

1.2 1 0.8 0.6 0.4 0.2 0 0

100

200


500

600

700

Fig. 23. Ratio of the Rate on the Small UV-band Layer over the Rate on the Small Y-band Layer during the \When Harry Met Sally" Test Sequence The Large UV-band layer, shows a relatively low bit rare when compared to the total bit rate of the layers that are added before it. No strict correlation was found between the large UV-band layer and it's Y-band counterpart. However, it was observed that the bit rate on the former is less than the bit rate on the latter most

49

2 1.8 1.6 1.4

Ratio

1.2 1 0.8 0.6 0.4 0.2 0 0

100

200


500

600

700

Fig. 24. Ratio of the Rate on the Medium UV-band Layer over the Rate on the Medium Y-band Layer during the \When Harry Met Sally" Test Sequence of the time. Thus we will do rate control exclusively on the Y-band layers. The rate on the corresponding color layers will be controlled simultaneously. E. Processing of Incoming Packets at the Decoder Fig. 25 shows the decoding process at the receiver side. The process is event driven. Arrival of a packet on one of the subscribed layers instigates an action at the receiver. Incoming packets are buered until the receiver knows that no more packets are incoming for that layer. A receiver detects that no more packets for the current frame are available by checking for the arrival of the end of frame bit markers on all the layers, or by detecting the arrival of a packet corresponding to a frame that has

50 a greater timestamp. When decoding, the data of the base frame (small and medium layers) should be decoded before the data of the large frame. Decoded pyramidal dierences are meaningless if the corresponding base blocks have not been decoded. F. The PARC Algorithm The Predictive/Adaptive Rate Control algorithm developed by Schroeder [5] can maintain the rate of the pyramidal stream at a \Target Rate" set at the source by sending consecutive frames at dierent quality levels. Brown used this \Target Rate" to adapt the rate of the large layer to avoid congestion. Fig. 26 shows the target rate and the actual rate of the pyramidal layer. Brown's approach was to add 2Kbps to the Target Rate in the absence of congestion, and to multiply it by 0:6 whenever congestion is reported. Fig. 26 shows how the PARC algorithm performs in the case of the test sequence. PARC's target rate was set to 96 Kbps, the target quality was set to 0 (the encoding quality parameter can vary between the \target quality" and 15). The bandwidth of the layer is maintained at nearly 96Kbps for most of the time. Even in very high motion, the PARC algorithm still limits excessive bandwidth use. Between seconds 380 and 450, the consumed bandwidth is brought down from 700Kbps at quality 0 to an average of 170Kbps. Note that the algorithm's performance relies on the ability to predict what quality values can be used for encoding the pyramidal dierences. Bad performance indicates a failure to predict how many octets are needed to encode the processed frame at a given quality q. This can happen when unusually high motion is encountered as in our test sequence as occurs between seconds 380 and 450. PARC is a good control scheme for the pyramidal layer. However, the aggregate rate of the Y-band layers would exhibit higher variability mainly because of the Y-

RTP Packet Received

Timestamp < Current Timestamp ?

Yes

Ignore Data

Return

Uncompress All the available information in the buffers that need to be decoded according to the current subscription level. Decode in this order: small, medium then large Y-band small, medium then large UV-band

No

Timestamp > Current undisplayed Timestamp ?

Yes

No

Save the Data into the corresponding Channel's Buffer

Last Packet Marker ?

No

Have we received the last packet of every layer ?

Yes

No

Yes

Uncompress All the buffers that need to be decoded according to the subscription level. Decode in this order: small, medium then large Y-band small, medium then large UV-band

Return

Render Image at the resolution required by the current subscription level

Return

Fig. 25. Decoding of the Incoming Packets 51

52

250

Bandwidth (Kbps)

200

150

100

50 Large Y−band Layer Target Rate 0 0

100

200


500

600

Fig. 26. PARC Algorithm Used for Congestion Avoidance

700

53 band medium layer. G. Stabilization of the Total Output Rate Using PARC We enhanced the eectiveness of Shroeder's PARC algorithm [4] by estimating the bandwidth used on the base layer and then setting PARC's target rate to the desired global target rate minus the base layer's rate. The total output rate of the encoder was greatly stabilized as can be seen in Fig. 27. 300

250

Bandwidth(kbps)

200

150

100

All Y−data Target Rate

50

0 0

100

200


500

600

700

Fig. 27. Total Bandwidth Used by the Y-band Layers (Small&Medium&Large) Controlled by the Enhanced Use of the PARC Algorithm under a Total Rate Limit of 200Kbps The control process is simple. We estimate the bandwidth on the base layer through a rst-order low-pass lter estimator with gain g. PARC's target rate is then set to the dierence between a maximum rate set by the user and the calculated

54 estimate. The update equations are listed in Fig. 28. A minimum rate of 16Kbps was assigned to the Y-band pyramidal layer during our tests. The assigned global rate is only relevant to the Y-band layers. We chose a value of 0:7 for g, the update equations are run every time a frame is encoded.

EstimateOnBase = (1 ? g)EstimateOnBase + g(OctetCountSmall&Medium) PARCTargetRate = MaxRate ? EstimateOnBase PARCTargetRate = min(2000Bytes; PARCTargetRate ) Fig. 28. Equations for the Enhanced PARC Control Process As will be seen in Chapter V, the large Y-band layer will be joined after all the Y-band and UV-band base layers have been successfully added. That level of subscription will be labeled \Level 4." In that case, the OctetCountSmall&Medium value in Fig. 28 is replaced by the octet count on both the Y-band and UV-band, small and medium layers. A reasonable value for the parameter PARCTargetRate is 300 Kbps as can be seen in Fig. 29. The rate at \Level 4" exceeds the target rate 28% of the time. But, if we disregard the period between seconds 400 and 480 where an unusual amount of motion is present, and if we allow the algorithm an error of 10%, the percentage of time spent above the target rate drops to 8:0%. The poor performance of the algorithm is easily explained by the fact that the aggregate bandwidth of the Yband small, Y-band medium, UV-band small and UV-band medium can signi cantly exceed the target rate as can be seen in Fig. 30. This bandwidth limiting scheme was used during the simulation of the various layer control schemes.

55

500 450 400

Bandwidth(Kb/s)

350 300 250 200 150 100 Target Rate Bandwidth at Level 4

50 0 0

100

200


500

600

700

Fig. 29. Total Bandwidth Used by all Base Layers (Y-band Small&Medium, UV-band Small & Medium) and the Large Y-band Layer Controlled by the Enhanced Use of the PARC Algorithm under a Total Rate Limit of 300Kbps 500 450 400

Bandwidth(Kb/s)

350 300 250 200 150 100 Target Rate Bandwidth at Level 3

50 0 0

100

200


500

600

700

Fig. 30. Total Bandwidth Used by all Base Frame Layers (Y-band Small&Medium, UV-band Small & Medium)

56 CHAPTER IV CONTROL SCHEMES FOR THE LAYERS A. Overview The objective of any multicast video distribution scheme is to reach a fair distribution of the video data. Cheung et. al. [25] de ne fairness as the reception by each receiver of video stream quality commensurate with its capabilities or the capabilities of the paths leading to it. In one-to-many multicast video, diculties stem from the realtime nature of the digital video and the potential for a large number of receivers with heterogeneous capabilities. Approaches to deal with these problems can be divided into two categories: a) the use of a network capable of resource reservation, and b) the use of feedback control to adjust video stream requirements to meet current network capabilities. In this thesis, we focus on the latter approach, mainly because of the open-to-all non-reservation nature of the Internet and its multicast backbone (MBONE). Through packet loss measurement, receivers try to get their optimal video quality (and thus use as much of the available bandwidth as possible). Optimal protocol behavior can be measured by running one-to-one sessions. In the absence of interference between clients the protocol should behave at its best. The performance can then be compared to each receiver's behavior in an actual one-to-many videoconferencing session. Streams hierarchically encoded and multicast distributed provide good granularity of control. Clients can add and drop layers until they receive the best video quality that they can handle. Multicasting each hierarchical layer on a separate IP multicast group eliminates most (but not all) of the \interference" between receivers.

57 \Interference" between clients can occur when a client elects to add a layer and inadvertedly congests a link on its reception path. All the clients located at the same side of the link as the \interfering client" would experience packet loss and lower video quality. A good protocol should be able to recognize transient packet losses due to clients \trying to join" extra layers. We call \join experiments" the act of subscribing to a new video layer by a client. In the absence of resource reservation schemes or router bandwidth evaluation agents, a receiver cannot be sure beforehand whether its join experiment will succeed or fail. McCanne et. al. [8] have proposed and implemented the Receiver-driven Layered Multicast (RLM) Protocol, a non-reservation active receiver experiment scheme whereby additional layers are periodically added in the absence of signi cant packet loss. RLM was simulated for constant bit rate (CBR) sources [8]. The challenge of implementing it with the CafeMocha encoder is the variable bit rate of most of CafeMocha's layers. The variability of the bit-rate is minimal for the small Y-band and UV-band layers. It is video motion dependent for users receiving any of the medium layers (Y-band or UV-band). The large layer can also be a source of rate variability unless the Enhanced-PARC algorithm is used. The Enhanced PARC control mechanism stabilizes the overall rate for receivers accepting all small, medium, and large layers. In this chapter, we rst give a brief overview of RLM. Then, we describe a basic one-to-one control scheme implemented in preliminary tests. We conclude by listing the metrics selected to evaluate the protocol's performance.

58 B. RLM 1. Description Under RLM, each receiver adapts individually to observed network performance by adjusting its level of subscription within the overall layered multicast group structure. Simply put, each receiver runs the following simpli ed control loop: 1. on congestion, drop a layer. 2. on spare capacity, add a layer. Link capacity is inferred by carrying out active experiments. These experiments consist in the spontaneous addition of layers at well chosen times. If a join-experiment causes congestion, the receiver drops the layer that was just added and deems the experiment a failure. In case of success (no congestion occurs), the receiver stays at the new level of subscription. Optimal operating points are normally just below the congestion point of the link (maximum link utilization). Join experiments might have a negative impact on the overall performance if they are repeated too often. An exponentially increasing delay is imposed between failed experiments. In the case of high packet losses that do not correspond to join experiments, the receiver drops layers periodically until network congestion ceases. Concurrent join experiments can mislead receivers adding lower layers. For example, assume that a receiver R1 undertakes a join experiment that causes a link to congest; and assume that a receiver R2 is conducting a similar experiment at a lower subscription level. If the path from the source to receiver R2 passes through the link congested by R1, then the packet losses would make R2 assume that it's join experiment has failed. The protocol's eciency is negatively aected by receivers

59 backing o erroneously. Consequently, receivers have to communicate and announce their join experiments. This allows receivers to observe concurrent join experiments. A user observing the failure of an experiment can infer that a similar join-experiment would not work for it (negative learning). On the other hand, positive learning does not occur in this scheme since a successful join-experiment can not be recognized by other receivers. RLM is not compatible with router packet priority forwarding schemes. If lower layer packets have more priority than any higher layer packets, packets of higher layers are selectively dropped at routers in case of congestion. This eliminates negative shared learning. When congestion occurs, other receivers would not perceive packet loss at lower layers and thus are unaware of the negative experimental outcome. Scalability problems arise if a large number of receivers interfere with each other's join experiments. The protocol is described in more detail in the next section. 2. Protocol Details Fig. 31 shows the state diagram of a receiver using RLM. A receiver can be in any one of the following states: Steady (S), Measurement (M), Drop (D), or Hysteresis (H). Each transition is labeled with the reason of the transition, either packet loss or timeout. Table X explains the variables, timers and transition causes in Fig. 31.

60

TJ (add)

TD (relax)

S _ _ . L F. R

L . F (drop)

Steady

H Hysterisis

D

TD

_ . L F. R

TD

Drop

>

TD

L Measurement

M
L
1) Join-timer relaxation constant (< 1) Minimum join-timer interval Maximum join-timer interval Detection-time estimator scaling terms Detection-time estimator lter constants Threshold used in making drop decision

Value 2.0 0.5 2500ms 60000ms 1, 2 0.25,0.25 5%

The protocol was tested under the same rate limits used previously for the basic control scheme. The \When Harry Met Sally" test sequence was broadcast under router rate limits of 500kbps, 310kbps, 200kbps, 100kbps and 50kbps. Note that the

95 ideal distributions calculated in the basic scheme's case are no longer valid because of slight dierences in the source output rate during dierent runs. We will follow the analysis methodology used for the basic control scheme with a graphical data analysis followed by a performance analysis through the computed metrics. 1. Graphical Data Analysis A 500kbps rate limit does not cause any packet loss, the receiver should add layers quickly and get to subscription level 4. Figs. 54-58 show the obtained data. Figs. 54 and 55 show the ideal and observed subscription level sample paths. Figs. 56 and 57 show the ideal and measured bandwidth usage. Fig. 58 shows the packet losses measured at the receiver during the broadcast of the test sequence. The occasional packet losses in Fig. 58 are caused by CPU pegging at receiver VCSun3. Note that at second 220 in Fig. 58 the large packet loss spike is ignored because the RLM receiver moves into a hysteresis state when losses are rst encountered. In our case, a receiver can stay in the hysteresis state for up to ten seconds; ten seconds is the observed average value of the detection timer. RLM performed much better than the basic control scheme when faced with transient packet losses due to CPU pegging. The small glitch starting at time 114 seconds lasts for only 15 seconds. This is a huge improvement on the basic scheme whose response was to drop one extra layer and stay below the ideal subscription level for approximately 80 seconds. RLM has a distinct advantage over the basic control scheme in that it is an event driven protocol. RLM can react instantly to loss events, it can add layers quicker (4 layers are added initially at 2.5 second intervals), and can react instantly when the addition of a layer causes packet loss.

96

5

4

Level

3

2

1

0 0

100

200

300 400 time(seconds)

500

600

700

Fig. 54. Ideal Subscription Levels under a Router Rate Limit of 500kbps - Testing RLM 5

4

Level

3

2

1

0 0

100

200


500

600

700

Fig. 55. Actual Receiver Subscription under RLM - Router Rate Limit of 500kbps

97

500 450 400

Rate(Kbps)

350 300 250 200 150 100 50 0 0

100

200


500

600

700

Fig. 56. Ideal Bandwidth Distribution under a Router Rate Limit of 500kbps - Testing RLM 500 450 400

Rate(Kbps)

350 300 250 200 150 100 50 0 0

100

200


500

600

700

Fig. 57. Observed Bandwidth Received under RLM - Router Rate Limit of 500kbps

98

100

Percentage of Packets Lost over 1 second

90 80 70 60 50 40 30 20 10 0 0

100

200


500

600

700

Fig. 58. Actual Packet Losses Recorded at the Receiver under RLM - Router Rate Limit of 500kbps

99 Figs. 59-63 show the data obtained from testing RLM at a router rate limit of 310kbps. Figs. 59 and 60 show the ideal and observed subscription level sample paths. Figs. 61 and 62 show the ideal and measured bandwidth usage. Fig. 63 shows the packet losses measured at the receiver during the broadcast of the test sequence. Again, for the same reasons listed when describing the performance of the basic scheme under a rate limit of 310kbps, we expect the receiver to perform poorly. The target rate of the bandwidth for the enhanced-PARC algorithm at \level 4" is 300kbps. The rate for level 4 exceeds 310kbps intermittently, the control protocol would have to be able to react very quickly in order to achieve the \ideal" subscription levels at all times. Occasional packet losses at subscription level 4 due to CPU pegging complicate the issue even further. Fig. 60 shows subscription levels that are for the most part lower than the ideal distribution as we expected in the above paragraph. The same applies when we compare the bandwidth received to the ideal bandwidth distribution (Figs. 61-62). Between time values 380 and 480, we can see how failed join experiments cause the join timer to be repeatedly doubled. When the join experiment succeeds at time value 480, the join timer of the new layer is already large enough so that the receiver only gets to its ideal subscription level at around time value 540. This is typical of the behavior of RLM and of any receiver which does not know the exact available bandwidth. Under the circumstances, RLM performs well. In contrast to the one-to-one scheme, failed join experiments cause packet losses during only three seconds. The oending layer is dropped instantly, not after ten seconds as in the case of the basic scheme. Packet losses over the experimental layer do not aect our receiver. The measured packet loss percentages after failed join experiments (as in Fig. 60 at seconds 160, 180, 240, 295, 390, 400 and 430) are measured over the subscription level that precedes and follows the failed join experiment.

100

5

4

Level

3

2

1

0 0

100

200


500

600

700


4

Level

3

2

1

0 0

100

200


500

600

700


101

500 450 400

Rate(Kbps)

350 300 250 200 150 100 50 0 0

100

200


500

600

700


Rate(Kbps)

350 300 250 200 150 100 50 0 0

100

200


500

600

700


102

100


90 80 70 60 50 40 30 20 10 0 0

100

200


500

600

700


103 Figs. 64-68 show the data obtained from testing RLM at a router rate limit of 200kbps. Like the basic scheme earlier, RLM performs well under this rate limit. Figs. 64 and 65 show the ideal and observed subscription level sample paths. Figs. 66 and 67 show the ideal and measured bandwidth usage. Fig. 68 shows the packet losses measured at the receiver during the broadcast of the test sequence. Comparing the ideal subscription sample path (Fig. 65) to the receiver's actual sample path (Fig. 65), and the ideal subscription bandwidth (Fig. 66) to the actual received bandwidth (Fig. 67) indicate visually that the protocol performed very well. As in the basic scheme's case under a rate limit of 200kbps, the actual and ideal subscription sample paths follow the same general outline: subscription at level 4 during the time period 0-110 seconds, a drop to level 2 at time 110 seconds, a return to level 3 at time 200 seconds, and similar behavior during the rest of the experiment. Packet losses are not as frequent as in the case of the basic scheme. RLM reacts much faster to failed join experiments as can be seen in Fig. 68. Packet losses after a failed join experiment last for only three seconds after the instantaneous drop by the receiver of the oending layer. The three second packet loss period is much better than the thirteen second packet loss period that follows a failed join experiment when the receiver is controlled by the basic scheme.

104

5

4

Level

3

2

1

0 0

100

200


500

600

700


4

Level

3

2

1

0 0

100

200


500

600

700


105

500 450 400

Rate(Kbps)

350 300 250 200 150 100 50 0 0

100

200


500

600

700


Rate(Kbps)

350 300 250 200 150 100 50 0 0

100

200


500

600

700


106

100


90 80 70 60 50 40 30 20 10 0 0

100

200


500

600

700


107 Figs. 69-73 show the data obtained from testing RLM at a router rate limit of 50kbps. Figs. 69 and 70 show the ideal and observed subscription level sample paths. Figs. 71 and 72 show the ideal and measured bandwidth usage. Fig. 73 shows the packet losses measured at the receiver during the broadcast of the test sequence. As we stated when studying the same case under the basic scheme protocol, the rate on the small Y-band layer plus the small UV-band layer (subscription level 1), is nearly constant at around 38kbps (except when very low motion in present as in the rst 120 seconds of the video sequence). In this case adding the medium Y-band layer would take the overall bandwidth used beyond 50kbps. Thus we ideally expect the receiver to subscribe at level 1 after the rst two minutes of the sequence. Fig. 70 shows a behavior almost identical to what we expected. After the initial low motion part of the sequence (time period 0 to 120 seconds), the receiver is carrying a join experiment at level 3; its experiment fails and the receiver drops from level 2 to level 1 after a hysteresis period. After time value 140, the receiver is subscribed at level 1 and carries out periodic join experiments at the maximum timer interval set to 60 seconds. Note that occasional packet losses, when a receiver is in the steady state, reset the join timer. This explains why the join experiments are not evenly spaced after time value 140. Fig. 73 shows typical steady state behavior of RLM. When faced with packet losses in the steady state, an RLM receiver moves into the hysteresis state and ignores all packet losses for a period of time equal to the detection timer (in our case about ten seconds). Between seconds 120 and 140, our receiver drops from subscription level 3 to subscription level 1. Packet loss percentages as high as 98% are momentarily ignored for ten seconds before causing a layer drop. A possible improvement for RLM is to drop a layer without going into the hysteresis waiting period when packet losses are higher than a threshold of 30% over two seconds for example.

108

5

4

Level

3

2

1

0 0

100

200


500

600

700


4

Level

3

2

1

0 0

100

200


500

600

700


109

500 450 400

Rate(Kbps)

350 300 250 200 150 100 50 0 0

100

200


500

600

700


Rate(Kbps)

350 300 250 200 150 100 50 0 0

100

200


500

600

700


110

100


90 80 70 60 50 40 30 20 10 0 0

100

200


500

600

700


111 2. Performance Metrics Table XVII shows the performance of RLM for each bandwidth limit case. When compared to the ideal case in terms of cumulative bandwidth performance, the protocol fares well except when the rate limit is set to 310kbps which is the case we discussed earlier; even then, the protocol performs at 73%. Otherwise, the protocol's performance varies between 93% and 100%. Table XVII. Performance of RLM Protocol

T =Rate Limit (kbps) Cr = Cumulative Received Bit Count (kbits) Ci = Cumulative Ideal Bit Count (kbits) Cl = Cumulative Lost Bit Count (kbits) Performance - Cr =Ci Eciency - Cr =(R Ttotal ) Loss Ratio - Cl=(Cl + Cr ) Mean of CPD (seconds) Standard Deviation of CPD (seconds) Mean of CPI (seconds)

500 144960

310 200 100 50 99578 81632 31592 23685

145300 142550 87473 32651 23979 1131

5274

4423

4365

6060

0.996 0.487 0.008 0.49 0.94

0.726 0.531 0.050 2.32 2.91

0.933 0.968 0.988 0.675 0.514 0.758 0.051 0.124 0.204 2.50 4.17 6.45 4.20 4.72 7.39

28.28

41.85

27.72 36.02 44.19

The eciency in terms of bandwidth use varies between 48.7% and 75.8%. This suggests that more evenly spaced layers or extra rate adaptation schemes are needed if good usage of the available bandwidth is sought.

112 RLM shows a signi cant improvement over the basic scheme when we look at the cumulative loss ratio. At all rate limits, the loss ratio of RLM is much better than that of the basic one-to-one scheme. The maximum loss ratio is of 20.4% compared to 65.9% in the basic one-to-one scheme under the same rate limit of 20kbps. The average CPD is of about 3.5 seconds with a standard deviation of about 4.5 seconds as is the case for rate limits of 200, 100kbps. The average CPD values in the 500kbps and 310kbps cases are smaller because small congestion periods caused by CPU pegging are averaged in with actual router congestion periods. More scattered small packet losses in the 500kbps case meant that congestion periods were smaller and the CPI was larger than their counterparts in the basic one-to-one case. The average CPD value for the 50kbps case is unusually large because of one large congestion period occurring between time values 126 and 157 in Fig. 73. Again, the average CPD and its standard deviation are a measure of the latency of the protocol to react to packet loss added to the pruning delay at the router. RLM reacts faster than the basic scheme to packet loss events. The mean value of the CPI indicates that a receiver will spend 35.6 seconds on average between loss periods. This value is quite high considering that a joinexperiment is attempted in the absence of packet loss at least once every 60 seconds. This metric shows a 14% improvement over the basic scheme. In conclusion, RLM signi cantly betters the performance of the basic control scheme. It shows excellent performance when the layers are moderately bursty (93% at 200kbps, 97% at 200kbps, and 99% at 50kbps), and acceptable performance when the source has a bursty behavior (73% at 310kbps). Its loss ratio is as low as 5% in the typical cases of rate limits of 310kbps and 200kbps. The high loss ratio at 50kbps is mainly due to the slow layer drop mechanism of RLM in steady state even when faced by 95% packet losses; even then, it is signi cantly better than the 65.9% loss

113

Fig. 74. Small Y-band Layer, 160x120 greyscale, Level 0 ratio of the basic one-to-one scheme in the same conditions. G. Qualitative Description of the Perceived Image We rst describe the general look and feel of each subscription level then describe the perceived picture quality when a rate control algorithm is run. 1. General Since the CafeMocha encoder truncates the four lower bits of the data in each color band datum, we expected scenery that has many shades of the same color to be rendered poorly and scenery with high contrast to be rendered well; experimental visual appraisal con rmed our intuition. At subscription level 0, a subscribed user receives Y-band update blocks of the base 160x120 frame under a rate limit of 20kbps. Picture updates of successive portions on the screen can substantially degrade the perceived image if very high motion is present. This problem is particularly obvious when the background of the

114

Fig. 75. Small and Medium Y-band Layers, 160x120 greyscale image changes and most blocks are updated. On the other hand, if reasonably low motion is exhibited, as in the case of a talking head scene, the received picture has an acceptable quality. Fig. 74 shows how movement can cause successive parts of the receiver's frame to be updated at subscription level 0. Subscription level 1 adds color to the blocks updated at subscription level 0. The same comments can be made. The picture is clearer through the presence of colors. At subscription level 2, Y-band information for the whole frame is updated; whereas, only the color information for the blocks updated at level 1 is refreshed. This leads to a degradation in the perceived quality when some blocks whose Y-band data is updated exclusively have the wrong colors. We noticed that in scenes where the background is stable, most blocks tended to have the correct colors. On the other hand, when the camera moved some color discrepancies were obvious. At subscription level 3, all Y-band and color information are received for the base 160x120 color picture with good perceived quality. Fig. 75 shows the greyscale version of level 3.

115

Fig. 76. Small, Medium and Large Y-band Layers, 320x240 greyscale At subscription level 4, dierence blocks are received. A 320x240 picture is reconstructed, the color information of the base picture is upsampled to cover the larger frame. Fig. 76 shows the greyscale 320x240 frame reconstructed when the receiver decodes all the Y-band layers. 2. Real-Time Layer Add/Drop As was shown in the previous sections, the basic control scheme causes higher packet loss than RLM. The basic scheme also took more time to drop an oending layer and continued displaying at the oending layer's level of visualization for 10 extra seconds. In practice, the visual quality under the basic scheme suered highly under packet

116 loss. Since no layer prioritization mechanism was used, some ghosting was observed when pyramidal blocks where updated without a corresponding refresh of their base block counterparts. On the other hand, RLM reacted much quicker to packet losses and the visual quality of the rendered picture was much better. Going up from level 1 to 2 brings full 160x120 Y-band updates to the displayed picture. Perceived picture quality is much better although it might be a good idea to display Y-band information uniquely in high motion scenes. This was always a dilemma when choosing a layer add/drop order: would it be better to add progressive color updates (small UV-band layer) followed by the total update of color information (small UV-band and medium UV-band layers), or to add the two base color layers simultaneously after adding the two base Y-band layers? We chose the current add/drop layer order to maximize the number and the bandwidth distribution of the subscription levels. Going up from level 2 to 3 had little noticeable eect in still background scenes but gives a much better rendition of the frame in higher motion scenes. Perceived image quality at size 320x240 is much higher than that of a 160x120 picture. Going up from level 3 to level 4 increases the perceived quality of the picture considerably.

117 CHAPTER VI CONCLUSION AND FUTURE WORK We have developed a exible 6-layer codec and tested a distributed control protocol to control multilayered multimedia sessions. CafeMocha, the videoconferencing tool developed by Sazzad and Brown at Texas A&M, was enhanced from a two layer codec to a six layer codec. Additional layers were created by splitting the previous base layer data into two layers giving us a total of three black and white layers: small, medium and large. A receiver can choose between a 160x120 pixel frame partially refreshed under a small layer rate limit chosen at the source, a fully refreshed 160x120 pixel frame picture, or a 320x240 pixel frame size if he subscribes to all three layers. We then added color data corresponding to each black and white layer over three additional layers. A receiver can enhance the previous de ned black and white resolutions by adding color data layers. In our tests we chose the following join layer order: start with the small black and white layer, then add its color counterpart, add the medium black and white layer, then add its color counterpart, add the large black and white layer, then add its color counterpart. Layer drop actions are done in reverse order from the join order. A basic control scheme, which is a simpli ed version of the Receiver-driven Layered Multicast protocol (RLM) proposed by McCanne was devised, developed, implemented and tested. RLM was also implemented, and tested. Our tests were carried out in a one-to-one testbed con guration; however, CafeMocha can be run in a oneto-many environment with full RLM support. New metrics were de ned whereby layer control scheme subscription paths are compared to the \ideal" layer subscription sample path under the same rate limits. RLM performance values of 99.6% in comparison to the \ideal" case were recorded

118 when the overall received bit rate was nearly constant. The performance was still good at 76% when the ideal highest subscription layer was bursty. Cumulative packet loss ratios as low as 5% were recorded in bursty conditions. RLM was found to be a good control mechanism for moderately bursty layered streams. The basic control scheme performed well in terms of received bandwidth usage but had an unacceptable loss ratio that could exceed 30% in bursty conditions. A subject of future research is the control of layer rates within each stream. The bandwidth allocated for the small layer and the bandwidth allocated for the whole stream can be changed at will. Feedback from receivers can be used to control the target rate of the small Y-band layer and the target rate of the Enhanced-PARC algorithm described in Section G of Chapter III. Prioritizing the base layers (small and medium grayscale) would make sessions more resilient to transient packet losses (before adaptation takes place). However, prioritization con icts with the scalability of RLM [8]. One possible solution is to use more active communication between receivers. In addition to announcing join experiments, receivers can also announce the outcome of their join-experiments. Control messages should not be allowed to ood the network. One possible solution might be to give inter-communication packets a small time-to-live (2 or 3 hops). Receivers can alternatively try to identify other users who are experiencing the exact same network conditions (similar patterns on packet losses per layer). These receivers can then cluster together and bene t from individual member join experiments. In this research, we have selected a pre-de ned join/drop layer order when adding and dropping layers. As shown in Table 22, a user can enhance the quality of its received picture by either going to the right (increasing the quality of the UV-band data), or by going down (increasing the quality of the Y-band data). An interesting area for future research is the development of the layer control stream to take into

119 account the multiple join/drop options available to receivers. In conclusion, regular RLM appears to cope well with bursty layers. It performed well in most of our tests and showed loss ratios as low as 5% in typical bursty conditions.

120 REFERENCES [1] H. Eriksson, \MBONE: The multicast backbone," Communications of the ACM, vol. 37, pp. 54{60, Aug. 1994. [2] Vxtreme Inc., website at www.vxtreme.com; Internet; accessed Aug. 28, 1997. [3] T. Brown, \A layered multicast packet video system," M.S., Texas A&M University, May 1996. [4] C. G. Schroeder, \Increased network eciency for variable video streams in an integrated services packet network environment," M.S., Texas A&M, May 1996. [5] S. Sazzad, \Design of a three resolution coder," M.S., Texas A&M University, Dec. 1995. [6] T. B. Brown, P. E. Cantrell, and J. D. Gibson, \Multicast layered video teleconferencing: Overcoming bandwidth heterogeneity," in Proc. First Annual Telecom. Conf., pp. 145{152, Austin, TX, Oct. 1996. [7] T. Brown, S. Sazzad, C. Schroeder, P. Cantrell, and J. Gibson, \Packet video for heterogeneous networks using CU-SeeMe," in Proc. ICIP'96, pp. 9{12, Lausanne, Switzerland, Sep. 1996. [8] S. McCanne, V. Jacobson and M. Vetterli, \Receiver-driven layered multicast," in Proc. SIGCOMM '96, pp. 117{130, Stanford, CA, Aug. 1996. [9] S. McCanne, \Scalable compression and transmission of Internet video,"Ph.D. dissertation, University of California at Berkeley, Berkeley, CA, Dec. 1996. [10] S. Deering, \IP multicast and the MBone: Enabling live, multiparty, multimedia communication on the Internet," talk given at CERN, Geneva, Feb. 1996.

121 [11] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, \RTP: A transport protocol for real-time applications," Internet Request for Comments, Jan. 1996; available at http://ds.internic.net/rfc/rfc1889.txt; Internet; accessed Aug. 28, 1997. [12] B. Braden, L. Zhang, S. Berson, S. Herzog, S. Jamin, \Resource reservation protocol (RSVP)- Version 1 functional speci cation," ACT Working Group, Internet Draft, Dec. 1997; available at ftp://ftp.ietf.org/internet-drafts/draft-ietfrsvp-spec-16.txt; Internet; accessed Aug. 28, 1997. [13] R. Cogger, CU-SeeMe, Cornell University. Software available at ftp://cuseeme.cornell.edu; Internet; accessed Aug. 28, 1997. [14] Ron Frederik, Network Video, Xerox Palo Alto Research Center. Software available at ftp://ftp.parc.xerox.com/pub/net-research; Internet; accessed Aug. 28, 1997. [15] Thierry Turletti, INRIA Video Conferencing System, Institut National de Recherche en Informatique et en Automatique. Software available at http://www.inria.fr/rodeo/Thierry.Turletti/ivs.html; Internet; accessed Aug. 28, 1997. [16] Van Jacobson and Steven McCanne, Visual Audio Tool, Lawrence Berkeley Laboratory. Software available at ftp://ftp.ee.lbl.gov/conferencing/vat; Internet; accessed Aug. 28, 1997. [17] Steven McCanne, Video Conferencing Tool, Lawrence Berkeley Laboratory. Software available at ftp://ftp.ee.lbl.gov/conferencing/vic; Internet; accessed Aug. 28, 1997.

122 [18] D. Ferrari, \Client requirements for real-time communication services", IEEE Comm. Mag. , vol. 28, pp. 65{72, Nov. 1990. [19] Madhu Sudan and Nachum Shacham, \Gateway based approach for conducting multiparty multimedia sessions over heterogeneous signaling domains," Technical Report, Computer Science Laboratory, SRI International; available from http://www.csl.sri.com/reports/postscript/csl-96-03.ps; Internet; accessed Aug. 28, 1997. [20] Elan Amir, Steven McCanne, and Hui Zhang, \An application level video gateway," in Proc. ACM Multimedia '95, pp. 255{265, San Francisco, CA, Nov. 1995. [21] S. Deering, \Internet multicast routing: State of the art and open research issues," MICE Seminar, Stockholm, Oct. 19, 1993. [22] M. Speer, S. McCanne, \RTP usage with layered multimedia streams," Internet Draft, Dec. 23, 1996; available at ftp://ftp.ietf.org/internet-drafts/draft-speeravt-layered-video-02.txt; Internet; accessed Aug. 28, 1997. [23] D. Waitzman, C. Partridge, S. Deering, \Distance vector multicast routing protocol," Internet Request for Comments, Nov. 1988; available from http://ds.internic.net/rfc/rfc1075.txt; Internet; accessed Aug. 28, 1997. [24] S. Deering, R. Hinden, \Internet protocol, version 6 (IPv6) speci cation," Internet Request for Comments, Dec. 1995; available from http://ds.internic.net/rfc/rfc1883.txt; Internet; accessed Aug. 28, 1997. [25] S. Y. Cheung, M. Ammar, and X. Li. \On the use of destination set grouping to improve fairness in multicast video distribution," in Proc. INFOCOM 96, pp. 553{560, San Fransisco, Mar. 1996.

123 [26] IP Multicast Initiative Web-Page, located at: http://www.ipmulticast.com; Internet; accessed Aug. 28, 1997. [27] \Encoding parameters of digital television for studios," CCIR Rec. 601-602, Recommendations of the CCIR, vol. XI, Part 1, pp. 95{104, 1990. [28] Internet Requests for Comments (RFCs) available at http://ds.internic.net/rfc/rfcnnnn.txt; Internet , where nnnn is the RFC number; accessed Aug. 28, 1997. [29] IETF Internet Drafts available from ftp://ftp.ietf.org/internet-drafts/; Internet; accessed Aug. 28, 1997. [30] IPNG Working Group Meeting, 38th IETF Proceedings, April 1997, Memphis, TN; Report available from ftp://ftp.ietf.org/ietf/ipngwg/ipngwg-minutes97apr.txt; Internet; accessed Aug. 28, 1997. [31] MBONE Mailing List, archive available ftp://ftp.isi.edu/mbone/mbone.mail.*; Internet; accessed Aug. 28, 1997. [32] \When Harry met Sally," Columbia Pictures, Hollywood, CA 1989.

at

124 VITA Ralph Akram Gholmieh was born in Fort-Lee Virginia in 1972. He obtained his Bachelor of Science in Electical Engineering from the Saint-Joseph University at Beirut in 1995. He can be reached through his email address [email protected]. His future address is 4149 Nobel Drive #34, San Diego, California 92122.

The typist for this thesis was Ralph Akram Gholmieh.

multicast multilayer videoconferencing: enhancement of a ... - CiteSeerX

multicast multilayer videoconferencing: enhancement of a ... - CiteSeerX

Suggest Documents

Reproducing 3D-Sound for Videoconferencing: a ... - CiteSeerX

Videoconferencing on the Internet - CiteSeerX

Implementation of a Videoconferencing System

Remote Control for Videoconferencing - CiteSeerX

Session F3F VIDEOCONFERENCING AND ... - CiteSeerX

Influence of Interactive Videoconferencing on the ... - CiteSeerX

Implementation of a Videoconferencing System

Analysis of videoconferencing and videostreaming ... - CiteSeerX

A comparison of reliable multicast protocols - CiteSeerX

Quilt: A Patchwork of Multicast Regions - CiteSeerX

The Use of Videoconferencing as a Medium for the ... - CiteSeerX

Enhancement of current-voltage characteristics of multilayer ... - VNU

Evolving Multilayer Perceptrons - CiteSeerX

Adaptive Reliable Multicast - CiteSeerX

Bilayer Video Segmentation for Videoconferencing ... - CiteSeerX

Enhancement and tunability of active plasmonic by multilayer grating ...

Multipoint videoconferencing with scalable video coding - CiteSeerX

Implementing Three-Party Desktop Videoconferencing - CiteSeerX

DC Bias Current Enhancement of Multilayer Chip ... - Springer Link

Active Reliable Multicast - CiteSeerX

Spatiality in Videoconferencing: Trade-offs between ... - CiteSeerX

Feature Selection Using a Multilayer Perceptron - CiteSeerX

Feature Selection Using a Multilayer Perceptron - CiteSeerX

Security Enhancement for Multicast over Internet of Things by ...