Rate Adaptation for Improved Audio Quality in ... - Semantic Scholar

2 downloads 44795 Views 79KB Size Report
implementation of a source initiated adaptive audio application. I. INTRODUCTION. The performance of network audio applications such as audio conferencing ...
1 of 5

Rate Adaptation for Improved Audio Quality in Wireless Networks Leann Christianson and Kevin Brown California State University, Hayward Department of Math and Computer Science Hayward, CA 94542-3092

Abstract- As public and corporate interest in wireless solutions increases, the demand for consistent and acceptable Quality of Service for real-time, high-bandwidth data will also increase. Low bandwidths, high error rates, and transient traffic conditions are common in the wireless environment. These characteristics cause packet delay and loss, which negatively affect received quality. To improve the quality of audio transmission over wireless networks, we propose that audio applications adapt to current network conditions by reducing encoding rates to match available bandwidth. In this paper, we present the design and implementation of a source initiated adaptive audio application.

I. INTRODUCTION The performance of network audio applications such as audio conferencing, Internet telephony, and digital audio broadcast tools is directly linked to the current conditions of the host network. These particular applications are unique in that they require high bandwidth channels and constant delivery rates. To maintain intelligibility, coherence, and smooth playback, audio data must reach the receiver at a constant rate. Limited bandwidth, excessive traffic, and congestion cause packet loss, delay, and increased jitter, decreasing the quality of the received audio signal. Wireless networks typically provide much lower bandwidths than traditional wired networks. Hence, they are more susceptible to congestion and overloading problems, which lead to unacceptable application performance. To improve the perceptual quality of audio network applications running in wireless environments, the problems of limited bandwidth and congestion must be addressed. In this paper, we propose that audio applications respond to network conditions and adapt their sampling and transmission rates to current conditions. We maintain that receivers prefer lower sample resolution and continuous delivery to full sample resolution marred by delay and losses. Current studies have focused on adaptive video applications for the Internet. Our study is unique in that it focuses on adaptive audio and listener preference. We limit the scope of our study to adaptive methods for unicast delivery of general audio in the wireless environment. Commercial applications that would benefit from such adaptive methods include: CDquality audio streaming, audio tracks to accompany video-ondemand and audioconferencing.

The paper is organized as follows. Section 2 presents background and motivation for adaptation in wireless environments. Section 3 presents design considerations, and Section 4 presents implementation. Experimental results and listener preferences regarding acceptable quality of service are presented in Section 5. Section 6 closes with future directions and conclusion. II. BACKGROUND & MOTIVATION Congestion and interference in wireless networks results in delayed and lost packets which degrade the quality of the received audio stream. Delay introduces unnatural silence gaps into the audio stream which listeners find extremely annoying. In speech delivery, lost packets may cause portions of words to be dropped out, negatively influencing intelligibility. When transmitting audio data, continuous delivery with limited loss is of primary importance when attempting to provide high quality of service. The mobility of hosts in the wireless environment adds to the complexity of providing constant delivery rates and quality of service. Complicated channel allocation strategies must be used to dynamically allocate bandwidth to a range of hosts, the number of which will fluctuate regularly. Assuming a lack of resource reservation and priorities, a crowded LAN’s only option is to steal bandwidth from currently running applications, thus interfering with, and degrading those applications' services. Overlay networks offer connections to several wireless networks (e.g., cellular, wireless LAN, satellite) simultaneously further complicating the media allocation problem by increasing the choice of networks on which data transfer may occur. As user preference towards wireless devices increases, attention will be focused on the types of service that these networks provide. Applications that provide consistent quality of service and that can adapt to transient traffic conditions will be favored. As it is impossible to know the state of any network a priori without invoking delays by probing, and as network conditions are in a constant state of flux, it is important to monitor the network while the application is running. This monitoring process should be unobtrusive so as not to consume too much additional bandwidth, further exacerbating the problem. In the following paragraphs, we present related work concerning

2 of 5

solutions to delay and loss of real-time data as well as application adaptation in wired environments. A. Related Work Effective techniques for handling small amounts of packet delay, jitter, and loss have been studied. Delay jitter can be somewhat controlled with the use of “leaky bucket” techniques which employ buffers to allow variable bitrate streams to be played to the audio device at a constant rate. Such a technique has been described in [1,2,3]. Packet loss has been addressed by forward error correction techniques in [2,4]. Video application adaptation has been studied in several works [1,3,5]. At the time of this writing, however, no studies have focused on general audio adaptation, nor have they focused on problems specific to wireless networks. We believe that approaches to general audio adaptation are similar to those of video, therefore, it is useful to consider common video adaptation approaches. These approaches can be categorized as receiver or sender initiated, and they often rely on transcoding. We review source initiated, receiver initiated, and transcoding techniques in the following subsections. B. Adaptation Techniques Source adaptation requires feedback from the receiver concerning the state of the audio stream as it reaches the receiver. The source reacts to information that packets are being delayed and lost by reducing the bandwidth of the transmitted audio stream. This method has been presented in [1,5]. As multimedia data is commonly sent over the MBONE using the RTP protocol, source initiated methods often rely on RTCP receiver reports for gathering feedback. Receiver driven adaptation requires the source to transmit data in a layered or hierarchical fashion. It also requires the receiver to maintain information regarding the state of the network. Using this method, the source sends a designated number of data streams, which are hierarchically encoded to the receiver who combines the streams for playout. More streams represent higher quality (i.e., higher bandwidth) data. A hierarchical encoder for general audio has yet to be developed. Transcoding involves decoding and re-encoding a data stream in order to represent it with a lower rate method and to produce a lower bandwidth stream. Typically, a transcoder is set up as a gateway application to reduce bandwidth for receivers who are suffering from excessive packet loss or to connect higher speed networks to lower speed subnets. Trancoding applications for video and speech have been presented in [1,2,3]. Transcoding would appear to be an attractive option for linking wired and wireless networks. There are several

disadvantages, however. Lower rate encoders are algorithmically more complex than their higher rate counterparts. Complexity causes additional coding delays, which may further increase if the audio stream must travel through several transcoders on its way to the receiver. Another issue is deciding when to invoke transcoding. This is particularly a problem when the audio stream is being multicast and carried over a wireless network. In this situation, it is likely that each receiver’s quality will vary based on proximity, power, and connection device. However, if a few receivers are suffering excessive loss, transcoding should occur. Subsequently, the other receivers who are not experiencing loss will have to accept a lower bandwidth stream. III. DESIGN There are four issues to consider when developing a general audio adaptation application for the wireless environment. These include: 1. 2. 3. 4.

Type of adaptation (e.g., source or receiver initiated) Encoding technique Feedback mechanism Adaptation algorithm

A. Type of Adaptation For our design, we chose a source initiated adaptation scheme. This decision was based on the recognition that the source has an advantage over the receiver in that it has “self knowledge” regarding its capabilities and limitations. For example, the source is aware of its available encoding methods and their subsequent bandwidth requirements as well as its buffer sizes and CPU speed. A receiver without this knowledge might request adaptation to a lower bandwidth stream which is beyond the functionality of the source application. It is wasteful to send parameters describing the source capabilities for the sole purpose of receiver initiated adaptation, hence we choose the sender initiated approach. This choice also reduces receiver complexity, allowing the allocation of its resources to continuous playout. As mobile hosts tend to be battery powered devices, this power efficiency is an added advantage, which may also lead to improved quality of service. B. Encoding Technique Characteristics to consider when choosing a general audio encoder include encoding rate, delay, and perceptual quality. For our study, we required a simple method that would provide variable rate, real-time encoding without the aid of additional hardware. We chose PCM, consisting of sampling coupled with quantization, is simple and performs well if the quantization routine is designed to match the incoming data.

3 of 5

With the accompaniment of a prediction process to further reduce sample to sample correlation, an easily adaptable encoder can be designed. C. Feedback Of foremost importance to adaptation is the feedback mechanism. This mechanism should not be so obtrusive as to require excessive bandwidth that could better be used towards data transfer. For a feedback mechanism, we chose to use RTCP receiver reports. At the transport layer, the Real Time Protocol (RTP) [6] is used to support multimedia traffic on the Internet. RTP can be implemented on top of UDP/IP or ATM, and it takes advantage of the MBONE. Using unreliable IP multicast, the MBONE provides bandwidthefficient distribution of real-time multimedia data by eliminating redundant packet transmissions.

Another adaptation approach is to attempt to match the transmission bandwidth to the estimated amount of bandwidth that would be used in a TCP connection. Using this technique, bandwidth usage is calculated based on maximum packet size, round trip travel time, and loss rate. The last adaptation issue concerns the level of encoding that the application should use to begin audio delivery. To address this issue, we propose the use of profiles containing information regarding user preference based on connection device, personal taste, and common network conditions for particular times of day. Statistics can be gathered from previous sessions which, when coupled with case-based reasoning, can guide the choice of start-up parameters and acceptable loss rate thresholds in order to maintain performance guarantees. IV. IMPLEMENTATION

RTP is made up of two components: a real-time data transfer protocol (RTP) and a control protocol (RTCP). The Real Time Control Protocol (RTCP) performs quality of distribution monitoring, intermedia synchronization, and participant identification. Quality of distribution monitoring is accomplished via sender and receiver status reports, which each participant generates periodically and multicasts to every other participant of the RTP session. Sender reports (SR) include the SSRC ID for the data source and the total number of packets and octets sent since the source started transmitting. Receiver reports (RR) are generated by each receiver indicating their current loss ratio, jitter, and highest sequence number received for each source from which they have received data. Using these statistics, the sender can determine the state of the network and make adjustments in data delivery if necessary. D. Adaptation The goal of the adaptation algorithm is to maintain delivery of acceptable quality audio in the presence of varying conditions. The primary issue, therefore, is when and how to adapt. It is fairly obvious that the encoding level should decrease when network conditions are overloaded, however, the question of when the encoding can be safely increased is more difficult. An additional consideration is the control of oscillations between successive encoding levels. A common adaptation algorithm is the linear increase, exponential decrease method. Using this method, an acceptable loss rate is established and RTCP receiver reports are monitored. When loss rates exceed a previously established threshold, bandwidth is reduced by decreasing the encoding rate. The linear increase, exponential decrease method prevents congestion, however, it does not effectively control oscillations which can occur between successive encoding levels.

Our study focuses on adaptive audio encoding for unicast delivery on wireless networks. We chose the Robust Audio Tool (Rat) version 3.0.24 developed by Hardman and Kouvelas at the University College London [7] as our base application to which we added enhancements. Rat was designed for audio unicast or multicast transmission over the Internet. Rat, however, does not allow dynamic adaptation by the source after start-up. Its features are sufficient for handling a very small number of packet losses, but are not robust enough for times of heavy traffic, nor are they adequate for maintaining consistent quality of service when used in a wireless environment. For our experiments, packet recovery was not invoked. Instead, an attempt to improve quality and reduce loss was accomplished through bandwidth adaptation. The RTP Protocol was used as the feedback mechanism for establishing network conditions and driving adaptation. Tests were executed to evaluate the adaptation process and listening tests were conducted to establish acceptable quality levels, acceptable loss rates, and overall listener preferences. The Rat 3.0.24 application provides 16 bit sampling of audio data at 8Khz. We chose to use linear PCM encoding for all encoding levels because it enabled the resultant quality of each level to be compared without taking into consideration the peculiarities of different encodings. Six levels of quantization, representing 16, 14, 12, 10, 8, and 4 bits per sample were implemented. The encoding routine receives a value from the feedback mechanism representing the quantization level needed to match current traffic conditions. As our primary focus was on encoding and received quality, we adopted a simple adaptation algorithm that we plan to enhance in future work. Before transmission, the source chooses values for several parameters that will later be used

4 of 5

to drive the adaptation process. These parameters include initial encoding level, expected allocated bandwidth, maximum tolerable loss, and minimum allowed perceivable loss. RTCP receiver reports are received by the source at periodic intervals, and statistics reflecting the percentage of packets expected, received, and lost are evaluated. We chose to delay adaptation until three, consecutive RTCP receiver reports were received, each indicating that the current loss rate was above the source’s specified maximum loss rate. When this occurred, the encoding level was decreased linearly to the next encoding level. The encoding rate is increased using the same model. If the RTCP reports indicate that the current loss rate is below the minimum loss rate specified by the source, encoding will increase linearly to the next higher level. For decoding, it was necessary to create a means of communicating changes in encoding level to the receiver for proper decoding. We chose to place this information in the payload section of the audio data packet based on recommendations presented in the RTP specification [6]. At the receiver, audio data is passed to the decoding routine which examines the control byte, unpacks the data, and dequantizes them through a table lookup. This routine returns the audio data to a 16-bit format for playback. V. RESULTS A 300MHz PentiumII PC and a 150MHz Pentium PC both running RedHat Linux 4.2 were used for testing purposes. These machines were situated at a distance of approximately 0.5 miles from each other. All transmission and reception between these machines was done in unicast mode and took place in the early evening hours. In order to simulate wireless conditions, bandwidth was limited via application start-up control parameters. Restricted bandwidth was simulated with an available bandwidth parameter, and the rat drop option was used to randomly drop packets at the source at the specified rate. Tests were executed in order to observe the bandwidth adaptation process and in order to evaluate user acceptance of various encoding rates and loss levels. Fig. 1 is an example of the encoding rate decreasing to fit the chosen available bandwidth. In this case, available bandwidth was set to 35kbps and the maximum acceptable loss rate was set to 5%. Encoding begins with 16 bits used per sample (a 128kbps stream), and initially loss rates are high. After three RTCP receiver reports showing loss above the maximum specified level, the encoding rate is reduced, which results in a reduction of lost packets. The encoding rate continues to adapt to lower rates until at 4 bits per sample (a 32kbps stream), the loss rate stabilizes at 0%. By decreasing encoding rates, we are decreasing the subsequent resolution of the audio samples. This will introduce error that may be discernable by the receiver at

140 120 100 80

Bandwidth (Kbps) Loss Rate %

60 40 20 0 1

3

5

7

9

11 13

15

17 19

Time

Fig. 1. Encoding begins with 16 bits used per sample. Loss rate is initially high, therefore, the encoding rate is reduced linearly to successive encoding levels. At 4 bits per sample, the loss rate stabilizes at 0%.

playback. It is interesting to compare the effects of fewer bits per sample versus that of lost packets on the received audio quality. Listening tests were performed to establish a threshold value for maximum loss and minimum encoding rate. To establish a loss rate threshold, seven-second sound samples were recorded under loss rates of 10% to 75%. These were played to listeners who were then asked to rate signal quality as excellent, satisfactory, or unacceptable. Fig. 2 presents listener thresholds for acceptable loss rates. Most listeners found loss rates of 37% or higher to be unacceptable. Intelligibility, clarity, and crispness were cited as qualities which suffered most from extreme loss. For longer sessions, such as a videoconference, we would expect that the acceptable loss percentage would be much lower. Listening tests were also performed to compare signals suffering from various loss rates with signals encoded at lower resolutions. Signals encoded at 16, 12, 10, 8, and 4 bits per sample were transmitted with little or no loss and recorded at the receiver. These were played to listeners who compared them to signals encoded at full resolution, but received with loss. In particular, we were interested in evaluating listener preference between samples containing losses that resulted in a final actual data bandwidth that was equal to the bandwidth required for a lower level of encoding. Fig. 3 displays results for this experiment. Lower encoding levels were preferred over loss for most signals. Results show that a 12 bit encoding (96kbps) was preferable to 25% loss (representing an actual received data rate of 96kbps). A 10 bit encoding (80kbps) was also preferable to 25% loss. An 8 bit encoding (64kbps) was preferable to 37% loss (representing an actual received data rate of 80kbps). When the encoding reached 4 bits per sample (32kbps), however, quantization noise in the form of static was noticeable and annoyed listeners, whose preferences turned towards the signals with loss. This was a result of the linear quantization method used. A non-linear approach would allow increased resolution for the lower encoding levels.

5 of 5

Percent Preference

120.00 100.00 80.00 Excellent Satisfactory Unsatisfactory

60.00 40.00 20.00 0.00 0%

25%

37%

40%

50%

60%

Percent Loss

Listener Preference Percent

Fig. 2. Listener thresholds for acceptable loss rates. Most listeners found loss equal or greater than 37% to be unacceptable.

120.00 100.00 80.00 Less Resolution Loss

60.00 40.00 20.00 0.00

12

bi

tv

s

25

%

10

lo

bi

ss

tv

s

25

%

10

lo

bi

ss

tv

s

37

%

8

lo

bi

ss

tv

s

37

%

8

lo

bi

ss

tv

s

50

%

4

lo

bi

ss

tv

s

50

%

4

lo

bi

ss

tv

s

75

%

lo

ss

Fig. 3. Listener preference for signals suffering from various loss rates compared to those transmitted with lower resolution

CD-quality audio streaming, wide-band radio broadcast, and audioconferencing applications become more popular. For acceptable quality of service, these applications require continuous delivery of data and minimal loss. Wireless networks differ from their wired counterparts in regards to bandwidth, error rate, and traffic consistency. Bandwidth is generally limited, errors occur more often due to interference and congestion, and network conditions are transient as mobile hosts move in and out of range. These characteristics cause delay and packet loss, which negatively affects received audio quality. This paper presents a method for allowing dynamic source initiated adaptation for general audio streams. Adjusting the bandwidth of the transmitted audio stream reduces the number of lost packets during times of congestion and decreases delay, thus allowing a constant delivery rate and improved quality of service. Network feedback was gathered with RTCP receiver reports, which were used to evaluate network traffic and loss and to guide adaptation. Results indicate that listeners preferred audio encoded at lower rates and delivered continuously to audio that suffered from excessive loss. REFERENCES [1] Bolot and Turletti, “Experience with control mechanisms for packet video in the Internet,” SIGCOMM Computer Communication Review, Vol. 28, No. 1, January 1998.

VI. FUTURE WORK There are several improvements to our audio encoding and adaptation algorithms that we would like to address, and that we believe are future. Rat 3.0.24 did not provide the high sample rates necessary to properly encode wide-band, general audio data. We are currently working with a newer version of rat which provides sample rates up to 48kbps. Additional encoding levels combined with a prediction algorithm and non-linear quantization would allow for more accurate encoding. This would provide a closer match to localized changes in audio dynamic level, and would improve received quality for lower bit rate encodings. Our initial adaptation algorithm for increasing and decreasing encoding levels is simplistic and is not sufficient to control oscillations between consecutive encoding levels. Work is currently in progress to develop a new method for controlling this problem. Additional testing over a wireless LAN is planned to gather more information regarding user preferences. Finally, we plan to incorporate case-based reasoning for predicting bandwidth needs based on time of day, applications in use, and connection bandwidth. Combined with user preferences, this information will be used to create user profiles that will guide the audio application’s selection of start-up parameters. VII. CONCLUSION Interest in wireless transmission of real-time, high bandwidth audio data is increasing as applications such as

[2] Hardman, Sasse, and Kouvelas, “Successful multiparty audio communication over the Internet,” Communications of the ACM, Vol. 41, No. 5, May 1998. [3] McCanne and Jacobson, “Vic: A flexible framework for packet video,” Proceedings of ACM Multimedia ’95, San Francisco, California, November 1995. [4] Bolot and Vega-Garcia, “The case for FEC-based error control for packet audio in the Internet,” ACM Multimedia Systems, 1997 [5] Busse, Deffner, and Schulzrinne, “Dynamic QoS control of multimedia applications based on RTP,” International workshop on high-speed networks and open distributed platforms, St. Petersburg, Russia, June 1995. [6] Perkins, et. al., “RTP: payload for redundant audio data,” RFC 2198, September 1997. [7] Hardman, Sasse, Handley, and Watson, “Reliable audio for use over the Internet,” Proceedings of INET’95, Honolulu, Hawaii, June 1995.