Semi-Fuzzy Rate Controller for Variable Bit Rate Video - IEEE Xplore

0 downloads 0 Views 710KB Size Report
Abstract—A novel semi-fuzzy (SF) rate control algorithm (RCA) for variable bit rate (VBR) video applications is proposed. The proposed RCA is optimized to ...
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

633

Semi-Fuzzy Rate Controller for Variable Bit Rate Video Mehdi Rezaei, Member, IEEE, Miska M. Hannuksela, Member, IEEE, and Moncef Gabbouj, Senior Member, IEEE

Abstract—A novel semi-fuzzy (SF) rate control algorithm (RCA) for variable bit rate (VBR) video applications is proposed. The proposed RCA is optimized to provide high quality compressed video bit streams in a wide operating range from constant quality to nearly constant bit rate. Thanks to a low degree of computational complexity, it is suitable for real-time applications of VBR video. The proposed RCA operates under given buffer size, delay and quality constraints. It provides a VBR video bit stream by controlling the quantization parameter (QP) on a picture basis. The QP is mainly controlled by a fuzzy rate controller and a deterministic quality controller, which are optimized such that they minimize the variation of quality to provide encoded video with high and stable visual quality. The proposed RCA has been implemented in an H.264/AVC video codec and the experimental results show that it provides a high-level average quality for encoded video while strictly obeying the buffering delay and quality constraints. Index Terms—Bit rate, coding, control, fuzzy, rate, semi-fuzzy (SF), variable, video.

I. INTRODUCTION

V

ARIABLE bit rate (VBR) video applications have constraints which are significantly different from those for constant bit rate (CBR) applications. For example, while in realtime video conversation, a constant, short-term average bit rate is required to ensure low delay, in streaming applications, a constant long-term average bit rate is sufficient and a major short-term variation in bit rate is acceptable. In comparison with CBR video, VBR video can provide better visual quality and coding efficiency for most video contents [1]. A great deal of attention has been paid to video rate control over the past two decades. As a usual approach, the control algorithms operate in two steps. In the first step, a bit budget is allocated to a video segment such as group of pictures (GOP), frame, and macroblock (MB) according to practical constraints and video properties. In the second step, a quantization parameter (QP) is computed according to the allocated bit budget and the coding complexity of video. Usually a rate-distortion (R-D) model is utilized for computation of QP. The R-D model is derived analytically or empirically. In analytical modeling, a R-D

Manuscript received November 13, 2006; revised March 19, 2007 and August 20, 2007. This work was supported in part by Nokia and the Academy of Finland, Finnish Centre of Excellence Program 2006–2011 under Project 213462. This paper was recommended by Associate Editor J. Cai. M. Rezaei and M. Gabbouj are with the Department of Signal Processing, Tampere University of Technology, Tampere FI-33720, Finland (e-mail: mehdi. [email protected]; [email protected]). M. M. Hannuksela is with the Nokia Research Center, Tampere FI-33720, Finland (e-mail: [email protected]). Digital Object Identifier 10.1109/TCSVT.2008.919108

model is derived according to the statistics of the source video signal and the properties of the encoder. Empirical modeling attempts to approximate the R-D curve by interpolating between a set of sample points. The R-D model provided by one of the two approaches is then employed in the bit budget allocation process and calculation of the QP for the rate control. The R-D model parameters are updated according to encoding results. A large number of rate control algorithms (RCAs) have been proposed that are mainly targeted for CBR applications. The RCAs presented in the standard reference models are examples that operate based on the bit allocation approach. Version 5 of MPEG-2 video Test Model (TM5) describes a RCA for CBR encoding that takes into account the different properties of the three coded picture types (I-, P-, and B-pictures).1 The algorithm is built based on a first-order R-D model and it provides control at three levels including: GOP, frame, and MB level. The H.263 Test Model Version 8 (TMN8) uses a RCA that allows control at the frame level and MB level [2]. This algorithm operates based on a second-order R-D model which is parameterized based on variance of the luminance and chrominance values in the (motion-compensated or intra) MB [3]. The MPEG-4 Verification Models (VM5) describes a RCA [4], which offers rate control at the frame level. The VM5 attempts to achieve a target bit rate over a certain number of frames using a second-order R-D model that is parameterized based on the mean absolute difference (MAD) of the residual frame after motion compensation [5]. An extension to this algorithm is described in the Annex L.3 of MPEG-4 verification model which supports modulation of the QP at the MB level and is more suitable for low delay applications. The RCA of the Joint Model (JM) [6] reference software of H.264/AVC creates streams satisfying the available bandwidth provided by a channel and is also compliant to hypothetical reference decoder (HR-D). It consists of a tight control at three levels including: GOP level, picture level and an optional Basic Unit level. The basic unit is defined as a group of successive MB in the same frame. At the picture level, the QP for a non-reference picture is computed by a simple interpolation method from the QPs of its reference pictures; whereas, the QP for a reference picture is computed based on a second-order R-D model which is parameterized based on MAD of the residual frame after motion compensation. While at this step the MAD of the current reference picture is not available it is predicted by a linear prediction using the actual MAD of the previous reference picture. The MAD of the current frame or MB is only available after the rate distortion optimization (RDO) process while the RDO is performed based on QP. This chicken-and-egg dilemma is solved by the prediction of MAD from the previous frames. 1[Online].

Available: http://www.mpeg.org/MSSG/tm5/

1051-8215/$25.00 © 2008 IEEE

634

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

Yang et al. [7] compute two QPs, one for the RDO process and the other one for coding, to break the dilemma resulting from QP-dependent RDO in H.264/AVC. Besides the RCAs used in the standard reference models, a huge number of RCAs have been proposed for different applications in the literatures. Tsai et al. [8] propose a modification to TMN8 rate control. They investigate the relationship between quantization distortion and the coding order of MBs. Based on the investigation results they modify the encoding order of MBs to favor the more complex MBs. Pan et al. [9] propose some modifications to the MPEG-4 VM5 rate control by a weighted bit allocation to the P frames. They allocate a higher bit budget to the earlier encoded frames which are used as reference by more subsequent frames. Navakitkanok et al. [10] try to improve the performance of H.264/AVC JM rate control in low delay case by decreasing the number of skipped frames. Ma et al. [11] propose another RCA for H.264/AVC revising the R-D model used in JM rate controller. Jiang et al. [12], [13] present a peak signal-to-noise ratio (PSNR)-based frame complexity estimation to improve the H.264/AVC JM rate controller. They use the combination of the PSNR and MAD measures for the complexity estimation and parameterization of R-D model. Some RCAs operate based on R-D models which are extracted heuristically. He et al. [14], [15] propose RCAs based on a -domain R-D model, in which the bit rate is estimated by a linear function of the percentage of zeros among the quantized transform coefficients in each video frame or MB. Chang et al. [16] proposed a new R-D model in -domain that estimates the encoding bit rates based on the number of nonzero coefficients, the count of zeros before the last nonzero coefficient in the zigzag-scan order, and the sum of absolute quantized nonzero coefficients. In addition to the RCAs targeted for CBR applications, a number of algorithms have been proposed for VBR applications such as video streaming and local recording applications. The algorithm presented in [17] is a low complexity frame-level rate controller for streaming applications. Although this algorithm utilizes a virtual buffer, two other parameters predominantly control its operation: a large time interval and a large bit budget. The virtual buffer, which is essential in streaming, does not play an active role in this algorithm. The smooth pursuit eye movement (SPEM) rate control scheme introduced in [18] is designed for real time streaming. This RCA works near the CBR region, and cannot utilize the VBR benefits. The algorithm presented in [19] is targeted for the recording application and it tries to suppress the fluctuation in quantization scale as much as possible. The buffer constraint, which is essential in streaming applications, is not considered in this algorithm. The methods proposed in [20] and [21] attempt to satisfy a target bit-budget constraint. In other words, they utilize the total storage size as a constraint for encoding a number of frames. The algorithm used in [22] performs bit allocation according to the coding complexity and does not obey any bit rate constraints. Therefore, depending on the content activity, it produces extreme bit rate variations, which do not obey buffering constraints. Moreover, another group of RCAs have been presented for VBR applications in the literature which uses a kind of look ahead for the rate control, see [23] and [24] as examples. This group of RCAs

requires more memory for storage of uncompressed video and more delay for preprocessing of video data. Therefore, they are not suitable for real-time applications. In this research we are looking for a low-complexity RCA optimized for real-time VBR applications with buffer constraint. A video RCA can operate in different regions in the R-D space between the constant rate region and the constant quality region. We classify RCAs into three classes in this paper: CBR, VBR and constant quality algorithms. However, the terms of VBR and constant quality are used with different meaning in the literature. An ideal CBR RCA operates in a region that is parallel to the distortion (or quality) axis and an ideal constant quality RCA operates in a region that is parallel to rate axis. Although in practice some variations in the bit rate of compressed video even in CBR can be exist, the term of VBR used in this paper means considerable variation in the bit rate. A VBR rate controller operates in a region between the CBR and constant quality operating areas. It can provide more constant quality in comparison with CBR and less variations in bit rate in comparison to the constant quality case. From the systematic controlling point of view it is difficult to find an accurate reference point for the VBR controller. In CBR, the target bit rate is a fixed reference point and the main objective of the controller is to drive the bit rate toward the reference point. In constant quality rate control the quality of encoded video is a reference point for the controller. In VBR a long-term average value is defined for the bit rate and there is no real short-term reference point in terms of rate or quality for the controller. The objective of a VBR is to minimize the variations in quality while the bit rate can have some variations with a buffer or delay constraint. A more constant quality for a bit stream can be achieved by more variations in bit rate that means more transmission and buffering delay. Therefore, when compared to CBR, a VBR rate controller provides a more constant quality for the compressed video at the expense of a higher transmission and buffering delay. Some RCAs such as MPEG-4 VM5 and H.264/AVC JM can provide a kind of VBR bit streams by operating at the frame level. Unlike TMN8 rate controller that is optimized for low delay applications; these algorithms can provide bit streams with high quality I-pictures. However, we believe that these algorithms are not optimized for VBR applications because they operate just based on a reference point from the rate without any feedback from the quality. Moreover, it is possible to have VBR RCAs with a lower degree of complexity. We proposed a RCA for real-time VBR application in [25]. It provides a VBR video combining a constant rate and a constant quality control method. It performs very well but it has many tuning parameters that should be adjusted for different applications. A fuzzy-logic-based real-time RCA for MPEG video is presented in [26]. We proposed two other fuzzy RCAs for mobile and streaming applications in [27] and [28]. The presented RCA in [27] is a low-complexity fuzzy rate controller for mobile applications including local recording, streaming and multimedia massaging services (MMS). The proposed RCA in [28] is a fuzzy controller which is optimized for streaming applications with a relatively high delay. Another RCA was proposed with delay constraint optimized for video streaming over DVB-H

REZAEI et al.: SF RATE CONTROLLER FOR VBR VIDEO

applications in [29]. There are basic differences between the presented algorithm in [26] and our fuzzy algorithms. From the controlling system point of view, they have different structures and diagrams. The presented algorithm in [26] uses two fuzzy controllers with totally five linguistic variables while our presented algorithms in [27], [28] and also in this paper only use one fuzzy controller with two linguistic variables. Furthermore, the fuzzy variables, fuzzy membership functions and the controlling methods in our algorithms are quite different. Moreover, we use the fuzzy controller in combination with other deterministic controller(s). A common element in both our fuzzy RCAs and the proposed controller in [26] is the well known fuzzy logic engine. The key concept in the proposed RCA is to prevent unnecessary fluctuations in the quality of compressed video as much as possible while the application constraints are obeyed. Various proposed RCAs in the literature differ in bit allocation method, employed R-D model, and the used coding complexity measure. The proposed RCA in this paper follows a different approach from the usual approach. There is no bit allocation step in the algorithm and therefore no R-D model is used directly for the calculation of QP. The proposed RCA in this paper controls the bit rate of video bit stream by controlling the variation of QP using feedbacks from the quality and the bit rate of previous encoded frames. The algorithm is optimized to minimize the variation of quality. From the visual quality point of view, it is desirable to have a constant visual quality for a long-period such as a video scene [1]. In general the constant visual quality does not mean constant PSNR necessarily and also the constant PSNR does not mean constant QP. However, there is a strong correlation between the visual quality, PSNR and QP. For a uniform video signal, experimental and analytical (with some assumptions) results show that minimizing the variation in quality provides the almost maximum qualitative measure of quality [1], [19], [30]. According to these assumptions the proposed RCA is optimized to prevent unnecessary fluctuations in QP and PSNR which are correlated to visual quality. The proposed RCA in this paper is a semi-fuzzy (SF) controller that can be used for a wide range of VBR applications. The term SF is selected because a fuzzy controller is utilized as a part of the controller and the other parts of the controller use a deterministic control based on some R-D results. With some modifications, it utilizes all the advantages of our previous algorithms while it is tuned easily for different applications. The input signals, membership functions and the desired central values of the fuzzy system have been modified. Furthermore, the gain of the fuzzy feedback loop is adjusted adaptively according to the application. Moreover, the QP of I-pictures is computed adaptively to the application and content. The adaptation features used in the new RCA makes it easy to be tuned. The proposed algorithm in this paper has a low degree of complexity. While it does not use any R-D model, there is no need to update the R-D model parameters based on a complexity measure such as variance or MAD. The proposed RCA can be used for H.264/AVC, MPEG-4 (part-2), MPEG-2 and H.263 video coding standards and with any RDO process. There is no chicken-and-egg dilemma for H.264/AVC any more because the QP is calculated only based on results of previous encoded

635

pictures. Results of implementation of our previous fuzzy RCA on MPEG-4 (part-2) and H.263 are presented in [27]. However, the new RCA has a better performance than the former versions. The proposed RCA in this paper was implemented on a H.264/AVC codec and a set of simulations were run. The results of simulations were compared with the results of the constant QP encoding and also with the results of H.264/AVC JM rate controller. Comparing simulation results show that the proposed algorithm provided a good performance. This paper is organized as follows. Sections II and III present the overview and detailed description of the new RCA, respectively. Simulation results are provided in Section IV. The paper is concluded in Sections V. II. OVERVIEW OF RATE CONTROL ALGORITHM The proposed RCA controls the bit rate by adjusting the QP on a picture basis. It utilizes a fuzzy rate controller in combination with a quality controller and several other tools to calculate the QP for different video pictures. Although here only intra-prediction pictures (I-picture) and reference inter-prediction pictures (P-pictures) are explained, the algorithm is easily expandable to other types of pictures as well. The fuzzy controller utilizes a virtual buffer to impose the buffer and delay constraints to the bit stream. The quality controller uses another feedback from the PSNR of encoded video to minimize the variation in the quality. The proposed RCA can be divided functionally in two main parts. The first part utilizes the fuzzy controller and the quality controller to compute the QP of P-pictures. The second part of algorithm uses several other feedback signals from the buffer, uncompressed and compressed video to calculate the QP of I-Picture based on some rules derived from a simple R-D model. The I-pictures at the scene cuts boundaries are treated differently from normal I-pictures at the predefined periodic random access points. While in VBR the bit allocation to I-pictures has a remarkable impact on the encoding results, the QP of I-pictures is computed very carefully. Many analytic and empiric results have been used in calculation of QP for different video pictures. The key point in the proposed RCA is to prevent unnecessary variations in quality while the buffer constraint is obeyed. Details of the proposed RCA are presented in the sequel. III. RATE CONTROL ALGORITHM A. Inter-Prediction Pictures The QP for P-pictures is defined by the fuzzy controller and the quality controller. Fig. 1 depicts the block diagram of proposed rate control system for the P-pictures. The fuzzy controller, the quality controller and the virtual buffer are the basic parts of the control system. The fuzzy controller attempts to control the bit rate of the encoded bit stream by controlling the variation of QP while it has been optimized such that to prevent unnecessary fluctuation of QP. In computation of QP, it is assumed that the consequent video pictures have a similar degree of complexity (except in scene cuts) so the complexity of the previous encoded picture is used as estimate for the complexity of the subsequent picture and the QP of the subsequent picture is computed based on QP of previous encoded picture

636

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

Fig. 1. Block diagram of the RCA for P-pictures.

with small variation which is defined by both the fuzzy and the quality controllers. The fuzzy controller uses two feedback signals from the buffer fullness and from the bit rate. The quality controller utilizes a feedback signal from the quality (PSNR) of encoded video to minimize the fluctuation in quality. Furthermore, a lowpass filter (LPF) smoothes the feedback signal from the rate to the fuzzy controller to smooth the variations in the output of fuzzy controller. The QP for the current P-picis the sum of the QP used for encoding previous picture added to the ture and the output of the fuzzy controller or output of quality controller (1) From the system point of view, the main part of computed QP for a P-picture is the delayed version of QP used for previous picture and the control (variation) of QP is provided by the fuzzy controller and the quality controller. From the R-D calculation point of view, the R-D of previous encoded pictures is used as reference for the next picture and small deviation from the reference point is computed. The main advantage of this approach is that in the small range around the reference point, the all nonlinear functions that exist in the system can be assumed as linear without losing the computational precision. More details about the RCA parts are presented in the sequel. 1) Virtual Buffer: The virtual buffer used by the controller simulates the buffering process of the decoder in the receiving side of a CBR channel. Although it utilizes a simple model, it is nearly identical to the hypothetical reference decoder models used in different video coding standards. The occupancy of virtual buffer is updated after encoding each video picture as (2) where denotes the occupancy of virtual buffer before enshows the number of bits consumed coding th picture. indicates the target avby the th encoded picture (P or I). erage bit rate for the bit stream or the channel bandwidth and stands for the frame rate. Note that the virtual buffer models the decoder buffer at the receiver side. Therefore, the occupancy of this buffer corresponds to the free space of a buffer at the encoder or transmitter side.

The size of virtual buffer is one of the main parameters that is defined by user. The virtual buffer size is related to the required buffering delay for the encoded bit stream. The maximum delay for a random access point can be computed as: , where denotes the size of virtual buffer. A decoder can start decoding from any random access point that requires a minimum buffering delay. The average of initial buffering delay on all random access points can be used as a robust measure for the bit stream that depends on the buffer size and the used RCA. For example, the SF controller drives the occupancy of virtual buffer toward a reference point that is a value about 60% to 65% of buffer size. Therefore, the average buffer fullness during operating time is about 60%–65% of buffer size , and the average delay can be estimated as: denotes the average buffering delay. where 2) Fuzzy Controller: The rate control approach used in this paper, in which the variation in QP is calculated instead of QP itself, is based on our previous RCA presented in [25] that uses a deterministic control approach. From the deterministic approach we learned that many heuristic functions coexist with the nonlinear R-D functions in the rate control process. As a new approach, the fuzzy controller is selected for this structure because the nonlinear functions and the complexities that exist in rate control task can be simply included in the fuzzy rules and fuzzy membership functions. Generally, a fuzzy controller can be designed based on the expert experiences or it can be learnt from the examples. Therefore, a fuzzy controller is a good option to use the many heuristic results for the video rate control. Moreover, according to the block diagram shown in Fig. 1, a controller is required to define a small quantized value based on rough measurements on the bit rate and buffer fullness. These properties make it fit to a fuzzy controller with low resolution inputs and output. The fuzzy controller has two input signals that are normalized values of the buffer occupancy and the actual bit rate of p-pictures. Buffer occupancy is normalized by the buffer size and the actual bit rate of P-pictures is normalized by the target bit rate for P-pictures. While in VBR the consumed bit budget by P-pictures can be very different from the consumed bit budget by I-pictures, depending on the frequency of I-pictures in the bit stream, the target bit rate of P-pictures can be very different from the whole target bit rate. It is attempted to estimate a precise value for the target bit rate of P-pictures to be used for the normalization purpose. The fuzzy inputs are defined as (3) (4) denotes the consumed bit budget by the previous enwhere coded P-picture. stands for the interval of periodic I-pictures in the bit stream in terms of number of pictures. indicates the coding complexity of I-pictures relative to P-pictures and it is computed as (5) and denote the average consumed bit budgets where by the encoded I- and P-pictures, respectively. If the previous

REZAEI et al.: SF RATE CONTROLLER FOR VBR VIDEO

637

TABLE I SUMMARIZATION OF THE IF–THEN FUZZY RULES

Fig. 2. Membership functions of the linguistic variables.

encoded picture is an I-picture, the value of in (4) is reset to the value of . To suppress the fluctuation of QP results of short-term variations in complexity of video pictures, the LPF before input to the fuzzy (LPF) smoothes the variation of controller. The impulse response of used LPF is

TABLE II DESIRED CENTRAL VALUES FOR THE OUTPUT OF FUZZY SYSTEM

(6) is a constant value and good results are obtained with . All fuzzy rules are summarized in the Table I. The content of table specifies the output of the fuzzy controller. The letters H, L, M and V correspond to linguistic specifications of High, Low, Medium and Very. The number before V shows the number of repetition or level of strength. As an example from the table, it is H) Then (Output is can be expressed as if ( is VL and 3VH). 3VH stands for Very Very Very High. The input signals are specified by their fuzzy membership functions (MSF). The MSF quantifies the grade of membership of the input to a given set. The value 0 means that the member is not included in the given set, 1 describes a fully included member. The values between 0 and 1 characterize fuzzy members. The numbers of 9 and were employed. The and 7 MSFs for the two inputs linguistic fuzzy rules and MSFs were designed based on provided experiences form our previous works [25], [27], [28]. The asymmetric structures in the table of fuzzy rule and fuzzy MSF are related to a number of facts which affect the operation of RCA. The nonlinearity of the R-D function and the difference between the bit budgets of I and P-pictures are two key points that cause the asymmetry in the structures. The other key point is that the output gain of feedback loop is a function of buffer conditions. A more aggressive control is required when the buffer fullness is close to critical conditions to prevent underflow and overflow and a looser control is preferred when the buffer fullness is far from the critical conditions to prevent unnecessary variations in quality of encoded video. After preliminary design of the fuzzy system, an optimization process was performed for fine tuning the fuzzy MSFs. In the optimization process several parameters including average bit rate, average PSNR, average QP, and standard deviation of PSNR were considered. The final distributions of MSFs are shown in the Fig. 2. The detailed information of MSFs required for the implementation is presented in the Appendix A. The desired central values for the output of where

fuzzy system correspond to the fuzzy rules in Table I are depicted in Table II. A well-known and simple fuzzy system with two inputs using “product inference engine,” singleton fuzzifier, and center-average defuzzifier, as in [31], was used.

(7)

where and

denotes

approximated output are fuzzy sets with and membership functions defined for inputs and , respectively. The , is chosen as the center of output fuzzy set, denoted by output desired value. More information about the derivation steps of the fuzzy system is presented in [31] and [32]. See [33] and [34], for a fast introduction to the fuzzy logic. The output of fuzzy system is passed through a gain control block that adaptively tunes the gain of feedback loop according to the buffer size (or delay) and the video content properties as (8)

where is a constant coefficient which can be used for fine tuning of the RCA according to the video content properties. According to experimental results, the range of (0.2–0.6) for is proposed. The higher loop gain is more suitable for the video

638

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

contents with high motion activities. However, using one value in the middle range is good enough to be used for the for the adaptively to the conall types of video contents. To tune tent, in the range, it can be roughly considered as a linear function of the average relative complexity as defined in (5). Simply, can be selected corthe values of 0.3, 0.4, 0.5, and 0.6 for respondingly to the values of 6, 5, 4 and 3 of the local average of relative complexity. 3) Quality Controller: The quality controller computes an additive term to the final QP based on the quality of previous encoded picture and a local average quality on encoded pictures. The idea is while the fuzzy controller provides the buffer constraint; the quality controller minimizes the variation in quality of encoded video by using the available buffer space. To com, the average values (over encoded pute the quality QP or pictures in the current scene) of QP and PSNR are considered as a reference point. Then, it is attempted to drive the PSNR of following pictures to the reference point with a gain proportional to the current deviation from the reference point. As explained in the Annex B, the quality QP used in (1) is computed by (9) where is the average QP over encoded frames in the current scene. and are the PSNR of previous encoded frame and the average PSNR over the encoded frames in the current scene, respectively. The PSNR values are computed based on the luminance component. The is a constant coefficient that defines the gain of quality feedback loop. The bigger gain for the quality feedback loop provides more constant quality while it increases the fluctuations of buffer fullness. This means addiis tional use of available space in the buffer. The value of limited to the range of ( 1, 1) to be sure about the buffer constraint and to prevent instability in the control system. In fact the fuzzy controller and the quality controller operate in parallel feedback loops that can have positive or negative outputs. If the outputs of two controllers have different signs, it means there a confliction between the controlling of rate and the controlling of quality. When the buffer conditions are critical, the priority is given to the fuzzy rate controller to be sure about the buffer constraint. Note that according to (9), the output of the quality controller is bounded in the range of ( 1, 1) while the output of the fuzzy controller can vary in a wider range. Therefore, when the buffer conditions are critical the output of the fuzzy controller is dominant and in the normal buffer conditions, depending on the quality measure, the quality controller can be dominant. Note also that it is possible to have a considerable increase or decrease in QP just by the quality controller because of integration of consequent outputs after a number of frames. B. I-Pictures The QP of I-picture is computed based on the picture complexity, target bit rate, buffer size (delay), buffer occupancy and scene cut information. Fig. 3 depicts a block diagram for the calculation of QP for I-pictures. There are two types of I-pictures in the bit stream, periodic I-pictures which are placed in locations with a constant frequency and the I-pictures which are inserted

Fig. 3. Block diagram of the RCA for the I-pictures.

at the beginning of scene cuts. The proposed QP for both types of I-pictures is formulated as (10) where denotes the QP of I-picture. is a constant value called content adaptation factor that can be used for fine tuning of the rate control according to video content properties. A provides a more aggressive control. To tune bigger automatically according to the content, it can be assumed as a function of frequency of scene changes. A bigger is is a reference suitable for more frequent scene changes. computed differently for the two types of value for the or the variation of around the I-pictures. The control of reference QP is imposed by three controlling signals , and . The signals and adapt the QP of I-pictures according to the coding complexity of video picture and the occupancy of virtual buffer, respectively. Moreover, adapts the QP according to the target delay which is a function of target bit rate and the size of virtual buffer. While defines a reference value for the QP, the controlling signals make small variations around the reference value. To compute the small variations around the reference point simply, it is precise enough to use a simple first-order R-D model as (11) where and denote the rate and distortion, respectively. stands for the coding complexity. However, for small variations of QP around the reference point, an approximated linear function between QP and distortion can be assumed and the R-D model above can be rewritten as (12) More details about the controlling signals and reference QP are presented in the sequel. 1) Reference QP: While the quality of I-pictures has a great impact on the quality of the following pictures in VBR video, and the reference QP is the main part of the final QP, it has an important role from the R-D point of view. The reference QPs for two types of I-pictures are handled differently. For the periodic I-pictures in which the consequent pictures have a high degree of correlation in terms of content and complexity, the

REZAEI et al.: SF RATE CONTROLLER FOR VBR VIDEO

639

idea is to have I-pictures with a quality as close as possible to the quality of neighboring pictures. Implementing a LPF similar to (6) on QP of previous encoded pictures gives a local average value which can be used as the reference QP for the current I-picture. However, using a similar QP for encoding the I-picture and the neighboring P-pictures provides a higher quality for the I-picture than the p-pictures. This difference is acceptable and it is useful for overall quality. The LPF prevents larger differences that can be existed between the quality of I-picture and P-pictures. The I-pictures at scene cuts may or may not have correlation in terms of complexity and/or content with the previous encoded pictures. Therefore, any estimation independently of previous encoded frames or only based on the previous encoded frames may lose the bit budget or the quality. From this point of view estimating a fit QP for an I-picture at scene cut is quite challenging. A method for calculating QP for I-pictures at scene cuts based on target bit rate and the coding complexity of the frame was proposed in [28]. An easier solution is proposed in this paper. As a simple solution, the reference QP for the first I-pictures at scene cut is calculated as (13) where is a local average as for frequent I-pictures and is a constant QP in the middle range, e.g., (26–34) for H.264/AVC, as a global average over various video content. The local average value of QP or keeps the quality of the I-picture close to those of previous encoded pictures when there is some correlation between the two consequent scenes in terms of content. guarantees the allocation of a bit budget in the middle The range if there is no correlation between consequent scenes in terms of complexity. it is proposed to esTo find a more accurate value for the tablish a 3-D look up table including a limited number of cells corresponding to practical operating points and based on average encoding results over a set of various video content and encoding parameters. The target bit rate, frame rate and picture size are the three basic parameters which are proposed as dimensions for the lookup table. For example, average simulation results give a QP about 28 for QVGA picture format, 15 fps and 300 kbps. 2) Coding Complexity Adaptation: The complexity adaptation signal or controls the QP of I-picture according to the coding complexity of the picture. Using the R-D model (12), considering the average values of QP and complexity of I-pictures as a reference point, the complexity adaptation signal based on a drift from the reference can be derived as (14) denotes the average value of QP of all encoded I-picwhere is an experimentally defined constant value (typically tures. denotes the about 0.3) called complexity adaptation factor. coding complexity of the I-picture and stands for the average value of coding complexity of all encoded I-pictures. Various criteria for the estimation of coding complexity can be used. A measure for the coding complexity of I-pictures was proposed

in our previous work [35]. With a small modification, the coding complexity is proposed to be computed as (15)

(16)

(17) (18) where is the variance of luminance pixels in one four-by-four block, and denote a vertical and a horiand zontal texture measures on the luminance pixels. , are average values of , and , respectively, over all is a pixel at the block edge, the blocks in the picture. If and can be in the neighboring blocks. 3) Buffer Fullness Adaptation: The buffer fullness adaptacontrols the QP of I-picture according to the tion signal or size and occupancy of the virtual buffer. Several points are considered in definition of fullness adaptation signal. First, a more aggressive control is needed when the buffer is close the critical conditions i.e., underflow or over flow, and a looser control is needed when the buffer fullness is far from the critical conditions to prevent unnecessary variations in QP and thereafter to maximize the average quality of compressed video. Second, as an I-picture can consume a bit budget several times more than a P-picture, a virtual buffer at the receiver side of the channel close to overflow condition may go to normal condition by a normal I-picture without any changes in the QP. On the other hand, when the buffer is close to underflow, the QP of I-picture should be increased to a relatively high value to prevent underflow. This means the fullness adaptation signal should be more aggressive in low buffer fullness conditions than high buffer fullness conditions. Finally, the buffer fullness adaptation feedback should be coherent with the other parts of RCA. Considering the points above, for the virtual buffer at the receiver side of the channel, a simple formulation for the buffer fullness adaptation signal is proposed as (19) where is defined as (3). The buffer adaptation function is depicted graphically in Fig. 4. 4) Delay Adaptation: The idea is while a smaller buffer size is used to achieve a smaller average delay, the allocated bits to I-pictures should be smaller than when a larger buffer is used to prevent buffer underflow. The delay adaptation signal or biases the QP of I-picture with a constant value according to the buffer size. Using the R-D model (12), considering a reference point for the buffer size in which no adaptation is needed, the delay adaptation signal based on a drift from the reference can be derived as

(20)

640

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

TABLE III COMPARISON THE RESULTS OF SF RATE CONTROLLER WITH THE CONSTANT QP CASE AND JM RCA IN A TARGET BITRATE ABOUT 300 kbps, 30 fps, AND QCIF PICTURE FORMAT

Fig. 4. Buffer adaptation function graph.

where denotes the average value of QP of all encoded P-picindicates a constant value as a reference point for the tures. buffer size in which no adaptation is needed. For example, simuthe lation results show that in many cases when buffer size is large enough to have I-pictures with a QP similar to P-pictures and there is no need for delay adaptation. Therecan be selected as . fore, IV. SIMULATION RESULTS VBR video applications cover a wide area in the R-D space between CBR and constant quality regions. The operating point in R-D space is defined by the RCA which controls the variations in the bit rate and the quality of encoded video bit stream. The amount of variations in bit rate is proportional to buffering delay in the system. Therefore, different VBR applications are mainly different in terms of the required buffering delay. To evaluate the proposed SF algorithm from different points of view, two sets of video sequences were used. In the first set, a number of the known video sequences including Foreman, Carphone, News, Hall, Silent, Container, New York, and Football with QCIF picture format were concatenated to make (30 s) long sequences suitable for the test. The second set includes four long (60 s) video sequences with different contents including: sport, news, music video, and movie contents with picture format QVGA captured from TV. The video sequences in the second set have many short scene cuts that are very challenging for the rate control. To evaluate the proposed SF algorithm from the quality and delay points of view, we compared the results of SF algorithm with the results of constant QP (CQP) encoding and also with the encoding results provided by H.264/AVC JM RCA. The video sequences of the first video set were encoded by the three algorithms for an average bit rate close to 300 kbps, frame rate of 30 fps, and frequency of 0.5 Hz for I-pictures. The buffer size of 250 kb was allocated to the SF and JM algorithms in this simulation. The typical values (mentioned in Section III) were used for the and tuning parameters and other constant coefficients. The JM RCA was used with the control at frame level (i.e., Basic Unit is frame) to operate closer to VBR. The value of 0.5 was used for the parameters of and according to

recommendation. The Nokia H.264/AVC codec2 was used for encoding. The level 3 of baseline profile, with R-D optimization RDO was used for the implementation of the three RCAs. The number of reference frames was set to 1, the number of bytes per slice was set to 1000, and other encoding parameters were used as default. To achieve similar average bit rates in three cases, each sequence was first encoded by a constant QP to get an average bit rate close to 300 Kb/s in the CQP case and then the target bit rate in SF and JM RCA was set to the average bit rate result of encoding in CQP case. Table III shows the results of simulation. The averaged results are also presented in the table. In comparison with the CQP case, the SF algorithm has provided a lower delay (0.80, 0.23 s), a higher PSNR (40.37, 40.47 dB), and a smaller average QP (22.38, 22.30) at the expense of a larger standard deviation (STD) of PSNR (1.59, 0.55 dB). In comparison with the JM algorithm, the SF algorithm provides a lower standard deviation for the PSNR (1.99, 1.59), a smaller average QP (22.46, 22.30) in a very close operating point in terms of PSNR (40.52, 40.47 dB) and delay (0.20, 0.23 s). Depicted graphs in Fig. 5 show the graphical encoding 2[Online]. Available: ftp://standards.polycom.com/IMTC_Media_Coding_ AG/

REZAEI et al.: SF RATE CONTROLLER FOR VBR VIDEO

641

TABLE IV COMPARISON THE RESULTS OF SF RATE CONTROLLER WITH THE CONSTANT QP CASE AND JM RCA WITH A TARGET RATE ABOUT 300 kbps, 15 fps, QVGA PICTURE FORMAT

Fig. 5. Simulation results for the Football video sequence encoded by the SF and the JM RCAs.

results for the Football video sequence. Although SF and JM algorithms have similar average PSNR in the table, the graphical results show smoother variations for QP and PSNR in the results of SF algorithm than those of JM algorithm. Moreover, the strong correlation between the graphs of QP, PSNR and occupancy of buffer provided by the SF algorithm means the available resources including the bit rate and delay are allocated according to the coding complexity of video content. The second simulation was run on the second video set to evaluate the SF algorithm in another frame rate (15 fps) and picture format (QVGA). The other encoding parameters were selected as in the previous simulation. Simulation results are depicted in Table IV. In comparison with CQP case, the SF algorithm has provided a lower delay (5.11, 0.39 s), higher PSNR (35.33, 35.53 dB), and a lower average QP (29.25, 28.91) at the expense of a higher value for the standard deviation of PSNR (1.48, 3.39 dB). In comparison with the JM algorithm, the SF algorithm provides a higher PSNR (35.41, 35.53 dB), a lower standard deviation for the PSNR (3.62, 3.39), and a smaller average QP (29.33, 28.91) in a very close operating point in terms of delay (0.35, 0.39 s). Comparing provided results on the two video sets in Tables III and IV show an overall higher performance for SF RCA on the second video set. This higher performance is related to the large number of scene changes in the second video set and the good solution of SF RCA for calculation of QP at scene cuts. It is remarkable that the results above are provided for small values of buffer size while according to the standard, it is possible to have larger buffers. Therefore, additional enhancement in the quality of compressed video can be achieved in the standard range by the proposed SF RCA at the expense of a higher delay. Based on additional investigations, it is concluded that higher delays do not affect the average PSNR much but the standard deviation of PSNR is decreased for higher delays.

To evaluate the proposed RCA from the buffer constraint point of view, a simulation was run on the second video set which is more difficult for control. The SF algorithm was tuned as the previous simulation and the video sequences were encoded for the target bit rate of 300 kbps and frame rate of 15 fps by different buffer sizes including 100, 150, 200, 250, 300, and 350 kb. For each value of the buffer size, the simulation was repeated for different values for the frequency of periodic I-pictures including 0.25, 0.33, 0.5, 0.6, 0.75, and 1.0 Hz. From the buffer and delay points of view, simulation results show that the encoded bit streams by the proposed SF algorithm strictly obey the buffer and delay constraints. In addition, the simulation results plotted in Fig. 6 shows the standard deviation of QP as a function of buffering delay and the period of I-pictures. The flat curve for the standard deviation of QP as a function of I-picture period means that the performance of RCA is perfectly independent of picture type and the I-pictures are encoded as good as P-pictures. Furthermore, the linear function between the standard deviation of QP and the average buffering delay proves the fine adaptation of RCA to the delay. Moreover, the average (of all sequences) PSNR of the luma component has been depicted as a function of buffering delay and the period of I-pictures in Fig. 7. The regular 2-D function for the PSNR proves the fine adaptation of SF algorithm to different operating points. To show how the proposed RCA is easily tunable for a wide range of applications, the Glasgow video sequence was selected for another simulation. It has many scene cuts and different types of scenes, which is a challenge for the rate control. The sequence was encoded by the proposed RCA for three different delays including 0.4, 0.8, and 1.4 s and with the target bit rate of 125 kbps and frame rate of 12.5 fps while the other encoding and tuning parameters are remained fixed. The graphical encoding results are depicted in Fig. 8. The curves show how well the RCA can operate in a wide area between CBR and constant

642

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

Fig. 6. Effect of I-picture period and buffering delay on the standard deviation of QP.

Fig. 8. Simulation results for the Glasgow video sequence in different buffering delay (D), 12.5 fps, QCIF.

TABLE V COMPARISON OF THE SF AND JM RATE CONTROL ALGORITHMS IN TERM OF PROCESSING TIME USING INTEL PENTIUM 4, 2.8 GHz PROCESSOR

Fig. 7. Effect of I-picture period and buffering delay on Luma PSNR.

quality. The strong correlation between the curves shows that the bit allocation is always implemented according to the complexity of the video content and available resources including the bit budget and delay. From computational complexity point of view, the proposed RCA has a remarkably low degree of complexity. It needs just few operations to calculate the QP for each picture. There is no R-D model to update based on a complexity measure such as variance or MAD such as usual algorithms. The complexity estimation for the I-pictures is the most complex part of this algorithm. However, it is used just for the I-pictures while a similar operation is used for all pictures (or MBs) in many RCAs proposed in the literature. To compare the complexity of JM and SF RCAs, another simulation was run. The first 100 video frames of four video sequences including Foreman, Carphone, News, and Hall were encoded by two RCAs with encoding parameters as before. The consumed processing times by the RCAs were measured by a high accuracy using the clock of processor (Intel Pentium IV, 2.8 GHz). To minimize the measuring error results of time sharing operation of processor, the encoding was repeated five times and the minimum measured value was selected for

each sequence. However, the measured values over repetitions are very similar with a small variance. The measured results are shown in Table V. The average results over the video sequences show that the JM RCA consumes a processing time about 384 s in average for each frame while the SF RCA consumes a processing time about 15 s in average for each frame. According to the provided results there is a big difference between the complexities of two algorithms. The complexity of JM RCA is even higher when the basic unit is smaller than a frame. The proposed RCA can be less complex if only a subgroup of MBs in the I-pictures is used for the complexity estimation of the picture as in [28]. The complexity of the proposed RCA is even less than a typical scene cut detector algorithm which is run for all pictures. To simplify the scene cut detection task, a scene cut detection algorithm was proposed in [28]. To determine the scene cuts a simple criterion is used as if(MAD>Threshold)

Scene Cut

(21)

where MAD is mean absolute difference between luminance components of two selected small groups of pixels in the pre-

REZAEI et al.: SF RATE CONTROLLER FOR VBR VIDEO

vious and current uncompressed frames. The pixel groups are selected according to a special pattern. One pixel is selected pixel in row ordered frame, where is an odd from every number in the range of (7, 15). It is notable that the constant coefficients used in the algorithms can be used for any applications with the mentioned typical values. The buffer size that defines the allowed delay is the main user defined tuning parameter. Therefore, the proposed RCA is an easy tunable algorithm.

643

some spaces are available in the buffer. Starting with the PSNR feedback as the quality measure we have

If for small variation of QP around an operating point the quantization error is assumed proportional to the QP it can be concluded

V. CONCLUSION A novel SF video RCA for VBR applications was proposed. It can be easily tuned for a wide range of applications with various target delays. The proposed video RCA has been optimized to utilize the advantage of VBR video to improve compression performance and maintain constant quality. Simulation results with a H.264/AVC video codec show that the algorithm can provide bit streams with high-level average quality for a wide range of applications from constant quality to CBR.

where yields

,

are constant values. Derivative of

or for small changes of QP and

on QP

it can be rewritten as

APPENDIX A. Fuzzy Membership Functions The input signals to the fuzzy system are specified by their fuzzy MSFs. The numbers of 7 and 9 trapezoidal MSFs for and were employed, respectively. Each the two inputs MSF can be defined by the trapezoid corners. The whole MSFs are summarized in two matrixes corresponding to two inputs as shown below

Considering the correlation between consequent video pictures, to drive the quality of following pictures toward a reference point with a certain feedback gain, the output of quality controller is defined as

where

is a constant that defines feedback gain. and denote the QP and PSNR of reference point. The in formula above is the PSNR of previous encoded frame as estimate for the PSNR of next frame if they are encoded with similar QPs. If the average values of PSNR and QP over all encoded picture in the same video scene are considered as the reference point, the output of quality controller is

where and denote the average of QP and PSNR on all encoded pictures in the same video scene. REFERENCES

B. Quality Controller The idea is to encode video pictures with minimum variation in quality by a feedback from the quality of encoded video pictures. This is possible with more variation in the bit rate when

[1] T. V. Lakshman, A. Ortega, and A. R. Reibman, “VBR video: Tradeoffs and potentials,” IEEE Proc., vol. 86, no. 5, pp. 952–973, May 1998. [2] T. R. Gardos, Video codec test model, near-term, Version 8 (TMN8), ITU—Telecommunications Standardization Sector Doc. Q15-A-59, Jun. 24–27, 1997. [3] S. Ma, W. Gao, P. Gao, and Y. Lu, “Rate control for advance video coding (AVC) standard,” in IEEE Int. Symp. Circuits Syst. ISCAS ’03), May 25–28, 2003, vol. 2, pp. 892–895. [4] MPEG-4 Video Verification Model, Version 18.0, Appendix I: Rate Control, ISO/IEC JTC1/SC29/WG11 N3908, Jan. 2001.

644

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

[5] T. Chiang and Y.-Q. Zhang, “A new rate control scheme using a new rate-distortion model,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 1, pp. 246–250, Feb. 1997. [6] G. Sullivan, T. Wiegand, and K. P. Lim, “Joint model reference encoding methods and decoding concealment methods,” Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG Document JVT-I049, Sep. 2003. [7] X. K. Yang, Y. M. Tan, and N. Ling, “Rate control for H.264 with twostep quantization parameter determination but single-pass encoding,” J. Appl. Signal Process., vol. 2006, pp. 35–37, 2006. [8] J.-C. Tsai and C.-H. Hsieh, “Modified TMN8 rate control for low-delay video communications,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 6, pp. 864–868, Jun. 2004. [9] F. Pan, Z. Li, K. Lim, and G. Feng, “A study of MPEG-4 rate control scheme and its improvements,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 5, pp. 440–446, May 2003. [10] P. Navakitkanok and S. Aramvith, “Improved rate control for advanced video coding (AVC) standard under low delay constraint,” in Proc. Int. Conf. Inf. Technol.: Coding Comput. (ITCC), 2004, vol. 2, pp. 664–668. [11] S. Ma, W. Gao, and Y. Lu, “Rate-distortion analysis for H.264/AVC video coding and its application to rate control,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 12, pp. 1533–1544, Dec. 2005. [12] M. Jiang and N. Ling, “On enhancing H.264/AVC video rate control by PSNR-based frame complexity estimation,” IEEE Trans. Consum. Electron., vol. 51, no. 1, pp. 281–286, Feb. 2005. [13] M. Jiang and N. Ling, “Low-delay rate control for real-time H.264/AVC video coding,” IEEE Trans. Multimedia, vol. 8, no. 3, pp. 467–4776, Jun. 2006. [14] Z. He and S. K. Mitra, “A unified rate-distortion analysis framework for transform coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 12, pp. 1221–1236, Dec. 2001. [15] Z. He and S. K. Mitra, “Optimum bit allocation and accurate rate control for video coding via -domain source modeling,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 10, pp. 840–849, Oct. 2002. [16] C. Y. Chang, T. Lin, D. Y. Chan, and S. H. Hung, “A low complexity rate-distortion source modeling framework,” in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP ’06), May 2006, pp. 929–932. [17] E. C. Reed and F. Dufaux, “Constrained bit-rate control for very low bit-rate streaming video applications,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 7, pp. 882–889, Jul. 2001. [18] A. G. Nguyen and J.-N. Hwang, “SPEM online rate control for real time streaming video,” in Proc. IEEE Int. Conf. Inf. Technol., 2002, pp. 65–70. [19] S. Takamura and N. Kobayashi, “MPEG-2 one-pass variable bit rate control algorithm and its LSI implementation,” in Proc. IEEE Int. Conf. Image Process., 2002, pp. 942–945. [20] A. Jagmohan and K. Ratakonda, “MPEG-4 one-pass VBR rate control for digital storage,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 5, pp. 447–452, May 2003. [21] B. C. Song and K. W. Chun, “A one-pass variable bit-rate video coding for storage media,” IEEE Trans. Consum. Electron., vol. 49, no. 3, pp. 689–692, Aug. 2003. [22] S. Kondo and H. Fukudaa, “A real-time variable bit rate MPEG2 video coding method for digital storage media,” IEEE Trans. Consum. Electron., vol. 43, no. 3, pp. 537–543, Aug. 1997. [23] V. Varsa and M. Karczewicz, “Long window rate control for video streaming,” in Proc. 11th Int. Packet Video Workshop, Korea, May 2001, pp. 154–159. [24] I.-M. Pao and M.-T. Sun, “A rate-control scheme for streaming video encoding,” in Proc. 32nd Asilomar Conf. Signals, Syst. Comput., Nov. 1998, vol. 2, pp. 1616–1620. [25] M. Rezaei, S. Wenger, and M. Gabbouj, “Video rate control for streaming and local recording optimized for mobile devices,” in Proc. IEEE Int. Symp. Personal Indoor Mobile Radio Commun. (PIMRC’05), Berlin, Sep. 2005, vol. 4, pp. 2284–2288. [26] D. H. K. Tsang, B. Bensaou, and S. T. C. Lam, “Fuzzy-based rate control for real-time MPEG video,” IEEE Trans. Fuzzy Syst., vol. 6, no. 4, pp. 504–516, Nov. 1998.

[27] M. Rezaei, A. Akhbardeh, M. M. Hannuksela, and M. Gabbouj, “Fuzzy rate controller for variable bitrate video in mobile applications,” in IEEE Int. Conf. Commun. (ICC), Istanbul, Turkey, Jun. 2006, vol. 7, pp. 3197–3201. [28] M. Rezaei, M. M. Hannuksela, and M. Gabbouj, “Low-complexity fuzzy video rate controller for streaming,” presented at the IEEE Int. Conf. Acoustic, Speech and Signal Processing (ICASSP ’06), Toulouse, France, May 2006. [29] M. Rezaei, M. Gabbouj, and I. Bouazizi, “Delay constrained fuzzy rate control for video streaming over DVB-H,” in IEEE Int. Conf. Intell. Inf. Hiding Multimedia Signal Process. (IIHMSP ’06), Pasadena, CA, Dec. 2006, pp. 223–227. [30] X. M. Zhang, A. Vetro, Y. Q. Shi, and H. Sun, “Constant quality constrained rate allocation for FGS-coded video,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 2, pp. 121–130, Feb. 2003. [31] L. X. Wang, Adaptive Fuzzy System and Control: Design and Stability Analysis. Englewood Cliffs, NJ: Prentice-Hall, 1994. [32] L. X. Wang, “Stable adaptive fuzzy control of nonlinear systems,” IEEE Trans. Fuzzy Syst., vol. 1, no. 2, pp. 146–155, May 1993. [33] L. A. Zadeh, “Soft computing and fuzzy logic,” IEEE Softw., vol. 11, no. 6, pp. 48–56, Nov. 1994. [34] L. A. Zadeh, “Fuzzy logic,” IEEE Computer Mag., vol. 21, no. 4, pp. 83–93, Apr. 1988. [35] M. Rezaei, S. Wenger, and M. Gabbouj, “Analyzed rate distortion model in standard video codecs for rate control,” in Proc. IEEE Workshop Signal Process. Syst. (SIPS ’05), Athens, Greece, Nov. 2005, pp. 550–555.

Mehdi Rezaei (M’04) received the B.S. degree in electronics engineering from Amir Kabir University of Technology (Polytechnic of Tehran), Tehran, Iran, in 1992, and the M.Sc. degree in electronics engineering from Tarbiat Modares University, Tehran, Iran, in 1996. He was an academic member of the Electrical Engineering Department, University of Sistan & Balouchestan, Iran, during 1997–2003. He has been a Researcher of Institute of Signal Processing, Tampere University of Technology, Finland, since 2003. His research interests include video signal processing, variable bit rate (VBR) video, video rate control, video splicing, region of interest (ROI) video coding, scalable video coding (SVC), video enhancement, video streaming and communication, mobile TV and DVB-H. He has published several papers in these fields. Mr. Rezaei received the Nokia Foundation Award in 2005 and 2006.

Miska M. Hannuksela (M’02) received the M.S. degree in engineering from Tampere University of Technology, Tampere, Finland, in 1997. He is currently a Research Leader in Nokia Research Center, Tampere, Finland. From 1996 to 1999, he was a Research Engineer with Nokia Research Center in the area of mobile video communications. From 2000 to 2003, he was a Project Team Leader and Specialist in various mobile multimedia research and product projects in Nokia Mobile Phones. Since 2003, he has been a Research Manager, Senior Research Manager, and Research Leader heading teams in the area of visual technologies and real-time multimedia communications in Nokia Research Center. He has been an active participant in the ITU-T Video Coding Experts Group since 1999 and in the Joint Video Team of ITU-T and ISO/IEC since its foundation in 2001. He has also contributed to several other multimedia standards, such as IP datacasting over DVB-H and 3GPP multimedia services. His research interests include video error resilience, scalable video coding, and video communication systems. He has co-authored several tens of papers in these fields.

REZAEI et al.: SF RATE CONTROLLER FOR VBR VIDEO

Moncef Gabbouj (SM’95) received the B.S. degree in electrical engineering from Oklahoma State University, Stillwater, in 1985, and the M.S. and Ph.D. degrees in electrical engineering from Purdue University, West Lafayette, IN, in 1986 and 1989, respectively. He is currently a Professor and Head of the Institute of Signal Processing at the Tampere University of Technology, Tampere, Finland. He is the Co-Founder and past CEO of SuviSoft Oy, Ltd. From 1995 to 1998, he was a Professor with the Department of Information Technology, Pori School of Technology and Economics, and during 1997 and 1998, he was a Senior Research Scientist with the Academy of Finland. From 1994 to 1995, he was an Associate Professor with the Signal Processing Laboratory, Tampere University of Technology. From 1990 to 1993, he was a Senior Research Scientist with the Research Institute for Information Technology, Tampere. His research interests include multimedia content-based analysis, indexing and retrieval, nonlinear signal and image processing and analysis, and video processing and coding. He is the coauthor of over 300 publications. Dr. Gabbouj is an Honorary Guest Professor at Jilin University, China (2005–2010). He served as Distinguished Lecturer for the IEEE Circuits and Systems Society in 2004 and 2005 and Past-Chairman of the IEEE-EURASIP NSIP (Nonlinear Signal and Image Processing) Board. He was Chairman of the Algorithm Group of the EC COST 211quat. He served as Associate Editor of the IEEE TRANSACTIONS ON IMAGE PROCESSING, and was a Guest Editor of the European journals Applied Signal Processing (Image Analysis for Interactive Multimedia Services, Part I in April 2002 and Part II in June 2002) and Signal

645

Processing (special issue on nonlinear digital signal processing, August 1994). He is the past Chairman of the IEEE Finland Section and past Chair of the IEEE Circuits and Systems Society, Technical Committee on Digital Signal Processing, and the IEEE SP/CAS Finland Chapter. He was also Chairman of CBMI 2005, WIAMIS 2001, and the TPC Chair of ISCCSP 2004 and 2006, CBMI 2003, EUSIPCO 2000, NORSIG 1996, and the DSP track chair of the 1996 IEEE ISCAS. He is also a member of EURASIP Advisory Board and a past member of AdCom. He also served as Publication Chair and Publicity Chair of IEEE ICIP 2005 and IEEE ICASSP 2006, respectively. He is the Director of the International University Programs in Information Technology and vice member of the Council of the Department of Information Technology at Tampere University of Technology. He is also the Vice-Director of the Academy of Finland Center of Excellence SPAG, Secretary of the International Advisory Board of Tampere International Center of Signal Processing, TICSP, and member of the Board of the Digital Media Institute. He serves as Tutoring Professor for Nokia Mobile Phones Leading Science Program (2005–2006 and 1998–2001). He is a member of Eta Kappa Nu, Phi Kappa Phi, IEEE SP, and CAS societies. He was the recipient of the 2005 Nokia Foundation Recognition Award, co-recipient of the Myril B. Reed Best Paper Award from the 32nd Midwest Symposium on Circuits and Systems, and co-recipient of the NORSIG 94 Best Paper Award from the 1994 Nordic Signal Processing Symposium. He has been involved in several past and current EU Research and education projects and programs, including ESPRIT, HCM, IST, COST, Tempus, and Erasmus. He also served as Evaluator of IST proposals and Auditor of a number of ACTS and IST projects on multimedia security, augmented and virtual reality, and image and video signal processing.