2013 European Modelling Symposium
QoE Prediction Model Based on Fuzzy Logic System for Different Video Contents
Mohammed Alreshoodi
John Woods
School of Computer Science and Electronic Engineering University of Essex, Colchester, UK
[email protected]
School of Computer Science and Electronic Engineering University of Essex, Colchester, UK
[email protected]
of the end user QoE versus network-oriented QoS and finding a meaningful mapping function between them. Various Artificial Intelligence (AI) based methodologies have been used to realize predictive QoE models. However, the vast majority of existing models are only partial solutions, neglecting many of the parameters available. This paper presents a Fuzzy Logic Inference System (FIS) that investigates how QoS factors contribute to the QoE, and how the video content type affects the QoE. The proposed methodology employs a learning system which derives the QoS/QoE mapping for single-layer coded video allowing an optimal selection of QoE given the constraints. The fuzzy model is tested using a library of representative video. Section II presents an overview of the related works for video QoE measurement. Section III presents the proposed QoE prediction model. Section IV presents the evaluation set-up. Results are presented in section V. Section VI shows the evaluation of the proposed model. The conclusions and future work can be found in Section VII.
Abstract— A model that can predict end user satisfaction or QoE (Quality of Experience) directly from the network QoS (Quality of Service) is still illusive in the field of image processing. This motivates the derivation of a meaningful QoS to QoE mapping function to allow one to be predicted in the absence of the other. This paper presents an affine fuzzy logic based model that can estimate the visual perceptual quality for different video content types using a combination of network level and application level QoS parameters. Video contents are classified based on their spatio-temporal feature extraction. The video QoE is predicted in terms of the Mean Opinion Score (MOS). From the results it is clear that the QoE is video content dependent. Also, the network level parameters have more impact on video quality than the application level parameters. Results show that the Fuzzy logic-based model provides high prediction accuracy. The performance of the model was evaluated using a public dataset with good prediction accuracy (~ 95%). The developed model has use in control methods for streaming standard encoded video. Keywords—QoE; QoS; video quality; fuzzy logic.
II. I.
INTRODUCTION
QoE estimation of video traffic can be subjective or objective. The subjective methods like MOS [2] are costly, time consuming and require an appropriate testing environment and strict rules. Objective methods overcome these limitations by providing mathematical calculation for the quality estimation. Peak Signal-to-Noise Ratio (PSNR) is one of the most popular objective video quality metrics. However, PSNR poorly correlates with the human perception of visual quality [3]. A number of AI techniques have been used in the literature for developing objective QoE prediction models, such as FIS, Bayesian Networks, Artificial Neural Networks (ANN) and Machine Learning algorithms (Decision Trees, Support Vector Machines, k-Nearest Neighbours, etc.) [4]. The study reported in [5] proposed a prediction model based on FIS (Sugeno system) to estimate the impact of network conditions on the quality of single layer video traffic. The developed model was integrated as part of a monitoring tool in an industrial IPTV test bed. Work in [6] proposed taxonomy for measuring the QoE of virtual reality applications. The taxonomy was modelled by a well-known FIS (Mamdani system) to quantitatively measure the QoE of a haptic virtual environment. The developed model connects QoE metrics to QoS metrics according to the corresponding level of QoE. However, real-time QoE estimation is not studied in the mentioned work.
Adopting a more holistic understanding of quality as perceived by end-users is becoming a vibrant area of research. When a customer has a low quality service, the service provider cannot afford to wait for customer complaints before doing something about the service quality. According to an Accenture survey [1], about 90% of users do not want to complain about a low quality service, and simply go to another provider. Therefore, it would be a powerful tool if the service provider could continually measure the QoE and make adjustments accordingly. A variety of factors can affect the perceived quality, including network reliability, the content preparation process, terminal performance, etc. The available Quality of Service (QoS) is the major determining factor and End-toend QoS is the enabler for QoE. Thus, finding the correlation between them is a significant first step towards a more optimized feedback system that can manage video services in an efficient way. There is however a lack of an accurate quantitative description of what QoE is. The interactions between various QoS parameters and their effects on QoE are still poorly understood. In light of this there is a need to directly and quantitatively map QoS to QoE. A prediction model that can quantitatively describe QoE and how it is affected by QoS is still missing especially in the area of multi-layered video. This motivates investigation 978-1-4799-2578-0/13 $31.00 © 2013 IEEE DOI 10.1109/EMS.2013.106
RELATED WORK
597 635
B. The FIS-based Prediction Model The aim is to develop a learning model to predict video quality considering different video content types. The prediction is from a combination of parameters associated with the encoder and access network. Fig 2 shows the functional block of the proposed video quality prediction model.
In [7], the authors propose two learning models to predict the video quality in terms of the MOS. The first model is based on an Adaptive Neural Fuzzy Inference System (ANFIS), while the second uses nonlinear regression analysis. They investigate the impact of QoS on end-to-end video quality for H.264 encoded video. The results demonstrate that both models give good prediction accuracy. However, the authors conclude that the developed models need to be validated by more subjective testing. Also the author in [8] used the ANFIS approach to identify the causal relationship between the QoS parameters and the overall perceived QoE. However, the success of the AI techniques depends on the model’s ability to fully learn the non-linear relationships between QoS and QoE. From the literature, very little work has been done on predicting video QoE considering the impact of video content types and the impact of both the network level and application level parameters. Also, the majority of the existing video QoE prediction models focus on single layer video coding. However, multi-layer video has rarely been investigated. Multi-layer video coding fits the requirements of video streaming in heterogeneous usage environments. We proposed in [9] a video quality prediction model based on FIS that combined both the application and network level parameters, but it only considers one type of video content (moderate movement). In this paper, we extend the work in [9] to propose an FIS-based learning model to predict video quality for different video content types using a combination of network level and application level QoS parameters. Video contents are classified based on their spatio-temporal feature extraction. The advantage of using the FIS is that it is simple and computationally less intensive. III.
Fig. 2. The functional block diagram of the proposed prediction model.
FIS is a method that possesses ability to learn from sample data, as well as structured knowledge representation [10]. The main objective of this approach is to identify the relationship between the QoS parameters that affect the QoE and the overall perceived QoE. We achieve high accuracy and make possible dynamic adaptation using this model. The system utilizes an unsupervised one-pass technique for extracting the rules from the collected data which describes the QoE of video and is used to build a model that learns the behaviour of the QoE. Fig. 3 shows an overview of the proposed FIS-based model. The code is implemented in the Java programming language.
THE PROPOSED PREDICTION MODEL BASED ON FIS.
A. Brief Overview about FIS Fuzzy logic became an efficient technique for user modelling that could imitate human reasoning [10]. It is considered as an extension of traditional set theory as statements could be partial truths, which means lying in between absolute truth and absolute falsity [11]. As shown in Fig. 1, the FIS includes four stages; fuzzifier, rule base, inference engine, and defuzzifier. The rule base can be extracted from numerical data or predefined by experts. Upon the rules’ establishment, the FIS maps the inputs to the outputs, such mapping can be described numerically as y = f (x) [11]. RULE BASE Crisp Inputs
FUZZIFIER
DEFUZZIFIER
Crisp Outputs
INFERENCE ENGINE
Fig. 1. Overview of the FIS [11].
Fig. 3. The proposed FIS-based model.
636 598
The proposed FIS-based model comprises of four steps: 1) Identification of the Input and Output Variables. 2) Fuzzy Rule Extraction from the gathered data. 3) Normalising the contradictory rules 4) Prediction Process These four steps are discussed in details in the following section.
()
y is B ()
(3)
∗
/ 0 3∗ ( () ) ≥ / 03 ( () ) (4) ∗
For h is 1,2, . . . , W, B is chosen as B . More details about the process of combining the contradictory rules can be found in [12] [13]. 4) Prediction Process The designed membership function and the fuzzy rules extracted from the inputs and the output enable the proposed model to identify and learn the video quality (QoE). Within this model, we employ the centre of sets defuzzification, product implication and singleton fuzzification [15]. We correlate a crisp input vector with a crisp output vector y = f(x) by using the following equation [13]:
After designing the membership functions, we can obtain the fuzzy rules to define the behaviour of the output (QOE). The method used for extracting and learning the rules from the data is based on an extended version of the MendalWang approach [12] [13]. This method is a one-pass approach centred on obtaining the fuzzy rules from a set of data under examination. The output and input are divided into fuzzy areas by the rules' antecedent and consequent fuzzy sets. The extracted rules from multi-input and one output represent the relationship between the input pattern = ( , . . . , ) and the output y. As mentioned before, we have four inputs and one output, so the form of the extracted rules will be: x is
()
' is selected based on finding the ' * among the W output fuzzy sets B ... B - , [10] [11]:
2) Fuzzy Rule Extraction
If x is
! " #$. . ! " %ℎ # y ! '
In identifying the input variables we were guided by the literature [5], [6], [7], [8] and [9] to define the most significant parameters for QoE. The input variables are classified as objective (e.g. Network QoS, audio-video synchronization) and subjective (e.g. user perception, material quality). In this study, the QoS parameters are the inputs, while the QoE (which is taken simply to be the MOS) is the output. Once we have identified the inputs and outputs, we need to categorize the gathered input/output data into linguistic labels: low, moderate, high or very low, low, moderate, high, very high to provide quantification of the values. This is achieved by the design of the membership functions for each input and output.
() A Then
∑
Where N and t is the data points index with regard to the conflict group. Thus, the N rules are combined into a single rule, adopting the following configuration [12] [13]:
1) Identification of the Input and Output Variables
() A and … …
∑
() =
5() = 6 () =
< ∑> ∏: 89: (;: ) < ∑> ∏: 89: (;: )
(5)
M is the number of rules in the rule base, y ? is the centroid of i output fuzzy set B , and ∏@C μF@ (x@ ) is the product of the membership values of each rule’s inputs. In the case of multiple outputs, this equation needs to be repeated for each output variable. More details about this process can be found in [12] [13] [14]. After this step, the output (QoE) is predicted and we can compare it with the original one that was measured from simulation. IV.
EVALUATION SET-UP
Our previous work in [9] was based on our QoE dataset and we evaluated one type of video content. In this paper, as a starting point we use a QoE dataset done in [15] that determined different video content types. The QoE dataset in [15] is of significant benefit to this effort because we can benchmark the results. To that end we compare our results with the work in [15]. The selected video sequences were divided depending on the type of content into three categories. Slow Movement(SM), video clip ‘Akiyo’ for training and ‘Suzie’ for validation. Gentle Movement (GM), video clip ‘Foreman’ for training and ‘Carphone’ for validation. High Movement (HM), video clip ‘Stefan’ for training and ‘Football’ for validation. These selected sequences were classified in [16] based on the temporal and spatial feature extraction using a cluster analysis tool [17]. All the video sequences were in QCIF format (176 h 144) and encoded in H.264. In [15], the videos used in subjective tests were sent
(1)
Where i (1, 2, ...M) is the rule index, while M is the number of rules. Each input x consists of V defined fuzzy sets with A , q is 1, 2 …V, while s is the input number. For the output y, there are W fuzzy sets defined by B , h is 1, 2, . . . , W. More details about the process of extracting the fuzzy rules can be found in [12] [13]. 3) Normalising the Contradictory Rules If we have rules comprising the IF part (i.e. inputs) but dissimilar consequent values (output), the contradictory rules need to be brought together into a single rule. Accordingly, as all the rules (N) are broken down into groups, the IF part rules are gathered within the same group (conflict group). Then, the rules’ weighted average in the conflict group is calculated by the following equation [12]:
637 599
V.
over an OPNET simulator [18] to create network conditions. Fig 4 shows the network topology of the simulation network.
The aim was to develop learning models to predict video quality considering all content types(SM, GM and HM). The models were trained with three distinct video clips (Akiyo, Foreman, and Stefan) and validated with video clips of Suzie, Carphone, and Football. The prediction is from a combination of parameters associated with the encoder and access network for different types of content. The QoS parameters were FR, SBR, CT, BLER and MBL. The accuracy of the proposed video quality prediction models are determined by the root mean squared error or RMSE. Fig. 5(a-b) shows the resulting graph of subjective data against model prediction using line and scatter graphs. The RMSE was calculated for all instances and the result was RMSE = 0. 1711. For the original values and predicted values given as t1 and t2 respectively we have the average percentage of accuracy as:
Fig. 4. Topology of the Simulation Network [15]
The chosen QoS parameters were Frame Rate (FR), Sender Bitrate (SBR), and Content Type (CT), Block Error Rate (BLER) and Mean Burst Length (MBL). The combinations of the QoS parameters along with the content types are given in Table 1.
Accuracy =
TABLE 1: THE SIMULATION PARAMETERS QOE DATASET IN [15]. Video sequence
FR (fps)
SBR (kbps)
Akiyo, Forman, Stefan Suzie, Carphone, Football
48, 88, 128 10 90, 130
BLER (%)
MBL
1, 5, 10, 15, 20
1, 1.75, 2.5
RESULTS
A total of 81 test sequences were generated for model training and 54 for model validation. The subjective tests were carried out via the Internet using the URL in [19]. The QoE was predicted in terms of the MOS obtained from subjective tests. The subjective quality assessment in [15] was carried out using the single-stimulus Absolute Category Rating (ACR) method [20], which has a five point quality scale. The MOS results obtained for the test conditions defined were made publicly available to the research community at [19]. Table 2 shows an example of the QoE video dataset in [15].
∆ ( EG , E ) E
≈ 95.71%
Fig. 5a. Measured MOS VS. Predicted MOS averaged for all video sequences.
TABLE 2: EXAMPLE OF THE QOE VIDEO DATASET [15]. CT
SBR
BLER
MBL
MOS
SM
48
0.01
1
3.91
SM
128
0.01
1.75
4.21
GM
88
0.2
1
3.71
GM
48
0.01
2.5
3.33
HM
128
0.01
1
3.53
HM
48
0.01
1.75
2.41
Looking at the dataset in [15], we can see that FIS provides a means for successful implementation of an accurate prediction model. We expect high accuracy and open the way for dynamic adaptation of the model.
Fig. 5b. Measured MOS VS. Predicted MOS averaged for all video sequences.
638 600
REFERENCES
The proposed work represents a proof of concept QoE prediction model. Having obtained these results we can conclude that our prediction model is suitable for the accurate prediction of video quality when the appropriate QoS parameters are chosen. The test experiment has demonstrated a close relationship between the QoS and the associated QoE. It is clear from the results that the most important QoS parameter in the application layer is content type. Therefore, an accurate video quality prediction model should consider all content types. VI.
[1] [2] [3] [4]
[5]
EVALUATION OF THE PROPOSED FIS-BASED MODEL
The proposed prediction model is compared to the model in reference [15]. In [15] the QoE prediction model was based on regression analysis. Table 3 shows a comparison between the proposed FIS-based model and the regressionbased model.
[6]
[7]
TABLE 3: COMPARISON OF THE MODELS [8] Models
RMSE
JK
FIS-based (our model)
0.17
91.17%
Regression-based
0.37
87.89%
[9]
We can conclude that we have achieved a higher accuracy than the model in [15]. We also observe that a rule based approach is suitable for these datasets. A high accuracy in determining the expected level of QoE enables us to make efficient decisions regarding the provisioning of network resources whilst keeping the customer satisfied. VII.
[10] [11] [12]
CONCLUSION
The work presented herein was focused on the construction of an accurate as well as practical QoE prediction model. We developed a basic learning model based on FIS for predicting video quality. We considered the impact of QoS parameters on QoE in the context of different video content types. The perceived video quality was evaluated in terms of MOS. The proposed prediction model is compared with the regression-based model in reference [15]. The accuracy of the proposed video quality prediction models is determined by the RMSE. The results have demonstrated that it is possible to predict video quality when the appropriate QoS parameters are chosen. We have achieved a higher accuracy than the regression-based model in [15]. We can conclude that FIS provides an accurate prediction model. The high accuracy in determining the expected level of QoE enables network providers to optimize existing network resources by finding the impact of QoS parameters and hence the trade-off between them. In the future, we aim to evaluate combined (Spatial/Temporal) scalability. Also, we aim to explore the use of type-2 fuzzy logic systems which can handle high levels of linguistic and numerical uncertainties. Moreover, we will focus on extensive subjective testing to validate the model.
[13]
[14]
[15]
[16]
[17]
[18] [19] [20]
639 601
Quality of Experience (QoE) of mobile services: Can it be measured and improved? Nokia white paper, 2004. “Mean Opinion Score (MOS) terminology,” ITU-T Recommendation P.800.1, 2006. “Objective perceptual video quality measurement techniques for digital cable,” ITU-T Recommendation J.144, 2003. Mushtaq, M.S., Augustin, B., Mellouk, A., "Empirical study based on machine learning approach to assess the QoS/QoE correlation", Networks and Optical Communications (NOC), 2012 17th European Conference J. Pokhrel, B. Wehbi, A.Morais, A. Cavalli, E. Allilaire, “Estimation of QoE of video traffic using a fuzzy expert system”. CCNC 2013: 224-229 A. Hamam, M. Eid, A. El Saddik, and N. D. Georganas, “ fuzzy logic system for evaluating Quality of Experience of haptic-based applications,” in Proc. EuroHaptics 2008, pp.129–138, June 2008. A. Khan, L. Sun, E. Ifeachor, J. Fajardo, F. Liberal and H. Koumaras, “Video Quality Prediction Models Based on Video Content Dynamics for H.264 Video over UMTS Networks”, International Journal of Digital Multimedia Broadcasting, Volume 2010, 17 pages, T. Malinovski, T. Vasileva, and V. Trajkovik, “Quality-of-Experience Perception for Video Streaming Services: Preliminary Subjective and Objective Results”, International Journal of Research and Reviews in Next Generation Networks Vol. 1, No. 2, December 2011, Mohammed Alreshoodi, John Woods, “An Empirical Study based on a Fuzzy Logic System to Assess the QoS/QoE Correlation for Layered Video Streaming”, IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (IEEE CIVEMSA 2013), Milan, Italy, July 2013. J. Bih, “Paradigm shift -an introduction to fuzzy logic,” IEEE Potentials, vol. 25, no. 1, pp. 6–21, 2006. J. Mendel, “Fuzzy logic system for engineering: A tutorial,”Proceedings of the IEEE, 1995, vol. 83, no. 3. L. X. Wang and J. M. Mendel, “Generating fuzzy rules by learning from examples,” IEEE Trans. Syst. Man Cybern., vol. 22, no. 6, pp. 1414–1427,Nov./Dec.1992. L. X. Wang, “The MW method completed: A flexible system approach to data mining,” IEEE Trans. Fuzzy Syst., vol. 11, no. 6, pp. 768–782,Dec. 2003. F. Doctor, H. Hagras, and V. Callaghan, “A fuzzy embedded agentbased approach for realizing ambient intelligence in intelligent inhabited environments,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, Jan. 2005. A. Khan, L. Sun, E. Ifeachor, J. Fajardo and F. Liberal, ‘Video quality prediction model for H.264 video over UMTS networks and their application in mobile video streaming’. IEEE ICC, Cape Town, South Africa , 23–27 May 2010 A. Khan, L. Sun, and E. Ifeachor, “Content-based video quality prediction for MPEG4 video streaming over wireless networks,” Journal of Multimedia, vol. 4, no. 4, pp. 228–239, 2009. S. d. Toit, A. Steyn, and R. Stumpf, “Cluster analysis, Handbook of graphical exploratory data analysis,” ed. S.H.C. du Toit, SpringerVerlag, New York, pp.73-104, 1986. OPNET Simulator: www.opnet.com “Subjective_MOS_@ www.tech.plym.ac.uk/spmc/staff/akhan/mos_scores.html. ” “ITU T Rec. P.910, "Subjective video quality assessment methods for multimedia applications",” Geneva, Sep. 1999.