Network Performance Testing System Integrating Models for Automatic ...

3 downloads 6897 Views 2MB Size Report
Mar 12, 2015 - Network Performance Testing System Integrating Models for Automatic QoE Evaluation of Popular Services: YouTube and Facebook. Authors ...
Wireless Pers Commun (2015) 81:1377–1397 DOI 10.1007/s11277-015-2479-y

Network Performance Testing System Integrating Models for Automatic QoE Evaluation of Popular Services: YouTube and Facebook Francisco Lozano • Gerardo Go´mez • Mari-Carmen Aguayo-Torres • Carlos Ca´rdenas Antonio Plaza • Antonio Garrido • Janie Ban˜os-Polglase • Javier Poncela



Published online: 12 March 2015 Ó Springer Science+Business Media New York 2015

Abstract Smartphones have significantly changed the traffic pattern usage of mobile subscribers. In addition to the classical file sharing and web browsing usage, social networking applications and access to video content are now a significant percentage of the typical user data traffic consumption. As the change in usage pattern affects the performance of mobile networks, operators need methods to estimate how well the network behaves. The metrics to use for the evaluation of the quality of experience (QoE) are only well documented for traditional applications. In this paper a novel QoE system that automatically captures and processes the metrics needed to evaluate two commonly used applications, the social networking application Facebook and the video download application YouTube, is presented. The system uses a test application developed for Android phones, which report the results to a centralized platform capable of processing and presenting the results in different formats. The system has been used to test those two services over 3G/LTE network technologies. In the case of the YouTube services, performance results show a subjective score very similar for the 3G and LTE because of the automatic adaptation of the video coding rate to the network conditions. Testing of the Facebook application has led to the proposal of a new MOS expression that takes into account the content type being posted, which provides more reasonable MOS estimations than the generic web browsing models available in the literature and commonly used to estimate Facebook QoE. Keywords Android

Quality of experience  Mean opinion score  YouTubeTM  Facebook 

F. Lozano (&)  G. Go´mez  M.-C. Aguayo-Torres  J. Ban˜os-Polglase  J. Poncela Departamento de Ingenierı´a de Comunicaciones, Universidad de Ma´laga, Campus de Teatinos, 29071 Ma´laga, Spain e-mail: [email protected] M.-C. Aguayo-Torres e-mail: [email protected] C. Ca´rdenas  A. Plaza  A. Garrido Performance testing solutions, AT4 wireless, Parque Tecnolo´gico de Andalucı´a, C/ Severo Ochoa, 2, 29590 Ma´laga, Spain

123

1378

F. Lozano et al.

1 Introduction Wireless Internet via mobile devices is changing the way we conduct our lives. Smartphones and other portable devices are now being widely used in our daily life activities. As a result, global mobile data traffic has been growing at fast rates over the last years and is expected to grow three times faster than fixed Internet traffic from 2013 to 2018 [1], up to a compound annual growth rate of 61 % between 2013 and 2018, and likely reaching 15.9 exabytes per month by 2018. This outstanding traffic demand growth represents a serious challenge for network operators, who should engineer their wireless networks architecture to handle the huge volume of traffic in efficient ways while providing the highest Quality of Experience (QoE) to end users. Based on function attributes and data packet features, mobile Internet applications can be categorized into streaming, social networking services, instant messaging, voice over IP, web browsing, email, file transfer, gaming, and machine-to-machine dialogue. Specific applications allow the on-the-move use of services equivalent to those available through wired networks. This paper focuses on two widely spread applications: the video service platform YouTube and the popular social network Facebook. Mobile video traffic represents today 50 % of all data traffic and, according to forecasts for 2017, video traffic on mobile devices will account for 66 % of the total traffic for these terminals [1]. Notably YouTube accounts for more than 27 % of the global internet traffic, with 4 billion videos viewed every day [2]. YouTube platform has adapted to mobile devices, allowing users access to digital content from smartphones and tablets via wireless networks. In terms of communication protocols, YouTube uses Hypertext Transfer Protocol (HTTP) to transfer video instead of Real-time Transport Protocol (RTP) over User Datagram Protocol (UDP), as could be expected for a real time service. Facebook is one of the most popular and widespread social network with over 1100 million registered users [3]. Its integration into mobile application results in over 100 million photos and comments being uploaded daily. In terms of communication protocols, Mobile Facebook clients and servers deliver messages over HTTP protocol similar to how Facebook Web service does, but the delivery format is simpler in the mobile version. Perceived user quality results from a mix of performance indicators that are different from one application to another. In the case of video applications, any stop during the rendering of the image when the user is watching a video is identified as lack of quality. In the case of social networks, such as Facebook, a pronto upload of a photo is what the user demands, and the response time is directly related to the perceived QoE. Commonly, performance indicators directly gauged from wireless networks are not easily converted into user’s subjective quality ratings. Any tool aiming at evaluating wireless network performance should cope with the challenging task of not only capturing network information but also analysing the influence of each specific service. A traditional way to evaluate the quality perceived by users is by performing a user survey under various network conditions. The ratings are averaged over a large sample of users in order to obtain a single parameter known as ‘‘Mean Opinion Score’’ (MOS) [4]. This approach has several drawbacks such as the difficulty in ensuring repeatability of the network conditions and the cost of the survey itself. Hence, it would be more appropriate to automate the measurement procedure so that it could be performed without human intervention. To this end, MOS models have to be developed in order to substitute direct user opinion on services’ performance. Once models that correlate specific measurable parameters to user perceived quality are available, network operators can estimate the perceived user quality from the measurement of those parameters.

123

Network Performance Testing System for Automatic QoE Evaluation

1379

In this work a network performance testing system that carries out automatic objective measurements, maps the objective measurements to MOS scores, and displays the results together with network information and device geolocation information, is presented. The system is able to measure objective Quality of Service (QoS) indicators associated to YouTube and Facebook, in addition to other traditional service measurements. The measurement system is built in the form of an App that can run on any mobile device with Android operating system. The Performance App automatically runs tests, collects objective metrics, computes MOS scores out of the objective metrics following the models presented here, and reports the results to a centralized database. A web user interface facilitates querying the data base, filtering results and presenting the information in different presentation formats. The rest of this paper is organized as follows. Section 2 presents an overview of the network performance testing system and its architecture. In Sect. 3, both YouTube and Facebook QoE and traffic models are described. Section 4 focuses on the functionality of the application to measure the QoE related parameters. Real measurements on both 3G and LTE networks on the same location are presented in Sect. 5. Finally, some concluding remarks are given in the last section.

2 Network Performance Testing System Architecture The networks performance testing system, whose architecture is depicted in Fig. 1, is composed of two main components: A performance testing application, i.e., an App, that can run on any Android device; and a centralized data base, where results are stored, and that can be accessed via a web based interface, to retrieve, filter and display results.

Fig. 1 Architecture of the quality measurement system

123

1380

F. Lozano et al.

The performance testing application, once installed on mobile devices, is used to actually run the tests on demand. It has a local user interface where the different types of tests it supports can be started. The App also allows configuration of different settings for each of the tests it supports, such as number of iterations, testing time interval, etc. The user interface is very simple so that it can be used even by a subscriber to figure out whether or not they have sufficient network coverage and quality for a specific use such as watching a video. Once the test has finished, the outcome of each test is locally visible in the device and is automatically uploaded to the centralized database at the end of each test campaign (tests and test iterations programmed to be executed in sequence) together with the captured network information. The App can be used on static locations or on the move to run tests on any mobile cellular network, such as 3G (e.g., UMTS/HSPA) or 4G (LTE), or WiFi wireless networks. More details on the App are given in Sect. 4. After some pre-processing the database, where the results of the testing are upload, stores all relevant information. It includes information about the date/time when the measurements took place, the device geolocation information, network information, and the specific metrics of each type of test. The Web page provides the user interface to retrieve the information and has multiple filtering capabilities and presentation options. This allows a network operator or a test company to easily identify whether there are problems in certain locations and to correlate the perceived user quality indicators to parameters such as position, cell, or even time of the day. The information can be presented in various formats (tabular and graphs) and can be exported for further processing. It also generates Google EarthTM compatible KMZ files. In addition to more traditional measurements, the networks performance testing system has integrated the YouTube and Facebook tests and MOS models described in the following sections. Overall the networks performance testing is a data wireless network analytics platform which combines the flexibility and scalability of the App to provide concurrent testing on multiple devices and analytics of multiple wireless networks simultaneously. The system helps to perform test campaigns and automatically capture endto-end metrics, including user experience, store test results and visualize networks parameters together with perceived user experience.

3 QoE Modelling for Multimedia Services We have employed the QoE evaluation for multimedia services based on the ITU-T Recommendation P.800 [4]. This opinion scale, the most frequently used for multimedia services, allocates qualitative values from Bad to Excellent by mapping the quantitative MOS as depicted in Table 1. A minimum value of 3 has to be obtained to establish quality as Fair, being 5 the maximum score and 1 the lowest mark. Table 1 ITU-T Recommendation P.800 MOS scale [4]

123

Quality

MOS

Excellent

5

Good

4

Fair

3

Poor

2

Bad

1

Network Performance Testing System for Automatic QoE Evaluation

1381

3.1 YouTube YouTube service uses a transport mechanism based on the HTTP/TCP protocol stack which is called progressive download [5]. This mechanism makes it possible for the video to start playing before its full content has been downloaded. At the receiver side, the video player includes a buffer that stores the video content whilst it is being consumed. The purpose of this buffer is to compensate for the jitter introduced by the network. As soon as there is sufficient amount of data in the buffer, the playback starts. The video data transfer from the media server to the client consists of two phases: initial burst of data and throttling algorithm [6]. In the initial phase, the media server sends at the maximum available bandwidth an initial burst of data whose size is determined by one of the setup parameters. Then, the server starts the throttling algorithm, in which the data is sent at a constant rate, normally at the video clip encoding rate multiplied by the throttle factor, also included in the setup or configurable parameters. In a network congestion episode, if data cannot be delivered at the constant rate, data are buffered in the server and released as soon as the congestion is alleviated. When the latter occurs, data is sent at the maximum available bandwidth. Whenever the player’s buffer runs out of data, the playback will be paused, thus leading to a rebuffering event, i.e., a video stop, which obviously is an undesirable event for the user. There are several available metrics to characterize the video quality. Some of them are based on comparing the received video with the original video as reference. Examples of this type of video quality metrics are: Mean Square Error (MSE), Peak Signal to Noise Ratio (PSNR) [7], Video Structural Similarity (VSSIM) [8], Perceptual Evaluation of Video Quality (PEVQ) [9] and Video Quality Metric (VQM) [10]. This type of metrics is useful for obtaining objective metrics in controlled experiments, but they might not be applicable for online (real-time) procedures as the full reference is not available. Furthermore, they are suited to measure the image quality degradation, e.g., due to packet losses or due to compression. However, since YouTube uses a reliable protocol (TCP), this kind of image degradation does not frequently occur, but instead, bad network conditions will lead to video playback gaps. Therefore, in the case of YouTube, the use of metrics computed without a video reference are more suitable. In particular, the model for evaluating the QoE used in the testing system is based on a linear model proposed by Mok et al. [11], which provides a quality metric in terms of MOS. This model uses a layered structure with a generic procedure to estimate the end-user’s perceived quality described in three steps: 1. 2. 3.

Measure network QoS metrics (e.g. round-trip time, loss rate, throughput, etc.), Map network QoS metrics onto Application Performance Metrics (APM), Map the APM onto end-user’s QoE (in terms of MOS).

Even assuming that the user does not interact with the video during the playback (such as pausing and forward/backward), the model proposed in [11] to estimate application QoS metrics from network QoS is only valid provided that the network bandwidth, Round Trip Time (RTT) and packet loss rate are constant during the video download. However, this assumption is not very realistic in wireless networks, and therefore, we have implemented a modified procedure based on measuring the APMs directly from the receivers, and afterwards, mapping the APMs onto the MOS score. In HTTP video streaming service the following APMs are required by the model in [11]:

123

1382

F. Lozano et al.

– Initial buffering time (Tinit ): period between the starting time of loading a video and the starting time of playing it. – Rebuffering frequency (frebuf ): frequency of interruption events during the playback. – Mean rebuffering time (Trebuf ): average duration of a rebuffering event. The testing application uses the YouTube Player API to extract these APMs. This API provides the capability to embed a YouTube player in the testing application as well as to get certain information about the ongoing YouTube session. In particular, it is possible obtain the previously mentioned APMs by monitoring the events triggered by this API when the YouTube player has a change of state (unstarted, buffering, playing, paused, and ended). From these events, it is easy to compute the initial buffering time (Tinit ) as the time elapsed in the buffering state for the first time, the rebuffering frequency (frebuf ) from the number of times that the player enters the buffering state during playback, and the mean rebuffering time (Trebuf ) from the time elapsed in the buffering state. More details are given in Sect. 4. Finally, the MOS is computed as [11]: MOSQoSmodel ¼ 4:23  0:0672LTi  0:742Lfr  0:106LTr

ð1Þ

where LTi ; Lfr and LTr are quantized values of the respective levels Tinit ; frebuf and Trebuf , following Table 2. In a previous work [12] this same calculation was used to compute the MOS score. The result was correlated to actual user surveys run on trials using mobile devices connected to 3G and Wi-Fi infrastructures. In that work, a modified MOS expression was presented that includes a 20 % increase of the perceived QoE because it was identified that users expectations of quality on smartphones and tablets were not so restrictive as those for nonportable devices. Considering the proposed change, the mapping of APMs onto MOS incorporated into the testing App developed is given byquantization given in ð2Þ

MOS ¼ 1:2MOSQoSmodel

Due to quantization given in Table 2, only 27 combinations for the performance metrics are possible. Their mapping onto MOS is illustrated in Fig. 2, showing a maximum MOS value of 3.98 and a minimum value of 1.78. It is clear from it that the rebuffering frequency is the most influential performance metric. In fact, if frebuf stays below 0:02 (marked in red) it is always possible to obtain an acceptable MOS (over 3). That means that a high number of short gaps is perceived as worse quality than a single longer one. Note that the three performance metrics are not independent and commonly a good/bad rebuffering frequency is accompanied by good/bad (re)buffering times. 3.2 Facebook Facebook clients over PCs use HTTP as the application protocol, as a typical web service does. However, mobile terminals delivery format is simpler than web pages, as they use the Graph API tool provided by Facebook Developers site [13, 14]. Table 2 APMs quantification levels

123

Tinit

frebuf

Trebuf

Low

0–1 s

0–0.02

0–5 s

1

Medium

1–5 s

0.02–0.15

5–10 s

2

High

[5 s

[0.15

[10 s

3

Lx

Network Performance Testing System for Automatic QoE Evaluation

1383

Fig. 2 MOS mapping for the YouTube service

The most common action in Facebook is to publish a comment (from the client to the server). In order to do this, the session between the client and the server is set via a triple confirmation message (TCP 3-way handshake). Then, secure communication over TCP using Transport Layer Secure (TLS) is established, allowing to send encrypted application data. Finally, the TCP connection is released. This procedure is identical whatever the action is performed (uploading text, photo or video) and regardless of the technology used to access the Internet. Thus, the only metric which can affect the user QoE is the time elapsed to perform a particular action. Previous work with real subjects [15] propose a methodology for measuring the QoE for Facebook users based on the ITU-T P.800 recommendation [4]. Given the type of traffic on Facebook, an estimate of the quality of user experience is proposed through HTTP traffic models for actions related to the time required to fulfill a particular action.

Fig. 3 MOS mapping for Facebook as a function of the delay

123

1384

F. Lozano et al.

The most important metric at the application layer to estimate the MOS is the response time to complete the action. The MOS expression for this type of traffic is modelled by [16]: MOS ¼ 5 

578 1 þ 11:77 þ 22:61 D

2

ð3Þ

According to Eq. (3), where D represents the service response time in seconds, uploading small files would always be mapped onto a good quality whereas big files would be penalized. However, Facebook users are usually more delay-tolerant when uploading big files. Thus, we propose a modified MOS evaluation including a parameter a  1 which is different for text, photo and video: MOS ¼ 5 

578 1 þ 11:77 þ 22:61 Da

2

ð4Þ

Figure 3 shows the MOS mapping results assuming a ¼ 1 for text (comments), a ¼ 0:8 for photo and a ¼ 0:4 for video. These alpha values have been obtained empirically from subjective tests.

4 Testing App Implementation YouTube and Facebook QoE analysis has been carried out with the network performance testing system described in Sect. 2. The testing application has been developed in Android, and it is responsible for measuring at the mobile device those performance indicators required to estimate the MOS by interacting with YouTube and Facebook Application Programming Interface (API). 4.1 YouTube Test Implementation As previously described, once YouTube is started (automatically or when the user press the start button), data is buffered and only after enough data is downloaded, the playback begins. A rebuffering event happens whenever the buffer runs out of data while watching the video. The YouTube’s state machine is shown in Fig. 4. Embedded YouTube’s player in mobile devices has to be initialized before the video playback. Unlike YouTube’s player in PC, this process determines automatically the video quality as a function of network connection type, device performance and user’s settings. Android platform just let users choose between single definition (SD) or high definition (HD), if available. User presses start button but not enough buffered data

Cued

Paused User presses pause button

Not started automatically

User presses play button, enough buffered data

User presses start button Enough buffered data

Unstarted

Buffering Empty buffer

Fig. 4 YouTube’s state machine [6]

123

Video finishes

Playing

Unstarted

Network Performance Testing System for Automatic QoE Evaluation

1385

The Android application is able to access the state machine through the YouTube API. With this information, the testing application developed measures the time spent in each state and calculates the frequency of state changes. These performance indicators are introduced into the QoE model as given by Eq. (1) to estimate a MOS value for each video session. The application is able to set several YouTube iterations (video playbacks) and automatically evaluate the QoE without user interaction. The user may watch the video during the playback as the testing application includes an embedded YouTube’s player. A test video (see Sect. 5.1) has been developed and uploaded to YouTube for use by the testing application. The video has been designed to facilitate real users’ survey evaluation which provides correlation of real users MOS scores to the model used in the testing App. The user may however configure the test to use any other YouTube video. The input parameters of the test case are: – – – –

Video ID: identification code assigned by YouTube when created. Timeout: maximum waiting time between phases of the state machine. Iterations: number of video playbacks. Guard Time: waiting time between iterations.

When each video playback is finished, the iteration results are displayed on the device’s screen. The following information is provided for each iteration: initial buffering time, rebuffering frequency, duration of rebufferings, and estimated MOS. Once the given number of iterations have finished, statistics averaging all tests are shown as a test session summary. Figure 5 shows the flow diagram of the YouTube testing application.

Start

Input Config Data

Test Results i

Load input parameters

Test Starts i

i++

Test Finishes

Wait Guard Time

Final number of iterations

No

Yes Generate Statistics

End

Fig. 5 Test case flow diagram

123

1386

F. Lozano et al.

4.2 Facebook Test Implementation Facebook Graph API and Software Development Kit (SDK) [13] for Android facilitate the development of applications that can manage a Facebook account. The main feature of Graph API is to define each content (user, comment, etc.) as an object. It is also used to search, delete, publish and/or update an object, who has its unique identifier (ID). The API may send a query for the object via https://graph.facebook.com/\FACEBOOK ID[/ \Object-ID[, where \FACEBOOK ID [ represents the username of Facebook and \Object-ID [ is the ID of the object. Due to the fact that the communication takes place under the HTTP protocol, the actions are performed using the GET HTTP command to download or the POST HTTP command to upload. The response from the server is based on Java Script Object Notation (JSON), which returns the information requested. The QoE metric estimation proposed for the Facebook application is based on the time to complete an action. The testing application differentiates three main content types: text comment, picture and video, and implements uploading and retrieving actions related to the three content types. The testing application measures the time to post each content separately, and maps it into a MOS score. The input parameters of the Facebook setup are: – Time between operation: time between the completion of an action and the beginning of the next action to be performed. – Number of Comments: number of comments to be sent to the Facebook account. – Comment Length: number of characters of the comment to upload. – Number of Pictures: number of pictures to send. If this value is not empty the user has to select from the device’s gallery a picture to be sent. – Number of Videos: number of videos to send. If this value is not empty the user has to select from the device’s gallery a video to be sent. With these input parameters, the Facebook application sets an automated test pattern. The flow diagram is similar to that for YouTube shown in Fig. 5 except that the three content types (comment, picture, video) are sequentially uploaded.

5 Analysis of Results and Discussion A test plan was carried out in Ma´laga (Spain) to verify the functionality of the testing App and the complete network performance testing system. Tests were performed in an area covering urban and industrial zones with both LTE and 3G coverage. All tests have been carried out over a major Spanish mobile phone operator. The tests were performed in mobility conditions along a predefined route. The tests were run at walking speed as this would be the typical scenario usage for subscribers in that area. After introducing the configuration parameters for the tests, data acquisition was launched and the test operator followed the predefined route. Once the predefined walk route was completed the testing app was manually stopped and the measurements acquired by the device automatically uploaded to the database. The predefined route takes around 30 min to complete at normal walking speed. The devices used for the testing were a Sony Xperia Z smartphone, for the YouTube tests and an LG G3 smartphone for the Facebook tests. For each set of tests the route was walked twice with the device configured in preferred-LTE mode or 3G-only mode, to allow the comparison of the QoE based on the access network technology.

123

Network Performance Testing System for Automatic QoE Evaluation

1387

5.1 YouTube Analysis The YouTube testing was set with the input parameters shown in Table 3. The employed video (a snapshot is shown in Fig. 6) was designed with testing purposes. It is 1 min and 20 s long with up to 1080p quality and has multiple images with objects moving at increasing speeds, different types of letter fonts, and different colour patterns. An extraction of the instantaneous downlink data rate of two iterations (one per radio access technology) of the YouTube’s test is illustrated in Fig. 7. As expected, the downlink data rate curve shows two different phases: at the beginning, there is an initial burst characterized by a large peak; then, the throttling algorithm starts, characterized by data rate peaks lower than the initial burst. The different behaviour in terms of maximum and mean data rate depending on the radio access technology (3G or LTE) is clear in Table 4. The selection of the video coding rate is set by the YouTube server automatically and is based on the network connection type and device capabilities. Previous works [6] indicate that initial buffering corresponds to around 40 s of video playback and that the throttling algorithm regulates the data transmission to 1.25 times the video coding rate. Based on those values, the average coding rate estimation for each technology in presented in Table 5. Video was coded with 3 times better quality for LTE than for 3G as LTE is able to provide much higher data rates. Table 3 YouTube input config parameters

Field

Value

Video ID

HdbdH_9Ot8M

Timeout

30 s

Iteration

100

Guard Time

3s

Fig. 6 Snapshot of the testing video employed

123

1388

F. Lozano et al.

Fig. 7 Instantaneous YouTube Rx data rate over LTE and 3G Table 4 Rx data rate summary (Mbps) Connection type

No. Videos

Mean

SD

Min

Max

LTE

24

3.51

4.67

0.00

25.99

3G

23

1.15

1.45

0.00

8.65

Table 5 Coding rate estimation

Connection type

Estimated coding rate

LTE

2 Mbps

3G

0.67 Mbps

According to the statistics acquired during the test, the empirical cumulative distribution function (CDF) of the initial buffering time is estimated and shown in Fig. 8. It can be observed that LTE displays roughly the same initial buffering time for all the tests. In contrast, the results for 3G reveal a high variability of the initial buffering time, probably due to: (1) higher traffic load for 3G and (2) the existence of handovers between UMTS and HSPA? technologies during the tests. To highlight this behaviour, the analysis among spatial–temporal variation of the radio service and the data rate received profile is illustrated in Fig. 9, which is an extraction of three YouTube’s video playbacks with 80 s duration each. The background colours in the figure indicate UMTS or HSPA? coverage. When the application demands more network resources, the radio subsystem switches to HSPA? technology. But once the data transmission finishes, it returns to UMTS coverage in order to save battery. Figure 10, which plots the switching of the radio services along the test route, shows a repetition pattern, where cyan and yellow points correspond to HSPA? and UMTS, respectively. Figure 11 depicts the received data rate in green ([500 kbps), red (\500 kbps) and black (no transmission). The match in transitions between Figs. 10 and 11 confirms the switching to HSPA? from UMTS when each transmission begins. As already described the testing App itself computes the QoE score out of the performance metrics measured. Table 6 is a summary of the outcome of the tests carried out along the test route selected for the YouTube test. Note that measured MOS is higher in

123

Network Performance Testing System for Automatic QoE Evaluation

1389

Fig. 8 Initial buffering CDF on YouTube test

Fig. 9 Instantaneous data rate and radio access technology on YouTube test

average and has lowest deviation when the device was in 3G mode, contradicting what could be normal expectation that LTE provides better quality. The reason is that for 3G the quality of the video downloaded was worse. Thus, its requirements from the access network were much lower (a third data rate in average), and for 3G no rebuffering event happened during the test. Please, note that the video quality itself is not included in the MOS model provided here. 5.2 Facebook Analysis Following the test procedure already described in the YouTube analysis, the Facebook test session consisted on completing the same route also on foot. Again the device was set in LTE mode or 3G mode in each walking iteration.

123

1390

F. Lozano et al.

Fig. 10 Coverage YouTube test, UMTS (yellow) and HSPA? (cyan) (Google EarthTM )

Fig. 11 Rx data rate in route (Google EarthTM )

The Facebook test parameters are shown in Table 7. The data measured by the testing application allow to compare the response time of three actions completed in Facebook: posting a comment, posting a picture and posting a video.

123

Network Performance Testing System for Automatic QoE Evaluation

1391

Table 6 YouTube MOS summary Connection type

Mean

SD

Min

Max

LTE 3G

3.85

0.204

3.006

3.897

3.889

0.023

3.816

3.897

The elapsed time CDF of each action is depicted in Figs. 12, 13 and 14 for comment, picture and video, respectively. Some interesting insights can be obtained from the results. As it could be expected, elapsed times when in LTE networks are lower than those for 3G connections. CDF time elapsed to post a comment (Fig. 12) in 3G clearly shows two values that are likely to happen, a lower value around 2 s and a most likely value of about 4.5 s. Basically, it depends on which radio access within 3G (HSPA? or UMTS) is being used. While LTE is still having no problem to post a picture, 3G is sometimes delaying the transfer, thus a range of delays from 5 to 35 s were seen. Delays are longer for posting a video (Fig. 14) but even in this case LTE delays are quite short considering the big file size (Table 8). For 3G, two different sets of delay values for UMTS and HSPA? are still noticeable.

Table 7 Facebook input config parameters

Field

Value

Time between operation

3s

Number of comments

50

Comment length

255 characters

Number of Pictures

50

Picture File Size

1.19 MB

Number of Videos

50

Video File Size

48.86 MB

Fig. 12 Comment elapsed time on Facebook evaluation

123

1392

F. Lozano et al.

Fig. 13 Picture elapsed time on Facebook evaluation

Fig. 14 Video elapsed time on Facebook evaluation

Introducing the registered values of elapsed time into the MOS model in Eq. (3), we obtain an estimation of Facebook QoE of each action and technology. This basic MOS mapping is illustrated in Fig. 15 with dashed curves. Opposite to YouTube, results over LTE coverage displays a better MOS than over 3G as actions to be performed are identical in both cases. In that figure, average is shown as a dot and the vertical line over it represents the range of measured MOS. Through this MOS model, comment and picture posting actions throw reasonable MOS magnitudes. However, results for video evaluation are handicapped by a larger file size while throughput presents similar magnitude for other actions. The modified MOS model results (Eq. (4) with a ¼ 1 for comments, 0:8 for pictures and 0:4 for video) is also shown in Fig. 15. In this case, the obtained average video quality is similar to that of the picture under the same throughput availability, according to the user expectations for a bigger file.

123

Network Performance Testing System for Automatic QoE Evaluation

1393

Table 8 Facebook elapsed time summary Action

Connection type

Mean (s)

SD (s)

Min (s)

Comment

LTE

0.8

0.61

0.65

5.04

Comment

3G

4.5

1.16

1.65

7.4

Picture

LTE

3.1

0.28

2.8

4.61

Picture

3G

14.9

9.7

6.61

36.07

Video

LTE

45.61

11.67

Video

3G

199

31.6

33.66 140.6

Max (s)

77.93 252.63

Fig. 15 Facebook MOS evaluation with a ¼ 1 for comments, 0:8 for pictures and 0:4 for video on modified curves

6 Conclusions This work has presented the implementation of a network performance testing system capable of providing estimations for the QoE as perceived by users when using two of the most used applications: YouTube and Facebook. The testing system integrates a testing application, developed for Android phones, that measures parameters by accessing Android APIs and computes the MOS score based on models. The testing App also records date and time, the device geolocation information and the radio access network information, to facilitate further root cause analysis of problems should they appear. Test have been performed on a route, at walking speed, with two different smartphones, on the same network but with the device configured to access LTE or to 3G only mode. YouTube tests for LTE and 3G show significant differences for the initial buffering time depending on the radio technology. However, the final MOS is similar for 3G and LTE as video coding rate is automatically adapted to the radio capabilities and MOS models do not include this parameter to evaluate the perceived quality. With respect to Facebook, the application is able to automatically post comments, pictures and video. Commonly employed MOS expressions do not take into account the

123

1394

F. Lozano et al.

file size, distorting the estimated quality for big video files. A modified MOS model that takes into account that the user is willing to wait longer for posting a video than for a simple comment, is presented. The influence of the radio access technology on the perceived quality is noticeable not only between LTE and 3G, but also within the variants of the 3G technology (UMTS and HSPA?). Acknowledgments This work has been partly supported by the Spanish Government and FEDER under Grant TEC2013-44442-P, the Junta de Andalucı´a (Spain) under Grant 760423, and the European Union (European Regional Development Fund) and Corporacio´n Tecnolo´gica de Andalucı´a under Grant 13/680.The authors would like to thank Iban Serrano Caballero, Rube´n Reina Cuenca, Sergio Quintana Domı´nguez, Pla´cido Lo´pez Ga´mez, Noelia Dı´az Bernal, Noelia Guerra Melgares and Rafael Molina Jimena for their help in programming and testing both the Android App and the web application.

References 1. Cisco Systems Inc.,(2013). Cisco visual networking index: Forecast and methodology, 2013–2018. http://www.cisco.com. Accessed Oct 2014. 2. YouTube Press Room. (2014). Statistics. https://www.youtube.com/yt/press/statistics.html. Accessed Oct 2014. 3. Experian Hitwise US. (2014). Social media trends. http://www.experian.com/hitwise/online-trendssocial-media.html. Accessed Oct 2014. 4. ITU-T, Recommendation. (1996). P.800. Methods for subjective determination of transmission quality. Available at http://www.itu.int/rec/T-REC-P.800-199608-I/en 5. Begen, A., Akgul, T., & Baugher, M. (2011). Watching video over the web: Part 1: Streaming protocols. IEEE Internet Computing, 15(2), 54. 6. Ameigeiras, P., Ramos-Munoz, J. J., Navarro-Ortiz, J., & Lopez-Soler, J. (2012). Analysis and modelling of youtube traffic. Transactions on Emerging Telecommunications Technologies, 23(4), 360. 7. Gonzlez, R., & Wintz, P. (1987). Digital image processing (2nd ed.). London: Addison-Wesley Publishing Co. 8. Wang, Z., Lu, L., & Bovik, A. (2002). Video quality assessment using structural distortion measurement. In Proceedings on the 2002 International Conference on Image Processing, vol. 3, pp. 65–68. 9. ITU-T, Recommendation. (2008). J.247. Objective perceptual multimedia video quality measurement in the presence of a full reference. Available at http://www.itu.int/rec/T-REC-J.247-200808-I/en 10. ITU-T, Recommendation. (2004). J.144. Objective perceptual video quality measurement techniques for digital cable television in the presence of a full reference. Available at http://www.itu.int/rec/T-REC-J. 144-200403-I/en 11. Mok, R., Chan, E., & Chang, R. (2011). Measuring the quality of experience of http video streaming. In Integrated Network Management (IM), 2011 IFIP/IEEE International Symposium on, pp. 485–492. 12. Go´mez, G., Hortigu¨ela, L., Pe´rez, Q., Lorca, J., Garcı´a, R., & Aguayo-Torres, M.C. (2014). Youtube qoe evaluation tool for android wireless terminals. EURASIP Journal on Wireless Communications and Networking 2014, 2014, 164. 13. Facebook Inc., (2014). Facebook developers. https://developers.facebook.com/. Accessed Oct 2014. 14. Jung, I., Kim, H., Hong, D.K., & Ju, H. (2013). Protocol reverse engineering to facebook messages. In 2013 4th International conference on intelligent systems modelling simulation (ISMS), pp. 539–542. 15. Casas, P., Sackl, A., Egger, S., & Schatz, R. (2012). Youtube and facebook quality of experience in mobile broadband networks. In IEEE Globecom Workshops, pp. 1269–1274. 16. Ameigeiras, P., Ramos-Munoz, J. J., Navarro-Ortiz, J., Mogensen, P., & Lopez-Soler, J. M. (2010). QoE oriented cross-layer design of a resource allocation algorithm in beyond 3G systems. Computer Communications, 33(5), 571.

123

Network Performance Testing System for Automatic QoE Evaluation

1395

Francisco Lozano received his M.Sc. degree in Telecommunication Engineering and Postgraduate Certificate in Neuroimaging from the University of Malaga, Spain, in 2014. He is currently a research assistant in the department of Communications Engineering at the University of Malaga. His research interests include QoE evaluation for multimedia services in mobile communications.

Gerardo Go´mez received his B.Sc. and Ph.D degrees in Telecommunications Engineering from the University of Ma´laga (Spain) in 1999 and 2009, respectively. From 2000 to 2005 he worked at Nokia Networks and Optimi Corporation (recently acquired by Ericsson), leading the area of QoS for 2G and 3G cellular networks. Since 2005, he is an associate professor at the University of Ma´laga. His research interests include the field of mobile communications, especially QoS/ QoE evaluation for multimedia services and radio resource management strategies for LTE and LTE-Advanced.

Mari-Carmen Aguayo-Torres received the Ph.D. degree in Telecommunication Engineering from the University of Malaga, Spain, in 2001 (M.S. 1994) with a thesis on Adaptive OFDM. She is currently an Associate Professor in the department of Communications Engineering at University of Malaga. Her main research interests include adaptive modulation and coding for fading channels, generalized MIMO, OFDM and SC-FDMA, crosslayer design, and probabilistic QoS guarantees within wireless networks. Dr. Aguayo-Torres is involved in a number of public and private funded projects and actively collaborates with industry, mainly in the field of wireless communications (LTE, LTE-Advanced, WiMax).

123

1396

F. Lozano et al. Carlos Ca´rdenas received his Telecommunications engineering from the University of Ma´laga (Spain) in 2005. He joined AT4 wireless in 2004 and has participated in several national and international R&D projects focused on developing methodology and tools for testing mobile broadband networks. In 2009 he lead the Baobab team that worked on the development of the TTCN-3 code for WiMAX protocol conformance testing. He has also actively contributed to VoLTE Plugfests. He is now leading the development of performance testing solutions at AT4 wireless.

Antonio Plaza received his Telecommunications Engineering degree from the University of Malaga (Spain) in 2006 and completed a Master in Business and Administration in 2008. He joined AT4 wireless in 2005 and has participated as a testing expert in several contracts with ETSI. He has also participated in several national and European R&D projects focused on broadband technologies, ITS or security. He is currently leading the development of testing Apps for the evaluation of the performance of mobile and wireless networks as well as devices.

Antonio Garrido received his Telecommunications engineering from the University of Ma´laga (Spain) in 2008. He started his career at AT4 wireless in 2008 and has participated in several national and international R&D projects related to IPv6, mobile networks and conformance testing. His research interests include performance testing for fixed and mobile networks and mobile application development. He is actually responsible for the development of laboratory performance testing solutions.

123

Network Performance Testing System for Automatic QoE Evaluation

1397

Janie Ban˜os-Polglase (IEEE member) received her M.Sc. and PhD. Degree in telecommunications engineering from the Polytechnic University of Madrid (Spain) in 1981 and 1987, respectively. She has participated in various industry associations and standardization organization contributing to the development of standards. She has participated in several European R&D projects and has led various national R&D projects. She worked for AEG-Telefunken, until 1986, Alcatel until 1992, and then joined AT4 wireless, where she is now CTO. She has also lectured at the Polytechnic University of Madrid and the University of Ma´laga. Her research interests include mobile communications, electromagnetism and radio propagation.

Javier Poncela received the M.Sc. degree in telecommunication engineering from the Polytechnic University of Madrid, Spain, in 1994 and the Ph.D. degree from the University of Ma´laga, Spain, in 2009. He worked in Alcatel Spacious before joining the University of Ma´laga at the Communication Engineering Department. He has actively collaborated with multinational companies (Nokia, AT4wireless) on formal modelling and system testing in Bluetooth, UMTS and satellite systems. His actual research interests include methodologies for efficient development of complex communications systems, analysis of end-to-end QoS over heterogeneous networks and systems and models for the evaluation of QoE.

123