Remote display solution for video surveillance in multimedia cloud

Multimed Tools Appl DOI 10.1007/s11042-015-2816-x

Remote display solution for video surveillance in multimedia cloud Biao Song 1 & Mohammad Mehedi Hassan 1 & Yuan Tian 1 & M. Shamim Hossain 1 & Atif Alamri 1

Received: 21 September 2014 / Revised: 5 July 2015 / Accepted: 7 July 2015 # Springer Science+Business Media New York 2015

Abstract Cloud computing offers sufficient computing and storage resources that can be used to provide multimedia services. Migrating the existing multimedia service to cloud brings a new challenging issue, i.e., remote display of video contents. To reduce the bandwidth consumption especially for mobile users, it is desired to encode video before sending to client. Existing encoding methods have unique advantages and disadvantages, differing their performance under varying situations. Thus, we propose to use multi-encoder method to solve the real-time remote display problem for remote multimedia cloud. To select the most appropriate encoder, factors including cost, application requirement, network, client device and codec implementation are considered. In this paper, we form a non-linear programming model, and provide an example to illustrate how to apply the proposed model for getting desired optimization. Keywords Remote display . Video encoding . Multimedia cloud . Real-time system

* Biao Song [email protected] Mohammad Mehedi Hassan [email protected] Yuan Tian [email protected] M. Shamim Hossain [email protected] Atif Alamri [email protected] 1

College of Computer and Information Sciences, King Saud University, Riyadh, Kingdom of Saudi Arabia

Multimed Tools Appl

1 Introduction Nowadays, Cloud computing is emerging as a noteworthy technology that can facilitate effective processing of complicated multimedia services and provide quality of service (QoS) provisioning for applications from anywhere, at any time, on any device – all at lower costs. In such scenario, cloud computing eliminates the need to fully install media application software on users’ mobile devices; it thus alleviates the burden of software maintenance and upgrades. The mobile device executes only a viewer component (e.g., web browser), which operates as a remote display for video surveillance applications and services running on distant servers in the cloud [22]. One typical application of multimedia cloud is cloud based video surveillance. Cloud based video surveillance system uses IP cameras which can easily leverage the advantage of cloud computing and provide services to mobile users in a flexible way. In this paper, we focus on providing a novel and efficient remote display solution for cloud based video surveillance system. During the past decades, researchers and engineers have spent huge amount of efforts and resources to develop video surveillance solutions running on local machines. It is not efficient and cost effective to ask each research team to re-implement the same algorithm based on the cloud literature. Thus, finding an efficient way that enables the existing local video surveillance applications in cloud is necessary. As cloud providers offer powerful cloud on demand resources such as the CPU, memory, storage, GPU and network bandwidth, the computing and storage requirement for running the local video surveillance applications at cloud can be fulfilled. Through PaaS (Platform as a Service), the running environment can be also configured and re-used at cloud server. However, the challenging issue arises when the output of surveillance applications has to be sent and displayed on remote device rather than local monitor. For example, if a mobile user tries to watch surveillance videos using a windows application on amazon EC2 cloud, the bandwidth consumption will be huge due to the inefficiency of windows RDP (Remote Desktop Protocol). Once the available bandwidth of mobile device is not sufficient, mobile users may fail to watch the videos until enough bandwidth is guaranteed. Thus, there is a strong desire to design an efficient remote display solution for cloud-based video surveillance system. A common solution for remote video streaming is to use H.264/Mpeg-4 codex, which achieves high compression ratio for video data. However, their high encoding/decoding complexity and low resiliency to channel error are unbearable in real-time video surveillance context. Another alternative solution is to use M-JPEG (Motion-JPEG), which provides fast image stream recovery in the event of packet loss. Also the encoding/decoding complexity and latency of M-JPEG is lower than H.264/Mpeg-4. Unfortunately, although the compression ratio of M-JPEG can be controlled, the bandwidth consumption of M-JPEG is still higher than that of H.264/Mpeg-4. The third codec option is to use tensor method, which treats the highdimensional video data as higher-order tensors and applies redundancy reduction based on tensors. This method can achieve amazing performance when the dynamic texture information is the main interest of video. The size of encoded video is extremely small, and the peak signalto-noise ratio (PSNR) of encoded video can be significantly increased. But the time complexity of tensor encoder is much higher than that of H.264. Real-time or near real-time tensor encoding is not yet possible without the help of GPU. Remote display system is a soft real-time system. The overall quality of services degrades with the increase of response time. The exact time constraint depends on the contents of remote display, i.e., the remote services running on server. Besides, the mean processing time per frame is no greater than the input sampling period. Generally speaking, client side device

Multimed Tools Appl

should be able to receive and display at least 20–24 frames per second for smooth display effect. Consequently, remote display server need to process the same amount of frames, or even more if not 100 % of data is guaranteed to be delivered to client over network. Existing real-time screen sharing and remote display technologies adopt single conventional compression technique (X11, RDP, VNC) or codec that were invented for photos (JPEG) or video (H.264). These are only designed to display certain material under specific network condition. If the overall running environment is not suited, these solutions may cause low frame rates, low picture quality, high latency, and high bandwidth or computing resource usage. In the worst case, entire remote display service can be interrupted. According to above discussions, we figure out that using single codec method cannot provide satisfactory performance in cloud-based video surveillance context. The factors that may affect the evaluation of codec are as follows:

& & & & &

Cost: Although cloud has plenty of resources, the cost of service is still a key issue in cloud-based video surveillance system [5]. Video surveillance application requirement: Most of the video surveillance applications have strict requirements on minimum video resolution, frames per second or PSNR [25]. Network situation: The influence of network bandwidth and packet loss differs from one codec to another [9]. Client device situation: Processing capability (related to battery energy) is an important issue while using mobile device to display video contents [19]. Implementation of codec: The efficiency and cost of codec highly depend on their implementation [3].

In this paper, we propose a novel remote display method for multimedia cloud considering video surveillance as the main application scenario. Without changing the implementation of existing video surveillance applications, we attempt to provide a thin-client-like solution which encodes the output of applications, and then sends to client devices. Three codex including H.264, MJPEG and Tensor are chosen as candidate codex. Our main contribution in this paper is that we form a non-linear programming model to select the most cost effective codec as well as the encoding settings to optimize the quality of video in the meantime. The selection subjects to many rigid constraints arising from budget limitation, application requirements, network situation, client situation, and implementation of codex. Moreover, the adoption of GPU-assisted (Graphics Processing Units) encoding can greatly increase the performance of image processing at server side. Thus, it has been also considered while designing the remote display method. The rest of this paper is organized as follows. We review the related work in Section 2. Section 3 shows the cloud-based video surveillance model and the details of each codec. The proposed non-linear programming model for codec selection is presented in Section 4. Simulation results are shown in Section 5 and finally, concluding remarks are offered in Section 6.

2 Related work The cloud computing has been investigated to provide video surveillance services in various ways. Firstly, cloud can be used to store surveillance videos. In [10], the authors propose a framework for scalable cloud video recorder system in surveillance

Multimed Tools Appl

environment. Hadoop distributed file system is applied to store video data. Secondly, cloud can provide scalable and distributed processing power to achieve efficient video analytic. In [2], a novel system is invented to bringing together automatic license plate recognition engines and cloud computing technology in order to realize massive data analysis and enable the detection and tracking of a target vehicle in a city with a given license plate number. Real-time face recognition approach is implemented using a mobile-cloudlet-cloud acceleration architecture in [26]. In another study [17], multiclass object recognition using smart phone and cloud computing for augmented reality and video surveillance applications is proposed. In above solutions, the distribution of video contents remains as a challenging issue. Using IP-cameras with high-resolution will produce huge amount of video data, which is not easy to be delivered to mobile client. The current solutions are compatible to certain cameras supporting 640×480-resolution video at 5–6 fps using H.264 codex for streaming video to their services [1, 8, 11]. However, the actual performance of H.264 codex is greatly affected by network situation, especially the packet loss. To find a more efficient solution, we need to survey existing video compression methods for remote display. Amazon EC2 Cloud service uses RDP (Remote Desktop Protocol) to transfer the screen output. Since RDP is not specially designed for video encoding/decoding, it consumes huge bandwidth for delivering high motion screen updates. The state-of-theart video coding standards, such as H.264/AVC, use a block-based hybrid coding scheme. The H264/AVC video codec has been used in [4] as a real-time desktop streamer for thin client system. In [23], a GPU-assisted M-JPEG encoding method is introduced to support mobile thin client system. Comparing with H.264/AVC, MJPEG has better resiliency to packet loss. Tensor decomposition is a new tool introduced to image processing and computer vision in recent years; applications such as noise reduction, handwritten digit classification, dynamic texture synthesis, face recognition and object tracking have shown the usefulness of tensor based methods. Since tensor representation of multidimensional data preserving their useful structural information, tensor based methods are more potential than traditional methods. In data compression applications, the tensor model was applied to image compression by Shashua and Levin [20], and was applied to build a compressed texture database by Furukawa et al. [6]. In [27], Zhou et al. focused on the compact representation of multidimensional data, and proposed a multiple tensor rank-R decomposition (MTRD) algorithm. Experimental results show that tensor codec can improve the peak signal-to-noise ratio (PSNR) values of the reconstructed testing sequences up to 8.96 dB as compared with H.264/AVC. To the best of our knowledge, none of existing work has considered H.264/AVC, M-JPEG and tensor codex in one system. In this paper, we propose a solution that has all three encoding mechanisms and select the most appropriate one based on running environment.

3 System architecture & design In this section, we first explain the overall architecture that enables cloud-based video surveillance system for mobile users. Then the implementation details of each codec are presented.

Multimed Tools Appl

3.1 Cloud-based video surveillance system As shown in Fig. 1, the whole storing, processing and streaming system in the cloud is called Video Surveillance Cloud (VSC). VSC first receives surveillance videos from residential video recorders. IP network cameras are used to capture videos, which usually have H.264, MPEG-4 and Motion JPEG compression formats. Those cameras also support multimedia communication control protocols such as Real Time Streaming Protocol (RTSP). While receiving the video streams, VSC provides several surveillance applications including face detection, motion detection, face recognition, object tracking, etc. The applications usually need their own file-systems, databases or computing methods. The storage & computing modules can be designed in a distributed way to improve the efficiency of applications. For example, distributed file systems or large-scale video bases are required to store massive amount of video clips for large scale surveillance tasks. To provision processing, storage, and networking resources, an Infrastructure as a Service (IaaS) service model is included as shown in the base layer. The storage & computing modules can get desired resources (CPU, GPU, memory, hard-disk and network) from cloud servers through IaaS. Furthermore, it is also possible to use virtualized resources by implementing operating system level virtualization technology such as VMware and Xen hypervisors. Our focus in this paper is the remote display component in VSC. We propose to treat the screen update as raw video inputs when the applications present their output on local screen. Other applications generate encoded video stream as their output. However, the video stream Client IP Cameras

Internet Internet

Surveillance Applications

Remote Display

H.264/AVC Encoder

Face Detection

Input Frames M-JPEG Encoder

Object Tracking Codex Selection

Tensor Encoder

Storage & Computing Modules

File System

Database

Computing

Infrastructure Services

CPU

GPU

Memory

Hard Disk

Network

Virtualized Resources

Video Surveillance Cloud

Fig. 1 Cloud-based video surveillance system with novel remote display module

Multimed Tools Appl

may need to be transcoded before sending to the client because of network problem at client side. In this case, the video stream has to be decoded first before encoding it again. Thus, our remote display component can also take over the encoding task after the video stream is decoded as raw video. A codex selection method is invented to choose the most appropriate video encoder among three candidates. The decision is made after considering several important factors in the running context. At last, the selected encoder performs the encoding task, and the encoded contents are sent to client for remote display.

3.2 H.264/AVC real-time encoder The flow diagram of H.264 encoder with GPU acceleration is shown in Fig. 2. As we can see, Intra Prediction, Motion Compensation, Motion Estimation, Discrete Integer Transform (DIT), Quantization (Q), Inverse Quantization (IQ), Inverse Discrete Integer Transform (IDIT), Deblocking Filter and Clipping modules can be processed by GPU or CPU. The functions in above blocks have pixel level data parallelism. Consequently, the processing can be accelerated by leveraging the parallel processing capability provided by many GPU cores. As the Variable Length Coding requires sequential processing, offloading it to GPU will not bring an optimized encoding result. The frame rate has to be decided before the encoding process is started. Encoding more frames in a second will lead to higher bandwidth consumption, higher computational

GPU or CPU Compression settings

Frames per second

Raw data

-

DCT

VLC

Q IQ

Intraprediction

IDCT

Motion compensation De-blocking filter Motion estimation

CPU/ GPU

Frame memory

Clipping

Fig. 2 Flow diagram of GPU-assisted H.264 encoding

CPU

Output

Multimed Tools Appl

complexity and higher quality of video. Adopting GPU-assisted encoding can reduce the encoding time, but increases the cost for renting VMs with additional graphical hardware. The compression settings can be adjusted in Quantization (Q), Intra Prediction, Motion Compensation and Motion Estimation blocks. Those settings may affect video quality, consumed bandwidth and encoding time.

3.3 M-JPEG real-time encoder Figure 3 shows the flow diagram of JPEG compression using both GPU and CPU. As we can see, the color space transformation, discrete cosine transform, adaptive quantization and Bzigzag^ ordering can be assigned to GPU because those blocks consist of many independent computing tasks, which match with the data level parallelism feature in GPU. By contrast, the process of run-length encoding and Huffman coding contains lots of branches. The last two parts are assigned to CPU since GPU cannot provide satisfactory performance for such tasks. There are three decisions that have to be made before using M-JPEG encoder. The first one is to decide how many frames should be encoded in one second. Fps (frames per second) has impact on cost, quality of video and consumed bandwidth. The second issue is to decide whether the GPU will be used to assist encoding or not. Adopting GPU-assisted encoding enables better Fps, but increases the cost for renting VMs with additional graphical hardware. Thirdly, the image compression ratio can be changed by using several quantization tables. This setting affects the quality of video and consumed bandwidth.

Frames per second

Raw data

GPU or CPU

Color space transformation

CPU/ GPU

Discrete cosine transform

Compression ratio

Adaptive quantization

"Zigzag" ordering

Run-length encoding

Huffman coding

Fig. 3 Flow diagram of GPU-assisted JPEG encoding

CPU

Multimed Tools Appl

3.4 Tensor encoder Intuitively speaking, a real Nth‐order,(I1,I2,⋯,IN)-dimensional tensor A∈T I 1 I 2 ⋯I N is a N

multidimensional array of size I1 ×I2 ×⋯×IN that consists of ∏ I n entries in T, i.e., n¼1

A ¼ ðai1 i2 ⋯iN Þ; ai1 i2 ⋯iN ∈T ;

ð1Þ

where in =1,2,⋯,In In for n=1,2,⋯,N. For example, first-order tensors are vectors and second-order tensors are matrices. Tensors of order higher than 2 are called higher-order tensors, which are natural representations of multidimensional data preserving their structure. Traditionally, tensors are decomposed as a sum of rank-1 outer products using either the CP model or the Tucker model, or some variation thereof. Each model can be considered as a higher order generalization of the singular value decomposition (SVD). Concretely, a tensor A∈T I 1 I 2 ⋯I N is factored into a core tensor C∈T R1 R2 ⋯RN multiplied by a matrix XðiÞ ∈ T I i Ri along each i-mode, i.e., A ≈ C 1 Xð1Þ 2 ⋯ N XðN Þ R1 RN X X ¼ ⋯ d r1 ⋯rN xðr11 Þ ∘⋯∘xðrNN Þ r1 ¼1

ð2Þ

rN ¼1

These two models focus on different aspects. The CP model requires a diagonal core tensor, on the contrary, the Tucker model requires orthonormal matrices, which is also well known as the higher order SVD (HOSVD). The flow diagram of the encoder is shown in Fig. 4. Each selected spatial-temporal video block needs four steps to get the output bit stream. The purpose of the data rearrangement step is to transform the redundancy along each mode. It greatly affects the efficiency of the next step. Here we assume the information of each sub-image is concentrated in a principal direction, and estimate the principal directions by employing the Hough transform. In the transform step, the MTRD algorithm is used to obtain a compact representation for the current block. The value of R in the CP model should be well specified. After this step, we obtain a low-rank approximation of the current block. In other words, we obtain the diagonal core tensor D ¼ diagðηÞ, and the factor matrices U(1),U(2),U(3),U(4). This is the most complex step, thus the algorithm optimization or improvement for the compact tensor decomposition should be further studied.

Fig. 4 Flow diagram of the tensor based video encoder

Multimed Tools Appl

In the third step, The decomposition coefficients will be further quantize by using an 8-bit (256 levels) scalar linear quantizer. Since the values of η and U(1),U(2),U(3),U(4) are significantly different, the better way is to quantize them separately. Finally, in the entropy encoding step, we simply use the dictionary based LZ77 algorithm. Currently, tensor encoders cannot support real-time encoding. However, it is still very useful if the remote user does not need real-time services.

4 Encoder selection In order to achieve effective encoder selection, we use a non-linear programming model to formulate this problem. The following steps are required in non-linear programming modeling:

& & & &

Define related parameters, variables and functions Define appropriated objective Define relevant constraints Specify functions and rewrite constraints for all encoders

To obtain the desired aims, we assume that the information needed for making decisions can be retrieved from benchmarking, application profiling, user settings, real-time monitoring, etc.

4.1 Notations In this section, we present notations that are used to this programming in Table 1. As we can see from Table 1, our notations consist of parameters, variables and functions. The parameters are constant values that can be obtained before starting encoder selection. The variables will be decided during the encoder selection process to optimize goal function. The functions are used to calculate some important values based on variables and parameters. The functions used for the three encoders are different from each other. Thus, we need to solve one non-linear programming for each encoder by using its own functions. Finally, the three results will be put together for comparison, and the best one will be selected.

4.2 Optimization goal The objective of encoder selection is the minimization of the cost, which is the money spent on renting VMs, as well as the maximization of the quality of received video, which is defined as a function Q() and related to the received resolution, fps and PSNR. The optimization goal function is presented as follows: X C ni − min K 1 i ð3Þ K 2 Q f f r ð f bw ð f r; re; cmÞ; B; LÞ; f sn ðcmÞ; re where K1 and K2 are the weight factors adjusting the comparative importance of cost and quality. The objective should be same for all three encoders.

Multimed Tools Appl Table 1 Notations Parameter

Description

Cmax

Maximum cost user can afford

Ci CPUi

Cost of renting virtual machine i Available CPU processing capability on virtual machine i

GPUi

Available GPU processing capability on virtual machine i

Rre_min

Requirement on minimum resolution

Rfr_min

Requirement on minimum fps

Rsn_min

Requirement on minimum PSNR

B

Available bandwidth at client side

L

Packet loss at client side

P Variable

Processing capability at client side Description

ni

Number of rented virtual machine i

re

Encoder resolution setting

fr

Encoder frame rate setting

cm

Encoder compression setting

g

GPU setting; g=0 or 1

Function f_bw(fr,re,cm) f_fr(f_bw(),B,L)

Description Desired bandwidth Received frame rate at client side

f_sn(cm)

Video PSNR under certain compression setting

Q(f_fr(),f_sn(),re)

Overall evaluation on the quality of received video

f

en ð∑CPU i

ni ; ∑GPU i ni ; g; re; cmÞ

f_de(fr,re,cm)

Estimated encoding time per frame

Minimum processing capability required to decode received video

4.3 Constraints 4.3.1 Cost constraint The total cost of renting virtual machines should be less than or equal to the maximum cost X user can afford, i.e., C i ni ≤C max ð4Þ

4.3.2 Resolution constraint The encoder resolution setting should fulfill the requirement on minimum resolution, i.e., re ≥Rre

min

ð5Þ

4.3.3 Frame rate constraints The estimated encoding time per frame should be able to meet the requirement of encoder frame rate setting, i.e.,

Multimed Tools Appl

f

X en

CPU i ni ;

X

. GPU i ni ; g; re; cm ≤1 f r

ð6Þ

The received frame rate at client side should be greater than or equal to the required minimum fps, i.e., f

f r ð f bw ð f r; re; cmÞ; B; LÞ≥R f r min

ð7Þ

4.3.4 PSNR constraint The video PSNR should be greater than or equal to the required minimum PSNR, i.e., f

sn ðcmÞ ≥Rsn min

ð8Þ

4.3.5 Client capability constraint The required processing capability for decoding video at client side should be less than or equal to what client device owns, i.e., f

de ð f r; re; cmÞ ≤P

ð9Þ

4.3.6 Other constraints All variables should be greater or equal to 0, i.e., ni ; re; f r; cm; g ≥0

∀i

ð10Þ

If we consider the encoding task is handled by one VM, the following constraint should be included as well: X ð11Þ ni ¼ 1

4.4 Function specification The last step of problem modeling is to specify the functions. We use M-JPEG encoder as an example to show one possible function specification method using added parameters and variables. In this example, we allow only 50 different encoder settings by varying resolution, frame rate, compression ratio and GPU usage. According to that, we can add one group variable sj, j=1,2,…,50 to replace the original re, fr, cm and g. Through benchmarking, a group of new parameters showing the performance under those settings is added and shown in Table 2. Assume that the quality of video is guaranteed by constraints of (5), (7) and (8), we remove the corresponding quality factor from the optimization goal. The non-linear programming model of the M-JPEG encoder can be re-designed as follows: X ð12Þ min C i ni s:t:

X

C i ni ≤C max

ð13Þ

Multimed Tools Appl Table 2 Added parameters and variables Parameter

Description

CPUj

CPU requirement while adopting video encoding setting j

GPUj REj

GPU requirement while adopting video encoding setting j Resolution of video encoding setting j

FRj

Frame rate of video encoding setting j

SNj

PSNR of video encoding setting j

BWj

Desired bandwidth consumption for video encoding setting j

BSj

Bandwidth shortage calculated by BWj −B; BSj =0 if BWj ≤B

Pj

Minimum processing capability required to decode received video with setting j

Variable

Description Indicating whether video encoding setting j is applied or not; sj =0 or 1

sj

X X

X X

RE j s j ≥Rre

CPU j s j ≥

GPU j s j ≥

X

X

CPU i ni

ð15Þ

GPU i ni

ð16Þ

FR j s j 1−BS j =BW j ð1−LÞ ≥R f r X

SN j s j ≥ Rsn

X

min

P j s j ≤P

X

X

ð14Þ

min

min

ð17Þ ð18Þ ð19Þ

ni ¼ 1

ð20Þ

sj ¼ 1

ð21Þ

ni ; s j ¼ 0 or 1 ∀i; j

ð22Þ

We directly use (13), (14), (18) and (19) to replace (4), (5), (8) and (9), respectively. The constraint (6) is substituted by (15) and (16) to guarantee the provided CPU and GPU capacity is sufficient enough to meet the computational resource requirement of encoding setting j. As the frames of M-JPEG video are independent to each other, the received frame rate can be estimated by using a linear function as shown in (17), which replaces (7). As we can see from (12)–(22), the original non-linear programming problem is now converted into a binary linear programming problem after the functions are specified under certain assumptions. Any linear programming solver can be used to find the solution for this

Multimed Tools Appl

case. Similar method can be also used for specifying the functions representing H.264 and tensor encoders, which may not lead to a linear programming result. As there is no general solution for non-linear programming problem, we also claim that no general method can be used to specify the functions. However, the principle of function specification is to make the converted problem into a simple form that can be solved easily.

5 Simulations In the simulations, we present in detail a set of settings and performances for remote display of multimedia service and the reasoning behind them. Our solution is capable of detecting the client's Internet connection situation and deciding the appropriate encoder based on that link, along with the server cost and playback capabilities of the client –thus supplying each different client with the best resolution and bitrate he/she can use.

5.1 Simulation settings Remote multimedia user experience is being conducted through objective evaluation. Table 3 shows the hardware/software environment according to which we conducted the simulations. For the testing purpose, we run first person shooter game Counter-Strike, known as CS, on the server with resolution of 640*480. It is an application with high motion and strict real-time requirement. Thus, we believe that most of the remote video surveillance tasks can be also accomplished if our solution can support remote gaming. Four existing remote display solutions along with our proposed method have been applied to conduct our simulation. In our method, NVENC published by NVIDIA [15] is used as the encoding tools for H.264 encoder. H.264 is also the popular encoding method used by many existing commercial cloud gaming platform, such as Onlive [16], StreamMyGame (SMG) [24]. For MJPEG based encoding, we use our previous encoding technique in [23]. M-JPEG has the following advantages: i) minimum latency in image processing; ii) flexibility of splicing and resizing and iii) good resilience against packet loss. But it will consume more network bandwidth than H.264 encoding. Tensor encoder is not included in the simulation since it cannot meet real-time encoding requirement. The existing state-of-the-art solutions we use for conducting comparisons are: RDP [12], RealVNC [18], MPEG-based encoder [21] and MJPEG-based encoder [23]. Table 3 Simulation setup Client

Server

Language

C

C

OS

Windows XP SP3

Windows Server 2010

Specific tool Hardware

None Computer: CPU: Intel Core™2 Duo Memory: DDR3 ECC RAM 1.99Gbyte

CUDA HP Z820 Workstation: CPU: Intel® Xeon® CPU E5-2620 @ 2.00GHZ Memory: 32.00GB Graphic Card: NVIDIA Quadro K4000

Encoding settings

24 Frames Per Second

Network

100 Mbps Ethernet, TCP/IP NEWT (Network Emulator for Windows Toolkit)

Multimed Tools Appl

During running time, we use NEWT (Network Emulator for Windows Toolkit) [13] to simulate different network conditions, including available bandwidth and packet loss. We assume that the readers are familiar with basic video-encoding terms and technology. Audio is not included in the simulation since it is out of the scope of this research.

5.2 Simulations in static environment In static environment, we did not set any resource or network constraint. Thus, the results will represent the best performance each remote display solution can achieve for the giving multimedia service. Figure 5 shows the bandwidth consumption comparison during the simulation process. It shows that proposed solution select H.264 encoder which consumes less network bandwidth than any other encoder. Both windows RDP and RealVNC consume huge bandwidth due to the fact that they are not specially designed to handle high-motion remote application. Figure 6 presents the client CPU consumption results. The decoding overhead of MPEG and H.264 (proposed) are significantly higher than other solutions. Except them, 20 % is the maximum CPU consumption that we observed during the simulation. The average value of MJPEG decoding CPU overhead is between 13 and 14 %. It is noticeable our client device has lower capability than most of existing devices, including mobile devices. Thus, the client-side overhead is acceptable even for MPEG encoding. Figure 7 shows the real-time CPU usage of cloud server during gaming period. As we can see, MPEG and H.264 encoder (proposed) consumes slightly more CPU processing power than MJPEG encoder. Around 25–35 % CPU resource has been consumed for H.264 encoder, and 15–30 % CPU resource has been consumed for MJPEG encoder. Windows RDP and RealVNC consume less CPU resource on server since they do not perform sophisticated encoding. Figure 8 presents the Peak Signal-to-Noise Ratio (PSNR) of encoding results. During our simulation, we use the default setting of NVENC for H.264 and Q=50 for MJPEG encoder.

Fig. 5 Bandwidth consumption in static environment

Multimed Tools Appl

Fig. 6 Client CPU consumption in static environment

The PSNR is calculated by using the mean square error MSE between the image I before encoding and the final decoded image K: M SE ¼

m−1 n−1 1 XX ½I ði; jÞ−K ði; jÞ2 : m n i¼0 j¼0

ð23Þ

Then the PSNR (in dB) is defined as: PSNR ¼ 20 log10 ðM AX I Þ−10 log10 ðM SE Þ

ð24Þ

where MAXI is the maximum possible pixel value of the image. Due to the reason that the

Fig. 7 Server CPU consumption in static environment

Multimed Tools Appl

Fig. 8 PSNR in static environment

PSNR is mainly determined by the encoding techniques rather than the images, no significant difference is observed among those encoding solutions. All solutions achieve PSNR between 38 and 40 dB. Figure 9 shows the responds time. According to [7], the response delay consists of three major parts: Encoding delay (ED) is the time required for the server to receive and process a player’s command, and to encode and transmit the corresponding frame to that client; Playout delay (OD) is the time required for the client to receive, decode, and render a frame on the display; Network delay (ND) is the time required for a round of data exchange between the server and client. The simulation results indicate that H.264 and MPEG encoders have longer response time than other encoders. The reason is that these two encoders are intra-frame-based, thus will produce longer ED.

Fig. 9 Response time in static environment

Multimed Tools Appl

Figure 10 depicts the quality of video results. We use slow motion technique [14] to measure the quality of video. Among these protocols, RealVNC and RDP provide the worst quality of video in gaming scenario. Our approach along with MJPEG and MPEG encoding methods has a consistent and excellent performance. Over 85 % of the encoded data can be successively transmitted and received at client side. Thus, the entire gaming process is fluent.

5.3 Simulations in dynamic environment We vary the available network bandwidth, the packet loss and client side capability to test the performance of remote display solutions. The main performance measure in this sub-section is quality of video which can represent user experience better than any other evaluation metric. In Fig. 11, the impact of bandwidth availability is present. We vary the available bandwidth from 1500 Kbps to 100 Kbps. It can be seen that both Windows RDP and RealVNC degrade their performance significantly due to the fact that they regularly consume huge bandwidth. MJPEG solution performs well when the available bandwidth is above 500 Kbps. Both MPEG and H.264 (proposed) solutions can work normally in a network condition where only 200 Kbps connection is used. Overall, the proposed solution is able to deal with low bandwidth availability better than any other solutions. Figure 12 depicts the impact of packet loss. Our proposed solution uses H.264 encoder when packet loss is 0 %. Then it switches to MJPEG encoder once packet loss is 5 % or above. This is because H.264/MPEG encoder has bad resilience against packet loss even with certain recovery techniques. As shown in Fig. 11, the performance of MPEG encoder decreases and become the worst one in the end. These results prove that our proposed solution is able to use MJPEG to improve performance if it is needed. Figure 13 presents the simulation results with varying client CPU capacity. The MPEG decoder running at client side needs at least 60 % of CPU to perform decoding. Thus, it does not work when only 40 % or lower CPU capacity is available at client side. Since the H.264 decoder also face the same problem, our proposed solution starts using MJPEG for remote display after the available client CPU drop down to 60 %.

Fig. 10 Quality of video in static environment

Multimed Tools Appl

Fig. 11 Impact of available bandwidth

6 Conclusions This paper introduced a novel remote display method to support cloud-based multimedia system. To be more specific, we chose video surveillance as the application scenario for our proposal. H.264, M-JPEG and tensor are considered as three candidate encoding methods. We formed an encoder selection problem by using non-linear programming model. The issues including budget limitation, application requirements, network situation, client situation and GPU-assisted codec implementation have been addressed while selecting encoder. In future, we plan to improve the tensor encoder by using GPU acceleration. Since tensor encoding is based on tensor decomposition, our specific goal is to make GPU code for tensor decomposition and further calculation on decomposed data. If we can successfully reduce the encoding

Fig. 12 Impact of packet loss

Multimed Tools Appl

Fig. 13 Impact of client CPU availability

time, tensor encoder has the potential to be the most bandwidth efficient choice that can tackle the remote display problem even in a very bandwidth-limited situation. Acknowledgments This project was funded by the National Plan for Science, Technology and Innovation (MAARIFAH), King Abdulaziz City for Science and Technology, Kingdom of Saudi Arabia, Award Number (12-INF2613-02).

References 1. Axis Communications (2012) Axis Communications Web site. [Online]. http://www.axis.com/ 2. Chen Y-L, Chen T-S, et al (2013) Intelligent urban video surveillance system for automatic vehicle detection and tracking in clouds. IEEE 27th International Conference on Advanced Information Networking and Applications 3. Chien MC, Wang RJ, Chiu CH, Chang PC (2012) Quality driven frame rate optimization for rate constrained video encoding. IEEE Trans Broadcast 58(2):200–208 4. De Winter D, Simoens P, Deboosere L (2006) A hybrid thin-client protocol for multimedia streaming and interactive gaming applications. In the 16th Annual International Workshop on Network and Operating Systems Support for Digital Audio and Video 5. Dinh HT, Lee C, Niyato D, Wang P (2011) A survey of mobile cloud computing: architecture, applications, and approaches. Wirel Commun Mob Comput. doi:10.1002/wcm.1203 6. Furukawa R, Kawasaki H, Ikeuchi K, Sakauchi M (2002) Appearance based object modeling using texture database: acquisition, compression and rendering. In Proc. of the 13th Eurographics Workshop on Rendering, Aire-la-Ville, p 257–266 7. Huang C-Y, Hsu C-H, Chang Y-C, Chen K-T (2013) Gaming anywhere: an open cloud gaming system. In ACM Proceedings of the 4th ACM multimedia systems conference, p 36–47 8. ipConfigure, Inc. (2011) ipConfigure Web site. [Online]. http://www.ipconfigure.com/products/SCS/ 9. Kumar S, Xu L, Mandal MK, Panchanathan S (2006) Error resiliency schemes in H. 264/AVC standard. J Vis Commun Image Represent 17(2):425–450 10. Lin CF, Yuan SM, Leu MC, Tsai CT (2012) A framework for scalable cloud video recorder system in surveillance environment. In 2012 9th International Conference on Ubiquitous Intelligence & Computing and 9th International Conference on Autonomic & Trusted Computing (UIC/ATC), pp. 655–660 11. Mell P, Grance T (2011) The NIST definition of cloud computing recommendations of the National Institute of Standards and Technology. NIST Spec Publ 145(6):1–7

Multimed Tools Appl 12. Microsoft remote desktop protocol: Basic connectivity and graphics remoting specification. [Online]. Available: http://msdn2.microsoft.com/en-us/library/cc240445.aspx 13. Network emulator for windows toolkit in microsoft visual studio, [Online]. Available: https://www. visualstudio.com/en-us 14. Nieh J, Yang SJ, Novik N (2003) Measuring thin-client performance using slow-motion benchmarking. ACM Trans Comput Syst 21(1):87–115 15. NVIDIA VIDEO CODEC SDK (2014) https://developer.nvidia.com/nvidia-video-codec-sdk 16. Onlive [Online]. Available: http://www.onlive.com/ 17. Paul AK, Park JS (2013) Multiclass object recognition using smart phone and cloud computing for augmented reality and video surveillance applications. In IEEE 2013 International Conference on Informatics, Electronics & Vision (ICIEV), pp. 1–6 18. Real VNC. [Online]. Available: http://www.realvnc.com/ 19. Ren S, van der Schaar M (2013) Efficient resource provisioning and rate selection for stream mining in a community cloud. IEEE Trans Multimedia 15(4):723–734 20. Shashua A, Levin A (2001) Linear image coding for regression and classification using the tensor-rank principle. In Proc. of the 2001 I.E. Conf. on Computer Vision and Pattern Recognition, p 42–49 21. Simoens P, Praet P, Vankeirsbilck B, De Wachter J, Deboosere L, De Turck F, Dhoedt B, Demeester P (2008) Design and implementation of a hybrid remote display protocol to optimize multimedia experience on thin client devices. ATNAC 2008. Australasian Telecommunication Networks and Applications Conference, p 391–396, 7–10 22. Simoens P, De Turck F, Dhoedt B, Demeester P (2011) Remote display solutions for mobile cloud computing. Computer 44(8):46–53 23. Song B, Tang W, Nguyen TD, Hassan MM, Huh EN (2013) An optimized hybrid remote display protocol using GPU-assisted M-JPEG encoding and novel high-motion detection algorithm. J Supercomput 66(3): 1729–1748 24. Streammygame [Online]. Available: http://www.Streammygame.com 25. Tian Y-l et al (2008) IBM smart surveillance system (S3): event based video surveillance system with an open and extensible framework. Mach Vis Appl 19(5–6):315–327 26. Yi S, Jing X, Zhu J, Zhu J, Cheng H (2012) The model of face recognition in video surveillance based on cloud computing. In: Advances in computer science and information engineering. Springer, Berlin, pp 105– 111 27. Zhou B, Zhang F, Peng L (2013) Compact representation for dynamic texture video coding using tensor method. IEEE Trans Circuits Syst Video Technol 23(2):280–288

Biao Song received his Ph.D. degree in Computer Engineering from Kyung Hee University, South Korea in 2012. Currently he is with King Saud University, Kingdom of Saudi Arabia as Assistant Professor, in College of Computer and Information Science. His current research interests are Cloud computing, remote display technologies and dynamic VM resource allocation.

Multimed Tools Appl

Mohammad Mehedi Hassan received his B.Sc. degree in Computer Science and IT from Islamic University of Technology, Dhaka, Bangladesh in 2003. He received his Ph.D. degree in Computer Engineering from Kyung Hee University, South Korea in 2010. He was a Research Professor at Computer Engineering department, Kyung Hee University, South Korea from March, 2011 to October, 2011. Currently he is with King Saud University, Kingdom of Saudi Arabia as Assistant Professor, Chair of Pervasive and Mobile Computing, in College of Computer and Information Science. His current research interests are Cloud computing, data intensive computing, media Cloud, mobile Cloud, game theory, dynamic VM resource allocation, IPTV, virtual network, sensor network and publish/subscribe system.

Yuan Tian has received her master and Ph.D degree from Kyunghee University and she is currently working as Assistant Professor at College of Computer and Information Sciences, King Saud University, Kingdom of Saudi Arabia. She is member of technical committees of several international conferences. In addition, she is an active reviewer of many international journals. Her research interests are broadly divided into privacy and security, which are related to cloud computing, bioinformatics, multimedia, cryptograph, smart environment, and big data.

Multimed Tools Appl M. Shamim Hossain received his Ph.D. degree in electrical and computer engineering, with specialization in computer and software engineering, from the University of Ottawa, Ottawa, ON, Canada in 2009. He is currently an Assistant Professor with Software Engineering dept., CCIS, King Saud University, Riyadh, Saudi Arabia. He is also an adjunct member of Multimedia. Communication Research Laboratory (MCRLab), SITE, University of Ottawa, Canada. He is author and co-author of four books, four book chapters, and more than 30 publications. His research interests include Quality of service, Service oriented computing, service configuration, Ambient assisted living, e-health, and Biologically inspired approach for multimedia and software system. Dr. Shamim is a Senior Member of IEEE, member of ACM. He is the co-general chair of the IEEE ICME workshop on Multimedia services and technologies for E-health. He is Guest Editor for several journals.

Atif Alamri is an Associate Professor of Information Systems Department, at the College of Computer and Information Sciences, King Saud University. Riyadh, Saudi Arabia. His research interest includes multimedia assisted health systems, ambient intelligence, and service-oriented architecture. Mr. Alamri was a Guest Associate Editor of the IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, a Cochair of the first IEEE International Workshop on Multimedia Services and Technologies for E-health, a Technical Program Co-chair of the 10th IEEE International Symposium on Haptic Audio Visual Environments and Games, and serves as a Program Committee Member of many conferences in multimedia, virtual environments, and medical applications.