Joint Source and Channel Rate Allocation over Noisy ... - MDPI

5 downloads 0 Views 7MB Size Report
Aug 30, 2018 - mobile surveillance system for vehicle tracking, multiple mobile camera nodes capture and ... Keywords: multimedia IoT; video analysis; vehicle tracking; rate allocation; surveillance system; ..... Under the circumstance of weak lighting conditions, cameras are ..... source packets and the protect packets.
sensors Article

Joint Source and Channel Rate Allocation over Noisy Channels in a Vehicle Tracking Multimedia Internet of Things System Yixin Mei 1 , Fan Li 1, * 1 2

*

ID

, Lijun He 1 and Liejun Wang 2

School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China; [email protected] (Y.M.); [email protected] (L.H.) College of Software, Xinjiang University, Urumchi 830046, China; [email protected] Correspondence: [email protected]; Tel.: +86-29-8266-8772

Received: 9 July 2018; Accepted: 28 August 2018; Published: 30 August 2018

 

Abstract: As an emerging type of Internet of Things (IoT), multimedia IoT (MIoT) has been widely used in the domains of healthcare, smart buildings/homes, transportation and surveillance. In the mobile surveillance system for vehicle tracking, multiple mobile camera nodes capture and upload videos to a cloud server to track the target. Due to the random distribution and mobility of camera nodes, wireless networks are chosen for video transmission. However, the tracking precision can be decreased because of degradation of video quality caused by limited wireless transmission resources and transmission errors. In this paper, we propose a joint source and channel rate allocation scheme to optimize the performance of vehicle tracking in cloud servers. The proposed scheme considers the video content features that impact tracking precision for optimal rate allocation. To improve the reliability of data transmission and the real-time video communication, forward error correction is adopted in the application layer. Extensive experiments are conducted on videos from the Object Tracking Benchmark using the H.264/AVC standard and a kernelized correlation filter tracking scheme. The results show that the proposed scheme can allocate rates efficiently and provide high quality tracking service under the total transmission rate constraints. Keywords: multimedia IoT; video analysis; vehicle tracking; rate allocation; surveillance system; forward error correction

1. Introduction The Internet of Things (IoT) is a global infrastructure for the information society, enabling the advanced services by interconnecting physical and virtual things based on the existing and evolving interoperable information and communication technologies [1]. The IoT extends the conventional concept of the Internet as an infrastructure network reaching out to end-user’s terminals, to a concept of a pervasive network of interconnected objects. As an emerging type of IoT, Multimedia IoT (MIoT) refers to IoT with multimedia outputs or/and inputs [2]. MIoT has been widely used in the domains of healthcare [3,4], smart buildings/homes [5,6], transportation [7,8] and surveillance [9], etc. Particularly, intelligent surveillance networks are some of the main applications of MIoT. They normally consist of sensors, a transmission network and server [2]. The sensors and the server carry out the tasks of capturing and processing the videos, and the transmission network is responsible for the transmission of all kinds of information. Although a great proportion of videos are captured for human consumption, more and more videos are transmitted for video analysis rather than being processed manually in intelligent surveillance networks. In a realistic surveillance MIoT with the purpose of video analysis, videos are uploaded to a server to meet the computation capability requirement of video processing

Sensors 2018, 18, 2858; doi:10.3390/s18092858

www.mdpi.com/journal/sensors

Sensors 2018, 18, 2858

2 of 21

and analysis. When the video sequences are uploaded via a wireless network, multiple camera nodes compete for the limited transmission resources. Due to the huge amount of video data and the limited wireless resources, it is challenge to allocate the limited resources efficiently based on the video content features to achieve high quality intelligent surveillance service for MIoT systems. A lot of existing research has considered video coding and transmission schemes to improve the human visual quality [10–15]. The authors in [10] proposed a joint coding and transmission scheme to provide high quality video for human consumption. First, the video coding and transmission scheme was modified to get higher structural similarity index (SSIM). Then in a server, the multiple videos were merged to one video with a wide viewing angle and high resolution, to provide high quality surveillance video for human consumption. In [11], a background-model-based wireless video transmission scheme was proposed to provide high quality of experience (QoE) for users. The foreground (like pedestrians) regions were detected in camera mode. Then they were uploaded to server instead of full video sequences to avoid the degradation of QoE caused by limited transmission resources. Considering the channel fading and shadowing, [13–15] employed Gilbert-Elliot model to estimate the burst-loss prone channel condition and avoid the corrupt decoding caused when bursts of noises occur. All these schemes were designed to improve the quality of service/experience to provide better viewing experiences for clients. However, none of them are suitable for a surveillance network with the purpose of video analysis. To improve the service quality of video analytics in surveillance networks, plenty of researchers have worked on improving the performance of video analysis schemes in the computer vision field. Detection, recognition and tracking of human/vehicles are the main analysis tasks of smart surveillance systems [16–20]. In [21], an object detection algorithm was proposed based on the discriminatively trained part models. In [22], an object detection scheme was proposed based on region proposal networks. Minimum Output Sum of Squared Error filter (MOSSE) is a correlation filter-based tracker, which firstly uses a correlation filter for object comparison [23]. In [24], a kernel trick was used to improve correlation filters, which turned the linear classifier into a non-linear ridge regression classifier. Based on [24], Henriques et al. proposed a kernelized correlation filter (KCF) tracking scheme, which employed histogram of oriented gradient (HOG) features rather than gray scale features [25]. These works are representative video analysis schemes, which possess in common the characteristic of high computation complexity. Therefore, it is better to assign these complex video analysis tasks to a server, which has greater computational capacity than camera nodes. The main challenges for the schemes which chose to upload videos and analyze videos on a server, are the limited transmission resources and network state fluctuations. In [26,27], video features rather than full video sequences were uploaded to the server for analysis to save transmission resources. However, the full videos are not accessible on the server, which limits the further investigation. In [28], full video sequences were uploaded to a server for analysis. However, the video analysis performance degradation caused by compression or transmission errors are not considered. Only a few researchers have studied the effect of video compression and transmission parameters on video analysis performance. In [29], a saliency-based rate control scheme was proposed for human detection with a single camera. A standard-compliant video encoding scheme was proposed to obtain better object detection performance on compressed video [30]. In [31,32], the effect of quantization parameter (QP) and transmission conditions on human detection were considered to optimize the performance of human detection in server. However, due to the lack of the consideration of video content, which is closely related to the analysis performance, the prediction of human detection precision is not accurate. In a realistic surveillance network for vehicle tracking, all surveillance videos are uploaded to server for video analysis purposes. When video sequences are uploaded via a wireless network, multiple camera nodes compete for the transmission resources which are limited by the bandwidth. Therefore, considering the effect of video content and video quality degradation caused by compression and transmission error, a joint source and channel rate allocation scheme is proposed in this paper to

Sensors 2018, 18, x FOR PEER REVIEW

3 of 21

Sensors 2018, 18, 2858

3 of 21

compression and transmission error, a joint source and channel rate allocation scheme is proposed in this paper to optimize performance of KCF tracking scheme in server. The main contributions of this optimize performance of KCF tracking scheme in server. The main contributions of this paper can be paper can be summarized as follows: summarized as follows: (1) Content-aware tracking precision prediction model. The features of the video content, which (1) Content-aware tracking precision models, prediction model. The features of the video content, which are not considered in existing prediction have a great impact on tracking precision. Based on are not considered in existing prediction models, have a great impact on tracking precision. Based on the effect of video compression and video content on KCF tracking scheme, we take the bits per pixel, the effect of video and video content KCF tracking bits per luminance valuescompression and complexity of the videooncontent as keyscheme, factors we in take the the process of pixel, HOG luminance and complexity the video content as based key factors in obtained the process of HOG extraction extraction values and object matching inofKCF scheme. Then, on the relationship between and object matching KCF scheme. Then, based on the obtained relationship between these factors these factors and theintracking precision, we establish a content-aware tracking precision prediction and the by tracking we method. establish a content-aware tracking precision prediction model by using model using precision, curve-fitting curve-fitting method. (2) Tracking-precision-driven joint source and channel rate allocation scheme. Due to the (2) Tracking-precision-driven joint source of and channel rate allocation scheme. Due to the limited wireless resources and the unreliability wireless networks, it is important to design a joint limited resources and the unreliability of wireless networks, it is important design by a source wireless and channel rate allocation scheme to avoid degradation of tracking precisiontocaused joint source and channel rate allocation scheme to avoid degradation of tracking precision caused video compression or transmission errors. We optimize the rate allocation for camera sensors by a by video compression or transmission errors. We optimize theour ratecontent-aware allocation for camera by tracking-precision-driven optimization problem. Based on trackingsensors precision aprediction tracking-precision-driven optimization problem. Based on our content-aware trackingofprecision model, the optimization problem is formulated to maximize the product tracking prediction model, the optimization problem is formulated to maximize the product of tracking precision precision of each surveillance video in MIoT server under the constraint of the total transmission of each surveillance video in MIoT serverthe under the constraint of the total transmission rate in rate in the wireless network. To improve reliability of data transmission and the real-time ofthe the wireless network. To improve reliability data transmission and the real-time video communication, we adoptthe forward errorofcorrection (FEC) in the application layer.of the video communication, we adopt errorascorrection (FEC) in2 the application layer. The rest of this paper forward is organized follows: Section introduces the scenarios and structure of restsystem of this paper is organized as follows: 2 introduces the scenarios and structure our The MIoT for vehicle tracking. Section Section 3 presents a content-aware tracking precision of our MIoT system for vehicle tracking. Section 3 presents a content-aware tracking precision prediction model. The proposed joint source and channel rate allocation scheme are given in Section prediction model. The proposed joint in source and5,channel rate schemeinare given6. in Section 4. 4. The simulation results are shown Section followed byallocation the conclusions Section The simulation results are shown in Section 5, followed by the conclusions in Section 6. 2. Scenarios and Structure of MIoT System for Vehicle Tracking 2. Scenarios and Structure of MIoT System for Vehicle Tracking Figure 1 shows the scenario of the MIoT system. In the MIoT system, multiple mobile camera Figure 1 shows theand scenario the MIoT system. the MIoT system, nodes are distributed moveof round randomly. A In mobile camera nodemultiple can be amobile mobilecamera sensor nodes are distributed and move round randomly. A mobile camera node can be a mobile sensor device like a car recorder, smart phone or unmanned aerial vehicle, etc. Every mobile camera node device like a car recorder, smart phone aerial vehicle, etc. Every mobile camera can capture, encode, and upload videosortounmanned the MIoT server via a wireless network. The MIoT node sever can capture, upload videos to thealgorithm MIoT server a wireless The MIoTDue sever analyzes theencode, receivedand videos with a tracking oncevia the tracking network. tasks are assigned. to analyzes the received videos with a tracking algorithm once the tracking tasks are assigned. Due to the mobility and random distribution of camera nodes, wireless wide area networks (WWANs) are the mobilityfor and random distribution of camera nodes, wireless networks (WWANs) are employed video transmission. In WWAN, mobile nodes in a wide large area geographic area can establish employed for video transmission. WWAN, mobile nodes in aestablishes large geographic area can establish radio frequency links with the baseInstation, and the base station global connectivity to the radio frequency links with the base station, and the base station establishes global connectivity to the backbone core network [33]. backbone core network [33].

Figure Figure1.1.Scenario Scenarioof ofthe theMIoT MIoTsystem. system.

The Theproposed proposedstructure structure of ofthe theMIoT MIoTsystem systemfor forvehicle vehicletracking, tracking,which whichconsists consists of ofthree threemain main components, components,namely namelyaamobile mobilecamera cameranode, node,wireless wirelessnetwork networkand andMIoT MIoTserver, server,isisshown shownin inFigure Figure2. 2. Mobile This can can be beaamobile mobilesensor sensordevice devicelike likea acar car recorder, smart phone, Mobilecamera camera node: node: This recorder, smart phone, or or an an unmanned aerial vehicle, In every group of picture (GOP) period, the camera has tasks: two unmanned aerial vehicle, etc.etc. In every group of picture (GOP) period, the camera nodenode has two tasks: (1)camera The camera calculates and uploads content features of current uncoded a (1) The node node calculates and uploads videovideo content features of current uncoded GOPGOP to a to rate

Sensors 2018, 18, 2858

4 of 21

Sensors 2018, 18, x FOR PEER REVIEW

4 of 21

rate allocation module inserver; the server; (2) camera the camera encodes uploads the GOP according to allocation module in the (2) the nodenode encodes and and uploads the GOP according to the the rate allocation policy from the MIoT server. The H.264 encoder and Raptor [34] FEC encoder are rate allocation policy from the MIoT server. The H.264 encoder and Raptor [34] FEC encoder are applied applied for for video video coding coding in in each each camera camera node. node. Wireless Network: A wireless wide area area network network (WWAN) (WWAN) is Wireless Network: A wireless wide is chosen chosen for for video video transmission transmission due to the random distribution and mobility of camera nodes. When uploading videos to the due to the random distribution and mobility of camera nodes. When uploading videos to the server server via a WWAN, multiple camera nodes compete for transmission resources which are limited by via a WWAN, multiple camera nodes compete for transmission resources which are limited by the the bandwidth. Due to to the the inherent inherent transmission transmission characteristics characteristics of of wireless wireless networks, networks, packet loss occurs bandwidth. Due packet loss occurs in each camera node is in the the transmission transmission network. network. In In this thispaper, paper,we weassume assumethat thatthe thepacket packetloss lossrate rateofof each camera node perfectly known by the MIoT server. is perfectly known by the MIoT server. MIoT source and channel raterate allocation module optimizes the MIoT server: server: In In the theMIoT MIoTserver, server,a ajoint joint source and channel allocation module optimizes source and and FEC FEC ratesrates for each node according to the content featuresfeatures and the and current the source for camera each camera node according to the content the channel current packet loss rates of GOPs in all camera nodes. Then the target source and redundancy rates of all channel packet loss rates of GOPs in all camera nodes. Then the target source and redundancy rates camera nodes are sent back to each camera node for current GOP encoding. Besides, the MIoP server of all camera nodes are sent back to each camera node for current GOP encoding. Besides, the MIoP stores performs vehiclevehicle tracking on received videos according to thetoactual demand. Since the server or stores or performs tracking on received videos according the actual demand. Since tracking tasks usually are assigned with a delay in realistic scenarios, there is no real-time feedback the tracking tasks usually are assigned with a delay in realistic scenarios, there is no real-time about the about tracking in results the MIoT server. feedback theresults tracking in the MIoT server. The FEC encoder shown in Figure 2 layer The FEC encoder shown in Figure 2 is is an an application application layer layer FEC FEC encoder. encoder. Application Application layer forward error correction solutions provide a straightforward and powerful means to overcome packet forward error correction solutions provide a straightforward and powerful means to overcome loss [35]. the At camera node, for every source packets of video stream, (N −K)(N−K) redundant data packet lossAt[35]. the camera node, for K every K source packets of video stream, redundant packets are produced by the FEC encoder and are sent with source packets of the source block. As long data packets are produced by the FEC encoder and are sent with source packets of the source block. as receiver, is awhich MIoT server in our structure, receives at least any Katofleast the Nany packets, it can Asthe long as the which receiver, is a MIoT server in our structure, receives K of the N recover all the source packets [36]. packets, it can recover all the source packets [36].

Figure 2. Structure of MIoT system for vehicle tracking. Figure 2. Structure of MIoT system for vehicle tracking.

3. Content-Aware Tracking Precision Prediction Model 3.1. Factors Impact on KCF Tracking Scheme

Sensors 2018, 18, 2858

5 of 21

3. Content-Aware Tracking Precision Prediction Model Sensors 2018, 18, x FORon PEER REVIEW 3.1. Factors Impact KCF Tracking Scheme

5 of 21

The scheme is proposed by Henriques et al. in et 2015 a well-accepted, The KCF KCFtracking tracking scheme is proposed by Henriques al.[21]. in Since 2015 KCF [21]. isSince KCF is a robust, and computationally efficient scheme, it is adopted as the vehicle tracking scheme in the well-accepted, robust, and computationally efficient scheme, it is adopted as the vehicle tracking MIoT server. scheme in the MIoT server. As 3, The KCFKCF tracking scheme includes two important processes,processes, HOG extraction As shown shownininFigure Figure 3, The tracking scheme includes two important HOG and object matching. In HOG extraction module, HOG features of reference frames and current frame extraction and object matching. In HOG extraction module, HOG features of reference frames and are extracted to represent the shape of objects. The HOG features are calculated by orientation and current frame are extracted to represent the shape of objects. The HOG features are calculated by magnitude themagnitude intensity gradient of pixels, gradient which canofbepixels, modified by the orientation of and of the intensity which cancompression be modifiedrate by and the video lighting conditions. In the object matching module, the HOG features in the reference frames compression rate and video lighting conditions. In the object matching module, the HOG features in are trained toframes get theare kernelized filter. Then the HOG features current frame the reference trained tocorrelation get the kernelized correlation filter. Then of thethe HOG features of are the correlated with the filter to determine the object position in current frame. The correlation responses current frame are correlated with the filter to determine the object position in current frame. The are based onresponses the shapes ofbased target on object its surroundings in the which are determined by correlation are the and shapes of target object andframes, its surroundings in the frames, the video complexity. HOG complexity. extraction and object matching are the two modules which arecontent determined by theTherefore, video content Therefore, HOG extraction and object which are sensitive to video compression and video content. matching are the two modules which are sensitive to video compression and video content.

Figure 3. Factors impact on HOG extraction and object matching in KCF scheme. Figure 3. Factors impact on HOG extraction and object matching in KCF scheme.

3.2. Model Features Extraction and Analysis 3.2. Model Features Extraction and Analysis 3.2.1. Bits per Pixel 3.2.1. Bits per Pixel Since HOG features can be affected by the artifacts created by video encoder at different Since HOG features can be affected by the artifacts created by video encoder at different compression rates [29], we employ the bits per pixel to represent the effect of video compression on compression rates [29], we employ the bits per pixel to represent the effect of video compression HOG computation. Videos with different resolution require different coding rates to provide high on HOG computation. Videos with different resolution require different coding rates to provide high video quality. Due to the variety of resolution of mobile camera devices, bits per pixel instead of bit video quality. Due to the variety of resolution of mobile camera devices, bits per pixel instead of bit rate is employed to maintain the quality of different resolution videos. Bits per pixel (Bpp) can be rate is employed to maintain the quality of different resolution videos. Bits per pixel (Bpp ) can be calculated as follows: calculated as follows: rr B pp B= (1) , pp f (1) · psf N Npx ps

px

where r, f ps and Npx are the coding bit rate, frame rate and the number of pixels in a frame, respectively. where r, f ps and N px are the coding bit rate, frame rate and the number of pixels in a frame, 3.2.2. Video Luminance Level respectively. When a camera is capturing video, the luminance value of pixels can be greatly affected by the 3.2.2. Video Luminance Level lighting conditions at that time. Under the circumstance of weak lighting conditions, cameras are proneWhen to lose detailedisinformation of objects of the lack reflected light from affected objects and a camera capturing video, the because luminance value ofof pixels can be greatly by the lighting conditions at that time. Under the circumstance of weak lighting conditions, cameras are prone to lose detailed information of objects because of the lack of reflected light from objects and the interference of noise, and HOG features are affected consequently. We employ luminance level (L) to represent the effect of lighting conditions on HOG computation.

Sensors 2018, 18, 2858

6 of 21

Sensors 2018, 18, x FOR PEER REVIEW

6 of 21

interference of noise, and HOG features are affected consequently. We employ luminance level (L) to We consider two scenarios: the open field and the covered field. The open field frame is the represent the effect of lighting conditions on HOG computation. frame which has scenes of a vehicle driving in an open field, such as Figure 4a,b. In Figure 4a, We consider two scenarios: the open field and the covered field. The open field frame is the frame vehicles are driving in an open field with strong lighting conditions during daytime, and all frames which has scenes of a vehicle driving in an open field, such as Figure 4a,b. In Figure 4a, vehicles are have similar and high luminance levels. In Figure 4b, vehicles are driving at night, all frames in this driving in an open field with strong lighting conditions during daytime, and all frames have similar video sequence maintain a relatively stable low luminance level. In the open field frame, most parts and high luminance levels. In Figure 4b, vehicles are driving at night, all frames in this video sequence of the frame have similar luminance levels. The covered field frame is the frame which has scenes of maintain a relatively stable low luminance level. In the open field frame, most parts of the frame have a vehicle driving in the field covered with a roof or canopy in daytime, such as Figure 4c. In Figure similar luminance levels. The covered field frame is the frame which has scenes of a vehicle driving in 4c, although the videos are captured during daytime, the luminance levels of the in tunnel regions the field covered with a roof or canopy in daytime, such as Figure 4c. In Figure 4c, although the videos and out of tunnel regions are different. Therefore, in a covered field frame, the largest region has a are captured during daytime, the luminance levels of the in tunnel regions and out of tunnel regions low luminance level because of the roof or canopy and the other uncovered field regions have high are different. Therefore, in a covered field frame, the largest region has a low luminance level because luminance levels. of the roof or canopy and the other uncovered field regions have high luminance levels.

(a)

(b)

(c)

Figure 4. Examples of different video scenarios: (a) Open field in daytime; (b) Open field at night; (c) Figure 4. Examples of different video scenarios: (a) Open field in daytime; (b) Open field at night; Covered field in daytime. (c) Covered field in daytime.

The open and covered field frames are classified by the statistical results of luminance of the The andThe covered field frames are classified the statistical results of luminance of the pixels in open a frame. histogram of luminance values by of pixels in a frame is obtained as shown in pixels in a frame. The histogram of luminance values of pixels in a frame is obtained as shown in Figure 5. The luminance values are sorted into five intervals. The quantitative values of each interval Figure 5. The as luminance values are sorted into five intervals. The quantitative values of each interval are denoted H = [25 75 125 175 225] . The numbers of pixels in all intervals are set as S . Then, as are denoted as H = [25 75 125 175 225]. The numbers of pixels in all intervals are set as S. Then, as shown in Figure 5, the histogram is re-ordered by the descending order of the numbers of pixels Se , shown in Figure 5, the histogram is re-ordered by the descending order of the numbers of pixels S, and the corresponding quantitative values are set as H . To exclude the singular pixels of a frame, e To and the corresponding quantitative values are set as H. exclude the singular pixels of a frame, only only first three elements of S and H are used in frame classification method and luminance level e and H e are used in frame classification method and luminance level calculation first three elements of S calculation method. The frame f can be classified based on reordered luminance histogram as method. The frame f can be classified based on reordered luminance histogram as follows: follows: ( e e2 > TL2 f ∈ H 1TL1TLand andH H 2  TL 2 , Cf i fCHi1f < 1 (2) , (2) f ∈O otherwise f  O otherwise  

where field frames, frames, respectively. respectively. TTL1 isisthe whereCCand and O O indicate indicate the the set set of of covered covered and and open open field theluminance luminance L1 threshold for dark regions and TL2 is the luminance threshold for bright regions. The first two intervals threshold dark are regions and TLas is the and luminance for bright regions. first two 2 dark, [0,50] and for (50,100] considered the lastthreshold two intervals (150,200] and The (200,255] are intervals [0, 50] and (50,100] are considered as dark, and the last two intervals (150,200] and (200,255] considered as bright. Hence, TL1 and TL2 are set to be 100 and 150, respectively. Although fixed thresholds TL1 and this 150, method can be improved by are considered as Tbright. Hence, TLto and TL 2 the areoperation set to bespeed, 100 and respectively. Although L2 are employed 1 improve employing dynamic thresholds instead of fixed thresholds. In different scenarios, the division of bright fixed thresholds TL1 and TL 2 are employed to improve the operation speed, this method can be and dark regions can be different, so dynamic thresholds can better adapt to changes of scenarios and improved by employing dynamic thresholds instead of fixed thresholds. In different scenarios, the provide more accurate classification results of bright and dark regions. Hence, the interference regions division of bright and dark regions can be different, so dynamic thresholds can better adapt to can be excluded precisely. changes of scenarios and provide more accurate classification results of bright and dark regions. Hence, the interference regions can be excluded precisely.

Sensors 2018, 18, 2858 Sensors 2018, 18, x FOR PEER REVIEW

7 of 21 7 of 21

Figure 5. 5. The Thehistogram histogramof ofpixels pixels luminance luminance values values and and the the reordered reordered histogram. histogram. Figure

Then Then the the luminance luminance level level of of frame frame can can be be obtained obtained as as follows: follows:

 33 e iS e uu(SeS )H ∑ii=  i ii  11  i i   i iff ff ∈  CC 3 e e ∑i3=1 u(Si )Si , l ( f ) =   3 eu eSi  Si , l( f )   ∑3i=3i 11Hi Si i f f ∈ O

(3) (3)

 ∑i i1=H1 ieSSi i if f  O  3 S  ( where: i 1 i 1 i f x ≤ TL1. u( x ) = . (4) where: 0 i f x > TL1 1 if x  TL1  As in Equation (3), in the covered field . regions are excluded for calculating u( xframes )   C, the bright (4) 0 if x  T  L 1  the luminance level, since the vehicles are actually in the dark region with a low luminance level. e 1, H e 2 and H e 3 are considered. In theAs open field O, all represented H in Equation (3),regions in the covered fieldby frames C, the bright regions are excluded for calculating Based on the luminance levels of classified frames, the video scenarios can be detectedlevel. and the the luminance level, since the vehicles are actually in the dark region with a low luminance In video luminance level is obtained consequently. According to the previous description about scenarios, the open field O, all regions represented by H1 , H 2 and H 3 are considered. the video that has several consecutive covered field frames is regarded as a covered field scenario. Based aon the luminance levelsfive of classified frames, theframes video is scenarios canabe detected and the Therefore, video with more than consecutive covered considered covered field video. video luminance level is obtained consequently. According to the previous description about The other videos are considered as open field videos. scenarios, the has several consecutive coveredscheme, field frames is regarded aslevel a covered field Due to thevideo errorthat propagation caused by the tracking different luminance calculation scenario. Therefore, a video with more than five consecutive covered frames is considered a covered methods are designed for different scenarios. In the tracking scheme, once a wrong target is determined field video. The other videos are considered as open field videos. in a frame due to the loss of HOG information caused by a low luminance level, the right target is to thein error propagation by the tracking scheme, different luminance levels. level hardDue to choose all following framescaused even when the following frames have high luminance calculation methods are designed for different scenarios. In the tracking scheme, once a wrong target Therefore, the open field frames are excluded when calculating the luminance level of video that has is determined in afield frame due to the loss ofinHOG information caused by a low luminance level, the daytime covered scenarios, as shown Equation (5): right target is hard to choose in all following frames even when the following frames have high luminance levels. Therefore, the open field1 frames are excluded when calculating the luminance · l ( f )). (5) L= ∑ (ω ( f ) as Ncscenarios, level of video that has daytime covered field shown in Equation (5): f ∈C





1 . L field ω( f )  l( in f ) daytime The other videos, which have open scenarios or at night have a different (5) N luminance level calculation method, that is: c f C

The other videos, which have open 1field scenarios in daytime or at night have a different L = is: ∑ (ω ( f ) · l ( f )), (6) luminance level calculation method, that N f ∈C∪O

1 L N

  ω( f )  l( f ) ,

f C O

(6)

where l( f ) is the luminance level of the f-th frame, N is the total frames in the video sequence. Nc is the total covered field frames in the video sequence. Hence, ω( f ) is the weight of the luminance level of f-th frame in video which can be obtained by Equation (7):

Sensors 2018, 18, 2858

8 of 21

where l ( f ) is the luminance level of the f -th frame, N is the total frames in the video sequence. Nc is the total covered field frames in the video sequence. Hence, ω ( f ) is the weight of the luminance level of f -th frame in video which can be obtained by Equation (7): ω( f ) =

N− f +1 , N

(7)

here ω ( f ) is determined by the number of frames influenced by the f -th frame. In other words, ω ( f ) is determined by the number of frames following the current frame. 3.2.3. Video Adjacent Block Difference In the KCF tracking scheme, the correlation results from kernelized correlation filter depend on the shapes of objects in the frame. In a video with complex content, the target object is surrounded by other background objects and the target object is hard to separate from the surroundings, which decreases the matching precision. We employ the video adjacent block difference (V) to represent the effect of video content complexity on tracking precision. The adjacent block difference of a video sequence is defined as the maximum frame adjacent block difference just as follows: V = max(v( f )), (8) where v( f ) represents the frame adjacent block difference of the f -th frame, defined as: v( f ) =

Nclst ( f ) , N f mb ( f )

(9)

where Nfmb (f ) is the total number of foreground MBs in the f -th frame, and Nclst (f ) is the total cluster number of the f -th frame, which can be obtained based on the cluster results of foreground macroblocks (MBs) in the f -th frame. Here, the foreground MBs refers to the MBs which have more edges than other MBs in the frame. As the analysis in Section 3.1 shows, the target is tracked based on the correlation results of the HOG features of the target and its surroundings. Complex HOG features have a higher probability to affect the tracking precision. Therefore, the MBs with relatively complex HOG are detected by gradients of pixels in the following. Since these complex HOG are almost caused by edges of objects, we call the corresponding MBs foreground MBs. The foreground MBs are detected based on magnitudes of the intensity gradients of all pixels in an image. The frame is divided into lots of non-overlapping 16 × 16 MBs. Let G ( x, y) be the average magnitudes of intensity gradients of all pixels in MB( x, y). The Gmin is the minimum of G ( x, y), and the Gmean is the mean magnitudes of intensity gradients in a frame. Then foreground MBs are detected based on the gradients magnitudes in the frame. Objects, such as vehicles, trees and pedestrians, always have more edges than background like road and sky. Then, we define the foreground set Ω as: (

MB( x, y) ∈ Ω i f G ( x, y) > Tm , MB( x, y) ∈ / Ω otherwise

(10)

where: Tm = ( Gmean + Gmin )/2.

(11)

Then, Nfmb is defined as the total number of MBs in Ω. A sample frame and its foreground is shown in Figure 6, the white block is foreground MB and the black block is background MB.

Sensors Sensors2018, 2018,18, 18,2858 x FOR PEER REVIEW

99ofof21 21

Sensors 2018, 18, x FOR PEER REVIEW

9 of 21

(a) (b) (a) (b) Figure 6. Sample of foreground detection result: (a) Sample frame; (b) Foreground detection result of sample6.frame. Figure Sample of foreground detection result: (a) Sample frame; (b) Foreground detection result of Figure 6. Sample of foreground detection result: (a) Sample frame; (b) Foreground detection result of sample frame. sample frame.

The total cluster number Nclst is then obtained based on clustering of foreground MBs. As shown in Figure 7, each foreground MB isthen compared with its clustered adjacent foreground MBsAsbased on The total cluster number N clst is obtained based on clustering of foreground MBs. shown The total cluster number N is then obtained based on clustering of foreground MBs. As shown clst the average block luminanceMB values. MBs thatwith have average luminance levelMBs are clustered. in Figure 7, each foreground is compared itssimilar clustered adjacent foreground based on in Figure 7, clustering each is compared itssimilar clustered adjacent foreground MBs based on Due to the routineMB shown in Figure 7,have only top or left foreground MB may for the average blockforeground luminance values. MBs thatwith average luminance levelbe areavailable clustered. the average block luminance values. MBs that have similar average luminance level are clustered. clustering the current MB. Let and be the cluster number and average luminance c ( x , y ) I ( x , y ) Due to the clustering routine shown in Figure 7, only top or left foreground MB may be available for Due to the the clustering routine shown Figure topthe or left foreground may be available for clustering MB. Let and Inumber number MB and c(The x, yin)cluster (7,x,only y) be value of ΜΒ( of thecluster first foreground MBaverage in frameluminance is set to 1. xcurrent , y) , respectively. clustering the current MB. Let c( x, y) and I ( x, y) be the cluster number and average luminance value value , respectively. The cluster of the first x, y)number Then of theΜΒ( cluster of foreground asforeground follows: MB in frame is set to 1. MB(number x, y) is obtained of MB( x, y), respectively. The cluster number of the first foreground MB in frame is set to 1. Then the Then the cluster number of foreground as follows: x, y) is obtained cluster number of foreground MB( x, y) isMB( follows: cobtained ( x  1, y) as if   T and 1   2 1  ) ifif 1 2TT and c( x , y) cc((xx, y1,y1) and1 22 1 , (12)  c (  x − 1, y) i f ∆1 < T∆ and ∆1 ≤ ∆2 c( x , y)  cmax( ( x , y c)1) 1ifotherwise  2  T and  2  1 , (12) c( x, y) = (12) c( x, y − 1) i f ∆2 < T∆ and ∆2 < ∆1 , max(c)  1 otherwise   max  (c) + 1 otherwise where max(c) is the maximum cluster number when the ΜΒ( x, y) is being clustered, and max(c) + 1 where max(c) theΜΒ( maximum number the ΜΒ( and max(c) +1 x, y) is being clustered, T is threshold indicates thatis considered as awhen new cluster. of similar x, y) iscluster where max(c) isthe the maximum cluster number when the MBThe ( x, yparameter ) is being clustered, and max(c) +1 indicates isconsidered considered asaa new new cluster. The parameter threshold of of similar ΜΒ((isx, x,set that the the MB yy) is as parameter TT∆ isis threshold 1 , cluster.  2 are defined adjacent that MBs, which as 20 in this paper. as: adjacent defined as:as: adjacent MBs, MBs, which which is is set set as as 20 20 in in this this paper. paper. ∆11, ,∆2 2are are defined    I ( x , y )  I ( x  1, y ) (  1 (13) ∆1= 1, y))| . | I ( x, y) − I ( x − 1 2 I (Ix( x, y, y) )I (Ix ( x, y1,y1) . (13) . (13) ∆2= | I ( x, y) − I ( x, y − 1)|  2  I ( x , y)  I ( x , y  1)

Figure 7. The cluster routine and available adjacent macroblocks. Figure 7. The cluster routine and available adjacent macroblocks. Figure 7. The cluster routine and available adjacent macroblocks.

Then, Nclst is defined as the maximum c in the frame. In one cluster, magnitudes of gradients Then, N is defined defined as the cc in the In one cluster, of between MBs areas prone to be small, which makes In object difficult. Thus a smaller Then,adjacent Nclst clst is the maximum maximum in the frame. frame. one matching cluster, magnitudes magnitudes of gradients gradients between adjacent MBs are prone to Nclst indicates more complex content. between adjacent MBs are prone to be be small, small, which which makes makes object object matching matching difficult. difficult. Thus Thus aa smaller smaller N indicates more complex content. Nclst clst indicates more complex content. 3.3. Model Establishment 3.3. Model Establishment Establishment 3.3. Model We select six videos which take vehicles as the target objects from the Online Object Tracking We six vehicles as target from the Online Tracking Benchmark (OOTB) [37]. which Amongtake them, BlurCar1 is divided into two video BlurCar(1) is a We select select six videos videos which take vehicles as the the target objects objects from thesequences. Online Object Object Tracking Benchmark (OOTB) [37]. Among them, BlurCar1 is divided into two video sequences. BlurCar(1) aa video sequence which isAmong clippedthem, from 1-st frameistodivided 320-th frame in BlurCar1, and BlurCar(2) hasis Benchmark (OOTB) [37]. BlurCar1 into two video sequences. BlurCar(1) isthe video sequence which is clipped from 1-st frame to 320-th frame in BlurCar1, and BlurCar(2) has the 390-thsequence frame towhich 709-thisframe in BlurCar1. Each video is coded rates using the H.264 video clipped from 1-st frame to 320-th frameat indifferent BlurCar1,bit and BlurCar(2) has the 390-th frame to 709-th frame in BlurCar1. Each video is coded at different bit rates using the H.264 coder frame as shown in Table 1. The target bitEach rate of eachis frame the GOP is assigned using the 390-th to 709-th frame in BlurCar1. video codedwithin at different bit rates using the H.264 basic as variable mechanism in the H.264 The frame 30 fps, and the using Groupthe of coder shownbit in rate Table 1. The target bit rate of coder. each frame withinrate the is GOP is assigned basic variable bit rate mechanism in the H.264 coder. The frame rate is 30 fps, and the Group of

Sensors 2018, 18, 2858

10 of 21

coder as shown in Table 1. The target bit rate of each frame within the GOP is assigned using the basic Sensors 2018, 18, x FOR PEER REVIEW of 21 variable bit rate mechanism in the H.264 coder. The frame rate is 30 fps, and the Group of10Picture (GOP) Picture structure and size are IPPP and 16, respectively. The tracking precision of each coded video (GOP) structure and size are IPPP and 16, respectively. The tracking precision of each coded sequence is obtained KCF tracking scheme and is shown Figure in 8. Figure Tracking results are compared video sequenceby is obtained by KCF tracking scheme and in is shown 8. Tracking results are with the ground truth. In each frame if the distance between center of tracking result and center compared with the ground truth. In each frame if the distance between center of tracking result and of groundcenter truth is thantruth 20, then thethan tracking result considered as is correct [25]. The tracking precision of less ground is less 20, then theistracking result considered as correct [25]. The trackingof precision the number of correct framesof divided by number of all frames. is the number correctisframes divided by number all frames. Table 1. The encoded bitrateofofvideos. videos. Table 1. The encoded bitrate Video

Resolution

Video Resolution BlurCar1(1) 640 × 480

Bitrate(kb/s) 32 64 Bitrate(kb/s) 128 256 512 768 32 6464 128 128 256 256 512 512 768768 32 64 128 256 512 768 64 128 256 512 768 16 6432 128 64 128 256 256 512 512768 16 32 64 128 256 512 32 64 128 256 512 16 32 64 128 256 512

BlurCar3 640 × 48032 BlurCar1(1) 640 × 480 BlurCar4 640 × 480 BlurCar3 640 × 480 32 Car2 320 × 24032 BlurCar4 640 × 480 360 × 240 Car2 Car4320 × 240 16 CarDark 320 × 240 Car4 360 × 240 16 32 64 128 256 512 CarDark 320 × 240 16 32 64 128 256 512 We use Bpp to be our main tracking precision prediction parameter. According to the relationship between tracking precision and Bpp shown in Figure 8, the tracking precision can be Weformulated use Bpp toas: be our main tracking precision prediction parameter. According to the relationship

between tracking precision and Bpp shown in Figure 8, the tracking precision can be formulated as:

1 P( Bpp )  (14) ,  β B 1 1αe√ pp, P( B pp ) = (14) − β· B pp 1 + αe where P is the tracking precision, and α , β are empirical parameters. We can determine the shape

of isthe according and β . As interpreted We in Figure 8, the shape the of where P theprediction tracking function precision, and α, βtoareα empirical parameters. can determine the of shape prediction functionaccording varies according to different video content. Therefore, and βofare α shape the prediction function to α and β. As interpreted in Figure 8, the theestimated prediction by L and V of the video as follows: function varies according to different video content. Therefore, α and β are estimated by L and V of the video as follows: α=a1  La  V a α = a · L a2 · V a3 . (15)  a  ( a1  L  a  V )a , a a}7 . (15) β  min{ β = min a44 · ( a55 · L +6 a6 · V )8 , a8 2

3

7

The finally obtained model coefficients shown Table2.2. The finally obtained model coefficients areare shown ininTable

(a)

(b)

(c)

(d)

(e)

(f)

Figure 8. Tracking precision of each video sequences: (a) BlurCar1(1); (b) BlurCar3; (c) BlurCar4; (d)

Figure Car2; 8. Tracking of each video sequences: (a) BlurCar1(1); (b) BlurCar3; (c) BlurCar4; (e) Car4;precision (f) CarDark. (d) Car2; (e) Car4; (f) CarDark.

Sensors 2018, 18, 2858

11 of 21

Table 2. Model coefficients. a1

a2

a3

a4

1.049 × 10−15

12.661

7.034

200.560

a5

a6

a7

a8

0.010

0.900

6.150

137.000

4. Proposed Joint Source and Channel Rate Allocation Scheme In the MIoT system, multiple mobile camera nodes have to compete for the limited wireless transmission data rate. Due to the packet loss in the wireless network transmission, FEC needs to be used for source data protection. Therefore, we develop a joint source and channel rate allocation scheme for both video and FEC rate allocation with the objective to maximize the product of tracking precision of each uploaded GOP. Let M be the number of mobile camera nodes in system, and k and n are the M×1 vectors. The element k m in k represents the source packets of mobile camera node m. The element nm in n represents the FEC packets of mobile camera node m, which contains the source packets and the protect packets. The proposed scheme allocates the appropriate source and FEC packets for GOP in each camera node, such that the product of the tracking precision of all GOPs is maximized. The joint source and channel rate allocation scheme can be formulated as follows: M

max ∏ Pm



n,k m=1 M

k m ·S NGOP · Npx



· f (k m , nm ; pm )

s.t. ∑ nm · S · f ps /NGOP ≤ Rtotal m=1  Pm N km ··SNpx ≥ Pmin ∀m

,

(16)

GOP

nm ≥ k m ∀m where pm is the packet loss rate of the camera node m. S is the packet size and NGOP is the GOP size, Rtotal is the total available rate, and Pmin is minimum acceptable precision. Pm (·) is the tracking precision prediction model established in Section 3, and f (·) is the FEC packets correction rate of the camera node m in application layer [32]: n − k − np f (k, n; p) = Φ p np(1 − p)

!

1 = √ 2π

Z √n−k−np

np(1− p)

−∞

e−t

2 /2

dt,

(17)

where Φ(·) is the cumulative distribution function of the standard normal distribution. The first constraint indicates that the sum of the uploaded data in a GOP time period is limited by the total data rate. The second constraint implies that the tracking precision of each camera node should meet the predefined minimum requirement. The last constraint means that the number of source packets of camera node m cannot be greater than FEC packets of that camera node. To solve the optimization problem, the logarithm of objective function is employed, and the k m and nm are √ √ em = nm . Then the original problem can be formulated as: substituted by e km = km , n M

max ∑



n,k m=1 M

  log Pm N

e k2m ·S GOP · Npx



  e2m ; pm ) + log f (ek2m , n

e2m · S · f ps /NGOP ≤ Rtotal s.t. ∑ n m =1   e k2 · S Pm N m · Npx ≥ Pmin ∀m GOP

em ≥ e n k m ∀m

.

(18)

M    k2  S m max   log  Pm  n,k    N N m 1   GOP px  M

    log f ( km2 , nm2 ; pm )  



s.t.  n  S  f ps / NGOP  Rtotal

Sensors 2018, 18, 2858

m 1

2 m



   12 of 21

(18)

.

 k2  S  m   Pmin m Pm  The reformulated problem is aN convex  N px  optimization problem and can be solved by convex GOP   optimization tools like CVX toolbox [38]. The optimized resolution k* and n* are the source packets nm  km m

and the FEC packets for each camera node. Although packets number should be integer in the reality, The is a convex optimization and problem. can be solved by convex the elements inreformulated k* and n* areproblem not required to be integer whenproblem solving the The reason for this tools like CVXrate toolbox optimized k* and are the source is thatoptimization the target source coding and[38]. FECThe coding rate ofresolution GOP which aren*converted frompackets k* and n* and thetoFEC eachfrom camera Although are allowed be packets slightlyfor differ thenode. actual codingpackets rate. number should be integer in the reality, the elements in k* and n* are not required to be integer when solving the problem. The reason for this is that theand target source coding rate and FEC coding rate of GOP which are converted from k* and n* 5. Simulation Performance Comparison are allowed to be slightly differ from the actual coding rate.

5.1. Simulation Settings

5. Simulation and Performance Comparison

We simulate a scenario where six mobile camera nodes capture different scenes and upload 5.1. Simulation Settings videos to a server via lossy wireless networks. The six mobile camera nodes have to compete for simulate a scenario where sixin mobile camera nodes capture different scenes upload the limitedWe transmission resource which our simulation is represented by the totaland transmission videos a servercondition via lossy of wireless networks. Theissix mobile camera nodes haveloss to compete for camera the bitrates. The to channel each camera node represented by its packet rate. Each limited transmission resource which in our simulation is represented by the total transmission node extracts the two content features (L and V) of the current raw GOP and sends these features bitrates. The channel condition of each camera node is represented by its packet loss rate. Each to the server. The server jointly optimizes the source coding rate and FEC rate for all camera nodes camera node extracts the two content features (L and V) of the current raw GOP and sends these based features on thesetofeatures and packet loss rates of all camera nodes. Then the optimal source coding the server. The server jointly optimizes the source coding rate and FEC rate for all rates and FEC rates are fed each camera node current coding. The encoded GOPs camera nodes based on back these to features and packet lossfor rates of all GOP camera nodes. Then the optimal of six source cameracoding node rates are uploaded to the server. Next GOPs are also processed like this. Finally, and FEC rates are fed back to each camera node for current GOP coding. The we evaluate the tracking on the uploadedtoGOPs. In our simulation, sixlike video encoded GOPs ofperformance six camera node are uploaded the server. Next GOPs are we alsoselected processed this. Finally, we evaluate thevisual tracking performance the uploaded our simulation, sequences from OOTB [37] and object trackingon (VOT) [39] dataGOPs. set asIncaptured videoswe of six from OOTB in [37] and3.visual tracking [39] to data setframes as cameraselected nodes, six the video video sequences information is shown Table Here,object BlurCar(2) has(VOT) the 390th 709th captured videoshas of six nodes,frames the video information is shown in the Table 3. Here, BlurCar(2) in BlurCar1, Car1(1) thecamera 1st to 320th in Car1, and Car1(2) has 410th to 729th frames in has the 390th to 709th frames in BlurCar1, Car1(1) has the 1st to 320th frames in Car1, and Car1(2) Car1. The sample frames of the six test videos are shown in Figure 9. The FFMPEG implementation has the 410th to 729th frames in Car1. The sample frames of the six test videos are shown in Figure 9. is employed as H.264 video encoder. The frame rate and GOP size are 30 fps and 16, respectively. The FFMPEG implementation is employed as H.264 video encoder. The frame rate and GOP size are The rate assigned to a GOPThe by the mechanisms allocatedmechanisms to frames within 30 that fps and 16, respectively. rate different that assigned to a GOP is bythen the different is then the GOP using basic variable bit rate mechanism in the H.264 coder. The GOP structure is IPPP. allocated to frames within the GOP using basic variable bit rate mechanism in the H.264 coder.Each The test video has GOPs. The total transmission bit 19 rates change fromtransmission 400 kb/s tobit 1400 kb/s, andfrom the 400 change GOP19 structure is IPPP. Each test video has GOPs. The total rates change step iskb/s 200tokb/s. The content information of each is shown in Table The packet loss rate 1400 kb/s, and the change step is 200 kb/s. video The content information of3. each video is shown in is Table Thefor packet loss rate is set and as 1%the or 3% for different videos, and the packet loss ratesare of GOPs set as 1% or3.3% different videos, packet loss rates of GOPs within the video fixed for within purposes. the video are fixed for comparison purposes. comparison

Sensors 2018, 18, x FOR PEER REVIEW

13 of 21

(a)

(b)

(c)

(d)

(e)

(f)

Figure 9. Sample frames of test video sequences: (a) BlurCar1(2); (b) Wiper; (c) Tunnel; (d) Car1(1);

Figure 9. Sample frames of test video sequences: (a) BlurCar1(2); (b) Wiper; (c) Tunnel; (d) Car1(1); (e) Car1(2); (f) Car24. (e) Car1(2); (f) Car24. Table 3. Test video sequences. Video BlurCar1(2) Wiper Tunnel Car1(1)

Resolution 640 × 480 640 × 480 640 × 360 320 × 240

Lighting Condition Medium Low Low High

Content Complexity Medium High Medium Medium

Sensors 2018, 18, 2858

13 of 21

Table 3. Test video sequences. Video

Resolution

Lighting Condition

Content Complexity

BlurCar1(2) Wiper Tunnel Car1(1) Car1(2) Car24

640 × 480 640 × 480 640 × 360 320 × 240 320 × 240 320 × 240

Medium Low Low High High High

Medium High Medium Medium Low Low

We compare our proposed scheme with three schemes: the mean square error (MSE) driven rate allocation scheme, QoE driven rate allocation scheme and constant bitrate control (CBR) scheme. In CBR scheme, the total transmission rate is equally allocated to each camera node, and the packet correct rate of each node is set to be 0.995. For the MSE-driven and QoE-driven schemes, we adopt a classical rate-distortion model in [40] and a QoE model in [41]. The corresponding rate allocation problem is expressed as:   M min ∑ Qm N km ··SNpx n,k m=1 M

GOP

s.t. ∑ nm · S · f ps /NGOP ≤ Rtotal m =1 k m ·S NGOP · Npx

,

(19)

≥ Rmin ∀m

f (k m , nm ; pm ) ≥ pmin ∀m nm ≥ k m ∀m here Qm (·) is the MSE/QoE of camera node m. Rmin is the minimum data rate and is set to be 32 kb/s. pmin is minimum packet correct rate which is set to be 0.995. 5.2. Simulation Results and Discussion We set a fixed total data rate of 800 kb/s for the proposed tracking-precision-driven scheme (TP-driven scheme), MSE-driven scheme, QoE-driven scheme and CBR scheme. The packet loss rates of all camera nodes are set to be 1%. Figure 10 shows the source data rate of each GOP. The proposed TP-driven scheme allocates less data rate to the GOPs which have high luminance level and adjacent block difference (like GOPs in Car1(2) and Car24) to save bit rates. For the GOPs with low luminance level or adjacent block difference, the proposed TP-driven scheme allocates more data rates to these GOPs to increase video quality for vehicle tracking purpose (5th to 12th GOPs in Wiper, 15th to 18th in BlurCar1(2)). The MSE-driven scheme allocates more data rates to the GOPs with more objects or details. For example, the Car1(1) and Car1(2) are allocated more rates than Car24 by MSE-driven scheme. However, the MSE-driven cannot distinguish the GOPs by video content, and therefore GOPs in BlurCar1(2) and Car1(2) are allocated similar date rate even these GOPs have different degrees of tracking difficulty. The QoE model classifies GOPs into three different content type, ’slight movement’, ‘gentle walking’ and ‘rapid movement’, then the QoE-driven scheme allocates more data rates to GOPs with ‘rapid movement’ content type and less data rates to GOPs with ‘slight movement’ content type. Therefore, compared with MSE-driven and TP-driven scheme, QoE-driven scheme allocates more data rates to BlurCar1(2) because it is ‘rapid movement’.

GOPs have different degrees of tracking difficulty. The QoE model classifies GOPs into three different content type, ’slight movement’, ‘gentle walking’ and ‘rapid movement’, then the QoE-driven scheme allocates more data rates to GOPs with ‘rapid movement’ content type and less data rates to GOPs with ‘slight movement’ content type. Therefore, compared with MSE-driven and Sensors 2018, 18, 2858 of 21 TP-driven scheme, QoE-driven scheme allocates more data rates to BlurCar1(2) because it is14‘rapid movement’.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 10. Source coding rate of all GOPs in each video under 800 kb/s total rate constraint. The Figure 10. Source coding rate of all GOPs in each video under 800 kb/s total rate constraint. The packet packet loss rate of each video is 1%: (a) BlurCar1(2); (b) Wiper; (c) Tunnel; (d) Car1(1); (e) Car1(2); (f) loss rate of each video is 1%: (a) BlurCar1(2); (b) Wiper; (c) Tunnel; (d) Car1(1); (e) Car1(2); (f) Car24. Car24.

Next, Next, we weconduct conducttracking trackingon onthese thesevideo videosequences sequencesby byKCF KCFscheme. scheme. The The number number of ofcorrect correct tracking frames in each GOP is illustrated in Figure 11. tracking frames in each GOP is illustrated in Figure 11.

Sensors 2018, 18, 2858

15 of 21

Sensors 2018, 18, x FOR PEER REVIEW

15 of 21

(a)

(b)

(c)

(d)

(e)

(f)

Figure 11. Tracking results of all GOPs in each video under 800 kb/s total rate constraint. The packet

Figure 11.rate Tracking results in each(b) video under 800 kb/s total rate The packet loss of each video is of 1%:all (a)GOPs BlurCar1(2); Wiper; (c) Tunnel; (d) Car1(1); (e)constraint. Car1(2); (f) Car24. loss rate of each video is 1%: (a) BlurCar1(2); (b) Wiper; (c) Tunnel; (d) Car1(1); (e) Car1(2); (f) Car24. Here, we assume that the ground truth of the first frame in each video sequence is assigned by the server. From Figure 11, we can see thatof our proposed TP-driven theisbit rates by Here, we assume that the ground truth the first frame in eachscheme video allocates sequence assigned more efficiently. For instance, in the Wiper video sequence, the KCF tracks the right target in GOPs the server. From Figure 11, we can see that our proposed TP-driven scheme allocates the bit rates from the proposed scheme while it loses the target in GOPs encoded by the MSE-driven, QoE-driven more efficiently. For instance, in the Wiper video sequence, the KCF tracks the right target in GOPs and CBR schemes. In the BlurCar1(2) video, the proposed scheme achieves higher tracking precision from than the proposed scheme it loses the target in GOPs encoded by the MSE-driven, QoE-driven the MSE-driven andwhile CBR schemes. This is because the proposed scheme considers content the and CBR schemes. In the BlurCar1(2) video, proposed scheme achieveshigh higher tracking precision impacts on tracking and allocates more datathe rates to these GOPs to provide tracking precision. than For the Car24, MSE-driven and CBR schemes. This isprecision, because but the the proposed scheme considers all schemes get the same tracking proposed TP-driven schemecontent saves the more rates. Although the proposed scheme rates to the Car1(1) and sacrifices impacts onbit tracking and allocates more data rates allocates to these less GOPs to provide high tracking precision. tracking performance in its GOPs, the overall performance in the six test video is better than thesaves For Car24, all schemes get the same tracking precision, but the proposed TP-driven scheme MSE-driven scheme, QoE-driven scheme and CBR scheme. more bit rates. Although the proposed scheme allocates less rates to the Car1(1) and sacrifices tracking The performance of the proposed TP-driven scheme, MSE-driven scheme, QoE-driven scheme performance in its GOPs, the overall performance in the six test video is better than the MSE-driven and CBR scheme is tested under different total rate constraints. Figure 12 shows the product of scheme, QoE-driven scheme and CBR scheme. tracking precisions of all video sequences under different total rate constraints. Figure 13 illustrates The performance of the proposed TP-driven scheme, MSE-driven scheme, QoE-driven scheme the detailed tracking precision of each video under different total rate constraints. The packet loss and CBR is tested total rate constraints. Figure 12scheme shows gets the product rates scheme are set to be 1% under for all different mobile cameras. The proposed TP-driven better of tracking precisions of all sequences differentFor total rate constraints. 13 when illustrates performance when thevideo total rate is limitedunder and deficient. example, as shown in Figure Figure 13, the detailed tracking precision of each video under different total rate constraints. The packet the total rate is 800 kb/s, the proposed scheme achieves higher tracking precision for BlurCar1(2) and loss

rates are set to be 1% for all mobile cameras. The proposed TP-driven scheme gets better performance when the total rate is limited and deficient. For example, as shown in Figure 13, when the total rate is 800 kb/s, the proposed scheme achieves higher tracking precision for BlurCar1(2) and Wiper.

Sensors 2018, 18, 2858

16 of 21

Sensors 2018, 18, x FOR PEER REVIEW

16 of 21

2018, 18, x FOR PEER REVIEW 16 of 21 For Sensors Tunnel, Car1(2) and Car24, the four schemes show little tracking precision differences. For Car1(1), For scheme Tunnel, gets Car1(2) andtracking Car24, the four schemes show little trackingHowever, precision the differences. theWiper. proposed lower precision than the other schemes. products For Car1(1), the proposed scheme gets lower tracking precision than tracking the other schemes.differences. However, Wiper. For Tunnel, andsequences Car24, theunder four schemes show TP-drive, little precision of tracking precision ofCar1(2) all video the proposed MSE-driven, QoE-driven the products of tracking precision of all video sequences under the proposed TP-drive, MSE-driven, For Car1(1), the proposed scheme gets lower tracking precision than the other schemes. and CBR scheme are 0.92, 0.28 0.29 and 0.31, respectively. It is obvious that the trackingHowever, precision QoE-driven CBR scheme are 0.92, 0.28 0.29sequences and 0.31, under respectively. It is obvious that MSE-driven, the tracking the productsand of tracking precision of all video the proposed TP-drive, of the MSE-driven, QoE-driven and CBR schemes is lower than that of the proposed scheme as precision of and the CBR MSE-driven, QoE-driven CBR0.31, schemes is lowerIt than that of thethe proposed QoE-driven scheme are 0.92, 0.28 and 0.29 and respectively. is obvious that tracking shown in Figure 12. Therefore, the proposed TP-driven scheme achieves better overall performance in scheme as of shown in Figure 12.QoE-driven Therefore, the TP-driven scheme overall precision the MSE-driven, andproposed CBR schemes is lower thanachieves that of better the proposed vehicle tracking.

performance in vehicle tracking. scheme as shown in Figure 12. Therefore, the proposed TP-driven scheme achieves better overall performance in vehicle tracking.

Figure 12. Product of tracking precision of each video under different total rate constraints. The Figure 12. Product of tracking precision of each video under different total rate constraints. The packet packet of each is 1%. Figureloss 12. rate Product of video tracking precision of each video under different total rate constraints. The loss rate of each video is 1%. packet loss rate of each video is 1%.

(a)

(b)

(a)

(b)

(c)

(d)

(c)

(d)

Figure 13. Cont.

Sensors 2018, 18, 2858

17 of 21

Sensors 2018, 18, x FOR PEER REVIEW

17 of 21

Sensors 2018, 18, x FOR PEER REVIEW

17 of 21

(e)

(f)

Figure 13. Tracking precision of each video under different total rate constraints. The packet loss of Figure 13. Tracking precision of each video under different total rate constraints. The packet loss of each video is 1%: (a) Total rate = 400 kb/s; (b) Total rate = 600 kb/s; (c) Total rate = 800 kb/s; (d) Total each video is 1%: (a) Total rate = 400 kb/s; (b) Total rate = 600 kb/s; (c) Total rate (f) = 800 kb/s; (d) Total rate = 1000 kb/s; (e) Total(e) rate = 1200 kb/s; (f) Total rate = 1400 kb/s. rate = 1000 kb/s; (e) Total rate = 1200 kb/s; (f) Total rate = 1400 kb/s. Figure 13. Tracking precision of each video under different total rate constraints. The packet loss of

We test FEC rate(a)allocation different loss(c) rates Since eachthe video is 1%: Total rate =by 400setting kb/s; (b) Total ratepacket = 600 kb/s; Totalfor rateeach = 800video. kb/s; (d) Totalin the CBR the data allkb/s; camera nodes equally allocated to allvideo. videos, onlyinthe We scheme, test rate the=FEC rate allocation setting different loss rates for each Since the 1000source kb/s; (e) Totalrates rate by = of 1200 (f) Total ratepacket =are 1400 kb/s. MSE-driven and rates QoE-driven schemesnodes are tested. In the first case, the packet loss only of each CBRTP-driven, scheme, the source data of all camera are equally allocated to all videos, the We Figure test the 14 FEC rate allocation by setting lossfirst rates for redundancy each video. Since in video is MSE-driven 1%. shows the average source different datatested. ratepacket and FEC rate under TP-driven, and QoE-driven schemes are In average the case, the packet loss of the each CBR scheme, the source data rates of all camera nodes are equally allocated to all videos, only the different total rate constraints in this case. We can see that videos are allocated more redundancy video is 1%. Figure 14 shows the average source data rate and average FEC redundancy rate under MSE-driven and QoE-driven schemes areTP-driven tested. In scheme the first allocates case, the packet loss of each ratesTP-driven, with of the rate. The proposed redundancy different total the rateincrease constraints intotal this case. We can see that videos are allocated moremore redundancy rates video is 1%. Figure 14 shows the average source data rate and average FEC redundancy rate under rates to the videos with low luminance level and adjacent block differences to avoid packet losses. with the increase of the total rate. The proposed TP-driven scheme allocates more redundancy rates to differentthe total rate constraints this case. We can seeare thatprone videostoare allocated MSE-driven and in QoE-driven schemes protect the more videoredundancy with more the Meanwhile videos with low luminance level andrate. adjacent block differences to avoid packet losses. Meanwhile rates with the increase of the total The proposed TP-driven scheme allocates more redundancy details and rapid movement, respectively. Then the packet loss rates of Tunnel and Car1(1) are set as the MSE-driven and QoE-driven schemes are prone to protect the video with more details and rapid rates to theloss videos low luminance block differences to avoidofpacket losses. 3%, the packet rateswith of other videos arelevel set asand 1%.adjacent Figure 15 shows the percentage redundancy movement, respectively. Then the packet loss rates of Tunnelare and Car1(1) are set the as 3%, thewith packet loss Meanwhile the MSE-driven and QoE-driven schemes prone to protect video more rates in the total coding rates of Tunnel and Car1(1) in the two cases. As shown in Figure 15, all rates of details other videos aremovement, set as 1%.respectively. Figure 15 shows the percentage rates in and rapid Then the packet loss ratesofofredundancy Tunnel and Car1(1) arethe settotal as schemes allocate more data to videos with 3% packet loss rates to protect the data packets. From 3%, the rates of other videos set asAs 1%.shown Figure in 15 Figure shows the of allocate redundancy coding rates of packet Tunnelloss and Car1(1) in the twoare cases. 15,percentage all schemes more Figure 15 we can see that although the redundancy rates increase when the total constraint rates rates in the total coding rates of Tunnel and Car1(1) in the two cases. As shown in Figure 15, dataincrease to videos with 3% rates to protectofthe data packets. Figure 15 the we increase can see all that as shown inpacket Figure loss 14, the percentage redundancy ratesFrom decrease with of schemes allocate more data to videos with 3% packet loss rates to protect the data packets. From14, although the redundancy rates increase when the total constraint rates increase as shown in Figure total Figure bit rates. This is because the increase of the redundancy rates is lower than the increase of the 15 we can see that although the redundancy rates increase when the total constraint rates the source percentage of redundancy ratestracking decrease with theofincrease of total bit rates. This isinbecause the data rates. The product precision each video in this case is with shown Figure 16, increase as shown in Figureof14, the percentage of redundancy rates decrease the increase of increase of the redundancy rates is lower than the increase of the source data rates. The product of which is almost theThis same as the product in theoffirst It is because changes source dataofrates total bit rates. is because the increase thecase. redundancy rates the is lower thanofthe increase the tracking precision of each video in this case is shown in Figure 16, which is almost the same as the and FEC packets correction rates that caused precision by packetofloss changes very slight.in Figure 16, source data rates. The product of tracking eachrate video in thisare case is shown

product which in the isfirst case.theItsame is because the changes source rates and packets correction rates almost as the product in theoffirst case. data It is because theFEC changes of source data rates that caused by packet rate changes arecaused very slight. and FEC packetsloss correction rates that by packet loss rate changes are very slight.

(a)

(b) (a)

(b)

Figure 14. Cont.

Sensors 2018, 18, x FOR PEER REVIEW

18 of 21

Sensors 2018, 18, 2858

18 of 21

Sensors 2018, 18, x FOR PEER REVIEW

(c)

(e)

18 of 21

(d) (c)

(d)

(e)

(f)

(f)

Figure 14. Average source data rates and FEC redundancy rates of each video under different total Figure source data rates andand FEC redundancy ratesrates of each videovideo under different total rate Figure14. 14.Average Average source data rates redundancy of each different total rate constraints. The packet loss rate of FEC each video is 1%: (a) Average source dataunder rates of proposed constraints. The packet loss rate of each video is 1%: (a) Average source data rates of proposed scheme; rate constraints. The packet loss rate of each video is 1%: (a) Average source data rates of proposed scheme; (b) Average FEC redundancy rates of proposed scheme; (c) Average source data rates of (b) Average redundancy rates of proposed scheme; (c) Average source data rates of MSE-driven MSE-driven scheme; Average FEC redundancy rates of MSE-driven scheme; (e) Average scheme; (b)FEC Average FEC(d) redundancy rates of proposed scheme; (c) Average source datasource rates of scheme; (d) Average FEC redundancy rates of MSE-driven scheme; (e) Average source data rates of data rates of QoE-driven scheme; (f) Average FEC redundancy rates of QoE-driven scheme. MSE-driven scheme; (d) Average FEC redundancy rates of MSE-driven scheme; (e) Average source

QoE-driven scheme; (f) Average FEC rates of QoE-driven data rates of QoE-driven scheme; (f)redundancy Average FEC redundancy rates scheme. of QoE-driven scheme.

(a)

(b)

Figure 15. Percentage of FEC redundancy rates in total coding rates of Tunnel and Car1(1) under different rate constraints. In the fisrt case (solid lines), packet loss rate of each videos is 1%. In the (a) (b) second case (broken lines), packet loss rates of Tunnel and Car1(1) are 3%, and packet loss rates of videos are 1%: (a) Percentage of FEC rates in totalofcoding rates of Car1(1) Tunnel; under (b) Figureother 15. Percentage of FEC redundancy ratesredundancy in total coding rates Tunnel and Figure 15. Percentage of FEC redundancy rates in total coding rates of Tunnel and Car1(1) under Percentage of FEC redundancy rates in total coding rates of Car1(1). different rate constraints. In the fisrt case (solid lines), packet loss rate of each videos is 1%. In the

different rate constraints. In the fisrt case (solid lines), packet loss rate of each videos is 1%. In the second case (broken lines), packet loss rates of Tunnel and Car1(1) are 3%, and packet loss rates of second case (broken lines), packet loss rates of Tunnel and Car1(1) are 3%, and packet loss rates of other other videos are 1%: (a) Percentage of FEC redundancy rates in total coding rates of Tunnel; (b) videos are 1%: (a) Percentage of FEC redundancy rates in total coding rates of Tunnel; (b) Percentage Percentage of FEC redundancy rates in total coding rates of Car1(1). of FEC redundancy rates in total coding rates of Car1(1).

Sensors 2018, 18, 2858

19 of 21

Sensors 2018, 18, x FOR PEER REVIEW

19 of 21

Figure 16. Product of tracking precision of each video under different total rate constraints. The Figure 16. Product of tracking precision of each video under different total rate constraints. The packet packet loss rates of Tunnel and Car1(1) are 3%, and packet loss rates of other videos are 1%. loss rates of Tunnel and Car1(1) are 3%, and packet loss rates of other videos are 1%.

In our simulation, the GOP structure is IPPP. In fact, several B frames are allowed in the GOP, our simulation, the GOPcan structure is IPPP. In fact, several B like frames are However, allowed in the the GOP, GOPInstructure like IBBPBBP provide similar performance IPPP. GOP GOP structure likehave IBBPBBP can provide similar performance IPPP. However, the are GOP structures structures which a lot of consecutive B frames (IBB…BPlike structure for example) not suitable which a lot Due of consecutive B frames (IBBmovement . . . BP structure example) areconsecutive not suitable our for ourhave scheme. to the blurring and fast in thefor video, a lot of B for frames scheme. Due to the blurring and fast movement in the video, a lot of consecutive B frames can cause can cause obvious video quality degradation. Consequently, tracking precision decreases. Therefore, obvious videoare quality degradation. Consequently, decreases. Therefore, I or P I or P frames needed to be inserted every several tracking B framesprecision in the GOP for high quality service. frames are needed to be inserted every several B frames in the GOP for high quality service. 6. Conclusions 6. Conclusions In this paper, we have proposed a joint source and channel coding algorithm for vehicle In this have proposed a joint source and channelbetter coding algorithm for vehicle in tracking tracking in paper, MIoT we systems. Our algorithm aims to provide tracking performance MIoT in MIoT systems. Our algorithm aims to provide better tracking performance in MIoT servers, is servers, which is different with previous QoS/QoE-driven schemes which aim to improve thewhich human different with previous QoS/QoE-driven schemes which aim toand improve the human experience. visual experience. We first study the effect of video content compression on visual the KCF tracking We first study the effect of video content and compression on the KCF tracking scheme and establish scheme and establish a content-aware tracking precision prediction model, which can estimate the atracking content-aware tracking precision prediction model, which can estimate the tracking precision of precision of videos with different content features and coding rates. In this model, video videos withlevel different and codingare rates. In this to model, videothe luminance and luminance and content adjacentfeatures block difference extracted represent effect oflevel lighting adjacent block difference are extracted to represent the effect of lighting conditions and video content conditions and video content complexity on tracking precision, respectively. Then, based on the complexity on tracking precision, respectively. problem Then, based on the precision prediction precision prediction model, the optimization is formulated to maximize the model, productthe of optimization problem is formulated to maximize the product of tracking precision of each surveillance tracking precision of each surveillance video under the constraint of the total transmission rate in the video under the constraint of the total transmission rateinin wireless the wireless network. wireless network. Considered the transmission error network, FECConsidered is adoptedthe to transmission error in wireless network, FEC is adopted to improve the reliability of data transmission. improve the reliability of data transmission. Compared with the traditional QoS/QoE-driven Compared QoS/QoE-driven algorithm and CBRrate algorithm, our algorithm joint sourcemore and algorithm with and the CBRtraditional algorithm, our joint source and channel allocation channel rate allocation algorithm more efficiently allocates the bit rates under the constraint total efficiently allocates the bit rates under the constraint total transmission rate and achieve better transmission rate and achieve better tracking performance. tracking performance. Author Contributions: Conceptualization and Methodology, F.L. and Y.M.; Software, L.H.; Validation and Formal Author Contributions: Conceptualization and Methodology, Analysis, L.W.; Writing-Original Draft Preparation, Y.M. and L.H.F.L. and Y.M.; Software, L.H.; Validation and Formal Analysis, L.W.; Writing-Original Draft Preparation, Y.M. and L.H. Funding: This research was funded by the National Science Foundation of China: 61671365, the National Science Foundation of China: 61701389, the Joint of Science MinistryFoundation of Education China:61671365, 6141A020223, and the Funding: This research was funded by Foundation the National of ofChina: the National Natural Science Basic Research Plan in Shaanxi Province of China: 2018JQ6022. Science Foundation of China: 61701389, the Joint Foundation of Ministry of Education of China: 6141A020223, Conflicts of Interest: TheBasic authors declare no conflict of interest. and the Natural Science Research Plan in Shaanxi Province of China: 2018JQ6022.

Conflicts of Interest: The authors declare no conflict of interest. References 1. ITU-T Recommendation, Y. 2060 Overview of the Internet of Things. Available online: http://www.itu.int/ References rec/T-REC-Y.2060/en (accessed on 29 August 2018). 1. Floris, ITU-TA.;Recommendation, Y. quality 2060 Overview themultimedia Internet internet of Things. Available online: 2. Atzori, L. Managing the of experienceof in the of things: A layered-based http://www.itu.int/rec/T-REC-Y.2060/en (accessed on 29 August 2018). approach. Sensors 2016, 16, 2057. [CrossRef] [PubMed]

2.

Floris, A.; Atzori, L. Managing the quality of experience in the multimedia internet of things: A layered-based approach. Sensors 2016, 16, 2057.

Sensors 2018, 18, 2858

3. 4. 5. 6. 7. 8.

9. 10. 11.

12. 13. 14.

15. 16. 17. 18.

19.

20. 21. 22. 23.

24.

25.

20 of 21

Huang, Q.; Wang, W.; Zhang, Q. Your glasses know your diet: Dietary monitoring using electromyography sensors. IEEE Int. Things J. 2017, 4, 705–712. [CrossRef] Catarinucci, L.; Donno, D.D.; Mainetti, L. An IoT—aware architecture for smart healthcare systems. IEEE Int. Things J. 2015, 2, 515–526. [CrossRef] Lobaccaro, G.; Carlucci, S.; Löfström, E. A review of systems and technologies for smart homes and smart grids. Energies 2016, 9, 348. [CrossRef] Choi, H.S.; Rhee, W.S. IoT-based user-driven service modeling environment for a smart space management system. Sensors 2014, 14, 22039. [CrossRef] [PubMed] Masek, P.; Masek, J.; Frantik, P. A Harmonized perspective on transportation management in smart cities: The novel iot-driven environment for road traffic modeling. Sensors 2016, 16, 1872. [CrossRef] [PubMed] Sutar, S.H.; Koul, R.; Suryavanshi, R. Integration of smart phone and IOT for development of smart public transportation system. In Proceedings of the IEEE International Conference on Internet of Things and Applications, Pune, India, 22–24 January 2016; pp. 73–78. Kokkonis, G.; Psannis, K.E.; Roumeliotis, M. Real-time wireless multisensory smart surveillance with 3D-HEVC streams for internet-of-things (IoT). J. Supercomput. 2017, 73, 1–19. [CrossRef] Wang, M.; Cheng, B.; Yuen, C. Joint coding-transmission optimization for a video surveillance system with multiple cameras. IEEE Trans. Multimedia 2018, 20, 620–633. [CrossRef] Cajote, R.D.; Ruangsang, W.; Aramvith, S. Wireless video transmission over MIMO-OFDM using background modeling for video surveillance applications. In Proceedings of the IEEE International Symposium on Communications and Information Technologies, Nara, Japan, 7–9 October 2015; pp. 237–240. Li, F.; Fu, S.; Liu, Z.; Qian, X. A cost-constrained video quality satisfaction study on mobile devices. IEEE Trans. Multimedia 2018, 20, 1154–1168. [CrossRef] Liu, Z.; Cheung, G.; Ji, Y. Optimizing distributed source coding for interactive multiview video streaming over lossy networks. IEEE Trans. Circuits Syst. Video Technol. 2013, 23, 1781–1794. [CrossRef] Feng, J.; Liu, Z.; Ji, Y. Wireless channel loss analysis—A case study using wifi-direct. In Proceedings of the International Wireless Communications and Mobile Computing Conference, Nicosia, Cyprus, 4–8 August 2014; pp. 244–249. Liu, Z.; Ishihara, S.; Cui, Y.; Ji, Y.; Tanaka, Y. JET: Joint source and channel coding for error resilient virtual reality video wireless transmission. Signal Process. 2018, 147, 154–162. [CrossRef] Bilal, M.; Khan, A.; Khan, M.U.K. A Low Complexity Pedestrian Detection Framework for Smart Video Surveillance Systems. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 2260–2273. [CrossRef] Li, F.; Zhang, S.; Qiao, X. Scene-Aware Adaptive Updating for Visual Tracking via Correlation Filters. Sensors 2017, 17, 2626. [CrossRef] [PubMed] Tran, N.N.; Long, H.P.; Tran, H.M. Scene recognition in traffic surveillance system using neural network and probabilistic model. In Proceedings of the IEEE International Conference on System Science and Engineering, Ho Chi Minh City, Vietnam, 21–23 July 2017; pp. 226–230. Jung, H.; Choi, M.K.; Jung, J. ResNet-based vehicle classification and localization in traffic surveillance systems. In Proceedings of the IEEE Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 934–940. Lee, K.H.; Hwang, J.N.; Chen, S.I. Model-based vehicle localization based on 3-d constrained multiple-kernel tracking. IEEE Trans. Circuits Syst. Video Technol. 2015, 25, 38–50. [CrossRef] Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1627–1645. [CrossRef] [PubMed] Ren, S.; He, K.; Girshick, R. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 1137–1149. [CrossRef] [PubMed] Bolme, D.S.; Beveridge, J.R.; Draper, B.A. Visual object tracking using adaptive correlation filters. In Proceedings of the IEEE Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2544–2550. Henriques, J.F.; Rui, C.; Martins, P. Exploiting the circulant structure of tracking-by-detection with kernels. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 702–715. Henriques, J.F.; Rui, C.; Martins, P. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 583–596. [CrossRef] [PubMed]

Sensors 2018, 18, 2858

26. 27.

28. 29.

30. 31.

32. 33. 34. 35. 36.

37. 38. 39.

40. 41.

21 of 21

Girod, B.; Chandrasekhar, V.; Chen, D.M. Mobile visual search. IEEE Signal Process. Mag. 2011, 28, 61–76. [CrossRef] Redondi, A.; Cesana, M.; Tagliasacchi, M. Rate-accuracy optimization in visual wireless sensor networks. In Proceedings of the IEEE International Conference on Image Processing, Orlando, FL, USA, 30 September–3 October 2012; pp. 1105–1108. Lee, K.H.; Hwang, J.N. On-road pedestrian tracking across multiple driving recorders. IEEE Trans. Multimedia 2015, 17, 1429–1438. [CrossRef] Milani, S.; Bernardini, R.; Rinaldo, R. A saliency-based rate control for people detection in video. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 2016–2020. Kong, L.; Dai, R. Object-detection-based video compression for wireless surveillance systems. IEEE Multimedia 2017, 24, 76–85. [CrossRef] Chen, X.; Hwang, J.N.; Lee, K.H.; de Queiroz, R.L. Quality-of-content (QoC)-driven rate allocation for video analysis in mobile surveillance networks. In Proceedings of the IEEE International Workshop on Multimedia Signal Processing, Xiamen, China, 19–21 October 2015; pp. 1–6. Chen, X.; Hwang, J.N.; Meng, D. A Quality-of-content-based joint source and channel coding for human detections in a mobile surveillance cloud. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 19–31. [CrossRef] Wei, H.; Gitli, R.D. Two-hop-relay architecture for next-generation WWAN/WLAN integration. IEEE Wirel. Commun. 2004, 11, 24–30. [CrossRef] Shokrollahi, A. Raptor codes. IEEE Trans. Inf. Theory 2006, 52, 2551–2567. [CrossRef] Wright, S.; Jones, S.; Lee, C. Guest Editorial-IPTV systems, standards and architectures: Part II. IEEE Commun. Mag. 2008, 46, 92–93. [CrossRef] Wu, J.; Shang, Y.; Huang, J.; Zhang, X.; Cheng, B.; Chen, J. Joint source-channel coding and optimization for mobile video streaming in heterogeneous wireless networks. EURASIP J. Wirel. Commun. Netw. 2013, 2013, 283. [CrossRef] Wu, Y.; Lim, J.; Yang, M.H. Online object tracking: A benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2411–2418. Grant, M.; Boyd, S. CVX: MATLAB Software for Disciplined Convex Programming. Available online: http://cvxr.com/cvx (accessed on 3 August 2018). ˇ Kristan, M.; Leonardis, A.; Matas, J.; Felsberg, M.; Pflugfelder, R.; Cehovin, L.; Vojír, T.; Häger, G.; Lukežiˇc, A.; Fernández, G.; et al. The visual object tracking vot2016 challenge results. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 191–217. Huang, Y.-H.; Ou, T.-S.; Su, P.-Y. Perceptual rate-distortion optimization using structural similarity index as quality metric. IEEE Trans. Circuits Syst. Video Technol. 2010, 20, 1614–1624. [CrossRef] Khan, A.; Sun, L.; Ifeachor, E. QoE Prediction Model and its Application in Video Quality Adaptation Over UMTS Networks. IEEE Trans. Multimedia 2012, 14, 431–442. [CrossRef] © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).