A Novel Large-Scale Digital Forensics Service Platform ... - IEEE Xplore

178

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 1, FEBRUARY 2012

A Novel Large-Scale Digital Forensics Service Platform for Internet Videos Hao Yin, Member, IEEE, Wen Hui, Hongzhi Li, Chuang Lin, Senior Member, IEEE, and Wenwu Zhu, Fellow, IEEE

Abstract—The increasing transmission of illegal videos over the Internet imposes the needs to develop large-scale digital video forensics systems for prosecuting and deterring digital crimes in the Internet. In this paper, we propose, design, and implement a novel large-scale Digital Forensics Service Platform (DFSP) that can effectively detect illegal content from Internet videos. More specifically, we propose a distributed architecture by taking advantage of Content Delivery Network (CDN) to improve scalability, which can process enormous number of Internet videos in real time. We propose CDN-based Resource-Aware Scheduling (CRAS) algorithm, which schedules the tasks efficiently in the DFSP according to resource parameters, such as delay and computation load. We deploy the DFSP system in the Internet, which integrates the CDN-based distributed architecture and CRAS algorithm with a large-scale video detection algorithm, and evaluate the deployed system. Our evaluation results demonstrate the effectiveness of the platform. Index Terms—Content delivery network, digital forensics, load balancing, resource scheduling, video detection.

I. INTRODUCTION S the number of digital videos distributed over the Internet increases exponentially, there is a strong need to solve their security problems. A recent study has shown that 23.8% of global Internet traffic can be associated with the illegal distribution of copyrighted work.1 Another report discovered that 91% of the Internet pornographic materials are videos that could be downloaded from different media sources.2 Reacting to these threats, it is essential to develop a digital foren-

A

Manuscript received April 06, 2011; revised July 15, 2011; accepted September 13, 2011. Date of publication October 03, 2011; date of current version January 18, 2012. This work was funded by the Project 60873254, 60736012, 61170290, 60932003 supported by the National Natural Science Foundation of China (NSFC); the Project 2010CB328105, 2011CB302600, 2012CB315800 supported by the National Basic Research Program of China (973 Program); New Century Excellent Talents in University. This work was performed when H. Li was visiting Microsoft Research Asia as a research intern. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Jia Li. H. Yin and C. Lin are with the Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China. W. Hui is with the School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China, and also with the Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China (e-mail: [email protected]). H. Li is with the Department of Electrical Engineering, Columbia University, New York, NY 10027 USA. W. Zhu is with Microsoft Research Asia, Beijing 100080, China. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TMM.2011.2170556 1http://www.copyrightalliance.org/files/Aistars%20-%20HJC%20Streaming %20Hearing%20Testimony.pdf 2http://www.bitdefender.com/news/bitdefender-survey-reveals-Internetpornography-remains-a-major-e-threat-source-1965.html

sics system for large-scale Internet videos. To date, most existing forensics systems focus on how to deal with robustness and identification accuracy [1]–[4], which have not considered how to detect the legality of large-scale Internet videos in real time. There are fewer reported works on the efficiency and scalability for large-scale video content identification [5], [6], which mainly focus on solving these problems from video retrieval algorithm perspective. This paper aims to address the efficiency and scalability issues from system perspective. There are two key challenges on large-scale forensics system for Internet videos: • The large amount of computation brought by analyzing and detecting large volume of video data. This makes it difficult to serve a large number of users concurrently. • The large amount of communication brought by transmitting a great volume of video data from media sources to the system. To address the above challenges, in this paper we propose, design, and implement a novel large-scale Digital Forensics Service Platform (DFSP) to effectively and efficiently detect illegal contents from large-scale Internet videos. To solve the scalability problem, we propose to build DFSP upon CDN [7]. Existing CDN-based distributed architectures to improve the performance of media applications include Video-on-Demand (VoD) systems [8]–[11], live video streaming systems [12]–[14], collaborative media streaming systems [15]–[17], etc. DFSP, by using CDN, can push the massive forensics tasks to the most appropriate CDN nodes, thereby effectively reducing the computational cost and the communication cost. To solve the efficiency problem, we propose CDN-based Resource-Aware Scheduling (CRAS) algorithm. Existing network-aware resource scheduling algorithms include non-adaptive scheduling algorithms that use some heuristics to select nodes [18]–[20], and adaptive scheduling algorithms that take the current network or server conditions into account [21]–[23]. In DFSP, with CRAS algorithm, user requests can be directed to appropriate nodes. Different forensics tasks are assigned to different CDN nodes based on not only the network conditions but also the computation load, thereby the massive data stream can be in parallel scheduled among multiple nodes efficiently. Moreover, in the work, we propose to integrate the proposed CDN-based distributed architecture and CRAS algorithm with a Large-scale Video Detection (LVD) algorithm. To summarize, the main contributions of this paper are as follows: • We propose and design a novel large-scale digital video forensics service architecture and platform, which can process enormous number of Internet videos in real time by employing CDN.

1520-9210/$26.00 © 2011 IEEE

YIN et al.: A NOVEL LARGE-SCALE DIGITAL FORENSICS SERVICE PLATFORM FOR INTERNET VIDEOS

• We propose a CDN-based Resource-Aware Scheduling algorithm for dynamic load balancing in DFSP, which can significantly improve the efficiency of the platform. • We implement and evaluate a deployed system in large scale, which integrates the CDN-based distributed architecture and the CDN-based resource-aware scheduling algorithm with a large-scale video detection algorithm. The rest of this paper is organized as follows. Section II introduces the related works on digital video forensics. Section III presents the system architecture. Sections IV and V describe the large-scale video detection algorithm and the CRAS algorithm in detail, respectively. Section VI deploys and implements a real system, and presents the results of performance evaluations. Finally, Section VII concludes the paper and discusses the future work. II. RELATED WORKS In this section, we overview the related works on digital video forensics techniques and digital video forensics systems.

179

To improve the forensic efficiency, some researchers began to study the data structure for effectively indexing the video features. However, the approaches are all based on single server. For example, Hoad and Zobel used local alignment to find sequences of similar values in video and clip, which provided much faster searching [38]. Lejsek et al. achieved good efficiency greatly depending on a specific hardware configuration, which is outside the range of usual computer workstations [39]–[41]; thus, their work may not be practical for large-scale deployments. Zhao et al. improved the scalability of several well-known features including color signature and visual keywords for web-based retrieval by using high-dimensional indexing techniques [5]. Recently, Shang et al. introduced a compact spatiotemporal feature to represent videos and constructed an efficient data structure to address the efficiency and scalability issues for real-time large-scale near-duplicate Web video retrieval [6]. Although these works have made good use of the server capacity, when the number of videos continues to scale up, the real-time performance may be still hard to be guaranteed.

A. Digital Video Forensics Techniques In the past few years, there has been a rapid growth on research in digital video forensics. Watermarking is a traditional forensics technique for detecting illegal copies and digital tampering [24]–[26]. It has three aspects for improvement: 1) the watermark must be embedded in legal videos prior to video distribution; 2) the nature of the original data will be changed; and 3) the watermark could be attacked or destroyed during transmission. Other than watermarking, there exists a class of statistical techniques for detecting digital tampering [27]–[29]. Recently, video fingerprint technique has drawn wide attention [30]–[32]. Different from other forensics techniques, this technique can identify illegal content by extracting a unique fingerprint from video data. The fingerprint can be extracted based on some static features such as color [33], texture [34] and shape [35], or some motion features of the video [36], [37]. The major advantage of fingerprint technique is that the fingerprint can be extracted after the media has been distributed and will not change the original data; thus, it is a very useful method for detecting Internet videos. B. Digital Video Forensics Systems With the development of digital video forensics techniques, some forensics systems have been studied and implemented. Douze et al. presented a video copy detection system [1], which used a precise representation method to decide whether or not a query video segment was a copy of a video from the indexed dataset. Xu et al. explored an effective system for analyzing the high-level structures and extracting useful features from soccer videos in order to identify the content of videos [2]. Gauch and Shivadas presented a commercial identification system by extracting features from video sequences that can characterize the temporal and chromatic variations within each clip [3]. Shen et al. outlined a system for detecting near-duplicate videos based on the dominating content and content changing trends of the videos [4]. In addition to the above systems, some other similar ones also provided effective methods for video forensics and achieved good results. However, all this work analyzes and processes videos on a single server.

III. SYSTEM ARCHITECTURE Fig. 1 depicts the DFSP system architecture. It consists of three main components: Content Access (CA), Video Detection (VD), and Resource Management (RM), explained below: 1) Content Access (CA): It is located on each CDN node, which is used to obtain video data from certain media sources. During this process, we use web crawler [42], a special technology for web search, to collect data. This technology can methodically scan or “crawl” through Internet pages to create an index of video data, so that CA can quickly provide the relevant data to detect and regularly ensure the data are up to date. 2) Video Detection (VD): VD is a group of servers distributed near CDN nodes, which are responsible for analyzing video content and judging their legality. It is composed of “Blacklist” Database, Content Analysis Servers, and Searching and Matching Servers. “Blacklist” Database stores a copy of fingerprints of improper videos. Content Analysis Servers are used to analyze the video content and extract their fingerprints. Searching and Matching Servers take charge of comparing these fingerprints with those pre-stored in “Blacklist” Database, and judging their legality. We propose a large-scale video detection algorithm in VD, which will be presented in detail in Section IV. 3) Resource Management (RM): RM controls and monitors the whole platform, in charge of scheduling, coordinating, and managing all the resources and tasks. It comprises Network Monitoring module and Load Balancing module. Since each node has a copy of “blacklist”, the forensics tasks can be done at any node. Network Monitoring module is in charge of monitoring all the nodes and media sources, so that Load Balancing module can coordinate each node and schedule different tasks depending on the overall situation. To balance the load among multiple nodes, we propose CDN-based Resource-Aware Scheduling algorithm, which will be discussed in Section V in detail. DFSP working flow is as follows. When a user requests to detect a media source (e.g., http://www.youku.com/), the re-

180


Fig. 1. DFSP system architecture.

quest is first transferred to RM. RM takes into account of the location of the designated media source and the current load of each node to find an appropriate node for the media source. The user request is then redirected to the selected node. CA of the selected node obtains video data from the designated media source, caches them, and transfers them to VD. After that, VD of the selected node analyzes the video content coming from the designated media source and extracts their fingerprints. These fingerprints are compared with those pre-stored in “Blacklist” Database. Finally, the legality of these videos, which is determined by the comparison result, is returned to the user. IV. LARGE-SCALE VIDEO DETECTION To identify illegal content in DFSP, we propose a large-scale video detection algorithm, which represents video by spatialtemporal video words and computes the relevance score by language modeling approach. Next, we give the details below. Our problem can be formulated as follows. Let represent a large set of videos. For each video , let be a set of keyframes extracted from video , where is the number of keyframes in . For each keyframe , frames before and after it are called its adjacent frames, as shown in Fig. 2. We set 100 ms between two adjacent frames. Keyframe and its adjacent frames can represent a short term of video that is with the keyframe in the middle of it. We call this . short video as a Spatial-Temporal Unit (STU), denoted as Thus, video can be represented by a set of STUs, denoted as . , where refers to the keyframe, and the adjacent frames of the keyframe otherwise. In practice, we set to be 5. Our task is to extract features from STU, represent a video by those

Fig. 2. Keyframe and its adjacent frames.

Fig. 3. Nine directions of trajectories.

features, and compute the relevance score between two videos based on their features. A. Feature Extraction and Video Representation We use motion feature and global feature to represent an STU. It is known that within a short video clip, motion (including object, background, and camera motion) can be used to identify the content of video clip [43]. We extract motion


181

Fig. 4. Mechanisms of resource-aware scheduling.

feature from STU according to the following steps. First, we track the interesting points in each frame in an STU and get the motion direction of each interesting point. In practice, we use Kanade-Lucas-Tomasi (KLT) Tracker to track the trajectory. Second, we define nine directions of the trajectories, as shown in Fig. 3. By calculating the histogram of directions in each frame, . Since is we get the motion feature vector, denoted as is 90. set to 5, the dimension of In addition to motion feature, we also use global feature in STU, including average color moment vector and average color histogram vector in HSV space, denoted by and , reand are 9 and 64, respectively. The dimensions of spectively. As a result, we get the final feature vector of STU, . denoted by Based on the feature of STU, we have . The dimension of is . To reduce the matching complexity and increase robustness, we build a codebook for the 163-dimensional feature vector by is using k-means algorithm. Then, each feature vector can be represented quantized to a video word. Thus, video by a set of video words, denoted by . B. Video Indexing and Matching To speed up the process of video matching, we index videos using inverted file. Inverted file is an effective data structure in be term frequency, information retrieval system [44]. Let which is the number of times word occurs in video . Let be the inverse document frequency of a term , which can be formulated as (1) where is the number of videos in the dataset, and refers to how many videos contain word in the dataset. Then we assign a weight for each word in video as follows: (2) Thus, video

can be represented by , where is the vocabu-

be the lary size of video words codebook. Let similarity score of video and video , which is defined as (3) Although and are long, given that they are sparse, by using inverted indexing structure, the computational complexity brought video matching is not very high. According to (3), the query videos coming from the designated media sources are compared with those pre-stored in “Blacklist” Database. The comparison results indicate whether or not they are legal. V. CDN-BASED RESOURCE SCHEDULING In this section, we propose CRAS algorithm to manage and allocate appropriate resources to different tasks in a complex network environment. The principle of CRAS algorithm is shown in Fig. 4. The CRAS algorithm assumes that any node can detect video data. Thus, the key problem is how to choose appropriate nodes to process the forensics tasks. To formulate this problem, we let be the given complete network, where is the set of nodes and is the set of edges. The node set is further partitioned into two nonempty subsets and , where is the set of CDN nodes, and is the set of media sources under surveillance. Here we assume that each user is served by exactly one CDN node. A CDN node includes CA and VD as illustrated in Fig. 1. Each CDN node has a copy of “blacklist”. With each edge , let be the cost for detecting video data from the designated media source, which depends on the network proximity and the node’s traffic load. Specifically, is defined as if otherwise

(4)

is the network proximity between node and media where source , and is the traffic load between them. To calculate network proximity, we first take advantage of some “landmark” nodes to get the dynamic network topology information. These landmarks are chosen by a well-known set of machines, and then the network link latency is measured from

182


each node to each landmark, which implies their locations. For example, suppose that there are three landmarks, the latency from an arbitrary node to each landmark is 25, 5, and 60, respectively. To quantify these latencies, we divide the range of , possible latencies into three levels: “0” for latencies in , and “2” for latencies higher than “1” for latencies in 50. So the landmark order information of node is (1,0,2). The network proximity between two nodes can be calculated by the Euclidean distance of their landmark order. As for the traffic load, since it is closely related to the server is defined as load in node , (5) is the server utilizawhere is the set of servers in node , tion in in node and is the capacity of server , and is the number of connections in server . Our goal is to minimize cost to get video data from the media source. This problem can be formulated as follows:

Minimize

(6) (7)

Algorithm 1: CRAS algorithm do

1: for all

2: Measure network latency from media source landmark 3:

Compute landmark order of media source ,

4:

for all

do

5:

Measure network latency from node to each landmark

6:

Compute landmark order of node ,

7:

Compute network proximity

8:

if

9:

(9) where

10:

Measure server utilization,

11:

Measure the number of connections,

12:

end for

13:

Compute traffic load,

14:

if

(10)

16:

else

17: end if else

20: 21:

The objective function in (6) is the sum of all computation cost incurred on each node and communication cost consumed when accessing video data from media sources. The condition (7) is the constraint on computation capacity of each node, that is, the traffic load on each node cannot exceed its computation capacity. The conditions (8) and (9) imply that for each media source, the forensics tasks can be realized on one and only one node at once. The CRAS algorithm is described in Algorithm 1. It performs global load balancing among all the nodes when users request to detect media sources. First, the location of media sources and all the nodes are measured by our landmark method, and the network proximity between each of them is calculated (Lines 1 to 7). Based on their network proximity, for each media source, its adjacent nodes are no farther than (Line 8). Then the traffic load for these adjacent nodes are measured and calculated, and their total cost are calculated in Lines 9 to 18. Next, the optimal nodes are selected from these adjacent nodes such that is minimized according to (6)–(10). These optimal nodes are added to an optimal node set , which maintains the media sources and their optimal nodes (Lines 19 and 20). For each media source, as long as it has an optimal node, it is assigned to this node; if the media source has more than one optimal node, it is assigned to the first one of (Lines 21 to 28).

then Compute total cost,

19: if otherwise.

do

for all

18:

is an indicator variable, which is defined as

between and

then

15: (8)

to each

end if

22:

end for

23:

Select optimal node

24:

Add

25:

if

26: 27: 28: 29: 30: 31:

minimizing

to optimal node set then

Assign media source to the first item of else if

then

Assign media source

to

else Exit end if

32: end for VI. SYSTEM DEPLOYMENT AND PERFORMANCE EVALUATION A. System Deployment DFSP is deployed on ChinaCache, the biggest CDN provider in China. Three Central “Blacklist” Database are deployed to back up each other. Around these central databases, 55 CDN nodes (about 500 servers) are located within each district in


183

Fig. 6. Comparison of PR curves. Fig. 5. Deployment of Central “Blacklist” Databases and CDN nodes.

China. Fig. 5 shows the actual location of Central “Blacklist” Databases and CDN nodes for the platform. The system configuration of its application servers is in the following: • Hardware: Dell 2850 (Xeon 1.6G*2/4G/2T) • Operation system: Red Hat Enterprise Linux AS 4. We perform three sets of experiments to evaluate how the system platform performs in the real environment. In the first set of experiments, we evaluate the detection accuracy of the DFSP. In the second set, we evaluate the efficiency in large scale for different length of query videos and different number of media sources. In the third set, we test the scalability of the platform. In the following subsections, we provide the details of these experiments. To test the performance of DFSP in the real environment, in our experiments we construct our “Blacklist” Database of 50 thousand videos crawled from some video sharing websites such as Youku.com3 and Tudou.com4 and their duplicates. There are 250 000 videos in total and they are about 9.5T and 83 333 h. The duplicate transformations include changing resolution, changing bitrate, changing light, and changing contrast. B. Detection Accuracy Evaluation We first evaluate detection accuracy of our Large-scale Video Detection algorithm. We compare our LVD with three existing near-duplicate video detection methods, namely color histogram-based method [45], local feature-based method [45], and LSH-based method [46]. Let CH, LF, and LSH represent these three methods, respectively. Fig. 6 illustrates the precision-recall (PR) curve of 50 000 queries. From Fig. 6, we see that our LVD algorithm outperforms all the other methods. Specifically, the performance of LVD is much better than that of CH and LSH. When recall is less than 0.6, LVD has similar precision to LF. C. Efficiency Evaluation The second set of experiments evaluates the efficiency of the platform [47]. First, we extract fingerprints from 50 000 videos, 3http://www.youku.com/ 4http://www.tudou.com/

TABLE I COMPARISON ON FINGERPRINT MATCHING TIME

and obtain the average fingerprint size as about 2 KB. The average time of fingerprint extracting is about 20 s. Then, we use these fingerprints to test the matching time of LVD algorithm, and compare it with LF and LSH, as shown in Table I. From the results shown in Table I, we can see that LVD outperforms LF and LSH on fingerprint matching time. In the following, we evaluate detection time5 with respect to video length at system level. In this experiment, users request to detect ten arbitrary media sources. Fig. 7 shows the relation between the average detection time and the video length. In Fig. 7, “local detection” means that no matter where the media sources are, all the query videos are accessed from the media sources to user’s local server for detection, but not to their appropriate nodes. From Fig. 7, we can see that the architecture design of DFSP can greatly improve the detection efficiency, regardless of the content of these query videos and where they come from. Then, we compare the detection time of the DFSP with two other existing systems [5], [6] that solved the large-scale problem from video detection algorithm perspective. Fig. 8 shows the comparison result. From Fig. 8, we can see that DFSP is more efficient. Next, we evaluate detection time with respect to the number of media sources. In this experiment, the number of media sources that are under surveillance varies from 10 to 50. Fig. 9 shows the relation between the average detection time and the number of media sources. From Fig. 9, we can see that although the detection time is affected by the number of media sources in a certain degree, our approach is much better than that using user’s local server. We further compare DFSP with two other systems in [5] and [6], in terms of their average detection time in Fig. 10. The results shown in Fig. 10 demonstrate the higher efficiency of our platform. 5Detection time refers to the time from user sending a detection request to when he receives the result.

184


Fig. 7. Average detection time versus video length.

Fig. 8. Comparison of detection time (changing video length).

Fig. 9. Average detection time versus number of media sources.

Fig. 10. Comparison of detection time (changing the number of media sources).

D. Scalability Evaluation In order to evaluate the scalability of the platform, in the third set of experiments, detection time is measured based on a variety of videos from ten arbitrary media sources. Fig. 11 shows that the detection time with respect to the number of user queries. It can be observed from Fig. 11 that the DFSP has a relatively stable detection time with the increasing number of user queries, which demonstrates better scalability. The average detection time is 35.2 s. To understand the underlying reason, we find out that it is due to the architecture design of the DFSP. Despite the heavy demands on the video detection, the DFSP architecture is able to distribute the computation load among several nodes, which saves computation time significantly. VII. CONCLUSIONS AND DISCUSSIONS In this paper, we have proposed, designed, and implemented a novel large-scale digital forensics service platform for Internet videos. More specifically, first, our proposed DFSP architecture is built upon CDN infrastructure. By taking advantage of CDN, we have constructed a distributed architecture that has good scalability. Second, we have proposed CRAS algorithm to solve dynamic load balancing problems in large scale, thereby achieving high system efficiency. Third, we have deployed our proposed DFSP in a real system with integration of the CDN-based architecture, the CRAS algorithm, and the large-scale video detection algorithm. We have deployed the

Fig. 11. Comparison of detection time (increasing user queries).

DFSP on ChinaCache and performed evaluation. The evaluation results demonstrate the effectiveness of the DFSP. To reduce the storage and computation complexity for processing a great amount of video data is one of our future research directions. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their valuable comments that greatly improved the paper.


REFERENCES [1] M. Douze, A. Gaidon, H. Jegou, M. Marszalek, and C. Schmid, “INRIA-LEARs video copy detection system,” in Proc. TRECVID Workshop, 2008. [2] P. Xu, L. Xie, S. Chang, A. Divakaran, A. Vetro, and H. Sun, “Algorithms and system for segmentation and structure analysis in soccer video,” in Proc. IEEE ICME, 2001, pp. 928–931. [3] J. Gauch and A. Shivadas, “Identification of new commercials using repeated video sequence detection,” in Proc. IEEE Int. Conf. Image Processing, 2006, vol. 3. [4] H. Shen, X. Zhou, Z. Huang, and J. Shao, “Statistical summarization of content features for fast near-duplicate video detection,” in Proc. 15th Int. Conf. Multimedia, 2007, pp. 164–165. [5] W. Zhao, S. Tan, and C. Ngo, “Large-scale near-duplicate web video search: Challenge and opportunity,” in Proc. IEEE Int. Conf. Multimedia and Expo, 2009, pp. 1624–1627. [6] L. Shang, L. Yang, F. Wang, K. Chan, and X. Hua, “Real-time large scale near-duplicate web video retrieval,” in Proc. Int. Conf. Multimedia. [7] G. Pallis and A. Vakali, “Insight and perspectives for content delivery networks,” Commun. ACM, vol. 49, no. 1, pp. 101–106, 2006. [8] F. Thouin and M. Coates, “Video-on-demand networks: Design approaches and future challenges,” IEEE Netw., vol. 21, no. 2, pp. 42–48, 2007. [9] N. Carlsson and D. Eager, “Server selection in large-scale video-ondemand systems,” ACM Trans. Multimedia Comput., Commun., Appl. (TOMCCAP), vol. 6, no. 1, p. 1, 2010. [10] Y. Guo, K. Suh, J. Kurose, and D. Towsley, “P2cast: Peer-to-peer patching for video on demand service,” Multimedia Tools Appl., vol. 33, no. 2, pp. 109–129, 2007. [11] S. Chan, “Distributed servers architecture for networked video services,” IEEE/ACM Trans. Netw. (TON), vol. 9, no. 2, pp. 125–136, 2001. [12] J. Mol, A. Bakker, J. Pouwelse, D. Epema, and H. Sips, “The design and deployment of a bittorrent live video streaming solution,” in Proc. 2009 11th IEEE Int. Symp. Multimedia, 2009, pp. 342–349. [13] H. Yin, X. Liu, T. Zhan, V. Sekar, F. Qiu, C. Lin, H. Zhang, and B. Li, “Design and deployment of a hybrid cdn-p2p system for live video streaming: experiences with livesky,” in Proc. 17th ACM Int. Conf. Multimedia, 2009, pp. 25–34. [14] U. Kozat, O. Harmanci, S. Kanumuri, M. Demircin, and M. Civanlar, “Peer assisted video streaming with supply-demand-based cache optimization,” IEEE Trans. Multimedia, vol. 11, no. 3, pp. 494–508, Apr. 2009. [15] G. Fortino, W. Russo, C. Mastroianni, C. Palau, and M. Esteve, “CDNsupported collaborative media streaming control,” IEEE Multimedia, vol. 14, no. 2, pp. 60–71, 2007. [16] G. Fortino, C. Mastroianni, and W. Russo, “Collaborative media streaming services based on CDNs,” Content Del. Netw., pp. 297–316, 2008. [17] W. Yiu, X. Jin, and S. Chan, “Vmesh: Distributed segment storage for peer-to-peer interactive video streaming,” IEEE J. Select. Areas Commun., vol. 25, no. 9, pp. 1717–1731, Dec. 2007. [18] M. Szymaniak, G. Pierre, and M. Van Steen, “Netairt: A DNS-based redirection system for apache,” in Proc. Int. Conf. WWW/Internet, 2003, pp. 435–442. [19] K. Delgadillo, Cisco distributeddirector, Cisco White Paper, 1997. [20] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, “Dynamo: Amazon’s highly available key-value store,” ACM SIGOPS Operat. Syst. Rev., vol. 41, no. 6, pp. 205–220, 2007. [21] G. Pierre and M. Van Steen, “Globule: A collaborative content delivery network,” IEEE Commun. Mag., vol. 44, no. 8, pp. 127–133, 2006. [22] C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu, “Bcube: A high performance, server-centric network architecture for modular data centers,” ACM SIGCOMM Comput. Commun. Rev., vol. 39, no. 4, pp. 63–74, 2009. [23] M. Masa and E. Parravicini, “Impact of request routing algorithms on the delivery performance of content delivery networks,” in Proc. 2003 IEEE Int. Conf. Performance, Computing, and Communications, 2003, pp. 5–12. [24] M. El’arbi, C. B. Amar, and H. Nicolas, “A video watermarking scheme resistant to geometric transformations,” in Proc. IEEE Int. Conf. Image Processing, San Antonio, TX, Sep. 2007, pp. V-481–V-484.

185

[25] J. Wang, X. Gao, and J. Zhong, “A video watermarking based on 3-d complex wavelet,” in Proc. IEEE Int. Conf. Image Processing, San Antonio, TX, Sep. 2007, pp. 493–496. [26] K. Maeno, Q. Sun, S. Chang, and M. Suto, “New semi-fragile image authentication watermarking techniques using random bias and nonuniform quantization,” IEEE Trans. Multimedia, vol. 8, no. 1, pp. 32–45, 2006. [27] W. Wang and H. Farid, “Exposing digital forgeries in video by detecting double mpeg compression,” in Proc. ACM Workshop Multimedia and Security, Geneva, Switzerland, Sep. 2006, pp. 37–47. [28] A. Popescu and H. Farid, Exposing Digital Forgeries by Detecting Duplicated Image Regions, Dept. Comput. Sci., Dartmouth College, Hanover, NH, Tech. Rep. TR2004–515. [29] A. Popescu and H. Farid, “Statistical tools for digital forensics,” in Proc. Int. Workshop Information Hiding, Toronto, ON, Canada, May 2004, pp. 128–147. [30] W. Wang and H. Farid, “Exposing digital forgeries in video by detecting duplication,” in Proc. ACM Workshop Multimedia and Security, Dallas, TX, Sep. 2007, pp. 35–42. [31] S. Lee and C. Yoo, “Robust video fingerprinting based on affine covariant regions,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Las Vegas, NV, Mar. 2008, pp. 1237–1240. [32] Y. Yan, B. Ooi, and A. Zhou, “Continuous content-based copy detection over streaming videos,” in Proc. IEEE Int. Conf. Data Engineering, Cancún, México, Apr. 2008, pp. 853–862. [33] K. Van De Sande, T. Gevers, and C. Snoek, “Evaluating color descriptors for object and scene recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1582–1596, Sep. 2010. [34] B. Bouaziz, W. Mahdi, M. Ardabilain, and A. Hamadou, “A new approach for texture features extraction: Application for text localization in video images,” in Proc. IEEE Int. Conf. Multimedia and Expo, 2006, pp. 1737–1740. [35] L. Zhang, B. Wu, and R. Nevatia, “Pedestrian detection in infrared images based on local shape features,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007, pp. 1–8. [36] P. Smith, N. D. V. Lobo, and M. Shah, “Temporalboost for event recognition,” in Proc. IEEE Int. Conf. Computer Vision, 2005, vol. 1, pp. 733–740. [37] H. Zhong, J. Shi, and M. Visontai, “Detecting unusual activity in video,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, 2004, vol. 2, pp. II–819. [38] T. Hoad and J. Zobel, “Fast video matching with signature alignment,” in Proc. 5th ACM SIGMM Int. Workshop Multimedia Information Retrieval, 2003, pp. 262–269. [39] H. Lejsek, Á Jóhannsson, F. Ásmundsson, B. Jónsson, K. Daoason, and L. Amsaleg, “Videntifier™ forensic: A new law enforcement service for automatic identification of illegal video material,” in Proc. ACM Workshop Multimedia in Forensics, Beijing, China, Oct. 2009, pp. 19–24. [40] F. Ásmundsson, H. Lejsek, K. Daðason, B. Jónsson, and L. Amsaleg, “Videntifier™ forensic: Robust and efficient detection of illegal multimedia,” in Proc. ACM Int. Conf. Multimedia, Beijing, China, Oct. 2009, pp. 999–1000. [41] H. Lejsek, H. Pormóottir, F. Ásmundsson, K. Daoason, Á. Jóhannsson, B. Jónsson, and L. Amsaleg, “Videntifier™ forensic: Large-scale video identification in practise,” in Proc. ACM Workshop Multimedia in Forensics, Security and Intelligence, 2010, pp. 1–6. [42] C. Castillo, “Effective web crawling,” in Proc. ACM SIGIR Forum, Dec. 2005, pp. 55–56. [43] W. Jiang, C. Cotton, S. Chang, D. Ellis, and A. Loui, “Short-term audio-visual atoms for generic video concept classification,” in Proc. 17th ACM Int. Conf. Multimedia, 2009, pp. 5–14. [44] J. Sivic and A. Zisserman, “Video Google: A text retrieval approach to object matching in videos,” in Proc. IEEE Int. Conf. Computer Vision, 2003, pp. 1470–1477. [45] X. Wu, A. Hauptmann, and C. Ngo, “Practical elimination of nearduplicates from web video search,” in Proc. 15th Int. Conf. Multimedia, 2007, pp. 218–227. [46] W. Dong, Z. Wang, M. Charikar, and K. Li, “Efficiently matching sets of features with random histograms,” in Proc. 16th ACM Int. Conf. Multimedia, 2008, pp. 179–188. [47] Call for Proposals on Image & Video Signature Tools, Lausanne, Switzerland, 2007, Tech. Rep. ISO/IEC MPEG W9216.

186


Hao Yin (M’11) received the B.S., M.E., and Ph.D. degrees from Huazhong University of Science and Technology, Wuhan, China, in 1996, 1999, and 2002, respectively, all in electrical engineering. Currently, he is an Associate Professor with the Research Institute of Information Technology in Tsinghua University, Beijing, China. He also works as the Director of the Tsinghua-ChinaCache Content Delivery Network Research Institute and the Vice-Director of Ministry of Education & Microsoft Joint “Multimedia and Network Technology” Key Lab. Since 2006, he was invited as Chief Scientist by the leading CDN Service Provider ChinaCache (NASDAQ: CCIH). His research interests are content delivery networks, performance evaluation for Internet and wireless networks, multimedia communications, and network security. He has published over 80 papers in refereed journals and conferences, and holds several patents in the previous mentioned research areas. Dr. Yin is on editorial boards of Advances in Multimedia and Ad Hoc Networks Journal. In addition, he has been involved in organizing over 12 conferences. He has received a series of honors and awards, including First Prize of Natural Science Award of the Chinese Ministry of Education (2005), Second Prize of Innovation Award of the China Computer Association (2007), Program Supported for New Century Excellent Talents in University of the Chinese Ministry of Education (2009), and First Prize of Electronic Information Science and Technology Award of the Chinese Institute of Electronics (2010).

Wen Hui received the B.S. degree from Beijing University of Technology, Beijing, China, in 2004, and the M.E. degree from Beijing University of Posts and Telecommunications, Beijing, China, in 2008. She is currently pursuing the Ph.D. degree as a joint student in the School of Computer and Communication Engineering, University of Science and Technology Beijing, China, and the Department of Computer Science and Technology, Tsinghua University, Beijing, China. Her research interests are multimedia computing, communications, and applications.

Hongzhi Li received the B.E. degree from Zhejiang University, Hangzhou, China, in 2010. He is currently pursuing the Ph.D. degree in the Department of Electrical Engineering, Columbia University, New York. From 2009 to 2011, he was with Microsoft Research Asia, Beijing, China, as a Research Intern. His research interests are in the areas of multimedia search, computer vision, and mobile multimedia computing.

Chuang Lin (M’03–SM’04) received the Ph.D. degree in computer science from Tsinghua University, Beijing, China, in 1994. He is a Professor of the Department of Computer Science and Technology, Tsinghua University. His current research interests include computer networks, performance evaluation, network security analysis, and Petri net theory and its applications. He has published more than 300 papers in research journals and IEEE conference proceedings in these areas and has published three books. Prof. Lin is a member of ACM Council and the Chinese Delegate in TC6 of IFIP. He serves as the Technical Program Vice Chair, the 10th IEEE Workshop on Future Trends of Distributed Computing Systems (FTDCS 2004); the General Chair, ACM SIGCOMM Asia workshop 2005; an Associate Editor, IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY; an Area Editor, Journal of Computer Networks; and an Area Editor, Journal of Parallel and Distributed Computing.

Wenwu Zhu (M’97–SM’01–F’10) received the Ph.D. degree from Polytechnic Institute of New York University in 1996. His research interest is in the area of wireless/Internet multimedia communication and computing. He worked at Bell Labs during 1996–1999. He was with Microsoft Research Asia’s Internet Media Group and Wireless and Networking Group as research manager from 1999 to 2004. He was the director and chief scientist at Intel Communication Technology Lab, China. He is a senior researcher at the Internet Media Group at Microsoft Research Asia. He has published more than 200 refereed papers and filed 40 patents. He participated in the IETF ROHC WG on robust TCP/IP header compression over wireless links and IEEE 802.16m WG standardization. Dr. Zhu has served and been serving on various editorial boards of IEEE journals, such as Guest Editor for the PROCEEDINGS OF THE IEEE and the IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, Associate Editor for the IEEE TRANSACTIONS ON MOBILE COMPUTING, the IEEE TRANSACTIONS ON MULTIMEDIA, and the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY. He received the Best Paper Award in the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY in 2001 from the IEEE Circuits and Systems Society. He received the Best paper award in 2004 from the Multimedia Communication Technical Committee in the IEEE Communications Society. He served as Chairman of the Visual Signal Processing and Communication Technical Committee from 2006–2008, and served on the Steering Committee board of the IEEE TRANSACTIONS ON MOBILE COMPUTING from 2007–2010. He currently serves as Chairman of IEEE Circuits and System Society Beijing Chapter and advisory board on the International Journal of Handheld Computing Research. He will serve as TPC co-chair for IEEE ISCAS 2013.