Efficient processing of requests with network coding in

0 downloads 0 Views 521KB Size Report
23 Dec 2012 - skip computation of the priority of most vertices in the graph, thereby ..... mum clique in the graph theory, AC can flexibly encode any number of ...
Efficient processing of requests with network coding in on-demand data broadcast environments Jun Chena , Victor C. S. Leeb,∗, Kai Liub , G. G. Md. Nawaz Alib , Edward Chanb a

b

School of Information Management, Wuhan University, Wuhan, Hubei, China Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

Abstract On-demand broadcast is an effective wireless data dissemination technique to enhance system scalability and the ability to handle dynamic user access patterns. In traditional on-demand broadcast, only one data item can be retrieved by mobile clients during the course of each broadcast, which limits bandwidth utilization and throughput. In this paper, we consider data broadcast with network coding in on-demand broadcast environments. We analyze the coding problem in on-demand broadcast and transform it into the problem of finding the maximum clique in graph theory. Based on our analysis, we first propose a new coding strategy called AC, which exploits the cached information related to clients and data items requested by them, to implement a flexible coding mechanism. Then, based on AC, we propose two novel coding assisted algorithms called ADC-1 and ADC-2 which consider data scheduling, in addition to network coding. In ADC-1 data scheduling and coding are considered separately, while these two factors are fully integrated in ADC-2. The performance gain of our proposed algorithms over traditional and other coding assisted broadcast algorithms is demonstrated through simulation results. Our algorithms not only reduce request response time but also utilize broadcast channel bandwidth more efficiently. Keywords: Network coding, mobile computing, on-demand broadcast, data ∗

Corresponding author Email addresses: [email protected] (Jun Chen), [email protected] (Victor C. S. Lee), [email protected] (Kai Liu), [email protected] (G. G. Md. Nawaz Ali), [email protected] (Edward Chan) Preprint submitted to Information Sciences

December 23, 2012

scheduling 1. Introduction Data broadcast has attracted many academic researchers’ attention as it has been increasingly used to disseminate information to large populations of mobile clients in many new mobile applications, such as location based services, where efficient data broadcast is critical to the system performance [1, 2]. In general, there are two major data broadcast approaches [3, 4]: (a) push-based; and (b) pull-based. Push-based broadcast periodically broadcasts data according to a static schedule which is computed offline from clients’ historical data access statistics. Pull-based broadcast, commonly referred to as on-demand broadcast, compiles requests in the service queue and broadcasts data based on various attributes of pending data items at the server. Pushbased broadcast is efficient with applications which require a small set of data items with stable access pattern, while on-demand broadcast is more widely used for dynamic, large-scale data dissemination [5, 6]. In this paper, we focus our discussion on on-demand broadcast, where response time is one of the most important metrics to measure the system performance [7]. In existing on-demand broadcast strategies, mobile users can retrieve only one data item from each broadcast unit. This constraint restricts full utilization of the limited broadcast bandwidth and leads to long response time to mobile clients. Network coding has been proposed in recent years to improve system performance in wireless networks. Previous works show that network coding can utilize available broadcast bandwidth more efficiently to improve throughput and energy consumption in multicast communication [8, 9, 10, 11]. Our previous work [12] addressed the coding problem in on-demand broadcast environments by proposing application of the CR-graph in wireless mesh networks [13, 14] to capture the relationships among all cached and requested data items. In this work we further investigate the coding and scheduling problems in on-demand broadcast environments. Based on our analysis, we propose two novel broadcast strategies and compare their performance with existing algorithms. Our main contributions are as follows: 1. We give a probabilistic analysis of the extent to which coding can be effective in on-demand broadcast environments. Effectiveness of network

2

coding and the necessity of adopting adaptive network coding strategy in this dynamic environment is established by the analysis. 2. We propose a coding strategy AC (Adaptive Coding) for on-demand broadcast applications, based on the CR-graph. AC removes the need for constructing the whole CR-graph G by defining a sub-graph G’ of G. This helps prune the search space and reduce computation cost. 3. Two new broadcast algorithms called ADC-1 and ADC-2 (Adaptive Demand-oriented Coding) are proposed that combine the strength of data scheduling and network coding. ADC-1, which is similar to ADC proposed in [12], considers data scheduling and network coding in isolation while ADC-2 integrates coding and scheduling in an attempt to achieve better performance. It is demonstrated that ADC-2 not only serves the maximum number of clients in each broadcast unit but also significantly improves overall system performance in terms of request response time and broadcast bandwidth utilization. Both ADC-1 and ADC-2 outperform existing algorithms significantly over a wide range of settings. 4. We introduce two mechanisms in ADC-2 to make its implementation efficient and practical. First, an efficient data structure and a pruning technique are used to reduce the search space. Second, the degree of a vertex, i.e. the upper bound of its maximum clique size, is used to skip computation of the priority of most vertices in the graph, thereby saving time required for processing in search for the most rewarding data item to be broadcast. The rest of this paper is organized as follows. Section 2 gives the background of the research area. Section 3 describes the system architecture. Section 4 analyzes the coding problem in on-demand broadcast environment. Section 5 outlines our new coding assisted algorithms, ADC-1 and ADC-2. Section 6 describes the simulation model and experimentally compares the performance of ADC-1 and ADC-2 with existing algorithms. Finally,we present our conclusions and suggested direction for future research in Section 7. 2. Related works Data broadcast through wireless channel is a common way to disseminate information to a large population of mobile clients. In traditional data broadcast environments, data scheduling algorithms play an important role. 3

Various scheduling algorithms have been proposed to determine the sequence of broadcast data items in on-demand broadcast environments. Dykeman and Wong [4] proposed two widely used strategies: Most-Requested-First (MRF) and Longest-Wait-First (LWF). MRF broadcasts the data item with the maximum number of pending requests (also called broadcast productivity) first. When the system load increases and the data access pattern follows the Uniform distribution, MRF has been shown to have the shortest response time. For LWF, the sum of the time that all pending requests for a data item have been waiting is computed, and the data item with the largest total waiting time is chosen for broadcasting next. When the data access pattern follows the Zipf distribution, LWF has the best performance. Although LWF outperforms other strategies in minimizing wait time, it is expensive to implement. Tan, K.L. and Ooi, B.C. [15] proposed an adaptive batching scheduling scheme called Maximum Queue Length with Time restriction (MQL-time). In MQL-time, a predetermined and fixed time, say t s, is assigned to each request. Requests are served based on the maximum queue length under normal conditions while requests that have been waiting for more than t s are given higher priority and are served immediately, in order to reduce the overall waiting time. Aksoy and Franklin [16] proposed a low-overhead and scalable scheduling algorithm called RxW, an approximation of the LWF algorithm wherein the number of pending requests for a data item is multiplied by the amount of time for which the oldest outstanding request for the data item has been waiting in the service queue. The data item with the largest product is chosen for broadcasting. RxW combines the strength of MRF and FCFS to provide good performance for both hot and cold items. For systems supporting time critical services, an online scheduling algorithm called SIN (Slack time Inverse Number of pending requests) [17] has been proposed. It is motivated by two existing strategies: EDF (Earliest Deadline First), which considers urgency of requests, and MRF, which focuses on broadcast productivity. According to the simulation results presented in [16, 17], SIN outperforms other existing on-demand broadcast algorithms significantly. In all the above scheduling algorithms, only one data item can be retrieved from the broadcast channel by mobile clients in each broadcast unit. This constraint limits both bandwidth utilization and throughput of broadcast systems. Recently, researchers have proposed the use of network coding to further improve performance. Yitzhak and Tomer [18] first provided insights into the coding problem in on-demand data broadcast. They pointed 4

out that the basis of coding in on-demand broadcast is that the server needs to have the full knowledge of requested and cached data items of each client. Recently, several strategies have been proposed to apply network coding technique to data broadcast [19, 20, 21, 22]. Dong et al. [20] proposed some network coding schemes for wireless broadcast in order to reduce the number of retransmissions by the server. The server combines and retransmits the lost packets in a certain way so that with one transmission multiple clients are able to recover packets they have lost or missed. Yang and Chen [21] provided a theoretical treatment of the coding problem in data broadcast. They transformed the coding problem into an optimization problem and proved the problem to be NP-hard. However, their analysis is based on the pushbased broadcast scenario, in which requested and cached data items do not change dynamically. Chu et al. [19] proposed an algorithm called OE, in which both scheduling and network coding are considered in order to improve performance. They showed that adopting traditional network coding (Linear Combination) [23] in data broadcast can help reduce the response time to some extent. However, response time can not be minimized because traditional network coding encodes all data items and unnecessary encoding of data items (i.e. redundant encoding) leads to a high access delay. To eliminate redundant encoding, OE encodes two data items in each broadcast unit to reduce response time. In on-demand broadcast, data items requested and cached by clients are dynamically changing. An efficient coding assisted algorithm for data broadcast should make full use of clients’ dynamic information for coding and scheduling decisions in order to serve the maximum number of clients in each broadcast unit. OE encodes a small fixed number of data items in each broadcast unit, which means the coding mechanism only utilizes part of the available information to encode data. Moreover OE considers scheduling and coding separately and fails to exploit the possible performance gain from combining data scheduling and network coding. 3. System architecture Our model is based on the typical architecture of an on-demand data dissemination system [16] in mobile environments (Figure 1). The system consists of one server and a number of clients. Each client has a local cache to store the requested data items retrieved from the encoded packet broadcast by the server [18]. Due to the limited space available for cached data, a cache replacement policy is required. When a client cannot find a data item 5

Data Encoder

A+B C+D+E H+F+K+M G+A+I Data Scheduler

Downlink Channel Broadcast

Service Queue

Database

Server Client 1

Client 2

Client N

On-demand requests

Uplink Channel

Figure 1: System Architecture

in its cache, the client sends a request for the data item and piggybacks its updated cache content to the server through an uplink channel. After sending the request to the server, the client listens to the broadcast channel to satisfy its request. Only when the client has received and successfully decoded its requested data item from the encoded packet broadcast by the server can its request be satisfied. On receiving a request, the server inserts it into a service queue. The server first retrieves a requested data item from the local database based on a certain scheduling algorithm and then generates an encoded packet based on the information about clients’ cached and requested data items. The simple and low-overhead bitwise XOR operation is commonly used in encoding and decoding data. Finally, the server broadcasts the encoded packet through the downlink channel. Satisfied requests are removed from the service queue. The primary goals are to minimize the average request response time [16, 24, 25] and to make efficient use of the limited broadcast bandwidth.

6

Table 1: Summary of Notations

Notation dj ci prij psij pnij sl N n M(j1 , j2 )

Description The j th data item in the database client i The probability of client ci requests data item dj The probability of client ci stores data item dj in its cache The probability of client ci neither stores data item dj in its cache nor requests dj Maximum number of data items stored in each client’s cache Number of data items in the database Number of clients The expected number of clients that can decode the encoded packet dj1 ⊕ dj2

4. Coding problem in on-demand data broadcast 4.1. Coding capacity analysis In this section, we explore the potential of using network coding in ondemand broadcast environments by analyzing the coding capacity probabilistically. The coding capacity is measured by the expected number of clients that can decode a data item out of an encoded packet. The larger is the number of clients that can decode the encoded packet, the higher is the coding capacity of the system. The notations used in the analysis are summarized in Table 1. For simplicity, we first analyze the request and storage probabilities of the client. In our system model, we assume that if a client cannot find a data item in its cache, the client sends to the server a request for the data item. Thus, there exist three cases for client ci : • ci requests dj , we denote this probability by prij • ci has dj in its cache (and therefore does not request dj ), we denote this probability by psij ; • ci does not have dj and does not request dj , we denote this probability by pnij . 7

It is easy to see that prij + psij + pnij = 1. We assume that the profile of data stored by individual clients follows their respective data access pattern because in on-demand broadcast environments, data items cached by clients are those broadcast by the server which are, in turn, based on requests submitted by the clients. Thus, we have the following lemma. Lemma 1: The probability that client ci has dj in its cache, psij satisfies psij = (1 − prij )(1 − (1 − prij )sl ), where sl is the cache size and prij is the probability that client ci requests data item dj . ¯ Proof: Let S={ci stores dj }. Let C={ci requests dj }, then C={c i does not request dj }. Thus, the relationships among the above three probabilities is as follows. ¯ (C) ¯ psij = P (S) = P (S | C)P (C) + P (S | C)P (1) Recall our assumption that if client ci requests data item dj , then it does not have dj in its cache, which means P (S | C) = 0. Thus, Equation (1) can be simplified as: ¯ (C) ¯ = P (S | C)(1 ¯ psij = P (S) = P (S | C)P − prij )

(2)

Next, we need to find P(ci stores dj | ci does not request dj ), namely ¯ Since the stored data profile follows the data access pattern, given P (S | C). that the cache size is sl, for every stored data item in the cache, it might be dj with the probability of prij . Thus, the probability that the cache does not contain dj on the condition that ci does not request dj is (1 − prij )sl . Thus, ¯ = 1 − (1 − pr )sl . Therefore, ps = (1 − pr )(1 − (1 − pr )sl ). P (S | C) ij ij ij ij Consider that dj1 , dj2 are two arbitrary data items clients may request. A client ci is regarded as contributing to the coding capacity if ci requests dj1 (dj2 ) and at the same time ci has stored dj2 (dj1 ) in its cache. In this case, if the server broadcasts the encoded packet dj1 ⊕dj2 , client ci can decode dj1 (dj2 ) from the encoded packet since it already has dj2 (dj1 ). Thus, we can observe that if most clients contribute to the coding capacity, the server would prefer broadcasting dj1 ⊕ dj2 to dj1 and dj2 separately. Let the expected number of clients which contribute to the coding capacity be M(j1 , j2 ). M(j1 , j2 ) measures the expected number of clients that can decode the encoded packet dj1 ⊕ dj2 . If M(j1 , j2 ) is high, the server would prefer broadcasting dj1 ⊕ dj2 since a large number of clients can decode the encoded packet and obtain their requested data items. In contrast, if M(j1 , j2 ) is low, the server would

8

([SHFWHGQXPEHURIFOLHQWVWKDW UHFHLYHWKHEURDGFDVWGDWD

  0

(

(



    



  7+(7$





Figure 2: Expected number of clients that receive the broadcast data under different data access patterns

broadcast dj1 and dj2 separately without coding. Thus, M(j1 , j2 ) reflects the effectiveness of using network coding for on-demand data broadcasting. Theorem 1: Assume there are n clients and N data items in the database. For any two requested data items dj1 and dj2 , the expected number of clients that the encoded packet dj1 ⊕ dj2 , M(j1 , j2 ), satisfies M(j1 , j2 ) = Pn canr decode s s r i=1 (pij1 pij2 + pij1 pij2 ) Proof: There are two ways client ci may contribute to coding capacity. (1) ci requests dj1 while storing dj2 ; (2) ci requests dj2 while storing dj1 . For case (1), the probability is prij1 psij2 . For case (2), the probability is psij1 prij2 . Thus, the probability for ci to contribute to the coding capacity is prij1 psij2 + psij1 prij2 . Considering all clients, we can sum the probability of every client together to get the expected number of P clients contributing to the coding capacity. Therefore, we have M(j1 , j2 ) = ni=1 (prij1 psij2 + psij1 prij2 ). For comparison purpose, we define E(j) to be the expected number of clients that P can receive the requested data item dj without network coding, E(j) = ni=1 prij . To analyze the coding capacity quantitatively, we compute numerical values of M(j1 , j2 ), E(j1 ) and E(j2 ), using the Zipf distribution. The Zipf distribution is a popular distribution used to model data access pattern of mobile clients in the literature [16, 17, 26]. P In the Zipf distribution, 1 1 r t the access probability of the i h data item is iθ / N i=1 iθ . Thus, pij = P N 1 / j=1 j1θ , 1 ≤ i ≤ n, and 1 ≤ j ≤ N. Let N = 1000, n = 300, and we set jθ dj1 and dj2 to be the two most requested data items, which correspond to d1 and d2 in the database, respectively. 9

Figure 2 shows the expected number of clients that receive the broadcast data under different data access patterns when cache size equals 60. Values of M(j1 , j2 ) can be evaluated in terms of Lemma 1 and Theorem 1, while values of E(j1 ) and E(j2 ) can be obtained by definition. In Figure 2, we use M, E1 and E2 to represent M(j1 , j2 ), E(j1 ) and E(j2 ), respectively. In the Zipf distribution, the data access pattern gets more skewed with the increasing value of THETA, and when THETA equals 0, the data access pattern follows the uniform distribution and every data item has the same chance to be accessed. It can be observed that M has a higher value than E1 and E2 with the increasing value of THETA. This indicates that with network coding, more clients can receive their requested data items than without network coding when the data access pattern becomes skewed. According to the above analysis, it is obvious that network coding technology does work in on-demand broadcast environments. A coding strategy which can fully exploit clients’ requested and cached information to maximize the coding capacity can surely save bandwidth and enhance system performance. 4.2. CR-graph construction In this section, we define and construct a graph called the CR-graph G, initially proposed in [13, 14] to represent the relationship between clients and their requests. This graph is used to make encoding decisions to serve as many clients as possible in each broadcast unit and also to eliminate redundant encoding. To construct the graph, we need to utilize information about clients’ requested and cached data items. As described in the last section, a client who cannot find a data item in its own cache will submit a request for the data item together with its cache content to the server. So, it is practical to assume the availability of this information at the server. Before constructing the CR-graph G, some definitions are given as follows. We assume database D contains N data items where D = {d1 , d2 , · · · , dN }. Let C = {c1 , c2 , · · · , cn } be the set of mobile clients’ ID in the on-demand broadcast system. Definition 4.1. Let Si = {dαi (1) , dαi (2) , · · · , dαi (|Si |) } be the set of cached data items of client ci ,which is a subset of database retrieved from encoded packets broadcast by the server in the past and |Si | denotes the number of data items stored in client ci ’s cache and 1 ≤ |Si | ≤ N. Note that 1 ≤ αi(ε) ≤ N and 1 ≤ ε ≤ |Si |. 10

v82

d1 d2 Cached data items

d2 d3 d6 Cached data items

c2 d1 d2 d4 Cached data items

c8

d1 d4 d6 Cached data items

c7

d2 d3 Cached data items

S

d2 d3 d5 Cached data items

c4 c5

d1 d3 d4 Cached data items

Requested data items

c1

c3

c6

v11

d2 d3

c1 c2 c3 c4 c5 c6 c7 c8

d1 d3 d3 d1 d2 d1 d4 d2

Cached data items

v23

v74

v61

v33

v52

(a) An example of on-demand data broadcast

v41

(b) CR-graph G of example given in (a)

Figure 3: An example of CR-graph G construction

Definition 4.2. Let Qi = {dj } be the request for data item dj issued by client ci where 1 ≤ j ≤ N. Note that Qi ∩ Si = ∅, i.e., the requested data item can not be found in the cache. In the CR-graph G (V, E), each vertex represents a data item requested by a client. That is, for a client ci who requests data item dj , there is a corresponding vertex vij ∈ V (G), where 1 ≤ i ≤ n and 1 ≤ j ≤ N. For any two different vertices vi1 j1 and vi2 j2 ∈ V (G), an undirected edge (vi1 j1 , vi2 j2 ) ∈ E(G) exists under either of the following conditions. • If j1 = j2 ; that is, if client ci1 and client ci2 request the same data item, there is a link between vertices vi1 j1 and vi2 j2 . • If j1 6= j2 , dj2 ∈ Si1 and dj1 ∈ Si2 ; that is, if client ci1 ’s cache contains the data item being requested by client ci2 and vice versa, there is a link between vertices vi1 j1 and vi2 j2 . Consider an on-demand data broadcast scenario which consists of a server S and eight mobile clients c1 , c2 , · · · , c8 as shown in Figure 3(a). Each client stores some data items in its local cache and requests a data item by issuing a request to the server. Figure 3(b) shows the CR-graph, constructed on the basis of the client’s states in Figure 3(a). For instance, there is an edge (v11 , v41 ) because both c1 and c4 request the same data item d1 . There is an edge (v11 , v23 ) because c1 has d3 , which is requested by c2 and c2 has d1 , which is requested by c1 . 11

4.3. Coding based on graph theory According to the construction of the CR- graph G, we can deduce the following lemma from [13]. Lemma 2 For e = (vi1 j1 , vi2 j2 ) ∈ E(G), (1) if j1 = j2 , client ci1 and client ci2 can both retrieve dj1 if the server broadcasts dj1 alone; 2) if j1 6= j2 , client ci1 can derive dj1 and client ci2 can derive dj2 if the server broadcasts dj1 ⊕ dj2 (bit-by-bit XOR). Proof: Due to the nature of the broadcast mechanism, one broadcast can serve all clients who have requested the same data item. Therefore, (1) is obvious. In the case of j1 6= j2 , according to the construction of G, e = (vi1 j1 , vi2 j2 ) ∈ E(G) means client ci1 ’s cache contains dj2 and client ci2 ’s cache contains dj1 . Therefore, if the server broadcasts dj1 ⊕ dj2 , client ci1 can derive its requested data item dj1 by dj2 ⊕ (dj1 ⊕ dj2 ) and client ci2 can derive its requested data item dj2 by dj1 ⊕ (dj1 ⊕ dj2 ). Thus, (2) holds. Let δ = {vi1 j1 , vi2 j2 , · · · , vik jk } be an arbitrary clique in G, where δ ⊆ V (G), and |δ| = k. A clique is a subset of vertices such that any two vertices in the subset are connected. Let Cδ = {ci |vij ∈ δ} be the set of clients covered in δ, where Cδ ⊆ C. Let Dδ = {dδ(j) |vij ∈ δ} be the set of requested data items in δ, thus, Dδ = {dδ(1) , dδ(2) , · · · , dδ(|Dδ |) }, where 1 ≤ |Dδ | ≤ k. Lemma 3 By broadcasting the encoded packet γ = dδ(1) ⊕dδ(2) ⊕· · ·⊕dδ(|Dδ |) , for any ci ∈ Cδ , it can derive its requested data item dj from the broadcast encoded packet γ if its corresponding vij ∈ δ. Proof: Since δ is a clique, each vertex in δ has an edge to every other vertex in the clique. In addition, each vertex represents a unique client. Therefore, according to the edge definition of G, each client ci ∈ Cδ should have in its cache all data items in Dδ except the requested data item dj . Therefore, client ci can derive its requested data item dj by decoding the received encoded packet γ with its stored data items. Lemma 3 indicates that one encoded packet can serve all clients in a clique δ ⊆ V (G) and the number of served clients is equal to the number of vertices in the clique (clique size). Thus, according to Lemmas 2 and 3, the problem of serving the maximum number of clients with each encoded packet in each broadcast unit can be transformed into the problem of finding the maximum clique in the CR-graph G. An example of maximum clique δmax is shown in Figure 3(b), in which the vertices are connected by thick edges and δmax = {v11 , v23 , v33 , v41 , v52 , v61 }, Cδmax = {c1 , c2 , c3 , c4 , c5 , c6 } and Dδmax = {d1 , d2 , d3}. Therefore, instead of 12

broadcasting d1 , d2 and d3 in separated broadcast units (one broadcast unit for each data item), broadcasting the encoded packet γ = d1 ⊕ d2 ⊕ d3 in one broadcast unit can serve most of the clients in the graph G. By serving a greater number of clients in each broadcast unit, it is expected that the average request response time can be reduced substantially. 5. Proposed algorithms 5.1. Motivation In traditional data broadcast systems, when a data item is broadcast, only those mobile clients who request a particular data item can be served. However, some recent studies have showed that by using network coding, clients requesting different data items can be served simultaneously by broadcasting an encoded packet. In this section, an effective coding strategy based on the CR-graph called AC is proposed to fully exploit broadcast bandwidth by serving as many clients as possible with an encoded packet. In addition, it has been demonstrated by various researchers that data scheduling can enhance broadcasting performance. However, to the best of our knowledge, no existing solution that effectively integrates data scheduling and network coding for on-demand data broadcast exists. We now propose two algorithms to exploit the advantages of combining network coding and data scheduling. ADC-1 considers scheduling and coding in two separate phases while ADC2 merges them into a single step. Moreover, we also introduce a number of mechanisms to reduce the overheads and computation complexity of the algorithms so that they can be deployed in actual systems. 5.2. Proposed coding strategy In this section, based on the CR-graph defined in the last section, we propose a coding strategy called AC (Adaptive Coding). Given a candidate vertex, AC constructs a sub-graph G’ of G, as defined below. We will explain how this candidate vertex is identified in our proposed algorithms as presented in the next sub-section. Recall that a vertex represents a data item requested by a client. ′





Definition 5.1. Let G (V , E ) be a sub graph of G, edge connections ′ between vertices in G follow the rules of construction of G. Only vertices ′ that satisfy the following conditions are included in G :

13

• The candidate vertex vmk (denotes data item dk requested by client cm ) ′ should be included in graph G . ′

• Vertices connected to vmk should be included in graph G . Next, we find the maximum clique in graph G’, which covers the candidate vertex, as defined below. v

ij Definition 5.2. Let δmax = {vi1 j1 , vi2 j2 , · · · , vik jk } be the maximum vij vij clique of the candidate vertex vij where vij ∈ δmax . |δmax | denotes the size vij of the maximum clique with value k. As proved in Lemma 2, |δmax | denotes the number of clients that can be served by broadcasting the encoded packet generated from the maximum clique.

It is well-known that finding the maximum clique in a graph is NP-hard and many efficient approximation algorithms have been proposed in the literature ([27, 28, 29]). AC adopts the approach presented in [27] to find the maximum clique in graph and the time complexity of this algorithm is polynomial. The pseudo code of the coding strategy AC for a candidate vertex is presented as follows. There are a number of advantages in the proposed coding strategy. Firstly, by transforming the coding problem into the problem of finding the maximum clique in the graph theory, AC can flexibly encode any number of data items covered by the maximum clique in each broadcast unit. Secondly, AC is dynamic and adaptive to the system environment. For instance, the AC is more likely to find a larger maximum clique in the CR-graph if clients are equipped with larger caches. Last but not least, given a candidate vertex, AC constructs a sub-graph covering the candidate vertex without the need to construct the whole CR-graph. According to the definition of CR-graph, the time complexity of constructing the whole CR-graph is O(n2 − n) , while the time complexity of constructing the sub-graph in AC is O(n − 1) , where n is the number of clients who have sent requests to the server. This helps prune the search space and reduce computation cost. 5.3. Proposed coding assisted algorithms In this section, we propose two coding assisted algorithms. ADC-1 considers scheduling and coding in two separate phases. In contrast, ADC-2 integrates scheduling and coding to make a broadcast decision. 14

Algorithm 1 Coding Strategy AC 1: Input: V (G) and candidate vertex vmk 2: Output: The maximum clique of candidate vertex vmk ′ ′ ′ 3: Step 1: Construct the undirected graph G (V , E ) according to V (G) and candidate vertex vmk ′ ′ 4: // / Initialize V (G ) with the candidate vertex vmk ′ ′ ′ ′ 5: V (G ) ← V (G ) + vmk 6: for each vij ∈ V (G) do 7: if j = k then ′ ′ ′ ′ 8: V (G ) ← V (G ) + vij ′ ′ ′ ′ 9: E (G ) ← E (G ) + e(vmk , vij ) 10: end if 11: if dk ∈ Si AND dj ∈ Sm then ′ ′ ′ ′ 12: V (G ) ← V (G ) + vij ′ ′ ′ ′ 13: E (G ) ← E (G ) + e(vmk , vij ) 14: end if 15: end for ′ 16: Step 2: Find and return the maximum clique δmax in graph G where vmk ∈ δmax

15

Table 2: Summary of Notations

Notation Description Ndk Request frequency of data item dk Wdk The amount of time that the oldest outstanding request for data item dk has spent in the service queue Pdk Priority of data item dk 5.3.1. Algorithm ADC-1 We propose the first coding assisted algorithm called ADC-1(Adaptive Demand-oriented Coding) in this section. The proposed algorithm includes two phases, scheduling phase and coding phase. In the scheduling phase, one of the best-performing scheduling algorithms in the literature, RxW [16], is used to find the data item with the highest priority. The priority of a data item is the product of the data item’s broadcast productivity and the waiting time of the oldest pending request for the data item. In the coding phase, coding strategy AC is invoked to find the maximum clique of the highestpriority data item identified in the scheduling phase. Lastly, ADC-1 adopts the bitwise XOR operation to generate the encoded packet for all data items included in the maximum clique. As a result, ADC-1 not only combines the strength of FCFS and MRF to earn performance gain but also exploits the merits of AC as described in the last section to boost the performance. Like conventional scheduling algorithms for data broadcast, ADC-1 is implemented at the server side only. The pseudo code of ADC-1 is presented in Algorithm 2, and primary notations used in the pseudo code are summarized in Table 2. In the scheduling phase, ADC-1 needs to search all requested data items to identify the candidate data item for encoding. The time complexity for finding this candidate data item is , where M is the number of requested data items in the system. In the coding phase, ADC-1 invokes AC once, according to the candidate data item. The time complexity of AC is polynomial. Thus, the total time complexity of ADC-1 is polynomial. 5.3.2. Proposed algorithm ADC-2 Although ADC-1 considers both scheduling and coding, this is not real integration because the two mechanisms are carried out in separate phases. The outcome of the scheduling phase becomes the input to the coding phase. Each phase maximizes its own performance gain in isolation. For instance, 16

Algorithm 2 ADC-1 1: Arrival of a new request Qi for dk 2: // Update the request frequency of dk 3: Ndk ← Ndk + 1 4: // Add dk ’s corresponding vertex into V (G) 5: V (G) ← V (G) + vik 6: Generation of encoded packet 7: // Scheduling phase (using RXW) 8: maxP riority ← 0 9: for each requested data item dk in the service queue do 10: Pdk ← Ndk ∗ Wdk 11: if Pdk > maxP riority then 12: maxP riority ← Pdk 13: Selected data item ← dk 14: end if 15: end for 16: // Coding phase (using AC) 17: the vertex that corresponds to the Selected data item with the longest waiting time is identified as the candidate vertex vmk 18: // Generate the encoded packet γ for the candidate vertex vmk 19: Invoke AC to find the maximum clique of the candidate vertex vmk 20: Compute Dδmax of δmax 21: γ = dδmax (1) ⊕ dδmax (2) · · · ⊕ dδmax (|Dδmax |) ,where dδmax (i) ∈ Dδmax 22: Broadcast the encoded packet γ 23: After the encoded packet γ has been broadcast 24: // Update request frequency of dk and V (G) according to the vertices in the maximum clique δmax 25: for each vij ∈ δmax do 26: Ndj ← Ndj − 1 27: V (G) ← V (G) − vij 28: end for

17

the scheduling phase does not consider the gain due to encoding of multiple data items while the coding phase ignores scheduling criteria such as the longest waiting time first in its decision-making process. In particular, ADC1 finds the maximum clique for only the candidate vertex identified by RxW in the scheduling phase. Obviously, this cannot fully exploit the effectiveness of network coding. Thus, we propose another coding assisted algorithm called ADC-2. The core idea of ADC-2 is to schedule the encoded data item that can maximize the effectiveness of network coding while preserving the original scheduling criteria. On the one hand, to maximize the effectiveness of network coding, we find the vertex with the largest maximum clique size. This also accounts for broadcast productivity, which is one of the scheduling criteria in RxW. On the other hand, a weight as defined below is assigned to each of the vertices to represent the waiting time of the corresponding request, which is another scheduling criteria in RxW. Definition 5.3. Let W Tvij be the weight of vertex vij . It is defined as the amount of time for which request Qi for data item dj has been waiting at the server. It reflects the urgency of the requested data item. From the viewpoint of scheduling, request productivity and request urgency are two important factors when making scheduling decisions, i.e. it is inadequate for a scheduling algorithm to consider only one of these facvij tors. In the proposed coding assisted algorithm, |δmax | reflects broadcast productivity because it denotes the number of clients that will be satisfied by broadcast of the encoded packet. In order to minimize the response time, we make the following intuitive observations: • Given two vertices with the same weight (WT ), the vertex with a larger vij maximum clique size (|δmax |) should be chosen to serve more clients with a single encoded packet. v

ij • Given two vertices with the same maximum clique size (|δmax |), the vertex with the higher weight (WT ) should be chosen to help reduce the response time by increasing the probability of requests waiting for longer time being assigned service.

Motivated by the above observations, we propose to assign the following priority Pvij to each vertex vij .

18

Definition 5.4. Let Pvij be the priority of a vertex vij , Pvij is defined vij vij as :Pvij = |δmax | ∗ W Tvij . Note that the maximum clique δmax which covers vertex vij with the highest priority Pvij is used to generate the encoded packet for broadcasting. Definition 5.5. Let Dgvij be the degree of vertex , which is the number of vertices connected to vertex vij . A straightforward implementation of ADC-2 is to compute, at each broadcast unit, the priority of all vertices in the graph and generate the encoded packet from the maximum clique which covers the vertex with the highest priority. However, such an implementation has high computational complexity because the time complexity of finding the maximum clique is polynomial [27], and it is necessary to find the maximum clique of each vertex in the graph in order to compute its priority (Def. 5.4). An efficient implementation of ADC-2 is presented next. We introduce two mechanisms in ADC-2 that make it efficient and practical. First, inspired by [16], an efficient data structure and a pruning technique are used to reduce the search space. In other words, ADC-2 examines fewer vertices when finding the vertex with the highest priority. The second mechanism is based on the fact that the degree of a vertex (Def. 5.5) is the upper bound of its maximum clique size. Recall that every two vertices in a clique are connected. By using the degree of a vertex, it is possible to skip computation of priority of some vertices under examination. Details of the proposed efficient implementation of ADC-2 are described below. When a new request arrives at the server, we add the request’s corresponding vertex into V (G). ADC-2 uses two sorted lists named D-list and W-list to index the vertices in V (G). In the D-list, vertices are sorted in descending order of the value. In the W-list, vertices are sorted in descending order of the WT value. Initially, the search starts by examining the vertex at the head of the D-list and setting MAX to its priority, denoting the highest priority of a vertex found so far. The WT values can then be bounded using the Dg value of the next vertex in the D-list. Since the D-list is sorted in descending order of the value, for any unexamined vertex in the W-list to have a priority greater than MAX, it must have a WT value greater than AX limit(W T ) = NMextDg , where NextDg is the Dg value of the next unexamined vertex in the D-list. Since the W-list is sorted in descending order of the WT value, this limit marks a point in the W-list below which no vertex can exceed 19

the current highest priority and, therefore, no search is required. Next, the AX vertex at the head of the W-list is examined and the limit(Dg) = NM , extW T where NextWT is the WT value of the next unexamined vertex in the W-list. Then, ADC-2 searches the D-list and W-list in an alternating fashion and updates the two limits accordingly. The MAX value is updated only when a vertex with a priority higher than MAX is found. Recall that the Dg value of a vertex is the upper bound of its maximum clique size. In other words, the priority of a vertex can not exceed its Dg ∗ W T value. Thus, ADC-2 can skip computation of priority of an examined vertex that has a Dg ∗W T value smaller than MAX and hence there is no need to find the maximum clique of this examined vertex. The search process stops when one of the limits is passed. At this point, MAX is the highest priority of all vertices in V (G) and the maximum clique that covers the vertex with this highest priority is used to generate the encoded packet for broadcasting. The pseudo code of algorithm ADC-2 is presented is presented in Algorithm 3. Figure 4 shows an example of efficient implementation of ADC-2. In the example, the D-list and W-list are two separate lists indexed to the set of vertices. First, we examine the first vertex v52 in the D-list. MAX is set as the priority of v52 . Suppose the maximum clique size of v52 is 40, which is smaller than its degree value (100). Thus, we set MAX = δmax ∗W Tv52 = 40∗5 = 200, M AX limit(W T ) = Dg = 200 = 3.33 That is, there is no need to search vertices 60 v33 with WT value smaller than 3.33 in the W-list because they cannot have a priority higher than MAX. Alternately, we examine the first vertex v67 in the W-list. The Dg ∗ W T value of vertex v67 is 180, which is smaller than MAX. Therefore, there is no need to calculate the priority of v67 and MAX M AX remains unchanged. So, we set limit(Dg) = W = 200 = 2.67. Similarly, Tv86 75 there is no need to search vertices with Dg values smaller than 2.67 in the D-list. Next, we examine the second vertex v33 in the D-list. The Dg ∗ W T value of vertex v33 is 1200, which is greater than MAX. Therefore, we need to calculate the priority of v33 . Suppose the maximum clique size of v33 is 45. Then the priority of v33 is 45 ∗ 20 = 900, which is greater than MAX, so M AX we update MAX to 900, and set limit(W T ) = Dg = 900 = 36. Then, the 25 v12 second vertex v86 in the W-list is examined. Dg ∗ W T of v86 is 300, which is smaller than MAX. So, there is no need to calculate the priority of v86 M AX and MAX remains unchanged. So, we set limit(Dg) = W = 900 = 25.7 Tv93 35 . The search is stopped here since the next unexamined vertex in the D-list 20

Algorithm 3 ADC-2 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53:

Arrival of a new request Qi for dk // Add dk ’s corresponding vertex into V (G) V (G) ← V (G) + vik Link vik and its Dg value in the D-list in descending order of Dg value Link vik and its WT value in the W-list in descending order of WT value Generation of encoded packet // find the maximum clique that covers the vertex with the highest priority M AX ← 0 pd ← the head of D-list pw ← the head of W-list limit(Dg) ← 0 limit(W T ) ← 0 while pd 6= N U LL OR pw 6= N U LL do if pd → Dg ≥ limit(Dg) then if pd → Dg ∗ W T > M AX then Invoke AC to find the maximum clique of the vertex pointed by pd Calculate the priority P of the vertex pointed by pd if P > max then M AX ← P δmax ←the maximum clique of the vertex with MAX end if end if Advance pd to the next unexamined vertex in the D-list if pd 6= N U LL then MAX limit(W T ) ← pd→Dg end if end if else break // pd → Dg < limit(Dg) if pw → W T ≥ limit(W T ) then if Dg ∗ pw → W T > M AX then Invoke AC to find the maximum clique of the vertex pointed by pw Calculate the priority P of the vertex pointed by pw if P > max then M AX ← P δmax ←the maximum clique of the vertex with MAX end if end if Advance pw to the next unexamined vertex in the W-list if pw 6= N U LL then MAX limit(Dg) ← pw→W T end if end if else break // pw → W T < limit(W T ) end while // Generate the encoded packet γ according to δmax Compute Dδmax of δmax γ = dδmax (1) ⊕ dδmax (2) · · · ⊕ dδmax (|Dδmax 21|) ,where dδmax (i) ∈ Dδmax Broadcast the encoded packet γ After the encoded packet γ has been broadcast //Update V (G) according to the vertices in maximum clique δmax for each vij ∈ δmax do V (G) ← V (G) − vij end for

Max Dg

Dg value

WT value

v52

100

90

v67

v33

60

75

v86

v12

40

35

v93

v46

20

30

v12

v75

15

10

v33

v93

8

5

v52

v86

4

2

v75

v67

2

1

v46

Max WT

Figure 4: Numerical example of pruning the search space

has a Dg value smaller than limit(Dg). Thus, MAX is the highest priority among all vertices and the maximum clique which covers vertex v33 is used to generate the encoded packet. In total, we only need to examine four vertices to find the vertex with the highest priority for encoding, in spite of a large number of vertices in both lists. Moreover, among these four vertices, we need to find the maximum clique of only two of them. Altogether, these two mechanisms can effectively reduce the computational complexity of ADC-2 and make it efficient and practical. 6. Performance Evaluation 6.1. Simulation model We conducted a detailed set of experiments for evaluating performance, using a simulation model written in CSIM [30]. The model is based on the system architecture described in Section 3. The main parameters and their settings for the experiments are shown in Table 3 and these default values are used in various simulation experiments, unless stated otherwise. Each client is simulated by a process which generates a stream of requests. Upon the completion of a request, a new request is generated after an exponentially distributed think time, which is controlled by parameter f . A lower value of f means a shorter think time and thus a heavier workload. The data access pattern is shaped with the Zipf distribution [31], with a skewness parameter θ, where 0 ≤ θ ≤ 1. Upon receiving a requested data 22

Parameter f NUMCLIENT DBSIZE CACHESIZE θ

Table 3: System Default Range 0.1 0.05-2 300 100-500 800 450-1400 160 30-280 0.8

0-1.0

Parameter Settings Description Think time control parameter Number of clients Number of data items in database Maximum number of data items stored in each client’s cache Zipf distribution parameter

item broadcast on the channel, a client stores the data item in its local cache and the LRU (Least Recently Used) policy is used for cache replacement. 6.2. Performance metrics In the simulation, we use the following metrics for performance evaluation and analysis. Note that statistics of only requests submitted to the server are collected in the experiments. Requested data items that can be found in a client’s local cache are ignored. This excludes the performance gain attributed to the improved data availability provided by the cache. 1) Average request response time. It is the duration of time from the instant a request arrives at the server to the time when the required data item is received. This is an important criterion to evaluate the performance of an algorithm because it measures the responsiveness of a system. The primary goal of a broadcast algorithm is to minimize the average request response time. 2) Average number of data items encoded in each broadcast unit. It is the average number of data items selected to form an encoded packet in each broadcast unit. This metrics measures the flexibility of the coding strategy, which reflects its ability to utilize the clients’ states effectively. 3) Broadcast productivity. It is the average number of requests satisfied by each broadcast. This metrics measures the productivity of the server in serving requests and reflects the combined effect of data scheduling and coding for coding assisted algorithms on overall system performance. 6.3. Experimental results In this section, we present results of performance tests of different algorithms based on a detailed simulation study of an on-demand broadcast system. The results are obtained when the system was in a steady state and

23

the simulations continued until a confidence interval of 0.95 with half-width less than 5% about the mean has been achieved. For comparison purpose, we also implement algorithms RxW [16] and OE [19]. RxW is one of the best-performing scheduling algorithms for ondemand broadcast environments and it outperforms FCFS, MRF and LWF in a variety of circumstances. OE is a recently proposed coding assisted algorithm that considers both scheduling and coding. 1. Effect of Cache Size Figure 5(a) shows the average request response time of each algorithm under different cache sizes. Since RxW does not consider network coding in scheduling decisions, the client’s cache size has no effect on its performance. Compared with the coding assisted algorithms, RxW performs the worst. This confirms that network coding is effective in improving performance of data broadcast systems. In the other three coding assisted algorithms, performance is improved when the cache size increases. OE has the least gain, ADC-1 ranks the second and ADC-2 has the best performance. When the cache size is small, all coding assisted algorithms have similar performances. With an increasing cache size, improvement on reducing the average request response time in ADC-1 and ADC-2 becomes more prominent, while the improvement in OE is inconspicuous and its performance becomes steady when the cache size is larger than 200. The results confirm our earlier discussion on the weakness of OE. First, OE is inflexible as it encodes a small fixed number of data items in each broadcast unit. Second, OE is not adaptive. With an increasing cache size, OE fails to utilize the cache information effectively in making coding decisions. Thus, OE cannot fully present the effectiveness of network coding in data broadcast environments whereas ADC-1 and ADC-2 conquer the weakness of OE and encode variable number of data items in each broadcast unit to maximize the effectiveness of network coding in dynamic circumstances. Therefore, ADC-1 and ADC-2 perform significantly better than OE. Figure 5(b) depicts analysis of coding power of OE, ADC-1 and ADC-2, showing the average number of data items encoded in each broadcast unit under different cache sizes. As the cache size increases, there is more room for performance gain due to coding. This is because the chance for a client’s cache to store a data item being requested by other clients becomes higher as the cache size increases. Figure 5(b) reflects the ability of a coding strategy 24



5;:

2(

$'&

$'&

$YHUDJH1XPEHURIGDWDLWHPVHQFRGHG LQHDFKEURDGFDVWXQLW

$YHUDJH5HVSRQVH7LPH %URDGFDVW8QLW









 





  &DFKH6L]H



4

0

4

3

0

2

2

1

5

%

D

C

!

1

A

D

C

!

2

%

%

5

0

%

%

5

0

 



%

0

1







  &DFKH6L]H







%

A

3



%

5

2(

(b) Average number of data items encoded in each broadcast unit

%

5

$'&





%URDGFDVW3URGXFWLYLW\

'LVWULEXWLRQRIQXPEHURIGDWD LWHPVHQFRGHGLQDSDFNHW

0

$'&





(a) Average request response time 5





5;:

2(

$'&

$'&

     

%

%

 2

1

N

u

m

b

e

r

o

f

3

d

a

t

a

i

t

4

e

m

s

e

n

5

c

o

d

e

d

i

6

n

e

a

c

h

7

b

r

o

a

d

c

a

8

s

t

t

i

c



k

(c) Distribution of number of data items encoded in a packet for the default setting





(d) Broadcast productivity

Figure 5: Performance under different cache sizes

25

  &DFKH6L]H





to exploit opportunity to enhance its coding power. When the cache is small, the number of data items encoded in each broadcast unit in the proposed schemes is very close to that in OE. This explains why they have similar performances in Figure 5(a), when the cache is small. With an increasing cache size, ADC-1 and ADC-2 encode more data items in each broadcast unit than OE. In other words, the proposed schemes can effectively exploit more coding opportunity. Also, ADC-2 encodes consistently more data items in each packet than ADC-1, implying that although both algorithms adopt the same coding strategy and scheduling criteria, the integrated approach of ADC-2 can better exploit the combined strength of data scheduling and network coding. To facilitate better understanding, Figure 5(c) demonstrates the coding flexibility of the proposed algorithms statistically by showing the distribution of number of data items encoded in a packet under the default setting. Recall that, OE always encodes two data items in each broadcast unit. In contrast, ADC-1 and ADC-2 encode different number of data items in packets. Figure 5(d) shows the broadcast productivity of each algorithm under different cache sizes. This metrics is used to measure the productivity of the server, thus measuring the overall performance of an algorithm. For a coding assisted algorithm, there are two factors contributing to broadcast productivity. First, due to the intrinsic nature of data broadcast systems, a data item can serve more than one pending requests for the same data item at a time. Second, using network coding, clients requesting different data items can also be served by an encoded packet simultaneously. As shown in Figure 5(d), RxW has the lowest broadcast productivity because it does not adopt any coding strategy to increase broadcast productivity. Among the three coding assisted algorithms, OE has the worst performance. It does not show much improvement when the cache size becomes larger. Again, this indicates that OE is less effective with a large cache because its coding mechanism is inflexible and not adaptive. Therefore, it can not help to maximize the broadcast productivity. In contrast, ADC-2 gains the highest broadcast productivity among all the algorithms. On the one hand, it effectively exploits the information of clients’ states and, therefore, opportunity for encoding in order to maximize the number of clients served with different data items. On the other hand, it merges network coding into data scheduling and further exploits the synergy between the two mechanisms. As a result, ADC-2 serves the most number of clients with each encoded packet.

26

5;: $'&

$YHUDJH5HVSRQVH7LPH %URDGFDVWXQLW



$YHUDJH1XPEHURIGDWDLWHPVHQFRGHG LQHDFKEURDGFDVWXQLW

 2( $'&

     





 'DWDEDVH6L]H





(a) Average request response time

 $'&



%URDGFDVW3URGXFWLYLW\



    



 'DWDEDVH6L]H

5;: $'&

2( $'&

  'DWDEDVH6L]H



        





(b) Average number of data items encoded in each broadcast unit





2(



 

$'&

(c) Broadcast productivity Figure 6: Performance under different database sizes

27

2. Effect of database size Figure 6 shows the performance of each algorithm under different database sizes. With an increasing database size, data access is distributed over a larger number of data items and the chance of accessing each data item is therefore reduced. Consequently, the number of pending requests that can be served by a broadcast data is reduced and more bandwidth is required to serve all requests. Similarly, coding opportunity is also reduced. It can be observed in Figure 6(b) that a fewer number of data items are encoded in each broadcast with an increasing database size for all coding assisted algorithms. As a result, increasing the database size inevitably reduces the broadcast productivity of the server, which can be observed in Figure 6(c). Similar to the last set of experiments, RxW, which does not consider network coding, performs the worst over the whole range. Among the three coding assisted algorithms, ADC-2 performs the best, ADC-1 ranks second and OE comes the last. 3. Effect of system workload Figure 7 shows the performance of each algorithm under different numbers of clients. More clients submitting requests indicates a heavier workload. As shown in Figure 7(a), the average request response time increases when the workload becomes heavy. Given a certain database size, increasing the number of clients increases the number of times each data item is accessed. Therefore, in spite of a heavier workload, more clients provide more coding opportunities (Figure 7(b)) implying a higher broadcast productivity (Figure 7(c)) because a broadcast can potentially satisfy more requests. Among all algorithms, ADC-1 and ADC-2 have the best performance. 4. Effect of data access pattern Figure 8 shows the performance of each algorithm under different data access patterns. When THETA equals 0, the data access pattern follows The Uniform distribution and probability of every data item being accessed is the same. The data access pattern becomes more and more skewed as value of THETA increases. From Figure 8, it is evident that performance improves for each algorithm when THETA increases. According to previous studies on data broadcast [17, 26, 32], scheduling algorithms have better performance when the data access pattern becomes more skewed because there is a higher potential to satisfy more requests in each broadcast. The same trend can be 28

$YHUDJH5HVSRQVH7LPH %URDGFDVW8QLW



5;:

2(

$'&

$'&

$YHUDJH1XPEHURIGDWDLWHPVHQFRGHG LQHDFKEURDGFDVWXQLW



       

   1XPEHURI&OLHQWV



(a) Average request response time

 

$'&

$'&

2(

        

   1XPEHURI&OLHQWV

(b) Average number of data items encoded in each broadcast unit



%URDGFDVW3URGXFWLYLW\

 

5;:

2(

$'&

$'&

        

   1XPEHURI&OLHQWV





(c) Broadcast productivity Figure 7: Performance under different numbers of clients

29

$YHUDJH1XPEHURIGDWDLWHPVHQFRGHGLQ HDFKEURDGFDVWXQLW

$YHUDJH5HVSRQVH7LPH %URDGFDVW8QLW

  

5;:

2(

$'&

$'&

     



  7+(7$





(a) Average request response time

 

$'&

$'&

2(

         



  7+(7$

%URDGFDVW3URGXFWLYLW\

5;:

2(

$'&

$'&

     











(b) Average number of data items encoded in each broadcast unit

 





7+(7$

(c) Broadcast productivity Figure 8: Performance under different data access patterns

30

observed in Figure 8 for all the algorithms. Consistent with previous results, the relative performance of the algorithms remains unchanged in this set of experiments. 7. Conclusion In most of the studies on on-demand data broadcast, mobile users can retrieve only one data item in each broadcast unit. However, this constraint restricts full utilization of the limited broadcast bandwidth. In this paper, we apply network coding to data broadcast and analyze the coding problem in on-demand broadcast environments. The CR-graph is used to capture the relationships among all requests based on the information about clients’ cached and requested data items. We transform the coding problem of serving the maximum number of clients with each encoded packet into the problem of finding the maximum clique in the CR-graph. Based on the CR-graph, a flexible coding strategy called AC is developed. Then, two novel on-demand broadcast algorithms that combine request scheduling and network coding called ADC-1 and ADC-2 are proposed. ADC-1 has two distinct phases. A candidate request with the highest priority is identified in the scheduling phase. In the coding phase, AC is invoked to find the maximum clique of this candidate request and all requests covered by this maximum clique can be served by the encoded packet in a single broadcast unit. In contrast, ADC-2 merges data scheduling and network coding into a single step in order to fully exploit the combined strength of data scheduling and network coding. In addition, an efficient implementation of ADC-2 is proposed to reduce computational complexity of the algorithm. Simulation results show that both ADC-1 and ADC-2 considerably outperform conventional as well as other coding assisted algorithms over a wide variety of circumstances and ADC-2 has the best overall performance. The efficiency of network coding in data broadcast environments relies heavily on knowledge of clients’ requested and cached data items. Since caching pattern and cache replacement policy adopted at a client affect its cache’s content, which in turn determines for which data items the client would request, client-side management issues play an important role in coding assisted data broadcast systems. In this work, we have studied the coding problem on the server side. In our future work, we plan to study the caching problem on the client-side and the combined effect of broadcast strategy and cache management on the overall performance. 31

Acknowledgement This work is sponsored by the Social Science Foundation from the Ministry of Education, China (Grant No. 10YJC630021), the National Natural Science Foundation of China (Grant No. 71202120) and the Wuhan University Academic Development Plan for Scholars after 1970s (”Research on Internet User Behavior”). References [1] H. Jung, Y. Chung, L. Liu, Processing generalized k-nearest neighbor queries on a wireless broadcast stream, Information Sciences 188 (2011) 64–79. [2] C. Yu, D. Yao, X. Li, Y. Zhang, L. Yang, N. Xiong, H. Jin, Locationaware private service discovery in pervasive computing environment, Information Sciences, 2012. [3] H. Dykeman, M. H. Ammar, J. Wong, Scheduling algorithms for videotext systems under broadcast delivery, in: Proceedings of the International Conference on Communications (ICC’96), 1996, pp. 1847–1851. [4] H. Dykeman, J. Wong, A performance study of broadcast information delivery systems, in: 7th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM’88), 1988, pp. 739– 745. [5] V. Lee, K. Liu, Scheduling time-critical requests for multiple data objects in on-demand broadcast, Concurrency and Computation: Practice and Experience 22 (15) (2010) 2124–2143. [6] K. Liu, V. Lee, On-demand broadcast for multiple-item requests in a multiple-channel environment, Information Sciences 180 (22) (2010) 4336–4352. [7] J. Wang, Set-based broadcast scheduling for minimizing the worst access time of multiple data items in wireless environments, Information Sciences 99 (2012) 93–108.

32

[8] X. Wang, J. Wang, S. Zhang, Network coded wireless cooperative multicast with minimum transmission cost, International Journal of Distributed Sensor Networks, 2012. [9] C. Fragouli, J. Widmer, J. Boudec, A network coding approach to energy efficient broadcasting: from theory to practice, in: 25th Annual Joint Conference of the IEEE Computer and Communications Societies (Infocom’06), 2006, pp. 1–11. [10] D. Lun, N. Ratnakar, R. Koetter, M. Medard, E. Ahmed, H. Lee, Achieving minimum-cost multicast: A decentralized approach based on network coding, in: 24th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM’05), 2005, pp. 1607– 1617. [11] E. Rozner, A. Iyer, Y. Mehta, L. Qiu, M. Jafry, ER: efficient retransmission scheme for wireless LANs, in: Proceedings of the 2007 ACM CoNEXT Conference, 2007. [12] J. Chen, V. Lee, C. Zhan, Efficient processing of real-time multi-item requests with network coding in on-demand broadcast environments, in: IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA’09), 2009, pp. 119–128. [13] M. Chaudhry, A. Sprintson, Efficient algorithms for index coding, in: INFOCOM Workshops, 2008, pp. 1–4. [14] S. El Rouayheb, M. Chaudhry, A. Sprintson, On the minimum number of transmissions in single-hop wireless coding networks, in: Information Theory Workshop (ITW’07), 2007, pp. 120–125. [15] K. Tan, B. Ooi, Batch scheduling for demand-driven servers in wireless environments, Information Sciences 109 (1-4) (1998) 281–298. [16] D. Aksoy, M. Franklin, R× W: a scheduling approach for large-scale ondemand data broadcast, IEEE/ACM Transactions on Networking 7 (6) (1999) 846–860. [17] J. Xu, X. Tang, W. Lee, Time-critical on-demand data broadcast: algorithms, analysis, and performance evaluation, IEEE Transactions on Parallel and Distributed Systems 17 (1) (2006) 3–14. 33

[18] Y. Birk, T. Kol, Coding on demand by an informed source (ISCOD) for efficient broadcast of different supplemental data to caching clients, IEEE/ACM Transactions on Networking 14 (2006) 2825–2830. [19] C. Chu, D. Yang, M. Chen, Multi-data delivery based on network coding in on-demand broadcast, in: Proceedings of the 9th International Conference on Mobile Data Management (MDM’08), 2008, pp. 181–188. [20] D. Nguyen, T. Tran, T. Nguyen, B. Bose, Wireless broadcast using network coding, IEEE Transactions on Vehicular Technology 58 (2) (2009) 914–925. [21] D. Yang, M. Chen, On bandwidth-efficient data broadcast, IEEE Transactions on Knowledge and Data Engineering 20 (8) (2008) 1130–1144. [22] D. Yang, M. Chen, Data Broadcast with adaptive network coding in heterogeneous wireless networks, IEEE Transactions on Mobile Computing 8 (1) (2009) 109–125. [23] S. Li, R. Yeung, N. Cai, Linear network coding, IEEE Transactions on Information Theory 49 (2) (2003) 371–381. [24] J. Wang, K. Jea, A near-optimal database allocation for reducing the average waiting time in the grid computing environment, Information Sciences 179 (21) (2009) 3772–3790. [25] K. Liu, V. Lee, Simulation studies on scheduling requests for multiple data items in on-demand broadcast environments, Performance Evaluation 66 (7) (2009) 368–379. [26] C. Hu, Fair scheduling for on-demand time-critical data broadcast, in: Proceedings of IEEE International Conference on Communications (ICC’07), 2007, pp. 5831–5836. [27] A. Dharwadker, The http://www.dharwadker.org/clique (2006).

clique

algorithm,

[28] F. Gavril, Algorithms for a maximum clique and a maximum independent set of a circle graph, Networks 3 (3) (2006) 261–273. ¨ [29] P. Osterg˚ ard, A fast algorithm for the maximum clique problem, Discrete Applied Mathematics 120 (1-3) (2002) 197–207. 34

[30] H. Schwetman, CSIM Guides (version 19), MCC Corporation, http://www. mesquite. com (2001). [31] G. Zipf, Human behavior and the principle of least effort: an introduction to human ecology, Addison-Wesley Press, 1949. [32] J. Chen, V. Lee, K. Liu, On the performance of real-time multi-item request scheduling in data broadcast environments, Journal of Systems and Software 83 (8) (2010) 1337–1345.

35

Suggest Documents