Optimal Cooperative Multi-Source Multimedia ... - IEEE Xplore

2010 Second International Conference on MultiMedia and Information Technology

Optimal Cooperative Multi-Source Multimedia Transmission Scheduling in Peer-to-Peer Networks Pengbo Si, Yanhua Sun, Yanhua Zhang F. Richard Yu College of Electronics Information and Control Engineering, Department of Systems and Computer Engineering, Beijing University of Technology, Beijing, China Carleton University, Ottawa, ON Canada Email: {sipengbo, sunyanhua, zhangyh}@bjut.edu.cn Email: richard [email protected]

Abstract—In peer-to-peer (P2P) networks, multiple sources multimedia transmission is one of the key technology in peer-to-peer (P2P) networks because of simultaneous multimedia file sharing. However, allowing all available sources to transmit to the destination node may result in serious network congestion. Thus the selection of multiple sources is one of the key issues in the design of multi-source multimedia transmission systems in P2P networks. In this paper, a cooperative multi-source sender selection scheme to minimize the multimedia distortion is proposed, based on recent advances in restless bandits algorithms. The proposed sender selection scheme has an indexability property that dramatically simplifies the computation and implementation of the policy. Furthermore, centralized control point is not necessary in the proposed scheme, and senders can join and leave from the P2P network freely. Extensive simulation results show that the proposed scheme reduces the distortion significantly compared to the existing scheme. Keywords-multi-source transmission; multimedia transmission; peer-to-peer; sender scheduling; restless bandits problem.

I. I NTRODUCTION Recently, with the rapid developing of peer-to-peer (P2P) technologies, the P2P applications have been growing in a tremendous speed currently, being over 50% of the global Internet traffic. Sharing multimedia resources among users is an important application in P2P networks. P2P lookup protocols, including Chord [1] and Pastry [2], are proposed to find out where the desired resources are located. Since the desired resource may be available in more than one node, multiple sources can provide the multimedia resource sharing at the same time. BitTorrent system [3] is one of the typical multi-source P2P applications. Network coding is introduced in [4] to improve the multi-source transmission system performance. In [5], the authors study the effects of source and channel coding on multi-source multimedia transmissions and propose a packet loss protection scheme for mobile ad hoc networks. The selection of multiple sources is one of the key issues in the design of P2P networks for the multi-source transmission system in P2P networks. Although some work has been done in the network architectures and protocols of P2P networks, the selection of multiple sources problem is not well solved as far as we know. The states of the users in P2P networks are considered as the source selection criterion in [6]. Although the architecture of multi-source streaming is studied in [7], no further development of the 978-0-7695-4008-5/10 $26.00 © 2010 IEEE DOI 10.1109/MMIT.2010.118

sender selection problem is reported. The work in [8] and [9] focuses on the multi-source selection in wireless P2P networks with the objective of maximize the transmission data rate. However, important quality-of-service (QoS) parameters of multimedia services such as multimedia distortion is ignored. In this paper, a distributed sender selection scheme for multi-source multimedia transmission in P2P networks is proposed, with the optimization objective of minimizing the multimedia distortion and power consumption of the senders. Our scheme is based on recent advances in restless bandit algorithms [10], [11]. The restless bandit formulation is an extension of the classical multiarmed bandit [12], which is a special type of stochastic control problems. This paper focuses on the application of restless bandit to the multi-source multimedia transmission problem in P2P networks. The distinct features of the proposed scheme include: • The sender selection policy has an indexability property that dramatically simplifies the computation and implementation of the policy. Among the potential senders that have the desired multimedia resource, the optimal selection policy is to simply choose the senders with the lowest indices. • A sender can compare its index with others in a distributed manner. There is no need for a centralized control point in the proposed scheme, and senders can join and leave from the networks freely. Therefore, the proposed scheme is fully distributed and scalable. • Minimizing the multimedia distortion are the objective in the proposed scheme. • Extensive simulation results are presented. It is illustrated that the proposed scheme improves the multimedia distortion performance significantly compared to the existing scheme. The rest of this paper is organized as follows. In Section II, the multi-source multimedia transmission scheduling problem is introduced. We formulate the problem as a restless bandit system in Section III. Section IV describes the selection process. Simulation results are presented in Section V. Section VI concludes this study. 263 259 255

II. M ULTI - SOURCE M ULTIMEDIA T RANSMISSION IN P2P N ETWORKS A. Multimedia Distortion The objective of the optimization is to minimize the multimedia distortion. Recent advanced coding algorithms, such as H.264 and MPEG-4, use rate control mechanism to control the video encoder output bit rate and error resilience mechanism for error protection [13]. Intra-refreshing, also called intra-update, of macroblocks (MBs) is an important approach for rate control and error protection. An intra coded MB does not need information from previous frames that may have already been corrupted by channel errors. This makes intra coded MBs an effective way to mitigate error propagation. Alternatively, with inter-coded MBs, channel errors from previous frame may propagate to the current frame. In P2P networks, different sources provide different data rates and different link quality to the receiver. Given a data rate in a network, authors in [13] provide a closed form distortion model taking into account varying characteristics of the input video, coding algorithm, and the intrarefreshing rate. We will use this rate-distortion model in our study. The total distortion comprises of the quantization distortion introduced by the lossy video encoder to meet a target bit rate and the distortion resulting from channel errors, which will be presented in the following subsections. 1) Source Distortion: The source distortion is given by Ds (Hs , ξ) = Ds (Hs , 0) + ξ(1 − η + ηξ)[Ds (Hs , 1) − Ds (Hs , 0)], where Hs is the source coding rate, ξ is the intra-refreshing rate, η is a constant based on the multimedia sequence. Ds (Hs , 0) and Ds (Hs , 1) are the time average all inter- and intra-mode selection for all frames. Yk T −1 1 1 Ds (Hs , 0, y), Ds (Hs , 0) = T Yk y=1

(1)

k=0

where Yk is the number of inter/intra frames at epoch tk . 2) Channel Distortion: According to the rate-distortion model [13], the average channel distortion is given by Ω1 ψ Dc (ψ, ξ) = E[Fd (y, y − 1)], 1 − Ω2 + Ω2 ξ 1−ψ (2) where ψ is the packet loss rate, Ω1 is the energy loss ratio of the encoder filter, Ω2 is a constant based on the motion randomness of the multimedia data, and E[Fd (y, y − 1)] is the average value of the frame difference Fd (y, y − 1) over the epochs. 3) Optimal Intra-refreshing Rate: The total distortion is D(Hs , ψ, ξ) = Ds (Hs , ξ) + Dc (ψ, ξ). Thus the optimal ξ ∗ to minimize the total distortion is given by ξ ∗ = arg min D(Hs , ψ, ξ). ξ

B. Multi-source Transmission In this paper, we consider the multimedia transmission from multiple sources. Users in P2P networks share their local multimedia resources by P2P applications. A user who wants to obtain a multimedia resource that is not stored in its local device will look up the desired resource in the network. This user is called the receiver in this paper. It’s probable that N (N ≥ 1) nodes have the desired resource. These nodes are called the potential senders. M (M ≤ N ) nodes among the N potential senders are set to be active at the same time to transmit the multimedia resource to the receiver. C. Network Model The concept of time slot, which is equivalent to a period of time, is used in this paper. Assume that the transmission duration is divided into T equal-length time slots. The action of the each sender n ∈ N = {1, 2, . . . , N } in time slot t ∈ T = {0, 1, . . . , T − 1} is represented by an (t), an (t) ∈ A = {0, 1}, in which an (t) = 0 means the sender n is passive in time slot t, and an (t) = 1 means active. At the beginning of the time slot t, the potentials senders cooperate with each other and decide an (t) for N node n = 1, 2, . . . , N , satisfying n=1 an (t) = M , in which M is the number of active senders in each time slot, 0 ≤ M ≤ N . We denote by M the set of active senders in time slot t. D. Route Model

k=0

Yk T −1 1 1 Ds (Hs , 1, y), Ds (Hs , 1) = T Yk y=1

To deal with the time-varying route states of the networks, we use adaptive intra-refreshing rate ξ to achieve the minimum distortion. Decreasing ξ reduces the source distortion Ds for a target bit rate. However, intercoding relies on information in previous frames. Packet loss results in error-propagation until the next intra-coded macro block is received. Thus there is a tradeoff, and our aim is to find an optimized ξ to minimize the distortion by (3).

(3)

The multi-hop route between a potential sender and the receiver can be modeled as the concatenation of a number of links, of which the congestion state may dramatically affect the multimedia transmission [14]. In this paper, we consider the link state that can be modeled as a Markov chain by dividing the continuous link congestion state into discrete levels for simplification [15]. Thus the concatenated route from sender n can also be modeled as a random variable cn evolving according to a finitestate Markov chain, which is characterized by a set of states, C = {C0 , C1 , . . . , CG−1 }, where G is the number of available route state levels. The route state realization of cn is Cn (t) for sender n in time slot t. The route state transition probability matrix of the sender taking action a is (4) Φan (t) = [φagn hn (t)]G×G , φagn hn (t) = P r(Cn (t+1) = hn |Cn (t) = gn , an (t) = a), for gn , hn ∈ C , a ∈ A . Assume that the user devices can estimate the route state according to the handshaking signals between potential

256 260 264

senders and the receiver before the active senders are selected. The values in the route state transition probability matrix can be obtained from the history observation of the ad hoc network.

the smallest value to the largest value in time slot t, the sender n’s action should be 1, if n ∈ {k1 , k2 , . . . , kM }, an (t) = 0, otherwise.

E. Restless Bandit Approach

Thus, to solve the sender selection problem, computing the indices is the key step.

The classical multiarmed bandit problem is an analogy with a traditional slot machine (one-armed bandit) but with more than one lever. It is a simple model of an agent that simultaneously attempts to acquire new knowledge and to optimize its decisions based on existing knowledge. The questions arising are related to the problem of balancing reward maximization based on the knowledge already acquired with attempting new actions to further increase knowledge. The restless bandit formulation is an extension of the classical multiarmed bandit to problems where multiple projects can be active and all projects evolve at each time instant. In restless bandit problem, in each time slot t = 0, 1, 2, . . . , M out of N projects are set to be active according to their states in ∈ S and the transition probability pain jn . Reward Rian is obtained in each time slot. Projects are selected under a policy u ∈ U , where U is the set of all admissible policies. The problem is to find the optimal u so that the optimization objective can be achieved. In this paper, we use the restless bandit approach to solve the multiple sender selection problem in multimedia P2P networks. III. R ESTLESS BANDIT F ORMULATION In this section, we formulate the multi-source selection problem as the restless bandit problem.

D. Policies We denote by U the class of all admissible policies. The admissible policy u ∈ U is a T × N matrix, whose element of the tth row and the nth column is an (t), representing the action taken by node n in time slot t. u satisfies u × κ = M × κ, where κ is a column vector with all its T elements equal to one. This u can be considered as the “active policy”, which determines the “passive policy” that is defined as u = −(u − A), where A is a T × N matrix with all elements equal to 1. Consequently, the total expected discounted reward over the time horizon r0 (t)]/T , where is Z(u) = (β 0 , β 1 , . . . , β T −1 )[ur1 (t) + u 0 < β < 1 is the discount factor. Define the optimization objective to be Z ∗ = max Z(u). (5) u∈U

∗

The optimal policy u is the policy that achieves the optimization objective. Thus the optimal policy is u∗ = arg maxu∈U Z(u). To solve the restless bandit problem, a hierarchy of increasingly stronger LP relaxations is developed based on the result of LP formulations of Markov decision chains (MDCs) [11]. Please refer to [11] for details. IV. M ULTIPLE S ENDER S ELECTION P ROCESS

A. Sender States The state of potential sender n ∈ {1, 2, . . . , N } in time slot t ∈ {0, 1, . . . , T − 1} is determined by the realization of the state Cn (t) of the random variable cn . Consequently, the state of a potential sender is simply written as in (t) = Cn (t), and the state set is represented as Sn , in (t) ∈ Sn , with the transition probability matrix Pna = φagn hn (t), where φagn hn (t) is defined in (4). The elements of Pna are pain jn , denoting the transition probability that the state of sender n changes from in to jn , where in and jn ∈ Sn . B. System Reward For the objective of minimize the multimedia distortion, assume the instantaneous system reward Rin (t)a to be E − D(Hs , ψ, ξ) at time t in state in (t) taking action a, where E is a constant. Define the reward vector as ra (t) = (Ria1 (t) , Ria2 (t) , . . . , RiaN (t) ) . C. Indices The index for potential sender n in state in is represented as δin . The optimal policy has an index rule: The M senders with the smallest indices in a given time slot t act as the active senders. That is, assuming {δk1 , δk2 , . . . , δkN } to be the set of indices arranged from

The sender selection process that allows the potential senders to decide their active/passive state in a distributed and cooperative way. The selection process is as follows. 1) The receiver executes a P2P lookup protocol to find the desired multimedia resource, and then the address list of the N potential senders is sent to the receiver. 2) According to the length of the multimedia resource, the receiver calculates the number of resource fragments, and multicasts the values of M , N , β and the request of the first fragment to all N potential senders. The address list of all potential senders is included in the request. 3) The potential senders share their state transition probabilities pain jn and the reward Rian with each other. 4) Each potential sender off-line calculates the set of indices and maintains the index table: input pain jn of each potential sender n ∈ N , Rian , β, M and α, then compute the finite set of the indices {δin }. Store these indices and the corresponding parameters in a table. 5) At the beginning of time slot t, with all potential senders’ initial state probability vectors α, each sender lookups the indices table to find out the corresponding index δin . 6) Each sender arranges the list of the indices from the lowest to the highest. A sender will be active in the following time slot if its index is one of the first M items.

257 261 265

9

10

8

9

5

System Reward

System Reward

Optimal Scheme, M=6 Optimal Scheme, M=4 Optimal Scheme, M=2 Existing Scheme, M=6 Existing Scheme, M=4 Existing Scheme, M=2

6

4

Figure 1.

7 6 5 4

3 2

Optimal Scheme, M=6 Optimal Scheme, M=4 Optimal Scheme, M=2 Existing Scheme, M=6 Existing Scheme, M=4 Existing Scheme, M=2

8

7

3

0

50

100 Time (Sec)

150

2

200

System reward comparison along the time line (N = 8).

7) The active sender m ∈ M transmits the requested multimedia resource fragments to the receiver. 8) Go to step 5 until the transmission is completed. V. S IMULATION R ESULTS AND D ISCUSSIONS In this section, we simulate the proposed optimal sender selection scheme and compare it to the existing scheme. We use Chord as the P2P lookup protocol. The basis is 2 and the DHT key space is 0, 1, . . . , 2256 − 1. AODV is used as the network layer protocol for the ad hoc network. Since previous work [3]–[5] does not consider the sender selection problem, we assume random selection in the existing scheme. Assume β = 0.8, E = 10, and the duration of one time slot to be 10 seconds. A. System Reward Improvement We run the simulation for 2, 000 seconds and obtain the system reward in Fig. 1. There are N = 8 potential senders in the network, and the performance when M = 2, 4 and 6 is presented respectively. From Fig. 1, we can see that the optimal scheme improves the system reward significantly. This is because the proposed scheme can optimally select the senders considering the route conditions. B. The Affect of Different Number of Senders The number of potential senders, N , which are found by P2P lookup protocols will be different with different network situations. Besides, the number of selected active senders, M , may also vary. Fig. 2 shows the system reward with different numbers of potential senders and active senders. We can draw the same conclusion that the optimal selection scheme always performs better than the existing one in various situations. Besides, since the performance improvement is not significant with larger M , it may not be necessary to require too many senders to transmit the multimedia file simultaneously to the same receiver. VI. C ONCLUSIONS In this paper, a distributed sender selection scheme for multi-source multimedia transmission in P2P networks to minimize the multimedia distortion has been proposed. The restless bandits approach was used to solve the sender selection problem. It was shown that the optimal policy is simply selecting the senders with the least indices. We have also shown the sender selection process in

Figure 2.

4

6

8 10 12 14 16 Total Number of Potential Senders

18

20

System reward comparison with different sender number.

practice. Simulation results were presented to illustrate the significant performance improvement compared with the existing scheme. R EFERENCES [1] I. Stoic, R. Morris, D. R. Karger, M. F. Kaashoek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup protocol for Internet,” IEEE/ACM Trans. Netw., vol. 11, pp. 17–32, Feb. 2003. [2] A. Rowstron and P. Druschel, “Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems,” in Proc. 18th IFIP/ACM Conf. Dist. Sys. Platforms (Middleware 2001), (Heidelberg, Germany), pp. 329–350, Nov. 2001. [3] B. Choen, “Incentives build robustness in bitorrent,” in Proc. P2P Economics Workshop, (Berkeley, California), pp. 978–982, June 2003. [4] C. Gkantsidis and P. R. Rodriguez, “Network coding for large scale content distribution,” in Proc. IEEE Infocom’05, (Miami, Florida), pp. 2235–2245, Mar. 2005. [5] T. Schierl, K. Ganger, C. Hellge, and T. Wiegand, “SVCbased multisource streaming for robust video transmission in mobile ad hoc networks,” IEEE Trans. Wireless Commun., vol. 13, pp. 96–103, Oct. 2006. [6] M. Hefeeda, A. Habib, B. Botev, D. Xu, and B. Bhargava, “Promise: peer-to-peer media streaming using CollectCast,” in Proc. 11th ACM Conf. Multimedia, (Berkeley, California), pp. 45–54, Nov. 2003. [7] S. Itaya, N. Hayashibara, T. Enokido, and M. Takizawa, “Scalable peer-to-peer multimedia streaming model in heterogeneous networks,” in Proc. 7th IEEE Int. Symp. Multimedia, (Irvine, CA), Dec. 2005. [8] P. Si, F. Yu, H. Ji, and V. Leung, “Distributed multi-source transmission in wireless mobile peer-to-peer networks: A restless bandit approach,” in Proc. IEEE ICC 2009, (Dresden, Germany), June 2009. [9] P. Si, F. Yu, H. Ji, and V. Leung, “Distributed sender scheduling for multimedia transmission in wireless mobile peer-to-peer networks,” IEEE Trans. Wireless Commun., vol. 8, pp. 4594–4603, Sept. 2009. [10] P. Whittle, “Restless bandits: activity allocation in a changing world,” in A Celebration of Applied Probability (J. Gani, ed.), vol. 25 of J. Appl. Probab., pp. 287–298, Applied Probability Trust, 1988. [11] D. Berstimas and J. Ni˜no-Mora, “Restless bandits, linear programming relaxations, and a primal dual index heuristic,” Operations Research, vol. 48, no. 1, pp. 80–90, 2000. [12] J. Gittins, Multi–armed Bandit Allocation Indices. Wiley, 1989. [13] Z. He, J. Cai, and C. Chen, “Joint source channel ratedistortion analysis for adaptive mode selection and rate control in wireless video coding,” IEEE Trans. Circ. Sys. Video Tech., vol. 12, pp. 511–523, June 2002. [14] M. Pearlman, Z. Hass, P. Sholander, and S. S. Tabrizi, “On the impact of alternate path routing for load balancing in mobile ad hoc networks,” in Proc. MobiHoc, (Boston, MA), pp. 3–10, Aug. 2000. [15] E. Gilbert, “Capacity of a bursty-noise channel,” Bell Syst. Tech. J., vol. 39, pp. 1253–1265, Sept. 1960.

258 262 266