2246
IEEE COMMUNICATIONS LETTERS, VOL. 19, NO. 12, DECEMBER 2015
Neural-Network Based Optimal Dynamic Control of Delivering Packets in Selfish Wireless Networks Jinglei Li, Qinghai Yang, and Kyung Sup Kwak
Abstract—In this letter, we investigate the dynamic packet delivery through a specific path in selfish wireless networks (SeWNs) with cascaded selfish relay nodes (RN). A dynamic node-selfishness model is designed to formulate the variation of the RN’s degree of node-selfishness (DeNS) with both its own resource and the incentive controlled by the source. When the node-selfishness dynamics are unknown, using the neural network (NN), we further identify the node-selfishness model and then approximate the optimal incentives for maximizing the infinite-horizon utility, namely the tradeoff between the path reliability and the incentive cost in the long term. Simulation results demonstrate the effectiveness of the proposed scheme. Index Terms—Neural networks, selfish wireless networks, nodeselfishness dynamics, delivering packets.
I. I NTRODUCTION
I
N recent years, we have witnessed a drastic growth of delivering the packets of media streams [1] in wireless networks, and the packet delivery of media streams has the low delay and high path reliability requirements. In the selfish wireless network (SeWN), which consists of relay nodes (RN) exhibiting a selfish behavior [2], the selfish behavior may degrade the path reliability of the end-to-end (E2E) packet delivery. The RN’s selfish behavior of forwarding packets may vary with its available resources [3], [4] and the incentives [5]–[7] received from the source, thus leading to the dynamic path reliability for the E2E packet delivery. However, the influence of both the available resources and the received incentives on the nodeselfishness is the private information of the RNs, thus the source does not know the RNs’ node-selfishness dynamics, which depict the RNs’ node-selfishness variations in a mathematical way. By using neural networks (NN) approximating nonlinear functions [8], the source may estimate the unknown nodeselfishness dynamics and the optimal incentives for maximizing the path reliability of the E2E packet delivery. In this letter, we investigate the packet delivery through one path in SeWNs under the unknown node-selfishness dynamics. In such SeWNs, the RN’s selfish behavior of forwarding packets is affected by both its own resource and the incentive received from the source. The RN’s degree of node-selfishness
Manuscript received June 9, 2015; accepted October 16, 2015. Date of publication October 26, 2015; date of current version December 8, 2015. This research was supported in part by NRF of Korea (MSIP-)(NRF2014K1A3A1A20034987) and NSF of China (61471287). The associate editor coordinating the review of this paper and approving it for publication was V. Eramo. J. Li and Q. Yang are with the State Key Laboratory of ISN, School of Telecommunications Engineering, Xidian University, Xi’an 710071, China (e-mail:
[email protected]). K. S. Kwak is Inha Hanlim Professor with the School of Information and Communication Engineering, Inha University, Incheon, Korea (e-mail:
[email protected]). Digital Object Identifier 10.1109/LCOMM.2015.2493542
(DeNS) is defined as the effect of both its own resources and the received incentives on its packet-forwarding behavior. A dynamic node-selfishness model is designed to formulate the dynamics of the RN’s DeNS with respect to (w.r.t.) its own resource as well as to the received incentive. When the RNs’ node-selfishness dynamics are unknown, an NN-based approximation scheme is conceived to identify the dynamic node-selfishness model and to approximate both the optimal infinite-horizon utility function balancing the path reliability and the incentive cost in the long term and the optimal received incentives. Notation: We define the symbol “◦” as the operator, which is the product of the corresponding elements of two equal-size matrices. For example, C = A ◦ B with A = {aij}, B = {bij} and C = {cij } implies that cij = aijbij ∀ i, j. II. S YSTEM M ODEL In the SeWN including selfish RNs, the source selects one path and then delivers the packets of media streams through this specified path. Due to the resource depletion of forwarding packets, the RNs decrease their willingness of forwarding packets, thus degrading the reliability of the selected path. By contrary, some incentives may be provided by the source to increase their willingness of forwarding packets, thus enhancing the path reliability. Hence, the path reliability of delivering the packets of media streams is related to the selfish behaviors of the RNs within the selected path. For the sake of analyzing the RN’s selfish characteristics, we refer to the RN’s own resource and the incentive received from the source as its intrinsic factor and extrinsic factor, respectively, and we define the RN’s DeNS s (0 ≤ s ≤ 1) as the degree reflecting the effect of its intrinsic and extrinsic factors on its behavior of forwarding packets, where the RN with s = 1 is completely selfish and the RN with s = 0 is altruistic. The variations of both the RN’s intrinsic and extrinsic factors with time cause the dynamics of its DeNS, which is employed to build a nodeselfishness dynamics. Additionally, when a certain RN within the selected path is simultaneously shared by other paths, its available resource of this RN decreases owing to the resource consumption of forwarding the packets through other paths. Accordingly, we analyze that other paths degrade the delivery performance of the selected path by considering the effect of the RN’s available resource, which is the remaining part after the resource consumption of forwarding the packets through other paths. Hence, we focus on the packet delivery through the selected path between the source and destination, denoted by R = {R1 , · · · , RN } with N being the number of RNs. In the time frame structure, the packets of media stream is delivered through the selected path R during every frame k. As the DeNS of any RN within the selected path increases, the path
1558-2558 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
LI et al.: NEURAL-NETWORK BASED OPTIMAL DYNAMIC CONTROL OF DELIVERING PACKETS IN SeWNs
reliability of delivering packets through this path decreases. Like the path reliability proposed in [9], the path reliability of the selected path R is defined as the product of the reliable degrees of all selected RNs during frame k, expressed as P(Sk ) = (1 − si,k ), (1) Ri ∈R
where Sk = [s1,k , · · · , sN,k ]T is the vector of all RNs’ DeNSs during frame k, si,k is the DeNS of RN i during frame k. For improving the reliability of the path R, the source needs to provide some incentives for stimulating the packet forwarding of all RNs within this path. The effect of the RNs’ DeNSs on the successful probability of delivering packets is related to the RN-destination distance, e.g., the hop number between the RN and the destination and the link loss probability of each hop [10]. If the RN is in the vicinity of the source, all packets may be successfully delivered to the destination under the condition that the RNs between this RN and the destination are altruistic, for which the large incentives are provided. By contrast, if the RN is in the vicinity of the destination, the success of delivering packets to the destination may be achieved in the case of the small provided incentives. Besides of the RNs’ selfish behaviors, the provided incentives are also correlated with the network parameters, i.e., the locality of the RNs and their link characteristics. We formulate the incentive cost of delivering packets during frame k as C (u(Sk )) = uT (Sk )Qu(Sk ),
(2)
where Q ∈ RN×N is the correlation matrix of the incentives depending on the network parameters, u(Sk ) = [u(s1,k ), · · · , u(si,k ), · · · , u(sN,k )]T is the vector of the incentives provided by the source with u(si,k ) being the provided incentive for RN i during frame k. III. O PTIMAL PACKET D ELIVERY U NDER U NKNOWN N ODE -S ELFISHNESS DYNAMICS Since media streams have the low delay and high path reliability requirements but the time overhead of path selection may increase the delivery delay, the source should provide the optimal path selection policy and the optimal path-reliability control of the selected path. Nevertheless, owing to the limited space of this letter, we only consider the path-reliability control for delivering the packets of media streams through the selected path. Furthermore, since a media stream is a continuous flow and continues for a long time period, the RNs’ selfish behaviors continuously affect the packet delivery through the selected path R. For effectively delivering the packets through the selected path during future frames, the source predicts the RNs’ instantaneous and future DeNSs, and determines the instantaneous incentives at the beginning of every frame k. However, it is a challenge for the source to know this dynamic node-selfish model, and to effectively deliver the packets of media streams through the path R in the long time period, owing to the RNs’ selfish characteristics being private. In this section, a dynamic node-selfishness model is designed in terms of the RN’s intrinsic and extrinsic factors, and the optimal packet delivery is obtained under the unknown nodeselfishness dynamics.
2247
A. Dynamic Node-Selfishness Model The RN’s DeNS increases with the resource depletion of forwarding packets, while the source provides the incentive to decrease the RN’s DeNS. The variation of the RN’s DeNS with its own resources is referred to as the intrinsic dynamics, while the variation of its DeNS w.r.t. the received incentive is referred to as the extrinsic dynamics. By employing the plant model of the control system [8], we design the dynamic node-selfishness model of RN i with the control input being the received incentive u(si,k ) and the state being the DeNS si,k as si,k+1 = f (si,k ) + g(si,k )u(si,k ),
(3)
where si,k+1 is the DeNS of RN i during frame k + 1, f (si,k ) is the intrinsic dynamics under the RN’s DeNS si,k , e.g., the linear function in [3], and g(si,k )u(si,k ) is the extrinsic dynamics under the RN’s DeNS si,k with g(si,k ) being the incentive-gain rule and u(si,k ) being the incentive received from the source in an incentive scheme, e.g., the credit-based incentive scheme in [5]. If no incentive is received from the source for RN i during frame k, i.e., u(si,k ) = 0, the DeNS of RN i will be only related to its intrinsic factor during frame k + 1, i.e., si,k+1 = f (si,k ). When u(si,k ) = 0, the DeNS of RN i is controlled by the incentive, besides of the effect of the own resource of RN i. Meanwhile, the intrinsic dynamics f (si,k ) is relevant with the personal resource of RN i. In the case that either the RN’s available resource amount is too high or too low, the change rate of its DeNS is low. Nevertheless, in the presence of mediocre resource amount, the change rate of the RN’s DeNS is high. Additionally, for the extrinsic dynamics, the incentive-gain rule g(si,k ) is related with the incentive provided by the source for RN i. With the decline of the RN’s DeNS, the value of the incentive-gain rule g(si,k ) decreases. According to Eq. (3), the dynamic DeNSs of all RNs within the selected path R are expressed as Sk+1 = f(Sk ) + G(Sk )u(Sk ) = f(Sk ) + g(Sk ) ◦ u(Sk ),
(4)
where Sk+1 is the vector of the DeNS of all RNs within the path R during frame k + 1, f(Sk ) = [f (s1,k ), · · · , f (sN,k )]T represents the vector of the intrinsic dynamics f (si,k ) of each RN i (∀ i ∈ R) during frame k, G(Sk ) is the diagonal matrix whose diagonal entries are the vector entries of the incentive-gain rule g(Sk ) = [g(s1,k ), · · · , g(sN,k )]T consisting of the incentivegain rule g(si,k ) of RN i (∀ i ∈ R) during frame k. Under the condition that the source knows the dynamic node-selfishness model of the RN, it controls the DeNSs of all RNs within the path R by providing some appropriate incentives in order to successfully deliver its packets through this path. However, since the available resource of each RN is its private information and the incentive-gain rule is also the RN’s own intrinsic rule, the closed-form expressions of the intrinsic dynamics f (si,k ) and the incentive-gain rule g(si,k ) of each RN i (i ∈ R) are unknown by the source. Hence, before delivering packets during frame k, the source has to determine the intrinsic dynamics f(Sk ) and the incentive-gain rule g(Sk ), which will be solved by NNs in Section III-C.
2248
IEEE COMMUNICATIONS LETTERS, VOL. 19, NO. 12, DECEMBER 2015
B. Infinite-Horizon Utility and Optimal Incentives During each frame k, the source provides the incentives to all RNs within the path R for depressing their DeNSs, which raises up the path reliability of delivering packets through this path. According to the RN’s dynamic node-selfishness model of Eq. (4), the source determines both the path reliability of delivering packets and the provided incentives in the long term. Here, an infinite-horizon utility function of the source during frame k is defined to maximize the path reliability and also to minimize the incentive cost in the long term, expressed as V (Sk , u(Sk )) =
∞
τ j−k P(Sj ) − πC u(Sj )
j=k
= P(Sk )−πC (u(Sk ))+τ V (Sk+1 , u(Sk+1 )) (5) where τ (0 ≤ τ < 1) is a discount factor, π (π > 0) is a factor, like a price parameter, which is to balance the path reliability and the incentive cost. For delivering packets during frame k, the source needs to provide the optimal incentives, whilst maximizing its infinitehorizon utility V(Sk , u(Sk )). Using Bellman’s principle of optimality, the optimal infinite-horizon utility V ∗ (Sk , u∗ (Sk )) is expressed as V ∗ Sk ,u∗(Sk ) = max P(Sk )−πC(u(Sk ))+τ V ∗ Sk+1 ,u∗(Sk+1) . u(Sk )
(6) By differentiating Eq. (6) and setting it to zero, we obtain the optimal incentives u∗ (Sk ) =
τ −1 T ∂V ∗ (Sk+1 , u∗ (Sk+1 )) Q G (Sk ) . 2π ∂Sk+1
(7)
From Eqs. (6) and (7), the optimal infinite-horizon utility V ∗ (Sk , u∗ (Sk )) and the optimal incentives u∗ (Sk ) are obtained for effectively delivering packets, whose precondition is that the source must know the node-selfishness dynamics in Eq. (4) and also the future infinite-horizon utility V ∗ (Sk+1 , u∗ (Sk+1 )). Since the RNs’ intrinsic dynamics f(Sk ) and the incentivegain rule g(Sk ) are unknown by the source, the RN’s nodeselfishness dynamics is also unknown, thus leading to the unknown RNs’ future DeNSs Sk+1 and the unknown future infinite-horizon utility V ∗ (Sk+1 , u∗ (Sk+1 )). To circumvent these deficiencies, an NN-based approximation scheme is proposed to deliver packets under unknown node-selfishness dynamics. C. NN-Based Approximation Scheme Under Unknown Node-Selfishness Dynamics In this subsection, an NN-based approximation scheme is proposed to deliver packets under the unknown nodeselfishness dynamics, including three parts: the NN-based identification of the dynamic node-selfishness model, the NN-based approximation of the infinite-horizon utility function V ∗ (Sk , u∗ (Sk )) and the NN-based approximation of the optimal incentives u∗ (Sk ). 1) NN-Based Approximation Scheme: An NN-based identification is proposed to generate f(Sk ) and g(Sk ). The dynamic
node-selfishness model of Eq. (4) can be expressed by using following NN-based identification f(Sk ) = WTf θf (Sk ) + εf ,k ,
(8)
g(Sk ) = WTg θg (Sk ) + εg,k ,
(9)
where Wf ∈ RL×N and Wg ∈ RL×N are the NN weights of the dynamic node-selfishness model with L being the number of hidden neurons, θf (Sk ) ∈ RL×1 and θg (Sk ) ∈ RL×1 are the NN activation functions of the dynamic node-selfishness model on the RNs’ DeNSs Sk , εf ,k ∈ RN×1 and εg,k ∈ RN×1 are the NN approximation errors of the dynamic node-selfishness model. Hence, the node-selfishness model is rewritten as Sk+1 = WTf θf (Sk )+εf ,k + WTg θg (Sk )+εg,k ◦ u(Sk ) (10) = WTs s (Sk ) ◦ U(Sk ) + ε¯ s,k , T
(11) T
where Ws = [WTf WTg ] , s (Sk ) = [θf (Sk )T θg (Sk )T ] , U(Sk ) = T
[11×N uT (Sk )] with 11×N being a vector whose entries are all T T ]T ◦ [1 T one, ε¯ s,k = [εfT,k εg,k 1×N u (Sk )] , with s (Sk ) ≤ M , 1N×2L s (Sk ) ◦ U(Sk ) ≤ M and ¯εs,k < ε¯ M , ∀ k. Using the approximation property of NNs, the infinitehorizon utility function has the NN-based representation as V(Sk ) = WTc θc (Sk ) + εc,k ,
(12)
where Wc ∈ RLc ×1 is the NN weight of the infinite-horizon utility function with Lc being the number of hidden neurons, θc (Sk ) ∈ RLc ×1 is the NN activation function of the infinitehorizon utility function, εc,k ∈ R is the NN approximation error of the infinite-horizon utility function. The optimal incentives provided by the source in Eq. (7) have the NN-based representation, expressed as u∗ (Sk ) = WTI θI (Sk ) + εI,k ,
(13)
where WI ∈ RLI ×N is the NN weight of the optimal incentives with LI being the number of hidden neurons, θI (Sk ) ∈ RLI ×1 is the NN activation function of the optimal incentives, εI,k ∈ RN×1 is the NN approximation errors of the optimal incentives. 2) Complexity of NN-Based Approximation Scheme: Since the numbers of input neurons, hidden neurons and output neurons determine the scale of an NN, the computational complexity of an NN is determined in terms of these three numbers. Accordingly, the computational complexity of the NN in Eq. (11) is 2LN 2 , and the computational complexity of the NN in Eq. (12) is NLc , while the computational complexity of the NN in Eq. (13) is NLI . In the NN-based approximation scheme, these three NNs, i.e., Eqs. (11)–(13), should be trained many times for obtaining the best approximating performance, thus the computational complexity of this scheme is MN(2NL + Lc + LI ), where M is the training times. IV. S IMULATION R ESULTS By using the NN toolbox of MATLAB, we simulate the packet delivery between a source-destination pair in a simple SeWN including one selfish RN with f (sk ) = sk , g(sk ) = −(sk )2 and the initial DeNS s0 = 1. Meanwhile, we set that
LI et al.: NEURAL-NETWORK BASED OPTIMAL DYNAMIC CONTROL OF DELIVERING PACKETS IN SeWNs
2249
Fig. 3. The RNs’ node-selfishness dynamics, utility and optimal incentive by using the trained NNs and a comparative method.
Fig. 1. The relevant parameters of the NN toolbox for the dynamic nodeselfishness model.
model is known, and the source only maximizes the instantaneous utility of delivering packets during frame k, i.e., V(sk , u(sk )) = P(sk ) − πC(u(sk )). Fig. 3 shows the RN’s nodeselfishness dynamics, optimal instantaneous utility and optimal incentive by using the trained NNs and the comparative method, marked by “Baseline.” The RN’s DeNSs and the optimal incentives under these two methods decrease with time, while the optimal instantaneous utilities increase with time. Meanwhile, during every frame k, the optimal instantaneous utility of maximizing the infinite-horizon utility is larger than that of just maximizing the instantaneous utility. V. C ONCLUSION
Fig. 2. The best training values of the RN’s dynamic DeNS, infinite-horizon utility and optimal incentive by using NNs, and their corresponding errors.
Q = 1, τ = 0.5 and π = 1. Additionally, we also set that three NNs in Eqs. (11)–(13) all have three layers, where the numbers of their hidden-layer neurons are all 20 and the transfer functions of the hidden and output layers are set as “tansig” and “purelin,” respectively. Fig. 1 shows the relevant parameters of the NN training for the dynamic node-selfishness model by using the NN toolbox in MATLAB. The training algorithm of the NN is “LevenbergMarquardt,” and the performance algorithm is “Mean Squared Error.” This NN finishes 51 iterations and uses 1 s. Meanwhile, the progress of the training performance during 51 epoches is shown, and the mean squared error decreases with the epoch. The best training performance is shown at epoch 51. Fig. 2 depicts the best training values of the RN’s dynamic DeNS, optimal infinite-horizon utility and optimal incentive by using NNs, and their corresponding errors in the best training performance. We generate the corresponding training data of these three NNs in terms of Eqs. (4), (6) and (7). By using these generated training data, we train the corresponding NNs during many epoches for decreasing the errors between the training data and the data generated by the trained NNs. At the last epoches, the orders of the magnitude of these errors are 10−3 , 10−2 and 10−3 , respectively, which prove that Eqs. (4), (6) and (7) are effectively approximated by the trained NNs. By using the trained NNs, we compare the proposed NNbased approximation scheme with a comparative method. In this comparative method, the RN’s dynamic node-selfishness
The dynamic node-selfishness model was designed to formulate the dynamics of the RN’s DeNS w.r.t. its own resources as well as the provided incentives. Under the unknown dynamic node-selfishness model, the NN-based approximation scheme was employed to determine the dynamic node-selfishness model of selfish RNs and to maximize the infinite-horizon utility of the source for delivering packets in long term. R EFERENCES [1] G. A. Shah, W. Liang, and O. B. Akan, “Cross-layer framework for QoS support in wireless multimedia sensor networks,” IEEE Trans. Multimedia, vol. 14, no. 5, pp. 1442–1455, Oct. 2012. [2] A. Mei and J. Stefa, “Give2Get: Forwarding in social mobile wireless networks of selfish individuals,” IEEE Trans. Depend. Secure Comput., vol. 9, no. 4, pp. 569–582, Jul. 2012. [3] E. Ataie and A. Movaghar, “Performance evaluation of mobile ad hoc networks in the presence of energy-based selfishness,” in Proc. IEEE Int. Conf. Broadband Commun., Netw. Syst., Oct. 2006, pp. 1–6. [4] F. Xing and W. Wang, “On the survivability of wireless ad hoc networks with node misbehaviors and failures,” IEEE Trans. Depend. Secure Comput., vol. 7, no. 3, pp. 284–299, Jul. 2010. [5] H. Zhou, J. Chen, J. Fan, Y. Du, and S. K. Das, “ConSub: Incentive-based content subscribing in selfish opportunistic mobile networks,” IEEE J. Sel. Areas Commun., vol. 31, no. 9, pp. 669–679, Sep. 2013. [6] M. E. Mahmoud and X. M. Shen, “Stimulating cooperation in multihop wireless networks using cheating detection system,” in Proc. IEEE INFOCOM, Mar. 2010, pp. 1–9. [7] Z. Li and H. Y. Shen, “Game-theoretic analysis of cooperation incentive strategies in mobile ad hoc networks,” IEEE Trans. Mobile Comput., vol. 11, no. 8, pp. 1287–1303, Aug. 2012. [8] S. Jagannathan, Neural Network Control of Nonlinear Discrete-Time Systems. Boca Raton, FL, USA: CRC Press, 2006. [9] A. E. Zonouz, L. Xing, V. M. Vokkarane, and Y. L. Sun, “Reliabilityoriented single-path routing protocols in wireless sensor networks,” IEEE Sens. J., vol. 14, no. 11, pp. 4059–4068, Nov. 2014. [10] F. Wu, K. Gong, T. Zhang, G. Chen, and C. Qiao, “COMO: A gametheoretic approach for joint multirate opportunistic routing and forwarding in non-cooperative wireless networks,” IEEE Trans. Wireless Commun., vol. 14, no. 2, pp. 948–959, Feb. 2015.