Cross-Layer Design of Rateless Random Network Codes for Delay ...

11 downloads 16524 Views 604KB Size Report
Abstract—We study joint network and channel code design to optimize delay ..... double-sided power spectral density (PSD) 0/2. At the end of each channel ...
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 59, NO. 12, DECEMBER 2011

3311

Cross-Layer Design of Rateless Random Network Codes for Delay Optimization Ming Xiao, Member, IEEE, Muriel M´edard, Fellow, IEEE, and Tor Aulin, Fellow, IEEE

Abstract—We study joint network and channel code design to optimize delay performance. Here the delay is the transmission time of information packets from a source to sinks without considering queuing effects. In our systems, network codes (network layer) are on top of channel codes (physical layer) which are disturbed by noise. Network codes run in a rateless random method, and thus have erasure-correction capability. For the constraint of finite transmission time, transmission errors are inevitable in the physical layer. A detection error in the physical layer means an erasure of network codewords. For the analysis, we model the delay of each information generation in the network layer as independent, identically distributed random variables. The calculation approaches for delay measures are investigated for coded erasure networks. We show how to evaluate the rate and erasure probability of a set of channels belonging to one cut. We also show that the min-cut determines the decoding error probability in the sinks if the number of information packets is large. We observe that for a given amount of source information, larger packet length leads to fewer packets to be transmitted but higher physical-layer detection error probabilities. Further, longer transmission time (delay) in the physical-layer causes smaller detection error probability at the physical layer. Thus, both parameters have opposite impacts on the physical and network layer, considering delay. We should find the optimal values of them in a cross-layer approach. We then formulate the problems of optimizing delay performance, and discuss solutions for them. Index Terms—Network coding, cross-layer design, delay, optimization.

I. I NTRODUCTION

N

ETWORK coding was originally proposed in [1], [2] to address the problem of network information flow for error-free networks. Reference [1] shows that the maximum possible flow is determined by the minimum cut between a source and sinks, and in general it can only be achieved by network coding at intermediate nodes. In [2], it was shown that linear network codes are sufficient to achieve min-cut

Paper approved by Ender Ayanoglu, the Editor for Communication Theory and Coding Applications of the IEEE Communications Society. Manuscript received on June 21, 2010; revised on December 8, 2010 and April 11, 2011. Part of this work was presented at the IEEE International Conference on Communications (ICC 2010), May 2010, Cape Town, South Africa. M. Xiao is with the ACCESS Linnaeus Center, School of Electrical Engineering, Royal Institute of Technology, Sweden (e-mail: [email protected]). M. M´edard is with the department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, MA, USA (e-mail: [email protected]). T. Aulin is with the Department of Computer Science and Engineering, Chalmers University of Technology, Sweden (e-mail: [email protected]). The work of M. Xiao is supported in part by the Swedish Research Council (VR) under the grant 621-2008-4249. This material is also based upon work under subcontract # 18870740-37362-C issued by Stanford University and supported by the Defense Advanced Research Projects Agency (DARPA). Digital Object Identifier 10.1109/TCOMM.2011.112311.100366

capacity. In [3], an algebraic framework is established for network coding. Thus, linear network codes are described by transfer matrices. In [4], a random network coding approach is proposed. The benefits of random network codes are decentralized operations and adaptation to dynamic environments (e.g., dynamic network topology). One classic example illustrating the benefit of network coding can be found in Fig. 1, (a) for a butterfly network, in which source 𝑠 tries to multicast two information bits 𝑏1 , 𝑏2 to two sinks 𝑡1 and 𝑡2 . The channels are assumed to be error-free and have a capacity of one bit (per time unit) each. It is easy to see that due to the limitation of the channel capacity, the multicast objective cannot be achieved by routing (namely, replicate-and-forward). However, by network coding at intermediate node 𝑚3 , two information bits are XORed (binary add) into one bit 𝑏1 ⊕ 𝑏2 , which is received at both sinks. A decoder rebuilds (XOR) two source bits with received bits from two incoming channels. Though the layered structure of network protocols makes the network layer often see the lower layer error-free pipelines, it may not always be optimal (sometime even not practical) in e.g., transmission resources or the delay to separate the design of different layers. Hence, network coding for networks with transmission errors (block erasures or symbol errors) has attracted more and more research interest. In [5], a rateless random network coding scheme is proposed for networks with erasure channels. In the scheme, intermediate nodes store and encode incoming codewords (packets), which have the same length. Compared to fountain codes ([6], [7]) which only encode (network layer) in the source node, rateless random network codes also encode in intermediate nodes. Hence, they can achieve the min-cut capacity of networks with erasure channels ([8]) because of the advantages of network coding ([5]), and thus higher rates are obtained in general. We note that compared to fountain codes, rateless random network codes generally have higher encoding and decoding complexity. Also, they require more memory in intermediate nodes. For the cross-layer design, a concatenated random code scheme with network coding is proposed in [9], where the channel codes are inner codes in the physical layer, and the network codes are outer codes in the network layer. Reference [9] assumes a fixed erasure probability in the physical-layer. The impact of network-layer parameters to the physical-layer performance has not been considered. In [10], the gains in delay resulting from random network coding are investigated. The networks in [10] are broadcasting erasure (ON/OFF) channels (e.g., for the downlink of a cellular network). One example on network coding for the erasure network is shown in Fig. 1, (b). With network coding, all nodes simultaneously encode and transmit codewords with information stored in

c 2011 IEEE 0090-6778/11$26.00 ⃝

3312

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 59, NO. 12, DECEMBER 2011

𝑏1



𝑚2 𝑏1

 𝑚3

𝑠 𝑏1 , 𝑏2

𝑏1

 𝑡1 

𝑚4 𝑏1 ⊕ 𝑏2

 𝑏1 ⊕ 𝑏2

𝑏2

 𝑚1

𝑏2

𝑏2

  𝑡2

(𝑎)



𝑚2

Ch3 Ch4

Ch1 , 𝑤1 = 2; 𝜉1 = 0.1

Ch5

𝑠

Ch6

Ch2 , 𝑤2 = 1;

𝜉2 = 0.04

min-cut

 𝑚1

 𝑡1   𝑡2 

Ch7 Ch8

 𝑡3

(𝑏)

Fig. 1. (a). Butterfly networks. Two bits 𝑏1 , 𝑏2 are to be multicast to two sinks 𝑡1 , 𝑡2 . All channels are error-free and have a capacity of one bit per time unit. (b). A network with two parallel erasure channels as the min-cut. The rates are in packets per second. The rate and erasure probability of Ch1 is 2 packets per second and 0.1 respectively. The rate and erasure probability of Ch2 is 1 packet per second and 0.04 respectively. The rates and erasure probabilities for channels Ch3 , Ch4 and Ch5 are 2 packets per second, and 0.01 respectively. The rates and erasure probabilities for channels Ch6 , Ch7 and Ch8 are 1 packet per second, and 0.01 respectively.

their memory. If one channel (say, Ch1 ) in the source-sink path has an erasure, 𝑚2 (through Ch3 , Ch4 , Ch5 ) can still send codewords encoded from all previously received codewords to the sinks. To further see the benefits of network coding (instead of the benefit of larger memory or source encoding), we assume that the source tries to transmit two messages 𝐼1 , 𝐼2 to the sinks, and 𝑚2 has received and stored two codewords 𝐶1 = 𝐼1 + 𝐼2 , 𝐶2 = 𝐼1 + 2𝐼2 (assuming that the codes have a sufficiently large field size). We also assume Ch6 not available (e.g., due to very high erasure probability). 𝑡1 has received one codeword 𝐶1 (or 𝐶2 ). But another codeword 𝐶2 (or 𝐶1 ) is erased in Ch3 . Due to the limitation of feedback, 𝑚2 does not know which codeword is lost. With network coding, however, 𝑚2 could transmit e.g., 𝐶3 = 𝛼1 𝐶1 + 𝛼2 𝐶2 , where 𝛼1 , 𝛼2 are nonzero coding coefficients. Then, 𝑡1 can rebuild both 𝐼1 and 𝐼2 , no matter which message has been received before. Without network coding, however, 𝑚2 may (with probability 0.5) repeat transmitting the same codeword already received by 𝑡1 , which cannot help on decoding. More generally, the maximum achievable rate of an erasure network with network coding equals the min-cut of the network, which generally cannot be achieved without network coding. A more detailed description will be given in Section II. A strict proof can be found in e.g., [8]. In this paper, we investigate the joint design of network and channel codes. The main objective is to optimize the delay of network codes (network layer), which are on top of the channel codes (physical layer). In the network layer, we use rateless random network codes ([5]), i.e., source and intermediate nodes randomly produce and transmit codewords until transmission succeeds or the maximum delay occurs (i.e., system is time out). On decoding successfully, a sink sends a positive feedback to the source. The source stops transmission if it receives positive feedback from all sinks, or the maximum transmission time-limit is reached. Thus, the codes have block-erasure-correction capability. Network coding is performed across many channel codewords. The physical-layer codewords are disturbed by channel noise. The transmission time of channel codewords is finite and much shorter than that of network codewords. Then, transmission errors are inevitable in the physical layer. The motivations of this work are as follows: Firstly, with a

long enough delay, rateless network codes can always collect enough packets to decode source information. Yet, the delay is one of the essential performance measures for practical networks. It denotes both quality of service (QoS) to upper layers and the cost of transmission resources. A longer transmission delay leads to lower QoS and larger cost of resources. Further, if a strict maximum-delay constraint is enforced for some scenarios, transmission may be time-out, and an error event occurs if any sink cannot decode the source messages. Secondly, network-layer parameters (e.g. packet length) have significant impacts on the physical-layer erasure probability, which affects the delay in the network layer. Also, physicallayer parameters, such as the transmission time of codewords, can also impact on the network layer. Hence, to improve delay performance, we should design these parameters in a crosslayer way. Note that we aim at considering the interactions of different layers to the delay and how to optimize the parameters of them. We do not consider the delay caused by the queuing effects. The main contributions of the paper are: We investigate the delay in packet-erasure networks using rateless random network codes (called coded erasure networks). We model the delay in coded erasure networks as independent, identically distributed random variables. For the delay performance, we show tradeoffs between the network layer and the physical layer on the length of the network codewords, and on the transmission time of the physical-layer codewords. To measure performance, we use the expected delay and network layer error probability of a given delay. We show how to calculate these delay measures for coded erasure networks. We formulate the problems to optimize delay performance on these measurements, and propose solutions. The organization of the paper is as follows: In Section II, we give a system description. In Section III, we analyze both network and physical layers. In Section IV, we formulate and solve the problems of optimizing delay performance. Finally, conclusions are given in Section V. II. S YSTEM D ESCRIPTION A. Network Layer The network consists of one source node (denoted by 𝑠), one or more intermediate nodes and sinks, and channels with

XIAO et al.: CROSS-LAYER DESIGN OF RATELESS RANDOM NETWORK CODES FOR DELAY OPTIMIZATION

finite capacity for transmitting network-layer packets. The set of nodes in the network is denoted by 𝑉, (𝑠 ∈ 𝑉 ), and the set of channels is denoted by 𝐸. We assume directed and acyclic networks. We assume a set of 𝐿(𝐿 ≥ 1) sinks in the network. The sink set is denoted by 𝐷 = {𝑑1 , 𝑑2 , ⋅ ⋅ ⋅ , 𝑑𝐿 }, and 𝐷 ⊂ 𝑉 . Similar to [3], [11], a cut 𝑆 is defined as a partition of 𝑉 into two parts: 𝑆 𝑆 and 𝑆 𝐷 , such that 𝑠 ∈ 𝑆 𝑆 and 𝑑𝑖 ∈ 𝑆 𝐷 for any 𝑑𝑖 ∈ 𝐷. The value of a cut (denoted by 𝑉 (𝑆)) is evaluated as ∑ 𝑉 (𝑆) = 𝑧𝑒 , (1)

3313

when all sinks receive 𝐾 innovative packets, if there is no strict maximum-delay constraint for the transmission of a generation. However, if there is a maximum-delay constraint, transmission stops (time-out) when the maximum delay occurs, and an error event results if any sink does not receive 𝐾 innovative packets, and thus cannot decode the source messages. In the following analysis, we shall consider both scenarios. We note that if there are multiple generations of information to be transmitted, packets from different generations are not mixed together.

𝑒∈𝐸(𝑆)

where 𝑧𝑒 is the capacity of channel 𝑒, which shall be shown later. 𝐸(𝑆) denotes the set of edges from 𝑆 𝑆 to 𝑆 𝐷 , i.e., the set of channels from one side (𝑆 𝑆 ) of the cut to the other side (𝑆 𝐷 ). In the source, a given amount (assuming 𝑁 bits) of information bits are transmitted to all 𝐿 sinks. These information bits are called a generation [4], [12]. They are divided into 𝐾 packets/blocks with equal length. We denote the 𝐾 source information packets as 𝑆 = {𝑠1 , 𝑠2 , ⋅ ⋅ ⋅ , 𝑠𝐾 }, where 𝑠𝑖 is a vector with 𝐼 = 𝑁/𝐾 information bits. Network coding is exclusively performed among these packets and their linear combinations in the source and intermediate nodes. Specifically, assuming that a node has 𝐹 output channels: channel 1, ⋅ ⋅ ⋅ , channel 𝐹 , a network codeword to channel 𝑗 is 𝑌 𝑗 = 𝐺𝑗 𝑆 𝑇 ,

(2)

where 𝐺𝑗 = {𝑔𝑗,1 , 𝑔𝑗,2 , ⋅ ⋅ ⋅ , 𝑔𝑗,𝐾 } is the GEK (global encoding kernels), and 𝑔𝑗,𝑖 ∈ {0, 1, ⋅ ⋅ ⋅ , 𝑀 − 1} is the coding variable of network codes. 𝑀 is the alphabet size of the network codes. For sufficiently large 𝑀 , a sink decoder rebuilds 𝑆, if it receives 𝐾 packets with linearly independent (LI) GEKs [5]. In the intermediate nodes, the incoming packets are stored in memory. As in [12], [5], a received packet of a node is called innovative if its GEK is LI of those of stored packets in the node, and only packets with LI GEKs are stored in memory for encoding. Assuming 𝐽 (≤ 𝐾) packets (denoted by 𝑋 𝑖 ) stored in an encoding node, 𝑌𝑗 =

𝐽 ∑

𝛽𝑖,𝑗 𝑋 𝑖 ,

(3)

𝑖=1

where the scalar 𝛽𝑖,𝑗 s are local encoding kernels (LEKs). The 𝑋 𝑖 s come from different channels (and different time slots). The GEKs can be calculated from the LEKs ([4], [2]). Random codes ([4], [5]) mean that 𝛽𝑖,𝑗 s are randomly chosen from the alphabet of the code. The LEKs (or GEKs) are independently generated for different 𝑗. For network coding, we use rateless random network codes [4], [5]. Hence, in every intermediate node (and the source), network coding is performed once the node has stored some network codewords, and the network codewords are sent from the node whenever the channels are available. For the advantages of network coding, rateless random network codes generally can achieve higher rates, compared to fountain codes ([6], [7]), which only encode in the source. It was shown that rateless random network codes can achieve the min-cut capacity [4], [5]. The transmission of current generation stops

B. Physical Layer The network codewords are transmitted through channels disturbed by noise, and they are protected by channel codes. The channel encoder takes the network codewords as information messages and forms channel codewords, which are disturbed by additive white Gaussian noise (AWGN) with a double-sided power spectral density (PSD) 𝑁0 /2. At the end of each channel, channel codewords are detected 1 (decoded) by the channel detector to recover the network codewords. In our system, each channel codeword is for one network codeword. Thus, for a channel, a detection error in the physical layer means an erasure event at the network layer, and the detection error probability 𝜖 equals the erasure probability. To facilitate the analysis of delay, without loss of generality, we assume continuous-time orthogonal codewords for the waveform channels (Chapter 8 [13], or Chapter 7 [14]). The relation between the physical-layer detecting error probability, transmission time and information bit length are numerically well analyzed for the model (e.g., [13], [14]). If other channel codes (e.g., specific channel codes) or channel models are considered, the analysis is similar. Since the length of a network-layer codeword is 𝐼 = 𝑁/𝐾 bits, there are 𝑄 = 2𝐼 different codewords in the physical layer. Each codeword is mapped to a time-continuous signal, denoted by 𝜑𝑚 (𝑡), 𝑚 = 1, 2, ⋅ ⋅ ⋅ , 𝑄. We assume a fixed and finite transmission power 𝑃𝑠 for each 𝜑𝑚 (𝑡). These waveforms are orthogonal to each other. Each signal uses transmission time 𝑇𝑗 seconds for channel 𝑗. Hence the rate of channel 𝑗 is 𝑤𝑗 = 1/𝑇𝑗 packets per second. We assume that the 𝑇𝑗 s are finite and much smaller than the delay in the network layer. Then, transmission errors are inevitable, since error-free transmission needs infinite 𝑇𝑗 (infinite block length). Hence, the channel codewords for the network codewords 𝑌 𝑗 may be dropped by the receiving side of the channel. Here, we assume perfect error detection. Thus, if the received codewords cannot be perfectly detected in the physical layer, they are dropped/erased. This is a physical-layer error event. The error probability analysis will be discussed later. We note that with powerful channel codes and infinite 𝑇𝑗 , a physical-layer error can be avoided or being negligible. However, here we mainly measure system performance by the delay. Hence, 𝑇𝑗 must be finite. Then, physical-layer errors are significant. 1 To avoid confusion with the decoder of the network layer, we call the decoder of the physical layer as the detector.

3314

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 59, NO. 12, DECEMBER 2011

C. System Assumptions In addition to above system description, we also assume our systems having the following properties. ∙ To concentrate on the illustration of the interaction of different layers, we assume intermediate nodes with sufficiently large memory, and there is no memory overflow. More specifically, at each node, a memory of 𝑁 bits plus certain overhead on packet heads is sufficient. ∙ We assume that the field size 𝑀 of network codes is sufficiently large (We will discuss in the next section on how large 𝑀 is sufficient). The assumption adds a constraint on the length of network codewords (to be large enough). Yet, we note that 𝑀 increases exponentially with the number of bits in the coding symbol. ∙ We also assume that there is no interference between channels on the network. The assumption is common for wireless networks with orthogonal channels, or the wired networks, e.g., Internet backbone. If interference appears in the network, the rates of channels will degrade in general. However, the achievable rate and detection error probability also depend on the physical-layer processing strategies. For instance, an intermediate node receiving multiple transmission signals may treat interference solely noise or useful signals [14], [15]. For the latter, the node may decode multiple network codewords from different transmitters. Similarly, each transmitter may have multiple receivers. Then the definition of cuts shall be different from non-interfered networks. However, we note that tradeoffs (to be discussed) and cross-layer design principles will be similar, though interference management (highly depending on specific network topology) will be important. ∙ We assume memoryless and time-invariant channels. Thus, the delay of transmitting one generation of information is not affected by channel variations. With the assumption, the delay of one generation of information is independent of another generation. Here we note that though the starting time of transmitting one generation might be affected by previous generations, the delay of the generation is not affected by other generations, since their packets are not mixed. Thus, the delays are independent and identically distributed (i.i.d) random variables. We note that especially for large-scale networks, the assumption is ideal, since the network topology (thus the shape or the capacity of the cuts) may change. Then, the delay is no longer identically distributed from generation to generation. Yet, the principles of the interaction between network and physical layers will not change. For instance, the min-cut may change due to channel variations. By the same approach (these in Section IV), we can re-calculate optimization variables. III. S YSTEM A NALYSIS A. Network Layer We use the network layer delay of the sinks to measure system performance. It is defined as the time from the start of the transmission of the current generation till all sinks receive 𝐾 innovative codewords (ICs). Without considering

packet loss due to memory overflow, there are two factors affecting the delay: First, due to a transmission error in the physical layer, the transmitted codewords are erased with a certain probability. Second, the received packet might be noninnovative. We also ignore the operation delay, such as encoding and decoding of network codewords. They are normally small compared to the transmission delay. Also, the operations can be performed simultaneously with the transmission. For instance, the transmission of the current generation can start while the previous generation is being decoded in the sink. Since the LEKs (GEKs) are randomly chosen, the output network codeword may be non-innovative to the receiving node for a channel. Then, a received codeword can be formed by linear transformation from what is already stored in the receiving node. Hence, the received codeword is actually useless for this node. Yet, as we will discuss later, this factor is insignificant for network codes with nontrivial 𝑀 (alphabet size of network codes) and 𝐾. This also happens with a certain probability. Thus, the delay is a random variable (denoted by Δ𝑇 ). To measure the performance of the random variable Δ𝑇 , we use two parameters: the network-layer decoding error probability for a given delay, and expected delay. The former can be a measure mainly for the system with a maximumdelay constraint. The latter denotes a measure for the systems without the maximum-delay constraint, and the transmission continues until success. A detailed explanation of them is given as follows. We first show how to calculate the decoding error probability given a delay 𝛿𝑇 . For a given 𝛿𝑇 , an error event (𝐸𝛿𝑇 ) occurs when one or more of sinks cannot receive 𝐾 ICs in 𝛿𝑇 seconds (e.g., the system is time-out in 𝛿𝑇 seconds). One of our design objectives is to minimize the probability of 𝐸𝛿𝑇 (denoted by 𝑃𝐸 (𝛿𝑇 )) for most 𝛿𝑇 (especially high 𝛿𝑇 ). Clearly, 𝑃𝐸 (𝛿𝑇 ) is a function of 𝛿𝑇 . With increasing delay, the sinks have a higher probability of receiving enough ICs, and 𝑃𝐸 (𝛿𝑇 ) shall decrease. We now show how to evaluate 𝑃𝐸 (𝛿𝑇 ) numerically. We start the analysis from the simplest network with only one single channel. Then, we generalize the result to more complicated networks. For the single channel, we assume that it has a transmission rate 𝑤 packets per second. Clearly, for a given delay 𝛿𝑇 seconds, the source transmits 𝑤𝛿𝑇 blocks. The sink cannot decode 𝐾 source information blocks, if it detects correctly fewer than 𝐾 ICs in 𝛿𝑇 seconds. Further, we assume that the channel has an erasure probability 𝜉. It is the probability in which a transmitted packet cannot be detected correctly by the sink. Thus, for the channel model assumed in the previous section, the error probability (We denote it by 𝑃𝑆 (𝛿𝑇 ) for a single channel to differ from that of a network 𝑃𝐸 (𝛿𝑇 )) follows the binomial distribution 𝑃𝑆 (𝛿𝑇 ) = =

𝐾−1 ∑(

) 𝑤𝛿𝑇 (1 − 𝜉)𝑗 𝜉 𝑤𝛿𝑇 −𝑗 , 𝑗 𝑗=0 𝐾−1 ∑ (𝑤𝛿𝑇 ) 1 ( − 1)𝑗 𝜉 𝑤𝛿𝑇 . 𝑗 𝜉 𝑗=0

(4)

(4) can be further simplified by Chernoff’s inequality or Hoeffding’s inequality, and becomes an inequality (union

XIAO et al.: CROSS-LAYER DESIGN OF RATELESS RANDOM NETWORK CODES FOR DELAY OPTIMIZATION

bound). We note that 𝑃𝑆 (𝛿𝑇 ) can also be well approximated by error exponents ([13], [9]). The results are also union bounds, rather than equalities. Thus, for a given 𝛿𝑇 , 𝑃𝑆 (𝛿𝑇 ) is determined by 𝜉, 𝑤 and 𝐾. For a given generation length 𝑁 , 𝐾 decides the information length of each packet. Thus, 𝜉 is also significantly affected by 𝐾 (details are given in the next section). Hence, 𝐾 is an essential parameter for 𝑃𝑆 (𝛿𝑇 ). Now we generalize the above results to coded erasure networks. We first show how to evaluate 𝑃𝐸 (𝛿𝑇 ) for a set of channels crossed by a cut. We assume that a cut 𝑆 crosses a set of channels 𝐸(𝑆) = {𝑒1 , 𝑒2 , ⋅ ⋅ ⋅ , 𝑒𝐶 }. For every channel 𝑒𝑖 ∈ 𝐸(𝑆), the transmitting node is in the set 𝑆 𝑆 , and the receiving node is in the set 𝑆 𝐷 . The set of transmitting nodes of 𝐸(𝑆) is denoted by 𝑉 𝑆 (𝑆), and the set of receiving nodes of 𝐸(𝑆) is denoted by 𝑉 𝐷 (𝑆). Then, we have the following proposition. Proposition 1: We assume that channels 𝑒1 , 𝑒2 , ⋅ ⋅ ⋅ , 𝑒𝐶 have transmission rates 𝑤1 , 𝑤2 , ⋅ ⋅ ⋅ , 𝑤𝐶 , and erasure probabilities 𝜉1 , 𝜉2 , ⋅ ⋅ ⋅ , 𝜉𝐶 , respectively. 𝐸(𝑆) can be regarded (logically) as a single erasure channel with a rate 𝑊 and an erasure probability 𝜉, which is evaluated as 𝑊 = 𝑤1 + 𝑤2 + ⋅ ⋅ ⋅ + 𝑤𝐶 , and 𝜉=

∑𝐶

𝑤𝑖 𝜉𝑖 . 𝑊

𝑖=1

(5)

(6)

Proof: Since 𝑉 𝑆 (𝑆) are in one side of 𝑆, and 𝑉 𝐷 (𝑆) are in the other side, we can regard 𝑉 𝑆 (𝑆) as one transmission node, and 𝑉 𝐷 (𝑆) as one receiving node, and there is only one channel between them with a rate (assuming 𝑊 ). Since we assume no interference, 𝑉 𝑆 (𝑆) sends 𝑤1 blocks through 𝑒1 to 𝑉 𝐷 (𝑆), 𝑤2 blocks through 𝑒2 to 𝑉 𝐷 (𝑆), and so on. Hence, we get (5). In one second, there are 𝑊 packets being transmitted through 𝑆. On the average, Channel 1 erases 𝑤1 𝜉1 packets, and Channel ∑𝐶2 erases 𝑤2 𝜉2 packets, and so on. For the cut, there are 𝑖=1 𝑤𝑖 𝜉𝑖 packets being erased in one second in average. Hence, the erasure probability can be calculated as (6). Q.E.D. With the proposition, we can evaluate the rates of network codewords passing through every cut of a network. From the source to sinks, there are in general multiple cuts. The rates of network codewords passing through them are bounded by the capacity of the network. For a coded erasure network, the capacity is ([5], [11]) ∑ 𝑧𝑒𝑗 , (7) 𝐶𝑁 = min min 𝑑∈𝐷 𝑆∈𝑄(𝑠,𝑑)

𝑒𝑗 ∈𝐸(𝑆)

where 𝑄(𝑠, 𝑑) denotes the set of all cuts between source 𝑠 and sink 𝑑. 𝑧𝑗 = 𝑤𝑗 (1 − 𝜉𝑗 ) is the capacity of the erasure channel 𝑒𝑗 . Here 𝑤𝑗 = 1/𝑇𝑗 and 𝜉𝑗 are the transmission rate and the erasure probability of 𝑒𝑗 , respectively. The cut whose value (evaluated by (1)) equals 𝐶𝑁 is the min-cut for erasure networks ([5], [11]). Hence, the min-cut is the minimum among the min-cuts of all source-sink pairs in (7). This matches to the delay definition of all sinks to receive 𝐾 ICs.

3315

During the transmission process, ICs are sent from the source to the sinks. Initially, only the source has ICs. Then, intermediate nodes receive ICs and start encoding and transmission. To facilitate the analysis, for the transmission of a generation, we divide the transmission period into two stages as follows. Definition 1. Starting stage 𝛿𝑆 : The time from the transmission of one generation beginning until all intermediate nodes receive network codewords (and start encoding and transmitting). The main reason for us to define the starting stage is that when the transmission of a generation just starts, most nodes have no ICs, and network coding has not been used yet. From the point of view of IC transmission, the network is in a transient phase. Thus, the analysis is quite different from the later period, when all nodes have some ICs and network coding starts. By definition, 𝛿𝑆 is determined by the network topology (network diameter), namely, the number of hops for the routes from the source to sinks, the capacity of links among them etc. After the starting stage, the flow of ICs is relatively stable. Then, we have the following definition. Definition 2. Stable stage: The time after the starting stage until the end of the transmission of the generation. For sufficiently large 𝐾, the starting stage is much smaller than the stable stage, since for a given network, the starting stage is relatively unchanged and the stable stage increases with 𝐾. However, considering 𝛿𝑆 makes our analysis more complete, especially for small 𝐾 or large networks. Now we show that in the stable stage, the min-cut determines the rates of ICs being received in sinks, and thus 𝑃𝐸 (𝛿𝑇 ). By Proposition 1, we can reduce a cut of multiple channels into a single channel and evaluate the rate and the erasure probability. We denote the rate and erasure probability of the min-cut by 𝑊𝐶 and 𝜉𝐶 , respectively. Then we have Proposition 2: For a coded erasure network with negligible starting stage, 𝐾−1 ∑ (𝑊𝐶 𝛿𝑇 ) 𝑊𝐶 𝛿𝑇 −𝑗 (8) 𝑃𝐸 (𝛿𝑇 ) ≈ 𝑃𝑆𝐶 (𝛿𝑇 ) = (1 − 𝜉𝐶 )𝑗 𝜉𝐶 𝑗 𝑗=0 as 𝐾, 𝑀 , 𝛿𝑇 grow to infinity 2 . Here 𝑃𝑆𝐶 (𝛿𝑇 ) denotes the error probability of the min-cut. Proof: Proof is given in Appendix. Using Proposition 2, we can evaluate the 𝑃𝐸 (𝛿𝑇 ) of a coded network by only analyzing one cut (the min-cut) of the network. Proposition 2 only considers for the stable stage. If the starting stage (relatively constant) is also considered, we have 𝑃𝐸 (𝛿𝑇 ) ≈ 𝑃𝑆𝐶 (𝛿𝑇 − 𝛿𝑆 ).

(9)

By definition, 𝛿𝑆 is approximately the time when all nodes receive some network codewords. An accurate computation of 𝛿𝑆 is difficult for arbitrary networks. However, for the networks with simple topologies, e.g., tandem networks, 𝛿𝑆 can be computed. For a single channel 𝑗 with a transmission rate 𝑤𝑗 , the expected time of one packet passing it is 2 By

our experiments, tens are sufficient for small-to-medium networks.

3316

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 59, NO. 12, DECEMBER 2011

1/(𝑤𝑗 (1 − 𝜉𝑗 )) (See the proof of Corollary 1). Then, for an 𝑁 -hop tandem network, 𝛿𝑆 is approximated by 1/(𝑤𝑗 (1 − 𝜉𝑗 )) =

𝑗=1

𝑁 ∑

𝑇𝑗 /(1 − 𝜉𝑗 ).

(10)

−1

10

𝑗=1

In Proposition 2, we assume sufficiently large 𝐾. The assumption is for large-scale networks, especially certain special networks. Consider a network which has the min-cut cross two channels of two distinct paths: one path has very long delay (many hops) and another has short delay (much fewer hops). In such a scenario, at the starting stage, lots of ICs may be received in the sink through the low-delay path before the first IC arrives the sink from the long-delay path. Thus, in the starting stage, most of the ICs received by the sink is from the low delay path. In the period, the delay performance is determined by the low delay path, rather than the min-cut. However, during the stable stage, the sink receives ICs from both paths. Hence, the min-cut determines the rates received by the sinks, as we analyze in Appendix (Proof of Proposition 2). If 𝐾 is sufficiently large, then most of the ICs are received in the stable stage. Thus, our analysis on 𝑃𝐸 (𝛿𝑇 ) is still valid. Above, we assume sufficiently large (asymptotic 𝐾) for approximation. If 𝐾 is small, the approximation based on the min-cut is not valid. Note that whether 𝐾 is small or not is relative to the network topology. A small 𝐾 means that a significant part of ICs have been received by sinks before the system enters the stable stage. Then 𝛿𝜉 is significant in the total delay. Yet, the analysis of such scenario highly depends on the network topology. Due to lack of generality, we do not give further comments. In Fig. 3, we show the simulations and the calculation results for the network with a topology in Fig. 2. The simulations are performed over 50, 000 generations of information transmission for statistics. The calculation results are by (4)-(10). We assume 𝑤 = 1 packet per second and 𝑇 = 1 second for all channels. We can see that the calculation results follow the decoding probability 𝑃𝐸 (𝛿𝑇 ) well.

−2

10 probability

𝑁 ∑

−3

10

Simulations for network 1 calculation for network 1 Simulations for network 2 calculation for network 2

−4

10

100

105

110

115

120 125 130 delay: second, K = 100

135

140

145

150

Fig. 3. Simulation and calculated results of delay 𝛿𝑇 vs decoding-errorprobability 𝑃𝐸 (⋅) for the networks with the topology in Fig. 2. 𝑇𝑗 = 𝑇 = 1 second for all channels. 𝐾 = 100, and 𝑀 = 26 . Network 1 has 𝜉1 = 0.01, 𝜉2 = 0.01, 𝜉3 = 0.01, 𝜉4 = 0.2, and network 2 has 𝜉1 = 0.01, 𝜉2 = 0.01, 𝜉3 = 0.1, 𝜉4 = 0.01.

0

10

Decoding Error Prob. PE(δT)

𝛿𝑆 ≈

0

10

−1

10

−2

10

simulation M = 64 calculated result simulation M = 32 simulation M = 16 simulation M = 8 simulation M = 128 simulation M = 4 simulation M = 2

−3

10

𝑠

𝜉1

 𝑚1

channel 1

𝜉2

 𝑚2

channel 2

𝜉3

 𝑚3

channel 3

𝜉4

 𝑡

channel 4

31

32

33

34 delay: seconds

35

36

37

Fig. 4. Simulations with different alphabet sizes and calculated results (from (4)-(6)) for decoding error probability for the network in Fig. 1 (b). 𝐾 = 90.

Fig. 2. A network with 4 tandem channels. 𝜉𝑖 is the packet erasure probability.

In Fig. 1 (b), we consider a network with a more complex topology. The network has two parallel channels as the mincut. Though the network is still not large, it has a much more general topology than tandem networks in the sense of multiple sinks, and the min-cut with multiple channels. By Proposition 1, it is easy to verify that the min-cut can be logically reduced to a single channel with 𝑊 = 3 and 𝜉 = 0.08. The calculated results are obtained by Proposition 1 and equations (4)-(10). The simulations are from average results of 10, 000 generation information with 𝐾 = 90. The simulated and calculated results for them are shown in Fig. 4. We can see again that the calculated results well follow the simulated 𝑃𝐸 (𝛿𝑇 ). Especially in the high 𝛿𝑇 region (≥ 37), two curves almost overlap.

In above analysis, we assume that the alphabet size of network codes are sufficiently large. Thus, the probability of being innovative 𝑃𝐼,𝐵 is close to 1 if the transmitting nodes have more ICs than the receiving nodes ([5], also see Appendix). However, the alphabet size determines the complexity of network codes [16], [17]. It is preferable to use small alphabet sizes for the complexity reason. In Fig. 4, we show by simulations how the alphabet size affects 𝑃𝐸 (𝛿𝑇 ). The calculated result assumes infinitely large 𝑀 . From the results, we can see that if the code has small 𝑀 (smaller than 16), the error probability increases significantly with decreasing 𝑀 . This is because the probability of transmitted codewords being non-innovative is high with small 𝑀 . Furthermore, from the figure, we can also see that the code with 𝑀 ≥ 26 (6 bits for a symbol) is already very close to the calculated result with

XIAO et al.: CROSS-LAYER DESIGN OF RATELESS RANDOM NETWORK CODES FOR DELAY OPTIMIZATION

the assumption of infinite 𝑀 . A larger 𝑀 is thus unnecessary. For the system without a maximum delay constraint, the expected delay (denoted by 𝛿 𝑇 ) is a key measure. If the sinks receive 𝐾 ICs at 𝛿𝑇 with a probability 𝑃𝛿 (𝛿𝑇 ), 𝛿 𝑇 is defined by: Definition 3: Expected delay, the expected waiting time until the sinks receive 𝐾 ICs. It is defined as ∑ 𝛿𝑇 ≜ 𝑃𝛿 (𝛿𝑇 )𝛿𝑇 , (11) {𝛿𝑇 }

where {𝛿𝑇 } denotes the set of all possible delay 𝛿𝑇 s. In general, it is very difficult to calculate 𝛿 𝑇 from above definition, since it is very difficult to get 𝑃𝛿 (𝛿𝑇 ). Thus, we have the following proposition as an approximation: Proposition 3: For a coded erasure network with sufficiently large 𝐾, the expected delay approximately equals the expected delay for 𝐾 ICs passing the min-cut (denoted by 𝛿𝜉 ) plus the starting stage delay 𝛿𝑆 , i.e., 𝛿 𝑇 ≈ 𝛿𝑆 + 𝛿𝜉 .

(12)

Proof: In the steady stage, all nodes simultaneously encode and transmit codewords. The rate of ICs received by sinks equals the rate of ICs passing the min-cut (see analysis of Appendix). Then the expected delay in the steady stage is the time for 𝐾 packets passing the min-cut (large 𝐾). Adding the delay in the starting stage, we find the overall expected delay 𝛿 𝑇 . Q.E.D. Corollary 1: For a coded erasure network with sufficiently large 𝐾, min-cut having a rate 𝑊 and an erasure probability 𝜉 (see Proposition 1 for a detailed explanation), 𝛿𝜉 = 𝐾/(𝑊 ⋅ (1 − 𝜉)).

(13)

Proof: From the proof of Proposition 2, we know that the time of sinks receiving 𝐾 ICs is about the time 𝐾 packets passing the min-cut. The receiving nodes of the mincut channel gets one IC in 1/𝑊 seconds with probability 1 − 𝜉, and one ICs in 2/𝑊 seconds with probability 𝜉(1 − 𝜉), and one ICs in 𝑗/𝑊 seconds with probability 𝜉 (𝑗−1) (1 − 𝜉) etc. the expected delay for one IC passing min-cut is ∑∞Thus,(𝑗−1) 𝑗⋅𝜉 (1−𝜉)/𝑊 = 1/(𝑊 (1−𝜉)). Then, the expected 𝑗=1 delay for 𝐾 packets passing the min-cut is 𝐾/(𝑊 (1 − 𝜉)). Q.E.D. B. Physical Layer Now we analyze how the physical-layer detecting error probability 𝜖 is affected by the delay. We assume fixed transmission power (𝑃𝑠 ) and finite 𝑇 (transmission time for each physical layer codeword). Thus, the energy 𝐸𝑠 = 𝑃𝑠 𝑇 is also finite. The physical-layer capacity 𝐶𝑃 = 𝑃𝑠 log2 (𝑒)/𝑁0 (bits per second) is finite (Chapter 8, [13]). Also due to a finite 𝑇 , a detection error is inevitable even if the rate is smaller than 𝐶𝑃 ([13], [14]), since error-free transmission uses channel coding with infinite delay. An exact value of 𝜖 is quite difficult to calculate. In the medium-to-low 𝜖 region of most practical interest, 𝜉 can be well estimated by error exponent techniques

3317

(Chapter 8 [13], Chapter 7 [14]), which are actually optimized union bounds. The equation is ⎧ −𝑇 (0.5𝐶 −𝑅 ) ⎨ 2 √ 𝑃 𝑝 , 𝑅𝑝 ≤ 1 𝐶𝑃 , 4 2𝜋𝑇 𝐶𝑃√ √ 2 (14) 𝜖≤ ⎩ 2−𝑇 ( √𝐶𝑃 − 𝑅𝑝 ) 𝐸𝑟 , 1 𝐶𝑃 < 𝑅𝑝 < 𝐶𝑃 , 4 4𝜋𝑇 where 𝐸𝑟 =



1√ 𝐶 𝑃 − 𝑅𝑝

+√

1 √

4𝜋𝑇 𝑅𝑝 (2

√ , 𝑅𝑝 − 𝐶 𝑃 )

and 𝑅𝑝 =

𝐼/𝑇 is the rate of the physical layer. 𝑇 𝐶𝑃 = 𝐸𝑠 /𝑁0 log2 (𝑒) = log2 (𝑒)SNR, and 𝑇 ⋅𝑅𝑝 = 𝐼 = 𝑁/𝐾 is the length of physicallayer information bits (length of the network codewords). C. Tradeoff between Physical and Network Layer From the above analysis, we can see that 𝑃𝐸 (𝛿𝑇 ) is determined by the min-cut of the network. The erasure probability 𝜉 of a single channels equals the detection probability 𝜖 in the physical layer. By Proposition 2, both 𝑊𝐶 and 𝜉𝐶 are decided by the erasure probabilities of the min-cut channels (given the transmission rates 𝑤𝑗 ). Thus, 𝑃𝐸 (𝛿𝑇 ) is decided by the erasure probabilities of the min-cut channels. In this sense, it seems that for minimizing delay, we only need to minimize erasure probabilities of the min-cut, by e.g., increasing 𝑇 and/or decreasing 𝐼 (length of the information message of the physical layer). However, there are tradeoffs in the network layer by changing these physical layer parameters. From (14), if 𝑇 is fixed, a larger 𝐼 will cause a higher 𝜖 (higher 𝜉), and thus a larger delay. However, for a given 𝑁 , a larger 𝐼 means fewer source packets (smaller 𝐾), which lead to shorter transmission delay if 𝑇 and 𝜉 were kept unchanged. Then, there is a tradeoff on the length of the network codewords 𝐼 (or 𝐾), between the physical and the network layer. Another tradeoff between them is the transmission time 𝑇 . Obviously, increasing 𝑇 leads to longer delay in the network layer if all other conditions (𝐾 and 𝜉) are kept unchanged. However, from (14), a larger 𝑇 will lead to smaller 𝜖 (smaller 𝜉), which produces shorter delay if 𝐾 and 𝑇 were kept fixed. Thus, there is another tradeoff between the network and the physical layer, considering 𝑇 . For these observations, we can see that it is valuable to find optimal 𝐾 (or 𝐼) and 𝑇 jointly considering both physical and network layers (in cross-layer methods) for the delay measure. Hence, we shall formulate and solve delay-optimization problems in the following section. We note that as shown in the previous example, an alphabet size with a few bits (by log2 𝑀 bits) is sufficient for network codes. Without considering complexity, the network code can use an alphabet size 𝑀 = 2𝐼 . That is, all bits of a physicallayer message form one network coding symbol. Then the alphabet size is generally quite large and the coding complexity is rather high. If the code is complexity-constrained, a physical layer codeword may contain multiple network coding symbols, namely, 𝐼 = 𝑛 log2 𝑀 , where 𝑛 ≥ 1 is an integer. We may append a few all-zero bits (fewer than log2 𝑀 bits) to meet the requirement of integer 𝑛. IV. P ROBLEM F ORMULATION AND S OLUTION A. Optimal Decoding Error Probability with Fixed 𝑇 First, we consider the scenario with fixed 𝑇𝑗 for all channels. From (14), we know that 𝜖 (thus 𝜉) is determined

3318

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 59, NO. 12, DECEMBER 2011

by 𝐼 = 𝑁/𝐾, when 𝑇𝑗 s are fixed. Since 𝑁 is given, 𝜉 is a function of 𝐾. For simplicity, we denote the function by 𝜉 = 𝑓1 (𝐾), where 𝜉 is the erasure probability of the min-cut (see Proposition 1). More specifically, the function 𝑓1 (⋅) is the erasure probability as the function of 𝐾. For a single channel, it is (14) (with 𝑅𝑝 = 𝑁/𝐾𝑇 ). For the min-cut cross multiple channels, the erasure probability of each component channel is evaluated by (14). Then, they are used to evaluate the erasure probability of the cut 𝑓1 (⋅) using Proposition 1. 𝑃𝐸 (𝛿𝑇 ) is determined by 𝐾 and 𝜉. We denote (9) by 𝑃𝐸 (𝛿𝑇 ) = 𝑓2 (𝛿𝑇 , 𝜉, 𝐾). An optimization problem is formulated as Minimize 𝑃𝐸 (𝛿𝑇 ) = 𝑓2 (𝛿𝑇 , 𝜉, 𝐾) S. T. 𝜉 = 𝑓1 (𝐾).

The integer 𝐾 (∈ [1, ⋅ ⋅ ⋅ , 𝑁 ]) is the only optimization variable, since 𝜉 is also a function of 𝐾. 𝛿𝑇 is a given parameter. By inserting 𝑓1 (⋅) into 𝑓2 (⋅), we can get more insights on the properties of the problem. Let us first assume one channel in the min-cut, negligible 𝛿𝑆 and assume the erasure probability −𝑇 (0.5𝐶𝑃 −𝑁/𝐾𝑇 ) −𝑇 (0.5𝐶𝑃 ) 𝑁 𝜉 = 𝜖 = 2 √2𝜋𝑇 𝐶 = 𝐶1 2 𝐾 , where 𝐶1 = 2√2𝜋𝑇 𝐶 𝑃 𝑃 is constant to 𝐾. Then, our problem is reduced to minimize ) 𝑁 (𝑤𝛿𝑇 −𝑗) ∑𝐾−1 ( 𝑁 𝐾 𝑃𝑆 (𝛿𝑇 ) = 𝑗=0 𝑤𝛿𝑗 𝑇 (1−𝐶1 2 𝐾 )𝑗 (𝐶1 2) . An exact analysis of the problem is impossible, since 𝐾 also decides the number of sum terms. Another difficulty of solving the optimization problem is that 𝐾 is an integer. Yet, for approximation, we can use Chernoff’s inequality, 𝑃𝑆 (𝛿𝑇 ) ≤ 𝑒−

𝑁 2((1−𝐶1 2 𝐾 )𝑛−𝐾+1)2 𝑛

.

(15)

Here we use 𝑛 = 𝑤𝛿𝑇 to simplify the notation. For the RHS (right hand side), the derivative 𝑁 2((1−𝐶1 2 𝐾 )𝑛−𝐾+1)2

𝑁

𝑛 is 𝑒− ( 𝑛4 )(𝑛𝐶1 2 𝐾 + 𝐾 − 𝑛 − 𝑁 𝑁 1)(𝑛𝐶1 ln 2 𝐾 2 2 𝐾 − 1). The derivative and the secondorder derivative of the RHS of (15) are not uniformly positive or negative. Thus, we cannot conclude if it is convex or concave for general scenarios. Yet, we note that 𝑛 is the number of transmission packets in time 𝛿𝑇 . We are mainly interested in the scenario 𝑛(1 − 𝜖) > 𝐾 (corresponding to a large 𝛿𝑇 ). It means that at least 𝐾 packets are expected to be received by the receiver (otherwise, 𝑃𝑆 (𝛿𝑇 ) ≈ 1). 𝑁 Under such a condition, 𝑛𝐶1 2 𝐾 + 𝐾 − 𝑛 − 1 < 0 (note that 𝑁 𝑁 𝑁 𝐶1 2 𝐾 = 𝜖 < 1). 𝑛𝐶1 ln 2 𝐾 2 2 𝐾 − 1 uniformly decreases 𝑁 with 𝐾. Since 𝐾 ∈ [1, ⋅ ⋅ ⋅ , 𝑁 ], for 𝜖𝑁1ln 2 < 𝑛 < 𝜖 ln 2, 𝑁 𝑁 𝐾 𝑛𝐶1 ln 2 𝐾 2 2 − 1 decreases from a positive to a negative number when 𝐾 increases from 1 to 𝑁 . We note that 𝑁 the assumption of 𝜖𝑁1ln 2 < 𝑛 < 𝜖 ln 2 might be the most interesting regions in practice. Then, The derivative of the RHS of (15) changes from a negative to a positive number when 𝐾 increases from 1 to 𝑁 . The critical point is 𝐾 𝑁 such that 𝑛𝐶1 ln 2 𝐾𝑁2 2 𝐾 − 1 = 0. Thus, we can find the 𝑁 minimum for given 𝛿𝑇 by solving 𝑛𝐶1 ln 2 𝐾𝑁2 2 𝐾 − 1 = 0 𝐾 𝑁 for max{ 𝜖𝑁1ln 2 , 1−𝜖 } < 𝑛 < 𝜖 ln 2 . Note that the boundaries of inequality include 𝜖, which is a function of 𝐾. Thus, the optimal 𝐾 needs to ensure that the inequality holds. Otherwise, the optimal 𝐾 may not exist or in the boundary 𝑁 of the inequality for 𝑛. For other 𝑛, e.g., 𝑛 > 𝜖 ln 2 , then 𝑁 𝑁 𝐾 𝑛𝐶1 ln 2 𝐾 2 2 − 1 is constantly positive. The RHS of (15) is

an increasing function of 𝐾. The minimum is the boundary 𝑁 points such that 𝑛(1 − 𝜖) > 𝐾 and 𝑛 > 𝜖 ln 2 hold. Though (15) is an upper bound, it can give insights on how 𝑃𝑆 (𝛿𝑇 ) is affected by 𝐾. If there are multiple channels in the min-cut, the erasure probability and transmission rates of the cut are the linear combination of individual channels as discussed in Proposition 1, (5) and (6). A similar approach can be used to find optimized 𝐾 and 𝑃𝑆 (𝛿𝑇 ) for a given 𝛿𝑇 . Note that for 14 𝐶𝑃 < 𝑅𝑝 < 𝐶𝑃 , similar analysis may be analytically intractable since the derivative will be too complicated. Furthermore, in some cases (if not most cases), we should not only consider 𝑃𝐸 (𝛿𝑇 ) for only one specific 𝛿𝑇 . How 𝐾 affects 𝑃𝐸 (𝛿𝑇 ) as a function of 𝛿𝑇 (namely, a wide range of 𝛿𝑇 ) might be more interesting. Intuitively, larger 𝐾 leads to smaller 𝐼, and smaller 𝜉. Yet, it also means more packets in the source. Thus, it normally leads to a sharper slope in the 𝛿𝑇 -to𝑃𝐸 (𝛿𝑇 ) function in the high 𝛿𝑇 region (larger exponent), but higher 𝑃𝐸 (𝛿𝑇 ) in the low 𝛿𝑇 region. More formally, in (15), 𝑁 √ √ )2 , where 𝑛 = 𝑤𝛿𝑇 . the exponent is −2((1−𝐶12 𝐾 ) 𝑛− 𝐾−1 𝑛 𝐾−1 For small 𝛿𝑇 , √𝑛 has a large impact. Then, larger 𝐾 leads 𝑁 √ to higher error probability. For large 𝛿𝑇 , (1−𝐶1 2 𝐾 ) 𝑛 is the dominating term. Then, larger 𝐾 leads to a larger exponent of 𝑃𝐸 (𝛿𝑇 ), and thus sharper slopes. In Fig. 5, we show an example for a network with two identical tandem channels. We assume 𝑁 = 1200 bits. Since 𝜉 is relatively high, it is easily evaluated from simulations. 𝑇𝑗 = 𝑇 for all channels is fixed and thus normalized to one second. From the figure, we can see that increasing 𝐾 leads to a larger exponent in 𝑃𝐸 (𝛿𝑇 ) curves (increased slope). Yet, this does not necessarily mean a better performance. In the relatively high 𝑃𝐸 (𝛿𝑇 ) region, a small 𝐾 gives a much smaller 𝑃𝐸 (𝛿𝑇 ). The results match to our analysis. Thus, the choice of optimal 𝐾 largely relies on which 𝑃𝐸 (𝛿𝑇 ) region we consider. There are no unique optimal solutions when we consider the whole range of 𝛿𝑇 . For instance, the code with 𝐾 = 80 has a lower 𝑃𝐸 (𝛿𝑇 ) than that with 𝐾 = 90 until delay 105 seconds. After that, the code with 𝐾 = 90 has lower 𝑃𝐸 (𝛿𝑇 ). Another note is that since 𝐾 is an integer, 𝐾 is finite. From (9) and (14), both 𝑃𝐸 (𝛿𝑇 ) and 𝜉 can be calculated from 𝐾. Thus, one possible solution is to draw all 𝑃𝐸 (𝛿𝑇 ) curves as a function of 𝐾, and pick the one best suited for the design objective. B. Optimal Decoding Error Probability with Variable 𝑇 s Above, we assumed fixed 𝑇𝑗 s for the physical-layer codewords. If we relax this constraint, and consider the 𝑇𝑗 s as variables too, more freedom is obtained for design. Now, we consider both 𝑇𝑗 s and 𝐾 as optimization variables. Since the physical-layer transmitted codeword 𝜑𝑚 (𝑡) has a fixed power 𝑃𝑠 , increasing 𝑇𝑗 leads to larger 𝐸𝑠 , and thus larger SNR. From (14), increasing 𝑇 means smaller 𝜖 (thus smaller 𝜉s for the min-cut channels). We denote 𝜉 = 𝑓4 ({𝑇𝑗 }, 𝐾), which has a variable SNR, compared to 𝑓1 (⋅). Here we use {𝑇𝑗 } to denote a set of 𝑇𝑗 s for all channels (𝑗 = 1, 2, ⋅ ⋅ ⋅ ). For convenience, we denote 𝑃𝐸 (𝛿𝑇 ) as a function of 𝐾 and {𝑇𝑗 } as 𝑃𝐸 (𝛿𝑇 ) = 𝑓3 (𝛿𝑇 , 𝜉, {𝑇𝑗 }, 𝐾). Then, the problem becomes Minimize 𝑃𝐸 (𝛿𝑇 ) = 𝑓3 (𝛿𝑇 , 𝜉, {𝑇𝑗 }, 𝐾) S. T. 𝜉 = 𝑓4 ({𝑇𝑗 }, 𝐾).

XIAO et al.: CROSS-LAYER DESIGN OF RATELESS RANDOM NETWORK CODES FOR DELAY OPTIMIZATION

0

0

10

10

K= 80, ξ = 0.07 K= 90, ξ = 0.025 K=63, ξ = 0.2

−2

10

−2

10

−4

10

−4

10 PE(δT)

Decoding Error Probability, PE(δT)

3319

−6

10

−6

10 −8

10

T = 0.8 second, ξ = 0.19, SNR = 11dB T = 1 second, ξ = 0.07, SNR = 12dB

−8

10 −10

10

Fig. 5.

80

85

90

95 100 Delay, δT: seconds

105

110

115 −10

10

75

80

85

Impact of 𝐾 on 𝑃𝐸 (𝛿𝑇 ) for a network with two tandem channels.

Here both {𝑇𝑗 } and 𝐾 are optimization variables, and 𝜉 is a function of them. Clearly, the 𝑇𝑗 s are a set of continuous valued variables, and 𝐾 is an integer. Thus, the problem falls into the field of mixed integer nonlinear programming (MINLP) [18], [19]. For a general scenario, this type of optimization problems is quite difficult. There is normally no solution with polynomial complexity [18], [19]. For the previous problem with fixed 𝑇 s, the analytic solution for optimal 𝐾 may be found for special conditions. A similar approach may be applicable to 𝑇𝑗 . Then, in (15), 𝐶1 is a function of 𝑇 , namely, the −𝑇 (0.5𝐶𝑃 ) 𝑁 √ √ )2 . exponent of 𝑃𝐸 (𝛿𝑇 ) is −2((1 − 2√2𝜋𝑇 𝐶 2 𝐾 ) 𝑛 − 𝐾+1 𝑛 𝑃 We can see that a larger 𝑇 will lead to a larger exponent −𝑇 (0.5𝐶𝑃 ) 𝑁 √ for large 𝑛 when (1 − 2√2𝜋𝑇 𝐶 2 𝐾 ) 𝑛 is dominant. For 𝑃 √ small 𝑛, 𝐾+1 is dominant. Then, a large 𝑇 may lead to 𝑛 higher 𝑃𝐸 (𝛿𝑇 ). For illustration, we consider a single channel system. We compare two scenarios with 𝑇 = 0.8 and 1 seconds. We assume 𝐶𝑃 = 22.87, 𝐾 = 80, 𝑁 = 1000. Then 𝜉 = 𝜖 ≈ 0.19 and 0.07, respectively. 𝑃𝐸 (𝛿𝑇 ) can thus be evaluated by putting the results to (4). The numerical results are shown in Fig. 6. We can see that one 𝑇 outperforming another in certain region might be worse in other region, and a larger 𝑇 leads to a larger exponent of 𝑃𝐸 (𝛿𝑇 ) in the absolute value (sharper slope in the figure). Thus, the numerical results match to our analysis on the exponent of 𝑃𝐸 (𝛿𝑇 ). One note is that the crossing point of two curves is at very low 𝑃𝐸 (𝛿𝑇 ). However, by our analysis on 𝑃𝐸 (𝛿𝑇 ), this is determined by network and channel parameters such as 𝐾, 𝐶𝑃 and so on. More importantly, similar to the impact of 𝐾, we can see that there is no single optimal 𝑇 for all 𝛿𝑇 regions. If the range of 𝐾 is not too large, a pragmatic method can be used: We can draw all curves with different 𝐾 or 𝑇 in a single figure, and choose the one most suitable to the practical design objective. C. Minimizing Expected Delay Now we discuss how to minimize the expected delay. In (12), 𝛿𝑆 is quite small compared to 𝛿𝜉 for a large 𝐾, and also it is relatively fixed (less affected by 𝐾). Thus, we minimize 𝛿𝜉 instead of 𝛿 𝑇 . The problem becomes

90 95 Delay, δT: seconds

100

105

110

Fig. 6. A tradeoff example for 𝑃𝐸 (𝛿𝑇 ) with different 𝑇 in a single channel. 𝐾 = 80, 𝐶𝑃 = 22.87. SNR changes with 𝑇 .

Minimize 𝛿𝜉 = 𝐾/(𝑊 (1 − 𝜉)) S. T. 𝜉 = 𝑓4 ({𝑇𝑗 }, 𝐾).

Solving this problem is relatively direct. Now we use the network in Fig. 2 for illustration. We assume that 𝑇𝑗 = 𝑇 are fixed and normalized to 1 second for all channels 𝑗. If 𝑇𝑗 s are variables too, we can use the approaches similar to previous problems for extension. Depending on 𝑓4 (⋅), the problem can be reduced to a convex optimization problem, which has an efficient (polynomial complexity) solution. For instance, −(0.5 log2 (𝑒)𝐸𝑠 /𝑁0 −𝑁/𝐾) , using assuming a low 𝜉, then 𝜉 = 𝜖 = 2 √ 2𝜋𝐸𝑠 /𝑁0 log2 (𝑒)

(14). Since 𝐾 is the only variable in 𝑓4 (⋅), and 𝜉 is the weighted average of all channel erasure probabilities in the min-cut, we write 𝜉 = 𝐶1 ⋅ 2𝑁/𝐾 , where 𝐶1 is independent of 𝐾. Thus, the problem is to minimize 𝐾/(1 − 𝐶1 ⋅ 2𝑁/𝐾 ). Here we drop 𝑊 since it is a constant. Since 𝐼 > 0 and 𝑁 is a constant, the problem is reduced to Minimize 𝑓6 (𝐼) ≜ 𝐼(𝐶1 2𝐼 − 1) for 𝑁 > 𝐼 > 0. Here the only variable is 𝐼 (namely 𝑁/𝐾). It is easy to verify that the second-derivative of 𝑓6 (⋅) is strictly positive. Thus, 𝑓6 (⋅) is convex. For the network in Fig. 2, we assume 𝐸𝑠 /𝑁0 = 12 dB and 𝑁 = 1200 bits, we can find the relation between 𝜉 and 𝐼 (and 𝐾). We assume that the system works in the high 𝜉 region. To further simplify the problem, we use polynomial curve-fitting to the simulation results to find the expressing function of 𝑓4 (𝐼). We can approximate 𝜉 = 𝑓4 (𝐼) ≈ 0.0018𝐼 2 − 0.0312𝐼 + 0.1498 with 𝐼 = 1200/𝐾. Thus, we can put these results into 𝛿𝜉 . The problem is reduced to minimize 𝐼(0.0018𝐼 2 −0.0312𝐼 +0.1498). By investigating the critical points, it is easy to find the optimal value of 𝐼 as 𝐼 ∗ = 20 (rounded to the nearest integer) and thus 𝐾 ∗ = 60. In Fig. 7, we show the simulations for the codes with optimal 𝐾 (𝐾 = 60). For comparison, we also give the numerical results for 𝐾 = 55 and 𝐾 = 87. The vertical lines highlight the expected delay of the schemes with different 𝐾s. From the figure, we can see that the expected delay of the system

3320

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 59, NO. 12, DECEMBER 2011

0

10

−1

Distribution

10

−2

10

−3

10

K = 60, ξ = 0.2332 expected delay K = 60 expected delay K = 55 K = 55, ξ = 0.3078 K = 87, ξ = 0.0532 expected delay K = 87

−4

10

60

65

70

75

80

85

90

95

100

105

Delay

Fig. 7. Distribution of the delay for different 𝐾. Vertical lines are the expected delays. 𝐾 = 60 is the optimal value. The expected delays are from simulation results.

with an optimal 𝐾 = 𝐾 ∗ = 60 is the smallest. Since the delay is a random variable, we also show the histogram of the ratios of the schemes with different 𝐾. The histogram on the distribution also matches to our optimization results by comparison. V. C ONCLUSIONS We have studied cross network-physical layer design for networks with rateless random network codes. The focus of the design objective is transmission delay in the network layer. We measure by the sink decoding error probability of a given delay and expected delays. The main contributions are: We observe that the delays of a coded network with erasure channels are i.i.d. random variables. The calculation approaches for delay measures are investigated for coded erasure networks. We show how to evaluate the rate and erasure probability of a set of channels of one cut. We also show that for a sufficiently large number of information packets, the decoding error probability in the sinks can be evaluated through the min-cut. We find that both the length of the packets in the network layer and the transmission time of the physical-layer codewords have opposite impacts on the physical and network layer, considering delay. Thus, we should find the optimal values of them in a cross-layer approach. Then, we formulate the problems of optimizing delay performance, and suggest solutions for them. For future work, an interesting direction might consider delay optimization in wireless networks. A natural application of network coding is wireless networks with multi-node multi-hop. Then, we need to consider the impact of fading, interference, and broadcasting etc. ACKNOWLEDGMENT The authors would like to thank the editor for kind help and suggestion and anonymous reviews for valuable comments. A PPENDIX : P ROOF OF P ROPOSITION 2 For sufficiently large 𝐾, 𝛿𝑇 , the network is in the stable stage, in which sinks receive most of the ICs. Thus, we can

concentrate our proof in the stable stage. We note that it is meaningful to consider large 𝛿𝑇 for a large 𝐾. Otherwise, if the sink receives fewer than 𝐾 packets, 𝑃𝐸 (𝛿𝑇 ) = 1. A key note is that our numerical results are averaged over many generations of transmission. Hence, for a channel with a rate 𝑤 and an erasure probability 𝜉, the receiving side successfully receives 𝑤(1−𝜉) packets in one second (i.e., 𝑤(1−𝜉) packets pass the channel). We start from a network with two tandem channels. In the network, an intermediate node 𝐵 stores 𝑁𝐵 ICs, and a source node 𝑈 has 𝑁𝑈 ICs, and a sink node 𝐷 has 𝑁𝐷 ICs. Channel 1 connects 𝑈 and 𝐵 with an erasure probability 𝜉1 and a rate 𝑤1 = 1/𝑇1 . The capacity of Channel 1 is 𝑧1 = 𝑤1 (1 − 𝜉1 ). Channel 2 connects 𝐵 and 𝐷 with an erasure probability 𝜉2 , and a rate 𝑤2 = 1/𝑇2 . The capacity of Channel 2 is 𝑧2 = 𝑤2 (1 − 𝜉2 ). Clearly, 𝑁𝑈 ≥ 𝑁𝐵 ≥ 𝑁𝐷 . Let Φ𝐵 ≜ 𝑁𝐵 − 𝑁𝐷 , then Φ𝐵 determines the probability (denoted by 𝑃𝐼,𝐵 ) of being innovative for a transmitted codeword. It is shown in [5] that 𝑃𝐼,𝐵 ≥ 1 − 𝑀 −Φ𝐵 . Thus, for a large 𝑀 , { 1, Φ𝐵 > 0 𝑃𝐼,𝐵 ≈ . (16) 0, Φ𝐵 = 0 Similarly, Φ𝑈 ≜ 𝑁𝑈 − 𝑁𝐵 , and 𝑃𝐼,𝑈 ≥ 1 − 𝑀 −Φ𝑈 . From (16), if a transmitter has more ICs than a receiving node, the codeword is innovative with probability 1. Clearly, 𝑃𝐸 (𝛿𝑇 ) is the probability of fewer than 𝐾 ICs received at 𝐷 in time 𝛿𝑇 . Assuming 𝑁𝛿 ICs received in time 𝛿𝑇 at sink 𝐷, then 𝑃𝐸 (𝛿𝑇 ) = 𝑃 𝑟{𝑁𝛿 < 𝐾} =

𝐾−1 ∑

𝑃 𝑟{𝑁𝛿 = 𝑗}.

(17)

𝑗=0

In the stable stage, sink 𝐷 receives 𝛿𝑇 𝑧2 packets (innovative or not). Thus, { 0, 𝛿𝑇 𝑧 2 < 𝑗 𝑃 𝑟{𝑁𝛿 = 𝑗} = , (18) 𝑃 𝑟{𝛿𝑇 𝑧2 𝑃 𝐼,𝐵 = 𝑗}, 𝛿𝑇 𝑧2 ≥ 𝑗 where 𝑃 𝐼,𝐵 denotes the average (among 𝛿𝑇 time) probability of received packets being innovative at node 𝐷. Now we show that for large 𝐾, 𝛿𝑇 , 𝑃 𝑟{𝑁𝛿 = 𝑗} approximately equals the probability of 𝑗 packets passing the min-cut. We consider two different scenarios. Scenario (1), 𝑧1 > 𝑧2 . Then, the mincut is Channel 2. After 1/𝑧1 seconds, node 𝐵 receives ICs and starts to transmit. In average (among many generations), 𝑁𝐵 increases in a rate 𝑧1 and decreases in a rate 𝑧2 . Then, in one second, Φ𝐵 changes 𝑧1 − 𝑧2 > 0, until 𝑁𝐵 = 𝐾. When 𝑁𝐵 = 𝐾, Φ𝑈 = 0. In one second Φ𝐵 reduces 𝑧2 until to 0. Node 𝐷 receives 𝑧2 packets in one second. Since Φ𝐵 > 0, 𝑃 𝐼,𝐵 = 𝑃𝐼,𝐵 ≈ 1 until transmission finishes. Then, 𝑃 𝑟{𝑁𝛿 = 𝑗} ≈ 𝑃 𝑟{𝛿𝑇 𝑧2 = 𝑗}. Scenario (2), the min-cut is Channel 1. Since 𝑧1 < 𝑧2 , node 𝐷 receives packets faster than node 𝐵. Thus, node 𝐵 will run out of ICs, relative to node 𝐷, namely, Φ𝐵 = 0. The codewords received by node 𝐷 are not innovative and 𝑃 𝐼,𝐵 < 1. Before 𝐾 packets are received by 𝐵, 𝑃𝐼,𝑈 = 1 and Channel 2 is faster. For either case, 𝑁𝛿 = 𝑊𝐶 𝛿𝑇 (1 − 𝜉𝐶 ) (ICs passing the min-cut channel). Note that it takes 𝐵 𝑇2 /(1 − 𝜉2 ) seconds to transmit the 𝑗-th packet to 𝐷. Yet, it is negligible for a large 𝛿𝑇 . 𝑃 𝑟{𝑁𝛿 =

XIAO et al.: CROSS-LAYER DESIGN OF RATELESS RANDOM NETWORK CODES FOR DELAY OPTIMIZATION

3321

2 } can separate the source and sinks. (3), A channel in the  𝑚8 min-cut involving multiple cuts. For a network, assume Ch. 𝑖 ∈ cut 1 and cut 2, and cut 1 is the min-cut. Then, the rates ⋅⋅⋅ ⋅⋅⋅ ⋅ ⋅ ⋅ Ch.3 ⋅⋅⋅ of channels 𝑉 (𝐸(cut 1 − Ch. 1)) < 𝑉 (𝐸(cut 2 − Ch. 1)). The same as above analysis for tandem networks, we can know that cut 1 determines the time ICs received sinks. Thus, Ch.2 : 𝑧2   𝑚3  ⋅ ⋅ ⋅ 𝑚5 𝑚1 Ch.1 : 𝑧1 𝑚7 Involving multi-cuts for a channel does not affect our analysis. Summarizing (1), (2) and (3), the analysis for the network cut 𝑆𝑚 : min-cut cut 2 with multi-channel min-cut is similar to the single channel min-cut. Thus, the time of receiving 𝑗 ICs at the sinks is Fig. 8. A network with the min-cut cross multiple channels. 𝑧𝑖 is the capacity time passing the min-cut, and 𝑃 𝑟{𝑁𝛿 = 𝑗} ≈ of Channel (or channel set) 𝑖. (the ) 𝑗 packets 𝑊𝐶 𝛿𝑇 𝑗 𝑊𝐶 𝛿𝑇 −𝑗 (1 − 𝜉 ) 𝜉 . 𝐶 𝑗 Another note is that the delay is defined as the time when all sink receives 𝐾 ICs. Thus, the delay is the maximum ) ( 𝑗} ≈ 𝑊𝐶𝑗 𝛿𝑇 (1 − 𝜉𝐶 )𝑗 𝜉 𝑊𝐶 𝛿𝑇 −𝑗 . Taking the result to (17) (among all sinks) time for all sinks to receive 𝐾 ICs. It is concludes the proof for the two-hop tandem network. determined by the minimum (among all sinks) rates of all For networks with more than 2 hops, we can analyze all min-cuts between the source and all sinks, since it requires intermediate nodes in a way similar to that of 𝐵 as follows. the longest delay for the sink to receive sufficient ICs. This is Consider a tandem network with 𝑍(𝑍 > 1) intermediate just how the min-cut of the multiple-sink networks is defined nodes (and 𝑍 + 1 channels). For generality, we consider (as in (7)), i.e., the minimum of the min-cut among all sourcenetworks with multiple min-cuts. Assume Channel 𝑖 (closer sink pairs. Q.E.D. to the source) and Channel 𝑚 are min-cuts with similar capacity. Other channels have higher capacity. Then, ICs will R EFERENCES be accumulated (increase) before Channel 𝑖. For transmitting [1] R. Ahlswede, N. Cai, S.-Y. R. Li, and R. W. Yeung, “Network infornode 𝑚𝑠 of Channel 𝑚, the increasing rate of ICs is limited by mation flow,” IEEE Trans. Inf. Theory, vol. 46, pp. 1204–1216, July 2000. Channel 𝑖, and the reducing rate of ICs is limited by Channel [2] S. Li, R. W. Yeung, and N. Cai, “Linear network coding,” IEEE Trans. 𝑚. Hence, ICs shall arrive and leave 𝑚𝑠 in approximately the Inf. Theory, vol. 49, no. 2, pp. 371–381, Feb. 2003. same rates, namely, the min-cut. Then, Φ𝑚𝑠 will keep positive. [3] R. Koetter and M. M´edard, “An algebraic approach to network coding,” IEEE/ACM Trans. Networking, pp. 782–795, Oct. 2003. The packets transmitted in Channel 𝑚 are innovative with [4] T. Ho, M. M´edard, R. Koetter, D. Karger, M. Effros, J. Shi, and probability close to 1 too. Hence, the time of 𝑗 ICs received B. Leong, “A random linear network coding approach to multicast,” in the sink is about the time 𝑗 packets passing Channel 𝑖 or 𝑚, IEEE Trans. Inf. Theory, vol. 52, no. 10, pp. 4413–4430, Oct. 2006. [5] D. Lun, M. M´edard, R. Koetter, and M. Effros, “On coding for reliable plus some negligible time (of the 𝑗-th packets arriving the sink. ) communication over packet networks,” Physical Commun., Mar. 2008. Namely, 𝑃 𝑟{𝑁𝛿 = 𝑗} ≈ 𝑊𝐶𝑗 𝛿𝑇 (1 − 𝜉𝐶 )𝑗 𝜉 𝑊𝐶 𝛿𝑇 −𝑗 . For the [6] M. Luby, “LT codes,” in Proc. 2002 Annual IEEE Symposium on networks with 3 or more min-cut channels, analysis is similar. Foundations of Computer Science, pp. 271–282. [7] A. Shokrollahi, “Raptor codes,” IEEE Trans. Inf. Theory, vol. 52, no. 6, For more general networks, the min-cut (say, cut 𝑆𝑚 ) pp. 2551–2567, June 2006. has multiple channels (channel set 𝐸(𝑆𝑚 )) as in Fig. 8. [8] A. Dana, R. Gowaikar, R. Palanki, B. Hassibi, and M. Effros, “Capacity of wireless erasure networks,” IEEE Trans. Inf. Theory, vol. 52, no. 3, Since we have studied the networks with a single channel pp. 789–804, Mar. 2006. as the min-cut, we analyze following three properties on a [9] M. Vehkaper¨a and M. M´edard, “A throughput-delay trade-off in packset of channels 𝐸(𝑆𝑚 ) making possible difference from a etized systems with erasures,” in Proc. 2005 IEEE Int. Sym. on Info. Theory, pp. 1858–1862. single channel. (1), Dependence among codewords of different channels in 𝐸(𝑆𝑚 ). We can use a contradiction approach to [10] A. Eryilmaz, A. Ozdaglar, and M. M´edard, “On delay performance gains from network coding,” in Proc. 2006 Annual Conference on Information show codewords in 𝐸(𝑆𝑚 ) are independent. If not, there are Sciences and Systems, pp. 864–870. fewer than 𝑗 ICs for the 𝑗 packets passing 𝑆𝑚 . However, [11] A. Ramamoorthy, J. Shi, and R. Wesel, “On the capacity of network coding for random networks,” IEEE Trans. Inf. Theory, vol. 51, no. 8, as shown in [4], [5], [8], random linear network codes can pp. 2878–2885, Aug. 2005. achieve the min-cut capacity for coded erasure networks, (for [12] P. A. Chou, Y. Wu, and K. Jain, “Practical network coding,” in Proc. 2003 Annual Allerton Conference on Communication, Control, and single-source, directed acyclic networks, with large enough Computing. 𝑀 , as assumed). Hence, for our networks, 𝑧 packets passing [13] R. Gallager, Information Theory and Reliable Communication. John through 𝑆𝑚 are linearly independent and innovative. (2), Also Wiley and Sons, 1968. due to multiple transmitting or receiving nodes in 𝐸(𝑆𝑚 ), a [14] J. G. Proakis, Digital Communications, 4th edition. McGraw-Hill, 2000. channel (say, Ch. 1) ∈ 𝐸(𝑆𝑚 ) connects (directly or by some [15] D. Tse and P. Viswanath, Fundamentals of Wireless Communication. Cambridge University Press, 2005. hops) to an input or output channel (or channels), e.g., Ch. 2 [16] A. Rasala-Lehman and E. Lehman, “Complexity classification of network information flow problems,” in Proc. 2004 Annu. ACM-SIAM in Fig. 8, such that 𝑧1 > 𝑧2 AND all codewords of Ch. 1 pass Symp. Discrete Algorithms, pp. 142–150. Ch. 2. Then, the codewords of Ch. 1 may not be innovative, [17] C. Chekuri, C. Fragouli, and E. Soljanin, “On average throughput and since 𝑧1 > 𝑧2 . This is also impossible. Otherwise, another alphabet size in network coding,” IEEE Trans. Inf. Theory, vol. 52, no. 6, pp. 2410–2424, 2006. cut (cut 2 in Fig. 8) replacing Ch. 1 by Ch. 2 has a value 𝑉 (cut 2) < 𝑉 (𝑆𝑚 ), where 𝑉 (⋅) is the value of a set of [18] C. A. Floudas, P. Pardalos, C. Adjiman, et al., Handbook of Test Problems in Local and Global Optimization (Nonconvex Optimization channels (1). Then, cut 2 is the min-cut. This contradicts the and Its Applications). Springer Press, 1999. assumption. Note that cut 2 always exists since all codewords [19] I. Nowak, Relaxation and Decomposition Methods for Mixed Integer Nonlinear Programming. Springer Press, 2005. of Ch. 1 shall pass Ch. 2. Then cut 2 = {𝐸(𝑆𝑚 ) − Ch.1 + Ch. 𝑚2

Ch.𝑛 : 𝑧𝑛

 𝑚4  ⋅ ⋅ ⋅

𝑚6

3322

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 59, NO. 12, DECEMBER 2011

Ming Xiao (S’2002-M’2007) was born in the SiChuan Province, P. R. China, in 1975. He received Bachelor and Master degrees in Engineering from the University of Electronic Science and Technology of China, ChengDu in 1997 and 2002, respectively. He received Ph.D. degree from Chalmers University of Technology, Sweden in November 2007. From 1997 to 1999, he worked as a network and software engineer in ChinaTelecom. From 2000 to 2002, he also held a position in the SiChuan communications administration. From November 2007 to now, he has been in ACCESS Linnaeus Center, School of Electrical Engineering, Royal Institute of Technology, Sweden, where he is currently an assistant professor. Dr. Xiao received “Chinese Government Award for Outstanding Self-Financed Students Studying Aborad” in March, 2007. He received a “Hans Werth´en Grant” from the Royal Swedish Academy of Engineering Science (IVA) in March 2006. He received “Ericsson Research Funding” from Ericsson in 2010. He received the best paper awards in “IC-WCSP” (International Conference on Wireless Communications and Signal Processing) in 2010 and “IEEE ICCCN” (International Conference on Computer Communication Networks) in 2011. He was a visiting researcher at laboratory for information and decision system, Massachusetts Institute of Technology, USA in 2006, and the institute of network coding, the Chinese University of Hong Kong in 2010. Muriel M´edard is a Professor in the Electrical Engineering and Computer Science at MIT. She was previously an Assistant Professor in the Electrical and Computer Engineering Department and a member of the Coordinated Science Laboratory at the University of Illinois Urbana-Champaign. From 1995 to 1998, she was a Staff Member at MIT Lincoln Laboratory in the Optical Communications and the Advanced Networking Groups. Professor Medard received B.S. degrees in EECS and in Mathematics in 1989, a B.S. degree in Humanities in 1990, a M.S. degree in EE 1991, and a Sc D. degree in EE in 1995, all from the Massachusetts Institute of Technology (MIT), Cambridge. She has served as an Associate Editor for the Optical Communications and Networking Series of the IEEE J OURNAL ON S ELECTED A REAS IN C OM MUNICATIONS , as an Associate Editor in Communications for the IEEE T RANSACTIONS ON I NFORMATION T HEORY and as an Associate Editor for the OSA Journal of Optical Networking. She has served as a Guest Editor for the IEEE J OURNAL OF L IGHTWAVE T ECHNOLOGY, the Joint special issue of the IEEE T RANSACTIONS ON I NFORMATION T HEORY and the IEEE/ACM T RANSACTIONS ON N ETWORKING AND I NFORMATION T HEORY and the IEEE T RANSACTIONS ON I NFORMATION F ORENSIC AND S ECURITY: Special Issue on Statistical Methods for Network Security and Forensics. She serves as an associate editor for the IEEE/OSA J OURNAL OF L IGHTWAVE T ECHNOLOGY . She is a member of the Board of Governors of the IEEE Information Theory Society. She has served as TPC co-chair of ISIT, WiOpt and CONEXT. Professor Medard’s research interests are in the areas of network coding and reliable communications, particularly for optical and wireless networks. She was awarded the 2009 Communication Society and Information Theory Society Joint Paper Award for the paper: Tracey Ho, Muriel Medard, Rolf Kotter, David Karger, Michelle Effros, Jun Shi, Ben Leong, “A Random Linear Network Coding Approach to Multicast,” IEEE T RANSACTIONS ON I NFORMATION T HEORY, vol. 52, no. 10, pp. 4413–4430, October 2006. She was awarded the 2009 William R. Bennett Prize in the Field of Communications Networking for the paper: Sachin Katti, Hariharan Rahul, Wenjun Hu, Dina Katabi, Muriel Medard, Jon Crowcroft, “XORs in the Air: Practical Wireless Network Coding,” IEEE/ACM T RANSACTIONS ON N ETWORKING, vol. 16, no. 3, June 2008, pp. 497–510. She was awarded the IEEE Leon K. Kirchmayer Prize Paper Award 2002 for her paper, “The Effect Upon Channel Capacity in Wireless Communications of Perfect and Imperfect Knowledge of the Channel,” IEEE T RANSACTIONS ON I NFORMATION T HEORY, vol. 46, no. 3, May 2000, pp. 935–946. She was co-awarded the Best Paper Award for G. Weichenberg, V. Chan, M. Medard, “Reliable Architectures for Networks Under Stress,” Fourth International Workshop on the Design of Reliable Communication Networks (DRCN 2003), October 2003, Banff, Alberta, Canada. She received a NSF Career Award in 2001 and was co-winner 2004 Harold E. Edgerton Faculty Achievement Award, established in 1982 to honor junior faculty members for distinction in research, teaching and service to the MIT community. In 2007 she was named a Gilbreth Lecturer by the National Academy of Engineering.

Tor Aulin (S’77-M’80-SM’83-F’99) was born in Malm¨o, Sweden, on September 12, 1948. He received the M.S. degree in electrical engineering from the University of Lund, Lund, Sweden, in 1974 and the Dr. Techn. (Ph.D.) degree from the Institute of Telecommunication Theory, University of Lund, in November 1979. He became a Docent at the University of Lund in 1981 and worked at this institute as a Postdoctoral Fellow. During this period he was also a Visiting Scientist at the ECSE Department at Rensselaer Polytechnic Institute, Troy, NY. Following this he spent one year at the European Space Agency (ESA), the European Space Research and Technology Centre (ESTEC) in Noordwijk, the Netherlands, as an ESA Research Fellow. In 1983 he became a Research Professor (Docent) in Information Theory at Chalmers University of Technology, G¨oteborg, Sweden. In 1991 he formed the Telecommunication Theory Group there and also became a Docent in Computer Engineering in 1995. During the fall of 1995 he was a Visiting Fellow at the Telecommunications Engineering Department, Australian National University, Canberra, ACT, Australia. He was a Visiting Professor at City University of Hong Kong in 2004 and in 2005 he was a Research scholar at the University of Southern California (USC) in Los Angeles, CA, USA. During 2005 he also spent several months working at Communication Systems Department at Lund University, Lund, Sweden. Some of his research interests are communication theory, combined modulation/coding strategies (such as CPM and TCM), analysis of general sequence detection strategies, digital radio channel characterization, digital satellite communication systems, and information theory. During recent years the potentials of these have been considered for iterative decoding in concatenated versions. This is also the case for such schemes integrated into Multiple Access strategies (TCMA, Trellis Code Multiple Access and its CPM counterpart). Joint source/channel coding also falls into this concept. His company, AUCOM, has performed several advanced theoretical studies as a consultant to some of the major international organizations dealing with developing and operating satellite communication systems, e.g., INTELSAT and ESA. He has also performed theoretical study contracts for Saab and Volvo. Nokia has trusted him as an Internal Lecturer and he has performed numerous studies for Ericsson in the area of digital radio transmission, the latter resulting in a patent. He has authored and published some 200 technical papers and has also authored the book Digital Phase Modulation (Plenum, 1986) as a result of his extensive research in this area at that time. He has organized and chaired several sessions at international symposia/conferences organized by, e.g., IEEE and is an EAMEC representative within the Communications Society of the IEEE. He has been an Editor for IEEE T RANSACTION ON C OMMUNICATIONS in the area of communication theory and coding for a decade. He is also (since 30 years) on the Communication Theory Committee within IEEE COMSOC. In December 1997 Dr. Aulin was awarded the Senior Individual Grant at a ceremony in Stockholm, Sweden, handed over by the Prime Minister of Sweden. This has thereafter been repeated in 2004. Dr. Aulin has two papers among the best (Best-of-the-Best) published during the first 50 years of the IEEE COMSOC, selected in connection with their 50th anniversary in 2002. Dr. Aulin also has an academic degree as a solo cellist.

Suggest Documents