Improved Delayed ACK for TCP over Multi-Hop ... - Semantic Scholar

2 downloads 0 Views 329KB Size Report
the sake of clarity, we name our new algorithm TCP-TDA because we use the ACK-delay timeout as a trigger in a different way from the existing work. A.
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2009 proceedings.

Improved Delayed ACK for TCP over Multi-hop Wireless Networks Beizhong Chen

Ivan Marsic

Dept. of Electrical & Computer Engineering Rutgers University, Piscataway, NJ 08854, USA {bzchen, marsic} @caip.rutgers.edu

Huai-Rong Shao

Ray Miller

Samsung R&D Center San Jose, CA 95134, USA [email protected]

Bell Labs Murray Hill, NJ 07974, USA [email protected]

Abstract–TCP performance in contention-based multi-hop wireless networks is shaped by two main factors, which are unlike the wired network case. First, the maximum throughput for a given topology and flow pattern is reached for a specific congestion window. This window is very difficult to detect under dynamically changing network traffic. Second, the excessive control traffic consumes channel bandwidth more severely than in the wired case. Our analysis and simulations show that the smaller TCP ACK packets consume channel resource comparable to the much longer TCP DATA packets, over high-speed connections. Motivated by this observation, we propose a new approach to improve TCP performance by further lowering the number of control packets compared with the known methods. Extensive simulations show that our strategy improves the TCP throughput up to 205% compared with the regular TCP. Although our simulation is based on 802.11, the same idea works in other networks using a contention-based MAC design.

I. INTRODUCTION The TCP (Transmission Control Protocol) is the most widely deployed protocol for reliable data transfer but it suffers significant performance deterioration in multi-hop wireless networks where the key reason for packet loss is different from that in a wired networks. First, unlike the wired network where buffer overflow is the key reason for packet loss, TCP packet drop is mainly caused by the link-layer contention [1]. In a contention-based wireless network, a transmitting node prevents its neighbors from transmitting/receiving. Moreover, the well-known hidden/exposed terminal problem causes collisions and degrades the channel utilization. To alleviate this problem, 802.11 adopts RTS/CTS and virtual carrier sense mechanism. Note that even without collision and channel errors, packets can still be dropped if the number of MAC-layer retries exceeds a given limit. In such a contention-based network, Fu et al. [1] found that there is an optimal value of the congestion window (cwnd) that gives the maximum throughput, given a specific network topology and flow patterns. Further increase in the cwnd size results in increased link layer contention and packet losses. However, this optimal cwnd is dependent on the number of nodes, which is not a directly controllable variable in the network. The second factor is the much higher overhead introduced by lower layers. In this paper, our analysis and simulations show that for high-speed connections the smaller ACK packets consume comparable channel resource as the much longer DATA packets, which implies that a key step in improving the

TCP performance is to reduce maximally the number of control packets. Based on this, we propose a novel delayed ACK algorithm that can enhance TCP performance significantly, up to 205% improvement over regular TCP. Since TCP NewReno [2] is possibly the most popular variant of TCP, we use it as the regular TCP. In TCP, after receiving a DATA packet, the receiver replies with an ACK packet and this introduces a minor overhead in wired networks. The regular TCP also provides a delayed ACK option (TCP-DA) which generates a cumulative ACK for every two in-order packets that arrive within a given period [3]. If the receiver does not receive the second packet in this period, denoted as ACK-delay timeout, it sends an ACK without waiting longer. In our proposed approach, the receiver always waits until this timeout event occurs to generate an ACK, no matter how many in-order packets it receives during the timeout period, assuming other conditions introduced in Section IV below are not satisfied. Similar as in [4], we define delay window as the number of DATA packets that generate one ACK packet. We also define delay window limit as the maximum delay window. We do not address the mobility-related issues in this paper, but this is part of our ongoing research. The rest of the paper is organized as follows. Related works are introduced in Section II. The TCP ACK impact on channel utilization is analyzed in Section III. In Section IV, we introduce our new method in details. Section V evaluates our approach, based on similar topologies used in existing works. Section VI discusses a few issues and Section VII concludes this paper. II. RELATED WORK Fu et al. [1] show that the main factor affecting the TCP performance in multi-hop wireless networks is the link-level contention, rather than buffer overflow, and there exists an optimal cwnd size. They introduce LRED (Link Random Early Drop) scheme, which, by randomly dropping a packet, indirectly informs the sender to slow down its traffic injection. Additionally, they also propose adaptive pacing which intends to distribute evenly the traffic over the nodes on the connection path to reduce contention. Some researchers proposed proactive methods to limit the injected traffic load by setting a low cwnd limit, intending to match the optimal cwnd size. K. Chen et al. [7] use the number of hops to determine the optimal cwnd window.

978-1-4244-2948-6/09/$25.00 ©2009 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2009 proceedings.

Another way to reduce contention is to reduce the number of control packets. In TCP, two types of packets are transmitted: the actual data (DATA packet), fulfilling the transmission goal, and ACK packets, providing control services. Reducing the number of transmitted packets results in less contention and shorter contention periods for channel access. Altman et al. [5] reported that using delayed ACK strategies results in TCP performance gains. Yuki et al. [11] proposed to combine TCP DATA packet and ACK packet into a single packet. Oliveira and Braun [8] proposed TCP-DAA (dynamic adaptive acknowledgement) that tries to combine the idea of restricting cwnd to its optimal/suboptimal value while reducing the number of ACK packets. They fix the cwnd size at 4 packets. However, having a fixed cwnd limit number may restrict applicability of their method. Besides, they set the delay window in the range of 2~4, which means every fourth in-order packet, at most, generates a cumulative ACK packet. Y. Chen et al. [4] removed the cwnd restriction, and proposed a method called TCP-DCA, which selects different delay window limit based on the number of hops. They set the maximum delay window limit at 3 or 5 when the total number of hops is greater than 3. In our work, we change the strategy to generate ACK packets. Instead of limiting the delay window to a fixed small number, we attempt to minimize the number of ACK packets, which is inspired by the fact that an ACK packet consumes channel bandwidth comparable to a DATA packet. Our analysis is provided below. III. TCP ACK IMPACT ON CHANNEL UTILIZATION In the regular TCP, every TCP DATA packet triggers one ACK packet, which means that an n-packet transmission actually incurs 2n two-way transmissions in the wireless channel. Typically, a TCP ACK packet contains 40 bytes. Note the TCP ACK is different from the MAC ACK and we use ACKmac to distinguish MAC ACK from TCP ACK hereafter. Although the TCP ACK typically is much smaller compared with a DATA packet size, e.g., 1000 bytes, and the transmission time is much shorter, it has a significant impact on the channel utilization. To alleviate the collisions brought by the hidden node problem, a four-way RTS/CTS/DATA/ACKmac exchange is introduced in 802.11, where RTS/CTS are transmitted at the basic-rate that is generally much lower than the payload transmission rate. Moreover, a sender senses the channel and delays its transmission exponentially if the channel is busy before it starts to transmit. Such mechanisms can greatly reduce packet collisions but, on the other hand, sometimes the channel could be idle even when all nodes have packets waiting for transmission. Meanwhile, this also introduces overhead and degrades the channel utilization, especially for high-speed connection. Fig. 1 shows the timing diagram for transmissions that include RTS/CTS. For simplicity, we ignore other overheads, e.g., PHY overhead, etc., which would support our conclusion even stronger.

SIFS (10 μs)

160μs RTS 20 bytes

SIFS (10 μs) DATA

CTS 14bytes DIFS (50 μs)

SIFS (10 μs)

ACKmac 14bytes

112 μs

112 μs

Fig. 1. Timing diagram for a packet transmission with RTS/CTS

RTS, CTS, and ACKmac frames generally use the basic-rate, which is lower than the transmission rate for data frames. Assuming the physical layer is DSSS, the basic-rate is 1Mbps and the data transmission rate is 11 Mbps, the overhead time is DIFS+3SIFS+RTS+CTS+ACKmac = 464μs. After including this overhead, the transmission times for a 40-byte TCP ACK packet and a 1040-byte TCP DATA packet are 493μs and 1220μs, respectively. They are on the same order of magnitude. Moreover, because of the exponential backoff mechanism, the ACK packets have a greater impact than would be proportional to their transmission time. When the transmission rate becomes higher, the transmission time for data will shrink and therefore the overhead brought by RTS/CTS will be greater. To clarify this conclusion, we compare two cases having the packet sizes of 1040 bytes (1000 bytes plus the header of TCP/IP) and 40 bytes (the size of an ACK packet). If we only consider the transmission time, the total number of transmitted 40-byte packets should be 26 times the number of 1040-byte packets, in the same time period. Table I shows the simulation results of actual numbers of transmitted packets in a chain network (Fig. 2, see Section V for simulation parameters), using the regular TCP. Although the 40-byte packet is only 1/26 of the size of a 1040-byte one, the number of transmitted packets only increases by 53% for a TCP chain, in cases of 6-12 hops. This implies that ACK packets could consume channel capacity comparable to that for much longer DATA packets. This fact points to a promising direction to improve TCP performance over multi-hop network by further reducing the number of transmitted ACKs. Motivated by this finding, we propose a new algorithm. IV. NOVEL WAY TO IMPROVE TCP THROUGHPUT Two main factors affect the TCP performance in multi-hop wireless network: the injected traffic load and the control traffic. Because setting a fixed cwnd would restrict its applicability, we focus on improving TCP performance by minimizing the number of ACK packets. TABLE I MAXIMUM ACHIEVED NUMBER OF TRANSMITTED PACKETS VS. PACKET SIZE, TRANSMISSION RATE 11MBPS

Num of hops Num of 1040 byte pkts Num of 40 byte pkts Ratio

6 4212 5184 1.23

8 3529 4255 1.21

10 2359 2698 1.14

12 1947 2969 1.53

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2009 proceedings.

Existing methods like TCP-DAA [8] and TCP-DCA [4] set small delay window limit. When a delay window limit is small, it will be generally reached sooner than the ACK-delay timeout, which means that an ACK is mostly generated before the timer expires. To further reduce the number of ACK packets, our strategy is to make the ACK-delay timeout to be the main generator of ACKs, if the channel condition is favorable. Therefore, we set up a large delay window (here it is set at 25). Due to this, ACK-delay timeout becomes the dominant generator of ACKs. This is unlike the existing approaches, which use a small delay window limit and reaching this limit is the main ACK generator. Chen et al. [4] point out that a higher delay window limit does not necessarily increase the throughput. However, because in our method ACK-delay timeout generates many more ACKs than reaching the delay window limit, we can achieve better performance with large delay window limits. Here, based on extensive experimentation, we set the timeout timer at 200 ms. When the congestion window is small (less than the delay window), the above mechanisms cannot respond quickly and they introduce unnecessary delay. For example, when the congestion window is 1, the sender sends one packet and then waits for the ACK. Because no more packets are coming, the receiver only waits until ACK-delay timeout to send the ACK packet. This causes the under-utilization of the channel. To avoid this problem, we adopt a solution similar as in [4]. The sender puts the current value of cwnd into the TCP option field in packet header to inform the receiver of the current cwnd. When the number of received-but-unacknowledged packets equals cwnd, the receiver knows that the sender is waiting for an ACK and sends one immediately. The difference in our method is that we do not need to count the path length, as required in [4]. For lost/out-of-order, gap-filling packets, we use the same mechanism as in the regular TCP, which means that the receiver generates an ACK immediately for such events. In summary, the receiver generates an ACK packet if one of the following conditions is met: 1) Lost/out-of-order, gap-filling packet detected 2) ACK-delay timeout event occurs 3) Number of unacknowledged in-order packets reaches the cwnd 4) Number of unacknowledged in-order packets reaches the delay window One drawback of using the current cwnd as a trigger condition for ACK-generating is that the TCP-sender needs to include the cwnd information in the transmitted data. The sender can put this information into the TCP header option field. Another method is to use the advertised window field in the TCP header, which is possible because the current TCP works mainly in the one-direction mode. For a two-way TCP, based on our analysis above, the best way to improve performance is to piggyback the ACK in the DATA packet, which removes most of the overhead brought by separate ACK packets.

Extensive simulation results show that our simple strategy improves TCP performance significantly by reducing more unnecessary ACK packets. V. PERFORMANCE EVALUATION In our evaluation, we mainly focus on comparing against TCP-DCA, TCP-DA and the regular TCP. We do not include TCP-DAA because it is designed for short chains; also TCPDCA is reported to work better than TCP-DAA [4]. Based on results reported in [8], we do not select SACK and Vegas. For the sake of clarity, we name our new algorithm TCP-TDA because we use the ACK-delay timeout as a trigger in a different way from the existing work. A.

Simulation scenario

To have a better comparison with existing work, we adopt simulation scenarios similar as in [4] [8], where two typical topologies are considered. The distance between adjacent nodes is 200 meters. The transmission range is 250 meters and the carrier sensing range is 550 meters. We used the ns-2 simulator [9] with most of the parameters set to the default values. We have chosen FTP as the traffic generator, AODV as the routing protocol, simulation time is 100s, and channel bandwidth is 11 Mbps. All data points are obtained by averaging the result of 4 or 5 simulations with different random seeds. B.

Throughput in chain topology

Fig. 2 shows the topology. The TCP sender resides on the first node and the receiver on the last one. For the lack of space, we only show the cases of 1, 2, and 6 flows. Experi-ments using other numbers of flows show similar results. Fig. 3-top shows the throughput where hop counts are 1 to 3. We can see that our algorithm has almost the same performance as TCP-DCA. The difference is around 1%. This is expected because these two methods basically have the same factor, reaching the cwnd, that affects the ACK generating in such scenarios. Both algorithms outperform TCP/TCP-DA significantly. The improvement is 26%, 24%, 41% over the TCP-DA and 55%, 53%, 72% over the regular TCP. Similar results hold for other number of flows. Fig. 3-bottom-left shows the throughputs for the hop counts from 4 to 29 and single flow. The performance gain of TCPTDA over TCP-DCA is in the range of 12% ~ 80%, on average 35%. In most cases, the performance gain is more than 60% over TCP-DA, up to 110%, on average 68%, and 100% over TCP, up to 205%, on average 139%. Compared with TCP-

TCP sender 1

200 m

2

3

4

5

Fig. 2. Chain topology

6

TCP receiver 7

8

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2009 proceedings.

4000 TCP-DCA

3000

TCP-DA

2500

TCP

2000 1500 1000 500 0 2 num of hops (one flow)

900 800 700

200 100 0 4

6

TCP-DA

2500

TCP

2000 1500 1000 500 0 1

TCP-TDA TCP-DCA TCP-DA TCP

600 500 400 300

3000

3

2 num of hops (two flows)

700

TCP-DA

600

TCP

500 400 300 200 100 0 8

TCP-DA

2500

TCP

2000 1500 1000 500 0 2 num of hops (six flows)

800

TCP-DCA

6

TCP-DCA

3000

1

TCP-TDA

800

4

8 10 12 14 16 18 20 22 24 26 28 num of hops (one flow)

TCP-TDA

3500

3

900 Aggregate Throughput (Kbps)

Throughput (Kbps)

1

TCP-DCA

Aggregate Throughput (Kbps)

3500

4000

TCP-TDA

3500

Aggregate Throughput (Kbps)

TCP-TDA

Aggregate Throughput (Kbps)

Throughput (Kbps)

4000

3

TCP-TDA

700

TCP-DCA

600

TCP-DA

500

TCP

400 300 200 100

10 12 14 16 18 20 22 24 26 28 num of hops (2 flows)

0 4

6

8 10 12 14 16 18 20 22 24 26 28 num of hops (six flows)

Fig. 3. Throughput for the chain topology

TABLE II NUMBER OF ACKS AND DATA PACKETS FOR A 13-HOP CHAIN AND 1 FLOW

TCP-TDA TCP-DCA TCP-DA

Num of ACKs sent 693 1509 1542

Num of packets sent 5644 4321 2887

Ratio 0.12 0.35 0.53

Throughput (Kbps) 478 368 242

significantly and therefore the performance gain deteriorates. This confirms that reduction in the number of ACKs does help improve the performance. Because other factors also affect the TCP behavior, throughput does not increase linearly with the number of reduced ACKs. Moreover, in the 6-flow case, we observed in some scenarios that although TCP-TDA sends 10% fewer ACKs than TCP-DCA, the performance gain is small, or even worse. This implies that our algorithm can be improved further. TCP-TDA significantly reduces excessive ACK packets but it may result in TCP-sender timeout if numerous ACK packets lost due to channel error. To solve this problem, we modify the TCP-TDA to setup another timer to resend the last ACK packet if receiver does not receive new packets in a given time (currently 500ms) after an ACK is sent. Moreover, receiver sends ACK a little sooner than the number of arrived packets reaches the cwnd (currently 2 packets before cwnd is reached when cwnd is greater than 6). Fig. 4 shows our preliminary result. We observe performance gains (up to 20%) when the number of hops is greater than 12. We are currently investigating further improvements for these strategies and their 900 Throughput (Kbps)

DCA, we notice that the TCP-TDA performance gains are higher for most cases in larger hop counts. One reason for this is that the TCP sender emits a burst of packets after receiving a cumulative and incurs heavy contention in the first few hops. When the chain is longer and the packets spread out the chain, this burst effect is alleviated. For other numbers of flows, the performance gains are also significant. In the 2-flow case, the average gains are 36%, 77%, 87% over TCP-DCA, TCP-DA, TCP, respectively. In 6-flows case, the gain is 11%, 38%, 76%, respectively. But, we observed that with the increasing of number of flows, although the performance gain over TCP-DA/TCP is still significant, that over TCP-DCA decreased significantly. In the 6-flow case, TCP-TDA works only slightly better (Fig. 3-bottom-right). In a few cases, the TCP-TDA works even worse than TCP-DCA. The explanation is given below. The fact that all four algorithms work worse for larger number of flows suggests that the number of flows should be kept low, especially with large number of hops. This sheds light on application design for such scenarios. Table II shows a typical example of number of ACKs and DATA packets sent. TCP-TDA generates much less ACKs than other algorithms. In the 6-flow case, we observed that, in many scenarios, the ratio for TCP-TDA increases to more than 0.3 and, accordingly, the ratio for TCP-DCA increases to more than 0.4, which means the number of generated ACK increases

TCP-TDA-modified TCP-TDA

800 700 600 500 400 300 4

6

8

10

12

14 16 18 20 Num of hops

22 24

Fig. 4. Performance of the modified TCP-TDA

26

28

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2009 proceedings.

performance in complex scenarios. We can also see an interesting phenomenon in Fig. 3. The throughput does not decrease monotonously with the increasing hop count, for all four algorithms. The reason is that errors generate duplicate ACKs, which results in cwnd shrinking sooner. This, in turn, brings the injected traffic closer to the optimal value and improves the throughput.

Flow 4

Flow 5

Flow 6

Flow 1

Flow 2

Flow 3

Cross topology

We also evaluate the performance in a more complicated 5 X 5 grid topology shown in Fig. 5 (top). The black filled circles depict the sending nodes, the triangles indicate the receivers, and the arrow lines point the transmission direction. For the lack of space, we only report the results for 6 cross flows, averaged over 5 runs with different seeds. In this critical scenario, the interference and contention have greater impact than in the chain topology. Therefore, the performance gain of our algorithm is degraded. However, it still shows approximately 8%, 25%, 35% improvement over the TCPDCA, TCP-DA and the regular TCP, respectively. We believe TCP-TDA will be further improved if combined with other techniques. VI. DISCUSSION AND FUTURE WORK Our method is motivated by the fact that a short ACK packet consumes channel capacity comparable to the much longer DATA packet, over high-speed connections. For higher transmission rates, lower layer overhead would be greater and the ACK packets will consume more channel capacity. This means that our method would perform even better. Because RTS/CTS introduces significant overhead, it would be interesting to investigate how to alleviate this negative effect. Our algorithm can work with methods that use ELFN-like technique [10], which could be used to inform the sender to lower its packet injection rate, e.g. by reducing the cwnd, rather than keeping the cwnd from shrinking. Modifying existing ELFN-like techniques to work with our algorithm is part of our future work. The performance of all existing algorithms deteriorates for increasing numbers of flows. We are currently studying new methods to improve performance in this case. Multi-hop wireless network is present in scenarios such as sensor networks. Different sensor network implementation may use different contention-based MAC design. However, our idea still holds and could be applied with different implementations.

Aggregate Throughput(Kbps)

C.

Aggregate Throughput for grid topology TCP-TDA TCP-DCA TCP-DA TCP

1600 1400 1200 1000 800 600 400 200 0 Algorithm

Fig. 5. Throughput for the grid topology shown on the top

packets consume channel capacity comparable to DATA packets when the transmission rate is high. This motivated us to minimize the number of ACKs to improve performance. Extensive simulations show that our method lowers the excessive ACKs considerably and therefore improves TCP performance significantly, achieving up to 205% gain in chain networks and 35% gain in a complex grid network, compared with the regular TCP. We also provide some possible improvements to the proposed TCP-TDA algorithm. Our method could be improved further by combining it with complementary methods. VIII. REFERENCES [1] Z. Fu, P. Zerfos, H. Luo, S. Lu, L. Zhang, and M. Gerla, [2] [3] [4] [5] [6] [7] [8]

VII. CONCLUSION TCP was originally designed for networks where bufferoverflow is the main reason for packet loss. In multi-hop wireless networks, where the main reason for packet loss is channel contention and too high traffic injection, TCP performs poorly. To improve TCP performance in such scenarios, the key is to reduce contention. This paper shows that the short ACK

1800

[9] [10] [11]

“The impact of multihop wireless channel on TCP throughput and loss,” Proc. IEEE INFOCOM '03, Mar 2003. S. Floyd and T. Henderson, RFC 2582, http://www.ietf.org/rfc/rfc2582.txt R. Braden. “Requirements for internet hosts – communication layers”, RFC-1122, IETF Network Working Group, Oct 1989. J. Chen, Y. Z. Lee, M. Gerla, and M. Y. Sanadidi, “TCP with delayed ACK for wireless networks,” Proc. IEEE BROADNETS, 2006. E. Altman and T. Jimenez, “Novel delayed ACK techniques for improving TCP performance in multihop wireless networks,” IEEE Pers Wireless Comms, Sep 2003. A. K. Singh and K. Kankipati, “TCP-ada: TCP with adaptive delayed acknowledgement for mobile ad hoc networks,” IEEE WCNC, 2004. K. Chen, Y. Xue, and K. Nahrstedt, “On setting TCP’s congestion window limit in mobile ad hoc networks,” Proc. IEEE ICC, 2003. R. de Oliveira and T. Braun, “A dynamic adaptive acknowledgment strategy for TCP over multihop wireless networks,” Proc. IEEE INFOCOM '05, 2005. NS-2 network simulator, http://www.isi.edu/nsnam/ns/ G. Holland and N. H. Vaidya, “Analysis of TCP performance over mobile ad hoc networks,” ACM MobiCom '99, Seattle, WA, Aug 1999. T. Yuki, T. Yamamoto, M. Sugano, M. Murata, H. Miyahara, and T. Hatauchi, “Performance improvement of TCP over an ad hoc network by combining of data and ACK packets,” IEICE Transactions on Communications, 2004.

Suggest Documents