Feasibility Considerations of Multipath TCP in Dealing with Big Data Application Zia Ush Shamszaman, Safina Showkat Ara, Ilyoung Chong Information and Communications Engineering Hankuk University of Foreign Studies Yongin, Korea Republic of
[email protected],
[email protected],
[email protected] Though technology encroachment allows coexistence of multiple network interfaces in a single device, the default manners is to use a single link for all network communications. Other links are considered for the backup purpose, which means only the failover has been accounted rather than load sharing. As a result users are paying for all the links but enjoying only one service at a time.
Abstract—In this paper we present an idea to handle big data in a better way. We consider Multipath TCP (MPTCP) to use all available paths simultaneously. MPTCP is a tailored form of TCP which aspire to improve throughput by sharing available resources smartly and fairly. The key concentration of this paper is to analyze the benefit of MPTCP over single path TCP for bandwidth and time sensitive applications as well as big data application. Our simulation shows that MPTCP can higher goodput by bandwidth aggregation, Couple Congestion Control(CCC) provide better throughput without being unfair to other legacy TCP flows and also portray that large receive buffer causes performance enhancement for relatively large RTT link. As a result, MPTCP can be a enormous addition in contrast with single path TCP in dealing with big data application. Keywords—Multipath TCP(MPTCP); Congestion COntrol (CCC) ; buffer
I.
big
data;
Subsequently, it is essential at this time to utilize all the available resources of an IT device, especially networking resources to handle big data application more efficiently. Which will help to increase the satisfaction level of users as well as optimum use of available network resources. TCP is the most commonly used transport-layer protocol today, but regrettably legacy TCP does not support true multipath. It has not been designed to handle multiple interface simultaneously, Therefore load balancing and failover can't be done efficiently in transport layer by using only legacy TCP. To overcome this pitfall IETF have formed a working group to stipulate multipath protocol at the transport layer, titled Multipath TCP (MPTCP) [2]. The MPTCP is a vital addition to the TCP where multiple paths are used instantaneously, though being transparent to the applications, being fair to legacy TCP flows and compatible with existing internet [3][4]. MPTCP design goals and the protocol architecture are explained in [5]. A number of design choices has been implemented in [6] to facilitate extension of legacy TCP to MPTCP. The problem can be alleviated by aggregating the bandwidth of the different links, giving applications access to more bandwidth than a single link can provide. Multiple links in MPTCP can also be used to provide different services or features [4], for example, increased reliability of a networked application by using the additional link(s) for redundancy and application mobility is another added advantage of MPTCP.
Coupled
INTRODUCTION
Multipath TCP(MPTCP), Big Data, Congestion Control, Coupled CongDealing with big data is a burning issue in IT industry today. Big data cloud can be referred as a collection of data which can be accessed online at any time [1]. Data is exploding in every second from uncountable devices as well as being sensed, collected, stored and analyzed online at extraordinary tempo. However, storing is not the only defy in dealing with big data. Analyzing those data requires optimum network facility, especially for real time applications and multimedia data. Online analysis of those collected data demands optimum network resources. Fig. 1 shows an idea to deal with big data in a better way by using MPTCP. Users with MPTCP enabled devices are connected to big data cloud interfaces through internet. According to behavior MPTCP is totally transparent to the application layer, which means user can get the bandwidth aggregation benefit in transport layer without considering any change in application unlike any other existing transport layer multipath protocols. Details of MPTCP is explained in section-II..
Main concentration of this paper is to investigate the feasibility of MPTCP with big data. We have considered Couple Congestion Control (CCC) with MPTCP in this evaluation process to measure goodput. Experimental result shows that MPTCP with CCC can provide better throughput and is not unfair to other legacy TCP flows. It also portray that large receive buffer causes performance enhancement for relatively large RTT link. MPTCP can be a enormous addition in contrast with single path TCP to deal with big data. It can also be a good addition for bandwidth insane and time sensitive applications because of its acceptable bandwidth aggregation ability.
Applications communicate through a network by using communication protocol and creating network socket. The transport layer provides end-to-end communication, while the application layer enables processes to communicate. Application layer protocols are defined and implemented by the application developers. This allows for a great deal of flexibility, as the developer has full control over the behavior of the protocol.
978-1-4673-5742-5/13/$31.00 ©2013 IEEE
708
ICOIN 2013
Figure 1. An idea to use MPTCP for big data cloud.
Figure 2. Simulation Model.
Our MPTCP simulation is based on NS3 open source network considering IPV4 environment where base operating system is Ubuntu. Fig. 2 shows simulation model. Node-A and Node-B are having one wired link and one WiFi link. However they are connected with each other through a network. Background traffic is also considered for Wifi link.
can be used for retransmission. Here another issue arises to maintain packet sequencing as well as data integrity. MPTCP solves this by using two types of sequence numbers, one at the connection level and another type is at the subflow level. Subflow level sequencing follow legacy TCP packet sequencing style inside a subflow.
This paper is organized as follows, section II describes MPTCP and section III explains CCC. Comparative goodput of MPTCP and TCP is presented in section IV. Finally we conclude in section V.
In contrast, connection level sequencing is the MPTCP level sequencing and DSS (Data Sequence Signal) option carries the data sequence number [4].
II.
This MPTCP sequencing technique enables the receiver to reorder data correctly that receives through different subflows. Fig 3. depicts subflow sequence number and data sequence number in MPTCP. It is worth mentioning here that each subflow level sequence numbers are acknowledged usually on each subflow. To handle packet loss at each subflow level MPTCP considers to use a different subflow for retransmitting the lost segment. Remapping the data sequence number is the modus operandi to treat this issue.
MULTIPATH TCP
MPTCP is a necessary set of extension to TCP. Basically MPTCP places on top of TCP in protocol stack. Functional decomposition of MPTCP architecture has been described in [5]. It lists four entities, namely Path Management, Packet scheduling, Subflow interface and Congestion control . Path Management can be included in transport layer then considered as built in path manager (PM) or can be positioned out of transport layer with the capability to depict available paths to MPTCP [4], remaining three entities are incorporated in transport layer.
B. Impact of Buffer Size in MPTCP Buffer size has significant importance in MPTCP as it stores out of order data until it receives all the missing parts. In legacy TCP Bandwidth Delay Product (BDP ) has been doubled (2*BDP) to calculate the receive buffer size. However things are more complex in MPTCP considering packet loss in one subflow should not influence the performance of other subflows. Therefore Receive buffer should be big enough to accumulate lost segment until retransmission [5]. It would be considered as the worst case, if the largest RTT/RTO subflow see a time out. It's because receive buffer need to keep data from all other subflows. MPTCP requires the receive buffer should be larger than the sum of the buffers required for individual subflows [5]. MPTCP advertise receive window to be shared by all subflows. The recommended receive buffer calculation is Rcv_Buffer = 2 ´ åiB i ´ RTTm , where Bi is the bandwidth of each subflow and RTTm is the largest among all the subflows. According to the IETF MPTCP architectural guideline [5] the send buffer in MPTCP should have the same size as the receive buffer.
MPTCP is totally transparent to the upper and down layers. It accept byte stream from application by using standard TCP socket API [16] then divide those byte stream into several segments and passes it to TCP subflow module. All the TCP subflows are actually legacy TCP flow individually. Path Manager is responsible to monitor and find the available paths to a destination. After selecting the path Packet scheduler passes segment to the subflow. MPTCP enabled receiver to reorder those segment according to the segment numbers and passes to the destined application. The Coupled Congestion Control (CCC) algorithm maintains congestion window, improve throughput by aggregating bandwidth of all available paths, moving traffic from most congested path to less congested path and also ensure fairness in case of resource pulling. CCC is explained in the section III. A. Data Sequencing in MPTCP MPTCP can send data to any of the available subflows, when one subflow fails or disrupted then a different subflow
Fig. 4 shows the impact of different size receive buffer in MPTCP goodput, at this point Node-A and Node-B in Fig. 2 is
709
1
100
101
200
201
Outgoing byte stream
300
Subflow seq. numbers subflow:1
scheduler
100
200 subflow:2
1
Data seq. numbers
101
200
101
100
1
300
201
100
1
Figure 3. Subflow level and Connection level data sequencing.
Figure 4. Effect of receive buffer size in MPTCP goodput.
considered as sending & receiving host having two disjoint paths wired and WiFi, where the bandwidth capacity is 4Mbps in both links. For better understanding we inserts various delay to WiFi link and no delay at wired link. In practical a WiFi access point is shared among many users. To taste this flavor we consider background traffic for the WiFi link. This scenario can also be considered as an unstable network conditions, where two subflows are connected through very different type of links. Such asymmetric delays force the MPTCP receiver to store many packets in its receive buffer to be able to deliver all the received data in sequence. Which causes the decrement in terms of goodput.
distribution among all the paths should not be the intention of MPTCP rather it shall distribute traffic based on the path status. It is actually desirable in multipath scenarios, to use only the less congested paths instead of distributing traffic equally among the available paths. CCC algorithm has been proposed with three goals[12][3] i.e. Aggregated throughput of multipath flow have to be better than any of the single path flow, Should be fair to other TCP flow in term of resource pulling in a bottleneck and finally traffic should be distributed according to the path status and capacity. For each non-duplicate acknowledgement on subflow i , increase the congestion window of the subflow i, by
In the Fig. 4, when two subflows have 0ms to 10ms delay, they are able to saturate almost full capacity of two links with a receive buffer of 2 MBytes. Goodput decreases to less than 6 Mbps at 100 ms delay in WiFi link but still the buffer requirement is within 2MBytes. However, when delay reaches to 300ms then around 8MBytes receive buffer is required to reach at a stable state, the slope means that the receiver is waiting for in order packets and receiving more packets from other subflow. In the same way 500ms and 700ms delay require almost 10MByte buffer size to get into a stable state. The important observation is, when the delay in one link is 700ms then only increasing the receive buffer can't provide higher goodput than single path TCP. Here aggregation function of MPTCP fails to meet the CCC first goal. From this experiment it can be understood that with the larger buffer size MPTCP can ensure stable and better goodput than single path TCP up to a certain level of delay in one link. Though this is not a very common situation but still it can be overcome by disabling the delayed path until it returns to the acceptable condition. Hence, according to our analysis removing the lossy link increases the overall throughput in MPTCP. III.
æ a ´ bytes _ ackd ´ mssi bytes _ ackd ´ mssi minçç , cwnd i å i cwnd è
ö ÷ ÷ ø
where åi cwnd is the summation of congestion window of all the subflows and denotes the aggressiveness of the multipath flows and calculated as following
é æ cwnd i öù max ç ÷ ê iè RTTi 2 ø ú êa = å cwnd ú i cwnd i ê æ ö ú ç åi ê RTTi ÷ è ø ú ë û
When any loss detects on subflow i , decrease the subflow congestion window by 1 cwnd i [6]. 2
The coupled congestion control algorithm only pertains to the additive increase phase of the congestion avoidance and uses other TCP New Reno behavior in case of a drop (e.g. slow start, fast retransmit, fast recovery). We measure and compare the performance of CCC with the Reno congestion control to find out why CCC is better for MPTCP [6]. In the Fig. 5 and Fig. 6 we evaluate the behavior of MPTCP Coupled Congestion Control (MPTCP-CCC) in a bottleneck situation. One of the goals of CCC is not to be unfair to legacy TCP.
COUPLE CONGESTION CONTROL
Congestion control is another area where MPTCP differs from legacy TCP. It's not possible for MPTCP to use TCP congestion control scheme without being unfair to legacy TCP flow. In a common bottle neck point it would occupy double of their fair share. Fig. 5 and Fig. 6 depicts the aggressiveness of Reno congestion control. Furthermore, [12] explains that MPTCP should select less congested path. Equal traffic
710
Figure 5. MPTCP-CCC act like single path TCP in a shared bottleneck.
Figure 7. MPTCP-CCC is fair to other TCP subflows.
Figure 6. MPTCP-Reno consumes more bandwidth than MPTCP-CCC.
Figure 8. MPTCP-Reno is not fair to other TCP subflow.
To measure this fairness we compare the MPTCP-CCC and MPTCP-Reno bandwidth consumption with legacy TCP. According to our simulation result in Fig. 5. MPTCP-CCC acts like a single path legacy TCP in a shared bottleneck . On the other hand in the Fig. 6. MPTCP-Reno consumes more bandwidth in the same situation compare to MPTCP-CCC. It is because of two subflows tries to get bandwidth against one. In both cases up to 3 TCP connections is considered. This simulation results show that MPTCP with CCC will not be aggressive to other TCP flow, which ensure the coexistence of MPTCP with other protocol without doing any harm. So, in a big data cloud non MPTCP client will not be dominated by MPTCP in a bottleneck situation, which is a motivation to consider MPTCP as a transport protocol in big data cloud.
scenario. However, this experiment is to measure MPTCP performance in a optimistic situation where both of the links have same bandwidth (3mb each) but slightly different delay. Fig. 9. shows the simulation result of this scenario. MPTCP can provide almost two times better goodput compare to legacy TCP when two links bandwidth is identical with very small delay difference. On the other hand, Fig. 10 shows that two paths are sharing almost equal load (2.7Mb, 2.8Mb), The slight difference is because of different delay of two links. Both of the links goodput decrease at a certain point because of packet reordering at the receiver end. However it will be too positive to expect this in real life. In Fig. 2 Node-A and Node-B are connected to network by Wifi and Wire. At this point we measure achieved goodput of MPTCP over a single path TCP when two MPTCP paths are dissimilar in bandwidth and delay. Here WiFi path capacity is 2Mbps and the wired path capacity is 4 Mbps . On the other hand to compare with single path TCP, we consider the 4mbps wired link. Another experiment is to measure how much load is shared by each MPTCP path.
Fig. 7. and Fig. 8 shows the effect when number of MPTCP subflow increase. In case of single subflow both MPTCP-CCC and MPTCP-Reno behavior is similar. When subflows increase to 2, 3 and 4 then MPTCP-Reno shows its unfairness to regular TCP and aggressively consumes supplementary bandwidth. Alternatively MPTCP-CCC bandwidth consumption remains unchanged when subflows increase to 2, 3, 4 at the same time maintaining the same fairness to legacy TCP incessantly. This experiment shows that, CCC outperforms Reno congestion control with MPTCP. IV.
Goodput comparison and the load sharing between two MPTCP path is depicted in Fig. 11 and Fig. 12 respectively. In Fig. 11 MPTCP goodput is degraded at around 4th second because of the different delay of Wifi link which causes the reordering in receive buffer. That means receiver need to wait for missing segment and can't deliver them to application, at this point receiver reduces the window size which announces to the sender to control the flow, at the same time sender will receive 3 duplicate acknowledgements and activate fast
MPTCP GOODPUT COMPARISON WITH TCP
In the Fig. 2. Node-C communicate with Node-D, both of the node have two wired connections to cloud. In real life two wired connection from two different providers is a common
711
Figure 9. MPTCP goodput comparison with TCP.
Figure 11. MPTCP goodput comparison with TCP.
Figure 10. Load sharing between two wired links (MPTCP).
Figure 12. Load sharing between wired and wireless links (MPTCP).
retransmit state to send those missing segment again. Therefore from the 4th to 6th second MPTCP provide less goodput than the single path TCP , however again it maintains better good than single path TCP from the 6th second. From this experiment it is understandable that receive buffer has a great effect on MPTCP performance as it has to store those segment as long as missing segments are not received. Impacts of receive buffer size is explained in section II.B. V.
may come bundled as part of an operating system (in which case the end-user may not be aware, or may need to enable/configure it). Early adopters will have the highest costs in deployment due to relative complexity. As soon as MPTCP will become available and will be enabled by default in operating systems, this cost will tend towards zero. MPTCP can be a very good addition to the big data transfer as from the mobile terminal load balancing will really be helpful for users.
CONCLUSION
ACKNOWLEDGMENT
MPTCP is a remarkable addition to TCP desire to achieve better throughput over multipath and also sharing resources in a fair method. According to our understanding MPTCP is fair, and not harmful to other TCP flows. In diverse network environment large receive buffer is required to reach a stable state.
"This work was partly supported by the IT R&D program of KCC/MKE[10035206, The development of IMT-Advanced mobile IPTV core technology. REFERENCES
In this paper we have analyzed MPTCP performance with Coupled Congestion Control and have compared with Reno. When delay has been inserted in one link MPTCP performance degrades abruptly due to the head-of-line blocking in receive buffer. In order to get MPTCP benefits, there are potentially two separate costs for end-users. Firstly, there is a cost involved with the deployment of MPTCP, potentially monetary but primarily due to time and effort involved. Deploying MPTCP may take the form of installing a specific extension (i.e. the end-user is aware and makes an explicit choice), or it
[1] [2] [3] [4]
712
Tawny Schlieski and Brian David Johnson. Entertainment in the Age of Big Data, Proceedings of the IEEE,Volume: 100 , Issue: Special Centennial Issue, May 2012. IETF working group for MPTCP, http://datatracker.ietf.org/wg/mptcp/. Last accessed on June 20, 2012. C. Raiciu, M. Handley, and D.Wischik. Coupled Congestion Control for Multipath Transport Proto-cols, RFC 6356(Informational), October 2011. A. Ford, C. Raiciu, M. Handley, and O. Bonaventure. TCP Extensions for Multipath Operation with Multiple Addresses. Internetdraft, draft ietf mptcp multiaddressed-09.txt, work in progress, June 2012.
[11] J. Iyengar, P. Amer, and R. Stewart, "Concurrent multipath transfer using SCTP multihoming over independent end-to-end paths," IEEE/ACM Transactions on Networking, vol. 14, no. 5, pp. 951–964, Oct. 2006. [12] D. Wischik, C. Raiciu, A. Greenhalgh, and M. Handley, "Design, implementation and evaluation of congestion control for multipath TCP," in Proceedings of USENIX NSDI, Boston, Mar. 2011. [13] R. Moskowitz and P. Nikander. Host Identity Protocol (HIP) Architecture. RFC 4423 (Informational), May 2006. [14] P. Nikander, T. Henderson, C. Vogt, and J. Arkko. End-Host Mobility and Multihoming with the Host Identity Protocol. RFC 5206 (Experimental), April 2008. [15] Sinh Chung Nguyen and Thi Mai Trang Nguyen, "Evaluation of multipath TCP load sharing with coupled congestion control option in heterogeneous networks", Global Information Infrastructure Symposium (GIIS), 2011. [16] M. Scharf. A. Ford, MPTCP Application Interface Considerations, draftietf-mptcp-api-05, work in progress, April 2012G.
[5]
A. Ford, C. Raiciu, M. Handley, S. Barre, and J. Iyengar. Architectural Guidelines for Multipath TCP Development. RFC 6182 (Informational), March 2011. [6] S. Barré, C. Paasch and O. Bonaventure, MultiPath TCP: From Theory to Practice, Springer, BerliHeidelberg, Vol. 6640, 2011, pp. 444-45. [7] R. Stewart. Stream Control Transmission Protocol. RFC 4960 (Proposed Standard), September 2007. Updated by RFCs 6096, 6335. [8] D. Sarkar and S. Paul, " Architecture, implementation, and evaluation of cmpTCP Westwood, in Proceedings of IEEE GLOBECOM, San Francisco, Dec. 2006. [9] H.-Y. Hsieh and R. Sivakumar, " A transport layer approach for achieving aggregate bandwidths on multi-homed mobile hosts," in Proceedings of ACM MOBICOM, Atlanta, Sep. 2002. [10] M. Zhang, J. Lai, A. Krishnamurthy, L. Peterson, and R. Wang, " A transport layer approach for improving end-to-end performance and robustness using redundant paths," in Proceedings of the USENIX '04 Annual Technical Conference, Boston, Jun. 2004.
713