information Article
Multiple Congestion Points and Congestion Reaction Mechanisms for Improving DCTCP Performance in Data Center Networks Prasanthi Sreekumari Department of Computer Science, Grambling State University, Grambling, LA 71245, USA;
[email protected] Received: 23 February 2018; Accepted: 6 June 2018; Published: 8 June 2018
Abstract: For addressing problems such as long delays, latency fluctuations, and frequent timeouts in conventional Transmission Control Protocol (TCP) in a data center environment, Data Center TCP (DCTCP) has been proposed as a TCP replacement to satisfy the requirements of data center networks. It is gaining more popularity in academic as well as industry areas due to its performance in terms of high throughput and low latency, and is widely deployed in data centers. However, according to the recent research about the performance of DCTCP, authors have found that most times the sender’s congestion window reduces to one segment, which results in timeouts. In addition, the authors observed that the nonlinear marking mechanism of DCTCP causes severe queue oscillation, which results in low throughput. To address the above issues of DCTCP, we propose multiple congestion points using double threshold and congestion reaction using window adjustment (DT-CWA) mechanisms for improving the performance of DCTCP by reducing the number of timeouts. The results of a series of simulations in a typical data center network topology using Qualnet network simulator, the most widely used network simulator, demonstrate that the proposed window-based solution can significantly reduce the timeouts and noticeably improves the throughput compared to DCTCP under various network conditions. Keywords: TCP; data centers; timeouts; double threshold
1. Introduction Modern data centers host a variety of services and computation-intensive distributed applications [1]. Nowadays, universities and large IT enterprises such as Microsoft, IBM, Google, and Amazon build their own data centers because of their lower operating costs, improved data protection, and cheap networking equipment [2–4]. Figure 1 shows the conventional Data Center Network (DCN) architecture for data centers adopted from Cisco [2,5]. The typical data center consists of core, aggregation, and access layers. Among these, the core layer provides the high-speed packet switching for all incoming and outgoing flows of the data center. In addition, it provides connectivity to multiple aggregation modules and serves as the gateway. From the clients in the Internet, data requests to multiple rack of servers (A) are routed through border (BR) and access (AR) routers in Layer 3 to the Layer 2 domain based on the destination Virtual IP address—that is, the IP address to which the request is sent and which is configured onto the two load balancers in Layer 2. The aggregation switches (S) in the top of Layer 2 forward data requests to the top-of-rack switches (TRS) which provide connectivity to servers mounted on every rack. The two load balancers (LBs) in the S contain the list of private and internal addresses (DIP) of physical servers in the racks. This list of DIPs defines the pool of servers that can handle requests to that VIP, and the LB spreads requests across the DIPs in the pool.
Information 2018, 9, 139; doi:10.3390/info9060139
www.mdpi.com/journal/information
Information 2018, 2018, 9, 9, x139 Information FOR PEER REVIEW
of 99 22 of
Figure 1. Conventional data center (DC) architecture. Figure 1. Conventional data center (DC) architecture.
Data center traffic is classified therequests traffic between tworack dataofcenter and the traffic From the clients in the Internet, as data to multiple serversnetworks (A) are routed through within a(BR) single center network. analysis data center traffic characteristics important border anddata access (AR) routers The in Layer 3 toofthe Layer 2 domain based on theisdestination to designing efficient networking mechanisms for data centers.is As authors in onto [6,7], Virtual IP address—that is, the IP address to which the request sentthe and whichmentioned is configured unique characteristics such as large number of short flows, short congestion periods, and significant the two load balancers in Layer 2. The aggregation switches (S) in the top of Layer 2 forward data traffic variability make datacenter traffic remarkably different compared with any other network requests to the top-of-rack switches (TRS) which provide connectivity to servers mounted on traffic. every The flows in data are either critical, orof short. These typically operate using rack. The two loadcenters balancers (LBs) large, in the latency S contain the list private andflows internal addresses (DIP) of the conventional TCP due to its reliability [8]. However, recent research has shown that the TCP does physical servers in the racks. This list of DIPs defines the pool of servers that can handle requests to not work wellthe in LB the spreads unique data center environment and unable to provide high throughput [9]. that VIP, and requests across the DIPs in theispool. One Data of thecenter main reasons for the TCP throughput collapse in a TCP networks Incast congestion. Incast traffic is classified as the traffic between twoDCN data is center and the traffic congestion is a catastrophic loss in throughput that of occurs number of senders communicating within a single data center network. The analysis datawhen centerthe traffic characteristics is important to with a single receiver by sending data increases the ability of an Ethernet switch to designing efficient networking mechanisms for beyond data centers. As the authors mentioned inbuffer [6,7], packets. It leads to severe packet loss and, consequently, frequent TCP timeouts, thereby reducing the unique characteristics such as large number of short flows, short congestion periods, and significant performance of TCP. Hence, for better communication with the compared nodes, there is aany need to propose traffic variability make datacenter traffic remarkably different with other networka good solution to address issuesare of conventional in a critical, data center environment. traffic. The flows in datathe centers either large, TCP latency or short. These flows typically Recently, a few solutions [10–18] have proposed[8]. to However, increase TCP performance in ashown DCN. operate using the conventional TCP due to been its reliability recent research has Among solutions, DatainCenter TCP (DCTCP) hasenvironment gained more and popularity in academic as that the the TCPexisting does not work well the unique data center is unable to provide well as industry areas due to its better performance in terms of throughput and latency. DCTCP [10] uses high throughput [9]. One of the main reasons for the TCP throughput collapse in a DCN is TCP a very small buffer space with existing solutions. However, in that recent research [19–25] Incast congestion. Incastcompared congestion is other a catastrophic loss in throughput occurs when the about theofperformance of DCTCP, thewith authors foundreceiver that mostbytimes the sender’s congestion window number senders communicating a single sending data increases beyond the reduces to one segment, which results in timeouts. In addition, the authors observed that the nonlinear ability of an Ethernet switch to buffer packets. It leads to severe packet loss and, consequently, marking TCP mechanism ofthereby DCTCPreducing causes severe queue oscillation in low throughput. frequent timeouts, the performance of TCP.which Hence,results for better communication To address thesethere issuesisof DCTCP, we propose multiple congestion points using double thresholdTCP and with the nodes, a need to propose a good solution to address the issues of conventional congestion reaction using window adjustment (DT-CWA) mechanisms for improving the performance in a data center environment. of DCTCP by reducing the number of timeouts. results of series ofTCP simulations in a typical data Recently, a few solutions [10–18] have beenThe proposed to aincrease performance in a DCN. center network topology using Qualnet network simulator, the most widely used network simulator, Among the existing solutions, Data Center TCP (DCTCP) has gained more popularity in academic demonstrate that the proposed solution significantly reduce timeouts and as well as industry areas due to window-based its better performance incan terms of throughput andthe latency. DCTCP noticeably the throughput withwith DCTCP under various networkHowever, conditions. [10] uses aimprove very small buffer spacecompare compared other existing solutions. in recent The [19–25] remainder of the the performance paper is organized as follows: In Section wemost describe of research about of DCTCP, the authors found 2, that timesthe thedetails sender’s proposed algorithm, DT-CWA. Section 3, wewhich present our experimental explain congestion window reduces toIn one segment, results in timeouts.methodology In addition, and the authors our results. Finally, Section 4 marking concludesmechanism our work. of DCTCP causes severe queue oscillation which observed that the nonlinear results in low throughput. To address these issues of DCTCP, we propose multiple congestion points using double threshold and congestion reaction using window adjustment (DT-CWA) mechanisms for improving the performance of DCTCP by reducing the number of timeouts. The results of a series of simulations in a typical data center network topology using Qualnet network simulator, the most widely used network simulator, demonstrate that the proposed window-based
solution can significantly reduce the timeouts and noticeably improve the throughput compare with DCTCP under various network conditions. The remainder of the paper is organized as follows: In Section 2, we describe the details of proposed algorithm, DT-CWA. In Section 3, we present our experimental methodology and explain Information 2018, 9, 139 3 of 9 our results. Finally, Section 4 concludes our work. 2. 2. DT-CWA DT-CWA DCTCP DCTCP has has greatly greatly improved improved the the throughput throughput of of conventional conventional TCP in data center networks. However, However, the the congestion congestion window window estimation estimation of DCTCP is in the choice of αα initialization value [25]. IfIf we suffer from frequent packet losses andand retransmission timeouts. On the weset setααtoto0,0,the thesender sendermay may suffer from frequent packet losses retransmission timeouts. On other hand,hand, if we ifsetwe α to minimize the queuing but delay the amount of amount packets to the other set1, αthe tosender 1, the can sender can minimize the delay queuing but the of be transferred is much smaller. As a result, the throughput of DCTCP will be reduced. In addition, packets to be transferred is much smaller. As a result, the throughput of DCTCP will be reduced. In performance studies show that show most times the sender’s window reduces one segment, addition, performance studies that most times thecongestion sender’s congestion windowtoreduces to one which results in timeouts. studies have shownhave that shown the nonlinear marking mechanism segment, which results inFurthermore, timeouts. Furthermore, studies that the nonlinear marking of DCTCP causes severecauses queuesevere oscillation, which results which in low results throughput [20]. To address these mechanism of DCTCP queue oscillation, in low throughput [20]. To issues of these DCTCP, DT-CWA modifies the DCTCP sender switchsender sides by adding schemes: one address issues of DCTCP, DT-CWA modifies theand DCTCP and switchtwo sides by adding is a congestion reaction for controlling the size the congestion window the senderwindow side, and two schemes: one is a congestion reaction for of controlling the size of the at congestion at the the other is side, multiple using double threshold at thedouble switchthreshold for solvingatthe of sender and congestion the other ispoints multiple congestion points using theproblem switch for queue as shown inoscillation, Figure 2. First, we present multiple congestion points usingcongestion a double solvingoscillation, the problem of queue as shown in Figure 2. First, we present multiple threshold mechanism the switchmechanism side and then we explain the congestion reaction mechanism at the points using a doubleatthreshold at the switch side and then we explain the congestion sender side. reaction mechanism at the sender side.
Figure2.2.Double Double threshold and congestion usingadjustment window (DT-CWA) adjustmentmodifications (DT-CWA) Figure threshold and congestion reaction reaction using window modifications over DCTCP. over DCTCP.
2.1. Multiple Congestion Points 2.1. Multiple Congestion Points For avoiding the queue oscillation at the switch, we improved the single threshold mechanism For avoiding the queue oscillation at the switch, we improved the single threshold mechanism of of DCTCP by adding multiple congestion points at the switch side using a double threshold DCTCP by adding multiple congestion points at the switch side using a double threshold mechanism mechanism which we adopted from [20]. which we adopted from [20]. Double Threshold Mechanism: DCTCP uses an active queue management policy with an Double Threshold Mechanism: DCTCP uses an active queue management policy with an explicit explicit congestion notification (ECN) in which the switch detects the congestion and sets the congestion notification (ECN) in which the switch detects the congestion and sets the congestion congestion encountered (CE) codepoint in the IP header. When the number of packets that are encountered (CE) codepoint in the IP header. When the number of packets that are queued exceeds queued exceeds the single marking threshold ‘K’, the switch marks the incoming packets with the the single marking threshold ‘K’, the switch marks the incoming packets with the CE codepoint and CE codepoint and informs the sender about network congestion. In data center networks, the queue informs the sender about network congestion. In data center networks, the queue length increases length increases and decreases rapidly due to the concurrent arrival of bursts of flows, and the and decreases rapidly due to the concurrent arrival of bursts of flows, and the ECN-enabled switch ECN-enabled switch marks packets until the queue length drops back to the threshold. As a result, marks packets until the queue length drops back to the threshold. As a result, the notified senders the notified senders reduce their window size for a while based on the value of α in DCTCP. In reduce their window size for a while based on the value of α in DCTCP. In addition, some packets may addition, some packets may drop due to severe oscillation in queue length. As a result, the sender drop due to severe oscillation in queue length. As a result, the sender needs to wait until the timeout needs to wait until the timeout happens. As the author mentioned in [20], one of the main reasons happens. As the author mentioned in [20], one of the main reasons for queue length oscillation is the for queue length oscillation is the nonlinear marking process of single threshold. nonlinear marking process of single threshold. As we explained in Section 1, in data center networks, TCP Incast occurs when the flow of data increases from many senders to a single receiver. Using a single threshold ‘K’, DCTCP cannot mitigate the problem of TCP Incast when severe packet loss occurs and the frequency of timeouts thereby
Information 2018, 9, x FOR PEER REVIEW
4 of 9
As we explained Information 2018, 9, 139
in Section 1, in data center networks, TCP Incast occurs when the flow of data 4 of 9 increases from many senders to a single receiver. Using a single threshold ‘K’, DCTCP cannot mitigate the problem of TCP Incast when severe packet loss occurs and the frequency of timeouts increases. For mitigating the TCP problem due todue frequent packet losses and and timeouts, we thereby increases. For mitigating theIncast TCP Incast problem to frequent packet losses timeouts, adapted the double threshold mechanism from [20] and incorporated it into DT-CWA. we adapted the double threshold mechanism from [20] and incorporated it into DT-CWA. Instead of of using using single singlethreshold threshold‘K’, ‘K’,we wesplit split‘K’ ‘K’into into‘K1’ ‘K1’and and‘K2’. ‘K2’. The lower threshold ‘K1’ Instead The lower threshold ‘K1’ is is for starting ECN packet marking, i.e., the starting point of congestion, and the upper threshold for starting ECN packet marking, i.e., the starting point of congestion, and the upper threshold ‘K2’ ‘K2’ for ending the marking of packets, i.e.,ending the ending of congestion, as shown in Figure 3. is forisending the marking of packets, i.e., the pointpoint of congestion, as shown in Figure 3. The The switch sends message of network congestion sendersbybymarking markingpackets packetsto to decrease decrease their switch sends the the message of network congestion to to senders congestion window window size from when the queue length increases beyond ‘K1’ until when the queue Using double double threshold threshold values, values, we we can can control the queue length and length decreases under ‘K2’. Using improve the performance of DCTCP by marking the packets earlier than DCTCP would, particularly from when the queue length increases to the first threshold ‘K1’ and by continuously marking the packets until the queue length falls back to ‘K2’. ‘K2’.
(a)
(b)
Figure Figure 3. 3. (a) (a) Single Single threshold threshold and and (b) (b) double double threshold. threshold.
2.2. Congestion Reaction 2.2. Congestion Reaction At the sender side: When the sender receives the acknowledgment (ACK) with a congestion At the sender side: When the sender receives the acknowledgment (ACK) with a congestion notification, the sender checks the current network condition based on the outstanding packets and notification, the sender checks the current network condition based on the outstanding packets and adjust the size of congestion window according to Equation (1): adjust the size of congestion window according to Equation (1): CWstart = 1/α(CWmax − CWmin) − CWcurrent (1) CWstart = 1/α(CWmax − CWmin) − CWcurrent (1) where α is the number of outstanding packets in the network, i.e., the difference between the total numberα of packets sentofand the number of ACK received so far thedifference network; between CWcurrent the where is the number outstanding packets in the network, i.e.,inthe theistotal current of the congestion window timereceived of receiving ACK packets CWcurrent with congestion number size of packets sent and the numberatofthe ACK so farthe in the network; is the notifications; and CWmax and CWmin are the maximum and minimum size, respectively, of the current size of the congestion window at the time of receiving the ACK packets with congestion congestion window adjusted before receiving congestion notification. the sender receives notifications; and CWmax and CWmin are thethe maximum and minimumWhen size, respectively, of the the ACK of all outstanding packets, the sender adjusts the sending rate according to Equation (2): congestion window adjusted before receiving the congestion notification. When the sender receives the ACK of all outstanding packets, sender adjusts the sending rate according to Equation (2):(2) CWendthe = CWcurrent + β(CWmax − CWmin)
where CWstart and CWend are the starting and ending points−ofCWmin) congestion, respectively. The above CWend = CWcurrent + β(CWmax (2) modifications in DCTCP using the two components (congestion reaction at the sender side and multiple congestion points are at the side) to overcome problem respectively. of oscillationsThe in queue where CWstart and CWend theswitch starting andhelp ending points of the congestion, above length as well in as DCTCP frequentusing retransmission timeouts, and thereby increase in data modifications the two components (congestion reactionthe at performance the sender side and center networks. multiple congestion points at the switch side) help to overcome the problem of oscillations in queue Forassetting the value ofretransmission β, we did an experiment the throughput using 10 senders length well as frequent timeouts, by andcalculating thereby increase the performance in data with values for β. All senders send data to a single aggregator through different switches. centerdifferent networks. As shown in Figure 4, from thewe result, weexperiment observed that when the value of β increases from 0.2 the For setting the value of β, did an by calculating the throughput using 10 senders throughput decreases dueβ.toAll frequent timeouts. Astoa aresult, set β to bethrough 0.2 in Equation with different values for senders send data singlewe aggregator different(2). switches. As shown in Figure 4, from the result, we observed that when the value of β increases from 0.2 the throughput decreases due to frequent timeouts. As a result, we set β to be 0.2 in Equation (2). With the above modifications to DCTCP, DT-CWA can improve performance by reducing the packet losses by controlling the size of congestion window as well as the packet marking mechanism using double threshold.
700
Throughput(Mbps) Throughput(Mbps)
600 Information 2018, 500 9, 139 Information 2018, 9, x FOR PEER REVIEW
400 700 300 600 200 500 100 400 0 300
5 of 9 5 of 9
β=0.2 β=0.4 β=0.8
β=0.2 2
200 100
4
6
8
No: of Senders
10
β=0.4 β=0.8
Figure 4. Comparison of β values.
0 With the above modifications to DCTCP, DT-CWA can improve performance by reducing the 2 4 6 8 10 packet losses by controlling the size of congestion window as well as the packet marking mechanism No: of Senders using double threshold. 3. Performance Evaluation
Figure 4. Comparison of β values. Figure 4. Comparison of β values.
3. Performance Evaluation In this section, we present the performance of our improved DCTCP algorithm through With the above modifications to DCTCP, DT-CWA can improve performance by reducing the comprehensive simulations using Qualnet network simulator [26]. We compare thealgorithm performance of Inlosses this section, we present performance of our through packet by controlling the sizethe of congestion window as improved well as the DCTCP packet marking mechanism our algorithm with that of DCTCP as it isnetwork the most popular[26]. data transport protocol. of We comprehensive simulations using Qualnet simulator Wecenter compare the performance our using double threshold. implemented DCTCP in Qualnet]. Our main goal of this work is to increase the performance of algorithm with that of DCTCP as it is the most popular data center transport protocol. We implemented DCTCP byincontrolling the main size ofgoal theofcongestion window as well as the queueoflength in by data center DCTCP Qualnet]. Our this work is to increase the performance DCTCP controlling 3. Performance Evaluation networks. the size of the congestion window as well as the queue length in data center networks. this section, we present the performance of our improved DCTCP through ToIn of of our in in aalgorithm Toachieve achieveour ourgoal, goal,weweevaluate evaluatethetheperformance performance ouralgorithm algorithm atypical typicalnetwork network comprehensive simulations using Qualnet network simulator [26]. We compare the performance of topology [20] which consists of of 1010 end hosts and four switches as as shown in in Figure 5. 5. Among them, topology [20] which consists end hosts and four switches shown Figure Among them, our algorithm with that of DCTCP as it is the most popular data center transport protocol. We Switch 1 is connected to to thethe aggregator; Switch 2 is connected to to Workers 1, 1, 2, 2, and 3; 3; Switch 3 is Switch 1 is connected aggregator; Switch 2 is connected Workers and Switch 3 is implemented DCTCP4,in5,Qualnet]. Our main 4goal of this work is to increase the 9.performance of connected to Workers and 6; and Switch is connected to Workers 7, 8, and In addition, connected to Workers 4, 5, and 6; and Switch 4 is connected to Workers 7, 8, and 9. In addition, Switches DCTCP by controlling the size of the congestion window as well as the queue length in data center Switches and 4 are connected to 1. Switch 1. 2, 3, and2,43,are connected to Switch networks. To achieve our goal, we evaluate the performance of our algorithm in a typical network topology [20] which consists of 10 end hosts and four switches as shown in Figure 5. Among them, Switch 1 is connected to the aggregator; Switch 2 is connected to Workers 1, 2, and 3; Switch 3 is connected to Workers 4, 5, and 6; and Switch 4 is connected to Workers 7, 8, and 9. In addition, Switches 2, 3, and 4 are connected to Switch 1.
Figure 5. Network topology. Figure 5. Network topology.
The link capacity was set to 1 Gbps and link delay was set to 25 µs, RTT 100 µs, and RTO min which is equal to 10 ms. The buffer size was set to 128 KB for Switch 1 and 512 KB for Switches 2, 3, and 4. The marking threshold value ‘K’ for DCTCP was set according to [10,21] for 1 Gbps link Figure 5. Network topology.
Information 2018, 9, 139
6 of 9
capacity. 2018, The value the weighted Information 9, x FORofPEER REVIEW Information 2018, 9, x FOR PEER REVIEW
averaging factor ‘g’ for DCTCP was set to 0.0625. We set the lower 6 of 9 6 of 9 threshold value ‘K1’ to be 35 and upper value ‘K2’ to be 30. An FTP-generic application was run on link set Gbps link was 25 RTT 100 and RTO eachThe source tocapacity send thewas packets as11quickly as possible. Figures 6–8to present results of our The link capacity was set to to Gbps and and link delay delay was set set to 25 µs, µs,the RTT 100 µs, µs, andevaluation RTO min min which is 10 buffer set KB Switch and KB 2, of the proposed inwas comparison DCTCP in11terms of throughput, average which is equal equal to toalgorithm, 10 ms. ms. The TheDT-CWA, buffer size size was set to to 128 128with KB for for Switch and 512 512 KB for for Switches Switches 2, 3, 3, and 4. The marking threshold value ‘K’ for DCTCP was set according to [10,21] for 1 Gbps link query completion time, and average short message completion time as they are the most important and 4. The marking threshold value ‘K’ for DCTCP was set according to [10,21] for 1 Gbps link capacity. value performance capacity. The Themetrics. value of of the the weighted weighted averaging averaging factor factor ‘g’ ‘g’ for for DCTCP DCTCP was was set set to to 0.0625. 0.0625. We We set set the the lower threshold value to upper ‘K2’ 30. FTP-generic application was 6 shows the‘K1’ performance of DT-CWA compared with of in terms of throughput lowerFigure threshold value ‘K1’ to be be 35 35 and and upper value value ‘K2’ to to be bethat 30. An AnDCTCP FTP-generic application was run each to the as quickly possible. Figures 6–8 the with different numbers of flows. From the weas that the throughput of DT-CWA is of higher run on on each source source to send send the packets packets asresult, quickly asobserve possible. Figures 6–8 present present the results results of our our evaluation of the proposed algorithm, DT-CWA, in comparison with DCTCP in terms than the throughput of DCTCP. After 20 flows, the throughput of DT-CWA increased significantly evaluation of the proposed algorithm, DT-CWA, in comparison with DCTCP in terms of of throughput, query time, average short message time as compared toaverage that of DCTCP, but the throughput DT-CWA reduced to completion less than 200 Mbps after throughput, average query completion completion time, and and of average short message completion time as they they are the 35 areflows. the most most important important performance performance metrics. metrics.
Figure of throughput. Figure 6. 6. Comparison Comparison of throughput.
Figure 7. Comparison of average query completion time. Figure 7. 7. Comparison time. Figure Comparison of of average average query query completion completion time.
Information 2018, 9, 139 Information 2018, 9, x FOR PEER REVIEW
7 of 9 7 of 9
Figure Figure8.8.Comparison Comparisonof ofaverage averageshort shortmessage messagecompletion completiontime. time.
Figure 6 shows the performance of DT-CWA thatofofDCTCP. DCTCPFor in 30 terms of However, the performance of DT-CWA is highercompared comparedwith to that flows, throughput with different numbers of flows. From the result, we observe that the throughput of DT-CWA achieves 59% throughput increment and for 55 flows, DT-CWA achieves 43% throughput DT-CWAcompared is higher to than the throughput of DCTCP. After flows, theisthroughput DT-CWA increment DCTCP. One of the main reasons for this20 achievement our multipleofcongestion increased significantly compared to that of DCTCP, but the the comparison throughputof ofaverage DT-CWA reduced to less points and rate adjustment mechanisms. Figure 7 presents query completion than 200 Mbps after 35 flows. time with flows from 25 to 60. The result shows that the average query completion time of DT-CWA theDCTCP. performance of DT-CWA is higher compared to that For 30 flows, is lessHowever, than that of Compared to DCTCP, the query completion timeofofDCTCP. DT-CWA is less than DT-CWA achieves 59% throughput increment and for 55 flows, DT-CWA achieves 43% throughput 50 ms until 35 flows. When the number of flows increases, the query completion time also increases. incrementDT-CWA compared toless DCTCP. OneDCTCP, of the thereby main reasons for this achievement Figure is our8multiple However, took time than increasing the performance. depicts congestion points and rate adjustment mechanisms. Figure 7 presents the comparison of DCTCP. average the results of the average short message completion time of DT-CWA compared to that of query completion time query with completion flows fromtime, 25 to The result shows the average Compared to the average for60. short messages, DCTCPthat and DT-CWA tookquery only completion time of DT-CWA is less than that of DCTCP. Compared to DCTCP, the less time. However, the performance of DT-CWA is better than that of DCTCP. DT-CWA suffers query from completion of DT-CWA is less ms untillike 35 flows. When thethe number of flows increases, timeouts duetime to port black-out [27] butthan not 50 frequently DCTCP during communication with the the query completion time also increases. However, DT-CWA took less time than DCTCP, thereby end host. increasing the performance. 8 depicts the results of in the short message completion The efficient utilization ofFigure the buffer leads to a reduction theaverage overflow of the queue length and time of DT-CWA compared to that of DCTCP. Compared to the average query completion time, for thereby reduces the loss of packets from the network; thus, DT-CWA achieves better performance short messages, DCTCP and DT-CWA took only less time. However, the performance of DT-CWA than DCTCP. is better than that of DCTCP. DT-CWA suffers from timeouts due to port black-out [27] but not like DCTCP during the communication with the end host. 4.frequently Conclusions The efficient utilization of the buffer leads to a reduction in the overflow of the queue length In this paper, we have developed a modified DCTCP protocol named DT-CWA for improving and thereby reduces the loss of packets from the network; thus, DT-CWA achieves better the performance of DCTCP in data center networks. In DT-CWA, we propose a new congestion performance than DCTCP. reaction mechanism at the sender side and multiple congestion points at the switch for avoiding queue length oscillation and frequent retransmission timeouts, thereby utilizing the buffer space 4. Conclusions efficiently. We conducted extensive simulations using 10 end hosts and 4 switches in Qualnet to In this we haveand developed a modified protocol namedcompared DT-CWA with for improving evaluate thepaper, performance effectiveness of ourDCTCP algorithm, DT-CWA, those of the performance DCTCP in average data center networks. In DT-CWA, we propose new congestion DCTCP in terms ofofthroughput, query completion time, and short messageacompletion time. reaction mechanism at the sender and multiple congestion points at performance the switch forcompared avoiding Our experimental results show that side the proposed algorithm achieves better queue length In oscillation and we frequent retransmission therebyTCP utilizing buffer with DCTCP. future work, will design a protocoltimeouts, for mitigating Incast the as well as space TCP efficiently. Wefor conducted extensiveofsimulations using 10 end hosts and 4 switches in Qualnet to Outcast issues the improvement data center networks. evaluate the performance and effectiveness of our algorithm, DT-CWA, compared with those of Acknowledgments: author would average like to thank the anonymous for short their valuable comments and DCTCP in terms The of throughput, query completionreviewers time, and message completion suggestions that have contributed to improve this paper. time. Our experimental results show that the proposed algorithm achieves better performance Conflicts of Interest: The author declare no conflicts of design interest.a protocol for mitigating TCP Incast as well compared with DCTCP. In future work, we will as TCP Outcast issues for the improvement of data center networks. Acknowledgments: The author would like to thank the anonymous reviewers for their valuable comments and suggestions that have contributed to improve this paper.
Information 2018, 9, 139
8 of 9
References 1. 2.
3. 4.
5. 6. 7. 8.
9. 10. 11. 12. 13.
14.
15. 16. 17. 18. 19. 20.
21. 22.
Zhang, Y.; Ansari, N. On Architecture Design, Congestion Notification, TCP Incast and Power Consumption in Data Centers. IEEE Commun. Surv. Tutor. 2013, 15, 39–64. [CrossRef] Cisco Data Center Infrastructure 2.5 Design Guide. Available online: http://www.cisco.com/c/en/ us/td/docs/solutions/Enterprise/Data_Center/DC_Infra2_5/DCI_SRND_2_5a_book.pdf (accessed on 7 June 2018). Guo, J.; Liu, F.; Lui, J.; Jin, H. Fair Network Bandwidth Allocation in IaaS Datacenters via a Cooperative Game Approach. IEEE/ACM Trans. Netw. 2016, 24, 873–886. [CrossRef] Guo, J.; Liu, F.; Wang, T.; Lui, J.C.S. Pricing Intra-Datacenter Networks with Over-Committed Bandwidth Guarantee. In Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC ’17), Santa Clara, CA, USA, 12–14 July 2017. Data Center Architecture Overview. Available online: http://www.cisco.com/c/en/us/td/docs/solutions/ Enterprise/Data_Center/DC_Infra2_5/DCInfra_1.html (accessed on 7 June 2018). Liu, F.; Guo, J.; Huang, X.; Lui, J.C.S. eBA: Efficient Bandwidth Guarantee under Traffic Variability in Datacenters. IEEE/ACM Trans. Netw. 2017, 25, 506–519. [CrossRef] Tao, W.; Liu, F.; Xu, H. An Efficient Online Algorithm for Dynamic SDN Controller Assignment in Data Center Networks. IEEE/ACM Trans. Netw. 2017, 25, 2788–2801. Chen, Y.; Griffith, R.; Liu, J.; Katz, R.H.; Joseph, A.D. Understanding TCP incast throughput collapse in datacenter networks. In Proceedings of the 1st ACM workshop on Research on Enterprise Networking (WREN ’09), Barcelona, Spain, 16–21 August 2009; ACM: New York, NY, USA, 2009; pp. 73–82. Kant, K. Data center evolution: A tutorial on state of the art, issues, and challenges. Comput. Netw. 2009, 53, 2939–2965. [CrossRef] Alizadeh, M.; Greenberg, A.; Maltz, D.; Padhye, J.; Patel, P.; Prabhakar, B.; Sengupta, S.; Sridharan, M. Data center TCP (DCTCP). ACM SIGCOMM Comput. Commun. Rev. 2010, 40, 63–74. [CrossRef] Hwang, J.; Yoo, J. FaST: Fine-grained and Scalable TCP for Cloud Data Center Networks. KSII Trans. Internet Inf. Syst. 2014, 8, 762–777. [CrossRef] Wu, H.; Feng, Z.; Guo, C.; Zhang, Y. ICTCP: Incast Congestion Control for TCP in Data Center Networks. IEEE/ACM Trans. Netw. 2013, 21, 345–358. Hwang, J.; Yoo, J.; Choi, N. IA-TCP: A Rate Based Incast-Avoidance Algorithm for TCP in Data Center Networks. In Proceedings of the 2012 IEEE International Conference on Communications (ICC), Ottawa, ON, Canada, 10–15 June 2012. Zhang, J.; Wen, J.; Wang, J.; Zhao, W. TCP-FITDC: An adaptive approach to TCP incast avoidance for data center applications. In Proceedings of the 2013 International Conference on Computing, Networking and Communications (ICNC), San Diego, CA, USA, 28–31 January 2013; pp. 1048–1052. Zhang, J.; Ren, F.; Yue, X.; Shu, R.; Lin, C. Sharing Bandwidth by Allocating Switch Buffer in Data Center Networks. IEEE J. Sel. Areas Commun. 2014, 32, 39–51. [CrossRef] Zhang, J.; Ren, F.; Tang, L.; Lin, C. Modeling and solving TCP Incast problem in data center networks. IEEE Trans. Parallel Distrib. Syst. 2015, 26, 478–491. [CrossRef] Wang, G.; Ren, Y.; Dou, K.; Li, J. IDTCP: An effective approach to mitigating the TCP Incast problem in data center networks. Inf. Syst. Front. 2014, 16, 35–44. [CrossRef] Shukla, S.; Chan, S.; Tam, A.S.-W.; Gupta, A.; Xu, Y.; Chao, H.J. TCP PLATO: Packet labelling to alleviate time-out. IEEE J. Sel. Areas Commun. 2014, 32, 65–76. [CrossRef] Sreekumari, P.; Jung, J.; Lee, M. An early congestion feedback and rate adjustment schemes for many-to-one communication in cloud-based data center networks. Photon. Netw. Commun. 2016, 31, 23–35. [CrossRef] Chen, W.; Cheng, P.; Ren, F.; Shu, R.; Lin, C. Ease the Queue Oscillation: Analysis and Enhancement of DCTCP. Procedings of the 2013 IEEE 33rd International Conference on Distributed Computing Systems, Philadelphia, PA, USA, 8–11 July 2013. Jiang, C.; Li, D.; Xu, M. LTTP: An LT-Code Based Transport Protocol for Many-to-One Communication in Data Centers. IEEE J. Sel. Areas Commun. 2014, 32, 52–64. [CrossRef] Fang, S.; Foh, C.H.; Aung, K.M.M. Prompt congestion reaction scheme for data center network using multiple congestion points. In Proceedings of the 2012 IEEE International Conference on Communications (ICC), Ottawa, ON, Canada, 10–15 June 2012; pp. 2679–2683.
Information 2018, 9, 139
23.
24. 25. 26. 27.
9 of 9
Zhang, J.; Ren, F.; Tang, L.; Lin, C. Taming TCP incast throughput collapse in data center networks. In Proceedings of the 2013 21st IEEE International Conference on Network Protocols (ICNP), Goettingen, Germany, 7–10 October 2013; pp. 1–10. Wu, W.; Crawford, M. Potential performance bottleneck in Linux TC. Int. J. Commun. Syst. 2007, 20, 1263–1283. [CrossRef] Improving Transmission Performance with One-Sided Datacenter TCP. Available online: https://eggert. org/students/kato-thesis.pdf (accessed on 7 June 2018). QualNet Network Simulator Software. Available online: http://web.scalable-networks.com/content/ qualnet (accessed on 7 June 2018). Prakash, P.; Dixit, A.; Hu, Y.C.; Kompella, R. The TCP outcast problem: Exposing unfairness in data center networks. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI’12), Lombard, IL, USA, 3–5 April 2013; USENIX Association: Berkeley, CA, USA, 2012; p. 30. © 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).