Optimal Polynomial Time Algorithm for Restoring Multicast Cloud ...

4 downloads 16715 Views 653KB Size Report
networks (VNs) hosted in the cloud. A common limitation in the existing literature, is that they have all been designed with unicast services in mind. In reality ...
IEEE COMMUNICATIONS LETTERS, VOL. 20, NO. 8, AUGUST 2016

1543

Optimal Polynomial Time Algorithm for Restoring Multicast Cloud Services Sara Ayoubi, Chadi Assi, Lata Narayanan, and Khaled Shaban Abstract— The failure-prone nature of data center networks has evoked countless contributions to develop proactive and reactive countermeasures. Yet, most of these techniques were developed with unicast services in mind. When in fact, multiple services hosted in data center networks today rely on multicast communication to disseminate traffic. Hence, the existing survivability schemes fail to cater to the distinctive properties and quality of service requirements that multicast services entail. This letter is devoted to understanding the ramifications of facility node or substrate link failure on multicast services residing in cloud networks. We formally define the multicast virtual network restoration problem and prove its NP-complete nature in arbitrary graphs. Furthermore, we prove that the problem can be solved in polynomial-time in multi-rooted treelike data center network topologies. Index Terms— Cloud data centers, multicast, service restoration, algorithms.

I. I NTRODUCTION ODAY, when a tenant requests to host his/her service in the cloud, he/she typically requests particular QoS requirements, negotiated in the form of a Service Level Agreement (SLA). A particular QoS requirement that most tenants share is a demand for service availability. Such a demand comes as no surprise, given the failure-prone nature of cloud data center networks [1]. To this extent, significant efforts [2] have been devoted towards ensuring the survivability of virtual networks (VNs) hosted in the cloud. A common limitation in the existing literature, is that they have all been designed with unicast services in mind. In reality, depending on the type of service running, the mode of communication between the virtual machines (VMs) can either be one-to-many, many-toone, or even many-to-many. Indeed, several services [3]–[6] hosted in cloud networks rely on multicasting to disseminate their traffic, e.g., websearch services, High Performance Computing (HPC), and map reduce-like cooperative computation systems, etc. Multicast services differ from unicast VNs in many ways, which ultimately renders existing protection schemes unusable in handling the survivability of multicast service. Mainly, a Multicast VN (MVN) consists of a multicast distribution tree that connects a multicast source to a set of recipient nodes. Further, services that involve real-time communication usually demand strict QoS in terms of delays, such as end-to-end delay and

T

Manuscript received April 15, 2016; revised May 18, 2016; accepted May 18, 2016. Date of publication May 23, 2016; date of current version August 10, 2016. This work was made possible by the NPRP 5-137-2-045 grant from the Qatar National Research Fund (a member of the Qatar Foundation). The statements made herein are solely the responsibility of the authors. The associate editor coordinating the review of this letter and approving it for publication was S. Yu. S. Ayoubi, C. Assi, and L. Narayanan are with Concordia University, Montreal, QC H3G 1M8, Canada (e-mail: [email protected]; [email protected]; [email protected]). K. Shaban is with Qatar University, Doha 2713, Qatar (e-mail: [email protected]). Digital Object Identifier 10.1109/LCOMM.2016.2571691

differential-delay (delay-variation) constraints. Thus, the problem of survivable MVNs in the event of failures demands separate attention and a tailored restoration that accounts for these properties. In this work, we study the problem of restoring multicast services in cloud data center networks. We consider MVNs which are already placed through any existing MVN embedding methods (e.g. [7]). We start by understanding the impact of failures on MVNs, then we present a formal definition of the MVN restoration problem, and we prove its NP-Complete nature in general graphs (e.g. inter-data center network interconnects, or wide-area networks). Further, we prove that the MVN restoration problem can be solved in polynomial-time for multi-rooted tree-like network topologies. II. P ROBLEM M OTIVATION A. Network Model Overview The network model consists of a substrate network depicting a cloud data center network hosting tenant services with one-to-many communication mode. We model the network as an undirected graph with a set N of substrate nodes interconnected via a set L of physical links; denoted as G s = (N, L). Each substrate node n ∈ N, can either be a facility node (server) that can host VMs with finite resource capacity (e.g., memory, CPU), or a network node (router/switch) that routes the traffic across the network. Each hosted MVN service is denoted as G v = (V, E, b, γ , δ), where V = {s, T } is the set of VMs consisting of a multicast source s and a set of recipient nodes T , E = ∪t ∈T (s, t) represents the set of virtual links each with a bandwidth demand b. In addition, MVNs are usually associated with delay constraints; particularly for real-time applications. That is, the source must reach all recipients within an end-to-end delay γ . Moreover, the jitter/delay-variation between recipients nodes must not fluctuate beyond a given threshold δ, to ensure correctness and synchronization. Let m = (m N , m L ) denote the embedding solution for G v on G s ; where m N : {V → N} denotes the one-to-one node mapping solution, and m L : {E → P}, where P represents the set of paths that forms the multicast tree. The non-conflicting (one-to-one) placement of VMs belonging to the same MVN enables better fault-tolerance. Fig. 1(a) illustrates an example of a 2-receiver MVN embedded in a substrate network. We denote the capacity of the substrate/virtual nodes1 and links by the digits placed next to each node or link, respectively. B. Impact of Failures on MVNs One way to protect a MVN against a facility node (e.g., server) failure is by augmenting it with backup virtual nodes. As for the failure of substrate links, this can be mitigated by 1 For the sake of clarity, we only show a single resource capacity.

1558-2558 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1544

IEEE COMMUNICATIONS LETTERS, VOL. 20, NO. 8, AUGUST 2016

a lower cost restoration solution, while also maintaining the MVN’s QoS requirements (dotted green tree in Fig. 1(b)). This leads us to conclude the following key observation: Restoring a MVN entails restoring the failed element while reconstructing the lowest-cost delay-constrained tree. III. T HE MVN R ESTORATION P ROBLEM (MVNR)

Fig. 1.

Illustrative Examples.

constructing an edge-disjoint backup tree. Such schemes are commonly known as proactive protection, since the backup nodes and links are instated prior to any failure [8], [9]. While this offers a certain degree of reliability, it is also fairly costly since the provisioned resources for these backup nodes and links remain idle until failures occur. An alternative approach could be to restore the affected resource(s) upon failures. Such a “reactive approach” is more cost-efficient as it eliminates idle resources in the network, but it demands fast restoration time to avoid long service downtime. This work focuses on providing reactive countermeasures. When a substrate link (or network node) fails, it may disconnect an entire subtree connected via this substrate link to the rest of the multicast group. For example in Fig. 1(a), the failure of substrate link {n 1 −n 3 } disconnects the subtree rooted at n 3 ’s adjacent network node, thereby detaching receivers t1 and t2 from the multicast source. A similar outcome will occur if n 3 ’s adjacent network node fails. On the other hand, when a facility node fails, this failure could either affect a MVN’s source or one of its recipient nodes. In either case, the problem consists of finding a new host for the failed VM that has enough residual capacity to accommodate it, and can connect to the rest of the multicast group while satisfying the delay constraints. To illustrate this, consider the example shown in Fig. 1(a). We assume (in this example only) that the delay measured on each physical link is 1, and that the hosted MVN has γ = 2, and δ = 0. The failure of facility node n 2 disconnects recipient t1 . Clearly n 5 is the only facility node with sufficient residual capacity to host t1 . Now, it is also important to make sure that n 5 can connect to the rest of the multicast group within γ and δ. Given that t2 is 2 hops away from the source and that δ = 0, then n 5 must also be exactly 2 hops away from the host (n 1 ) of the source. In this case, the only feasible restoration solution is to connect n 5 to n 1 via substrate path {n 1 −n 2 −n 5 } (the red dashed tree in Fig. 1(b)). Restoring a MVN not only requires finding a feasible solution that satisfies the MVN’s QoS (e.g., its delay constraints), but also, it is in the cloud provider’s best interest to find the lowest cost restoration solution. This enables the cloud provider to maximize his/her network admissibility, and in return the achievable long-term revenue. For instance, recall the restoration solution presented in Fig. 1(b) post-failure of recipient node t1 . Clearly, the resultant tree is more costly than the pre-failure distribution tree. Alternatively, reconfiguring the traffic routing to t2 via substrate path {n 1 − n 2 − n 4 } will yield

When a facility node fails and affects a hosted MVN G v , the MVN restoration (MVNR) problem requires finding another feasible facility node (with sufficient resources) to host the failed VM, and can connect to the rest of the multicast group with the lowest-cost delay-constrained tree. Here, we consider the cost2 as the total bandwidth consumed by the restored tree; thus normalized by the bandwidth demand b, the cost becomes the number of substrate links used. We represent the hosts of the unaffected VMs as K , and the set of feasible facility nodes that can accommodate the failed VM as Q; the MVNR decision problem can be formulated as follows: Definition 1: Given a post-failure substrate network ,   = N − 1 and  s = ( N L), where N L denote the set G of post-failure active substrate nodes and links, an MVN  G v = (V, E, b, γ , δ, ), a failed VM vˆ ∈ V , a set K  N indicating the hosts of the unaffected VMs ({V -v}), ˆ and a set  − K }; is there a host x ∈ Q such that K ∪{x} can Q = {N be connected by a tree in G s satisfying the delay constraints δ and γ , and has at most μ links? Let S  denote the set of all feasible restoration solutions for G v , where each s  is associated with a cost function φ(s  ). The MVN restoration model (REM) can be mathematically formulated as follows: s ∗ = arg min φ(s  ) s

Subject To

s ∈ S.

(1)

Theorem 1: The MVN restoration problem is NP-Complete. Proof: MVNR can be easily seen in the NP-class; since given a restoration solution, it can be verified in polynomial time. Next, we will prove that the problem is NP-Complete via a reduction from the NP-Complete graph-based Steiner Tree (ST) problem [10]. Throughout the proof, we consider that the failed facility node is omitted/removed from the network. For the sake of completeness, we provide the following definition of the ST problem: Given an undirected weighted graph G = (N, L) and a subset of nodes R  N; is there a tree connecting R with a cost less than or equal to w? Note that ST is NP-Complete even when the weight on all the edges is uniform. Now, given an instance (G = (N, L), R, w) of the ST problem, we transform it in polynomial time into an instance of the MVNR problem as follows: we build a substrate network G s = (N  , L  ) by adding to G a set of auxiliary nodes R  = |R| (with capacity set as a very large number M ); where each node in R  is connected to a single distinct node in R via a single auxiliary link of weight 1. Furthermore, we create an MVN G v = (V, E, b, γ , δ) with 2 A penalty can be added to account for service disruption.

AYOUBI et al.: OPTIMAL POLYNOMIAL TIME ALGORITHM FOR RESTORING MULTICAST CLOUD SERVICES

Fig. 2.

Fig. 3.

Reduction from the graph-based Steiner Tree Problem.

FatTree Network.

|V | = |R| + 1, assign vˆ to be an arbitrary node in V , K = R and Q  {N  − K }, γ = δ = ∞, and μ = w + 1. Fig. 2 illustrates an example of our reduction; the highlighted vertices (in Fig.2(a)) represent R, and the grey ones (in Fig.2(b)) indicate the set of auxiliary vertices R  . Clearly, this transformation can be done in polynomial-time. We now show that G has a Steiner tree connecting the vertices in R of cost ≤ w, ⇐⇒ ∃ a host x ∈ R  such that R ∪ {x} can be connected by a tree in G s with at most w + 1 links. If ST has a solution H of cost h ≤ w, then we can map the failed node vˆ on any node x ∈ R  and augment the tree H with the auxiliary link from x to its corresponding node in R to obtain an MVN restoration solution of cost h + 1 ≤ w + 1. Conversely, if MVNR has a solution of cost h  ≤ w + 1, since vˆ may be mapped on a node x ∈ R  , by removing the link from x to its corresponding node in R, we obtain a tree linking the nodes in R of cost h  − 1 ≤ w, that is, a solution to ST in G of cost ≤ w. This completes the proof of the reduction, we conclude that the MVNR problem is NP-Complete.  Note that by restriction, MVNR remains NP-Complete for any other instance with strict delay constraints. IV. C ASE OF S TRUCTURED DATA C ENTER N ETWORKS Multi-rooted tree-like topologies [3] are common data center network structures. They consist of multiple layers of commodity switches interconnecting a large number of servers. The goal of this design is to provide a high bisection bandwidth, while offering multiple equal-cost paths between the facility nodes and eliminating network oversubscription. Fig. 3 illustrates an example of such network topologies known as the FatTree network. The FatTree network is a multirooted tree like network topology built out of k-port switches 3 interconnecting k4 servers. It consists of k pods, where every pod forms a complete bipartite graph of k2 Aggregate Switches (ASs) connected to k2 Top of Rack (ToR) switches. Further, each ToR is connected to k2 Server Racks (SRs), and each AS is connected to k2 Core Switches (CS). It has been shown in many cases that these network topologies can be leveraged to enhance the support of multicast in data centers [5].

1545

Algorithm 1 REAL (U = S Rs, W ={}, v, ˆ R = {V −v}, ˆ k = 1) 1: while (U is not empty) do 2: if (vˆ is a receiver node) then 3: W = Hop-to-Hop-Search(R,k,b); 4: else W = Path-Convergence(R,k,b); 5: end if 6: if (∃ feasible SR(s) ∈ W ) then 7: Return GetLowestCostSolution(W ); 8: end if 9: U −= W 10: k++; 11: end while Theorem 2: The optimal multicast restoration problem can be solved in polynomial time in the FatTree network. In the following, we will prove that in the event of a source node or any recipient node failure, the optimal multicast restoration problem can be solved in polynomial time over the FatTree network. We denote the proposed restoration algorithm as REAL (illustrated in Algorithm 1). In order to ensure that our restored multicast tree is always loop-free, our restoration lookup ensures that the traffic in the resultant tree changes its direction once; that is up-packets can change their direction once to move traffic downwards. However, a downward facing packet can never be re-directed upwards. This aforementioned constraint ensures that the resultant tree will never have any cycles [11]. Proof (Recipient Node Failure - Hop-to-Hop Search): Our algorithm begins by pruning the substrate network in order to remove server racks and/or substrate links that violate the resource demands of the failed receiver. Next, we search via the remaining components in the MVN tree to find the SR that can host the failed receiver while incurring the minimal amount of additional Steiner points. To do so, we perform a hop-to-hop search from each active VM in the tree. Further, the look-up is encouraged to go through existing links in the multicast tree by setting the weight on these links to 0. Subsequently, the first feasible SR that will be reached by any VM will yield the lowest cost restoration solution, since it will incur the minimal number of additional Steiner points. Indeed, the set of candidate hosts that will be reached first are the SRs that share the same ToR with active VMs. If any of these candidate hosts was deemed feasible (in terms of resource capacity and respecting the delay constraints), then this solution will yield 0 additional Steiner points. If none of the candidate hosts was deemed feasible, the search will persist by exploring the ASs, yielding at most 2 additional Steiner points; and finally via the CSs. Clearly, If no solution can be found, then the MVN with the current embedding solution cannot be restored. Alternatively, the first solution found will definitely yield the lowest cost restoration solution. The topological structure of the FatTree network provides 2 every pair of servers with k4 equal cost paths. Further, there 3 can be no more than k4 candidate SRs. Hence, if we restrict the search from the source node only, we have a worst-case complexity of O(k 5 ), which is indeed polynomial in the size of the substrate network. To better illustrate this, consider

1546

Fig. 4.

IEEE COMMUNICATIONS LETTERS, VOL. 20, NO. 8, AUGUST 2016

Numerical Results over FatTree.

the example in Fig. 3, where a 3-receiver MVN is hosted in a FatTree (k = 4). Given a recipient node failure, we launch the hop-to-hop search from the SRs in the current MVN. Here, we find that S R1 and S R2 are reachable from the ToRs in the current tree (T o R1 and T o R2 ). Note that choosing either one of these SRs will yield the lowest cost restoration tree. However, it could happen neither S R1 nor S R2 are feasible (e.g., due to resource scarcity), then the hop-to-hop search will continue by moving towards the ASs and so on, until a feasible solution is found. Source Node Failure - Path-Convergence Routine: Now, the problem becomes that of finding a SR that can host the source and yield the lowest cost delay-constrained Steiner tree. Here, we prove that by adopting a path-convergence approach from the recipient nodes, the first feasible SR that these nodes will converge to will definitely yield the lowest cost restoration solution. We distinguish between two cases: Case 1, where the recipient nodes are residing in the same pod, and Case 2 where the recipient nodes are distributed across multiple pods. In case 1, if the recipient nodes are sharing the same ToR, then they will always converge first towards any feasible SR connected to that same ToR. If no solution was found, the next convergence will occur at the intra-pod SRs. Finally, in the event where no intra-pod solution can be found, then the next convergence will occur at SRs residing in alien pods. Clearly, the first solution found will yield the lowest cost restoration tree, as it will be composed of fewer Steiner points interconnected via lower-cost (lower-layer) substrate links. Now in case 2, all receivers will explore all SRs at the same time; and the algorithm will return the lowest-cost solution found. Given that to each candidate source node, 2|T |

we can find at most k 4 distribution trees; hence we have a worst-case complexity of O(k 2|T |+3 ), which is polynomial in the size of the network. V. N UMERICAL A NALYSIS We evaluate the performance of REAL over FatTree (FT) network (Fig. 4) against the optimal solution obtained by REM, and two benchmark algorithms: the Steiner- and

Greedy-based restoration schemes. Steiner consists of building the lowest cost restoration solution (disregarding delay constraints); whereas Greedy will always restore the failed VM on the server with the highest residual capacity. First, we consider FT (k = 4), and we look at the restoraRest ored tion ratio (RR = # #MMVVNs Ns F ailed %); we observe that both REM and REAL achieve 100% RR as we vary the failure frequency F from 2 to 8 (F denotes the number of failures that will be triggered at random intervals); whereas when F hits 8, the restoration ratio of Steiner drops to 60%, and that of Greedy to 46%. Clearly, the RR impacts the long-term achievable revenue (as shown in Fig. 4(b)); where revenue is the total profit incurred by each admitted MVN throughout its residency. Note that whenever a failed MVN could not be restored, a penalty cost (deducted from the revenue) is incurred by remitting 50% of the revenue gained while hosting this MVN. Similar gains are observed on FT (k = 8) (Fig. 4(c) and 4(d)). It is important to note that how much REAL can outperform the two benchmark algorithms depends on the QoS requirements of the hosted MVNs; for VNs with loose delay constraints, Steiner will achieve comparable results to REAL in terms of cost, revenue, and RR. VI. C ONCLUSION We studied the failure of MVNs hosted in cloud networks. We considered the impact of various network components failure, and we showed that in all cases, the multicast restoration scheme must not only restore the failed element but also ensure that the restoration solution maintains the MVN’s QoS requirements, namely its end-to-end delay and delay-variation constraints. We presented a formal definition of the MVN restoration problem for facility node failure, and proved its NP-Complete nature in arbitrary graphs. Further, we exploited the structured topology of typical data center networks to propose an optimal polynomial-time algorithm. R EFERENCES [1] P. Gill, N. Jain, and N. Nagappan, “Understanding network failures in data centers: Measurement, analysis, and implications,” ACM SIGCOMM Comput. Commun. Rev., vol. 41, no. 4, pp. 350–361, Aug. 2011. [2] S. Herker, A. Khan, and X. An, “Survey on survivable virtual network embedding problem and solutions,” in Proc. ICNS, 2013, pp. 99–104. [3] D. Li, Y. Li, J. Wu, S. Su, and J. Yu, “ESM: Efficient and scalable data center multicast routing,” IEEE/ACM Trans. Netw., vol. 20, no. 3, pp. 944–955, Jun. 2012. [4] D. Li, M. Xu, M.-C. Zhao, C. Guo, Y. Zhang, and M.-Y. Wu, “RDCM: Reliable data center multicast,” in Proc. IEEE INFOCOM, Apr. 2011, pp. 56–60. [5] A. Iyer, P. Kumar, and V. Mann, “Avalanche: Data center multicast using software defined networking,” in Proc. COMSNETS, 2014, pp. 1–8. [6] S. Ayoubi, Y. Chen, C. Assi, T. Khalifa, and K. B. Shaban, “Multicast tree repair and maintenance in the cloud,” in Proc. IEEE CLOUD, Jun./Jul. 2015, pp. 829–835. [7] S. Ayoubi, C. Assi, K. Shaban, and L. Narayanan, “MINTED: Multicast virtual network embedding in cloud data centers with delay constraints,” IEEE Trans. Commun., vol. 63, no. 4, pp. 1291–1305, Apr. 2015. [8] A. Fei, J. Cui, M. Gerla, and D. Cavendish, “A ‘dual-tree’ scheme for fault-tolerant multicast,” in Proc. IEEE ICC, vol. 3. 2001, pp. 690–694. [9] D. Li et al., “Reliable multicast in data center networks,” IEEE Trans. Comput., vol. 63, no. 8, pp. 2011–2024, Aug. 2014. [10] F. K. Hwang, D. S. Richards, and P. Winter, Eds., The Steiner Tree Problem, vol. 53. New York, NY, USA: Elsevier, 1992. [11] R. N. Mysore et al., “PortLand: A scalable fault-tolerant layer 2 data center network fabric,” ACM SIGCOMM Comput. Commun. Rev., vol. 39, no. 4, pp. 39–50, 2009.