Dragos Andrei, Massimo Tornatore, Charles U. Martel, and Biswanath Mukherjee. University of California, Davis, CA 95616. Email: {andrei, tornator, martel, ...
Flexible Scheduling of Multicast Sessions with Different Granularities for Large Data Distribution over WDM Networks Dragos Andrei, Massimo Tornatore, Charles U. Martel, and Biswanath Mukherjee University of California, Davis, CA 95616 Email: {andrei, tornator, martel, mukherje}@cs.ucdavis.edu Abstract—Many networking applications require distribution of data from a central point to multiple destinations; this distribution can be efficiently achieved by the means of multicasting. Traditionally, multicasting has been considered for on-demand applications such as HDTV, Video-on-Demand (VoD), IPTV, which usually require to start data transmission immediately. However, in the case of emerging e-Science and high-performance applications (which frequently need to replicate large datasets to multiple locations), the data distribution does not necessarily need to take place instantaneously; instead, the multicast session can be accommodated considering a flexible start time for the large data transfer. We study the efficient provisioning of Multicast Data-Distribution Requests (MDDRs) with flexible scheduling over WDM networks. We consider the practical case of multicast sessions that may require less than the entire capacity of a wavelength; hence the multicast sessions need to be “trafficgroomed”. Our first multicast provisioning approach (named Rand) generates randomized alternate multicast trees on which we try to provision the multicast session, and then attempts to assign wavelengths and schedule the session’s start time. In our second approach (named AllSlots), for each available start time S, we dynamically generate trees depending on the network state at time S. In our next approach (named Break), for the cases when provisioning an entire multicast tree fails, we enable the possibility of “breaking” the tree into subtrees (with independent start times) serving subsets of destinations. Moreover, we study the impact of partitioning the datasets into pieces on our multicast provisioning approaches, and also compare our multicast algorithms with an unicast approach.
I. I NTRODUCTION Optical networks with Wavelength-Division Multiplexing (WDM) technology provide a cost-efficient and scalable transport platform for supporting the growing traffic generated by a variety of data-intensive emerging network applications. Examples of such applications include the e-Science and grid computing applications, which may require the distribution of large datasets to multiple destination sites. These datasets can consist of bulk data obtained from large-scale scientific instruments (e.g., from the radio telescopes used in the Very Long Baseline Interferometry (VLBI) experiments [1]), can be data from the physics experiments at CERN [1], or the results of large-scale data processing, etc. The time-aware distribution of such large datasets over WDM optical infrastructure, so that the maximum time requirements [2], [3] of the high-end grid customers are satisfied, constitutes the topic of this paper. This work has been supported in part by NSF Grant No. CNS-06-27081.
In our work, the data which needs to be distributed is not necessarily needed immediately at destinations, but the timescheduling can be flexible. By flexible scheduling, we mean that data transmission can start at a later time, as long as all data is transmitted to the destinations before a predefined maximum completion time. In previous work [3], we have shown how to efficiently aggregate multiple files at a single destination considering flexible scheduling. In this paper, the information goes from a single source to multiple sites: we can now employ multicasting, and new provisioning approaches are needed to tackle this problem. Multicasting is known as an efficient approach for distributing information; the multicast problem has been well studied in the context of WDM networks [4]–[6]. The work in [5] accommodates multicasting of different bandwidthgranularity connections (the unicast problem of aggregating low-speed requests onto high-bandwidth pipes is denoted as traffic grooming [7], [8]). The multicast problem has also been tackled in the context of grid computing [9], in which reliable high-speed dissemination of data is considered. Note that, in grid computing, data replication throughout the network is an important problem, as it can be used to provide reliable and fast access to data (to future customer requests), and favors energy-efficient transmissions of data (over shorter distances). In this work, we consider the dynamic provisioning of multigranularity Multicast Data-Distribution Requests (M DDRs) with flexible time scheduling over WDM networks. In optical networking literature, there are two main time-dependent traffic models (for unicast requests): i) a scheduled traffic model [6], in which the start and end times of the lightpath demands are fixed, and ii) a sliding scheduled model, where the start time of a request can “slide” in a predefined timewindow [10]. Works in [10]–[13] consider unicast sliding requests. The work in [12] provisions flexible dynamic unicast advance reservations in optical grids, without considering traffic grooming; reference [14] also considers a flexible timescheduling problem in lambda grids, under a static traffic environment. Our flexible scheduling model is close to the sliding scheduled traffic; however, as a difference to existing work, our problem considers flexible multicast provisioning (and traffic grooming). The work in [6] also addresses a time-aware multicast problem, however, [6] has the following differences with respect to our work: it accommodates scheduled multicast
978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.
requests (i.e., without time flexibility) under a static traffic in a network with no wavelength conversion, and without traffic grooming. Also, reference [15] considers point-to-multipoint lightpath provisioning, under a scheduled traffic model, in a wavelength-continuous network, with no traffic grooming. Our main motivation for the flexible-scheduling multicast problem is represented by high-performance applications (distributing datasets to be used as input for remote applications); however, other (consumer-oriented) scenarios, requiring flexible-time multicast can be envisioned. Examples are the replication of videos by a VoD provider, or daily database updates from an institution’s central office to its branches. The rest of the paper is organized as follows. Section II presents our M DDR provisioning problem, Sec. III presents our multicast provisioning approaches, and Section IV presents illustrative numerical results. Section V concludes the study. II. G ENERAL P ROBLEM D ESCRIPTION We are given a WDM network represented as directed graph G(V, E, W ), with V = set of nodes, E = set of edges, and W = set of wavelengths; each link e ∈ E is equipped with |W | wavelengths of capacity C (e.g., C=10 Gbps (OC-192)). Nodes are connected to the network using Opaque switches1 . We consider a dynamic network environment, with arrivals of requests for distributing large files from a centralized location to multiple remote nodes. Such a Multicast Data-Distribution Request (MDDR) R is defined by the tuple: R=(M, F, B, A, D). M is the multicast group M =(s, d1 , d2 , ..., dk−1 ), consisting of k nodes (|M | = k). Between all nodes n ∈ M , the first node is the multicast source, while the next k-1 nodes are the destinations. Sourcenode s contains a large dataset of size F , which needs to be replicated at destinations. R’s transmission rate is B (in Gbps). Note that heterogeneous rates for different MDDRs require traffic grooming [7] to efficiently pack the traffic onto wavelengths. The holding time of the multicast transfer can F , while propagation delay is negligible be obtained as H = B compared to transmission delay for the large-sized datasets to be transferred. A is R’s arrival time slot, and the multicast needs to finish all its transfers before deadline D [2], [3] (deadlines are Quality-of-Service parameters widely used in the context of grid computing applications; in this work, a deadline represents the maximum completion time when data is needed at destinations, and is stated by the multicast client). We consider that network time is slotted into equal-length time slots (approach also used in [2], [11], [12]), so H, A, D and the to-be-assigned start time slot S are integer multiples of slots (note that the previously computed H can be rounded to the next higher slot length). MDDR R can be scheduled starting with any time slot S, as long as A≤S≤D-H (in our notation, R must be scheduled during time window [A, D), and time slot D-1 is the last possible slot allocated for transferring data). 1 Opaque Optical Cross-Connects (OXCs) convert incoming data to electronic domain and are a popular choice for modern networks. They support multicasting by replicating the arriving signal into multiple copies in electronics [5], which are then switched towards their destinations. Opaque OXCs provide seamless wavelength conversion and grooming capabilities.
Our goal is to setup the dynamic multicast session, i.e., assign resources for the multicast tree (route, assign wavelengths and groom - constituting our problem’s spatial dimension), as well as fix the period when allocating these resources (i.e., the temporal dimension), such that we retain a maximum amount of resources unused to accommodate future traffic. III. P ROVISIONING A PPROACHES We propose online algorithms for solving the MDDR provisioning (MDDRP) problem over WDM networks with opaque OXCs (possessing wavelength conversion at all nodes). Before we describe our provisioning algorithms, we present the data structure which maintains the current network state: the residual-capacity multidimensional array C[e][w][q], ∀ e ∈ E, ∀ w ∈ W , ∀ q ∈ Q (where Q is the set of time slots). For each wavelength-link (e, w), C maintains the remaining available bandwidth for each slot q ∈ Q, e.g., C[e][w][q] = 10 Gbps (the line rate), when no bandwidth is utilized during slot q, and = 0 when all bandwidth is used during q. R = (M, F, 5 Gbps, 1, 9); H=4 slots. Remaining Capacity
C[l][w] Slot No.
7
5
2
9
10
8
7
10
1
2
3
4
5
6
7
8
D-H
A
9
D
Possible start slot
Pl,w Slot No.
F
F
F
T
T
1
2
3
4
5
Fig. 1. Example for computing the possible start times vector Pl,w (for a given wavelength-link (l, w)).
Given an incoming M DDR R (defined as in Sec. II), and the network state information in C, we need a fast procedure to decide if a wavelength-link (l, w) is available (has remaining capacity ≥ B) for H consecutive time slots starting with any time slot S from time window [A, D-H]. Therefore, we first compute the boolean vectors Pl,w [S], which indicate if time slot S can be the start slot for R’s transmission, on wavelengthlink (l, w). Next, we centralize the Pl,w information for all wavelengths w ∈ W of link l into boolean vector Pl , in which we record (for each possible start slot S) if the transmission of R’s data is possible on link l, starting with slot S. For each valid start slot S on link l (i.e., Pl [S]=true), we also record the wavelength w to use for that slot (which is the first wavelength available starting with slot S, chosen by using First-Fit [7]), which is maintained as additional data to Pl [S]. Our problem of finding H consecutive slots with remaining capacity ≥B from window [A, D) (i.e., computation of Pl,w ) can be reduced to finding all patterns of length H in a string of length D-A (by using a slotted boolean vector that translates the vector of capacity values C[l][w] (for wavelength-link (l, w)) into boolean values indicating if C[l][w][q] ≥ B). Our exact algorithm for computing Pl,w [S] is related to the efficient “String Matching” algorithms in [16] and is not shown due to space limitations, however, we present its main features below. It is a fast linear-time algorithm, which only requires a single check of the time slots from time window [A, D) of C[l][w]. Consider the example in Fig. 1 (for MDDR R =
978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.
Algorithm 1 Multicast Data-Distribution Request Provisioning (MDDRP) - Rand Input: MDDR R = (M, F, B, A, D). Network state: Residual capacity matrix: C[e][w][q], ∀ e ∈ E, ∀ w ∈ W , ∀ q ∈ Q. Output: Multicast tree T = (VT , ET ) (with M ⊂ VT , ET ⊂ E), Wavelength assignment (WA) for each tree branch l ∈ ET , Start time slot of multicast transmission S; new network state. 1) Generate a set U of N alternate randomized multicast trees which connect the nodes in multicast group M (U = {T1 , T2 , ..., TN }). 2) For each tree Ti ∈ U , i = 1, 2, ..., N do a) For each branch l of tree Ti (l ∈ ETi ) do Compute the vector Pl [q] of possible start time slots q on branch l. b) For each time slot q such that A ≤ q ≤D-H do If Pl [q] = true for all tree branches l ∈ ETi then Choose tree Ti : Accept R, Reserve bandwidth B in matrix C on Ti ’s used resources and Return tree Ti , W A for its branches (using the information recorded in Pl [q]) and let start slot S ← q. Exit algorithm; EndIf EndFor EndFor 3) Block R, because no alternate tree Ti has all its branches available (i.e., free bandwidth B for a period of H consecutive slots during window [A, D) ) starting with any start time slot.
(M, F,5 Gbps, 1, 9), with H=4 slots). The first considered start slot is S=1. Since C[l][w][3]=2 (< 5 Gbps), the value of Pl,w [1] = false. Now, our algorithm directly progresses to checking (in C[l][w]) the start slot S=4, as we know that we cannot find valid transmission times starting with slots 2 and 3. For S=4, all its following H slots (i.e., slots 4, 5, 6, 7) have enough free capacity, so we set Pl,w [4]=true, and we advance to slot S=5. Now, we only need to check the value of one time slot: slot S+H-1 (i.e., slot 8) from the total of H slots. This is because, for the previous start slot (slot S-1), Pl,w [S-1]=true; hence we know that the next H-1 slots after S have enough capacity, and we only need to check the H-th slot. A. MDDRP Base and MDDRP Rand Algorithms The MDDRP Rand algorithm is presented in Alg. 1. MDDRP Base is built on a same framework with MDDRP Rand, and is used for benchmarking purposes. In both algorithms, we need to obtain minimum-cost Steiner trees for each multicast request. As the Steiner tree problem is NP-complete, we employ a common Steiner-tree heuristic [17] (also used in [5]), which is based on a form of Prim’s algorithm [16]. We denote it as STH. At each step, STH adds a new destination node with the minimum-cost path to a node in the partiallybuilt tree, until all the destinations are incorporated into the tree. The first node that we add to the partial tree is source s. The difference between Base and Rand is the cost matrix used when generating the Steiner trees in Step 1 of Alg. 1. In Base, we only generate one tree, by using STH on the adjacency matrix [16] of graph G (i.e., which has unit cost for
all of G’s edges)2 , and we next attempt the scheduling on this tree. As a difference, in Rand, we generate multiple “short” (having a small number of branches) trees, by performing STH on a matrix G having the link cost of each edge (from the initial graph G) as 1 + , with = a random number in the range [0, M AX ] (where M AX is a small parameter, e.g., it is 1.0 for our results in Sec. IV). To obtain different alternate trees, when we generate any e.g., n-th alternate tree, we check if it differs from the previous n-1 trees. If not, we attempt at most J=J1 +J2 tree generations to obtain this n-th tree, before we stop the search for alternate trees (in Sec. IV we chose J1 =5, J2 =10). In the first J1 iterations we use the random link costs described above. If a different tree is not found yet, for the next J2 iterations, we try to generate a tree which differs from one of the previous n-1 trees (let this previous tree be Ti ) by at least one branch l ∈ Ti (tree Ti and its branch l are selected randomly). This is achieved by removing branch l from matrix G (in G , l is assigned a very high cost, while the costs of the other edges are generated using the metric above). For simplicity, in the rest of the paper, we will specify that we attempt to generate N alternate trees, even if exactly N different “short” trees may not always be found (e.g., because of the limited topological flexibility in the network used). We occasionally find fewer than the maximum number of trees, however, it is rare for this to cause a request to be blocked. Our randomized tree generation allows us to: (i) construct (at most) N alternate trees; each of these trees’ edges are next explored for simultaneous transmission intervals, (ii) avoid repeatedly choosing same edges for different multicast requests, as it may happen in the case of the deterministic tree generation in Base, and (iii) still find “short” trees, as the value chosen for M AX is small. For each alternate tree Ti , MDDRP Rand first computes (in Step 2.a) the vector of possible start times (Pl ) for each of the trees’ branches l (note that if two alternate trees have common branches, the vector P for those branches is computed only once for the first tree, and reused for next trees). Next, for each possible start time slot q, we check if all of Ti ’s branches are available for the transmission of multicast request R, starting with q. We choose the first such available slot q, as scheduling R early is expected to leave more “free time” for future requests. R is blocked if no simultaneous start time slot is found for all branches, in any explored tree. MDDRP Rand is fast: after generating the tree, the wavelengths on its branches are prepared, and then (in the worst case for all slots q), we only need to check the value of Pl [q]. B. MDDRP AllSlots Approaches We consider two flavors of MDDRP AllSlots. MDDRP AllSlots first computes the possible start slot vector Pl (similar with Step 2.a in Alg. 1); as a difference to Alg. 1, here Pl is computed for all network links l ∈ E. Next, for 2 In our multicast scheduling problem, we require trees with branches available during the same time period, so having trees with a small number of branches/tree-hops both: (i) uses little cumulated bandwidth, and (ii) eases the simultaneous-time constraint. That is why we use the adjacency matrix, and not a matrix which e.g., considers link distance as its metric.
978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.
each possible time slot q (with A ≤ q ≤D-H), we generate an auxiliary graph Gq which includes all feasible edges of graph G (i.e., links which have Pl [q] =true, so transmission is possible on link l starting with slot q). We attempt to generate a multicast tree T (covering all destinations) on graph Gq . If such a tree T is found, we accept request R. If no tree is found on any slot, R is blocked. A second flavor of the algorithm is MDDRP AllSlotsSmallT ree , in which we choose (from all possible slots q) the first slot which provides a tree T with a “small enough” number of branches. More precisely, in an initialization step, we generate a good approximation tree (by using STH) TB =(VTB , ETB ) on the entire graph G (note that TB always has ≤ number of branches of a tree T obtained on a graph Gq which contains only a subset of E’s edges). Next, we attempt to generate tree T on Gq for each time slot q (as discussed in the above paragraph), and, even if T is found, we only accept R if T has a “small enough” number of branches (|ET | ≤ α · |ETB |, with α a small parameter >1.0). If, after scanning all time slots q, no “small enough” tree is found, we still choose the tree with the smallest number of branches found on all slots, and accept R. AllSlots algorithms are so named since, for all checked slots, we generate a Steiner tree (STH has complexity O(k|V |2 )). C. Splitting into SubTrees with Independent Start Times The main idea of this approach (detailed in Alg. 2) is to allow the splitting of the large multicast tree TL (generated as in Sec. III-A) covering all the destinations (di , with i=1, 2, ..., k1), into multiple subtrees serving subsets of destinations; each of these subtrees can then be scheduled independently in time. Intuitively, this approach helps alleviate the constraint of scheduling all of TL ’s branches simultaneously in time. The tradeoff is that splitting into multiple subtrees may slightly increase the bandwidth used. That is why, in Alg. 2, we always try to split into a minimum number of subtrees (and even keep the multicast tree unsplit whenever possible). For a multicast request R, the global (for all subtrees) auxiliary array X from Alg. 2 indicates which destinations are reached by all subtrees considered so far, while array DestinationsReached[q] (re-computed for each of Step 2’s while iterations) indicates how many destinations can be reached during the time interval starting with time slot q. Alg. 2 (same as in Alg. 1) also tries to schedule R on the first “available” tree out of N alternate randomized trees that initially cover all destinations (in Alg. 2, the index of each such tree is the step variable). No destination is initially reached (Step 1 of Alg. 2). A tree T which covers all destinations not yet satisfied (and source node s) is generated in Step 2.b. Next, in Step 2.e, we search for how many of these unreached destinations we can get to, starting with each start slot q. We choose the start slot q during which we can get to the largest number of destinations, i.e., forming the “largest” possible subtree T (see Step 2.g). Step 2.f specifies that if no destination can be reached starting with any slot, then we exit from the while and try the next alternate tree. If all R’s
Algorithm 2 MDDRP - BREAK into Subtrees with Independent Start Times Input: MDDR R = (M, F, B, A, D). Network state: Residual capacity matrix: C[e][w][q]. Auxiliary variables: X[di ] = true if we reached from s to multicast destination di ; = false, otherwise. (i = 1, 2, ..., k-1) T empC= temporary C, updated after each subtree is scheduled. DestinationsReached[q] = the number of destinations that can be reached (from the source) simultaneously (during time interval [q, q+H-1]) on a given tree T . The union of the paths from s to these destinations form a subtree T of T . Output: One/multiple subtrees, their W A and scheduling time slots. For step = 1, 2, ..., N do 1) Initialize X[di ] ← false, ∀ i = 1, 2, ..., k-1. 2) While (true) a) Insert source s and all destinations di unreached so far (with X[di ] = false) into multicast set M . Let |M |=f . b) Obtain a tree T which connects the nodes from set M . The first large multicast tree TL in the While is generated as in Sec. III-A (and its generated cost matrix GTL is stored). All the subsequent subtrees T are obtained by employing STH (to connect multicast set M ) on GTL . c) On tree T , obtain the f -1 (shortest) paths from source s to all destinations in set M . Let these paths be SP1 , SP2 , ..., SPf −1 . d) For each branch l of T , compute the vector of “possible” start time slots Pl,w . e) For each time slot q such that A ≤ q ≤D-H do For each of the f −1 unreached destinations in T (let them be dj , j = 1, ..., f -1), we check which of them can be reached (i.e., if we can schedule R’s transfer on all links on the end-to-end paths SPdj starting with slot q). Update DestinationsReached[q] accordingly. f) If (Maxq DestinationsReached[q] = 0) Break from While g) Choose time slot q with maximum DestinationsReached[q] as the start time for the subtree T formed by the reached destinations (and the source s). Update X[di ] ← true for all destinations di that can be reached starting with time slot q. Reserve bandwidth B for subtree T in TempC. h) If X[di ] =true for all i = 1, 2, ..., k-1, then Accept R, Update C from TempC, and Return. Else Continue in While (with the next subtree). End While End For Block R and Return.
multicast destinations (from X) can be reached, we provision request R, else we continue generating another random subtree containing the remaining destinations (see Step 2.h). D. Effect of Partitioning Transmitted File on M DDRP We study the effect of partitioning the to-be-distributed file into pieces (approach called P ART ). Intuitively, pieces with shorter holding time are easier to schedule than the whole file; in addition, different pieces can be scheduled starting with independent times. We partition file F into Z equal-sized (as much as possible) sub-files. As in Sec. III-A, we consider N alternate randomized trees Ti , and try to multicast all pieces
978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.
3
4 5
7
8
10
9
11 6
13
12
14 16
15
0.2 0.15 0.1 0.05 0 40
17 18
Fraction of Different Unprovisioned MDDRs
2
Fraction of Unprovisioned Bandwidth
1
MDDRP_Base MDDRP_RandN=1 MDDRP_RandN=3 MDDRP_RandN=5
0.25
50
19
60 70 80 Arrival Rate (requests/min)
90
Fig. 2. A 19-node backbone US Fig. 3. Unprovisioned bandwidth and impact of N for network topology [8]. Base and Rand MDDR provisioning approaches.
independently on a given tree Ti . If we can schedule all of F ’s pieces on tree Ti , we accept R. E. Alternative of Provisioning R using Unicast Transfers For traditional on-demand (with no time flexibility) “pointto-multipoint” requests, multicast is more efficient than individual unicasting from source to each destination, as multicast uses less bandwidth. For flexible-time multicast, additional aspects need to be considered: with unicast, the transfer to each destination is scheduled independently, while with multicast all the tree’s branches need to be simultaneously available. Our U nicast algorithm provisions each transfer from source s to each destination di ∈ M individually, by first generating the K-Shortest Paths (P athj , j = 1, ..., K) between s and di on G, and then attempting the unicast scheduling on P athj , by checking the availability of all of P athj ’s links, starting with possible slots (on the same lines as in Step 2.b of Alg. 1). First feasible start slot on first feasible path is chosen. IV. P ERFORMANCE A NALYSIS We study the performance of our algorithms on the 19node topology in Fig. 2. Links have 16 wavelengths, each of capacity 10 Gbps. The number of nodes in MDDR R’s multicast group M (i.e., k) is uniformly distributed between 2-10. Nodes in M are chosen uniformly (and are distinct). R’s transmission rate B follows the typical distribution 1 Gbps : 2.5 Gbps : 5 Gbps : 10 Gbps = 35 : 30 : 25 : 10 where lower rates apply to more requests. File sizes F are uniformly distributed from 30-70 GB (in increments of 10 GB) for smaller transmission rates of 1 and 2.5 Gbps, and from 60-200 GB (also in increments of 10 GB) for larger transmission rates of 5 and 10 Gbps. MDDR arrivals are independent and rounded to multiples of time slots (time-slot duration is chosen as 8 sec). The ratio between R’s holding time H and the deadline period (D-A) is chosen randomly from the set {0.2, 0.3, ..., 0.9}. Simulation results are averaged over ten runs with different random seeds, each of 50,000 MDDRs (equivalent to 250,000 point-to-point transfers); the first 5,000 MDDRs are not accounted in our performance metric, as the requests needed for the network to reach a stable state. Figure 3 shows the fraction of unprovisioned (or blocked) bandwidth (“bandwidth” of request R is computed as R’s transfer rate multiplied by its holding time: HR · BR , which is
100
1 Gbps 2.5 Gbps 5 Gbps 10 Gbps
0.5 0.4 0.3 0.2 0.1 0 40
50
60 70 80 Arrival Rate (requests/min)
90
100
Fig. 4. Fraction of Requests Unprovisioned by MDDRP RandN =3 for different bandwidth granularities.
equivalent to file size FR ) for Base and Rand. We observe that, even considering one randomized tree, RandN =1 provisions more bandwidth than Base. This is because the deterministic Base, by repeatedly selecting the same network links for the minimum-cost trees of different M DDRs, may slightly congest certain links, while leaving other links less-utilized. In contrast, the randomized approach utilizes network links more uniformly, and thus achieves better performance. Considering more alternate trees can improve network performance, but for larger values of N , the additional performance gain is smaller. Figure 4 shows the fraction of unprovisioned multicast requests of different bandwidth granularities. For an average load situation (i.e., arrival rate of 70), 0.15% of the 1 Gbps M DDRs are blocked. About 3.1%, 5.9%, and 26.7% of the 2.5, 5 Gbps, and 10 Gbps multicast demands, respectively, are also blocked. So, larger granularity multicast requests are significantly harder to provision than smaller granularity ones. Figure 5 shows that, for our problem, performance improvement is possible by choosing MDDRP AllSlots approaches over MDDRP Rand. However, in AllSlots approaches, in addition to checking the value of Pl for each investigated start slot (same as in MDDRP Rand - Step 2.b), we generate a tree for each inspected start slot, so AllSlots may sometimes be more time consuming than Rand (especially for AllSlotsSmallT ree , and for requests with a large time window [A, D), with more slots to be investigated). Figure 5 also shows that the approach that chooses “small trees” (i.e., with fewer branches) performs slightly better than the typical MDDRP AllSlots at high arrival rates. In Fig. 5, we also observe that our multicast approaches significantly outperform Unicast. Especially for the MDDRs with many destination nodes, the unnecessary bandwidth replication of Unicast (compared to multicast) is more important than the time independence offered by Unicast (while multicast approaches have a time dependency of their tree branches). Figure 6 shows that MDDR Break (see Alg. 2) is able to improve over its corresponding MDDR Rand, by “breaking” the trees into multiple sub-trees with independent start times. The improvement achieved over Rand is, however, smaller than that achieved by AllSlots approaches. BreakBase also provisions more bandwidth than Base (even though not explicitly shown, BreakBase works similarly with BreakRand , with
978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.
MDDRP_Rand 0.4 MDDRP_AllSlotsSmallTree MDDRP_AllSlots 0.35 UNICAST 0.3
Fraction of Unprovisioned Bandwidth
Fraction of Unprovisioned Bandwidth
0.45
0.25 0.2 0.15 0.1 0.05
MDDRP_Base 0.2
MDDRP_RandN=1 0.15
N=1
0.05
0 40
50
60 70 80 Arrival Rate (requests/min)
90
100
Fig. 5. Performance of MDDRP AllSlots and Unicast vs. Multicast algorithms.
the differences that (i) only one “deterministic” initial tree is generated, i.e., there is one “step” in Alg. 2, and (ii) the newly-generated sub-trees are also computed on the adjacency matrix, as in Base). To further improve the performance of our algorithms, we consider partitioning the dataset to-be-transferred into multiple parts, which are next time-independently scheduled. Figure 7 shows that, for low and average loads, some performance increase over Rand is possible by partitioning into p parts (e.g., p=2 or 4 in Fig. 7). This is because, for lower and intermediate loads, there is less contention for bandwidth in time, and it is easier to find contiguous bandwidth for multiple shorter holding-time parts (e.g., about H/p slots per part) than for the entire file (where H consecutive slots with enough free capacity need to be found). In contrast, for high loads, the performance of PART tends to become similar to that of MDDRP RandN =3 . This is because, for high loads, unused bandwidth is scarce; in addition, by splitting the dataset into multiple parts and provisioning them individually, bandwidth tends to become fragmented in time and subsequent multicast requests cannot find their contiguously needed time-slots. Fraction of Unprovisioned Bandwidth
MDDRP_BREAKRand
0.1
0
0.18
MDDRP_Rand
0.16
MDDRP_PART2p
0.14
MDDRP_PART4p
0.12 0.1 0.08 0.06 0.04 0.02 0 40
Fig. 7.
MDDRP_BREAKBase
50
60 70 80 Arrival Rate (requests/min)
90
Impact of file partitioning on our MDDR provisioning approaches.
V. C ONCLUSION We studied the problem of multicasting under a flexible traffic model in the context of high-performance applications over WDM networks. We considered the practical scenario with MDDRs of different bandwidth granularities. Our first multicast provisioning approach (named MDDRP Rand) considers a set of alternate multicast trees for each MDDR R
40
Fig. 6.
50
60 70 Arrival Rate (requests/min)
80
90
Performance of MDDR Break provisioning approach.
(on which R can be subsequently scheduled), and was found to improve over an approach called Base. Another method (named AllSlots), which dynamically constructs trees at each available start time, was found to outperform MDDRP Rand. Other approaches (one which “breaks” the trees into a small number of sub-trees with independent start times, and another which partitions the to-be-transferred dataset) can also slightly improve over Rand. R EFERENCES [1] Open Grid Forum, “Grid Network Services Use Cases from the eScience Community,” Editor: T. Ferrari, 2007. [2] H. Miyagi et al., “Advanced Wavelength Reservation Method Based on Deadline-Aware Scheduling for Lambda Grid Networks,” Journal of Lightwave Technology, vol. 25, no. 10, pp. 2904-2910, Oct. 2007. [3] D. Andrei et al., “On-Demand Provisioning of Data-Aggregation Requests over WDM Mesh Networks,” Proc., IEEE Globecom, Nov. 2008. [4] L. Sahasrabuddhe and B. Mukherjee, “Light-trees: Optical multicasting for improved performance in wavelength-routed networks,” IEEE Commun. Mag., vol. 37, no. 2, pp. 67-73, Feb. 1999. [5] N. Singhal, L. Sahasrabuddhe, and B. Mukherjee, “Optimal Multicasting of Multiple Light-Trees of Different Bandwidth Granularities in a WDM Mesh Network with Sparse Splitting Capabilities,” IEEE/ACM Transactions on Networking, vol. 14, no. 5, pp. 1104-1117, Oct. 2006. [6] Y. Fan, B. Wang, and X. Luo, “Multicast Service Provisioning under a Scheduled Traffic Model in WDM Optical Networks,” Proc., International Conf. on Communications and Computer Networks, Oct. 2005. [7] B. Mukherjee, Optical WDM Networks, Springer, Feb. 2006. [8] M. Tornatore et al., “Dynamic Traffic Grooming with Holding-Time Knowledge,” IEEE J. on Sel. Areas in Comm., vol. 26, no. 3, Apr. 2008. [9] M. Barcellos et al., “High-Performance Reliable Multicasting for Grid Applications,” Proc., IEEE Intl. Workshop on Grid Comp. (GRID), 2004. [10] A. Jaekel and Y. Chen, “Resource Provisioning for Survivable WDM Networks under a Sliding Scheduled Traffic Model,” Optical Switching and Networking, vol. 6, no. 1, pp. 44-54, Jan. 2009. [11] L. Shen, X. Yang, A. Todimala, and B. Ramamurthy, “A Two-phase Approach for Dynamic Lightpath Scheduling in WDM Optical Networks,” Proc., IEEE International Conf. on Comm., pp. 2412-2417, June 2007. [12] S. Tanwir et al., “Dynamic Scheduling of Network Resources with Advanced Reservations in Optical Grids,” International Journal of Network Management, vol. 18, no. 2, pp. 79-105, Jan. 2008. [13] C. V. Saradhi and M. Gurusamy, “Scheduling and Routing of Sliding Scheduled Lightpath Demands in WDM Optical Networks,” Proc., IEEE Optical Fiber Communications Conference, Mar. 2007. [14] A. Banerjee et al., “Algorithms for Integrated Routing and Scheduling for Aggregating Data from Distributed Resources on a Lambda Grid,” IEEE Trans. on Parallel and Distrib. Systems, vol. 19, no. 1, Jan. 2008. [15] F. Arshad et al., “Advance Reservation and Dynamic Scheduling of Point to Multipoint Lightpaths,” Proc., IEEE High Capacity Optical Netw. Conference, Nov. 2008. [16] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms, Second Edition, MIT Press, 2001. [17] H. Takahashi and A. Matsuyama, “An approximate solution for the Steiner problem in graphs,” Math. Japonica, pp. 573-577, 1980.
978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.