Towards the optimal design of an enterprise access ...

12 downloads 0 Views 547KB Size Report
E-mail: {sanyal,ghosh,basu,das}@cse.uta.edu ... bandwidth choosing from k classes like fractional DS-1,. DS-1 ... Genetic Algorithms(GAs) [2] fall into a special class .... 70. 80. 90. 100. Generation Number. Normalized Fitness Values. 5 nodes.
TOWARDS THE OPTIMAL DESIGN OF AN ENTERPRISE ACCESS NETWORK : A GENETIC ALGORITHM PERSPECTIVE Soumya Sanyal ,Preetam Ghosh, Kalyan Basu & Sajal K. Das CReWMaN (Center for Research in Wireless Mobility and Networking) Dept. of Computer Science & Engineering The University of Texas at Arlington Arlington, TX 76019-0015 E-mail: {sanyal,ghosh,basu,das}@cse.uta.edu

Keywords: Access Network Design, Single Sink Buy-at-Bulk Problem, Genetic Algorithms

Abstract We seek a solution to the Enterprise Access Network design problem often referred to as the single sink buy-at-bulk (SSBB) problem using a Genetic Algorithm(GA). The problem is well known in Network Design. The problem is known to be an NP-Hard problem and past work has involved formulating approximation algorithms for it. By approaching the problem using a GA we seek a more optimal result than traditional algorithms.

1

Introduction

There has been a wealth of literature dealing with topology driven telecommunication network design problems in the past [1, 3, 4, 7]. In this paper, we focus on the access network design issues for an Enterprise. In particular, the single sink buy-at-bulk (SSBB) network design [7] or the single sink edge installation problem. With the advent of convergent technologies e.g. IP telephony, videoconferencing, high bandwidth file transfer etc. onto a single network, the design of an access network for an Enterprise involves optimization on the cost of commercial bandwidth used. The problem involves trying to route traffic from a set of local offices into a single branch office which is connected to the core. The bandwidth usage is levied a tariff by the commercial bandwidth provider(s). The central idea is to optimize on the cost of bandwidth usage by the access network by routing traffic through other local offices rather than connecting them directly to the branch since the cost of multiplexing through other nodes i.e. aggregating traffic, provides cost effective solutions [1]. Each local office typically has a band-

width requirement depending on the traffic (provided through source modelling) it generates depending on the services offered in the network and the number of people at each location.

2

Problem Definition and Our Approach

We present in this section a brief and simplistic description of the Single Sink Buy-at-Bulk (SSBB) problem. For a more formal treatment, please refer to the Appendix. Our problem domain as we have mentioned earlier are Enterprise Access Networks. We assume the existence (in such an access network) of a branch office (called the sink) which is connected to the core and several local offices whose traffic have to be routed to (and through) this branch office into the network. The traffic might be routed through local offices to optimize on the commercial cost of carrying the bandwidth on its way to the sink. The traffic links of this network have to be installed with one or more copies of commercial bandwidth choosing from k classes like fractional DS-1, DS-1, DS-3, OC-12, OC-48 etc. Because of the nature of commercial pricing strategies, it might be cheaper to use a larger slab of bandwidth along some links than multiple copies of a smaller slab. If we view the access network in graph theoretical terms, we find that this network’s topology resembles a cost optimal spanning tree of the network with the root being the sink. Traditionally, computer scientists have treated this problem as extremely hard (called NP-Hard) and hence getting an optimal solution to it is not feasible in reasonable amount of time. What they have done, however, is to find approximate solutions to the problem i.e. guarantee a time feasible solution within a fraction of the optimal cost. What we have employed in this paper, is an evolutionary heuristic, that attempts to solve the problem and hence obtain the optimal cost spanning tree in reasonable time. Be-

BeginGA() /* Πγ : Population of the generation. Γ: Maximum number of generations. γ: Current generation number. ²: Encoding function. φ: Fitness function. ψ: Selection method. χ: Crossover method. µ: Mutation method. λ: Individual number. */ begin 1: γ=0 2: /* Initialize the population */ 3: Initialize(Πγ ) 4: begin 5: /* Encode each chromosome */ 6: ∀ λ encode using ²(λ); 7: /* Evaluate the fitness of the initial population */ 8: Evaluate φ(λ), ∀ λ 9: end 10: γ = γ + 1; 11: while (γ < Γ or stopping criterion not met) 12: begin 13: /* Selection of the parents from the current generation */ 14: ψ(Πγ ) ; 15: /* Perform crossover to produce the offspring based on the crossover probability */ − Πγ ; 16: Π0 γ+1 ← χ 17: /* Mutate the genes of the children based on the mutation probability */ − Π0 γ+1 ; 18: Πγ+1 ← µ 19: /* Evaluate the fitness of the next generation */ 20: Evaluate φ(λ), ∀ λ; 21: /* Increment the generation number */ 22: γ = γ + 1; 23: end 24: Return the fittest solution found so far; end

Figure 1: Outline of Simple Genetic Algorithm cause of the randomized nature of such heuristics (as will be clearer later) we cannot guarantee that the optimum will always be found. However, as evidenced in subsequent sections, we did get the optimal values for small problem instances. What this shows is that, the results are fairly encouraging and perhaps can be employed instead of traditional algorithms.

3

A Genetic Algorithm Approach

Genetic Algorithms(GAs) [2] fall into a special class of Evolutionary Algorithms and have long been known and applied to many optimization problems in the past. They provide robust optimal or sub-optimal solutions. While its applicability has crossed many boundaries including the design of telecommunication networks [8],

it has not been used in the past to solve the SSBB problem. In Fig. 1, we give a skeleton of a typical GA. Any GA approach always has a set of initial solutions called an initial population(Π0 ) of individuals. Each individual consists of one or more chromosomes. Each chromosome represents a possible solution to the problem. A chromosome on the other hand consists of genes that can take values which are called alleles. Hence, in brief, a population of individuals is generated as the first generation in a GA. Each of the chromosomes in the initial population form an initial solution space. By using the two fundamental operators , crossover(χ) and mutation(µ), the real power of a GA is harnessed. With each chromosome we associate a fitness value(φ) that determines how good a solution the chromosome actually gives. This is usually sub-

ject to the way the chromosome is chosen to represent the solution i.e. it is user defined and is referred to as the encoding technique. The evolution to the next generation of individuals is chosen by selecting(ψ) a group of fit mates. These mates pair up to form parents who produce a pair of children (usually) as the next generation of individuals. Each of these children contain genes from both the parents. This is achieved by the crossover operation. The hope being that the children will produce fitter solutions. Often during reproduction, one or more genes belonging to the children will be mutated to produce a random solution in the solution space. This is the method that the GA employs to escape converging to a local minimum. If the crossover operation is chosen properly, the subsequent generations will have better fitness than the current generation. This usually leads to a convergence in the average fitness of the population and can act as a good stopping criterion for the GA. We implemented another technique called elitism, that brings about a faster convergence of the GA and hence in finding a solution. This is done by keeping the fittest individual in the current population for the next generation. We outline the details of the problem at hand and the details of the GA below. Fig. 2 shows the chromosome encoding used. The index 0, represents the branch office, and the indices other than that represents the local offices. X represents no connection. Specific to our GA, we did the following:

1

3.2 Crossover After a probabilistic(depending on the fitness) selection of two individuals(parents) from the current generation, the crossover operation is applied on the parents to get two offsprings(children). The idea is to produce children as fit or fitter than their parents. In mathematical terms, the operation performs a local search in the solution space. Fig. 3 shows a typical crossover operation. We decided to implement the crossover by exchanging the nodes to which the branch office was connected. There are also minor checks we required to keep the tree connected. The crossover probability was kept at 0.95. [0]

[1]

[2]

[3 ]

[4 ]

[5 ]

[0]

[1]

[2]

[3 ]

[4 ]

[5 ]

3

5

3

x

3

0

4

5

3

0

x

0

Parent 1

Child 1

[0]

[1]

[2]

[3 ]

[4 ]

[5 ]

[0]

[1]

[2]

[3 ]

[4 ]

[5 ]

4

2

4

0

x

4

3

2

4

x

3

4

Child 2

Parent 2

1

2

0

1

2

0

3

3 5

5 Parent 1

1

4

Crossover c

2

0

Child 1 4

1

2

0

2

0

3

3 5 Parent 2

4

5 Child 2 4

3 5

Figure 3: The Crossover Operation

4 [0]

[1]

[2]

[3 ]

[4 ]

[5 ]

3

5

3

x

3

0

3.3 Mutation Figure 2: Chromosome encoding

3.1 Initial Population Generation For every individual(also chromosome in our case) we randomly linked the branch office(index 0) to any other local office to ensure connectivity to the branch office. The corresponding local office is then marked with X indicating that it is already connected to the branch office. The other local offices are then randomly paired. We ran a modified version of the Prim’s algorithm [6] to construct each chromosome.

The mutation operation is fairly simple to implement. We randomly chose a local office node and removed its current connection and replaced it with a random new one. We did this for all local offices except the one connected to the branch office. Fig. 4 shows the mutation operation in effect. The mutation probability was kept at 0.05. Fig. 5 shows the convergence of the GA used for a varying number of local office nodes. We tested our approach against a brute force method for small problem instances, and in each case the GA found the optimal solution. Since the problem is NP-hard finding a comparison for higher problem instances was not feasible.

Table 1: Distance in miles between the local office locations with Dallas as the branch Cities Ft. Worth Austin San Antonio Waco Houston Laredo Corpus Christi Abilene Amarillo Lubbock

Dallas 34 196 274 95 239 434 412 181 377 351

Ft. Worth 0 188 266 87 269 426 404 151 346 322

Austin 0 79 101 162 239 217 253 537 414

San Antonio 0 179 196 157 143 245 514 406

Waco 0 185 340 317 233 437 403

Houston 0 353 221 415 616 585

Laredo 0 145 400 669 560

Corpus Christi 0 410 655 855

Abilene 0 269 163

Amarillo 0 123

Table 2: Source Traffic requirements of various local offices in MbPS Ft. Worth 3.69

Austin 1.21

1

San Antonio 3.08

Waco 1.95

Houston 0.35

Laredo 0.67

2

0

Corpus Christi 2.83

Abilene 0.57

Amarillo 0.32

Lubbock 0.14

100

90

5 nodes

3 80 Optimal Values

5 [0]

3

[1]

[2]

[3 ]

[4 ]

[5 ]

5

3

x

3

0

Before Mutation 1

2

0

Normalized Fitness Values

6 nodes

4

70

60 7 nodes 50 8 nodes 40

10 nodes

30

3 5

9 nodes 20

4 [0]

[1]

[2]

[3 ]

[4 ]

[5 ]

3

2

3

x

3

0

10

0

100

200

300

400 500 600 Generation Number

700

800

900

1000

After Mutation

Figure 4: The Mutation Operation

4

Experimental Study

We modelled traffic for a set (and subset) of 11 (including the branch office at Dallas) cities in the state of Texas. Table 1 shows the distances used. In our initial experiments, the traffic generated for an individual was calculated first by modelling the various services used (using an on-off model) by a person e.g. HTTP, E-mail, FTP, IP telephony or backup traffic and multiplying this by the the number of people at each local office location. We later decided to randomize the traffic generated at each source in order to test the robustness of our algorithm. The traffic requirements of the various local offices(randomized version) are shown in Table 2. The cost structure for the different classes of bandwidth usage as shown in Table 3 was also generated in an ad-hoc fashion. We simply ensured that they were in a non-decreasing order and followed an economy of scale. One can easily substitute the actual

Figure 5: Normalized Fitness ($1000k/Cost) vs. Generation Number cost of DS-1, DS-3 etc. in place of these costs. Fig. 6 and 7 show the optimal and sub-optimal(for 9 and 10 cities) network configurations for 5 to 10 local offices. It is important to elicit the complexity of the problem at hand. The total number of spanning trees in a complete graph of order n is nn−2 . Hence, for a 11 node graph (10 + 1 branch office) the explicit number of trees to search for an answer would be 119 or 2, 357, 947, 691 number of nodes. This is what exactly we tried to implement using a naive brute force algorithm. While for lower number of nodes, we did get an answer within a few minutes (sometimes hours), for 9 or greater number of nodes, the method was largely impractical. The GA however gave an answer within a few seconds. For a population size of 30 and 1000 generations, if we did randomly generate an unique solution each time, it would give us 30, 000 different answers. If we compare that to the optimal value we obtained for 8(9 including the branch office) local offices,

Lubbock 0

Table 3: Cost per Unit length and Capacity in MbPS of different Bandwidth Classes Bandwidth Class Bandwidth Capacity Cost($/Mile)

1 5 20

2 10 30

3 15 40

4 20 50

5 25 60

6 30 70

7 35 80

8 40 90

9 45 100

10 50 110

Amarillo

Lubbock

Dallas Ft Worth Abilene Waco

Austin

Houston San Antonio

Corpus Christi Laredo

Figure 6: The access network configuration for 5,6 and 7 local offices 40 Amarillo

Seed = 0.9 Lubbock

35

Ft Worth

Dallas

Normalized Fitness Values

Abilene

Seed = 0.5

30

Waco

Austin Houston San Antonio

25 Corpus Christi Laredo

20

15

10

Figure 9: Dependence of the solution on the Traffic Demands (10 local offices) 0

100

200

300

400 500 600 Generation number

700

800

900

1000

Figure 8: Dependence of the GA on the random seed

i.e. 97 = 4, 782, 969 possible cases, we see the power of GA’s. It is important to also note, that the initial population generated and hence the random seed and the amount of traffic also play a major role. So does, the crossover and mutation probabilities. In Fig. 8 we see that just by changing the random seed, we get a bet-

ter solution for the same number of generations and population size. In Fig. 9 we changed the initial traffic demands and found a new solution structure for 10 local offices case. Compare this structure to the third solution shown in Fig. 7. Hence, for a higher number of nodes, there needs to be some experimentation done with the parameters.

Amarillo

Amarillo

Amarillo

Lubbock

Lubbock

Lubbock

Ft Worth

Dallas

Abilene

Ft Worth

Dallas

Ft Worth

Waco

Waco

Waco

Austin

Austin

Austin

Corpus Christi Laredo

Houston

Houston

Houston San Antonio

Dallas

Abilene

Abilene

San Antonio Corpus Christi Laredo

San Antonio Corpus Christi Laredo

Figure 7: The access network configuration for 8,9 and 10 local offices

5

Conclusion

In this paper we approach the well documented SSBB problem with a new perspective. We departed from the conventional approximation algorithms and tried an evolutionary approach. For a lower number of nodes, we showed that the GA is very successful in obtaining the optimal values. Conventional approximation algorithms only guarantee a solution whose cost is within a constant factor of the optimal value. For a higher number of nodes, we showed that experimenting with the GA parameters might produce surprisingly better results too. While we could not guarantee optimality in such cases, we are fairly confident that the GA produces near-optimal solutions too. The relevance of this paper is in the field of Access Network Design and we have obtained some very promising results that try and optimize the cost of designing such a network especially for an Enterprise. This GA approach could prove to be a very useful tool for network planners and designers who try and build custom solutions for their clients. Finally, it would be an useful exercise to try and build other methods for crossover,mutation and encoding functions and compare the results with our GA.

[4] Anupam Gupta, Amit Kumar and Tim Roughgarden. “Simpler and Better Approximation Algorithms for Network Design.” In Proceedings of the 35th Annual ACM Symposium on the Theory of Computing (STOC), 2003. [5] Adam Meyerson, Kamesh Munagala, and Serge Plotkin. “Cost-distance: Two metric network design.” In Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science, pages 624–630, 2000. [6] R.C. Prim, “Shortest connection networks and some generalizations”, Bell System Techn. J. 36:1389–1401, 1957. [7] F. Sibel Salman, Joseph Cheriyan, R. Ravi, and Sairam Subramanian. “Approximating the singlesink link-installation problem in network design.” SIAM Journal on Optimization, 11(3):595–610, 2000. [8] Mark C. Sinclair, “Evolutionary Telecommunications: A Summary”, proceedings of the Bird-offeather Workshop GECCO ’99, 1999.

Appendix: The SSBB problem References [1] Matthew Andrews and Lisa Zhang. “Approximation algorithms for access network design.” Algorithmica, 34(2):197–215, 2002. (Preliminary version in 39th FOCS, 1998.). [2] D.Goldberg, “Genetic Algorithms in Search, Optimization and Machine Learning,” New York: Addison-Wesley, 1989. [3] Sudipto Guha, Adam Meyerson, and Kamesh Mungala. “A constant factor approximation for the single sink edge installation problems.” In Proceedings of the 33rd Annual ACM Symposium on the Theory of Computing (STOC), pages 383–388, 2001.

In this section we provide a detailed graph theoretic formulation of the problem. Given a graph G = (V, E) where |V | = n and a subset S, S ⊂ V with a sink t ∈ V, t ∈ / S we have to find a single path from each node si ∈ S to t i.e. route the demands demi of each node si in the subset to the sink t. For every pair of nodes (vi ,vj ) in G the shortest distance between the nodes is taken as the length `: E →

Suggest Documents