A Multi-objective Network Design for Real Traffic ...

2 downloads 5033 Views 536KB Size Report
[7] optimized the network availability for the expansion of computer networks. Also, other authors have .... two most important factors: the network installation cost.
A Multi-objective Network Design for Real Traffic Models of the Internet by Means of a Parallel Framework for Solving NP-hard Problems José M. Lanza-Gutiérrez, Juan A. Gómez-Pulido, Miguel A. Vega-Rodríguez, Juan M. Sánchez Dep. of Technologies of Computers and Communications University of Extremadura, Polytechnic School Campus Universitario s/n, 10003 Cáceres (Spain) {jmlanza, jangomez, mavega, sanperez}@unex.es Abstract—The design and optimization of a communication network is a complex task, because it involves a lot of factors. The two most important factors are the network cost and the communications delay. We have tackled this NP-hard multi-objective optimization problem using two metaheuristics not used before: NSGA-II and SPEA-II, demonstrating that SPEA-II offers better results than NSGA-II. In order to facilitate the programming tasks, we have used the PISA framework. This paper proposes also a novel PISA parallelization based on MPI that allows distributing work load between computer cluster nodes, increasing thus the productivity. Communication networks; multio-objetive metaheuristics; PISA framework; parallelism.

I.

optimization;

INTRODUCTION

The design and optimization of a communication network is a complex task, because it involves a lot of factors that are necessary to evaluate for a solution that will convince both end users (quality of network service) and entities that carry out its implementation (costs of deployment) [1]. Among the most common optimization factors are usually taken those that affect the cost and quality of the network (delay, reliability). Both factors influence each other, since the variation in any of them causes the other be affected. This constitutes a MultiObjective Optimization Problem (MOOP), where it is necessary to cover all the space of possible solutions to find the optimal one. Since this is an Np-hard MOOP, exhaustive searches are clearly discarded, being necessary to use other techniques to facilitate their resolution [2]. Many studies have tried to fix this optimization problem. Starting with heuristics, we can cite the work of Khan et al. [3] (they developed a technique based on branch and bound to optimize the cost of the network about specific reliability values) and Ersoy et al. [4] (an optimization technique used on the mean delay for design of interconnected LAN/ MAN). However, all these heuristics do not guarantee that solutions are optimal; in addition, most of them optimize only one objective, so the problem should be split in two. Many other works have used evolutionary algorithms (EA) such as Genetic Algorithms (GA) for single-objective

optimization. Thus, Abuali et al. [5] minimized the cost of the network while they considered maximum capacity; Ko et al. [6] optimized the network cost while maintaining constant values of delay; and Kumar et al. [7] optimized the network availability for the expansion of computer networks. Also, other authors have used the multi-objective (MO) GAs, since they are best suited to this type of problem [8]. This way Barnerjee et al. [9] studied the network design based on normal traffic patterns (self-similar and Poisson) to optimize cost and delay using GAAP (Pareto Converging Genetic Algorithm), and R. Kumar et al. [10] tried to optimize the cost and delay using the same algorithm. There are not all advantages in the use of EAs. The main difficulty is the complex implementation because they focus usually on complex mathematical theories. This is the reason why their use is often relegated to certain highly qualified scientific fields (computers, maths, etc). Nevertheless, there are software frameworks that facilitate the use of EAs and metaheuristics for MOOPs, like PISA1 and JMetal2. These platforms try to separate on one hand, the implementation of the selection algorithm and on the other hand, the definition of the problem. We have solved a multi-objective network design optimization problem for real traffic models of the Internet by means of GAs, using the PISA framework. We selected this framework due to its weight in the community. After studying the characteristics of this platform, we observed that it didn't have implemented a version for parallel systems. This was the reason why we decided to parallelize the framework, while we solved the network design problem using a couple of well-known EAs provided by the platform: Non-dominated Sorting Genetic Algorithm II (NSGA-II) [11] and Strength Pareto Evolutionary Algorithm II (SPEA-II) [12]. These algorithms have not been used before for this optimization problem. Although it might seem odd (due to the large number of publications in this issue), practically there are no public authorities (data sets defining the problem) that can be used 1

Programming Language Independent Interface for Search Algorithms (PISA), web: http://www.tik.ee.ethz.ch/pisa/. 2

Metaheuristic Algorithms jmetal.sourceforge.net/.

in

Java

(JMetal),

web:

http://

when the results are validated, except one well-known: the ten China cities most populated [6], used in our work. The rest of this paper is organized as follows. In the second section we provide a brief description of the PISA platform. The problem of network design is presented in section 3. The parallelization of PISA framework and the implementation of the problem of network design appear in section 4. In the fifth section, we present an evaluation of the results using statistical techniques. A comparison with other approaches appears in section 6. Finally, the conclusions and future work are left for section 7. II.

reaches a certain stop condition, usually a number of iterations. Finally, when the optimizer reaches step 5, it generates the output file with the corresponding Pareto front (it is a graph which contains the results in MOOPs). Another important component of the PISA platform is the performance assessment (Fig. 1), which determines whether any of the used selection algorithms provides a significantly higher performance, by means of statistical tools [14]. Finally, the component called monitor (Fig. 1) also exists, that will be discussed in depth in Section IV-A.

DESCRIPTION OF THE PISA FRAMEWORK

This section provides a brief description of the platform used in this work. For a more complete description, we recommend the reference [13]. As mentioned above, PISA is a platform for solving optimization problems by means of EAs, and it consists of three main elements: optimizer, monitor and performance assessment (Fig. 1). The optimizer solves a MOOP using EAs. This element is divided into two modules: on one hand, the variator module, which contains specific details of the problem (representation of information in chromosomes, fitness functions, mutations and crossovers strategies). On the other hand the selector, that contains a selection algorithm (like NSGA-II and SPEA-II) determining how the individuals are selected in the evolution process. Each of these modules is a standalone application that communicates by means of text files. The formats of the input / output files are perfectly detailed, allowing all modules are interoperable, regardless of the programming language used. Besides reusability, this modular approach facilitates the implementation, since the user just needs to focus on the implementation of at least one of them. Thus, the expert in EAs will want to implement a new selector module, while the expert in a particular scientific field, for example biology, will implement a new problem by means of a variator module. Each of these modules has associated a configuration file that allows adjusting the required parameters. This file permits to define multiple configurations for the same problem. For example, for a variator module, we may adjust crossover probability, number of iterations, and so on. When solving an optimization problem, it is enough to run each of these modules (variator and selector) with an associated configuration. During execution, control passes from one module to another through intermediate files: it gives the impression that both run like a whole. The optimizer follows the steps showed in Fig. 2. Starting with Step 1, the variator module generates the initial population and calculates the fitness value for each individual. The control is then passed to the selector module in step 2, where the individuals will be selected to evolve in the first iteration. In step 3, the variator module carries out mutations and crossovers about selected individuals. In step 4, the selector module makes the individuals selection and it returns the control to step 3 again; this loop ends when it

Figure 1. Elements of the platform PISA.

Figure 2. Steps followed by the optimizer.

III.

DESIGN OF A COMMUNICATION NETWORK

When designing a communication network, there are several factors that must be evaluated to obtain an appropriate network to each need, including: network cost, communication delay, traffic volume, and so on [1]. In this paper we propose an optimized network design based on the two most important factors: the network installation cost (not maintenance cost) and the communications delay. A.

Problem instance definition A particular problem instance will be defined by the number of network nodes (N), distance between nodes (D, a matrix of NxN elements), estimated traffic between nodes (T, a matrix of NxN elements), types of available nodes (K, with its characteristic of cost and capacity), types of existing links (M, with their values of cost and capacity), signal amplifiers cost (A) and maximum distance that signal can travel without amplification (L). These two parameters (A and L) are due to use a fiber optic network. B.

Routing policy Traffic matrix (T) provides the estimated traffic between the cities of the instance, taking into account that the topology is completely related. Currently, a topology is composed of a subset of all possible links in the network, so

it is necessary to redefine the initial traffic matrix with new needs that arise to route the information through existing links. This new matrix is called T_acu. For this task, we have used Dijsktra’s shortest path algorithm [15], a metric used is the length link. C.

Objective functions Network installation cost (y1) is defined by the nodes cost, signal amplifiers cost and links cost (1). Delay (y2) is established based on the traffic model used (2); in this case we have used Poisson, a model for conventional networks. Thus, the delay is measured based on the middle-size of transmission queues generated at network nodes [16]. Note that with CoNEi,, we want to refer to cost of a node named i, with CoLinki,j link cost between nodes i and j, and with CpLinki,j link capacity between nodes i and j. Both objective functions have been used in other publications [9] [10]. ⎢ Dij ⎥ (1) y1 = ∑ Co NE1 +∑∑ (Co Linkij ) + A∑∑ ⎢ ⎥ L i i j i j ⎣ ⎦ T _ acuij

y2 =

∑∑ CpLink i

j

− T _ acuij

ij

∑∑ CpLink i

ij

j

(2)

D.

Limitations A given network topology is valid if it meets certain restrictions: on one hand, the flow through a link cannot exceed the capacity of that link. This requires analyzing all traffic that passes through link due to other network nodes; on the other hand, the obtained network should be reliable (it must be at least bi-connected; in other words, all nodes must be accessible through two alternative routes). IV.

safely named SSH File Transfer Protocol (SFTP), in order to obtain output files from cluster nodes. The new monitor is described in Algorithm 1: For each selector module that you wish to use against the same variator module. Once one of these variator-selector combinations is selected, all the desired repetitions for this combination are distributed by MPI between every nodes in the cluster, allowing as many executions per node as cores have available. After execution on all nodes, the results are collected by SFTP and the procedure is repeated again for a new selector algorithm. The output of this monitor is a concatenated Pareto front file, like original monitor. Algorithm 1: Pseudocode of the proposed monitor 1: The necessary data are spread across the cluster: variator module and selector, configuration files, instance data... (SFTP) 2: for i takes values of {selector1, selector2,…,selectorN} do 3: for j=0 to NUM_NODES do 4: for k=0 to NUM_CORES do 5: for z=0 a MAX_REPETITIONS/(NUM_NODES*NUM_ 6: CORES) do //repetitions per core 7: Launch variator module in node j (MPI) 8: Launch selector module in node j (MPI) 9: end for, end for, end for 10: for j=0 to NUM_NODES do 11: for k=0 to NUM_CORES do 12: for z=0 to MAX_REPETITIONS/(NUM_NODES*NUM_ 13: CORES) do //Repetitions per core 14: Wait for termination signal from node j (MPI) 15: Get result file from node j (SFTP) 16: Write result into output file 17: end for, end for, end for 18: end for

IMPLEMENTATION

In this section we detail the parallel implementation of the framework and the variator module that defines this network design problem. A. Framework parallel implementation In the parallel implementation of the framework, the monitor is replaced for one that works in parallel. The original monitor 3 tries to solve an optimization problem by using different selection algorithms. For each one of them, multiple iterations are performed with the same settings, in order to obtain statistically valid results. The PISA sequential design causes that required computation time is presumed high. Our idea is to harness the cluster computing power to lighten the required computation time, thus increasing the overall system productivity. The strategy we have followed requires the use of two technologies: a message passing library named Message Passing Interface (MPI) [17] for communication between processes, and a protocol for access to remote file system 3

Detailed documentation of the original monitor (including pseudocode): http://www.tik.ee.ethz.ch/pisa/monitor/monitor_documentation.pdf

Figure 3. Example of a charge distribution with the proposed monitor.

The way in which executions are distributed by the cluster depends primarily on the number of repetitions to be obtained. Fig. 3 shows an example of a cluster with 5 nodes, each of them with 4 CPU cores. For example, if getting 40 repetitions is required, the use all the cluster cores (each of them carrying two executions) is mandatory. For 20 repetitions, it would be used again all the cores, each of them carrying one execution. If needed 10, 5 nodes would be used, but only 2 cores per node are needed, and so on.

As expected, the system increases its productivity. The reason is that the execution of the optimization process requires a large amount of CPU time, in comparison with the necessary communication time between machines. This aspect will be discussed in detail in section V. In order to be compatible the monitor raised here with the modules already developed, it is necessary to make two changes: the inclusion of typical MPI initialization directives for both modules and sending the termination message to the master node from the state 4 of variator module (see Figs. 1 and 2). B. Variator module implementation In order to implement the variator module that defines the addressed problem, we have adopted the platform specification in [13] to ensure proper communication with selector modules already implemented. We could talk in detail about the coding of each of variator module states, but due to the limited space, we will only discuss basic aspects of EAs design. 1) Encoding used An individual represents a possible solution as network topology, and it is encoded as chromosome. The fixedlength chromosome is divided into two parts, as shown in Fig. 4: the first one defines each network node type; the second one represents the links between nodes, where 1 indicates the existence of a link and 0 otherwise.

Figure 4. Fixed-length chromosome that represents problem individuals.

2) Generating initial population The initial population is generated by a combination of random and deterministic procedures. First, each node is assigned with a random type. Then, minimum distance tree between all nodes is obtained by means of Prim’s algorithm [15]. Finally, new links are randomly added to the tree. Once generated the individual, it is checked if it meets the restrictions imposed in paragraph B-3. If it is correct, the individual will be inserted into the initial population, otherwise it will be discarded. 3) Evaluation of individuals Two tasks are performed when individuals are evaluated. First, we get the objective functions values. Secondly, we check if the individual is a valid topology. A

topology is considered valid only under the following conditions: it must adjust to the limitations imposed in Section III-D (it must be bi-connected and the capacity of each link is sufficient for the traffic generated by the rest of the network; for this last task, the Dijsktra's proposal [15] is used to obtain the requirements of each link due to total network traffic) and the types assigned to each node must be valid (non-existent types may appear due to mutations). If an individual is not valid, its cost and delay will take infinite values, allowing discarding it in the next iterations. 4) Crossover and mutation strategies For recombination purposes, we consider chromosomes are divided into two distinct parts. The crossing point in the first part is selected in order to maintain the encoding unchanged; otherwise a lot of individuals would be generated. Thus, the generated individual from crossover will have as possible types those of their parents. Finally, the crossing point in the second part is randomly located. For mutation, we have also considered this same circumstance; therefore mutation points have similar criteria to recombination. The number of mutations performed over individual takes a random value. V.

RESULTS

In this section we detail the experimental results. First, we present the obtained data from solving the problem with the variator module implemented by us and two selector modules developed by PISA (NSGA-II and SPEA-II). Next, we provide a benefit analysis obtained by our proposal of parallelized framework. All experiments have been performed on a cluster of parallel processor made up of 5 nodes, each consisted of four Intel ® Xeon ™ processors at 3.0 GHz and 1 GB RAM. A. Solve problems by both selector algorithms The strategy used for solving the problem by both algorithms is simple. First, we determine the settings that get the best results for each algorithm. Next, we study whether any of the two algorithms provides a higher performance; statistical tools are used for this purpose. The most common parameters have been set to define the appropriate configurations: crossover probability, generation number, population size and mutation probability. This methodology is similar to the exposed by A. Rubio-Largo et al. [18]: starting on a default configuration, the parameters values are adjusted one by one in its optimal value, until all parameters have been adjusted.

Figure 5. Procedure to select the appropriate statistical tests.

Figure 6. Obtained hypervolume box plot.

Figure 7. Comparison between the execution times of both monitors.

Some kind of measure is required when determining the degree of goodness that is obtained for each configuration. We decided to use a standard method: the hypervolume [14]. The higher hypervolume value, the better result is. It is necessary to define the minimum and maximum reference points when calculating the hypervolume value. Thus, it is sufficient to have the points {1.000.000, 0.9} as maximum and {0, 0} as minimum for tuples {cost, delay}, since they engage all fronts obtained. Following this methodology, we can say the algorithms provide the best results with the settings shown in Table 1. From the two obtained configurations, next step is to check, as it seems, if SPEA-II obtains better results than NSGA-II. This requires performing a statistical study to verify whether the achieved improvement by SPEA-II is significant. The procedure followed for the statistical analysis is shown in Fig. 5 [19]. The first step is to determine if the data obtained from the 20 runs (for statistical validation) for both configurations follow a normal distribution. For this purpose we used both Shapiro-Wilk [20] and KolmogorovSmirnov-Lilliefors (K-S) [21] tests. The following hypotheses are tested for both tests: • H0 - The model underlying the data is normal. • H1 - The model underlying the data is not normal. For both tests we obtain a p-value less than 0.05, so there is strong evidence against the null hypothesis. We cannot assume that data come from a normal model. In Fig. 6, we can see a box plot in which, for every algorithm, the obtained hypervolumes by several executions

are showed, as well as the medium hypervolume (marked by a thick line) for both algorithms. How there are differences between the median, with the SPEA-II better to NSGA-II. Is this difference significant? To check it, and since we cannot assume a normal distribution in any case (as noted above), we used a nonparametric test. In this case we applied the Wilcoxon test [22] since there are two related samples (Fig. 3), which are tested for the following hypotheses: • H0 - The two samples come from populations with the same distribution (same medium). • H1 - The two samples come from populations of different distributions in the central tendency (median difference). The test result is a p-value of 0.001, lower than 0.05, so there is strong evidence in the data against the null hypothesis. That is, the two samples come from populations with different mediums, in other words, there are significant differences between the two samples. As there are significant differences between sample means, it is sufficient to conclude SPEA-II has a higher performance than NSGA-II. B. Parallel proposal analysis The problem has been solved using the two monitors. As expected, the results have been identical. Only execution times have been affected. A simple experiment has been performed to demonstrate the advantages of this parallel proposal (see Table 2): each configuration of Table 1 is executed a variable number of times (Table 2: field reps) across both monitors (Table 2: Field sec: time used by original monitor, and Table 2: Field par: time used by proposed). The cluster configuration will depend on the number of repetitions to be obtained (see section 4). This configuration is specified in config. cluster field in Table 2. As shown in Fig. 7, times provided by proposed monitor are much lower than those offered by original monitor. Increasing the number of repetitions causes monitor execution times have a linear rise, while proposed monitor times suffer a much smaller increase. TABLE I APPROPRIATE CONFIGURATIONS OBTAINED Algorithms NSGA-II SPEA-II

Generations 800 800

Popul ation 250 250

Crossover Probability 0.8 0.6

Mutation probability 0.5 0.5

Hyper volume 0.975 0.976

TABLE II EXECUTION TIMES OF THE MONITORS TO MULTIPLE NUMBERS OF REPEATS Execution times (s) SPEA-II NSGA-II Seq Par Seq Par 480 100 690 150 1000 102 1320 152 1970 105 2437 151 2960 201 3658 261 4050 205 4925 263

Repeats 5 10 20 30 40

Cluster configuration 5 nodes with 1 core 5 nodes with 2 cores 5 nodes with 4 cores 5 nodes with 3 cores, 2 repeats 5 nodes with 4 cores, 2 repeats

As we have demonstrated experimentally, the parallelized alternative proposed in this paper involves a great advantage over the framework currently used. VI.

[4]

[5]

COMPARISONS WITH OTHER AUTHORS

As we mentioned in the first section, the number of known public instances for this problem is almost nonexistent. There is only one that is well described, so it has been used in this work. The shortage of real instances is a difficulty, if we want validate our methods by comparing the results with those of other authors. In addition, some authors use other nondocumented or randomly generated topologies. However, there are other works that use the same instance that we have adopted. Thus, Ko et al. [6] provide a single cost value of the pair {cost, delay} as result; in this case we have obtained better results. Barnerjee et al [9] and R. Kumar et al [10] offered Pareto fronts without specifying any measure of quality, like the hypervolume we have used. VII. CONCLUSIONS AND FUTURE WORK In this paper, we have solved a multi-objective network design optimization problem for real traffic models of the Internet over the ten most populated China cities. To tackle this problem, we have used two EAs not probed before, and provided by the PISA framework, a platform for solving NP-hard optimization problems: SPEAII and NSGA-II. With those algorithms, we have made a statistical study in order to determine the efficiency, concluding that SPEA-II offers better results than NSGA-II. In addition, we have proposed a novel framework parallelization based on MPI in order to take advantage of the current cluster computers used in large optimization problems. We have probed that this proposal offers better execution times than the original framework. As future works, we propose to use more instances of real scenarios, other traffic models as Self-Similar [23], other metaheuristics, as well as using OPEN-MP [24] to improve the proposed parallelization framework.

[6]

[7]

[8] [9]

[10]

[11]

[12] [13]

[14]

[15] [16]

[17] [18]

VIII. ACKNOWLEDGEMENTS This work has been partially funded by the Ministry of Education and Science and the ERDF (European Regional Development Fund) under the project TIN2008-06491-C0404 (MSTAR project), and Junta de Extremadura through GR10025 grant provided to group TIC015. IX. [1] [2]

[3]

REFERENCES

Andrew S. Tanenbaum, Computer Networks, Prentice Hall, 2003. B. Dengiz, F. Altiparmak, and A.E. Smith, Local search genetic algorithm for optimal design of reliable networks, IEEE Transactions on Evolutionary Computation, vol. 1, Sep. 1997, pages 179-188. Rong-Hong Jan, Fung-Jen Hwang and Sheng-Tzong Chen, Topological optimization of a communication network subject to a reliability constraint, IEEE Transactions on Reliability, vol. 42, 1993, pages 63-70.

[19] [20]

[21] [22] [23]

[24]

Cem Ersoy and Shivendra S. Panwar, Topological design of interconnected LAN/MAN networks, IEEE Journal on Selected Areas in Communications, vol. 11, 1993, pages 1172-1182. F.N. Abuali, D.A. Schoenefeld, and R.L. Wainwright, Designing telecommunications networks using genetic algorithms and probabilistic minimum spanning trees, Proceedings of the 1994 ACM symposium on Applied computing - SAC ’94, Phoenix, Arizona, United States: 1994, pages 242-246. King-Tim Ko, Kit-Sang Tang, Cheung-Yau Chan, Kim-Fung Man and Sam Kwong, Using genetic algorithms to design mesh networks, Computer, vol. 30, Ago. 1997, pages 56-61. A. Kumar, R.M. Pathak and Y.P. Gupta, Genetic-algorithm-based reliability optimization for computer network expansion, IEEE Transactions on Reliability, vol. 44, 1995, pages 63-72. C. Coello Coello, Evolutionary algorithms for solving multi-objective problems, New York: Springer, 2007. N. Banerjee and R. Kumar, Multiobjective network design for realistic traffic models, Proceedings of the 9th annual conference on Genetic and evolutionary computation - GECCO ’07, London, England: 2007, pages 1904-1911. R. Kumar, P.P. Parida and M. Gupta, Topological design of communication networks using multiobjective genetic optimization, Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No.02TH8600), Honolulu, HI, USA, pages 425-430. Kalyanmoy Deb, Samir Agrawal, Amrit Pratap and T Meyarivan, A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multiobjective Optimization: NSGA-II, Parallel Problem Solving from Nature PPSN VI, 2000. E. Zitzler, M. Laumanns and L. Thiele, SPEA2: Improving the strength Pareto evolutionary algorithm, EUROGEN 2001. Stefan Bleuler, Marco Laumanns, Lothar Thiele, Eckart Zitzler, PISA - A Platform and Programming Languaje Independent Interface for Search Algorithms, Berlin: Springer, Evolutionary Multi-Criterion Optimization, 2003. Carlos M.Fonseca, Joshua D.Knowles, Lothar Thiele and Eckart Zitzler, A Tutorial on the Performance Assessment of Stochastic Multiobjetive Optimizer. Guanajuato, Mexico: EMO, 2005. T. Cormen, Introduction to algorithms, Cambridge Mass.: The MIT Press, 2001. Mohsen Guizani, Ammar Rayes, Bilal Khan and Ala AlFuqaha.,Network Modeling and Simulation: A Practical Perspective, Wiley-Interscience, 2010. W. Gropp, Using MPI : portable parallel programming with the message-passing interface, Cambridge Mass.: MIT Press, 1999. A. Rubio-Largo, M.A. Vega-Rodriguez, J.A. Gomez-Pulido, y J.M. Sanchez-Perez, A Differential Evolution with Pareto Tournaments for solving the Routing and Wavelength Assignment problem in WDM networks, IEEE Congress on Evolutionary Computation, Barcelona, Spain: 2010, pages 1-8. L. Ott and M. Longnecker, An introduction to statistical methods and data analysis, Cole Cengage Learning, 2008. Shapiro, S. S. and Wilk, M. B, An analysis of variance test for normality (complete samples), Biometrika, 52, 3 and 4, (1965) 591611. Chakravarti, Laha, and Roy. Handbook of Methods of Applied Statistics, Volume I, John Wiley and Sons, (1967), pages 392-394. Wilcoxon, F. Individual Comparisons by Ranking Methods, Biometrics 1, 1945, pages 80-83. Sahinoglu Z and Tekinay S, On multimedia networks: self-similar traffic and network performance, IEEE Communications Magazine, vol.37, no.1, Jan. 1999, pages 48-52. Publisher: IEEE, USA. B. Chapman, Using OpenMP: portable shared memory parallel programming, Cambridge Mass: The MIT Press, 2007.

Suggest Documents