Improving the simulation of Storage Area Networks (SAN) using ... - UPV

Improving the simulation of Storage Area Networks (SAN) using concurrent execution À. Perles, X. Molero, A. Martí, V. Santonja, J.J. Serrano Departament d’Informàtica de Sistemes i Computadors Universitat Politècnica de València Camí de Vera, 14. 46022 València (Spain) e-mail: [email protected]

KEYWORDS Storage networks, parallel simulation, cluster computing, PVM environment, CSIM simulation language. ABSTRACT A Storage Area Network is a high-speed subnet that establishes a direct connection between heterogeneous storage resources and servers. Up to now, the work done in our department on the performance evaluation of these systems has been carried out using traditional simulation techniques. However, the SAN simulator designed by our researchers needed a lot of computational time to obtain statistically correct results. In this work we show how we have improved the execution time of our SAN simulator using a concurrent simulation approach. This approximation basically consists of executing in parallel variablesized independent replications of the simulation model. The obtained results encourage us to continue working on concurrent simulation development. INTRODUCTION Storage Area Networks (SAN) (Clark 1999) are an emerging data communications platform which interconnects servers and storage devices (such as disks, disk arrays, and tape drives) to create a pool of storage that users can access directly. This networking approach reports benefits such as computer clustering, topological flexibility, fault tolerance, high availability, and remote management (see Fig. 1). In order to evaluate the performance of these systems, a very flexible and easy to use SAN simulator has been developed (Molero et al. 2000). This tool is able to consider, among others, both real world I/O traces and synthetic I/O traffic, Fibre Channel and Myrinet switches, message fragmentation, faults in links and switches, virtual channels, different routing

algorithms, etc. This simulator has been written using CSIM (CSIM 1998) libraries for C. Our simulator accurately models a SAN focusing on its internal design, producing very computationally expensive simulation programs. Storage Area Network

Server

SAN

Local Area Network

LAN Server

Fig. 1. A typical SAN environment

In order to improve the execution time of our SAN simulator, we have introduced concurrent simulation using the Clustered Simulation Experimenter (CSX) tool (Perles et al. 1999), that makes massive concurrent simulation experiments without having to be concerned in locating computational resources to carry it out. In this work, we present the benefits obtained when applying this tool to a CSIM based SAN simulator. Distributed simulation (Fujimoto 1990) has not been considered because it can not be easily applied to our model and its application normally requires an extra modellers effort. THE SAN SIMULATION MODEL We have used the CSIM language for implementing the SAN simulator. CSIM consists of a library of procedures, functions and macros that give C programmers a powerful tool for developing discreteevent, process-oriented simulation models. A CSIM program models a system as a collection of CSIM processes that interact with each other by using internal structures to CSIM. The simulator, which is about 9500 lines long, has been written in ANSI C

This work has been supported by both the CICYT TAP99-0443-C05-02 and the “UPV- Design and modeling of storage systems based on magnetic disks” projects.

code in order to enable system portability. In fact, we have run it in both Unix and Windows systems. The process-oriented conception of CSIM allows an easy way for structuring the simulator program. Before the simulation stage, the program first reads and tests input parameters; next, it generates the network topology and computes routing tables according to the specified routing algorithm. Once the load has been specified by means of trace files or synthetic generation, and disks have been initializated, the main() function calls the main simulation process, sim(), for carrying out the simulation itself. This program structure has been defined by the following code: void main() { test_input_parameters(); topology_generation(); calculate_routing_tables(); load_specification(); disks_initialization(); sim();

a packet process terminates when its last flit leaves the source device. As a consequence of the way messages and packets processes are modeled, the only entities that network manages are flits. Therefore, flit processes terminate when they leave the network at the destination device. When the last flit of an entire message arrives at the destination disk, it creates the disk_access() process that models the disk request. Once the access is completed, this disk_access() process creates a new message() process and then terminates. This new message models the response (data or acknowledgement, if read or write, respectively) from the disk to the server that initiated the I/O operation. The program uses the batch means simulation technique to statistically test output variables. However, obtaining accurate results appears to be computationally expensive. In consequence, both the concurrent simulation and the replication simulation technique have been considered in order to reduce the simulation response time. Fig. 3 shows a typical result obtained with the SAN simulation model. Each point in the curve represents the performance for a specified load value.

collect_statistics(); } sim() Alive througout the whole simulation server()

Processes are one of the most important parts in the simulator design. The global process hierarchy is shown in Fig. 2. The sim() process is the main simulation process. It creates one server() process for each server connected to the SAN; these processes generate the I/O operations according to load specification.

messsage(

packet() When the message has arrived to disk header_flit()

Throughout the whole simulation, only the sim() and server() processes are alive. A message terminates when the last flit of its last packet has left the source device (server or disk) and enters the network at the first switch in the path. In the same way,

data_flit()

last_flit()

Fig. 2. CSIM process hierarchy of the SAN model

900

800

I/O mean response time (cicles)

Each I/O operation consists of transmitting two different messages: a data and a control message (read request or write acknowledgement). Data message processes create several data packet() processes, according to their respective length. For example, a 8192-byte data message may generate 4 packets of 2048 bytes each one, or 1 single packet of 8192 bytes, depending on the maximum length allowed by the storage network. Control message processes only create a control packet() process, a few bytes long. Finally, a packet generates the corresponding flit processes (there are three different classes: header, data, and tail flit processes).

disk_access()

700

sim() 600

sim() 500

sim() 400

300

200 0,5

1

1,5

2

2,5

3

3,5

Delivered traffic (bytes/cicle)

Fig. 3. Typical simulation result

4

4,5

workstation PVM virtual machine PVM messages experiment running

LM LM

LM

LSM Local Simulation Monitor

MLM machine database

MM Master Manager

LSM instrumented Simulation

MLM Machine and Load Manager

experiment database

UI LM

LM

MM

LM

Load Monitor

UI UIS User Interface Server

LM UI

UIS

UI LSM

User Interface

UI

Fig. 4. The CSX architecture

THE CSX TOOL The CSX tool main purpose is to use a lot of idle and heterogeneous workstations at university laboratories and research centers to run concurrent simulations. Simulations executed using CSX are discrete-event simulation models that, basically, will be run using replication techniques, and will be monitored in order to extract statistical information of the output variables. The CSX design allows its application to any commercial or public simulator, avoiding the apprenticeship of a new simulation language. It provides a general form of monitoring and controlling simulations and it has been initially applied to the SMPL simulation language (McDougall 1987). The monitorization of the model only requires minimal changes in the original program code, and it is mainly based on a special event injection in the simulator event queue. This allows external program control without introducing noticeable overhead. Fig. 4 shows the design of a CSX environment in operation. This is outlined as a distributed application working under PVM (Geist et al. 1994) that lets Unix and Windows heterogeneous computers incorporation to a parallel virtual machine. In this work, this environment has been successfully applied to the CSIM simulation language. The SAN simulator program code has been modified and linked with CSX

libraries in order to be monitorized. The main introduced modifications are showed bellow: #include "csim.h" #include "csx_csim.h" void main() { ... /* Statistics trap function. */ csx_uservar("I/0 mean resp. time",tr); /* CSX monitor connection. */ csx_enroll(); /* Main simulation process. */ sim(); ... } void sim() { /* Schedule the CSX control event. */ csx_first_event(); ... }

Using the CSX tool, a concurrent simulation can be done by spawning N replications of the instrumented program. Each replication uses a different random stream, and the output variables of each replication allow obtaining a confidence interval. SIMULATION RESULTS Execution time of serial and parallel simulation has been analyzed using the interconnection topology shown in Fig. 5. This topology presents an irregular configuration where three switch ports are used to connect to other switches. Servers and disks may be attached to the remaining ports.

In order to run our experiments, 10 PCs AMD K62/350 MHz with 128 MB RAM and running SuSE 6.1 Linux has been utilized. Switch

0 1

Disk

5

Free port

Server

When CSX receives a sample of statistical information of any replication, it computes the confidence interval for the user-selected output variables. The simulation will end if the confidence interval satisfies the user-specified end conditions. Fig. 8 shows the CSX computed mean and its limits for a 95% confidence interval. The stopping simulation criterion is achieved when the relation between the mean and the half-width of the confidence interval is lower than 5%. Simulation has been artificially continued in order to show the confidence interval evolution.

3

5000

2 4500

4000

3000

2500

Replications mean

2000

reset of output variables

1500

1000

500

97

100 100

94

91

97

88

85

82

79

76

73

70

67

64

61

58

55

52

49

46

43

40

37

34

31

28

25

22

19

16

7

13

0 4

The original simulation program has been executed using one of the PCs previously described. This program uses the batch means analysis method, and convergence for the output variable “I/O mean response time” is achieved in 64 minutes. Fig. 6 shows the evolution of the model output variable "I/O mean response time" and the behavior of the statistical analysis method of CSIM, being observed that confidence interval is not offered until sufficient batches has been collected.

3500

1

Fig. 5. A SAN with six switches and irregular topology

I/O mean response time

4

10

Bidirectional link

Simulated time (cicles/4000)

Fig. 7. I/O mean response time for each independent replication and CSX calculated mean 5000

END OF SIMULATION Desired error for the confidence interval

95% confidence interval.

4500

4000

I/O mean response time

4000

3500

3000

CSIM computed mean

END OF SIMULATION 5% error

2500

Response execution time for the slowest replication = 890 s

3000

2500

5% error bars

2000

1500

reset of output variables

2000 1000

1500 500

1000

94

91

88

85

82

79

76

73

70

67

64

61

58

55

52

49

46

43

40

37

34

31

28

25

22

19

16

13

7

1

4

0

Reset of output variables

10

I/O mean response time (cicles)

Confidence intervals

3500


500

Fig. 8. CSX computed mean and confidence interval 137

133

129

125

121

117

113

109

97

105

101

93

89

85

81

77

73

69

65

61

57

53

49

45

41

37

33

29

25

21

9

17

13

5

1

0

Simulated time (cicles/10.000)

Fig. 6. Original CSIM output analysis method behavior

Next, an independent replication has been spawned in each of the 10 available computers. Statistical output variables are sent, asynchronously, to the CSX tool by the running simulation replications every 14 seconds in average. Fig. 7 shows the evolution of collected values of the output variable for each replication and also the evolution of the CSX calculated mean.

End-simulation execution time for this experiment was 14.8 minutes. Compared with the 64 minutes of the original model, our concurrent approach was 4 times faster. However, we have used a total amount of 10 computers. Replicatons (computers) Original (batch) 10 7 4

End of simulation (seconds) 3840 890 1016 1090

Speed-up 4.3 3.8 3.5

Table 1. Speed-up and number of replications

Table 1 shows the relation between speed-up and number of computers. In this case, we have used the same number of replications as the number of available computers.This relation is not lineal, for example, with a total of 4 computers and 4 independent replications, the execution response time was 18.17 minutes, that is, about 3 times faster than the original simulation program. This effect is due to the initial model warmup, because batch means only has one transient period and replications method has a transient period for each replication. In this experiment, 6 replications are a good relation between speed-up and mean coverage. Another important factor that influences speed-up is the different performance executing each replication. Fig. 9 shows the relation between simulated time and replications response time. For this experience each computer only has run one replication. As can be seen, some simulations are faster than others, producing that the end of simulation is determined by the slowest replication. In this experience this effect is due to the influence of the random stream used in each replication. On a true heterogeneous environment, computer perfor-mance, input random stream, and load variation effects can be solved using dynamic load balance techniques. 1800

As a future work, we are working on enhancing the statistical output analysis method and to apply dynamic load balance techniques in order to improve the concurrent simulation tool. REFERENCES Farley M. Building storage area networks. McGraw-Hill. January, 2000. User's guide: CSIM18 Simulation Engine (C version), Mesquite Software, Inc. 1998 Perles A., Martí A., Serrano J.J., Clustered Simulation Experimenter: A tool for concurrent simulation on loosely coupled workstations. Proceedings of the 13th European Simulation Multiconference (ESM99). May, 1999. McDougall M. H. Simulating Computer Systems. The MIT Press, Cambridge, Massachusetts. 1987. Fujimoto R.M. Parallel discrete event simulation. Communications of the ACM. N. 10. October 1990 Geist A., Beguelin A., Dongarra J., Jiang W., Manchek R.and Sunderam V., PVM3 user's guide and reference manual, Technical Report ORNL/TM-12187, Oak Ridge National Laboratory, May, 1994. Clark, T. Designing storage area networks: a practical reference for implementing fibre channel SANs. AddisonWeslwy, 1999.

1583 seconds

1600

1400

Execution time (seconds)

concurrent independent replications. The developed tool that implements this parallel method works on a cluster of heterogeneous computers and can be applied to any discrete-event simulator.

1200

1355 seconds

1000

Molero X., Silla F., Santonja V., Duato J. Modeling simulation os storage area networks. To appear in proceeding of the 8th. International Symposium Modelling, Analysis and Simulation of Computer Telecommunications Systems, August, 2000

and the on and

800

BIOGRAPHY

600

Difference response time between the fastest and the slowest replications

400

200

y = 2.5091x - 9.5729 97

94

100

91

88

85

82

79

76

73

70

67

64

61

58

55

52

49

46

43

40

37

34

31

28

25

22

19

16

7

13

4

10

1

0


Fig. 9. Execution time of each replication

CONCLUSIONS AND FUTURE WORK The recently advent of SANs as the new storage paradigm has motivated the interest of our department in evaluating their performance. However, the researches using the designed simulator experienced long execution times mainly due to the detailed modeling process and the simulation technique used (serial batch means). In this work we have improved the high execution time of this SAN complex simulator by using

Àngel Perles is an assistant professor in the Department of Computer Engineering DISCA at the Politechnical University of Valencia. He is member of the Fault Tolerant Systems group in this department. His research interests include the design and development of parallel and distributed simulation tools to solve efficiently the simulations models of the group.

Improving the simulation of Storage Area Networks (SAN) using ... - UPV

Improving the simulation of Storage Area Networks (SAN) using ... - UPV

Suggest Documents

Configuring Storage Area Networks Using ... - Semantic Scholar

GSAN: Green Cloud-Simulation for Storage Area Networks

Storage Area Networks (SANs) - CiteSeerX

Internetworking the Storage Area Networks - wseas.us

in Storage Area Network (SAN) - CiteSeerX

SAN (Storage Area Networking) Overview - Confex

Introduction to Storage Area Networks - IBM Redbooks

Introduction to Storage Area Networks - IBM Redbooks

Storage Area Networks Security Protocols and ... - Employees.Org

Introduction to Storage Area Networks - IBM Redbooks

Introduction to Storage Area Networks - IBM Redbooks

Improving the Performance of IEEE802.11s Networks using ...

Wireless Body Area Networks for Telemedicine ... - iTEAM-UPV

Improving the Reliability of Wireless Body Area Networks - iWINLAB

Next-Generation Optical Storage Area Networks: The ... - CiteSeerX

On Scalable Storage Area Network (SAN) Fabric ... - IBM Research

pdf-1448\san-capacity-performance-analyst-storage-area-network ...

Improving the Context-sensitive Dependency Graph - UPV

Improving the Context-sensitive Dependency Graph - UPV

Improving Offline Narrowing-Driven Partial Evaluation using ... - UPV

SIMULATION OF BIOCHEMICAL NETWORKS USING COPASIâA ...

SIMULATION OF LARGE-SCALE NETWORKS USING SSF

Storage/SAN Compatibility Guide

Joint Cutoff Probabilistic Estimation Using Simulation: A Mailing ... - UPV

Improving the simulation of Storage Area Networks (SAN) using ... - UPV