Simulation Modelling of Parallel Systems - CiteSeerX

0 downloads 0 Views 195KB Size Report
Oct 17, 1996 - University of Westminster. Centre for Parallel Computing ..... Sciences,University of Texas at Austin, 1993. 17] P. Newton, J. Dongara, Overview ...
Simulation Modelling of Parallel Systems T. Delaitre, G.R. Justo, F. Spies, S. Winter,

University of Westminster Centre for Parallel Computing 12-14 Clipstone street London W1M 8JS

email: fdelaitt,justog,spiesf,[email protected] In this paper, a simulation model for incorporation within a performanceoriented parallel software development environment is presented. This development environment is composed of a graphical design tool, a simulation facility, and a visualisation tool. Simulation allows a parallel program performance to be predicted and design alternatives to be compared. The target parallel system models a virtual machine composed of a cluster of workstations interconnected by a local area network. The simulation model architecture is modular and extensible which allows the re-con guration of the platform. The model description and the validation experiments which have been conducted to assess the correctness and the accuracy of the model are also presented.

1 Introduction The key obstacle to the widespread adoption of parallel computing is the diculty in program development. Firstly, an application has to be decomposed into parallel objects (processes, or tasks) according to the computational model underlying the programming language. Secondly, the parallel hardware con guration has to be speci ed. Finally, the processes are mapped onto the hardware. The range of design choices available to the parallel program designer at each of these three stages can be immense. This has led to highly-optimised platform-speci c solutions, which are not easily ported to other platforms. This project is funded by an EPSRC PSTPA programme, Grant Number : GR/K40468 and also by EC Contract Num: CIPA-C193-0251, CP-93-5383 1

Preprint submitted to Elsevier Preprint

17 October 1996

Rapid prototyping is a useful approach to the design of (high-performance) parallel software in that complete algorithms, outline designs, or even rough schemes can be evaluated at a relatively early stage in the development lifecycle, with respect to possible platform con gurations, and mapping strategies. Modifying the platform con gurations and mappings will permit the prototype design to be re ned, and this process may continue in an evolutionary fashion throughout the life-cycle. However, appropriate approaches (techniques) to the evaluation of performance are required at each iteration. The three main approaches are: measurement, analytical modelling and simulation [13]. Measurement can only be applied to existing systems and the results are a ected by a range of run-time factors arising within the system and the measurement process itself. Analytical modelling { the development and study of models whose solutions are tractable by mathematical techniques { enables exact descriptions of a system behaviour to be developed but restrictions on the system functionality and workload are usually introduced to make the model solvable, resulting in loss of accuracy. Simulation of models which are often analytically intractable overcomes both these diculties, enabling the behaviour of arbitrarily complex software and hardware systems to be treated at any level of detail. Modelling abstractions may be necessary to improve the performance of the simulation but this is generally less restrictive than in the case of the analytical approach. Simulation thus helps the designer to identify those components which limit the capacity of the system (bottlenecks), and allows performance estimates of the application to be obtained at early stages of its development. In the next section, we describe several modelling tools. In Section 3, we present techniques used for developing simulations. Our simulation model is presented in Section 4 and results of its validation are presented in Section 5. Finally, in Section 6, we present the main conclusions of the paper and future work.

2 Parallel System Performance Modelling Tools The current trend in parallel software modelling tools is to support all the software performance engineering activities in an integrated environment [18]. A typical toolset should be based on at least three main tools: a graphical design tool, a simulation facility and a visualisation tool. The graphical design tool and the visualisation tool should coexist within the same environment to allow information about the program behaviour to be related to its design. Many existing toolsets are comprised of only a subset of these tools but visualisation is usually a separate tool. In addition, the target parallel system is typically a transputer-based multiprocessor machine and the modelling of the 2

operating system is usually not addressed (except in the PEPS toolset [19]). One of the earliest toolsets for parallel programming was the Transim/Gecko [11,26] toolset, developed under the Parsifal project at the University of Westminster. The designer can rapidly evaluate di erent designs of an occam-like program running on a transputer-based multiprocessor by using the graphical tool (Gecko) to animate the traces generated by the simulator (Transim). Gecko also allows occam processes to be re-mapped graphically and the application to be then re-simulated. The Transim/Gecko approach has been the model for other simulation-based environments such as MIMD [24,2], developed at the University of Edinburgh. MIMD (Multiple Instruction stream, Multiple Data stream) is a modelling environment for studying the performance of parallel programs. MIMD is built on top of DEMOS [1] and Simula. DEMOS is an extension of Simula which contains classes suitable for discrete event simulations. MIMD provides classes for modelling message passing parallel programs running on distributed memory architectures. The existing classes allow occam programs running on Transputer machines to be modelled. Both Transim/Gecko and MIMD have similar features except that in MIMD an experimental framework has been de ned in order to investigate the e ects of varying certain parameters characterising a parallel program's run-time behaviour. A more recent toolset, which support the development of real-time applications based on transputers (primarily) and PowerPCs, is the HAMLET Application Development System [20]. It combines graphical design tools, simulation techniques, and performance traces. In particular, it consists of a design entry system (DES), a speci cation simulator (HASTE), a debugger and monitor (INQUEST), and a trace analysis tool (TATOO). A key feature of HAMLET is the ability to produce a code suitable for the simulator and a code suitable for real execution from the same graphical design. Also, performance traces obtained from the simulation and from the monitoring tools have the same format, therefore the visualisation tool is suitable for simulation and monitoring. The limitation of HAMLET is that its hardware and software libraries are restricted to transputers and PowerPCs. The toolsets described above usually assume the target parallel system to be a physical machine. Other toolsets, however, as the one presented in this paper, are based on a virtual machine platform such as PVM. The PEPS project [19] aims to investigate benchmarks, modelling, characterisation, and monitoring of PVM programs for Transputer-based platforms. The aim of performance modelling in PEPS is to develop a tool for the performance evaluation of computer architectures. PEPS uses the Simulog simulation toolset including MODARCH which o ers a range of software and hardware components. The library of objects allows PVM programs running on a network of transputers to be modelled. This model of PVM used is much simpler than that of PVM 3

within a heterogeneous distributed computing environment in which all nodes share a single communication medium and where contentions occur. Recent projects now include heterogeneous architectures. Usually, they use a message-passing environment as an intermediate layer to achieve this goal. The VPE project [17] objectives are the design and the monitoring of parallel programs in the same tool. The design is described such as a graph where the nodes represent sequential computation or a reference to another VPE graph. Performance analysis and graph animation are not the targets, but the design aspect of this work is one of the most elaborated. In an other way, the PARADE project [25] is mainly oriented on the animation aspects. PARADE is divided into a general animation approach which is called POLKA, and speci c animation developments such as PVM with PVaniM, Threads with GThreads, and, HPF. This work does not include any graphical design of parallel programs, thus, the prede ned animations and views can decrease the user understanding. But, one of the most important point is the ability of classi cation between general and speci c concepts. In the the SEPP project [29,6] (Software Engineering for Parallel Processing) a toolset based on ve types of tools has developed. There are static design tools, dynamic support tools, behaviour analysis and simulation tools, and visualisation tools [14,9,15].The programming model is also based on a subset of PVM.

3 Simulation in EDPEPPS/SEPP The EDPEPPS [5] (Environment for Design and Performance Evaluation of Portable Parallel Software) toolset being developed at the University of Westminster includes parts of the SEPP toolset. It is based on a rapid prototyping philosophy where the designer synthesises a model of the intended software which may be simulated, and the performance subsequently analysed using visualisation. The toolset combines a graphical design tool (PVMGraph), a simulation facility, and a visualisation tool (PVMVis). The same design is used to produce a code suitable for simulation and for real execution. The results of the simulation are an event trace le and some statistical information about the virtual machine. The graphical design tool is based on the PVM programming model. Simulation of the PVM platform is built using a state of the art simulation environment, SES/WorkbenchTM [21,22]. The simulation, as in Transim, is based on discrete-event modelling which can provide a remarkable degree of accuracy. The technology is well-established, and sophisticated modelling tools are available commercially. SES/Workbench, for example, has wide functionality meeting the requirements of computer 4

system modelling, and includes a time-saving graphical design interface and animation-based visualisation capabilities. Techniques for parallelising discreteevent simulation programs are also well-established, thus providing a route for optimising the run-time performance of the simulator. SES/Workbench has been used both to develop, and simulate platform models. Thus the simulation engine in SES/Workbench is an intrinsic part of the toolset.

4 The EDPEPPS/SEPP Simulation Model The EDPEPPS/SEPP simulation model consists of the PVM platform model library and the PVM programs for simulation. The PVM platform model is partitioned into three layers (Figure 2): the message passing layer, the operating system layer and the hardware layer. Modularity and extensibility are two key criteria in simulation modelling, therefore layers are decomposed into modules which permit a re-con guration of the entire PVM platform model. The initial modelled con guration consists of a PVM environment which uses the TCP/IP protocol, and a cluster of heterogeneous workstations connected to a single 10 Mbit/s Ethernet network. A PVM program generated by the PVMGraph graphical design tool is translated into the SES/Workbench simulation model language and passed to the SES/Workbench simulation engine, where it is integrated with the platform model for simulation. 4.1 SimPVM { A Simulation Oriented Language

PVMGraph allows PVM applications to be developed using a combination of graphical objects and text. From this description, executable and \simulatable" PVM programs can be generated but annotations must be inserted into the graphical/textual source to control the simulation. All simulation models in EDPEPPS/SEPP are based on queueing networks. The \simulatable" code generated by PVMGraph is predominantly a description of the software rather than the execution platform. It is written in a special intermediary language called SimPVM, which de nes an interface between PVMGraph and SES/Workbench. To simulate the application, a model of the intended platform must also be available. Thus, the simulation model is fundamentally partitioned into two sub-models: a dynamic model described in SimPVM, which consists of the application software description and some aspects of the platform (e.g. number and type of hardware nodes) and a static model which represents the 5

underlying parallel platform. By building the static descriptions into the simulation system itself, the service is transparent to the application designer, yet provides a virtual, simulation platform for the generated application. There are three available descriptions to de ne a parallel program. During the early stage of the design, you can use delay functions to simulate a block of code. Later, you may improve the description of this block by using probabilistic evaluation to simulate a conditional loop or branch. Finally, you will need to replace all the used simulation facilities by real data and instructions. It is possible to mix these three types of description into the same parallel application, because the need of accuracy may di er between blocks of code. The SimPVM language basically contains the following elements: { the list of processes to be initially executed (exec), and the host (identi cation) number where they execute; { the description of processes (process) and functions (function). In this version, processes cannot be parameterised; { C instructions for variables declaration, loops (for and while), conditional instructions (if else), and assignments; { PVM functions for process management (e.g. pvm mytid and pvm spawn), bu er management (e.g. pvm getsbuf) and for point-to-point communication (e.g. pvm send); { simulation constructs such as computation delay functions and statistical variables. The time parameter of the delay function is considered as a time of intensive processor use. The probabilistic evaluation is a boolean function which may be true according to its expression parameter. Thanks to these facilities, it is possible to analyse performances of a general program behaviour, and not only of an execution which depends on a speci c set of data. SimPVM lies above the LIBPVM level of the platform model but some special functions are also provided to allow direct interaction with the kernel model. A SimPVM program is then translated into an SES/Workbench simulation model where lines of the program are interpreted as simulation objects 2 . 4.2 The PVM Platform Model

The PVM message-passing layer (Figure 2) models a single (parallel) virtual machine dedicated to a user. It is composed of a daemon which resides on each host making up the virtual machine and the library which provides an 2

The SES/Workbench simulation model language is de ned by graphical nodes.

6

interface to PVM services. The daemon acts primarily as a message router. It is modelled as an automaton or state machine which is a common construct for handling events. The life-cycle of the state machine corresponds to the main function of the daemon. The LIBPVM library allows a task to interact with the daemon and other tasks. It contains functions for packing/unpacking messages, managing multiple bu ers, message passing and process control. The library is structured into two layers. The top level layer includes most PVM programming interface functions and the bottom level is the communication interface with the local daemon and other tasks. The major components in the operating system layer (Figure 2) are the System Call Interface, the Process Scheduler, and the Communication Module. The Communication Module is structured into 3 sub-layers: the Socket Layer, the Transport Layer and the Network Layer. The Socket Layer provides a communications endpoint within a domain. The Transport Layer de nes the communication protocol (either TCP or UDP). The Network Layer implements the Internet Protocol (IP) which acts as a message router. The Hardware Layer (Figure 2) is comprised of hosts and the communications subnet. Each host is modelled as a single server queue with a time-sliced round-robin scheduling policy. The communications subnet is Ethernet, whose performance depends on the number of active hosts and the packet characteristics. Resource contention is modelled using the CSMA/CD (Carrier Sense Multiple Access with Collision Detection) protocol. The basic notion behind this protocol is that a broadcast has two phases : propagation and transmission. During propagation, packet collisions can occur. During transmission, the carrier sense mechanism causes the other hosts to hold their packets. 4.3 Platform Model Veri cation and Validation

After developing our simulation model, a veri cation and validation step is necessary to guarantee that it gives relevant performance evaluation of the real system. Parts of our model validation have been conducted from the system resources layer to the application layer. Validation takes the forms of comparative measurement. In the case of Ethernet we compare our results against published measurements [23,4]. Functional modelling aims to reproduce the logical ow of the system being modelled (the target system). Functional model veri cation, whose aim is to ensure that the simulation program performs the target system functions correctly, is therefore similar to ordinary program debugging; program tracing and animation are two valuable techniques for this purpose. Once the functional model is veri ed, the next step is to model performance. The pur7

pose of performance modelling is to establish timing and other parameters (by measuring the target system). This model is validated by exhaustive experimentation on the target system and the model with the aim of obtaining comparative measurements between the two. The methodology adopted to validate the PVM platform model was a bottomup approach. The components of the Hardware Layer were rst validated in isolation from the other layers. Then the Operating System Layer was added, followed by the Message-Passing Layer. For each layer, appropriate performance measures were de ned and statistical components were added on top of each functional model to form a performance model. Results of the validation will be presented in Section 5.

5 Case study: CCITT H.261 Decoder The various layers of the platform simulation model have been validated experimentally, and the results presented in [UKPar]. To validate the platform as a whole, a PVM application, generated from the PVMGraph design tool, has been developed. The application chosen is the parallel-pipeline model of a standard image processing algorithm, the H.261 Decoder [3], proposed by Downton et al [8]. The parallel-pipeline - or pipeline processor farm (PPF) model is a generic approach to parallel algorithm design, which combines two standard decomposition methods: pipelining and process farming. The H.261 algorithm decomposes into a three-stage parallel pipeline: frame initialisation (T1); frame decoder loop (T2); and frame output (T3). The rst and last stages are inherently sequential, whereas the middle stage contains considerable data parallelism, and can be decomposed into a parallel farm. Thus, for example, the PPF topology for a middle stage farm of 5 tasks, is shown in Figure 3. The number of possible topologies which solve a given problem are clearly very large, even for the H.261 algorithm. The PPF model thus implicitly provides a rich set of experiments for validation of the simulator. Some of these results are described in t he following section. The same topological variation in the PPF model leads directly to performance variation in the algorithm, which, typically, is only poorly understood at the outset of design. One of the main purposes of the simulation tool in this case is to enable a designer to identify the optimal topology, quickly and easily, without resort to run-time experimentation. 8

5.1 Graphical Representation Using PVMGraph

There are compelling reasons to represent (program) designs graphically. One of them is to expose the software structure as the description of the constituent software components and their patterns of interconnection, which provides a clear and concise level at which to specify and design systems. In terms of message passing parallel programs, their structures are naturally graphical as parallel activities can be represented by the nodes of a graph and message

ow can be denoted by the arcs. This explains the popularity of graphical representations for parallel systems (for a survey refer to [16]). Furthermore, a clear description of a parallel program is important during the mapping and load balancing which are essential activities in the development of a parallel program. The graphical representation we have developed for EDPEPPS/SEPP tries to balance the aspects of design structure and the behaviour description of the components into a single graphical representation. The principle is that the design of a parallel program consists of nodes (tasks) and arcs (message ow) but the graph must be enriched with special allegories (symbols) which correspond to important aspects of the behaviour of the tasks. Since EDPEPPS/SEPP supports the development of PVM programs, these aspects refers to the operations of PVM. In this way, for most of the PVM operations we have de ned special symbols. Figure 4 illustrates the graphical representation of the H.261 decoder using PVMGraph. A PVM program design consists basically of a collection of processes (tasks) which interact by message passing. A task is the basic component of the design. A task is denoted by a box with its name. In the PPF example shown in gure 3, there are three types of tasks: T1 which corresponds to the frame initialisation, T2 which corresponds to the frame decoder loop and T3 which corresponds to the frame output as shown in Figure 4. Many instances of the same task can be created. For example, T2 has 5 instances. All instances of a tasks share the same code. A task may contain sub-tasks forming a hierarchical structure. This type of task, called composed task, is represented with double boxes, for example task T1 in Figure 4. A task becomes a composed task when it calls the PVM pvm spawn operation which is denoted by a small circle and a directed dashline from the parent task to the spwaned task. In Figure 4, T1 points to the instances of task T2 and to task T3, which means that T1 spawns all the other tasks in the application. The communication between tasks is carried out by calling PVM communication operations to send and receive messages, and each operation has its own 9

symbol. In Figure 4, the tasks use 4 types of communication operations are shown. Task T1 uses a family of outputs to represent a loop of sends (a for loop containing a pvm send). A family of outputs is denoted by two overlapped small triangles pointing to outside the box. Similarly task T3 uses a family of inputs (a for loop containing a pvm recv) which is denoted by two small overlapped triangles pointing to inside the box. Task T2 uses two operations, an input (pvm recv), which is denoted by a small triangle pointing to inside the box, and an output (pvm send) denoted by a small triangle pointing to outside the box. Tasks which have compatible interfaces (communication operations) can be connected. PVMGraph performs di erent consistency checks when the user tries to connect two tasks. The basic check consists of evaluating the direction of the communication, for example, only inputs can be linked to outputs. Other checks will depend on the type of the operation. An interaction is denoted by a solid line connecting the interfaces as illustrated in Figure 4 between tasks T1 and T2, and tasks T2 and T3. Unfortunately, in this paper we are not able to give all the details of the EDPEPPS/SEPP graphical representation but the description presented in this section outlines the expressiveness of the representation. The rst version of the PVMGraph editor has been completed and is under test. 5.2 The Validation Experiments

CCITT H.261 is an image encoding algorithm. Passing an image (or frame) through the pipeline processor farm is, in simulation terms, a form of transaction. The validation experiments described here are based on two transaction scenarios. In the former, a single image is processed; in the latter, ve images are pipelined through the PPF. Validation is achieved by running the same algorithm on the simulated platform, and on the corresponding real platform. A frame is a 352 x 288 8-bit image. Each frame is partitioned into 396 pels (a 16 x 16 block sub-image). Stages T1 and T3 of the algorithm process whole images sequentially. In stage T2, a frame, broken into pels, allows each member of the farm to work independently in parallel (Single Process, Multiple Data (SPMD) model). The principle architectural variation is in the middle stage of the PPF. In the experiments, the number of processors in the middle stage is varied from 1 to 5. In every case, the load (ie. the number of pels) is evenly balanced between processors. A minor architectural variation is available in the rst and last tasks (T1 and T3). In this case, because the tasks are felt to be relatively light, T1 and T3 have been mapped onto the same processor. Thus, 10

the experiments are based on an architecture which ranges bewteen 2 and 6 processors. The target platform is a heterogeneous network of up to 6 workstations (SUN4's. SuperSparcs and PC's). Timings for the three algorithm stages were extracted from [8] and inserted as time delays directly into both the simulation model, and the executi on model. Thus, di erences in computational speed arising from di erences in CPU speed are not included. However, CPU and architectural di erences are accounted for in the communication model, which is implicit to the real execution environment, and built into the simulation model. This experiment is suitable for validation purposes (since the simulated and real trials are identically structured), but may not be appropriate for predicting the real performance of the H.261 algorithm, in view of the approximate nature of the assumed computation times. Figure 5 shows the experimental results. For each transaction scenario, speedup is a normalised measure of execution time referred to a baseline experiment consisting of a single worker in the middle (T2) stage, ie. a total of 2 processors. It may be observed that speed-up monotonically increases with the number of workers in T2, but that the 5-frame scenario performs signi cantly better (in terms of scalable speed-up) than the 1-frame scenario. This is not unexpected, since the pipeline is fuller in the former case. In both scenarios, the simulator tracks the actual performance very well; the di erence between simulated and real execution speed-up never exceeding 10% of the real execution time.

6 Conclusion This paper has described the simulation subsystem underpinning EDPEPPS/SEPP, a toolset to support a performance-oriented parallel program design method. The toolset supports graphical design, performance prediction through modelling and simulation, and visualisation of predicted program behaviour. The designer is not required to leave the graphical design environment to view the program's behaviour, since the principal visualisation facility is an animation of the graphical program description, and the transition between design and visualisation viewpoints is virtually seamless. It is intended that this environment will encourage a philosophy of program design, based on a rapid synthesis-evaluation design cycle, in the emerging breed of parallel programmers. Success of the environment depends critically on the accuracy of the underlying simulation system. Preliminary experiments for validating the PVM-based platform model have been very encouraging, demonstrating accuracy of the model below 10%. The speed of simulator, which is also a critical factor, will 11

be addressed through parallelisation of the simulator.

Acknowledgement We would like to thank Romain Bigeard for developing part of the experiments. One of the authors (Thierry Delaitre) also wishes to acknowledge the support and contributions of his supervisor, Dr. Stefan Poslad, in the simulation aspects of the work reported in this paper.

References [1] G.M. Birtwistle, Discrete Event Modelling on Simula (McMillan, London, 1986). [2] R. Candlin and N. Skilling, A modelling system for the investigation of parallel program performance, Computer Performance Evaluation 6(1) (1992) 1{32. [3] CCITT Draft revisions of recommendation H261: video codec for audiovisual services at p  64 kbit/s Signal Process, Image Commun. 2(2) (1990) pages 221{239. [4] D.K. Choi and B.G. Kim, The expected (not worst-case) throughput of the ethernet protocol, IEEE Transactions on Computers 40 (1991) 245{252. [5] T. Delaitre, G.R. Justo, F. Spies and S. Winter, An environment for the design and performance evaluation of portable parallel software, Technical Report of the EDPEPPS project, EDPEPPS/6, University of Westminster, UK, 1996. [6] T. Delaitre, E. Luque, R. Suppi and S. Taylor, Simulation of parallel systems in sepp, in: A. Pataticza, ed., The Eight Symposium on Microcomputer and Microprocessor Applications 1 (1994) 294{303. [7] T. Delaitre, F. Spies and S. Winter, Simulation Modelling of Parallel Systems in the EDPEPPS project, in: C.R. Jesshope and A.V. Shafarenko, ed., UKPAR'96 Conference (Springer 1996) 1-13 . [8] A.C. Downton, R.W.S. Tregidgo and A. Cuhadar, Top-down structured parallelisation of embedded image processing applications, in: IEE Proc.-Vis. Image Signal Process. 141(6) (1994) 431-437. [9] G. Dozsa, T. Fadgyas and P. Kacsuk, A Graphical Programming Language for Parallel Programs, in: Budapest Technical University , ed.,Proceedings of the symposium on Microcomputer and Microprocessor Applications 1 (1994) 304{ 314.

12

[10] A. Geist, A. Begueling, J. Dongarra, W. Jiang, R. Manchek and V. Sunderam, PVM: Parallel Virtual Machine (MIT Press, 1994). [11] E.R. Hart and S.J. Flavell, Prototyping transputer applications, in: H.S.M. Zedan , ed., Real-Time Systems with transputers (IOS Press, Amsterdam, 1990) 241{247. [12] J.A. Hupp and J.F. Schoch, Measured performance of an ethernet local network, Comm. of the ACM 23(12) (1980) 711{720. [13] R. Jain, The Art of Computer Systems Performance Analysis (Wiley, 1991). [14] P. Kacsuk, P.Dozsa and T. Fadgyas, Designing Parallel Programs by the Graphical Language GRAPNEL, Microprocessing and Microprogramming 41 (1996) 625{643. [15] P. Kacsuk, G. Haring, S. Ferenczi, G. Pigel, G. Dozsa and T. Fadgyas, Visual Parallel Programming in Monads-DPV, in: XXXX, ed., Proceedings of the 4th Euromicro Workshop on Parallel and Distributed Processing (1996) 344{351. [16] P. Newton, A graphical retargetable parallel programming environment and its ecient implementation, Technical Report TR93-28, Dept. of Computer Sciences,University of Texas at Austin, 1993. [17] P. Newton, J. Dongara, Overview of VPE: a Visual Environment for MessagePassing, Heterogeneous Computing Workshop, 1995. [18] C.M. Pancake, M.L. Simmons and J.C. Yan, Performance evaluation tools for parallel and distributed systems, Computer November 28 (1995) 16{19. [19] PEPS Partners, PEPS bulletin, the bulletin of the performance evaluation of parallel systems project, EEC PEPS Esprit 6942, 1993. [20] P. Pouzet, J. Paris and V. Jorrand, Parallel application design: The simulation approach with HASTE, in: XXXXX. ed., High{Performance Computing and Networking 2 (1994) 379{393. [21] Scienti c and Engineering Software Inc, SES/Workbench User's Manual, Release 2.1, Scienti c Engineering Software Inc. 1992. [22] Scienti c and Engineering Software Inc. SES/workbench Reference Manual, Release 2.1, Scienti c Engineering Software Inc., 1992. [23] N. Shacham and V. B. Hunt, Performance evaluation of the CSMA/CD (1persistent) channel-access protocol in common-channel local networks, in: Proceedings of the IFIP TC6 International In Depth symposium on local Networks, ed., Local Computer Network (North-Holland, 1982) 401{414. [24] N. Skilling, Mimd: A multiple instruction stream multiple data stream computer simulator, Technical Report TR9107, University of Edinburgh, Dept. of Chemical Engineering, 1991. [25] J.T. Stasko, The PARADE Environment for Visualizing Parallel Program Executions Graphics, Visualization and Usability Center, 1995

13

[26] M. Stephenson and O. Boudillet, GECKO: A graphical tool for the modelling and manipulation of occam software and transputer hardware topologies, in: C. Askew, ed., Occam and the Transputer{Research and Applications (IOS press, 1990) XXX. [27] F. A. Tobagi and V. B. Hunt, Performance analysis of carrier sense multiple access detection. in: Performance Analysis of carrier sense multiple access with collision detection. , ed., Proceedings of the LACN symposium (1979) 217{245. [28] H. Wabnig and G. Haring, PAPS: The parallel program performance prediction toolset, in: XXXX, ed., 7th International Conference on Modelling Techniques and Tools for Computer Performance Evaluation (1994) XXX. [29] S. C. Winter and P. Kacsuk, Software engineering for parallel processing, in: A. Pataricza, ed., The Eight Symposium on Microcomputer and Microprocessor Applications 1 (1994) 285{293.

14

PVMGraph

SimPVM Translator SimPVM Modelling Language File

PVM Application

Trace File and Statistics File

PVMD

SES/Workbench Graph File

LIBPVM OS

PVM Platform Model Graph Files

System Resources

SES/Workbench Simulation Engine

SES/Workbench (GUI)

Fig. 1. The EDPEPPS Architecture.

15

Application Layer

PVM Applications PVMD

LIBPVM

Message-passing Layer

System Call Interface Process Scheduler

Socket Layer

Operating System Layer Transport Layer Network Layer

System Resources

Hardware Layer

Fig. 2. Simulation model architecture.

16

0

1

2

T1

3

4

6

5

T2

T3

Fig. 3. PPF topology for a three-stage pipeline with ve workers in the second stage

17

Fig. 4. PVMGraph main window.

18

PPF SpeedUp for 5 & 1 Frames 5 4.5 4

SpeedUp

3.5 3 2.5 2 1.5

EDPEPPS Simulator 5 Frames Real Experiments 5 Frames EDPEPPS Simulator 1 Frame Real Experiments 1 Frame

1 0.5 0

1

2

3 Number of Processors

4

5

6

Fig. 5. Comparison between EDPEPPS/SEPP simulator performance predictions and real experiments

19

Suggest Documents