Evaluating Application Mapping using Network

0 downloads 0 Views 113KB Size Report
parts of an application to be executed on a Network on. Chip (NoC) has ... which sets requirements for programmability and ..... Further automation would include.
Evaluating Application Mapping using Network Simulation Tommi Salminen, Juha-Pekka Soininen VTT Electronics, P.O. Box 1100 (Kaitoväylä 1), FIN-90571 Oulu, FINLAND E-mail: [email protected] Abstract The quality of the mapping decisions made when selecting computation and storage resources for different parts of an application to be executed on a Network on Chip (NoC) has a high impact on the overall system performance. A network simulator is needed to be able to test different mapping configurations. This paper presents such a simulator and demonstrates how it can be used to compare different kinds of application mappings.

1. Introduction When a complex system is being designed on a single silicon chip it may be favorable to design it as a network of separate processing units, a Network on Chip (NoC) [1]. The main benefits of having an integration framework in a form for a network of embedded system type of computing and storage resources are the separation of physical design issues from application development issues and the possibility of reusing system-level IP blocks [2]. In the interconnection the traditional design-specific bus- and wire-based connections can be replaced by a network of switches that provide packet-based communication between the resources [3]. The network also inherently supports the use of multiple clock domains and asynchronous communication. The communication network, therefore, becomes a general-purpose and scalable part of an NoC that can potentially be reused like another NoC IP block. The NoC architectures that are based on embedding a more or less independent subsystem are only feasible for complex system platforms, where it is necessary to have multiple clock domains and heterogeneous computing resources. Because of complexity the whole NoC must be reusable in various product variants and generations, which sets requirements for programmability and configurability [4]. Therefore, in the design of NoC-based systems it is necessary to separate computer-like platform development from the software-system type of application development [5].

The optimization of communication between the resources is the dominating hardware design problem. The first problem is the selection of the algorithm to be used in routing the packets and the second is the positioning of the IP blocks, e.g. computing and storage resources, in the network locations. Different approaches to solving these problems have been proposed by different researchers [6], as have general-purpose platforms for an NoC communication network [2, 7]. During the application mapping the problems are related to the optimization of system performance under a given application workload [8]. The system performance depends on how efficiently the computing resources can execute the application workload and how efficiently the network can handle the communication workload. The hypothesis in this paper is that by studying application mapping patterns using network simulation and abstract workload models it is possible to make mapping decisions that lead to a more efficient use of the network and savings in communication delays, buffer use and energy. We present a SystemC-based network simulator that enables the studying of the performance of different application mappings. The simulator produces values for packet injection and transmission delays, packet hopcounts and switch buffer use when the communication workload and application mappings are given. These values can be used to identify the network hot spots and for estimating the communication delays the application will experience with a particular mapping.

2. Application mapping for NoC platforms Application mapping means the mapping of functionality into computation and storage resources. In this paper the resources are assumed to be fixed. The idea with this assumption is that the implementation of the NoC is a fixed-product platform, and the mapping means the mapping of the product’s functionality into the platform. This does not exclude the possibility of having hardwired functions in the platform, but the focus is on the selection of computing resources for those functions that can be implemented in various places inside the NoC. The NoC platform is assumed to be a two-dimensional mesh of embedded computers that communicate with each

other via an asynchronous message passing network. The length of the packet is fixed to one word. The computing resources can be different from each other, which sets constraints on possible mapping, but this is not taken into account in this paper. Details about the platform architecture are given in [2]. The main phases of application mapping are: 1. the partitioning of functionality into mapping objects, e.g. functions, 2. the identification of feasible resources for those functions, 3. the selection of mappings, e.g. which resources are used for the execution of the functions, 4. development of communication functions between objects, and 5. implementing the functions with the chosen resources. The quality criteria and validation procedures are different in each phase. In this paper we focus on the third phase: selection of mapping. The problem is how to map application functions to resources so that communication is minimized. Minimization of the amount of communication and minimization of the distances that messages have to travel are the two most important issues. Both factors depend on how far apart from each other we have to map the functions that communicate with each other. Local communication occurs inside the resource, and global communication involves messages through the network. The ratio of local and global communication is the measure of the amount of communication. The distances of communication paths affect various issues. The amount of buffering capacity and the message delays are examples. Long distances involve more latency and also increase the probability of network congestion. The evaluation of network performance with given mapping can be used as a decision support method at various abstraction levels from analytical mathematical models down to simulated execution of application programs in processor models. In this paper we have chosen to use the network simulation and statistical workload models of application. The main reasons are the expected complexity of NoC-based systems, the need to analyze the platform type of architectures, and the need to perform evaluation at very early stages of design.

3. Evaluation of network performance To be able to evaluate the communication load in the network with different application mappings a network simulator was designed in SystemC language.

3.1.

Principles of operation

The simulator consists of a two-dimensional mesh of simulated processing resources, each connected to the global communication network via a switch. Each switch is connected to four neighboring switches, except for those at the network boundaries. The communication load in the network is produced by the resources, each of which has a file that holds the list of packets the resource will send during the simulation. The resources send one packet in each clock cycle to the resource defined by the destination address of the packet. The packet can also be defined as being sent inside the resource, in which case the packet will not contribute to the network load. Each packet hop between the switches takes one clock cycle, as well as the transmission of a packet between resource and switch. The resources consume the incoming packets as they arrive, so there are no computational delays assumed in the resources. For the simulations in this paper the network dimensions were chosen to be 5x5 with a switch output buffer length of two. For routing a simple algorithm that first routes the packet to the destination row and then along the row to the destination column was chosen. At present the switch buffer length and network dimensions are given as parameters at compile-time. However, some parts of the simulator still need manual work when the network dimensions are altered.

3.2.

Workload models

Each resource has a send probability table that defines the communication probability from that resource to the other network resources. There is also a defined global/local ratio that defines the probability of the packet being sent outside the resource. This value is used for changing the communication load of the network between the simulations. Before simulation the packet lists for the resources are automatically generated according to the probabilities defined in the send probability tables.

3.3.

Performance monitors

To observe the network behavior there are monitors implemented to gather information on the data propagation in the network and the status of the switch buffers. During the simulation the monitors report the buffer status of each switch and the injection and transport delays of the packets as they are sent and received in each clock cycle. When the simulation is complete the monitors compute and report statistical information about the communication load in the network. The statistical information includes maximum and average values of

4. Experimental results To test the network behavior in different types of applications four communication patterns were created, representing a network communication that different application mappings could produce. In each case it was investigated how much packets could be delivered with a certain network load. This was done by running successive simulations, varying the global/local ratio of the resources' send probability tables. The application mapping quality in different cases could be seen from the number of packets that could be handled with a similar network load. The transmission delays that the packets experienced were used as the measure of the network performance. case A

case B

Case C represents a direction-optimized mapping, where one of the send directions is dominant in each resource so that the dataflow forms a pipe-like path. All the communication by the resources is to the closest neighbor only, except for one resource at the southwest corner that sends random packets representing global control to all resources. Case D is a superset of case B since there are packets also sent to the cornering neighbors of each resource.

4.2.

Simulation results

The waiting times the packets experience during the transmission, representing the performance of the network with different global/local –ratios, are presented in Figure 2. Similarly, the use of buffers is shown in Figure 3. 4.5

A

4

AVG. WAITING TIME

switch buffer utilization, injection delays of the packets, packet transport delays and the hop counts of the packets. The data is written to files in such a format that it is convenient to read it in a visualization tool of choice for graphical analysis.

3.5 3 2.5 2 1.5 1 0.5

D

0

C

0

10

20

30

40

50

60

70

80

90

100 B

G /L [ % ]

case C

case D

MAX. WAITING TIME

30

A

25 20 15 10

D C

5 0 0

10

20

30

40

50

60

70

80

90

100

B

G /L [ % ]

Figure 2. Average and maximum packet waiting times in the different cases Figure 1. Communication patterns in the test cases

4.1.

Workload patterns

Figure 1 shows the communication patterns that were chosen to be used in the simulations and out of which the send probability tables for the resources were generated. Case A represents a random communication pattern, where each resource sends packets to all of the other resources in equal probability. Case B represents a clustered communication model, where each resource communicates with all of the one-hop distance neighbors with equal probability. This also represents a distance-optimized case since all the packets only have to travel one hop to reach the destination.

Figure 2 shows the average and maximum waiting times the packets experience during the propagation over the network. These waiting times are caused by network congestion and they show the network behavior in different applications. It can be seen that the random communication pattern of case A gives substantially higher delay values than the other cases. This was expected since this case has the potential for the largest average number of hops between the communicating resources and, therefore, the largest amount of overlapping in the communication paths. The maximum waiting times for case A go outside the chart, but the maximum value for case A at G/L=100% was 35. The second longest waiting times were reported in case D; case C had a little lower values. There are some unexpected variances in the maximum waiting time curves, such as the peak in case A at G/L=50%, that are caused by the random nature of the communication.

AVG. NUMBER OF USED BUFFERS

Case B did not have waiting times with any network load. This is because the communication was only between the neighboring resources, so the packets did not interfere with each other at all. Therefore, case B would represent an ideal mapping where there is no network congestion present with any communication load. 80

A

70 60 50 40

D

30

C B

20 10 0 0

10

20

30

40

50

60

70

80

90

100

MAX. NUMBER OF USED BUFFERS

G /L [% ]

120

A

100 80

delays and the number of occupied buffers when different application mappings are applied. The application mapping should, therefore, be evaluated with abstract models as early as possible in the NoC design. The network simulator presented in this paper can be used for evaluating different mappings. Future work includes full parameterization of the simulator properties so that no manual work is needed when network dimensions are changed. The creation of the workload models (in the form of send probability tables) will then remain as the most time-consuming task in the simulation. The workload modeling should also be automated and such models should be so constructed that they represent the communication patterns of real-life applications. Further automation would include automatically finding the optimal mapping for a given application to the network resources when the simulator is constrained by the packet delays, buffer sizes and network dimensions.

60

D

40

C B

20

6. References

0 0

10

20

30

40

50

60

70

80

90

100

G /L [ % ]

Figure 3. Average and maximum number of used buffers in the different cases Figure 3 shows the average and maximum values of the use of buffers in the network. The number of used buffers indicates the packet congestion. When a packet cannot be passed from one switch to the next one, it remains in the buffer of the sending switch until the receiving switch is ready for the transfer. The curves of average number of buffers used are almost identical for cases B and C. Case D produces a little higher numbers than B and C, and case A has a substantially higher number of used buffers than the others. The curves of maximum number of buffers used have similar shapes to the curves of average number of buffers used. Comparing Figure 2 with Figure 3 it is worth noticing that even with the highest waiting times of case A the maximum number of buffers used is only 62% of the total number of buffers, and the average number of buffers used is merely 43%.

5. Conclusions This paper presented a network simulator written in SystemC language. The simulator was used to evaluate the communication load of different application mappings to a fixed network of resources. The work showed that there are significant differences in the packet transmission

[1] Benini, L. & De Micheli G. "Networks on chips: A new SoC paradigm", IEEE Computer, Vol. 35 No. 1, 2002, pp. 71–78. [2] Kumar, S. Jantsch, A. Soininen, J.-P. Forsell, M. Millberg, M. Oberg, J. Tiensyrja, K. Hemani, A. "A network on chip architecture and design methodology", Annual Symposium on VLSI, IEEE Computer Society, 2002, pp. 105 -112 [3] Dally, W. & Towles, B., “Route packets, not wires: on-chip interconnection networks”, Design Automation Conference, 2001, pp. 684– 689 [4] Soininen, J-P., Jantsch, A., Forsell, M., Pelkonen, A., Kreku, J. and Kumar, S., “Extending platform based design to Network on Chip systems”, 16th International Conference on VLSI Design, 2003, pp. 401-408 [5] Kreutzer, K., Malik, S., Newton, A. R., Rabaye, J. M., Sangiovanni-Vincentelli, A., “System Level Design: Orthogonolization of Concerns and Platform-Based Design”, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 19, No. 12, 2002, pp. 1523-1543 [6] Rijpkema, E. Goossens, K.G.W. Radulescu, A. Dielissen, J. van Meerbergen, J. Wielage, P. Waterlander, E. "Trade offs in the design of a router with both guaranteed and besteffort services for networks on chip", Design, Automation and Test in Europe Conference and Exhibition, 2003, pp. 350 -355 [7] Paulin, P.G. Pilkington, C. Bensoudane, E. "StepNP: a system-level exploration platform for network processors", Design & Test of Computers, IEEE , Volume: 19 Issue: 6, 2002, pp. 17-26 [8] Jingcao Hu, Marculescu, R. "Exploiting the routing flexibility for energy/performance aware mapping of regular NoC architectures", Design, Automation and Test in Europe Conference and Exhibition, 2003, pp. 688 -693

Suggest Documents