OSIRIS: Automated Synthesis of Flat and Hierarchical Bus ... - CiteSeerX

3 downloads 0 Views 118KB Size Report
OSIRIS: Automated Synthesis of Flat and Hierarchical Bus. Architectures for Deep Submicron Systems on Chip. Nattawut Thepayasuwan, Alex Doboli.
OSIRIS: Automated Synthesis of Flat and Hierarchical Bus Architectures for Deep Submicron Systems on Chip Nattawut Thepayasuwan, Alex Doboli Department of Electrical and Computer Engineering State University of New York at Stony Brook Stony Brook, NY, 11794-2350 {nattawut, adoboli}@ece.sunysb.edu Abstract This paper presents a bus architecture (BA) synthesis algorithm for designing the communication sub-system of an SoC. The novelty is that a potential variable at physical level, namely, total bus length is comtemplated during the synthesis process. The algorithm generates both flat and hierarchical bus architecture using performance parameters, i.e., bus length, topology complexity, potential for communication conflicts over time. BA synthesis results for a network processor is discussed.

IP 1

IP 2

IP 3

IP 4

IP 5

IP 1

Br

IP 2

IP 3 a) A Flat Bus Architecture (FBA)

IP 5

IP 4

b) A Hierarchical Bus Architecture (HBA)

Figure 1: Flat and Hierarchical Bus Architectures Modified Select−Eliminate Algorithm Initial Synthesis Table

1.

INTRODUCTION

This paper proposes a methodology for synthesis of bus architectures of an SoC implemented in a VDSM process. The methodology is an incremental work based on [2] [3] to capture both flat and hierarchical architecture. The proposed BA synthesis algorithm identifies the set of possible building blocks (using the proposed PBS bitwise generation algorithm), and assembles them into flat or hierarchical topologies using simulated annealing (SA) as an exploration engine. The crux of the BA synthesis algorithm is an efficient way of pruning the solution space by using a specialized data structure, called bus architecture synthesis table (BAST), and a dedicated pruning method, named SelectEliminate (SE) algorithm. SE method discards buses with complex and redundant connectivities before SA would generate them. For HBA, the number and identity of hierarchical buses are probabilistically chosen under the constraint that the complexity of the BA hierarchy should not exceed a fixed limit. The paper offers experimental results for a network processor. Produced BA are sensitive to the bus delays caused by interconnect parasitics, and avoid having many communication conflicts on a link. Traditional BA synthesis methods are not effectively coping with these issues. Hence, this work contributes to the wider design objective of automatically integrating IP cores by offering a method for designing the communication sub-system of an SoC. The paper is organized as follows. Section 2 discusses the BA synthesis. Experimental results are given in Section 3. Finally, we put forth our conclusions.

2.

BUS ARCHITECTURE SYNTHESIS

The proposed BA synthesis flow is shown in Figure 2. The methodology uses an iterative improvement algorithm like Simulated Annealing (SA) to explore the solution space for both FBA [3] and HBA. FBA consists of a set of primary

Select PBS P1 New Cluster

1− P1 Existing HBS

Validate PBS

P

HBS i

Choose HBS

P (PBS HBS i

i

)

Choose Connecting−PBS

BAST for FBA

BAST for HBA

Final Bus Architecture

Figure 2: HBA Synthesis Flow bus structures (PBS) defined in [3]. HBA contains numerous FBA with different hierarchy levels, defined by number of bridges. Using the HBA synthesis flow, our methodology will generate an efficient BA considering the desired number of HBS and hierarchy levels. Initially, a modified BAST is generated by adding core requirements rows to the original BAST [3]. Once a PBS is selected, probabilities are assigned whether the selected PBS builds a new HBS, or is added to existing HBS. P1 and 1 − P1 represent probabilities of initiating a new HBS, and adding to the existing HBS, respectively. If P1 increases, the synthesized BA tends to have more non-hierarchical structures. If P1 is small, the BA results in less number of HBS but higher number of hierarchy levels. If an existing HBS is selected, the new PBS is assigned to the available HBSi with specific probabilities, PHBSi . To lower maximum hierarchy levels in a BA, these probabilities, PHBSi must be dynamically changed. The probability, PHBSi , is inversely proportional to hierarchy level of each HBS. An HBA with higher hierarchy levels

Proceedings of the IEEE Computer Society Annual Symposium on VLSI Emerging Trends in VLSI Systems Design (ISVLSI’04) 0-7695-2097-9/04 $20.00 © 2004 IEEE

must have a smaller probability assigned to them, so that the hierarchy levels are balanced among existing HBS. Updating the BAST is based on assigning the selected PBS to new or existing HBS. If a new HBS is introduced, the synthesis table is updated using the unmodified SelectEliminate algorithm. If a PBS is added to an existing HBS, the adapted eliminate algorithm (in Subsection 4.2) is used. The updated synthesis table is fed back to the beginning of the flow to select new PBS. The addition of a PBS to existing HBS option must be validated. The selected PBS maybe invalid, if there is only one core requirement after table updating. This is to ensure that there is no bridge connecting a bus to an IP core (a bus with one core only). Such a structure is not practically feasible. The synthesis flow stops when all connectivity requirements are met. Performance evaluation is needed for the HBA produced inside the exploration loop. Multi-objective cost functions for FBA characterize a BA by bus length, bus utilization, number of PBSs, communication conflict and maximum [3]. Such a cost function is of limited use for HBA, where inter-PBS communication is allowed. To cope with this problem, we introduced a new representation, called “cluster graph”, to capture HBS characteristics. Definition: A cluster graph G (V, E) is a graph where V and E are the set of PBS in HBS, and the set of inter-PBS communication, respectively. Ci = {c|c  cores in P BSi }, and Ei = {e|e  interP BSi communication links}. Following parameters form the multi-objective cost function: 1. Inter-PBS Communication load, Cinter , is the communication load across buses. This represents the amount of information flow through all bridges. Inter- communication load of a hierarchical structure is Cinter = P W is the communication load across iCi WEi , where P Ei PBS i, WEi = jei wej . 2. Number of bridges, Nbr = dim{V } − 1, in a HBA reflects the additional circuitry, i.e., bridges, drivers. 3. Hierarchy level is the longest path in a cluster graph. It defines the upper bound for the critical path of each node. 4. Maximum critical path, Ccp , is the critical path on which the PBS associated with an HBS interfere. Ccp = dim{C} , where critical path WEi × max{critical path i}i=1 i is the longest path from any cluster node Ci to any cluster node Cj as long as the path conveys information. The cost function used for HBA synthesis is T otal cost = W × P T W = [w w n wc wml winter wbr wh wcp wu ] l P P = n [L N t bi Ci Mli Cinteri Nbri Hi Ccpi − Cui ] i i=1 where Lti is the total bus length of the ith HBS, Nbi is the number of segments in the ith HBS, Ci is the total communication conflict of the ith HBS, Mli is the maximum data loss of the ith HBS, Cinteri is the total inter-PBS communication load of the ith HBS, Nbri is the number of bridges for the ith HBS, Hi is the hierarchical level of the ith HBS, and Ccpi is the maximum critical path and Cui the bus utilization of the ith HBS.

3.

EXPERIMENTAL RESULTS

The experiment presents the BA synthesis results for a network processor [1] with communication loads indicated

HDLC

HDLC

ARB

BRIDGE

Ethernet MAC

McMAL

10/100 Mbps

EBC

GPIO

UART

I2C ARB

Ethernet MAC 10/100 Mbps

DMA Controller

Ethernet MAC 10/100 Mbps

on−chip SDRAM Controller 256 K On−Chip

BRIDGE

SDRAM

PPC440 CORE

EXT SDRAM DDR Controller

PCI−X

Ethernet MAC

133/66MHz

10/100 Mbps ARB

ARB

Figure 3: HBA for Network Processor in [3]. To enhance communication resource sharing, an HBA was also generated. Since the number of IP cores is small for the network processor, we assigned P1 = 1.0 for the highbandwidth sub-CG, so that HBA were produced. The lowbandwidth CG was allowed to have more than one bus structure. The result for the highspeed sub-CG had only 1 HBS, thus PHBS1 was set to 1.0. All weights related to HBA characteristics and layout (total bus length weight) were also set to 1.0. The synthesized HBA is shown in Figure 3. The bus architecture contains a bridge connecting two PBSs for high-speed sub-CG. The low-speed sub-CG is supported by two separate PBS. IP core connections tend to distribute more systematically to bus links (four EMAC connected to the same PBS) as compare to a FBA. However, MCMAL, HDLC, and PPC Core communications are carried using only one PBS. This is reasonable, because their communication loads are relatively low. Communication resources are shared, so that performance requirements are met using the limited resources.

4. CONCLUSIONS The proposed BA synthesis creates customized BA depending on the application specifics and performance requirements. The BA synthesis algorithm identifies the set of possible building blocks, and assembles them into a topology using simulated annealing (SA) as an exploration engine. The algorithm also employs a bus architecture synthesis table and select-eliminate method to prune dominated solutions. This reduces the exponential number of potential bus architectures. The algorithm is capable of exploring in a very short time the large solution space for SoC with many cores. It is sensitive to placement and routing information, which results in buses of higher speed.

5. REFERENCES [1] J. Darringer, R. Bergamaschi, S. Battacharyya, D. Brand, A. Herkersdorf, J. Morell, I. Nair, P. Sagmeister, Y. Shin, “Early Analysis Tools for System-on-a-Chip Design”, IBM J. of Research & Development, Vol. 46, No. 6, 2002, pp. 691-707. [2] N. Thepayasuwan, A. Doboli, “Layout Conscious Bus Architecture Synthesis for Deep Submicron Systems on Chip”, Proc. of Design, Automation and Test in Europe, 2004. [3] N. Thepayasuwan, V. Damle, A. Doboli, “Bus Architecture Synthesis for Hardware-Software Co-Design of Deep Submicron Systems on Chip”, Proc. of International Conference on Computer Design, 2003, pp. 126-133.

Proceedings of the IEEE Computer Society Annual Symposium on VLSI Emerging Trends in VLSI Systems Design (ISVLSI’04) 0-7695-2097-9/04 $20.00 © 2004 IEEE

Suggest Documents