for profit or commercial advantage, the copyright notice, the title of the publication and its date .... Each range expression represents a Boolean cube cover which.
Design and Synthesis of Array Structured Telecommunication Processing Applications Wolfgang Meyer*, Andrew Seawright*, Fumiya Tada** * Synopsys, Inc., 700 E. Middlefield Road, Mountain View, CA 94043 ** Hitachi, Ltd., 216 Totsuka, Yokohama 244, Japan
Abstract: This paper describes an automated design and synthesis methodology for telecommunications ASIC’s. Array frames, an array structured visualization of the processing problem, are used in the specification and debugging of the design. This allows design much closer to the specification level. The array frame concept is integrated into the Dali structured control logic design environment. An SDH (Synchronous Digital Hierarchy) style example demonstrates the use of array frames. The array frame approach was successfully applied in industrial applications.
1. Introduction The processing of structured data representations is widespread in telecommunications systems. In particular, the processing of standardized array structured data representations such as in the SDH (Synchronous Digital Hierarchy) protocol is very common. The design of these system is challenging in the complexity of the data structures, the numerous layers of processing, and high transmission speed. Designers typically utilize two or three dimensional array representations to visualize the data processing applications in the system specifications. This paper describes an automated design and synthesis methodology for telecommunications ASIC’s utilizing an array structured visualization of the processing problem in the specification and debugging of the design. This paper focuses on array frames, a novel concept for design and synthesis of protocols which process array structured frames of fixed cycle length (fixed number of clock cycles). The array frames are fully integrated within the Dali structured control logic design environment [Sea96]. Array frames and the Dali environment are aimed to reduce the time to market and man-hour effort in designing these types of telecommunications applications. Array frames allow design very close to the specification level. In case of a design change, a new controller for the array frame is automatically generated. Each protocol layer is described independently, synthesized, and verified by back-annotating the simulation results on the source: the array frame specification. Furthermore, all protocol layers can be simulated and backannotated together to verify that the complete design system has the desired behavior. Separation of protocol layers also allows for easy design changes and design re-use of the protocol layers. The paper is organized as follows: The next section discusses related work. Section 3 describes basic array frame concepts and “Permission to make digital/hard copy of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and /or a fee.” DAC 97, Anaheim, California (c) 1997 ACM 0-89791-920-3/97/06..$3.50
synthesis in the context of Dali. Background information about the popular array structured protocol SDH is presented in Section 4 in order to motivate the implementation of the design example. This example, explained in detail in Section 5 describes the processing of SDH-like frames carrying ATM cells. Section 6 reviews simulation and debugging. Results are presented in Section 7. The paper ends with a conclusion and discussion about future work.
2. Related Work Dali is built on previous work by Seawright and Brewer on logic synthesis from grammatical productions [Sea94]. More efficient controller construction and optimization based on [Sea94] was presented in [Cre96]. The concept of hierarchical frames and its usage within the Dali synthesis and simulation environment was presented in [Sea96]. Figure 1 shows an overview of the Dali Entry, Synthesis and Simulation environment. Dali Entry Frame Editor Array Frame Editor
Dali Simulation Interface
Dali Synthesis Compile
HDL Simulator Testbench
Controller under Design
Debug
Other Modules in System
System Under Development Figure 1. Dali environment This paper concentrates on a method to describe array frames with a fixed cycle length. Related approaches for high level synthesis of controllers are Esterel [Ber92] and Statecharts [Har87]. Both systems however are designed to model reactive systems, where special actions are to be performed based on the urgency of internal and external events. Dali is focussed on dealing with the complexity inherent in processing structured data over time. Designs implementing complex telecommunication standard protocols today are typically performed manually with the designer writing RTL directly. Creating complex control logic in RTL can be difficult to debug, especially when multiple designs are involved.
3. Array Frame Concepts and Synthesis In Dali, protocols are described in terms of frame constructs.
The paper concentrates on describing the fundamental array frame concepts, synthesis of array frames, and how they can be applied to SDH/SONET style applications. To learn more about SDH and SONET, please see Section 4. Because of the generality of array frames, however, the tool is not restricted to these types of applications.
Array Frame Concepts An array frame describes a behavior in terms of a two dimensional grid, representing specific clock cycles. Once an array frame is active, it executes until the end. Figure 2 shows an example 4x8 array frame. Each square in an array frame represents one clock cycle. The execution order is from left to right and top to bottom. As a result, the array frame needs 4x8=32 cycles to complete its execution. The square describing the current execution cycle is called the accepting square. In Figure 2, the square s(1,6) is accepting. 0 1 2 3 4 5 6 7 0 Reg1(0,1,0,0) 1 2 3
accepting square s(1,6)
Reg2(0,0,2,3)
Figure 2. Example 4x8 Array Frame
Regions Let AF(n,m) be an array frame with n rows and m columns. A rectangular sub-array reg(n1,n2,m1,m2) of AF(n,m) is called a region of size (n2-n1+1)*(m2-m1+1) of AF(n,m). A region of an array frame cannot be outside the border of the array frame, that is 0 ≤ n1 ≤ n2 ≤ n-1 and 0 ≤ m1 ≤ m2 ≤ m-1. Regions can be as small as one square or as big as the complete array frame. They can also overlap. The region reg(n1,n2,m1,m2) accepts, whenever it covers the accepting square s(i,j), that is n1 ≤ i ≤ n2 and m1 ≤ j ≤ m2.
Actions Actions are attached to regions and they are executed each time a region is accepting. Built-in actions can be selected from a given set of actions which includes “assign”, “incr”, “decr”, “set” and “clear” operation or customized actions can be attached to any region. Multiple actions can be attached to any region and the execution of a particular action can be delayed for n cycles. Delayed actions allow for adjustment of control signals to latency present in other parts of the system.
Array Frame Execution Example Let AF(4,8) be the array frame of Figure 2 and Reg1(0,1,0,0) be a region having the action clear(x) attached to it. Furthermore let Reg2(0,0,2,3) be the region having the action incr(x) attached to it. Then x is set to ‘0’ in the first clock cycle, to value ‘1’ in the third, and to ‘2’ in the fourth clock cycle. In cycle number 9, x is cleared to value ‘0’ again. The value of x does not change for the remaining portion of the array frame.
Composing Array Frames Array Frames are composed with the other constructs of the Dali system to describe the overall behavior of the design. For example, the processing of a continuous stream of array frames -- a typical case -- is described by composing the array frame construct with a repetition operator. If AF is an array frame, then [AF]+ describes the repeating sequence of AF frames. In other words, when the last square of AF executes (AF accepts), the repetition operator causes AF to be re-started in the next clock cycle. Using sequence, hierarchy, alternative and conditional constructs more complex array structured and non-array structured processing tasks can be specified and mixed together. For example, in typical telecommunication applications such as SDH, searching for the alignment of array structured data frames is a common task which must be performed before synchronized processing of array structured data frames. The searching operation and the processing of synchronized data frames can be easily described using a combination of array structured (array frames) and non-array structured constructs (hierarchical frames) in Dali.
Array Frame Synthesis The complete specification of the sequential behavior of a design in the Dali environment is described as a hierarchical protocol tree [Sea96]. Each node of the tree corresponds to one construct in the frame hierarchy. Dali creates an initial controller circuit in a bottom up approach on the protocol tree. For each node of the elaborated tree, the already created sub-machines are composed together according to the function of the node. The machine constructed for the root of the tree implements the complete protocol. This initial circuit is then optimized in several ways. The initial circuit architecture created for an array frame node in the protocol tree is shown in Figure 3. The activate signal starts the execution of the array frame and the accept signal indicates when the array frame has completed its execution.
activate
activate activate
CounterLogic
activate
reset(i) reset(j) decr(i) decr(j) accept
M1 ...
In general, a frame represents a cycle based behavior. Each frame may have an action attached to it that executes when the frame accepts. In addition to constructs describing sequential, alternative, optional, and repeating behaviors, called hierarchical frames in Dali, an array frame construct is provided. Both hierarchical and array frames are entered using graphical editors.
region sub-machines
Mn
Figure 3. Initial circuit for an array frame
An array frame is synthesized using two counters, one counting the number of rows, the other one counting the number of columns. The counters can be shared among different array frames when they are used in a mutually exclusive manner. Counters are used, which are initialized when the array frame is activated. The counter logic increments the appropriate counter(s) in each cycle and determines when the machine accepts. These sub-machines are activated whenever the array frame is activated and check if the accepting square is “within” the corresponding region. Then, the region sub-machine executes its associated actions, possibly with certain user specified latency. For each region sub-machine, a region Boolean expression is synthesized. The Boolean expression evaluates true if and only if the array frame counters values are within the boundaries of the
region (the region accepts). The expression is created by anding together two range expressions: one for each counter dimension. Each range expression represents a Boolean cube cover which defines the counter values which are within the region for the particular dimension. The cube cover is created by summing together cubes covering the one-dimensional region. Later, logic synthesis steps will restructure and optimize the region expressions and region sub-machines together. If the region expression evaluates true and the array frame is active, then the region “accepts” and the actions are triggered. Additional delaying registers are inserted for “delayed” actions.
i
Boolean region expression
Counters j
delaying FFs
accept
activate incr(x)
Figure 4. Initial machine for a region sub-machine
4. SDH Background Information Array frames are useful to describe a wide variety of array structured processing applications. In telecommunications, the most popular array structured protocol in use today is SDH which became a worldwide standard in 1988. Background information on the SDH protocol is discussed here to motivate how array frames are applicable to these kinds of applications. The use of array frames in an SDH-like example follows in Section 5. The first level of the SDH hierarchy, known as the Synchronous Transport Module (STM1), consists of a frame 2430 bytes in length transmitted at 155.52 MBits/second. The difference between two samplings is therefore 125us. The STM1 frame is visualized as a two-dimensional array comprising 9 rows of 270 bytes each. The first nine bytes of each row are used for the section overhead (SOH), the remaining for the payload (Figure 5). 9 bytes
3
261 bytes
1
AU pointer
5
Multiplex SOH
1byte VC4 P O H
9 53 bytes ATM cell
A1 A1 A1 A2 A2 A2 C1 B1 E1 F1 D2 D1 D3 H1 H1 H1 H2 H2 H2 H3 H3 H3 B2 B2 B2 K1 D4 D5 D8 D7
The 9x9 byte non-payload portion of the STM1 frame is called the section overhead (SOH). Each individual byte has a special meaning. Bytes A1 and A2, for instance, are used for frame synchronization. B1 carries a checksum for the preceding frame. D1 through D3 are used for administrative network purposes and so forth. H1, H2, and H3 bytes encode the AU pointer information. More details about SDH/SONET can be found in [SeR92], [ITU93a] and [ITU93b]. The design and synthesis of controllers processing this data stream has several major challenges. First, thinking about the controller in terms of a finite state machine (FSM) -- the traditional approach at the RTL level -- is time consuming and error-prone. Changes of the specification often require a complete redesign of the controller. Second, the layered structure of this type of protocol adds enormous complexity in the case of designing an FSM to process this data. This is especially true when trying to model the AU pointer process, which allows the payload to “float” across frame boundaries. Thus, debugging and verifying on the RTL level requires a large effort.
5. Design Example To demonstrate the capabilities of array frames, a simplified design example showing a receiver of SDH-like frames carrying ATM cells was designed and implemented using array frames. The example shows common features of typical telecommunication systems such as multiple layers, frame alignment, and floating payload processing. The example circuit SDH_mini implements a simplified receiving of atm cells contained in virtual container vc4 transported in the payload section of stm1 frames. To model this behavior in Dali, it is best to split the frame constructs according to the protocol layers. This splitting process keeps the constructs simpler and therefore helps in debugging and verifying the system.
STM1
Regenerator SOH
contain portions of a prior VC4 and portions of the next VC4. The beginning of the next VC4 is “pointed” by the AU pointer. Adjustment of the pointer offset is performed by a complex protocol. For a particular STM1 frame, the VC4 offset may be adjusted forward/backward by three bytes as specified via the AU pointer. This process is called positive/negative pointer justification. Under certain conditions, the AU pointer offset can be initialized to a new offset.
261 bytes
K2 D6 D9
D10 D11 D12 Z1 Z1 Z1 Z2 Z2 Z2 E2
Figure 5. SDH hierarchy The transport of ATM cells in STM1, for example, uses VC4 virtual containers in a byte synchronous cell stream. The VC4 containers float across STM1 frame boundaries. This means that the payload data of each STM1 frame (261x9 = 2349 bytes) may
Figure 6 shows the top level frame definition for the example circuit SDH_mini. This frame describes the concurrent processing of three different layers. The top layer describes the processing of stm1 frames including frame alignment and floating payload detection. The frame stm1 is implemented as a 4x8 array frame and described in detail below. The second layer describes the processing of vc4 frames contained in the payload section of stm1 frames. Before starting the vc4 frame, run_vc4 is cleared for technical reasons. The frame vc4 is implemented as a 4x5 array frame comprised of path overhead and atm cell data and described in detail below. The last layer finally describes the receiving of atm cell data. Again, run_atm is cleared for technical reasons. The continuous processing in each layer is modeled by using repeat ([ ]+) frames. Furthermore, two run/idle frames are used for the VC4 and ATM layer. Run/idle frames allow to suspend and resume the execution of a certain behavior based on a condition. The endless repetition of vc4 frames for instance will be performed only as long as the condition run_vc4 is true. If the condition run_vc4 evaluates to false, the processing of vc4 frames is suspended, that is the array frame vc4 will idle in its current state until vc4 is set to true again.
SDH_mini
stm1
Layers:
0 1 2 3 4 5 6 7 0 1 2 3
+ stm1
1
run_vc4
STM1 Layer
1 clear(run_vc4)
+ vc4
VC4 Layer
Regions run_atm
1 clear(run_atm)
Symbol
+ atm
ATM Layer
Meaning
Actions
soh(0,3,0,2)
clear(run_vc4)
payload(0,3,3,7)
set(run_vc4)
A1(0,0,0,0)
!A1 && search: clear(width_stm1)
A2(0,0,1,1)
!A2 && search: clear(width_stm1) clear(search)
H1(2,2,0,0)
data_in==”1”:set(run_vc4)
H2(2,2,2,2)
data_in==”1”:clear(run_vc4)
Concurrent execution of parallel processes Sequential execution Terminal frame. One clock cycle delay
1
vc4 0 1 2 3 4
clear(run_vc4)
+
0 1 2 3
Action Endless repetition of frame stm1
stm1
x
atm
Run / Idle construct. Sub-frame atm executes when Boolean expression x is true, otherwise idles. Figure 6. SDH_mini example top level
Regions
Actions
poh(0,3,0,0)
clear(run_atm)
payload(0,3,1,4)
set(run_atm)
The array frame stm1 (figure 7) implements a simplified alignment and the processing of the SOH layer in an SDH-like protocol. The main purpose is to process the overhead, monitor the alignment, and to adjust the floating payload.
SOH data is processed.
The size of the array frame stm1 is 4x8 and uses two up counters width_stm1 and height_stm1. Regions A1 and A2 implement the searching for the alignment pattern A1A2 on the input stream. When search is true and the input stream does not equal a1 in the first cycle, the counter width_stm1 is cleared. Similarly, when search is true and the input stream does not equal A2 in the next cycle, the counter width_stm is cleared. As a result, the processing of frame stm1 continues only if the pattern A1A2 has been received on the input stream. At that point in time, search is cleared and processing continues.
Justification is implemented using additional control of run_vc4. If region H1 accepts and the data_in equals 1, positive justification is performed by setting the signal run_vc4 delayed for one cycle. Similarly, if region H2 accepts and the data read is 1, negative justification is performed by clearing the run_vc4 signal delayed by two cycles. Loading a completely new AU pointer value can be implemented by loading new values for width_stm1 and height_stm1. Note, this justification mechanism is an intended oversimplification of the actual AU pointer processing to keep the example simple.
The second task of array frame stm1 is to model the floating virtual container in the payload portion of stm1. The size of virtual container vc4 is chosen to be exactly the size of the payload portion of frame stm1, that is 4x5. The main idea is to use a run/idle frame to control the execution of frame vc4. Regions soh and payload perform this basic requirement in setting/clearing run_vc4, the control variable for frame vc4. As a result, as long as stm1 is processing the SOH, run_vc4 is low, which suspends vc4. When stm1 is processing the payload section, run_vc4 is set to continue execution of vc4. This way stm1 controls the execution of vc4. Not taking justification into account, vc4 is suspended as long as the
The frame vc4 implements processing of virtual container information comprising the path overhead (POH) and the ATM cell data. This virtual container floats inside the stm1 payload. The main task is to separate the path overhead information from the actual payload data, in this case ATM cells. Like the stm1 frame, the top level behavior is described as an endless repetition of the array frame vc4: [vc4]+. The size of the array frame vc4 corresponds to the size of the payload of stm1: 4x5 bytes. However, due to the floating nature of the virtual container, the accepting square in the 4x5 payload of the stm1 frame may correspond to a different accepting square in the 4x5 vc4 frame.
Figure 7. Array frames stm1 and vc4
The frame vc4 controls atm via run_atm in the same way stm1 controls vc4. As long as the input belongs to the path overhead POH, atm is suspended by clearing run_atm. The final layer in the example, atm, is intended to model a receiver for ATM cells described via hierarchical frames like the design presented in [Sea96]. This receiver performs additional cell delineation functions. It finally outputs the valid ATM cells received in the payload part of vc4. The signal atm_valid is high, when the output data represents a valid byte of a received ATM cell.
In a typical design flow, the design is first synthesized including debugging information. When no more problems can be found in the debugging process, an optimized design is synthesized and the generated HDL code is passed to a downstream synthesis tool.
Figure 8 shows a timing diagram of some important signals in the system. On the left side, the vc4 is completely aligned within the payload of stm1. Therefore, the first atm cell is received 3+1+2 cycles after an aligned stm1 is started. Three cycles are used for reading the SOH. One cycle is used for reading the POH. Two cycles are the delay between stm1 and vc4, and the delay between vc4 and atm. The right hand side shows the virtual container shifted, where it is out of sync by one cycle. In this case, one byte of an ATM cell is read between the SOH and POH.
data_in
A1A2
soh done
run_vc4
poh done
justification
run_atm atm_valid 1 2 3 4... cycle
Figure 9. Back-annotation of SDH_mini example
Figure 8. Timing diagram The strength of this design is that the different layers are cleanly specified and clearly separated. The complexity of each layer is low. For example, the region poh of vc4 is simply formed by the first cycle of each row. The main purpose of the example is to illustrate some components of a typical telecommunication system and how they can be handled using array frames. The example can be expanded to model the more complex pointer processing in SDH. Descrambling and error checking functions can be implemented as additional regions and actions. The use of delayed actions is particularly useful for these tasks in handling the latencies involved in descrambling and error checking.
6. Simulation and Debugging Synthesis for simulation includes additional debug information to allow back-annotation of the simulation on the source, e.g. the array frame(s). While cycling through the array frame, the accepting square is highlighted. Figure 9 shows a screen shot of the two array frames used in the SDH_mini example and their relationship to the simulated waveforms. The waveforms show the clock clk, counters for stm1 and vc4 as well as run_vc4 and run_atm signals. Breakpoints can be set on regions, so that the simulation will stop, every time a region is accepting. This allows the designer to stop at a particular region in one layer of the protocol and to examine the behavior of the other layers at that point in time. For example, the designer can place breakpoints on the AU pointer regions (H1, H2, H3) to observe if the pointer processing protocol is properly performed.
7. Results Table 1 shows results in applying the array frame approach to several design examples. The first row shows the SDH_mini example. This design was automatically partitioned by Dali into three FSMs of 10, 6, and 5 states. The design was compiled for minimum area and mapped to the LSI_10k library. The next five rows (design1 to design5) are components of an industrial telecommunication system specified using a combination of hierarchical frames and array frames. These designs are part of a larger ASIC (70,000 gates), which was fabricated using a 0.5u ASIC library. The last two rows describe another industrial example. The data type of input frame for this design is one bit serial, which is different from design1 to design5, which is 8 bit parallel. The frame length is serial 4096 bits (512 bytes). Handling this frame by an appropriate two-dimensional array frame is useful for simplifying the definition of data timing, so that an array frame with a size of 16x256=4096 was used. Design design6 shows results using an ASIC library, design6 (FPGA) shows results using an FPGA target library. These results were created using the Dali system from the frame specifications. The state of the controller is represented by the array frame counters and an overall controlling FSM. After verifying the functionality of the designs using back-annotation, the controller architectures were optimized and HDL code was generated. The gate count and critical path delay is shown after synthesizing the generated HDL code using a commercial logic synthesis tool.
Table 1: Design Results Design
Size
Regions
Actions
Counters
7 2
8 4
2+3 bits 2+3 bits
10+6+5
774 gates
13.97ns
design1
9x90
6
8
4+7 bits
5
942 gates
6.63ns
design2
9x90
6
8
4+7 bits
59
958 gates
6.83ns
design3
9x90
9
33
4+7 bits
411
1804 gates
8.08ns
design4
9x90
7
17
4+7 bits
11
2324 gates
8.08ns
design5
16x8 16x8
7 9
15 16
4+3 bits 4+3 bits
7+6+2+4+2
1898 gates
9.90ns
design6
16x256
7
7
4+8 bits
34
531 gates
5.55ns
design6 (FPGA)
16x256
7
7
4+8 bits
34
90 core cells
28.00ns
A simplified example application -- SDH frames carrying ATM cells -- was provided to emphasize the main advantages of the new approach. The example included frame alignment, AU pointer handling (justification), and ATM cell receiving. The array frame approach was successfully applied to industrial designs. The designers felt comfortable describing the system using array frames and expressed the productivity enhancement the tool provided in the following areas:
• • •
Path Delay
4x8 (stm1) 4x5 (vc4)
This paper presented array frames, a new concept to design and synthesize controllers for cycle fixed frames in processing structured data.
•
Area
SDH_mini
8. Conclusions
•
States
The designer works close to the familiar specification level. Protocols layers can be modeled and verified individually as well as in conjunction with the complete system. The graphical entry mechanism allows for easy design changes. Back-annotation of the simulation results on the graphical entry reduces the overall design cycle time. The designer does not have to think in terms of an FSM, it is generated automatically.
9. Future Work The concept of array frames can be expanded to cover the onedimensional and three-dimensional cases. A one-dimensional array frame is simply a degenerated two-dimensional, a threedimensional array frame uses another counter and modified region expression as well as counter logic. Using a three-dimensional array will help in describing the multiplexing of different channels.
10. Acknowledgments The authors wish to thank: Minoru Inayoshi for the opportunity to engage in this collaborative work; Kazuyuki Miyashita for tool evaluation and industrial application; Akihiko
Takase, Masahiro Takatori, Yoshihiro Ashi and Toyokazu Tatsuta for providing in depth knowledge of telecommunication design; and Raul Camposano, Barry Pangrle, Ulrich Holtmann, Rob Verbrugghe, Ping Yeung and Pradip Shah for developing Dali.
11. References [Ber92] G. Berry and G. Gainsayer, “The ESTEREL synchronous programming language: design, semantics, implementation”, in Science of Computer Programming, Nov. 1992, vol. 19 (no. 2): pp. 87-152. [Cre96] A. Crews and F. Brewer, “Controller Optimization for Protocol Intensive Applications”, in Proceedings of the European Design Automation Conference 1996, Geneva, Switzerland, September 1996, pp. 140-145 [Har87] D. Harel, “Statecharts: A Visual Approach to Complex Systems,” in Science of Computer Programming, Aug. 1987, vol. 8 (no. 3), pp. 231-275. [ITU93a] “Network Node Interface for Synchronous Digital Hierarchy”, ITU-T Recommendations G.708, March 1993 [ITU93b] “Synchronous Multiplexing Structure”, ITU-T Recommendations G.709, March 1993 [Sea94] A. Seawright and F. Brewer, “Clairvoyant: A Synthesis System For Production-Based Specification,” in IEEE Trans. on VLSI Systems, June 1994, pp. 172-85. [Sea96] A. Seawright, U. Holtmann, W. Meyer, B. Pangrle, R. Verbrugghe, and J. Buck, “A System for Compiling and Debugging Structured Data Processing Controllers”, in Proceedings of the European Design Automation Conference 1996, Geneva, Switzerland, September 1996, pp. 86-91 [SeR92] Sexton, Reid, “Transmission Networking: SONET and the Synchronous Digital Hierarchy”, Artech House Inc. March 1992 [Tou93] H. Touati and G. Berry, “Optimized Controller Synthesis Using Esterel”, in Proc. International Workshop on Logic Synthesis IWLS’93, Lake Tahoe, 1993