Design and Prototyping of an E1 Drop_Insert Soft Core F. Moraes1 , N. Calazans1 , C.Marcon1 , D. Mesquita1 , J. Palma 1 , V. Blauth2 {moraes, calazans, marcon, dmesquita, jpalma}@inf.pucrs.br,
[email protected] 1
FACIN, Pontifícia Universidade Católica do Rio Grande do Sul 2 Parks S/A Comunicações Digitais
Abstract This work describes the design and prototyping of a telecommunication Intellectual Property soft core developed in the scope of an industry-academia cooperation. The soft core allows the manipulation of E1 2048KHz 32-channel carrier information. It inserts/removes data into/from E1 frame and multiframe structures. The soft core was fully described in VHDL, functionally validated by simulation, prototyped in FPGA platforms and certified using E1 testers. The main use of the developed soft core in the implementation of data communication equipment is to allow multiple users to share a common E1 carrier with a flexible scheme for bandwidth allocation. The main contribution of the paper is a reusable hardware module that increases performance and reduces latency of the data treatment when compared to existing commercial solutions.
1. Introduction This work describes the design and prototyping of an E1 carrier Drop_Insert intellectual property (IP) soft core, which was developed in the scope of an industry-academia research and development cooperation. An IP core is a complex, pre-designed and pre-verified hardware module that can be used in the composition of large circuits, typically custom VLSI integrated circuits or large programmable devices, such as multimillion-gate FPGAs. According to the way IP cores are made available to designers they can be divided into soft, firm or hard cores [1]. Soft cores are furnis hed in the form of source code, such as a hardware description language file. They are thus very flexible, but may not guarantee some functionality, such as exact timing in the final implementation, since the user of the core has to synthesize it together with the other modules in the design. IP cores are an increasingly important resource for complex digital systems design in general and particularly for telecom systems. The statement can be confirmed by noting that the SIA roadmap predicts that in 2005 90% of the digital systems designs will be carried out in the form of IP cores interconnection [2]. Also, approximately 50% of the programmable devices sold today are used in telecom applications [3]. The goal of this work was to develop a telecom IP soft core module [4], which uses the E1 carrier transmission protocol. The developed IP core operates dropping and inserting information from/to an E1 carrier frame, or simply E1 frame. This IP core respects the ITU-T Recommendations G.703, G.704 and G.706 [5][6][7]. It was prototyped in hardware to guarantee timing constraints, using Virtex and Spartan Xilinx FPGAs [8]. The developed IP soft core posed challenges in the synchronization of data and control information. Commercial chip-set solutions for the problem such as [9][10] involve a dedicated IC to mount an E1
frame and perform multiframe alignment detection, Cyclic Redundancy Check (CRC) computation and time slot detection. To execute the drop-insert function the usual solution is to add a micro-controller. The current technology imposes a great latency to perform drop-insert with this approach. On the other hand, the solution proposed here operates with a much smaller latency, allowing e.g. to cascade several dropinsert modules in single equipment without impairing transmission by excessive latency. This paper is organized as follows. The next Section presents the basic E1 frame structure. Section 3 describes the design, validation and prototyping of the E1 carrier Drop_Insert IP soft core. Finally, Section 4 presents a set of conclusions.
2. E1 Carrier Frame Structure E1 is the lowest level of the Plesiochronous Digital Hierarchy (PDH). It is among the most common ways of transmitting voice and data over telephone and data networks. The signal transported in an E1 carrier allows the transmission of up to 31 voice or data channels plus 1 channel dedicated to carry low level control information. Each channel in a frame has 8 bits and is called a time slot. Thus, a frame contains a total of 256 bits. Time slots in a frame are numbered from 0 to 31. Each time slot corresponds to a 64Kbps channel carrying 8 bits of either data or an 8KHz digitized voice sample. Bits in a time slot are numbered from 1 to 8. Time slots are combined using Timing Division Multiplexing (TDM) at 2,048MHz. Thus, a frame is transmitted each 125µs. Multiple frames are grouped to transport alignment, error detection and service information. Eight consecutive frames constitute an E1 submultiframe (SMF) structure. Two consecutive E1 submultiframes form an E1 multiframe (MF). E1 carrier equipment transmits and/or receives an MF each 2ms. Frames in an MF are numbered from 0 to 15. The basic E1 frame and multiframe structure is depicted in Figure 1. SMF
N 0 1
MULTIFRAME
2
§ § § §
3 4
I
5 6 7 8 9 10 11
II
time slot 0 1 2 C1 0 0
1
C2 0 0
1
C3 0 1
1
C4 0 0
1
C1 0 1
1
C2 0 1
1
3 0
4 1
5 1
6 0
time slot 1 7 1
8 1
3
4
5
6
time slot 2 7
8
1
2
3
4
5
6
time slot 31
... 7
8
1
2
3
4
5
6
7
8
0
1
1
0
1
1
A S4 S5 S6 S7 S8 0
1
1
0
1
1
A S4 S5 S6 S7 S8 0
1
1
0
1
1
A S4 S5 S6 S7 S8 0
1
1
0
1
1
A S4 S5 S6 S7 S8 0
1
1
0
1
1
A S4 S5 S6 S7 S8
C3 0
0
13
E
A S4 S5 S6 S7 S8
14
C4 0
0
15
E
A S4 S5 S6 S7 S8
1
2
A S4 S5 S6 S7 S8
12
1
1
1
1
1
1
0
0
1
1
1
1
C1, C2, C3, C4: SMF CRC code, computed during transmission and inserted in the SMF that follows it. E: CRC-4 error, indicates if the computed CRC is equal or not to the received CRC in each SMF. A: remote alarm indication, bit 3 of all odd frames indicates the event of a remote alarm. S4, S5, S6, S7, S8: service channel or synchronization status messages, repeated every odd frame, a channel dedicated to receive and transmit information between operators.
Figure 1 - Basic E1 MF structure, highlighting the control information structure in time slot 0. The first time slot of each frame, named time slot 0 contains MF basic control information. Two kinds of alignment are employed, frame and MF alingment. Frame alignment provides frame synchronism information, where every even frame contains a “0011011” pattern from bit 2 to bit 8, and all odd frames have a ‘1’ in bit 2. MF alignment provides MF synchronism, where every odd frame from
1 to 11 has in bit 1 a value that generates a “001011” pattern along an MF. The other time slots (1 to 31) are responsible for carrying voice and/or data. As the frame repetition rate is 8KHz, the basic frequency of the frame is 2048KHz. Time slot 16 can be used by higher level protocols to carry control information such as signalling start and end of a phone call.
3. Drop_Insert Design, Validation and Prototyping 3.1 Design The Drop_Insert circuit main function is threefold. It receives E1 frames coded either in HDB3 or AMI formats [5] and a binary data stream to insert in the incoming frame, both from external sources. Then, it replaces a contiguous set of time slots in the incoming E1 frame with the data in the binary stream. Finally, the new E1 frame and the dropped set of time slots are sent to the external world. These functions are executed continuously, in real-time. Figure 2 shows the block diagram for the Drop_Insert IP soft core. The communication with the external world can be roughly divided into four interfaces: (i) E1 input-output; (ii) binary data stream; (iii) service channel; (iv) control. f2m servCtrl 4KHz servOut servIn (2048KHz) code
CRCError MFAError BFAError synchronized
Synchronization Circuit
Reset
CRCControl
Encoder
mux2
DOCRCInsert
muxInsert
n64KHz
FIFOS MFA
Frequency Generator
DI-
mux1
dataDecod
CRC4
outmuxCRC
DO+
Decoder
DI+
addresss
SS
dataInsert dataDrop n64KHz
n64
50MHz
Figure 2 - Block diagram of the Drop_Insert. The E1 input/output interface comprises two pairs of differential signals, (DI+ , DI-) and (DO+ and DO-), being the signal code responsible for selecting between HDB3 or AMI formats. The signal f2m is the synchronizing clock for reception and transmission of E1 frames. The binary data stream interface includes binary input and output signals dataInsert and dataDrop and associated control signals. The number of time slots to replace is specified in the 5-bit control input n64, and the first time slot of the set to be replaced is determined by the 5-bit control input, SS. This determines the position of the information to substitute in the frame, since the set of time slots to change is contiguous. If error conditions occur, such as SS+n64>31, the circuit ignores the insert information. Both, dataInsert and dataDrop signals are synchronized by the n64KHz control signal, made available at the external interface as an output. This signal is internally generated from the 50MHz input reference
clock signal. The jitter tolerance specified in the ITU standard is 50 ppm. To allow meeting the specification, an internal DLL had to be used to double the input clock frequency. The doubled frequency was used to generate the n64KHz frequencies by division. Two 5-bit input and output signals , servIn and servOut, an output signal that controls the enabling of the channel service replacement, servCtrl and an internally generated clock, 4KHz, implement the service channel interface. The incoming service channel is replaced by the servIn data if the servCtrl signal is asserted. MFAError and BFAError are output signals used to notify the possible absence of multiframe and basic frame alignment, respectively. The synchronized signal informs when the system is synchronized. CRC operation is implemented by the CRCControl input signal, which controls the enabling of the CRC computation and by CRCError output signals, which notify the presence of CRC errors. The Drop_Insert internal organization comprises 6 modules. The Decoder converts an electrical signal in one of two formats, HDB3 or AMI, into a logical signal, the bits of the E1 frame (dataDecod internal signal). The Encoder has functionality opposite to that of the Decoder. The CRC4 module computes and inserts the CRC into the SMF following the current SMF. The Frequency Generator is based on the 50MHz external clock, generating the n64KHz frequency, being n64KHz a 64KHz multiple frequency, in the range from 1 to 31, as defined by the n64 input. The Synchronization Circuit is responsible for frame and MF synchronization, and for operation with or without CRC. This module controls multiplexers mux1 and mux2, using the signals muxinsert and CRCinsert. When MF synchronism is detected, this module generates the Multiframe Alignment (MFA signal in Figure 2) and a valid address signal for the FIFOS module. The module is also responsible for generating information about error detection and synchronization for the external world. The FIFOS module is a structure that allows inserting and dropping information to and from an E1 frame. This module contains in fact two first-infirst-out buffer structures implemented using dual port RAMs. This FIFOS module is needed to adapt the distinct bit rates found in the E1 carrier line and the dataInsert/dataDrop lines. It receives bits from the dataInsert input at rates smaller than 2048Kbps, controlled by the n64KHz input (ranging from 64KHz to 1984KHz), and sends bits through the dataDrop output at the rate specified in the n64KHz input. It also inserts/removes the bits in the E1 frame at 2048KHz. Suppose there is a need to insert 2 bytes of information at each passing E1 frame starting in time slot 3. This is done by setting n64=2 (thus making n64KHz=128KHz) and SS=3. Figure 3 shows the timing to read/write data from/to the buffers. It is necessary to have available two distinct addresses for writing and reading in the FIFOS module. During data insertion, one buffer is being written using the n64KHz frequency and being read using the f2m frequency. It is mandatory that the other buffer be operated at the same time for dropping. Thus, the second buffer will be written using the f2m frequency and read using the n64KHz frequency. The address generated with the n64KHz frequency is called endn64 and the other address is called end2048.
frame 1
data
S0
S1
S2
S3
frame 2 S4
S5
S31
S0
S1
S2
S3
frame 3 S4
S5
S31
S0
S1
S2
S3
S4
S5
S31
muxInsert reset
end2048 ++
reset
end2048 ++
reset
end2048 ++
end2048 endn64 ++
reset
endn64 ++
reset
endn64 ++
reset
endn64
Time
Figure 3 - Insert and drop control operation. In the example, SS=3, n64=2, n64KHz=128KHz. The muxInsert line determines, when in logic 1, the moment at which the data in the input buffer (received from the dataInsert line) should be inserted at the E1 frame. Also, it specifies the moment at which the replaced information should be taken out from the E1 frame and put in the output buffer. The time interval when muxInsert control line is at logic 1 is called drop/insert period. Thus, the end2048 is reset at the beginning of the first slot and incremented during the drop/insert period. As the incoming insert data is continuously arriving, the endn64 is continually incremented, being synchronized (i.e. reset) at each end of a drop/insert period. This shows why using dual port RAMs, since during the drop/insert period the input and output buffers may be accessed simultaneously for writing and reading. It can be shown that simultaneous accesses will never access the same address for any combination of SS and n64 values, guaranteeing the correct behavior of the FIFOS module at all times. 3.2 Validation and Prototyping The E1 carrier Drop_Insert has been described in 1800 lines of structural VHDL. Except for the FIFOS module, implemented using Xilinx FPGAs Select-RAM primitives, the description is fully portable. The functional validation is based on generating test E1 multiframes and insert information, submitting these to the core description through a VHDL simulator. Another soft core was implemented in VHDL to produce test multiframes. This core is parameterizable to produce several correct test multiframe patterns, i.e. correct slot 0 information (with or without CRC) and a fixed pattern in slots 1 to 31 (any 8-bit pattern). The testbench instantiates this core with the chosen parameterization, produces an insert pattern (1 to 31 bytes) together with n64 and SS information, and applies this set of signals, together with clock and reset, to the E1 soft core. In addition to the waveforms, the testbench produces a set of output files containing the generated E1 frames. Figure 4 illustrates the general aspect of the functional validation for the E1 soft core using a VHDL simulator. The synchronized signal indicates when a multiframe alignment was found. The dataDecod signal is obtained after the Decoder module. Shaded regions on dataDecod represent slot 0 information, while the test multiframe was generated with an all-1s pattern in slots 1 to 31. The outmuxCRC represents the E1 frame after data insertion. While muxInsert=1, new data is inserted into the E1 frame. The data being inserted is twice the “11110000” bit pattern.
Inserted data
1 frame (256 f2m periods)
pattern to insert in next frame :
“11110000”
Figure 4 - General view of an example Drop_Insert simulation for n64=2 and ss=3. A more detailed view of the same simulation is depicted in Figure 5. Again, the test E1 frame (dataDecod signal) has a constant '1' value in time slots 1 to 31. As indicated by SS and n64, the time slots 3 and 4 receive the new data, “11110000”. It should be noted however, that the inserted data stream is not aligned with the data slot limits, which is the reason why the value showed in time slots 3 and 4 of Figure 5 is “11000011” instead of “11110000”. This procedure is in accordance with the E1 carrier standard, since it is the responsibility of the data terminal equipment (DTE) at each side of the network to provide end-to-end data synchronism. Slot 0
Slot 1
Slot 2
Slot 3
Slot 4
“0 0 0 1 1 0 1 1” “1 1 0 0 0 0 1 1”
Mux Control
Figure 5 - Detailed view of a drop/insert simulation for n64=2 and ss=3. One major difficulty with the functional validation step was the number of simulation cycles needed to verify every aspect of the design, and the huge amount of output data produced. Figure 6 depicts the functional validation process where two C programs were written to help the IP validation. First, a parameterizable Generator creates test multiframes according to the parameters. The Analyzer compares simulation results with input file and input parameters. The testbench code has also a set of VHDL assert conditions to detect exceptions and critical operations.
Parameters
Analyzer
Result analysis Output file
Generator
VHDL Simulator
Input file
Assert commands outputs
Figure 6 - Functional validation process. Once the design was validated at the functional level, the Drop_Insert was prototyped and validated using E1 certification test equipment [11]. At the university laboratory, the VW -300 platform [12] was employed, with a 300,000-gate Virtex FPGA. At the enterprise laboratory, the Spartan II Evaluation Board platform [13] was used, with a 100,000-gate Spartan-II FPGA, replacing the 50MHz input frequency by a 32,768MHz clock, to generate all internal clocks precisely. The E1 carrier Drop_Insert complete design occupied around 28,000 gates. The system is fully operational, fulfilling all design constraints of the original specification. Since the Drop_Insert is relatively small and has no significant operation latency, it allows cascading several instances of it in the same FPGA. Figure 7 shows an example of how this core can be used to implement a multi-channel drop-insert module. This module allows that multiple enterprises use a single E1 line, each one allocating some fixed bandwidth as a subset of the 31 data time slots.
Output Line
DO+
BitIn
3
dataDrop
Drop_Insert
4 KHz
n64KHz
servIn servOut
dataInsert
3
Drop_Insert
4 KHz
n64KHz
servIn servOut
dataInsert
dataDrop
4 KHz
3
f2m
BitOut
DO-
servIn servOut
E1
BitIn
Drop_Insert
DI-
f2m
BitOut
n64KHz
DO-
Transceiver
Input Line
DI+
dataInsert
E1
f2m
DO+
dataDrop
f2m
DI+ FPGA
DI-
Enterprise 1
Enterprise 2
Enterprise 3
Figure 7 - Typical example of using Drop_Insert in building a product. A typical commercial value of frame propagation latency is around one frame time (256 clock cycles) [9]. The approach used in this work reduces the latency to 11 clock cycles. Figure 8 shows the latency in the Drop_Insert soft core, from the Encoder input to the Decoder output. The Encoder and the Decoder introduce each 5 clock cycles of latency. The latency introduced by the remainder modules is one clock cycle only. In cascade operation, Drop_Insert modules can be partially modified by removing the Encoder and Decoder intermediate modules, since internal communication between Drop_Insert cores may be done using a single wire. This reduces the latency of each Drop_Insert from 11 clock cycles to 1 clock cycle. Up to 31 Drop_Insert modules can be cascaded. The first Drop_Insert needs a Decoder module, the 29 next Drop_Insert do not need either Decoder or Encoder modules and the last Drop_Insert requires an Encoder module. In this case, the latency is 41 clock cycles. In addition, the removal of the Decoder and Encoder modules reduces the Drop_Insert area by 15%. This implies that the circuit can be implemented in an 800,000-gate FPGA.
ciclos
Decoder operation: 5,5 cycles
Encoder operation: 5 cycles 11 cycles
Figure 8 – Drop_Insert frame propagation latency.
4. Conclusions The subject area tackled involved drop-insert functions for E1 carriers. A better solution than available commercial implementations to solve the drop-insert problem was developed and is currently being included in a product under development. The developed IP core offers more than tenfold reduction in the latency of data processing when compared to an specific IC plus microcontroller implementation and provides also a faster and lower cost solution. Validation is well known as a critical step for VLSI design. The approach used in this work consumed 30% of the total design time with validation, which amounts to 2 months. Prototyping was even shorter, consuming only 5% of the total design time. Respecting the timing constraints in the core implementation was not critical in the design. The only timing restriction that posed some problem to be fulfilled was the n64 frequencies generation with the jitter tolerance specified in the ITU standard, namely 50 ppm. T1 carriers are very similar in structure to E1 carriers. The developed E1 carrier IP soft core can also serve as a basis for a T1 carrier IP core with minimum design effort.
5. References [1]
BERGAMASCHI, R. A.; LEE, W. R. Designing systerns-on--chip using cores. 37th Design Automation Conference, 2000, pp. 420-425
[2]
SEMICONDUCTOR INDUSTRY ASSOCIATION – SIA: 'International Technology Roadmap for Semiconductors 2002 Update'. http://public.itrs.net/Files/2002Update/Home.pdf
[3]
XILINX, INC: ‘Investor Factsheet - Third Quarter Fiscal Year 2003’. http://www.xilinx.com
[4]
VAHID, F.; GIVARGIS, T.: 'Platform tuning for embedded systems design'. IEEE Computer, 2001, 34 (3) pp. 112-114
[5]
ITU-T: 'Physical/electrical characteristics of hierarchical digital interfaces'. Recommendation G.703, 1998
[6]
ITU-T: 'Synchronous frame structures used at 1544, 6312, 2048, 8448 and 44736 kbit/s hierarchical levels'. Recommendation G.704, 1998
[7]
ITU-T: 'Frame alignment and cyclic redundancy check (CRC) procedures relating to basic frame structures defined in Recommendation G.704'. Recommendation G.706, 1991
[8]
XILINX, INC: 'Product Data Sheets'. 2003. http://www.xilinx.com/ partinfo/databook.htm
[9]
DALLAS SEMICONDUCTOR. 'DS21354 and DS21554 E1 Single Chip Transceivers'. Data Sheet. 2003. http://www.dalsemi.com
[10]
ZARLINK SEMICONDUCTOR. 'MT9075B E1 Single Chip transceiver with LIU'. Data Sheet. 2003. http://www.zarlink.com
[11]
ACTERNA CORPORATION: 'Acterna PFA-35 Comprehensive E1 and Data Tester'. 2003. http://www.acterna.com/products/descriptions/PFA-35
[12]
VIRTUAL COMPUTER CORPORATION: 'The Virtual Workbench Guide, Version 1.02.2'. Reference Manual, 1999
[13]
AVNET, INC: 'Xilinx SpartanII Evaluation Kit'. 2003. http://www.avnet.com