Solid-State Circuits, IEEE Journal of - IEEE Xplore

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 7, JULY 1999

1001

SACHEM, a Versatile DMT-Based Modem Transceiver for ADSL L. Kiss, K. Adriaensen, C. Gendarme, E. Hanssens, M. Huysmans, F. Van Beylen, and H. Van De Weghe

Abstract— The complete digital processing (physical medium dependent) of the discrete multitone modulation scheme and transmission convergence layer has been integrated into a single device, processed in 0.35-m standard CMOS process. Power and area are kept well within limits, making this device a cost-effective solution for asymmetric digital subscriber line and network terminating systems. New design methods were used to meet the severe time-to-market constraint of this high-complexity device. Index Terms— Asymmetrical digital subscriber line (ADSL), digital signal processing (DSP), discrete multitone (DMT), fast Fourier transform (FFT), very high-level description language (VHDL), very large-scale integration (VLSI), Viterbi.

I. INTRODUCTION

T

HE most important feature of an asymmetrical digital subscriber line (ADSL) is that it can provide high-speed digital services on existing pair copper wire, in overlay, and without interfering with the traditional analog telephone service (plain old telephone system); see Fig. 1. ADSL can offer, due to its highly efficient line coding technique, new services like high-speed Internet and online access, home working, and video on demand to every residential telephone subscriber. The technology is largely independent of twisted pair characteristics, thereby enabling it to be applied universally, virtually regardless of the actual parameters of the local loop. The modulation technique for ADSL, which has been standardized in TI.413 [1], is discrete multitone (DMT), a special form of multicarrier modulation [2], [3]. Fundamentally, DMT modulation superimposes several carrier-modulated waveforms to represent the input bitstream. The DMT transmit independent subsignals, signal (see Fig. 2) is the sum of each of equal bandwidth and equispaced with center frequency . Each subchannel can be considered as a quadrature amplitude modulated (QAM) signal. In a DMT modulation scheme, the number of input data bits allocated on distinct subchannels is variable. Obviously, subchannels that encounter less attenuation and less noise will carry more bits of information. The chip we propose reaches the highest integration level. The complete digital signal processing (DSP) for ADSL DMT-based modem functionality and transport convergence functions such as (de)interleaving, Reed–Solomon (RS)

Manuscript received October 26, 1998; revised January 19, 1999. The authors are with the Microelectronics Department, Alcatel Antwerp, Antwerp 2018 Belgium (e-mail: [email protected]). Publisher Item Identifier S 0018-9200(99)04729-0.

Fig. 1. ADSL network.

Fig. 2. DMT frequency division.

(de)coding, (de)scrambling, and (de)framing is integrated in the single device called SACHEM. II. ARCHITECTURE SACHEM is used in both central office (line termination) and remote applications (network termination) and is designed for sampling rates up to 8.8 Ms/s with DMT symbols at 4 kHz. On the other hand, it can interface with asynchronous transfer mode (ATM) devices through a Utopia interface (level 1 and level 2) or synchronous devices through a SLAP interface

0018–9200/99$10.00  1999 IEEE

1002


Fig. 3. SACHEM architecture.

Fig. 4. DSP front-end architecture.

(Alcatel propriety). The following main functions can be distinguished in SACHEM (see Fig. 3): • up- and downsampling; • time-domain equalization (TEQ); • time–frequency conversion and vice versa; • frequency-domain equalization (FEQ); • symbol alignment; • frequency deviation tracking; • constellations (de)coding and tone ordering; • channel (de)coding; • ATM. A. DSP Front End The DSP front end contains a transmit part that performs filtering and upsampling, a receive part that performs

downsampling, time-domain equalization, and some test functionality as bypass and transmit–receive looping (see Fig. 4). The receive path performs decimation and time-domain equalization. The decimator (see Fig. 5) receives 16-bit words at 8.8 MHz from the analog front end and reduces the rate to 552 kHz in a (CO) central office application and 2.2 MHz in a remote (R) application. Downsampling by a factor of 16 is performed by a cascade of half-band finite impulse response (FIR) filters: two three-tap triangular filters reducing the rate Hamming compensated by four, followed by a 15-tap FIR filter reducing the rate by two, and finalized by a 59-tap Hamming compensated FIR filter bringing the rate to 552 kHz. The factor-of-four downsampling is obtained by dropping the up-front triangular filters and achieving an output rate of 2.2 MHz. The time equalizer is a FIR filter with programmable coefficients, mainly intended to reduce the effect of intersymbol interference (ISI) by shortening the

KISS et al.: SACHEM DMT-BASED MODEM TRANSCEIVER

1003

Fig. 5. Decimator architecture.

Fig. 6. Receive FFT.

channel impulse response. Length is determined by the type of application, 64 taps in central office and 32 taps in remote configuration. The transmit direction includes sidelobe filtering, clipping, delay equalization, and interpolation. The sidelobe filtering and delay equalization are implemented in a three-stage and twoinfinite impulse response stage biquad [second-order FIR (IIR)], thus reducing the effect of echo. Clipping limits the amplitude of the output signal by a FIR type of structure and as such optimizes the dynamic range of the analog front end. The interpolator performs an upsampling of two, from 4.4 to 8.8 MHz, in a CO application by a seven-tap triangular FIR filter. An upsampling of four, from 2.2 to 8.8 MHz, is performed in remote application by a simple hold function. The noise shaper reduces word size from 16 to 13 bits by a one-order IIR and thus minimizes noise introduction by word-size reduction. B. FFT, Rotor, FEQ, and FTG The FFT is instantiated twice in SACHEM. It is used as a DMT carrier demodulator in the receive direction and as a modulator in the transmit direction. It is a programmable machine with an instruction set, which can do all the processing

for one DMT symbol in less than 250 s and is based on a dedicated pipeline multiplier-accumulator arithmetic logic unit (ALU). The ALU contains two 20 18 fixed-point multipliers and two buses: one for data, two times 20 bits; and one for coefficients, two times 18 bits. The ALU performs complex radix-2 and radix-4 decimation in-time (I)FFT butterflies, special “resolve” butterflies to combine results of real FFT’s, scaling, and complex times complex multiplications with complex time real multiplication. In the receive direction, the FFT (see Fig. 6) used as a DMT carrier demodulator performs the following functions. • A real time to positive frequencies, from 512 (CO) or 128 (R) time samples to 256 (CO) or 64 (R) complex positive frequencies with a maximum computing delay of 92 s. • Frequency equalization, a rotation (360 maximum) to and axes, is align the received carriers on the performed to reduce signal phase rotation by carrierspecific channel distortion. Signal amplitude attenuation by the same distortion can be compensated by applying fine gain, between 0 and 6 dB, on the FEQ computation and by doing so adjusting the received vector to the demapping grid. FEQ calculation is performed within

1004


15 s for remote and 5 s for central-office application. • ROTOR, performed on positive frequencies, performs a linear phase correction to compensate a misalignment of the sampling clock. Actually, it interpolates the sampling clock of the received data to any intermediate point but in the frequency domain. The following formula applies:

where is the resulting frequency component afis the content of an accumulator ter FEQ and for each next frequency. This incremented with ROTOR process is performed in two steps, a coarse adjustment followed by fine adjustment, the last step using the six least significant bits of the ROTOR value. The computational delay is 30 s in remote and 9 s in central-office application. Refer to the next section for a detailed explanation on this feedback loop. In the transmit direction, the IFFT performs complementary operations. • Fine tune gain (FTG), meant to correct gain of individual carriers, an operation taking 15 s for 256 frequencies (R) and 30 s for 512 frequencies (CO). • ROTOR calculation to adjust the frequency error between local tal and desired transmit frequency. Computing delay of both operations, coarse and fine is 30 s for remote mode and 9 s for central-office application. • The IFFT performs a positive frequency to real-time samples conversion: in remote mode, 256 positive frequencies are processed, yielding 512 time samples within 76 s, while in central-office mode, 512 positive fre256 interpolated quencies (256 computed frequencies frequencies) are converted to 1024 time samples within 178 s. C. ROTOR Loop As explained above, the ROTOR function allows for a tracking of the receiver clock onto the transmitter clock. For this purpose, a feedback loop is used as shown in Fig. 7. A dedicated tone is taken as pilot, and the phase of this tone is closely monitored. A measure of phase error (15 bits) with respect to the correct demapping point is fed into an integrator filter (DPLL), whose output is given to the ROTOR function to apply to each tone . as a measure of the correction This feedback loop is so designed that a constant frequency error between the central-office clock and the remote clock can be corrected without any residual phase error after the ROTOR cannot grow infinitely, sample skip or function. As the duplicate will be performed each time the phase correction to apply will be equivalent to one sample. Actually, the skip or duplicate action will be performed when 3/4 of such an equivalent sample phase is reached in order to reduce the dynamic of the ROTOR while keeping a sufficient hysteresis. Fig. 8 shows the typical evolution of the correction phase (input to ROTOR) for a startup phase mismatch. Notice the discontinuity corresponding to a sample skip.

Fig. 7. ROTOR loop.

Last, instead of performing the same process of DMT clock tracking on the receive path at the central office as it is done at the remote, the adjustment is performed up front for the upstream channel, i.e., in the transmit path of the remote. This precompensation is better in terms of ISI and makes the remote work as the slave while the central office is the master. D. Constellation (De)coding The receive part mainly contains the following blocks. • The demapper, which converts the FFT computed constellation points to a block of bits by use of a programming table (tone ordering). This essentially consists in identifying a point in the two-dimensional (2-D) QAM constellation plane. It is also capable of demodulating four-dimensional (4-D) Trellis encoded carriers [4], [5] by the 2-D subset information provided by the Trellis decoder. • The Viterbi decoder, further detailed in Section II-E. • The monitor computes error parameters to be used for updates of adaptive filter coefficients (FEQ, TEQ), clock phase adjustment (DPLL) and error detection (loss of signal, loss of frame). Signal detection, also part of monitoring activities, is build around eight configurable leaky integrators whose outputs are fed to a highly programmable level detector. Error parameters obtained by linear monitoring can be used for automatic updates of FEQ coefficients. This adaptive process can be inhibited for pilot tone and also in case of loss of signal in order to avoid incorrect coefficient updates. The transmit block has less complexity and only contains the following functions: 4-D Trellis encoding and mapping of data by use of programming table (tone ordering). The Trellis encoder fetches information of a pair of tones and adds one redundant bit, thus creating an overhead of 1/2 bit per tone. E. Viterbi Decoder The purpose of the Viterbi decoder is to estimate the most likely 4-D subset based on a long data sequence. According to the ANSI specification [1], the 16-state 4-D trellis code of a word error indicator [5] is used. Therefore a 16-state Viterbi


1005

Fig. 8. ROTOR-loop phase correction.

Fig. 9. Viterbi decoder.

decoder was selected. Furthermore, the Viterbi decoding is “DMT aligned,” meaning that the decoder is in a known state at the beginning and at the end of each DMT symbol. In this configuration, and for this application, system simulations have shown that a backtrace length of 20 Viterbi symbols (each Viterbi symbol corresponding to two tones) gives almost all the expected gain. Fig. 9 shows the block architecture of the decoder. For each 4-D tone (pair of DMT tones), the 64 weights to be assigned to the different branches of the trellis are computed by the branch metric unit. Then, the add-and-compare unit selects the survival paths for the Viterbi states on the basis of the accumulated weights on the different paths. To reduce the dynamic of those ac-

cumulated weights, hereafter called state metrics, one usually subtracts at each step the minimum state metric from all others. However, this is a computational-intensive task requiring some extra complexity. A known technique has been applied in SACHEM to reduce the computational load [6]. By slightly increasing the dynamic of the state metrics and performing a “rescale” operation only when the minimum state metric has reached the half-dynamic boundary, this extra complexity can be reduced. A “rescale” operation is performed simply by resetting the most significant bit of all state metrics, thereby avoiding the costly subtraction of the minimum state metric. Fig. 10 shows the principle of this rescale operation. With this efficient

1006

Fig. 10.


Viterbi rescale operation.

technique, it is still possible to reduce the dynamic of the state metric to 6 bits without degrading the decoder performances. After selection of the survival paths, the one with the lowest state metric is chosen to start the backtrace operation and find back, 20 Viterbi symbols earlier (backtrace length), the most likely 4-D subset, which is afterwards translated to 2-D information and sent back to the demapper unit. F. Transmission Convergence Layer The data received from the demapper are split into two paths, one dedicated to the interleaved or slow data flow and the other to the noninterleaved or fast data flow. The interleaving/deinterleaving is used to increase the error-correction capabilities of block codes for error burst. A block code with depth increases the burst errors capability from to bytes. SACHEM uses rectangular interleaving with depths 1, 2, 4, 8, 16, 32, and 64. The RS decoder [7], [8] is able to correct errored bytes by using the redundant bytes and erasure information. The errorcode is limited to the correcting capabilities of an RS following equation:

where is the number of erased bytes, is the number is the number of RS codeof undetected error bytes, is the number of data bytes. SACHEM, word bytes, and being even), configurable up to 16 overhead bytes ( is capable of decoding three RS code words within one DMT symbol. Two physical-medium-dependent (PMD) descramblers are used, one for slow and one for fast, performing

as specified in ADSL Standard T1.413 [1]. The deframer is a highly programmable synchronization machine with a variable synchronization delay according to , with being the number of DMT symbols and being the per code word interleaving depth. The deframer also extracts some specific bytes used for frame synchronization, ADSL messaging, and redundancy checks.

Last, the two byte streams (slow and fast) are presented to an ATM byte-based processing unit, which provides basic cell functions like cell synchronization, payload descrambling, idle/unassigned cell filtering, and cell header detection and correction, all according to the ITU-T I.163 standard. Provision is also made for bit error rate measurement. The transmit path is the dual of the receive path and contains ATM transmissionconvergence-layer functions, framing, PMD scrambling, RS encoding, and interleaving. III. DESIGN METHODOLOGY Some basic design rules were followed to significantly reduce complexity and hardware development time and increase design efficiency and testability. Our first constraint was that all building blocks of SACHEM must be configurable for both central-office and remote applications. Second, we wanted to introduce a high level of programmability, thus enabling optimal scheduling (reduce latency) and provide high access and visibility for device behavior validation. Enhanced testability and verification was also ensured by providing a high accessibility toward internal memories and test loops. Last, we adopted a modular design approach and used very high-level description language (VHDL) generics, mainly in the DSP front end. A very fine example of this design approach is the half-band symmetrical decimator. The architecture of these half-band FIR filters was optimized in such a way that the number of calculations were kept to a minimum. This was realized by choosing the FIR coefficients very carefully: even coefficients function were all zero except for the central tap of the , while the odd coefficients were put symmetrical with respect to the central tap. This enabled us to introduce one and to divide multiplication for the even samples the number of multiplications by two for the odd samples . Notice the folding back of the odd sample delay line, followed by an odd sample addition and then the multiplication. Further design speed improvement was accomplished by generalizing this structure in defining some basic parameters such as decimation rate (c_Division), number of taps (c_HBwidth), input and output word size (c_VectorWidth), filter coefficients’ word size (c_CoefWidth), and amount of guard bits (c_GuardWidth). This “generic”


Fig. 11.

1007

Symmetrical half-band decimator. TABLE I SIMULATION SPEED OVERVIEW

approach made it possible to create whatever type of halfband symmetrical FIR decimator just by defining its input parameters. This meant a significant reduction in design time. Fig. 11 shows a 59-tap implementation of such a half-band FIR. Three new basic approaches were introduced in the design flow in order to meet stringent time-to-market requirements: and behavioral VHDL), simulation flexibility, modeling (C and design mapping on emulator to speed up test and software development even before any silicon was available. Bit-true C-models of the PMD layer were used to verify system performance and match versus VHDL simulation results. The simulation environment of SACHEM was build in a manner such that designers were able to switch among , register-transfer-level, Verilog gate-level, or behavioral C mapped data-base (emulator) simulations without worrying about stimuli or input files. Mapping of SACHEM on an emulator enabled us to significantly speed up simulation time by a factor of more than 100 when the emulator was driven

Fig. 12. SACHEM photograph.

directly by a workstation or more than 100 000 when the emulator was driven by a test board (with simulation software being compiled on the target on-board controller). Moreover, software development could start as soon as a netlist was mapped, i.e., several weeks before the first available samples. Table I gives an indication of achieved run times when simulating 10 s of an ADSL initialization procedure. It is obvious that major improvements in simulation speed, thereby increasing coverage and level of confidence, can be obtained by both modeling and hardware emulation. IV. RESULTS A photograph of SACHEM is shown in Fig. 12. The device is designed in 0.35- m digital CMOS technology (five metal layers) and packaged in a 144-pin plastic quad flat pack.

1008


TABLE II ASIC CHARACTERISTICS

L. Kiss was born in Antwerpen, Belgium, in 1958. He graduated in electronics engineering from Katholieke Industriele Hogeschool, Antwerpen, in 1981. In 1982, he joined the Alcatel VLSI Design Center, Antwerp, Belgium, where he was involved in CMOS full- and semicustom digital designs of telephone circuits. Since 1991, he has been Project Leader for several CMOS ASIC’s for telecommunication applications in the field of switching, access, mobile and xDSL. He is currently leading an improvement program on VLSI design processes and methodologies within the Alcatel Microelectronics Design Department, Antwerp.

K. Adriaensen was born in Hoogstraten, Belgium, in 1966. He received the electrical engineering degree from the Katholieke Universiteit Leuven, Belgium, in 1989. In 1989, he joined the Microelectronics Design Department, Alcatel Bell, Antwerp, Belgium, where he has been responsible for the development of telecommunication VLSI circuits for switching, BISDN, and xDSL. Since 1994, he has been leading the xDSL Microelectronics Design Group. He currently is Project Leader for the development of next-generation central office ADSL products.

Thanks to special attention during development, area and power consumption are below usual figures for designs of such complexity. Table II gives an overview of SACHEM’s main characteristics.

V. CONCLUSION This paper shows a high-complexity design, approaching the system-on-chip concept, which was developed within a very short lead time and introduced new design methodology ) and use of concepts such as high-level modeling (C emulation for parallel engineering. The presented approach formed the basic enabling for hardware and software codesign and device verification in “real” conditions well before the availability of silicon. REFERENCES [1] Network and Customer Installation Interfaces—Asymmetric Digital Subscriber Line (ADSL) Metallic Interface, ANSI Standard TI.413, 1995. [2] A. Peled and A. Riuz, “Frequency domain data transmission using reduced computational complexity algorithms,” in Proc. Int. Conf. Acoustics, Speech and Signal Processing, Denver, CO, Apr. 1980, pp. 964–967. [3] A. Riuz and J. M. Cioffi, “A frequency domain approach to combined spectral shaping and coding,” in Proc. ICC’97, Seattle, WA, June 1987, pp. 1711–1715. [4] G. Ungerboeck, “Trellis-coded modulation with redundant signal sets part 1: Introduction,” IEEE Commun. Mag., vol. 25, pp. 5–21, Feb. 1987. [5] L.-F. Wei, “Trellis-coded modulation with multidimensional constellations,” IEEE Trans. Inform. Theory, vol. IT-33, pp. 483–501, July 1987. [6] A. P. Hekstra, “An alternative to metric rescaling in Viterbi decoders,” IEEE Trans. Commun., vol. 37, pp. 1220–1222, Feb. 1989. [7] S. Choomchuay and B. Arambepio, “Time domain algorithms and architectures for Reed-Solomon decoding,” Proc. Inst. Elect. Eng., vol. 140, pt. I, no. 3, pp. 189–196, June 1993. [8] C.-H. Wei, C.-C. Chen, and G.-S. Liu, “High-speed Reed-Solomon decoder for correcting errors and erasures,” Proc. Inst. Elect. Eng., vol. 140, pt. I, no. 4, pp. 246–254, Aug. 1993.

C. Gendarme received the master’s degree in electronics for telecommunications systems engineering from the Ecole Nationale Superieure des Telecommunications, Paris, France, in 1996. In 1997, he joined the Microelectronics Design Department, Alcatel Bell, Antwerp, Belgium, where he is working on data communication products for ADSL and broadband switches. His research interests are in integrated system definitions.

E. Hanssens was born in Brussels, Belgium, in 1969. He received the M.S. and Ph.D. degrees in electrical engineering from the Universit´e Catholique de Louvain, Belgium, in 1991 and 1996, respectively. Since 1996, he has been with Alcatel Bell, Antwerp, Belgium, where he works on design and implementation of ADSL modems. His research interests include signal processing as well as digital VLSI circuit designs.

M. Huysmans was born in Herentals, Belgium, in 1959. He graduated in electronics engineering from Hoger Instituut der Kempen, Geel, Belgium, in 1981. In 1990, he joined the Microelectronics Design Department, Alcatel Bell, Antwerp, Belgium, where he was working on ASIC designs for passive optical networks. In 1996, he became a Feasibility Engineer for ADSL integrated circuits. He currently is a System Engineer for ADSL network terminations within the ADSL Design Group. His special interests are within telecommunication system complexity control domains.


F. Van Beylen graduated from Groep-T, Leuven, Belgium, in 1992. In 1993, he joined the Microelectronics Design Department, Alcatel Bell, Antwerp, Belgium, where he has worked on CMOS digital designs for telecommunication products in the field of xDSL and SONET. He currently is Project Leader of an ASIC for broadband applications.

1009

H. Van De Weghe graduated from Hoger Technisch Instituut Gent, Belgium, in 1973. Currently, he is with the Microelectronics Design Department, Alcatel Bell, Antwerp, Belgium, where he is involved in the design of CMOS digital circuits for telecommunication applications in the field of xDSL and broad-band switching. His main interests are in the field of high-level description languages and high-level synthesis.