REDUCED-COMPLEXITY VITERBI DETECTOR ... - Semantic Scholar

2 downloads 0 Views 480KB Size Report
Scotts Valley, CA 95067. Abstmct- The objective of this ..... The case shown in Fig. 4b is not at- tractive, since it has time varying branch inetrics in the hut terfly.
REDUCED-COMPLEXITY VITERBI DETECTOlR ARCHITECTURES FOR PARTIAL RESPONSE SIGNALLING Gerhard Fettweis

Razmik Karabed

Paul H. Siege1

Hemant K . Thapar

Mobile Commiiniration Systems Dresden University of Technology D-01062 Dresden, Germany

GEC Plessy Semiconductors Scotts Valley, CA 95067

University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0407

DataPath Systems, Inc. Santa Clara, CA 95051

A b s t m c t - The objective of this paper is t o provide a general framework for optimizing the design of Viterbi detectors t o achieve substantial reductions in hardware complexity as we11 as power consumption. Specific application of the framework t o partial response systems is discussed. It is shown that the complexity of the add-compare-select unit for an N-state Viterbi detector is approximately N adders. Compared t o conventional architectures, this represents a 50% savings for partial response signals o f interest in magnetic storage.

quired t,o achieve hardware savings in t,he ACSU. Application of the transformations to PR4 and EPR4 are illustrat,ed in Section 3.. Section 4 summarizes the resiilts.

11.

BRANCH METRICS

This section is intended t o give an introdiiction to a novel tool - namely, the shifting of branch metrics - which allows the manipulation and transformation of the ACSloop operations. Tlne tool is fundamentally related to the M-step parallelization introduced in [7] t o break the inherent bottleneck in the conventional 1-step approach [6]. T h e key for an M-step parallelization of the VD is the validity of the distributive law of addition and maxinium/mininium selection. From here on in this section, without loss of genierality, we treat only the case of maximization. for which the distributive law takes the form:

I . INTRODUCTION Uncoded [I, 21 and coded [3] partial response signalling methods have been unequivocally demonstrated [4,51 t o provide substantial performance benefits in digital niagnetic recording systems. However, their application in prodncbs, especially those with a small form-factor and low power dissipat,ion requirement, crucially depends on the coniplexity of the receiver implementation. One of the biggest concerns is the complexity of the required Viterbi detector (VD) for such systems. This paper is intended to address t h a t concern and render these advanced signal processing and coding techniques viable for d a t a storage applications requiring low power dissipation. In this paper, we develop a general framework for designing low-complexity architectures for the Viterbi dek c t o r . T h e reader is assumed t o be familiar with the Viterbi algorithm and its implementation that typically comprises a pipeline of three units: a branch metric unit (BMU) t o compute the branch metrics that depend on the d a t a input; an add-compare-select unit (ACSU) that processes the ACS-recursion; and the survivor memory unit (SMU) that stores the decisions niade in the ACSU, and provides the decoded output. (Those unfamiliar with the Viterbi algorithm are referred t o [6] .) Our focus here is specifically on the ACSU. We show that the complexity is substantially reduced by transforming the “add-conipareselect” operation into a “compare-select-add” operation. T h e application of the general framework is illustrated with examples of Class 4 (PR4) and extended Class 4 (EPR4) partial response systems. It is shown that the complexity of the ACSU is reduced by about 50% relative t o the conventional architectures. Section 2 describes the “trellis transformations” re-

“(2

+ a,y + a ) = m az( 2,y) + a .

(1)

This relationship may be exploited not just for parallelization of the algorithm to achieve higher speed[8], but also to realize substantial reduction in hardware complexity of a. VD, as discussed below. In terms of a signal flow graph the fact that (1) holds allows tis t o shift additions on branches over maximization nodes. This is shown in Fig. la, where the values a and h are added t o the two variables x and y, respectively, before taking the niaximum. Fig. l a can be transformed to the equivalent operation shown in Fig. l b , where the value a has been shifted over the maximization node, in compliance with the distributive law (1). When decoding with the Viterbi detector, the trellis diagram, with its branch labels defined by the branch metrics, describes the algorithm that nceds t o be carried out. Unlabeled branches leaving a node correspond to the distribution of the node variable; branches labeled with branch metrics correspond to the addition of the branch metric t o the node variable being distributed along the branch; and branches merging in a node correspond t o selecting the maximum of all the variables into the node. Using these observations, the trellis given in Fig. 2a can be redrawn as a “2-iteration trellis” shown in Fig. 2b, separat-

This work was done while the authors were with IBM Research Division, Almaden Research Center, Scan Jose, CA.

0-7803-2509-5195 US$4.00 0 1995 IEEE

OPTIMIZATION BY SHIFTING

559

FIG 3 s

FIG 3 c

FIG 3b

FIG 3d

X

FIG. 1b

BL*

k.1

Fzg. 3: (a) Generic 2-state butterfly diagram. (b) Result of shifting addition of branch metrics a and c. (c) Result of shifting addition of branch metric -a i h. (d) Resulting branch-metric-shift~:d“2iteration” trellis diagram.

max(x+a,y+b)

Y

threefold product of two diagonal niatrices with a matrix Fzg. f: Illustration of shifting br 0, then yo < y 1 and yo > y 1 n o t occiir simuitanrously. If C < 0 then yo > y1 and yo < y1 C can never occur. Hence, only three of the four possible choices exist for different values of yo and

+

+

71.

Second, the two comparators for the two maximum selections can share common hardware, since the difference of two numbers is compared t o two levels, 0 and C . This reduces the coinplexity t o that of approximately one adder in case of conventional binary arithmetic (see section 1II.C). In case of carry-save arithmetic, the bit-level

560

sithtraction iinit [9] can be shared. FIG 4a

I1I. APPLIc ATI O N S

We now discuss two examples of practical interest to show the significance of the transformations siininiarized in Fig. 3c and (2).

A . Class-4 Parlial Response (PR4) Maxii~itim-likelihooddetection of PR4 over an additive white Gaussian noise channel involves the selection of a n allowed scqiicnce of symbols that iiiininiizes the sum of t he sqiiarcd-error bet,ween itself and the seqitence of noisy observations. T h e total error that, needs to h e examined may be written in ternis of the saniples { y k } and the d a t a sequence { u k } :

Fag. 4: (a) Transformation of trellis for PR.4 (one interleave), W M F case. (b)Transformation of trellis for PR4 (one interleave), M F case, { ~ 1, t -1) inputs. (c)Transformation of trellis for PR.4 (one interleave), M F case, {O, 1) inputs.

n:=n

which can be transfornied by shifting branch nietrics as shown. Note that for c a s e f l the constant term ai is left out of the minimization, i.e. E - C ( y i + 2 a i ) is minimized, which is

This will he referred to as the “whitened matched filter” (WMF) case. T h e Viterbi algorithm, which is a dynamic prograniniing method, is applied to determine the sequence { a b } that minimizes E. Note that since the ternis y i are common to E for all of the allowed sequences, they can be siibtracted in the minimization operation. For t he magnetic recording channel, the input comprises only binary symbols. These may be represented in terms of (0, I} or { + I , -I} levels, resulting in two cases of interest,, which we refer to as “ c a s e i l ” : { a k } E {-1, l} , or “caseO1”: { a k } E {0,1}. Because of the dc-free nature of the PR4 channel, both cases yield the same trellis of Fig. 4a. This trellis can be transfornied by shifting branch inetrics and dividing by 2, as shown in Fig. 4a. Here, the 1 has been left-shifted, addition of branch metric -2yk and the addition of branch metric 2yk - 1 has been rightshifted. Since dynamic prograniniing only depends on the existence of the .distributive law of niaxiniuni/niininiuni selection and addition, it can also be applied if Eq. (3) is rewritten as

k

k

The result of shifling the branch metrics in case of Fig. 4 a and Fig. 4c leads to only one variable addition, and one fixed addition in the butterfly, exactly as was discussed a t the end of the previous section. Hence, the total hardware complexity of the ACS-unit of the VD is approximately only two adders. T h e case shown in Fig. 4b is not attractive, since it has time varying branch inetrics in the hut terfly. We remark that the difference metric approach for PR4 [lo] can also be applied to the branch-shifted trellises of Fig. 4. The transformed trellises in Fig. 4a ( W M F , c a s e f l and case0l) and Fig. 4c (MF, case0l) yield identical update recursions for the difference metric 6 k , which is defined as the upper path metric minus the lower one, as given by:

+

6k+l

By defining the new [‘matched filter” (MF) variable ZI;= -yk y k t 2 and re-ordering the addends Eq. (4) can be rewritten as [IO]

+

=

{

6k -Zk --Zk 1--z~

for 6 k > 0 for - 1 < 6 k for 6 k 5 -1

20

(7)

This recursion, which derives from both the branch-shifted M F trellis and the branch-shifted W M F trellis, is equivalent t o the conventional difference metric formulation of the M F case, but differs from the conventional difference metric formulation o f the WMF casc, as given i n [lo]. See also [11],[12].

Drawing the trellis for applying dynamic programming t o (5) leads to Fig. 4b for c a s e f l and t o Fig. 4c for case01,

561

? I

I /

I

Fzg. 6 : Viterlii tlctcctor implcmentation corresponding to Fig. 5b. +2

FIG 56

FIG 5a

Fzg. 5 . (a) Trellis diagram for E P R 4 , WMF crzsr, { I 1, -1) inputs. (h)Transformctf trellis.

B. Ex f e n d e d Class

4 Partial

Response (EPR4)

In case of EPR4 [a], the nornlalized slim of squarcderrors in case of WMF branch mcttrics equals

Fig. 5a shows the original trellis for c a s e f l , whereas Fig. 5b shows the result. after branch metric shifting. T h e implementation of the Viterbi detector based on the transformed trellis of Fig. 5b is shown in Fig. 6. Each block niarked C/S represents a conipare-select block that compares the two state-metric inputs to prodnce the siirvivor met,rics as well as the survivor-sequence pointers, snii. T h e pointers are used to control the contents of the path memory, which may be organized using the traceback or the regiskr-exchange configuration. A block diagram of the C/S unit is shown in Fig. 7. Even though shown separately, the add and the decision functions may be liiiiiped together as one logical function. Note that. the configuration in Fig. 6 is organized in the forni of compare-select-add operations. By simply moving tlhe adders (involving A and B valites) through the survivor metric registers, the configuration may be re-cast into the conventional add-compare-select operations. T h e iniplenientation in Fig. 6 provides a significant reduction in hardware requirement relative to the conventional approach. Instead of 10 adders in the ACS-units, the new approach requires only 4 adders. (The other four adders involving the addition or subtraction of 2 do require

Fzg. 7: Detail of compare-select unit in Fig. 6 .

extra hardware, but it is significant,ly smaller than a full adder involving two variable quantities.) T h e implenientation of the compare-select hardware is also simplified since only 4 comparat,ors, each slightly modified to produce two survivor decisions, are now required, instead of the 8 comparators, one per state, demanded by the conventional approach. Thus, we can achieve siibstantial hardware savings in the EPR4 trellis structiires by means of branch metric shifting and, as in the PR4 case, the total hardware coinplexity is aboiit, one adder per state. It can in fact be shown that these techniques are applicable t o the Viterbi detector corresponding to any intersynibol interference (ISI) channel model, resulting in corresponding hardware simplifications. T h e overall concliision is that t h e corn,plexity o j t h e add-compare-select f o r a n y b i n a r y partial response V D with N s t a t e s is a p p r o x i m a t e l y N a d d e r s , where N / 2 adders are required for the addition of the data-

562

dependent branch nietrics, and the remaining N / 2 adders niake np the complexity of the N / 2 two-level compares. (The correspondence between the t,wo-level compare circuitry and the adder is described in more detail in the next, subsection.) This formulation makes the implementation of higher-order partial response systems beyond EPR4 [a] more viable as a means of increasing the storage density. These techniques may also be applied to advantage in implementing detectors for trellis-coded IS1 channels, such as inatched spectral-null coded partial response channels [3].

I V . CONCLUSIONS T h e objective ol this paper was to introdllce methods for the optimization of Viterbi detectors. Branch metric shifting was used to achieve a substantial reduction in the ACSU complexity. T h e main resnlt, demonstrated for partial response signalling using PR4 or EPR4, and generalizable t o Vitmbi detectors for arbitrary IS1 channels a n d trellis-coded systems, is that, by use of branch nietric shifting, the coniplexity of the ACS hardware for N states is approximately N adders. T h e resulting hardware savings for PR4 is approxiniately 50%, while that, for EPR4 is greater t,han 50%.

C. TWOLevel COmpare- Se1ec t T h e statement that the coniplcxity of a binary N-state add-compare-select is approximately that of N adders holds only if the complexity of a two-level compare-select is that of one adder. In this subsection, we show a simple architecture for the two-level compare-select for which, indeed, the complexity is essentially that of a single adder. A comparison of two niinibers A and B can be carried out in many ways. One very eficient way, targeted for CMOS implementation, is to subtract A from B using 2’s coniplement arithmetic. T h e most significant hit (MSB) of the result, indicates the niaxiiiiiiiii/niininiiii~iand can drive the select operation. As is well-known, the MSB can be obtained with the carry-ripple path only and not, a complete adder. This can bc iniplcniented very efficiently in CMOS by using only one 3x2 AND/OR or OR/NAND complex gate per bit-level. Each of these coinplcx gates requires only 12-14 transistors (in CMOS). T h e two-level compare of two nuinbers A and B differs from the simple conipare described above by the fact that now the decision has to be made iipon A and B , and lipon A + 1 and B , simultaneously. Since a fixed-point arithinetic can be assumed, the numbers A and A+ 1 only differ in the upper n bit-levels. Hence, the carry-ripple circuit for the lower bit-levels can be shared for both conipares. A half-adder chain needs to be implemented only a t the upper n bit-levels to compute A + 1, and an additional carry-ripple is needed for the compare. Therefore it can be seen that the total coniplcxity of the compare and the two select-multiplexers is a t most that of one adder. Note that, to speed-up the computat,ion cycle of one add-compare-select, t,he add can be carried out before the select operation has been completed, as described in [13]. This can cut the latency of the critical path by up t o approximately 40%. However, this requires doubling the adder and select hardware, as well as the wiring conimunication of the trellis between the different ACS path nietric cells. As a result, this can lead to a favorable area-time tradcoff when the trellis wiring is not too complicated [13].

REFERENCES H. Kobayashi, “Application of probabilistic decoding to digital magnetic recording systems,” I B M J . Res. Dev., vol. 15, pp. G474, January 1971. H. Thapar and A. Patel, “A class of partial response systems for increasing storage density in magnetic recording,” I E E E Trans. Magn., vol. MAG-23, pp. 3666-3668, September 1987. R. Karabed and F’. Siegel, “Matched spectral nnll codes for partial response channels,” IEEE Trans. Info. Th., Special I.ssue on Coding f o r Storage Devices, vol. 37, pp. 818-855, May 1991. H. Thapar and T. Howell, “On the performance of partialresponse maximum likelihood and peak detection methods in digital magnetic recording,” in Digests of the 1991 Magnetic Recordang Confewnce, (Pittsburgh, PA), Paper D1, Jiine 1991. H. Thapar, d . R m ,C. Shiing, R. Karabed, anti P. Siegel, “On the performance of a rate 8/10 matched spectral null code for class-4 partial response,” I E E E Trans. Magn., vol. MAG-28, pp. 2884-2889, September 1992. G. Forney, “The ’Viterbi algorithm,” Proceedings of the IEEE, vol. 61, pp. 268-278, March 1973. G. Fettweb and H. Meyr, “Parallel Viterbi algorithm implementation by breaking the compare-select feedback bottleneck,” in PTOC.of I E E E Int. Conf. Commun. (ICC), vol. 2 , (Philadelphia, PA), pp. 71!3-722, June 1987. G. Fettweis and H. Meyr, “High-speed parallel Viterbi decoding: Algorithm and VLSI-architectnre,” I E E E Communicataons Magazine, pip. 4 6 5 5 , May 1991. J . P. Berns, “Realkierung einer 12-stufigen ACS-Einheit eines Viterbi Decoders,” tech. rep., ERT, Aachen University of Technology, Aachen, Germany, Jnly 1990. F. Dolivo and G . Ungerboeck, “Viterbi detectors for partial response class IV !signaling: Theory and implementation,” tech. rep., IBM Res. Rep. RZ 1177, Zurich, Switzerland, September 1982. P. Siegel and J . Wolf, “Modulation and coding for information storage,” I E E E Communications Magazzne, vol. 29, pp. 68-86, December 1991. M. Fergrison, “0,ptimal reception for binary partial response channels,” Bell Syst. Tech. J . , vol. 51, pp. 493-505, February 1972. G. Fettweis and € 1 ~Meyr, “A 100Mbit/s Viterbi-decoder chip: Novel architecture and its realization,’’ in PTOC.I E E E Int. Conf. Commun. (IGC), vol. 2 , (Atlanta, GA), pp. 463-467 (307.4), April 1990.

563

Suggest Documents