Optimal Choice of Intermediate Latching to Maximize Throughput in ...

3 downloads 0 Views 643KB Size Report
ing task, such as FIR filtering, for example, the maximum clock rate, and hence ... chip area and P is the period (the reciprocal of throughput). We derive.
28

IEEE TRANSACTIONS ON ACOUSTICS, SPEECH,

AND SIGNAL PROCESSING, VOL. ASSP-32, NO. 1, FEBRUARY 1984

Optimal Choice of Intermediate Latching to Maximize Throughput in VLSI Circuits PETER R. CAPPELLO, MEMBER,

IEEE,

ANDREA LAPAUGH, MEMBER,

IEEE, AND

KENNETH STEIGLITZ,

FELLOW, IEEE

Abstract-In many computational tasks, especially in signal processing, or delay. it is the throughput that is important, rather than the latency, If a special-purposeVLSI chip is designedfor a particular signal processing task, such as FIR filtering, for example, the maximum clock rate, and hence throughput,is determined by the depthof the combinational logic between registers and the time required for the distribution and operation of the clock. If the combinational logic is sufficiently deep (in bit-parallel circuits, for example), the throughput can be increased by inserting intermediate stages of clgcked latches. This is at the expense of increased area and delay to operate andclock the intermediate registers. Roughly speaking, the strategy amounts t o using more of the chip area to store information useful for pipelining. This paper investigates the optimal tradeoff between the degree of intermediate latching and cost, using the measure AP, where A is the 42 41 chip area and P is the period (the reciprocal of throughput). We derive Fig. 1 . Two-phaseclockedlatchesbetweenstages of combinational expressions for the time and area before and after intermediate latchlogic. ing, using the Mead-Conway model, both for the cases of on-chip and off-chip clock drivers. The results show that significant reductions in AP product (reciprocal of throughput per unit area) can be achieved by intermediate latching in many typical signal processing applications, and P is the period. The APproduct can be thought of as the for a wide range of circuit parameters. The array multiplier is used as reciprocal ofthroughput per unitarea,andacompletely an example.

pipelined circuit optimal with respect to this criterion can be claimed tomake bestuse ofchip area. Leiserson andSaxe 1141 treat the related problem of redistributing latches so as I. INTRODUCTION to decrease periqd, but they do not consider area or clocking HEN certain tasks are implemented with special-purpose VLSI chips, it is often the period P (time between suc- penalities. We assume that the circuits we discuss are designed along the cessive outputs) that is crucial, rather than the latency or delay lines described by Mead and Conway [ l ] : typically that a twoT. This is especially true in signal processing,wheretypical phase clock is used to transfer information between registers tasks suchas filteringanddiscreteFouriertransformation (or latches), and that these registers are separated by combinaoften have high volume requirements and relatively lax delay tional logic. The fol1,owing sections are devoted to modeling requirements.Recentworkhasdescribed bit-serial andbitthe time and, area requirements of the latches, the combinaparallel VLSI architectures that do in fact allow the period to tional logic, and the clock driver. We then consider the overall be equal to the clock period (see, for example, [2], [4] -[9], circuit and investigate theoptimalchoice of theamountof [ 121). In [5] , [7] a class of these circuits is called completely latching for the two cases of on-chip and off-chip clockdrivers. pipelined. In this paper, we take up a different question, that While the assumptions made about first-order circuit behavior of inserting intermediate stages of latching so as to maximize pertain to nMOS technology, the analysis technique uses dithe rate at which the clock can run without a disproportionate mensionless parameterization and is applicable to any situablowup in area requirements. We will use the criterion of minitions with deep combinational logic-typically bit-parallel cirmizing the A P product, where A is the area of the VLSI circuit cuits. A representative tradeoff curve is shown for an example.

W

ManuscriptreceivedAugust 10, 1982; revisedApril 12, 1983.This work was supported in part by the National Science Foundation under Grant ECS-8120037, U.S. Army Research-Durham under Grant DAAG29-82-K-0095,andDARPA ContractN00014-82-K-0549.A preliminaryversionofthispaper was presented atthe1983 IEEE international Conference on Acoustics, Speech, and Signal Processing, Boston, MA, April 14-16, 1983. P. R. Cappello was with the Departmentof Electrical Engineering and ComputerScience,Princeton University, Princeton, NJ 08544. He is nowwiththeDepartment of Computer Science,University of California, Santa Barbara, CA 93106. A. LaPaugh and K. Steiglitzarewith the Department of Electrical Engineering and Computer Science, Princeton University, Princcton, NJ 08554.

11. CLOCK TIMING We will adopt a version of thetwo-phaseclockingsystem described by Seitz in [ l , ch. 71, atypical stage of which is shown in Fig. 1. Fig. 2shows the correspondingtimingdiagram:First, we must drive the phase l clock signal q51 high, takingtime tclock (the clock driver time). We then needa minimum time tdelay (the delay time) to charge the input stage of the combinational logic. Phase 1 must then go low (taking time tClock), and phase 2 must then go high (also taking time tclock). We must insure that there is a minimum time t12 dur-

0096-3518/84/0200-0028$01.00 0 1984 IEEE

CAPPELLO e t aZ.: OPTIMAL CHOICE OF INTERMEDIATE LATCHING

29

/-----

Fig. 2 . Clock-timing diagram.

ing which both clocks,are low; otherwise we run the risk that skew betweentheclock phases will cause both clocks to be on at the same time. This brings us up to thepoint where the combinational logic has already startedto work. The input values propagate through the combinational logic, takingsometime tlogic. Thistimeincludesthetimeduring which @ 1 is broughtdownand $2 is broughtup.Thetime tlogic will ordinarily dominate the clock-interchange time, but, in general, we need to set the time for this operation to =

(tlogic 9 2tclock

"r

"r

tl2)

where, for safe operation of the circuit, tlogic must of course +2 be taken as the maximum delaytimeof thecombinational Fig. 3. Detailsof theclockedlatches,showing pullup and pulldown logic. effectiveresistances andcapacitances. . We next need to transfer the output values of the preceding logic stage to the inputof the latch whose output is controlled fectivepulldownresistance R , ,andpulldowntime (transit by dl ; that is, G2 mustremain on for, a minimumcharging time) T when driving the input of an equal size inverter. We time tSet(the preset time). The $2 clock signal must then be refer to such a cell in what follows as aminimal inverter. broughtdown(takinganotherclockdrivertime tclock, and Now inverters in the latches are driven through pass trananotherdeadtime (t21) provided to insurenonoverlap of sistors, so the discussion in [ 11 shows that we should choose a clocks in case of clock skew. pullup/pulldown ratio of 8. The time required for the second The minimum period P of the circuit is therefore inverter to charge its load is therefore approximated by the following RC constant:

p = 2tclock tdelay tset

i- t 2 l

(tlogic~2tclock -t tl2)-

TO be more accurate, we might want to take into account the factthatthe upgoing and downgoingclockwaveforms are not completely symmetric; but the term tclock canbe taken to represent the average of the upgoing and downgoing clock times in a single driver. In a multistage driver the stages alternate up and down, and we can take tc1ock to be the sum of the averages of the upgoing and downgoingtimesalongthe driving chain.

tdelay = (R 1

Rpass) (Cload

Cpass)

wherethe R's and C's areshown in Fig. 3. Assuming that the pass transistors are minimum size,RpaS = R and CPm = C. Also assuming that the capacitive load (input to the combinational logic) is minimal, we get tdelay .= 2(R 1/R -t 1) 7

111. LATCHTIME AND SPACE

where, from now on, we expressresistance in terms of the We nextwant to express the timedelay of the latches in length-to-width ratio of the transistor terms of basic units that are determined by the technology. For this purpose, we consider the nMOS inverter with a miniR l = (LI/Wl)R. mum size pulldown and a pullup/pulldown ratio of 4 to be the basic cell, with area A , pulldown gate capacitance C, ef- If the pullupjpulldown ratio of the latches is taken to be 8

30

IEEE TRANSACTIONS ON

(as mentionedabove), time as

we canwrite

ACOUSTICS, SPEECH, AND

SIGNAL PROCESSING, VOL. ASSP-32,

NO. 1, FEBRUARY 1984

the normalizeddelay

lected to reflect the spaceper logical elementrequired for power and ground lines. We will assume that the nominalcircuit has one typical logic tdelay/T = 2(8r + 1) stage between a pair of two-phase latches, and we then consider the insertion- of (m - 1) latches equally spaced in the where r = Lz/W2 is the size of the latch pulldown. When r = the pulldown transistor of the latch inverter will be twice as combinational logic, m Z 1. The case m = 1 then represents wide as the corresponding transistor of the minimal inverter, the original situation. We assume the latches can be made to but the pullup/pulldown ratio is 8, not 4, so the pullup tran- “fit” well; that is, thatthecombinational logicis arranged sistor will then be the same length as in the minimal inverter. regularly enough so that stages can be pushed apart and colThe area of such a latch inverter Gith Y = will be only a little umns of latches inserted. The total time required for thelogic larger than that of a minimal inverter, perhaps about 25 per- is therefore cent larger. Thus, the choice of r = $ speeds up the latch witht l o g i c h = P(k/m) out much area penalty, and we will use this value in this paper, although it could be kept as a parameter. and the area Using a similar argument based on RC charging times, the Alogic/A= d k 2 preset time is where d = n / k is the height-to-width ratio of the original logic t s e t / T = (8r 1) (1/r + 1). block,another dimensionlessparameter, usuallyassumed to The l / r term comes from the input capacitance of the second be 1. inverter, which loads the first inverter. To see this, write V. ON-CHIPCLOCK DRIVERTIMEAND SPACE Croad =(L2W*/LW) c = ( W z / L ~ ) c = ( l / r )c If we use an on-chip clock driver, we want to use a multistage version as described in [ 11, since the driver will have a where L , = L = W are minimum size. large capacitative load, especially if there is an appreciable The latching area is easy to write down. Assuming that the amount of intermediate latching introduced. We assume that pass transistors are thesame size as minimal inverters, and that clock distribution is on metal, so that propagation delay along the latches have area 1.25A, each two-phase latch requires northe wires is small. Each stage is assumed to have a pulldown f malized area times the size of the preceding, so if there are S stages driving Ahtch/A = 2(1.25 + 1) = 4 3 , Y pass transistors, each with minimal capacitance C,

4

&

Iv. COMBINATIONAL LOGICTIMEAND SPACE

f=

ym.

We want a fairly general model for the combinational logic that is sandwichedbetween thelatches;such logic maybe If we start the clock driving with a minimal inverter, the norbuilt from NAND and NOR gates, pass transistors, or some corn- malized delay of such adriver is approximately binationofthetwo. We will assume thatthe typical logic fd,.ive/T = 2.5fS. stageis a uniform array of n X k logical elements, each of The factor of 2 5 results from averaging the pullup time of 47 which has an area Aelem and a delay7,1em, where and pulldown time 7 along the inverter chain. (If we do not Aelem = insist that S is integer, and we minimizethisdelaywithrespect to f , we get the value f = e [ I ] . But S is an integer.) and This estimate for delay assumes that we insist on a globallyTelem = 07 synchronized clock-that the clock signals at the input of the This array will be thought of as n rows by k columns, with a driver. can be used anywhere else without concern for synchronization. Caraiscos and Liu [ 11] have pointed out that therise maximum delay path from left to right of k elements. Since logic stages are not usually so uniform, the 01 and p parameters and fall times of the clock waveforms may be much smaller must represent aierage values for the combinational logic. If than the absoiute delay, and thatusing a local clock may allow gates are built out of inverters and coupled directly, for exam- higher throughput, at ihe expense of using local clock signals that must be made synchronous with the signal itself at differple, p will generallybedetermined by the fan-out factor of the logic and the size of the inverters. An. average fan-out fac- ent points on and off the chip. Sending the clock along with tor of 3 , using gates ‘(with a pullup/pulldown ratio of 4), will the signal will incur other costs, of course. (For a discussion of result in p = 12, because we must allow for the worst case in the virtues of a globally-synchronized clock in signal processthe propagation of logic, where all signals are upgoing. To re- ing, see [ 101). The analysis inthispaper is conservative in duce this to a value closer to that of a minimal inverter, we the sense thai the resulting degree of latching and increase in expect to increase the area to, say, twice that of a minimal throughput is on thelow’side. (We can avoid the area and inverter. Thus, we can take values of 01 = 2 and 0= 4-1 2 as delay penalty incurred by using an on-chip driver by moving typicalofcombinational logic implementedwitharrays of the clock driver off-chip. That case will be discussed in more gates. We should also note that the valueof 01 should be se- detail in Section VII.)

CAPPELLO et al.: OPTIMAL CHOICE OF INTERMEDIATE LATCHING

We must also consider the area contribution of the clock driver in relation to the rest of the circuit. The normalized area of the driver is

A&.iJA =

s-1

.

f’= ( Y - l ) / ( f - 1).

i =O

Next we look at the overall time and space requirements of the circuit. VI. OPTIMIZATIONOF APPRODUCTWITH CLOCK DRIVER

AN

ON-CHIP

We can now write thetotal minimumnormalizedperiod P/T = p in terms of our parametersas follows:

p = 5fS t 25 t 721 + max (/3k/m, 5fSt T ~ ~ ) where, as above, fS = Y = (m t 1 ) n = number of lines driven

and T~~ = t12/7,721 = t21/ T . Similarly, the total normalized area area/A = a is

a = 2 ( Y - l)/(f- l ) + 4 . 5 Y + a k n where the factor of 2 accounts for the fact that we must have two drivers, one for each phase. (These can be combined to some extent, but the total area is still nearly twice that of a single driver.) We now have the function ap(m, S), where m and S are discrete parameters. The number of stages is never much larger than In Y , since the optimal choice o f f is usually around e. In most cases ofinterest,therefore, it suffices totakethe minimum of ap for S = 1, . . . , 16, producing what we call ap(m, *>:

ap(m, *) = mins ap(m, 9. The range of m is certainly between 1 and k , so the optimal choice of m can be determined simply by

ap(*, *) = min, ap(m, *). The gain G in AP product achieved by latching is, therefore,

G =4 1 ,

*>lap(*,*).

VII. THE CASE OF AN OFF-CHIP CLOCK DRIVER As mentioned in Section V, if we allow the clock driver to be off-chip, we can drive the larger capacitive loads incurred by extra latching with essentially no penalty in clock delay’or driver area. The normalizedperiod and area can then be written P = 27clock + 25.

a = 4.5Y

t

721 + max

(/3k/m,2T&ck t 712)

akn

where we have assumed some delay of ‘TClock= tclock/7for the clock rise and fall times. The ep product is therefore a function of only one unknown parameter, m. With these changes ina andp, the same methodology appliesa numerical example will be given in the next section. Note,

31

however, that now the optimal value of m will occur roughly near the breakpoint where /3k/rn = 2~~~~~t 712, and that these times are both highly uncertain and small in size. The analysis in this case is therefore much less reliable, and much more sensitive to unmodeled effects such as propagation delay, than in the on-chip clock driver case. VIII. NUMERICALEXAMPLES We now give sometypicalnumericalresults. For this purpose, we consider a 16-bit array multiplier, implemented by an array of full adders, as descr‘ibed, for example, in [2]. We also assume that the full adders are implemented with gates; each full adder will then be about 3 gates deep. The carry propagation will require an array that has a maximum depthof 2 X 16, so altogether the combinational logic will have k x 100. (This is consistent with the value of “1 13 gate delays” given in [3] .) Say that each gate takes about double the area of a minimal inverter ( a = 2, optimistic for area, and hence pessimistic for our purposes), and that, as discussed in Section IV, /3 x 6. The array is roughlysquare, so that d x 1. Finally, we will assume that clock skew is not an important problem, and take 712 = 721 = 4. Fig. 4 shows a plot of normalized period p ( m , *)/p(l, *>; normalized area a ( m , * ) / a ( l*); , and normalized A P product ap(m, *)/ap(l, *) versus m. Theperiodasafunctionof m decreasessharply(roughly as l/m) untilthecombinational logic time is dominated by the clock-swapping and dead time (that is, until tlogic x 2tdOckt t12). After this point the clockdriving time will determine the minimum clock period and it no longerpays to increase m, because the area will increase with no payoff in speed. The minimum value of period occurs close to the minimum value of AP product. Thus, in theory, theperiod can be decreased. somewhat from its value when the AP product is minimized, at a slight cost in area. In practice theoptimal valuesarealmost always nearlyequal, and sometimes identical, because of the discreteness of the parameters m and S. Fig. 5 shows a plot of gain G in AP product versus the depth of combinational logic k , for the values a = 2 and 0= 4 , 6, 8, 12. Thegraphshowssignificantgainsin AP product (more than 2) over the unlatched case when k 2 50 and /3 2 6. Even when the gates are as fast as a minimalinverter(worst-case delay factor 0 = 4), there is an AP product gain of 2.2 when k = 100. Note that a larger value of a would only improve the gain. We conclude by looking at the actual numerical values of the minimum clock periods and areas involved in this analysis. Taking the k = 100, a = 2, /3 = 6 case above for a hypothetical 16-bit array multiplier, and assuming 7 = 0.3 ns for current technology, we get a period of P = 210 ns with no intermediate P = 66 ns with m = 6 ( 5 latching, and an optimal period of intermediate latching stages). The area before latching is 2.1 1 X 104A, which at h = 1 . 5 ~ (31.1line width) and a 225h2 inverter is about 10.7 mm2. After theintermediatelatching,theareabecomes 12.1 mm’; certainly a modest increase in area for about a threefold increase in speed.

32

IEEE TRANSACTIONS.ON ACOUST4CS. SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-32, NO. 1, FEBRUARY 1984

large predictedspeedupsin realizable in practice.

possible clockratemay

not be

IX. CONCLUSIONS We have modeled the timing of generic a pipelinable VLSI circuit in w h c h there are combinational logic stages separated by latching stages driven by two-phase clocks. An array multiplier is typicalofsuchaconfiguration. We then investigated the effectofintroducingintermediatelatchingstages, especially the tradeoff between increased throughput and increased area. Expressions were derived for area and minimum clock period, normalized in terms of minimal inverter area and delay, and we showed that optimal choices of the number ofclock driver stages (S), andthenumberofintermediate latching stages (m - l), can be made by simple enumeration. The numerical results illustrate the choice of latching density in atypical signal processing application.According toour OO 10 20 rn 30 40 model, a 16-bit array multiplier with gate logic and an on-chip Fig. 4. Normalized period, area, and AP product versus m for cy = 2, multistage clock driver can be clocked about three times faster p = 6, k = 100. The parameter ( m - 1) is the number of intermediate with about a 13 percent increase in area using five intermediate latching stages. latching stages. This decrease in period is also accompanied by an increase in the latency, or delay, of the multiplier. Higher ‘throughput canbeachieved with an off-chip clock driver, but the parameters in that case are lesswell known, and at such speeds the model becomesless reliable. Much more work needs to be done on detailed modeling of the timing of such VLSI circuits if we are to achieve maximum throughput rates in applications like signal processing. Future work will attempt to refine our model, along the lines of [ 131 as an example. We also need to study propagation delay,which was assumed to be relatively small in the examples (4 times the minimal inverter gate delay 7 for clock distribution, a reasonable assumption if the clock lines are metal, for example). Another important set of interesting problems concerns the study of the way algorithms, topologies, and layouts interact withthe timingproblemsconsideredhere.Recentwork on completelypipelinedor bit-level systolicarrays is astartin that direction (see, for example, [2], [4] - [9] , [ 121).

>

ACKNOWLEDGMENT We are indebtedto C. Caraiscos and B. Liu for valuable Fig. 5 . Gainin AP product versus combinational logic depth k foI p = 4, 6 , 8, 12. The parameter p is the delay of a combinational logic comments on the manuscript. element, normalized in terms ofthat of a minimal inverter. REFERENCES k

Theprecedingexample assumed anon-chipclock driver. When we use an off-chip clock driver at presumed small cost, as discussed in Section VI, we naturally get much faster solutions. In this example, the optimal value of period with the parametersofSection VI1 and Tclock = 4 (assuming a very sharp clock rise time and fall time), minimizing AP product, is 18 ns, compared with the unlatched valueof 191 ns. The area g3es from 10.6 mm2 with no latching to 16.5 mm2 with latching.This large increaseinareareflectsacorresponding increase in the density of latching; 26 (rn= 27) latching stages are introduced. We emphasize that in the case of an off-chip clockdriver, the numerical values of the parameters t,, and tclock are very uncertain and the optimal values of period, area, andlatchingdensityare sensitive to theseparameters. The

[ 11 C. Mead and L. Conway, Introduction to VLSISystems. Menlo Park, CA: Addison-Wesley, 1980. [ 2 ] J. V. McCanny, J. C. McWhirter, J. B. G. Roberts, D. J. Day, and T. L. Thorp, “Bit level systolic arrays,’’ in Proc. 15th Asilomer Conf. Circuits, Syst., Comput., Nov. 1981. [ 3 ] K. Botcher, A. Lacroix, M. Talmi, and D. Wesseling, “Integrated floating point signal processor,” in Proc. 1982 IEEE Int. Con5 Acoust., Speech, Signal Processing, Paris, France, May 1982, pp. 1088-1091. [4] P. R. Cappello, and K. Steiglitz, “Digital signal processing applications of systolic algorithms,” CMU Con$ VLSI Syst. Computations, H. T. Kung, B. Sproull, and G. Steele, Eds. Rockville, MD: Computer SciencePress, 1981. [SI -, “Bit-level fixed-flow architectures for signal processing,” in Proc. 1982 IEEE Int. Conf. Circuits, Comput., Sep. 29-Oct. 1, 1982. [6] -, “A VLSI layoutforapipelineddaddamultiplier,” ACM Trans. Comput. Syst., vol. 1, May 1983. [ 7 ] -, “Completelypipelinedarchitecturesfordigital signal pro-

CAPPELLO et CHOICE al. : OPTIMAL

OF INTERMEDIATE LATCHING

cessing,” IEEE Trans. Acoust., Speech, SignalPuocessing, vol. ASSP-31, pp. 1016-1022, Aug. 1983. [8] H. T. Kung, L. M. Ruane, and D. W. L. Yen, “A two-level pipelinedsystolicarrayforconvolutions,” CMU Conf. Syst. Computations, H. T.Kung, B. Sproull, and G. Steele, Eds. Rockville, MD: Computer Science Press, 1981. [9] P.B. Denyer and D. J. Myers, “Carry-save arrays for VLSI signal processing,” in VLSI 81: Very Large Scale Integration, J. P. Gray, Ed.London:Academic,1981. [ 101 R. F. Lyon, “A bit-serial VLSI architecture methodology for signal processing,” in VLSI 81: Very Large Scale Integration, .I. P. Gray, Ed. London: Academic, 1981. [ 111 C. Caraiscos and B. Liu, private communication. “Bit serialVLSI implementationsofFIRand IIR digital [12] -, filters,” in h o c . 1983 Int. Symp. Circuits Syst., May 1983. [13] P. Penfield, Jr. and J. Rubinstein, “Signal delay in RC tree networks,”in ROC. Second California Inst. Technol. Conf. VLSI, 1981. [ 141 C.E. Leiserson and J. B. Saxe,“Optimizingsynchronous systems,’’ in BOc. 22nd Ann. Symp. Foundations of Comput. Sci., October 28-30, 1981.

33

AndreaLaPaugh(”81)was bornin Middletown, CT, on June 26, 1952. She received the A.B. degree in physics from Cornel1 University, Ithaca, NY, in 1974, and the M.S. and Ph.D. degreesinelectricalengineering and computer science fromthe Massachusetts Institute of Technology,Cambridge,in 1977and1980, respectively. ShesubsequentlyspentayearasaVisiting Assistant Professor in the Department of Computer Science,BrownUniversity,Providence, RI. Since September 1981 shehasbeen an Assistant Professor in the Department ofElectricalEngineeringandComputerScience,Princeton University, Princeton, NJ. Dr. LaPaugh is amemberoftheIEEEComputerSocietyandthe Association for Computing Machinery.

Kenneth Steiglitz (Sy57-M’64-SM’79-F’81) was born in Weehawken, NJ, on January 30, 1939. He received the B.E.E., M.E.E., and Eng.Sc.D. degrees from New York University, New York, NY, in 1959, 1960, and 1963, respectively. Since September 1963 he has been with the Department of Electrical Engineering and Computer Science, Princeton University, Princeton, Peter R. Cappello (”83) was born in Queens, NJ,wherehe is nowProfessor,teachingand NY, on October 18, 1948. He received the B.S. conductingresearchinthecomputerand sysdegrees in mathematics and in computerscience tems areas. He is the author ofIntroduction to from Pennsylvania State University, University Park, in 1970, the M.S. degree in electrical engi- Discrete Systems (New York: Wiley, 1974), and coauthor, with C. H. neering and computer science from the Univer- Papadimitriou, of Combinatorial Optimization: Algorithms and Complexity (Englewood Cliffs, NJ : Prentice-Hall, 1982). sity of California,Berkeley, in 1973 (whilea Dr. Steiglitz has served as a member of the Digital Signal Processing member of the Technical Staff of Bell LaboraCommittee of the IEEE Acoustics, Speech, and Signal Processing Sotories), and the Ph.D. degree in electrical engiciety, and as an Administrative Committee member and Awards Chairneering andcomputer science fromPrinceton man of the Society. He is Associate Editor of the journal Networks. University, Princeton, NJ, in 1982. He is now with the Department of Computer Science, University of He is a member of Eta Kappa Nu, Tau Beta Pi, and Sigma Xi, and in 1981 received the Technical Achievement Award of the ASSP Society. California, Santa Barbara,CA.

Suggest Documents