Low Power Minimum Distance 1D-Search Engine ... - Semantic Scholar

1 downloads 0 Views 893KB Size Report
This Minimum Distance ID-Search Engine (MDSE) realizes the pattern-matching hardware ... According to architectural-level power estimation, the proposed.
A LOW-POWER MINIMUM DISTANCE lD-SEARCH ENGINE USING HYBRID DIGITAL/ANALOG CIRCUIT TECHNIQUES Chang-Ki Kwon, Student Member of IEEE, and Kwyro Lee, Senior Member of IEEE Department of Electrical Engineering Korea Advanced Institute of Science and Technology, Taejon, 305-70 1, Korea Phone: +82-42-869-5433. Fax: +82-42-869-8590. E-mail: ckkwon @ dimple.kaist.ac.kr

Abstract This Minimum Distance ID-Search Engine (MDSE) realizes the pattern-matching hardware accelerator for the portable multimedia and intelligent processing systems. The chip executes highly parallel computation of L,-norms between an input key and stored multiple reference records, and search of the minimum distance among them in 1 dimensional (1D) memories. According to architectural-level power estimation, the proposed MDSE improves the power reduction by orders of magnitude as compared to the conventional systems, as the number of record increases. Two novel circuits, such as Merged Memory Logic (MML) and Digital/Analog Mixed Winner-Take-All (DAMWTA) circuit, have been implemented with 0.6 pm CMOS technology. The simulation results of the 4bit-8word MDSE show that-the power dissipation (=2.8mW at 3V) of the MML coincides to the estimated power within 43% error, and the worst-case delay of the DAM-WTA is less than 80ns.

1. Architectural-Level The architecture of MDSEs styles shown in Fig. 1(a)-(c). keys (=Xi) are compared in records(=Yjj=l,2,... ,N) with *

1-port IMbit SRAM (Records, Y J

Power Estimation

are classified into three different Assume that 1OMHz (=fs) sampled brute-force manner to the given N 4bit unsigned integer representation. I

f Ia3

b ----------

Found Address,

For power estimation in architectural-level, it is necessary to be (realistically) assumed that supply voltage (=Vdd) is fixed within the process constraints and the clock frequency (=fclk) is scaled linearly below the maximum frequency at which the external SRAM can be accessed. Moreover, assumed the fclk can be gated according to the power-down strategies, then the maximum frequency is only needed to compute the minimal number of parallel data-paths. All the power/area libraries have been uniformly adapted for a 0.6 pm CMOS technology, operating at 5V. For the SRAM, we have used the power model of Fujitsu’s lMbit-SRAM[l], whose maximum frequency is 1OOMHz at SV. The other power models have been adopted from the developed models of embedded RAM, registers[2] and data-paths[3], and they have been slightly modified by Constant E-Field (CE) scaling theory[4]. For area estimation, 0.8 pm COMPASS library[5] has been also scaled similarly to the given 0.6 pm CMOS technology. Fig. 2 shows the predicted results of total power and area, neglecting the dashed boxes due to their negligible contribution. Notice that the power which goes into the memory access (c~f~,J dominates the other contributions. In addition, it can be also noticeable that the effective capacitance of an external SRAM is about one order of magnitude larger than that of an embedded RAM. However, we cannot overlook the gap of power dissipation between Fig. 1(b) and Fig1 .(c) increases as the number of record increases. The reason is that the curve of Figl.(b) increases in proportion to N2, while that of our MDSE increases in portion to N due to the elimination of f2.3.

k

(a) The external SRAM and data-paths. (fi ,= fs, f, ?= N x fs, f, 3= N x fs, f, 4= N x fs )

Comparison of Architectures(Fig.1) @ fs=lOMHt, -1-

I

5

(b) The Embedded Memory Logic. (f*.,= fs, fi,Z= N x fs, f2,3=N x fs, fZA4=N x fs)

1

1

6 7 LogZ(Nurnber of Records)

[x : SRAM+Data-Path] 1 I

(c) The proposed MDSE. (f3.1=fs, fJe2=fs )

1

t

I

8

9

10

[ o : EML] [ * : MDSE] 1

LogZ(Number of Records)

Figure 1. The block diagram of MDSEs.

0-7803-5474-5/99/$10.00 1999 IEEE

0.6um tech., vdd=SV, fck=lOOMHz ---,

-7-1,

Figure 2. The estimated results of total power and area.

I-214

2. Merged Memory Logic Circuit The MML circuit is designed to calculate the absolute difference value between an n-bit input key (=Yn) with n-bit records (=Xn) stored at SRAM in parallel. Fig.3 presents a l-bit MML circuit with a l-bit SRAM cell. Hence, the n-bit MML circuit can be constructed by cascading n l-bit MML circuits like an n-bit ripple carry adder. If CFGl *CSi is high, then the input data ( Yi and % ) can be inserted. As RESET falls to zero, the circuits start to perform two subtractions simultaneously, Xn - Yrz and Yfz- Xn . Using the most significant bit (=Cn) of these two operations, the multiplexer selects the positive one. Therefore, the output (=Di) gives the absolute difference value (=Fn- ml). To reduce the area, two subtractors are combined as follows. Pi = Xi 0 E

for Xi - Yn

C/ = XiYi + &Pi

for

Fig.4 shows the average power dissipation of our 4bit-8word MML over 200ns duration. All the pseudo-random input keys were generated by a 32bit LFSR(Linear Feedback Shift Register). Notice that the power dissipation simulated by HSPICE using the extracted parameters from the chip-layout coincides to the estimated power within 36~59% error.

(1)

= xi 0 Yi .for Yn - XII Ci = XiYi + Ci-lPi

Our proposed MML has the following features: Firstly, it is very compact and modular to be expanded further, for the first half adder is a combination of the XOR with positive Manchester carry chains having transmission gate structures. Secondly, for low power consumption, it is designed by signal feedback technique (at MPF)[6] and charge-recycling scheme (at MNE)[7], which will be more useful to speed up the propagation delay as the number of records increases. Finally, it is flexible in adaptation to the change of various functions such as absolute difference calculation, subtraction, and addition, via simple control of (Jo, Co//and Cn.

Average

Power

18.0m

Xn - Yn

(2)

for Yn - Xn

iw16.0mt ‘0 14.0m .-Y

1bit SRAM Cell

W/L

(3)

-simulated

F o~-simulated - - estimated

Y

Di = Pi Q (C,,Ci-l+ C,C:-, 1 (where Co = CA = 1)

- m

estimated

I

of 4bit X 8word

MML

at at at at

---

--NW”---.l.D-

8.0mt

B/L 15.OM

1 O.OM

fHz1

Figure 4. Average power dissipation from 0 through 200ns with pseudo-random input data.

3. Digital/Analog Pi

90

70

1

Xi

C Gl*CSi 1 Yi

at Vdd=5V at Vdd=3V

80

--

.! n

Frequencv

CFGl *CSi

-A--error --VW error

Vdd=W Vdd=3V Vdd=SV Vdd=3V

"1 .E

“‘;Sv.OM

1RESET

Dissipation

20.0m r

Mixed WTA Circuit

A variety of circuits that perform the WTA function have been reported recently for analog and digital signal processing systems. Analog WTA circuit (AWTAC) that rely almost invariably on voltage-follower (VF) is ubiquitous in virtue of the smallest size and the fastest speed. However, several modifications were proposed to enhance the system performance because of VF’s low resolution-gain[8]. Moreover, most of their gain stages suffer from the considerable variation in the process, temperature, and supply voltage.

s

Di

Fig.5(a) illustrates our AWTAC, of which the resolution gain is determined only by the (W/L) ratio of Ml to MRl which operate in the triode region. If all the other transistors, except for Ml and MRl, are in the saturation region, and gm,,, is much greater than hJ%f ,+g%m, ) 7 the gain (=A) from the input (v,) to the output (v,) is given by &7%4, AZ --=-gmA4Rl

Figure 3. The 1bit MML circuit diagram.

I-215

PM1 P MRl

(4)

Fig.S(b) shows the simulation results for the proposed 8 AWTA cells and underlying VF cells. It shows that the system resolution (rO.lZV) of the VF cells is reduced to 12mV for our AWTAC, a 10 times improvement obtained by the feedback voltage gain (A). We also extend our AWTAC to DAM-WTAC, shown in Fig.G(a), by adding a replica-bias circuit and a simple digital input stage that converts the digital voltages to the analog current. The input current is linearly converted to the analog voltage at the gate of MRl by the negative feedback (MC1 , MSFl, and MRl). After WTA actions, MPFI forces to turn off MSFl and stabilizes the output stage by the positive feedback (MPF 1, MP 1, and MP2)[9].

i'[MSF, I I

I

h-i Vbw

==

'

Vbr=0.825V, '

Iba=12uA,

Cp=O.l

pF @ Fh.G(a)

'

w

r‘,l’

iN

II

I .MBN I Y I IL----I

F

VN

_.

‘..* c- ,..____.........................

L

t

. . . . . . . _. . . . _. . . . . . . . . . . . . . . . . . . . . . . . . . i

I 800.0n

1

-0.5 '

--

I

' 0.0

'

' 200.0n

'

' 400.0n

'

' 600.0n

I

1

I

low-power MDSE is proposed by exploring the architecturallevel power domain for dramatic power reduction. In addition, two novel circuits, such as MML and DAM-WTAC, are also proposed. A summary of the chip features is presented in Table I, and the layout is shown in Fig. 7. It is expected that not only the chip but also the hybrid circuit techniques will be useful for low power multi-media systems.

- Vt(Vsb=0.961W-

Acknowledgements

I 0.5

I 0.0

4. Conclusion

_

-1.4 > '3: > 1.2

1 .o

Iv1 -v21 Figure 5(b). Comparison between VF and our AWTAC.

We would like to thank LG Semiconductor and IDEC for the technical collaboration. We also acknowledge the financial support of Samsung Electronics.

16/2 1 &+Jba

IL

MA1

I

?JMB01 -Replica-B

:

I 1 .olJ

Time

I VDD=3V >$

I -0.5

\‘ _ _.

Figure 6(b). The transient response of the DAM-WTAC at the worst case.

1

I

*___iL

. . . ..%.v.-*.“-*-**.“*w** 0.0

Figure 5(a). The proposed AWTAC.

0.6 -1 .o

N=8, Vdd=3V.

,

I r-+t Ihw.,!

1.6 I

In Fig-G(b), the high pulse on vRST,resetting all the outputs to zero, marks the beginning of the WTA cycle, and the transient response using the extracted parameters from the layout shows the worst-case cycle time below 80ns at Ibw=l2uA. Notice there are trade-off: between the static power dissipation and the system speed, and between total area and the system resolution by adopting logarithmic tree architecture.

asing Circuit

-aI-

v3 1 v2 I VI I VOI 4bits-Digital Input Stage

Figure 6(a). The proposed 4bit DAM-WTAC.

I-216

I VRST

output I Stage



Figure 7. The layout of the 4bit-8word MDSE chip.

Table 1. Features of the 4bit-8word MDSE chip.

MML Circuit Technology Supply Voltage

O.GI.tmlP-3M CMOS 3-5v

DAM-WTA

Circuit

2.Opm (min. length) 3-5v

Cycle Time

< 12.6ns @ Vdd=3V

< 80ns @Ibw= 12uA

Power Dissipation

28mW @Vdd=3V, f= 1OMHz

(static) 1.1mW @Vdd=SV

Total Area

0.3 x 0.7 (mm2)

0.13 x 0.7 (mm2)

References [ I] T. Seki, et al., “A 6-ns I-Mb CMOS SRAM with Latched Sense Amplifier”, IEEE Journal of Solid-State Circuits, vol. 28, no. 4, pp.478-483, April 1993. [2] P.Landman, Low Power Architectural Design Methodologies, Ph.D. Thesis, E.R.Lab., U.C., Berkeley, August 1994. [3] S. Wuytack, et al., “Power Exploration for Data Dominated Video Applications”, ht. Symp. on Low Power Electronics and Design, pp.359-364, August 1996. [4]- _R.H. Dennard, et al., “Design of Ion Implanted MOSFETs with Very Small Physical Dimensions”, IEEE Journal of Solid-State Circuits, vol. SC-g, pp.256-266, Oct. 1974. PI VLSI Tech., 0.8-um CMOS VSC4.50 Portable Library - Rev 1.0, 1992. PI Angus Wu, “High Performance Adder Cell for Low Power IEEE Int. Symp. on Circuits and Pipelined Multiplier”, Systems, pp.57-60, May 1996. [171B.S. Kong, Kwyro Lee, “Charge Recycling Differential Logic for Low-Power Application”, ht. Solid-State Circuits Conjkrence, pp.302-303, Feb. 1996. V. Pedroni, “Inhibitory mechanism analysis of complexity PI O(N) WTA networks”, IEEE T. CAS. I, ~01.42, pp. 172-l 75, 1995. PI G. Cawenberghs, V. Pedroni, “A Low-Power CMOS Analog Vector Quantize?, IEEE Journal of Solid-State Circuits, pp. 1278-1283, August 1997.

I-217

Suggest Documents