Via-Programmable Read-Only Memory Design for Full Code Coverage Using A Dynamic Bit-Line Shielding Technique
1
Meng-Fan Chang1, 2, Ding-Ming Kwai2, and Kuei-Ann Wen1 Institute of Electronics, National Chiao Tung University, Hsinchu, Taiwan 2 Intellectual Property Library Company, Hsinchu, Taiwan
[email protected]
Abstract Crosstalk between bit lines leads to read-1 failure in a high-speed via-programmable read only memory (ROM) and limits the coverage of applicable code patterns. Due to the fluctuations in bit-line intrinsic and coupling capacitances, the amount of noise coupled to a selected bit line may vary, resulting in the reduction of sensing margin. In this paper, we propose a dynamic bit-line shielding (DBS) technique, suitable to be implemented in compliable ROM, to eliminate the crosstalk-induced read failure and to achieve full code coverage. Experiments of the 256Kb instances with and without the DBS circuit were undertaken using 0.25µm and 0.18µm standard CMOS processes. The test results demonstrate the read-1 failures and confirm that the DBS technique can remove them successfully, allowing the ROM to operate under a wide range of supply voltage.
1. Introduction Mask-programmable read-only memory (ROM) macros are commonly embedded into system-on-chip (SoC) designs today. To meet the instance requirements for a variety of configurations, ROM compilers targeting at CMOS baseline processes have been developed and the available code layers include diffusion, metal, and via. Compared to those coded by diffusion and metal layers, via-programmable ROMs shorten the turnaround time in manufacturing [1]. Consequently, it is preferred that the coding via can be promoted to an upper level. We have seen synchronous ROM compilers designed to be coded by via-1 at the 0.25µm generation and by via-2 at the 0.18µm generation. However, the coding via at an upper level implies that the bit lines must be promoted as well. Fig. 1 shows a cross-
0-cell
1-cell
0-cell
Metal-3 (BL) Via-2 (Code) Metal-2 Via-1 Metal-1 Contact STI
Diffusion
STI
Diffusion
STI
Diffusion
STI
Fig. 1. Cross-sectional view of ROM array with via-2 codes and metal-3 bit lines.
sectional view of the memory array with via-2 codes and metal-3 bit lines. This leads to more severe bit-line crosstalk, since the contact and via at the lower levels must be stacked and the metal islands for them to land upon contribute considerable side-wall capacitances to neighboring bit lines. In a configurable design, where the memory array is partitioned into at most two banks, the length of a bit line is determined by the choice of column multiplexers, such as 16, 32, and 64. Code patterns on a viaprogrammable ROM can produce wide variations in bit-line loading and coupling capacitances. To achieve a high speed, the word-line pulse width is short, and the sensing margin becomes small and vulnerable to noise. The crosstalk-induced read failure limits the coverage of applicable code patterns. To reduce power consumption and to improve speed performance of ROMs, previous works employed schemes, such as precharge-discharge dynamic CMOS
Proceedings of the 2005 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT’05) 1087-4852/05 $20.00 © 2005 IEEE
logic [2], charge recycling [3], restricted bit-line swing [4]-[6], and divided wordlines [7]-[9]. These studies have not addressed their solutions to the crosstalkinduced read failure across different code patterns. In this paper, we investigate this problem and present a dynamic bit-line shielding (DBS) technique to achieve 100% code coverage for high-speed via-programmable ROMs. To our knowledge, this is the first to deal with the code-dependent read failure induced by crosstalk between bit lines. The remainder of our presentation is organized as follows. Section 2 analyzes the behavior of the codedependent crosstalk-induced read failure. Section 3 presents the proposed dynamic bit-line shielding technique. Section 4 gives the experimental results on 0.25µm via-1 and 0.18µm via-2 programmable ROMs. Section 5 contains our conclusion.
2. Crosstalk Induced Read Failure 2.1. Sensing Margin In typical NOR-type via-programmable ROMs, the data stored in a bit cell is determined by whether the electrical connection between the transistor of the bit cell and the bit line is present or not. As illustrated in Fig. 2, a bit cell with its transistor connected to the bit line (BL) stores a 0 (0-cell), and a bit cell without such a connection stores a 1 (1-cell). When a row is activated by applying a high voltage to the relevant word line (WL), the 0-cell sinks current from the bit line, while the 1-cell does not sink any current. The 0cell also adds the parasitic capacitance on the drain side of its transistor to the bit line. Correspondingly, the capacitive loading on a bit line may vary by different code patterns. The bit line of a column whose cells are all 0-cells experiences the largest loading effect. This fluctuation in parasitic capacitance causes various voltage swings on a bit line when a 0-cell is accessed. In a conventional synchronous ROM design, all bit lines are precharged to the power supply voltage VDD prior to the sensing phase of a cycle [5]. During the sensing phase, a bit line is either discharged or floating. Here, we assume that the bit line develops a voltage swing of 'V0 when reading a 0-cell, and that of 'V1 when reading a 1-cell. To differentiate a 0-cell from a 1-cell, the sense amplifier then needs a reference voltage VREF whose value must be set between VDD – 'V0 and VDD – 'V1. The sensing scheme with a predefined VREF [10] is employed in this work. Clearly, the sensing margin SM0 for reading a 0-cell is VREF – VDD + 'V0 and the sensing margin SM1 for
0-cell Code (Via) WL (Poly) VSS (Diffusion)
BL (Metal)
1-cell
Fig.2. The circuit and layout of a cell array in viaprogrammable ROMs. The word line (WL), ground line (VSS), and bit line (BL) are implemented by poly, diffusion, via, and metal, respectively.
reading a 1-cell is VDD – 'V1 – VREF. To detect the data correctly, the voltage drops 'V0 and 'V1 must satisfy the inequalities 'V0 > VDD – VREF and 'V1 < VDD – VREF simultaneously by noting that both SM0 and SM1 must be larger than zero under the parametric variations. Thus, the minimum of 'V0, denoted as 'V0min, and the maximum of 'V1, denoted as 'V1max, deserve our attention, since 'V0min – 'V1max = SM0 + SM1. In other words, once 'V0min and 'V1max are determined, no matter how we adjust VREF, the total sensing margin cannot be improved.
2.2. Crosstal-Induced Read Failure in ROMs Failure to sense a 1-cell correctly is one of the major consequences when crosstalk between bit lines occurs. Fig. 3 shows the behaviors without and with crosstalk between bit lines in a 256Kb ROM macro. The memory array is configured as 512 u 512 in a single bank. Without crosstalk in Fig. 3(a), the bit-line waveforms are BL-A1 and BL-A2 when reading a 0cell with light (with 511 1-cells) and heavy (with 512 0-cells) bit-line loads, and BL-D when reading a 1-cell. Recall that a configurable design activates all bit cells on the same row by a word line, while reading out only those on the bit lines selected by the column multiplexers. A 0-cell of an unselected bit line, which shares the same word line, becomes an aggressor. Its voltage drop 'V0 generates various amounts of coupling noise on the selected bit line, as we notice that
Proceedings of the 2005 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT’05) 1087-4852/05 $20.00 © 2005 IEEE
BL-D
BL[j-2]
BL-V2
Sense a 1-cell
BL[j-1]
BL[j]
BL[j+1]
BL[j+2]
WL [j]
VREF CC[j-2,j-1]
BL-A2
WL
Sense a 0-cell
WL Read-1 Failure
CC[j, j+1] CC[j+1, j+2]
ICELL
CBL[j]
BL-V1 Precharge
BL-A1
(a)
CC[j-1,j]
BL-A2
1-cell
0-cell
(b)
Fig. 3. Simulated waveforms of a 512-cell bit line in a conventional 0.25µm via-1 ROM design (a) without and (b) with crosstalk.
Y[j-1]
Y[j]
Y[j+1]
Sense Amplifier
in Fig. 3(a), 'V0 can vary widely from a full VDD swing to a few hundred millivolts. In Fig. 3(b), two such aggressors (i.e., 0-cells) are the nearest neighbors. With light bit-line load, a significant voltage drop occurs which causes the read failure of 1-cell. The voltage drop 'V1 on the victim bit line BL-V1 for reading a 1-cell is larger than VDD – VREF. As a result, the sense amplifier mistakenly detects a 0, rather than the expected 1. With heavy bitline load, 'V1 is smaller than VDD – VREF, as on BLV2, and thus, the sense amplifier still detects the correct data. We find that it does not work to prevent such a large voltage drop by adding a static bit-line pull-up (bleeder), if 'V1max is much larger than 'V0min. Because the cell transistor is already drawn with the minimum channel width (in our case, 0.58µm for the 0.25µm process and 0.42µm for the 0.18µm process), coupling capacitance between bit lines and intrinsic bit-line load play a key role in the crosstalk effect. Unfortunately, as the feature size scales down, the coupling capacitance increases, even through the intrinsic bit-line capacitance decreases [11]. For the 0.25µm and 0.18µm processes that we used in this work, their metal thickness and inter-metal dielectric (IMD) thickness are virtually kept the same. The trend limits the number of bit cells connected to a bit line in a conventional synchronous ROM using advanced technologies. Fig. 4 depicts a simplified bit-line structure with the parasitic capacitances. The unselected bit lines BL[j – 1] and BL[j + 1], which are the nearest neighbors to the selected bit line BL[j], have 0-cells activated when the
Fig. 4. A simplified structure of bit-lines and multiplexing circuit in a conventional ROM.
word line WL[i] is turned on. Voltage swings on BL[j – 1] and BL[j +1] generate coupling noise through CC[j – 1, j] and CC[j, j + 1] onto BL[j]. Given word-line pulse width TWL and cell current ICELL, the voltage drop 'V0[j] of an aggressor bit line BL[j] can be derived as
' V0 > j @
I CELL TWL C BL > j @ C C > j 1, j @ C C > j , j 1@
(1)
With two aggressors present at the neighboring bit lines, the coupled voltage 'V1[j] on a victim bit line BL[j] can be derived as
'V1> j@ 'V0> j 1@
CC> j1, j@ CBL> j@ CC> j1, j@
'V0> j 1@
CC> j, j1@
(2)
CBL> j@ CC> j, j1@
Intuitively, a victim bit line suffers the largest coupling noise when it has no 0-cell and its neighbors on the same row are all 0-cells which turn into aggressors. Moreover, the aggressor bit lines have light loading to gain the maximum strength. In practice, however, we are not able to obtain the value simply from (1) and (2), because they are intertwined by a more complex RC network with the metal (bit-line) sheet resistance ranges from 55m:/
to 100m:/
Proceedings of the 2005 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT’05) 1087-4852/05 $20.00 © 2005 IEEE
Maximum Voltage Drop (V)
1.2
BL[j-2]
BL[j-1]
BL[j]
BL[j+1]
BL[j+2]
WL [j]
1 0.8
CC[j-2,j-1] CC[j-1,j]
Worst Case
CC[j, j+1] CC[j+1, j+2]
ICELL
0.6 CBL[j]
0.4
Typical Case
0.2
Pre_Odd
1-cell
Pre_Even
0-cell
0 0
64
128
192
256
320
384
448
512
Number of 0-Cells on the Victim Bit Line
Y[j-1]
Y[j]
Fig. 5. Maximum voltage drop versus number of 0cells on a victim bit-line across code-patterns.
and the via resistance from 4: to 8:. Parasitic extraction and circuit simulation are used to obtain the worst case. Fig. 5 shows the simulation results for the worst and typical cases in the 0.25µm via-1 ROM design. It isnoteworthy to point out that if the number of 0-cells on the victim bit line is larger than 256 or if the number of 0-cells on the aggressor bit line is smaller than 256, a correct sensing may still be achieved.
3. Dynamic Bit-line Shielding Technique Since the fluctuations in coupling capacitance and intrinsic loading on the bit lines across various code patterns are inevitable, restricting the voltage swing of the aggressors helps to reduce the coupled noise on the victims. The dynamic bit-line shielding (DBS) technique is proposed to eliminate the crosstalk between bit lines and to prevent the read failure. The behavior of the DBS ROM in the precharge phase is the same as that of a conventional one: All bit lines are precharged to VDD before a word line is turned on. After the input address is decoded, the selected bit line BL[j] is connected to the sense amplifier through the multiplexer controlled by the column select signal Y[j]. However, the behavior of the sensing phase is different. In the DBS ROM, a half of the bit lines, either odd- or even-numbered, are clamped at a fixed voltage during the sensing phase. The unselected bit lines, whose voltage level is maintained by the clamping transistors, now act as the shielding for the
Y[j+1]
Sense Amplifier
Fig. 6. A simplified structure of bit-lines and multiplexing circuit of the DBS ROM.
selected bit lines and are dynamically specified according to the input address of each cycle. In Fig. 6, we let the clamping transistors be shared with the precharge ones. Thus, the control signals for both precharging and clamping are encoded with the least significant bit of the column address. Accordingly, the area overhead of the clamping circuit and the control block for odd/even precharge signals is relatively minor. If the bit line BL[j] is selected, then the column select signal Y[j] turns on the path to pass the signal on BL[j] to the sense amplifier. The Pre_Odd signal enables the clamping function on odd-numbered bit lines (i.e., BL[jr1], BL[jr3], and so on). This clamping scheme results in no voltage swing on the oddnumbered bit lines, during the sensing phase. However, the unselected even-numbered bit lines (i.e., BL[jr2] and so on) still have voltage swings as in a conventional ROM design, if 0-cells are activated during the sensing phase. This clamping eliminates the major coupling noise source for the selected bit lines and mitigates the crosstalk induced read failure. A variant to clamp only the nearest neighbors of the selected bit lines can certainly be implemented, but requires a more complex control scheme. In summary, a bit line during the sensing phase by the DBS technique is in one of the following three
Proceedings of the 2005 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT’05) 1087-4852/05 $20.00 © 2005 IEEE
Two test chips have been fabricated using 0.25µm and 0.18µm CMOS processes, respectively. Each test chip contains two 256Kb ROM macros implementing a conventional synchronous design and its DBS counterpart, respectively. The memory array is configured as 512 u 512 in a single bank; the 0.25µm ROMs are coded by the via-1 layer and the 0.18µm ones by the via-2. For crosstalk analysis, the macros are equipped with two external pins to raise or to lower the reference voltage VREF by 100mV. Table I lists the features of the ROM macros.mail addresses if possible. A realistic code pattern was applied to the memory array to explore the crosstalk-induced read failures. The code pattern implements three cases of bit-line loads, respectively, with 1, 256, and 511 0-cells on the bit lines, while making the victim bit lines read 1-cells and the aggressor bit lines read 0-cells. The conventional ROM design fails reading the 1cells with the three VREF settings in the nominal VDD range, even though lowering VREF helps to reduce the number of failing bits. Fig. 7 shows an example using the 0.25µm test chip. Functional tests were performed by varying VDD from 2.9V to 1.5V in a step of 0.05V, and in each step, the number of failing bits is recorded. The read failures appear to depend on VDD. As VDD is lowered, the aggressor strength becomes weaker and the number of failing bits is reduced. With the original VREF setting, the test chip can eventually pass at VDD = 1.55V and by lowering VREF, the operable supply voltage is improved to 2.1V. On the other hard, the DBS ROM passes the functional tests with all VREF settings. Fig. 8 shows the Shmoo plot of the 0.25µm DBS counterpart with VDD sweeping from 3.0V to 1.5V in a step of 0.05V. The 0.25µm DBS ROM is shown to be able to operate at the supply voltage down to 1.6V. The test results demonstrate that the DBS technique indeed eliminates the read-1 failures and also provides headroom for the ROM to have a higher VREF value.
5. Conclusion
100000
Number of Bit Failures.
4. Experimental Results
Table I. 0.25µm and 0.18µm Via-Programmable ROM Macros Technology 0.25 0.18 (µm) Capacity 256K 256K (bits) Macro Area 0.53 0.31 (mm2) Bit Cell Size 1.44 0.75 (µm2) Via-1 Via-2 Code Layer 2.5 1.8 VDD (V) VREF (V) 2.35 2.25 2.15 1.65 1.55 1.45
Raised Reference Voltage
10000 1000 100
Original Reference Voltage Lowered Reference Voltage
10 1 1.5
1.75
2
2.25
2.5
2.75
3
Supply Voltage (V)
Fig. 7. Number of read-1 failures in the conventional 0.25µm via-1 ROM. 1.5V
2.0V
2.5V
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 2.0V
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 2.5V
5ns
10ns
tACC
states: selected read, unselected read, or unselected clamp. In the unselected clamp state, the nearest neighbors of the selected bit line are put into the crosstalk-elimination mode. The clamped bit lines (even or odd) are dynamically selected based on the least significant bit of the column address.
15ns
20ns
25ns 1.5V
* * * * * * * * * *
* * * * * *
* * * * * * *
* * * * * * * *
* * * * * * * * *
* * * * * * * * * * * *
We have investigated the crosstalk between bit lines and read-1 failures across various code patterns in
Proceedings of the 2005 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT’05) 1087-4852/05 $20.00 © 2005 IEEE
* * * * * * * * * * * * *
* * * * * * * * * * * * * *
* * * * * * * * * * * * * *
VDD
* * * * * * * * * * * * * * *
* * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * *
3.0V 5 ns * * * * * * * * * 1 0ns * * * * * * * * * * 1 5ns * * * * * * * * * * 2 0ns * * * * * * * * * * 2 5ns 3.0V
Fig. 8. Shmoo plot of the 0.25µm DBS ROM by VDD sweep synchronous via-programmable ROMs. The conventional design was shown to be dependent of VDD due to the crosstalk-induced read failures. Then, the DBS technique was proposed to eliminate the crosstalk and to preserve the sensing margin. The test chips demonstrated its effectiveness and achieved 100% code coverage under high-speed operation. This DBS circuit, albeit simple, has been implemented in our 0.25µm and 0.18µm compilers, which aim primarily at area minimization, and proven successfully in mass production.
Acknowledgments The authors would like to thank S.-M. Yang, S.-J. Shen, H.-Y. Pan, C.-H. Lee, Y.-C. Liao, Agnes Hsu , and G.-S. Chaung of Intellectual Property Library Company (IPLib) for the implementation and testing of the testchips.
References [1] H. Takahashi, S. Muramatsu, and M. Itoigawa , “A new contact programming ROM architecture for digital signal processor,” Proc. IEEE Int’l Symp. VLSI Circuits, pp. 158-161, June 1998. [2] C. -R. Chang, J. -S. Wang, and C. -H. Yang, “Lowpower and high-speed ROM modules for ASIC applications,” IEEE J. Solid-State Circuits, vol. 36, no. 10, pp. 1516-1523, Oct. 2001.
[3] B. D. Yang and L. S. Kim, “A low-power ROM using charge recycling and charge sharing techniques,” IEEE J. Solid-State Circuits, pp. 641-653, Apr. 2003. [4] M.-F. Chang, K.-A. Wen and D.-M. Kwai, “Supply and substrate noise tolerance using dynamic tracking clusters in configurable memory designs,” Proc. IEEE Int’l Symp. Quality Electronic Design, pp. 225- 230, Mar. 2004. [5] R. L. Barry et al., “A high-performance ROM compiler for 0.50µm and 0.36µm CMOS technologies,” Proc. IEEE Int’l ASIC Conf., Austin, TX, pp. 370-373, Sep. 1995. [6] T. Tsang, “A compilable read-only-memory library for ASIC deep sub-micron applications” Proc. IEEE 11th Int’l Conf. VLSI Design, Chennai, India, pp. 490–494, Jan. 1998. [7] M. Yoshimoto et al., “A Divided Wordline Structure in the Static RAM and its Application to a 64K Full CMOS RAM,” IEEE J. Solid-State Circuits., vol. 18, no. 5, Oct. pp. 479–485. 1983. [8] K. Itoh et al. “Trends in low-power RAM circuit technologies,” in Proc. IEEE, vol. 83 no. 4, Apr. pp. 524–543, 1995. [9] M. -F. Chang, M. J. Irwin, and R. M. Owens, “Powerarea tradeoff in dual word line memory cell arrays,” J. Circuits, Systems and Computers, vol. 7, no. 1, pp. 49– 67, Feb. 1997. [10] A. Tuminaro, “A 400Mhz, 144Kb CMOS ROM macro for an IBM S/390-class microprocessor,” Proc. IEEE Int’l Conf. Computer Design, pp. 253-255 Oct. 1997. [11] J. A. Davis et al., “Interconnect limits on gigascale Integration (GSI) in the 21st century,” Proc. IEEE, vol. 89, no. 3, pp. 305-324, Mar. 2001.
Proceedings of the 2005 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT’05) 1087-4852/05 $20.00 © 2005 IEEE