may also be combined with various programming order techniques that mitigate the ..... http://newsroom.intel.com/community/intel_newsroom/blog/2011/04/14/.
Constrained Flash Memory Programming Amit Berman and Yitzhak Birk Technion – Israel Institute of Technology {bermanam@tx, birk@ee}.technion.ac.il Abstract—In NAND Flash memory featuring multi-level cells (MLC), the width of threshold voltage distributions about their nominal values affects the permissible number of levels and thus storage capacity. Unfortunately, inter-cell coupling causes a cell’s charge to affect its neighbors’ sensed threshold voltage, resulting in an apparent broadening of these distributions. We present a novel approach, whereby the data written to Flash is constrained, e.g., by forbidding certain adjacent-cell level combinations, so as to limit the maximum voltage shift and thus narrow the distributions. To this end, we present a new family of constrained codes. Our technique can serve for capacity enhancement (more levels) or for improving endurance, retention and bit error rate (wider guard bands between adjacent levels). It may also be combined with various programming order techniques that mitigate the inter-cell coupling effects and with decoding techniques that compensate for them.
I.
INTRODUCTION
Flash memory is a form of non-volatile memory (NVM) that can be electrically programmed (written) and erased. Relative to magnetic hard disk drive (HDD), NAND Flash features lower latency read and write operations, lower power consumption, higher throughput, and solid-state reliability [1]. However, NAND Flash still has a worse price-capacity ratio, as well as limited endurance (the number of times that a memory cell can be programmed and erased, typically 103-105 cycles) and retention (the ability of a memory cell to store valid value over time, typically 10 years) [1]. Therefore, increasing Flash memory capacity, endurance and retention is highly desirable. NAND Flash density has approximately doubled every year since 2003, and has reached 543Mb/mm2 in 2011 [2]. In order to enhance its bit density, device technology development has progressed in two main directions: 1) aggressive technology scaling - NAND Flash memory represents the most dense semiconductor circuit [1,2]; and 2) multi-level cell (MLC) architectures (multiple bits per cell) [1,3]. A Flash cell stores electrons in a floating gate layer. Generally, the more electrons that are stored (or trapped), the higher the cell’s threshold voltage, Vt. (Vt refers to the voltage that must be applied to the gate to cause a measurable level of conduction in the channel.) Programming (writing) entails a sequence of voltage pulses, each tunneling a small amount of charge into the floating gate. Following each pulse, the cell’s threshold voltage is sensed, and the process repeats until the required Vt is sensed. A cell should not be overcharged, as charge cannot be removed from individual cells. Cells can be programmed concurrently, and programming schemes have been studied extensively [1]. In single level cells (SLC), only one program level is
available, so a cell is either erased or programmed. Multi-level cells (MLC) currently permit as many as 15 levels [3], and more elaborate Vt sensing circuitry is used for determining the level. In line with the SLC convention, an L-level cell can store log2(L+1) bits. Given a permissible range of Vt, the number of permissible levels is determined mostly by the width of the Vt distributions about their nominal values. As NAND Flash memory process technology scales below 32nm and the number of charge levels per cell exceeds four, cell threshold voltage distributions must be narrower in order to prevent errors resulting from distribution overlap [1,2,3]. An impediment to achieving narrow Vt distributions is the floating-gate (FG) to floating-gate coupling effect [1,3,4]: a cell’s Vt is shifted, the extent depending on the level of adjacent-cell coupling and on the amount of charge in its surrounding cells. When merely sensing a given cell’s Vt, this manifests itself as an apparent broadening of the Vt distributions. In this paper, we present a novel approach whereby constrained coding is employed to limit the effect of floating gate inter-cell coupling. Specifically, we forbid the use of those adjacent-cell charge combinations that result in exceedingly high Vt shift of any cell. Our analysis shows that the required sacrifice of per-level information capacity may be small. In one example, for 7-level cells with at most a 10-level total charge difference between a cell and its neighbors, the code rate is above 0.95. This 5% loss can be more than offset by the capacity increase brought about by the larger number of levels. II.
INTER-CELL COUPLING AND RELATED INSIGHTS
Ideally, the floating gate voltage of a cell is determined solely by its control gate, drain, and source voltages, the charge stored in the floating gate, and the capacitances among them. In practice, however, the sensed Vt of a given memory cell is also affected by the amount of charge in its neighboring cells. The main cause is parasitic inter-cell coupling capacitance [1,4]. Unfortunately, this effect becomes more pronounced as device size is scaled down. For example, the cell-to-cell coupling has increased threefold with the move from 160nm to 43nm process technology. Inter-cell parasitic coupling also causes yield loss and may limit process scaling. At the architecture level, the change to a cell’s Vt due to charges in neighboring cells manifests itself as an apparent broadening and shift of the Vt distributions, as depicted by the red line in Fig. 1. This results in fewer permissible charge levels and thus in lower capacity, and/or in a smaller inter-
distribution gap and thus lower endurance and retention, i.e., less reliable storage. 1.2
adjacent-cell level combinations so as to bound its value from above. Algorithm 1 provides additional details. __________________________________________________ Algorthm 1: Flash constrained coding framework 1.
normalized number of cells
1
2. 3.
0.8
0.6
4.
0.4
5. 0.2
0
-3
-2
-1
0 1 threshold voltage [V]
2
3
4
Fig. 1: Threshold voltage distributions for a 7-level cell. Blue line: no charge in neighboring cells; Red line: the possible effect of inter-cell coupling on threshold voltage distributions.
In reality, the effect of inter-cell coupling is actually deterministic to within the variability of various capacitances. Also, because of the aforementioned programming technique (multiple small pulses between consecutive levels and Vt verification at each step), a cell’s Vt at the end of its own programming is essentially at the nominal value despite the effect of neighboring cells. Consequently, and because charge is only added to cells as they are being programmed, a cell’s Vt is only affected by charge added to its neighboring cells after the completion of its own programming. Nonetheless, the unconditional threshold voltage distribution, i.e., not conditioned upon a particular state of the neighboring cells and a particular programming order, appears to have become wider. Note that the sample space for this distribution is the set of all charge-level combinations in the cell and its neighbors. These important observations have spawned techniques for mitigating the effect of inter-cell coupling on cell capacity. E.g., a “proportional programming” technique whereby all cells are programmed concurrently in steps that are proportional to their target charge levels, so very little charge is added to any cell’s neighbors after its own programming has been completed. One decoding technique is successive interference cancellation (SIC), starting with cells that were programmed last [5]. Our approach, which can be viewed as competitive or complementary to those, is to restrict the neighboring-cell charge-level combinations so as to limit the worst-case effect of inter-cell coupling, thereby narrowing the Vt distributions by “chopping off” their tails. III.
CONSTRAINED FLASH MEMORY PROGRAMMING
A. Scheme outline We begin by constructing a programming-order dependent coupling severity function, and then constrain the use of
6. 7. 8.
Given the programming order, express the change to a cell’s Vt as a function of the charge levels of its neighboring cells. We refer this function as FG inter-cell coupling effect severity function. Choose T, the maximum permissible value of this function. From 2, the components of distribution width and the coupling parameters, derive the distribution’s width. Determine the permissible number of levels based on the range of Vt, the distribution widths in 3 and the required gap between adjacent distributions. Through constrained coding techniques, define the constrained code and determine the maximum code rate R1; i.e., the fraction of cell-level combinations that satisfy the constraint on the function. Compute the Flash cell storage capacity (FSC) as FSC=Rlog2(L+1), where L+1 is the number of permissible charge levels per cell (including the “erased” level). Construct an encoder and a decoder either algorithmically or as a lookup table. Here, one may elect to trade some capacity for simplicity. Repeat steps 2-7 in search of the constraint value that maximizes Flash cell storage capacity.
Remarks. - The constraint value in 2 may differ among levels. - In optimizing the constraint value, one can trade capacity for reliability (endurance, retention and transient bit error rate).
__________________________________________________ B. Coding Schemes for Breadth-First and Even-Odd Programming Orders In this section, we demonstrate the use of constrained Flash programming in the context of two prominent programming orders: Breadth-First and Even-Odd. As a baseline for each of them, one can use the unconstrained case. This results in R=1 (no restrictions), but also in the smallest number of charge levels. In both cases, we consider the relatively simple 1-D case (a single, very long row of cells). The 2-D case, albeit more complex, is basically the same, provided that 2-D blocks of cells are being programmed. The severity function (step 1) of cell c will be denoted D(c). 1. Breadth-first programming order (1-D) Consider charge levels {0,1,…,L} and the following programming order: all cells whose target levels are 1 or higher (Level(c)≤1) are programmed to level 1, including verification of Vt; next, all those s.t. Level(c)≤ 2 are programmed to level 2, and so on. Therefore, cell c is only affected by the sum of the charge differences between its higher charged neighbors and itself. Accordingly,
Vt (c) D(c)
D (c )
max Level (neighbor ) Level (c), 0
(1)
neighbors
where Level neighbor – the charge level of cell's neighbor.
4 L 2T 2 1,..., T L 4
4 L 2T 4 1,..., T L 4
1,...,T L
(a)
(b)
(c)
Fig. 2: (a) Graph representation of a T-constrained code for Flash breadth-first programming order, G1, includes 1+4L-2T vertices. A cell is capable of storing discrete levels {0…L}. (b) Top: additional edges labeled N from odd-index vertices i to even vertices numbered i+1+4N, where N=1,2,3,…, (4L-2T-i)/4; Bottom: additional edges from even vertices j (j≥4) to vertices (4L-2T-1),(4L-2T-3),…,(4L-2T-(j-3)) labeled (T-L+1), (T-L+2),…, (T-L+(j/2-1)) respectively. (c) Additional edges, added only if labels are unique, from any odd vertex to all other odd vertices (including itself): an edge to vertex 1 is labeled L, an edge to vertex 3 is labeled L-1, and so on, with the edge to vertex 4L-2T-1 labeled T-L+1. For simplicity of presentation, numbers separated by commas on the edge represent different labels on separate edges with the same source and destination vertices.
Definition: a 1-D T-constrained code for Flash breadth first program order is a set of finite words over the alphabet 0,1,..., L , such that in a word w=u1u2…un , for any letter ui , 2in-1 (excluding the first and last letter) in w: max ui 1 ui ,0 max ui 1 ui ,0 T , LT2L-1. The expansion of this definition to 2-D T-constrained code is straightforward, but will not be discussed in this paper. Code construction: A general graph representation of a Tconstrained code for Flash breadth-first programming order is shown is Fig. 2. Traversing the graph generates a legal codeword. The graph consists of 1+4L-2T vertices. For facility of exposition, we partition the graph edges into three components. Fig. 2(a) shows all graph vertices. The first vertex is not numbered. The edges are as follows: the first vertex is connected to all odd vertices (1, 3,…, 4L-2T-1) with edges labeled L down to T-L+1, respectively. All odd vertices are connected to the first vertex, each with edges labeled {(4L-2T2)/4+1,…,T-L}, {(4L-2T-4)/4+1,…,T-L}, …, {1,…,T-L}. Odd vertices are connected to consecutive even vertices with an edge labeled 0. Even vertices 2,4,…,4L-2T are connected to the first vertex with edges {0,…,T-L}, respectively. Fig. 2(b) top shows an edge labeled N from any given odd vertex i to the even vertices (i+1+4N), for N=1,2,3,…, (4L-2T-i)/4. Fig. 2(b) bottom shows an edge from even vertices j (j≥4) to odd vertices (4L-2T-1),(4L-2T-3),…, (4L-2T-(j-3)) with edges labeled (T-L+1), (T-L+2),…, and (T-L+(j/2-1)), respectively.
Finally, Fig. 2(c) shows edges from any odd vertex to all odd vertices (including itself): an edge to vertex 1 is labeled L, an edge to vertex 3 is labeled L-1, and so on, with the edge to vertex 4L-2T-1 labeled T-L+1. Lemma 1: the graph G1 in Fig. 2 represents a language for T-constrained Flash coding with breadth-first program order. Proof: Consider a vertex, say V1, in graph G1. V1 has one or more outbound edges. Select one edge and follow any path through two more vertices, such that V1V2V3 are connected (G1 is strongly connected). The sum of the initial edge's label and any outbound edge's label in V3 is at most T. The combinations of any output edge from V1 and any output edge from V3 (s.t. V1V2V3 are connected) yield all possible values in (0,T). By allowing one to start from any edge and ensuring that all 3symbol sequences are legal, we ensure correctness for a "sliding window" over the cell locations. 2. Even-odd programming order (1-D) Here, the even-numbered cells are programmed first, followed by the odd-numbered cells. An odd-numbered cell is thus unaffected, whereas an even numbered cell is affected by the total charge of its two neighbors. Accordingly,
c odd 0, Level c 1 Level c 1 , c even
D (c )
(2)
edges labeled 0,1,…,T-L, and to vertices 2,3,…,2L-T+1 with edges labeled L,L-1,…, T-L+1, respectively. Vertices 2,3,…,2L-T+1 connect to vertex 1 with edges labeled {0,1,…,T-L},{0,1,…,T-L+1},…, {0,1,…,L-1}, respectively. Fig. 3(b) depicts the corresponding adjacency matrix. Lemma 2: Graph G2, depicted in Fig. 3 represents a Tconstrained language for Flash even-odd program order. Proof: for each vertex i, 2 i 2L-T+1, in graph G, there is one input edge, labeled L+2-i. Each of those vertices contains output edges, ranging from 0 to T-L-2+i. Therefore, the maximum sum of adjacent labels is T, and all smaller sums can be obtained. For state 1, its input edge labels range from 0 to L1 (since L T 2 L-1, T-L L-1) and its output edges range from T-L+1 to L. Therefore, the maximum sum of adjacent labels is T, and all lower sums can be obtained.
(a)
(b) Fig. 3: (a) Graph G2 representing a T-constrained code for L-level Flash with even-odd programming order,. (b) The corresponding (2L-T-1)X(2L-T-1) adjacency matrix AG. Numbers separated by commas on an edge represent different labels on separate edges with the same source and destination vertices.
Clearly, no distribution narrowing takes place for oddnumbered cells (as no broadening occurred). Even numbered cells are affected by the total charge placed in their neighbors, independently of their own level, so all Vt distributions of even numbered cells are affected equally. Therefore, the encoding and decoding process consider only the odd cells. In the coding scheme, a sequence u1u2u3 represents cells in odd positions such as 1,3,5. Note that when calculating the Flash Storage Capacity (Algorithm 1 step 6), the code rate R refers only to half of the cells. Definition: a 1-D T-constrained code for Flash even-odd program order (considering only the even cells) is a set of finite words over the alphabet 0,1,..., L s.t. for any letter ui in a word w=u1u2…un, , 1in-1, and constraint LT2L-1: ui+ui+1T,. (Only odd-index cells are included in a code word!) We allow LT2L-1 because if T2L-1 there is no constraint and all combinations are allowed. Code construction: Fig. 3(a) depicts a graph representation of a T-constrained code for even-odd programming. The graph comprises 2L-T+1 vertices. Vertex 1 is connected to itself with
Lemma 3: Graph G2, depicted in Fig. 3, is the Shannon cover of T-constrained code for Flash even-odd program order; i.e., G2 generates the constrained language of T-constrained code for Flash with the minimum number of states. Proof: G is deterministic, as none of its vertices have multiple outbound edges with the same label. Consider the follower sets of all vertices in G. For each vertex i, 2i2LT+1, all output edges are directed to vertex 1. Since G is deterministic, any length-2 word that begins and ends in vertex i is unique to its follower set. As for vertex 1, the word 00 is unique to its follower set since all output edges are labeled greater than 0, as LT, T-L+1>0. Therefore, all follower sets of the graph's vertices are different, so graph G is reduced. G is irreducible as it is strongly connected: for any two vertices u and v, there is a path from u to v and path from v to u. vertex 1 is connected to all other vertices and vice versa. A path from any vertex u to vertex v is u1v. So, G is deterministic, reduced and irreducible. Therefore, G is a Shannon cover of Tconstrained code for Flash even-odd program order. Additional properties of the Breadth-first and Even-odd constrained languages, e.g., that Fig. 2 is the Shannon cover of the corresponding language, will be presented in a future paper. C. Encoder and decoder construction Using the constrained language graph, an encoder and a decoder can be generated. First, according to the code rate R, find a p:q ratio such that R=p/q (R≤Cap(S)), where p is the size of the input data and q is the size of the encoded data. Next, the constraint language graph has to be multiplied, so that every node will have Nq output edges, where N is the number of levels per cell. The state splitting algorithm may be needed to achieve q output edges [6]. The final step is to assign input data to edges, such that every edge will have two labels: encoding data and decoding data. Encoding is done when traversing the graph, transforming the input data to the encoded data, and decoding is done the same way in reverse. Examples of encoder and decoder will be presented in future paper. D. Error propagation and correction A combination of error correction codes with constrained codes is described in [6], and channel tuning and adding constraints approaches are described in [7]. Error correction
technique that handle Flash memory constrained coding scheme is an interesting subject for future research. IV.
CAPACITY RESULTS
Accurate capacity results for constrained codes of Breadth First and Even-Odd programming orders, for various T and L values, are shown in Fig. 4. Since the alphabet is not binary, the language capacity can be greater than 1. We normalize this capacity by dividing it by the ideal number of bits per cell, so that maximum capacity is 1. The normalized capacity is calculated according to:
CapNorm S
log 2 N l; S log 2 AG l log 2 ( L 1) log 2 ( L 1)
lim l
Where L+1 is the number of levels per cell, and λ(AG) is the Perron eigenvalue of the adjacency matrix of graph G. N(l;S) is the number of legal words of length l in the constrained language S. We observe an exponential increase in the capacity with linear increment of T/L ratio. Breadth-First constrained codes achieve better capacities than Even-Odd. However, Even-Odd encodes only half of the cells. Therefore, if comparing it to breadth-first code rate, we should include the uncoded cells. If R is the code rate of the even cells, the comparison code rate of Even-odd is (R+1)/2, see Fig. 4(b). VI.
(a)
CONCLUSIONS AND FUTURE WORK
We have shown that constrained coding can be used to mitigate the effects of inter-cell coupling in Non-volatile memory, and particularly in MLC NAND Flash. The narrowing of the Vt distributions increases the gap between adjacent Vt distributions, thereby increasing endurance and retention in exchange for a moderate capacity reduction due to code rate below 1. Alternatively, whenever the increase in the number of levels more than offsets the loss in coding rate, storage capacity can be increased. Constrained encoding can serve as an alternative to adjusted cell programming and smart decoding. However, combining it with one or both of those may actually strike a good balance between capacity, complexity and programming flexibility. Our method is thus an alternative to previously proposed ones but may also complement them. This work spans multiple areas, ranging from Flash technology via architecture to coding schemes. We believe that the approaches presented in this paper as well as the provided insights will serve as starting points for further studies in the respective disciplines.
(b) Fig. 4: Capacity bounds (a) Breadth-First programming order, (b) Even-Odd programming order with code rate (R+1)/2, where R is the code rate for odd cell coding.
REFERENCES [1] [2]
[3] [4]
ACKNOWLEDGMENT This work was supported in part by HPI institute for scalable computing.
[5] [6] [7]
J. Brewer et-al., "Nonvolatile Memory Technologies with Emphasis on Flash", IEEE Press Series on Microelectronic Systems, Chapter 6, 2008. IntelPR, "Chip Shot: Intel Micron Sample 20nm NAND Flash", http://newsroom.intel.com/community/intel_newsroom/blog/2011/04/14/ chip-shot-intel-micron-sample-20nm-nand-flash. C. Trinh et-al., "A 5.6MB/s 64Gb 4b/Cell NAND Flash Memory in 43nm CMOS", ISSCC, pp. 246-247, 2009. J.D. Lee et-al., "Effects of Floating-Gate Interference on NAND Flash Memory Cell Operation", IEDL, Vol. 23, No. 5, pp. 264- 266, 2002. Li et-al. "Read operation for non-volatile storage that includes compensation for coupling", US Patent 7,301,839, Nov. 27, 2007. B. H. Marcus, R. M. Roth, and P. H. Siegel, “Constrained systems and coding for recording channels” in Handbook of Coding Theory, V, 1998. A. Berman and Y. Birk, "Error Correction Scheme for Constrained InterCell Interference in Flash Memory", Non-Volatile Memory Workshop (NVMW11), 2011.