UVM-based Verication of ECC Module for Flash Memories Giuseppe Visalli, Non-volatile Engineering Group Micron Semiconductor Italy, Catania, 95121
[email protected] February 17, 2018
Abstract
In this contribution, we present a coverage driven functional verication environment based on the UVM framework and the System Verilog language to certify the operational correctness of the ECC error management logic used in volatile and nonvolatile memories. We apply this methodology to oating-gate nonvolatile memories for the embedded market, which requires a read error rate of 10−14 . The proposed environment achieves the complete validation of ECC features by 80 coverage main points. Additionally, we cover also the error distribution in the cache buer, on board the memory, which requires additional 240 coverage points. The advantage of code reusability of this kind of verication environment and the wished coverage plan, achieved in few minutes with respect to the exhaustive approach, make easier the application of the proposed solution to any kind of volatile and nonvolatile memories.
Keywords
Block codes, Flash Memories, System Verication, UVM
1 Introduction and Background The capability of error correction in a semiconductor memory has become a critical challenge since the read disturbs are increasing due to the technology shrinking in the nonvolatile context and radiation phenomena in the volatile memories. For this reason, the error correcting codes (ECC) become more complex than past to recover wrong data bits during the read access.
In ash
10−3
[1], upon
memory context, the bit error rate (BER) during read is about the usage model, too high for the devices' reliability.
The ECC block is on
board the memory for the embedded market [2], however it is always possible to disable it and provide externally a more robust recovery mechanism, based on the same ideal approach. Some memory vendors implement the ECC algorithm in software [3]. The used codes own to the class of systematic Bose-ChaudhuriHocqenghem (BCH) block codes or Hamming codes [4]. A systematic code is
1
any error-correcting code in which the input data is embedded in the encoded output and the parity information can be appended to the nal encoded output. Another error-correcting approach in non-volatile memories is to dynamically adjust the read reference voltage called read-retry [5], in order to compensate the threshold voltage shift. In this work, we propose a complete block level validation environment of our ECC module, that uses a BCH code, improving the read BER to a satisfactory value for the embedded market.
We start from the classic Universal Veri-
cation Methodology (UVM) architecture [6], specializing the software model with additional automatic checkers (assertions) written in System Verilog. The assertion-based functional verication is the most powerful tool to discover design errors.
How we propose is a UVM test that stimulates the hardware to
code the message and calculates the related parity information. After, the same test adds binary errors in the code's protected area (message, meta data, and parity). Finally, the test forces a decoding operation, typical when read from array to the internal buer. In this way, we can validate the entire correction process, measuring the corrected and uncorrected binary errors, these last when the total number overcomes the maximum correctable by the used code. The proposed approach overcomes the problem of coverage driven simulations by exhaustive or formal verication [7] made of direct test cases unfeasible in very large scale integration (VLSI) systems [8] [9]. Moreover, the verication by full random patterns is unfeasible for this kind of design. Typical full random patterns verication represents a valid choice when the design allows a few or moderate number of binary states. Instead, the ECC requires the transmission of some information's kilo bytes, when the control signals are deterministic during the entire simulation. For this reason, the UVM environment combines reuse of code, assertion based verication and nally randomness of the used pattern in context of a complete coverage plan [10]. Additionally, the functional verication covers also critical issues in the design by the specic patterns as direct or constrained random or full random test cases.
The strength of our
approach is speed and coverage through thousands of random tests that veries the functional compliance in this challenging architecture. The paper has this organization: section II introduces the verication approach, the UVM architecture and the role of each software object. The section III introduces the used assertions to check potential hardware failures. We illustrate our coverage plan and how we achieve the complete functional verication in section IV. Section V is dedicated to the test architecture and the introduction to the principal signals in the ECC circuit involved in the parity information generation and check. Lastly, section VI concludes our work.
2 The Verication Environment The need of an automatic verication of complex analog, digital and mixedlevel systems requires a software architecture whose main key points are re-
2
Figure 1: The UVM based ECC verication environment
usability and ease of use. For this reason, test bench is organized into software objects that performs electrical stimulus, result computation and comparison of the design under test (DUT) to ideal model. The System Verilog enhances standard Verilog HDL language with object oriented programming, as natural extension to allow test bench design with interchangeable modules and their inter communication queues. Finally, UVM library and System Verilog facilitate functional coverage collection by random stimuli. The coverage software class is a specialization of a master UVM model, where verication engineer denes design functionality to cover by test bench.
2.1
The case study
Our case study is a 4Gbit nonvolatile ash memory.
The array is organized
in main blocks (one is the minimum erasable portion) which is a collection of smaller portions called pages. The ECC circuit considers page as a collection of N identical chunks, this recover system has a maximum of correctable binary errors in each chunk. Additionally, circuit provides a global status signal available to user to report correcting eorts and if there was an un recoverable errors event. This status reported correction eort as the worst scenario in the N chunks. The single chunk is a composition of three dierent memory entities: message that stores user data, some bits for future use (metadata) and a region that hosts parity generated by the ECC in program mode. The encoding process, during memory program phase, analyzes the N chunks serially, generating parity information. When memory reads data from array, binary mismatches in programmed data may occur, so the ECC block identies dierences and it tries to correct them. Finally, memory has additional feature of continuous read data from array page by page until the memory deselection. This important feature (continuous read CR) requires fast data analysis and correction by ECC and therefore a dierent operation mode and verication strategy. Algorithm for data correction is quite complicated; the circuit calculates
3
error locator polynomial (ELP); its roots are in relation with errant bits' position.
The Berlekamp and Massey algorithm iteratively calculates the ELP.
Finally, Chien algorithm searches the ELP's root in an exhaustive way. ECC coding/decoding mode is controlled by an internal nite state machine (FSM). Additionally, update phase is a sequence of buer data read, 1's complement of the read data at position of detected error and a nal data write. A second FSM controls this last functionality.
2.2
The Test bench architecture
The UVM-based verication environment comes from its canonical conguration and a set of two FSM checkers as our specialization in considered context. The Fig.1 shows our test bench software architecture. Main components are: -
UVM Driver :
this software unit drives the ECC block at coding and de-
coding stage.
It generates coding frame to the ECC in program mode.
Additionally, it inserts an arbitrary (random) number of errors resulting the corrupted frame.
Finally, it drives ECC in read mode writing cor-
rupted data sequentially chunk by chunk. -
UVM Sequencer :
This software module receives stimulus' data and it
sends this information into the UVM driver. -
UVM Monitor :
This passive entity reads physical signals by System Ver-
ilog interface detecting coding/decoding process. -
UVM Coverage :
This passive entity receives the current behavior from
monitor and it collects coverage information. -
UVM Reference Model :
This software class receives ECC activity from
monitor's analysis port. It calculates predicted status and it forwards this binary string to the scoreboard. -
UVM Scoreboard :
This software class compares predicted (from reference
model) and observed (from monitor) ECC status. -
FSM Checker 1 :
This software class checks state transitions and output
alignment of machine dedicated to ECC's main functionality. -
FSM Checker 2 :
This software class checks state transitions and output
alignment of machine devoted to data update.
3 Assertions UVM environment collects a high number of assertions whose violation indicates a design errors. System Verilog and UVM library represents a powerful tool to include assertions in any block of test bench illustrated in Fig. 1. Assertions
4
Figure 2: Textual printing of data update FSM and scoreboard assertions
Figure 3: Textual printing of buer prole assertions.
are primarily used to validate the behaviour of a design; they may also be used to provide functional coverage information for a design under test.
The
scoreboard includes a set of assertions to identify design errors relating to ECC status' binary value.
Additionally, more assertions are in FSM dedicated to
data update. They check both integrity of buer address and corrected data. In this way, at the end of decoding we estimate buer nal prole in term of corrected and uncorrected binary mismatches. We compare this last prole with buer content in Verilog behavioral model. Assertion both in this FSM and buer memories identify both wrong data update due to an uncorrected address and/or data value. Finally, some assertions check the internal state value and output integrity of the two nite state machines.
Text in the Fig.
2 shows
in last line typical hardware fault of ECC status by an assertion violation in the scoreboard. Same text shows rewrite assertions that checks buer (column) address and 32-bit binary data, this last observed at RTL level and predicted by a software model. Moreover, text in the Fig.
3 shows nal check at the end of decoding
process in memory buer. This last check compares buer physical prole with expected behavior. The mismatches zero mean ECC corrected all inserted errors in current chunk.
5
Figure 4: The Data Update FSM functional coverage report.
Figure 5: The Main FSM functional coverage report.
4 The functional coverage plan In this section, we summarize the structure of our coverage plan, specialized in the context of considered hardware. In particular, we generate appropriate stimulus by directed and random test cases that cover any existing and supported features that could be optionally enabled or disabled according to manufacturer desire.
Memory under consideration in this work could operate for serial or
parallel data communication to external world. In this case, the maximum correctable errors in each chunk may dier. Additionally, internal error recovery algorithm could be enabled or disabled, in this last case an optional external ECC has the role of error correcting data. There are many other parameters that identify 80 dierent congurations for the hardware.
This result is the
cross intersection of conguration parameters dened in the coverage plan; the UVM library and System Verilog language allow this augmented level of functional coverage.
Furthermore, we cover the principal state transitions in the
two FSMs introduced in section II. We achieve complete functional coverage for data update and ECC main operations. The maximum correctable total errors in each chunk diers if ECC operates in serial or parallel scenario. In this last condition, the status string has three dierent values that imply three dierent paths, and therefore three dierent cover points, in the nite state machine as depicted in Fig. 4. The ECC FSM has ve dierent state paths as depicted in the Fig. 5: encoding, ECC decoding enable signal on or o, this last under the continuous read feature is enabled or not.
6
Figure 6: The ECC coding/decoding phase: parity generation and errors correction
Test Name directed 01 directed 02 directed 03 directed 04 directed 05 directed 06 directed 07
Conguration CRN-S-MY-DE CRN-S-MY-DE CRY-S-MY CRN-P-MY-DE CRN-S-MN-DE CRN-S-MY-DE CRN-S-MY-DE
Coverage 47.80% 54.34% 67.81% 89.15% 90.40% 92.55% 93.07%
Sim. Time 15 sec. 7 sec. 9 sec. 15 sec. 14 sec. 14 sec. 14 sec.
Test Name directed 08 directed 09 directed 10 directed 11 directed 12 random 01 random 02
Conguration CRN-S-MY-DE CRY-S-MY-DE CRY-S-MY-DE CRY-S-MY-DE CRY-P-MY-DE CRN CRY
Coverage 94.29% 94.49% 94.49% 94.49% 94.54% 99.71% 100.00%
Sim Time 15 sec. 21 sec. 20 sec. 21 sec. 19 sec. 137 sec. 170 sec.
Table 1: Accumulated total coverage collection by direct and random test cases
5 The test structure The UVM test software model is specialized for twelve directed tests and a couple of random stimuli. The goal of directed tests is to qualify ECC circuit with expected behavior; additionally, this set of programs participates to functional coverage collection. Instead, the goal of the random stimulus is to complete the coverage plan, sampling congurations not included in the directed test cases. We collect accumulated total coverage simulating twelve direct/constrained random simulations and a couple of random tests. In particular, the conguration represents deterministic (that is not randomized) main parameters: continuous read enabled (CRY) or disabled (CRN), serial (S) or parallel (P) mode, meta data covered in the parity bits' generation (MY) or not (MN) and nally ECC decoding enabled (DE) or not (DN). This classication allows a clear understanding how the proposed directed/random test increases the total coverage. We achieve complete validation of the circuit under analysis in ten minutes, a time required to simulate a single coding/decoding phase at a full chip.
The
goal of random test is to cover the last 5% as potential critical behaviors not covered by direct tests. Table 1 shows the accumulated total coverage collection, simulating direct and random test cases.
The Fig.
6 shows the main signals involved in this
complex operations when N=8: -
clk :
-
algo_decode :
The ECC clock signal. Both ECC FSMs receive this reference pulse. when high indicates ECC checks the corrupted frame for
eventual data update.
7
-
algo_encode :
-
ecc_decode_en :
-
databus[31:0] :
-
sbus_in_en :
-
sbus_out_en :
-
quad :
-
quad_corr :
-
coladdr :
-
ecc_status :
when high indicates ECC calculates parity for each chunk. this is ECC enable signal.
ECC bi-directional main bus.
data transfer from ECC to buer. data transfer from buer to ECC.
chunk under analysis (coding/decoding). chunk under correction (decoding only).
buer address bus. ECC status string as the best correction eorts in the N
chunks. -
rmw_state :
data update FSM's internal state.
The Fig.6 also shows the
rmw_state
that exits from idle value during correction
phase. It performs a buer read, and exclusive bitwise OR and a nal buer write.
6 Conclusions In this work, we introduce the block level verication of ECC error management logic block by a software infrastructure whose main advantages are code reuse, a bench mark test suite for the complete hardware check and nally the measure of a functional coverage in few minutes of simulation. The use of a software System Verilog library specialized for the functional verication allows these primary goals while other approaches based on full random patterns and algorithm check at a block level and complete direct tests regression at a full chip generally require days of simulation.
The success of our approach requires an analysis
of the DUT hardware congurations, the main operational parameters for the design of an eective coverage plan. We nally achieved the correct design of this important unit, also helping the verication of memory net list, checking electrical integrity of communication protocol from/to ECC block to the rest of hardware.
References [1] Y. Cai, G. Yalcin, O. Mutlu, E. F. Haratsch, A. Cristal, O. S. Unsal, and K. Mai, Error analysis and retention-aware error management for NAND ash memory,
Intel Technology Journal, vol. 17, no. 1, 2013.
8
[2] D.
L.
Miller,
Error
correction
for
exponentially distributed read noise,
NOR
CoRR,
memory
devices
with
vol. abs/1306.5350, 2013.
[Online]. Available: http://arxiv.org/abs/1306.5350 [3] L. Zhang, Y. an Tan, and Q. kun Zhang, Identication of NAND ash ECC algorithms in mobile devices.
Digital Investigation, vol. 9, no. 1, pp.
3448, 2012. [4] J. Kim and Y. Jee, Hamming product code with iterative process for
2010 2nd International Conference on Computer Technology and Development, Nov 2010, pp. 611615.
NAND ash memory controller, in
[5] A. Fukami, S. Ghose, Y. Luo, Y. Cai, and O. Mutlu, Improving the reliability of chip-o forensic analysis of NAND ash memory devices,
Digit. Investig.,
vol. 20, no. S, pp. S1S11, Mar. 2017. [Online]. Available:
https://doi.org/10.1016/j.diin.2017.01.011 [6]
Universal Verication Methodology UVM 1.1 User Guide.
Accelera, 2011.
[7] A. Lvov, L. A. Lastras-Montano, V. Paruthi, R. Shadowen, and A. ElZein, Formal verication of error correcting circuits using computational
2012 Formal Methods in Computer-Aided Design (FMCAD), Oct 2012, pp. 141148. algebraic geometry, in
[8] K. Khalifa and K. Salah, Implementation and verication of a generic
2015 10th International Conference on Design Technology of Integrated Systems in Nanoscale Era (DTIS), April 2015, pp. 12.
universal memory controller based on UVM, in
[9] M. Moskala, P. Kloczko, M. Cieplucha, and W. Pleskacz, UVM-based
2015 IEEE 18th International Symposium on Design and Diagnostics of Electronic Circuits Systems, April 2015, pp. 123124. verication of bluetooth low energy controller, in
[10] G. Zhong, J. Zhou, and B. Xia, Parameter and UVM, making a layered testbench powerful, in
2013 IEEE 10th International Conference on ASIC,
Oct 2013, pp. 14.
9