Computer Engineering Department ... volatility is required, Flash memory is considered the best available choice when compared to other types of ..... The second type of disturb failures is known as Read Recovery Disturb (RRD) [21]. This type ...
1
Fault Model and Test Procedure for Phase Change Memory Mohammad Gh. Mohammad Computer Engineering Department Kuwait University P.O. Box 5969, Safat 13060, Kuwait
Abstract Chalcogenide based Phase change memory is one type of non-volatile memory that is most likely will replace the currently wide spread flash memory. Current research on Phase Change Memory targets the integration feasibility, as well as the reliability of such memory technology into the currently used CMOS process. Such studies identified special failure modes, known as disturbs, as well as other Phase Change memory specific faults. In this paper, we identify these failures, analyze their behaviors, and develop fault primitives/models that describes these faults accurately and effectively. In addition, we propose an efficient test algorithm, called March-PCM, to test for these faults and compare its performance to some previously developed test algorithms. Index Terms Phase Change Memory, fault model, test algorithms, March Test, disturbs.
I. I NTRODUCTION Different memory technologies have different features that are normally emphasized in a particular technology. For example, for volatile memory technology, Static Random Access Memory (SRAM) is considered the high performance memory of choice whereas DRAM represents the cost-effective one. On the other hand, when nonvolatility is required, Flash memory is considered the best available choice when compared to other types of non-volatile memories in terms of cost and performance [1]. In recent years, the interest in chalcogenide based phase change memory (sometimes referred to as PCM, PCRAM, or Ovonic Unified Memory (OUM)) has increased significantly. This can be seen by the ever increasing number of patents and research projects that targeted such technology. Over the past five years, the number of patents related to phase change memories have increased exponentially [2]. It is believed that PCM is the next generation of non-volatile memory due to its many desirable characteristics. These features can be summarized as 1) high performance comparable to SRAMs 2) small size analogous to This work was supported by Kuwait University Research Grant Number EO 04/04.
2
DRAMs 3) Non-volatile as Flash memory but without its limited endurance downside. With such characteristics, chalcogenide based phase change memory will possibly be the memory of choice that will replace all other type of volatile and non-volatile memories in the future [1], [2], [3], [4], [5]. Phase change memory is a novel non-volatile semiconductor memory concept that utilizes phase transitions in thin-film chalcogenide material, such as Ge2 Sb2 Te5 (GST), to store data information [1], [4], [5], [6], [7], [8]. Chalcogenide material, such as GST, can exist in two stable states with either high or low resistance value. The high resistance state is known as amorphous phase (or RESET state) whereas the low resistance state (or SET) is known as the crystalline phase. The resistance of the amorphous state is approximately two orders of magnitude larger than that of the crystalline state and is used to represent the stored information in such technology [3], [4], [8], [9], [10]. The transition from one state to the other can be accomplish by appropriately heating the chalcogenide material. Currently, there is a tremendous amount of research directed toward the study of PCM concept, its manufacturing compatibility, and integration with the traditional CMOS process [5], [11], [12], [13]. Other research is directed toward the modeling of phase transformation and cell operation in order to study design alternatives and their performance characteristics [6], [7], [8], [14]. Further, many researchers are currently characterizing technology performance and reliability characteristics [9], [15], [16], [17]. Reliability studies have shown that PCMs exhibit failure modes that are particular to this type of memory [18], [19], [20], [21]. However, research targeting fault modeling and test of such memories is extremely lacking especially since only limited number of studies target PCM faults [22]. Manufacturing test is an important part of any digital system production cycle. Test requirements constitute approximately 30% of final product cost [23], thus, efficient tests are required to provide cost-effective products and to minimize manufacturing cost. In order to efficiently develop tests for PCM, appropriate research must be performed to analyze the failure modes and develop appropriate tests. The objective of this paper is to discuss some of the issues associated with phase change memory test. The first goal for this paper is to discuss PCM cell structure and their principle of operation. The development of fault models that are appropriate to model the various failures and their physical cause constitute our second objective. The third and final objective is to utilize the fault models developed to devise appropriate test algorithms for the different faults. The paper is organized as follows. Section II discuss PCM technology, cell structure, and theory of operation. Different type of failures and the developed fault models associated with such memories are discussed in section III. A Test algorithm for PCM faults and it detection capabilities as compared to previously developed algorithms is described in section IV. Section V concludes the paper and highlights future work. II. P HASE -C HANGE -M EMORY The basic concept in phase change memories starts with the use of a material which can exists in two separate structural states in a stable fashion (amorphous or crystalline). The PCM cell consists of a chalcogenide layer (i.e. GST), a heater, and a select transistor. The chalcogenide material refers to alloy containing at least one group VI
3 Drain
11111111 00000000 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111
Top Electrode
GST Heater
Isolator
00000000 11111111 11111111 00000000 00000000 11111111 00000000 11111111
Bottom Electrode Gate
MOS
Source
Fig. 1.
Cross Section of PCM Cell
element such as germanium, antimony or Tellurium [1], the cell electrodes are constructed from Ti/TiN material [3], and the cell is insulated from neighboring cells using appropriate insulator such as ZnS-SiO [24]. The select transistor can either be a MOSFET or a bipolar junction field effect transistor [8], [12]. Figure 1 shows a typical cross section of a PCM cell based on MOSFET [3], [7], [9], [15], [17], [18]. Temperature amorphous phase (RESET) o
Tm 600 C polycrystalline phase (SET)
o
Tg 300 C
Time
Fig. 2.
Temperature vs. Phase Transition
The phase of the GST material can be suitably changed either through current or plasma heating of the chalcogenide material. Plasma heating is the method used in optical memories such as CD-R and DVDs whereas PCMs utilize current passage through the chalcogenide material as their heat generation method. The crystalline state, also known as SET state, is used to identify a cell with a logic value “1”, where as an amorphous state is know as RESET state where the logic value it represents is a logic “0”. The phase transition operations as a function of temperature and program pulse duration are depicted graphically in Figure 2 [12]. The amorphous phase (high resistance) is accomplished by heating the material above the melting point (Tm ) then cooling it rapidly below glass transition temperature (Tg ). While heating the material between the melting point and the glass transition temperature causes nucleation and crystal growth to rapidly occur over a period of several nano seconds hence resulting in the
4
crystallization of the material. Typical melting temperature of commonly used chalcogenide glasses is about 600o C whereas crystallization temperature is about 300oC [3]. Programming time for a cell to the amorphous phase is done in approximately 10ns whereas it crystallization is reached after approximately 50ns [9].
(a) I-V Characteristics Fig. 3.
(b) I-R Characteristics
Typical PCM Characteristics [18]
Figure 3(a) shows the I-V characteristic of a typical PCM cell. In the crystalline phase, the cell behaves as a quasi-linear resistor whereas in the amorphous phase, the I-V characteristics of the cell exhibit a snap-back behavior at approximately 1.2 volts which is defined as the threshold voltage of the cell (Vth ). The threshold voltage divides the operating range of the PCM cell into two regions, a region used for reading the cell and another used for programming (as indicated in the figure) [7], [12], [18]. In the region below Vth , a cell in the amorphous phase conduct negligible current whereas a crystalline one conducts a current in the order of 200µA for a read voltage of 0.5 volts. On the other hand, in the programming region (i.e. above Vth ), the current flowing through the cell is used to program appropriate value into the cell. For example, current of approximately 600 µA is used to crystalize the cell (SET operation) while 1.2mA current will result into RESET cell (amorphous phase). Regardless of the phase of the cell (whether SET or RESET), the resistance of the cell exhibit a dynamic behavior under the influence of the current forced through the cell. As the current forced through the cell is increased, the resistance of the cell decreases while temperature is increased until the melting temperature is reached where the resistance begins to increase once more. Figure 3(b) shows the current-resistance characteristics of the cell showing the resistance evolution for both crystalline and amorphous cell. Figure 4 shows the connectivity of a typical PCM memory cell array. The PCM cell can be modeled as a MOSFET with a variable resistance connected to its drain terminal. The drain of the PCM cell is connected to the bitline (BL), gate terminal is connected to the wordline (WL), and source terminal is connected to ground. In such memory array, the program and read operations are accomplished using the following steps [8]:
5
RESET The RESET operation is the operation where a logic “0” is written to the addressed cell. In order to accomplish this, high current is passed through the GST layer of the selected cell to heat the material above the melting point. This is done by applying high voltage on the BL of the addressed cell and turning ON the pass transistor by applying appropriate gate voltage. For unaddressed cells, their BL and WL are grounded. SET
The SET operation refers to the operation of writing a logic “1” value. This operation is performed by applying a high voltage on the BL of the addressed cell while turning ON the pass transistor (biasing the WL). Then a high value of current is forced through the cell to heat the material to a temperature above the glass transition temperature, but below the melting point of the GST material. The bias condition and the amount of current flowing in the cell will result in the transformation of the material to the crystalline phase. As for the unaddressed cells, their BL and WL are grounded to inhibit their programming.
READ During read operation, appropriate read voltage is applied to the addressed cell BL while turning ON the pass transistor by biasing its gate terminal with appropriate voltage. The low voltage on the BL is required to provide the necessary read current and avoid un-intentional modification of the stored data in the addressed cell. The stored data is sensed by measuring the amount of current passing through the addressed cell BL. For example, to read a cell, the BL is biased with read voltage (i.e. 0.4V) and the addressed cell is selected by applying the required voltage on its WL (i.e., 1.2V). For a cell in the amorphous phase, the resistance of the GST material is so high that no current (or extremely small current < 1µA) will pass through the cell and the majority of the current will flow in the BL. The amount of current that flows in the BL is sensed by the sense-amplifier and is compared with the reference current. In the case of an amorphous GST material, the magnitude of the current is greater than that of the reference cell, thus identified as a logic “0” value. However, in the case when the GST material is in the crystalline state, considerable current will flow through the memory element hence minimizing the amount of current flowing through the BL. The sense amplifier will sense a current with a magnitude that is less than the reference cell, hence concluding the content of the cell to be a logic “1” value. Table I gives an example of the biasing condition to program and read typical PCM cell [8]. In the table, labels VW L and VBL refer to voltages on the word line and bit line for the specified operation whereas IBL refers to the current sensed or forced through the bit line. During the read operation, the current is sensed from the cell is typically negligible (i.e. 0) when in RESET state and approximately 80µA when SET. On the other hand, while programming the cell to SET/RESET state, the current is forced through the cell as indicated in the figure to allow writing appropriate logic value into the cell. III. PCM FAULTS AND
DISTURBS
During initial design phase of a new product, the technology used undergoes a tremendous amount of characterization testing to evaluate design alternative, process quality, as well as performance characteristics. However, due to the manufacturing environment, spot defects as well as other anomalies such as contaminants between cell
6
TABLE I PCM CELL O PERATIONS ,V DD =3V
Operation VW L (V)
Values VBL (V)
IBL (µA)
READ
1.8
0.4
0–80
SET
Vdd
1.5
300
RESET
Vdd
2.7
600
heater and chalcogenide material would result in defective or unreliable parts [25]. In this section, we will briefly discuss known issues related to PCM cell failures and will develop appropriate fault models that can be used to set proper tests to screen out defective units. A. PCM Disturb Faults In non-volatile memories, data storage operations require either elevated voltages, such as the case in flash memory, or elevated currents (hence temperature), as the case in phase change memories [26], [27]. These elevated voltages/currents result in behaviors known as disturbs. Disturbs are simply defined as the unintentional change in the content of an un-addressed cell while operating on another cell. It is also possible to have a disturb behavior to the addressed cell similar to read disturb faults [27]. Manufactures normally spend considerable time to optimize the design of the memory cell and array organization to reduce these unwanted effects. Next, we describe four types of disturb faults associated with PCMs, namely, Proximity (PD), Read Recovery (RRD), Read (RD), and False Write (FWR). BL 0
BL 1
BL 2
WL 0
PCM cell
WL 1
WL 2
Fig. 4.
PCM Memory Array
PCM operations are based on the generation of heat to store information in memory cells. Therefore, the design and implementation of this type of memory cells must guarantee the proper isolation of the individual cells. Further,
7
current/voltage rating under different operating conditions must be within design limits to guarantee proper operation and data retention characteristics. If any of these requirements is violated, failures such as disturbs or data retention failures might occur.
Fig. 5.
Heat Dissipation to neighboring Cell
The first type of disturbs which is considered as one of the most pronounce failure mode is called the proximity disturb (PD) or simply program disturb [28], [22]. This failure can be explained as the unintentional loss of the stored information (RESET state) when a cell in its immediate proximity (first neighbors, as shown in Figure 4) is being programmed to a RESET state (logic 0). This failure mode is caused by thermal cross-talk generated while programming a cell. For example, if a cell is undergoing a program operation to RESET state (logic “0”), the GST material is heated to its melting temperature to program appropriate information into the addressed cell. On other hand, the generated heat could be dissipated to neighboring cells, due to defects or improper isolation, which could lead to the loss of stored data (through crystallization of the GST material) in the neighboring cell. In [28], authors have shown during program operation (RESET), first neighbors experience elevated temperatures, thus resulting in increased possibility of proximity disturb occurring. Figure 5 shows the temperature in a disturbed cell as a function of feature size (F) using different scaling approaches. In one scaling approach, called isotropic scaling, all cells dimensions are scaled linearly (depending on cell structure) with the feature size. On the hand, in non-isotropic approach, cell dimensions are scaled down except for the heights of the heater and chalcogenide material which are kept constant. The third approach is a mix of these two approaches with specific scaling rules [29]. Even though non-isotropic scaling is advantageous in terms of reducing the required programming current, It is non viable with regard to program disturb immunity. It is apparent from the figure that a non-isotropic scaling approach dissipates a large amount of heat to its immediate neighbors, hence amplifying proximity disturb behavior. Hence, it is important to consider the scaling approach, technology, cell structure and isolation method when manufacturing PCMs to avoid such unwanted behavior. Therefore, it is important to test for this type of fault to guarantee all aspects of design and integration characteristics are fully compliant with specifications. This type of disturb can be modeled as an idempotent coupling fault of the type < xw0; 0/1m /− >. This fault
8
model is given using traditional coupling fault notation , representing a fault behavior involving two memory cells, an aggressor cell (a) and a victim cell (v) [30]. In this notation, Sa ∈ {w0,w1,r0,r1}, is the operation sequence of the aggressor (selected cell), Sv the operation sequence/initial state of the victim (affected cell), F ∈ {0,1} the state stored in the faulty cell, and R ∈ {0,1,-} the output of the read operation. The value “-” in field R is used when the operation performed is a write operation. Note that in this fault primitive, the subscript “m” represents the fact the cell’s resistance is not a typical one, it is rather less resistive thus referred to as marginally RESET and to signifies its requirement for special detection condition (i.e. margin read [26]). The characteristics of the margin read operation will be discussed in section IV. Such failure can be avoided by properly controlling heat generation and confinement within the active GST material of the core cell by various isolation methods such as cell heater design [28].
Fig. 6.
Resistance Recovery [21]
The second type of disturb failures is known as Read Recovery Disturb (RRD) [21]. This type of failure occurs when reading a cell immediately after it has been programmed to high resistance state (RESET, or logic 0). The read operation returns value “1” whereas the true value of the cell is actually a logic “0”. This failure occurs due to the fact that the newly programmed cell immediately is in a non-equilibrium state and additional time is required for free carriers to recombine/diffuse to restore equilibrium after programming voltages removal. In [21], the authors had shown that the drift dynamics of the resistance can be explained by time evolution of resistance as: t
R(t) = Rd (t)kR(0)e τ
(1)
where Rd (t) is the drift component of the resistance, R(0) is the resistance at time zero in the recovery transient, and τ is the effective recombination time for access carriers. Figure 6 shows the comparison of the experimental results of the recovery dynamics as compared to those calculated by (1) which shows a good agreement between them. It has also been shown that in addition to resistance recovery, there is also threshold dynamics (VT ) that occur [21]. These findings necessitate the proper design of the operating voltage of the read operation (bitline voltage < minimum (VT )) as well as timing of the read operation (time between program operation and any subsequent read operation).
9
Since this behavior is an intrinsic to all PCM cells, designer’s of the PCM cell array must properly design the read/write circuitry to allow sufficient time between write and read operation to ensure all transient behavior in the cell have reached steady state. Therefore, when such a behavior is present in a cell, this faulty behavior can be attributed to a fault in the read/write circuitry. Nonetheless, such faulty behavior can be modeled as a memory cell array fault as it is normally done for some faults in peripheral circuitry(i.e. using reduced functional fault model approach for memories) [31]. This failure can be modeled as a class of incorrect read fault (IRF) of type < 1w0R0/0/1m > (in particular, dynamic IRF10 [30]). Unlike the previous fault which requires two cells (aggressor and a victim), this fault is a single cell fault. Another type of disturb which occurs under low current biasing condition is read disturb (RD) [13]. This disturb behavior, which is similar to the one occurring in Flash memory, may prevail in cells in the amorphous state (high VT state in flash). During the read operation, the amount of current flowing through the cell is in the order of 1µA. In the presence of defects (or due to cycling), the amount of current flowing in the cell is increased leading to the increase in the amount of heat generated in the GST material. If the amount of current flowing in the GST layer results in localized heating of the chalcogenide material, thus switching the phase of the GST material from amorphous to crystalline state [15]. Therefore, the stored information in the cell is destroyed. This fault type can be represented as a fault of the type < 0r0/1m /0 >. This class of faults can be categorized as a Deceptive Read Disturb Fault (DRDF) [30]. This fault model assumes that the read operation is short enough to read the cell in the correct logic state before switching occurs. Note that in this fault primitive, the subscript “m” represents the fact the cell is not in a typical RESET state (in terms of resistance), it is rather marginally RESET. The reason for marginal resistance value is that the localized heating crystalize only a small portion of the programmable GST layer. Thus, the total resistance of the cell is the parallel combination of the amorphous and crystalline GST material which is less than typical amorphous phase. Due to this decreased resistance, this fault type also requires special read operation similar to that used for RRD fault. Another read disturb phenomenon, known as ‘false write” (FWR), was observed in PCM (such behavior was not observed in flash memory). In such disturb failure, the read operation changes the addressed cell from a SET to RESET state. This fault can be view as the complement of the RD fault described above. In [18], it was shown that this failure occurred due to the false activation of the write current circuitry during read operation especially when operating in environments where radiations are present. Even though that such failure occurs due to fault in the read/write circuitry, it can still be modeled as memory cell array fault as we have previously done for the case for RRD fault. Thus, this failure mode can be modeled as a deceptive read disturb fault (DRDF) of type < 1r1/0/1 > in the core memory cell. Again, the assumption here is that the read operation is short enough for the sense amplifier to recognize the switching, thus the cell’s original value is read. B. Other PCM Specific Faults In the previous section, we discussed faults that unintentionally alter the content of a certain cell (whether addressed on non-addressed) which are normally referred to as disturb faults. In this section, we discuss more
10
general failure modes that can occur in PCMs. Unlike the previous kind of failures, most of this kind of failures normally result in permanent faulty behavior. In the former type of faults, the correct behavior of the cell is restored if proper data is re-written into the cell (i.e. PD fault) or sufficient time is allowed before reading (i.e. RRD fault). However, most faults discussed in this section normally result into a faulty behavior that cannot be corrected by any means (except for Incomplete Program Fault which can be avoided if detected during characterization phase, and program circuitry/algorithm were properly adjusted). The first type of failures in this class can be attributed to the overheating of the GST material during programming (due to high current magnitude). When the generated heat is extremely high during programming, the chalcogenide material might intermix with the adjacent material of the core cell thus destroying its physical characteristics [15]. In such circumstances, the cell will continuously exhibit low resistance value that can no longer be altered. This failure mode is known as Stuck Set (SS) due to the fact that the faulty cell will permanently have a stuck-at-1 value. However, since this type of failure will not occur unless the cell is programmed to the RESET state at least once, thus the failure mode is modeled as a transition fault of type . Another failure mode in this category is called Incomplete Program Fault (IPF). This fault occurs when the cell is undergoing a RESET operation and it result in a less resistive final state than a typical cell [32]. In severe cases, the cell exhibit a resistance value close to a SET state. Even if resistance value is few K-ohms from SET value, such cells pose a reliability concern and might result in an in-field failure. The cause of this failure is the presence of contaminants and other impurities in the active region of the GST material. These impurities results in parallel conductive paths with low resistive characteristics hence reducing the overall resistance of the cell. Another cause for this behavior can be attributed to variation in the crystallization rate of the GST material which is normally circumvented by fast quench time (i.e. program time characteristics) [33]. This failure mode can be modeled as a < ∀/1m > fault which requires a margin read operation to be detected as a stuck-at fault. The final failure mode of PCM cells that is addressed in this discussion is called “Stuck Reset” (SR). Even though this failure more has only been observed during cycling, hence not a manufacturing related defect, we are including it here for the sake of completeness. Such failure mode occurs after extensive cycling, where the GST material of the PCM cell might become separated from its top electrode leading to an open circuit at the contact region [17]. Since such failure occurs only if the cell is programmed to RESET state at least once, then it can be modeled as a transition fault of type . Table II summarizes the names as well as the fault models associated with the various failure modes associated with PCM memories. In the next section, we discuss test algorithms that detect the various PCM faults. IV. E XCITATION CONDITIONS
AND
T EST A LGORITHM
In the previous section, different fault models for various PCM failure modes were discussed. Even though the excitation/detection conditions for every fault can be deduced from its fault primitive, such condition are explicitly given in Table III for each fault. The table specifies name of fault, victim initial state (i.e. faulty cell), sensitizing and, detecting conditions. Note that in the last column, there are three possible options to detect the fault. The first case
11
TABLE II S PECIFIC PCM FAULT PRIMITIVES
Failure Name
Cause
Fault model
Prox. Disturb (PD)
Thermal Coupling
< xw0; 0/1m /− >
Read Rec. Disturb (RRD)
Read Access Timing
< 1w0r0/0/1m >
Read disturb (RD)
Localized heating
< 0r0/1m /0 >
Fault Write (FWR)
Read/Write Circuitry
< 1r1/0/1 >
Stuck Set (SS)
Over heating,
Stuck Reset (SR)
Over heating,
Incomplete Prog. (IPF)
Contaminants
< ∀/1m >
labeled as “Rx” means that fault detection requires performing a read operation expecting a value “x”. Moreover, when symbol “Rxm ” is used, it signifies that the fault can only be detected by a margin “x” read operation. The use of rounded brackets, i.e. “(Rxm )”, signifies an “optional” read operation. In such case, the read operation is not required if margin read is used in fault excitation. However, if a normal read operation is used in fault excitation part, then the read operation (margin read) becomes compulsory. TABLE III E XCITATION AND D ETECTION C ONDITIONS
Fault
Initial State
Excitation
Detection
PD
0
W0neighbor
R0m
RRD
1
W0,R0
(R0m )
RD
0
R0
R0m
FWR
1
R1
R1
SR
0
↑
R1
SS
1
↓
R0
IPF
X
W0
R0m
A. Excitation and Detection Conditions Starting with PD fault, this fault involves two cells, an aggressor cell and a victim cell. The victim cell must be initially in the RESET state (logic 0). In order to excite this fault, a neighboring cell must be programmed to RESET state (labeled as W0neighbor in the table). The final state of the victim cell is changed from RESET to SET and thus detected by R0 operation. On the other hand, fault RRD is excited by writing 0 followed immediately by reading 0 to the same cell. The R0 operation in the excitation sequence is used to detect the fault hence no additional read operation is required. As for RD and FWR faults, both of these faults are excited by a read operation. The difference between these two faults is the initial state of the faulty cell and the fact that RD fault requires a margin read operation to detect this type of fault. Margin read operation uses different reference current value than that used during normal read
12
operation. Only one margin read operation is needed to detect PCM specific faults. This margin read is called RESET margin read (i.e. R0m in Table III). The difference between the normal and margin read is the reference current used to identify the stored information in a cell. Since the faulty cell’s resistance value is marginally different that a typical cell, the reference current should be designed in such a way that it distinguishes between faulty and fault-free cell. For example, a typical RESET cell will conduct a current less than 1µA during a read operation whereas a SET cell conducts current in the range 50-100µA. Thus, a typical read reference current is chosen to be around 20µA. Therefore, for a marginally RESET cell, margin read reference current, called IRST m , can be chosen to be approximately 5µA. Using margin read operation, cells that conduct current in excess of 5µA will be considered as SET where as those that conduct less current are considered as RESET. The characteristics of the margin read reference current with respect to normal read current is graphically depicted in Figure 7. 1111111111111 0000000000000 0000000000000 1111111111111 0000000000000 1111111111111 0000000000000 1111111111111 0000000000000 1111111111111 0000000000000 1111111111111 0000000000000 1111111111111 0000000000000 1111111111111 0000000000000 1111111111111 0000000000000 1111111111111 0000000000000 1111111111111 0000000000000 1111111111111
1111111111111 0000000000000 0000000000000 1111111111111 0000000000000 1111111111111 0000000000000 1111111111111
Fig. 7.
Typical I(SET)
I
ref
I
RST m
Typical I(RESET)
Margin Read Reference Currents
As for SR fault, it can be excited by writing 1 to the cell and is detected by read 1 operation. Moreover, SS fault is excited by writing 0 and detected by read 0 operation. Similarly, IPF is excited by write 0 operation and detected by margin read 0 operation. B. Test Algorithm After considering the various conditions for fault excitation and detection conditions, we developed a March algorithm that can detect PCM specific faults in addition to common faults such as stuck-at (i.e. SA0 and SA1)and address decoder faults (AF). Algorithm March-PCM is given below: March-PCM= {M 0 :m (W 0); M 1 :⇑ (R0m , W 1, R1); M 2 :⇓ (R1, W 0, R0m); M 3 :⇓ (R0m )} The algorithm is given using conventional March test notation. A march test consists of a sequence of march elements where each march element by itself consists of a sequence of operations that are all applied to to a given memory cell before proceeding to the next cell. The way to proceed to the next cell is determined by either increasing ‘⇑’, or decreasing ‘⇓’ address order. An increasing address sequence starts with address 0 and ends with address n − 1 for a memory containing N addressable cells (bits, bytes, or words), whereas a decreasing address sequence starts from n − 1 and terminates at 0. If the addressing sequence can be arbitrary, the symbol ‘m’ is used.
13
Operations on memory cells are ‘w0’ (write value ‘0’), ‘w1’, ‘r1’ (read the cell expecting a ‘1’ value), and ‘r0’. The complete march test is delimited by curly brackets ‘{...}, while a march element is delimited by regular brackets ‘(...)’. Different march elements within a test are separated by semicolons, whereas different operations within a single element are separated by commas [31]. For example, a march element such as ⇑(r0,w1,r1) performs a read ‘0’ followed by write ‘1’ followed by a read ‘1’ operations. If we assume that the cell that underwent the pervious operation was i, then the next cell to be addressed will be the cell i + 1 since the address sequence is designated with ‘⇑’symbol. In order to ease the understanding of the working of the algorithm, we have labeled each march element with a label, i.e. Mx, where x ∈ 1,2,...etc. For example, M1 in algorithm March-PCM refers to the march element ⇑ (R0m , W 1, R1). Moreover, March-PCM algorithm utilizes margin read operations such as “R0m ” which signifies that the read operation uses IRST (described in previous section) as a reference current value. The following is a m proof how algorithm march-PCM detects the various PCM faults described in Table II: PD:
All faults except the fault in the last cell (n-1) are excited by W0 of M0 and are detected by R0m in M1. Fault in cell (n-1) is excited and detected by W0 in M2 and R0m in M3, respectively.
RRD: These faults are excited by element W0R0m of M2 and are detected by R0m of M2 as well. RD:
R0m in M2 excites these faults, and R0m in M3 detects them.
FWR: R1 in M1 and R1 in M2 excites and detects these fault, respectively. IPF:
Such faults are excited by W0 in M2 and detected by R0m in M3.
SS:
Operations W0 and R0 of M2 excites and detects these faults, resepectively.
SR:
These faults are excited by W1 of M1 and detected by R1 in the same march element.
In addition to PCM specific faults mentioned above, March-PCM algorithm detects other common faults such as Stuck-at faults (SAF) and address decoding fault (AF). The detection capabilities and complexity of algorithm March-PCM as well previously developed algorithms are summarized in Table IV. The detection capabilities of all algorithms for common faults such as SAF and AF were verified by the use of RAMSES RAM fault simulator [34]. TABLE IV C OMPARISON BETWEEN M ARCH -PCM AND OTHER M ARCH TESTS
March
SAF
TF
AF
PD
RRD
RD
FWR
IPF
Comp.
Y
100
100
100
0
0
0
100
0
8n
PC
100
100
100
0
0
0
100
0
11n
PCM
100
100
100
100
100
100
100
100
8n
Y*
100
100
100
100
100
100
100
100
8n
PC*
100
100
100
100
100
100
100
100
11n
Note that in Table IV, there are two versions of algorithms March-PC and March Y [22], [31]. The first version (i.e. PC and Y) signifies the originally proposed algorithms. These algorithms do not use margin reads hence unable to detect faults that require margin read operations. We have modified these algorithm to use margin reads
14
(by changing some read operation to margin read) and were included in the table as PC* and Y*. With this modification, the detection capabilities of these algorithms are greatly enhanced. Note that in the table, faults “SR” and “SS” are grouped together into a single column labeled “TF” due to the fact that they can be modeled as “transitional” faults (refer to table III). By comparing March-PCM with March-PC,we can notice both algorithms have the same fault coverage, however, March-PCM has better complexity. Further, March-PCM and March-Y have both, same complexity as well as fault coverage. Note that March-Y algorithm fault coverage is boosted only after modifying it to use margin read operation (hence version March-Y). From this discussion, it is clear that March-PCM algorithm is an efficient algorithm that is capable of detecting all PCM specific as well as common memory faults such as SAFs and AF faults. V. C ONCLUSION In this paper, we have discussed failure modes that are specific to Phase Change Memories or PCMs. We describe the origin of the faulty behaviors and developed appropriate faults models. The developed faults models were used to develop a March algorithm (called March-PCM) that is capable of detecting all these faults in an efficient manner. Moreover, comparison between the proposed algorithm and previous developed algorithm was also discussed. Future work associated with this topic includes the development of an electrical model for defective cells as well as the validation of faulty cell behavior by means of electrical simulation. R EFERENCES [1] J. Maimon, E. Spall, R. Quinn, and S. Schnur, “Chalcogenide-based non-volatile memory technology,” IEEE Proceedings Aerospace Conference, vol. 5, pp. 2289 – 2294, March 2001. [2] C. Lam, “Phase-change memory,” 65th Annual Device Research Conference, pp. 223–226, June 2007. [3] G.Muller, N.Nagel, C.-U. Pinnow, and T.Rohr, “Emerging non-volatile memory technologies,” 29th European Solid-State Circuits Conference, pp. 37 – 44, September 2003. [4] F. Ottogalli, A. Pirovano, F. Pellizzer, M. Tosi, P. Zuliani, P. Bonetalli, and R. Bez, “Phase-Change Memory Technology for Embedded Applications,” 34th European Solid-State Circuits Conference, pp. 293 – 296, September 2004. [5] F. Bedeschi, C. Resta, O. Khouri, E. Buda, L. Costa, M. Ferraro, F.Pellizzer, F. O. A. Pirovano, M. Tosi, R. Bez, R. Gastaldi, and G. Casagrande, “An 8Mb Demonstrator for High-Density 1.8V Phase-Change Memories,” Symposium on VLSI Circuit Digest of Technical Paper, pp. 442–445, June 2004. [6] A.Pirovano, A. Lacaita, A. Benvenuti, F. Pellizzer, S.Hudgens, and R. Bez, “Scaling analysis of phase-change memory technology,” IEEE International Electron Devices Meeting Technical Digest, pp. 29.6.1 – 29.6.4, December 2003. [7] M.Gill, T.Lowrey, and J.Park, “Ovonic unified memory - a high-performance nonvolatile memory technology for stand-alone memory and embedded applications,” IEEE International Solid-State Circuits Conference Digest of Technical Papers, vol. 1, pp. 202 – 459, February 2002. [8] F. Bedeschi, R. Bez, C. Boffino, C. Bonizzoni, E. Buda, G. Casagrande, L. Costa, M. Ferraro, R. Gastaldi, O. Khouri, F. Ottogalli, F.Pellizzer, A. Pirovano, C. Resta, G. Torelli, and M. Tosi, “4-Mb MOSFET-selected phase-change memory experimental chip,” 30th European Solid-State Circuits Conference, pp. 207 – 210, September 2004. [9] D. Salamon and B. F. Cockburn., “An electrical simulation model for the chalcogenide phase-change memory cell,” International Workshop on Memory Technology, Design and Testing, pp. 86 – 91, July 2003. [10] X. Wei, L. Shi, W. Rajan, R. Zhao, B. Quek, X. Miao, and T. Chong, “Universal HSPICE Model for Chalcogenide Based Phase Change Memory Elements,” Non-Volatile Memory Technology Symposium, pp. 88–91, November 2004.
15
[11] Y. Hwang, J. Hong, S. Lee, S.J.Ahn, G. Jeong, G. Koh, H. Kim, W. Jeong, S. Lee, J. Park, K. Ryoo, H. H. Y. Ha, J. Yi, W. Cho, Y. Kim, K.H.Lee, S. Joo, S. Park, U. Jeong, H. Jeong, and K. Kim, “Phase-change chalcogenide nonvolatile RAM completely based on CMOS technology,” International Symposium on VLSI Technology, System and Applications, pp. 29 – 31, October 2003. [12] S. Ramaswamy, K. Hunt, D. Maimon, B. Li, A. Bumgarner, J. Rodgers, and L. Burcin, “Progress on Design and Demonstration of the 4Mb Chalcogenide-based Random Access Memory,” Non-Volatile Memory Technology Symposium, pp. 137 –142, November 2004. [13] K. Osada, T. Kawahara, R. Takemura, N. Kitai, N. Takaura, N. Matsuzaki, K. Kurotsuchi, H. Moriya, and M. Moniwa, “Phase Change RAM Operated with 1.5-V CMOS as Low Cost Embedded Memory,” IEEE 2005 Custom Integrated Circuits Conference, pp. 431 – 434, September 2005. [14] Y. Ha, J. Yi, H. H. J. Park, S. Joo, S. Park, U.-I. Chung, and J. Moon, “An edge contact type cell for Phase Change RAM featuring very low power consumption,” Symposium on VLSI Technology Digest of Technical Papers, pp. 175 – 176, June 2003. [15] A. Pirovano, A. Redaelli, F. Pellizzer, F. Ottogalli, M. Tosi, D. Ielmini, A. L. Lacaita, and R. Bez, “Reliability Study of Phase-Change Non-Volatile Memories,” IEEE Transactions on Device and Materials Reliability, vol. 4, no. 3, pp. 422 – 427, September 2004. [16] A. L. Lacaita, A. Redaelli, D. Ielmini, M. Tosi, F. Pellizzer, A. Provano, and R. Bez, “Programming and Disturb Characteristics in Non Volatile Phase Change Memories,” IEEE Non-Volatile Semiconductor Memory Workshop, pp. 26–27, Aug 2004. [17] S.Lai, “Current status of the phase change memory and its future,” IEEE International Electron Devices Meeting Technical Digest., pp. 10.1.1–10.1.4, December 2003. [18] J. D. Maimon, K. K. Hunt, L. Burcin, and J. Rodgers, “Chalcogenide Memory Array: Characterization and Radiation Effects,” IEEE Transactions on Nuclear Science, vol. 50, no. 6, pp. 1878–1884, December 2003. [19] A. Pirovano, A. Lacaita, F. Pellizzer, S. Kostylev, A. Benvenuti, and R. Bez, “Low-Field Amprphous State Resistance and Threshold Voltage Drift in Chalcogenide Materials,” IEEE Transactions on Electron Devices, vol. 51, no. 5, pp. 714–719, May 2004. [20] U. Russo, D. Ielmini, A. Redaelli, and A. Lacaita, “Intrinsic Data Retention in Nanoscaled Phase Change Memories-Part I: Monte Carlo Model For Crystalization and Percolation,” iEEE Transactions on Electron Devices, vol. 53, no. 12, pp. 3032 – 3039, December 2006. [21] D. Ielmini, A. Lacaita, and D. Mantegazza, “Recovery and drift dynamics of resistance and threshold voltages in phase-change memories,” IEEE Transactions on Electron Devices, vol. 54, no. 2, pp. 308 – 315, February 2007. [22] M. G. Mohammad, L. Terkawi, and M. Albasman, “Phase Change Memory Faults,” 19th International Conference on VLSI Design, pp. 6–11, January 2006. [23] M. L. Bushnell and V. D. Agrawal, Essentials of Electronic Testing for Digital, Memory and Mixed-Signal VLSI Circuit, 2nd ed. Kluwer Academic Publishers, 2004. [24] L. Shi, T. Chong, J. Li, D. Koh, R. Zhao, H. Yang, P. Tan, X. Wei, , and W. Song, “Thermal Modeling and Simulation of Nonvolatile and Non-rotating Phase Change Memory Cell,” Non-Volatile Memory Technology Symposium, pp. 83 – 87, October 2004. [25] D. mantegazza, D. Ielmini, A. Pirovano, and A. Lacaita, “Anomalous Cells With Low Reset Resistance in Phase-Change Memory Arrays,” IEEE Electron Device Letters, pp. 865–867, October 2007. [26] M. G. Mohammad and K. K. Saluja, “Optimizing Program Disturb Faults Test Using Defect-Based Testing,” Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 24, no. 6, pp. 905 – 915, June 2005. [27] M. G. Mohammad and L. Al-Terkawi, “Techniques for Disturb Fault Collapsing,” Journal of Electronic Testing: Theory and Applications, vol. 23, no. 4, pp. 263 – 268, August 2007. [28] A. Lacaita and D. Ielmini, “Status and challenges of PCM modeling,” 37th European Solid State Device Research Conference, pp. 214–221, September 2007. [29] U. Russo, D. Ielmini, A. Redaelli, and A. Lacaita, “Modeling of Programming and Read Performance in Phase-Change Memories-Part II: Program Disturb and Mixed-Scaling Approach,” iEEE Transactions on Electron Devices, vol. 55, no. 2, pp. 515 – 522, February 2008. [30] A. van de Goor and Z. Al-Ars, “Functional memory faults: a formal notation and a Taxonomy,” Proceedings of the 18th IEEE VLSI Test Symposium, pp. 281 – 289, April 2000. [31] A. van de Goor, Testing Semiconductor Memories: Theory and Practice.
ComTex Publishing, 1998.
[32] D. mantegazza, D. Ielmini, A. Pirovano, B. Gleixner, A. Lacaita, E. Veresi, F. Pellizzer, and R. Bez, “Electrical Characterization of Anomalous Cells in Phase Change Memory Arrays,” International Electron Devices Meeting, pp. 1–4, December 2006. [33] A. Itri, D. Ielmini, A. Lacaita, F. Pellizzer, A. Pirovano, and R. Bez, “Analysis and phase-transformation dynamics and estimation of
16
amorphous-chalcogenide fraction in phase-change memories,” 42nd International Reliability Physics Symposium, pp. 209–215, April 2004. [34] K.-L. Cheng, J.-C. Yeh, C.-W. Wang, C.-T. Huang, and C.-W. Wu, “RAMSES-FT: A Fault simulator for Flash Memory Testing and Diagnostics,” VLSI Test Symposium, pp. 281–286, 2002.