This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
1
Modeling and Optimization of Memristor and STT-RAM-Based Memory for Low-Power Applications Yasmin Halawani, Member, IEEE, Baker Mohammad, Senior Member, IEEE, Dirar Homouz, Mahmoud Al-Qutayri, Senior Member, IEEE, and Hani Saleh Abstract— Conventional charge-based memory usage in low-power applications is facing major challenges. Some of these challenges are leakage current for static random access memory (SRAM) and dynamic random access memory (DRAM), additional refresh operation for DRAM, and high programming voltage for Flash. In this paper, two emerging resistive random access memory (ReRAM) technologies are investigated, memristor and spin-transfer torque (STT)-RAM, as potential universal memory candidates to replace traditional ones. Both of these nonvolatile memories support zero leakage and lowvoltage operation during read access, which makes them ideal for devices with long sleep time. To date, high write energy for both memristor and STT-RAM is one of the major inhibitors for adopting the technologies. The primary contribution of this paper is centered on addressing the high write energy issue by trading off retention time with noise margin. In doing so, the memristor and STT-RAM power has been compared with the traditional sixtransistor-SRAM-based memory power and potential application in wireless sensor nodes is explored. This paper uses 45-nm foundry process technology data for SRAM and physics-based mathematical models derived from real devices for memristor and STT-RAM. The simulations are conducted using MATLAB and the results show a potential power savings of 87% and 77% when using memristor and STT-RAM, respectively, at 1% duty cycle. Index Terms— Duty cycle, embedded memory, low energy, low power, memristor, spin-transfer torque (STT)-RAM, wireless sensor node (WSN).
I. I NTRODUCTION
B
ATTERY life is a major concern in portable electronics with low duty cycle such as wireless sensor nodes (WSNs). These nodes are used in a wide range of applications including structural, volcanic, and health monitoring fields. In such applications, the node is expected to interact with the environment and perform its intended functions throughout its lifetime without recharging or replacing the batteries. Battery technology improvement at a rate about 7%
Manuscript received August 31, 2014; revised January 20, 2015 and April 7, 2015; accepted May 14, 2015. Y. Halawani, B. Mohammad, H. Saleh, and M. Al-Qutayri are with the Department of Electrical and Electronics Engineering, Khalifa University, Abu Dhabi 127788, UAE (e-mail:
[email protected];
[email protected];
[email protected], hani.saleh@ kustar.ac.ae). D. Homouz is with the Department of Applied Mathematics and Science, Khalifa University, Abu Dhabi 127788, UAE (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TVLSI.2015.2440392
Fig. 1. Utilization of logic and memory for a typical SoC shows memory usage up to 70% of the die area by 2017 [6].
a year [1] is not sufficient to keep up with the energy density required by WSNs. The growing interest in maximizing the operational lifetime of such battery-powered devices has led to various tradeoffs between communication and computation with little improvement on the memory side [2], [3]. Embedded memory is known to have positive impact on both performance and power by reducing bus transaction and bridging the gap between processor and main memory frequency [4], [5]. On-chip memory in embedded systems occupies more than 50% of the total chip area and this trend continues to grow, as observed in Fig. 1 [6]. SRAM that uses six transistor (6T) cell has been the preferred choice for most embedded memories due to its fast access time. However, SRAM sensitivity to process variation due to ratio logic for both read and write operation limits voltage and power scaling [7]. Conventional on-chip memory, which uses SRAM, consumes about 40%–50% of the system’s power [8], [9]. A significant portion of this power (about 50%) is wasted into leakage [10], [11]. On the other hand, existing Flash and DRAM technologies have limitations that prevent them for being considered to replace SRAM. Flash requires high programming voltage and adds cost to the chip manufacturing as it needs six additional masks. Regular DRAM memories require the additional periodic refresh operation. Embedded DRAM has been proposed but only for high-performance chips like the ones targeted for server where the capacity and
1063-8210 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
TABLE I C OMPARISON OF D IFFERENT C ONVENTIONAL T ECHNOLOGIES [11]
price can tolerate the additional cost and power associated with DRAM (refresh cycle and trench capacitor). Embedded Flash technologies lag advanced logic by two to three generations, so it is rarely used for advanced system-onchip (SoC) [12]. Thus, research trend is focusing more on noncharge-based memory technologies that combine the speed of SRAM, density of DRAM, and nonvolatility of Flash. Different adaptive oxide materials have been proposed to achieve high device density with low power consumption [14]. The internal state of such materials (e.g., resistivity, polarization, and magnetization) can be electrically controlled to realize logic 1 and logic 0. Emerging nonvolatile memory technologies include ferromagnetic RAM (FeRAM), pulse-code modulation (PCM), memristor, and spin-transfer torque (STT)-RAM [15]. PCM is temperature sensitive and suffers from high write energy [16]. FeRAM has low write endurance and low compatibility with CMOS technology [17]. Memristor and STT-RAM are both nonvolatile memory technologies with zero leakage current, high endurance, fast write time, and small cell size. They both have desirable characteristics that enable them to be potential candidates as a universal memory. Using such nonvolatile technologies to implement an embedded storage subsystem will help reduce the power dissipation in the communication bus and provide years of storage with zero standby current and, hence, minimize the overall energy consumption of the system. In addition, memristors and STT-RAMs have much smaller areas compared with SRAM [18], which results in higher memory density per area. Hence, they were considered in this paper. Table I shows endurance and write performance comparison between the different memory technologies discussed above. One of the main challenges facing memristors and STT-RAMs is the relatively high energy required during write operation. Many researchers are working at the device and material level to address this limitation [19]–[21]. This paper focuses on design parameters optimization to mitigate high write energy. Detailed analysis of the device model is also discussed. We explored the effects of write energy, retention time, and noise margin on device behavior. We also show that these parameters can be tuned to provide energy efficient and robust memory subsystem. The rest of this paper is organized as follows. Section II presents an analysis about memristor device modeling, timing, and power. STT-RAM modeling and design parameters are discussed in Section III. Section IV presents simulation methodologies accompanied with the assumptions used in power calculations. Section V concludes this paper and presents future work.
Fig. 2.
Four fundamental passive elements [22].
II. M EMRISTOR D EVICE M ODELING AND A NALYSIS This section highlights the physical mechanisms and models of memristor devices. A. Memristor Theory and Device Memristor is a resistor with a memory. It was discovered by Chua in 1971, but has not been realized until 2008 by Hewlett Packard (HP) researchers [21]. Memristor is considered by many as the fourth fundamental passive element, as shown in Fig. 2 [22]. It relates flux to charge as in dφ dt dφ = . (1) M(q) = dq dq dt
The dynamical electrical characteristics of the memristor depend on the history of the current passing through it and on the current–voltage bias, so it has to be modeled mathematically using two equations. The first one relates the voltage applied across the device with the current passing through it (2). While the second equation describes an intrinsic property called the state variable and how it evolves with time (3) v = M(x)i (2) dx = f (i ). (3) dt Different theories try to explain the underlying physical phenomenon of a memristor. One such theory relies on doping and ionic mobility. This is the mechanism used by HP. The HP memristor consists of a thin titanium oxide film structured
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. HALAWANI et al.: MODELING AND OPTIMIZATION OF MEMRISTOR AND STT-RAM-BASED MEMORY FOR LOW-POWER APPLICATIONS
Fig. 4.
3
Representation of the noise margin levels of the memristor.
as in Fig. 4 Fig. 3.
Cross section of a Pt\TiO2 \Pt memristor [23].
as Pt\TiO2 \Pt with the TiO2 divided into two regions as shown in Fig. 3: a region highly doped with oxygen vacancies TiO2−x , which acts as a conductor with low resistance, and TiO2 undoped region, which acts as an insulator. HP memristor exhibits bipolar switching characteristics and constant concentration of the vacancies is assumed across the device. When a positive voltage is applied, oxygen vacancies are pushed to the undoped region resulting in a metal–insulator transition and turning the device ON. Reversing the polarity of the applied voltage turns the device OFF. Moving the boundary between the doped and undoped regions changes the state variable giving rise to different resistance states. When no voltage is applied, the oxide thin film is capable of retaining the last resistance state it had. A second hypothesis explains memristive behavior as a phase transition phenomena rather than a doping driven one. Niobium dioxide was shown to be a Mott insulator at room temperature with a narrow conducting channel [24]. The material heats up once a voltage is applied and current starts passing through it. With this joule heating, the material goes through phase transition and becomes metallic with about four orders of magnitude jump in conductivity. The phenomenon gives rise to the negative differential resistance. Multilevel switching can be achieved by continuously modifying the position of the state variable. In this paper, the TiO2 memristor is considered. B. Memristor Models Different models exist in [23] to explain the dynamical behavior of the memristor. 1) Linear [25], [26]: Linear represents the memristor in its simplest form. It assumes a simplified tworesistance region that compromises the memristance, as shown in (4) and (5). This model assumes a linear relation between the applied electric field and the drift velocity (which is not accurate). In addition, the model does not take into account the extreme physical boundaries of the device (x = 0 and x = 1)
M(x) = RON x + ROFF (1 − x) (4) (μRON ) i (t) = k i (t). (5) x(i ˙ )= D2 2) Nonlinear [27]: Nonlinear introduces a window function f (x) in 6 that overcomes the boundary problem in the previous model and limits the state variable between zero and unity. Yet, the nonlinear dependency of the on electric field is not addressed by this model x(t) ˙ = k i (t) f (x).
(6)
3) Exponential: The exponential model captures the nonlinearity of the electric field seen within the memristor and it is the one that closely matches to the published data for write time [28]; hence, it is used in this paper to simulate the dynamical behavior of the memristor in MATLAB. Its drift velocity or the change in state variable is given by dx = a sinh(bv) f (x, i ). dt
(7)
C. Memory Design With Memristor Analysis and Optimization The tradeoffs between switching time, retention time, and resistance ratio will be explored for low writing energy. Write/switching time is the time required by the electric field to push the oxygen vacancies between two different values of the state variable x. The retention time of a memristor is the time taken by the oxygen vacancies to diffuse back to their stable state under the absence of an applied field and is given by [29] AE L exp (8) Retention Time = (Write Time) 2a Eo where L is the length of the device in nanometers and a is the periodicity of the potential wells given in nanometers. The internal field within the memristor is higher than the external by a certain parameter A, which is found to be around 28 for a 10-year retention time at 1 V. E o is the characteristic field calculated at room temperature (300 K) and is given by Eo =
kB T 2qa
(9)
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
TABLE II E XPONENTIAL M ODEL S IMULATION R ESULTS . (a) C OLUMN 1 S HOWS T HAT FASTER S WITCHING T IME I S A CHIEVED AT H IGHER V OLTAGES . (b) C OLUMN 2 R EPRESENTS THE W RITING AND R ETENTION T IME AS A F UNCTION OF V OLTAGE . (c) C OLUMN 3 S HOWS THE
E NERGY C ONSUMED PER B IT AT D IFFERENT V OLTAGE L EVELS
with k B being the Boltzmann constant (in Joules), T is the temperature of the system, and q is the electric charge. According to the Einstein–Nernst formula for mobility, the drift velocity is about three orders of magnitude higher than the diffusion velocity. This means if switching occurs very fast, then the retention time is very short and vice versa. However, in nanoscale devices and at high electric fields, the drift velocity of the vacancies is exponentially dependent on the field and is given by E . (10) velocity = μE o exp Eo Fig. 4 shows the corresponding levels for writing logic 1 and logic 0. Faster switching can be obtained by reducing the noise margin (high to low resistance ratio). The model used does not capture the asymmetry between writing the two logics as seen in the experimental results. Three different scenarios were generated and analyzed using MATLAB (see the Appendix). The state variable x in the range [0.05–0.4] corresponds to logic 1, while from [0.6–0.95] represents logic 0. A safety margin is kept from 0.4 to 0.6. This gives the following two extreme cases and an intermediate one.
1) Case 1: State variable changes between 0.95 and 0.05. 2) Case 2: State variable changes between 0.8 and 0.2. 3) Case 3: State variable changes between 0.6 and 0.4. The original device parameters before optimization gives a 10-year retention time at 1 V with 55-ns switching time and 0.46-pJ writing energy. It can be deduced from the first column of Table II that higher writing speeds can be achieved by reducing the noise margin. Although power increases as voltage increases, the energy consumed decreases since faster switching is obtained, and hence, the area under the current curve is less. Case 3 resulted in a write energy of about 20 fJ with an acceptable noise margin ratio of ∼10x a retention time of around two years and 10-ns switching time, all obtained at 1 V. While case 2 resulted in a writing power of 0.1 pJ. This value was used to carry on the power analysis and consumption in Section IV. The associated energies for switching were computed by integrating the V –I product curve during the write time. In literature, asymmetric switching has been observed with the ON-switching occurring faster than the OFF-switching [25], [28], [30]. This is because drift and diffusion are either in the same direction, which makes it easier to switch, or in the opposite direction where more time is required. The exponential model does not capture
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. HALAWANI et al.: MODELING AND OPTIMIZATION OF MEMRISTOR AND STT-RAM-BASED MEMORY FOR LOW-POWER APPLICATIONS
TABLE III R ESISTANCE R ATIO /N OISE M ARGIN FOR OFF -S WITCHING AT 1 V
5
Hence, in this paper, the focus is on STT mechanism due to its low power consumption feature. This is in line with the need for low power in portable devices.
A. MTJ Models
Fig. 5.
MTJ structure (a) parallel and (b) antiparallel configurations.
this asymmetry. Studying the retention time in the ON case was not addressed here since the ON-state is assumed as the stable state where the oxygen vacancies eventually diffuse to. The resistance ratio or noise margin is calculated as Rhigh divided by Rlow at the instance where x reaches a value slightly greater than 0.6 and then slightly lower than 0.4 during switching. The noise margin values for the different cases are given in Table III. III. STT-RAM M ODELING AND C HARACTERISTICS STT-RAM is a more mature technology compared with memristor. Everspin is expected to commercialize its first STT-RAM in the near future [31]. The interaction between electric currents with the local magnetization in ferromagnetic (FM) materials gives rise to spin-transfer torque. In STT-RAM, the magnetic state is electrically controlled by transferring the spin momentum. The key building block for STT-RAM is the trilayer magnetic tunnel junction (MTJ). It consists of two FM layers with a thin oxide layer sandwiched between them. Logic is realized by the relative direction of the magnetization in FM layers. Fig. 5 shows the conventional MTJ structure. Different MTJ writing (magnetization switching) schemes have been implemented for different generations of magnetic RAM [32]. 1) Field-Induced Magnetic Switching (FIMS): High power is dissipated and large nMOS width is required to account for the large currents in the milliampere region and moving current in-plane to the MTJ. 2) Thermally Assisted Magnetic Switching: It requires current in the milliampere region. 3) Spin Transfer Torque (STT): is observed in MTJs with width less than 100 nm. Unlike FIMS, the current is perpendicular to plane (CPP) as in the z-direction. Requires a small NMOS transistor width as the current is in hundreds of μA range.
STT-RAM still suffers from high programming current compared with SRAM. Switching behavior of STT-RAM is highly dependent on the dynamic behavior and historical status of the write current applied to the MTJ. Therefore, it is very important to model the switching phenomenon accurately to determine the appropriate write voltage and pulsewidth. To ensure accurate modeling of reading and writing cycles, several parameters need be taken into consideration [33]: 1) bias voltage dependency; 2) asymmetric switching from parallel to antiparallel and vice versa; 3) temperature dependency; 4) current pulsewidth dependency; 5) probability of switching; 6) tunnel magnetoresistance (TMR) effect. 1) Static Modeling: In this type of modeling, static response of the MTJ is expressed by the low- and high-resistance states varying with the voltage and critical current calculations. In [34], MTJ behavioral modeling is based on three sets of equations. 1) MTJ simplified resistance model taken from the Brinkman MTJ conductance model that takes into consideration voltage dependency and the height of the oxide barrier used and is given by 1 exp 1.025 tox ϕ 2 Fϕ surface R(0) R(V ) = 2 tox eV m 1 + 2h¯ ϕ R(0) =
tox
1 2
(11) (12)
where tox is the thickness of the oxide layer, F is the factor calculated from the (resistance area) product, ϕ is the height barrier of the oxide, h¯ is the reduced Planck constant, m is the mass of the electron, e is the electron charge, and V is the voltage. In [33], the conductance model is calculated through the Julliere conductance model temperature dependence as a function of the angle θ of the free-layer magnetization with respect to the reference layer as G(θ ) = G T [1 + P 2 cos θ ] + G SI
(13)
where G T = (CT/sinCT)G 0 with G 0 the conductance at zero temperature, P is the spin polarization of the material, and G SI represents the inelastic tunneling conductance. Materials with high polarization gives resistance ratio that can get up to 2x. 2) STT critical current switching model was proposed by Slonczewski. It explains how to get the intrinsic critical current for the switching behavior of the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 6
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
MTJ such that
magnetization vector [37]:
2eαμo Ms d (Hext ± Hani ± Hd /2) h¯ gsv −1 1 = − 4 + P 2 + P 2 (3 + cos(θ ))/4)−1
Jco =
(14)
gsv
(15)
Ico = Jco Area
(16)
where Jco is the critical current density, α is the damping factor, μo is permeability in free space, d is the thickness of the free layer, gsv is the spin polarization efficiency, and Ico is the critical current. In [35], the total efficiency was considered that closely matches experimental results and calculated by gtunnel = ((P/2)/(1 + P 2 cos(θ ))), and was responsible for the efficiency of the tunnel junction itself. For the switching to take place, the current flowing through the MTJ must be greater than the intrinsic critical current. The direction of the current determines whether the resistance state is being switched from parallel to antiparallel or vice versa. MTJ can typically work in three switching. Depending on the switching time required, the current density can be determined. The current density equations as a function of device parameters and pulsewidth are as follows [36]: Jc PRECESSIONAL = Jco +
Cln(π/2θ ) τpw
(17)
Jc DYNAMIC THERMAL + JcPRECESSIONALexp(−A(τpw − T )) Jc = (1 + exp(−A(τpw − T )))
JcTHERMAL = Jco
τpw 1 ln 1− τo
(18) (19)
where τpw > 20 ns in the case of thermal activation switching, 10 ns > τpw > 30 ns for dynamical reversal switching, and τpw < 3 ns for precessional switching. A, C, and T are the fitting parameters. is the thermal stability of the magnetization. 3) TMR effect bias-voltage dependence model explains the relation between TMR and the bias voltage between the electrodes: TMR ratio resembles the distinction between high- and low-resistance states, and it is considered a key factor for the sensing mechanism TMRreal =
TMR(0) 2 1 + VVbias h
(20)
where TMR(0) is the TMR at zero bias voltage and Vh is the bias voltage when TMRreal equals half of the TMR(0). 2) Dynamic Modeling: Switching dynamics of the free layer are explained by Landau–Lifshitz–Gilbert (LLG) equation with the additional STT term. Key point in the following LLG equation is that it assumes a constant
dm dm = −γo [m × Heff ] + α m × dt dt Is h¯ . − γo [m × (m × eˆ p )] gV Ms2 2e
(21)
The first term in (21) corresponds to the precession of the magnetization vector around the effective field. Damping that is represented by the second term aligns the magnetization vector with the effective field or where the minimum energy exists. On the other hand, the additional STT term either strengthens or weakens the damping effect depending on the polarity of the applied switching current. The effective field is a combination of uniaxial crystalline, shape anisotropy, and external applied field that can be due to magnetic field intentionally applied to the MTJ or due to stray fields from nearby MTJs through dipolar coupling [38] Heff = Hanis + Hext + Hdemag + Hstt .
(22)
Micromagnetic software based on the LLG equation exists, such as the Object Oriented MicroMagnetic Framework (OOMMF), where switching current and time can be accurately estimated by varying several parameters and among them the pulsewidth [39]. Others reduced the 3-D equation to 1-D and modeled it using SPICE [40]. But in [41], the 3-D LLG equation was modeled in SPICE. In [37] and [42], behavioral Verilog-A code has been implemented. The free layer is treated as a single macrospin. This is because the atoms of the material are strongly coupled together via exchange coupling. Hence, the magnetic moments will tend to move together in such small scales and ignore the variations different moments. Initially, a nonzero angle is required to have torque acting on the magnetization vector. Thermal fluctuation assists in achieving the initial nonzero angle. Waiting for that to occur results in having incubation delay that can take up to a few nanoseconds [43]. The MTJ is similar to memristor in that it is asymmetric in nature. 3) STT-RAM Design Space: The experiment was performed to calculate and compare the power consumed by the STT-RAM with that consumed by both SRAM and memristor. The LLG equation (21) was modeled and simulated in MATLAB [37]. First, the LLG equation is expanded in the three Cartesian coordinates (x, y, and z) and then converted to spherical as functions of theta and phi. The evolution of the free-layer magnetization vector with time is a function of theta and phi. Theta is the angle between the magnetization vector M and z-axis, while phi is the angle between the projection of the magnetization vector M in the x y plane and the easy axis x. The simple Euler method was used to numerically integrate the field components in the theta and phi directions. The model used does not capture the asymmetry switching between writing logic 1 and logic 0. Switching from P → AP resembles writing logic 1. Table IV summarizes some of the main parameters used in the simulation, where al pha is the LLG damping factor, P is the unitless polarization factor, and Ms is the magnetization saturation of the FM material and the length, width, and thickness of the free layer.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. HALAWANI et al.: MODELING AND OPTIMIZATION OF MEMRISTOR AND STT-RAM-BASED MEMORY FOR LOW-POWER APPLICATIONS
7
TABLE IV MTJ S IMULATION PARAMETERS
TABLE V E LECTRICAL S IMULATION R ESULTS
Fig. 7.
Magnetization switching from P → AP. TABLE VI
T HERMAL S TABILITY FOR D IFFERENT R ETENTION T IMES [45]
and is given by Hk Ms Ar t (24) 2K b T where K B is the Boltzmann constant, Hk is the effective field including magnetocrystalline anisotropy and shape anisotropy, Ms is the magnetization saturation of the material used, Ar is the area of the MTJ, t is the thickness of the free layer, and T is the temperature in Kelvin. Area of the MTJ or the thickness of the free layer can be decreased to reduce the thermal stability. In addition, Hk and Ms can be tuned. For caches, it can be decreased to the order of seconds without the addition of any overhead in the refresh policy. But this will affect the thermal stability of the cell and make it more vulnerable to thermal fluctuations. In addition, for systems with long sleep time and low frequency, an additional refresh operation to hold the data might be needed. Hence, it was not considered in our study. General differences between both technologies are summarized in Table VII in terms of their models, resistance ratio that is a key enabler in reading operation, the maturity of the device that gives insight when it is expected to see it in the market, analog or digital process that controls the device, and whether it has a complex or simple structure. =
Fig. 6.
Precession, switching, and damping for a magnetization vector.
Initially, theta cannot be 90° because it will cause the torque to be zero, and thus, magnetization switching will not take place. Therefore, it is assumed that it equals 89° initially, which is true since the thermal fluctuations causes the magnetization vector not to be fully identical to the easy axis. The easy axis in the model was assumed to be the x-axis (1, 0, 0). The electrical specifications of the simulation results are shown in Table V. If a spin-polarized current of −600 μA is moving through the MTJ, it will cause a voltage drop of −0.64 V across the junction and a resistance ratio of about 1.7x. Switching of the magnetization vector from +x − axis to −x − axis is shown in Fig. 6. Fig. 7 shows how the resistance is affected when switching from the parallel to the antiparallel configuration. According to [44] and [45], retention time is traded for better performance. Following from (14), retention time of STT-RAM is exponentially dependent on the thermal stability. Table VI shows some values for cell retention time versus its thermal stability Retention Time = τo exp()
(23)
where τo = 1/ f o is the thermal attempt time and for storage purposes, it is about 1 ns and represents the thermal stability
IV. E MERGING V ERSUS T RADITIONAL M EMORY T ECHNOLOGIES P OWER A NALYSIS FOR WSNs A typical memory size for WSN applications [46] was targeted to carry out detail analysis of the power consumed using SRAM versus memristor and STT-RAM. As explained
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 8
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
TABLE VII G ENERAL D IFFERENCES B ETWEEN M EMRISTOR AND STT-RAM T ECHNOLOGY M ODELS
TABLE VIII M EMORY A RCHITECTURE FOR A LL T HREE T ECHNOLOGIES
Fig. 8. Total power for 128-kB size memory with normalized frequency. Power per bit from Table IX in addition to (25)–(28) are used to compute the total power. Traditional memristor and STT-RAM results in similar power consumption, but the proposed tuned memristor design improved the power efficiency.
previously, both STT-RAM and memristor have high write energy, zero leakage, and small reading energy. Power consumed in writing is technology dependent, while the reading process in all three technologies is capacitance based. General memory organization and memristor model are based on [23]. Reliability challenge concerned with state drift that is caused by read or write disturb may exist. The proposed solution for read disturb was mitigated in [23] through read-modified-write approach. Another possible solutions are discussed in [47] and [48]. It is recognized that other reliability challenges may exist due to variability of resistance, as discussed in [49]; however, these issues are outside the scope of this paper. The memory architecture used for each technology is shown in Table VIII, where all transistors are 45 nm. For example, SRAM writing procedure is similar to reading one [50]. Sections II and III explained in detail the writing mechanism and power consumed in both memristor and STT-RAM. Writing energies resulted from the simulations were converted to power. The read and write access for our target memory was separated. It is assumed that 60% of the typical memory access are read operation and 40% goes for writing [51]. This assumption is based on typical memory access since
Fig. 9. Typical memory organization with global IO and local bus interface. TABLE IX P OWER C ONSUMPTION PER B IT FOR THE T HREE M EMORY T ECHNOLOGIES
most computing involves fetching 2 operands and storing the result. Hence, read operation occurs ∼66% of the time and store/write happens ∼33%. For example, R3 = R1 + R2 requires reading the values of R1 and R2 , then summing them up and storing it in R3 . Equations (25)–(28) give insight about how the calculations and assumptions were carried in this paper. In addition, (25) is used to calculate single-bit read power based on node capacitance and then these data have been added to write power using (26). After that,
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. HALAWANI et al.: MODELING AND OPTIMIZATION OF MEMRISTOR AND STT-RAM-BASED MEMORY FOR LOW-POWER APPLICATIONS
9
Algorithm 1 Memristor-Retention Time
the power per bit has been scaled and used to calculate power for all bits depending on memory array size. In addition, local versus global active power was considered. The local power contribution to the array access is assumed to be 40%, while
the global power, which is due to routing and multiplexing the data from the different blocks as observed in Fig. 9, is assumed 60%. Since routing the data in and out of the array differs between the three technologies depending on the size
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 10
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
of the array, the power is scaled based on array size. It was assumed that one bank consists of four banks and eight blocks, where each bank is with a 4-kB size. WSNs typically have low duty cycles that ranges between 0.1% and 1% [52] Pread = f C V 2 Pactive = [(40% × Pread ) + (60% × Pwrite )] Pleakage
×Duty Cycle = Pleakage per bit × Memory Array Size
Ptotal = Pactive + Pleakage .
(25) (26) (27) (28)
Table IX shows that SRAM consumes the least power per bit. However, when the total energy is considered as a function of duty cycle, memristor is more power efficient compared with SRAM for up to 90% duty cycle, while SRAM is a better option compared with STT-RAM above 45% duty cycle. Further analysis of the data in Fig. 8 shows that using memristor saves about 87% of power compared with 6T-SRAM, while using STT-RAM saves about 77% at 1% duty cycle. It is also shown in Fig. 8 how the slope changes for traditional and proposed memristors. V. C ONCLUSION The high integration density and scalability of both memristor and STT-RAM coupled with their compatibility with CMOS technology makes them cost effective, while the nonvolatility provides zero stand-by leakage power for energyefficient solutions. One of the challenges in adopting such technologies for mobile applications is the relatively high write energy. In this paper, we proposed a solution that addresses this issue by trading OFF switching time and retention time with noise margin. After optimizing the design parameters using the presented solution, both technologies became a viable option for battery-powered devices such as WSNs. Our results show that despite suffering from high write power, the overall power consumed is much less than the traditional 6T-SRAM. This is because most of the power consumed in SRAM-based memory is leakage in low duty-cycle devices. In future work, the asymmetric switching for both memristor and STT-RAM models needs to be taken into account to achieve more critical insight about writing time and energy. Temperature effect shall also be considered. Map the MATLAB models to SPICE for more accurate power simulation results. A comprehensive system-level energy evaluation with emerging nonvolatile memory is under investigation. A PPENDIX Three scenarios generated and analyzed using MATLAB are shown in Algorithm 1. R EFERENCES [1] D. Sylvester, “How to design nanowatt microsystem,” presented at the 20th ICECS Key Note Speech, 2013. [2] G. Mathur, P. Desnoyers, P. Chukiu, D. Ganesan, and P. Shenoy, “Ultralow power data storage for sensor networks,” ACM Trans. Sensor Netw., vol. 5, no. 4, 2009, Art. ID 33. [3] Y. Yao, L. Wan, and Q. Cao, “System architecture and operating systems,” in The Art of Wireless Sensor Networks. Berlin, Germany: Springer-Verlag, 2014, pp. 697–738.
[4] S. A. McKee and R. W. Wisniewski, “Memory wall,” in Encyclopedia of Parallel Computing. New York, NY, USA: Springer-Verlag, 2011, pp. 1110–1116. [5] H. J. Yoo and D. Kim, “Embedded memory architecture for low-power application processor,” in Embedded Memories for Nano-Scale VLSIs. New York, NY, USA: Springer-Verlag, 2009, pp. 7–38. [6] S. Kaushik and Y. Zorian, “Embedded memory test and repair optimizes SoC yields,” Synopsys, Mountain View, CA, USA, Tech. Rep., Jul. 2012. [7] M. Qazi, M. E. Sinangil, and A. P. Chandrakasan, “Challenges and directions for low-voltage SRAM,” IEEE Des. Test Comput., vol. 28, no. 1, pp. 32–43, Jan./Feb. 2011. [8] N. Ickes, D. Finchelstein, and A. P. Chandrakasan, “A 10-pJ/instruction, 4-MIPS micropower DSP for sensor applications,” in Proc. IEEE Asian Solid-State Circuits Conf., Nov. 2008, pp. 289–292. [9] N. Ickes, Y. Sinangil, F. Pappalardo, E. Guidetti, and A. P. Chandrakasan, “A 10 pJ/cycle ultra-low-voltage 32-bit microprocessor system-on-chip,” in Proc. IEEE ESSCIRC, Sep. 2011, pp. 159–162. [10] S. Narendra, V. De, S. Borkar, D. A. Antoniadis, and A. P. Chandrakasan, “Full-chip subthreshold leakage power prediction and reduction techniques for sub-0.18-μm CMOS,” IEEE J. Solid-State Circuits, vol. 39, no. 3, pp. 501–510, Mar. 2004. [11] International Technology Roadmap for Semiconductors, Emerg. Res. Devices, Tech. Rep., 2011. [12] K. Lee and S. H. Kang, “Development of embedded STT-MRAM for mobile system-on-chips,” IEEE Trans. Magn., vol. 47, no. 1, pp. 131–136, Jan. 2011. [13] K. Eshraghian, K.-R. Cho, O. Kavehei, S.-K. Kang, D. Abbott, and S.-M. S. Kang, “Memristor MOS content addressable memory (MCAM): Hybrid architecture for future high performance search engines,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 8, pp. 1407–1417, Aug. 2011. [14] S. D. Ha and S. Ramanathan, “Adaptive oxide electronics: A review,” J. Appl. Phys., vol. 110, no. 7, p. 071101, 2011. [15] L. Baldi and G. Sandhu, “Emerging memories,” in Proc. IEEE Eur. Solid-State Device Res. Conf. (ESSDERC), Sep. 2013, pp. 30–36. [16] E. Bozorg-Grayeli, J. P. Reifenberg, M. A. Panzer, J. A. Rowlette, and K. E. Goodson, “Temperature-dependent thermal properties of phase-change memory electrode materials,” IEEE Electron Device Lett., vol. 32, no. 9, pp. 1281–1283, Sep. 2011. [17] W. Zhou, “Application of NIL in memory devices,” in Nanoimprint Lithography: An Enabling Process for Nanofabrication. Berlin, Germany: Springer-Verlag, 2013, pp. 203–216. [18] J. J. Yang, D. B. Strukov, and D. R. Stewart, “Memristive devices for computing,” Nature Nanotechnol., vol. 8, no. 1, pp. 13–24, 2013. [19] Y. Halawani, B. Mohammad, M. Al-Qutayri, and H. Saleh, “Modeling of STT-MTJ for low power embedded memory applications: A comparative review,” in Proc. IEEE 20th Int. Conf. Electron., Circuits, Syst. (ICECS), Dec. 2013, pp. 719–722. [20] J. P. Strachan, A. C. Torrezan, G. Medeiros-Ribeiro, and R. S. Williams, “Measuring the switching dynamics and energy efficiency of tantalum oxide memristors,” Nanotechnology, vol. 22, no. 50, p. 505402, 2011. [21] D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams, “The missing memristor found,” Nature, vol. 453, no. 7191, pp. 80–83, 2008. [22] J. M. Tour and T. He, “Electronics: The fourth element,” Nature, vol. 453, no. 7191, pp. 42–43, 2008. [23] B. Mohammad, D. Homouz, and H. Elgabra, “Robust hybrid memristorCMOS memory: Modeling and design,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 11, pp. 2069–2079, Nov. 2013. [24] R. S. Williams, M. D. Pickett, and J. P. Strachan, “Physics-based memristor models,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2013, pp. 217–220. [25] D. B. Strukov, J. L. Borghetti, and R. S. Williams, “Coupled ionic and electronic transport model of thin-film semiconductor memristive behavior,” Small, vol. 5, no. 9, pp. 1058–1063, 2009. [26] Y. N. Joglekar and S. J. Wolf, “The elusive memristor: Properties of basic electrical circuits,” Eur. J. Phys., vol. 30, no. 4, p. 661, 2009. [27] T. Prodromakis, B. P. Peh, C. Papavassiliou, and C. Toumazou, “A versatile memristor model with nonlinear dopant kinetics,” IEEE Trans. Electron Devices, vol. 58, no. 9, pp. 3099–3105, Sep. 2011. [28] J. J. Yang, M. D. Pickett, X. Li, D. A. A. Ohlberg, D. R. Stewart, and R. S. Williams, “Memristive switching mechanism for metal/oxide/metal nanodevices,” Nature Nanotechnol., vol. 3, no. 7, pp. 429–433, 2008.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. HALAWANI et al.: MODELING AND OPTIMIZATION OF MEMRISTOR AND STT-RAM-BASED MEMORY FOR LOW-POWER APPLICATIONS
[29] D. B. Strukov and R. S. Williams, “Exponential ionic drift: Fast switching and low volatility of thin-film memristors,” Appl. Phys. A, vol. 94, no. 3, pp. 515–519, 2009. [30] N. Hashem and S. Das, “Switching-time analysis of binary-oxide memristors via a nonlinear model,” Appl. Phys. Lett., vol. 100, no. 26, p. 262106, 2012. [31] Everspin Technologies Inc. [Online]. Available: http://www.everspin. com/company.php?qtype=overview, accessed 2014. [32] L.-B. Faber, W. Zhao, J.-O. Klein, T. Devolder, and C. Chappert, “Dynamic compact model of spin-transfer torque based magnetic tunnel junction (MTJ),” in Proc. 4th IEEE Int. Conf. Design Technol. Integr. Syst. Nanoscale Era (DTIS), Apr. 2009, pp. 130–135. [33] F. Ren, “Energy-performance characterization of CMOS/magnetic tunnel junction (MTJ) hybrid logic circuits,” M.S. thesis, Univ. California, Los Angeles, CA, USA, 2011. [34] W. Zhao et al., “Macro-model of spin-transfer torque based magnetic tunnel junction device for hybrid magnetic-CMOS design,” in Proc. IEEE Int. Behavioral Modeling Simulation Workshop, Sep. 2006, pp. 40–43. [35] Y. Zhang et al., “Compact modeling of perpendicular-anisotropy CoFeB/MgO magnetic tunnel junctions,” IEEE Trans. Electron Devices, vol. 59, no. 3, pp. 819–826, Mar. 2012. [36] Z. Sun, X. Bi, H. Li, W.-F. Wong, and X. Zhu, “STT-RAM cache hierarchy with multiretention MTJ designs,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 6, pp. 1281–1293, 2014. [37] L. M. Engelbrecht, “Modeling spintronics devices in Verilog-A for use with industry-standard simulation tools,” Ph.D. dissertation, Dept. Elect. Comput. Eng., Oregon State Univ., Corvallis, OR, USA, 2011. [38] C. Augustine, A. Raychowdhury, D. Somasekhar, J. Tschanz, V. De, and K. Roy, “Design space exploration of typical STT MTJ stacks in memory arrays in the presence of variability and disturbances,” IEEE Trans. Electron Devices, vol. 58, no. 12, pp. 4333–4343, Dec. 2011. [39] D. E. Nikonov, G. I. Bourianoff, G. Rowlands, and I. N. Krivorotov, “Strategies and tolerances of spin transfer torque switching,” J. Appl. Phys., vol. 107, no. 11, p. 113910, 2010. [40] Z. Xu, K. Sutaria, C. Yang, C. Chakrabarti, Y. Cao, “SPICE modeling of STT-RAM for resilient design,” in Proc. 5th Int. MOS-AK/GSA Workshop, San Francisco, CA, USA, 2012. [41] G. D. Panagopoulos, C. Augustine, and K. Roy, “Physics-based SPICEcompatible compact model for simulating hybrid MTJ/CMOS circuits,” IEEE Trans. Electron Devices, vol. 60, no. 9, pp. 2808–2814, Sep. 2013. [42] R. Garg, D. Kumar, N. Jindal, N. Negi, and C. Ahuja, “Behavioural model of spin torque transfer magnetic tunnel junction, using Verilog-A,” Int. J. Adv. Res. Technol., vol. 1, no. 6, pp. 36–42, 2012. [43] M. M. de Castro et al., “Sub-nanosecond precessional switching in a MRAM cell with a perpendicular polarizer,” in Proc. 4th IEEE Int. Memory Workshop (IMW), May 2012, pp. 1–4. [44] A. Jog et al., “Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs,” in Proc. 49th ACM/EDAC/IEEE Design Autom. Conf., Jun. 2012, pp. 243–252. [45] C. W. Smullen, V. Mohan, A. Nigam, S. Gurumurthi, and M. R. Stan, “Relaxing non-volatility for fast and energy-efficient STT-RAM caches,” in Proc. IEEE 17th Int. Symp. High Perform. Comput. Archit. (HPCA), Feb. 2011, pp. 50–61. [46] M. Johnson et al., “A comparative review of wireless sensor network mote technologies,” in Proc. IEEE Sensors, Oct. 2009, pp. 1439–1442. [47] A. Ghofrani, M. A. Lastras-Montano, and K.-T. Cheng, “Towards data reliable crossbar-based memristive memories,” in Proc. IEEE Int. Test Conf. (ITC), Sep. 2013, pp. 1–10. [48] I. Vourkas, D. Stathis, G. C. Sirakoulis, and S. Hamdioui, “Alternative architectures toward reliable memristive crossbar memories,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., to be published. [49] P. Pouyan, E. Amat, and A. Rubio, “Reliability challenges in design of memristive memories,” in Proc. 5th Eur. Workshop CMOS Variability (VARI), Sep./Oct. 2014, pp. 1–6. [50] B. Mohammad, Embedded Memory Design for Multi-Core and Systems on Chip. New York, NY, USA: Springer-Verlag, 2014. [51] S. Kvatinsky, “Memristors—Not just memory,” presented at the Annu. Conf. Israeli Semiconductor Ind. (ChipEx), 2013. [52] Y. K. Tan and S. K. Panda, “Review of energy harvesting technologies for sustainable wireless sensor network,” in Sustainable Wireless Sensor Networks. Rijeka, Croatia: InTech, 2010, pp. 15–43.
11
Yasmin Halawani (M’13) received the B.S. degree in electrical and electronics engineering from the University of Sharjah, Sharjah, United Arab Emirates, in 2012, and the M.S. degree in electrical and electronics engineering from Khalifa University, Abu Dhabi, United Arab Emirates, in 2014, where she is currently pursuing the Ph.D. degree. Her research project focused on investigating the suitability of emerging memory technologies, such as memristor and STT-RAM for low power applications.
Baker Mohammad (M’04–SM’13) received the B.S. degree from the University of New Mexico, Albuquerque, NM, USA, the M.S. degree from Arizona State University, Tempe, AZ, USA, and the Ph.D. degree from The University of Texas at Austin, Austin, TX, USA, in 2008, all in electronics and communication engineering. He was involved in a wide range of microprocessors design from high performance, server chips >100 W (IA-64), to mobile embedded processor low power sub-1 watt (XScale) with Intel Corporation, Santa Clara, CA, USA. He was a Senior Staff Engineer/Manager with Qualcomm, Austin, where he was involved in designing high performance and low-power digital signal processor (DSP) processor used for communication and multimedia application. He has over 16 years of industrial experience in microprocessor design with an emphasis on memory, low power circuit, and physical design. He is currently an Assistant Professor of Electronics Engineering with Khalifa University, Abu Dhabi, United Arab Emirates, and a Consultant with Qualcomm Inc., San Diego, CA, USA. He is involved in microwatt range computing platform for wireless sensor nodes focusing on energy harvesting and power management, including efficient dc/dc and ac/dc converters. He has authored one book entitled Embedded Memory Design for Multi-Core and SoC, and publishes several publications in digital system design, memory design and testing, power management, and power conversion, in addition to emerging memory technology modeling and design. He holds eight issued U.S. patents and several pending patent applications. His current research interests include power efficient computing, high yield embedded memory, and emerging technology, such as memristor, STT-RAM, and computer architecture.
Dirar Homouz received the Ph.D. degree from the University of Houston, Houston, TX, USA, in 2007. He is currently an Assistant Professor of Physics with the Department of Applied Mathematics and Sciences, Khalifa University, Abu Dhabi, United Arab Emirates. He is involved in interdisciplinary research, such as modeling memristive devices for use in hybrid CMOS memory applications. His current research interests include computational biophysics, in which he uses molecular dynamics simulations to model protein folding in cell-like environment.
Mahmoud Al-Qutayri (S’86–M’92–SM’06) received the B.Eng. degree from Concordia University, Montréal, QC, Canada, in 1984, the M.Sc. degree from the University of Manchester, Manchester, U.K., in 1987, and the Ph.D. degree from the University of Bath, Bath, U.K., in 1992, all in electrical and electronic engineering. He is currently a Full Professor of Electrical and Computer Engineering and the Associate Dean of Graduate Studies with the College of Engineering, Khalifa University, Abu Dhabi, United Arab Emirates. He has authored numerous technical papers in peer-reviewed international journals and conferences, and co-authored a book. His current research interests include embedded systems design, applications and security, design and test of mixed-signal integrated circuits, wireless sensor networks, and smart environments.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 12
Hani Saleh received the B.Sc. degree in electrical engineering from the University of Jordan, Amman, Jordan, the M.Sc. degree in electrical engineering from the University of Texas at San Antonio, San Antonio, TX, USA, and the Ph.D. degree in computer engineering from the University of Texas at Austin, Austin, TX, USA. He was with several leading semiconductor companies, including Intel, Santa Clara, CA, USA, where he was involved in ATOM mobile microprocessor design, AMD, Sunnyvale, CA, USA, where he was involved in Bobcat mobile microprocessor design, Qualcomm, San Diego, CA, USA, where he was involved in QDSP DSP core design for mobile systems-on-chip (SoCs), Synopsys, Mountain View, CA, USA, where he was a Key Member with the Synopsys Turnkey Design Group and taped out many application-specified integrated circuits (ASICs) and designed the I2C DW IP included in Synopys DesignWare library, Fujitsu, Minato, Japan, where he was involved in SPARC compatible high performance microprocessor design, and Motorola Australia, where he was involved
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
in M210 low power microprocessor synthesizable core design. He was a Senior Chip Designer (Technical Lead) with Apple Inc., Cupertino, CA, USA, where he was involved in the design and implementation of Apple next generation graphics cores for its mobile products (iPad and iPhone). He has a total of 19 years of industrial experience in ASIC chip design, microprocessor design, DSP core design, graphics core design, and embedded system design. He is currently an Assistant Professor of Electronic Engineering with Khalifa University, Abu Dhabi, United Arab Emirates. He is an Active Member with the Khalifa University Research Center, Abu Dhabi, where he leads a project for the development of wearable blood glucose monitor SoC and a mobile surveillance SoC. His experience spans DSP core design, microprocessor peripherals design, microprocessors, and graphics core deign. He has authored over 60 articles in peer-reviewed conferences and journals in digital system design, computer architecture, DSP, and computer arithmetic. He holds three issued U.S. patents and 13 pending patent application. His current research interests include DSP algorithms design, DSP hardware design, computer architecture, computer arithmetic, SoC design, ASIC chip design, field-programmable gate array design, and automatic computer recognition.