DESIGN OPTIMIZATIONS OF SPIKING HARDWARE NEURONS By ...

DESIGN OPTIMIZATIONS OF SPIKING HARDWARE NEURONS

By MANU RASTOGI

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2012

c 2012 Manu Rastogi

2

To my parents, my family and my teachers

3

ACKNOWLEDGMENTS I am greatly indebted to my adviser Dr. John G. Harris for giving me the opportunity to work in his lab. He has been a very understanding and a patient mentor. It is from him that I learnt to be analytical, think creatively, explore new ideas and explore simple solutions for complex problems. I am very grateful to him for giving me the opportunity to attend conferences and workshops. I am grateful to him for trusting me with the responsibility of defining the scope of my projects, courses and research work. As a good coach he was always there to critically point out the flaws in my decision making process and to encourage me whenever I did manage to put things in place in time. I would like to thank my committee members Dr. Jose` C. Principe, Dr Robert M. Fox and Dr. Sachin Talathi for taking the time to provide me with valuable feedback on my research. I would like to specially thank Dr. Fox for teaching me analog circuit design. Dr. Principe has provided me feedback, ideas and helped me broaden my horizons at various junctures of my research at CNEL. My discussions with Dr. Talathi during his class on neuro-dynamics and on my research work gave me a unique perspective. I am also thankful to Dr. Justin Sanchez for his comments and suggestions which significantly shaped my research work. Vaibhav Garg and Ravi Shekhar have been very patient friends, critics and confidantes throughout. I am also very thankful to Vaibhav Garg for showing me the way to CNEL. I have gained immensely from my intense discussions with Jie Xu and Alexander Singh-Alvarado which gave rise to new ideas. Without their technical expertise, help and support I possibly would not have been able to complete this work. CNEL labmates, especially Savyasachi Singh, Erion Hasanbelliu, Jeremy Anderson Sohan Seth and Steve Yen, have been a constant source of support, motivation and fun during my time at CNEL. I am indebted to my parents and my family for providing me with best possible education and opportunities. I am thankful that they engrained into me the value of

4

education and self-reliance. They have been a constant source of encouragement, support and enlightenment. Finally, I would like to acknowledge my funding source National Institute of Neurological Disorders and Stroke (NINDS) which supported my research partially through grant Number NS053561.

5

TABLE OF CONTENTS page ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 CHAPTER 1

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2

INTEGRATE-AND-FIRE AS AN ADC REPLACEMENT . . . . . . . . . . . . . . 21 2.1 Introduction . . . . . . . . . . . . . . . . 2.2 Background and Prior Work . . . . . . . 2.2.1 Integrate-And-Fire (I&F) Sampler 2.2.2 Reconstruction Algorithm . . . . 2.3 Signal-To-Noise Ratio . . . . . . . . . . 2.3.1 Performance Metrics . . . . . . . 2.4 Summary . . . . . . . . . . . . . . . . .

3

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

21 22 27 29 31 33 35

LOW POWER INTEGRATE-AND-FIRE CIRCUIT . . . . . . . . . . . . . . . . . 36 3.1 Circuit Design . . . . . . . . . . . . . . . . . 3.1.1 Comparator . . . . . . . . . . . . . . 3.1.1.1 Comparator topology . . . 3.1.1.2 Hysteresis . . . . . . . . . 3.1.2 Refractory Component . . . . . . . . 3.2 Comparator Delay . . . . . . . . . . . . . . 3.3 Power Consumption . . . . . . . . . . . . . 3.3.1 Sources of Power Consumption . . . 3.3.2 Optimum Comparator Bias Current . 3.3.3 Reducing Static Power Consumption 3.4 Results and Discussion . . . . . . . . . . . 3.5 Summary . . . . . . . . . . . . . . . . . . .

4

. . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

36 36 37 42 44 46 50 50 52 55 64 72

LIMIT ON INTEGRATE-AND-FIRE ENERGY DISSIPATION . . . . . . . . . . . 74 4.1 4.2 4.3 4.4

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . Limit on Inverter Power Consumption . . . . . . . . . . . Limit on Single Stage Comparator’s Power Consumption Limit on I&F Circuit’s Power Consumption . . . . . . . .

6

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

74 74 77 83

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5

ENERGY HARVESTING

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Energy Harvesting Architecture . . . . . . . . . . . . . . 5.2.1 Energy Harvesting Circuit . . . . . . . . . . . . . 5.2.2 Energy Utilization . . . . . . . . . . . . . . . . . . 5.2.3 Finite State Machine (FSM) Based Control Logic 5.3 Results and Discussion . . . . . . . . . . . . . . . . . . 5.4 Conclusion and Future Work . . . . . . . . . . . . . . . 6

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

91 91 92 94 96 105 110

CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

7

LIST OF TABLES Table

page

3-1 Comparison of the this I&F implementation with the past implementations. . . . 68 3-2 Comparison of FOM with other ADCs. . . . . . . . . . . . . . . . . . . . . . . . 70 3-3 Technology scaling for the I&F circuit. . . . . . . . . . . . . . . . . . . . . . . . 72 5-1 Results for the energy harvesting circuit . . . . . . . . . . . . . . . . . . . . . . 110

8

LIST OF FIGURES Figure

page

1-1 Possible ”tweaks” to the current human brain and their respective trade-offs. . . 16 2-1 A comparison of the output data rates of the conventional ADC and the I&F . . 22 2-2 Conventional Integrate and Fire (I&F) sampler

. . . . . . . . . . . . . . . . . . 27

2-3 Biphasic Integrate and Fire data converter . . . . . . . . . . . . . . . . . . . . . 28 2-4 Positive and Negative channel pulse outputs for a sine wave. . . . . . . . . . . 28 2-5 SNR vs. the clock period used for quantizing the sample times. . . . . . . . . . 34 3-1 Single Voltage Amplifier and the corresponding small signal model . . . . . . . 39 3-2 Latch-based positive feedback comparator and its equivalent small signal model. 39 3-3 A cascade of N identical single stage amplifiers . . . . . . . . . . . . . . . . . . 40 3-4 Latch-based positive feedback comparator with output stage. . . . . . . . . . . 41 3-5 Normalized comparator delay vs. step input size . . . . . . . . . . . . . . . . . 41 + 3-6 Measured Vtrip vs. Ibias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3-7 Two possible implementations of the refractory component . . . . . . . . . . . 44 3-8 The complete schematic of the positive channel of the I&F circuit. 3-9 Measured comparator delay vs. bias current

. . . . . . . 45

. . . . . . . . . . . . . . . . . . . 48

3-10 Simulated block wise power consumption of the I&F for the pulse interval . . . 51 3-11 Measured energy/pulse of the comparator as a function of pulse rate . . . . . . 53 3-12 Time varying delay of the comparator . . . . . . . . . . . . . . . . . . . . . . . 54 3-13 SER vs. Comparator Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3-14 Dynamically biased comparator architecture. . . . . . . . . . . . . . . . . . . . 57 3-15 Simulated IBias as a function of Vin . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3-16 Simulated IBias as a function of Vin for varying values of Vcntrl . . . . . . . . . . 59 3-17 Measured energy/pulse of the comparator as a function of pulse rate . . . . . . 60 3-18 Simulated block wise power consumption of the dynamically biased I&F. . . . . 61 3-19 Measured node voltages for the current limit circuit.

9

. . . . . . . . . . . . . . . 62

3-20 Two implementations of the current limiting and source follower circuit. . . . . . 64 3-21 Dynamically biased comparator with source follower. . . . . . . . . . . . . . . . 65 3-22 Simulated power consumption of the dynamically biased I&F. . . . . . . . . . . 66 3-23 Measured SNR vs. power consumption of the I&F sampler. . . . . . . . . . . . 67 3-24 Measured energy/pulse vs. pulse rate for the I&F circuit . . . . . . . . . . . . . 68 3-25 Measured ENOB and power consumption vs. sine wave amplitude. . . . . . . . 69 4-1 Simulated inverter voltage transfer curves for varying Vdd . . . . . . . . . . . . . 77 4-2 Simulated inverter delay and power consumption for varying Vdd . . . . . . . . . 78 4-3 Simulated inverter power-delay product for varying Vdd . . . . . . . . . . . . . . 79 4-4 Single Voltage Amplifier and the corresponding small signal model . . . . . . . 79 4-5 Analytical evaluation of the minimum supply voltage as a function of bias current. 81 4-6 Power-delay product of the comparator vs. Vdd for different Ibias . . . . . . . . . 82 4-7 Single stage comparator delay vs. Vdd for various values of Ibias . . . . . . . . . 83 4-8 Measured and analytical minimum supply voltage for the I&F circuit

. . . . . . 84

4-9 Schematic of the positive channel of the I&F circuit. . . . . . . . . . . . . . . . 85 4-10 Equation 4–29 vs. Vdd for different bias currents. . . . . . . . . . . . . . . . . . 89 5-1 Block level schematic of the biphasic I&F circuit. . . . . . . . . . . . . . . . . . 92 5-2 Second generation Current Conveyor circuit and I&F circuit. . . . . . . . . . . . 93 5-3 Amplifier bias control using the digital-like pulse output from the I&F circuit. . . 94 5-4 Current Conveyor circuit with the switches and the I&F circuit. . . . . . . . . . . 95 5-5 Algorithm for controlling the energy harvesting loop

. . . . . . . . . . . . . . . 98

5-6 FSM representation of the algorithm flow . . . . . . . . . . . . . . . . . . . . . 99 5-7 The control loop FSM with digital flags and control signals. . . . . . . . . . . . 101 5-8 Timing diagram of the FSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5-9 The complete harvesting architecture schematic.

. . . . . . . . . . . . . . . . 106

5-10 Measurement results for the energy harvesting architecture. . . . . . . . . . . . 107

10

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy DESIGN OPTIMIZATIONS OF SPIKING HARDWARE NEURONS By Manu Rastogi May 2012 Chair: John G. Harris Major: Electrical and Computer Engineering Over millions of years of evolution, biology has optimized the neuron structure and the interconnects for energy consumption and communication. Inspired by biology many research groups would like to build large arrays of neurons in hardware for efficient solutions to engineering problems. The purpose of this research is to provide a framework and design methodology for improving the energy efficiency of existing silicon neurons. Although there exist extremely complex silicon implementations of biologically realistic models, it is debatable whether or not using a biologically realistic model has any significant advantage over other models for solving realistic engineering challenges. Thus, as part of this work, design optimizations are performed the integrate and fire (I&F) model. In the past it has been shown that the I&F model can be used for encoding continuous time signals into asynchronous pulse trains. Under certain ideal conditions the original signal can be mathematically reconstructed perfectly from these pulse trains. We use the error introduced by the silicon neuron in the timing of the pulse trains and subsequently the error in the reconstructed signal as the benchmark for optimizing the neuron design. We build a framework for estimating the error introduced in reconstruction due to the timing jitter in the pulses. The model is verified using a simulated I&F circuit. This estimate serves as the maximum attainable signal reconstruction accuracy. Based on these estimates circuit topologies with the most favorable speed-resolution trade-off are

11

selected. Several sine waves of varying frequency and amplitude were encoded through the fabricated I&F circuit. Pulses were recorded and the signal was reconstructed, based on the range of SER values recovered we estimate the optimum bias current at 750 nA at 3.3 V of power supply. These circuit parameters result in a energy/pulse of 100 pJ for a pulse rate of roughly 1 K/sec. In order to further reduce the static power dissipation two novel circuit topologies are proposed. The proposed circuits were fabricated in AMI 0.6 µm technology and measured results indicate an improvement of roughly 80 times over this I&F circuit’s predecessors developed at CNEL. The average energy/pulse is roughly 10 pJ for a pulse rate of 200 pulses/sec, the best published neuron circuit consumes roughly 267 pJ [1]. Since, an analog-to-digital converter (ADC) also encodes continuous time signals into discrete samples, the efficiency of the I&F sampler is compared to that of conventional ADCs . Using traditional estimates such as effective number of bits (ENOB) and figure-of-merit (FOM) the I&F sampler is compared against conventional ADCs and a new class of converters known as asynchronous ADCs (AADCs). We find that the the I&F sampler with a FOM of 0.6 pJ outperforms the other ADCs in its technology node. However, as semiconductor technology advances, conventional ADC designs would naturally outperform the existing circuit implementations of the I&F circuit in 0.6 µm technology. Thus, the performance of the sampler is simulated using the well-accepted predictive technology models (PTMs) from the Arizona Sate University. I&F circuit achieves a FOM of 9.8 fJ/conversion in the 45 nm technology node. Thus indicating that the I&F circuit scales favorably with technology scaling. We present a analysis that estimates that the minimum energy required for generating a pulse is in the order of a few tenths of fJs. In estimating this number the static power consumed in between spike times was ignored. At lower pulse rates the static power dissipation (power consumed in between the pulses) is high resulting in a higher energy/pulse for lower pulse rates. As the pulse rate increases, the inter-pulse

12

interval becomes shorter and resulting in a lower static power dissipation. Thus for extremely high pulse rates the energy/pulse tends to reach this limit. Whether the I&F circuit is used an ADC or used as part of a large array of hardware neurons running a computational algorithm, power consumption is a key criteria. In order to further improve the energy efficiency of the I&F circuit, we present an energy harvesting circuit. The energy harvesting circuit harnesses some of the energy used by the I&F circuit for generating an output pulse and uses that energy to power itself. The I&F circuit generates an output pulse by comparing the capacitor voltage to the threshold, once the pulse is generated the capacitor is discharged. The energy stored in the capacitor is lost each time an output pulse is generated, instead a smarter scheme has been developed where in the capacitor discharges by transferring its stored energy to another circuit as compared to simply discharging to ground. The energy is harvested each time an output pulse is generated, higher the pulse rate higher would be the energy harvested. The overall efficiency of the energy harvesting setup is around 25%.

13

CHAPTER 1 INTRODUCTION 1.1 Motivation The human brain is an engineering marvel of nature. Time and again engineers and researchers have turned to nature for inspiration at various levels of abstraction to solve complex engineering problems. There is a growing interest in building spike-based systems inspired by biology in such fields as speech recognition, face recognition and autonomous robotic systems. Inspiration to study such systems stems from the fact that the best engineered algorithms with the most powerful computing hardware do not come close to matching the performance achieved by the human brain. The human brain consumes about 12 watts of power [2], about a tenth of what the latest computer processors consume, and is able to perform numerous tasks such as robust speech recognition, processing images and even driving a car tasks that machines have trouble accomplishing. The brain achieves these computational feats through a densely interconnected system of specialized cells called neurons. Neurons are electrically excitable cells that process and transmit information by electrochemical signals between themselves via connections called synapses. While, the precise information representation and processing mechanisms in the brain at higher levels of abstraction are not well understood, it is well known that the neurons at the lowest level communicate amongst themselves though some kind of a temporal code by generating spikes. The information is generally contained in the time between the spikes and not in their size or shape. Over time, nature has optimized the brain of different species for energy efficiency and computation. Herculano-Houzel et al. [3] examine cellular scaling rules for primate brains and rodent brains. They show that primate brains scale linearly with the number of neurons in contrast to the rodent brains which scale faster in size than in number of neurons. As a consequence of these linear cellular scaling rules, primate brains

14

have more neurons than rodent brains of similar size, presumably endowing them with greater computational power and cognitive abilities. Roth and Dicke [4] point out that brains of the honeybee, the octopus and the crow and other intelligent mammals look nothing alike on the first glance, however neural circuits for tasks such as vision, smell and navigation have the same basic arrangement. The authors claim that such evolutionary convergence suggests that an anatomical or physiological solution has been reached and there may be very little room for improvement. A recent article in Scientific American by Douglas Fox, speculates that the maxima for intelligence for species has been reached. Fox explores four possible ”tweaks” that can be applied to the human brain for enhancing intelligence. The first obvious change would be to increase the brain sizes while keeping the neuron size the same, this would require longer axons. A large number of neurons mean more energy consumption and longer axons mean larger communication time between neurons resulting in an overall slower processing. The second tweak would be to increase the number of synapses between neurons enabling a faster and more reliable communication between neurons at the cost of increased energy and space requirements. The third possible change is to increase the width of the synapses in order to make the communication more reliable and faster. However, as in the previous case thicker synapses would need more energy and more space. The final tweak is to increase neuron density while keeping the overall brain size the same. Making smaller neurons means smaller cellular membranes which would result in unreliable firing of the neuron thus resulting in higher energy consumption and unreliable communication. Fox’s argument is best summarized by Figure 1-1 from his article [5]. A similar optimization analysis should be simpler for electronic spiking hardware. Inspired by the way the brain represents information in time there has been significant interest in encoding continuous-time signals into asynchronous pulse trains. These pulses can then be processed to get a uniformly sampled signal equivalent to

15

Figure 1-1. Possible ”tweaks” to the current human brain and their respective trade-offs. The figure is from an article by Douglas Fox in Scientific American [5] what would have been obtained through a conventional Analog-to-Digital converter (ADC). These biologically inspired samplers generally have lower power consumption, less silicon area, reduced circuit complexity as compared to conventional ADCs. Examples of such encoders include the time-based ADC by Yang and Sarpeshkar [6] and a Hodgkin-Huxley based video encoder by Lazar and Zhou [7] and Aghagolzadeh et al. [8]. One such asynchronous time based sampler has also been developed at CNEL using the integrate-and-fire (I&F) neuron model. The structure of the I&F sampler

16

has been modified to encode the positive and the negative parts of the input signal into two separate output data channels. The I&F circuit automatically adjusts the output data rate according to the signal amplitude, unlike conventional ADCs that ignore the input signal characteristics. The I&F sampler achieves lower power and lower bandwidth at the cost of a computationally expensive reconstruction method. The simplicity of the sampler along with the power and bandwidth savings has made it more suitable for biomedical sensors. The I&F sampler has been demonstrated for encoding neural data for brain machine interfaces [9]. The theoretical bounds on the reconstruction accuracy for bandlimited signals have also been presented [10]. Apart from encoding continuous-time signals, there is also an interest in building circuits that mimic biology and perform tasks such as vision, audition and cognitive functions. Building brain-inspired hardware is advantageous as compared to software simulations chiefly due to real-time operation. Many research groups and engineers are trying to build hardware neuron arrays with configurable synaptic weights and connections that can be used for exploring basic computation techniques of the brain. The circuits are designed to closely model the responses of biological neurons and synapses. Biologically plausible circuit designs require time and substantial effort. Years ago, Carver Mead and other researchers realized the similarity between the biological neurons and transistor device physics. This led to a number of new analog circuits inspired by biology. These circuits can still be found in some of the neuron arrays being built by research groups. The motivation to build analog circuits came from two facts. First, analog computation in the late 80s and early 90s was more power and area efficient as compared to digital computation. Second the subthreshold operation of the transistor closely resembles the characteristics of real neurons. However, as semiconductor technology has advanced from micro-meter to nano-meter gate lengths, it has become more and more challenging for analog computation to keep up with its digital counterpart. Digital circuits are power, area, mismatch and noise efficient

17

along with shorter design times and ease of reconfigurability. Research groups are increasingly shifting to digital designs for building neuron arrays. This shift from analog to digital neuron arrays is similar to the evolution of hardware artificial neural networks in the late 90s. Biology optimized the neuron size, the inter-connectivity between neurons and the brain size for optimum computation and energy consumption. Even with most optimizations, the human brain at 2 percent of our body weight consumes close to 20 percent of the energy [5]. Since the neurons form the building blocks of the brain, there has been considerable work done in the design and fabrication of neuron models. Various research groups build large arrays of hardware neurons for a better understanding of the brain and to find biologically inspired solutions to complex engineering problems. At the heart of these arrays is usually some kind of a biologically inspired neuron model. A large set of neuron models are available in the literature ranging from the very simple integrate and fire model to the more biologically plausible Hodgkin-Huxley model. Faisal et al. [11] show that shrinking the size of neurons makes their spiking noisy and unreliable and leads to consequently overall higher energy consumption. It is of interest to explore the noise, reliability and overall energy consumption as a function of transistor and technology scaling and compare it to biological neurons. As part of this work we present the design optimization of the I&F circuit. The aim is to lower the energy consumption of the I&F circuit while maintaining its performance as a sampler. We begin by constructing a theoretical framework and experimental results for evaluating the I&F circuit as an ADC. Performance of the I&F circuit as an ADC is used as the key metric for evaluating the energy optimizations performed. Using figure-of-merit (FOM) as a criteria we show that I&F circuit outperforms several existing ADC architectures. The figure-of-merit for the I&F sampler is 0.6 pJ/conversion despite the fact that existing I&F has been implemented in a much older technology. However,

18

as semiconductor technology advances, conventional ADC designs would naturally outperform the existing circuit implementations of the I&F circuit in 0.6 µm technology. Thus, the performance of the sampler is simulated using the well-accepted predictive technology models (PTMs) from the Arizona State University. I&F circuit achieves a FOM of 9.8 fJ/conversion in the 45 nm technology node. Thus indicating that the I&F circuit scales favorably with technology scaling. Sources of energy consumption in the I&F circuit are carefully evaluated. Based on this evaluation, novel circuit-based schemes for reducing energy consumption are presented. The proposed circuits have been fabricated and measured results indicate an improvement of roughly 80 times over this I&F circuit’s predecessors developed at CNEL. In order to further increase the energy efficiency of the I&F circuit, a novel energy harvesting scheme is presented. We show that the energy harvesting scheme when used with an array of I&F circuit can present significant energy savings. The energy harvesting setup with an efficiency of roughly 26% has been presented and implemented on chip in AMI 0.6 µm technology. The basic idea behind harvesting energy comes from the realization that the charge held in the membrane capacitor is wasted to ground after every pulse. Using a very simple mathematical analysis we estimate that the minimum energy required for generating a pulse is in the order of a few tenths of fJs. In estimating this number the static power consumed in between spike times was ignored. At lower pulse rates the static power dissipation (power consumed in between the pulses) is high resulting in a higher energy/pulse for lower pulse rates. As the pulse rate increases, the inter-pulse interval becomes shorter and resulting in lower static power dissipation. Thus for extremely high pulse rates the energy/pulse tends to reach this limit.

19

1.2 Outline Chapter 2 reviews prior work done in the area of asynchronous ADC design. It reviews some of the recently published asynchronous ADCs and the trade-offs associated with them. Also, this chapter surveys some of the prior work done on the I&F sampler and the reconstruction algorithm is reviewed. The chapter ends with a discussion of the performance metrics for the I&F sampler. Chapter 3 describes the circuit design philosophy for the I&F sampler. The chapter begins by reviewing the speed-resolution trade-off associated with various analog comparators. Practical implementation issues for the comparators and their impact on the reconstructed signal are discussed. Sources of power consumption in the analog I&F circuit along with circuit methodologies for reducing the power consumption are discussed. The chapter ends with results and discussion followed by the summary of the work. Chapter 4 presents the mathematical framework for evaluating the fundamental limits of energy consumption of the I&F circuit. The effect of circuit noise, power supply and bias current are evaluated against the performance of the circuit as an ADC. Chapter 5 presents a novel energy harvesting architecture for further increasing the overall energy efficiency of the I&F circuit. Chapter 6 summarizes the work along with the contributions.

20

CHAPTER 2 INTEGRATE-AND-FIRE AS AN ADC REPLACEMENT 2.1 Introduction Conventional analog-to-digital converters (ADCs) encode continuous-time signals into a uniformly sampled discrete-level representation. Once these signals are converted they can be stored and/or processed using conventional signal processing tools. The accuracy of the encoded signal, speed, cost and power are the main criteria for choosing ADC architecture. Certain applications can tolerate some degradation in the recovery of the signal as long as the power consumption, size and bandwidth constraints are met. The output data rate of a conventional ADC is independent of the bandlimited signal characteristics, as long as the sampling rate obeys the Nyquist frequency. As an example if the input signal is zero, the ADC would continue to generate samples at a fixed output rate. Conventional ADCs fail to utilize the signal information resulting in unnecessary power and bandwidth consumption. Figure 2.1 illustrates this point. A comparison of the output data-rates of the conventional ADC and the I&F circuit with respect to the input signal amplitude are shown. The conventional ADC continues to generate a fixed output data rate irrespective of the signal amplitude whereas the I&F adjusts the output data rate according to the input signal amplitude. Owing to these shortcomings there has been a significant interest in exploring novel schemes for analog-to-digital conversion. A popular approach is to use irregular samplers such as asynchronous ADCs (AADC)[12]. AADCs have no clocks or fixed output data rate. An output sample is generated when the signal properties change, potentially leading to significant power and bandwidth savings. For example, a level crossing AADC [13] generates an output sample whenever the signal crosses a quantization level. some other approaches such as reference crossing [14], asynchronous delta converters [15], and level crossing with exponential level spacing [16] have also been implemented for various target applications.

21

80

IF Conventional ADC

Pulse Rate (KSamples/sec)

70 60 50 40 30 20 10 0 0

200

400

600 800 1000 Sinewave Amplitude (nA)

1200

1400

Figure 2-1. A comparison of the output data rates of the conventional ADC output and the I&F. Even though the input signal is around zero and the signal contains no information the conventional ADC wastes bandwidth and power whereas the I&F circuit adjusts the output data rate. AADCs tradeoff bandwidth and power for complex recovery mechanisms and precision in sample times. A review of recovery mechanisms and their effect on the AADCs can be found in [16]. AADCs are a more favorable approach over conventional ADCs when exploiting the specific features of the input. For example, AADCs lead to power and bandwidth saving for applications such as remote sensing and biomedical implants [14, 17] where the signal is slowly changing and sparse in time. In this chapter we present the background work related to the I&F sampler. 2.2

Background and Prior Work

Shannon’s sampling theorem provides both a uniform sampling scheme used in conventional ADC as well as a reconstruction algorithm based on the ‘sinc’ function basis. The theorem is typically explained in terms of the frequency domain. It is also intuitive how distortions manifest when the sampling rate is below the Nyquist bound. However, recovery is no longer straightforward for non-uniform sampling. Nevertheless,

22

there are many advantages when sampling on a non-uniform set. Techniques based on non-uniform sampling have given rise to research areas such as alias free sampling [18]. In general terms a nonuniform sample set has the aliased frequencies throughout the base band instead of concentrating at a single frequency. As will be seen for the I&F model, higher the frequency of the aliased signal lower the amount of aliasing. The current literature for nonuniform sampling and reconstruction is based on frame theory [19]. The recovery algorithms can be explained in terms of linear algebra in the case of finite dimensional spaces . The reconstruction algorithm is nothing more than finding the solution to a least squares problem. The fundamental difference between the conventional literature and recovery methods for adaptive samplers, is that, in the latter, the samples provide us with more information than simply their location or point wise value. In the case of level crossing samplers, for instance, we can bound the variation of the input between consecutive samples since if it had crossed a level the samples set would have been different. Therefore depending on signal properties, sampling schemes can be adapted such that each sample has maximum information content. Different sampling schemes require different architectures and thus based on the implementation irregular samplers can broadly be classified as: Level Crossing Sampler (LCS), Time Variant Threshold Crossing Samplers (TCS), Delta-Sigma Samplers (DSS) and the Integrate and Fire sampler (I&F). In this section we describe the fundamental sampling concept and review the prior work done in each of the categories. An asynchronous level crossing ADC (LCS) [20] is the simplest kind of an asynchronous ADC and is structurally similar to the synchronous flash ADC [21]. Both the LCS and the flash ADC compare the input signal to N equally spaced voltage quantization levels and reports when there is a change. A few LCS-like architectures with adaptive quantization levels have also been reported by Agarwal et al. [22] and by Guan et al. [23]. The flash ADC generates output samples at a fixed data rate, the LCS generates a sample if and only if the signal crosses a threshold level. The asynchronous

23

output sample times are discretized using a synchronous interpolator, for recovery of the signal the samples are filtered. The synchronous successive approximation ADC has been customized for asynchronous operation by Kinniment et al. [24] and Conti et al. [25]. The asynchronous implementation eliminates bit errors resulting due to metastability in the synchronous architecture. Allier et al. [13] have presented the complete design philosophy along with the implementation of a novel asynchronous successive approximation ADC using an asynchronous signaling scheme. The authors have claimed that the figure of merit (FOM), defined later in the chapter, of the ADC is an order of magnitude higher than that of the conventional ADC. An energy efficient ADC architecture based on the principles of successive approximation was also proposed by Yang and Sarpeshkar [6]. An improvement over Allier’s ADC [13] was proposed by Akopyan et al. [26] by using principles of delta modulation from [27] and using them on an asynchronous level crossing ADC. In the proposed implementation, the input signal is encoded as a 1 or as a 0 depending on whether the signal is increasing or decreasing resulting in a further decrease in circuit activity. Prior knowledge of the system delay eliminates the explicit need for a time keeper. Another variation to the conventional LCS was proposed by Jungwirth and Poularikas [28]. In Jungwirth’s scheme, a cosine dither was added to the input signal resulting in a reduced odd harmonic distortion as compared to Sayiner’s ADC [20]. The idea of adding dither to the signal was extended by Hurst et al. [29] by using a 30 KHz triangular dither along with a sixth-order polynomial interpolator (fclk = 500 MHz) it was shown that 10-bit resolution can be achieved by using just seven comparators. The advantage that the I&F has over the LCS comes from the fact that the I&F circuit has no quantization and the I&F can be implemented with just two comparators, the existing LCS in literature need a comparator for every quantization level.

24

Another popular class of irregular samplers is the asynchronous time-varying threshold crossing samplers (TCS). The input signal is compared to a single threshold (or just one quantization level) and an output sample is generated when the input signal crosses this threshold. A simple example of this is the zero crossing circuit. A time varying threshold sampler compares the input signal to a threshold function, θ(t). Examples of these threshold functions include a sinusoidal function or triangular wave [18]. A low power neural recording system has been demonstrated by Yin and Ghovanloo [30] by using a triangular threshold function . The output of the TCS is pulse width modulated (PWM) signal, and the information is saved in pulse widths. The pulse width modulation circuits are also known as voltage-to-frequency circuits [31] and are very well studied in communication systems. The output of the I&F circuit may seem similar to the PWM. However, they are fundamentally different since the output of the I&F is asynchronous whereas the output of the PWM is synchronous. Needless to say this leads to different implementations as well. While the I&F circuit needs just two comparators with fixed thresholds, the PWM circuits typically need an onboard oscillator for generating the time varying threshold circuit. These oscillators tend to be power hogs. Another issue with varying thresholds is that, output data rate tends to be much higher as compared to the I&F. The advantage of using a PWM circuit over the I&F is that the output data can be synchronized with the oscillator clock, which leads to a much simpler transmission and storage of the output samples. Another approach to achieve high resolution analog-to-digital conversion using pulse density principles is known as delta sigma modulation [32]. In a delta-sigma modulator the analog input signal is integrated and compared to a predefined threshold. Once the integrated signal exceeds the threshold an output is generated. This digital output is converted back to an analog signal using a DAC and is subtracted from the integrated signal. More advanced versions of the feedback loop, known as higher-order sigma delta modulators, are often used to achieve a higher effective resolution. The

25

main strength of sigma delta converters is that by oversampling the input signal thus noise shaping the quantized signal (the quantization noise is moved to a higher frequency band as compared to the desired signal bandwidth), a higher effective quantization level can be achieved than the actual number of quantization levels in the circuit architecture. The delta sigma converters achieve this feat at the cost of a higher data rate as compared to the Nyquist rate, larger chip area, higher design complexity and much higher power consumption. Fully asynchronous and partial asynchronous versions of the conventional sigma delta modulators have recently gathered a lot of attention in the research community [33–36]. Fully asynchronous versions have no clocks at all, whereas the partial DSS, also popularly called continuous-time delta sigma converters, have some form of clock; either in the feedback loop or in the feedforward quantization process [37]. While the concepts of noise shaping and quantization are fairly well understood and established for the synchronous DSS, it is an active area of research for the asynchronous samplers. Asynchronous DSS has been studied and developed at CNEL by Dazhi Wei in his Ph.D. work [38]. As part of his dissertation work he demonstrated that the asynchronous DSS is superior to its synchronous counter-part in terms of power consumption, output data rate and simplicity of design for the same effective number of bits. A variation of the DSS known as the delta sampler has been proposed by Culurciello et al. [15] for neural recordings, as the name suggests there is no integration of the input signal, an output pulse is generated when the signal exceeds a predefined threshold, θ, resulting in the input signal being reset. Since an output sample is generated every time the input signal crosses θ and is reset to zero thereafter, this implementation can be also be visualized as a very smart level crossing sampler. The reason for this is that the output samples are generated at integer multiples of θ.

26

Vdd

Iin

Vth Vm

−

τ

+

Vout

Delay

Cm

Vmid

Figure 2-2. Conventional Integrate and Fire (I&F) sampler. This structure for the I&F and its variations have been used in various neuromorphic circuit designs [40–49] 2.2.1 Integrate-And-Fire (I&F) Sampler The block level schematic of the I&F sampler, proposed in [39], is shown in Figure 2-3. The input signal, Iin , is integrated on the capacitor, Cm , resulting in a voltage, Vm , which is compared against positive and negative thresholds, Vth± , using two analog comparators. When one of the two thresholds is reached, the output of the respective comparator, Vout± , goes high, generating an output pulse and resetting Vm to Vmid . The capacitor voltage is then held at Vmid for a predefined time, τr after which the process repeats. The I&F architecture, proposed in [39], is different from some of the other implementations of the I&F [40, 41]. This implementation uses two comparators for separately encoding the positive and the negative parts of the signal whereas the popular approach is to use only one comparator. Using one comparator may be more biological but it implies that the signal needs to be DC shifted to represent signed signals, which leads to a non-zero minimum output pulse rate. The limitation of this structure is that the input signal would have to be either above or below Vmid implying that a DC shift would have to be added to the input signal. A DC shift would result in pulses being generated even if the input signal is zero resulting in wasted bandwidth and power. A solution to this was proposed by Du Chen in [39]. It uses two comparators to generate separate positive and a negative output data stream, hence called the Biphasic I&F. The block level diagram is shown in Figure 2-3. The positive and negative output

27

Vdd

Vth+ Iin

−

Vout+

τ

+

Delay

Vm − VthCm

Vout+ Vmid

Vout-

τ

+

Delay

VoutVmid

Figure 2-3. Biphasic Integrate and Fire data converter proposed by [39]. Sine Wave Input and the Measured Output Pulses from the Biphasic neuron chip

Amplitude in Volts (AC coupled)

Amplitude in Volts

0.1

0.05

0

−0.05

−0.1 0.3 0.25 0.2 0.15 0.1

Negative Channel

Positive Channel

0.05 0 −0.05 −1

0

1

2

3 4 5 Time in Seconds

6

7

8 −3

x 10

Figure 2-4. Positive and Negative channel pulse outputs for a sine wave input as measured from the chip output. pulses are generated by the top and bottom comparators respectively as shown in Figure 2-3. The top comparator has a positive threshold Vth+ and the bottom comparator has a negative threshold Vth− and both have their respective refractory period. Figure 2-4 shows the positive and the negative channel pulses outputs, for a sine wave input. As evident from Figure 2.1 the biphasic I&F circuit adjusts the output data rate as a function of the input signal amplitude. A variation to the biphasic I&F circuit called as the Time Derivative Neuron (TDN) [50] was proposed by Jie Xu, at CNEL as part of PhD research. The TDN exploits the slope of the input signal to generate the output sample. An adaptive circuit was added to the biphasic I&F by Steve Yen as part of his research

28

work at CNEL [51]. The adaptive I&F adapts the threshold of the comparators to further reduce the output data rate. 2.2.2 Reconstruction Algorithm It has been shown that the original signal can be reconstructed perfectly from these asynchronous pulse trains [52–54]. The signal reconstruction is built on the framework presented by Lazar and Toth in [55] linking mathematical frame theory and reconstruction for the case of asynchronous sigma-delta converters. For completeness the reconstruction algorithm proposed by Dazhi Wei as part of his dissertation [38] is presented below. The signal reconstruction from the I&F pulses can be treated as a non-uniform sampling problem. It can be shown that any bandlimited signal can be expressed as a low-pass filtered version of an appropriately weighted sum of delayed impulse functions [56, 57]. Thus a signal x (t) bandlimited to [−Ωs , Ωs ] can be represented as: x (t) = h(t) ∗ X

=

j

X j

wj δ(t − sj )

wj h(t − sj )

(2–1)

where: 1.

wj are scalar weights

2.

h(t) = sin(Ωs t)/(Ωs t), the impulse response of the low pass filter

3.

* denotes convolution

4.

sj ’s are the timings of the impulse train.

5.

It is assumed that the maximum adjacent sample timing is less than the Nyquist period T = π/Ωs

The firing times of the pulses from the I&F circuit must satisfy the following: Z

tie

tib

x (t)dt = θ , ∀i

29

(2–2)

where θ is defined as (Vth+ · Cm ) or (Vth− · Cm ) depending if the spike was fired by the positive comparator or the negative comparator and tib and tie denote the beginning and the end of the integration period. Assuming that the maximum adjacent interval between the pulses is less than the Nyquist period T = π/Ωs i.e. (t(i+1)b − tib < T , (t(i+1)e − tie < T ). Then x(t) can be expressed as in Equation 2–1 Substituting Equation 2–2 into Equation 2–1 θi =

Z

tie

Z

x (t)dt =

tib

tie tib

X

=

X j

wj

=

tie

tib

j

X

Z

wj h(t − sj )dt h(t − sj )dt

wj cij

(2–3)

j

where cij are constants that can be numerically computed with: cij =

Z

tie tib

h(t − sj )dt

(2–4)

The resulting set of linear equations given by CW = Θ is in the matrix form where W is a column vector with wj as the j th row element, C is a square matrix with cij as the i th row and j th column element and Θ is a column vector with θi as the i th row element. Usually C is ill-conditioned and some well-known techniques must be employed to stabilize the computation. Thus the weight vector can be estimated using: W = C −1 Θ

30

(2–5)

Thus Equation 2–5 can be substituted into Equation 2–1 to numerically reconstruct the signal from pulses. Thus x (t) can be represented as: x (t) = [h(t − sj )][cji−1 ][θi ] X = [ h(t − sj )[cji−1 ][θi ] j

= [hi (t)][θi ] X hi (t)θi =

(2–6)

i

where [cji−1 ] is the j th row and i th column element of the inverse matrix C −1 and hi (t) =

X j

h(t − sj )cji−1

(2–7)

2.3 Signal-To-Noise Ratio Although analytically we have shown that the signal can be recovered perfectly under certain constraints, there are some implementation issues that degrade the recovered signal. In order to quantify the quality of the recovered signal we use signal-to-noise ratio (SNR) as the performance metric. SNR is defined as : SNR = 10 log Signal Power /Noise Power

(2–8)

One of the first implementation issues is that for storing/processing these asynchronous samples in the digital format requires that the sample times be quantized. Time quantization implies that the output sample times will no longer be constrained by Equation 2–2. The jitter in sample times or the jitter in θi depends on the input signal and the quantization clock leading to degradation in the SNR. Taking an approach similar to the one presented in [12, 13] we estimate the SNR as a function of the time quantization and signal properties. In a conventional ADC, the noise power is easily determined since we have direct access to the estimated sample as well as to the original value. In the case of level crossing samplers or the I&F, we do not have direct access to the error at

31

each sample and therefore it is not as straightforward. In this case, the error at each sample point is given as the difference between the original function and the recovered signal. The timing error due to the quantization is assumed to be a uniformly distributed random variable in the interval [−Tclk /2, Tclk /2] where Tclk is the clock period of the time quantizer. Further assuming that two consecutive samples at time ti , ti+1 are quantized to tî , ˆti+1 such that tî = (ti + δi ) and ˆti+1 = (ti+1 + δi+1 ). The error due to quantization can be expressed as an error in estimating θi in the reconstruction equation: e(tk ) = x (tk ) − xˆ(tk ) X = gi (tk )(θi − θî )

(2–9) (2–10)

i

Where x (tk ), xˆ(tk ) are the original and the reconstructed signals respectively and signals tk corresponds to a specific sample point. We assume the signal is constant during time intervals {ti , tî } and {ti+1 , ˆti+1 }. Therefore the difference θi − θî can be written as [δi x (ti ) + δi+1 x (ti+1 )]. The variance of the error, or quantization noise can then be defined by: Pquant = E

X i

= E

X

gi (t)(θi − θî )

2

gi (t)(δi x (ti ) + δi+1 x (ti+1 ))

i

(2–11) 2

Since the error is also determined by the evaluation of gi (t) and x (t), the expectation is over gi (t), x (t), δi . We assume all of these to be independent of one another. Furthermore {δi , δi+1 } are zero mean independent random variables uniformly distributed in the interval [−Tclk /2, Tclk /2] and E {gi gi+1 } = 0. Equation 2–11 can

32

be rearranged and simplified: Pquant =

X i

E {gi2 (t)}(E {δi2}E {x 2 (ti )}+

(2–12)

2 E {δi+1 }E {x 2 (ti+1 )}) 1X 2 = E {gi2 (t)}(Tclk Ps ) 6 i

Note that the power of the signal can be written as: Ps = E {x (t)} =

X i

E {gi2 (t)θi2 }

(2–13)

The signal to noise ratio (SNR) can then be written as follows where we have assumed |θi | = θ

∀

i: SNRquant

Ps = 10 log 1 P 2 2 E {g (t)}(Tclk Ps ) 6 2i i 6θ = 10 log 2 Tclk Ps

(2–14)

Thus for x (t) = A sin(ωin t) Equation 2–14 can be rewritten as: SNRquant = 10 log

12θ2 2 A2 Tclk

(2–15)

In summary, Equation 2–14 is the SNR of the I&F sampler as a function of the quantization clock and the input signal under the assumption that θ and τr are precisely known and do not vary with time. The variation of the SNR with Tclk for a simulated I&F sampler and Equation 2–15 are shown in Figure 2-5. The SNR for the simulated I&F follows Equation 2–15 and plateaus near 100 dB due to limited precision of the machine. The slight difference between the simulated I&F and 2–15 at higher time periods stems from the assumption that the errors {δi , δi+1 } are independent. 2.3.1 Performance Metrics SNR is one of several metrics for evaluating the performance of the sampler. However synchronous ADCs have traditionally been evaluated on the basis of the

33

220 200

Matlab simulation of the IF sampler with time quatization. Numerical evaluation

180 160 SNR (dB)

140 120 100 80 60 40 20 0 1e-009

1e-008

1e-007 Tclk (sec)

1e-006

1e-005

Figure 2-5. SNR vs. the clock period used for quantizing the sample times. The SNR above is calculated for a 750 nA · sin(2000πt) with the threshold set at 0.4 V and Cm =20pF effective number of bits (ENOB). The ENOB represents the effective resolution of a conventional ADC [58]: ENOB =

SINAD − 1.76 6.02

(2–16)

Where SINAD is the measured SNR plus distortion and includes all the non-ideal effects of a practical converter. Since the I&F sampler does not have fixed quantization levels and we are quantifying its performance in terms of a conventional ADC we use Equation 2–16 for converting between the SNR of the reconstructed signal and the ENOB. ENOB is a function of both amplitude and frequency [59]. With a fixed number of quantization levels, for a conventional ADC, ENOB is limited by the minimum quantization noise for low amplitude signals and distortions at higher amplitudes. As an example, an input signal which is less than the least significant bit (LSB) was to be given to an ADC, the output would be zero. Similarly if the input signal was to be higher than the maximum quantization level the ADC output would saturate and lead to distortions. Since there

34

are no quantization levels in the I&F, a smaller amplitude signal just produces fewer pulses and the signal can still be recovered. Similarly a higher amplitude signal means higher pulse rate which means a much better recovery of the signal. Also, Vth± and τr can be varied to manipulate the pulse rate according to the input signal characteristics. For example for a low amplitude signal Vth± and τr can be lowered for attaining a higher pulse rate. A more comprehensive measure of synchronous ADC performance is the figure of merit (FOM), defined as [58]: FOM =

Power 2ENOB × min (fsample , 2 · BW)

(2–17)

where fsample is the sampling rate and BW is the signal bandwidth, the minimum of the two is taken to accommodate oversampling data converters. Unlike the conventional ADCs, the asynchronous ADCs, do not generate samples at a fixed rate hence we use BW instead of fsample in estimating the FOM for the I&F sampler. 2.4

Summary

In this chapter an overview of the work done in the field of asynchronous data converters was presented. We also presented the prior work done on I&F along with a review of the reconstruction algorithm. An analytical model is developed for analyzing the effect of time-quantization of the output samples. The model is verified with the measurement results. Existing performance metrics for the ADCs are discussed.

35

CHAPTER 3 LOW POWER INTEGRATE-AND-FIRE CIRCUIT In this chapter the circuit implementation of the I&F circuit is presented. Various comparator architectures are explored and analyzed in terms of the speed-resolution trade-off. Sources of power dissipation in the I&F circuit are also analyzed and a methodology for reducing the power consumption is presented. Some novel low power circuit designs are presented. Towards the end of the chapter simulated and measured results are presented followed by a brief summary and proposed future work. 3.1 Circuit Design As shown in Figure 2-3 the I&F circuit consists of just comparators, delay elements and reset transistors. As discussed in Chapter 2, the key motivation for using asynchronous samplers is lower power consumption as compared to conventional samplers. We will see later in this chapter the major sources of power consumption in the I&F sampler are the comparators. Thus, in this section the comparator design is discussed in detail. The speed-resolution trade-off of existing comparator topologies is discussed. 3.1.1 Comparator The comparator is used to detect whether the voltage on the capacitor Vm is larger or smaller than the threshold Vth and then represent the outcome as a logic 0 (if Vm < Vth ) or a logic 1 (if Vm > Vth ). The input signal is integrated on the capacitor, when the integrated signal becomes greater than the reference value the comparator generates output logic 1. The capacitor is then reset after some delay called the refractory period, the output of the comparator again goes to logic ’0’ and the process is repeated. For an ideal case, the comparison takes place at the precise threshold voltage and the output rises instantaneously. However, for a real comparator the comparison takes place at a voltage slightly higher or lower than the set threshold value due to fabrication process related mismatches, resulting in a fixed offset. Also, a real comparator has finite gain and bandwidth the output rises only after some delay known

36

as the propagation delay. The choice of the comparator design rests on how offset, speed and power consumption affect the end application [60]. Fixed or systematic offset can be accounted for in the reconstruction algorithm by using the actual threshold value, i.e. Vth ±offset, in Equation 2–5. The propagation delay, on the other hand, depends on the input signal representing itself as a signal-dependent jitter or threshold variation [61]. This signal dependent reference variation is hard to estimate and correct for in the reconstruction algorithm since it not known before-hand1 . The I&F encodes the signal information in time therefore signal dependent jitter in timing would show itself as an error in the reconstructed signal. Hence it is essential that the comparator is designed for minimum signal dependent jitter. 3.1.1.1 Comparator topology The speed of the comparator is influenced by design parameters such as transistor sizes and the bias current and it also depends on the circuit topology [62], hence it is essential to examine comparator topologies in detail. Comparators are essentially voltage gain circuits which amplify the differential input voltage (Vin − Vth ) to give an output of logic one or a logic zero. Thus any op-amp architecture can be used as a comparator, however the resolution of a single op-amp is limited by the static gain of the amplifier [63]. The gain of the amplifiers can be increased by cascading a number of these op-amps resulting in a multiplicative increase in the overall gain. Another class of comparators employs positive feedback to build a very fast and a very high-gain around the trip point (Vin − Vth ≈ 0). Figures 3-1, 3-3 and 3-2 show the schematic and corresponding the small signal model of a single voltage amplifier, cascade of amplifiers, and a latch based positive feedback comparator. The general speed-resolution tradeoff

1

We are exploring an iterative scheme of repeatedly reconstructing and estimating the delay

37

for each of the comparator architectures is summarized below, more details can be found in [62–64]: Single voltage amplifier Tp Voh ≈ τc ∆in

(3–1)

N identical voltage amplifiers Tp ≈ τc

N! · Voh ∆in

N1

(3–2)

Latch based positive feedback comparator Tp ≈ log τc

Voh · gmlatch ∆in · gmin

(3–3)

where: 1.

Tp is the propagation delay of the comparator.

2.

τc = 1/ωc where ωc is the -3 dB frequency of the comparator and is given by gmin /Co

3.

Tp /τc is the normalized delay.

4.

∆in is the step input to the comparator given by Vin (t) − Vth

5.

Voh is output voltage level of logic ’1’

6.

In order to simplify the result in Equation 3–1 it is assumed that (gmin · rout ) · ∆in ≫ Voh ; (gmin · rout is the static gain of the amplifier)

7.

In order to simplify the result in Equation 3–2 it is assumed that (gm · rout )N · ∆in ≫ Voh ; (gmin · rout is the static gain of a single amplifier)

8.

In Equation 3–3 gmlatch is the transconductance of cross-coupled transistor M5 as shown in Figure 3-2.

9.

In Equation 3–3 it is assumed that gmlatch ≫ gmdiode is the transconductance of diode connected transistor M3 as shown in Figure 3-2.

10.

In Figure 3-2 it is assumed transistor pairs M1 − M2 , M3 − M4 and M5 − M6 are identical with each other respectively.

38

Vdd

Vdd

Vout V in

Vout

Vth

gm∆in

rout

Cout

Vbias

Figure 3-1. Single Voltage Amplifier and the corresponding small signal model

Diodes

Latch

Diodes

Vdd

Vdd Vdd

Vdd

Vo1 M3

M5 M6

M4 gm1Vin

Vo1

1/gm3

gm5Vo2

Co

1/gm3

gm5Vo1

Co

Vo2 Vo2

Vin

M1

M2

Vth gm1Vth

Vbias

Vss

Figure 3-2. Latch-based positive feedback comparator and its equivalent small signal model. Transistor pairs M1 − M2 , M3 − M4 , M5 − M6 are identical hence gm1 = gm2 ; gm3 = gm4 ; gm5 = gm6 around the transition point.

39

Vin

−

Vth

+

− +

−

. . .N . ..

+

− +

Co

Figure 3-3. A cascade of N identical single stage amplifiers are used for a more favorable speed-resolution tradeoff. The speed-resolution tradeoff is summarized in Figure 3-5 for a Voh value of 1.65 V 1 . Figure 3-5 shows the propagation delay normalized with respect to the time constant of the amplifier, Tp /τc , vs. the step input voltage, Vin (t) − Vth , for the above mentioned comparator topologies. As it is evident from the equations and the single amplifier has the worst normalized delay, multiple amplifiers provide a more favorable speed-resolution tradeoff however at the cost of increased power and area. Also, further analysis of the speed-resolution tradeoff presented in [60, 62] shows that for different values of the step input the optimum number of amplifiers in the cascade architecture, N, varies. The latch based comparators on the other hand present the best speed resolution trade-off amongst the three. Figure 3-5 shows the normalized delay for a latch based comparator vs. the step input voltage for different values of the ratio gmlatch /gmin . The lower the value of the ratio, the faster is the comparator. A more detailed discussion, analysis and the derivation of the above results on various comparator architectures can be found in [62, 64, 65]. The outputs from the latch need to be converted to correct logic levels thus an output stage is required. Figure 3-4 shows the complete schematic of the latch based comparator with the output stage.

1

Assuming that the comparator drives an inverter with a trip voltage of 1.65 V

40

Vdd

Vdd

Vdd Vdd

M3

Vdd

M5 M6

Vdd

M4

M7

M8 Vo1

Vin

Vo2

M1

M2

Vth

Vbias

Vss M9

M10

Vss

Vss

Figure 3-4. Latch-based positive feedback comparator with output stage. Transistors M7 − M10 are needed to convert the output of the latch to equivalent logic level voltages.

104 Single Step Comparator Multi Step Comparator N = 2 Multi Step Comparator N = 6 Multi Step Comparator N = 10 Latch based Comparator gmlatch/gmin = .1 Latch based Comparator gmlatch/gmin = .3 Latch based Comparator gmlatch/gmin = .5 Latch based Comparator gmlatch/gmin = .7

Normalized Delay

103

102

101

100

10-1

0

10

20

30

40

50

60

70

80

90

100

Vinput (mV)

Figure 3-5. Normalized comparator delay vs. step input size in mV for different comparator topologies with the same bias current.

41

3.1.1.2 Hysteresis In Equation 3–3 it was assumed that gmlatch ≫ gmdiode (in Figure 3-2 and 3-4 gm5 ≫ gm3 ), this results in a stronger positive feedback during the trip point thereby increasing the transition slope of comparator. However, it introduces hysteresis in the comparator response. The response of a normal comparator without hysteresis can be summarized as: Vout =

   Logic 0

for Vi n(t) < Vth

(3–4)

for Vi n(t) ≥ Vth

  Logic 1

The response for a comparator with hysteresis is:   −  Logic 0 for Vin (t) < (Vref − Vtrip )    − + Vout = Preserves the previous output for (Vref − Vtrip ) < Vin (t) < (Vref + Vtrip )     +  Logic 1 for Vin (t) > (Vref + Vtrip ) (3–5) Using the square law model, for the circuit schematic in Figure 3-2, the expression for − + Vtrip and Vtrip can be written as:

− Vtrip

=

+ Vtrip

=

2i1 β1

1/2

+ VT 1 −

2i2 β2

1/2

− VT 2

(3–6)

− + Using the EKV model [66] and the design equations presented in [67], Vtrip and Vtrip can

be expressed as: − + Vtrip = Vtrip = 2nuT log(e

√

IC1

− 1) + VT 1 − 2nuT log(e

√

IC2

− 1) − VT 2

(3–7)

where : 1.

(i1 , VT 1 ) and (i2 , VT 2 ) are the small signal current and threshold voltage of transistors M1 and M2 respectively. i1 and i2 can be written as: Ibias 1 + [(W /L)5 /(W /L)3 ] Ibias = Ibias − 1 + [(W /L)5 /(W /L)3 ]

i1 = i2

42

(3–8)

90 Measured Hand Calculation

80

Vtrp+ (mV)

70

60

50

40

30 1e-008

1e-007

1e-006

1e-005

IBias (A) + Figure 3-6. Measured Vtrip and Equation 3–7 vs. Ibias . (W /L)5 = (6.75µm/3µm), (W /L)3 = (3.0µm/3.0µm), (W /L)2 = (9µm/3µm)

2.

n = (COX + CDEP )/COX , it is the capacitive division between gate, surface and body, COX is the gate oxide capacitance per unit area and CDEP is the depletion capacitance per unit area. 1/n is also known as the subthreshold slope factor.

3.

uT is the thermal voltage given by kT/q and approximately equals 25 mV at room temperature.

4.

IC1 , IC2 are the inversion coefficients for transistors M1 , M2 respectively and are defined as: i1 i1 = 2 2.8µCOX (W /L)1 uT Ion (W /L)1 i2 i2 = = 2 2.8µCOX (W /L)2 uT Ion (W /L)2

IC1 = IC1

(3–9) + Figure 3-6 compares the measured Vtrip and Equation 3–7 as a function of Ibias . There is

an approximate scaling factor of 1.25 between the measured and the hand calculation. Also, if 1.4 is used instead of 2.8 in the denominator of Equation 3–8 then the two curves align. Since the comparison happens at a voltage different than Vth , in the

43

Vdd

Vdelay

Vin

Vdd

Vin

Vin

M3 M1

M1 Vout

Vin Vout

t

M2

Vout

Vdelay

M2

Vout

t

M3

t

t

Vss Refractory in rising edge

Vss Refractory in the falling edge

Figure 3-7. Two possible implementations of the refractory component. The refractory period can either be on the rising edge of the output voltage or on the falling edge of the output voltage. + + reconstruction algorithm Vth + Vtrip is used instead of Vth . Since Vtrip depends only

on the bias current and the transistor geometries, it can be estimated beforehand and accounted for in the reconstruction algorithm. As long as the membrane voltage is reset − at a very fast rate and Vmid < Vth − Vtrip a pulse will not have any effect of the preceding

pulse. Note: From this point onwards Vth± is assumed to include the hysteresis voltage. 3.1.2 Refractory Component The delay element serves the purpose of reducing the bandwidth, it does so by keeping the capacitor voltage at Vmid for a pre-determined time after the comparator has fired a pulse. The circuit implementation for the delay element is shown in Figure 3-7. The circuit was proposed in [68] and has been successfully used for limiting the pulse rate of the I&F circuit by [52, 69] and other neuron implementations [40]. The circuit is essentially an asymmetric current-starved inverter. The circuit operates like a normal inverter for the high-to-low input transition with M1 fully turned on and charging the output node, however for a low-to-high input transition the current through transistor M2 is limited by the current through transistor M3 which is set by the voltage Vdelay . This limited current through this transistor results in a delay in discharging the output node, resulting in an extended pulse width. The delay can be controlled by appropriately

44

changing Vdelay . The delay can be approximated as: τfall =

Vdd

Vdd

Vdd Vdd

M3

Cload Vdd 2 µCox (W3 /L3 )Vdelay

Vdd

M5 M6

Vdd

M4

M7

Vdd

Iin

M8 Vo1

Coc1

Vo2

Co1

Coc2

Co2 Voc2

Voc1 Vin

M1

M2 Vbias

Cm

M19 Vmid

(3–10)

Vth

M11

Vout+ Vdd

Vdd

M12

M14

M16

M13

M15

M17

Vss

Vss Vdelay

Vss M9

M10

Vss

Vss

Vdd

Vreset+

M18 Vss

Figure 3-8. The complete schematic of the positive channel of the I&F circuit. The negative channel is symmetric to the positive channel, it is not shown in order to simplify the schematic shown here. Select parasitic capacitances are shown in dotted line.

The complete schematic for the positive channel of the I&F circuit is shown in Figure 3-8. The circuit shown is different from the previously proposed biphasic I&F circuit [52, 70, 71]. In addition to the change in the comparator schematic, the OR gate in the reset loop has been removed and the position of the refractory component has been changed. The valid output states for positive and the negative channels of the biphasic I&F circuit are {00,10,01}, since two channels are independent by the virtue of design and definition. Therefore an OR operation of the outputs of the two channels is not required and hence has been removed from the design proposed by [52, 70, 71]. The second difference is that the refractory component was previously in the ’ON’ period of

45

the output pulse which implied that the refractory component would drive an inverter. This would result in an increased short-circuit power of the inverter being driven by the refractory component, thus the refractory component has been placed such that the refractory period is in the ’OFF’ period. 3.2 Comparator Delay In this section the delay in generating the spike due to the finite gain of the comparator is analyzed along with the impact on the recovered signal. First, the comparator delay is modeled then the impact of this delay on the reconstruction is analyzed by building an analytical model for the impact of delay on the signal-to-noise ratio (SNR). The model is verified comparing against the circuit simulations from Cadence. Figure 3-4 shows the schematic of the comparator. The propagation delay, Tpd , is defined as the time taken for the low-to-high transition at the output of the I&F that is the time it takes for the I&F to generate the pulse when Vmem ≥ Vth . In the equations below, subscripts HL and LH indicate High-to-Low and Low-to-High transitions respectively. Tpd can be expressed as the sum of the individual delays (a more accurate and complicated analysis would be to calculate the combined delay using transfer functions for individual circuit blocks): Tpd = Tcomp + Tinv 1 LH + Tinv 2 HL + Tref

LH

(3–11)

where: 1.

Tcomp is the time it takes for the comparator (transistors M1 -M11 ) in Figure 3-8) to resolve the differential input. This can be estimated by first writing an expression for Voc2 , in Figure 3-8, in terms of Vin and Vth and then solving for the time it takes for Voc2 to reach the inverter trip voltage. gm5 t −t gm 1 (Vin − Vth )e Co1 Voc2 ≈ [−gm7 Rout (1 − e Rout Coc2 )] (3–12) gm5 where :

46

(a) Rout is the equivalent output resistance at node Voc2 and is given by (r8 || r10 ). Where r8 and r10 are the small signal output resistances of the transistors M8 and M10 . (b) Co1 is the parasitic capacitance at node Vo1 and can be estimated as: Co1 = 2Cgd2 + Cdb2 + Cgs4 + Cdb4 + Cgs8 + 2Cgd8 + 2Cgd6 + Cgs6 + 2Cgd5 + Cdb5 ≈ Cgol (2W2 + W4 + 3W8 + 3W6 + 2W5 ) + Cjb (W2 L2 + W4 L4 + W5 L5 ) + 2Cjbsw ((L2 + W2 ) + (L4 + W4 ) + (L5 + W5 )) 2COX + (W4 L4 + W8 L8 + W6 L6 ) (3–13) 3 (c) Coc2 is the parasitic capacitance at the node Voc2 and is given by: Coc2 = Cdb8 + 2Cgd8 + 2Cgd10 + Cdb10 + Cgs12 + 2Cgd12 + Cgb12 + Cgs13 + 2Cgd13 ≈ Cgol (W10 L10 + W12 L12 ) + Cjb (W8 L8 + W10 L10 + W12 L12 ) + Cjbsw (2(W8 + L8 ) + 2(W10 + L10 ) + 2(W12 + L12 )) 2COX (3–14) + (W12 L12 + W13 L13 ) 3 (d) gm1 , gm5 and gm7 are the transconductances of transistors M1 , M5 and M7 respectively. Using the EKV equations from [72, 73] transconductance can be expressed as: ! √ 1 − e − ICi √ ∀ i ∈ {1, 5, 7} (3–15) gmi = Ii nUT ICi Ii and ICi are the drain current and the inversion coefficient of the ith transistor. The inversion coefficient is defined by Equation 3–9. The drain current for transistors M1 and M5 equals Ibias /2. In absence of a fixed bias current through transistors M7 - M10 , I7 is calculated using Vo1 around the trip point. The effective Vo1 for this calculation is taken mid-way between the resulting voltage when the entire bias current flows through transistor M3 and the resulting voltage when the entire bias current flows through M4 i.e. no current flows through M3 . Using a Taylor series expansion for the exponential terms in Equation 3–12, using only first three significant terms from each expansion and solving the resulting quadratic equation gives the following expression for Tcomp :

Tcomp ≈

21 Voc2 τ2 −τ1 τ2 τ1 −1 + 1 − 4 τ1 Ks (Vin −Vth ) 2 (τ2 − τ1 )

(a) τ1 = Cco1 /gm5 47

(3–16)

10 Measured Delay Equation 3-19 9

8

Delay (usec)

7

6

5

4

3

2

1

0 0.01

0.1

1

10

Ibias (uA)

Figure 3-9. Measured Tcomp and equations 3–16, 3–17, 3–18 vs. Ibias . The comparator overdrive voltage was set at 1 mV i.e. Vin -Vth =1 mV. (b) τ2 = Rout .Coc2 (c) Ks = (gm7 .Rout .gm1 )/gm5 Equation 3–16 is too complicated to use in further analysis. It can be simplified further by realizing that τ2 ≫ τ1 , since for a CMOS device 1/gds ≫ 1/gm. Therefore approximating τ2 − τ1 ≈ τ2 Equation 3–16 can be simplified to: Tcomp

1 ≈ 2

4τ1 τ2 Voc2 −τ1 + τ12 − Ks (Vin − Vth ) 12 4τ1 τ2 Voc2 1 2 τ − ≈ 2 1 Ks (Vin − Vth ) 12 τ1 τ2 Voc2 ≈ Ks (Vin − Vth )

12 !

(3–17) (3–18) (3–19)

Figure 3-9 compares equations 3–16, 3–17, 3–18, , 3–19 and the measured delay versus the bias current for a 1 mV overdrive voltage (Vin -Vth = 1 mV). It is evident from the figure that Equation 3–18 is a reasonable approximation for the comparator delay. 2.

Tinv 1 LH , Tinv 2 LH , Tref LH are the corresponding propagation delays of the 1st inverter (transistors M12 − M13 ), 2nd inverter (transistors M14 − M15 ) and the refractory (transistors M16 − M18 ) respectively. Using principles of logical effort [74, 48

75] the combined delay of the inverter and the output stage can be approximated as: W19 0.2Lmin +3 (3–20) Tinv 1 LH + Tinv 2 HL + Tref LH ≈ 2 W12 + W13 (a) 0.2Lmin /2 is the technology delay in pico-seconds and Lmin is the minimum channel length in nanometers [74]. Note: It is evident from equations 3–18 and 3–20 that the dominant delay is the comparator delay. Similar to the above analysis, the I&F takes time to reset, resulting in a finite pulse-width, Tpw : Tpw = Tdischarge + Tcomp + Tinv 1 HL + Tinv 2 LH + Tref

HL

(3–21)

where: 1.

Tdischarge is the time taken to discharge the membrane capacitor, Cm and can be approximated as: Vth (3–22) Tdischarge = W19 Ron nmos (Cm + Cp ) log Vmid (a) Ron nmos is the ”ON” resistance per unit length of a NMOS transistor for a given technology. (b) Cp is the parasitic capacitance in parallel to the membrane capacitance Cm .

2.

The expression for Tcomp is similar to the delay expressed in Equation 3–16. Delay associated with the inverters, (Tinv 1 HL + Tinv 2 LH ), can also be estimated using Equation 3–20.

3.

Tref

HL

is given by Equation 3–10.

Note: In absence of a refractory period the minimum pulse width is theoretically given by the sum of the membrane capacitor discharge time, the comparator delay and the inverter delay. However, for a practical circuit implementation in absence of the refractory period, as soon as the capacitor gets discharged, (such that Vin is slightly lower than Vth ), Vreset goes to zero before the capacitor has the chance of discharging

49

completely to Vmid . Hence the refractory period has to be greater than the discharge time of the capacitor given by Equation 3–22. 3.3 Power Consumption One of the chief motivations of using asynchronous ADCs is lower power consumption as compared to the conventional ADCs. In this section various approaches towards reducing the power are presented. First, the sources of power consumption are analyzed followed by circuit approaches towards reducing power consumption. 3.3.1 Sources of Power Consumption The total power consumption of the I&F circuit can be classified into two parts: static and dynamic power consumption. The dynamic power is defined as the power consumed during the generation of the pulses. The static power is the base power consumed whether or not the I&F circuit is generating the pulses. Figure 3-10 shows the power consumption of the I&F circuit for two consecutive pulses and the time between them. The output stage and the inverters (sub-graph 3 and 4) consume negligible power in between pulses but have higher power consumption during the pulses, whereas the latch consumes power irrespective of the presence or absence of the pulse. The power consumed by the output stage and the inverters is high during the pulse edges, however this power is consumed for a very small duration of time (hundreds of nano-seconds) making the overall energy consumed by these blocks low as compared to the comparator. The main advantage of asynchronous ADCs over the conventional ADCs is their lower output data rates. As shown in Figure 3-10, if the time between the pulses were to be extended, thereby lowering the average data rate, the power consumption per pulse for the comparator output stage and the inverters would remain unaffected since they consume power only during the pulse edges, however the overall power/pulse of the comparator latch would dominate the overall power consumption. This implies that even if the overall data rate for the I&F were to be reduced the overall energy

50

1

1.2

1.4

1.6

1.8 2 Time in microseconds

2.2

2.4

2.6

1

1.2

1.4

1.6


2.2

2.4

2.6

1

1.2

1.4

1.6


2.2

2.4

2.6

1

1.2

1.4

1.6


2.2

2.4

2.6

8 7 6 5 4

Comp. Output Stage Power in uW

3

20 18 16 14 12 10 8 6 4 2 0 -2

Inverter Power in uW

Comp. Latch Power in uW

IF Output Voltage

2 1.5 1 0.5 0 -0.5 -1 -1.5 -2

80 70 60 50 40 30 20 10 0 -10

Figure 3-10. Simulated block wise power consumption of the I&F for two consecutive pulses and the inter-pulse interval. The top graph shows the pulse output, the second graph shows the power consumption of the latch for the positive and the negative channels (transistors M1 - M6 in Figure 3-8), the third graphs shows the power consumption of the output stage (transistors M7 M10 in Figure 3-8) and the last graph shows the power consumption of the inverters (transistors M12 - M18 in Figure 3-8). For this graph the comparator bias current was arbitrarily chosen as 1 µA.

51

consumption per pulse would increase or in other words at lower data rates more energy is wasted in the background than in generating the pulses. Figure 3-11 shows the energy consumption per pulse for different circuit blocks as a function of average pulse rate. As mentioned previously, the output stage and the inverter power remain relatively constant. However, power consumed by the latch becomes the dominant source of power consumption at lower pulse rates and at higher pulse rates since the comparator is less and less idle, the power utilization is high and therefore the energy consumed per pulse by the latch is low. As the asynchronous ADCs get better at lowering the output data rates, the static power would cap the power savings. One of the motivations of using a time based encoding scheme is low power applications and the comparator bias current is the dominant factor in determining the power consumption, hence it is essential that the value of the bias current is chosen carefully. In summary, the dominant source of power consumption is the comparator, since the bias current is always flowing through the comparator whether or not a pulse is being generated. Also, from section 3.2, equations 3–11 through 3–19, it can be inferred that the comparator delay (hence the SER) is a strong function of the bias current. Owing to this dependence of the SER and the power consumption on the bias current their relationship needs to be explored in detail. 3.3.2 Optimum Comparator Bias Current As discussed in the previous section it is imperative that the value of the bias current be carefully chosen. From Equation 3–18 it is evident that the delay depends on the input signal rate. This input signal dependence is detrimental not just for the I&F system but also for the ADCs (chapter 11 in [58]). This is best explained by considering that if a sine wave were to be applied as an input to the I&F. Since the slope varies as a function of time, the delay also varies as a function of time. This slope dependent delay would result in samples shifted in time as compared to an ideal comparator. This shift of pulses can act like a domino effect lowering the SER and would be very

52

10-9 Comparator energy Comp. output stage energy Inverter energy

Energy Consumed / pulse (J)

10-10

10-11

10-12

10-13

10-14

10-15

10-16

10-17

0.1

1

10

100

1000

Pulse Rate in Kpulses/sec

Figure 3-11. Measured energy consumption of the comparator latch, output stage and the inverters as a function of pulse rate. In order to illustrate the Energy consumption as a function of pulse rate the comparator bias current was arbitrarily chosen as 1 µA. difficult to estimate and compensate. This is illustrated by Figure 3-12. The top graph in the figure shows the capacitor voltage for one period of a 1 KHz sine wave and the bottom graph shows the capacitor voltage zoomed around the reference voltage. If the comparator is slower than the input signal around the reference voltage then by the time a pulse is generated and the capacitor voltage is reset the input signal has already become larger than the reference voltage. As shown in the bottom graph, the delay can be equivalently visualized as an input signal dependent shift in the reference voltage. Lower bias currents lead to higher deviation of the effective reference voltage from the predefined reference voltage. The reconstruction algorithm uses the predefined

53

400 300

Vm (mV)

200 100 0 -100 -200 -300 -400 0

0.2

0.4

0.6

0.8

1

Time (ms) 380

Vm (mV)

370 360 350 340 330 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Time (ms)

Figure 3-12. Time dependent threshold variation for a sine wave input. The top graph shows the simulated capacitor voltage of I&F circuit for a 1 KHz sine wave input. The bottom graph shows capacitor voltage, from the top graph, zoomed around the comparator reference. Note: there is a time dependent delay. comparator reference voltage for recovering the signal. However, as inferred from Figure 3-12, the actual threshold varies with the input signal resulting in the SER degradation. The measured SER vs. the bias current for the comparator is shown in Figure 3-13. It can be inferred that the three curves saturate around 500 nA. However, to be on the safe side a bias current value of 750 nA was chosen. In the test setup, the hysteresis voltage for different bias currents was estimated using Equation 3–7 whereas the refractory period for the same was measured. To summarize, the signal dependent variation in the comparator reference is the chief reason for SER degradation. At lower bias currents the effective reference voltage deviates a lot from the set reference voltage and hence the SER decreases at lower bias currents. However if the actual reference voltage, as seen on the oscilloscope is used, then the SER for lower currents improves significantly. This implies that if a calibration

54

50

45

40

SER (dB)

35

30

25

20

15

10

5 1.0e-008

750n sin(5K t) 600n sin(2K t) 750n sin(4K t) 1.0e-007

1.0e-006

1.0e-005

Comparator Bias Current (uA)

Figure 3-13. Measured SER vs. comparator bias for different sine waves. setup were to be used with the I&F circuit then the SER can possibly be improved even at lower bias currents. 3.3.3 Reducing Static Power Consumption It can be inferred from Figure 3-11 at lower pulse rates the comparator bias becomes the dominant factor since they are biased whether or not a pulse is being generated. While the power consumption would be lower as compared to some of the other asynchronous ADCs proposed in [12, 16, 26, 28, 76], because I&F uses two comparators whereas the others use individual comparator for every quantization level. The synchronous ADCs save on static power consumption using a very popular technique called as dynamic biasing proposed by Copeland and Rabaey [77], Monticelli [78] and Hosticka [79]. The comparator bias current is disabled during the off period of

55

the clock and is enabled during the on period of the clock [80–82]. The dynamic biasing works well for the synchronous ADCs due to two reasons. First, the presence of the clock which acts as the enable/disable signal and second due to the fact that output samples from the synchronous ADCs are generated only during the on phase of the clock. Unfortunately for the asynchronous ADCs neither of these factors applies. Motivated by the numbers in Figure 3-11 and the dynamic biasing scheme the best case scenario would be if no power was consumed during the inter pulse interval and the power is consumed only for pulse generation. The digital circuits are not a significant source of power, neglecting the leakage power, when the pulses are not being generated. Since, the static power consumption is dominated by the comparator, in order to have a near zero power consumption in between spikes the comparator bias current would have to turn on when Vin ≈Vth and be turned off at all other times. Thus, a bias voltage or current control block is desired that will turn on only when the input voltage is near the threshold and would turn off at all other times, which essentially implies that a comparator is required. The requirement of a comparator sort of defeats the purpose since it will need to be biased as well. The most straight forward solution would be to use a self-biased comparator. A self-biased comparator controls its bias current using the input. There are several variations of self-biased comparators in literature. Majority of the asynchronous self-biased opamps or comparators have a base DC current and near comparison voltage a large transient current flows through them [63, 65, 83]. The problem with using these architectures is the static bias current that flows through them, which would again defeat the purpose. Keeping these in mind, a very simple self-biased comparator structure is proposed as shown in Figure 3-14. IBias as a function of Vin can be estimated from the roots of the following equation: x k+1 − x k − x + (1 − exp∆ ) = 0 56

(3–23)

Vdd

Vdd

M3

Vdd Vdd

Vdd

M5 M6

Vdd

M4

M7

M8 Vo1

Vin

Vo2

M1

M2 Vin

Vth

Mb1 Ibias Mb2 Vcntrl

M9

M10

Vss

Vss

Figure 3-14. Dynamically biased comparator architecture. where: 1.

k 2 = (W /L)b2 /(W /L)b1

2.

x = exp ICb2 where ICb2 is the inversion coefficient of transistor Mb2 given by Equation 3–9.

3.

∆ = (Vin − Vcntrl − 2Vt )/(2nUT ) where n is the subthreshold slope and UT is the thermal voltage.

√

the above equation reduces to the following for k=1:

⇒ Ibias

x 2 − 2x + (1 − exp∆ ) = 0 p W = · Ion · (log(1 + exp∆ ))2 L b2

(3–24)

Figure 3-15 compares Equation 3–24 and the Cadence simulation for varying Vin . As expected the current rises exponentially as a function of Vin , it continues to rise till the input becomes equal to the reference and the capacitor voltage is reset. This is very different from the desired approach, it was desired that the bias current would saturate around the reference voltage, this curve is independent of the reference voltage. The

57

1.6 Simulated Hand Calculation 1.4

1.2

IBias (uA)

1

0.8

0.6

0.4

0.2

0 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Vin (V)

Figure 3-15. Simulated IBias as a function of Vin . The graph compares the Cadence simulation vs. Equation 3–24. fact that the bias current is independent of the reference voltage raises issues like the small signal parameters, gm s, hysteresis voltage etc. varying with the reference value. The reference dependent variability in these circuit parameters makes it difficult to characterize the circuit and will result in SER degradation. The bias current is a function of the transistor geometries which cannot be changed after fabrication, from Equation 3–24 it is evident that the current is also dependent on ∆ which in turn is dependent on Vcntrl , which can be changed. With slight modification of Vcntrl , since it is raised to an exponential, the Ibias curve can be shifted to the right or the left. As explained before, every reference voltage has its fixed bias current, since Vin rises to the reference voltage after which its reset, so for a given reference voltage the current also rises to a value given by 3–24 and then drops to zero. By suitably changing Vcntrl the curve shown in 3-15 can be shifted to the left or the right. This implies that the 58

1.8 Vcntrl = 1.5 V Vcntrl = 1.56 V Vcntrl = 1.62 V Vcntrl = 1.65 V Vcntrl = 1.68 V Vcntrl = 1.74 V Vcntrl = 1.80 V

1.6

1.4

IBias (uA)

1.2

1

0.8

0.6

0.4

0.2

0 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Vin (V)

Figure 3-16. Simulated IBias as a function of Vin for varying values of Vcntrl . By varying Vcntrl in Figure 3-14 the bias current can be varied, this implies that the current can be varied for a given comparator reference voltage. current corresponding to a given comparator reference voltage can now be increased or decreased accordingly. Figure 3-16 shows IBias curves for different Vcntrl voltages as a function of Vin . As it is evident from the figure this flexibility comes at the cost of increased static power dissipation. Figure 3-22 shows the block wise simulated current consumption for two consecutive spikes and the duration between them. Figure 3-17 shows the energy consumption as a function the pulse rate for the I&F circuit using the dynamically biased comparator shown in Figure 3-14. Since it is the varying current for each reference value that is the problem the solution is to use a current limiting circuit, which consumes near zero-dc current. The circuit shown in Figure 3-20 can be used. For the circuit shown on the left, when Vin is near Vss , M2 is in cutoff, since there is no current through the circuit, and Vout is

59

-9

10

Comparator energy Comp. output stage energy Inverter energy

-10


10

-11

10

-12

10

-13

10

-14

10

-15

10

-16

10

-17

10

0.1

1

10

100

1000

10000


Figure 3-17. Measured energy consumption of the comparator latch, output stage and the inverters as a function of pulse rate for the I&F circuit with the dynamically biased comparator shown in Figure 3-14. ≈ Vss and Vint is near Vdd . As Vin increases current starts to flow through the circuit subsequently increasing Vout . Since Vout increases in order to support the current through the circuit Vint starts to decrease. The output voltage continues to increase with an increasing Vin but it cannot increase to Vdd , since that would turn off transistor M3 and the current would go to zero, instead the circuit reaches a stable point. This stable point is reached when Vds across M1 ≈ 100 mv or transistor M1 is near the edge of triode region. After this stable point increasing Vin has no effect and the output voltage and the current saturate. The circuit on the right can be explained in a similar fashion. The output voltage, Vout , can be used for controlling the bias voltage for the comparator. Using this current limiting circuit as a bias control alleviates the problem of a variable bias current for varying comparator references since the output voltage would get

60

IF Output Voltage

2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 0

20

40

60

80

100 120 140 Time in microseconds

160

180

200

220

0

20

40

60

80


160

180

200

220

0

20

40

60

80


160

180

200

220

0

20

40

60

80


160

180

200

220

Comp. Latch Power in uW

2.5 2 1.5 1 0.5

Comp. Output Stage Power in uW

0

10 9 8 7 6 5 4 3 2 1 0

Inverter Power in uW

120 100 80 60 40 20 0 -20

Figure 3-18. Simulated block wise power consumption of the dynamically biased I&F for two consecutive pulses and the inter-pulse interval. The top graph shows the pulse output, the second graph shows the power consumption of the latch for the positive and the negative channels (transistors M1 - M6 in Figure 3-8), the third graphs shows the power consumption of the output stage (transistors M7 - M10 in Figure 3-8) and the last graph shows the power consumption of the inverters (transistors M12 - M18 in Figure 3-8).

61

2 Vint Vout

Current Limit Circuit Response (V)

1.5

1

0.5

0

-0.5

-1

-1.5 -2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Vin (V)

Figure 3-19. Measured node voltages for the current limit circuit. saturated after the trip point of the circuit set by transistor geometries. Using this circuit would also lead to significant power savings for some of the other asynchronous ADCs, for example the asynchronous level-crossing ADC [84], which uses a comparator for every quantization level. Since the reference voltage for a quantization level does not change, the transistor geometries of the current limiting circuit can be tailored for every quantization level. Figure 3-19 shows the measured node voltages for the circuit shown in Figure 3-20 (a). As far as the I&F is concerned the circuit still needs to be modified. As seen in Figure 3-19 there are two key parameters that define the response of the current limiting circuit. These parameters are the slope of Vout and the saturation point. Using EKV design equations Vout can be estimated (for the circuit shown in Figure 3-20

62

(a)) by the following two design equations for transistors M1 and M2 respectively: Vout = Vss + Vt2 + 2n · ut log(exp

√ IC2

√

Vout = Vin − Vt1 − 2n · ut log(exp

IC1

−1)

(3–25)

−1)

Subtracting the above the two equations to eliminate Vout and solving for Iout we get: Vin − Vt1 − Vt2 − Vss 2n · ut exp

Vin −Vt1 −Vt2 −Vss 2n·ut

√

= log((exp √

= (exp

IC2

IC2

−1)(exp √

−1)(exp

IC1

√

IC1

−1))

(3–26)

−1)

Substituting the exponent on the left hand side in the above equation as ∆ and taking the Taylor series expansion for the exponential terms on the right hand sign with the first three significant terms, the above equation reduces to exp∆ = IC1 · IC2

(3–27)

Thus using Equation 3–9 Iout can be expressed as:

Iout = exp

∆

W L

1

W L

2

2 Ion

21

(3–28)

Using this expression for the output current and substituting it back into Equation 3–25 to solve for Vout : Vout

n · ut ∆ (W /L)1 = Vss + Vt2 + log exp 2 (W /L)2

(3–29)

The expression above does not reflect the saturation of the voltage. As mentioned previously, the circuit saturates because transistor M1 moves into the triode region. This point can also be estimated by realizing that around the saturation point Vds1 ≈ 100 mV. Thus using the EKV equations and assuming that transistor M3 is in saturation (Vds3 > 100mV) the saturation voltage for the circuit can be estimated as following: s Iout + .25 Vsat = 0.1 + Vdd − 1.5 − 2 · ut Iop (W /L)3

63

(3–30)

Vdd

Vdd

M3 Vin

M1

Vdd

Vin

M2 Vout Vin

M2

Vdd

Vref

M1

M1

Vout Vref

M3

M2 Vss

Vss

M1

Vin

Vss

(a)

M2 Vss

(b)

Figure 3-20. (a) Two possible implementations of the current limiting circuit. (b) Two possible implementations of the source follower. As the expression above shows one of the key problems of this current limiting circuit is its dependence on Vdd for the peak value of the current. The bias current generated by the current limiting circuit is independent of the I&F reference voltage. A dynamically biased single stage comparator (essentially a differential amplifier) can be used for generating the difference between Vin and Vref and this difference can be given to the current limiting circuit. Another possibility is to use the output of the source follower circuit as an input to the current limiting block. The source follower circuit (shown in Figure 3-20) has been used in several neuromorphic circuit implementations [40, 42, 47–49]. Figure 3-21 shows the complete schematic of the comparator with the current limit circuit, transistors Mc1 -Mc3 and the source follower Msf1 -Msf2 . 3.4 Results and Discussion Three versions of the I&F sampler, fixed bias I&F, dynamically biased I&F and dynamically biased I&F circuit with current limiting were fabricated in AMI 0.6 µm technology and tested. Some of the results from the chip are presented in this section. Since power efficiency is of interest, the effect of varying the comparator bias current on the SNR was analyzed. The SNR vs. Power plot is shown in Figure 3-23 for a sine

64

Vdd

Vdd

M3

Vdd Vdd

Vdd

M5 M6

Vdd

M4

M7

M8 Vo1

Vdd

Vin

Msf1

Vo2

Vcntrl2 Mc1

Vin

M1

M2

Vth

Mc2 Vth

Mb1 Msf2

Ibias

Mc3 Mb2 Vcntrl M9

M10

Vss

Vss

Figure 3-21. Positive channel dynamically biased comparator with the source follower and current limiting circuit. wave input. The input signal, θ, and the refractory period were kept constant, to ensure a constant pulse rate, while the bias current of the comparators was varied resulting in a variable power consumption. The measurement of power in this setup works to a disadvantage for the I&F since the input signal is not sparse and the sampler generates pulses at a fixed rate. The overall energy/pulse for the three versions of the I&F circuit is shown in Figure 3-24. I&F with dynamically biased comparator consumes roughly a tenth of what its counterpart with fixed bias current consumes. Whereas the energy consumed by the current limited I&F is only one fifth of that consumed by the dynamically biased I&F. The energy savings with the current limited biasing scheme are less than what is expected. The reason for this is that even though the current consumed by the comparators in between the pulses has decreased, the source follower tends to consume energy thereby reducing the overall savings. Table 3-1 compares this I&F implementation with

65

Vmem Voltage

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 20

30

40

50 60 Time in microseconds

70

80

90

20

30

40


70

80

90

20

30

40


70

80

90

20

30

40


70

80

90

Mc1-Mc3 current (nA)

25 20 15 10 5 0

Mb1-Mb2 current (nA)

Msf1-Msf2 current (uA)

-5

45 40 35 30 25 20 15 10 5 0 -5

900 800 700 600 500 400 300 200 100 0

Figure 3-22. Simulated block wise current consumption of the dynamically biased I&F for two consecutive pulses and the inter-pulse interval. The top graph shows the capacitor voltage, the second graph shows the current consumption of the current limiter circuit (transistors Mc1 - Mc3 in Figure 3-21), the third graphs shows the current consumption of the source follower(transistors Msf1 - Msf2 in Figure 3-21) and the last graph shows the current consumption of the comparator bias (transistors Mb1 - Mb2 in Figure 3-21).

66

55

SNR (dB)

50

45

40

35

30 0

2

4

6 Power (uw)

8

10

12

Figure 3-23. Measured SNR vs. power consumption of the I&F sampler. The input to the sampler was 750 nA sin(2000πt), the threshold was 0.4 V and Cm =20 pF. The pulse rate was kept constant while varying the bias currents of the comparators resulting in variable power dissipation. some of the previous implementations of the I&F done at CNEL. To keep the comparison fair the power consumption and energy/pulse were estimated for the same value of SER at similar pulse rates. The implementation of the I&F circuit as part of this work 80 times as power efficient as compared to its predecessors at CNEL. The energy/pulse of the I&F circuit has improved roughly by three orders of magnitude. The average energy/pulse is roughly 10 pJ for a pulse rate of 200 pulses/sec, the best published neuron circuit consumes roughly 267 pJ [1]. The next result is the ENOB (described in section 2.3.1) and the corresponding power consumption. Average ENOB and the corresponding average power consumption measured from the chip are shown in Figure 3-25. ENOB (left side y-axis) and the

67

-9

10

Dynamically Biased I&F Fixed Bias I&F Current Limited Bias I&F

-10


10

-11

10

-12

10

-13

10

-14

10

-15

10

-16

10

-17

10

0.1

1

10

100

1000

10000


Figure 3-24. Measured energy/pulse as a function of pulse rate for the I&F circuits shown in Figure 3-8, 3-14 and 3-21. Table 3-1. Comparison of the this I&F implementation with the past implementations. Reference Pulse Rate SER Power Consumption Chen et al. [39] 47 K 50 dB 92 µW Wei [38] 33 K 50 dB 90 µW Rogers [71] 36 K 50 dB 100 µW this work 40 K 50 dB 1.2 uW power consumption (right side y-axis) are plotted for increasing peak-to-peak amplitude of a 1 KHz sine wave. The I&F sampler generates a higher output data rate as the input signal amplitude increases. More output pulses implies a better reconstruction accuracy thereby resulting in higher ENOB. The ENOB was estimated from the SNR using Equation 2–16, the amplitude was varied from 25 nA to 2.25 µA with Vth+ and Vth− fixed at 300 mV and -300 mV respectively, Cm was 20 pF. As mentioned previously, I&F circuit adjusts the output data rate proportional to the signal amplitude. From Figure

68

12

1.6

Effective Number of Bits

1.2 8 1 6 0.8 4 0.6 2

Power consumption (uW)

1.4

10

0.4 Effective number of bits Power Consumption

0 0

500 1000 1500 2000 Peak-to-Peak input signal amplitude in nA

0.2 2500

Figure 3-25. Measured average ENOB and average power consumption vs. increasing sine wave amplitude. 10 measurements were made at every point and the average value is plotted. 3-25, we note that as the signal amplitude increases the ENOB improves due to higher number of samples for data recovery. The ENOB graph saturates around 10 bits of resolution and increasing the output sampling rate has little effect, this results from the time quantization of the output data samples while recording. Nevertheless, as mentioned previously a more comprehensive measure of synchronous ADC performance is the figure of merit (FOM). The FOM ties together the power, ENOB and the input signal bandwidth and is defined as [58]: FOM =

Power 2ENOB × min (fsample , 2 · BW)

(3–31)

where fsample is the sampling rate and BW is the signal bandwidth, the minimum of the two is taken to accommodate oversampling data converters. Unlike conventional ADCs, the asynchronous ADCs, do not generate samples at a fixed rate hence we use 2 BW instead of fsample in estimating the FOM for the I&F sampler. Using the numbers for

69

Table 3-2. Comparison of FOM with other ADCs. Reference FOM (pJ / conv. step) Kozmin et al. [16] 1.9 Akopyan et al. [26] 10.7 Allier et al. [13] 1.09 van Elzakker et al. [85] .004 Naraghi et al. [86] 0.1 Scott et al. [87] 5.2 Yang and Sarpeshkar [6] 1.17 Craninckx and van der Plas [88] 0.065 Hong and Lee [89] 0.065 this work 0.6

Technology (nm) 350 180 250 65 90 250 350 90 180 600

ENOB and power consumption from Figure 3-25 along with Equation 2–16, the FOM for the I&F sampler is calculated and listed in Table 3-2 along with the FOM for other asynchronous ADCs. Based on reported results in table 3-2 we have shown that our I&F system provides the best published ADC FOM for its technology node. However, as semiconductor technology advances, conventional ADC designs will naturally outperform the existing circuit implementation of the I&F in 0.6 µm technology. Thus, it is of interest to estimate the performance of the I&F circuit with technology scaling. Predictive technology models (PTM) made available by Nanoscale Integration and Modeling group at the Arizona State University have been used by several researchers for verifying circuit techniques in scaled technologies. These models have been used for predicting performance of SRAM cells in nanometer technologies [90], for accurate estimation of circuit leakage power [91], muli-core processor architectures [92] and many others [93–95]. Using the PTM we have simulated the I&F circuit in Cadence. The I&F circuit was re-designed for every technology node by first selecting the transistor sizes for the most favorable speed-resolution tradeoff, followed by estimating the optimal bias current for each technology node. Based on the bias current value the dynamic biasing scheme was designed and simulations were done to estimate the ENOB and the

70

corresponding power consumption leading to the estimated FOM. This design process is described in chapter 3 for AMI 0.6 µm technology. The results for scaling are summarized in Table 3-3. The simulated ENOBs for various technology nodes are slightly higher (as compared to the measurement results for the I&F fabricated in 0.6 µm technology), the reason for this is that these are simulation results with a time quantization of picoseconds and are without any measurement inaccuracies. The power dissipation of the logic gates with technology and supply voltage scaling is relatively easier to estimate as compared to that of analog circuits [96]. The reason for this is that with technology and supply voltage scaling of analog circuits there are several contradictory effects that come into consideration. These may lower the power consumption or increase it, such is the case of circuit noise. It has been shown that as technology and supply voltages scale down there is an increase in the overall circuit noise, which can be decreased by increasing the bias current[97]. Increasing the bias current negates the effect of lowering the supply voltage and results in less than expected power savings. A benefit of technology scaling is that parasitic loading at the intermediate nodes decreases, resulting in faster comparators in newer technologies for a given bias current. Also, various second order effects such as gate leakage and non-zero off transistor current lead to a higher power dissipation than expected [73]. Finally, newer technologies deliver a higher gm /ID as compared to the older technologies [97], since the comparator delay is inversely proportional to this ratio (Equation 3–19 the comparator response improves with technology scaling. We observe these effects in play in the simulation results in estimating the optimum bias current for the I&F. Initially with technology scaling the bias current scales almost linearly, however below 180 nm the bias current does not scale down much. The overall power consumption decreases (although percentage savings tend to saturate), since the power of the inverters in the circuit goes down significantly in newer technologies. Thus, with such opposing

71

Table 3-3. Technology scaling for the I&F circuit. Tech.(nm) Ibias(nA) Vdd(V) Power(uW) 350 600 3.3 0.9632 250 400 2.5 0.5160 180 325 1.8 0.3378 130 275 1.1 0.1531 65 225 1.1 0.1064 45 200 1.1 0.0726

ENOB FOM(pJ) 11.82 0.133 11.50 0.089 11.81 0.047 11.93 0.019 11.65 0.0165 11.84 0.0098

and complex effects it becomes difficult to derive a unified analytical expression for technology and supply voltage scaling for the I&F sampler, however simulations results show that the I&F circuit will perform favorably with technology scaling. 3.5

Summary

In this chapter the design methodology for the I&F circuit is presented and verified through simulation results. Following a bottom-up approach, the comparator architecture with the most favorable speed-resolution tradeoff was chosen. In order to build a fast comparator, hysteresis had to be introduced in the comparator response. Since, hysteresis results in a shift in the threshold voltage, it was modeled and compared to the measurement results. Even though the best possible comparator architecture was chosen, signal dependent delay tends to degrade the SNR of the recovered signal. In order to better understand the circuit parameters that affect the I&F response, the propagation delay of the comparator was estimated. Based on the analytical expressions, it was realized that the comparator is the dominant source of delay and is strongly affected by the bias current. The analytical model was validated by comparing the measured delay and the estimated delay as a function of the bias current. The bias current also effects the speed of the comparator and the overall power consumption. The optimum bias current was estimated by encoding sine waves of different amplitudes and frequencies using the fabricated circuit for different bias currents. The sine waves were reconstructed from the pulses and the SNR was estimated to get the optimum bias current. In order to better understand the

72

power consumption of the I&F circuit, measurements were made for the energy consumption/pulse of the comparator, inverters and the delay element for different pulse rates. Since the comparator is the dominant source of power consumption. A dynamically biased comparator architecture was proposed and fabricated in 0.6 µm technology to lower the energy consumption of the existing comparator. A comparison of the measured energy/pulse of the dynamically biased comparator and the conventional comparator for different output pulse rates was done. The comparison indicates that the dynamically biased comparator reduces the energy/pulse by a factor of 10 as compared to the comparator with a fixed bias. The ENOB and FOM were measured from the I&F circuit fabricated in 0.6 µm technology. The FOM shows that the performance of the I&F circuit as an ADC is comparable and even better in some cases than the performance of other ADCs and AADCs in the same CMOS technology bracket. In order to show that the I&F circuit would scale favorably with technology scaling, the I&F circuit was redesigned using the design methodology explained above for each technology node. Using the PTM the I&F circuit was simulated for gate lengths from 350 nm to 45 nm. Based on the simulation results, the optimum bias current was estimated, power consumption of the I&F with the dynamically biased comparators, ENOB and the resulting FOM for each of the technology nodes. The estimated FOM numbers indicate that the I&F circuit tends to scale favorably with technology scaling. In summary, the I&F sampler provides a simple and efficient alternative for ultra-low power analog-to-digital conversion for applications requiring limited precision and low bandwidth.

73

CHAPTER 4 LIMIT ON INTEGRATE-AND-FIRE ENERGY DISSIPATION 4.1 Introduction There are a number of research groups working on building large arrays of neurons and various kinds of neuron models in hardware. These implementations range from biologically realistic models such as Hodgkin-Huxley [98] to simplified models such as the I&F. These silicon neurons form the core of large arrays on which various computational algorithms are tested. The number of neurons on these arrays varies from tens [40], to thousands [99] and even a million [100]. Over the years various circuit and model level optimizations are made for lowering the energy consumption of the hardware neurons. Researchers have evaluated and built complex communication protocols for the neurons to talk to each other in within a chip as well as in multi-chip architectures [101]. With such a large scale effort in optimizing the circuit implementations for neuron design it is of interest to ask: ”What is the fundamental limit to energy consumption of a silicon neuron?” This question is difficult to answer with great precision but in this chapter we build a methodology to get a feel for where we stand in today’s technology and existing implementations. Since, a digital inverter is a fairly simple circuit to analyze we begin by examining its power consumption limits. Analyzing the inverter prepares the ground work for estimating the power limits of the integrate-and-fire circuit using a single stage comparator. This analysis is extended for the I&F circuit with the regenerative comparator presented in Chapter 3. 4.2 Limit on Inverter Power Consumption There are three major sources power consumption of in a digital CMOS inverter as presented by Chandrakasan and Brodersen [102]: Pavg = Pswitching + Pshort−circuit + Pleakage 2 = α0→1 CL · Vdd · fclk + Isc · Vdd + Ileakage · Vdd

74

(4–1)

The first term represents the switching component of power, where CL is the load capacitance, fclk is the switching frequency and α0 → 1 is the transition activity factor. The second term is due to the direct-path short circuit current Isc which arises when both NMOS and PMOS transistors are simultaneously active conducting current directly from Vdd to ground. The last term is the leakage current which is determined by fabrication technology. For an inverter circuit, the energy drawn from the power supply during 0→1 2 transition is CL Vdd . Half of this energy is stored in the output capacitor and the other half

is dissipated in the PMOS transistor. During the during 1→0 the energy stored in the capacitor is dissipated through the NMOS device [96]. Usually the switching component is the dominant source of power consumption for the inverter [103]. Based on Equation 4–1 reducing the power supply, Vdd will have a quadratic effect on the overall power consumption. Reducing the power supply increases the delay of the inverter. However, if the system is such that it can operate at a considerably lower frequency then what is the minimum supply voltage at which the inverter can be operated? In other words what is the limit to power consumption of the inverter. Assuming that the both transistors are operating in the weak inversion region and the trip point occurs when the current through the PMOS equal the current through the NMOS [104]. Using the following weak-inversion current equation for the transistors [97]: ID = 2nµ0 Cox UT2

W (Vgs −Vt )/nUT e 1 − e −Vds /UT L

(4–2)

Where UT is the thermal voltage given by KT/q and Vt is the threshold voltage. µ is the effective mobility, Cox is the oxide capacitance and n is the subthreshold slope factor. Then equating the PMOS and the NMOS currents: ID

W = e (Vin −Vtn )/nn UT 1 − e −Vout /UT L n W = 2np µ0p Cox UT2 e (Vdd −Vin −Vtp )/np UT 1 − e −(Vdd −Vout )/UT L p 2nn µ0n Cox UT2

75

(4–3)

Solving the above equations for Vin and assuming the nn = np we get: Vin = UT log

np µ0p nn µ0n

1 − e −(Vdd −Vout )/UT 1 − e −Vout /UT

+ Vtn − Vtp + Vdd

(4–4)

Differentiating the above equation to Vout to get an expression for the inverse of the gain we get: dVin = nn dVout

e (Vdd −Vout )/UT e Vout /UT + 1 − e (−Vdd +Vout )/UT 1 − e −Vout /UT

(4–5)

The maximum gain for the inverter occurs at Vout = Vdd /2 and is given by: 1 e Vdd/2UT − 1 Gain ≈ nn 2 Vdd·q/2KT −1 1 e = nn 2

(4–6)

Based on the equation above the inverter gain rolls off quickly with a decreasing supply voltage. Using a similar analysis, Swanson [105] estimates that the minimum Vdd should be greater than 3-4 kT/q for the inverter to be usable. This result is also confirmed from simulated transfer curves of the inverter in Cadence as shown in Figure 4-1. At such low voltages the output and the input noise margins of the inverter are separated by a few millivolts. Circuit noise or ambient noise can trip the inverter or result in false inverter output. In strong inversion the inverter delay does not depend on supply voltage. Whereas exponentially decreasing current with a decrease in voltage leads to exponentially higher delays. Inverter delay for the subthreshold region can be expressed as: td =

Cload Vdd 2 2nµ0 COX UT (W /L)e (Vdd −Vt )/nUT

(4–7)

The simulated delay and power consumption of the inverter as a function of Vdd are shown in Figure 4-2. Reducing the supply voltage would decrease the power consumption of the inverter, however it also increases the delay. Thus, it is essential to explore the energy consumed in one transition as a function of Vdd . This energy per

76

1 Vdd = 980 mV Vdd = 700 mV Vdd = 500 mV Vdd = 350 mV Vdd = 250 mV Vdd = 180 mV Vdd = 130 mV

0.9

0.8

0.7

Vout (V)

0.6

0.5

0.4

0.3

0.2

0.1

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Vin (V)

Figure 4-1. Simulated inverter voltage transfer curves for varying Vdd . The slope of the transfer curve is the gain of the inverter. Inverter gain reduces considerably for Vdd near 100 mV. transition is also known as the power-delay product. Figure 4-3 shows the simulated power-delay product as a function of Vdd . 4.3 Limit on Single Stage Comparator’s Power Consumption In this section we explore the limits to power consumption of a single stage comparator described in Section 3.1.1.1. The single stage comparator is shown again in figure 4-4[ht] Unlike the inverter, deriving a closed form expression for the minimum supply voltage for the differential amplifier is fairly complicated. However the bounds on the supply for the amplifier can be expressed by realizing that all transistors have to be saturated for sufficient gain as shown by Liu et al. [48] and Ismail and Fiez [47] . Thus

77

10000

0.001

Delay Power Consumption 1000

0.0001

Delay in us

10

1

1e-005

0.1

0.01

Power consumption (uW)

100

1e-006

0.001

0.0001

1e-007 0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Vdd (V)

Figure 4-2. Simulated inverter delay and power consumption for varying Vdd . the expression for the supply voltage can be expressed as: Vdd ≥ Vds sat5 + Vds sat2 + |Vgs3 |

(4–8)

The equation above uses Vds sat5 , that is the minimum Vds required for keeping transistor M5 in saturation. The absolute value of Vd5 is determined by the input voltages V1 and V2 . Vds sat5 also determines the input voltage range of the amplifier. Using the EKV design equations Vds sat can be expressed as: √ Vds sat = UT [ IC + 0.25 + 1.5] ≈ 4UT , In weak inversion √ ≈ 2UT , IC ≈ (VGS − Vt ) In strong inversion

78

(4–9)

1e-008

Power-Delay Product (pJ)

1e-009

1e-010

1e-011

1e-012

1e-013

1e-014 0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Vin (V)

Figure 4-3. Simulated inverter power-delay product for varying Vdd .

Vdd

Vdd

Vout V in

Vout

Vth

gm∆in

rout

Cout

Vbias

Figure 4-4. Single Voltage Amplifier and the corresponding small signal model

79

5

where IC is the inversion coefficient of a transistor as defined by Equation 3–9. Using the EKV design equations parented by Binkley [73], the gate overdrive voltage can be expressed as: Vgs − Vt = 2nUT log[e

√

IC

− 1]

≈ nUT log[IC ] In weak inversion √ ≈ 2nUT IC In strong inversion

(4–10)

Substituting Equation 4–9 and 4–11 in Equation 4–8 to get an expression for the minimum supply voltage we get: s "s # √ Ibias Ibias Vdd ≥ UT + 0.25 + + 0.25 + 3 + 2n log[e IC3 − 1] Ion (W /L)5 2Ion (W /L)2 (4–11) Since the minimum supply voltage is dependent on the bias current, Figure 4-5 shows the minimum supply as a function of bias current for a given sizing of the transistors. The I&F circuit encodes the information in time and the SER reduces with an increase in the comparator delay. Hence, it is of interest to investigate the effects of reducing supply voltage on the comparator delay. As described in Section 3.1.1.1 the comparator delay is dependent on the bias current of the comparator. Using the results from the analysis presented in section 3.2, the delay of the single stage comparator can be expressed as: Tp =

Voh gm1 · ∆in Co

(4–12)

Where Voh is the output voltage level of logic ’1’ (Vdd /2), ∆in is the step input to the comparator given by Vin (t) − Vth , Co is the load capacitance and gm1 is the transconductance of the input stage given by equation 3–15. As seen from Equation 4–13, the delay depends on the gm of the input stage and the required output voltage swing. As long as all the transistors are saturated reducing the supply voltage should reduce the delay since the Voh also scales with Vdd . The

80

1.7

1.6

Minimum Vdd

1.5

1.4

1.3

1.2

1.1

1 0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

Ibias (uA)

Figure 4-5. Analytical evaluation of the minimum supply voltage as a function of bias current. dynamic power required for a single comparison is simply given by the product of Vdd and Ibias . Thus the power-delay product for a given bias current can be expressed as: Power Delay = Vdd

min

· Ibias ·

Vdd min gm1 · 2∆in Co

(4–13)

Figure 4-7 shows the delay vs. Vdd for various bias currents for the comparator. Figure 4-6 shows the power-delay product vs. Vdd for various bias currents. There are two interesting observations to me made from Figures 4-6. First the minimum dynamic power-delay product occurs around 1 pJ. The second thing to note based on the graphs is that the circuit offers two degrees of freedom namely the Vdd and Ibias to achieve near minimum power-delay product. Depending on the system level constraints either a high

81

110 Ibias = 0.3e-6 Ibias = 0.8e-6 Ibias = 1.3e-6 Ibias = 1.8e-6 Ibias = 2.1e-6

100 90

Power Delay (pJ)

80 70 60

50 40 30 20 10 0 0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Vdd (V)

Figure 4-6. Analytical evaluation of the power-delay product of the single stage comparator vs. Vdd for various values of Ibias . Vdd with high bias current can be chosen or low bias current with low Vdd can be chosen for a minimum power-delay product. In the above analysis and the graphs the dynamic power consumption of the comparator was considered and the static power consumption was ignored. Theoretically in order to make a comparison only dynamic power is required whereas static power is just wasted in between comparisons. Thus, the absolute minimum power consumption of a comparator is the power required only during the time the input voltage goes above the reference voltage. However, as it has been previously shown in Figure 3-17 for a practical comparator static power cannot be ignored since it is virtually impossible to completely turn-off a comparator in between pulses.

82

100

Delay (us)

10

1

Ibias = 0.3e-6 Ibias = 0.8e-6 Ibias = 1.3e-6 Ibias = 1.8e-6 Ibias = 2.1e-6 0.1 0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Vdd (V)

Figure 4-7. Analytical evaluation of the single stage comparator delay vs. Vdd for various values of Ibias . 4.4 Limit on I&F Circuit’s Power Consumption The schematic for the I&F circuit presented in Section 3.1.1.2 is shown again in Figure 4-9. Using the approach similar to that of the single stage amplifier. The limit on the power supply for the circuit can be expressed as [106]: Vdd ≥ Vds

sat11

+ Vds

sat1

+ |Vgs3 |

(4–14)

Figure 4-8 compares the measured minimum voltage and equation 4–14. The measured minimum voltage follows the trend predicted by Equation 4–14. The possible reasons for the difference between the measured and the predicted minimum voltage can be due to the fact that in solving the equation approximate values of the threshold voltages

83

1.6 Analytical Measured 1.5

Minimum Vdd

1.4

1.3

1.2

1.1

1

0.9 0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

Ibias (uA)

Figure 4-8. Measured and analytical minimum supply voltage for the I&F circuit as a function of bias current. were used and the fact that the same subthreshold slope value was used for the all the transistors irrespective of their respective region of operation. Based on the energy analysis presented in the previous section, it seems that the estimating the minimum energy required for generating a pulse should be straight forward. The obvious approach would be to express the combined delay of the comparator the three inverters and multiply it by the power consumption of the respective circuit blocks. Estimating the delay, the power consumption and subsequently the energy of an inverter chain operating in strong inversion is fairly simple using principles of logical effort. However, due to the exponential dependence of the current on gate voltage it is a lot harder to estimate the power and the delay for an inverter chain operating in subthreshold region [103]. Another issue with this simplistic approach

84

Vdd

Vdd

Vdd Vdd

M3

Vdd

M5 M6

Vdd

M4

M7

Vdd

Iin

M8 Vo1

Coc1

Vo2

Co1

Coc2

Co2 Voc2

Voc1 Vin

M1

M2 Vbias

Cm

M19 Vmid

Vth

M11

Vout+ Vdd

Vdd

M12

M14

M16

M13

M15

M17

Vss

Vss Vdelay

Vss M9

M10

Vss

Vss

Vdd

Vreset+

M18 Vss

Figure 4-9. The complete schematic of the positive channel of the I&F circuit. The negative channel is symmetric to the positive channel, it is not shown in order to simplify the schematic shown here. Select parasitic capacitances are shown in dotted line. is that from figure 4-3 we realize that there is an optimum value of the supply voltage for the inverter for minimum energy consumption. Thus the minimum voltage that the comparator can operate at may not be the optimum voltage for the inverter chain. The energy consumed by an inverter chain of length len can be expressed as [106]: ETotal = α · len · Eswitching + Pleakage · td

1 2 = α · len · CL Vdd + len · V dd · Ileak (len · tp ) 2

(4–15)

α is the activity factor, td is the delay of the overall inverter chain and tp is the delay of a single inverter. In the above equation the short-circuit energy has been ignored. Since it has been shown in [107] and [106] that for inverter chains operating in subthreshold regime the leakage power accounts for less than 5% of the overall power consumption. In equation 4–15, tp needs to be estimated accurately. Let tp step denote the ideal inverter delay with a step input and tp actual denote the actual inverter delay for an input

85

with a rising time of τr . tp step can be estimated as Zhai et al. [106] tp step =

1 CL · Vdd · 2 Isub

(4–16)

where Isub is the subthreshold current as described in Equation 4–2. Similarly the step delay can be extended to the actual delay as [104]: r τ 2 r tp actual = tp2 step + 2

(4–17)

[104] shows that for a slow rising input such that the input rise is slower than the inverter delay i.e. τr > tp actual , the actual delay can be estimated as: tp actual = 0.84 · τr ⇒ tp actual = 1.2 · tp step

(4–18)

Substituting this result in equation 4–15 we get: ETotal

1 CL · Vdd 1 2 = α · len · CL Vdd + len · Vdd · Ileak len · 1.2 · · 2 2 Isub 1 Ileak 2 = α + 1.2len · · lenCL Vdd 2 Isub

(4–19)

Isub is given by Equation 4–2, Ileak is the off-current of a transistor. Off-current is characterized when the gate-source voltage is at zero and the there is a finite drain-source voltage resulting in a leakage current. ileak can be estimated from Equation 4–2 by substituting Vgs as 0 and Vds as Vdd . Thus the ratio of leakage current to subthreshold current can be expressed as: Ileak Isub

e (−Vtn )/nn UT

1 − e −Vdd /UT = (e (Vdd −Vtn )/nn UT ) (1 − e −Vdd /UT ) = e (−Vdd )/nn UT

(4–20)

Thus the total energy/transition of the inverter chain can be expressed as: Etotal =

1 2 · lenCL Vdd α + 1.2len · e (−Vdd )/nn UT 2 86

(4–21)

In order to estimate the minimum energy of the inverter chain, the above equation is differentiated with respect to Vdd and set to zero to get the minima [108]. −Vdd /nUT 2 −Vdd /nUT −1 = lenCL Vdd α + 0.6len CL 2Vdd e + Vdd e nUT −1 0 = α + 0.6len · e −Vdd /nUT 2 + Vdd nUT Vdd /nUT αe −1 = 2 + Vdd (4–22) 0.6len nUT dEtotal dVdd

2

A closed form solution to the above equation has been presented by Swanson [105]. Vopt = (nUT )[(1.58) · log

0.6 · len α

− 2.35]

(4–23)

Using the above analysis we can now estimate the energy of the I&F circuit. As mentioned in Section 3.2 the comparator delay can be expressed as: Tcomp ≈

τ1 τ2 Vco2 Ks (Vin − Vth )

12

(4–24)

where τ1 = Coc1 /gm5 , τ2 = (r8 ||r10 )Coc2 and Ks = (gm7 (r8 ||r10 )gm1 )/gm5 . Therefore the energy required to generate a single pulse is given by: EnergyIF = Energycomparator + Energycomp output stage + Energyinverter

chain

(4–25)

Since we are considering only the energy required for generation of a single pulse (in other words the static power dissipation is ignored), the energy of the comparator can be expressed as: Energycomparator = Vdd · Ibias ∗ tdelay

(4–26)

Similarly the energy consumed by the output stage can be expressed as: Energycomp output stage = Vdd ·

2np µ0p Cox UT2

87

W L

8

e (Vdd −Vo2 /2−Vtp )/np UT ∗ tdelay (4–27)

Here tdelay is the propagation delay of the I&F circuit. It is the sum of the comparator delay and the delay of the inverter chain. In deriving the expression for the energy for the inverter chain it was assumed in Equation 4–18 that the input signal to the inverter chain is slow rising. However, for the I&F circuit that may not be the case depending on the bias current value. The value of τr is the rise time of the input signal to the inverter, which is essentially the rising time of the comparator. Hence using Equations 4–24 and 4–17 the delay can be estimated as: tdelay = Tcomp + tdelay inverter v 12 u 12 !2 u τ τ V τ1 τ2 Vco2 1 1 2 co2 = + ttp2 step + Ks (Vin − Vth ) 2 Ks (Vin − Vth ) v 12 u 12 !2 u 1 C · V 2 τ1 τ2 Vco2 τ1 τ2 Vco2 1 L dd t = + + · Ks (Vin − Vth ) 2 Isub 2 Ks (Vin − Vth ) v 21 u 21 !2 u 1 C · V 2 τ1 τ2 Vdd τ1 τ2 Vdd 1 L dd t (4–28) = + · + 2Ks (Vin − Vth ) 2 Isub 2 2Ks (Vin − Vth )

Thus the energy required to generate a single pulse can be expressed as: 1 2 + 9Vdd Ileak tdelay EnergyIF = Vdd Ibias tdelay + Vdd Icomp stage tdelay + 3 CL Vdd 2

inverter

(4–29) Since we are estimating the energy of one pulse in the equation above the activity factor has been substituted as 1 and the length of the inverter chain has been set at 3. Figure 4-10 evaluates Equation 4–29 for different bias currents as a function of Vdd . As expected the energy continues to drop with a decrease in the supply voltage, since Equation 4–29 is a strong function of Vdd . However, after a point the energy stops to further drop with a decrease in the supply voltage. At this point the inverter energy starts to dominate the overall energy consumption.

88

1.8e-015

1.6e-015

Energy Consumed (J)

1.4e-015

1.2e-015

1e-015

8e-016

6e-016

4e-016

Ibias = 0.05 uA Ibias = 0.25 uA Ibias = 0.50 uA Ibias = 1.00 uA Ibias = 2.50 uA Ibias = 5.00 uA

2e-016

0 1

1.5

2

2.5

3

3.5

4

4.5

5

Vdd (V)

Figure 4-10. Equation 4–29 vs. Vdd for different bias currents. 4.5

Summary

In this chapter the fundamental limit to power consumption of the I&F circuit was presented. We started by exploring the limits to power consumption of a simple static CMOS inverter. Since an analog circuit presents two degrees of freedom for a circuit, a basic single stage amplifier was analyzed for power consumption. Based on the simulated energy vs. supply voltage graphs we realized that the energy consumed converges to a value for different bias currents. Since the energy consumption converges and it appears as if the comparator can be biased for with a lower bias current at low supply voltages. However closely examining the dependence of delay on the supply and bias current, we realize that the delay is a stronger function of the bias current than the supply voltage. Using a low voltage and low bias current ensures that

89

the overall static power consumption is less particularly if the numbers of comparisons per unit of time are low. However penalty is paid in terms of the delay. On the other hand biasing the comparator with a higher bias and supply voltage would ensure that the overall delay is at its minimum at the cost of a higher static power consumption. In some ways the energy vs. supply graph can be divided into four quadrants namely high-bias high-supply/ high-bias low-supply/ low-bias high-supply/ low-bias low-supply. High-bias high-supply would give the best delay at the cost of a high static and dynamic power consumption. Low-bias high-supply would give low static power dissipation for a higher delay cost. Low-bias low-supply gives the lowest power consumption however at the cost of high delay. High-bias at a low-supply seems to be the most favorable it gives reasonably low delays at moderate power consumption. Using some of the insights gained from analyzing the limits to power consumption of the inverter and the differential amplifier, the I&F circuit was analyzed for power consumption limits. Both the inverter and the single stage comparator were analyzed as independent circuits. The I&F circuit on the other hand is made up of a comparator, a comparator output stage and an inverter chain. Since the energy consumption depends on the overall delay of the system a basic framework had to be built for analyzing the energy consumption at a system level. Using this frame-work, we presented the analysis for estimating the optimum value of supply voltage for minimum energy consumption for a chain of inverters. Using the method for analyzing the chain of inverters we used the comparator delay model from Chapter 3 and used it to estimate the energy consumption of the I&F circuit as a function of bias current and bias voltage. Simulating the equation we realize that depending on the bias current there is an optimum supply voltage for the I&F circuit for minimum energy consumption. Unlike the case of the inverter chain the resulting equation is quite tedious to derive an elegant solution for the supply voltage in terms of other parameters.

90

CHAPTER 5 ENERGY HARVESTING 5.1 Introduction Whether the I&F circuit is used as an encoder for micro-sensor applications or used for building large arrays of spiking neuron networks [45, 109], energy efficiency is a key criteria. In the preceding chapters we have developed several key energy optimizations to the existing I&F circuit. In order to further improve the energy efficiency of the I&F circuit, we propose a novel energy harvesting circuit. The I&F circuit generates an output pulse by comparing the capacitor voltage to the threshold, once the pulse is generated the capacitor is discharged. The energy stored in the capacitor is lost each time an output pulse is generated, instead a smarter scheme can be developed wherein the capacitor discharges by transferring its stored energy to another capacitor as compared to simply discharging to ground. The energy is harvested each time an output pulse is generated. 5.2 Energy Harvesting Architecture The block level schematic for the I&F circuit is shown in Figure 5-1. The input signal, Iin , is integrated on the capacitor, Cm , resulting in a voltage, Vm , which is compared to the threshold, Vth± , using two analog comparators. When the threshold is reached, the output of the comparator, Vpulse± , goes high, generating an output pulse and resetting Vm to Vmid . The capacitor voltage is then held at Vmid for a predefined time, τr , after which the process repeats. The charge integrated on the capacitor is wasted every cycle. The essential idea is to recycle this charge and utilize it in some way. The charge stored on capacitor Cm is discharged as current through the reset transistor. The current through the reset transistor can be used to charge another capacitor while keeping the voltage Vmid near constant. Once sufficient charge has been transferred and stored in the harvesting capacitor, it can be used to either power a circuit on the chip [110] or the charge can transferred back to the battery [111].

91

Like most energy harvesting systems, the system described as part of this work can also be broadly divided into three components [111] namely: the energy harvesting loop, the control logic and the energy utilization. The harvesting part encompasses the basic analog circuit for transferring the charge from the membrane capacitor to the harvesting capacitor. The control logic monitors and manages the energy harvesting process and the energy utilization circuit uses the harvested charge to power back the I&F circuit. This section describes the design process of different components of the energy harvesting architecture. Vdd

Vth+ Iin

− +

τ

Vpulse+

Delay

Vm − VthCm

Vpulse+ Vmid

+

τ

Vpulse-

Delay

VpulseVmid

Figure 5-1. Block level schematic of the biphasic I&F circuit. 5.2.1 Energy Harvesting Circuit As mentioned previously the essential idea behind energy harvesting in the I&F circuit is to transfer the charge from the membrane capacitor to another reservoir capacitor. The charge transfer must occur during the pulse time since that is when the membrane capacitor discharges. The membrane capacitor discharges through the drain current of the reset transistor. This discharge current can be used to charge another capacitor while keeping the source of the reset transistor steady at Vmid . This implies that the drain current of the reset transistor needs to be conveyed to another capacitor while maintaining a fixed voltage. A second generation current conveyor circuit (CCII) essentially performs this operation [112, 113]. The basic architecture of a CCII is shown

92

Vdd

Vth+ Iin

−

Vpulse+

τ

+

Delay

~Vpulse

Vm − VthCm

τ

+

Delay

Vpulse

Vpulse-

Vpulse

− Vmid

P1

+

C int Vss

Figure 5-2. Second generation Current Conveyor circuit and I&F circuit. in Figure 5-2. In the past this architecture of CCII has been implemented at CNEL for different applications. It was first used by Harpreet Narula for implementing a time-based potentiostat for ion current measurements [114]. This architecture was also used by Jie Xu as a possible analog front-end for recoding current instead of voltages in neural implants. The essential operation of the harvesting loop is very simple. Initially capacitor Cint is reset to Vss . When the I&F circuit generates an output pulse Cm discharges and the current flows through the reset transistor to Cint . Since the amplifier is connected in a negative feedback configuration the source of the reset transistor is held at ≈ Vmid . As more and more pulses are generated the capacitor Cint charges and its voltage rises. One of the first concerns with this approach, is the power consumption of the amplifier. A possible solution comes from the realization that the current conveyor is required only during the reset phase, thus the amplifier can be disabled in-between the pulses. Since most implementations of the I&F circuit generate both Vpulse± and their inverted values, these are digital-like pulses they can be used to control transmission gates for enabling and disabling the amplifier bias current. Figure 5-3 shows the amplifier bias-control schematic. The output of the I&F circuit is used as digital control for enabling and disabling the amplifier.

93

Vpulse Vdd

~Vpulse Vbias Ibias ~Vpulse Vss

Vss Vss

Figure 5-3. Amplifier bias control using the digital-like pulse output from the I&F circuit. Turning off the amplifier bias current will no longer ensure a voltage close to Vmid at the source of the reset transistor Mr in Figure 5-1. This problem can also be circumvented by again realizing that we can rewire the source of the reset transistor to the Vmid in-between pulses and can reconnect it to the amplifier loop during the pulse times. Turning the amplifier off in between the pulses can result in the PMOS transistor turning-on and discharging the capacitor. Thus, the switches are used again to disconnect the capacitor from the loop while the amplifier is disabled. Figure 5-3 shows the resulting CCII circuit schematic. As described later, the area and power cost using just one CCII loop per I&F circuit is fairly high. The real savings occur when an array of I&F circuits utilize the same CCII loop. Once an array of I&F circuits is connected to the harvester, the output pulses of the individual I&F circuits are logically OR-ed to generate the control signals for the switches. 5.2.2 Energy Utilization As mentioned previously, during initialization the capacitor voltage is reset to Vss and with every pulse the capacitor voltage rises. However, the capacitor voltage can rise only close to Vmid . The reason for this is that any voltage greater than this will result in the PMOS transistor P1 in Figure 5-2 turning-off thereby resulting in no negative feedback. In absence of sufficient negative feedback voltage at the source of the reset

94

] Vdd

Vth+ Iin

−

Vpulse+

τ

+

Delay

~Vpulse

Vm − VthCm

τ

+

Delay

Vpulse

Vpulse-

Vpulse Vpulse ~Vpulse Vmid + −

−

P1

+

~Vpulse Vss Vpulse C0

Vss

Figure 5-4. Second generation Current Conveyor circuit with the switches and the I&F circuit. transistor would no longer be close to Vmid . Once the capacitor voltage has raised close to Vmid the harvesting loop needs to be disabled and the harvested energy needs to be utilized. The harvested energy can either be used to charge the battery or can be used to power back an on-chip circuit. Usually energy harvesting systems use some form of a power converter scheme to convert the voltage on the capacitor [115]. The power converter boosts the output voltage of the energy transducer to a suitable level that enables utilization of the energy harvested. There are typically two schemes used for power conversion in harvesting systems namely boost conversion or a charge pump. A boost converter requires an inductor, which increases the system cost and size. On the other hand, charge pumps use only switches and capacitors are more area and cost efficient. Charge pump circuits essentially use switching on the two plates of the capacitor to effectively generate a higher voltage. For the above described scheme the obvious choice is to use a charge-pump circuit instead of a boost-converter. The reason for this choice is that the harvesting capacitor charges up until roughly Vmid . The Vmid voltage lies mid-way between Vdd and Vss for reasons explained in Chapter 2. Thus one very simple scheme

95

for boosting the voltage on the harvesting capacitor can be to connect the negative plate of the capacitor (the plate connected to Vss in Figure 5-4) to Vmid after it has been disconnected from the harvesting loop. This would boost the voltage on the capacitor to near Vdd . Energy harvesting architectures use a DC-DC converter or a regulator after the up-conversion of the harvested energy to power load circuits [116]. This is typically done to ensure that a constant voltage is supplied to the load circuits. Since the aim of this work is to illustrate energy harvesting as a feasible option for I&F circuits, we use the capacitors to directly power back the circuits without the implementation of a regulator. 5.2.3 Finite State Machine (FSM) Based Control Logic As mentioned above the voltage on the harvesting capacitor cannot exceed Vmid thus the harvesting capacitor needs to be disconnected from the harvesting loop and the harvesting needs to be disabled. In order to maximize the energy efficiency of the harvesting loop instead of using just one harvesting capacitor two capacitors can be used as analog ping-pong buffers. Analog ping-pong buffers imply that once the first capacitor is fully charges a second capacitor can be swapped in place of the first one in the harvesting loop. During this time while the first capacitor powers the circuit the second continues to harvest energy. In order to manage the harvesting capacitors as ping-pong buffers an efficient digital control is essential. The basic algorithm for managing the ping-pong buffers is fairly straight forward and is shown in Figure 5-5. During the initialization step the circuit is powered through the supply while the first capacitor charges through the harvesting loop. Once the first capacitor, C0 , charges all the way to Vmid , it is disconnected from the harvesting loop and the second capacitor, C1 , is connected in place of C0 . Meanwhile the supply is disabled and C0 is used to power the circuit. C0 continues to power the circuit and starts to discharge. The process continues till either C0 discharges significantly below Vdd or C1 charges close to Vmid . If C1 manages to charge before C0

96

has discharged fully then the harvesting loop is disabled and the system waits till C0 gets discharged so that now harvesting can occur on C0 and C1 can power the system. On the other hand if C0 discharges before C1 is full, then the power supply is enabled and the system waits for C1 to get full so that it can be used to power the circuit and harvesting can continue on C0 . In summary one capacitor harvests energy while the other one powers the circuit. If the capacitor powering discharges before the other capacitor has fully charged then the power supply is enabled and the system waits for the harvesting capacitor to charge fully. In the scenario, that the harvesting capacitor charges fully while the circuit is powered by the capacitor then the harvesting loop is disabled since both the capacitors are now fully charged. The harvesting loop is re-enabled only when one of them discharges. Since the control algorithm needs to be implemented as a digital circuit a finite state machine (FSM) for the control flow has to be designed. The equivalent FSM for the control algorithm is shown in Figure 5-6. The shown FSM requires a combination of four basic conditional statements to change states. These conditions are: (a). Vcap0≈Vmid (b). Vcap0