Delay-line-based Signal Processing ASIC for Velocity ... - IEEE Xplore

2 downloads 0 Views 389KB Size Report
John Taylor. Department of Electronic and Electrical Engineering. University of Bath. Bath BA2 7AY, UK [email protected]. Abstract— This paper describes an ...
Delay-line-based Signal Processing ASIC for Velocity Selective Nerve Recording Robert Rieger

John Taylor

Electrical Engineering Department National Sun Yat-Sen University 804 Kaohsiung, Taiwan [email protected]

Department of Electronic and Electrical Engineering University of Bath Bath BA2 7AY, UK [email protected]

Abstract— This paper describes an integrated circuit (ASIC) implementing the core functionality for the technique of velocity selective recording (VSR) of ENG in which multiple neural signals are matched and summed to identify excited axon populations in terms of velocity. Delay matching is achieved using multiple sample-and-hold blocks arranged to realize a matching range between 10–100 ȝs for eight input channels (80 ȝs–800 ȝs total delay) as well as signal summation. The system laid out in 0.35 ȝm CMOS technology occupies 0.78 mm2 core area and consumes 30 ȝW of power from a 3 V supply. A buffer driver stage is added which consumes 150 ȝW. Simulated results are provided to confirm that the velocity spectrum is successfully extracted using the proposed system. I.

INTRODUCTION

A current challenge in neuroprosthetics research is the use of naturally-occurring neural signals recorded from peripheral nerves (ENG) to provide feedback and control to artificial devices [1, 2]. Fully implanted systems in particular are limited by the available power budget and small form factor, requiring highly integrated electronics for signal capture and processing. Neural signals recorded using cuff electrodes typically span a bandwidth between about 300 Hz – 10 kHz with the main signal power located below 5 kHz. Sampling the signal at the Nyquist rate with 8 to 10 bit resolution thus results in a considerable data volume requiring a substantial share of the overall system power for transmission or storage. The use of multiple parallel recording channels further exacerbates this problem. Signal-to-information conversion realized by the implanted system has the potential to reduce the effective data rate by extracting useful information from the raw data and relaying only this reduced set of information. Velocity selective recording (VSR) has been proposed as a technique that enables information on the velocity and direction of action potential (AP) propagation to be extracted from a whole nerve recording. It relies on the use of a multi-electrode cuff (MEC) that records propagating APs at several positions along the axis of the nerve. This principle

was described in detail in [3], [4] and only a very brief overview is given in Section II. In addition, recent papers [5], [6] describe VSR methods and their limitations using data obtained in-vitro from explanted nerves of frogs. It is envisaged that a practical VSR system will have the general form of Fig. 1 in which measurements can be made from several nerves (three in the system shown). Each nerve is fitted with an MEC on which is mounted an ASIC containing low noise, low power, multi-channel pre-amplifiers, the signal processing system described in this paper, and analogue-to-digital conversion (ADC). In a previous implementation reported in [7], the electrode ASIC only performed amplification and ADC. The full-bandwidth digital output was then sent over implanted cables to a communication unit where the VSR processing algorithm was executed by a digital processor. A large portion of power and area was consumed by the digital sample-and-delay blocks and by the summing units required for VSR. The integrated processing system described in this paper uses switchedcapacitor sample-and-hold stages (S&H) to realize the delay and also to implement signal summation. Using analogueamplitude signal processing techniques requires that the system be placed on the electrode ASIC of the system in Fig. 1, before ADC. The design of the S&H stage was discussed in [8]. Here, we expand on this idea and describe the chip implementation of a complete 8-channel signal processor with considerably smaller area and power than that of the

Fig. 1. Plan of an implanted system for VSR interfacing with three nerves. Each nerve is fitted with a MEC and an electrode ASIC. The proposed processor would reside on the electrode unit and reduces the data traffic on the implanted cables.

This work was supported in parts by grants NSC 102-2221-E-110-007-MY2 and MOST 103-2221-E-110-026-MY2.

c 978-1-4799-5230-4/14/$31.00 2014 IEEE

205

conversion before signal processing, hence eliminating the need for full-speed ADC. However, only after all channels have been captured the summation can take place, occupying all S&H stages for the maximum delay time which for 8 tripole channels is 7.Td. To meet the Nyquist criterion, the sample interval TS should be no longer than 100 ȝs for the 5 kHz bandwidth signal assumed here. Consequently, several time-interleaved parallel S&H stages are needed. The required number of parallel stages (S) is given by

Timing Unit

clk_Ts

sum TP

0.6 pF

clk_Td

clk VSS

Vref

TP

0.6 pF

clk Vref

VSS

ܵ൐ TP

0.6 pF

sum_out

clk

VSS

Vref

Fig. 2. Circuit schematic diagram of the S&H column and associated Timing Unit.

comparable digital unit. The processor requires only two digital amplitude timing signals to control the matched velocity (i.e. the velocity at which activity is detected) and the effective output rate of the delay-matched signal. II.

SYSTEM PRINCIPLE

As APs propagate along the nerve, voltages appear at the MEC electrodes from which double-differential signals (tripoles) are formed by appropriate connection of the preamplifiers [8]. There is a delay, Td, between the appearance of the signals at successive tripole outputs which is a function both of the AP propagation velocity v and the inter-tripole spacing, d, expressed by Td =

d v

(1)

If equal and opposite delays are introduced subsequently by the signal processing (delay matching), and the tripole signals are added, the resulting output power is a maximum for that velocity. This allows the system in principle to classify excited populations by their propagation velocities. If the velocity bands are well-chosen, they may correspond to distinct physiological functions. Here, we target a velocity range of about 16 m/s – 120 m/s and assume a pitch d of 1.5 mm which requires Td be variable over the range 93.75 ȝs – 12.5 ȝs to achieve matching. If it was desired not only to detect activity at one given velocity but also to determine a wider velocity spectrum, the delays should be made variable to be able to tune to a chosen velocity by evaluation of (1). Implementing Td in a digital circuit requires converting the input signal at a sample rate Td, that may be very different from and often much higher than the Nyquist sample rate TS determined by the input signal bandwidth. For example, to discriminate a mid-range velocity of 60 m/s requires Td = 25 ȝs. The system described here eliminates the need for signal

଻ȉ்೏ሺ೘ೌೣሻ ்ೄ

.

(2)

This implementation requires S > 6.58 (for maximum Td=94 ȝs). A total of eight S&H blocks are used here which leaves room for reducing TS slightly as discussed in the following section. III.

CIRCUIT IMPLEMENTATION

The system is composed of eight copies of identical S&H blocks to realize the parallel stages. For the 8-channel system, each S&H block consists of eight storage elements, each built from a MOSFET transmission gate acting as a sampling switch and a storage capacitor. This arrangement is shown in Fig. 2. The sampling switches are controlled by digital timing signals clk generated in the associated Timing Unit. Each switch is opened with a delay Td compared with the preceding switch in the block, capturing the tripole input signal with the required delay. A timing signal clk_Ts is supplied externally that determines the delay between the eight parallel S&H blocks and hence sets the effective sample rate of the system. The positive edge of clk_Ts aims the S&H blocks in consecutive order. Only after a block has been aimed it can be triggered to acquire samples. The first sample is then taken on the first positive edge of a second externally provided clock signal clk_Td. Subsequent samples are acquired on every fourth positive edge of the clock. More generally, sampling can be performed on every Mth edge, where it is required that M • S as will be discussed further on. Letting the period of clk_Td be Td_clk=Td/M yields the required inter-sample delay Td. The sample aperture window of the S&H changes proportionally with Td and the switch resistance is therefore designed to be small enough to make any effect on the sampled voltage negligible. Introducing the frequency scaling factor M is essential to realize the system sample interval as follows. A delay occurs between aiming a block and taking the first sample, which ultimately depends on the frequency ratio of TS and Td. Therefore, the actual sample interval TS’ changes from sample to sample within the range TS to Ts+Td/M and is given for the nth sample by

TS' (n ) = (1 − n) ⋅ TS +

206

§ M ⋅ n ⋅ TS Td ⋅ ceil ¨¨ M © Td

· ¸ ¸ ¹

(3)

clear

sum_out1 sum_out2 ... sum_outS

MUX 1.2 pF

Vo

Fig. 4. Charge amplifier output stage providing a buffered test output Vo. A MUX selects the processed sample from the current S&H unit.

Fig. 3. Block diagram of the Timing Unit, which generates the switch signals for the S&H columns shown in Fig. 2.

where the function ceil returns its argument rounded to the next highest integer. To ensure a sufficient minimum sample frequency, a suitably lower target TS may be chosen within the limits imposed by (2). For example, letting TS=82 ȝs enables using the whole Td tuning range with a sample frequency of at least 9.5 kHz. The deviation from the target sample frequency and also the minimum sample frequency can be improved by increasing the division factor M and providing a proportionally faster time base Td_clk. The summation of the channel voltages within a S&H block is elicited after the last sampling switch in that block has been opened. Summation is obtained by connecting all storage capacitors in parallel using another set of switches, allowing charge redistribution to establish the average of all the sampled voltages. Since all sampling capacitors are of nominally identical size, summation of all eight channels is obtained with a gain of 1/8. The detailed block diagram of the Timing Unit is shown in Fig. 3. It consists of a chain of D-type flip-flops (FFs) which are all clocked by clk_Ts. Eight FF are used in this design as S=8. The FFs generate pulses on lines aim1-aim8, where each line is routed to a copy of the Spike Generation circuit shown in the shaded area of Fig. 3. The delay between the pulses appearing on adjacent lines aim1-aim8 is set by the edge separation of signal clk_Ts. Each pulse triggers the generation of a volley of spikes clk by the associated Spike Generation unit, which control the sampling switches in the S&H columns of Fig. 2. As already mentioned, the delay between successive spikes is set by signal clk_Td. An example of the resulting timing waveforms is given in Fig. 6, described in detail the following section. Finally, an analogue multiplexer (MUX) routes the summed voltage from the current S&H block to an output buffer which was added to drive an oscilloscope probe for chip testing and to provide additional gain of 4 V/V. The block diagram of the output stage is shown in Fig 4. The amplifier is implemented as a folded cascode opamp configured as a charge-to-voltage

S&H capacitor S&H Timing column

Output selector

Buffer

MUX

Fig. 5. Layout of 4 branches of the VSR processor core in 0.35 ȝm CMOS technology. An identical copy is used to implement the remaining 4 branches. The total active area of two blocks measures 850x900 ȝm2.

converter using a 1.2 pF feedback capacitor as shown. A shunting switch phased by signal clear is used to dump the capacitor charge and zero the output between samples. The negative input of the opamp connects to the summing node of one of the S&H units described earlier. A multiplexer (MUX) selects the S&H stage which holds the current summed sample to be connected to the amplifier (i.e. it de-multiplexes the parallel processing stages at a rate TS’), so that a continuous output signal with the required update rate is obtained. IV.

RESULTS AND DISCUSSION

The signal processor is realized in 0.35 ȝm CMOS doublepolysilicon technology, using three of the four available metal layers for interconnect. The layout of the core area including four of the parallel S&H processing branches is shown in Fig. 5. A copy of this layout is used to add the additional 4 branches. The active area for all eight branches is then 850x900 ȝm2. Fig. 6 shows the simulated timing waveforms clk generated in the Timing Unit for the first two S&H blocks. Corresponding signals are obtained for the remaining switches but are not shown for brevity. Packets of eight clk pulses are discernible, where a negative pulse causes the corresponding S&H capacitor to sample the input signal and hold it during the remaining time when the signal is high. The sampling delay is 30 ȝs for the simulation example shown and the time delay TS’ between generated clk signals is 105 ȝs (as expected from (3) when TS=100 ȝs and n=1). The positive edge of the summing signal follows the last pulse in the packet of clk pulses, triggering the summation of the sampled voltages. Simulation of the system functionality is carried out next. An AP template function was used [9] from which 207

tripole signals were formed and provided to the system input as the test signals. Multiple transient simulations were then performed sweeping the S&H delay from 30 ȝs to 100 ȝs in steps of 10 ȝs. This process was repeated for two natural AP delays Td of 50 ȝs and 80 ȝs respectively. Velocity spectra where obtained from the transient system output voltages by applying bandpass filtering, calculating the signal power using Matlab software and plotting the power versus the reciprocal of the S&H delay time as shown in Fig. 7. As expected, the regions of maximum power correspond with the natural delay of the tripole signals. Using (1), the corresponding AP velocity can thus be estimated. V.

CONCLUSIONS

A circuit implementation of the delay-and-add matrix used for VSR is presented in this paper. The layout including the clock generation circuits and four parallel S&H occupies an active area of about 0.78 mm2 in 0.35 ȝm CMOS technology. Test chips are currently in fabrication and will soon be available for testing. The trade-off between silicon area (parameter S), sample rate and velocity tuning range was discussed. The area consumption compares favourably with about 10 kbits of RAM required in a fully digital processing back-end realizing delay-and-add, which has an estimated size exceeding 14 mm2. Also, a significant saving in power

consumption is anticipated, as the operation of the S&H delay-and-add is essentially based on passive charge distribution. Both these parameters are important in the design of an implanted device. Further comparison is provided in Table I. However, issues remain that need to be addressed to expand the proposed mixed-signal processing core into a full replacement of a digital VSR processor. In particular bandpass filtering of the output signal should be added as this increases velocity selectivity. Also, the generation of the tuning curves could be supported by on chip functions, which requires implementing rectification and signal envelope detection. However, since these operations work at the reduced system output data rate, conventional implementation as a digital circuit following ADC appears feasible without compromising the advantage of low power and small area compared to the fully digital design. REFERENCES [1]

[2]

[3]

[4]

[5]

[6]

[7] Fig. 6. S&H switch signals generated in the Timing Unit for the first two columns and Td=30 ȝs and Ts=100 ȝs as set by the respective control signals clk_Ts and clk_Td.

[8]

[9]

M. Haugland and J. Hoffer, “Slip information obtained from the cutaneous electroneurogram: Application in closed loop control of functional electrical stimulation.” IEEE Trans. Rehab. Eng., vol 2, pp. 29-36, 1994. B. Popovic, R. B. Stein, et al., "Sensory nerve recording for closed-loop control to restore motor functions," IEEE Trans. Biomed. Eng., vol. 40, no. 10, pp. 1024-1031, 1993. J. Taylor, N. Donaldson, and J. Winter, “Multiple-electrode nerve cuffs for low velocity and velocity-selective neural recording”, Medical and Biological Engineering and Computing, 2004, vol 42, pp. 634-643. J. Taylor, M. Schuettler, C. Clarke, and N. Donaldson, “The theory of velocity selective neural recording: a study based on simulation”, Med. Biol. Eng. Comp., vol. 50 no. 3, pp. 309-318. M. Schuettler, N. Donaldson, V. Seetohul, and J. Taylor, “Fibreselective recording from the peripheral nerves of frogs using a multielectrode cuff”, J. Neural Eng., vol. 10 no. 3, 03016. R. Rieger, J. Taylor, E. Comi, et al.,”Experimental Determination of Compound A-P Direction and Propagation Velocity from MultiElectrode Nerve Cuffs,” Med. Eng. Phys., vol. 26, pp. 531-534, July 2004. C. Clarke, X. Xu, R. Rieger, J. Taylor, N. Donaldson, “An Implanted System For Multi-Site Nerve Cuff-Based ENG Recording Using Velocity Selectivity,” Analog Integrated Circuits and Signal Processing, vol. 58, no. 2, pp. 91-104, February 2009. R. Rieger, J. Taylor, “A Switched-Capacitor Front-End for VelocitySelective ENG Recording,” IEEE Trans. Biomed. Circ. Syst., vol. 7, no. 4, pp. 480-488, August 2013. R. Rieger, M. Schuettler, S.C. Chuang, “A Device for Emulating Cuff Recordings of Action Potentials Propagating Along Peripheral Nerves,” IEEE Trans. Neural Syst. Reha. Eng., in print, DOI 10.1109/ TNSRE.2014.2300933, January 2014.

TABLE I. COMPARISON OF THIS WORK WITH A DIGITAL REALIZED IN THE SAME CMOS TECHNOLOGY.

Fig. 7. Velocity spectra calculated from the system output when the AP delay was 50 ȝs and 80 ȝs respectively and the delay introduced by the system was varied from 30 ȝs to 100 ȝs in steps of 10 ȝs.

VSR PROCESSOR

Area

Power

Sample clock

Tuning range

Digital processor*) [7]

14 mm2

70 mW

50 kHz

20 ȝs -300 ȝs

This work

0.78 mm2

30 ȝW system +130 ȝW test probe amp

*)

208

Includes BP filters.

10 kHz ±33%max

10 ȝs -100 ȝs

Suggest Documents