Copyright 2005 Society of Photo-Optical Instrumentation Engineers. This paper will be published in the Proceedings of M icroelectronics for the New Millenium Symposium and is made available as an electronic preprint with permiss ion of SPIE. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, dis tribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of th e content of the paper are prohibited.
A Bioinspired Vision Chip Architecture for Collision Detection in Automotive Applications R. Laviana*, L. Carranza, S. Vargas, G. Liñán, E. Roca. Instituto de Microelectrónica de Sevilla, Centro Nacional de Microelectrónica, Avda. Reina Mercedes s/n, Campus Universidad de Sevilla, E-41012 Sevilla (Spain). ABSTRACT This paper describes the architecture and retino-topic unit of a bio-inspired vision chip intended for automotive applications. The chip contains an array of 100 × 150 sensors which are able to capture high dynamic range (HDR) images, with a programmable compressive characteristic. The chip also incorporates a mechanism for adaptation of the global exposition time to the average illumination conditions. Average values are evaluated over image areas which are programmable by the user. In addition to the HDR pixel, every retino-topic unit in the array incorporates digital memory for three 6-bit pixel values (18-bits), as required for the implementation of a bionspired computing model for collisions detection which has been developed in the framework of a multidisciplinary European research project. All processing steps are executed off-chip, though we are currently working in the design of tiny digital processors (one per column) which will allow for running the whole model on-chip in a future version of this prototype. The chip has been designed in a 0.35µm 2P-4M technology and maintains its correct operation in extreme temperature conditions (from -40ºC to 110ºC). Keywords: CMOS image sensor, Bio-Inspired Vision Chips, Vision Chips for Automotive Applications.
1. INTRODUCTION During few last years advances in image sensors and microelectronic technologies have enabled more and more sophisticated imaging applications1. Such evolution has also reached the automotive industry where new products offering vision applications for safety enhancement are appearing almost every day for applications like cruise control2, lane departure3, parking control4, etc. Most probably, one of the most exciting and gratifying fields of research in this topic is collision avoidance, which requires a previous collision threat assessment. Such preliminary identification of collision threats constitutes the main objective of the Locust project5 which is the framework where this chip has been designed. In this multidisciplinary european research project, collision threats are identified by mimicking a particular neuronal structure called Lobula Giant Movement Detector LGMD which is found in the eyes of Locusts6,7. Tough other bioinspired collision detection systems have been reported by Bermudez8 or Harrison9, to cite some of the most recently published works, they have not been designed to satisfy the demanding specifications existing in automotive applications. The chip presented in this paper is a first step towards the implementation of a complete Vision System on a Chip (VSoC) which must be able to detect objects in a collision course using a computing algorithm relying on the Locust LGMD neuronal structure. Main design challenges refer to, on one hand, the mapping of a bio-inspired retino-topic architecture into a compact silicon chip, and, on the other hand, the reliable achievement of very hard system constraints under the tough operating environments existing in automotive applications.
* Mail:
[email protected], Phone: +34-955-056666, Fax: +34-955-056686
2. BIOINSPIRED COLLISION DETECTION MODEL AND SYSTEM REQUIREMENTS 2.1 Bioinspired algorithm description
ON-OFF
ON-OFF
ON-OFF
The original model from Dr.Rind’s group6, is composed of four interacting retino-topically organized layers of interacting neurons. Signals from last layer converge into a single neuron, the already mentioned LGMD which fires alarm spikes whose frequency tells Locust about how dangerous is the present situation. During the course of the LOCUST project, this original model has served as starting point for the S development of a set of new computational models. P The aim of such evolution in modelling was to end∆t ∆t I up with a model which fits better to the specifications and needs of the Locust project, including VLSI system integration. After trading LGMD S performance, simplicity, and processing results, a final choice was made among the different generated P models. The selected model is described in detail by ∆t ∆t I Cuadri 7 in the proceedings of this conference, here we only provide a brief mathematical description in order to justify the operators and model needs which S will be presented in the next section. This simplified P computational model, conceptually shown in Fig. 1., ∆t ∆t I employs five types of neurons. It takes its input from P cells, which play the role of photoreceptors transducing the incident light into an electrical Fig. 1: Block Diagram of the simplified model signal* L ij ( t ) , which represents the input image. ON-OFF neurons compute a movement map by evaluating the difference between the current image L ij ( t ) , and that in a previous frame L ij ( t – ∆t ) , by making: L ij ( t ) – L ij ( t – ∆t ) C ij ( t ) = -----------------------------------------2
(1)
The output of each ON-OFF cell feeds its corresponding Inhibition neuron. These Inhibition neurons produce an output I ij ( t ) which is simply given by: I ij ( t ) = InhCoeff ( n ) ⋅ C ij ( t – ∆t )
(2)
where InhCoeff ( n ) is an inhibition coefficient which depends on the number of spikes generated by the LGMD within a predefined time window (5 frames in our case). In S-units, excitation from ON-OFF cells and inhibition from I-units compete to generate a net activity potential as: S ij ( t ) = max ( C ij ( t ) – I ij ( t ), 0 )
(3)
In the last layer, the LGMD neuron collects all S ij ( t ) signals to create a global excitation input e ( t ) .
∑ Sij ( t )
i, j e ( t ) = ------------------150
* Which, a priori, could correspond to either a voltage or a current.
(4)
The emulation of the LGMD membrane potential is calculated according to the following temporal differential equation (z domain)*: –2
–1
LGMD ( z ) = α 2 ⋅ z LGMD ( z ) + α 1 ⋅ z LGMD ( z ) + α 0 ⋅ e ( z )
(5)
Finally, one spike is fired whenever this membrane potential surpasses an adaptive threshold V , given by: V ( z ) = β3 ⋅ z
– 14
v ( z ) + β2 ⋅ z
– 10
–5
v ( z ) + β1 ⋅ z v ( z ) + β 0
(6)
where β 3, 2, 1 are tuning coefficients, β 0 is the minimum threshold and v ( z ) is calculated from: –2
–1
v ( z ) = α2 ⋅ z v ( z ) + α1 ⋅ z v ( z ) + α0 ⋅ e ( z )
(7)
After generating one spike, the LGMD is somehow hyperpolarized by setting LGMD ( t ) , LGMD ( t – ∆t ) and LGMD ( t – 2 ∆t ) to 0.
2.2 System requirements for automotive applications Circuit requirements are provided in two groups; those related to environmental conditions in automotive applications and those related to the processing needs of the bioinspired model.
2.2.1 Operating conditions In this first set, most important specifications are: Operating Temperature: Ambient temperatures ranging from -40°C up to +110°C. This wide range of temperature will basically affect the following aspects: • High temperatures will severely degrade the performance of analog memories on the chip due to the exponential increase in diode leakage and subthreshold currents with temperature. • Very high temperature will raise photodiode’s dark current and electrical noise in general. Illumination Conditions: The chip must cover an illumination range of at least 72dB within a frame. Also, global 2 2 illumination conditions may vary between 0,3cd / m to 30, 000cd / m . This has the following consequences: • The sensor must be able to adapt to 100dB of interscene variations in illumination conditions.Therefore, an autonomous mechanism for adaptation from dark to bright situations and viceversa needs to be included. • The need of 72dB of intrascene dynamic range also forces us to include a compression mechanism to adapt the sensor response to this High Dynamic Range of operation within a frame (HDR)12. Frame Rate: The sensor must provide at least 25 frames per second (25fps). The system is required to work in realtime. Hence, processing time must not exceed 40ms.
2.2.2 Processing requirements These requirements are determined by the computational and processing needs of the model in terms of required spatial resolution and uniformity, robustness under typical non-idealities in integrated circuits (noise, mismatching, etc.), number of elementary operations per second (OPS) to be executed, etc. Just by having a look at the equations describing the model, one can easily identify the following operators as the bricks which build-up the whole processing infrastructure: 1. Memorization: 2. Data storage for at least one frame (40ms) in the retino-topic variables. 3. Linear Operations: • Additions and Subtractions. • Scaling by a constant. 4. Non-Linear Operations: • Full-wave and half-wave rectification. * Model parameters have been specially tuned for this application by employing a novel genetic algorithm11
Table 1: Number of Operations to Execute per Frame in the simplified model Neuron
Add/Subs.
Scale by Constant
Compare
Assigna
# Neurons
ON-OFF
1
0
1
1
N×M
Inhibition
0
1
0
1
N×M
S-Units
1
0
0
1
N×M
LGMD
(N × M) + 2
3
0
1
1
Totalb
3 ⋅ (N × M) + 2
(N × M) + 3
(N × M)
3 ⋅ (N × M) + 1
8 ⋅ (N × M) + 6
a. These are the output function operators like I ij ( t ) ← InhCoeff ⋅ C ij ( t – ∆t ) . b. Total number of operations in the rightmost cell on the last row assume that additions, subtractions, comparisons, scalings, and assignments, are executed as single elementary instructions. Accessing operations to the memory to recall/save data are not accounted for despite they may rise to an important portion of the global processing time
Regarding data storage, we can see that the model needs to keep at least 3 ( N × M ) images to compute all operations involved in a frame. Namely L ij ( t ) , L ij ( t – ∆t ) , and C ij ( t – ∆t ) . Also, according to the robustness analyses carried out during model tuning, these variables need to be kept at a minimum equivalent resolution of 6-bits. This amounts to approximately 270Kbits of memory to be included within the chip for image saving purposes. Moreover, two values, L ij ( t – ∆t ) and C ij ( t – ∆t ) , need to be stored at that resolution for at least 40ms. This is quite a long time for analog memories13 if we take into account that operating temperature may rise up to 110ºC. On the other hand, if processing is not implemented on chip, then instead of storing C ij ( t – ∆t ) , we need to store L ij ( t – 2 ∆t ) thus rising the memorization time to 80ms (2 frames). Finally, the number of operation per layer and frame to be executed is summarized in Table 1. As it can be seen, processing requirements results in: N OPS = F ⋅ [ 8 ⋅ ( N × M ) + 6 ]
(8)
where F is the number of frames per second (fps) and N × M the size of the input image. Analyses have shown that 100 × 150 6-bit input images are very suitable for this application if the implementation of the operators also keeps this 6-bit accuracy limit. Indeed, the computational model has been widely tested and tuned employing this image size. Therefore, processing requirements of the selected model reduce to 3x106 OPS (MOPS) thus making feasible an external implementation of the required equations in conventional digital processing systems built around a FPGA.
3. CHIP ARCHITECTURE The architecture of the proposed system is presented in Fig. 2. The main block is an array of 100 × 150 retino-topic units, each of them corresponding to a single pixel in the input image. Retino-topic units contain a photodetector, with reconfigurable on-pixel Analog-to-Digital (AD) conversion, three 6-bits DRAM memories, one 6-bit I/O port and some internal logic. This array is surrounded by a ring of shielded cells for both dummy purposes and to provide a simple way to measure the dark current. Besides, the chip incorporates a module for autonomous exposure adaptation to the average illumination which controls the frequency of the master control that defines exposure times, a controller which addresses and executes commands from a 16Kbits SRAM program memory block, a bank of I/O registers connected to a 24-bit bidirectional data bus, and some other peripherals needed for addressing, timing control, digital-to-analog converters, and temperature monitoring. Two factors made us opt for an off-chip implementation of the algorithm — in a FPGA platform — in this first version of the chip. First because processing needs of the selected model are not very demanding and executing all computing tasks in a flexible hardware/software FPGA platform is absolutely viable. Second because the LGMD computing algorithm is still being improved and open to the incorporation of new features. Therefore, the chip is basically devoted to image acquisition and storage of current and two previous frames — as required by the model described in the previous section.
Timing Module
control bus
SRAM Internal Program Memory
Program Counter & Tick Counter
Acq SRAM 6X64 Drivers + Addressing
row [7:0]
Drivers + Addressing
Timer & WatchDog
Ramp Drivers
100X150 Cells
Controller
6-bit DAC
imagebus [23:0]
I/O Regs, Buffers, Addressing
control bus col [5:0] row [7:0] ROW&COL Pointers
Reg
Signals generator Module
StartRow StartCol EndRow EndCol
ER EC
∑ ∑ Lij ( t ) SR SC
Auto-Exposure Module
Target
Fig. 2: Global architecture of the proposed system
In future versions an on-chip algorithm implementation will be included with minor effort because required operations — add/sub, absolute value, scale by a constant, halfwave rectification... etc.— are easily implemented using digital circuits. Indeed we have already started the design of column digital processors which will perform all the required retino-topic operations. Moreover, the design of the digital module computing the LGMD and threshold state equations has been completed and occupies less than 400 × 400µm 2 . Regarding image acquisition, the autonomous exposure adaptation module works by accumulating pixel values within an user-defined window. The decision about whether to increase the frequency of the clock controlling exposure times is taken according to the difference between the computed average and a target which has been previously defined by the user. The chip also includes a microcontroller unit which executes orders from an internal program memory of 16Kbits. It allows the chip to operate as an autonomous peripheral in the car vision system. Image downloading and uploading are carried out through a 24-bit bidirectional data bus. Image uploading/ downloading is executed in a row-wise format through a bidirectional register which is able to store a complete row.
4. RETINO-TOPIC UNIT The retino-topic element is basically intended for two functionalities. First of all, it is responsible for image acquisition and intraframe dynamic range compression. Secondly, it is also responsible of on-chip storage of three consecutive frames. This section describes how these two functionalities have been implemented in this chip.
4.1 Image Acquisition The capability of acquiring very wide dynamic range images is a must in vision applications for the automotive industry14. CMOS sensor are nowadays one of the most promising alternatives in this context14, and several approaches have been published during last years by Schanz15, Lulé16, Yang17, Kleinfelder18, Kagami19 or McIlrath20. In this chip, we opted for a sensor similar to those in Kleinfelder18 and Kagami19. Fig. 3 shows a block diagram of the sensor. As it can be seen, in addition to the photodiode (D1) and the reset transistor (S1), the pixel contains a continuous-time comparator which controls the operation of a memory. The pixel employs two external signal references,
REF1 which is intrinsically analog, and REF2 which can be either analog or digital depending on the selected memorization scheme. The operation of the sensor can be basically described as Vreset follows. First of all, S1 resets the photodiode's integration node to Vreset. Afterwards, when RST signal is released, diode's current, which is mostly due to photogeneration, RST S1 discharges the capacitor at a rate which is approximately Vpix proportional to the power of the incident light. Photodiode's + voltage is continuously compared with a time-dependent D1 analog reference REF1, and the result of such comparison − On Pixel determines whether the memory has to sample signal REF2 or S2 Memory not. REF1 A particularly interesting aspect of this sensing scheme REF2 is its flexibility to produce different mappings between the Fig. 3: Block diagram of the sensor power of the incident light and the stored output signal. Such flexibility relies, obviously, on the capability of the sensor to employ different signals at REF1 and REF2. Moreover, if we employ a digital memory within the pixel, we can directly obtain a programmable analog-to-digital conversion of the photogenerated current19. Hence, different reference signals generate different input-output characteristics and, therefore, different HDR compressions profiles can be obtained. Probably, the most straightforward alternative is to employ a staircase ramp for both REF1 and REF2. Consider the example in Fig. 4(a). There, O max ( t ) and O min ( t ) , correspond, respectively, to the maximum and minimum light intensities that can be distinguished by the system*: I max - ∆t O max ( t ) = V reset – --------C pix
I min - ∆t O min ( t ) = V reset – --------C pix
(9)
If we employ a staircase ramp of 2 N steps, then one can easily find that: Input–Output sensor characterirstic using a 6-bit staircase ramp 3F
REF1
Vpix
O max ( t )
Stops acquisition and sample REF2
Texp t 0123456789ABCDEF............................ 3D3F
REF2
Digital Output Code (Hex)
O min ( t )
32 28
1E 14
0A
0123456789ABBBB.....................................B Memory Content
(a)
Iph (A)
t
–12
(b) 10
10
–11
10
–10
–9
10
Fig. 4: (a) Sensor signals during acquisition when using staircase references. (b) Compression characteristic of the sensor when using staircase references. * Observe that photogenerated currents larger than that defined by O max will produce a larger slope and hence will intersect the same value of REF1 (therefore making them indistinguishable from O max ). Basically the same applies for photogenerated currents smaller than that defined by O min .
0.8/2
VbiasP
0.4/6
0.4/6
1.4/4 7/0.4
7/0.4
VcascP
4.3/0.5
1.6/0.4
4.3/0.5
0.8/0.35
V+ Zout
VcascN
V2.5/0.4
2.5/0.4
1/0.35
1/0.35
VbiasN VbiasN
0.4/3
0.5/3
UNCND 0.4/4
0.4/0.4
0.5/3
0.4/0.35
Fig. 5: Schematic of the Full Input Range Comparator in the Pixel
V reset V reset C pix - ⇒ I max = --------- V O max ( t LSB ) = -----------– -----------t LSB reset 2N 2N V reset C pix 2N – 1 ------------ V reset ⇒ I min = ----------------------------O min ( ( 2 N – 1 )t LSB ) = -------------N N ( 2 – 1 )t LSB 2 N 2
(10)
Hence, the obtained dynamic* range results in I max DR = 20Log --------- = 40Log ( 2 N – 1 ) I min
(11)
Fig. 4(b) shows the compression curve which results when REF1 and REF2 patterns are 6-bit staircase ramps. Regarding the comparator in the pixel, its design has been addressed by paying special attention to three main facts, namely: 1) To maximize its input range. Therefore, the comparator does not set up any limit to the voltage excursions created by photogeneration in the diode’s parasitic capacitor. 2) To minimize its input referred offset-voltage, while keeping a reduced area, for the sake of reducing pixel level** Fixed-Pattern-Noise21. Maximum tolerable offset has been estimated as 1/2 LSB of a 6-bit coded signal in a full-scale of 3.15V (worst-case power supply requirement). Since we are designing under 4σ specifications, maximum input-referred offset voltage results in about 6,125mV 3) To trade speed and power consumption so that the required comparison times (in the range of 200ns for transitions from high to low — see Fig. 4 —) are reached with minimum power consumption levels. Fig. 5 shows the complete schematic of the comparator. Signal UNCND is employed to unconditionally set to zero the output of the comparator. Maximum current consumption is 2µA whereas maximum delay in responses to steps whose amplitude is 1 LSB around the middle point are 246ns and 179ns (for positive and negative transitions respectively). * Assuming that O min ( ( 2 N – 1 )t LSB ) is well above the minimum resolution levels imposed by noise and mismatch. ** Observe that, since Analog-to-Digital conversion is implemented at the pixel level, there is no column dependent FPN.
Montecarlo simulations (including both mismatching effects and global process variations) report a maximum inputreferred offset voltage whose standard deviation is 5,8mV (its systematic offset is always below 200µV ).
4.2 On pixel image storage: problems and proposed solutions The needs for middle term memorization times (80ms), operation under very high temperatures (110ºC), and the use of tiny devices (45000 pixels to be stored), turned the selection of the memorization scheme one of the most critical points in our design. First, we compared advantages and drawbacks of digital and analog memorization techniques, paying special attention to their suitability for the requirements of our application. Clearly, the main disadvantage of digital memories refers to its relatively large area occupation as compared to analog alternatives when data have to be kept at relatively low resolution limits. Under such low resolution constraints, analog memories, either current-mode22 or voltage mode13, offer more compact solutions for short-term memorization than their digital counterparts. Indeed, very few conventional APS architectures17,18,19,20 use digital memorization schemes to store the visual information. This is due to the obvious penalization in pixel size and power consumption introduced by the incorporation of an ADC at the pixel level. This scenario changes dramatically when the system is required to store pixel values for nearly 100ms at 110ºC. In this case advantages of analog memorization techniques existing in conventional CMOS technologies are not that obvious at all. The main reason behind this is that analog memories are, in general, much more sensitive to leakages than digital memories. Leakage currents in modern CMOS technologies are generated by physical phenomena of very different nature; subthreshold conduction of MOS transistors, reverse current of parasitic diodes, gate tunnelling, Drain Induced Barrier Lowering (DIBL), Hot Carriers effects, Gate-Induced Drain Leakage (GIDL), etc.23,24. In the case of analog memories in a 0.35µm technology, most important leakage effects are those due to: 1) The subthreshold current through the channel of the access transistor which enters/drains (depending on the sign of Vds) carriers from the capacitor where the data is stored. 2) The reverse current of the parasitic diode in the diffusion terminal of the access transistors which always drains carriers (in the NMOS case) from the capacitor to the ground node. A detailed characterization of leakage currents in the selected CMOS technology (0.35µm 4M-2P) has shown that the maximum contribution due to the parasitic diode in the diffusion terminal in a minimum PMOS switch reaches 2pA@110ºC whereas it amounts to 0.2pA@110ºC in the NMOS case. Differences are mainly due to the different doping levels in N-wells and P-substrates. Subthreshold currents are much more important in NMOS switches (mainly due to the larger current factor of NMOS transistors) where they may rise up to ~12pA@110ºC when the voltage of the diffusion terminal which is not connected to the memory capacitor is close to zero (observe that is the case of maximum VGS when the NMOS switch is off, and the exponential dependence of subthreshold current with this voltage25. Subthreshold current can be made negligible by controlling the voltage at the diffusion terminal of the access transistor which is not connected to the memory capacitor. Indeed if we avoid this node to go below 0.4V during retention times, subthreshold current diminishes to a few nano amperes. Even by doing so, we still need to be able to keep the data for 80ms@110ºC with a leakage current of 0.2pA —that of the parasitic diode —. Hence, to keep the 6-bit resolution limit within a full-scale of 2V requires a memorization capacitor not smaller than 17 × 17µm 2 . Many leakages compensating techniques for analog memories compatible with standard CMOS technologies have been proposed in literature26,27,28,29. No matter which alternative is selected, the refreshing circuitry requires additional area. Seemingly, adding a refreshing element per retino-topic unit is not viable due to the strong penalization in density. Moreover, sharing the refreshing element by a certain number of cells — e.g. one refreshing circuitry per column — leads to the need of downloading each memory once in a while — quite often actually because of the strong leakage current and the need to keep degradation below 1/2 LSB of the full-scale. In summary, for middle/long memorization periods, and in applications where large capacitors are not viable — as ours —, the continuous degradation of the stored analog information due to the effect of leakage currents and their large sensitive to temperature variations — every increase of 8ºC in temperature approximately doubles it —, imposes so hard restrictions on analog memories — area, compensation architectures including ADC-DAC loops, etc. — that their supposed advantages compared to digital memories disappear in practice. On the other hand, digital memories are not that unfavourable in our case. First of all because everyone knows that they can either be built intrinsically leakage-insensitive, as conventional SRAM, or provided with very simple refreshing
Read Select
Data
(a)
Data
Data
Write
Write Data
(b)
Read Data
Select
(c)
Select
Fig. 6: Digital Memory Cells. (a) 1T- DRAM. (b) 3T-DRAM. (c) Differential SRAM
ENM_LOCAL
mechanisms which make them virtually free of data losses due to leakages (DRAMs). Second, because the area penalization due to the use of digital memories does not have to be larger than that of using analog memories for 80ms@110ºC ( ∼ 300µm 2 per memory). Third, because our pixel already includes a comparator and a programmable external reference which is all we need to run Analog-to-Digital conversions using ramp techniques30. As a result, we finally opted for a digital solution. In the OUTPUT proposed architecture, every retino-topic unit in the array includes B1 the photodiode, a comparator performing on-pixel Analog-toData I/O Digital conversion and three 6-bit digital memories. B2 Fig. 6 shows some of the most widely employed digital memory cells. Fig. 6(a) and (b) correspond to Dynamic RAM cells INPUT_LOCAL whereas Fig. 6(c) corresponds to the well known differential static 31,32 RAM cell . SI1_LOCAL SO1 M13 In our chip, digital memorization has been implemented by M12 M11 means of 3T-DRAM cells due to: 0.4/0.35 2.6/2.6 1) Despite their need to be refreshed every once in a while, they 0.4/0.35 0.4/0.35 are much more compact than any SRAM solution. Moreover, the MC1 refreshing circuitry can be shared among different DRAM cells, 0.4/0.35 SO2 thus sharing also the cost in area occupation. S1 2) Though they are less compact than 1T-DRAM cells, they do M23 SI2 not suffer from charge redistribution problems among the M M21 22 0.4/0.35 2.6/2.6 memorization capacitors and the parasitic capacitors of data lines 0.4/0.35 during readouts. Consequently, readout circuitry can be simplified 0.4/0.35 to a single digital buffer. MC2 Memorization capacitor has been implemented using the gate SO3 capacitor of a NMOS transistor of 2,6µm ⁄ 2,6µm . Hence, the M33 memory only needs to be refreshed every 4ms@110ºC (keeping a SI3 M32 M 31 safety margin of 750mV before losing the stored information). 0.4/0.35 2.6/2.6 Fig. 7 shows the whole circuitry for a given bit in the 3 images 0.4/0.35 0.4/0.35 stored in a retino-topic unit. It contains three 3T-DRAM cells, one MC3 readout/refreshing circuitry (shared by the 3 memory blocks), and a one bidirectional tri-state I/O buffer (also shared). The retino-topic unit contains 6 of such modules. 0.4/0.35 The operation of the whole block can be described as follows: RFSH 1) Image Acquisition and Analog-to-Digital Conversion of the PCH photogenerated current. This operation is implemented in memory MP B3 block #1. During image acquisition, signals INPUT_LOCAL and SI1_LOCAL are enabled whereas ENM_LOCAL is disabled. Fig. 7: Schematic of 3 DRAM cells corresponding to Hence, memory #1 is continuously acquiring signal REF2 from the the same bit in the 3 pixel values stored on the retinoI/O line. This process ends when the comparator stops the topic unit and their I/O and refreshing circuitry.
INPUT
Vreset RST
Vpix
OUTPUT RFSH SI2 SI3
+ −
SI1_LOCAL
SI1
UNCND
MEMORY BLOCK
SO1 SO2 SO3 PCH
Zcomp
REF1
INPUT_LOCAL
SI2 SI3
ENM_LOCAL
ENM
b5 b4 b3 b2 b1 b0 REF2 and I/O Bus
Fig. 8: Complete Diagram of the Retino-Topic Unit
acquisition — see section 4.1— At this point, the comparator disables signal SI1_LOCAL and enables signal ENM_LOCAL*. 2) Memory Refresh. This operation is executed every 4ms and is activated by an internal programmable timer. Memories #1, #2, and #3, are refreshed sequentially. The refreshing operation involves the following steps. First, node a is precharged to the supply level by pulling down signal PCH. Then, PCH is released and output signal SO j is enabled. In a third step, the memory access signal SI j and the RFSH signals are activated, closing the loop around the memorization capacitor and regenerating the previously stored information. Since refreshing operations can occur arbitrarily during an acquisition period, block #1 needs to be isolated from blocks #2 and #3 by transistor S 1 to avoid any signal conflict which might happen. Transistor S 1 provides this required isolation between the refresh buffer B 3 and the input buffer B 2 , which, during acquisition, is providing signal REF2 to the memory block #1. 3) Memory Readout. This process is very similar to the refresh operation. The only difference relies on the activation of signal OUTPUT in the last step to connect the memory to the external data line. During readouts, memories are also refreshed by enabling the corresponding access switch. Basically the same signal sequence can be employed to perform data movements among memories.
4.3 Complete Diagram and Layout of the Retino-Topic Unit Fig. 8 shows a complete block diagram of the retino-topic unit. As it can be seen, in addition to the HDR pixel, and the memory block**, it also contains a few logic gates which modify the local values of some global controls ENM , INPUT , and SI1 , depending on the output of the comparator in the pixel Zcomp . During optical acquisitions, UNCND must be set to zero to allow the memory to respond to the output of the comparator. Conversely, during image uploading or downloading, UNCND must be set to 1 in order to control signals ENM , INPUT , and SI1 externally (without paying attention to whatever is happening around the photosensor. Fig. 9 shows the layout of the retino-topic unit. Cell size is 46 × 46µm 2 whereas area reserved for the photodiode is 15 × 10µm 2 .
5. CONCLUSIONS AND FUTURE WORK We have presented the architecture and the retino-topic unit of a VLSI chip intended for automotive applications. The chip, currently in the latest phase of formal verification before submission to fabrication, contains an array of 150x100 photoreceptors incorporating a HDR compression mechanism and has distributed memory for 3 consecutive images. The * These signals are generated by an internal logic which operates with global ENM, INPUT, and SI1, and the output of the comparator in the pixel, to avoid conflicts between buses (see next section for more details). ** Consisting of 6 blocks — one per bit— identical to that in Fig. 7.
tough specifications in automotive applications have made us opt for digital memorization of the acquired images using 3T DRAM memory blocks. The chip has been designed to provide optical sensing with a maximum intraframe dynamic range of 72dB and to cover 100dB of interframe variations (from 0.3 to 30,000 cd/m2) by means of an adaptation of the clock frequency controlling the integration time. Future works will involve the development of a digital column processor which will be responsible of computing (row-wise) the equations defining the Locust LGMD collision detection model which has been presented in section 2.
ACKNOWLEDGEMENTS This work has been partially funded by project IST2001-38097 (LOCUST) and TIC2003 - 09817C02-01 (VISTA). Mr. Laviana’s work is completely funded by a grant from the Andalusian Regional Government (Spain).
LOGIC
REFERENCES 1
2 3 4
5 6
7
8 9 10 11 12 13
14
Proceedings of Electronic Imaging Symposium MEMORIES (Society for Imaging Science and Technology COMP (IS&T) and SPIE), http:// www.electronicimaging.org. Mobileye web site: http://www.mobileye.com/ Iteris web site: http://www.iteris.com/av/ passenger.html Conference book of International Forum on Advanced Microsystems for Automotive DIODE Applications. http://www.amaa.de/ The Locust Project Web Site: http:// www.imse.cnm.es/locust/ Fig. 9: Layout of the Retino-Topic Unit F.C. Rind and P.J. Simmons, “Seeing what is coming: building collision-sensitive neurones”. Trends Neuroscience 22(5), pp 215-220. May 1999. J.Cuadri, G.Liñán, R.Stafford, M.S.Keil, E.Roca, “A bioinspired collision detection algorithm for VLSI implementation”, 2005 SPIE conference on Bioengineered and Bioinspired System Sevilla, Spain, May 2005, SPIE Proceedings Vol. 5839. S.Bermudez, P.Verschure, “A Collision Avoidance Model Based on the Lobula Giant Movement Detector (LGMD) neuron of the Locust”, Proceedings of the IJCNN, Budapest 2004. R.R. Harrison, “A Low-Power Analog VLSI Visual Collision Detector”, Advances in Neural Information Processing Systems (NIPS 2003), Vancouver, Canada, 2004. C. F. Rind and D. Bramwell, “Neural Network based on the input organization of an identified neuron signalling impending collisions”, Journal of Neurophysiology, No. 75, pp. 967-985, 1996. Yue S., Rind F.C., Keil M.S., Cuadri J. & Stafford R. “A bio-inspired visual collision detection mechanism for cars: Optimisation of a model of a locust neuron to a novel environment”, submitted to Neurocomputing, October 2004. O. Yadid-Pecht, “Wide Dynamic Range Sensors”, Opt. Eng. 38(10) pp.1650-1660, October 1999. R. Carmona, S. Espejo, R. Domínguez-Castro and A. Rodríguez-Vázquez. “Chapter 7: Short-Term Storage of Analog Signals for CNN Universal Machine based Image Processing”, in T. Roska and A. Rodríguez-Vázquez (Eds.), Towards the Visual Microprocessor, ISBN: 0-471-95606-6, John Wiley & Sons Ltd., Chichester, England, 2001. .B. J. Hosticka, Senior Member, IEEE, W. Brockherde, Member, IEEE, A. Bußmann, T. Heimann, R. Jeremias, A. Kemna, C. Nitta, and O. Schrey, “CMOS Imaging for Automotive Applications”, IEEE Transactions on Electron Devices, vol. 50, No. 1, pp. 173-183, January 2003.
15
16 17 18 19
20 21 22 23 24 25 26 27 28 29 30 31 32
Michael Schanz, Christian Nitta, Arndt Bußmann, Bedrich J. Hosticka and Reiner K. Wertheimer, “A HighDynamic-Range CMOS Image Sensor for Automotive applications”, IEEE J. Solid-State Circuits, Vol. 35, pp. 932-938, July 2000. Tarek Lulé, Michael Wagner, Marcus Verhoeven, Holger Keller, and Markus Böhm, “100 000-Pixel, 120-dB Imager in TFA Technology”, IEEE J. Solid-State Circuits, Vol. 35, pp 732-739, May 2000. David X. D.Yang, Abbas El Gamal, Boyd Fowler nad Hui Tian, “A 640x512 CMOS image sensor with ultrawide dynamic range floating-point pixel-level ADC”, IEEE J. Solid-State Circuits, Vol. 34, pp 1821-1834, Dec. 1999. Stuart Kleinfelder, SukHwan Lim, Xinqiao Liu, and Abbas El Gamal, Fellow, “A 10 000 Frames/s CMOS Digital Pixel Sensor”, IEEE J. Solid-State Circuits Vol. 36, No 12, December 2001, pp 2049-2059. Shingo Kagami, Takashi Komuro, Masatoshi Ishikawa, “A Software-Controlled Pixel-Level A-D Conversion Method for Digital Vision Chips”, 2003 IEEE Workshop on Charge-Coupled Devices and Advanced Image Sensors, Elmau, Germany, May 2003. Lisa G. McIlrath, “A low-power low-noise ultrawide-dynamic-range CMOS Imager with pixel-parallel A/D conversion”, IEEE J. Solid-State Circuits, Vol. 36, pp. 846-853, May 2001. Orly Yadid-Pecht, Ralph Etienne-Cummings, CMOS Imagers: From Phototransduction to Image Processing, Kluwer Academic Publishers, 2004. ISBN: 1-4020-7961-3. S.J. Daubert, D. Vallancourt, and Y. P.Tsividis, “Current copier cells”, Electronics Letters, Volume: 24, Issue: 25, pp 1560-1562, 8 Dec.1988. K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, “Leakage Current Mechanisms and Leakage Reduction Techniques in Deep-Submicrometer CMOS Circuits”, Proceedings of the IEEE, Vol. 91, No. 2, February 2003. A. Agarwal, C. H. Kim, S. Mukhopadhyay, K. Roy, “Leakage in nano-scale technologies: mechanisms, impact and design considerations”, Proc. of the 41st. Design Automation Conference, pp. 6-11, June 2004. Yannis Tsividis, Operation and Modeling of the MOS Transistor, McGraw-Hill New-York, 1987. Y. Horio, M.Yamamoto and S. Nakamura, “Analog Memories for VLSI Analog Neural Networks”, Proceedings of the International Conference on Fuzzy Logic and Neural Network, Vol. 2, pp.665.660, 1990. P.B. Brown, R Millecchia, and M.Stinely, “Analog Memory for Continuos Voltage, Discrete-Time Implementation of Neural Networks”, Proc. of the IEEE ICNN87, Vol. 3, pp 523-530, 1987. E. Vittoz, H. Oguey, M. A. Maher, O. Nys, E. Dijikstra and M. Chevroulet, Analog Storage of Adjustable Synaptic Weights in VLSI Design of Neural Networks, pp. 47-63, Kluwer Academic, Boston, 1991.Circuits. B. Hochet, V. Peiris, G. Corbaz, and M. J. Declercq, “Implementation of a Neuron Dedicated to Kohonen Maps with Learning Capabilities”, Proc. of the IEEE CICC 90, pp. 26.1.1-4, 1990. B. Razavi, Principles of Data Converter System Design, IEEE Press, 1995. R. L. Geiger, P. E. Allen and N. R. Strader. VLSI Design Techniques for Analog and Digital Circuits, ISBN:0-07023253-9, McGraw-Hill series in electrical engineering, 1990. pp. 827-839. D. A. Hodges, H. G. Jackson, Analysis and Design of Digital Integrated Circuits, ISBN 0-07-029158-6, McGrawHill series in electrical engineering, 1988. pp.364-380.