cmos implementation of a class of cellular neural ...

GHEORGHE ASACHI TECHNICAL UNIVERSITY OF IAȘI Doctoral School of the Faculty of Electronics, Telecommunications and Information Technology

CMOS IMPLEMENTATION OF A CLASS OF CELLULAR NEURAL NETWORKS -Summary-

Ph.D. supervisor: Prof. dr. ing. Liviu GORAȘ Ph.D. candidate: ing. Ion VORNICU

IAȘI – 2011

UNIUNEA EUROPEANĂ

GUVERNUL ROMÂNIEI MINISTERUL MUNCII, FAMILIEI ŞI PROTECŢIEI SOCIALE AMPOSDRU

Fondul Social European POSDRU 2007-2013

Instrumente Structurale 2007-2013

OIPOSDRU

UNIVERSITATEA TEHNICĂ “GHEORGHE ASACHI” DIN IAŞI

Acknowledgements

First, I would like to acknowledge my supervisor, Professor Liviu Goraș, for his guidance and support during my doctoral studies, and Professor Angel Rodriguez-Vazquez for his valuable advices that kept me focused on this research during the three months stay on the premises of the Institute of Microelectronics of Seville. I also want to thank for the financial support given by the BRAIN project „Doctoral Scholerships – An investment in intelligence”, strategic project financed by Social European Founds and Romanian Government. I would like to thank my colleagues and professors from the Signals, Circuits and System lab of the Faculty of Electronics, Telecommunications and Information Technology, and also the research group of the Institute of Microelectronics of Seville which is leaded by Professor Angel Rodriguez-Vazquez, for their support and for making my working place a nice environment for research and development. Along with Professor Liviu Goraș, I would like to express my warm thanks to the members of the scientific committee for accepting to analyze my doctoral dissertation, for their suggestions and their observations that contributed to the improvement of quality of the thesis and for their participation at the public presentation. Last but not least, I want to acknowledge all the professors of the Faculty of Electronics, Telecommunications and Information Technology of Iași who have contributed to my formation as an engineer.

i

Content I I.1 I.2 I.3

INTRODUCTION .................................................................................................. 1 Motivation ............................................................................................................. 1 Main limitations in the design of the smart sensors .................................................... 1 Objectives and thesis organization ............................................................................ 2

II ANALYSIS, DESIGN AND SYNTHESIS OF A CLASS OF ANALOG PARALLEL NETWORKS FOR IMAGE PROCESSING ..................................................................... 5 II.1 Two resistive grids cellular neural network (CNN) .................................................... 5 II.2 The analysis of a CNN without resistive grids ........................................................... 6 II.2.1 Nodal equations for a generic architecture ................................................................ 6 II.2.2 Transfer function associated to each spatial mode of 1D analog parallel network............ 7 II.2.3 Dispersion curve KA(m) ........................................................................................ 8 II.3 The design of spatio-temporal linear network with 1x64 cells ................................... 10 II.3.1 Cellular structure and the system of nodal equations of the network with 1x64 cells ..... 10 II.3.2 The design of the cellular transconductors. Simulation results ................................... 10 II.3.3 The influence of the nonidealities and nonhomogeneities on the dynamics of the filters 12 II.4 On the programmability of the core architecture .................................................... 13 II.4.1 The design of the digital control block ................................................................... 13 II.4.2 Design of the column buffer ................................................................................. 14 II.4.3 Design of the row buffer ...................................................................................... 15 II.4.4 Simulation results of the test network .................................................................... 16 II.4.5 On the layout of the reconfigurable spatio-temporal filter ......................................... 19 II.5 References.......................................................................................................... 19 III ANALOG PARALLEL ARCHITECTURES WITH OPTICAL INPUTS ..................... 23 III.1 Photo-sensitive elements ..................................................................................... 23 III.1.1 Working principle of a CMOS sensor ................................................................... 24 III.1.2 P-N junctions used as photo-detectors .................................................................. 24 III.2 APS parameters ................................................................................................. 25 III.2.1 Fill factor ......................................................................................................... 25 III.2.2 Dynamic range .................................................................................................. 25 III.2.3 Linearity .......................................................................................................... 26 III.2.4 Power consumption ........................................................................................... 26 III.3 Self freezing mechanism of the network dynamics.................................................. 27 III.3.1 Spatial peak detector .......................................................................................... 27 III.3.2 Simulation results .............................................................................................. 28 III.4 References......................................................................................................... 29 IV TRANSLINEAR CIRCUITS. LOG-DOMAIN MAPPING OF THE LINEAR EQUATIONS ............................................................................................................. 31 IV.1 Log-domain mapping of a 1D linear analog parallel architecture ............................. 31 IV.1.1 MOS transistor biased in weak inversion .............................................................. 32 ii

IV.1.2 Circuit synthesis................................................................................................ 33 IV.1.3 The design of a reconfigurable log-domain network ............................................... 34 IV.1.4 The dynamics of log-domain network affected by real technological conditions.......... 35 IV.2 The design of a 32x32 log-domain analog parallel architecture ............................... 36 IV.2.1 Log-domain mapping of the state equations of a 2D autonomous system ................... 37 IV.2.2 Circuit implementation....................................................................................... 38 IV.2.3 Simulations results ............................................................................................ 39 IV.3 References.......................................................................................................... 40 V IMAGE PROCESSING APPLICATIONS OF A CLASS OF ANALOG PARALLEL ARCHITECTURES .................................................................................................... 43 V.1 Applications of CNN’s .......................................................................................... 43 V.1.1 Vision chips ...................................................................................................... 43 V.2 Edge detection with the architecture analyzed in this work ....................................... 43 V.2.1 Edge detection performed with 1D analog parallel architecture ................................. 43 V.2.2 Edge detection performed with 2D analog parallel architecture ................................. 44 V.3 Smoothing .......................................................................................................... 45 V.3.1 Smoothing performed with 1D analog parallel architecture ....................................... 45 V.3.2 Smoothing performed with 2D analog parallel architecture ....................................... 45 V.4 Image segmentation ............................................................................................ 46 V.5 ECG signals classification .................................................................................... 49 V.5.1 ECG signals ...................................................................................................... 49 V.5.2 1D architecture with high order neighborhoods ....................................................... 50 V.5.3 Analysis and synthesis of high-order spatio-temporal filters ...................................... 51 V.5.4 The design of the reconfigurable high-order spatio-temporal filter ............................. 53 V.5.5 Classification rate ............................................................................................... 54 V.6 Textures classification ......................................................................................... 55 V.6.1 Images classification ........................................................................................... 55 V.6.2 Analog parallel architecture with high-order interconnections .................................. 56 V.6.3 The design of high-order 2D spatio-temporal filters ................................................. 56 V.6.4 Aspects related to the design of the 2D programmable spatio-temporal filters .............. 58 V.6.5 Classification rate ............................................................................................... 59 V.7 References.......................................................................................................... 60 VI CONTRIBUTIONS .............................................................................................. 63 VI.1 Contributions presented in Chapter 2 ................................................................... 63 VI.2 Contributions presented in Chapter 3 ................................................................... 63 VI.3 Contributions presented in Chapter 4 ................................................................... 63 VI.4 Contributions presented in Chapter 5 ................................................................... 63

iii

Introduction

I INTRODUCTION I.1 Motivation The visual biological system is one of the most complex sensorial systems which provides an important amount of information from the outdoor environment. The improvement of the CMOS optical sensors makes possible the development of a new class of circuits meant to emulate functions of the biological retinas. These electronic retinas are based on optical sensor and processing architectures of the incoming signals. The functions of the visual system were modeled with the aid of CNN’s. Considering the cellular structure of this kind of networks, an appropriate optical sensor is the CMOS sensor which allows embedding both the sensor and the processing element (PE) in the same pixel. The main challenges of CMOS sensors design involve the improvement of three key parameters: number of pixels, SNR, power consumption. Currently, CMOS photo-detectors and CCD devices have the same performances regarding image quality, the first one keeping the advantage of random access, integration on the same chip together with the processing circuitry and low power consumption. These days the effort is focused on the design of so called smart sensors which underlies the design of analog visual processors which are able to achieve real time operations on the input images. Due to the fact that the main bottleneck of the image processing chain is given by the speed of ADCs, the only way to simplify the data set which will be converted in the digital domain is to extract from the gross acquired image only the relevant features. In the literature, many CNNs meant to perform different image preprocessing operations are proposed. Unfortunately, only few of them are feasible on a standard CMOS technology because of the aria, power consumption and accuracy point of view. The above reasons give the main motivations of this research work on the possibilities of CMOS implementation of a new class of analog parallel architecture designed to perform the following image processing operations: edge detection, smoothing, image segmentation, texture and ECG signals classification. The design of the new massively interconnected analog network which is able to perform spatio-temporal filtering operations is not depending on the design of the optical sensor. Without the image sensor the network works only like an analog image processor, otherwise a vision chip with APS can be obtained by integrating the photo-detectors matrix together with parallel architecture. Thus the area, fill factor and power consumption constraints become critical. Therefore the main goal is to find the simpler circuit implementation of the basic cell which keeps the robustness of the filtering operations even if real technological conditions are considered.

I.2 Main limitations in the design of the smart sensors The main challenge in the visual processors field is to integrate more complex analog circuits on a smallest area. Due to this limitation, the usual maximum resolution which is found in the literature is about 128x128 pixels. In order to minimize the active pixel pitch, the aria of each processing element has to be minimized. Despite this limitation, this kind of vision chips is successfully used in real time applications which need speed rather than high quality images.

1

Introducere Maybe in the future this bottleneck can be overcome by 3D technologies where the processing array is integrated on different active layer than the CMOS sensor. In addition of that, different shared pixel circuitry concepts can be also used. Often, for high speed applications (at least 1 KFps) the analog or mixed cellular neural networks need an analog-to-digital and/or digital-to-analog conversion to the interface with the digital processors. Thus, depending on the resolution and the readout strategy of the back-end circuits, the retention time of each photo-cell gives the main constraints on the design of ADCs or DACs. The second limitation regarding the sensor interfacing is given by the huge amount of date which must be transferred to the digital peripherals, e.g. considering a 1024x1024 input image at 30 FPS and 8 digitization bits, this means a 30 Mbytes/second transfer bandwidth. Hence in 33 seconds, the amount of data which has to be delivered to the DSP is around 1Gbyte. In order to relax the imposed constraints on the converters design and to minimize the amount of information provided by the sensor, special pre-processing techniques are needed to simplify the data set acquired by the photodiodes matrix. Thus, the first step to increase the speed is to extract in the front-end circuits only the relevant features which will be further transferred in the processing chain.

I.3 Objectives and thesis organization This work aims to achieve the following objectives: ¾ CMOS analysis and design of a new linear analog parallel architecture which is able to perform different spatio-temporal filtering operations; ¾ Log-domain mapping of the linear differential equations of the proposed cellular neural network. The design is made using MOS transistors biased in weak inversion; ¾ The achievement of several applications based on the proposed analog parallel network: o Edge detection and smoothing; o Image segmentation; o ECG signal classification and texture classification. Besides the above research directions, the possibility of loading the capacitive nodes of the network with the information provided by a CMOS sensor and the implementation of the voltage controlled current sources (VCCS) from each cell of the network using degenerated inverters in order to minimize the area will be studied. Also the influence of the non-idealities and nonhomogeneities on the network dynamics will be analyzed and new specific calibration mechanisms will be proposed. Due to the fact that this network is unstable, a new self freezing circuit which stops the dynamics of the spatio-temporal filter before at least one node reaches saturation will be designed. As a collateral problem of texture and ECG signals classification, the design of high order spatio-temporal filters will be discussed. This work is organized into six chapters, each of them containing specific aspects of this doctoral dissertation. The second chapter is dedicated to analysis, design and synthesis of a new analog parallel architecture used for image processing. It is divided into four sections. The first section briefly presents the well known two resistive grids CNN. In the second section, the new proposed architecture without resistive grids is analyzed. The third section describes the design methodology of a 1x64 cells reconfigurable spatio-temporal linear filter. This CNN has an unstable behavior which leads to Turing patterns formation. Different aspects regarding cellular 2

Introduction structure and the nodal system equations associated to the network, the main relevant parameters in the design of VCCS, the spatial frequency characteristics of the basic spatio-temporal filters, the transistor level calibration mechanisms of the linear network and the influence of the nonidealities and non-homogeneities on the dynamics of each filter are also discussed. The fourth section is focused on the design of the core architecture so that it can be programmed in different operation modes. For this purpose the functionality of the parallel architecture, the design of the digital control block and line/column buffers are also widely described. The third chapter is focused on analog parallel architecture with optical inputs and is organized in six sections. In the first section a short survey on the CMOS sensors is realized. The second section describes the APS structure. The third section presents the main CMOS sensors parameters followed by the APS parameters in the forth section. The fifth section is focused on the design of the proposed CNN using degenerated inverters. In the last section a new selffreezing circuit of the unstable network is proposed. The fourth chapter is focused on the translinear circuits and is divided in six sections. The first one describes the log-domain mapping of the input-state-output linear equations. In addition of that, the second section highlights the main translinear principles by synthesizing the logdomain state equations and circuitry of a first order linear filter. The second and third sections customize the general theory by analyzing the log-domain mapping of the state-output linear equations and drawing some important conclusions. In the fifth section, the CMOS design of a 1D log-domain cellular neural network is realized. For this purpose, the functionality of the MOS transistor biased in weak inversion is discussed and the possibility to control the drain current under the process variations and mismatches by changing the bulk potential is explored. The log-domain filter is achieved by mapping the linear differential equations in the logarithmic domain and synthesizing the circuitry of the log-domain reconfigurable network in order to implement the basic high-pass filter (HPF), low-pass filter (LPF), band-pass filter (BPF) and stop-band filter (SBF). The least but not last, the obtained modules of the spatial-frequency characteristics are presented and the dynamics of the logarithmic network under the real technological conditions is analyzed. The last section is focused on the design of a 32x32 pixels analog parallel architecture in the logarithmic domain and consists of the log-domain mapping of differential equations of the 2D autonomous system, circuit synthesis and presentation of the simulation results. In the fifth chapter, several applications for analog image processing based on the designed circuit are proposed. This chapter is organized in six sections. The first one briefly presents some of the most important applications based on parallel processors. The basic applications implemented with the proposed analog parallel architecture are the following: edge detection, smoothing, image segmentation, texture and ECG signal classification. This work concludes with the last chapter which highlights the main personal contributions, as follows: the analysis and design of a new analog parallel architecture which is able to perform different spatio-temporal filtering operations, log-domain mapping of the linear CNN using MOS transistors biased in weak inversion and the implementation of several applications based on the proposed analog processor.

3

Analysis,design and synthesis of a class of analog parallel networks for image processing

II ANALYSIS, DESIGN AND SYNTHESIS OF A CLASS OF ANALOG PARALLEL NETWORKS FOR IMAGE PROCESSING II.1 Two resistive grids cellular neural network (CNN) In this chapter will be widely discussed the theory of a class of analog parallel architecture which is similar with an homogeneous two-grids resistively coupled cellular network with piecewise linear characteristic cells which has been shown to be able to produce Turing patterns based on spatial mode competition dynamics. CNN’s are parallel computing systems characterized by an architecture consisting of cells connected only with their neighbors [1]. This feature makes the difference between CNN’s and general neural networks whose cells are totally and non-homogeneously interconnected. The spatio-temporal dynamics of such analog architectures possibly associated with image sensors (CCD or CMOS) can be used for high speed (1-10 KFps) 1D and 2D signal processing including linear [2], [3] or nonlinear [4]-[9] filtering and feature extraction [10]. CNN’s as they have been defined in the seminal papers by Chua and Yang [1], [2] consisted of an array of identical and identically coupled cells as suggested by Fig. II-1a. Each cell ij in an M×N array is connected with the cells kl within a neighborhood Nr(i,j) of order r: N r (i, j ) = {C (k , l ) | max {| k − i |,| l − j |} ≤ r ,1 ≤ k ≤ M ,1 ≤ l ≤ N } (1) as it is shown in Fig. II-1b.

a.

b.

Figure II-1a. Sketch of a 4×4 CNN; b. Neighborhoods Nr of order r=1 and 2 [1]

A particular case of CNN is composed by two resistive grids interconnected as is shown in Fig. II-2d. The second order cells from Fig. II-2a are connecting the nodes of the resistive grids. The nonlinear resistance is implemented by the piecewise linearized function depicted in Fig. II2c. This kind of network is not suitable from area and accuracy point of view. The stability of these linear or nonlinear systems was deeply studied in literature [11]-[15]. Nevertheless the real challenge is to analyze the stability of the non-homogeneous networks. This issue is closer to the physical implementation because the network cells are slightly different due to these technological non-idealities. Furthermore, the linear differential system equations attached to this kind of non-homogeneous network cannot be solved using the so called decoupling techniques [16], which are nothing else than an appropriate change of variable. For this reason it is worth to trace the limits where the dynamics of non-homogeneous parallel architecture keeps the same features like the homogeneous one. Thus it could be draw a quantitative measure of the robustness of the physical implementation of the proposed cellular neural network. In the literature, the stability of two-resistive grids networks by means of Gershgorin’s theorem [17], [18] is discussed. Using the discreet version of the state-output equations [19], an discreet equivalent of the two-resistive grids network [20] can be obtained.

5

Analysis, design and synthesis of a class of analog parallel networks for image processing

Figure II-2. Two coupled resistive grids of 2D architecture

The spatio-temporal dynamic of the homogeneous and non-homogeneous networks was deeply studied in different domains, including autonomous CNN’s [21]-[24]. There are two important motives for analyzing these networks: understanding and modeling of the biological systems and the possibility of using the spatio-temporal dynamics of a class of parallel architecture for high speed signal and image processing. Even if the decoupling technique is valid only for the central-linear part of the cells characteristics, this method gives useful information about the shape of the final pattern which emerges after the network reaches nonlinearity. The system of differential equations associated to the nodes of the network can be solved by making the appropriate change of variable according to the ring boundary conditions [25], [26]. A similar approach proved to be useful for a particular case of a simpler network which will be proposed in the next sections. CNN with two-resistive grids is a special analog parallel architecture which derives from the Turing’s reaction-diffusion model [27]. The main idea of having a matrix of spatial modes, among which at least one spatial mode is unstable, can be implemented using a flexible architecture, Fig. II.2. If the emerging pattern is “frozen” before any nonlinearity has been reached, this architecture behaves like a spatio-temporal filter which has a time variable quality factor [28].

II.2 The analysis of a CNN without resistive grids II.2.1 Nodal equations for a generic architecture Further, for the sake of simplicity let us consider a 1D network of M cells having the basic structure as is shown in Fig. II-3. The network is composed by linear or piecewise linear cells. Each cell has an Y(s) admittance and voltage controlled current sources (VCCS’s) in order to connect any cell with a “Nr” neighborhood of “r” radius. The network parameters are represented by the interconnections weights, Ak, and the connections weights of the external inputs with the network nodes, Bk. The main goal is the identification of architectural and circuital challenges related to the design of the proposed massively-connected array processor. Therefore, in this section a particular case of maximum second order neighborhoods will be analyzed. Furthermore, in the last chapter, the design of an high order spatio-temporal filter will be discussed.

Figure II-3. 1D architecture

6

Analysis,design and synthesis of a class of analog parallel networks for image processing As shown in Fig II-3, only the first order neighborhoods are highlighted. Considering the usual notation, s ↔ d / dt , Y(s) is an integro-differential operator of the form: q

Q( s) = Y (s) = P( s)

∑q s l =0 p

l

l

∑p s n =0

(1) n

n

In this case the differential equation associated to each node of the network has the form:

Y ( s ) xi (t ) =

∑Ax

k ∈Nr

k

i+k

(t ) + ∑ Bk ui + k (t ),

i = 1,.., M

(2)

k ∈Nr

The above relations represent a set of coupled integro-differential equations which can be solved by the decoupling techniques [16] which basically consists in the following change of variables:

xi (t ) =

1 M

M −1

∧

∑ Φ M (i, m) x m (t ), ui (t ) =

m=0

M −1

1 M

∧

∑ Φ M (i, m) u m (t )

(3)

m=0

where the M functions Φ M (i, m) depend on the boundary conditions and are orthogonal with respect to the scalar product in CM. For ring boundary conditions, the eigenfunctions ΦM(m,i) Φ M (i , m ) = e j2π mi / M , and hence it can be written that have the form

Φ M (i + k , m ) = e j2π mk / M Φ M (i , m ) . Applying the change of variable (3), equation (2) becomes (4): M −1

M −1

∧

∧

Y ( s ) ∑ Φ M (i, m) x m (t ) = ∑ Φ M (i, m) x m (t ) ∑ Ak Φ M (k , m) + m=0

m=0

M −1

∑Φ

m=0

k∈N r

∧

M

(i, m) u m (t ) ∑ Bk Φ M (k , m), i = 1,.., M

(4)

k ∈N r

If the below notation are used, taking into account that N r ⊂ `* (because any cell cannot be connected to itself), K A (m) = ∑ Ak Φ m (k , m), K B (m) = ∑ Bk Φ m (k , m) (5) k∈N r

k∈N r

and considering Φ M (i, m) ≠ 0 , equation (4) becomes: ∧

∧

∧

Y ( s ) x m (t ) = K A (m) x m (t ) + K B u m (t ), i = 1,.., M

(6)

II.2.2 Transfer function associated to each spatial mode of 1D analog parallel network ∧

If equation (6) is divided by u m (t ) ≠ 0 , the transfer function associated to each spatial mode of 1D analog parallel network can be written: ^

um ( s )

∧

^

xm ( s )

Figure II-4. The feedback network associated to each decoupled spatial mode

H m (s) =

x m (t ) ∧

=

K B ( m) P ( s ) , Q ( s ) − K A ( m) P ( s )

u m (t ) m = 1,.., M (7)

From (7) the characteristic polynomial is R ( s ) = Q ( s ) − K A (m) P ( s ) , meaning that the stability of the system depicted in Fig. II-4 is given by the dispersion curse, KA(m), ultimately by the transconductance of the VCCS’s , Ak coefficients.

7

Analysis, design and synthesis of a class of analog parallel networks for image processing II.2.3 Dispersion curve KA(m)

According to (5), the dispersion curve or the modes curve for an homogeneous and symmetrical network has a sinusoidal form: 2π km 2π km r r r r j j 2π km 2π km K A (m) = ∑ Ak e M = ∑ Ak cos , K B (m) = ∑ Bk e M = ∑ Bk cos , k ≠ 0 (8) k =− r

M

k =0

k =− r

k =0

M

Φ M (i, m) are eigenfunctions of the spatial operators represented by the A and B vectors of network parameters and KA(m), KB(m) are the corresponding eigenvalues, which are generally complex and are depending on the interconnections matrix and the number of cells. Particular case 1: Bk = 0; Y ( s ) = sC ⇒ Q( s ) = sC ; P( s ) = 1 meaning that Y(s) admittance is only capacitive and s ↔ d . Thus equation (6) becomes: dt ∧

(9)

Amplitude

Amplitude

∧ d x m (t ) C = K A (m) x m (t ), m = 1,.., M dt For M=64, the solutions of the equation (9) are m1=16 and m2=48, according to Fig II-5.

Figure II-5. Dispersion curves for first order spatio-temporal filters with 1x64 cells

M − 1] (for the above example [0; 31]) gives the spatial 2 frequency characteristic of the filter based on the proposed analog parallel architecture as follows: the modes which have a positive real part will be amplified due to their instability, since the others with negative real part will go to zero. Naturally, due to the unstable behavior, the network dynamics has to be frozen before at most one node reaches saturation. The obtained spatio-temporal filter will have a time variable selectivity. Thus, the longer is the free transient run, the bigger will be the quality factor. In the following will be considered the case of maximum second order neighborhoods (r≤2, 2π m 4π m ). K A ( m ) = 2A1cos +2A 2 cos M M It is worth to mention that the second order spatial filter can be designed using both first and second or only second order interconnections, the latter case taking the advantage to implement second order filters with the same number of interconnections as the first order ones. Therefore, for the second order filter, A1 equals 0. In these conditions, the solutions of the characteristic polynomial, for M=64 cells, are: M 3M 5M 7M m1 = = 8, m2 = = 24, m3 = = 40, m4 = = 56 (10) 8 8 8 8 The dispersion curve taken from [0;

8


Amplitude

According to (10), a very important observation can be made: the roots of the characteristic polynomial for this kind of filters are not depending on the network parameters (A2)! The obtained results are given below:

Figure II-6. Dispersion curves for second order spatio-temporal filters with 1x64 cells

In order to conclude, the analog parallel network can be configured as first order (HPF or LPF) or second order (BPF or SBF) time variable spatial filter by using first order, r=1, or second order, r=2, neighborhoods. GsC Q( s) Particular case 2: Bk = 0; Y ( s ) = G || sC = = ⇒ Q( s ) = GsC ; P( s ) = G + sC , G + sC P( s ) meaning that Z(s) is only R||C, where R=1/A0=1/G. Thus equation (6) becomes: ∧

Amplitude

∧ d x m (t ) C = ( K A (m) − A0 ) x m (t ), m = 1,.., M (11) dt and the characteristic polynomial roots are given by the solutions of the equation KA(m)-A0=0. Obviously, A0 conductance connected with each capacitive node buries the dispersion curve, thus some unstable modes becomes stable. For instance, the BPF from Fig II-6 with A0=0 has the stable modes from (8; 24), since for the one from Fig. II-7 with A0>0, only the modes from (11; 21) remain unstable. Therefore, this can be a mechanism to control the cutoff frequency of the first order filters, or the bandwidth of the second order filters.

Figure II-7. Dispersion curves for second order spatio-temporal filters with 1x64 cells; A0>0

The characteristic polynomial roots for first and second order neighborhoods are given by (12): A 2π m M = 0 ⇒ m1 = arccos 0 , A0 , A1 > 0 r = 1; A1 ≠ 0; A2 = 0; − A0 + 2 A1 cos 2π 2 A1 M (12) A0 + 2 A2 4π m M = 0 ⇒ m1,2 = arccos ± , A0,2 > 0 r = 2; A1 = 0; A2 ≠ 0; − A0 + 2 A2 cos 2π 4 A2 M

9

Analysis, design and synthesis of a class of analog parallel networks for image processing It is important to mention that A0 parameter is indispensable for the design of high order spatio-temporal filters.

II.3 The design of spatio-temporal linear network with 1x64 cells II.3.1 Cellular structure and the system of nodal equations of the network with 1x64 cells

Let’s consider a 1D network of 1x64 cells. Each cell is connected with the first or second order neighborhoods. The structure of each cell is depicted in Fig II-8. In order to minimize the area, R=1/A0 resistance will be integrated in each cell only if it is necessarily. Because the network is homogeneous, Ast2=Adr2=A2 si Ast1=Adr1=A1.

Figure II-8. 1D architecture

The above circuit is characterized by the following set of state equations: dx (13) C i = − A0 xi + A1 ( xi −1 + xi +1 ) + A2 ( xi − 2 + xi + 2 ), i = 1..M − 1 dt According to the analysis made in section II.2 where the ring boundary conditions are fulfilled and removing the auxiliary inputs (Bk=0), the decoupled differential equations for each spatial mode have the form (9). In addition of that, it is worth to mentioned that A1≠0 and A2=0 for first order neighborhoods, since A1=0 and A2≠0 for second order of interconnections. II.3.2 The design of the cellular transconductors. Simulation results

The implementation of the VCCS’s implies the design of a transconductor replicated in each cell. The „Gm” transconductances of these amplifiers are the template coefficients of the network (13). Due to the area and power consumption constraints, these transconductors should be as simple as possible and confronting in the same time with a very poor power budget. Thus the main tradeoffs that have to be made here are between bandwidth, linearity and power consumption. In this implementation a basic operational transconductance amplifier (OTA) will be used.

Figure II-9. OTA structure replicated in each cell

10

Analysis,design and synthesis of a class of analog parallel networks for image processing Thus, taking into account that g m = β n I10 and adopting SM1=SM2=2um/20um from the linearity and layout points of view, a biasing current about I10=6.4uA will result. Each transistor from the differential pair is divided into 10 square transistors connected in series. Assuming that i Gm = out = Kg m , the value of this transconductance can be modified by scaling the output v2 − v1 stage, keeping the same linearity and power consumption of the input stage. If the typical Gm is about -5uS, its value will be placed by the corners simulations somewhere between -5.7uS and 4.3uS. II.3.2.1 Finding the temple parameters. Main constraints

This section is focused on the design of the core architecture with 1x64 cells, which can be configured to implement low-pass/ high-pass or band-pass/ stop-band filtering operations. The first step is to find the expression of the nodal voltages and to identify the circuital constraints related to the design of OTA’s. The solution of the system of first order decoupled differential equations, (9), is given by (14): ^

^

x m (t ) = x 0 m e

K A (m) t C

(14)

^

where x 0m represents the initial condition for the „m” spatial mode (16). Replacing (14) in (3) the time dependence of the voltages map can be computed as follows: 2π im K A ( m ) ^ t j 1 M −1 1 M −1 ^ xi (t ) = Φ (i, m) x m (t ) = x 0 m e M e C , i = 1..M (15) ∑ ∑ M m=0 M m=0 ^

x 0 = [ x01..x0 M ] = fft ( x0 ) (16) where x0 is the input signal or the initial conditions vector which loads the capacitive nods of the network. One of the most important issues in the design is to compute the maximum time period, tf, when the free run of the nodal voltages can be stopped without reaching saturation as follows: (47) max( xi (t ), i = 1..M ) t =t < VDD − VDSsat f

Basically we have to find the template parameters according to the incoming signal. If the network is loaded with an 200mV spatial impulse, placed on the 32-th cell, then max(xi(tf), i=1..M)=x32(tf). Thus, assuming that tf is about 350ns and using equation (15) then A1 parameter has the value of 5uS. In this case tf is the time at which max(xi(tf), i=1..M)=3.12V, assuming a supply voltage about 3.3V and a virtual ground about 1.65V.

Amplitude

Module of frequency characteristic of the designed filter

Modes

Figure II-10. Matlab HPF synthesis based on eq. (15)

11

Analysis, design and synthesis of a class of analog parallel networks for image processing II.3.3 The influence of the non-idealities and nonhomogeneities on the dynamics of the filters

The nonhomogeneities are caused by the random variation of the network parameters representing the worst case which is also modeled by Monte Carlo simulations, since the nonidealities are given by the uniform variation of the template parameters and can be modeled at system level by the OTA’s limited linearity and offset currents. The latter case can be highlighted by the corner simulations as follows: 1. the uniform variation of the transconductances of the network could be calibrated by controlling the hold or “freezing” time of the network (Tfree-run); 2. the uniform variation of the offset current which affects especially LPF and SBF, can be calibrated by adjusting the analog ground, GNDA, and controlling Tfree-run as shown in Fig. II-11. C1 and C2 curves represent the module of the frequency characteristics of LPF in „wp” and respectively “tm” corners. C3-5 curves are ideal C1, calibrated C1, and respectively calibrated C2 curve. Due to the calibration procedure C1 and C2 curves are identical with the module of the frequency characteristic of the ideal LPF. Calibrarea unui filtru trece-jos

Amplitudine

15

C1

10

C2 C3,C4,C5

5

0

10

20

30 40 Spatiu

50

60

Figure II-11. The calibration of LPF affected by the OTA’s nonlinearities and offset currents given by the “typical” and “wp” corners

As it was mentioned above, the effect of the offset current is cancelled by adjusting the analog ground. The maximum values of the processing times for each filter are depicted in Tab. II-1.

Filter type LPF HPF BPF SBF

Table II-1. Corner variations of the sample times of the programmable network Corner type/ Maximum sample time of the network [ns] (before reaching saturation) Ideal Typical wp ws wo wz 350 350 350 350

C=calibrated

NC=non-calibrated

C

NC

C

NC

C

NC

C

NC

647 x x 656

686 682 683 688

460 x x 457

492 428 477 492

876 x x 880

920 920 920 920

580 x x 587

628 615 617 627

702 x x 696

730 734 735 728

The calibration is made according to the spatial frequency characteristic of the network “frozen” when at least one nodal voltage reaches a given threshold voltage, Vthreshold=3.12V. Corners are affecting both values of the Gm’s and offset currents. The calibration procedure is made by tuning the transistor level filters according to the ideal ones. Due to the fact that the current offset does not affect the behavior of HPF and BPF, an additional adjustment of GNDA is not required. Tab. II-2 synthesizes the calibration effects. In order to have a better assessment on the nonidealities and nonhomogeneities of the network, MSE (Mean Squared Error) and PSNR (Peak Signal-to-Noise Ratio) are computed:

12

Analysis,design and synthesis of a class of analog parallel networks for image processing ⎛ MaxI2, K ⎞ PSNR = 10 log10 ⎜⎜ ( I (i) − K (i) ) , ⎟⎟ ∑ i =1 ⎝ MSE ⎠ PSNR is computed for each filter both for the corner and Monte Carlo simulations. I(i) and K(i), i=1..64, are the module of the spatial frequency characteristic of the ideal and respectively transistor level filter affected by the corners or process variations and mismatches. MaxI2, K is the

1 MSE = M

M

2

square peak value of module of the spatial frequency characteristic of the ideal filter. Table II-2. PSNR values of the configured filters under process variations and mismatch PSNR value [dB] Filter type Corner type Typical wp ws wo wz HPF 36.8 35.5 35.5 36.5 37.4 Calibrated 41 40 44.3 42 43 LPF 23.5 15.5 23.7 15.4 24 Non-calibrated BPF 37 35.3 35.5 36.3 37.4 Calibrated 38.6 40 43.2 36.4 42.4 SBF 23.5 15.7 23.7 15.5 24 Non-calibrated

A previous calibration for the peak values of the modules of the frequency characteristics has been made. Obviously, the PSNR values for the non-calibrated filters are quite low due to the offset currents. In addition of that, it can be concluded that the offset current does not affect the HPF and BPF.

II.4 On the programmability of the core architecture II.4.1 The design of the digital control block

The proposed architecture needs also a digital control circuit which can be embedded on chip or FPGA platform. The control block can be easily synthesized from the following diagram:

Figure II-12. State diagram for the digital control block

13

Analysis, design and synthesis of a class of analog parallel networks for image processing II.4.1.1 Linear network states

Analyzing the diagram from Fig II-12, only 5 states are required in order to test a minimal functionality of the network: • IDLE: the circuit holds the filtered version of the input signal in the capacitive nodes. Due to the fact that this architecture does not have a digital memory, the length of this state is limited by the retention time which is physically given by the presence of the leakage currents. The filter type is also configured. • POWER U/D: all OTA’s are powered on, preparing the network for the “FILTER” state. • LOAD: the input signal is loaded in the capacitive nodes of the circuit. These initial conditions can be downloaded from a CMOS sensor or a digital memory through digital-toanalog converters. • FILTER: the initial conditions vector is filtered, enhancing the dynamic range of the nodal voltages. This is the shortest state, but the circuit spends the biggest amount of power budget. FREEZE: the network is frozen, and the final values of the nodal voltages are downloaded • outside the chip by the back-end circuitry. The dynamic power consumption of this state is given by the output and column buffers. The number of the cells and the “pixel time” define how long this state takes, Τfreeze=M·Tpixel. II.4.1.2 Structure of the reconfigurable linear cell

The circuitry of each cell which is depicted in Fig II-13, can be easily drawn by synthesizing the digital control block. All the switches are implemented by transmission gates (TG).

Figure II-13. The structure of a reconfigurable cell

II.4.2 Design of the column buffer II.4.2.1 Back-end circuits of an analog processing chain

The weakest link of an analog processing chain is represented by the A/D conversion, therefore the main advantage of the parallel processing does not matter without an appropriate readout procedure. Many readout circuits for vision chips are reported in the literature. The column circuits are used to readout the parallel architecture without affecting the nodal capacitance by charge injection phenomena. The simplest way to do that is to use a source follower transistor, as shown in Fig. II-14. As we mentioned before, all the nodal capacitors are 14

Analysis,design and synthesis of a class of analog parallel networks for image processing decoupled from the network during the „FREEZE” state, keeping the final values of V1,..,VM. During the reading operation, each cellular analog memory is sequentially connected to the output node by means of Φ1,.. ΦM switches.

Figure II-14. Column circuits

II.4.2.2 Dynamic range and slew rate

The dynamic range of the column buffer is sketched in Fig. II-15a. The continuous line is the input voltage and the dashed one is the output voltage.

Figure II-15. a) DC characteristic of the column buffer. b) Corner variations of the slew rate.

The dynamic behavior of this circuit is also important, because this one gives the „pixel time” and furthermore, the processing speed of the programmable spatio-temporal filter. Assuming that the biasing current, Iref is about 2uA and considering a parasitic capacitance of 200fF, the SR is about 19V/us. Fig. II-15b plot with dashed line the transient output in the typical corner. In this example the pixel time, Tpixel is assumed to be about 400ns. The column buffer transistors which are founded in the vision chips are usually made by special masks in order to decrease their threshold voltage. II.4.3 Design of the row buffer

The retention time, Tretention is the time interval while the stored nodal voltages can be maintained into a specific range, e.g. 1mV. Thus, during this period all the nodal voltages can be readout without compromising the dataset from the first read cell to the last one. According to the measurements Tretention equals 190us. Besides these deviations, other error caused by the charge injection when the output buffer is connected occurs. From the measurement, this error is at most 3mV. Finally, for A/D converter a VLSB of 4mV should covers. Assuming that Tpixel is about 400ns, only 12.8us are required to read 32 cells, which mean that Tfreeze=12.8us. In these conditions a 2.5Mhz SAR or single slope A/D converter should fulfill the requirements.

15

Analysis, design and synthesis of a class of analog parallel networks for image processing Further, the following suppositions will be made: the maximum capacitive load of the row buffer and the SR are about 10pF and 200ns respectively. The first assumption is a little bit exaggerated considering that the ESD capacitance is about 3-500fF. Considering that the expression of the SR is I dV VDD − Vsatp − Vsatn KI SS / 2 SR = = ⇒ K SS 155uA dt Cload 200ns 2 and K=25, then Iss equals 12.4uA. SR is one of the most important parameters in the design of this buffer because it gives the minimum value of pixel time. The others are linearity and dynamic range. Obviously, besides these parameters, the stability issues are also very important. Assuming a 10pF load capacitance, the typical phase margin (PM) is 77◦. This nominal value is placed by corner simulations somewhere between 75◦ and 78◦. Furthermore, if the load capacitance varies between 100fF and 10pF, PM stays in the range of [77◦; 90◦]. In addition of that SRrow_buffer has to be bigger than SRcolumn_buffer. The schematic of this buffer is depicted in Fig. II-16.

Figure II-16. Row buffer schematic

II.4.4 Simulation results of the test network

In the following, the simulation results of the designed circuit which can be configured in four different ways are given. In this implementation, the network is supposed to be parallel loaded from an external memory. The control signals are depicted in Fig. II-17. The network is loaded with a 200mV spatial impulse. Another way to control the filter quality factor is by controlling the maximum values of the initial conditions vector. Because the output dynamic range of the analog processor is limited by column circuitry and the input dynamic range of the row buffer, the only mechanism to adjust the filter selectivity is given by the amplitude of the initial conditions. Thus, in order to enhance the quality factor we have to shrink the amplitude of the input signal, otherwise the filtering operation will be weaker due to the decrease of the quality factor. Another possibility to do that by keeping an certain amplitude of the incoming signal is to control the hold time.

16


Tfreeze

Tprocess Tload Tset up

Tfreeze


Tfreeze


Tfreeze


Figure II-17. Temporal diagram of the reconfigurable spatio-temporal filter

The time diagram of the designed network which implements 4 different kinds of filters is depicted in Fig. II-17. Tset up is the time interval when the transient behavior caused by powering on all the OTA’s of the network disappears. During Tload the network is loaded with initial conditions vector. Tprocess is the free-run time of the instable spatio-temporal filter when the desired filter emerges. During Tfreeze the information from the analog processor is downloaded. The simulation results by calibrating the network are depicted in Tab. II-3. For this purpose the frequency of the master clock, fCLK, used for calibration should be of 200Mhz. Time intervals of an complete operating cycle TPower_U/D (Tset_up) Tload

Tprocess

Tfreeze

Table II-3. Corner variation of Tprocess and Tfreeze Filter type Corner type Typical wp ws HPF/ LPF, BPF/ SBF 1034 HPF/ LPF, BPF/ SBF 12 HPF 66 50 96 LPF 66 50 106 Offset GNDA +5mV +17mV BPF 66 50 106 SBF 66 86 106 Offset GNDA +5mV +17mV HPF/ LPF, BPF/ SBF 2560

wo

wz

58 60 +5mV 60 60 +5mV

74 74 +2mV 72 70 +2mV

Due to dynamic range of the column and line circuits, the dynamics of the analog processor is stopped when the peak value for the nodal voltages reaches 2.5V. The PSNR computed between ideal and designed filters are up to 30dB, which means that results sketched in Fig. II18 are very close to the ideal ones. The achieved dynamic range of the LPF/SBF and HPF/BPF is about 700mV and 1.3V respectively. Fig. II-18 A’, B’, C’, D’ show the output of the analog processor configured as HPF, LPF, BPF and SBF respectively.

17

Analysis, design and synthesis of a class of analog parallel networks for image processing

A’

B

B’

C

C’

Amplitude

A

D D’ Figure II-18. A, B, C, D – Modules of the spatial frequency characteristics of HPF, LPF, BPF and SBF; A’, B’, C’, D’ – Output of the analog processor configured as HPF, LPF, BPF and SBF

According to Tab. II-3, a filtering operation takes about 18.56us which means a processing speed of 53.8 KOps. 18

Analysis,design and synthesis of a class of analog parallel networks for image processing II.4.5 On the layout of the reconfigurable spatio-temporal filter

The required area for the designed cell is about 190umX217um, meaning that the area of the 1D analog processor is about 1mmX1.5mm. This area is mostly given by the layout of the OTA’s as shown in Fig. II-9. Thus it could be significantly decreased by making some optimizations of the layout which is not the main purpose of this work.

Figure II-19. Layout of one cell of the 1D network

In order to have more flexibility in the testing faze, the digital control block was implemented on FPGA. In addition of that, the D/A converters used to load the analog processor can be implemented off-chip. Thus, the testing board will require a FPGA and an acquisition board with D/A converters. The output of the designed reconfigurable spatio-temporal filter is analog.

II.5 References [1] L. O. Chua, L. Yang, „Cellular Neural Networks: Theory”, IEEE Transactions on Circuits and Systems, vol. 35, no 10, pp. 1257-1272, October 1988. [2] L. O. Chua, L. Yang, „Cellular Neural Networks: Applications”, IEEE Transactions on Circuits Systems, vol. 35, no 10, pp 1273-1290, October 1988. [3] T. Roska, J. Vanderwalle, „Cellular Neural Networks”, John Wiley & Sons, 1993. [4] J.L. Huertas, Wai-Kai Chen, R.N. Madan, „Visions of the Nonlinear Science in the 21-st Century”, World Scientific Publishing, 1999. [5] Z. Kincsest, Z. Nagyl, and P. Szolgay, „Implementation of Nonlinear Template Runner Emulated Digital CNN-UM on FPGA”, International Workshop on Cellular Neural Networks and Their Applications, pp. 1-5, August 2006. [6] T. Szabot and P. Szolgay, „CNN-UM-Based Methods Using Deformable Contours on Smooth Boundaries”, International Workshop on Cellular Neural Networks and Their Applications, pp 1-5, August 2006. [7] D. Balya, G, Tímar, G. Cserey, and T. Roska , „A New Computational Model for CNN-Ums and its Computational Complexity”, International Workshop on Cellular Neural Networks and Their Applications, pp. 100-105, 2004. [8] http://en.wikipedia.org/wiki/Cellular_neural_network 19

Analysis, design and synthesis of a class of analog parallel networks for image processing [9] T. Roska, „Cellular Wave Computers and CNN Technology – a SoC architecture with xK Processors and Sensor Arrays”, International Conference on Computer Aided Design, Jose, CA, USA, pp. 557 - 564, 2005. [10] L. Goras, P. Ungureanu, „On the Possibilities of Using Two-Grid Coupled CNN's for Face Features Extraction”, Proceedings of the 8-th IEEE International Workshop on Cellular Neural Networks and their Applications, CNNA 2004, Budapest, Hungary, pp. 381-386, July 2004. [11] C. W. Shih, „Complete stability for a class of cellular neural networks”, International Journal Bifurcation and Chaos, vol. 11, pp. 169-177, 2001. [12] M. Gilli, „Stability of cellular neural networks and delayed cellular neural networks with nonpositive templates and nonmonotonic output functions” IEEE TCAS-I, vol. 41, no.8, pp. 518-528, 1994. [13] P.P Civalleri and M. Gilli, “Practical stability criteria for cellular neural networks”, Electronics Letters , vol. 33, no.11, pp. 970-971, 1997. [14] M.P. Joy and V. Tavsanoglu, “A new parameter range for the stability of opposite-sign cellular neural networks”, IEEE TCAS-I, vol. 40, no. 3, pp. 204-207, 1993. [15] M. Balsi, “Stability of cellular neural networks with one-dimensional templates”, International Journal of Circuit Theory and Applications, vol. 21, pp. 293-297, 1993. [16] L. Goras, I. Alecsandrescu, I. Vornicu, “Spatial filtering using linear analog parallel architectures”, , International Symposium on Signals, Circuits and Systems ISSCS 2009, Iasi, Romania, vol 2, pp. 409-412. [17] http://en.wikipedia.org/wiki/Gershgorin_circle_theorem [18] I. Alecsandrescu, L. Goras, “Gershgorin circles associated to double grid second order cellular neural networks”, Acta Tehnica Napocensis Electronisc and Telecommunications, vol. 49, no. 1, pp. 1-7, 2008. [19] E. David, P. Ungureanu, M. Ansorge, L. Goras, “On the CNN template design for Gabortype filters based on Pade approximation”, International Symposium on Signals, Circuits and Systems, vol. 1, pp. 197 – 200, 2003. [20] R. Carmona, F. Jimenez-Garrido, R. Dominguez-Castro, S. Espejo and A. RodriguezVazquez, “CMOS Realization of a 2-layer CNN Universal Machine Chip”, International Journal of Neural Systems, pp. 432-442, 2003. [21] L. Goras, L. O. Chua, and D. Leenearts, “Turing Patterns in CNNs – Part I: Once Over Lightly”, IEEE Transactions on Circuits and Systems – I, vol. 42, issue 10, pp. 602-611, 1995. [22] L. Goras, L. O. Chua, and D. Leenearts, “Turing Patterns in CNNs – Part II: Equations and Behaviors”, IEEE Trans. on Circuits and Systems – I, vol. 42, issue 10, pp. 612-626, 1995. [23] L. Goras, L. O. Chua, and D. Leenearts, “Turing Patterns in CNNs – Part III: Computer Simulation Results”, IEEE Trans. on Circuits and Systems – I, vol. 42, issue 10, pp. 627-637, 1995. [24] L. Goras, T. Teodorescu, R. Ghinea, "On the Spatio-Temporal Dynamics of a Class of Cellular Neural Networks", Journal of Circuits, Systems and Computers Section I (Theory) (Special Issue on "CNN Technology and Visual Microprocessors") JCSC, vol. 12, no. 4, August 2003. [25] L. Goras, L. O. Chua, “On the Influence of CNN Boundary Conditions in Turing Pattern Formation”, Proc. ECCTD’97, Budapest, pp. 383-388, 1997.

20

Analysis,design and synthesis of a class of analog parallel networks for image processing [26] L. Goras, T.D.Teodorescu, “On CNN Boundary Conditions in Turing Pattern Formation”, Proc. of the Fifth International Workshop on Cellular, Neural Networks and Their Applications, pp. 112-117, 1998. [27] A. M. Turing, “The Chemical Basis of Morphogenesis”, Phil. Trans. Roy. Soc. Lond. B 237, pp.37-72, October 1952. [28] L. Goraş, “On Pattern Formation in Cellular Neural Networks”, NATO Advanced Research Workshop, Siena, Italy, 22-24 October, 2002; published in V. Piuri , M. Gori , S. Ablameyko and L. Goras, (editors) “Limitations and Future Trends in Neural Computation”, Volume 186 NATO Science Series: Computer & Systems Sciences.

21

Analog parallel architectures with optical inputs

III

ANALOG PARALLEL ARCHITECTURES WITH OPTICAL INPUTS

This chapter is focused on the possibilities of loading the designed analog processor with the information provided by a CMOS sensor. In order to minimize the active pixel area, another implementation of the basic cell will be proposed. Basically vision chips are using two types of CMOS sensors: PPS and APS. PPS are smaller and they have the fill factor larger than APS. Nevertheless only APS [1] are used due to their large SNR significantly increased by the source follower transistor. There are two types of APS: with 3 (3T) or 4 (4T) transistors. The 3T structure is used since 1980 and besides the photo-diode it has a reset transistor, a source follower transistor and a selection transistor. The latter 2 transistors work as switches. Correlated double sampling (CDS) is the most common technique for noise cancelling. Due to the fact that CDS requires 2 capacitors, in 1993 JPL introduced the photo-gate that underlies the 4T APS. Thus kTC and FPN noise can be removed by CDS using the transfer gates of photo-gate devices. Thus CMOS and CCD sensors have the same performances even if they have different functionalities. CCD sensors are based on charge transfer featuring only a serial access, while CMOS sensors have the advantage of random access.

III.1 Photo-sensitive elements CMOS sensors are based on the photo-electric effect which characterizes the interaction of electromagnetic radiations with different materials, in particular silicon semiconductor. For this kind of sensors, the electromagnetic radiation is represented by the visible spectrum. When a semiconductor is exposed to the light, an energy transfer from the incident photons to the silicon atoms occurs, generating electron-holes pairs only if the photon energy is bigger than the semiconductor band gap as shown in Fig. III-1 [2].

Figure III-1. Structure of photo-sensitive element

The silicon semiconductors behave transparent for the photons which have an energy level lower then the specific band gap. Therefore, the higher is the energy level of the incident light or the smaller is the wave length, the higher will be the absorption coefficient of the electromagnetic waves. Fortunately, the band gap of the silicon is small enough (1.12eV) such that the visible light is able to generate electron-holes pairs. Next step to detect these pairs of electron-holes is to separate the electrons from holes, otherwise they will recombine in a short time. The simplest way to avoid this is to apply the electric field of the depletion region of the p-n junction. Hence, electrons and holes are drifting to the n-region and p-region respectively, generating a reverse 23

Analog parallel architectures with optical inputs current along p-n junction. To conclude, a photo-diode is a p-n junction exposed to the light which is converted into a reverse current which is called photo-current [3]. III.1.1 Working principle of a CMOS sensor

In order to integrate the photo-current of the photo-detector, its own capacitor can be used. When a diode is reverse biased, p and n regions work as the plates of a capacitor. The photodiode operation mode is depicted in Fig. III-2.

Figure III-2. Working principle of a CMOS sensor

Initially, all the photo-diodes from a CMOS sensor are reset to a reference voltage, Vref. Further, assuming t0 the time when photo-diodes are disconnected from the voltage reference and are exposed to the light, the voltage drop corresponding to each photo-detector will decrease depending on the light intensity. The measured voltage drop on the photo-diode is directly proportional with the light intensity and integration time. III.1.2 P-N junctions used as photo-detectors

Standard CMOS process embeds at least 3 vertical diode junctions and an horizontal one, two bipolar transistors, one vertical and the other one lateral and a photo-gate which can be used as photo-detector element. Due to the fact that operating mode of the photo-gate is not based on a reversed biased p-n junction, the dark current is much lower because it is caused only by the recombination process of the electric carriers in the substrate.

Figure III-3 – Photo-diode structure a) p-substrate/ n-well, b) p-diffusion/n-well

Depending on the depth of each wave length in the silicon wafer, the quantum efficiency, η, can be plotted as a measure of the light sensitivity of the photo-diode. The depletion region is represented by bolded line. P-substrate/ N-well diode has a larger quantum efficiency around the green light. The carrier pairs generated by the infra-red light (L~10um) are mainly collected by the diffusion region. This phenomenon is causing a contrast decrease in a matrix of CMOS sensors due to the lateral diffusion to the adjacent pixels. This is the reason way an infra-red filter is often used to enhance the image quality [4].

24

Analog parallel architectures with optical inputs The depletion region is characterized by the C capacitance of the reversed biased p-n junction, A where A is the junction area. The lateral capacitor should be considered as well for C = ε 0ε w more accurate computation. Usually, the value of this capacitance is around 0.5fF/um2 for CMOS sensors. Because the used 0.35um CMOS technology does not have available the photo-diode model, its functionality have been simulated by the equivalent circuit depicted in Fig. III-4.

Figure III-4. Simulated photo-diode using a p-dif/n-well diode

„D” is a p-diffusion/n-well diode as shown in Fig. III-3b. IPHD is a DC current source which is proportional with the light intensity. The circuit from Fig. III-4 has the same functionality with the usual photo-sensor which is connected to the supply voltage in the reset faze. The area and the equivalent capacitance of this diode are about 144 um2 and 185fF respectively.

III.2 APS parameters III.2.1 Fill factor

This parameter represents the ration between the active area exposed to the light and total area. A smaller fill factor means a lower sensitivity to the light. Due to this fact, the required area for the processing elements should be as smaller as possible, keeping the value of the fill factor somewhere between 30-40%. III.2.2 Dynamic range

The dynamic range of an APS is measured at the output of the processing elements. The circuitry of the APS based on the spatio-temporal filter is given by Fig. III-5.

Vload

Vload

Figure III-5 – APS based on the proposed spatio-temporal filter

If the illumination level is high then the final value of Vd will be low. The conventional sensor has a so call “hard reset” which means that the on resistance of M1 transistor should be low such that the photo-diode could be charged close to Vpixel. The maximum dynamic range is: 25

Analog parallel architectures with optical inputs |Vi(t) – Vsat11|max = Vpixel – Vgs2 – Vgs4 – Vsat9 (1) In order to find the maximum dynamic range on the photo-diode, it should be taken into gm 2 gm4 account the substrate effect of the source follower, AV = = . The usual g m 2 + g mb 2 g m 4 + g mb 4 dynamic range of the photo-diode from APS is about 1V. III.2.3 Linearity

The substrate effect and the nonlinearity of the threshold voltage of the column buffer compromise the global linearity of the analog processor: VTH = VTH 0 + λ ( 2Φ F + VSB − 2Φ F ) , (2) where γ is the substrate effect coefficient, and φF = (kT / q ) ln( N sub / ni ) , where Nsub is the doping concentration of the substrate, ni is the intrinsic concentration, and VSB is the source-substrate voltage. For NMOS source follower, VSB is the output voltage, and VGS is the difference between input and output voltage. Due to the low area and Ron resistance constraints, this source follower used as column buffer has minimum length and low width to reduce the parasitic capacitances as much as possible. Therefore short channel effect occurs as well affecting the linearity. Yet, a usual transistor size of 1um/lmin has a measured nonlinearity about 0.5% for both 0.18um and 0.35um CMOS technologies. Measurements reported in the literature unveil that the nonlinearity caused by the photosensor is at most 1%. Therefore, since the nonlinearities introduced on the processing chain are lower than 1%, these will not affect the circuit. III.2.4 Power consumption

This kind of linear networks have the dynamic power consumption dependent on the maximum setup time of the nodal capacitors. Furthermore, it is given by the time constant of the corresponding node (τ=C/gm, where C and 1/gm are the equivalent capacitor and respectively resistance). The network depicted in Fig. III-5 features three main operations: network loading, initial conditions processing and data readout. Supposing that the loading stage requires a settling time of 100ns, for a nodal capacitor of 1pF, a gm of 10uS is needed. Because the value of transconductance is quite small, hence source follower M2 works in weak inversion, the assumption that gm2 is around 20Ibias can be made. Thus, the biasing current for each pixel equals 500nA, which means that the parallel loading of a 64X64 sensor requires a biasing current of 2mA. Different sequentially techniques instead of parallel approach can be used as well to decrease the power consumption of this stage. The dynamic power consumption of the processing stage is strongly dependent on the implementation of the analog parallel architecture. In order to decrease the scanning time, let’s assume that APS works at 1000 frames/second, which means a frame time of 1ms. The line time for a 64X64 sensor equals 15us. Supposing that 10% of the line time is used to load and readout the network, the reading time is about 1.4us. Considering a load capacitance of 2pF (nodal capacitor + parasitic capacitors of the column bus), the transconductance of source follower M7 is about 1.3uS. Adopting the previous judgment, if gm7 is around 20Ibias, the biasing current of M7 is about 65nA. Therefore the biasing current needed to readout one line is about 4.2uA. Obviously, low currents entails mismatch problems in current mirrors.

26

Analog parallel architectures with optical inputs

III.3 Self freezing mechanism of the network dynamics Fig. III-6 presents the structure of the proposed APS. It is composed by the photo-diode and the processing element based on the discussed analog parallel architecture. Ci is the nodal capacitance which has the value of 500fF and occupies an area of 30umx15um. This additional capacitance, besides the photo-detector capacitance, gives the possibility to store two different frames, one with the original input image and the other representing a filtered version of the original image. MR is a PMOS transistor in order to minimize the sub-threshold effect by connecting the substrate to its own source. Usually, the nonidealities modeled by Monte Carlo simulations cause slightly differences between these source followers which are represented by FPN. This can be reduced below the observability limits applying the well known CDS technique. Switches SW1-3 are completely isolating Ci capacitor in the storing faze, since SW4 isolates the photo-diode capacitor form Ci.

Figure III-6. The proposed APS with self freezing mechanism

An incoming signal is processed by the proposed network as follows: 1. the order of the neighbor interconnections is setup by the Sel bit, thus configuring an certain type of spatial filter (HPF or BPF); 2. SW1-4 are closed, Inv_deg blocks are powered on and Di (i=1..M) photo-diodes are reset then exposed to the light. At the end of the exposure time, Ci store the information provided by the photo-diodes; 3. SW4 is opened, and the processing time starts. The end of this stage can be manually or automatically programmed by the aid of the proposed spatial peak detector. The dynamics of the network is frozen by opening SW1-3. Hence Ci (i=1..M) hold the filtered version of the input signal since Di keep the original. Notice that in the “freezing” faze the Inv_deg blocks are powered off to minimize the power consumption. III.3.1 Spatial peak detector

Basically, any unstable circuit has to be “stopped” before reaches saturation. Therefore a peak detector circuit is needed. Due to the non-idealities of the network, the quality factor of the implemented filters varies. Using the peak detector, the maximum selectivity that can be achieved by a certain designed filter can be obtained. The maximum processing time can be

27

Analog parallel architectures with optical inputs easily measured by a counter which is enabled by the 0L logic level of Vfreeze_PHD and is disabled by the 0L logic level of Vfreeze_A. The schematic of the spatial peak detector is depicted in Fig. III-7. Vref

1

Comp 2

Comp 1

Comp M

M-1 Vfreeze_A

Vout_Or_1 C1

C2

CM

Vout_Or_M-2

Figure III-7. Schematic of the spatial peak detector

The positive input of the comparator is connected to the voltage reference, Vref and the other one to each capacitive node. Vref is chosen such that no nodal voltage reaches saturation. Vfreeze_A is activated on 0L logic level when at least one nodal voltage exceeds the reference voltage. The schematic of the used comparator is depicted in Fig. III-8. Notice that the „Bias” block is implemented only once, since the core of the comparator which is composed by an input differential pair with common mode feedback is placed into each cell. The biasing current Iref is 500nA. Both inverters increase the amplification of the cascode stage.

Figure III-8. Schematic of the comparator implemented at pixel level

Due to the fact that the biasing current is limited by a small power budget, the only way to enhance the input transconductance is by increasing the width of M1, M2 transistors. On the other hand, the length of M5, M6 has to be much larger than their widths to ensure that it works in triode region. III.3.2 Simulation results

The proposed APS is loaded with 200mV spatial impulse. Due to the unstable behavior, the quality factor of the desired spatial filter will grow, and finally the network is frozen before any nonlinearity has been reached. Fig. III-9 outlines the time evolution of the voltage map of HPF.

Figure III-9. Nodal voltages dynamics of HPF during the states: Reset, Load, Process, Store

28

Analog parallel architectures with optical inputs The exposure time is given by the photo-sensor. The processing time is about 3.5us and is given by the transconductances of VCCS’s and nodal capacitances of the network. Fig. III-10A shows simulation results obtained with the parallel network configured as HPF whose pixel structure is depicted in Fig. III-8. In all the technological corners the network dynamics is automatically stopped when the peak value of the final voltages map reaches the reference voltage of 2.6V. Caracteristica FTS 6

ws

tm

wz tm,ws,wp,wo(calibrat)

4

wp

wo

tm

3 2

ws

Amplitudine

wp

5 Amplitudine

Caracteristica FTB 6

wo

4

wz tm,ws,wp,wo(calibrat)

2

1 0

5

10

15 20 Spatiu

25

0

30

5

10

15 20 Spatiu

25

30

A B Figure III-10. Calibration of A) HPF/ B) BPF

From simulation results, the lower quality factor is given by „wz” corner. Therefore, the other corners should be calibrated according to the worst case, “wz”. Thus, the calibration mechanism consists of the tuning of the reference voltage. In addition of that, the analog ground should be adjusted as well due to its variation under the corners somewhere between 1.49V and 1.78V. Fig. III-10B shows the simulation results for BPF. The required area of the proposed APS is 44x44um2, and the fill factor is about 30%.

III.4 References [1] Abbas El Gamal, Helmy Eltoukhy, “CMOS image sensors – An introduction to the technology, design, and performance limits, presenting recent developments and future directions”, IEEE Circuits and Devices Magazine, pp. 6-20, May/ June 2005. [2] T. N. Swe, K. S. Yeo, “An accurate photodiode model for DC and high frequency spice circuit simulation”, International Conference on modeling and simulation of microsystems, pp. 362-365, 2001. [3] Tobi Delbruck, C.A. Mead, “Analog VLSI phototransduction by continuous-time, adaptive, logarithmic photoreceptor circuits”, edited by C. Koch, H. Li in Visio Chips: Implementing visio algorithms with analog VLSI circuits, pp. 139-161, 1995. [4] M. Bigas, E. Cabruja, J. Forest, J. Salvi, “Review of CMOS image sensors”, Microelectronics Journal, vol. 37, issue 5, pp. 433-451, 2006.

29

Traslinear circuits

IV

TRANSLINEAR CIRCUITS. LOG-DOMAIN MAPPING OF THE LINEAR EQUATIONS

According to the principal of the dynamic translinear circuits (DTL) [1], the derivative of a current can be written as a product of currents. The dynamic range of this kind of circuits is larger by companding the input which means that a certain voltage logarithmically dependents on a proper current. They also can be implemented using class AB circuits, taking the advantage of a quite large bandwidth. Fig. IV-1 presents a translinear loop. Icap

Ic Vcons t Q

C

0

Figure IV-1. Translinear loop

The equation which leads to a translinear circuits is: •

(1) iC iCAP = CVT i C Depending on the companding laws which can be logarithmic, tanh or sinh [2], [3], distinct translinear circuits emerge. Log-domain filters are based on translinear loops which have been introduced for the first time by Frey [4]. Translinear loops based on active devices which have exponential or squared [5] current-voltage characteristics have been also reported in the literature. Log-domain mapping of a linear filter is quite natural considering that bipolar transistors [6], [7] and MOS transistors biased in weak inversion [8] have exponential currentvoltage characteristics. Finally, the log-domain filters keep the overall input-state linearity.

IV.1 Log-domain mapping of a 1D linear analog parallel architecture The log-domain mapping of the linear network depicted in Fig. IV-2 and the synthesis of the circuit using MOS transistors biased in weak inversion has been achieved based on general theory of log-domain mapping of an autonomous linear systems. This principal has been used to implement the reaction-diffusion equations which are nonlinear differential equations [9]. For physical implementation of large network, the required area and power consumption for each cell should be minimized. Thus the log-domain mapping of linear system fulfills both area and power consumption constraints.

Figure IV-2. 1D ideal linear architecture

The linear differential equation for first and second order neighborhoods and homogeneous network with Arg2=Alf2=A2 si Arg1=Alf1=A1 has the form:

31

Translinear circuits dxi (t ) = − A0 xi + A1 ( xi +1 + xi −1 ) + A2 ( xi + 2 + xi − 2 ), ∀i = 0...63 (2) dt Further, these equations will be translated into logarithmic domain by making an appropriate change of variable. Thus a new set of differential equations is obtained. Despite the associated circuit is nonlinear, the linear overall behavior is preserved. Even if bipolar transistors are more suitable to implement log-domain spatio-temporal filters, MOS devices can be used as well due to the remarkable robustness proved by the simulation results of the linear network. In a standard implementation, the natural physical significance of the state variables is usually that of voltage. However, in order to convert the system of differential equations (2) into the log-domain, we will further assume that the physical significance of the state variables xi(t) is that of current. Assuming that xi(t)=x(i) and vx,i(t)=vx(i) and applying an appropriate change of variable such Ci

vi

as xi = I S (eVT − 1) , equations (2) can be written as (3). •

CI Sα v x (i )eα vx (i ) = A2 I S eα vx (i − 2) + A1 I S eα vx (i −1) − A0 I S eα vx ( i ) + A1 I S eα vx ( i +1) α vx ( i + 2)

+ A2 I S e α vx ( i )

Dividing the equation (3) by e

− x0 (2 A1 + 2 A2 − A0 )

(3)

, applying the following notations: Cx = CI Sα ; A1 I S = I A1 ;

A2 I S = I A2 ; A0 I S = I A0 ; xoffset = x0 (2 A1 + 2 A1 − A0 ) = I X 0 , and considering A0=0, the new “i” state equation can be rewritten as (4). •

Cx v x (i ) = coupling − I X 0 e −α vx (i ) coupling = I A2 eα ( vx (i − 2) −vx ( i )) + I A1 eα ( vx (i −1)−vx (i )) − I A0 + I A1 eα ( vx (i +1)−vx (i )) + I A2 eα ( vx ( i + 2) −vx ( i )) , i = 1..63

(4)

In the linear system both input and output are directly taken from the states, while in the logdomain filter the outputs are taken from the states by the same exponentially nonlinearity. Equation (5) gives the output of „i” cell. yi (t ) = I S (evx (i ) − 1), i = 1..63 (5) The equation represents the Kirchhoff current law for cell “i”: the current through the capacitor “Cx” equals the sum of currents injected by the current sources nonlinearly controlled by the voltages of the neighboring cells. The term “coupling” represents the contribution of neighboring cells over the i-th cell. Each term within “coupling” represents the nonlinear equivalent in the log domain of the voltage controlled current sources while the term I X 0 e −αv x ( i ) defines a nonlinear

conductance. IV.1.1 MOS transistor biased in weak inversion

As Tsividis states, sub-threshold conduction occurs for low drain current, vGS lower then VT, and Φs should be the double of ΦF. Neglecting the substrate and short channel effect, the drain current is exponentially dependent on the gate-source voltage [10], [11]: vGS

W W KT I D = I D 0 e nVT , I D 0 = K n , p VT2 e1.8 , VT = (6) L L q The n constant is usually higher then one and is called slope factor of the sub-threshold conduction. This factor can be easily evaluated by measuring the slope of the logarithmic representation of the drain current versus vGS. Considering the substrate effect, the drain current has the form [11]:

32

Translinear circuits

vGB

vSB

vDB

− − W I D = I D 0 e nVT (e nVT − e nVT ) (7) L Equation (7) takes place in the following hypothesis: VGSVDsat≈4VT , VDB>-V0, VSB>V0. Neglecting the substrate effect, vSB=0, (7) becomes: vGS

vDS

− W I D = I D 0 e nVT (1 − e nVT ) L

(8)

IV.1.2 Circuit synthesis

As it has been shown previously, (2) represents the state equation associated to cell “i” of a first order high-pass spatial filter translated into the log-domain. The structure of a pixel obtained by the nonlinear mapping of the state equations of a linear autonomous system, (4), is composed of a nonlinear conductance, four current sources exponentially controlled by the voltages of the neighboring cells, and a dc current source. The basic building blocks used for the implementation of the log-domain equations were firstly proposed by Frey with bipolar transistors and are presented in Fig. IV-3.

a

a’

VDD

IA1,IA2

-

+ iA1,iA2 CX

GND GND

b’

b

Figure IV-3. a) current source exponentially controlled by the voltages of neighboring cells (injects current in cell capacitor); a’) current source exponentially controlled by the voltages of the current cell (sinks current from cell capacitor); b) transistor implementation of a); b’) transistor implementation of b).

The nonlinear controlled sources presented in Fig. IV-3 a, b respectively 3 a’, b’ have three inputs: a current input (IA1, IA2, IX0), and two voltage inputs (v+, v-). The scaling factor of the sources can be controlled by an external biasing current. Following to the above considerations, the schematic of a logarithmic cell is presented in Fig. IV-4 and corresponds to equation (4). Basically this circuit follows the translinear principle.

Figure IV-4. Cell schematic for high-pass/band-pass filter structure

The above structure can be used to implement a high-pass filter for vx(i-2)= vx(i+2)=0 and 33


vx(i-1)= vx(i+1)≠0 and a band-pass filter for vx(i-2)= vx(i+2)≠0 and vx(i-1)= vx(i+1)=0. Multiplying equation (4) by minus one, the cell structure for a low-pass/ stop-band spatial filter can be easily obtained. Due to the fact that A0 conductance is zero, DC current source IA0 vanishes from both topologies. According to equation (5), the circuit which expands to the output the state of i-th cell is shown in Fig. IV-5. This circuit is the same for both structures. R resistance works as currentvoltage converter and can be implemented by a chain of MOS transistors biased in triode region. The output Yi(t), matches with the output of the linear cell, vic(t).

Figure IV-5. Expanding circuit of the i-th state

The biasing current for the voltage controlled exponentially current sources is of 50nA, the unity NMOS transistor is of 10um/1um and is divided into 10 fingers from layout considerations. IV.1.3 The design of a reconfigurable log-domain network

In order to minimize the occupied area, a reconfigurable cell which implements HPF, LPF, BPF or SBF using a minimum number of nonlinear cells should be developed. Thus the proposed circuit is depicted in Fig. IV-6. Each exponentially current source of the 5 bits reconfigurable cell is biased with the same current, Iref of 50nA. This means that a matrix of 512x512 cells has the power consumption around 92mA. Furthermore, the pitch of each cell is about 560um2.

Figure IV-6. Structure of the reconfigurable log-domain cell

Tab. IV-1 gives the configuration bits for the four types of filters. 34

Translinear circuits Table IV-1. Configuration bits for 1D log-domain filter Configurations bits Filter type

Sel1

Sel2

Sel3

Sel4

Sel5

HPF

1L

0L

0L

1L

0L

BPF

1L

0L

0L

0L

0L

LPF

0L

1L

1L

0L

1L

SBF

0L

0L

1L

0L

1L

IV.1.4 The dynamics of log-domain network affected by real technological conditions IV.1.4.1 Corners nonidealities

Even if the deviations of the output current of each exponential current sources E+ and Ecan be quite easily adjusted by controlling the bulk potential and bias current, this calibration mechanism can not be implemented for large arrays. In addition of that, it is quite hard to extrapolate at system level the errors introduced by each non-linear element from each cell of the log-domain network. Due to the fact the corner simulations uniformly change the performances of NMOS and PMOS transistors, the behavior of the logarithmic implementation of the linear network is not significantly jeopardized. Therefore, the network homogeneity is preserved and variations caused by the corner simulations affect only the processing time which can be calibrated by controlling the sampling time of the nodal voltages map, as it is shown in Tab. IV-2. In order to have a good assessment on the differences between ideal characteristics and these ones obtained with transistor level log-domain filters which are affected by „tm”, „wp”, „ws”, „wo”, respectively „wz” corners, PSNR values are computed for each set of characteristics and depicted in Tab. IV-2. Tfreeze, when the dynamics of non-ideal filters has to be frozen such that the maximum values of the modules of spatial frequency characteristics are the same with their ideal counterparts, are given as well. Table IV-3. PSNR computed between linear ideal and log-domain transistor level filters PSNR [dB]/Tfreeze [us] Filter type Typical Corner (wp) Corner (ws) Corner (wo) Corner (ws) 52.6/ 2.76 51.5/ 2.35 53.8/ 3.25 51.7/ 2.85 53.4/ 2.71 HPF BPF

52.6/ 2.76

51.5/ 2.36

53.8/ 3.25

51.8/ 2.85

53.4/ 2.71

LPF

30.6/ 7.69

30.7/ 6.78

54.6/ 8.62

50.4/ 8

30.6/ 7.41

SBF

30.5/ 7.65

30.7/ 6.78

55/ 8.62

55.8/ 7.95

50.2/ 7.41

In order to calibrate the sampling time, a frequency clock of fCLK=100Mhz fulfills the requirements. According to the simulations results, PSNR values for each filter are up to 30 dB. Therefore an additional calibration circuitry is not needed in this case. IV.1.4.2 Monte Carlo nonidealities (process variations and mismatches)

Obviously, this is the worst case for this kind of network even if it is implemented in the linear or logarithmic domain. The nonhomogeneous network can be no more analyzed by piecewise linearized methods. The only way is to count on network robustness. It is hard to draw the boundaries of this robustness by analytical means. Some calculation can be done by the aid of Gershgorin’s theorem. Fig. IV-7 shows the modules of the spatial-frequency characteristics of four spatial filters given by the Monte Carlo simulations performed with 100 steps. All filters 35


0.7

0.7

0.6

0.6

0.5

0.5 Amplitude

Amplitude Amplitude

were sampled when the maximum values of the spatial-frequency characteristics reached the value of 0.6.

0.4 0.3

0.4 0.3

0.2

0.2

0.1

0.1

0

10

20

30 40 Space

50

0

60

10

20

0.7

0.7

0.6

0.6

0.5

0.5

0.4 0.3

0.1

0.1 0

20

30 40 Space

50

60

0.3 0.2

10

60

0.4

0.2

0

50

B

Amplitude

Amplitude Amplitude

A

30 40 Space

50

60

10

20

30 40 Space

D C Figure IV-7. A, B, C, D – Monte Carlo simulations with 100 steps. Modules of spatial frequency characteristics for HPF, LPF, BPF respectively SBF

Each frequency characteristics are represented on the abscissa for the first 32 spatial modes. According to the simulation results, the dynamics of the non-ideal filters affected by process variations and mismatches is very close to the ideal ones. It is worth to mention that using bipolar transistors instead of MOS transistors, better results are expected for the following reasons: o the exponential characteristic is preserved for wider range of input voltages; o deviations of the output current caused by nonidealities are much smaller. All spatial frequency characteristics are drawn by applying FFT transform on the impulse spatial response. The amplitude of the used spatial impulse depends on the linear or log-domain implementation and their position does not matter for ring boundaries conditions. Thus the log-domain implementation has the same overall behavior with the linear network. The main advantages of this approach are given by lower power consumption and smaller area comparing with the linear design.

IV.2 The design of a 32x32 log-domain analog parallel architecture Further, the log-domain mapping of the linear analog parallel network by keeping the overall linearity has been achieved. Fig. IV-8 shows the structure of a 32x32 network with first and second order neighborhoods. The linear differential equation written for (i, j) node is shown in (9).

36


Figure IV-8. 2D architecture

dxi , j (t )

= Alf 2 xi − 2, j + Alf 1 xi −1, j − A0 xi , j + Arg1 xi +1, j + Arg 2 xi + 2, j + (9) dt Aup 2 xi , j + 2 + Aup1 xi , j +1 + Adw1 xi , j −1 + Adw 2 xi , j − 2 , ∀i, j = 0..M-1 The dispersion curve obtained by applying the decoupling techniques on the linear state equations (9) is given by (10). The module of the transfer function is also described by the shape of the KA(m,n). 2π m 2π n ⎞ 4π m 4π n ⎞ ⎛ ⎛ K A (m, n) = − A0 + 2 A1 ⎜ cos + cos + cos (10) ⎟ + 2 A2 ⎜ cos ⎟ M M ⎠ M M ⎠ ⎝ ⎝ C

IV.2.1 Log-domain mapping of the state equations of a 2D autonomous system

This is a particular case of log-domain mapping of an input-state-output system. The state equations of an autonomous system are: ⎧⎪ • X = AX ⎨ ⎪⎩Y = CX

(11)

where “X” is the column vector of M2 length, “A” is the M2xM2 transition matrix and “C” is the output matrix which connect the states to the “Y” output . Let’s assume that the network is homogeneous, meaning that Alf1= Arg1= Aup1= Adw1= A1; Alf2= Arg2= Aup2= Adw2= A2, and also the ring boundary conditions are fulfilled. Thus equation (10) becomes: dxi , j (t ) = A2 ( xi − 2, j + xi + 2, j + xi , j − 2 + xi , j + 2 ) + A1 ( xi −1, j + xi +1, j + xi , j −1 + xi , j +1 ) − A0 xi , j ,∀i,j = 0..M-1 (12) C dt In the above linear system of the discussed architecture, “xi,j(t)” has the significance of voltage. The following change of variable is considered: (13) x(i, j ) = I S eα vx (i , j ) - I S Hence, from now on “xi,j(t)” will be considered as a current. Moreover the notations xi,j(t)=x(i,j) and vx,ij(t)=vx(i,j) will be used. Applying the above change of variable (13), the (i,j) linear differential equation is: •

CI Sα v x (i, j )eα vx ( i , j ) = A2 I S (eα vx (i − 2, j ) + eα vx (i + 2, j ) + eα vx ( i , j − 2) + eα vx (i , j + 2) ) α vx ( i −1, j )

+ A1 I S (e

α vx ( i +1, j )

+e

α vx ( i , j −1)

+e

α vx ( i , j +1)

+e

α vx ( i , j )

) − A0 I S e

− I S (4 A1 + 4 A2 − A0 )

(14)

α vx ( i , j )

Dividing (14) buy e and accepting the following notations: Cx = CI Sα ; xoffset = I S (4 A1 + 4 A1 − A0 ) = I X 0 ; A1 I S = I A1 ; A2 I S = I A2 ; A0 I S = I A0 equation (14) can be rewritten as: 37

(15)


•

C x v x (i, j ) = coupling − I X 0 e −α vx (i , j )

(16)

where coupling = I A2 (eα ( vx ( i − 2, j ) −vx ( i , j )) + eα ( vx ( i + 2, j )−vx (i , j )) + eα ( vx (i , j − 2)−vx (i , j )) + eα ( vx (i , j + 2) −vx ( i , j ) )

(17) + I A1 (eα ( vx (i −1, j ) −vx ( i , j )) + eα ( vx ( i +1, j ) −vx (i , j )) + eα ( vx (i , j −1) −vx (i , j )) + eα ( vx (i , j +1) −vx (i , j )) ) − I A0 The above equation represents Kirchhoff II law for cell (i,j): the current through the capacitance Cx(i,j) is equal with the sum of the currents sourced by the exponential voltage controlled current sources. The so called “coupling” term represents the influence of the neighboring cells on the cell (i,j). Each term from “coupling” has the meaning of a non-linear equivalent, in logarithmic domain, of the voltage controlled current sources, while I X e −αv ( i , j ) term defines a non-linear conductance. x

0

IV.2.2 Circuit implementation

The proposed reconfigurable log-domain cell is depicted in Fig. IV-9.

Figure IV-9. 2D reconfigurable pixel structure

38

Translinear circuits Each exponentially current source is biased with a reference current of 50nA. ¾ Power consumption: the cell of 2D reconfigurable log-domain network is biased with 700nA and 450nA for HPF/ BPF respectively LPF/SBF. Thus the required biasing current for a 32x32 matrix is about 716.8uA. The supply voltage is 1.8V. ¾ Area: the pitch without optimizing the layout is around 1760um2, which means for a 32x32 analog processor core the required area is about 1.8mm2. ¾ Processing speed: assuming that the biasing currents are settled, this parameter depends on the time constant of the intermediate node which can be decreased as much as parasitic capacitors allow. Obviously, if both linear on logarithmic network have the same nodal capacitance and template parameters then the first one will be faster than the second one. IV.2.3 Simulations results

In the following, we will compare the frequency characteristics for the ideal linear filters and the log-domain transistor level ones, both configured to perform all kinds of filtering operations. Further, simulations are performed with 32x32 log-domain analog parallel architecture whose cell circuitry is depicted in Fig. IV-9 and the configuration bits are given by Tab.IV-5. PSNR values are calculated as well.

a

b

c d Figure IV-10. Modules of spatial frequency characteristics for log-domain a, b, c, d) filters; a) HPF; PSNR= 55dB; b) LPF; PSNR= 41.7dB; c) BPF; PSNR= 52.3dB; d) SBF; PSNR= 49dB

For the sake of simplicity, we will consider the A0 term equal to zero. The spatial frequency characteristics can be read from Figure IV-10 by keeping only the [0..15; 0..15] spatial modes. The spatial frequency characteristics presented in Figure IV-10 were obtained by seeding both the ideal and the transistor level networks with a 20mV spatial impulse initial condition. In order to have a good assessment on the influence of the non-idealities introduced by the transistor level implementation, PSNR is computed between ideal and circuital level filters and according to the simulations results, the obtained values are up to 40dB which mean that the first characteristics match with the latter ones. For the log-domain filter a 400mV dynamic range have 39

Translinear circuits been achieved, which means that a 8 bits A/D converter with a VLSB about 1.5mV covers the requirements. Tfreeze values are chosen such that the maximum modules of the spatial frequency characteristics of both ideal and transistor level filters are equal to 2. These values are provided by Tab. IV-4. HPF/ BPF have the same time constant for both implementations, considering that all the template parameters for the linear filter are equal to 50nS, and the equivalent nodal capacitor equals 145fF. The effect of the offset current has been cancelled from LPF and SBF in the same way as 1D case by adjusting the analog ground, GNDA, by +4mV. This is the reason that transistor level implementation of these filters has a time constant much larger than that of the ideal filters. Table IV-4. Calibration of the log-domain filters according to the ideal ones Tfreeze [us] Filter Implementation type type tm ideal 7.95 3.35 LPF 3.28 3.34 HPF 3.37 3.34 BPF 7.87 3.35 SBF

As can be concluded from Tab. IV-5, only 5 configuration bits are enough to set the proposed network in order to implement the first and second order filters.

Filter type

Table IV-5. Configuration bits for 2D log-domain filter Log-domain transistor level spatio-temporal Spatial frequency filters characteristics Configuration bits

HPF BPF

Fig. IV-30a/a’ Fig. IV-30c/c’

Vc1 0 0

Vc2 1 0

Vc3 1 1

Vc4 1 1

Vc5 1 1

LPF SBF

Fig. IV-30b/b’ Fig. IV-30d/d’

1 1

1 1

0 0

1 0

0 0

Simulation results prove that both log-domain and linear implementation have the same behavior. The first synthesis has been used MOS transistors biased in weak inversion. Promising results on the bases of the logarithmic implementation of the massively-interconnected arrays analog processor has been achieved without additional calibration circuitry.

IV.3 References [1] Johan Huijsing, Rudy van de Plassche, Willy Sansen, “Analog circuit design – Volt electronics; Mixed-mode systems; Low-noise and RF power amplifiers for telecommunication”, published by Kluwer Academic Publishers, Boston, pp. 33-35, 1999. [2] A. G. Katsiamis, H. M. D. Ip and E.M. Drakakis, „A Practical CMOS Companding Sinh Lossy Integrator”, ISCAS, pp. 3303-3306, 2007. [3] K. N. Glaros, A. G. Katsiamis and E.M. Drakakis, „Harmonic vs. Geometric Mean Sinh Integrators in Weak Inversion CMOS”, ISCAS, pp. 2905-2908, 2008. [4] Douglas Frey, “Exponential state-space filters: A generic current mode design strategi”, IEEE Transactions on circuits and systems – I: Fundamental theory and applications, vol. 43, no. 1, pp. 34-42, January 1996. 40

Translinear circuits [5] Montree Kumngern, Kobchai Dejhan, „A new translinear-based dual-output square-rooting circuit”, Hindawi Publishing Corporation Active and Passive Electronic Components Volume 2008, http://downloads.hindawi.com/journals/apec/2008/623970.pdf [6] Jan Mulder, Albert C. Van der Woerd, Wouter A. Serdijn, and Arthur H. M. Van Roermund, „General current-mode analysis method for translinear filters”, IEEE Transactions on Circuits and Systems – I: Fundamental Theory and Applications, vol. 44, no. 3, pp. 193-197, March 1997. [7] Yannis Tsividis, „Externally linear, time-invariant systems and their applications to companding signal processors”, IEEE Transactions on Circuits and Systems – II: Analog and digital signal processing, vol. 44, no. 2, pp. 65-85, February 1997. [8] Andreas G. Andreou, Kwabena A. Boahen, Phillippe O. Pouliquen, Aleksandra Pavasovic, Robert E. Jenkins, Kim Strohbehn, „Current-mode subthreshold MOS circuits for analog VLSI neural systems”, IEEE Transactions on neural networks, vol. 2, no. 2, pp. 205-213, March 1991. [9] Phuoc T. Tran, Bogdan M. Wilamowski, ”VLSI implementation of cross-coupled MOS resistor circuits”, IECON’01, vol. 3, pp. 1886-1891, 2001. [10] J. Baker, H. Li, D. Boyce, „CMOS circuit design layout and simulation”, Chapter 21.4.2, pp. 483. [11] Kenneth R. Laker, Willy M.C. Sansen, „Design of analog integrated circuits and system”, 1-6, pp. 30.

41

Applications

V IMAGE PROCESSING APPLICATIONS OF A CLASS OF ANALOG PARALLEL ARCHITECTURES V.1 Applications of CNN’s The dynamics of the proposed parallel network proved to be useful for image processing applications. The difference between most linear and nonlinear CNN’s and the analog parallel architecture which has been presented in Chapter 2 is that while the final states are important in the first case, in the latter one the transient dynamics is used before the final states has been reached. Among the applications developed with the proposed analog processor are: edge detection and smoothing, high frequency noise cancelling, image segmentation, ECG signals classification and texture classification. V.1.1 Vision chips

Image processing based on vision chips can be: spatial and spatio-temporal. A. Most of the spatial vision chips, generically called silicon retinas, implements only one spatio-processing operation like smoothing, global objects orientation. These are based on retinas models of the vertebrates for motion detection [1] and contrast enhancement. The models from the latter case are divided in biological [2] and computational [3] models. Local and global intensity adaptation and contrast enhancement are some of the general features to whom it is paying attention. Among computational models proposed in literature are: Laplacian of Gaussian (LoG), difference of Gaussiens (DoG), direct derivative of biharmonic equations and linear and multiplicative lateral inhibition. It is hard to say which one of these models are closer to the biological retinas. Fovea sensors are another type of vision chips where the physical dimension and the layout of the sensors form a log-polar or linear-polar mapping of the input image. This kind of mapping is a scale and rotation invariant transformation. Furthermore it has a higher resolution in the central part which logarithmically decreases in the peripheral layers. B. The spatial vision chips are able to perform only stationary operations, since the spatiotemporal ones sense the time dependency of images features. Fortunately, the proposed linear analog processor proved to be quite robust under real technological conditions. The algorithms which have been developed for edge detection [4] are based on computational models such as models based on light intensity gradient, features and correlation of two frames after previous edge detections operation has been performed.

V.2 Edge detection with the architecture analyzed in this work V.2.1 Edge detection performed with 1D analog parallel architecture

In order to emphasize the use of the proposed analog processor for edge detection, the network depicted in Fig. II-8 is considered. The circuit is configured as high-pass filter. Hence, a rectangular pulse about 200mV is loaded in the spatial filter as is shown in Fig. V-1. The analog ground is 1.65V and the dynamics of HPF is frozen when the center of the input pulse reaches 1.7V. The effect of R0=1/A0 resistance which is simultaneously connected with the nodal capacitor is highlighted as well. Therefore, when R0 decreases, the dispersion curve drops, thus increasing the quality factor. 43

Applications

1.8

a0=9uS

1.85

Amplitude

Amplitude

1.75 a0=1fS 1.8 1.75

1.7 1.65

1.7 1.6 1.65 1.55 1.6

10

20

30 40 Space

50

60

10

20

30 40 Space

50

60

A A’ Figure V-1. A) Input rectangular pulse; A’) Simulation result for high-pass filtering of the input signal

Due to the fact that we only have to amplify a little bit the most unstable spatial mode without affecting the shape of input pulse, the quality factor of HPF is not need to be high. Thus 1/A0 resistance decreases the processing time for edge detection operation. V.2.2 Edge detection performed with 2D analog parallel architecture

In this case, the 32x32 network from Fig. IV-8 has been used. The network is configured as 2D HPF and the peak value of the input image is about 20mV and 200mV for the log-domain respectively linear implementation. For edge detection operations, a chessboard type input image has been used as is shown in Fig. V-3B. The nodal voltages of the network are frozen after 860ns and the emerged voltages maps for both log-domain and linear filters are depicted in Fig. V-3A respectively A’.

A’

A

B) Input image B’ B’’ Figure V-3. A, A’) Nodal voltages map after 860ns for both linear ideal and log-domain transistor level networks (3D representation); B) Chessboard type input image; B’, B’’) Edge detection after a given threshold has been applied on A and A’(2D representation).

In order to test the efficiency of this circuit for gray-level images, the following representative pictures was picked from the literature and the simulation results are given in Fig. V-4A, B.

A

A’

44

A’’

Applications

B B’ Figure V-4. A) Input images; A’, A’’) Simulations results obtained with log-domain high-pass filter (HPF) after 560 ns respectively after a certain threshold was applied. B) Input image; B’) Simulation results which was obtained with HPF after 1.42 us.

Assuming that the linear network is loaded from an external digital memory, if the peak value of input images is 200mV then an 8 bits D/A convertor with VLSB=781uV is required. VLSB matches in this care with one gray level.

V.3 Smoothing V.3.1 Smoothing performed with 1D analog parallel architecture

This kind of operation is often encountered in image processing and it is performed by lowpass filtering. For this application, the network sketched in Fig. II-8 is used. DC component is also amplified therefore nodal voltages have to be sampled before any nonlinearities have been reached. Fig. V-5A’ highlights the influence of A0 parameter on the “strength” of the filtering operation. 2.4

1.8

Amplitude

Amplitude

1.85

1.75 1.7

2.2 A0=9uS

2

A0=1fS

1.8 1.65 1.6

1.6 10

20

30 40 Space

50

60

10

20

30 40 Space

50

60

A A’ Figure V-5. A) Input rectangular pulse; A’) Simulation results of low-pass filtering of the A pulse

The filter dynamics is frozen such that the maximum value of the nodal voltages vector reaches 2.5V (this value is set by input dynamic range constraints of the row buffer). V.3.2 Smoothing performed with 2D analog parallel architecture

The 32x32 network from Fig. IV-8 has been used. For black and white input image, the same chessboard type image has been chosen. Fig. V-6A, A’ sketches a 3D representation of the nodal voltages map of the low-pass filter, after 860ns.

A

A’

45

Applications

B) Input image B’ B’’ Figure V-6. A, A’) nodal voltages map after 860ns for both linear ideal respectively log-domain implementation (3D representation); B) Chessboard type input image; B’, B’’) simulation results for smoothing operation of the input image after an appropriate threshold is applied (2D representation).

Fig. V-7A’, B’ and A’’, B’’ show the simulations results for gray scale images of linear respectively log-domain filter.

A

A’

A’’

B B’ B’’ Figure V-7. A, B) Input image; A’, B’) Linear architecture - Low-pass filtering after 1.56us; A’’, B’’) Low-domain filter - Low-pass filtering after 1.56us.

V.4 Image segmentation Image segmentation aims at objects recognition, borders positions estimation for a moving object and image compression [4]. It is known that linear resistive grids can be used only for smoothing operations [5]. However, if the so called “resistive fuses” are used instead of linear resistors, the basic network is fragmented into zones that have the same spatial contrast that do not surpass a given threshold. Our analog architecture is able to perform this kind of operation without fragmentation techniques. This behavior is obtained by programming the network in a low-pass configuration. Due to the initial differences between similar contrast levels are also amplified by the low-pass filter, it somehow compensate the unlike edges filtering, finally resulting that the need for fragmentation is not a must in order to obtain a segmentation effect. The cell structure of the proposed fragmented network is depicted in Fig. V-8. „Dif_comp” block makes the decision if a certain (i,j) cell stays connected with a neighbor cell by computing the difference between the voltage drop on the first one, V(i,j), and the latter one. This difference is compared with a given threshold, VREF. If the result of the comparison is positive then the corresponding interconnection is cut off, otherwise is kept. Thus, the uniform regions of the image can be decoupled. This mechanism is borrowed from the operating principle of the nonlinear resistive grid [6], [7]. D flip-flops store a certain network interconnectivity map.

46

Applications

Figure V-8. Structure of a cell from the segmented network

It is hard to appreciate the behavior of the segmented network compared to the counterpart compact filter in what concerns the temporal evolution of the spectral components. Yet, as it can be observed from Figure 7.3a, b, c, the fragmented network has a different dynamic versus the compact one (Figure 7.1a, 2a, 2c). In order to catch the differences between both implementation, before the filter reaches nonlinearities, it has to be setup with a large selectivity (close to the instability limit) to slow down the unstable behavior. Another remark regarding these two types of implementation refers at the processing speed per frame: since a larger network has a higher inertia of instability than that of a smaller one, the fragmented filter performs the same results like the compact one, but in a shorter time. Anyway, each sub-network could be analyzed by the known means of the compact architecture. All sub-network features are kept only if all coefficients of each active pixel remain unchanged. The interconnectivity map between network cells can be set from the beginning and kept during the filtering process, continuously updated or only at a given moments. The advantage of this implementation becomes significant when different region from an image has to be filtered in different ways. This is possible only because each sub-network exhibit an independent dynamic compared to the others. Also, the selective decoupling technique is useful for a nonlinear processing by disconnecting the saturated region from the rest of the network, thus the nonlinear part of the filter does not affect the linear one. The pixel structure is slightly changed comparing with the basic scheme from Fig. II-8. In addition, it’s worth mentioning the presence of the circuit that calculates the module of the difference between voltages of two consecutive pixels. This value will be compared with an appropriate threshold and stored in order to control the gain of the voltage controlled current sources. In Fig. V-9 results obtained using the fragmented low-pass filter will be compared with those of the compact one.

47

Applications

3.5

Amplitude

Amplitude

1.5

1

0.5

0

3

2.5

2.5

2 1.5

20

30 40 Space

50

60

2 1.5

1

1

0.5

0.5

0

10

3.5

3

Amplitude

2

10

20

30 40 Space

50

60

0

10

20

30 40 Space

50

60

Low-pass filtering after 60us, Low-pass filtering after 336us with Vthreshold=1.65V Figure V-9. Comparison between compact LPF/ fragmented LPF

Input image nr. 1; 100mV ripple

In the above figure, the 1D filter was loaded with initial conditions having maximum amplitude around 1.65V, considering a 3.3V supply voltage. So, at least for these simple 1D examples, both networks exhibit similar behavior. Next, several results for image segmentation, obtained using a 2D network, implemented at transistor level in the linear or log-domain will be presented. For this purpose, a comparison between the fragmented filter, log-domain filter frozen after reaches saturation and the nonlinear resistive grid implemented by pulse-modulation technique [8] will be made.

1A

1B

1C

1D

2A

2B

2C

2D

3A

3B

3C

Input image nr.1

4A 4C 4B 4D Figure V-10. Image segmentation using different techniques compared with the results obtained with the nonlinear network implemented with pulse-modulation techniques [8]. ¾ 1A, B, C, D – low-pass filtering after 71, 96, 104 respectively 113us ¾ 2A, B – low-pass filtering with VREF=1.65V after 160us respectively 306us (2C, 2D) ¾ 3A – low-pass filtering using the segmented filter after 300us respectively 240us (3B); 3C – low-pass filtering using the segmented filter, with the reloading of the network interconnectivity configuration after 140us. ¾ 4a, b, c, d – segmented versions obtained with a nonlinear resistive network implemented using pulse-modulation technique.

48

Applications Fig. V-11, 12 show several relevant snapshots taken from the log-domain filter (Fig. IV-8) after reaching saturation, at different times.

Input image nr.2

1B

1A

1C

1D

2D 2A 2C Input image nr.3 2B Figure V-11. Image segmentation using 2D logarithmic filter (Fig. IV-27) after inherent nods reach saturation. ¾ 1A,B,C,D – snapshot on the log-domain filter frozen after 78, 95, 103 and 110us respectively. ¾ 2A,B,C,D – snapshot on the log-domain filter frozen after 71, 98, 107 and 126us respectively.

It appears that the nonlinear log-domain filter can be used for image segmentation. Fig. 12 confirms the usefulness of linear/nonlinear low-pass filtering for image segmentation as seen from the comparison of the simulations performed with the compact linear filter, nonlinear logdomain filter and nonlinear restive grid implemented using another circuit solution [7].

Input image nr.4

1A

1B

1C

Noisy input image 2A 2B 2C Figure V-12. Image segmentation using linear and log-domain filter after reaching saturation and the comparison of the simulations results with that ones based on nonlinear resistive grid reported in the literature. ¾ 1A – segmentation obtained with nonlinear resistive grid [7] ¾ 1B, 1C – low-pass filtering with the linear filter after 192 respectively 352us ¾ 2A, 2B, 2C – segmentation using log-domain filter frozen after 98, 108 respectively 116us

V.5 ECG signals classification V.5.1 ECG signals

An ECG recording is a measurement of the heart activity taken by means of electrodes placed in specific locations of the thorax. Even though the verdict regarding ECG classification should 49

Applications be given by the cardiologist, automated classification could be of great interest in patient monitoring. Many algorithms for ECG heartbeat patterns detection and classification are presented in the literature and have been implemented. They include approaches based on Hidden Markov Models and artificial neural networks [1-3], Support Vector Machines and mixture-ofexpert methods, signal processing techniques like Wavelet transform [4], frequency analysis, Principal Component Analysis, filter banks [5], [6] etc. An extensively used method in signal classification is the extraction of relevant features using filter banks. Each feature is the energy of a filtered version of the signal. Obviously, the classifiers quality depends on the filter bank realization, the main bottleneck in speed of a processing chain being given by the filtering operations. Therefore, the design of a filter bank for signal classification based on architecture able to perform successive linear filtering operations is an interesting approach. When the classification is based on segmentation of the temporal ECG pattern, this approach should obviously consider the computing time and power consumption as well as the heart rate variability. The main idea of this paper is to extract the above features by transforming the ECG time signal into a spatial signal which will be processed using a bank of 1D spatial filters. The mean energies of the filters outputs represent the feature vector used to classify the signals. The mean energy calculation can be performed either in the digital domain, meaning that nxM A/D conversions should be done or in the analog domain, where only 1xn A/D conversions are needed which is more convenient, where M is the number of cells and n is the number of the filters in a filter bank. The 1D analog parallel architecture that will be used is shown in Fig.V-13 where V1...VN are the nodal voltages on the cells. The input of each voltage controlled current sources is connected to an analog multiplexer in order to provide programmable interconnections between cells. Thus, the network can be configured as higher order neighborhood spatio-temporal band-pass filters with different central frequencies and selectivity or various comb filters. VN V2 V1 Sel1

MUX

MUX

Sel2

Ak1

Ak2

MUX

MUX MUX

MUX

MUX

Ak1

Ak2

Ak1

MUX

Ak2

Homogeneous network: Alf1=Arg1=Ak1 Alf2=Arg2=Ak2 Vi

Alf2Vi-2

Alf1Vi-1

Arg1Vi+1

R Ci

Ak1

Ak2

MUX = Analog multiplexer

Arg2Vi+2

Figure V-13. 1D programmable pixel schematic

V.5.2 1D architecture with high order neighborhoods

The network used for this kind of application is a generalization of the one depicted in Fig. II-8. For ECG signals classification, a network of M=200 cells has been used. Thus, the differential linear equations for any order lateral interconnections can be written as well in (1), where 1≤ i, k1, k2,≤M, k1, k2≠i. dx C i = − A0 xi + Ak1 ( xi − k1 + xi + k1 ) + Ak2 ( xi − k2 + xi + k2 ) (1) dt

50

Applications și xi=xi(t), Ci=C ∀ i=1,..,M. Equation (1) is written for an homogeneous network and ring boundary conditions. According to the theory from the second Chapter, the dispersion curve for two layers of high order interconnection has the following form: 2π k1m 2π k2 m K A (m) = − A0 + 2 Ak1 cos + 2 Ak2 cos (2) M M V.5.3 Analysis and synthesis of high-order spatio-temporal filters

The main issue of the design of the filter bank is that of finding appropriate template coefficients for higher order neighborhoods such that the filter bank covers the whole spectrum of interest by means of frequency bands and sub-bands. For three layers of interconnection, each cell has six voltage controlled current sources, and the dispersion curve for this kind of homogeneous network has the following form: 2π k3 m 2π k1m 2π k2 m + 2 Ak 2 cos + 2 Ak 3 cos (3) M M M For a homogeneous network: Alf1=Arg1=A1; Alf2=Arg2=A2; Alf3=Arg3=A3; A0=1/R0. The k1, k2 and k3 coefficients, k1≠k2≠k3, are positive natural values that depends on the filter characteristic to be designed. In order to classify a data base of ECG signals, three filter banks have been used, FB1, FB2 and FB3. K A (m) = − A0 + 2 Ak1 cos

FB1 is made of eight linear filters: four order neighborhood high-pass filter (HPF) (for filter selectivity improvement), band-pass filters having the central spatial frequency on the ninetieth (BPF90), eightieth (BPF80), sixtieth (BPF60), fortieth (BPF40), twentieth (BPF20), and respectively tenth (BPF10) spatial mode and a fifth order low-pass filter (LPF). The dispersion curve for a high order band-pass filter is obtained quite intuitive by adding two or three cosine waves: one of low frequency and the others of higher frequency having the global maximum on the same mode with central spatial frequency of the desired filter. The synthesis of two band-pass filter having the central frequency on the sixtieth (HPF60) and fortieth (HPF40) spatial mode is shown in Fig. V-14.

Figure V-14. KA(m) synthesis for HPF60 respectively HPF40

It can be observed that HPF60 and HPF40 can be obtained from each other by merely changing the signs of A1 and A2. Using graphical methods the parameters of the FB1 filters have been determined and presented in Table I. Filter type HPF

A0 (μS) 20

A1 (μS) -10

Table V-1. FB1 coefficients A2 Connection order (μS) (k1) 5 1

51

Connection order (k2) 4

Applications

BPF90 BPF80 BPF60 BPF40 BPF20 BPF10 LPF

40

-20 -12 -12 -12 12 20 10

20 40 20

-5 -5 5 -5 -5 -5 5

1 1 2 2 1 1 1

10 4 3 3 4 10 5

As an example, in order to synthesize a band-pass filter having the central frequency on M0, k1 and k2 have to be found from the following constraints: dK A (m) 4π =− dm m = M 0 M

2π k1M 0 2π k2 M 0 ⎞ ⎛ + A2 k2 sin ⎜ A1k1 sin ⎟ = 0 , K A (M 0 ) > 0 M M ⎠ ⎝

∀ M 0, ,

dK A (m) = 0 ⇒ K A ( M 0, ) < 0 dm m = M 0,

(4)

(5)

The two conditions mean that M0 must be the only global maximum with positive real part, despite the latter one which guaranties that all local maximums should have a negative real part, i.e. are stabile. Fig. V-15 shows the spatial frequency characteristics of FB1 from 0 to M/2-1.

Figure V-15. FB1 – modules of spatial frequency characteristics

In order to fill the gaps in the spatial frequency domain, a second filter bank, FB2, has been used. This one has tree layers of interconnections and is built of band-pass filters having the central frequencies on the third, sixth, twelfth, twenty-fifth, and respectively fiftieth spatial mode. Tab. V-2 shows the configuration parameters. Filter type BPF3 BPF6 BPF12 BPF25 BPF50

A0 (μS) 40 20 0

Table V-2. FB2 coefficients Connection order A1 A2 A3 (μS) (μS) (μS) k1 k2 k3 10 10 -5 1 5 40 20 5 -5 1 5 13 10 5 -5 1 2 7 10 5 -10 1 2 3 0 -10 0 * 2 *

k1, k2 and k3 coefficients mean the neighborhood orders (e.g. for a HPF with k1=1 and k2=4 the „i” cell is left connected to the i-1 and i-4 cell respectively and right connected to the i+1 and i+4 cell respectively). As already mentioned, the principle of ECG classification consists in converting the temporal signal into a spatial signal with the remarkable advantage that the position of the QRS complex does not matter for ring boundary conditions. Since a low-pass filter (LPF) is used as well, the mean value should be subtracted from each waveform. The spatial frequency characteristics of the second bank of filters are shown in Fig. V-16.

52

Applications

Figure V-16. FB2 – modules of the spatial-frequency characteristics

Besides, in order to increase the classification rate of the ECG signals, a third filter bank FB3 is used. FB3 has six comb filters, using one layer of interconnection of fifteenth (comb15), twentieth (comb20), twenty-fifth (comb25), thirtieth (comb30), thirty-fifth (comb35) and fortieth (comb40) spatial modes respectively. According to the simulation results, FB3 significantly increase the classification rate, despite their spatial frequency characteristics are not orthogonal. V.5.4 The design of the reconfigurable high-order spatio-temporal filter V.5.4.1 1D reconfigurable pixel architecture

According to the synthesis procedure sketched above, a programmable architecture which can be configured in 19 different modes results. Fig. V-17 shows the schematic of one pixel.

Figure V-17. 1D pixel architecture

Moreover, the negative values of the transconductances are easily obtained by reversing the inputs, using “Input inverter” blocks. 53

Applications V.5.4.2 The effect of network nonhomogeneities on the filter banks

The influence of the non-idealities of the network on the programmable filter has been studied using Monte Carlo and corner deviations of A0-3. The results are presented in Tab. V-3. TableV-3. The non-idealities effect on the network parameters Corner Parameters – typical Monte Carlo – Standard values (uS) deviations (%) 1 2 3 4 27 15 * * [16.7; 24.3] – 45% A0=20 54 30 * * [33.4; 49] – 50% A0=40 6 4.4 5.6 4.7 [4.1; 5.5] – 40% Gm=5 12 8.6 11.4 9.3 [7.5; 9.9] – 50% Gm=10 14 10.5 13.3 11.2 [9.8; 13] – 42% Gm=12 24 17.7 22.6 19 [16.7; 22.1] – 35% Gm=20

Extrapolating the results from Tab. V-3, the behavior of the entire network under real conditions can be evaluated. In order to have a good assessment of these deviations, the following MSE (Mean Squared Error) and PSNR (Peak Signal-to-Noise Ratio) formulas are applied for each spatial frequency characteristics. MSE is calculated between ideal and real frequency characteristic of each filter of the banks. According to the corner and Monte Carlo simulation results for FB1-3, the PSNR values are up to 32 dB, thus the dynamics of the linear network is quite robust under real technological conditions, therefore any additional calibration mechanism is not needed. V.5.5 Classification rate

By using the dynamics of the programmable analog parallel architecture described above, the entire database was sequentially loaded in the capacitive nodes of the network. After a prescribed time, the dynamics of the (unstable) filters is frozen and the filtered version of each input signal from the database is converted into a number by mean energy calculation. Finally, for each 200 samples ECG signal a 1x19 feature vector will be obtained. The filtering operation of each signal from the database by means of FB1-3 takes maximum 19us. The classification operation has been tested on the MIT_BIH database. This one is divided into 8 classes: the first class keeps signal from healthy people, while the other ones contain signals affected by different pathologies. For each class, 700 ECG’s were randomly picked. Using these filter banks an 87.8% and 92% classification rates with, and respectively without normalizing the input signals has been obtained. The classification rates for the latter case are given by Tab. V-4. Filter banks FB1 FB2 FB3 FB1-3

Table V-4. Classification rates Classification rates (%) Typical Corners 1 2 3 77.4 76.6 83.2 77.6 80.1 80.3 80.5 79.9 74.7 83.2 82.3 82.6 92 91.9 91.5 91.8

MC 4 77.7 80.1 82.6 91.8

77.6 80.8 82.7 91.5

Each class contains signal from specific heart pathologies but also some abnormalities. If these irregular waveforms are removed from the database, the classification rate can be increased up to 96%.

54

Applications V.5.5.1 Network dimension programmability

The main problem when filtering ECG signal is the variable number of samples for different input signals. One of the advantages of the programmable spatial filter is that the frequency characteristics depend on the interconnections order rather than the number of cells. Thus, the dimension of the network can be set up depending on the number of the samples for each input signal, without affecting the performance of the filter. Consequently, the network can be design with a sufficiently high number of cells e.g. M=400; supposing that a signal with M=230 samples is loaded it follows that 170 cells should be inhibited by keeping them in a reset state (i.e. all the inputs are grounded) while the boundary cells are ring connected. The ECG is converted into a spatial signal which is further processed using nineteen different programming modes. According to the simulation results, the network has a robust behavior under corners, process variations and mismatches, therefore an additional calibration circuit is not required. A 92% classification rate was achieved at a maximum current consumption rate about 120uA per cell.

V.6 Textures classification V.6.1 Images classification

The classification algorithm uses the outputs of a filter bank which covers the whole spatial frequency spectrum [11]. Each input image is successively passed through the filter bank and the mean output energies are used as features for classification as is shown in Fig. V-18. The implementation on DSP’s of this kind of algorithms requires many multiplication operations which diminish speed and increase power. As an alternative the extraction of the filtered images can be done using a reconfigurable analog parallel network which can be run in real time applications. The mean energy of each filtered version from the “n” filters can be calculated either in the digital or in the analog domain. The latter solution is better since it requires only “n” analog-to-digital conversions for an MxN network, instead of nxMxN, where n is the number of filters. In order to obtain optimum features, the required band-pass filters should have frequency characteristics that cover evenly the frequency spectrum. The filters can be obtained by using higher order neighborhoods connections between cells as it will be discussed in the next sections.

Figure V-18. Classification method using filter banks

It is important to note that the available area for the interconnection lines is almost the same for both second and higher order connections, since the number of them is kept the same. For these 55

Applications reasons, each cell in the filter bank will be connected only with two neighboring layers whose order is set up in the design phase. V.6.2 Analog parallel architecture with high-order interconnections

We will assume an MxM cells architecture as shown in Fig. 1. For texture classification we used M=64 in order to diminish the silicon area. However, with an optimized layout, the number of the cells can be easily increased. The following linear differential equations can be written for each (i, j) node, 1≤ i, j ≤M, where k1, k2≠ i, j are the neighborhoods order: dxij

= − A0 xij + Ak1 ( xi − k1 , j + xi + k1 , j + xi , j − k1 + xi , j + k1 ) + Ak2 ( xi − k2 , j + xi + k2 , j + xi , j − k2 + xi , j + k2 ) (6) dt where xij=xij(t), Cij=C ∀ i, j=1,..,M, and 1≤ k1, k2≤ M. Equation (6) is written for a homogeneous network and ring boundary conditions are used. Obviously, the linear system of equations (6) contains coupled equations. Using the following change of variable: C

M −1 N −1

∧

xij (t ) = ∑ ∑ Φ MN (i, j , m, n) x mn (t )

(7)

m =0 n =0

where, for ring boundary conditions ΦMN have the form φMN (i, j , m, n) = e substituting (7) in (6), a decoupled linear system equations is obtained:

j

2π im M

e

j

2π jn N

and

^

^ d x mn C = K A (m, n) x mn (8) dt where KA(m,n) is a function of the spatial modes i.e., the dispersion curve, and represents the real part of the eigenvalues of the spatial modes. The spatial filtering is based on the possibility that part of the spatial modes can be unstable and thus can grow until the dynamics is frozen. For a network with second order neighborhood:

2π k1m 2π k1n ⎞ 2π k2 m 2π k2 n ⎞ ⎛ ⎛ K A (m, n) = − A0 + 2 Ak1 ⎜ cos + cos + cos ⎟ + 2 Ak2 ⎜ cos ⎟ M N ⎠ M N ⎠ ⎝ ⎝

(9)

V.6.3 The design of high-order 2D spatio-temporal filters

As already shown the frequency characteristic of a spatial filter based on unstable modes is given by the dispersion curve, KA(m,n) depicted in (9). Since in general it is impossible to analytically solve a high order trigonometric equation in order to design a given band of unstable modes, the graphic method has been used. Starting from (9), in order to synthesize a band-pass filter (BPF) having the central frequency on the (M0, N0) spatial mode, the k1 and k2 parameters have to be determined by imposing the following conditions, only if m, n ∈ \ : 4π ⎛ ∂K A (m, n) ⎞ =− ⎜ ⎟ ∂m M ⎝ ⎠m = M 0

2π k1M 0 2π k2 M 0 ⎞ ⎛ + Ak2 k2 sin ⎜ Ak1 k1 sin ⎟= 0 M M ⎠ ⎝

(10)

4π ⎛ ∂K A (m, n) ⎞ =− ⎜ ⎟ ∂n N ⎝ ⎠ n = N0

2π k1 N 0 2π k2 N 0 ⎞ ⎛ + Ak2 k2 sin ⎜ Ak1 k1 sin ⎟= 0 N N ⎠ ⎝

(11)

The modes curve for this kind of filter must fulfill two conditions:

56

Applications •

•

They must exhibit a global maximum where the value of the function has to be positive, meaning that K A ( M 0 , N 0 ) > 0 (12). This condition ensures that at least one spatial mode has a positive real part. The value of the function calculated in all the local maximum and minimum points has to be negative, or ∀( M 0, , N 0, ) so that:

⎛ ∂K A (m, n) ⎞ ⎛ ∂K A (m, n) ⎞ , , (13) ⎜ ⎟ , =⎜ ⎟ , = 0 ⇒ K A (M 0 , N0 ) < 0 ∂m ∂n ⎝ ⎠m= M 0 ⎝ ⎠ n = N0 Basically, starting with the central frequency of the filter on the (M0, N0) spatial mode, and considering the above conditions for KA(m,n), the k1, k2, Ak1, Ak2 and A0 parameters can be determined. Tuning the Ak1, Ak2, A0 parameters, relations (12) and (13) must be satisfied. From the design point of view, it’s quite important to fulfill the following two constraints: •

no local or global maximum points at the limit of instability should exist (e.g. (M0,N0) spatial mode with KA(M0,N0)=0 and (13) ), because due to the variation of the process parameters in a physical implementation, a stable mode can become unstable and vice versa.

•

k1, k2 must have a minimum value.

Next, the design of a bank of eight high order spatial filter will be discussed: a four order highpass filter (HPF), six order band-pass filters (BPF1 – BPF6) having the central frequency on fourth, eighth, fourteenth, twentieth, twenty-sixth and respectively thirtieth spatial mode, and a five order low-pass filter (LPF). Fig. V-19 presents the dispersion curve KA(m,n) for BPF2 – BPF5.

Figure V-19. Dispersion curves for four band-pass filters

It’s worth mentioning that the filters BPF1 - BPF6, BPF2 - BPF5 and BPF3 - BPF6 are complementary, meaning that each one from every pair can be designed starting with the other one by merely changing the sign of the coupling parameters Ak1, Ak2. The coefficients k1 and k2 are the neighborhoods order (e.g. HPF with k1=1 and k2=4 mean that (i, j) cell is connected with (i-1,j) and (i-4,j), (i+1,j) and (i+4,j), (i,j-1) and (i,j-4), respectively with (i,j+1) and (i,j+4). Table V-5 contains the configurations parameters of the network in order to implement the filter banks. Filter type A0 HPF BPF6 BPF5 BPF4 BPF3

56e-6

Table V-5. Filter bank coefficients First neighborhood A1 A2 order (k1) 12e-6 -5e-6 1 12e-6 5e-6 1 12e-6 9e-6 1 12e-6 -9e-6 2 12e-6 9e-6 2

57

Second neighborhood order (k2) 4 10 4 3 3

Applications BPF2 BPF1 LPF

-12e-6 -12e-6 -12e-6

9e-6 5e-6 -5e-6

1 1 1

4 10 5

To obtain the spatial frequency characteristic, the network was inputted with a 100mV spatial impulse. Every filter was normalized to unity amplitude. V.6.4 Aspects related to the design of the 2D programmable spatio-temporal filters V.6.4.1 2D network programmability

The network was designed to have eight configuration modes. Fig. V-20 presents the schematic of a nine bits programmable cell. The “OTA” block has a discrete variable transconductance, gm. It is worth to mention that a continuously gm control solution is not convenient, because only to double the transconductance, the biasing current must be increased four times. Alternatively, modifying the value of gm by scaling the channel width of the output stage transistors, a lot of power can be saved.

Figure V-20. Programmable pixel schematic

V.6.4.2 Network nonhomogeneities

Process variations and mismatches cause deviations of the A0, Ak1, Ak2 from the typical values given by the first column of the Tab V-6. Running a Monte Carlo simulations on the flatten network is not feasible, therefore some system level parametric simulations was set up using technological information shown in Tab II. The simulations were performed with Spectre, using 0.35um CMOS technology. Typical

Table V-6. Technological deviations of Ak1, Ak2, A0 Corner1 Corner2 Corner3 Corner4

58

Monte Carlo

Applications A1,2=5uS A1,2=9uS A1,2=12uS A0=56uS

5.8uS 10uS 14.3uS 80uS

4.26uS 7.8uS 10uS 45uS

5.47uS 10uS 13.5uS X

4.6uS 8.4uS 11.3uS X

[4uS : 5.6uS] [7.8uS: 9.6uS] [10uS:13.5uS] [46uS: 68uS]

In order to have a quantitative estimation on the deviation of the spatial frequency characteristics caused by nonidealities, both ideal and transistor level filter were inputted with the same 100mV spatial impulse. The processing time for each filtering operation is on average about 200ns. We calculate the PSNR (Peak Signal-to-Noise Ratio) given by (14) to estimate the differences between ideal and real filters. ⎛ MaxI2, K ⎞ , PSNR = 10 log10 ⎜⎜ (14) ⎟⎟ i =1 j =1 ⎝ MSE ⎠ where MxN is the size of the network, I(i,j) and K(i,j) with i≤M, j≤N are the ideal and real frequency characteristics respectively, and MaxI,K represents the maximum value of the ideal or real frequency characteristics since the dynamics of both systems were “frozen” when the maximum value of the modules of the transfer functions reaches the unity value. PSNR is calculated for every filter both for Monte Carlo and corner simulations results. MSE =

1 MN

M

N

∑∑ ( I (i, j ) − K (i, j ) )

2

Analyzing the simulations results which give PSNR values up to 34dB, it can be concluded that technological deviations do not significantly affect the behavior of the network; therefore this architecture does not need a special calibration circuit. V.6.5 Classification rate

The network performances have been evaluated using the Brodatz album [12]. The classification rates obtained with five different filter banks based on those eight filters have been calculated. Tab. V-7 presents the results obtained with Bank1-5 which are composed by BPF2-5, BPF16, LPF - BPF2–5 – HPF, LPF – BPF1-6 – HPF, and BPF1-6 – HPF respectively. “DB1, 2” are two data bases composed by 16, 67 classes, and each class contains 28 and respectively 16 textures. Table V-7. Simulation results obtained with different filter banks and data bases Ideal filter bank Transistor level Data Filter filter bank bases bank type Classification rate Bank1 94,86% 95.9% Bank2 97.1% 97.5% DB1 Bank3 94% 95.5% Bank4 96.6% 96.8% Bank5 98.4% 98.4% Bank1 54% 55.4% Bank2 71.5% 69.8% DB2 Bank3 61.8% 63.2% Bank4 71% 69.8% Bank5 73.5% 71%

For both data bases, “DB1 and DB2”, the best classification rate is given by the last filter bank, Bank5 composed of seven filters, since it covers in the most uniform way the spatial frequency spectrum of the input images. Moreover, it is important to mention that for a larger data base, seven filters are not enough to achieve a reasonable classification rate. Thus, results can be improved from 73.5% to 91%. Fig. V-21 shows the outputs of Bank5 for a given input texture form DB1.

59

Applications

Input image

BPF1 version after 245ns





BPF6 version after HPF version after 245ns 195ns Figure V-21. Example of processing a texture with the seven filters bank

Basically, every texture is filtered, and for each filtered version the mean energy value is calculated. Thus, each 64x64 texture will have associated a 1x7 feature vector. If the calculation of the mean energy is realized in the analog domain, for a certain input image only 7 analog to digital conversions are required instead of 4096 which significantly increase the processing speed, considering that the main bottleneck in speed operation for this kind of massively interconnected networks is given by the A/D conversion. In this section the analysis and design of a 64x64 analog parallel architecture used for texture classification have been briefly presented. The network can be configured in order to implement eight high order spatio-temporal filters. Each filtering operation can be realized between 150ns and 260ns, depending on the filter type. For a data base composed of 16 classes, each one having 28 textures, a 98.4% classification rate was achieved. Monte Carlo and corner simulations were performed to confirm the network robustness which doesn’t need a special circuit calibration. The same network can be configured in eight modes by the mean of nine bits programmable basic cell.

V.7 References [1] [2] [3] [4]

Alireza Moini, „Vision chips”, published by Kluwer Academic Publishers, pp. 11-13, 2000. Alireza Moini, „Vision chips”, published by Kluwer Academic Publishers, pp. 15-18, 2000. Alireza Moini, „Vision chips”, published by Kluwer Academic Publishers, pp. 18-19, 2000. Mika Laiho, Jonne Poikonen, Kati Virtanen, Ari Paasio, „Self-adapting compressive image sensing scheme”, 11th International Workshop on Cellular Neural Networks and their Applications, Santiago de Compostela, Spain, pp. 125-128, 14-16 July 2008. [5] S. Naso, M. Storace, G. Pruzzo and M. Parodi, “CMOS implementation of a cellular nonlinear network for image segmentation”, in Proceedings of the 8th IEEE International Biannual Workshop on Cellular Neural Networks and their Applications, Budapest, Hungary, July 22-24, 2004. [6] J. Schemmel, K. Meier and M. Loose, “A scalable switched capacitor realization of the resistive fuse network”, Analog integrated circuits and signal processing, vol. 32, pp. 135148, 2002. 60

Applications [7] P. C. Yu, S. J. Decker, Hae-Seung Lee, C. G. Sodini, John L. Wyatt, “CMOS resistive fuse for image smoothing and segmentation”, IEEE Journal of Solid-State Circuits, vol. 27, no. 4, pp. 545-553, April 1992. [8] H. Ando, T. Morie, M. Miyake, M. Nagata, A. Iwata, “Image segmentation/extraction using nonlinear cellular networks and their VLSI implementation using pulse-modulation techniques”, IEICE Trans. Fundamentals, vol. E85-A, no. 2, pp. 381-388, February 2002. [9] Valtino X. Afonso, Willis J. Tompkins,Truong Q. Nguyen, Shen Luo, “ECG beat detection using filter banks”, IEEE Transactions on Biomedical Engineering, vol. 46, no. 2, pp. 192202, February 1999. [10] M. Fira, L. Goraş “An ECG signals compression method and its validation using CNN’s”, IEEE Trans. On Biomedical Engineering TBME, vol. 55, no.4, pp. 1319-1326, April 2008. [11] Trygve Raden, John Hakon Husoy, „Filtering for texture classification A compare study”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 4, pp. 291-310, April 1999. [12] P. Brodatz, “Textures: A photographic album for artists and designers”, Dover: New York, 1966.

61

Contributions

VI

CONTRIBUTIONS

The main personal contributions related to each chapter are presented as follows:

VI.1 Contributions presented in Chapter 2 ¾ Transistor level analysis, design and synthesis of a new analog parallel architecture which is able to perform different kind of spatio-temporal filtering operations. Identification of most important parameters of voltage controlled current sources and system level design parameters has been studied as well [IV1]; ¾ Identification of main limitations and proper tradeoffs for the physical design of linear architecture was also achieved. The effect of non-idealities and nonhomogeneities on the network dynamics has been discussed, and the deviations between transistor level and ideal filters have been computed; ¾ Electrical design of core architecture in order to be configured in different operating modes. The design of row and column buffers of the back-end circuitry.

VI.2 Contributions presented in Chapter 3 ¾ The possibilities of loading the proposed network with the information provided by a CMOS sensor have been studied. Electrical design of an APS for spatio-temporal filtering operations has been done; ¾ Synthesis of the cellular network using degenerate invertors in order to minimize the required area. ¾ The design of self “freezing” circuit of network dynamics.

VI.3 Contributions presented in Chapter 4 ¾ Log-domain mapping of the linear differential equations of an autonomous system have been studied [IV6]. An reconfigurable architecture in logarithmic domain have been design as well and some calibration mechanisms for the log-domain spatio-temporal filters have been studied; ¾ The linear network has been synthesized in the logarithmic domain by using MOS transistors biased in weak inversion and a study of nonidealities and nonhomogeneities effects on log-domain filters has been also performed [IV2], [IV5]; ¾ Analysis, design and synthesis of 2D reconfigurable log-domain network [IV4].

VI.4 Contributions presented in Chapter 5 ¾ Edge detection and smoothing operations using the purposed 1D and 2D analog processor have been highlighted [IV3]; ¾ Image segmentation using a low-pass spatio-temporal filter and a new cellular architecture based on the operation principle of the nonlinear resistive grid have been purposed [IV6];

63

Contributions ¾ ECG signals classification. The analysis, design and synthesis of a 1D high-order spatiotemporal filter have been discussed and the electrical design of the parallel architecture which can be programmed to perform 19 different kind of filters has been achieved [IV7]; ¾ Texture classification using a bank of filters. The analysis, design and synthesis of a 2D high-order spatio-temporal filter based on the dispersion curve for 2D network have been realized. Electrical design of 9 bits reconfigurable architecture that implements a bank of 8 spatio-temporal filters including the effects of the nonhomogeneities on the classification rate has been also reported [IV8].

Publications

[IV1]

Liviu Goraş, Iolanda Alecsandrescu, Ion Vornicu, “Spatial Filtering Using Linear Analog Parallel Architectures”, International Symposium on Signals, Circuits and Systems, ISSCS 2009, Volume 2, pp. 409-412.

[IV2]

Liviu Goraş, Ion Vornicu, “Log-Domain CMOS Implementation of a Class of Analog Parallel Architectures”, International Semiconductor Conference, CAS 2009, Volume 2, October 12-14, Sinaia, pp. 499-502.

[IV3]

Ion Vornicu, Liviu Goraş, “Image processing using a CMOS analog parallel architecture”, International Semiconductor Conference, CAS 2010, 11-13 October 2010, Vol. 2, pp. 461-464.

[IV4]

Ion Vornicu, Liviu Goraş, “32x32 parallel analog architecture for image processing using log-domain active pixel”, Acta Technica Napocensis, Vol. 51, Number 4, 2010, pp. 45-50.

[IV5]

Liviu Goraș, Ion Vornicu, “Spatial Filtering Using Analog Parallel Architectures and Their Log-Domain Implementation”, Romanian Journal of Information Science and Technology, ROMJIST 2009, pp. 73–83.

[IV6]

Liviu Goraş, Ion Vornicu, Paul Ungureanu, “Topics on Cellular Neural Networks”, „Handbook on neural information processing”, editors: M. Bianchini, M. Maggini, L. Jain Eds. Elsevier, publisher Springer Verlag, in press (2012).

[IV7]

Ion Vornicu, Liviu Goraș, “On the design of a class of CNN’s for ECG classification”, 20th European conference on circuit theory and design (ECCTD 2011), pp. 153-156.

[IV8]

Ion Vornicu, Liviu Goraș, “On the Possibilities of Using a Class of CNN’s for Texture Classification, 20th European conference on circuit theory and design (ECCTD 2011), pp. 237-240.

64

cmos implementation of a class of cellular neural ...

cmos implementation of a class of cellular neural ...

Suggest Documents

A Scalable FPGA Implementation of Cellular Neural ... - CiteSeerX

HARDWARE IMPLEMENTATION OF A FEEDFORWARD NEURAL ...

A Neural Implementation of Multi-Adjoint Logic

A Class-F CMOS Oscillator - Semantic Scholar

Neural implementation of psychological spaces

Implementation of Artificial Neural Network

Design guidelines of CMOS class-AB output stages: a tutorial

analog cmos implementation of a neuromorphic oscillator with current ...

Analysis and CMOS Implementation of a Chaos-Based ... - IEEE Xplore

SOI CMOS Implementation of a Multirate PSK ... - CiteSeerX

Class 10: CMOS Gate Design

Issues in the Implementation of a 60GHz Transceiver on CMOS

Simulation and Implementation of a Filter-less CMOS ...

A 0.18 Âµm CMOS Implementation of On-chip ... - DATE Conference

CMOS Implementation of a Pulse-Coupled Neuron ... - Semantic Scholar

Design and Implementation of A CMOS Light Pulse Receiver ... - MDPI

A Compact FPGA Implementation of a Bit-Serial SIMD Cellular ...

Stochastic cellular automata model of neural networks

Optimisation and Robustness of Cellular Neural ...

Reversal of Cellular Phenotypes in Neural Cells

Implementation of Neural-Cryptographic System - Journal of ...

Design and implementation of cellular manufacturing ...

cellular neural networks: a review - CiteSeerX

Implementation of an Asynchronous Cellular Logic ... - CiteSeerX