Computer Aided Design of Fuzzy Systems based on ... - CiteSeerX

2 downloads 479 Views 444KB Size Report
Hypertext-based Design Manager. Fig. 13. Structure: Processor Instantiation. Printed Circuit. Rule Memory. Generation Software. Rule Coding Bitstream.
Computer Aided Design of Fuzzy Systems based on generic VHDL Speci cations Thomas Hollstein, Saman K. Halgamuge and Manfred Glesner

Abstract | Fuzzy systems implemented in hardware can operate with much higher performance than software implementations on standard microcontrollers. In this contribution three types of fuzzy systems and related hardware architectures are discussed: standard fuzzy controllers, FuNe I fuzzy systems, and fuzzy classi ers based on a neural network structure. Two Computer Aided Design (CAD) packages for automatic hardware synthesis of standard fuzzy controllers are presented: a hard-wired implementation of a complete fuzzy system on a single or multiple Field Programmable Gate Arrays (FPGA) and a modular toolbox, called fuzzyCAD, for synthesis of reprogrammable fuzzy controllers with architectures due to speci ed designer constraints. In the fuzzyCAD system, an ecient design methodology has been implemented, which covers a large design space in terms of signal representations and component architectures as well as system architectures. VHDL descriptions and usage of powerful synthesis tools allow di erent technologies easily and eciently to be targeted. In the last part of this contribution, properties and hardware realizations of fuzzy classi ers based on a neural network are introduced. Finally future perspectives and possible enhancements of the existing toolkits are outlined. Keywords | Fuzzy systems, fuzzy hardware, fuzzy controller, fuzzy classi er, generic VHDL fuzzy modules, fuzzyCAD, neuro-fuzzy systems, FPGA designs.

I. Introduction

The functionality of a fuzzy system can be acquired from either expert knowledge or from training data. This source of knowledge has an basic impact on the fuzzy system structure to be applied. Generally a multiple input multiple output (MIMO) fuzzy system can be abstracted as a system with n inputs fX ; : : : ; Xn g and m outputs fZ ; : : : ; Zm g (Fig. 1). 1

1

Fig. 2 shows an example for the operation of an standard fuzzy controller. Low

High

Medium Low

OR

Slow

Standard Fuzzy Controller (Mamdani)

Fast

AND

Fig. 2. Standard Fuzzy Controller (Mamdani)

Without a-priori knowledge of F , the number and form of the rules, membership functions and the defuzzi cation parameters have to be generated by neuro-fuzzy software based on training data fX; Z g. Examples of two such models known in the neuro-fuzzy eld are shown in the Figures 3 and 4. W1 Low

High

FuNe I Fuzzy System

OR K1

Medium

Low

W2

Sig(Σ K iWi )

AND K2

X1

Z1

Fuzzy System

Xm

Zm

Fig. 1. General MIMO Fuzzy System

Assume, that the functionality of this system is described by the functional relation F :X !Z (1) with X = fX ; : : : ; Xn g and Z = fZ ; : : : ; Zm g. If F is acquired from expert knowledge, the fuzzy system can be realized by a classical standard fuzzy controller implementation, which has been introduced by Mamdani [MA75]. 1

1

The authors are with the Institute of Microelectronic Systems, Darmstadt University of Technology, Darmstadt, Germany, E-Mail: [email protected]

Fig. 3. FuNe I Fuzzy system

Comparing the standard fuzzy controller operation with these alternative fuzzy systems, it is obvious that the major di erence is in the conclusion/defuzzi cation parts. The FuNe I fuzzy system performs the defuzzi cation by weighted addition of singletons, passing the result through a sigmoid function. In the classi er fuzzy system, which is also a neural network that can be interpreted as a fuzzy system [HPG95], [HG95], the defuzzi cation is not required, since a decision about the membership to an output class [Bez93] is sucient. The three introduced fuzzy models are considered by the authors for designing fuzzy hardware. Every type of fuzzy system can be implemented on standard microcontrollers.

Low

Classifier Fuzzy System

OR

High

K1

MAX( K i)

Low

Medium

AND K2

Fig. 4. Classi er Fuzzy System

As soon as dedicated fuzzy hardware is taken into consideration, restrictions can apply due to specialized hardware modules. Assume that a software programmable fuzzy hardware covers a functionality space S = fS ; : : : ; Sk g, where all Si (i 2 f1::kg) are possible fuzzy systems, which can be realized. Any projected fuzzy system functionality F can be mapped on this hardware, if 9Si 2 S with Si = F; i 2 f1; ::; kg (2) If jS j = 1, the hardware is a fully hard-wired implementation with a xed rule base. This special case is especially interesting for rapid prototyping on RAM-based FPGAs, where a software-programmable rule-base is not required, since the whole system can be re-synthesized, if rule modi cations are required. In the following sections, hardware implementations and synthesis toolkits for the previously introduced fuzzy system types will be described. The general structure for standard fuzzy controllers and a similar hardware architecture, which can be con gured by the software neuro-fuzzy system FuNe I [HG94] are shown in Fig. 5. 1

Designer: Hardware Architecture and Constraints

Rules and Membership functions

typing purposes can be generated for an FPGA technology (Xilinx). By use of the advanced system fuzzyCAD, based on generic VHDL descriptions, reprogrammable fuzzy controllers can be synthesized for di erent ASIC target technologies. The system architecture and module selection is in uenced by interaction of the designer in order to meet the required timing and area constraints. By selecting a dedicated defuzzi cation module, a special architecture can be synthesized, which can be con gured by FuNe I (o -line software training). The hardware requirements for the third system (neurofuzzy classi er) are totally di erent, since the structure of this system is based on a three-layer neural network. This fuzzy-interpretable neural network can be trained on the chip. The amount of neurons in the hidden layer is not xed and can be varied during the training process. Therefore a special generic systolic array architecture is required for the hardware implementation. An initial structure (number of hidden neurons, initial edge weights) can be programmed based on optionally available expert knowledge. The structure of this system is shown in Fig. 6. Expert Knowledge for Neural Network Structure Initialisation

Training Database

Designer: Systolic Array Structure and Size

Target Lib.

Hardware Synthesis (Synopsys)

Generic VHDL Component Library

X

Neuro-Fuzzy Classifier Hardware

Z

Fuzzy System Operation Pre-Configuration by Software

Training Database

Technology Target Library

Hardware Synthesis On-Chip Training

fuzzyCAD FuNe I Software

X

Design Toolkit & Synopsys Synthesis

FuNe I config. Fuzzy System

Z

X

Programming Software Generic VHDL Module Library

Standard Fuzzy Controller

Fig. 6. Design and Con guration Flow: Neuro-Fuzzy Classi er

This special fuzzy system structure and related hardware is described in the last part of this contribution. Z

II. Rapid Prototyping of Fuzzy Systems on FPGA Target Architectures

A toolkit FUZ2LCA for automatic generation of application speci c fuzzy controllers, using a high-level fuzzy language input, has been successfully implemented and tested [HHKG94]. A similar approach is described recently in [Hun95]. Advantages of direct hard-wired implementation Fig. 5. Design and Con guration Flow: Standard Fuzzy Controllers is the minimum hardware overhead in both the data-path (Mamdani) and FuNe I con gurable Systems and the controller of the design, which leads to a miniBased on a generic netlist module library, hard-wired im- mum number of logic cells required on the target devices. plementations of standard fuzzy controllers for rapid proto- SRAM-based Field Programmable Gate Arrays (FPGA) Fuzzy System Operation

Configuration/Programming by Software Hardware Synthesis

Fig. 8. Fuzzi cation block

Fig. 7. Generated fuzzy hardware

are well-suited for prototyping purposes due to their reprogrammability. An additional advantage is, that the con gurable logic cells may also be used eciently for an on-chip realization of small SRAM memory blocks. The compiler FUZ2LCA automatically creates a design of a complete standard fuzzy controller, based on netlist module library with generic parameters. A large design space in terms of timing and area can be covered, since the designer can the number of computation units to work in parallel. Fuzzy systems, written either in the C programming language or a type of fuzzy programming language (e.g. Togai's FPL language) can be synthesized and converted to Xilinx Netlist Format (XNF). This enables the user to de ne the fuzzy system in a problem speci c manner. Problems arising in mapping speci cations of large fuzzy systems can be solved by e ectively partitioning the design into several FPGAs. Each fuzzy system design consists of three modules or functional units: fuzzi cation, rule inference, and composition/defuzzi cation. All modules have their own local controllers allowing them to operate independently. The user can set parameters depending on the availability of hardware resources and the required speed, so that a highly parallel design, a completely sequential design or a compromise can be is created. Due to the high time consumption of many commonly used methods, the defuzzi cation unit should normally operate in parallel to the fuzzi cation and inference units. The system controller supports, depending on user selectable parameters, both sequential and pipe-line modes. Due to the modularity FUZ2LCA can be easily extended by adding alternative modules. In addition to the FPGAs external memories are needed for storing antecedent and consequent membership functions (MSF in Fig. 7).

A. Fuzzi cation Unit Membership functions Xk ;i can be easily stored as lookup tables, using two di erent external RAM blocks [SU95]. All odd numbered membership functions

(i 2 f1; 3; 5; : : :g) are stored in an 'odd'-RAM block, while the even numbered membership functions (i 2 f0; 2; 4; : : :g) are stored in an 'even'-RAM block (Fig. 8). The restriction in this method is, that at maximum only two membership functions can overlap, but the RAM blocks can be accessed eciently in parallel. B. Rule Inference Inference is the process where the evaluation of the premise and the consequent membership function of a single rule is performed (Fig. 9 and leftmost part of Fig. 10), where as in the composition the inference results of many rules are combined (center part of Fig. 10, here: 'max' operation). Three di erent types of rule evaluators can be generated:  simple evaluators that can either read or negate the membership value of an input  Min/Max rule evaluators for rules with less complexity  complex rule evaluators with maximum of 16 Min/Max operations and parenthesis hierarchies The premises are evaluated by using a single rule evaluator or several in parallel depending on the rule base complexity and timing constraints. The initially implemented, but easily extendable inference/composition method is Min/Max. The outcome of the composition is directly piped into the parallely running defuzzi cation (no additional intermediate memory required). C. Defuzzi cation The defuzzi cation unit normally is the most time consuming module, especially if a very resource consuming method, such as COG, is implemented. Two steps are taken in order to overcome this problem:

ω

A1

A2

0

l

u

n

z

Fig. 11. Ecient implementation of MOA defuzzi cation

Fig. 9. Rule inference

 Mean Of Maxima (MOM), the center of gravity of the

area under the maxima of fuzzy output. MOM = zout

i2M zi

P

(5)

jM j

M = f i j !i = max(! ; : : : ; !n) g 1

! ; : : : !n are the curve segments which originate from 1

the corresponding consequent membership function segments Z; ; : : : ; Z;n (after composition).  Center of Mean (COM), the middle of the area under the maxima of fuzzy output, introduced in [HHKG94]. 1

h

X

Fig. 10. Composition and MOA defuzzi cation

i

si =

=1

 the defuzzi cation module is always generated as a

parallely running module  less time consuming methods are generated with ecient hardware structures Midpoint of Area (MOA), also known as Center of Area (COA), Mean of Maxima (MOM), are the standard defuzzi cation methods ([DHR93]), which can be implemented ef ciently in hardware. Considering the composition output curve as ! (normalized to the maximum value unity: 0  !  1), and denoting Zval as the nite set of possible normalized output values of a fuzzy controller with Zval = fz ; :::; zi ; :::; zn g, the di erent defuzzi cation methods can be formalized as follows:  Center Of Gravity (COG), the center of gravity of the area n 0

P

COG = i zout n P =0

!i zi

i

=0

!i

 Midpoint Of Area (MOA), the middle of the area h

X

i MOA . where zh = zout

=0

!i =

n

X

i h =

!i

where

n

X

i h

si

(6)

=

8k 2 f1; : : :; ng 

!i if !i = max(!k ) 0 if !i < max(!k ) COM The defuzzi ed crisp output is zh = zout si =

The standard MOA method and two variations of it can be generated as defuzzi cation units [HHKG94]. The hardware implementation of the MOA method is depicted in Fig. 10. The pointers zu upper and zl lower cover the output range starting from the lower and the upper limit respectively and moving stepwise towards their meeting point. The area underneath the fuzzy output shape is added to a register as the pointer on the left moves and subtracted from this register as the pointer on the right moves. The meeting point zm equals the crisp output, since (3) the integration is performed such, that the condition X

l

i

=0

!i ,

n

X

i u =



!i = min

(7)

!

(4) is optimized in every step (see Fig. 11: A = A ). Since complex operations such as multiplications or divisions are not involved in the MOA strategies, these methods are much faster than COG. 1

2

Designer Interaction

fuzzyCAD Hypertext-based Design Manager Instantiation of selected structures

Fig. 12. Fuzzy truck control

D. Application Example After several tests, the compiler has been successfully applied for generating a fuzzy controller for the fuzzy truck with trailer, described in [HRG94]. This fuzzy controller consists of 11 fuzzy rules, 2 inputs and 1 output (each with 5 membership functions), employs Max-Min inference/composition and MOA defuzzi cation (Fig. 12). A 4-bit version of the generated fuzzy controller (the accuracy is sucient for this application), could be implemented in a XC4006-FPGA and only 42s are needed for calculating a new output. This result can be compared with standard solutions such as DSP-TMS320 (150 S), and special fuzzy solutions of Togai ASIC FC110 (32 S). III. fuzzyCAD: New Module oriented VHDL-based Design Approach

Based on the experiences with the previously described automated fuzzy controller implementations on FPGAs, a completely modular fuzzy controller design toolkit is developed. Compared to the FPGA solution, which was a pure rapid prototyping approach, the VHDL-based approach provides more exibility and is intended to become a CAD system for exible customer speci c solutions. The decision for VHDL as description language has been made, since a lot of design experience with VHDL speci cations and the SYNOPSYS simulation and synthesis toolkits were already available. The new system is not restricted to one target technology and the realized controller is fully reprogrammable by software. The user is able to design a MIMO standard fuzzy controller according to requirements of one or more application domains. Advantage of the library oriented VHDL concept is the exibility concerning integration of new modules and the possibility of making a rough estimate of resulting timing and area costs. Another bene t of this concept is the reduced simulation e ort, since the modules have already been tested many times (reuse of design components). So the main simulation e ort is given by the validation, whether the selected bit widths are sucient for the required computation accuracy. The fuzzy controller parameters, which are determined and xed by the design process are:  number ninp of input and noutp of output signals  number nMFin of membership functions (MF) for input signals

VHDL Module Library

Fuzzy Controller Frame Fuzzification Unit

Rule Evaluation Unit

Inference/ Defuzz. Unit

Fig. 13. Structure: Processor Instantiation Rule Coding Bitstream Generation Software Fuzzy System Knowledge Database

Fuzzy Controller

Printed Circuit Board Rule Memory (EE)PROM

Fig. 14. External Rule Con guration Memory

 number nMFout of MFs for output signals  the maximum overlap ovmax of MFs (the maximum

number of MFs which can produce a non-zero value for one crisp input value)  MF storage technique  external and internal bit widths  number and capabilities of parallel running rule evaluation modules  defuzzi cation method In combination with previously mentioned parameters, the user has an in uence on timing and area of the resulting implementation. A. Overview: General Structure The toolkit consists of a modular generic VHDL description library and a con guration software tool. Selection of VHDL modules and setting of generic instantiation parameters of VHDL modules will be automatically/interactively performed by this CAD program due to speci c user requirements. The structure for instantiation can be seen from Fig. 13. For application support of the controller chip, a bit stream generator program for binary rule coding will be provided. The bit stream can be stored in a (EE)PROM which is located on the board adjacent to the fuzzy controller and is read once after power-up (Fig. 14). B. The Design Flow The complete design ow can be seen from Fig. 15:

fuzzyCAD Design Manager Library-based Instatiation and Composition of VHDL Description

degree of membership

VHDL Module Library

RAM 0

MF 0

MF 3

crisp input degree of membership

degree of membership

VHDL Description of Fuzzy Controller

MF 0

MF 1

MF 2

MF 3

RAM 1 MF 1

MF 4

MF 4

crisp input

crisp input

High-Level and Logic Synthesis

degree of membership

RAM 2 MF 2

crisp input

Target Technology Netlist

Fig. 16. Overlap-free Membership Function Storage degree of membership

Standard Cell / FPGA Design Software

y4=255 m3

y_s y3

m4 m2

Physical Layout Description File (ASIC) or Device Configuration Bitstream (FPGA)

y5 y2 m5 y6 m1 y0,y1,y7 0=x0

m6

m0

m7 x1

x2

x3

x_s

x4

x5

x6

x7

crisp input

Fig. 15. Design Flow: Generic Fuzzy Processor

Fig. 17. Membership function approximation

Using the fuzzy controller design software fuzzyCAD, a VHDL description of the complete controller is generated. This VHDL source code can be mapped on a standard cell or FPGA target library using a high-level design tool (SYNOPSYS in our case). With vendor speci c design software, this net list can be compiled to a physical implementation (layout). Simulation can be performed on every level of abstraction in order to validate the processor functionality.

C.2 MF Shape: Piece-wise linear Representation In many neuro-fuzzy approaches, where fuzzy systems are automatically generated ([HG94], [HPG95], [HG95]), the resulting membership functions can be of bell shape. The typical approach utilizing look-up tables for fuzzi cation is inecient in many cases, because of the requirement of huge fuzzi cation memory. One solution to this problem is the approximation of membership functions using straight lines as shown in Fig. 17. In this example each membership function form is represented with maximum of 8 straight lines reducing the memory capacity. In case of implementing a sigmoidal membership function 256 bytes are needed for a simple lookup-table compared to 24 bytes which are needed for the approach with membership function approximation. Each _ , x )+y is characterized by the three paramline Y = a (X eters: the tangential coecient mi , and the coordinates of the leftmost position of the line x and y . Three memory words distributed in three short memories or concatenated to one bit string may contain those parameters. C.3 MF Shape: Look-Up Table Representation For high-speed MF access, the look-up table representation is well-suited: For every crisp input value, the MF-

C. Internal Membership Function Representation C.1 Overlap-free MF Storage For ecient defuzzi cation an overlap-free membership storage is very useful. The maximum overlap ovmax determines the number of RAM blocks, required for storing the MFs: R ; : : : ; Rovmax , . Generally a membership function MFi , i 2 f0; : : :; nMF , 1g will be stored in the RAM module Rx, where x = i mod ovmax . Fig. 16 shows an illustration for ovmax = 3. Overlap free MF storage is very e ective for defuzzi cation, since the ovmax RAM benches, storing the MF functions of an output variable, can be processed in parallel for inference computation. This is important, because the defuzzi cation operation is the critical bottleneck concerning performance. 0

1

i

i

i

i

i

Crisp Input 1

from MF Memory Module

Membership Function Memory

Crisp Input 2 MUX

en clk reset

REGISTER

and Fuzzification

FuNr.

Y2

FuNr.

Y1

FuNr.

Y0

Crisp Input n

Input Registers Output RAM

Output to Rule Evaluation Unit ram_adr_sel adr

IN

IN

ram_adr_sel

r/w en

Fig. 18. Structure of Fuzzi cation Unit

r/w en

adr

RAM 1

OUT

IN

ram_adr_sel

r/w en

adr

RAM 2

RAM 0

OUT

OUT

from Rule Eval. Module MINIMUM

for writing of rule weight

MINIMUM

MINIMUM

RAMs

related fuzzi ed values can directly be read out of the memory without additional computation e ort. This method is sometimes also more ecient for MFs with bent shapes, which would require a lot of base points for piece-wise linear representation. D. Fuzzi cation Unit Since the defuzzi cation is the most time-consuming unit, fuzzi cation can be performed serially without in uence on the global timing behavior of the circuit. Fig. 18 shows the data-path structure of this unit: Crisp inputs are captured in input registers and applied to the fuzzi er serially. The fuzzi er can be realized as look-up table. Since fuzzi cation is not too timecritical, in the presented approach the membership functions are stored piecewise linear and the fuzzi ed values are computed by interpolation. This also implies one search through the MF memory per input variable. Generic VHDL fuzzi cation unit entity: entity fuzzification is generic (

port ( : in std_logic ; : in std_logic ; : in std_logic ;

en_write_ram1 : in std_logic ; -- enable write to ram1 write_ram1 : in std_logic ; -- write to ram1 (from main controller) read_ram1 : in std_logic ; ram_copy : in std_logic ; -- read ram2 (from main controller) var : in num : in x : in

en

COMPARE

MAX MAX

en 1 Address applied to MF memory module

COUNTER (lower)

clk reset

COUNTER (upper)

en clk reset

0

Reg 0

0

1

hoechtbit

add_mux_sel

en clk reset

REGISTER

ALU

carry

Fig. 19. MOA Defuzzi cation: Inference and Integration Unit -- MF point: x coordinate y : in std_logic_vector( y_width -1 down-to 0 ); m : in std_logic_vector( m_width - 1 down-to 0); ein : in std_logic_vector(2**var_width * x_width-1 down-to 0); end_read_ram1 : out std_logic ; end_read_ram2 : out std_logic ; ram1_full : out std_logic ; out_ram2 : out std_logic_vector( var_width+num_width+y_width -1 down-to 0)); end fuzzification ;

var_width : integer; -- bit width for representation of # input variables num_width : integer; -- bit width for representation of # MF per variable x_width : integer; -- bit width x-input y_width : integer; -- bit width y-input m_width : integer; -- bit width m-input adr_width_ram2: integer; -- depth of output-memory adr_width_ram1: integer; -- depth of first memory ram1_width : integer ); -- width of first memory

start reset clk

calcul_end

std_logic_vector(var_width - 1 down-to 0); std_logic_vector(num_width - 1 down-to 0); std_logic_vector( x_width - 1 down-to 0);

E. Inference and Defuzzi cation Modules

The composition and defuzzi cation unit works similar to the defuzzi cation unit in the previously described FPGA prototyping system, using the midpoint-of-area method (MOA). Compared to the FUZ2LCA solution, the MF overlap can be any value ovmax now. The operational

ow is as follows: the integration begins at the zero-point of the output variable's value range. This value is applied to the MF memory and a certain number (< ovmax ) of non-zero MF values and the corresponding MF identi er are read out. This can be done fully parallel, since the MFs are stored overlap-free. Then a minimum operation is performed on these MF values and the corresponding rule weights. The results are fed into a max-tree and the nal value is used for the MOA defuzzi cation (integration), e.g. it is added to the accumulator or subtracted (depending on the actual integration direction). Depending on the sign of the integration result (stored in the accumulator register) the integration direction for the next step is determined. Fig. 19 shows the operational unit for defuzzi cation.

F. Programmable Rule Evaluation Kernel The rule evaluation kernel is programmable by software. During the design phase, the number of parallel running rule evaluators, their type and rule memory size is xed. Three classes of rules can be processed: 1. Trivial Rules: if a is high then x is medium 2. Normal Rules: conditions chained in a sequence with AND, OR and NOT operators 3. Hierarchical Rules: multilevel-nesting with parenthesis; operators: AND, OR, NOT. For every class of rules, instances of rule evaluators can be created. For normal applications the classes 1 and 2 are of primary interest. Rule evaluators of a class n may also evaluate rules of class m with m < n. The user has direct in uence on the created number of rule evaluators of each class. For low performance applications it may be sucient to create only one rule evaluator of the most complex class to be processed later on. All rules will be processed sequentially and the chip area (costs) is minimal. For high-performance applications multiple rule evaluators of one or di erent classes may operate in parallel. Each evaluator can process multiple rules sequentially. IV. Implementation of FuNe I Fuzzy Systems

Two possibilities can be considered for the real-time implementation of a fuzzy systems automatically generated by neuro-fuzzy systems. The rst and simple method is the direct implementation of all neurons and interconnections. The second method is to implement it as a fuzzy system. In case of a FuNe I multilayer perceptron based system, the computation time and the complexity of hardware needed for the rst method is much higher than that for the second method. But for fuzzy interpretable neural networks based on nearest prototype classi cation, the rst method is more appropriate as described in Section V. Although the FuNe I type fuzzy systems designed with o -line software training can be implemented in commercially available fuzzy processors, an application speci c design would increase the speed. The design must be easily con gurable for di erent generated fuzzy systems. The rst hardware implementation has been a simple FPGA design. The FuNe I fuzzy system with 4 inputs and 3 outputs, extracted from the popular Iris data set [And35] is implemented in a single Xilinx FPGA 4005 chip. This design is used in a prototype board that can be connected to a personal computer via ISA bus for the visualization of classi cation results. The typical approach utilizing look-up tables for fuzzi cation can be inecient in cases with high fan-in, because of the requirement of huge fuzzi cation memory. One solution to this problem is the approximation of membership functions using straight lines as indicated in a previous section (see also Fig. 17). A comparison of performance of the two e orts described is summarized in Table I

V. Real-time Fuzzy Interpretable Classifiers

Although FuNe I fuzzy systems can be eciently con gured in FPGA based prototype boards as discussed above, the o -line neural network training for designing is hardly implementable in FPGAs due to area limitations. But several new methods for the generation of fuzzy classi cation systems were presented, that can be implemented in FPGAs:  Dynamic Vector Quantisation (DVQ) Variations (DVQ2 and DVQ3)as improved versions of Learning Vector Quantisation networks [HG95]  Cubic Basis Function Networks (CBFN) (deduced from famous Radial Basis Function Networks) with modi ed Restricted Coulomb Energy (MRCE) learning presented in [HPG95]. Since those methods can be considered as nearest prototype neural networks, the distance between an input vector and all the reference vectors are calculated to decide upon the class membership of an input vector. The prerequisite is the selection of a distance measure with less computational intensity. The distance measure used in competitive learning can be more generally de ned by the Minkowski metric [Koh89]: n

~ = (X jI , W j ) = ; (~I; W) d

d

d

(8)

1

=1

The most commonly used measure the Euclidean distance ( = 2), the City block distance ( = 1) and the Maximum ( ! 1) can be derived from this general form. A. Computationally Feasible Distance Measures The Euclidean distance, though reported good simulation results, contains a multiplication operation per dimension, which is an disadvantage in hardware implementation. Therefore, the city block distance and the Maximum measure are compared. X

X

2

Maximum distance in all dimensions

X

2

City block distance

1

X

1

Fig. 20. Points of equal \distances" for di erent distance measures

n

~ = X jI , W j city block dist (~I; W) d

d

=1

d

~ = MaxjI , W j : 1  d  n max dist (~I; W) d

d

(9) (10)

Features Design 1 Design 2 Inputs 4 128 Output 3 4 Mem. func. per input 4 128 No. of rules 16 256 Type of FPGA XC4005 XC4006 No. of sigmoids (EPROMs) 3 (256 bytes) 4 (256 bytes) No. of memory units 1 (256 bytes 3 (8 bytes for storing mem. func. per mem. func.) per mem. func.) Speed in million rules per second 1:25 1:25 TABLE I

Comparison of two FuNe I fuzzy system implementations

Number of false Number of false Data set Classi cations Classi cations city block dist 6 3 0

Iris Solder Digit

TABLE II

max dist 6 5 26

Comparison of distance measures

Both measures describe the points with equal \distances" as squares for a two dimensional input space (inputs ~I = fX ; X g) as shown in Fig. 20. The iris data set Iris [And35] is a real world data set, that can also be considered as a benchmark, having 4 inputs and 3 classes. The Solder data set is from a real world application [HPmG93], consisting of 23 inputs, classifying solder joints into 2 classes, \good" or \bad". In this paper authors use another real world data set Digit from the optical digit recognition containing 36 preprocessed inputs, and 10 outputs [HG94]. The complexity of the data sets Iris, Solder and Digit increases in terms of the number of inputs . Comparison in Table II indicates that the City block distance is better, specially for complicated large data sets such as Solder, Digit. 1

2

B. Fixed Point Calculation A reduced xed point format should be found without a signi cant degrading in performance. The error versus the number of bits are analyzed for di erent data sets limiting the number of reference vectors generated per each class to one. The simulation results clearly indicate that the number of bits needed is less than 6 even for complex classi cation examples (Fig. 21). The number of neurons per class remain constant throughout the simulation since the dynamically adding neurons also compensate the computational accuracy.

C. FPGA Architecture for Parallel Processing One can consider most of the tasks solved by neural networks as operations with arrays (e. g. input vectors and weights). These tasks are often solved with highly parallel architectures. Very often systolic arrays, as best suited structures to this class of problems are used. For performing nearest neighbor decisions with m n-dimensional reference vectors, a 2D systolic array with m rows and n + 1 columns can be used (Fig. 22). Reference vectors are stored in processing elements (PEs), one in each row of the systolic array. The rightmost additional column is used to nd the smallest distance. In case of CBFN, additional elements ~rj = fr ; r ;   g, which determine the extensions of the hyper box from ~ j = fW ; W   g to the dimensions d = the center W f1; 2;   g, when the input vector ~I = fX ; X ;   g is presented, are stored into each PE. Whatever the signal a PE receives from the left neighbour in the row, it has to pass the original input element X to the next PE in the column. If a PE receives `1' from its neighbour:  if jX , W j  r , it passes the `1' to the neighbour in the row;  otherwise it passes `0' to the neighbour in the row. The output of the PEs in the row will be thereafter `0', since the input element is not in the attraction region of this hyper box. One-dimensional systolic arrays can also be considered for implementations (Fig. 23). Then no parameter data is stored in PEs and the parameters of all hyper boxes are applied to the array one after another. If the number of reference vectors (hyper boxes) is m and number of dimension is n, m(n + 2) + n cycles are needed to make a nearest neighbor decision. The last PE in the row stores the actual smallest distance and compares this with its input. In this way the smallest stored distance is updated. Considering hardware restrictions of FPGAs, systolic arrays with many elements could be hardly implemented. It is also possible to make an array with a feedback loop. If Single Instruction Multiple Data (SIMD) arrays are used for nearest neighbor classi cation, reference vectors j1

j1

j2

1

d

d

jd

jd

2

j2

ADDRESS

50

RAM

RAM

RAM

PE

PE

PE

45

40

"iris_data" "solder_data" "tyre_data"

35

Errors

30

25

COMMAND

INPUT

DATA IN

20

VECTOR

DATA OUT

15

CPU

10

Fig. 24. SIMD array

5

0 16

14

12

10

8 6 Accuracy [Bit]

4

2

0

Fig. 21. Classi cation error for di erent xed point formats

X1

X2

Xn

r 11 w11

r 12 w12

r 1n w1n

r 21 w21

r 22 w22

r 23 w23

r m1 wm1

r m2 w m2

r mn wmn

Min

Fig. 22. Systolic array (2D) for nearest neighbor classi cation

X1 r 11w11

X 2 r12w12

X n r1n w 1n

Fig. 23. Systolic array (1D) for nearest neighbor classi cation

are stored in local memories, either connected (external) or integrated into each PE. It takes n cycles for a PE to compute the distance between a n-dimensional reference vector and an input vector. If there are k PEs, then k distances are computed in parallel. It takes n + k cycles to get the result, if outputs of all PEs are sent to a common data bus sequentially. To classify m input vectors it takes m (n + k ) cycles. A SIMD array solution which is more k appropriate to DVQ variants is illustrated in Section V-D. C.1 Comparison of Di erent Architectures Considering the large application Digit there should be up to 10 output classes. The feature space should have up to 36 dimensions and the number of dynamically generated neurons can be limited to 150. For every dimension of the feature space 6 to 8 bits can be used. The rst architecture (Fig. 22) is hardly implementable on 4 FPGAs, because of the high number of PEs. Although these PEs are simple, not more than 10 of them can be implemented on one chip (with external memory). But it's also important to consider, that there exist many connections between elements, which in turn, take a lot of routing resources of the FPGA. In case of assigning a generated neuron to a PE, the number of PEs needed for this solution is very large: 150  36 + 150 = 7550. The second and the third solution seem to be more suitable for FPGAs, but still the rst of these two architectures is dicult to implement (Fig. 23). If there are 36 dimensions, 37 PEs are needed. This means, that 3 of the FPGA chips should contain 9 PEs and one chip 10 PEs. If three parameters are inputs to each PE simultaneously (X ; W ; r ), every PE needs 3  8 = 24 I/O pins and overall 10  24 = 240 I/O pins are needed, but a XC4013 FPGA has only 192 I/O blocks. If data is presented to PEs sequentially with 4 bit size, the number of available I/O blocks is sucient, but it is, of course, twice as time consuming and every PE needs also 3 4-bit registers for input data. The third solution (1D systolic array with a feedback loop) is somewhat slower than the second one, but seems to be more easily d

jd

jd

cnt w

db ir

ac

enb

ac - calculated distance db - data bus buffer ir - instruction register enb - enable buffer cnt - RAM addressing register w - RAM for weight vectors

Fig. 25. Registers of a processing unit

implementable - the number of PEs is less and the number of registers which are needed for input data. Since the neurons, representing reference vectors, are dynamically created, nding the optimal number of PEs is complicated and the highest classi cation speed is seldom achieved. Concerning the solutions which are presented above, none of these 3 architectures allows training. In case of CBFN with MRCE learning, the dimension which causes the misclassi cation must be known. Here the systolic array solution shows only, that the classi cation result is correct or not. If SIMD array is used for implementing a nearest neighbor classi er and CBFN, every PE needs at least 4-5 registers and a separate control unit that can be implemented in DSP. If an application works in real time and calculation speed of one single input vector has to be as high as possible, a SIMD array is the best solution. D. SIMD Array with FPGAs

Each dynamically generated neuron is assigned to a processing unit in a FPGA, which calculates the distance between an input vector and the neuron. Since the distance to all the reference vectors (processing elements) should be calculated for each input vector, a SIMD array can be used for this purpose. Since no multiplication is used in training as well as in recall operation, the algorithms proposed seem to be very e ective. Although it is sucient to have a real-time implementation with a DSP-board for classi cation benchmarks such as Iris data, it is absolute necessary to implement parallel parts of algorithms with the SIMD array for complicated applications (e.g. Digit data set, where number of reference vectors generated is at least 46 (in the case of DVQ3). Even though the clock rate for FPGAs are several times less than that for a powerful DSP solution, the overall speed gain is much higher for such applications due to the exploitation of massive parallel implementation of processing units in FPGAs. Depending on the application either CBFN/MRCE, DVQ2 or DVQ3 can be selected. In case of highly representative training data sets CBFN/MRCE is appropriate. For other applications either DVQ2 or DVQ3 more suitable, since the CBFN/MRCE method does not have the ability to generalize properly. For highly overlapping data sets DVQ3 gives the best results due to its excellent generalization capability.

Fig. 26. Structure of a PE

1 1101 1011 0100 0011

2 1101 1011

3 1101

4 1101

MSB

LSB

Fig. 27. Calculation of minimum distance

move to the next bit. D.1 Distance Calculation The registers implemented in a processing unit are shown nal result is to inverted to get the minimum distance. in Fig 25. The 64 word RAM w with 6 bit word length The In the stores the weight vectors for 64 dimensions. The accumu- \0010".example shown in Fig. 27 the minimum distance is lator ac stores the calculated \distance" and cnt is used for indirect addressing of RAM. VI. Conclusions and Future Work Table III shows the number of Con gurable Logic Blocks The presented compiler FUZ2LCA for automatic gen(CLBs) occupied by each register of a PE. eration of fuzzy controller implementations on FPGAs is tested with several application examples. Since rules are register width [Bit] Number of CLBs hard-wired, this concept had to be improved for the autow, RAM 6x64 12 mated design of fuzzy ASICS, where re-programmability of ac 12 6 the rules is required. Since high-level design entry makes ir 3 1.5 possible the mapping on di erent target technologies (stancnt 6 3 dard cell libraries, FPGA libraries, ...), a VHDL-based apdb 6 3 proach is well-suited for a fuzzy CAD toolkit. The addienb 1 0.5 tional advantage is that the system can already be simTABLE III ulated on behavioral level in order to validate, if the seNumber of CLBs needed for registers lected bit widths for internal and external signals are sucient for achieving a required computation precision. Since the whole system is instantiated from a generic VHDL liThe following register transfers are implemented for cal- brary, basic rough design faults can be excluded. The basic culation of the city block distance. modules of the VHDL library are already available. Currently the modules are integrated into the complete con adac calculates the city block distance accordtroller structure. The fuzzyCAD design manager is curing to the equation 9: ac ac + j w(cnt) { db j rently implemented with an hypertext-based user interface. cnt cnt + 1 Furthermore implementation cost models implemented as  ldcnt load the index register cnt: estimators for timing and area will be developed. Addicnt db tionally a defuzzi cation module for a FuNe I con gurable  rdac reads the accumulator ac of a PE: fuzzy system is currently speci ed in VHDL. The neurodbus ac fuzzy approaches can either deliver fuzzy modules that can  ldac initialize the accumulator ac: be implemented by the generic fuzzy processor or they are ac db hardware friendly and fuzzy interpretable neural structures  ldwc initialize the weight vectors of a neuron: that are directly considered as fuzzy hardware solutions. w(cnt) db cnt cnt + 1 References The processing elements described in VHDL are synthe- [And35] E. Anderson. The Irises of the Gaspe Peninsula. Bull. Amer. Iris Soc., 59:2{5, 1935. sized with a commercial tool (Synopsis) to get the FPGA [Bez93] J. C Bezdek. A Review of Probabilistic, Fuzzy, and net list (see also Fig. 26). Neural Models for Pattern Recognition. Journal of Intelligent Fuzzy Systems, 1, 1993. D. Driankov, H. Hellendoorn, and M. Reinfrank. An Introduction to Fuzzy Control. Springer-Verlag, USA, 1993. S. K. Halgamuge and M. Glesner. Neural Networks in Designing Fuzzy Systems for Real World Applications. International Journal for Fuzzy Sets and Systems, 65(1):1{12, 1994. North Holland. [HG95] S. K. Halgamuge and M. Glesner. Fuzzy Neural Networks: Between Functional Equivalence and Applicability. IEE International Journal on Neural Systems (in press), 1995. World Scienti c Publishing. [HHKG94] S. K. Halgamuge, T. Hollstein, A. Kirschbaum, and M. Glesner. Automatic Generation of Application Speci c Fuzzy Controllers for Rapid Prototyping. In IEEE International Conference on Fuzzy Systems' 94, Orlando, USA, June 1994. [HPaMG94] S. K. Halgamuge, W. Pochmuller, and C. Grimm and M. Glesner. Fuzzy Interpretable Dynamically Developing Neural Networks with FPGA Based Implementation. In Fourth International Conference on Microelectronics for Neural Networks and Fuzzy Systems, Torino, Italy, September 1994. [HPG95] S. K. Halgamuge, W. Pochmuller, and M. Glesner. An Alternative Approach for Generation of Membership

D.2 Nearest Neighbor Calculation [DHR93] After the calculation of city block distances, the minimum distance has to be calculated. The method pre- [HG94] sented in Fig. 27 uses a Wired-OR bus for this purpose [HPaMG94]. Invert all the bits. starting from the most signi cant bit (MSB), for all the bits: for all PEs activated by the controller write the distances as binary numbers to the Wired-OR bus. If the resulting binary number on the bus is \1": deactivate all the PEs, that have written a \0" to the bus move to the next bit. If the resulting bit in the bus is \0":

Functions and Fuzzy Rules Based on Radial and Cubic Basis Function Networks. International Journal of Approximate Reasoning (in press), 1995. Elsevier. [HPmG93] S. K. Halgamuge, W. Pochmuller, and M. Glesner. A Rule based Prototype System for Automatic Classi cation in Industrial Quality Control. In IEEE International Conference on Neural Networks' 93, pages 238{ 243, San Francisco, USA, March 1993. IEEE Service Center; Piscataway. ISBN 0-7803-0999-5. [HRG94] S. K. Halgamuge, T. A. Runkler, and M. Glesner. A Hierarchical Hybrid Fuzzy Controller for Realtime Reverse Driving Support of Vehicles with Long Trailers. In IEEE International Conference on Fuzzy Systems' 94, Orlando, USA, June 1994. [Hun95] D. L. Hung. Dedicated Digital Fuzzy Hardware. IEEE MICRO, 15(4), August 1995. [Koh89] T. Kohonen. Self-Organization and Associative Memory. Springer Verlag, 1989. [MA75] E. H. Mamdani and S. Assilian. An experiment in linguistic synthesis with a fuzzy logic controller. IJMMS 7, 1975. [SU95] H. Surmann and A. P. Ungering. Fuzzy Rule-Based Systems on General-Purpose Processors. IEEE MICRO, 15(4), August 1995.