If we call and the -level sets of the fuzzy sets, respectively, and refer to the generic antecedent. ( is. ) ... parable with classical methods (center of gravity, mean of.
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 7, NO. 5, OCTOBER 1999
553
VLSI Hardware Architecture for Complex Fuzzy Systems Giuseppe Ascia, Vincenzo Catania, and Marco Russo, Member, IEEE
Abstract—This paper presents the design of a VLSI fuzzy processor, which is capable of dealing with complex fuzzy inference systems, i.e., fuzzy inferences that include rule chaining. The architecture of the processor is based on a computational model whose main features are: the capability to cope effectively with complex fuzzy inference systems; a detection phase of the rule with a positive degree of activation to reduce the number of rules to be processed per inference; parallel computation of the degree of activation of active rules; and representation of membership functions based on -level sets. As the fuzzy inference can be divided into different processing phases, the processor is made up of a number of stages which are pipelined. In each stage several inference processing phases are performed parallelly. Its performance is in the order of 2 MFLIPS with 256 rules, eight inputs, two chained variables, and four outputs and 5.2 MFLIPS with 32 rules, three inputs, and one output with a clock frequency of 66 MHz. Index Terms— Fuzzy logic, parallel architecture, pipelining, rule chaining, VLSI.
I. INTRODUCTION
I
N the last years, the use of fuzzy logic [1], [2] has become increasingly widespread thanks to its capacity to tolerate information expressed in a way that is uncertain and imprecise. This is demonstrated by the significant number of fields in which fuzzy logic is applied, such as process control [3], [4], decision-making support systems [5], expert systems [6], databases [7], and image processing [8]. A substantial role is played in this context by the computation structures that have to process applications developed using fuzzy logic. These structures have to keep up with the evolution outlined above, in the sense that they are destined to be used for intense computation activity as the level and quantity of “fuzzy computing” in complex systems gradually increases. This is obviously a challenge to the architects of hardware structures dedicated to fuzzy logic, as they have to design architectures and technologies that can cope with a demand extending from simple fuzzy microcontrollers to parallel computing for fuzzy applications, which require the processing of concurrent fuzzy inferences. Of course, the architectural complexity of the hardware varies according to the type of application involved. A feature common to several fuzzy system applications in the field of Manuscript received April 15, 1997; revised May 18, 1999. G. Ascia and V. Catania are with the Istituto di Informatica e Telecomunicazioni, Universit`a di Catania, Catania, 95125 Italy. M. Russo is with the Dipartimento di Fisica, Universit`a di Messina, Salita Sperone 31-98166, Sant’Agata, Messina, Italy and with the INFN-Section of Catania, C.so Italia 57, 95129 CT Italy. Publisher Item Identifier S 1063-6706(99)08730-5.
control [9], [10], for example, is the regular homogeneous structure of their knowledge bases; they are typically made up of fuzzy conditional rules in which the antecedents operate on the same set of variables and the consequents have no links with the antecedents. Other classes of applications [11], [12], on the other hand, require greater structural complexity in the knowledge base and may include rule chaining. This leads to an increase in the complexity of the fuzzy inference execution models, as intermediate variables present in the antecedents of some rules and the consequents of others have to be dealt with [13]. This feature is to be found, for instance, when an application is split up into a set of cooperating fuzzy inferences, the outputs of some of which appear as inputs to other inferences (Fig. 1). In such cases the hardware architecture is structurally more complex and cost-effective implementation can only be obtained by using appropriate inference computation strategies. It is, in fact, necessary to provide for synchronization mechanisms between chained rules and to use methods which will reduce the amount of information they need to exchange. Finding appropriate computational methods can drastically reduce the time required to compute fuzzy inferences. In turn, the architectural model directly affects the processing speed obtainable, according to whether it exploits a certain degree of parallelism, or incorporates solutions which optimize the computation of specific functions. In this paper, we address these issues, proposing a design for a VLSI fuzzy processor that is capable of dealing with complex fuzzy knowledge bases that also include rule chaining. The choice of the architecture was based on the definition of a computational model, the main features of which are: • the capability to cope effectively with rule chaining; • a detection process, applied to each input pattern, which only extracts rules with a nonnull degree of activation from the knowledge base, thus achieving a significant reduction in the number of rules to be processed per inference; • representation of membership functions, based on -level sets; • division of the various inference processing phases into steps that can be executed parallel to each other; • use of a defuzzification technique, which can be costeffectively implemented in hardware. The architecture of the processor, the definition of which is based on the computational model used, is made up of a number of pipeline stages in which several inference processing phases are performed parallel to each other.
1063–6706/99$10.00 1999 IEEE
554
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 7, NO. 5, OCTOBER 1999
Fig. 1. A set of cooperating fuzzy inferences.
The processor was modeled in very high speed integrated circuit hardware description language (VHDL), successfully simulated, and then synthesized, using HCMOS5 as the target technology, requiring a silicon area of only 30 mm . The processing speed, expressed in FLIPS, reaches 2 MFLIPS. The paper is organized as follows. Section II is a brief overview of fuzzy logic, analyzing the inference processing techniques used. Section III defines the most critical aspects that need to be dealt with to make such an execution model efficient and then illustrate the model used on which the processor architecture is based. Section IV shows the method used to compute the degree of truth of an antecedent. Section V presents the design of the fuzzy processor. In this section, the latency for each pipeline stage is evaluated in order to choose the appropriate number of parallel operating units. The methodology used to design the processor is also illustrated. Section VI gives a detailed explanation of the blocks it comprises. Section VII is a cost and performance assessment of the processor. Section VIII gives the authors’ conclusions. The symbols used are defined in Table I.
which are used to describe a fuzzy system and by a set of fuzzy inputs is
AND
Fuzzy set theory can be seen as an extension of traditional theory: a fuzzy set, in fact, has a membership function (1) which associates each element in the universe of discourse with a real value between zero and one. Fuzzy sets [14] can also be defined by means of their families of -level sets, according to the resolution identity (Fig. 2), its -level sets, theorem [15]. Given a fuzzy set , are given by the following identity: (2) is convex, can be represented with a If the fuzzy set , ]. closed interval [ A fuzzy inference is characterized by a set of conditional : fuzzy rules is
AND THEN
is is
is
If Premise Then Conclusion where the premise is made up of a set of terms ( is ), called antecedents, linked by fuzzy operators (AND, OR). The AND operator can be defined as conjunction, while the OR operator can be defined as disjunction. In this paper, the fuzzy rules contain only the AND operator. The premise of a rule defines the conditions in which the conclusion has to be applied. The conclusion defines the actions to be taken when the conditions of the premise are satisfied. Using the MAX–MIN inference method, the degree of activation of the premise can be obtained by (3)
A. Fuzzy Sets and Fuzzy Rules
is
AND
where , , are fuzzy sets. As has been seen, a typical conditional rule assumes a form like the following:
II. FUZZY LOGIC
IF
is
AND
AND
is the degree of truth where is the minimum operator and of the th antecedent ( is ), which is obtained by means of the knowledge of the intersection between the fuzzy set base and the fuzzy input . and the -level sets of the fuzzy If we call and refer to the generic antecedent sets, respectively, ) of a fuzzy rule; the degree of activation of the th ( is antecedent is given by the following relation: (4) B. Rule Chaining In [13], the authors present a method which considerably simplifies calculation of the degree of truth of antecedents with intermediate variables. In this section, we describe how to obtain the degree of truth when the fuzzy sets are defined by means of their families of -level sets. On the basis of this method we deal with two kinds of chaining, single, and multiple; in the former the intermediate variable is present in the consequent of one rule and in the antecedent of another.
ASCIA et al.: VLSI HARDWARE ARCHITECTURE FOR COMPLEX FUZZY SYSTEMS
555
TABLE I TABLE OF SYMBOLS
In multiple chaining the intermediate variable is present in the consequents of several rules and in the antecedent of another rule. Let us first consider single chaining. We will assume that we have the following rules: IF
is
AND
is
THEN
is
IF
is
AND
is
THEN
is
Let be the degree of activation of the rule and and the degrees of truth of the antecedents “ is ” ( 1, is ,” respectively. 2, 3) and “ is an output for the rule and input for the rule As we have
(5)
where represents the maximum degree of intersection and . This value can be deterbetween the fuzzy sets mined off-line if the term set of the intermediate variable is known. to be calculated by an operation between This allows scalar magnitudes, which simplifies the rule-chaining solution considerably. Let us now consider multiple chaining IF
is
IF
is
AND AND
is is
THEN THEN
is is
IF
is
AND
is
THEN
is
IF
is
AND
is
THEN
is
As can be seen in the example given above, the variable which and with and has two different chains rules term sets: a consequent term set which is only present in the conclusion of the rules and an antecedent term set which is present in the premise of the chained rules. and are the degrees of activation of rules and If and is the degree of truth of the antecedent ( is ),
556
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 7, NO. 5, OCTOBER 1999
greater simplicity in hardware implementation. It is, in fact, no longer necessary to calculate the output fuzzy set . This means a considerable reduction in silicon area and an increase in time performance as well. III. EXECUTION MODEL
FOR THE
PROCESSOR
The choice of a computational model for the processing of inferences has a considerable impact on the performance that a hardware implementation can offer. In this section, we will first indicate the most critical aspects that need to be dealt with to make such a model efficient and then illustrate the solutions proposed.
Fig. 2. Fuzzy set definition.
it follows that
A. Main Issues
(6) Considering that the knowledge base is known, the terms can be precalculated and stored in the memory. The operations needed to calculate the degree of truth of a chained antecedent, therefore, as can be deduced from (6), reduce to the maxima and minima between the scalar magnitudes and . These examples of the solution of single and multiple chaining can easily be generalized to cover cases in which the knowledge base contains several intermediate variables and/or several chained rules. C. Defuzzification One of the choices a fuzzy system designer has to make is the method of defuzzification to use. Various methods have been proposed [16], [17]. The defuzzification method used here to obtain the output value is the one proposed by Yager [18]. Let us suppose we have a set of fuzzy rules . Let us consider a function (7) is the universe of discourse of the output . The where previous function is defined from the consequent of the rule, . If is the degree of activation of , the output is given by (8) The defuzzified output is determined as follows:
(9)
The method proposed by Yager gives results that are comparable with classical methods (center of gravity, mean of maxima). The most interesting property of the method is the
The choice greatly depends on the typical characteristics of inferences. The description of fuzzy inference presented in Section II-A shows that computation methods for the calculation of inferences are inherently parallel: several antecedents and/or rules can be processed concurrently. A hardware implementation can fully exploit this parallelism. There can, in fact, be several processing units operating parallel to one another, some dedicated to calculating the and others to calculating the degree of degree of truth activation of the rules . The literature provides various examples based on this approach [20], [22]. Although a considerable increase in processing speed can be obtained using this kind of parallelism, the number of processing units operating parallel to each other has to be limited. As the number of processing units increases, in fact, so does the silicon area needed. It is, therefore, important to look for computation methods that not only exploit parallelism, but also optimize the various phases of inference processing. We refer in particular to methods which, for the same application, allow both the number of antecedents and the number of rules to be calculated per inference to be reduced. These methods can be derived by analyzing some typical features of a fuzzy application. As far as calculation of is concerned, two observations can be made. 1) The knowledge base very often contains antecedents that appear in more than one rule. It is therefore inefficient to repeat calculation of the degree of truth of the same antecedent several times. A significant improvement can be achieved if the degree of truth of an antecedent is only calculated once and then stored in a memory, to be “used” again later in the rules in which it appears. 2) Typically, an input variable only has a nonnull intersection with a fraction of the fuzzy sets of its term set. It is therefore useless to calculate antecedents that present a null degree of truth. This can be avoided by finding inexpensive methods, in terms of latency, to identify all and only the antecedents with a positive degree of truth for each input pattern. It will be on these antecedents that the value of alpha is calculated. As far as calculation of is concerned, it can be remarked that it is useless to process rules with a null degree of activation as they make no contribution to calculation of the
ASCIA et al.: VLSI HARDWARE ARCHITECTURE FOR COMPLEX FUZZY SYSTEMS
557
Fig. 3. Input detection. Fig. 4. Fuzzy rule organization.
inferences. Assuming that the antecedents are only linked by AND operators, it makes sense to process only those rules, henceforward referred to as active rules, in which all the antecedents have a positive degree of truth. As the latter, according to what was said above in point 2, are often quite a low percentage of the total number of antecedents, the number of rules to be processed can be considerably reduced, with a consequent improvement in performance. An improvement in performance based on optimization of the calculation of alpha and theta, as illustrated, can of course only be economical if it is possible to find techniques requiring a low degree of latency and a small silicon area for the detection of active rules and antecedents. Bearing in mind that by the detection of active rules the only rules processed are those which have a nonnull degree of activation, a further improvement can be obtained in the calculation of alpha: the antecedents processed are only those which have a positive degree of truth. As we saw previously, they are detected in a preliminary phase, before detection of the active rules. B. The Inference Execution Model On the basis of the considerations made in Section III-A, we propose an inference execution model that performs the following tasks: 1) acquisition of fuzzy input sets; 2) detection of antecedents with a positive degree of truth; 3) detection of active rules; 4) computation of the antecedents with a positive degree of truth; 5) computation of the active fuzzy rules; 6) defuzzification. As it is assumed that the membership function is made up segments and the truth space is made up of of sectors, (see the Appendix), Task a) is performed by first acquiring the segments belonging to the sector , which corresponds to the fuzzy sets support, and then the others. , , Task b) is performed by detecting a vector , for each term set , , , of the variables ,( ), where IF IF (10)
The value of the generic element can be calculated, in and are convex, the hypothesis that the fuzzy sets by calculating the intersections between the corresponding supports supp( ) and supp( ) (see Fig. 3) and then the intersection between the intervals defining them. As the end points of these intervals coincide with those of can the segments belonging to the sector , the vectors are acquired. be calculated as soon as the segments in are used to detect the rules that have a The vectors positive degree of activation and to select the antecedents with a positive degree of truth. The two activities are independent and can thus be performed parallel to each other. How to calculate the degree of truth will be dealt with in Section IV. Now let us see how active rules are detected. If a fuzzy rule of the following kind is being evaluated: is
IF is
AND
is
AND
AND
THEN
the rule is active only if all of the antecedents have a positive with degree of truth, i.e., if and only if all the values and are equal to one. So it is sufficient to perform a logical AND between these . If the result is one the rule is active; otherwise it values is not active. This operation could quite simply be performed for each rule in the knowledge base. It is, however, possible to reduce the number of rules on which this test is carried out, by appropriate organization of the set of rules. More specifically, the set of fuzzy rules can be divided into groups using the following procedure. be the number of Having chosen an input variable , let distinct distinct antecedents in which this variable appears. , contains all groups are formed, in which the th group, is ) appears. The the rules in which the antecedent ( is also formed, containing all the rules not inserted group , . Fig. 4 shows into any of the groups how the knowledge base is organized in groups, assuming . On the basis of this organization, it can be observed that is ) is null, the corresponding group if the antecedent ( does not contain any active rules. The search for active rules can therefore be limited to the in which the antecedent ( is ) has a positive groups
558
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 7, NO. 5, OCTOBER 1999
Fig. 5. Positions of fuzzy input X with respect to fuzzy set P.
degree of truth and to the group , which doesn’t contain is ). rules with the antecedent ( From an operational point of view, in our model, the detection of active rules requires the following steps. is scanned and for each the The vector is accessed to select the active rules corresponding group it contains. The procedure is then completed by accessing the and selecting all the active rules it contains. group In the course of Task e) the degree of activation of the premise of each active rule is calculated and as a result the and are updated for defuzzification. sums The degree of activation of the rules is obtained by parallel calculation of the minimum between the degrees of truth of the antecedents calculated during Task c). In particular, is first calculated for active rules, which do not contain intermediate variables, then once the antecedents for the intermediate variables can be calculated, it is computed for the remaining rules. During the processing of the rules without intermediate variables the degree of truth of the antecedents associated with the intermediate variables is calculated concurrently. The degree of truth of these antecedents is obtained by the following maximum operation: (11) As the maximum possesses the commutative property, this value can be obtained sequentially: (12) is obtained by updating the partial value The value of whenever a new active rule is processed. In this way, as soon as all rules without intermediate variables have been processed, the values of the degree of truth of the chained antecedents are available. Defuzzification is performed when the final value of the two and is made available by Task e). sums Computation of the fuzzy inference is divided into four pipelined stages: Stage 1) Acquisition of the inputs and calculation of the are performed parallell to each other; vectors as the acquisition of each input is independent of the others, several inputs can be acquired at the same time. It is also possible to calculate several that are parallel. vectors
Stage 2) The degree of truth of the antecedents is calculated and the active rules are detected. These two actions are parallel. In addition, several antecedents with a positive degree of truth are processed parallel to each other. Again, rule detection is performed on several rules at once. Stage 3) In this stage only computation of the active rules is performed. Here again, however, a certain degree of parallelism is possible: several active rules can be processed and the degree of activation is calculated by parallel calculation of the minimum of the degrees of truth of the antecedents. Stage 4) This stage involves defuzzification of the outputs. As in the other cases, it can be performed by parallel calculation of the various outputs. IV. DEGREE
OF
TRUTH
OF AN
ANTECEDENT
Computation of the degree of truth of an antecedent (an example of which is shown in Fig. 5) involves the following subsequent phases: containing the truth value of 1) the truth space sector the antecedent is identified; -level set of the fuzzy set , 2) the position of the , with respect to the -level set of the fuzzy is evaluated; input , 3) the truth value is calculated. containing the degree of truth is Identifying the sector equivalent to identifying the sector in which the segments defining the membership functions of the fuzzy sets intersect. This happens when (13a) (13b) is the maximum degree of truth in the th sector, where which is given by (14) is located by means of a binary search with a The sector . number of steps equal to Fig. 6 shows the algorithm used to perform the three phases. and POS, respectively, indicate the current sector and the -level set of the fuzzy set , , with position of the -level set of the fuzzy input , . respect to the is on the left with respect to the In the third phase, if ”) will be computed using the fuzzy set (POS “
ASCIA et al.: VLSI HARDWARE ARCHITECTURE FOR COMPLEX FUZZY SYSTEMS
Fig. 7. Computation of
Fig. 6. Algorithm used to calculate
S3
and
559
in the sector S 3 .
.
left side of and the right side of . If is on the left with (POS “ ”) will be computed using respect to the left side of . Otherwise the degree of truth is equal to “ OR ”). one (POS and are the two segments which intersect in the If sector, and are the points of the two maximum degree and are their slopes, the degree of truth of the segments, antecedent with respect to the maximum degree of the sector is obtained as follows: (15) . where As it is assumed that
Fig. 8. The hardware architecture.
,
can be rewritten as
(16) are stored in a special The values of the relation memory which contains a limited number of words considering , . that The truth value can be calculated as (17) and where respectively.
are the mantissa and exponent of the relation,
V. FUZZY PROCESSOR DESIGN Fig. 7 depicts the architecture of the proposed fuzzy processor. According to the computational model described in Section IV, it is organized as a cascade of four pipeline stages (Fig. 8). In the first stage the input/output (I/O) INTERFACE acquires the fuzzy inputs which are stored in IN MEM and intersection detectors (ID’s), IN BASE MEM, while , . operating in parallel, detect the vectors
In the second stage the rule detection unit (RDU) detects the -UNIT’s operating in parallel compute active rules, while the positive degree of truth of the antecedents. units, computes The third stage, which is performed by the fuzzy rules. The MAX units are used to calculate the degree of truth of the intermediate variables. If there is only one unit, the MAX units are not required. In the last pipeline stage, the defuzzifier calculates the outputs using the Yager method. A. Balancing of Latency As performance in pipeline architectures is limited by the slowest stage, a balancing of latency between stages is necessary to exploit the parallelism fully. The latency in each stage depends on the number of units operating in parallel. Let us evaluate the latency for each stage, in order to choose this number appropriately. The acquisition of input values consists of two stages. In the of the input first the segments belonging to the sector variables are acquired. The extrema of these segments define of this stage is given by each input support. The latency clock cycles
(18)
is the size of the input bus and where of bits per segment.
is the number
560
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 7, NO. 5, OCTOBER 1999
In the second stage the other segments are acquired. Conseparate membership functions have to be sidering that segments, the latency required, acquired, each containing , totals clock cycles
(19)
of the first variable is acquired, the ID’s As the segment vectors whose size is . This stage begins to produce of requires a latency clock cycles
(20)
vectors and the acquisition of the As the generation of the membership functions of the input variables are performed in parallel, the latency of the first stage is clock cycles.
To design a good architecture, it is necessary to try to balance the throughput for the various pipelined units as far as possible. In mathematical terms, the following is necessary:
where is the maximum delay of all of the pipeline stages. Bearing in mind that the first pipeline stage also comprises two subunits, it also follows that
From this “equality” it follows that which we obtain
(26a) (26b)
(21)
As the units operate parallelly with the RDU in the second of this stage is given by stage, the latency
, from
Taking these results into account and manipulating the previous equality, we get (26c)
where represents the maximum latency of all the units that of the RDU’s. and The unit computes the degree of truth of the antecedents variables; as the maximum number of antecedents with of , the unit computes a positive degree of truth is degrees of truth. As computation of is split into two steps which are pipelined and the first step is the slowest, unit latency is the clock cycles
(22)
is the latency of the first step of computation. where to indicate the fraction of the rules in the If we use in which the knowledge base belonging to the groups is ) has a positive degree of truth and antecedent ( for the number of rules to be processed parallel to each in terms of clock cycles is other, the latency clock cycles
(23)
The latency of the third stage depends on the time required to process all the active rules. The value , expressed in clock cycles, is (24) as each of these is computed in one clock cycle. This value is not known a priori as it depends on the fraction of active . rules In the fourth pipeline stage the defuzzified values of the outputs for the rules are calculated. This is performed dividers. The divider gives the defuzzified value in by clock cycles. This pipeline stage requires clock cycles
(25)
(26d) (26e) (26f) B. Fuzzy Processor Design Methodology In this section we will illustrate a procedure for the design of a fuzzy processor which reflects our architectural proposal, starting from the specifications requested by a user or a foundry. As the reader will have deduced, the architecture we propose is quite general in nature. This means that the implementation limits are not architectural but are dictated by the final cost of the product. This is quite innovative, as many of the architectures presented in literature have architectural limits. The algorithmic design proposal we present is not a significant departure from the methodology used in designing classical microprocessors; deliberately starting from this methodology, it aims to provide general guidelines for the design of a “good” fuzzy processor, taking into account past experience in the design of fuzzy processors at a worldwide level and the theoretical results we have obtained, as illustrated in the previous section. The Methodology Proposed: In the design of the fuzzy processor, the first two magnitudes that have to be specified concern the I/O of the processor. It is, in fact, necessary to define the number of input, output, and chained variables. This choice consequently determines the format of the rules, i.e., the maximum number of antecedents and consequents present in a rule. In parallel, it is necessary to choose the technological process for implementation of the processor. On the basis of this choice we can obtain the maximum clock frequency that it is possible to use. In our architecture, in fact, the
ASCIA et al.: VLSI HARDWARE ARCHITECTURE FOR COMPLEX FUZZY SYSTEMS
memories have to have a cycle equal to that of the clock. So, having specified the technology, the minimum memory access time and, thus, the maximum clock frequency immediately follow. The next step consists of synthesizing the elementary blocks with the technology chosen. This allows us to evaluate the latency expressed in clock cycles for each block. By synthesizing the various units making up the processor we have seen that almost all the blocks have a latency of a single clock cycle, except for the alpha units and dividers. This synthesis gives us detailed information about the area required to implement each block. At this point, it is possible to consider the constraints concerning the maximum silicon area available for the final implementation of the processor. From our experience, we have seen that the total area can usually be roughly divided into two parts—one devoted to the internal memories and one to the architecture itself. The area required for the control part is usually neglected at this stage of the design. Typically between 40 and 60% is reserved for the memories and the remaining part is left for the rest. So, having set a bound for the internal memory and chosen the number of inputs, outputs, and chained variables, it is possible to make an initial estimate of the maximum number of rules, the number of sectors per fuzzy set, and the number of bits per segment. It is, however, necessary to remember to reserve part of the memory for rule chaining. The last step, which is also the most complex one, consists of choosing the desired performance level of the processor in terms of inferences calculated per second, still respecting the constraints relating to memory area. Choice of performance is equivalent to choosing the latency of each pipeline stage. The value of determines by (26) the number of units that have to be present in each pipeline stage and, consequently, the silicon area occupied. A processor designer cannot define the number of active rules a priori as it depends on the type of fuzzy application. An optimal choice of the number of units requires knowledge of the features of the fuzzy system that will use the processor. A possible solution, as no fuzzy benchmarks are available, of is to formulate hypotheses about the mean percentage of rules active antecedents per rule, the mean percentage and, finally, the fraction of belonging to the groups active rules. Considering that these parameters greatly depend on the nature of the fuzzy inference, pessimistic hypotheses need to be formulated, including most of the applications to which the processor is destined. It should be pointed out that fixing these values does not preclude the possibility of using the processor for all applications that do not come under the previously formulated hypotheses; it only means that for such applications the architecture may not be perfectly balanced. , In general, we think that good values are , and . , and and chosen an So, having set the values , initial indicative value for , it is possible, on the basis of the equations obtained, to calculate the total number of cells needed for a balanced architecture. As we know the areas of each elementary cell from the previous synthesis phase it is
561
Fig. 9. The design methodology.
possible to estimate the total area needed for the total number of blocks. If this area complies with the specifications, then and only then can be fixed at the indicative value previously chosen. If this is not the case, the value has to be increased (thus establishing a lower performance level) and the whole procedure has to be iterated until the specifications concerning the area are met. Ultimately, having fixed on the basis of (26), the width of the input bus immediately follows. A final consideration that has to be made is that the design procedure may be sensibly different if, instead of starting from and thus from the speed specifications, it is the value of necessary to take into account strict constraints concerning the package. In this case, hypotheses have to be formulated immediately follows immediately about the input bus and from (26). The design algorithm is depicted in Fig. 9. C. The Processor Designed In this section, we will give the design details of the processor, which will be described below. As we shall see, using the design scheme described in the previous section, it is relatively simple to go from the initial specifications to correct sizing of the various components of the processor. The design we decided to develop was that of a lowcost, high-performance fuzzy processor with eight inputs, four outputs, and two chained variables with a precision of 8 bits for all the universes of discourse. The technology we chose was that available with our synthesizer simulator, i.e., a channel length of 0.5 m. In
562
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 7, NO. 5, OCTOBER 1999
TABLE II CHOICE OF THE NUMBER OF UNITS FOR ID, UNIT, EXECUTION UNIT (EU), AND DEFUZZIFIER
The final result we obtained is of great significance. We estimated a of 32 clock cycles. The performance obtainable from this processor is thus in the range of about two mega inferences per second.
VI. HARDWARE ARCHITECTURE
CHOICE
OF THE
TABLE III NUMBER OF
UNITS,
n
Fig. 10 shows the architecture of the fuzzy processor with the number of units determined above. We will now give a detailed description of the various units which make up the fuzzy processor. A. Memories
TABLE IV CHOICE OF THE UNITS FOR VARYING NUMBERS OF ACTIVE RULES
synthesizing some memories we saw that the typical cycle to the same number of was 15 ns. We therefore set the nanoseconds. We then synthesized the elementary blocks and observed that all the latency values were less than 15 ns except for ) those of the units, which were three clock cycles ( less than that of the divider, which was equal to that of the precision with which the fuzzy outputs are computed. The was therefore set to eight. Considering that value of we wanted to design a low-cost processor, we hypothesized a maximum area of about 30 mm and envisaged using a relatively economical package. The width of the input bus was therefore set to 32 bits. We then evaluated the memory area available and estimated a value ranging between 12 and 18 mm . On the basis of this range we estimated that a good tradeoff for the values , ed . The final total appearing was area estimated was about 17 mm . , , , , and Table II shows the values of for varying latency values, , expressed in terms of clock , while Table III shows the value cycles with for varying values and numbers of active rules with of . It is possible to rearrange the above equations so as to obtain the various magnitudes for varying numbers of active rules, , this time setting the number of units, , to one for example. Table IV shows these values. On the basis of the equations obtained in Section V-A. , , and we and assuming calculated that two intersection detectors, four execution units, one divider, four units, and one unit were needed.
1) TS Memories: Each term set memory (TS MEM) stores the term sets for a pair of input variables. As the term set can 24-bit contain seven elements, this memory contains words, i.e., 56 24-bit words. The number of memories is equal to the number of units, which in this architecture is four. 2) IN Memories: Each input memory (IN MEM) stores the fuzzy sets of three input variables. Therefore, its size is 24-bit words, i.e., 8 24-bit words. 3) TS Support Memories: The TS Support MEM contains information concerning the support of the fuzzy sets of the term sets of the variables. As the fuzzy sets are convex, it is sufficient to store the pair of end points which delimit them. The TS Support MEM is organized into words in which both points are memorized. This memory consequently contains words of bits. As we are assuming that , , and the memory contains 28 16-bit words. 4) In Support Memories: The IN support memory contains words the supports for the fuzzy inputs alone, so it contains . Under the assumptions made, these modules of contain four 16-bit words. 5) Cons Memories: The Cons Memory stores information about the elements in the term sets of the four output variables. It is made up of two modules, each of which is associated to a pair of output variables. As there are seven elements in the term sets, the size of each memory module is 17-bit words. 6) Premise Memory: This memory is organized as shown groups. in Fig. 11. It is divided into ( ) contains the rules with The th group is ). The eighth group contains the antecedent ( . the other rules without the variable This memory contains appropriately coded information about the rule premises alone: the codes of the antecedents of no intermediate variables other than . The code of the is not needed, as it is identified by the group antecedent of to which the rule belongs. Each word in the premise memory stores four different premises because, as will be seen below, the RDU detects the active rules working parallelly on four different rules. As 21 bits are sufficient to code a premise (3 bits for each bits. The total antecedent), a word in the memory is number of words is 1/4 of the maximum number of rules, i.e., 64.
ASCIA et al.: VLSI HARDWARE ARCHITECTURE FOR COMPLEX FUZZY SYSTEMS
Fig. 10.
The fuzzy processor.
Fig. 11.
Organization of premise memory.
Fig. 12.
A word in rule chain memory.
7) Rule Memory: This memory contains coded information about the antecedents and consequents of the fuzzy rule. The antecedents, including those containing intermediate variables, can be at most ten and this requires a total of 30 bits. The coding of a conclusion requires 3 bits to indicate the variable present in it and another 3 bits to indicate the element of its term set, thus giving a total of 6 bits. This memory therefore contains 256 words of 42 bits each. 8) Rule Chain Memories: As we saw previously, an inter( ) has a term set TC mediate variable of elements , which appear in the conclusion of a rule , which are only present in and a TSTA of elements the premise of the rule. Each of the two memories stores the between each element maximum of the intersection in TA and each element in TC . The intersections
563
between an and the elements of TC are stored in a single word, as we can see in Fig. 12. As each degree of intersection requires 6 bits, the size of a word is 42 bits. The number of words in each of the two memories is the same as the number of elements in the term set of the premise of an intermediate variable, i.e., 7. Table V summarizes the size of the various memories. B. Intersection Detector (ID) As has been said, the task of this unit is to detect the vectors associated with the fuzzy inputs. ( ), it In order to determine the bits is sufficient to calculate the intersection between the interval made up of the support (the set of points in the universe of discourse for which the degree of membership is positive,
564
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 7, NO. 5, OCTOBER 1999
TABLE V MEMORIES IN THE PROCESSOR
On the basis of this code the selector produces (Fig. 15) an output vector of 8 bits, each of which indicates whether the corresponding antecedent has a positive or null degree of activation. By performing an AND between these bits, the logical value one is only obtained if the rule is active. The outputs of the active rule selector (ARS) block are stored in the register RULE REG, which finally contains one bit for each rule (a total of 256) to indicate whether it is active or not. are stored in the register INT REG whenever The vectors a new detection starts. They are supplied by a shifter register which is in turn serially fed by the ID block. The rule detection control unit (RD CU) has the task of coordinating the various RDU blocks. On the basis of the information contained in the pointer memory it feeds the ARS with the codes of the premises stored in the premise memory. To increase the speed of the RDU the activities of the ARS block are pipelined with the search in the pointer memory for the next group of rules to be processed. D.
Fig. 13.
ID block diagram.
identified for convex fuzzy sets by the two extremes) of the and that of the fuzzy set of the variable. input Fig. 13 shows a block diagram of the intersection detector (ID), which comprises: the TS support memory and IN support memory, which store the supports of each fuzzy set in the knowledge base and the support of each fuzzy input; an INTERSECTION block, which calculates the intersection between the supports of two fuzzy sets; and a control unit (ID CU). The control unit addresses the two memories whose outputs are read by the INTERSECTION block. It supplies the value , which is stored in a shifter register. C. Rule Detection Unit (RDU) This unit detects all the active rules to be processed by the unit. Fig. 14 is a block diagram of the unit. The pointer memory stores a table, the number of entries in which is equal to the number of groups stored in the premise memory. The th entry contains the address of the first rule in the th group and the number of rules it contains. The size of the pointer memory is seven words (equal to the maximum number of groups in the premise memory) of 16 bits each, to represent the address of the first rule (8 bits) and the size of the group (8 bits). The active rule selector (ARS) block, comprising four identical units (execution units) working in parallel, can compute four separate rules to establish whether their premises have a positive degree of activation. Each execution unit (EU) receives the eight vectors Ai, each of 7 bits, stored in the INT REGISTER and the code of the premise of the rule being processed (24 bits).
Unit
The task of this unit is to compute the degree of truth of is ) with . Computation of the antecedents ( involves two subsequent steps which are pipelined: first, the truth-space sector containing the truth value of the antecedent is identified and the information about the segments involved is read and then the value is calculated. Fig. 16 is a block unit. diagram of the The -register file stores four vectors . The -CU block addresses the two memories TS memory and IN memory to gain access to information about the current sector . The INT-CALC block receives this information and checks the intersection. If the intersection is nonnull Int Res is equal to one. The SECTOR CALCULATOR block uses this value to update the current sector on the basis of the algorithm seen in Section IV. At the end of the binary search, the -CU accesses the and memory again to read the information about the sector the POS-CALC block checks the position of the fuzzy input with respect to the fuzzy sets of the term set . On the basis of this information, the -CU stores the right information about in the registers TS-REG and IN-REG. the sector The contents of the two registers are sent to the -CALC and . By means block, which calculates the differences , the -CU addresses the -MEM obof the value of . Once taining the relation , the slope of one of the two segments and the sector are known, in the next clock cycle the -CALC block, on the basis of (17), gives the truth value of the antecedent which is stored in an -register file together with its identifier. ( OR ), a null value is stored If POS! in the registers TS-REG and IN-REG and then the -CALC as the output value. block gives Each variable has a number of associated registers equal to the maximum number of antecedents with a positive degree of truth. As an overlap factor of four has been assumed, there are eight registers in the -register file.
ASCIA et al.: VLSI HARDWARE ARCHITECTURE FOR COMPLEX FUZZY SYSTEMS
Fig. 14.
RDU block diagram.
Fig. 15.
EU block diagram.
Fig. 16.
unit block diagram.
565
566
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 7, NO. 5, OCTOBER 1999
Fig. 17.
Fig. 18.
E.
unit block diagram.
-CALC.
Unit
As described in the computational model, this unit evaluates the premises of the rules, its output being the final values of and for the defuzzifier. the sums Evaluation of the premises to calculate is first performed on the rules with no intermediate variables and then, when the RCU has made the necessary information available, on the rest of the rules. Fig. 17 illustrates the hardware architecture of the unit. During the phase in which the values calculated refer to rules with no intermediate variables, the -CALC supplies the RCU with these values and the control unit supplies the RCU with the codes of the consequents. The RCU needs this information to calculate the values of the antecedents containing intermediate variables. On the basis of the contents of RULE REG, the control unit ( CU) selects one active rule at a time from RULE MEM and supplies -CALC (Fig. 18) with its premise code ( CODE).
1) -CALC: This unit uses the premise code to select the values among its inputs which refer to the rule being processed. The MIN8 block calculates the minimum between these values, thus determining the value of the rule. When rules with intermediate variables are processed, the calculation procedure is similar, the only difference being that the input alpha values supplied by the RCU are used. Of course, the minimum is calculated on ten values, involving the MIN3 block as well. Each selector in CALC is linked to an input variable. If the latter is not chained, the selector input is the four possible values, each coded with 9 bits ( bits); if it is chained, the RCU provides the selector input with the seven bits). possible values, each coded with 6 bits ( 2) Units: These units have the task of providing for each and for the set of rules output variable the sums processed by the unit. These sums are calculated by the units shown in Fig. 19. Each unit receives the value of the degree of activation of a rule from -CALC, and information on the line on which the mid points of the -level sets in the sector containing are located. The unit receives the minimum degree point of and the slope from the CONS MEMORY. this line The X-CALC block provides the middle point . and the of the -level set. The degrees of activation products obtained by the MULTIPLIER are summed by the ADDER blocks with the values of the partial sums already and REG . The activities stored in the registers REGT in this unit are pipelined. When all the rules have been processed, the total sums provided by the ADDER blocks are sent to the defuzzifier. 3) Rule Chain Unit (RCU): Each intermediate variable is associated with an RCU, which calculates the values of all the antecedents in which it is present. Fig. 20 illustrates the hardware architecture of an RCU. This unit receives the degree of activation of a rule relative to the th consequent and the label of the th consequent from the unit. On the basis of this information, the rule
ASCIA et al.: VLSI HARDWARE ARCHITECTURE FOR COMPLEX FUZZY SYSTEMS
Fig. 19.
6
567
unit block diagram.
Fig. 22. Defuzzifier block diagram.
processing of rules with no intermediate variables, signaled by the -unit by activating the signal Erl. F. Defuzzifier Fig. 20.
RCU block diagram.
Fig. 21.
ik
-unit block diagram.
chaining control unit (RC CU) addresses the rule chaining memory. Of the 42 bits of a word in this memory, six are -unit blocks, one for each fuzzy sent to each of the seven of the term set of the variable. set The th block has the task of calculating the degree of truth -unit block of the antecedent ( is ). Fig. 21 shows the diagram. The MIN block calculates the minimum between the , which is provided current value and the degree of truth by the rule chain memory. The MAX block calculates the maximum between the output of MIN and the contents of the -REG in which the partial value of the degree of register is stored. These registers are reset truth of the antecedent when a new inference starts. The final value of the degree of truth of the antecedent is -REG. This happens at the end of the stored in the register
As seen in Section V, a single divider is sufficient to compute the defuzzified value. It has the task of calculating ratio for each variable. The defuzzifier is first the and referring to the output provided with the sums variables of unchained rules and then, when the inference has been calculated, those of chained rules. As fuzzy rules contain and up to two consequents, the -unit gives the sums for both variables parallel to each other. These values, as XREG can be seen in Fig. 22, are stored in the registers REG. and As soon as these sums are stored in the registers, the divider starts to operate. Its input is the outputs of the two multiplexers. The beginning of the division is signaled by a start signal, which is activated by the CU of the -unit as soon as the input data is ready. VII. PROCESSOR PERFORMANCE In this section, we assess the performance of the processor and the costs of silicon implementation. The latency the various pipeline stages introduce in terms of clock cycles has been analyzed in detail in Section V. The clock frequency at which each processor operates is 66 MHz. This clock frequency was determined by logical synthesis of the processor, modeled in VHDL, by means of
568
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 7, NO. 5, OCTOBER 1999
TABLE VI PROCESSOR PERFORMANCE
COMPARISON
TABLE VII WITH OTHER PROCESSORS
number of clock cycles required to calculate the degree of activation of a rule. The processor in [20] can perform an inference (20 rules, two inputs, one output) at 200 KFLIPS, while in [21] the processor performs up to 310 KFLIPS (32 rules, four input, one output). A processor whose performance is of 580 KFLIPS (102 rules, two inputs, one output) is described in [19]. A parallel fuzzy processor, which uses a core based on an analog-numerical approach that combines the inherent advantages of both analog and digital implementations, is presented in [22]. This processor performs up to 1.5 MFLIPS (256 rules, eight input, one output). The design of the hardware architecture of a fuzzy processor, which allows the implementation of a fuzzy system to police ATM traffic on very large-scale integration (VLSI) technology, is presented in [23]. The fuzzy processor exhibits a level of performance of over 3 MFLIPS (18 rules, three inputs, one output, crisp input). VIII. CONCLUSIONS
the synopsys synthesizer using HCMOS 0.5 m as the target technology. With a clock frequency of 66 MHz, the number of inferences processed per second is
where is the number of clock cycles of the slowest pipeline stage. Table VI shows the performance of the processor for various , , and . number of , The fuzzy processor exhibits a level of performance of 2 MFLIPS with 256 rules, eight inputs, and four outputs, 5.2 MFLIPS with 32 rules, three inputs, one output. The silicon area estimated per processor is approximately 30 mm , 17 mm of which is occupied by memories. Table VII compares the processor, which is presented in this paper with others. While in [19] fuzzy inference is performed by sequential processing of the antecedents of the fuzzy rules, [20]–[23] are only some examples of architectures in which the inferences are split into several pipeline stages each of which has a certain degree of parallelism. In [20] and [21], the degree of activation is calculated for each rule in the knowledge base. Examples of architectures with a phase in which the active rules are detected, each with a different detection phase, are given in [22] and [24]. In [20], calculation of the degree of truth , obtained by sequential scanning of the universe of discourse, requires a high latency, which limits overall performance. In [21], calculation of the degree of truth is made by means of a binary search. In addition, as it is possible to take into account the partial value of the degree of activation of a rule. This solution certainly leads to a considerable reduction in the
The paper proposes the design of a VLSI fuzzy processor which is capable of dealing with complex fuzzy systems which also include rule chaining. An analysis of inference techniques and their complexity has led to the definition of an inference execution model which optimizes processing time. Its main features are: • the capability to cope effectively with rule chaining; • a detection process, applied to each input pattern, which only extracts rules with a nonnull degree of activation from the knowledge base, thus achieving a significant reduction in the number of rules to be processed per inference; • representation of membership functions, based on -level sets, which allows applications requiring high resolution of the universe of discourse to be dealt with; • division of the various inference processing phases into phases, which can be executed concurrently; • adoption of a defuzzification technique, which can be cost-effectively implemented in hardware. The architecture of the processor, definition of which is based on the computational model used, is made up of a number of pipeline stages in which several inference processing phases are performed parallelly. The processor was modeled in VHDL, successfully simulated, and then synthesized using HCMOS5 as the target technology. The processing speed, expressed in FLIPS, reaches 2 MFLIPS with 256 rules, eight inputs and four outputs, 5.2 MFLIPS with 32 rules, three inputs, one output. The silicon area estimated is 30 mm . APPENDIX A. Fuzzy Set Representation 1) Input Term Sets: It is assumed that the membership segments (Fig. 23). Each function is made up of segment can be characterized by the following information: the position of the point of the segment with the maximum and the slope . degree of truth
ASCIA et al.: VLSI HARDWARE ARCHITECTURE FOR COMPLEX FUZZY SYSTEMS
Fig. 23.
569
As we have seen in Section II-C, for each degree of activation of a rule, the mid point of the corresponding -level set is required. To avoid this calculation the location of the mid points of each of the elements in the term set is stored. As the segments, the membership functions are defined by , is made up of segments, location of the mid points, one for each sector (Fig. 24). For each segment the following are stored: the abscissa of , its slope , and the the minimum truth degree point sign of the slope.
Representation of a fuzzy set.
REFERENCES
Fig. 24.
Representation of a consequent.
The truth space [0, 1] is assumed to be divided into values and into groups of the same size. Inside a group there are two segments, one for each side of the fuzzy set. Each of these segments has a limited set of possible slopes, . In this way, if the right and left the number of which is minimum degree positions for a given group are called and and and are the slopes of the segment at level , the two extreme points of the interval expressing the -level set are given by (27) mod is the segment number. We define the slope as follows:
where
and
(28) Consequently, the number of possible slopes is equal to the range of possible values of . By this kind of choice a great variety of shapes can be maintained, but with the dual advantage of considerably reducing the amount of memory required and only slightly varying the time needed to obtain the end points. In this case, in fact, it is sufficient, after accessing the memory, to execute a shift in order to calculate the product or division of the slope and a sum or a difference. and The information that has to be stored is as follows: • the abscissae of the maximum degree point of the left and right segments associated with each of the groups; • the slope of the two segments. 2) Representation of Consequents: The elements of the output variable term set can be represented in the same way as the fuzzy sets associated with the input variables.
[1] L. A. Zadeh, “Outline of a new approach to the analysis of complex systems and decision processes,” IEEE Trans. Syst., Man, Cybern., vol. SMC-3, pp. 28–43, Jan. 1973. [2] , “Fuzzy logic, neural networks, and soft computing,” Commun. ACM, vol. 37, no. 3, pp. 77–84, Mar. 1994. [3] C. C. Lee, “Fuzzy logic in control systems: Fuzzy logic controller—(Parts I and II),” IEEE Trans. Syst., Man, Cyber., vol. 20, pp. 404–432, 1990. [4] D. H. K. Tsang, B. Bensaou, and S. T. C. Lam, “Fuzzy-based rate control for real-time MPEG video,” IEEE Trans. Fuzzy Syst., vol. 6, pp. 504–517, Nov. 1998. [5] H. Maeda and S. Murakami, “The use of a fuzzy decision-making method in a large-scale computer system choice problem,” Fuzzy Sets Syst., vol. 54, pp. 235–249, 1993. [6] J. Pan, G. N. DeSouza, and A. C. Kak, “FuzzyShell: A large-scale expert system shell using fuzzy logic for uncertainty reasoning,” IEEE Trans. Fuzzy Syst., vol. 6, pp. 563–581, Nov. 1998. [7] P. Bosc and O. Pivert, “Extending SQL retrieval features for handling of flexible queries,” in Fuzzy Information Engineering. New York: Wiley, 1996, pp. 233–253. [8] D. Sinha, P. Sinha, E. R. Dougherty, and S. Batman, “Design and analysis of fuzzy morphological algorithms for image processing,” IEEE Trans. Fuzzy Syst., vol. 5, pp. 570–584, Nov. 1997. [9] M. Garc`ıa-Rivera, R. Sanz, and J. A. P`eerez, “An antislipping fuzzy logic controller for a railway traction system,” in Proc. FUZZ-IEEE’97, Barcelona, Spain, July 1997, pp. 119–125. [10] X. W. Li, “Stable and optimal fuzzy control of linear systems,” IEEE Trans. Fuzzy Syst., vol. 6, pp. 137–143, Feb. 1998. [11] H. P. Adlassnig, “Fuzzy set theory in medical diagnosis,” IEEE Trans. Syst., Man, Cyber., vol. SMC-16, pp. 260–265, Feb. 1986. [12] C. Von Altrock, B. Krause, and H. J. Zimmermann, “Advanced fuzzy logic control of a model car in extreme situations,” Fuzzy Sets Syst., vol. 58, no. 1, pp. 41–52, 1992. [13] A. Bugar`ın and S. Barro, “Fuzzy reasoning supported by Petri nets,” IEEE Trans. Fuzzy Syst., vol. 2, pp. 135–150, May 1994. [14] K. Uehara and M. Fujise, “Fuzzy inference based on families of -level sets,” IEEE Trans. Fuzzy Syst., vol. 1, pp. 111–124, May 1993. [15] L. A. Zadeh, “The concept of a linguistic truth variable and its application to approximate reasoning—I, II, III,” Inform. Sci., vol. 8, pp. 199–249, 301–357, 1974; vol. 9, pp. 43–80, 1975. [16] R. R. Yager and D. P. Filev, “On the issue of defuzzification and selection based on a fuzzy set,” Fuzzy Sets Syst., pp. 255–271, 1993. [17] , “SLIDE: A simple adaptive defuzzification method,” IEEE Trans Fuzzy Syst., vol. 1, pp. 69–78, Feb. 1993. [18] M. Figueiredo, F. Gomide, A. Rocha, and R. Yager, “Comparison of Yager’s level set method for fuzzy logic control with Mandami’s and Larsen’s methods,” IEEE Trans. Fuzzy Syst., vol. 1, pp. 156–159, May 1993. [19] K. Nakamura, N. Sakashita, N. Nitta, K. Shimomura, and T. Tokuda, “Fuzzy inference and fuzzy inference processor,” IEEE Micro, vol. 13, pp. 37–48, Oct. 1993. [20] H. Watanabe, J. R. Symon, W. D. Detloff, and K. E. Yount, “VLSI fuzzy chip and inference accelerator board system,” in Fuzzy Logic for Management of Uncertainty. New York: Wiley, 1992, pp. 211–243. [21] G. Ascia and V. Catania, “A VLSI parallel architecture for fuzzy expert systems,” Int. J. Pattern Recognition Artificial Intell., vol. 5, no. 2, pp. 421–447, 1995. [22] G. Ascia, V. Catania, M. Russo, and L. Vita, “Rule driven VLSI fuzzy processor,” IEEE Micro, vol. 16, pp. 62–74, June 1996.
570
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 7, NO. 5, OCTOBER 1999
[23] G. Ascia, V. Catania, D. Panno, G. Ficili, and S. Palazzo, “A VLSI fuzzy expert system for real-time traffic control in ATM networks,” IEEE Trans. Fuzzy Syst., vol. 5, pp. 20–31, Feb. 1997. [24] H. Surmann and A. P. Ungering, “Fuzzy rule-based systems on generalpurpose processors,” IEEE Micro, vol. 15, pp. 40–48, Aug. 1995.
Giuseppe Ascia received the Laurea degree in electronic engineering and the Ph.D. degree in computer science from the University of Catania, Italy, in 1994 and 1998, respectively. In 1994, he joined the Institute of Computer Science and Telecommunications at the University of Catania. His research interests are soft computing, VLSI design, hardware architectures, and low-power design.
Vincenzo Catania received the Laurea degree in electrical engineering from the University of Catania, Italy, in 1982. Until 1984, he was responsible for testing microprocessor system at STMicroelectronics, Catania, Italy. Since 1985 he has cooperated in research on computer network with the Institute of Computer Science and Telecommunications at the University of Catania, where he is an Associate Professor of Computer Science. His research interests include performance and reliability assessment in parallel and distributed system, VLSI design, low-power design, and fuzzy logic.
Marco Russo (M’98) received the Masters and Ph.D. degrees in electronical engineering from the University of Catania, Italy, in 1992 and 1996, respectively. In 1996, he joined the Institute of Computer Science and Telecommunications as an Assistant Professor of Computer Science. Since 1998 he has been an Associate Professor of Computer Science at the Department of Physics at the University of Messina, Italy. His primary interests are fuzzy logic, genetic algorithms, neural networks, VLSI design, optimization techniques, and distributed computing. Dr. Russo is a member of the IEEE Computer Society and IEEE Systems, Man, and Cybernetics Society. He is also a member of the National Institute for Nuclear Physics (INFN).