A Fast Digital Fuzzy Logic Controller: FPGA Design ...

12 downloads 513 Views 422KB Size Report
parameterized digital fuzzy logic controller (DFLC) processing only the .... sel_r0(i) = sign( ip0 – rise_start0(i+2) ) sel_r1(i) = sign( ip1– rise_start1(i+2) ) j = j+1.
A Fast Digital Fuzzy Logic Controller: FPGA Design and Implementation K. M. Deliparaschos, F. I. Nenedakis, S. G. Tzafestas Intelligent Robotics and Automation Laboratory Dept. of Electrical Eng. and Comp. Science National Technical University of Athens Zographou Campus, Athens, GR 157 73 [email protected] Abstract 2. DFLC hardware implementation This paper describes an improved approach to design a Takagi-Sugeno zero-order type fast parameterized digital fuzzy logic controller (DFLC) processing only the active rules (rules that give a nonnull contribution for a given input data set), at high frequency of operation, without significant increase in hardware complexity. To achieve this goal, an improved method of designing the fuzzy controller model is proposed that significantly reduces the time required to process the active rules and effectively increases the input data processing rate. The DFLC discussed in this paper achieves an internal core processing speed of at least 200 MHz, featuring two 8bit inputs and one 12-bit output, with up to seven trapezoidal shape membership functions per input and a rule base of up to 49 rules. The proposed architecture was implemented in a Field Programmable Gate Array (FPGA) chip with the use of a very high-speed integrated-circuits hardwaredescription-language (VHDL) and advanced synthesis and place and route tools.

1. Introduction Typical fuzzy logic controller architectures found in the literature [2][3] (assuming overlap of two between n adjacent fuzzy sets) consume 2 clock cycles, since they process one active rule per clock cycle, ,where n is the number of inputs. Using the proposed Odd-Even method, we manage to process for the same model case n-1 scenario, two active rules per clock cycle or 2 , thus increasing significantly the input data processing rate of the system. The architecture of the design allows us to achieve a core frequency speed of 200 MHz, while the input data can be sampled at a clock rate equal to the half of the core frequency speed, while processing only the active rules. The presented DFLC is based on a simple algorithm similar to the Takagi-Sugeno zeroorder inference and defuzzification method [1].

0-7803-9402-X/05/$20.00 © 2005 IEEE

The parameters characterizing the presented DLFC are summarized in Table 1. Table 1. DLFC characteristics. Fuzzy Inference System

Takagi-Sugeno zero order

Inputs

2

Input resolution

8-bit

Outputs

1

Output resolution

12-bit

Antecedent MF’s

7 Trapezoidal shaped per fuzzy set

Antecedent MF degree of

4-bit

truth (α value) resolution Consequent MF’s

49 Singleton type

Consequent MF resolution

8-bit

Aggregation method

MIN

Implication method

PROD (product operator)

MF overlapping degree

2

Defuzzification method

Weighted average

The architecture (Fig. 1) is mainly broken in two major blocks: “Fuzzification & Aggregation” and “Inference & Defuzzification”. The dotted lines on the Fig. 1 indicate the number of pipeline stages for each block. Signal bus sizes are noted as well. Instead of processing all the rule combinations for each new data input set, we have chosen to process only the active rules. Dealing with the above problem and given the fact that the overlapping between adjacent fuzzy sets is 2, we have implemented an active rule selection block (ARS) to calculate the fuzzy sets region in which the input data corresponds to. As a result, for the 2-input DFLC with 7 membership functions per input instead of processing all 49 rule combinations in the rule base, only 4 active rules remain that need to be taken into account. Fig. 2, illustrates the fuzzy set (FS) coding used for both inputs, as well as the FS area split up that occurs by repeatedly comparing the two input data values (ip0,1) with the rising start points (rise_start0,1) of the fuzzy sets. It is also shown for the selected pair (evenodd), whether ip0,1 lies in the rising (value 0) or falling

part (value 1) of each FS. The algorithm used in the ARS block is shown in Fig. 3. ip0

ip1 8

8

Active Rule Selector ars bus 12

region bus 4

Address Generator ars gen bus 9

parameter Memory Banks

singletons ROM

Difference signed

alpha 0 odd 4

even

sel odd

change in ars

dy 0

EVEN 0

6

odd

ip bus

3

Trapezoidal Generator (x3)

srom ip even

6

change in reg

region gen bus

consequent mapper srom ip odd

corresponding Parameter Memory Bank ROM (addressed by ars0®0, ars1®1, ars2®2 st nd rd concatenated signals), for the 1 , 2 and 3 TMFG respectively. The output named alpha represents the degree of truth of the current fuzzy set. We treat rise and fall sections of the trapezoidal shape in the same way but distinguished with a logical NOT for the fall section.

Membership Function Multiplier

alpha 1 4

alpha 0 even 4

(1,0) (1,0)

ODD 2

EVEN 3

(1,1)

(2,1)

(2,2)

(3,2)

(1,0)

0

active bus even 8

even

theta odd

theta even

4

(0,1)

(1,0)

1

(0,1)

(ars _even 1, ars _odd 1)

(0,1)

(1,1)

2

3

4

5

6

(0,0)

(2,1)

(1,0)

(0,1)

(2,3)

(1,0)

(4,3)

(0,1)

(4,5)

(1,0 )

(6,5)

ip1

(0,1)

(reg _even 1, reg_odd 1) (0,0)

Multiplier odd

Multiplier even

implication odd

implication even

Adder unsigned

(1,1)

(1,1)

(1,1)

theta

ip0,1 rise _start0,1 fall_start0,1

Integrator unsigned

implication

divisor

14

(1,1)

Figure 2. FS coding and Odd-Even regions.

5

13

Adder signed

(1,1)

Fuzzification & Aggregation

4

13

(1,1)

Selecting ars and reg signals for ip 1

MIN odd

(1,1)

dy 1

even

8

8

(0,1)

(0,0)

active bus odd

8

(0,0) (0,0)

Rule Selector

1

EVEN 2

ip 0 (ars_even0, ars_ odd 0)

2

cns even

ODD 1

(reg_even 0, reg_odd 0)

odd

cns odd

EVEN 1

Fix alpha

sel even

2

clear int

ODD 0

Selecting ars and reg signals for ip0

i=0

j=0

i = i+1 sel_r0 (i) = sign( ip 0 – rise _start0 (i+2) ) sel_r1(i) = sign( ip1– rise_start1(i+2) )

j = j+1 sel_f0(i) = sign ( ip0 – fall_start0(i) ) sel_f1(i) = sign ( ip1 – fall_start1(i) )

6

Integrator signed

reciprocal LUT

divident

reciprocal

15

is one

17

YES

Multiplier mul

fix output Divider Block

op

YES

Inference & Defuzzification Pipeline stages

Figure 1. DLFC architecture. The address generator (ADG) block (Fig. 4) outputs the indicated by the ARS block addresses, needed for the active fuzzy rules (ars_even0,1, ars_odd0,1, reg_even0,1, reg_odd0,1). Normally, for 4 active rule addresses, the ADG would require 4 clock periods to generate the active rule addresses [2], consequently increasing the data input sampling rate period to 4 times the internal clock period. Here, by using the Odd-Even scheme, we manage to reduce the clock period number required by the ADG to only 2. The flow chart for the trapezoidal membership function generator (TMFG) is shown in Fig. 5. Three TMFG block instances required in the architecture and accept as inputs the controller input (ip0, ip0, ip1), region (reg0, reg1, reg2), start point and slope from the

i < FS_no - 1

NO

1

33

12

i < FS_no - 2

one flag

NO ars_odd0 = 0 ars_even0 = 0

sel_r0

ars_odd1 = 1 ars_even1 = 0

sel_r1

ars_odd0 = 0 ars_even0 = 1

sel_r0

ars_odd1 = 1 ars_even1 = 2

sel_r1

ars_odd0 = 1 ars_even0 = 1

sel_r0

ars_odd1 = 3 ars_even1 = 2

sel_r1

ars_odd0 = 1 ars_even0 = 2

sel_r0

ars_odd1 = 3 ars_even1 = 4

sel_r1

ars_odd0 = 2 ars_even0 = 2

sel_r0

ars_odd1 = 5 ars_even1 = 4

sel_r1

ars_odd0 = 2 ars_even0 = 3

sel_r0

ars_odd1 = 5 ars_even1 = 6

sel_r1

YES

sel_r0 = “11111” sel_r1 = “11111”

sel_r0,1

reg_odd0,1 = sel_f0,1(1) reg_even0,1 = sel_f0,1 (0)

sel_r0,1

reg_odd0,1 = sel_f0,1(1) reg_even0,1 = sel_f0,1 (2)

YES

NO

YES

sel_r0 = “11110” sel_r1 = “11110”

YES

NO

YES

sel_r0 = “11100” sel_r1 = “11100”

YES sel_r0,1

reg_odd0,1 = sel_f0,1(3) reg_even0,1 = sel_f0,1 (2)

NO

YES

sel_r0 = “11000” sel_r1 = “11000”

YES sel_r0,1

reg_odd0,1 = sel_f0,1(3) reg_even0,1 = sel_f0,1 (4)

NO

YES

sel_r0 = “10000” sel_r1 = “10000”

YES sel_r0,1

reg_odd0,1 = sel_f0,1(5) reg_even0,1 = sel_f0,1 (4)

NO

reg_odd0,1 = sel_f0,1(5) reg_even 0,1 = 0

Figure 3. ARS block algorithm. The TMFG parameter memory banks are organized as follows in Table 2.

ip0

ip1

ars_odd0 ars_even0 reg_odd0 reg_even0

ars_odd1 ars_even1 reg_odd1 reg_even1

NO

cnt = 1

ars2 = ars_odd1 reg2 = reg_odd1

ars0 = ars_odd0 ars1 = ars_even0 reg0 = reg_odd0 reg1 = reg_even0

Similarly the Even Rule selector accepts as inputs: • alpha0=alpha0even • alpha1=alpha1 • active_sel=sel_even YES

Table 3. Singletons Rom.

ars2 = ars_even1 reg2 = reg_even1

cnt = not(cnt)

Figure 4. Address generator algorithm. ip reg start slope distance = ip - start

mul = slope * distance

zer = sign(mul) ovf = sign(mul – 2^7) alpha_tmp = mul(6 downto 3)

ODD cnsODD0&0 cnsODD0&1 cnsODD0&2 cnsODD0&3 cnsODD0&4 cnsODD0&5 cnsODD0&6 cnsODD1&0 cnsODD1&1 cnsODD1&2 cnsODD1&3 cnsODD1&4 cnsODD1&5 cnsODD1&6 cnsODD2&0 cnsODD2&1 cnsODD2&2 cnsODD2&3 cnsODD2&4 cnsODD2&5 cnsODD2&6

active_selODD0&0 active_selODD0&1 active_selODD0&2 active_selODD0&3 active_selODD0&4 active_selODD0&5 active_selODD0&6 active_selODD1&0 active_selODD1&1 active_selODD1&2 active_selODD1&3 active_selODD1&4 active_selODD1&5 active_selODD1&6 active_selODD2&0 active_selODD2&1 active_selODD2&2 active_selODD2&3 active_selODD2&4 active_selODD2&5 active_selODD2&6

cnsEVEN0&0 cnsEVEN0&1 cnsEVEN0&2 cnsEVEN0&3 cnsEVEN0&4 cnsEVEN0&5 cnsEVEN0&6 cnsEVEN1&0 cnsEVEN1&1 cnsEVEN1&2 cnsEVEN1&3 cnsEVEN1&4 cnsEVEN1&5 cnsEVEN1&6 cnsEVEN2&0 cnsEVEN2&1 cnsEVEN2&2 cnsEVEN2&3 cnsEVEN2&4 cnsEVEN2&5 cnsEVEN2&6 cnsEVEN3&0 cnsEVEN3&1 cnsEVEN3&2 cnsEVEN3&3 cnsEVEN3&4 cnsEVEN3&5 cnsEVEN3&6

sel = zer & ovf & reg

sel = “000”

YES

alpha = alpha_tmp

NO sel = “001”

YES

alpha = not(alpha_tmp)

YES

alpha = 15

alpha0 alpha1 active _sel

NO active _sel = “11”

sel = “010”

EVEN active_selEVEN0&0 active_selEVEN0&1 active_selEVEN0&2 active_selEVEN0&3 active_selEVEN0&4 active_selEVEN0&5 active_selEVEN0&6 active_selEVEN1&0 active_selEVEN1&1 active_selEVEN1&2 active_selEVEN1&3 active_selEVEN1&4 active_selEVEN1&5 active_selEVEN1&6 active_selEVEN2&0 active_selEVEN2&1 active_selEVEN2&2 active_selEVEN2&3 active_selEVEN2&4 active_selEVEN2&5 active_selEVEN2&6 active_selEVEN3&0 active_selEVEN3&1 active_selEVEN3&2 active_selEVEN3&3 active_selEVEN3&4 active_selEVEN3&5 active_selEVEN3&6

YES

active 0 =alpha0 active 1 =alpha1

YES

active 0 =alpha0 active 1 =15

YES

active 0 =15 active 1 =alpha1

NO

NO

active _sel = “10”

alpha = 0

NO

Figure 5. TMFG block flow chart.

active _sel = “01”

Table 2. Parameter memory banks. ip0 ODD

ip0 EVEN

ip1

rise_startODD0 rise_slopeODD0 rise_startEVEN0 rise_slopeEVEN0 rise_start0 rise_slope0 fall_startODD0 fall_slopeODD0 fall_startEVEN0 fall_slopeEVEN0 fall_start0 fall_slope0 rise_startODD1 rise_slopeODD1 rise_startEVEN1 rise_slopeEVEN1 rise_start1 rise_slope1 fall_startODD1 fall_slopeODD1 fall_startEVEN1 fall_slopeEVEN1 fall_start1 fall_slope1 rise_startODD2 rise_slopeODD2 rise_startEVEN2 rise_slopeEVEN2 rise_start2 rise_slope2 fall_startODD2 fall_slopeODD2 fall_startEVEN2 fall_slopeEVEN2 fall_start2 fall_slope2 rise_startEVEN3 rise_slopeEVEN3 rise_start3 rise_slope3 fall_start3 fall_slope3 rise_start4 rise_slope4 fall_start4 fall_slope4 rise_start5 rise_slope5 fall_start5 fall_slope5 rise_start6 rise_slope6

The consequent address mapper block (CAM) is simply used to address the singletons ROM by generating two addressing signals, srom_ip odd and srom_ip even. The purpose of the Rule selection block (Fig. 6) is to allow for the selection of the desired rules stored in the rule base, rules that will contribute to the final result for an input data set. The rule combinations are stored in the Singletons ROM. It is obvious that two similar rule selector blocks are needed, since there are two rules to be examined at every clock cycle. The Odd Rule selector block accepts as inputs: • alpha0=alpha0odd • alpha1=alpha1 • active_sel=sel_odd

NO active 0=0 active 1 =’X’

Figure 6. Rule selector flow chart. The Spartan 3 family of FPGA [5] devices used to implement the proposed architecture provides embedded multipliers, an asynchronous version called MULT18X18 and a version with a register at the outputs called MULT18X18S. The input buses to the multiplier accept data in two’s-complement form (either 18-bit signed or 17-bit unsigned). One such multiplier is matched to each block RAM on the die. Division remains one of most hardware consuming operations. Several methods have been proposed [4]. These methods which target on better timing results, usually start with a rough estimation of the Reciprocal = 1/ Divisor and follow a repetitive arithmetic method, where the number of repetitions depends on the precision difference of the reciprocal with the desirable division precision. A similar method is followed here, but the reciprocal precision is calculated in such a way that it is the minimum possible for achieving the desired precision at the divider output without the need of any repetitions. This way, the division is simplified to a multiplication operation between the reciprocal estimation and the dividend. The precision of the reciprocal was estimated at 17-bit with the use of a

Simulink model. The “divide by zero” case has no practical value since it happens when all the theta values are zero, or when the antecedents of the rules have been disabled (by the rule selector) leading to no contributing consequences (from the sROM). Thus we set the output to zero (1/0≡0). We have chosen to add two extra circuit (is_one and fix output blocks in Fig. 1) to detect the (1/1≡1) case.

3. Results A new input data set is clocked on the falling edge of the external clock, with half the frequency of the internal clock, providing the necessary time for the Address Generator to generate the signals corresponding to all active rules. The Trapezoidal Generator block requires four pipe stages to compute the values of the membership functions. The computation occurs while the system addresses and reads the Singletons ROM. A clock later, after the Rule selection and the Minimum operator have taken place for the pair of the active rules, their θ values (MIN operation on the α values) along with their consequents and the signal for zeroing the integrators become available for the Inference and Defuzzification part. The rules implication result, occurs two clocks later along with the addition of the θ values. In the next clock, the sum of the implications is computed. During that time the unsigned integrator outputs the divisor value, while the dividend is computed by the signed integrator one clock later and at the same time the reciprocal estimation becomes available. Their multiplication and fixing of the output occurs in the th 12 pipe stage. Finally, the output is clocked at the output of the chip in the rising edge of the external th clock (13 ). The total data processing time starting from the time a new data set is clocked at the inputs until it produces valid results at the output requires a total latency of 13 pipe stages or 65 ns, with an internal clock frequency rate of 200 MHz (Spartan-3 1500-4fg676 features a 326 MHz system clock rate [5]) or 5 ns period and external clock frequency of 100 MHz or 10 ns period which effectively characterizes the input data processing rate (every 10 ns new input data can be sampled in). Without the use of the odd-even architecture we would have a latency of 12 pipe stages (no need for the adders in Fig. 1) but the external clock frequency would be 50 MHz, or the input data processing rate would be 20 ns. Along with the presented DFLC architecture, a modified model with ROM based MF Generator blocks instead of arithmetic based, has been implemented as well with respect to Table 1. The latter DFLC model requires 11 pipe stages or 55 ns total data processing time and 10 ns input data processing rate but with increase in FPGA area utilization compared to the first

model, moreover it is obvious that since the MF’s are ROM based any type and shape could be implemented. The design summary of the FPGA device used (Spartan-3 1500-4fg676) for both DFLC models (Arithmetic and ROM MF) are summarized in Table 4. Table 4. FPGA Design Summary for DFLC Models. Logic Utilization Slice Flip Flops 4 input LUTs Logic Distribution occupied Slices 4 input LUTs used as logic used as route-thru Used as 16x1 ROMs bonded IOBs MULT18X18s GCLKs DCMs Internal clock period

DFLC Arithmetic MF Generator 344 406

DFLC ROM MF Generator 238 419

294 419 406 13 30 6 2 1

368 560 419 13 128 30 3 2 1

5ns

5ns

4. Conclusions We have managed to design a digital fuzzy logic controller (DFLC) that reduces the clock cycles required to generate the active rule addresses to be processed, thus increasing the overall input data processing rate of the system. The latter reduction was possible by applying the proposed Odd-Even method presented in this paper. Two DFLC architectures were implemented; one with arithmetic and one with ROM based MF Generators. The first model achieves an internal clock frequency rate of 200 MHz with a total latency of 13 pipeline stages or 65 ns and the second for the same frequency rate requires a latency of 11 or 55 ns. The input data processing rate for both models is 10 ns or 100 MHz. Finally the VHDL codes for DFLC models presented here are fully parameterized allowing us to generate and test DFLC models with different specification scenarios. References [1]

[2] [3]

[4]

[5]

T. Takagi & M. Sugeno, Derivation of Fuzzy Control Rules from Human Operator's Control Actions, Proc. of the IFAC Symp. on Fuzzy Information, Knowledge Representation and Decision analysis, pp.55-60, July 1983 A. Gabrielli, E. Gandolfi, “A Fast Digital Fuzzy Processor”, IEEE Micro, Vol.17, pp. 68-79, 1999. C. J. Jiménez, A. Barriga. S. Sánchez-Solano, “Digital Implementation of SISC Fuzzy Controllers”, Proc. Int. Conf. on Fuzzy Logic, Neural Nets and Soft Computing, pp. 651-652, Iizuka, 1994. Derek Wong, Michael Flynn, “Fast Division Using Accurate Quotient Approximations to Reduce the Number of Iterations”, IEEE Trans. Comp., vol. 41, no. 8, Aug. 1992. Xilinx, Spartan-3 FPGA Family: Complete Data Sheet, DS099 Dec. 24, 2003

Suggest Documents