Orthogonal-least-squares and backpropa-gation hybrid learning ...

125

International Journal of Hybrid Intelligent Systems 11 (2014) 125–135 DOI 10.3233/HIS-130188 IOS Press

CO PY

Orthogonal-least-squares and backpropagation hybrid learning algorithm for interval A2-C1 singleton type-2 Takagi-Sugeno-Kang fuzzy logic systems Gerardo M. Méndeza,∗, J. Cruz Martinezb , David S. Gonzáleza and F. Javier Rendón-Espinozab a

Centro de Manufactura Avanzada, Corporación Mexicana de Investigación en Materiales SA de CV – COMIMSA, Saltillo, Coah, México b Departamento de Economía y Administración, Instituto Tecnológico de Nuevo León, Cd. Guadalupe, N.L., México

TH OR

Abstract. A novel learning methodology based on a hybrid mechanism for training interval singleton type-2 Takagi-SugenoKang fuzzy logic systems uses recursive orthogonal least-squares to tune the type-1 consequent parameters and the steepest descent method to tune the interval type-2 antecedent parameters. The proposed hybrid-learning algorithm changes the interval type-2 model parameters adaptively to minimize some criteria function as new information becomes available and to match desired input-output data pairs. Its antecedent sets are type-2 fuzzy sets, its consequent sets are type-1 fuzzy sets, and its inputs are singleton fuzzy numbers without uncertain standard deviations. As reported in the literature, the performance indices of hybrid models have proved to be better than those of the individual training mechanisms used alone. Experiments were carried out involving the application of hybrid interval type-2 Takagi-Sugeno-Kang fuzzy logic systems for modeling and prediction of the scale-breaker entry temperature in a hot strip mill for three different types of coils. The results demonstrate how the interval type-2 fuzzy system learns from selected input-output data pairs and improves its performance as hybrid training progresses.

1. Introduction

AU

Keywords: Type-2 Takagi-Sugeno-Kang fuzzy logic systems, hybrid-learning mechanism, OLS-BP training methods, ANFIS, temperature prediction

All Interval type-2 Takagi-Sugeno-Kang fuzzy logic systems are capable of approximating any real continuous function on a compact set to arbitrary accuracy. To use interval type-2 fuzzy systems as identifiers for nonlinear dynamic systems, it is necessary to update and tune the fuzzy parameters so that they perform the desired nonlinear mappings. This paper reports the de-

∗ Corresponding author: Gerardo M. Méndez, Centro de Manufactura Avanzada, Corporación Mexicana de Investigación en Materiales SA de CV – COMIMSA, Saltillo, Coah, México. E-mail: [email protected].

c 2014 – IOS Press. All rights reserved 1448-5869/14/$27.50

velopment of a hybrid training algorithm to train fuzzy systems to match desired input-output data pairs. Interval type-2 (IT2) fuzzy logic systems (FLS) are a mature technology [1–4]. The processes of financial systems [5–7], hot strip mills (HSM) [8, 9], autonomous mobile robots [10], intelligent controllers [11–15], plant monitoring and diagnostics [16– 18], edge detection [19] and forecasting of population growth [20] are characterized by high uncertainty, nonlinearity, and time-varying behavior [21]. Interval type-2 fuzzy sets (FS) make it possible to model the effects of uncertainties [22–24] and to minimize them by optimizing the parameters of an interval type-2 fuzzy set during the learning process.

G.M. Méndez et al. / Orthogonal-least-squares and backpropagation hybrid learning algorithm

CO PY

126

Fig. 2. Schematic view of the IT2 NSFLS1 firing set calculation. (Colours are visible in the online version of the article; http://dx.doi. org/10.3233/HIS-130188)

and consequent parameter tuning for interval singleton type-2 TSK FLS systems. Here, the names of such hybrid-adapted systems will be abbreviated based on the input type: the name IT2 TSK SFLS will be used for interval singleton type-2 TSK FLS systems with inputs modeled as crisp numbers as shown in Fig. 1; IT2 TSK NSFLS1 will be used for interval type-1 non-singleton type-2 TSK FLS systems with inputs modeled as type-1 fuzzy numbers as shown in Fig. 2; and IT2 TSK NSFLS2 will be used for interval type-2 non-singleton type-2 TSK FLS systems with inputs modeled as type-2 fuzzy numbers as shown in Fig. 3. In addition, the names of such hybridadapted systems will be abbreviated based on the antecedent and consequent fuzzy types, where “A” represents antecedent and “C” represents the consequent. A2-C1 represents the most general case of an interval type-2 TSK FLS when its antecedents are type-2 fuzzy sets, but its consequents are type-1 fuzzy sets; A1-C1 represents the case of an interval type-2 TSK FLS when its antecedents and its consequents are both type-1 fuzzy sets; and A2-C0 represents the case of an interval type-2 TSK FLS when its antecedents are type-2 fuzzy sets, but its consequents are crisp numbers. It is important to mention that this paper focuses on the case of an interval A2-C1 TSK SFLS system, so the A2-C1 specification can be omitted in the abbreviated name; i.e., the hybrid interval A2-C1 singleton type-2 TSK FLS is called hybrid IT2 TSK SFLS. The hybrid algorithm for IT2 Mamdani-type FLS systems have been presented elsewhere [8,9,26–28]

TH OR

Fig. 1. Schematic view of the IT2 SFLS firing set calculation. (Colours are visible in the online version of the article; http://dx.doi. org/10.3233/HIS-130188)

AU

In [1], both single-pass and steepest-descent methods are presented as IT2 Mamdani FLS learning methods, but only steepest-descent is presented as a learning mechanism for IT2 Takagi-Sugeno-Kang (TSK) FLS systems. When the steepest descent method is used in both Mamdani and TSK FLS, none of the antecedent and consequent parameters of the IT2 FLS is fixed at the start of the training process; they are tuned using the steepest-descent method exclusively. In [1], hybrid learning algorithms based on recursive parameter-estimation methods such as recursive least-squares (RLS), recursive square-root filters (REFIL) [25] (a Kalman-type filter), and recursive orthogonal least-squares (OLS) are not presented as IT2 FLS learning mechanisms. A feedforward neural network (FFNN) is a layered architecture, and its parameters (weights) can be optimized using the method of steepest descent, called in this case the backpropagation (BP) algorithm. In this algorithm, the output error is propagated in a backward direction from the output layer down into the inner layers, hence the name “backpropagation”. In an FLS, the output error is also propagated from the output layer down into the inner layers, and therefore this algorithm is also referred to as a backpropagation algorithm. The aim of this work is to present a novel OLPBP-based hybrid learning mechanism for antecedent


127

CO PY

formance has been experimentally examined under the same conditions as in previous work. This paper is organized as follows. Section 2 gives the fundamentals of IT2 TSK fuzzy-logic systems, using recursive orthogonal least-squares, and backpropagation estimation algorithms. Section 3 presents the process of constructing the hybrid IT2 TSK SFLS system for temperature prediction. Section 4 presents the experimental results, and Section 5 summarizes the conclusions.

2. Basis of the learning methodology 2.1. IT2 TSK fuzzy logic systems

˜ is characterA type-2 fuzzy set [1], denoted by A, ized by a type-2 membership function μA˜ (x, u), where x ∈ X and u ∈ Jx ⊆ [0, 1]: A˜ = {((x, u), μA˜ (x, u))|∀x∈X, ∀u∈Jx ⊆[0, 1]}

TH OR

Fig. 3. Schematic view of the IT2 NSFLS2 firing set calculation. (Colours are visible in the online version of the article; http://dx.doi. org/10.3233/HIS-130188)

AU

with three combinations of hybrid learning method: RLS-BP, REFIL-BP, and OLS-BP. For the case of TSK-type fuzzy logic systems, the hybrid algorithm method for interval singleton IT2 TSK SFLS [29,30] and for interval type-1 nonsingleton IT2 TSK NSFLS1 [31,32] has been presented using both RLS-BP and REFIL-BP. In addition, in [33] there is an introduction to a type2 TSK FLS design based on the type-1 or type-2 natures of the antecedent memberships and consequent parameters. In [34] there are three interval type-2 fuzzy neural networks (IT2FNN) with hybrid learning algorithm techniques (gradient descent and gradient descent with adaptive learning rate). In [35] it is presented a hybrid approach for image recognition combining type-2 fuzzy logic, modular neural networks, and the Sugeno integral. Studies of IT2 TSK SFLS using hybrid OLP-BP learning mechanisms as a training method have not been found in the literature. In this research, an IT2 TSK SFLS system with OLP-BP hybrid learning mechanism has been developed and implemented for modeling and prediction of the transfer-bar temperature in a hot strip mill. To enable direct comparison of the performance and functionality of the proposed hybrid mechanism, the same input-output data set as used in [26,32] was used. Per-

(1)

in addition, 0 μA˜ (x, u) 1. This means that at a specific value of x, say x , there is no longer a single value, as for a type-1 membership function (u ) [1]; instead, the type-2 membership function takes on a set of values called the primary membership of x , u ∈ Jx ⊆ [0, 1]. It is possible to assign an amplitude distribution to this set of points. This amplitude is called the secondary grade of a general type-2 fuzzy set. When the values of the secondary grade are the same and equal to one, the function is an interval type2 membership function [22–24]. An IT2 TSK fuzzy logic system having p inputs x1 ∈ X1 , . . . , xp ∈ Xp and one output y ∈ Y can be described by fuzzy IF-THEN rules that represent the input-output relations of the system and that can be expressed as: Ri : IF x1 is F˜1i and . . . and xp is F˜pi THEN Y i = C0i + C1i x1 + C2i x2 + . . . + Cpi xp (2)

where i = 1, . . . , M ; Cji (j = 0, 1, . . . , p) are consequent type-1 fuzzy sets (C1); Y i , the output of the ith rule, is also a type-1 fuzzy set, and F˜ki (k = 0, 1, . . . , p) are interval type-2 antecedent fuzzy sets (A2) described by Gaussians with uncertain means [36–38].

128


In an IT2 TSK fuzzy logic system, the firing set of the ith rule is F i (x), where p

F i (x) =

k=1

i μFki (xk ) = f i (x) , f (x)

and yr is

yr = (3) =

μFki (x) = μF˜ i (xk ), μF˜ i (x) k = 1, . . . , p (4) k

k

is the kth active branch of the ith rule, where f i (x) = μF i (x1 ) ∗ . . . ∗ μF i (xp )

(5)

p

1

and i

f (x) = μF1i (x1 ) ∗ . . . ∗ μFpi (xp )

(6)

is the firing set of the IT2 TSK fuzzy logic system. The ith consequent Cji is an interval set of the form: Cji = cij + sij , cij − sij

where cij denotes the center of Cji and sij denotes the spread of Cji (i = 1, . . . , M and j = 1, . . . , p). Then the consequent of Ri , Y i = yli , yri , is an interval set and can be expressed as: yli =

p j=1

p

cij xj + ci0 −

j=1

and p j=1

cij xj + ci0 +

p

j=1

|xj | sij + si0 . (9)

AU

yri =

|xj |sij − si0 (8)

The output of an IT2 TSK fuzzy logic system is an interval type-1 set YTSK,2 = [yl , yr ]. It is then possible to calculate this set using the average yl and yr ; the output is: YTSK , 2 =

yl + yr , 2

(10)

where yl is

yl =

=

M

i=1

=

i=1

f i yri +

R i=1

f i+

M i=R+1 M i=R+1

i

f yri f

i

(12)

yri pir (x) = yrT pr (x)

L is the index of the rule-ordered FBF expansions at which yl is a minimum, and R is the index at which yr is a maximum. L and R are calculated using the algorithm presented in [1]. In addition, if yli = yri , this represents the case of an A2-C0 IT2 TSK FLS system. In Eq. (2), when Cji (j = 0, 1, . . . , p) are consequent type-1 fuzzy sets (C1), Y i , the output of the ith rule, is also a type-1 fuzzy set, and F˜ki (k = 0 1, . . . , p) are type-1 antecedent fuzzy sets (A1), which is the case of an A1-C1 IT2 TSK FLS. This is equal to a type-1 TSK FLS because they both provide identical results. 2.2. The recursive orthogonal least-squares method

TH OR

(7)

i i i=1 fr yr M fri i=1

CO PY

and

R

M

As mentioned above, a brief presentation of the basic principles of the specific OLS method is given in this section. Suppose that, as in [25], a particular system has one input u (k) and one output y (k) with an additive noise e (k), measured during a certain number t of time periods of length T ; it is then possible to describe its dynamic behavior using the following differences model: y (k) =

n

aj y (k − j) +

j=1

n

bj u (k − j) + e(k)

j=0

(13) where k = 1, 2, 3 . . . t; aj , bj ∈ R and n = system order. This can be written in more compact form: y (k) = pT z (k) + e (k)

(14)

where: M

L

i i i=1 fl yl M i fl i=1

M

i=1

=

i=1

i

f yli +

L i=1

i

f +

M i=L+1 M i=L+1

yli pil (x) = ylT pl (x)

pT = [b0 , a1 , b1 , . . . , an , bn ]

f i yli fi

(11)

(15)

is the parameter estimation matrix of size 2n + 1 and: zT (k) = [u (k) , y (k − 1) , u (k − 1) , . . . , y (k − n) , u (k − n)]

(16)


T

T

T

T

Y (t) = P Z (t) + E (t)

(17)

where the output vector of size t, is: YT (t) = [y (1) , y (2) , . . . , y(t)]

(18)

⎤

u(2), . . . u(t) y(1), . . . y(t − 1) ⎥ u(1), . . . u(t − 1) ⎥ ⎥ ... ... ... ⎥ ⎥ (19) ... ... ... ⎥ ⎦ y(1 − n), y(2 − n), . . . y(t − n) u(1 − n), u(2 − n), . . . u(t − n)

and the noise vector of size t is: (20)

For the estimation of P, it is required to minimize the next criteria: J = (Y(t) − Z(t)P(t))T I(Y(t) − Z((t))P(t))

(21)

The symmetric and positive matrix C(t + 1) of size (2n + 1) × (2n + 1) is defined as: −1 C(t + 1) = ZT (t + 1)Z(t + 1)

(22)

Ax = b

AU

this works as a covariance attenuation matrix of the identification process. On the other hand, the linear equation system

(23)

where A is a matrix of size m × n, x is a vector of size n, b is a vector of size m, and m > n, does not have an exact solution, and can be written as: Ax − b = e

(24)

where e, a vector of size m, is the error of any solution of Eq. (23). If: AT A = FT F

(25)

(26)

A least-squares solution can be found using Eq. (26). Now, considering the orthogonal transformation or rotational matrix defined as: (27)

Rewriting Eq. (24) as:

x =e [A : b] −1

(28)

where D = [A : x] is a matrix of size m × (n + 1), x is a vector of size n+1. Now applying and x = −1 the orthogonal transformation matrix T to Eq. (28), we can obtain:

TH OR

ET (t) = [e (1) , e (2) , . . . , e(t)]

−1 T A b Fx = FT

TT = T−1

the measurements matrix of size (2n + 1) × t is: ⎡ u(1), ⎢ y(0), ⎢ u(0), ⎢ ZT (t)= ⎢ ⎢ ... ⎢ ... ⎣

where F is any upper or lower triangular matrix of size n, then Eq. (23) can be written as:

CO PY

is the measurements vector of size 2n + 1. The model Eq. (14) can be expressed for t inputoutput data pairs as:

129

TDx = Te

(29)

If D = TD = TF is a triangular matrix, and b = T −1 A b then Eqs (26) and (29) are equivTe = F alent. F the resulting upper or lower triangular matrix of size n is the squared-root of Eq. (25). It is possible to apply the orthogonal transformation solution to equations system for parameters identification of discrete models. The least-squares solution of Eq. (17) can be expressed as

T Z (t)Z(t) PT = ZT (t)Y(t)

(30)

this can be obtained through the orthogonal transformations algorithm. This equation can be reduced to an equivalent triangular system: F(t + 1)P(t + 1) = q(t + 1)

(31)

where the upper triangular matrix F(t + 1), of proper size 2n + 1, is the square root of ZT (t + 1)Z (t + 1), and q (t + 1) is a vector of size of 2n + 1. For each period of time, the above algorithm reduces to zero one row of the compound vector [zT (t + 1)y(t + 1)], of size 2n + 2. The parameters of P(t + 1) can easily be calculated by use of the REDCO routine given in [25].


2.3. The backpropagation method As explained in [40], a squared error measure for the pth input-output pair can be defined as: (dk − xk )2 (32) Ep = k where dk is the desired output for node k and xk is the actual output for node k when the input part of the pth data pair is presented. To find the gradient vector, an error term εi for node i is defined as: εi =

∂ + Ep ∂xi

(33)

where ωij is the weight of the connection from node i to node j and ωij is zero if there is no direct connection. Then the weight update factor ωki for online learning is: Δωki = −η

2 1 (t) fs2 x − y (t) (36) 2 where fs2 x(t) is the IT2 system prediction, y (t) is the real output value, and x(t) is the input vector at time t of the system under identification. The universal approximation theorem [1] do not indicate how many inputs, what inputs, how many rules and how many fuzzy sets for each input variable must be used to construct an optimal and stable IT2 TSK FLS. Universal approximation theorem imply that by using enough inputs, enough fuzzy sets and enough rules, the IT2 TSK FLS controller can uniformly approximate any real continuous nonlinear function to arbitrary degree of accuracy. There are an enormous number of possibilities for an IT2 TSK FLS. The design degrees of freedom that control the accuracy of IT2 TSK FLS are number of inputs, number of rules and number of fuzzy sets for each input variable. Consider the ith input variable xi , were xi ∈ Xi = [Xi− , Xi+]. It is intuitively obvious that dividing the inter val Xi− , Xi+ into 200 overlapping regions will lead to greater resolution, and consequently greater accuracy than dividing the interval into 20 overlapping regions. If there are p inputs, each of which is divided into r overlapping regions, then a complete IT2 TSK FLS must contain pr rules. As resolution parameter increases, the size of the FLS becomes enormous. All of these degrees of freedom introduce uncertainties in the final IT2 TSK FTS models. However, this is the real and hidden power of the IT2 TSK FLS; they are the base of the heuristic construction of the system through the expert’s knowledge. e(t) =

TH OR

By the chain rule of calculus, a recursive formula for εi can then be written as: ⎧ ∂xi ⎪ −2 (di − xi ) ⎪ ⎪ ⎪ ∂xi ⎪ ⎪ ⎨ = − 2 (di − xi ) xi (1 − xi ) εi = ∂x (34) ∂ + Ep ∂xj i ⎪ ⎪ = ⎪ j,i

Orthogonal-least-squares and backpropa-gation hybrid learning ...

Orthogonal-least-squares and backpropa-gation hybrid learning ...

Suggest Documents

The Effectiveness of Hybrid Backpropagation Neural ...

Phased Backpropagation: A Hybrid of BPTT and Temporal ... - CiteSeerX

Towards the Optimal Learning Rate for Backpropagation

Unified Backpropagation for Multi-Objective Deep Learning

Incremental Backpropagation Learning Networks - Neural ... - CiteSeerX

Incremental Backpropagation Learning Networks - Neural ... - CiteSeerX

Delta rule and Backpropagation

Complex domain backpropagation - Circuits and

An Improved Backpropagation Method with Adaptive Learning Rate 1 ...

Automatic Adaptation of Learning Rate for Backpropagation Neural ...

Backpropagation-Decorrelation: online recurrent learning with O(N ...

Hybrid Learning in Mathematics Learning - ERICwww.researchgate.net › publication › fulltext › Hybrid-Le

Hybrid Learning Environments - OECD.org

Hybrid Learning Environments - OECD

BACKPROPAGATION IN MATRIX NOTATION

7 The Backpropagation Algorithm

Hybrid Heterogeneous Transfer Learning through Deep Learning

Applying Merging Convetional Marker and Backpropagation Neural ...

Hybrid Learning Model and Acoustic Approach to

Hybrid Learning Leads to Better Achievement and

hybrid system of learning vector quantization and

Hybrid Fuzzy Colour Processing and Learning

Hybrid Learning: Theory, Application and Practice

Hybrid Asynchronous and Synchronous Learning ... - CiteSeerX