Hebbian and Error-Correction Learning for Complex-Valued Neurons

2 downloads 76 Views 314KB Size Report
wP x ..., ,xf. +. +. +. = ,. (1) where n x ..., ,x. 1 are the neuron inputs and n. , ...,w. ,ww1. 0 are the weights. The values of this function are the k th roots of unity: 2.
Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

Hebbian and Error-Correction Learning for Complex-Valued Neurons Igor Aizenberg Texas A&M University-Texarkana, 7101 University Ave, Texarkana, TX. 75503, USA E-mail: [email protected]

Abstract. In this paper, we observe some important aspects of Hebbian and error-correction learning rules for complex-valued neurons. These learning rules, which were previously considered for the multi-valued neuron whose inputs and output are located on the unit circle, are generalized for a complex-valued neuron whose inputs and output are arbitrary complex numbers. The Hebbian learning rule is also considered for the multi-valued neuron with a periodic activation function. It is experimentally shown that Hebbian weights, even if they still cannot implement an input/output mapping to be learned, are better starting weights for the error-correction learning, which converges faster starting from the Hebbian weights rather than from the random ones.

Keywords: complex-valued neural networks, derivative-free learning, multivalued neuron, Hebbian learning, error-correction learning, XOR problem 1. Introduction In this paper, we consider Hebbian and error-correction learning rules for complex-valued neurons. Complex-valued neural networks, that are the networks based on neurons with complex-valued weights, inputs, outputs, and activation functions become very popular. As it was shown in a number of works, complexvalued neurons have higher functionality, learn faster, and generalize better than their real-valued counterparts (Hirose 2006; Mandic and Su Lee Goh 2009; Aizenberg 2011). There are different specific types of complex-valued neurons and complex-valued activation functions. A good observation can be found, for example, in (Hirose 2006) and (Aizenberg 2011). There are also different learning techniques developed for complex-valued neurons. Among many different learning techniques there are the two ones, which considered classical. The first one is based on the Hebbian learning rule, which build associations of the neuron inputs with the desired outputs (Hebb 1949). The second technique is based on the error-correction learning rule, first introduced for a Boolean threshold neuron by F. Rosenblatt in (Rosenblatt 1960). The Hebbian 1

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

learning rule has been considered for complex-valued neurons with arbitrary inputs and outputs in (Fiori 2003; Fiori 2005), and for the multi-valued neuron (MVN) whose inputs and output are located on the unit circle in (Aizenberg 2011). However, the Hebbian learning rule was not yet considered for the multivalued neurons with a periodic activation function, which can learn non-linearly separable input/output mappings (Aizenberg 2010). The error-correction learning rule was deeply considered for the multi-valued neuron in Aizenberg et al. (2000) and (Aizenberg 2010), but it was not considered yet for a complex-valued neuron with arbitrary complex-valued inputs and output. Since both Hebbian and errorcorrection learning techniques are classical and fundamental, we would like to cover the mentioned gaps in this paper. So we would like to generalize the errorcorrection learning rule developed earlier for the multi-valued neuron, for a complex-valued neuron with arbitrary complex inputs and output. We will also consider the Hebbian learning rule for the multi-valued neuron with a periodic activation function, and discover how the error-correction rule is related to the Hebbian one for complex-valued neurons. It is important to mention that we will derive the Hebbian learning rule for a complex-valued neuron with arbitrary complex inputs and output in the different way compared to that one used in (Fiori 2003; Fiori 2005). Our considerations will be based on that approach, which has been used in (Aizenberg 2011) for the MVN Hebbian and error-correction learning rules. Since we will employ the same approach, which was already used to develop both Hebbian and error-correction learning techniques for MVN in (Aizenberg 2011), it should be reasonable to observe MVN and its properties in more detail. The discrete MVN was introduced in Aizenberg and Aizenberg (1992). The continuous MVN was introduced in Aizenberg et al. (2005). MVN is a complexvalued neuron whose inputs and output are located on the unit circle. However, its weights are arbitrary complex numbers. It should be mentioned that the discrete MVN activation function, which was proposed in Aizenberg et al. (1971), is the first historically known complex-valued activation function. It is a function of the argument (phase) of the weighted sum. It maps the entire complex plane into the unit circle. MVN’s main advantages are higher functionality in comparison with other neurons and simplicity of learning. MVN is successfully used in a number of applications (associative memories Aizenberg and Aizenberg (1992), 2

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

Jankowski et al. (1996), Aoki et al. (2001), Muezzinoglu et al. (2003), (Aizenberg 2011), an MVN-based feedforward neural network Aizenberg and Moraga (2007), etc. - a detailed observation is given, e.g. in (Aizenberg 2011)). The most remarkable of these applications is an MVN-based multilayer feedforward neural network (MLMVN) introduced in Aizenberg and Moraga (2007) with its derivative-free learning algorithm. This network outperforms many other machine learning techniques in terms of generalization capability, number of parameters employed and network complexity. In (Aizenberg 2010), MVN with a periodic activation function (MVN-P), which may learn non-linearly separable input/output mappings, was introduced. MVN-P's main advantage over other artificial neurons is its higher functionality. A single MVN-P can easily learn nonlinearly separable input/output mappings, and, for example, classical XOR and Parity n non-linearly separable problems are about the simplest, which can be learned by a single MVN-P (Aizenberg 2008), (Aizenberg 2011). MVN and its learning were first observed in detail in Aizenberg et al. (2000). The most comprehensive observation of MVN, MVN-based neural networks, and their applications is presented in (Aizenberg 2011). As we have mentioned above, Hebbian and error-correction learning rules for MVN have been considered in (Aizenberg 2011). Here we will generalize the Hebbian learning rule for MVN-P. We will also generalize both these learning rules for a complex-valued neuron whose inputs and output are arbitrary complex numbers. We will also consider how the error-correction learning rule and the Hebbian learning rule are connected with each other. The structure of the paper is as follows. Section 2 is devoted to basic fundamentals of complex-valued neurons. After MVN and MVN-P are recalled in subsection 2.1, a complex-valued neuron with arbitrary complex inputs and output is introduced in subsection 2.2. Section 3 is devoted to the Hebbian learning technique for complex-valued neurons. In subsection 3.1, the Hebbian learning for MVN is recalled. In subsection 3.2, the Hebbian learning for MVN-P is considered, and in subsection 3.3, the Hebbian learning technique for a complexvalued neuron with arbitrary inputs and output is developed. Section 4 is devoted to the error-correction learning for complex-value neurons. Simulation results are presented in Section 5.

3

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

2. Complex-valued neurons 2.1.

Multi-valued neuron (MVN)

To introduce a complex-valued neuron with arbitrary complex inputs and output as a generalization of MVN, let us recall first some MVN basic fundamentals for the reader's convenience. The discrete MVN was proposed in Aizenberg and Aizenberg (1992). It is based on the principles of multiple-valued threshold logic over the field of complex numbers presented in Aizenberg et al. (2000), (Aizenberg 2011). The discrete MVN implements an input/output mapping between n inputs and a single output. This mapping is described by a multiplevalued (k-valued) threshold function of n variables f ( x1 , ..., xn ) . We have to mention that we consider here input/output mappings described by functions of multiple-valued logic over the field of complex numbers (Aizenberg 2011). While in traditional multiple-valued logic its values are encoded by the integers from the

= set K

{0,1,..., k − 1} ,

in the one over the field of complex numbers they are

j i 2π j / k encoded by the kth roots of unity Ek = {ε 0 , ε , ε 2 ,..., ε k −1} , where ε = e ,

j = 0 ,..., k - 1 , (i is an imaginary unity).

P( z ) = e j⋅i 2π / k Fig. 1. Geometrical interpretation of the discrete MVN activation function

4

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

Fig. 2. Geometrical interpretation of the continuous MVN activation function

A k-valued threshold function f ( x1 ,..., xn ) : O n → Ek (here O is a set of complex numbers located on the unit circle), which presents an input/output mapping implemented by the discrete MVN, is represented using n+ 1 complex-valued weight as follows f ( x1, ..., xn ) = P( w0 + w1 x1 + ... + wn xn ) , where

(1)

x1, ..., xn are the neuron inputs and w0 ,w1 , ...,wn are the weights. The

values of this function are the kth roots of unity: ε j = ei 2π

j/k

, j ∈ {0 ,1,..., k - 1} , i is

an imaginary unity. P is the activation function

P( z ) = ei 2π j / k , if 2π j / k ≤ arg z < 2π ( j +1) / k ,

(2)

where j=0, 1, ..., k-1 are values of k-valued logic, z = w0 + w1 x1 + ... + wn xn is the weighted sum, arg z is the argument of the complex number z. It is important to mention that function (2), which was introduced in Aizenberg et al. (1971), is historically the first known complex-valued activation function. Function (2) divides a complex plane into k equal sectors and maps the whole complex plane into a set of kth roots of unity (see Fig. 1). The continuous MVN has been presented in Aizenberg et al. (2005). The continuous case corresponds to k → ∞ in (2). If the number of sectors k → ∞ (see Fig. 2), then the angular size of a sector tends to 0. Hence, an activation function in this case becomes simply a projection of the weighted sum z = w0 + w1 x1 + ... + wn xn onto the unit circle: 5

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273 iArg z ( z ) exp(i (arg= P z )) e= z/ | z | , =

(3)

where z is the weighted sum, Arg z is a main value of its argument (phase) and |z| is the absolute value of the complex number z. Activation function (3) maps the whole complex plane into the unit circle (see Fig. 2) Since we will consider below how the Hebbian learning technique works for MVN with a discrete periodic activation function, let us recall basic essentials related to this neuron. MVN-P has been presented in (Aizenberg 2010). Its main advantage is its ability to learn non-linearly separable input/output mappings. MVN-P is determined by its k-valued l-periodic activation function

Pl ( z ) = j mod k, if 2π j / m ≤ arg z < 2π ( j +1) / m, j = 0,1,..., m − 1; m = kl , l ≥ 2,

(4)

which divides a complex plane into m=kl equal sectors (Fig. 3). Function (4) is periodic because the neuron’s output determined by this function is equal to

− 1, 0,1,..., − 1,..., 0,1,..., − 1, 0,1,..., k k k    l −1 0  1  lk = m

Fig. 3. Geometrical interpretation of the k-periodic l- multiple discrete-valued MVN activation function (4)

depending on which one of the m sectors the weighted sum is located in. Particularly, for k=2, activation function (4) determines a special kind of MVN-P, 6

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

the universal binary neuron (UBN), which easily learns such classical non-linearly separable problems as XOR and Parity n without any network (Aizenberg 2008). 2.2.

A complex-valued neuron with arbitrary complex inputs and output

As we have seen, discrete MVN implements an input/output mapping f ( x1 ,..., xn ) : Ekn → Ek , continuous MVN implements an input/output mapping

f ( x1 ,..., xn ) : O n → Ek , and MVN-P implements an input/output mapping f ( x1 ,..., xn ) : Ekn → Em ; m = kl

where

Ek = {ε 0 , ε , ε 2 ,..., ε k −1}

and

Em = {ε 0 , ε , ε 2 ,..., ε m −1} are the sets of kth and mth roots of unity, respectively. So

MVN's and MVN-P's inputs and output are located on the unit circle. However, there are interesting practical problems, which are described by functions of complex variables whose domain and range are not limited to the unit circle. For example, a wind profile forecasting described by Goh et al. (2006) should be mentioned. Let us now consider a complex-valued neuron whose inputs and output are arbitrary complex numbers. Let xi ∈ ; i = 1,..., n be neuron's inputs. Let Φ be a neuron's activation function, and y ∈  be a neuron's output. Thus in general a complex-valued neuron with arbitrary complex-valued inputs and output implements an input/output mapping f ( x1 ,..., xn ) :  n →  employing a standard neural mechanism f ( x1 , ..., xn ) = Φ ( w0 + w1 x1 + ... + wn xn ) ,

(5)

where w0 ,w1 , ...,wn are the weights. An activation function Φ can be a eβ z − e− β z nonlinear function (for example, the complex tanh function Φ ( z ) = eβ z + e− β z where β is a slope parameter (for simplicity, it can be taken β = 1 ) can be used as it is recommended for example, in Goh et al. (2006)). An activation function Φ can also be a linear function. We will consider below Hebbian and errorcorrection learning rules for a complex-valued neuron with arbitrary inputs and output.

7

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

3. Generalized Hebbian learning for a complexvalued neuron In this section, we will derive the Hebbian learning rules for MVN-P and for a complex-value neuron with arbitrary complex inputs and output as generalization of the MVN Hebbian learning rule. Let us first recall how the latter works. 3.1 Hebbian learning for MVN Hebbian learning for MVN with regular activation functions (2) and (3) was considered in (Aizenberg 2011). The mechanism of the Hebbian learning for a complex-valued neuron is the same as the one for the classical threshold neuron and as it was described by D. Hebb in his seminal book (Hebb 1949): “The general idea is … that any two cells or systems of cells that are repeatedly active at the same time will tend to become 'associated', so that activity in one facilitates activity in the other”. So, a mechanism of the Hebbian learning is the association. If a neuron “fires”, its weights should help it to “fire” amplifying excitatory inputs and inverting the inhibitory ones. In turn, if a neuron “sleeps”, its weights should help it to “sleep” amplifying inhibitory inputs and inverting the excitatory ones. As it is known, this association is utilized through multiplication of the desired output by the corresponding input and finally through a dot product of a vector of desired outputs and a vector of the corresponding inputs (Haykin 1998; Fiori 2003; Fiori 2005). The Hebbian learning rule for MVN as it was shown in (Aizenberg 2011), works through the same mechanism. Let us recall it for the reader’s convenience. Let us have N n-dimensional learning samples

( x ,..., x ) , j = 1,..., N . Let j 1

j n

f = ( f1 ,..., f N ) be an N-dimensional vector-column of the desired outputs. Let T

(

x1 ,..., x n be N-dimensional vectors of all the inputs ( x1 = x11 , x12 ,..., x1N

)

T

,

x 2 = ( x12 , x22 ,..., x2N ) , …, x n = ( x1n , xn2 ,..., xnN ) ). T

T

Then according to the Hebbian learning rule the weights w1 ,..., wn are calculated as normalized dot products of vector f and vectors x1 ,..., x n ,

8

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

respectively. The weight w0 is calculated as a dot product of vector f and the Ndimensional vector-constant x 0 = (1,1,...,1) : T

= wi

1 = ( f , x ) , i 0,..., n , ( n + 1) i

where ( a, b )= a1b1 + ... + anbn is the dot product of vector-columns a = ( a1 ,..., an )

T

and b = ( b1 ,..., bn )

T

in the unitary space (“bar” is a symbol of complex

conjugation), thus

= wi

1 = (f , x ) ( n + 1) i

1 f1 xi1 + f 2 xi2 + ... + = f N xiN ) , i 0,..., n . ( ( n + 1)

(6)

Let us now consider the following example illustrated in Fig. 4. Let k=4 in the discrete MVN activation function (2). Thus, our MVN works in 4-valued logic whose values are encoded by the elements of the set E4=

{1, i, −1, −i}

i ε= ei 2π /4 is an imaginary unity, which is also a primitive 4th root of a unity). (= 4 Let us now consider MVN with the two inputs (Fig. 4). In Fig. 4a, the desired MVN output is i, while its two inputs are i and -1, respectively. According to (6)

w0=

1 1 1 1 i; w1= f1 x1= i ⋅ (−i )= , 3 3 3 3

weighted sum is z=

and

1 1 1 w2 = f1 x2 = i ⋅ (−1) =− i . 3 3 3

The

1 1 1 i + i − i ⋅ (−1)= i , and according to (2) the neuron's 3 3 3

output is P (i ) = i . Thus, the weight w1 passes the input x1 to the output, while the weight w2 “rotates” the input x2 passing it to the output. In Fig. 4b, the desired MVN output is 1, while its two inputs are i and -i, respectively.

w2 =

According

to

(6)

1 1 1 1 w0 = ; w1 = f1 x1 = ⋅ (−i ) =− i , 3 3 3 3

and

1 1 1 1 1 1 − i ⋅ i + ( i ) ⋅ (−i ) =1 , and f1 x2 = ⋅ i = i . The weighted sum is 3 3 3 3 3 3

according to (2) the neuron output is P(1) = 1 . Thus, both weights w1 and w2 “rotate” the inputs x1 and x2 passing them to the output.

9

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

(a)

(b)

Fig. 4 Calculation of the MVN weights using the Hebbian rule (6) for the two neuron inputs and for a single learning sample: the weight is equal to the product of the desired output and the complex-conjugated input

3.2 Hebbian learning for MVN-P Comparing activation functions (2) and (4), we can easily conclude that MVN-P differs from discrete MVN in the following. Discrete MVN implements a function f ( x1 ,..., xn ) : Ekn → Ek of k-valued logic (fully or partially defined), while MVN-P

implements a partially defined function f ( x1 ,..., xn ) : Ekn → Em ; m = kl of mvalued logic. The latter function is defined only on k-valued variables (since m kl ; l ≥ 2 , then k < m ). But in such a case, MVN-P can be considered as an =

m-valued MVN and the Hebbian rule (6) can be used for its learning. To show how it works, let us now consider how the Hebbian learning rule (6) can be used for learning XOR problem using a single MVN-P. Let k=2, l=2 in (4). Then (4) is transformed to the following equation

 1, if 0 ≤ arg z < π / 2 or π ≤ arg z < 3π / 2 Pl ( z ) =  −1, if π / 2 ≤ arg z < π or 3π / 2 ≤ arg z < 2π ,

(7)

which is also illustrated in Fig. 5. On the one hand, it is shown in (Aizenberg 2008; Aizenberg 2011) that the XOR problem can be solved using a single MVN-P with activation function (7). On the other hand, it is known from (Aizenberg 2010; Aizenberg 2011) that if there is an input/output mapping n presented by a k-valued function f ( x1 ,..., xn ) : Ek → Ek , which can be learned

using a single MVN-P with activation function (4), then there exists a partially defined (in a k-valued domain Ekn ⊆ Emn ) function of m=kl-valued logic

10

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

f ( x1 ,..., xn ) : Ekn → Em , which can be implemented using a single MVN with mvalued discrete activation function (2) (where k=m).

Fig. 5. Periodic activation function (4) for k=2, l=2

Let us consider an input/output mapping presented by the XOR function

f ( x1 , x2 ) : E22 → E2 , which is a Boolean function (a function of 2-valued logic). The XOR problem can be implemented using a single MVN-P, for example, with the weighting vector ( 0, i,1) (Aizenberg 2011). This is illustrated in Table 1. Table 1 MVN-P with activation function (4) (k=2, l=2) or (7) (which is the same) implements the

f ( x1 , x2 ) = x1 xor x2 function with the weighting vector (0, i, 1)

x1

x2

z =w0 + w1 x1 + w2 x2

arg( z )

Pl ( z )

f ( x1 , x2 ) = x1 xor x2

1

1

i +1

π /4

1

1

1

-1

i −1

3π / 4

-1

-1

-1

1

−i + 1

7π / 4

-1

-1

-1

-1

−i − 1

5π / 4

1

1

Evidently, the same weighting vector

( 0, i,1)

implements a 4-valued function

f ( x1 , x2 ) : E22 → E4 using a single MVN with activation function (2) (where k=4). This function f ( x1 , x2 ) is composed as f ( x1 , x2 ) = P ( z ) in Table 2.

11

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

Table 2 Composition of the partially defined (on the Boolean variables) function

f ( x1 , x2 ) = P ( z ) of 4-valued logic using MVN with activation function (2) (k=4) and the weighting vector (0, i, 1)

x1

x2

z =w0 + w1 x1 + w2 x2

arg( z )

f ( x1 , x2 ) = P ( z )

1

1

i +1

π /4

1

1

-1

i −1

3π / 4

i

-1

1

−i + 1

7π / 4

-i

-1

-1

−i − 1

5π / 4

-1

Let us now learn the function f ( x1 , x2 ) (and therefore we will learn the XOR function simultaneously) using the Hebbian learning rule (6). Evidently, x 0 = (1,1,1,1) . It is seen from Table 2 that x= 1 T

and = f

(1, i, −i, −1)

T

1 ( f , x0 =) 3 1 w= ( f , x1=) 1 3 1 w= ( f , x 2 =) 2 3 w= 0

(1,1, −1, −1)

T

, x1 = (1, −1,1, −1) , T

. According to (6) we obtain the following weights

1 (1 + i − i − 1=) 3 1 (1 + i + i + 1=) 3 1 (1 − i − i + 1=) 3

0, 2 (1 + i ) , 3 2 (1 − i ) , 3

2  2  which form the weighting vector w =  0, (1 + i ) , (1 − i )  . 3  3  Table 3 Implementation of the function

f ( x1 , x2 ) using a single MVN and the XOR function  

using a single MVN-P with the same weighting vector w =  0,

2 2 (1 + i ) , (1 − i )  resulted 3 3 

from the Hebbian learning

x1

x2

z =w0 + w1 x1 + w2 x2

arg( z )

P( z ) = f ( x1 , x2 )

Pl ( z ) = x1 xor x2

1

1

4/3

0

1

1

1

-1

( 4 / 3) i

π /2

i

-1

-1

1

− ( 4 / 3) i

3π / 2

-i

-1

-4/3

π

1

1

-1 -1

12

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273 This weighting vector implements the function f ( x1 , x2 ) using a single MVN with activation function (2) (where k=4) and the XOR function using a single MVN-P with activation function (4) (k=2, l=2) or (7) (which is the same). This is shown in Table 3. These considerations show that the MVN Hebbian learning rule (6) can also be used for MVN-P. Thus the Hebbian learning rule is an efficient tool not only for MVN, but also for MVN-P. It is necessary to know in advance that partially defined m-valued function f ( x1 ,..., xn ) : Ekn → Em which a k-valued function f ( x1 ,..., xn ) : Ekn → Ek that we want to learn, is mapped to.

3.3 Hebbian learning for a complex-valued neuron with arbitrary inputs and output Let us now generalize the Hebbian learning rule for a complex-valued neuron whose inputs and output are arbitrary complex numbers except 0. Such a neuron works according to (5). Let us have a learning set containing N n-dimensional learning samples to. Let f = ( f1 ,..., f N ) be an N-dimensional vector-column of T

the desired outputs. Let x1 ,..., x n be N-dimensional vectors-columns of all the inputs x r =

= Let x r

x , x ,..., x ) ; r (= 1 r

(

2 r

N T r

1,..., n .

)

= ( x1r ) , ( xr2 ) ,..., ( xrN ) ; r 1,..., n . Let also x0 = (1,1,...,1) . Then −1

−1

−1 T

T

the generalized Hebbian learning rule for finding the weights w1 ,..., wn for a complex-valued neuron is as follows.

= wi

(

)

−1 −1 −1 1 f1 ( xi1 ) + f 2 ( xi2 ) + ... + = f N ( xiN ) ; i 0,..., n . (n + 1)

(8)

A wonderful property of (8) is that if xi ∈ Ek or xi ∈ O; i = 1,..., n (so if the neuron inputs are kth roots of unity or arbitrary points on the unit circle), then (8) transforms to (6), that is to the Hebbian learning rule for MVN. Indeed, if xi ∈ Ek or xi ∈ O; i = 1,..., n , then x= = xi 1;= i 1,..., n and i

xi ) ; i (= −1

1,..., n . This means

13

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273 that if all xi , i = 1,..., n are located on the unit circle, then (8) really transforms to (6).

(a)

(b)

Fig. 6 Calculation of the weights for a complex-valued neuron with arbitrary complex inputs and output using the generalized Hebb rule (8)

Let us consider the following example, which illustrates how the generalized Hebbian rule (8) can be used for a complex-valued neuron whose inputs and output are arbitrary complex numbers (except 0) and whose activation function is linear. This example is illustrated in Fig. 6. Suppose for simplicity, but without loss of generality that our neuron has an identical linear activation function z ( Φ ( z ) equals the weighted sum z = w0 + w1 x1 + ... + wn xn ). Φ( z ) =

In Fig. 6a, the desired neuron output is 4i, while its two inputs are 2i and 2, respectively. According to (8) w= 0

1 4 1 1 1 2 1 1 1 2 . −1 −1 4i ⋅= 1 i; w= f1 ( x1 ) = 4i ⋅ (− i= ) ; w= f1 ( x2 ) = 4i ⋅ = i 1 2 3 3 3 3 2 3 3 3 2 3

The weighted sum and the output of the neuron is z =

4 2 2 i + 2i + i ⋅ 2 = 4i . 3 3 3

In Fig. 6b, the desired MVN output is 4i, while its two inputs are i and -2i, respectively. According to (8) w= 0

1 4 1 1 4 1 1 1 2 −1 −1 4i ⋅= 1 i; w= f1 ( x1 ) = 4i ⋅ (−i= ) ; w= f1 ( x2 ) = 4i ⋅ = i − . 1 2 3 3 3 3 3 3 3 2 3

The weighted sum and the output of the neuron is z=

4 4 2 i + i − ⋅ (−2i )= 4i . 3 3 3

Thus in all considered examples, the weights obtained according to generalized Hebbian rule guarantee that the actual neuron output coincides with its desired output. 14

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

However, when there are more learning samples in the learning set, the Hebbian learning rule usually does not lead to a weighting vector, which implements the corresponding input/output mapping. Nevertheless, the learning algorithm, which is based on the error-correction learning rule, converges faster when the learning process starts from this (Hebbian) vector than when it starts from a random vector. Let us consider this property.

4. Generalized error-correction learning for a complex-valued neuron Let us now generalize the MVN error-correction learning rule for a complexvalued neuron with arbitrary inputs and outputs described by (5). The errorcorrection learning rule is the most efficient rule among several MVN learning techniques. The error-correction learning rule is identical for both discrete and continuous MVNs. The MVN learning algorithm based on the error-correction rule is presented and analyzed in detail in (Aizenberg 2011) where its convergence is also proven. Let us recall how the MVN error-correction learning rule works. The most important property of MVN learning is that it is derivative-free. Let D be the desired neuron output and Y be the actual one. Then δ= D − Y is the error, which determines the adjustment of the weights performing as follows

Cr Cr 1,..., n , δ ; wir +1 = δx; i = w0r +1 = w0r + wir + ( n + 1) ( n + 1) i

(9)

where xi is the ith input complex-conjugated, n is the number of neuron inputs, δ is the neuron’s error, r is the number of the learning step, wir is the current ith weight (to be corrected), wir +1 is the following ith weight (after correction), Cr is the learning rate (it may always be equal to 1). Let us now again consider a complex-valued neuron whose inputs and output are arbitrary complex numbers (except 0). This neuron is described by (5). We would like to generalize MVN error-correction learning rule (9) for such a neuron. Again, for simplicity, but without loss of generality, we may consider that the activation function of this neuron is a linear identical function Φ ( z ) = z.

15

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

It is important that in the error-correction learning rule (9) the adjusting term, which is added to the ith weight to correct it, contains a factor xi . Let us generalize rule (9) for a complex-valued neuron whose inputs and output are arbitrary complex numbers (except 0) in the following way:

Cr Cr δ ; wir +1 = δ xi−1 ; i = 1,..., n . w0r +1 = w0r + wir + ( n + 1) ( n + 1)

(10)

Let us show that if the weights are corrected according to the generalized errorcorrection rule (10), then the updated neuron output is equal exactly to the desired value. Let D be the desired neuron output and Y be the actual one. Then

δ= D − Y is the error. Let us use (10) to adjust the weights: z = w 0 + w1 x1 + ... + w n xn = 1 1 1       δ  +  w1 + δ x1−1  x1 + ... +  wn + δ xn−1  xn =  w0 + n +1   n +1 n +1     1 1 = w0 + w1 x1 + ... + wn xn + δ + ... + δ = z + δ = Y + δ + D.   n +1 1 + n   z

(11)

n +1 times

Thus, a single learning step with the modified error-correction rule makes it possible to reach the desired output immediately. Equation (11) shows that after the weights are corrected, the weighted sum is changed exactly by δ , that is by the error. Since MVN inputs x1 ,..., xn are located on the unit circle ( xi ∈ Ek or xi ∈ O; i = 1,..., n ), then x= = xi 1;= i 1,..., n and i

xi ) ; i (= −1

1,..., n . Taking this

into account, we can easily conclude that the MVN learning rule (9) directly follows from the generalized error-correction learning rule (10). As it is well known and as we have already mentioned above, the Hebbian learning rule usually cannot create a weighting vector, which implements a corresponding input/output mapping. For example, if a neuron solves a classification problem, this means that the Hebbian weights usually cannot separate classes. But the Hebbian weights, as it was discovered for MVN in (Aizenberg 2011), form a better starting point for the learning algorithm based on the error-correction learning rule than the random weights. A learning process based on the error-correction learning converges faster starting from the Hebbian weights than starting from the random weights. This interesting property of the 16

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

Hebbian weights can be explained as follows. If some input/output mapping can be learned using a single neuron, then some ideal weighting vector exists for this input/output mapping. All other weighting vectors, which implement the same input/output mapping using a single neuron, should be either collinear or near collinear to an ideal weighting vector. A wonderful property of a Hebbian weighting vector is that even if it does not implement a given input/output mapping, which can be learned using a single neuron, it is always close to an ideal weighting vector in terms of collinearity and can easily be adjusted using the error-correction learning rule, if necessary. Let us confirm this conclusion by some simulations.

5. Simulation results Let us illustrate by simulations how efficient the error-correction learning is when it follows the Hebbian one and starts from the Hebbian weights. We will consider one benchmark problem (the 3-valued max function) for MVN and a real-world problem of wind profile forecasting for a complex neuron with arbitrary complex inputs and output. 1) Learning of the 3-valued function max ( x1 , x2 ) . Let us consider the input/output mapping described by the 3-valued function of 2 variables

f max ( x1 , x2 ) = max ( x1 , x2 ) . Thus k=3, and

xi ∈ E3 = {ε 0 , ε1 , ε 2 } . Evidently,

f = ( ε 0 , ε , ε 2 , ε , ε , ε 2 , ε 2 , ε 2 , ε 2 ) , where ε = ei 2π /3 is the primitive 3rd root of a T

unity. Let us find the Hebbian weights for f max ( x1 , x2 ) . According to (8) we obtain the following Hebbian weighting vector

WH = ( −0.11 + 0.06i, 0.167 − 0.032i, 0.167 − 0.032i ) . This weighting vector does not implement the function f max ( x1 , x2 ) . Distribution of the weighted sums over the complex plain with the weighting vector WH is shown in Fig. 7a. The outputs for five learning samples out of nine (samples 2, 4, 6, 8, 9) are incorrect. However, they can easily be corrected using the learning algorithm based on the error-correction rule (9). After just a single learning iteration the actual outputs for all the learning samples coincide with the desired outputs (see Fig. 7b). If the same learning process starts from the random weights, 17

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

more than a single iteration is needed to complete it. This is confirmed by 10 independent runs.

(a) Distribution of the weighted sums with the Hebbian weighting vector

(b) Iteration 1

Fig. 7 Movement of the weighted sum z after the correction of the weights according to (9) starting from the Hebbian weighting vector

2) Learning of the wind profile forecasting problem. This problem is important for the estimation of the wind turbines power output Goh et al. (2006). This estimation should be based on processing the wind-vector (speed and direction, which define a wind signal, which in turn can be treated as a complex-valued signal because each its component is characterized by a pair of real numbers Goh et al. (2006)). The data used in our simulations were obtained from the Iowa (USA) Department of Transport 1. We used the data sampled at 3 hours. Since wind signal has a spatial and temporal dimension, it can be treated as a time series x1 , x2 ,..., xn −1 , xn , xn +1 ,... . We have created a learning set for prediction of the value xt from the 5 preceding values xt −5 , xt − 4 , xt −3 , xt − 2 , xt −1 . After 500 learning samples have been created, the Hebbian learning rule (8) was applied. A weighting vector, which results from (8), however does not implement that input/output mapping, which we have constructed. As in the previous examples,

z . It should be we used a linear identical activation function of a neuron Φ ( z ) = mentioned that this activation function is better for the considered problem than

1

The data are available at http://www.commsp.ee.ic.ac.uk/~mandic/wind-dataset.zip and the same data were used in Goh et al. (2006)

18

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

the complex tanh function (this was tested experimentally). We used a root mean square error (RMSE) between the actual and desired outputs, separately calculated for real and imaginary parts, as a criterion of accuracy RMSEre =

( Dre − Yre ) N

2

; RMSEim =

( Dim − Yim )

2

N

where D is a desired output, Y is an actual output, and N is the number of learning samples. We have considered the learning results satisfactory if RMSE for both real and imaginary parts dropped below 0.04. The Hebbian weights resulted in RMSEre = 0.2398 and RMSEim = 0.2046 . However, a single iteration (!) of the learning algorithm based on the error-correction rule (10) with Cr ≡ 1 and started from the Hebbian weights is enough to drop the error below the desired threshold: RMSEre = 0.0393 and RMSEim = 0.0382 .

Table 4 The results of 10 trials of the learning algorithm based on the error-correction rule (10) applied to the wind profile forecasting problem and started from the random weights # of trial

1

2

3

4

5

6

7

8

9

10

# of iter.

500

500

2

2

2

2

2

2

500

1

0.0403

0.0383

0.0375

0.0378

0.0390

0.0353

0.0391

0.0382

0.0399

0.0380

0.0404

0.0402

0.0393

0.0396

0.0391

0.0363

0.0393

0.0383

0.0399

0.0382

RMSEre RMSEim

Contrary to this, the results of the learning algorithm based on the errorcorrection learning rule, but started from the random weights, may vary dramatically depending on how successful these starting random weights are. The results of 10 independent runs are summarized in Table 4. If the learning process has not converged after 500 iterations with a desired accuracy, we stopped it. As we see, occasionally, one of the 10 "random" trials (# 10) has converged after a single iteration as well as the learning process started from the Hebbian weights. However, 3 trials out of 10 have not converged with the desired accuracy after 500 iterations, while 6 other trials have converged after 2 iterations. This experiment shows that random starting weights for the error-correction learning rule can be occasionally very suitable, but also can occasionally be very

19

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

unsuitable, while the Hebbian weights always form a suitable starting point for the learning algorithm based on the error-correction rule.

6. Conclusions In this paper, we have considered Hebbian and error-correction learning rules for complex-valued neurons. Both learning rules, which have been earlier considered for the multi-valued neuron whose inputs and output are located on the unit circle, were generalized in this paper for a complex-valued neuron whose inputs and output are arbitrary complex numbers. It was also shown how the Hebbian learning rule can be used for the multi-valued neuron with a periodic activation function. We have also discovered that even if a Hebbian weighting vector does not implement some input/output mapping, which can be implemented using a single neuron, it can easily be adjusted using the error-correction learning rule. Acknowledgement. This work is supported by the National Science Foundation under Grant 0925080.

References [1] Aizenberg I (2008) Solving the XOR and Parity n Problems Using a Single Universal Binary Neuron. Soft Comput 12(3): 215-222. [2] Aizenberg I (2010) A periodic activation function and a modified learning algorithm for a multi-valued neuron. IEEE Trans Neural Netw 21(12):1939–1949 [3] Aizenberg I (2011) Complex-valued neural networks with multi-valued neurons. Springer, Heidelberg [4] Aizenberg NN, Aizenberg IN (1992) CNN based on multi-valued neuron as a model of associative memory for gray-scale images. Proceedings of the Second IEEE International Workshop on Cellular Neural Networks and their Applications. Technical University Munich, Germany, October 14–16, pp 36–41 [5] Aizenberg NN, Ivaskiv YL, Pospelov DA (1971) About one generalization of the threshold function. Doklady Akademii Nauk SSSR (The Reports of the Academy of Sciences of the USSR) (in Russian) 196(6):1287–1290 [6] Aizenberg I, Aizenberg N, Vandewalle J (2000) Multi-valued and universal binary neurons theory, learning and applications. Kluwer Academic Publishers, Boston/Dordecht/London [7] Aizenberg I, Moraga C (2007) Multilayer feedforward neural network based on multi-valued neurons (MLMVN) and a backpropagation learning algorithm. Soft Comput 11(2):169–183

20

Published in Soft Computing, vol. 17, No 2, February 2013, pp. 265-273

[8] Aizenberg I, Moraga C, Paliy D (2005) Feedforward neural network based on multi-valued neurons. In: Reusch B (ed) Computational intelligence, theory and applications. Advances in Soft Computing, XIV. Springer, Berlin, pp 599–612 [9] Aoki H, Watanabe E, Nagata A, Kosugi Y (2001) Rotation-invariant image association for endoscopic positional identification using complex-valued associative memories. In: Mira J, Prieto A (eds) Bio-inspired applications of connectionism, Lecture notes in computer science, vol 2085. Springer, Berlin, pp 369–374 [10] Fiori S (2003) Extended Hebbian Learning for Blind Separation of Complex-Valued Sources, IEEE Trans Circuits Syst – Part II, 50(4): 195-202. [11] Fiori S (2005) Non-Linear Complex-Valued Extensions of Hebbian Learning: An Essay, Neural Computation, 17(4): 779-838. [12] Goh S. L., Chen M., Popovic D. H., Aihara K., Obradovic D., and Mandic D. P. (2006) Complex-Valued Forecasting of Wind Profile, Renewable Energy, 31, 1733-1750 [13] Haykin S (1998) Neural Networks: A Comprehensive Foundation (2nd Edn.), Prentice Hall. [14] Hebb DO (1949) The Organization of Behavior, John Wiley & Sons, New York [15] Hirose A (2006) Complex-valued neural networks. Springer, Berlin [16] Jankowski S, Lozowski A, Zurada JM (1996) Complex-valued multistate neural associative memory. IEEE Trans Neural Netw 7:1491–1496 [17] Mandic D, Su Lee Goh V (2009) Complex valued nonlinear adaptive filters noncircularity. Widely linear and neural models. Wiley, New York [18] Muezzinoglu MK, Guzelis C, Zurada JM (2003) A new design method for the complexvalued multistate Hopfield associative memory. IEEE Trans Neural Netw 14(4):891–899 [19] Rosenblatt F (1960) On the Convergence of Reinforcement Procedures in Simple Perceptron. Report VG 1196-G-4. Cornell Aeronautical Laboratory, Buffalo, NY.

21

Suggest Documents