Enhanced Artificial Neural Networks Using Complex Numbers
Howard E. Michel Computer Science Department University of Dayton Dayton, OH 45469-2160
[email protected]
and
Abstract
The model of a simple perceptron using phase-encoded input and complex-valued weights is proposed. The aggregation function, activation function, and learning rule for the proposed neuron are derived and applied to two and three input Boolean logic functions. An improvement of 135% over the theoretical maximum of 104 linearly separable problems (of three variables) solvable by conventional perceptrons is achieved without additional logic, neuron stages, or higher order terms such as those required in polynomial logic gates. Such a network is very attractive for optical implementation since optical computations are naturally complex.
Introduction The processing power of an artificial neuron is dependent on the information representation used in the neuron. Traditionally, artificial neural networks (ANNs) used to process real-valued physical data have relied on real-valued weights. The interconnection weights—which represent the learned behavior of the ANN—are derived from the recognition that at a simplified level, a biological neuron’s firing rate represents the information in the network. However, some of the limitations of existing ANNs may be traced to the limitations in the representation of information. The objective of this work is to develop a new neuron model and a new learning paradigm that can encode information such that large-scale problems can be more easily solved on digital computers. It is hypothesized that representing real world digitized scalar data as phase and operating on this data in the complex-domain, might improve the performance of ANNs.
A. A. S. Awwal Computer Science & Engineering Wright State University Dayton, OH 45435
[email protected]
The idea of using complex number in ANNs, however, is not new. Various researchers have developed complexvalued ANNs and applied them to complex-valued data, such as complex signals and Fourier transform [1-7]. Others have explored optics in the course of finding a suitable candidate for implementation of neural networks, which naturally perform calculations in the complex domain [8-10]. Complex numbers have also been exploited for Hopfield type associative memory for associative retrieval with partial input [11, 12] and for rotation invariant retrieval using Fourier transform of edge data [13]. Still others have developed complex-valued artificial neural networks to solve Boolean logic functions of n variables by selecting an output state from a complex plane divided into m regions, with m > n [14]. The work proposed here extends complex numbers for general ANN architectures and proposes a new learning paradigm. The representation of the new neuron is shown to be at least as computationally powerful as, and in many cases, more powerful than existing ANNs.
Mathematical representation of the proposed neuron The proposed complex-valued artificial neuron is similar in composition to a traditional artificial neuron except all weights, wi , will be represented by complex numbers. Externally, input data and output data will be real. Therefore, input mapping and output mappings are required, along with complex-valued internal neuron functions. These internal functions will be called aggregation (f1) and activation (f2). Each of these operations is developed in the following sections. The input mapping defines how the real-world data will be represented in the ANN calculations. In the complex-
valued artificial neuron, this mapping will be from a realworld value—typically defined as a real number, logic value, or real-world data—into a complex number. It may be noted that even in the traditional ANNs, real-valued data from the real world must be mapped into a specified range. Therefore, the input-mapping is not an additional stage only required in the complex-valued artificial neuron.
p i = λ pi e iψ i
(1)
if data = FALSE
0 m β :ψ i = π 2
if data = TRUE
e if data = FALSE mcv_perceptron_input: pi = iπ e 2 if data= TRUE
The complex-valued neuron will use a perceptron-like activation function, that is, a hard limiting function. Because the magnitude of complex number is easy to compute, and easy to measure optically and electronically, and because it captures the effects of angle differences and individual component magnitudes, it was chosen as the domain variable for the activation function. That is, the range of the complex valued activation function will be the magnitude of the values in the intermediate space, q. The activation function is shown in equation 5,where a and T are real numbers, and q is complex.
0 if q < T a= 1 if q ≥ T
(2)
0
(3)
(5)
As opposed to the linear threshold used in conventional neuron, this is equivalent to a circular threshold. Thus the value lies either inside or outside of the decision circle. The activation function mapping is thus of the form C à R.
To express the input mapping for the complex valued artificial neuron, assume that the set of input variables P is composed of n-tuples pi, where each of the components pi is expressed as equation 1. One possible input mapping for Boolean data is shown in equation 2. Equation 3 is the fully developed version of equation 1 for λp = 1. Discrete logic levels are thus coded as periodic pulse trains with unity magnitude and different phases. This is a mapping from Rn à Cn.
In a traditional neuron, the output mapping from an internal representation to the physical representation is required. This is a mapping of the form R à R that typically is concerned with scaling and/or numerical accuracy. Because the complex-valued artificial neuron’s activation function is of the form C à R, the output mapping in the complex-valued neuron is of the form R à R and is identical to the traditional neuron’s output mapping.
The complex-valued aggregation function is designed after the form of a traditional neuron’s aggregation function as shown in equation 4. Here, p ∈ Cn is column vector of the input components pi, and w ∈ Cn is a row vector of weights terms wi . The aggregation function is thus a mapping Cn à C.
The aggregation function of a two-input complex-valued neuron is shown in equation 6. The variables ψ, and λp were defined earlier by equation 1. The variables λw and θi correspond to the magnitude and angle of the weight term respectively. For simplification purposes in the present discussion, we will assume λw is 1, and the learned weights are represented in θi.
Unlike traditional neurons, this aggregation function is not linear, and the resultant output is dependent on the relationships among the various weights and inputs, as well as their individual values. These relationships will be described in detail below.
q = wp
(4)
The aggregation function feeds directly into the activation function; therefore, the range of the aggregation function is the domain of the activation function, which we will call the intermediate space. The range of the activation function is the output space. Note that the output space is not the real-world value, but the representation of the “solution” within the artificial neuron; however, the neuron must eventually respond with a real valued answer.
q = λw1λ p1 [cos(θ 1 + ψ 1 ) + i sin (θ 1 + ψ 1 )] + λw 2 λ p 2 [cos(θ 2 + ψ 2 ) + i sin (θ 2 + ψ 2 )] r= q
2
= 2 + 2 cos (θ 1 + ψ 1 − θ 2 − ψ 2 )
(6)
(7)
Since only the magnitude of the resultant vector (and not the phase) will effect the outcome of the activation function, and the fact that all λ’s = 1, equation 6 can be replaced by a simplified formula. Equation 7 expresses the magnitude squared of the intermediate result in terms of the magnitudes and phases of the inputs and weights for the simple 2-tuple neuron.
The effective change in output in response to a weight change depends on the relationship of that weight to the other weights and all inputs. A weight term is not simply associated with only its corresponding input. This issue will be considered further as a new learning rule is developed for complex-valued artificial neurons.
1 ∆w = ∆θ = δ r ∆r δθ
(8)
Assume that, the weight change is ∆w = wnew - wold. As discussed above, the relevant part of the weight term is its angle, θ, therefore, ∆w = ∆θ = θnew - θold. The required change in the resultant is expressed as ∆r = rnew - rold. Equation 8 relates the change in the weight, ∆w, to the change in the resultant vector, ∆r. Equation 9 is thus selected as the training rule for the complex-valued neuron. However, equation 9 assumes that the desired change in the resultant r, that is, ∆r, is known. In actuality, the desired change in the output, the error d – a, (desired – actual) is known. To arrive at the resultant r, this error must be brought back across the activation function, defined by equation 5. If the activation function was continuous, the partial derivatives in equations 8 and 9 could be extended back to the output. However, it is not; it is discontinuous at the threshold point. Therefore, it is not mathematically correct to take this derivative. As an approximation, it will be assumed that a correction of ∆r in the direction toward the threshold of the activation function, on either side of the threshold, will satisfy the training goal in a “local” manner. Specifically, if d – a is positive, ∆r should be positive, and vice versa if ∆r is negative. Therefore, by replacing ∆r in equation 9 with a proportion of the output error d – a, a final training rule for the 2-input complex-valued artificial neuron is shown in equation 10. The proportionality constant, η, is also known as the “learning” rate.
[θ1 new [θ1new
1 1 θ 2 new ] = [θ 1 old θ 2 old ] + ∆r δ r 2 δθ1 θ2new ] = [θ1old
1 θ2 old ] +η(d − a) δ r δθ1
1 (9)
δr δθ 2
1 (10) δr δθ2
The complex-valued 2-input-plus-bias neuron Bias in a traditional neuron can be viewed in two ways. The first is that the bias shifts the threshold point for the activation function. In this context, an equivalent bias term in the complex-valued neuron is a shift in the decision threshold of the activation function. This is equivalent to a
shrinking or expanding of the decision circle. The second view of the bias term in a traditional neuron is that it adds an input-independent value to the summation performed by the aggregation function. This view of bias can be accommodated in the new complex valued neuron by adding an input-independent complex number to the complex summation performed by the aggregation function. The incorporation of complex number in the complexvalued neuron is equivalent to a vector-like shift of the resultant vector in the intermediate space before thresholding. Note that in a traditional neuron, the threshold is the additive inverse of the bias, thus providing one additional “degree of freedom.” In the complexvalued neuron, the bias and threshold provide three additional degrees of freedom—one for the threshold, and one for each of the magnitude and angle of the bias. The addition of a bias term to an artificial neuron can be expressed by incorporating a bias element into the input and weight vectors to create extended vectors resulting in an increase of their dimensionality by one. Therefore, the 2-input-plus-bias complex-valued neuron uses a 3-tuple input set instead of a 2-tuple input set. This added term is a constant, independent of the input. This additional term should not be confused with the transformation from a 2-input threshold logic gate (TLG) to a 3-input polynomial logic gate (PLG), in which the additional term is a function of the other two inputs. The 2-input-plus-bias artificial neuron is still a single level operation. Changing the weight vector to an extended weight vector involves adding an additional weight term to be applied to the bias term. The extended pi is defined by equation 11, with its components, pi, defined by equation 1. The component b can be either real or complex. For simplicity, and without loss of generality, it will be assumed that b = 1. The extended w is defined by equation 12.
p i = ( p1
(
w = λw1 e iθ1
p2 λ w2 e iθ 2
b)
T
λb e iθ b
(11)
)
(12)
By applying equations 11 and 12 to the aggregation function defined by equation 4, an expression for the resultant q is obtained similar to equation 6. This is shown in equation 13. By making similar simplifying assumptions about the magnitudes of the input and weight terms, λp and λw respectively, an expression for the magnitude squared of the resultant, r, similar to equation 7 is obtained. It is shown in equation 14. Note that no assumptions were made about the magnitude of the bias
term, λb, and it is included in expression 14. This issue will be covered further below.
q = λw1λ p1 [cos(θ 1 + ψ 1 ) + i sin (θ 1 + ψ 1 )]
+ λw 2 λ p 2 [cos(θ 2 + ψ 2 ) + i sin (θ 2 + ψ 2 )] (13) + λb [cos(θ b ) + i sin (θ b )]
r= q
2
= 2 + λ2b + 2 cos (θ 1 + ψ 1 − θ 2 − ψ 2 ) + 2λb cos (θ 1 + ψ 1 − θ b )
(14)
+ 2λb cos (θ 2 + ψ 2 − θ b )
δr = 2λb sin(θ1 +ψ1 −θb ) + 2λb sin(θ2 +ψ2 −θb ) (15) δθb δr = −2sin(θ1 +ψ1 −θ2 −ψ2 ) − 2λb sin(θ1 +ψ1 −θb ) (16) δθ1 δr = 2sin(θ1 +ψ1 −θ2 −ψ2 ) − 2λb sin(θ2 +ψ2 −θb ) (17) δθ2 1new
][
]
1 θ2 new θb new = θ1old θ2old θbold +η(d −a) δ r δθ1
If the error is defined as the difference between the magnitude of the desired output and the actual output, it can be seen that by subtracting the error (which can be –1, 0 or 1) from the threshold, the threshold moves in the correct direction. By paralleling the perceptron learning rule, a new learning rule for the complex valued neuron was developed. The learning rule for the threshold T is shown in equation 19. In equation 19, η is a learning constant similar to equation 18 above, but the values need not be equivalent.
T new = Told − η (d − a )Told
Following the development, a learning rule similar to equation 10 is developed. Those equations are shown formally as equations 15 through 18.
[θ
desired output is 0 and the actual output is 1, the threshold should be increased.
1 δr δθ2
1 (18)
δr δθb
Up to this point, the development of the 2-input-plus-bias complex-valued artificial neuron has followed the development of the 2-input version. All input and aggregation equations developed for the more complex neuron have similar counterparts in the simpler neuron. The parameters λb and T not addressed will now be discussed The actual output ai is related to the threshold T through the hard-limiting function depicted in equation 5. That is, if the magnitude of the intermediate-space resultant qi is less than the threshold T, the actual output will be set to 0; otherwise, the actual output will be set to 1. Two error conditions can exist. First, the desired output d is 1 and the actual output is 0, and second, the desired output is 0 and the actual output is 1. These errors can be corrected as follows. If the desired output is 1, and the actual output is zero, the threshold should be reduced. Conversely, if the
(19)
Changing λb, the bias term magnitude, changes which intermediate-value terms qi will exceed the threshold magnitude T, and therefore, their corresponding output values ai . There is a very complex relationship between these terms however. It can be seen that the effect of changes in the magnitude of the bias term on the output is related to amount of change, and the angle of the bias term as a component of the total angle of the intermediate term, q. Equation 14 expresses the relationship between the intermediate resultant squared, r, and the magnitude of the bias term λb. The partial derivative of r, with respect to λb mathematically captures the effect of changing λb has on r. The goal of this learning rule is to change λb in such a manner as to effect a desired change in r. That is, given a desired change in r, what should be the change in λb? Equation 20 expresses the relationship. Again, η is a learning constant, not necessarily equivalent to the other learning constants used in equations 18 and 19.
λ b new = λ b old +
η (d − a ) δr δλ b
(20)
Computer Simulation result Applying the learning rules developed above, the 2-inputplus-bias complex-valued artificial neuron is capable of learning all 16 possible functions of two Boolean variables, x1 and x2. Traditional perceptrons are capable of learning only 14 of those functions. The learned weight-values for all 16 functions are shown in table 1. In table 1, the Y column represents all 16 possible functions of 2 Boolean variables. Each Y entry represents a function by specifying which of the four minterms are included in the output. That is, Y = y1 y2 y3 y4, with y1 = ¬x1 ¬x2, y2 = ¬x1 x2, y3 = x1 ¬x2 and y4 = x1 x2. A 1 associated with a particular minterm is interpreted to mean that minterm is included in the output function, a 0 means the minterm is not included. For example, Y = 0001 includes only minterm y4 and is thus
the AND function, while Y = 0110 includes minterms y2 and y3, and is thus the XOR function. The learned weights in table 1 represent solutions when all λi = 1 and T = 1.2. The input values are encoded as specified in equation 3. The correctness of the learned values can be verified by applying equation 13 to the vectors. For example, minterm y1 of the AND function is calculated
q = e i 0 e i 1.5425 + e i 0 e i 0.1151 + e −i 1.9035 = 0.7154 and minterm π i 2
y4
is
calculated
π i 2
q = e e i 1.5425 + e e i 0.1151 + e i 0 e −i 1.9035 = 1.4431 Applying equation 5, it can be seen that q1 < 1.2 and q4 > 1.2, therefore, the actual output is 0 for minterm y1 (as it would be for minterms y2 and y3 if they were shown) and is 1 for minterm y4. The 2-input-plus-bias complex-valued artificial neuron has been extended to 3-input plus bias neuron. The 3-inputplus-bias complex-valued artificial neuron was trained to learn all three-variable Boolean functions. Based on these simulations, the neuron was able to effectively compute solutions to 245 of the 256 possible functions. Since only 104 of these are linearly separable, the complex-valued neuron is at least 135% more powerful that a conventional perceptron.
Y= y1y2y3y4 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
θ1
θ2
θb
-0.3246 1.5425 -1.0964 0.0973 -2.4710 0.6664 -2.3024 -0.6707 1.9473 -2.4268 1.3255 2.1486 -0.5393 0.6765 1.1095 0.2856
1.4096 0.1151 1.9306 2.8959 -0.2429 -2.0151 1.0335 -0.4818 1.9035 -3.0646 -3.1392 -2.3042 -1.6445 -0.4642 0.8900 0.4488
-1.7286 -1.9035 1.0902 -3.0582 2.3096 0.7127 1.7290 2.2412 0.1437 1.1171 0.7240 -1.7459 -2.0235 -0.6235 -0.0889 0.6283
Table 1. Learned weights, in radians, for 2-input-plusbias complex-valued perceptron
Cost Issues Those modes of implementation that are inherently more powerful—i.e. optical computing, software implemented on parallel computers, or software implemented on computers with co-processors—will benefit more than implementations on standard serial computers. Expected benefits include reduced network size, reduced delay when operating in the recall phase, and quicker learning. These benefits will arise because the complex-valued representation will be computationally more powerful than the existing representations. For example, a single complex-valued neuron constructed using the new representation can solve problems that are not linearly separable. Conventional neurons require at least two layers to solve this problem; therefore, ANNs can be constructed with fewer artificial neurons. Although each individual neuron will be more complex, the overall ANN will require less hardware or use fewer mathematical operations to solve existing problems, therefore, speed of operation will be increased and cost will be lowered. These expected benefits are implementation dependent. The cost of complex-valued neuron is less in all cases than the traditional neuron when implemented optically. Therefore, all the benefits the complex-valued artificial neuron can be obtained without additional cost. Additionally, the complex-valued neuron should be equally superior in those implementations that provide hardware support for complex arithmetic, for example computers with neural-network co-processors based on digital signal processing chips. On those implementations dependent on standard serial computers, the complexvalued neuron will be more cost effective only in those applications where its increased power can offset the requirement for additional neurons.
Conclusion The complex-valued neuron was shown to demonstrate higher computational capability for a large class of problems involving Boolean functions. The complexvalued neuron is able to solve all 16 functions of 2-input Boolean logic, and 245 of the 256 functions of the 3-input Boolean logic.
References: 1. Nitta, T., "An extension of the back-propagation algorithm to complex numbers," Neural Networks, 10 (8), 1391-1415, 1997. 2. Benvenuto, N., and Piazza, F., "On the complex backpropagation algorithm," IEEE Transactions on Signal Processing, 40 (4), 967-969, 1992. 3. Leung, H., and Haykin, S., "The complex backpropagation algorithm," IEEE Transactions on Signal Processing, 39 (9), 2101-2104, 1991. 4. Georgiou, G. M., and Koutsougeras, C., "Complex domain backpropagation," IEEE Transactions on Circuits and Systems—
II: Analog and Digital Signal Processing, 39 (5), 330–334, 1992. 5. Smith, M. R., and Hui, Y., "A data extrapolation algorithm using a complex domain neural network," IEEE Transactions on Circuits and Systems—II: Analog and Digital Signal Processing, 44 (2), 143-147, 1997. 6. Arena, P, Fortuna, G., Muscato, G., and Xibilia, M. G., "Multilayer Perceptrons to approximate quaternion valued functions," Neural Networks, 10 (2), 335–342, 1997. 7. Hirose, A., "Dynamics of fully complex-valued neural networks," Electronics Letters, 28 (16), 1492–1494, 1992. 8. Casasent, D., and Natarajan, S., "A classifier neural network with complex-valued weights and square-law nonlinearities," Neural Networks, 8 (6), 989-998, 1995. 9. Weber, D. M. and Casasent, D. P., "The extended piecewise quadratic neural network," Neural Networks, 11, 837-850, 1998. 10. Hirose, A., "Applications of complex-valued neural networks to coherent optical computing using phase-sensitive detection scheme," Information Sciences, 2, 103-117, 1994. 11. Khan, J. I., "Characteristics of multidimensional holographic associative memory in retrieval with dynamic localizable attention," IEEE Transactions on Neural Networks, 9 (3), 389– 406, 1998. 12. Khan, J. I, and Yun, D. Y., "A parallel, distributed and associative approach for pattern matching with holographic dynamics," Journal of Visual Languages and Computing, 8 (2), 1997. 13. Awwal, A. A. S. and Power G., "Object Tracking by an Opto-electronic Inner Product Complex Neural Network," Optical Engineering, 32, 2782-2787, 1993. 14. Aizenberg, N. N., and Aizenberg, I. N., "Universal binary and multi-valued paradigm: Conception, learning, applications," Lecture Notes in Computer Science, 1240, 463-472, 1997.