Rome, 1989; The Sonda Project: Prevention from self and heterodestructive ... Sonda, Turin, 1994; Artificial Neural Networks and Finance, Armando, Rome,.
Substance Use & Misuse, 33(2), 495–501, 1998
Self-Recurrent Neural Network Massimo Buscema, Dr. Semeion Research Center of Sciences of Communication, Viale di Val Fiorita 88, Rome, Italy
Feed Forward Artificial Neural Networks of the Back Propagation family have both a weakness and a strength in their makeup: their layer of hidden units encodes input vectors in a manner that is inclined to be distributed. This type of encoding is a strong point of these ANNs since it is a very efficient encoding system from a computational viewpoint. Even from a neuro-biological viewpoint, the memorisation system is plausible. But precisely because of its power, this type of input vector codification is practically uncontrollable. There are many ways through which hidden units encode input vectors. Which of them is the most efficient on the basis of the relationships that each input variable has with every other such variable? The ideal answer to this question consists in allowing the hidden layer to also encode its own codification of the input vector: (1)
y f ( x)
generic ANN function with input x and output y vectors
(2)
y f ( g( x))
g(x) is the output vector of the ANN’s hidden layer
(3)
x x g( x)
x is the ANN’s input vector and x is the new input vector (extended input)
(4)
y f ( g( x g( x)))
Final Recurrent Equation
Equation (4) makes provision for the ANN’s layer of hidden units to 495
Copyright © Semeion 1998
496
BUSCEMA
encode the original input vector as well as the codification that the hidden layer carries out on that vector. Equation (4), therefore, provides for a recurrent ANN or, in this specific case, a Self-Recurrent ANN. The Self-Recurrent ANN that we have created1, tries to dynamically approximate the main components of the hidden layer units. To optimise this aim, we have tried to approximate the implicit function that defines the layer of hidden units:
G( x, g( x)) 0
(5)
In fact, if: (5a)
h( g( x)) g( x) ,
equation (5) is approximately composed linearly in input: (6)
y f ( g( x h( g( x))))
Since equation (5) is always linear (its solution is approximate to an Auto Associative ANN), the possibility of convergence of equation (6) is not put into question. On the contrary, the convergence speed of the Self-Recurrent ANN ought to be theoretically greater than that of Forward Feed ANNs. In fact, the hidden units of the Self-Recurrent Network also receive a preprocessed input that contains data on the relationships among input variables (data that the hidden units themselves are in the process of encoding). The topological structure of a Self-Recurrent Network can therefore be schematized in the manner as in Figure 1. A life-cycle of a Self-Recurrent Network can be summarized as follows: 1. transfer of the signal from the original input (I) to the first layer of hidden units (H1); 2. transfer of the signal from the first hidden layer (H1) to the second hidden layer (H2), to the first self-associated layer;
1
Massimo Buscema, September 1996, Semeion, Rome.
SELF-RECURRENT NEURAL NETWORK
Original Input Vector I
497
Added Input Vector
...
...
...
Hidden Units H1
...
...
H2
...
Output Units O
Figure 1. Topological structure of a Self-Recurrent Network.
3. correction of the connections between the first and second hidden layers (H1, H2); 4. transfer of the signal from the new extended output (I + H2) to the first hidden layer (H1); 5. transfer of the signal from the first hidden layer (H1) to the second hidden layer (H2), to the first self-associated layer; 6. correction of the connections between the first and second hidden layers (H1, H2); 7. transfer of the signal from the first hidden layer (H1) to the output (O); 8. correction of the connections between output (O) and the first hidden layer (H1); 9. correction of the connections between the first hidden layer (H1) and the extended input (I + H2). The equations that regulate the transfer of the signal from input to output are those that are already known in the family of Back Propagation ANNs:
498
(7)
BUSCEMA
N ui f u j wij i j
The only limitation necessary for Self-Recurrent Networks involves the transfer function that should preferably be a classical sigmoid: (7a)
1
ui 1 e
N u j wij i j
Except for a few details, the learning equations for the connection layer linked to the output vector are quite well known: (8)
wij (ti ui ) f (ui ) u j Smij
(8a)
Smij wij ( n1) t i ui
1 0.5 wij
(8a) is the SelfMomentum equation already presented in other publications (Buscema, 1994, 1997). Sm
(9)
wkp
kp 1 (u p u k ) f (u k ) u p wkp ( n1) u p u k 0.5 wkp
where: k[1...N]; p[1...M] and N = M and where N and M are the cardinal points of H1 and H2. From a functional viewpoint, the Self-Recurrent Network is presented as an adaptive system capable of dynamically checking the efficiency of the codification that the hidden units carry out on the input vectors. This ANN model should provide the following advantages: a. Greater precision in generalisations. This is due to a learning process during which the input codification itself is subjected to learning. At the same time, this should also lead to a reduction in overtraining and noise
SELF-RECURRENT NEURAL NETWORK
499
codification problems. b. Greater tolerance of and resistance to noisy inputs, since the latter are recontextualised together with the clusters that they themselves activate. c. Greater biological plausibility. External inputs are not in fact perceived as such, but are reread through the very categories that the system has built during earlier experiments. In practice, each new output is re-categorised. This ought to have a positive effect on the problem of stability and plasticity that besets many artificial adaptive systems.
REFERENCES ALMEIDA, 1987: L. B. Almeida, A learning rule for asynchronous perceptrons with feedback in a combinatorial environment, in M. Caudil & C. Butler (eds.), Proceedings of the IEEE First Annual International Conference on Neural Networks (609–618), San Diego, CA, 1987. BUSCEMA, 1994: M. Buscema, Squashing Theory. Modello a Reti Neurali per la Previsione dei Sistemi Complessi, Collana Semeion, Armando, Rome, 1994 [Squashing Theory: A Neural Network Model for Prediction of Complex Systems, Semeion Collection by Armando Publisher]. BUSCEMA, 1995: M. Buscema, Self-Reflexive Networks. Theory, Topology, Applications, in Quality & Quantity, Kluwer Academic Publishers, Dordtrecht, The Netherlands, Vol. 29(4) , 339–403, November 1995. BUSCEMA, 1997: M. Buscema, F. Matera, T. Nocentini, and P. L. Sacco, Reti Neurali e Finanza: Esercizi, Idee, Metodi, Applicazioni, Quaderni di Ricerca, Armando Editore, n. 2, Rome [Neural Networks and Finance: Exercises, Ideas, Methods, Applications, Semeion Research-book by Armando Publisher, n. 2]. CHAUVIN, 1995: Y. Chauvin and D. E. Rumelhart (eds.), Backpropagation: Theory, Architectures, and Applications, Lawrence Erlbaum Associates, Inc. Publishers, 365 Brodway, Hillsdale, New Jersey, 1995. DEPRIT, 1989: E. Deprit, Implementing recurrent back-propagation on the connection machine, Neural Networks, 2, 295–314, 1989. ELMAN, 1990: J. L. Elman, Finding Structure in Time, in Cognitive Science, Vol. 14, pp. 179–211, 1990. JORDAN, 1989: M. I. Jordan, Serial order: A parallel distributed processing approach, in J. L. Elman and D. E. Rumelhart, Advances in connectionist Theory: Speech, Erlbaum, Hillsdale, N.J., 1989. JORDAN, 1992: M. I. Jordan, Forward Models: Supervised Learning with a Distal Teacher, in Cognitive Science, Vol. 16, pp. 307–354, 1992. PEARLMUTTER, 1989: B. Pearlmutter, Learning State Space Trajectories in Recurrent Neural Networks, Neural Computation, 1, 263–269, 1989. PINEDA, 1987: F. J. Pineda, Generalization of Back-Propagation to Recurrent Neural Networks, Physics Review Letters, 18, 2229–2232, 1987.
500
BUSCEMA
PINEDA, 1988: F. J. Pineda, Generalization of Back-Propagation to Recurrent and Higher Order Networks, in Dana Z. Anderson (ed), Neural information processing systems, pp. 602–611, AIP, New York, 1988. PINEDA, 1995: F. J. Pineda, Recurrent Backpropagation Networks, in Y. Chauvin and D. E. Rumelhart (eds.), Backpropagation: Theory, Architectures, and Applications, pp. 99–135, Lawrence Erlbaum, Hillsdale, N.J., 1995. QIAN, 1989: N. Qian and T. J. Sejnowski, Learning to solve random-dot stereograms of dense and transparent surfaces with recurrent back-propagation, in D. Touretzky, G. Hinton and T. Sejnowski (eds), Proceedings of the 1988 Connectionist Models Summer School, pp. 435–443, Morgan Kaufman, San Mateo, CA, 1989. RUMELHART, 1985: D. E. Rumelhart, G. Hinton, and R. J. Williams, Learning Internal Representations by Error Propagation, Institute for Cognitive Science Report 8506, University of California, San Diego, CA, 1985. WERBOS, 1974: P. Werbos, Beyond Regression: New Tools for Prediction and Analysis in Behavioral Sciences, Phd Thesis, Harvard, Cambridge, MA, 1974. WILLIAMS, 1991: R. J. Williams and D. Zipser, Gradient-based learning algorithms for recurrent networks and their computational complexity, in Y. Chauvin and D. E. Rumelhart (eds.), Backpropagation: Theory, Architectures, and Applications, pp. 433–486, Lawrence Erlbaum, Hillsdale, N. J., 1995.
THE AUTHOR Massimo Buscema, Dr., computer scientist, expert in artificial neural networks and adaptive systems. He is the founder and Director of the Semeion Research Center of Sciences of Communication, Rome, Italy. He is formerly Professor of Science of Communication, University of Charleston, Charleston, West Virginia, USA, and Professor of Computer Science and Linguistics at the State Uni-versity of Perugia, Perugia, Italy. He is a member of the Editorial Board of Substance Use & Misuse, a faculty member of the Middle Eastern Summer Institute on Drug Use, co-editor of the Monograph Series Uncertainty and co-
SELF-RECURRENT NEURAL NETWORK
501
creator and co-director of The Mediterranean Institute. He is consultant of Scuola Tributaria Vanoni (Ministry of Finance), Ufficio Italiano Cambi (Bank of Italy), ENEA (Public Oil Company), Sopin Group (Computer Science Corporation) and many Italian Regions. He has published books and articles; among them: Prevention and Dissuasion, EGA, Turin, 1986; Expert Systems and Complex Systems, Semeion, Rome, 1987; The Brain within the Brain, Semeion, Rome, 1989; The Sonda Project: Prevention from self and heterodestructive behaviors, Semeion, Rome, 1992; Gesturing Test: A Model of Qualitative Ergonomics ATA, Bologna, 1992; The MQ Model: Neural Networks and Interpersonal Perception, Armando, Rome, 1993; Squashing Theory: A Neural Networks Model for Prediction of Complex Systems, Armando, Rome, 1994; Self-Reflexive Networks: Theory, Topology, Application, Quality & Quantity, 29, Kluver Academic Publishers, Dordrecht, Holland; Idee da Buttare, Edizioni Sonda, Turin, 1994; Artificial Neural Networks and Finance, Armando, Rome, 1997; A General Presentation of Artificial Neural Networks, in Substance Use & Misuse, 32(1), Marcel Dekker, New York, 1997; The Sonda Project: Prevention, Prediction and Psychological Disorder, in Substance Use & Misuse, 32(9), Marcel Dekker, New York, 1997.