Implementation of a “Communication Language” In Supervised Learning of Artificial Neural Networks I.V. Grossua,*, C.I. Ciuluvica (Neagu)b a
University of Bucharest, Faculty of Physics, Bucharest-Magurele, P.O. Box MG 11, 077125, Romania b University of Bucharest, Faculty of Psychology and Education Sciences, 90 Panduri Street, District 5 Bucharest, P.O. 050663, Romania
ABSTRACT Inspired by the importance of both communication and error feedbacks in the natural learning process, our main goal was to use a similar mechanism in supervised learning of artificial neural networks. Thus, instead of using label information only as a desired ANN output, we also included it in the system as a particular dimension of the input space (in the same way words are associated to real world objects in natural learning). This result in the need of both modifying the activation function for neurons in the first hidden layer and adding error examples (incorrectly labeled data) in the training set. Following the belief that basic concepts should be tested on simple examples, we tried to apply the previous mentioned mechanism on a XOR equivalent problem. In this context, we noticed the interesting fact that the ANN was capable of recognizing all input forms in the absence of corresponding label information (“words”), even though this kind of examples were not explicitly included in the training set. Further analysis along applying this approach to more complex scenarios is currently in progress, as we consider the proposed “language-supervised” learning might contribute to a better understanding of mechanisms involved in natural learning, and opens also the possibility of creating a specific category of “communicative” artificial neural networks, with abstraction capabilities.
1. Introduction
Considering the importance of communication in the natural learning process [1-4], in this work we present an attempt of implementing a “communication language” in supervised learning of artificial neural networks. In this context, we noticed also the significant role played by error feedbacks. Thus, as Blair [1] suggests, nearly all categorization models take errors to be the essential ingredient in learning. According to Van Dyck et al. [5], the evaluation of past behaviors and acting upon the awareness that errors hold useful information are indeed considered as important practices to learn from errors. In literature, several authors: Edmonson [6], Van Dyck et al. [5], Rochlin [7], argue that communication is one of the most important condition for learning from errors to occur. More specifically, Van Dyck and colleagues argue that communication about errors is pivotal to the promotion of learning and innovation, while Rochlin stressed the importance of communication when dealing with human errors.
2. Implementation of a “Language-Supervised” ANN Training Algorithm
Starting from the idea that both communication and feedback to errors should play an important role in supervised learning, our main purpose was to create an ANN with the following capabilities:
*Corresponding author. E-mail addresses:
[email protected] (I.V.Grossu)
•
• • •
Communicability. The label information is not introduced in the system only as a desired ANN output, but also as an additional dimension in the input space (in the same way words are associated to real world objects in natural learning). Abstraction. Capability of recognizing labels (words) in the absence of any corresponding input form. Pattern recognition. Capability of recognizing patterns in the absence of corresponding input label information. Sensitivity to errors. Capability of recognizing incorrectly labeled data.
For a better understanding of concepts involved, we tried to apply the previous discussed concepts to the case of Multilayer Perceptron with Backpropagation [8-11]. Thus, the training set will include both regular examples (data, label, desired output) and labels alones, (null, label, desired output) vectors. For implementing the sensitivity to errors, in addition to the output neurons (O1,O2,…,On) corresponding to the classes of interest, we considered a new set of neurons (On+1,…,Op), representing the output subspace of errors. The training set will include also incorrectly labeled data examples. Although other, more flexible, approaches might be discussed, for simplicity, we considered that a fixed number of neurons are allocated in the first hidden layer for each class/label. Establishing these values might be considered as part of the initial step of choosing the ANN topology, in agreement with the specific problem of interest. In order to use the language axis as an input space divisor, the label dendrite should implement a specific behavior. Thus, as opposed to the neuron threshold, the label weight is constant (it is not changed during the training process). It contributes also to the neuron activation:
where y is the neuron output, f the activation function, n the dimension of the input data, w the weight matrix, x the input data matrix, t the neuron threshold, and wlabel is the label weight. Also as opposed to the neuron threshold, which might be implemented as a constant input weight, the label dendrites should react to input values. Thus, following our goal of creating an ANN capable of recognizing patterns in the absence of corresponding input labels, we considered the following modified sigmoid activation function for each neuron in the first hidden layer:
where xlabel is the label input. 3. Application to a simple not linearly separable problem
Following the belief that basic concepts should be tested on simple examples, we considered an XOR [8] equivalent problem: class1 = {(1,1),(2,2)}, class2 ={(1,2),(2,1)}. In 2
this example, zero is used for indicating the information absence (another approach might be based on nullable data types [12]). We chose a two-layer Perceptron with two hidden neurons for each label. The output layer contains one neuron for each class (O1 and O2), and one neuron for errors (O3). The MLP schema is presented in Fig.1.
2 O3
X2 2 X1
O2 1
Xlabel
O1 1
Fig.1. The MLP topology for the XOR equivalent problem discussed.
The training set contains: label information, input data with their corresponding labels, and negative examples (Tab.1).
x1
x2 0 0 1 1 2 2 1 1 2 2
xlabel d1 label information 0 1 1 0 2 0 (data, label) examples 1 1 1 2 2 0 1 2 0 2 1 1 negative examples 1 2 0 2 1 0 1 1 0 2 2 0
d2
d3 0 1
0 0
0 1 1 0
0 0 0 0
0 0 0 0
1 1 1 1
Tab.1. The training set for the XOR equivalent problem discussed.
The neuron weights for one approximate solution are presented in Tab.2.
3
Weights Neurons
H1 H2 H3 H4
O1 O2 O3
The Hidden Layer w2 t
w1
wlabel
-6.2906 4.9581 10.6337 -1.3723
-6.2907 15.1038 4.9581 -17.3967 10.6337 -28.2216 -1.3691 1.6392 The Output Layer w1 w2 w3 w4 12.6621 12.3410 -2.0275 -1.9140 -0.6923 -3.2406 15.3415 26.2034 -24.5173 -24.4831 -13.8343 -23.6743
1 1 2 2 t -7.4585 -22.0046 19.8240
Tab.2. Approximate solution for the XOR equivalent problem discussed.
For estimating the classification errors, we calculated the Euclidian distance between the ANN output (oi) and the desired output (di):
The outputs for the previous approximate solution are presented in Tab.3. x1 0 0 1 1 1 1 1 1 2 2 2 2 2 2
x2 0 0 1 1 1 2 2 2 1 1 1 2 2 2
xword 1 2 0 1 2 0 1 2 0 1 2 0 1 2
o1 0.99 0 0.97 0.99 0 0 0.01 0 0 0.01 0 0.9 0.99 0
d1
o2 1 0 1 1 0 0 0 0 0 0 0 1 1 0
0 0.97 0.02 0 0.04 0.94 0 0.97 0.94 0 0.97 0 0 0.04
d2 0 1 0 0 0 1 0 1 1 0 1 0 0 0
o3 0.01 0.04 0 0.02 0.95 0 1 0.04 0 1 0.05 0 0.02 0.94
d3 0 0 0 0 1 0 1 0 0 1 0 0 0 1
Dε 0.014 0.050 0.036 0.022 0.064 0.060 0.010 0.050 0.060 0.010 0.058 0.100 0.022 0.072
Tab.3. Approximate solution output for the XOR equivalent problem discussed.
One can notice that the ANN was able to learn all examples from the training set, but is also capable of recognizing patterns in the absence of any corresponding word, even though this information was not explicitly included in the training set.
4
4. Conclusions
Starting from the importance of both communication and feedback to errors in natural learning process, we tried to implement similar concepts in supervised learning of artificial neural networks. Following this purpose the input space was extended with the label dimension (the “language” axis), error classes were added to the output space, a modified sigmoid activation function was considered for neurons in the first hidden layer, and both language information (labels) and negative examples were included in the training set. For a better understanding of basic concepts involved, we tested the previous discussed mechanism on the simplified case of an XOR equivalent problem. Using a two-layer Perceptron with Backpropagation, we obtained some interesting results. The ANN was able to learn all examples from the training set (including labels, and errors). The presence of input label information results in stronger output signals. The ANN is also capable of recognizing input forms in the absence of corresponding labels, even though this kind of information was not explicitly included in the training set. This result could be connected with the interference between communication and feedback to errors mechanisms, which importance is discussed also in psychology [5-7]. Further analysis along applying this approach to more complex scenarios is currently in progress, as we consider the proposed “language-supervised” learning might contribute to a better understanding of mechanisms involved in natural learning, and opens also the possibility of creating a specific category of “communicative” artificial neural networks, with abstraction capabilities.
References
[1] [2] [3] [4] [5]
[6]
[7] [8] [9] [10] [11] [12]
R.M. Blair, R.M. Watson, M.M. Kimberly. Errors, efficiency, and the interplay between attention and category learning, Cognition, 2009, 112, 330-336 G.L. Murphy, The Big Book of Concepts, MIT Press, 2002 B.H. Ross, E.G. Taylor, and E.L. Middleton, Concept and Category Learning in Humans, Learning and Memory, A Comprehensive Reference, 2008, 2, 535-556 J.R. Anderson, The adaptive nature of human categorization, Psychological Review, 1991, 98, 409–429 C. Van Dyck, M. Frese, M. Baer, & S. Sonnentag, Organizational error management culture and its impact on performance: A two-study replication, Journal of Applied Psychology, 2005, 90, 1228–1240 A.C. Edmondson, Learning from mistakes is easier said than done: Group and organizational influences on the detection and correction of human error. Journal of Applied Behavioral Science, 1996, 32, 5–28 G.I. Rochlin, Safe operation as a social construct, Ergonomics, 1999,42(11),1549–60 D. Dumitrescu, H. Costin, Retele Neuronale Teorie si Aplicatii, Teora, Romania, 1996 P. Antognetti, V. Milutinovic, Neural Networks. Concepts. Applications and Implementations, vol 1…4, Prentice Hall, Englewood Cliffs, New Jersey, 1991 J.A. Freeman, D.M. Skapura, Neural Networks. Algorithms, Applications, and Programming techniques, Addison-Wesley Publ. Co., 1991 R.H. Silverman, A.S. Noetzel, Image processing and Pattern Recognition in Ultrasonograms by Backpropagation, Neural Networks, vol. 3 (1990), 593-603 Joseph Albahari, Ben Albahari, C# 4.0 in a Nutshell, O’Reilly, US, 2010, 237-240, 681733, 789-941 5