Random Neural Networks of Biologically Plausible ... - CiteSeerX

2 downloads 0 Views 150KB Size Report
upon the pre-existence of suitable connectivity within the given network. ... to this problem is to assume the existence of a randomly connected self-contained.
Random Neural Networks of Biologically Plausible Connectivity James M. Hogan Joachim Diederich Neurocomputing Research Centre School of Computing Science Queensland University of Technology G.P.O. Box 2434, Brisbane, Q 4001, AUSTRALIA.

Abstract.

Recent physiological research suggests that learning of novel associations by adult humans may be mediated through synaptic plasticity and the recruitment of unused neurons into active cortical circuits. The feasibility of incremental learning within a feedforward network is examined under the constraint of biologically plausible connectivity. A randomly connected network (of physiologically plausible global connection probability) is considered under the assumption of a local connection probability which decays with distance between nodes. The representation of the function XOR is chosen as a test problem, and the likelihood of its recruitment is discussed with reference to the probability of occurrence of a subnetwork suitable for implementation of this function.

1. Introduction The mechanism by which adult humans are able to rapidly assimilate novel associations is a central issue of cognitive science. While the process must involve modi cation of the underlying neural representation, such modi cations are tightly constrained. In particular, the new representation must not involve the formation of topologically new circuits (Rakic [1]), and it should not disturb existing knowledge. Thus, the e ectiveness of learning within neural systems may depend signi cantly upon the pre-existence of suitable connectivity within the given network. Learning which depends (perhaps hierarchically) upon previously accumulated concepts is likely to be achieved more readily than that which is truly novel { tasks for which one may not assume that suitable connectivity has been previously established (Feldman [2]). Recent physiological research, for example (Gilbert and Wiesel [3], Ramachandran [4]), provides some support for this view, suggesting that signi cant changes in neural circuitry may take place rapidly, and that apparently unused neurons may over time be recruited into active neural circuits. It is therefore important to investigate whether novel learning tasks may be mediated through these processes. Traditional arti cial neural network learning algorithms (such as back-propagation) su er from so-called `memory washout' under re-training and are thus unsuitable for a computational investigation of this problem. Incremental learning, introduced by Valiant [5], involves the representation of successively more complex relations through the recruitment of existing (often functionally committed) circuitry without interruption of the previous functionality. In order to assess its importance as a potential mechanism for the processes described above, it is necessary to investigate its perfor-

mance under the constraint of biologically plausible connectivity { a question which has not been addressed until now. 1.1. Cortical Connectivity It is not practicable to exactly determine the network topology of the human neocortex, but statistical estimates of cortical connectivity have been made. For the mammalian cortex, it is known that a particular neuron may receive connections from fewer than 3% of neurons underlying the surrounding square millimetre of cortex (Stevens [6]). Such a gure is modest in comparison with the full connectivity usually assumed in arti cial networks, but the large number of neurons does imply that many cells are connected to a substantial number of their neighbours. It is our purpose to investigate whether the resulting connectivity harbours subnetworks that may be usefully exploited in novel learning situations. As the local cortical topology is unknown, a reasonable computational approach to this problem is to assume the existence of a randomly connected self-contained network (of physiologically plausible global interconnection probability), and to examine the probability of existence within this network of candidate structures potentially useful in the representation of simple functions. It is then essential to postulate some plausible regimes for the variation of local connection probabilities, and to analyse their consequences at a global level. 1.2. Our Approach This work examines the feasibility of learning of a pre-determined simple function in a layered random network of biologically plausible connectivity, through recruitment of a subnetwork known to be suitable for representation of this function. Central to this issue is the probability that a set of given input units will be connected to at least one candidate architecture. We present a detailed analysis of these issues for the example function XOR and a minimal 4-unit architecture for this problem (see gure 1). The network parameters are chosen consistent with a localised cortical functional unit of between 6000 and 120000 neurons and the network is assumed suciently large that the global connectivity estimate may be meaningfully applied. We propose and investigate a plausible local connectivity regime in which the probability of existence of a connection between two nodes depends upon their separation `distance', and we examine the e ect of the choice of parameters upon the probability of connection of input units to a candidate architecture. The paper concludes with a discussion of biologically plausible connectivity, and of the feasibility of recruiting a candidate structure.

2. The Network and the Local Probability Regime We examine a network of K layers, 0; 1; : : : ; K ? 1, each containing N nodes so that

there are a total of KN nodes in the network. Usually N is large, while K is restricted to a maximum of around 6. We consider single connections between arbitrary nodes i in layer k (which we shall write as (k; i)), and j in layer k +  ((k + ; j )). So that connections are counted only once, we allow links only for   0. We impose the further restrictions that no link may pass beyond the last layer (layer K ? 1), and that input enters the network only through the input layer (layer 0). For computational simplicity we allow recurrent connections only within the given layer.

Figure 1: The 4-unit XOR architecture. 2.1. Distance-Based Probabilistic Analysis Here we de ne a node separation (or `city block distance') , by the following relation (for   0) ij = ((k + ; j ); (k; i)) =  + jj ? ij: (1) We assume that there is a relatively high probability of connections within a local radius R of each unit, and that probabilities decay exponentially outside this region. Thus, for moderate values of R, long-range lateral connections are very unlikely to exist and the well-known focusing or re nement of input through lateral inhibition is restricted in scope. Re ecting these assumptions, we postulate discrete connection probabilities ( R P r((k; i) ! (k + ; j )) = = ; (2) ?R ;  = R + 1; : : : ; N + K ? 2 ; = ?R for convenience writing  to represent and ?R. Note that self loops are assumed to have the same probability as local connections, and that lateral connections obey the same laws as forward links. In this analysis, we model connections from a node (k; i) to other nodes as a series of independent Bernoulli trials with varying success probabilities  based upon the node separation. The outcome of each of these trials is a Bernoulli distributed random variable with associated generating function ! (s) = (  s +  ); (3) where  = 1 ?  is the failure probability. We develop the generating function for nki, the number of connections emanating from a particular node, as the rst step towards developing the distribution of the total number of links in the network. 0

+1

0

1

See for example (Feller [7]). Here s is the domain variable of the generating function and is of operational importance only. 1

We consider connections from a (non-central) node (k; i) to nodes in the same or subsequent layers. It is readily apparent that for small R (relative to N ), the Bernoulli trials form equivalent pairs (until the end of a layer is reached) and may be treated as repeated trials. Consider possible connections from a source node (say (k; 2)) within an example network layer for which N = 9 and R = 2. There are seen to be 5 trials at (including 2 pairs of trials) and 4 unpaired trials at probabilities : : : . The situation is similar when the analysis is extended to the following layer, excepting that the separation has increased by one, so that the probabilities are correspondingly lower, and another  is introduced. For large N and moderate R (relatively small with respect to N but signi cantly greater than 1), the troublesome end e ects may be eliminated through the arti ce of lateral wrap-around connections up to l = bN=2c units distant in either direction. In e ect, each node is treated as if it were the central node of the layer. We further assume that connections for which  > l have negligible probability (an assumption entirely justi ed for large N ). Under these assumptions, the generating function for the number of connections nki emanating from the node (k; i) is readily obtained. Following the classical approach for approximation of a series of Bernoulli trials of varying success probability (Feller [7]), we obtain the Poisson distribution for nki. Using the standard result that the generating function of the sum of independent random variables is the product of the individual generating functions, we obtain successively: nk , the number of connections from layer k; and C , the number of connections within the entire network. It has the Poisson distribution (NR)n ; P r(C = n) = e?NR (4) n! where 2 R = R + K K? ; (5)

R = RK (K + 1) + K ? K K ? K ? ; and E [C ] = NR. We estimate the value of by equating E [C ] with the physiological estimates produced by Stevens [6], and some other possible values . Then, if there are KN nodes within the network, 0

1

(

4

+1)

1

(

1)(2

1)

6

2

E [C ] = Cx =

xK N 2

2

(6) 100 ; where x is the global connection percentage. After substitution for K = 6 and appropriate manipulation, we obtain a quadratic equation in . Writing  = R=N and neglecting against  terms of O(1=N ), we obtain the condition 9x > (7) 1050 guaranteeing that the local probability < 1. Figure 2 shows values of as a function of  for possible values of x. It is clear from the graph that provided R remains a signi cant fraction of N , we are justi ed (at the global connectivity predicted by Stevens [6]) in disregarding possible connections with probability of O( ). These gures may be signi cantly more restrictive if the work of Stevens proves to be an underestimate of the true connectivity. 2

2

Stevens' result is believed by some researchers to be an underestimate of the true connectivity.



0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1

= (; x) (; 1) (; 2) (; 3) (; 5) (; 10)

0.15

0.2

0.25

0.3

 = R=N

0.35

0.4

0.45

0.5

Figure 2: Distance-based analysis: variation of w.r.t  for possible values of the global connectivity percentage.

3. Connection to a Candidate Architecture The example architecture we consider is characterised by a feedforward structure in which two input units are succeeded by a hidden unit and it in turn by an output unit. For the purposes of recruitment, we focus attention upon two adjacent input units (layer 0) and examine the probability that this pair will fail to be connected to at least one candidate structure. For clarity, we restrict the analysis to a three-layer network and evidently this provides a loose upper bound for the six-layer result. 3.1. Development of the Probability Distribution We consider a three-layer network, consisting of two nodes in layer 0, and N nodes in each of layers 1 and 2. We de ne a doublet as a node in a forward layer which receives an input from each of the two input units. The existence of such doublets may be modelled as a series of Bernoulli trials, each with success probability r = , and failure probability r = 1 ? r . Then r layer-1 doublets exist with probability: ! N r1 N ?r1 ; (8) r r r 2

1

1

and the expression for the probability of r layer-2 doublets is similar. Potential 4-unit architectures then require a connection between one of the r layer-1 doublets, and one of the r layer-2 doublets. There are N possible connections between layers 1 and 2, each of which may be regarded as a Bernoulli trial of success probability . From this set of potential links, there are r r links which are marked as connections between doublets. Let us assume that l of the N possible links are realised. Then the 2

1

2

2

1

2

2

two input units are connected to r  l candidate architectures with probability ! ! ! ! N ? r r l r1 r2 N 2 ?l N ? r1 r2 : rr N N r r l?r r r r 1

1

2

2

1

2

+

2

(

+

)

(9)

2

This expression is readily shown to be a probability distribution using an identity on binomial co-ecients. The input units fail to be connected to at least one candidate architecture with probability: ?r1 r2 N X N N 2X X r1 =0 r2 =0

l=0

N r

1

!

r r1 r N ?r1

N r

!

2

r r2 r N ?r2

N

2

?r r l

1

2

!

2 l N ?l :

(10)

For large N , and moderate values of , this expression may be approximated by N N X X r r e?r r 1 e?r r 2 r1 r2 ;

r1 =0 r2 =0

r1 !

r2 !

(11)

where r = N r . 3.2. XOR Architectures and the Distance-Based Regime In order to construct an XOR network under the distance-based probability regime, it is necessary to ensure that forward nodes are reachable by all of their input nodes from previous layers. Neglecting probabilities of O( ), this requirement means that each possible forward node must be within radius R of its possible input nodes. This somewhat cumbersome requirement is greatly simpli ed by the choice of adjacent input nodes, and the di erence in `reach' may be safely neglected. However it is important to note that the results using equation (11) presented in section 4.2, refer to a subnetwork of width R embedded within a larger network of width N . For these calculations, the parameter N from section 3.1 should be replaced by R. 2

4. Discussion 4.1. Biologically Plausible Connectivity The overall goal of this work is the successful demonstration of incremental learning of functional structures within networks of biologically plausible connectivity. Inherent in this goal is the replication of biological functionality through a psychologically interesting mechanism and the possible inference of more precise limitations on biologically realisable computations. The assumption of biologically plausible connectivity is limited to a feedforward architecture in order to facilitate recruitment of the functions described. Our analyses give useful insight into the problem of the existence of functionally useful regions within a sparsely connected network, but their direct applicability to biological networks will be limited by the presence of feedback connections not considered in our model. In examining our assumption of stochastic independence of connections, the question arises as to whether the `dicult' cortical computations occur within small regions of extreme local connectivity which are balanced (in the global view) by other regions of relatively low connectivity. Within our assumptions, we have demonstrated that local arbourisation of signi cant complexity is possible provided that long-range connections are correspondingly penalised.

4.2. Recruitment of Structures Results for representative values of  and R using equation 11 are shown in table 1, with values of R corresponding to networks of width N = 1000; 5000, and 10000. 

0:1 0:1 0:25 0:25

x%

3 5 3 5



0:25 0:43 0:11 0:18

N = 1000 1:45E ? 2 1:88E ? 8 4:63E ? 1 4:80E ? 3

N = 5000 5:43E ? 14 1:41E ? 40 4:71E ? 4 5:30E ? 18

N = 10000 1:44E ? 27 1:00E ? 80 7:55E ? 13 1:33E ? 35

Table 1: Probability that the input units fail to be connected to at least one candidate architecture for selected values of N . Values of the parameters  = R=N , x and are shown in columns 1-3. The gures presented suggest that recruitment of XOR structures may be viable in biologically plausible networks of quite moderate size. The results presented in table 1 were chosen to represent the e ect of variations in the size of the local region of high connection probability (i.e.   R), and possible variations in global connection probability. It is apparent that for N  5000, the input pair is virtually certain to be connected to at least one candidate architecture { regardless of the values of R and x. Even smaller networks are likely to prove viable when the full six layers are taken into account. Successful recruitment thus depends upon the resolution of con icts between competing structures, the degree to which interference from other nodes may be countered, and the existence of a suitable initial weight distribution within the network. The concentration of the input signal upon two adjacent nodes provides a partial answer to these questions. Sources of interference are then limited to those nodes which receive direct or strong indirect signals from the input pairs, as the signal from other input units is likely to be weak by comparison. But this does not address the problem of competing intermediate units, and a mechanism such as lateral inhibition will be required to di erentiate between competing structures. The convergence of the learning algorithm will depend upon the initial weights of the candidate network lying close to the desired values { although this constraint may have the desirable e ect of reducing the number of competing units. 4.3. Further Work In future work we shall attempt recruitment of XOR architectures within a network of xed topology { randomly generated according to the probability regimes developed in this paper { using incremental learning algorithms based upon those due to Valiant [5] for Boolean conjunctions and disjunctions. Here, functional changes will be mediated through the modi cation of synaptic weights by Hebbian learning, and existing functionality of constituent circuits will be preserved. 4.4. Conclusions In this paper we have demonstrated that a local connection probability regime based upon decay of the probability with `distance' separating the nodes may be made con-

sistent with biological estimates provided that there is a local region of relatively high probability surrounding each neuron. Under this assumption, we have shown for reasonable choices of the parameters that recruitment of structures in order to represent a simple Boolean function is likely to be viable. The success of attempts to recruit such structures within a simulated network will depend upon the initial weight distribution assumed for the realised links, and upon the robustness of the recruiting algorithms that are able to be developed. Acknowledgement The authors would like to acknowledge helpful discussions with Gerard Finn, and Jerry Feldman (who suggested the two-input approach).

References [1] Rakic, P., \Limits of Neurogenesis in Primates." Science, 227, (1985), pp. 1054{ 1056. [2] Feldman, J.A., \Dynamic Connections in Neural Networks." Biol. Cybernetics, 46, (1982), pp. 27{39. Reprinted in: Diederich, J. (Ed.), Arti cial Neural Networks: Concept Learning. Los Alamitos, IEEE Computer Society Neural Network Technology Series, (1990), pp. 48{62. [3] Gilbert, C.D. and Wiesel, T.N., \Receptive Field Dynamics in Adult Primary Visual Cortex." Nature, 356, (1992), pp. 150{152. [4] Ramachandran, V.S., \Behavioral and Magnetoencephalographic Correlates of Plasticity in the Adult Human Brain." Proc. Natl. Acad. Sci. USA, 90, (1993), pp. 10413{10420. [5] Valiant, L.G.,\Functionality in Neural Nets." Proc. of the National Conference on Arti cial Intelligence 1988, pp. 629{634. [6] Stevens, C.F., \How Cortical Interconnectedness Varies with Network Size." Neural Computation, 1, (1989), pp. 473{479. [7] Feller, W., \An Introduction to Probability Theory and its Applications, 3rd Edition." New York, John Wiley and Sons (1968).