Chapter 4 On other metaphors and the difference between symbolism and ..... A quick scan through the 1993 volume of. Developmental ... network simulations guided by biological concerns. By closely following the first ...... are trainable. ..... environment. Staged structure on the other hand provides a filter which limits the.
M ASTER T HESIS
A STAGEWISE T REATMENT
OF
ROY M EIJER
A UGUST 1994
C OGNITIVE A RTIFICIAL INTELLIGENCE D EPARTMENT OF PHILOSOPHY U TRECHT U NIVERSITY H EIDELBERGLAAN 8 3584 CS UTRECHT T HE N ETHERLANDS
C ONNECTIONISM
Contents Chapter 1 Introduction .......................................................................1 1.1 Finding a strategy ..........................................................................1 1.2 Recurrent theme ............................................................................2 1.3 A personal note: on the backgrounds of this thesis.....................................3 Chapter 2 Setting the stages................................................................4 2.1 Introduction .................................................................................4 2.2 Neural networks............................................................................4 2.3 On the Stagewise Treatment of Connectionism (STC).................................5 2.4 The Proper STC .......................................................................... 10 2.5 STC applied............................................................................... 13 2.6 Remarks ................................................................................... 14 Chapter 3 To boldly go.................................................................... 15 3.1 Introduction ............................................................................... 15 3.1.1 Boxes and links, potentials and problems............................................. 15 3.1.2 Developmental psychology and connectionism....................................... 15 3.2 The importance of link number one .................................................... 16 3.3 On the value of box three................................................................ 18 3.3.1 Functional architecture................................................................... 18 3.3.2 Catastrophe theory and stagewise development ...................................... 18 3.3.3 Dynamic systems and cognitive growth ............................................... 19 3.4 Problems, problems...................................................................... 20 3.5 Wrong use of models .................................................................... 20 3.5.1 Overstatements............................................................................ 20 3.5.2 Interference................................................................................ 21 3.6 Use of wrong models.................................................................... 22 3.7 Summary .................................................................................. 23 Chapter 4 On other metaphors and the difference between symbolism and connectionism............................................................ 24 4.1 Introduction ............................................................................... 24 4.2 Metaphors ................................................................................. 25 4.3 Setting some symbolist stages .......................................................... 26 4.3.1 The conceptual level...................................................................... 26 4.3.2 Stagewise Treatment of Symbolism (STS)............................................ 26 4.4 Debate...................................................................................... 27 4.4.1 Projections and brains ................................................................... 27 4.4.2 Progress and lapse ....................................................................... 29 4.4.3 Creativity and constraints................................................................ 30 4.5 In terms of STC .......................................................................... 31 4.6 Objectivity revisited: the cycle of empiric progress.................................. 32 4.7 Concluding remarks...................................................................... 33 Chapter 5 Cognitive developmental modelling..................................... 35 5.1 Introduction to the problem ............................................................. 35 5.2 Incremental learning introduced ........................................................ 36 5.3 Incremental learning explored........................................................... 37 5.4 Incremental learning revised ............................................................ 39 5.4.1 One step further........................................................................... 39 5.4.2 Link 1 conclusions ....................................................................... 40 5.4.3 Box 3 ...................................................................................... 41 5.4.4 Conclusions............................................................................... 42 5.5 A theoretical framework for developmental connectionism ......................... 42 5.6 Summary .................................................................................. 44
Chapter 6 Biological developmental modelling.................................... 45 6.1 Introduction ............................................................................... 45 6.2 On the prescriptive use of STC ......................................................... 45 6.3 The developing brain..................................................................... 46 6.4 From assumptions to models ........................................................... 47 6.4.1 Assumptions .............................................................................. 47 6.4.2 Models..................................................................................... 48 6.5 A biological framework.................................................................. 50 6.6 Concluding Remarks..................................................................... 51 Chapter 7 Conclusions ..................................................................... 52 7.1 Introduction ............................................................................... 52 7.2 Connectionism and STC: an evaluation ............................................... 52 7.3 Connectionism and developmental psychology....................................... 54 7.4 The (loose) end(s)........................................................................ 55 References....................................................................................... 57
Committee of referees Drs. M. Lievers, Department of Philosophy Dr. T. Olthof, Department of Developmental Psychology Dr. R. van Hezewijk, Department of Theoretical Psychology
Acknowledgements I first want to express thanks for all the supportive work done by Adriaan Tijsseling. He gave comments on earlier drafts of the thesis, found several technical reports that were helpful and supplied advice on using this blasted, user friendly word processor. And I am sure that the faculty (and especially the administrative staff) is going to miss the horsing around we did in the otherwise so silent and solemn corridor. Menno Lievers provided the essential philosophical frame (of mind) for this thesis and helped to structure the arguments. Tjeert Olthof was there throughout most of the development of this thesis and provided helpful advice and articles. Just back from vacation and at the very last minute, Rene van Hezewijk was willing to participate in the committee. Before that, however, he had patiently listened to the first ideas from which this thesis has grown and sort of pointed me in the right direction. I forced my sister Baafke to read an early draft of the first part of the thesis, to prove that it is comprehensible even for an archaeology student. My other sister Lucia was not so easily put to work, but supportive nonetheless and my parents have supported me (and not just financially) throughout the six years spent studying in Utrecht (five of which were dedicated to CKI) and of course all the years before those. Thanks, my family! My girl friend Miriam was very strict in correcting grammatical aberrations but especially so in her attempts at correcting my sometimes leisurely approach to science. I still consider (cognitive) science to be an advanced game of puzzle, discovery, and creativity in verbal fencing, but I guess that this is what makes it so interesting. I am confident that this final version is up to her standards.
Chapter 1 Introduction 1.1 Finding a strategy Neural networks are very appealing. How is it possible that a small amount of nodes and connections is able to do relatively complex things? In practical applications the techniques of neural nets are beginning to gain more and more interest, for example in the field of ‘data base mining’. Neural networks are sometimes able to find novel relations and correlations in the data of large data bases, providing new (and sometimes valuable) insights in the ‘environment’ of a company. But the major impact of neural networks has up to this moment been in the field of cognitive science. Connectionism, the research program1 that is based on these network techniques, has continued to grow explosively since its reappearance on the cognitive scene in the middle of the eighties (Rumelhart & McClelland (1986a and 1986b)). Usually some aspect of human cognition is singled out (e.g. the role of attention in visual tasks), modelled and experimented with. Experimenters often conclude that neural networks are able to do the task while showing some interesting side effects, which sometimes resemble human behavior in those tasks. But what happens next? It looks as if much fragmented work is being done without some sort of coherence, or some sort of strategy. From the connectionist bibles (Rumelhart & McClelland (1986a and 1986b)) and other literature, however, a connectionist strategy can be discerned. Up to now this strategy has been largely implicit, to be found only in some quotes concerning the progress and goals of the connectionist research program, which usually end up somewhere in the introduction or the concluding remarks of an article. Those who are more theoretically inclined do write on connectionism and its role within cognitive science, but their articles mainly concern the differences between connectionism and symbolism and -again- only mention ‘the strategy’ implicitly. In this thesis I will make explicit how connectionist research is striving towards the goal of better understanding cognition. By taking the above mentioned quotes literally, it is possible to construct a model of progress in the connectionist field. If this stagewise model turns out to be a good approximation of what is going on, it is valuable in several ways both within the field and in its relation to the ‘outside world’. So the main aim of this thesis is to describe this (methodological) stagewise treatment (of the progress) of connectionism, which is to be called STC -Stagewise Treatment of Connectionism-. To further indicate the value of such a model, it will be used to examine several important aspects of connectionism. The first of these aspects is a closer look at connectionism itself. It is important to place the research that is currently done into a larger connectionist perspective. STC can be used to give an indication of the stronger and weaker points of this field of research and by making these explicit, connectionism can proceed in a more ‘self aware’ and precise way. The second use of STC lies in a comparison with other research programs, specifically of course the classical, symbolist program, which is considered to be the main competitor within the area of cognitive science. In order to do that I will describe progress of the symbolist program in a way similar to STC. This will be done in chapter 4, where a more ‘objective’ way of comparing these programs is also introduced. In the third part of the thesis the practical use of the model is demonstrated by using STC to describe in greater detail one specific sub-area of cognitive research. The main goal is to show the value of STC as a descriptive tool, but after establishing the legitimacy of the model some indication of its prescriptive uses will follow.
1 I will use the terms program, paradigm and discipline in a rather loose manner throughout this thesis
to mean a set of assumptions, constructs, techniques and goals that guide research. I will give appropriate definitions of paradigms and their underlying constructs where needed.
1
1.2 Recurrent theme The other theme, recurrent throughout these three parts, will be the field of developmental psychology, which has only recently started to enter the connectionist realm and which is therefore relatively ‘unspoilt’. “The field of developmental psychology” is a classifier which covers a very large field, however. A quick scan through the 1993 volume of Developmental Psychology reveals, for example, the following subjects: family models, unravelling girls’ delinquency, different effects of parent and grandparent drug use, school performance and disciplinary problems among abused and neglected children, a psychometric study of the adult attachment interview. This is just a tiny sample of the types of work done in this field and in most of this work it is not clear how neural networks could directly contribute. When referring to ‘developmental psychology’ throughout this thesis, I will therefore use the term as shorthand for something that should perhaps be called ‘cognitive developmental neuropsychology' (or: ‘developmental cognitive neuroscience’, KarmiloffSmith (1992), p.168) a subarea of developmental psychology in which connectionist models could possibly have an impact. The work on development described in this thesis might therefore be called pioneer-work, because it could lead to further implications in a wider area of developmental psychology. It is this pioneering work that has only just begun. Especially in chapter 5 the emphasis will lie on the way connectionism can influence the subject of developmental psychology. The way these themes are addressed in this thesis is also different. It is possible to give a relatively closed description of STC. Once the description is given the subject can be closed, and STC applied, so to speak. The treatment of the role of connectionism in developmental psychology is much more open as it is only possible to present an interpretation of the way this particular topic of research has progressed so far (keeping in mind the descriptive use of STC), and indicating (again using STC, but now in a prescriptive sense) where and how future research could take place. The conclusion will be used to summarize the findings of the above mentioned subjects in order to give an indication of the importance of such a methodological model for the progress of connectionism and to indicate in what way it has made an impact upon the way the subject of developmental psychology is looked at. Chapter-wise this leads to the following set-up: In chapter 2 some views on the backgrounds and goals of this field of research will be presented. A brief introduction into the workings of neural networks will be followed by my views on the workings of the connectionist paradigm. Specifically, I will describe a model (STC) with which the way this paradigm is progressing towards the goal of understanding human cognition can be described. Chapter 3 will concentrate on some of the values and problems of connectionism in light of STC. The most important boxes and links will be evaluated. Chapter 4 looks at the way the model can be used as a new approach to the debate on the differences between symbolism and connectionism. In chapter 5, I will examine in more detail how progression is described by STC, by describing several network experiments that concentrate on development. In chapter 6 the model will be used to look at the way neural networks could be used in network simulations guided by biological concerns. By closely following the first stages, research topics in this field present themselves and I shall give some examples of such a process. Chapter 7 presents conclusions regarding the interpretation, workings, application and evaluation of the four-stage model, but also on the interaction between research into neural networks and a sub-field of cognitive science, the area of ‘cognitive developmental neuropsychology’.
2
1.3 A personal note: on the backgrounds of this thesis The study called Cognitive Artificial Intelligence (CKI -Cognitieve Kunstmatige Intelligentie- at the Utrecht University) has the pretence of being a multi-disciplinary course. What I had in mind therefore in the course of my own ‘cognitive development’ was that we were being raised to be cognitive scientists who would bring together several separate fields of (cognitive) science and show where potential integration might lead to new insights. An introduction into these scientific fields (e.g. philosophy of mind and knowledge, experimental and perceptual psychology, computational syntax and semantics, an introduction into knowledge based or expert systems, from propositional logic up to modal logic, programming courses, foundations of symbolism and connectionism, neural networks techniques) was supposed to lead, in my view, to researchers with a global knowledge of a number of subjects, who could through this knowledge indicate overlap between fields and potential collaboration. Insight in the local scientific ‘lingo’ of these separate knowledge domains (or islands?) and the problems they are trying to cope with, through these introductory courses, forms a large part of the basis of this idea and added to that should be a willingness to delve into new subjects not directly related to a speciality. I believe that this idea is basically sound, but that it is lost in the last years of the study, when, almost inevitably, specialization starts to crop up. An absence of stimulation of work in the direction of cognitive generality, due to ‘organizational difficulties’, is a partial cause of this divergence. Still, I hope that this (admittedly idealistic) idea is reflected in what I have put down in this thesis. A number of different subjects are looked at: -a methodological view of the progress of connectionism; -a contribution to the symbolism/connectionism debate through this methodological model; -the subfield of developmental psychology in which connectionism might be useful; -some experiments with artificial neural networks; -hints at the possible value of catastrophe theory and dynamical systems theory; -biological considerations; -and then some. Underlying this thesis is an implicit general idea about how human beings become what they are. Through evolution, DNA is shaped in such a way that through the expression of molecules, a basic architecture is laid down. Interaction with the environment and the culture in which the individual is growing up, together with this expression and development, eventually leads to an adult individual. The adult reproduces, if he or she is skilful in surviving and reproducing (and lucky perhaps), in another step of the evolutionary process, restarting the cycle in a slightly different way with a new individual. The gist of this idea can be given in one quote from Mayr which is given in Edelman (1987): “Evolution, in which mutation, recombination and gene flow provide major sources of diversity, phenotypic function provides sampling of the environment, and heredity assures that some of the results of natural selection will yield differential reproduction of adapted phenotypes [...]”. The making explicit of such a ‘philosophy’ should take a couple of books, judging from the range of theories that have something to say on it, and that is where, ideally, people with a broad view come in handy. This brings us back again to my view on the goals of CKI or, more specifically, the idea behind this thesis. I have brought together a range of theories, hoping to come up with something coherent, even in its diversity. A central cohesive role is played by the STC model, which will be described in the next chapter.
3
Chapter 2 Setting the stages 2.1 Introduction In this chapter I will describe the workings and progression of the connectionist paradigm. I shall present a four-stage model which characterizes these connectionist processes and after that a simple example to serve as further clarification of the model. With it, the stage is set for a further look at the options of connectionism at this moment and from this vantage point I will try to describe some of those possible future steps in greater detail. It is this ‘Stagewise Treatment of Connectionism’ (STC) that will be used throughout this thesis as a background for the various subjects that I will examine. First, I will describe the basic elements involved, which will be expanded to the full model. Before I proceed to describe the model however, I believe a very brief introduction into the theory of artificial neural networks would be in order. 2.2 Neural networks What follows is a simplified explanation of the basics of a certain type of network called a feedforward, back-propagation network. Many more different architectures and learning rules exist and for a more complete insight into the workings of these structures the interested reader can consult several articles and books (for example Rumelhart and McClelland (1986a and 1986b), Phaf and Murre (1989), Bechtel and Abrahamsen (1991)). The aim of this introduction is not to explain the exact workings of these artificial architectures, but to present some intuitive feeling for the mechanisms involved, as this thesis is focused on the progress of these theories within cognitive science. I hope that for those with no background in the (mathematical) theory of neural networks, this introduction will provide enough basis for further reading. Artificial neural networks are based upon the structure of the brain in which elements (neurons) are connected to other elements by means of dendrites, axons and synapses. Neurons can activate or inhibit each other (but not both) through these connections. Changes in these connections (e.g. number of synapses; amount of neurotransmitter released) are supposed to occur when an organism is learning something. Just to give an indication of the scale of things: there are something like 180 billion brain cells of which 50 billion are actively involved in information processing. Each of these receive up to 15,000 physical connections from other cells (Kolb and Wishaw (1990)). In an artificial neural network elements (the nodes) are also connected to each other by means of weighted connections. The weights on these connections can be changed, giving the network the opportunity to learn something. This learning consists of presenting an input and the corresponding output to the network. The input is processed by the network and the result of this is compared to the desired output. The difference (the error) between the calculated output and the desired output is used to modify the weights on the connections in such a way that when the input-output pair is presented another time, the error will be less. After many of these training cycles, usually measured in hundreds of steps, this error should be reduced to a minimum and the network considered trained. In the ideal case, it should then be able to produce the correct output when given the input data it was trained on and generalize to other, related inputs. In Figure 2.1, an example of a network architecture is presented. The workings of the network might be better understood by some mathematical interpretation. The weights (i.e. the ‘strengths’ of the arrows in Figure 2.1) between the nodes could be seen as a matrix, the input seen as a vector, and the whole process as a kind of matrix multiplication. In this light what is happening when the network is learning, is that it is trying to find a function (the matrix) which describes the mapping between the input and the output. This kind of learning is called supervised learning, because the network is presented with both the input and the output data.
4
Output Patterns
Internal Representation Units
Input Patterns Fig. 2.1 Artficial neural network architecture. The circles represent the nodes of the network and the arrows the connections between them. The internal representation units are also called 'hidden' units, because they have no connection to the 'outside world'.
2.3 On the Stagewise Treatment of Connectionism (STC) After this extremely simplified view of the workings of one type of neural network I will describe the way these architectures are being used in the connectionist paradigm to investigate human cognition and behavior or to model other animal behavior. In order to do that and for the purpose of this thesis, a broad, general definition of the connectionist paradigm is appropriate. Stated simply (but quite effectively nonetheless), connectionism is the specific area of research in cognitive science in which cognition is studied using architectures called artificial neural networks. In these computational structures, elements are connected to each other by means of modifiable weights, hence the term connectionism. The various reasons for using these structures and the advantages of this over other approaches (namely the classic symbolic approach) have been more than amply discussed in other work2, so I will not delve into this. In short, properties of these networks make them interesting to be used in several ways in the ‘quest’ for the understanding of cognition. Rumelhart (1989), who is one of the guru’s of the connectionist (or Parallel Distributed Programming, PDP) field, makes the following remark about this connectionist quest: “Our strategy has thus become one of offering a general and abstract model of the computational architecture of brains, to develop algorithms and procedures well suited to this architecture, to simulate these processes and architectures on a computer, and to explore them 2 Distributed processing, graceful degradation and content-adressability of neural networks are some of
these reasons. For discussions on these and related subjects see for example Rumelhart & McClelland (1986a); Fodor & Pylyshyn (1988) for a critical analysis and Bechtel and Abrahamsen (1991).
5
as hypotheses about the nature of the human information-processing system”, p.134. It is from quotes like these that a model of progress can be constructed (more examples will be given when I describe the stages in detail): the connectionist research program is a program which is aiming to go through four stages: modelling networks on the basis of assumptions about the brain; the testing of those models, analyzing the models and finally the application of insights gained from these models on the brain. Before describing these stages and their connections in more detail, first something about the ‘stagewise treatment’ itself. Those familiar with the connectionist literature will have guessed the background of the name of the model. In Smolensky (1988) the Proper Treatment of Connectionism (PTC) is introduced. Smolensky splits up the cognitive field into two main areas: conscious rule application and intuitive processes. The first is initially taken to be the playground of symbolism in which semantics and syntax operate on concepts (and hence ‘conceptual’). Rules and symbols are assumed to be important for the exploration and description of this field and the formalization of cultural knowledge is a way to understanding this cognitive section. For intuitive processes these rules and symbols cannot be used. Skills, individual knowledge and intuition are not to be captured so easily by a symbolist approach, as experience has taught. Smolensky therefore argues that especially in this area connectionism can play a major role. This role should in time be expanded to also capture the conscious rule application in an attempt to unify the cognitive scientific community (see for example Smolensky, Legendre and Miyata (1992)). The goal of the Proper Treatment of Connectionism is to make clear these and other goals of connectionism (although Smolensky does not claim to represent all of the connectionist community), to state the fundamental hypotheses of this scientific field and to state its relation to the other participant(s) within this field. PTC introduces the notions of subsymbols and a subconceptual level and makes explicit differences concerning content of the symbolist and the connectionist approach in these terms. It was this article I had in mind when trying to find a name for the model that I will describe in this chapter. Although at a different level, I too want to make explicit goals of connectionism, to state some fundamentals and to describe its relation to other participants within the cognitive field. The ‘level of entry’ is different, however, since the model describes in a methodological way the manner in which connectionist progress goes and can therefore be taken to be a ‘meta’-view. Smolensky fills in important parts (i.e. the introduction of subsymbols) in the contents of the connectionist approach. I want to state some fundamental hypotheses about the nature of progress within this field and I will use the model to look at the differences between connectionism and symbolism. Together with the shape the model has taken, ‘Stagewise Treatment of Connectionism’ (STC) seemed to be a logical choice. What will follow is a first introduction of these stages and the way they are connected. To give an indication of why these four stages are appropriate, quotes from several ‘leading figures’ will be given to indicate on what basis the concept of these stages has been developed. In the next section, this first approximation of the model will be expanded to give a complete picture of the several ways in which artificial neural networks are currently applied to gain insights into the workings of brains. This will be followed by a simple example of the application of STC. So let’s take a look at the way it works. Stage 1 Modelling on the basis of assumptions about the brain. The first of the four stages is the generation of network models based on some very basic assumptions about the brain, i.e. there are elements which are connected with each other through connections which can be modified. According to Anderson et al. (1990):
6
“The brain is telling us that we should look at information representation that intrinsically requires large numbers of computing elements rather than clever and subtle architectures”, p.705. The statement of such assumptions goes back to for example the Hebb rule (which states that two neurons which fire together most of the time will be more strongly connected), which has been the basis for the Hebb learning rule used in Perceptrons (e.g. Rumelhart and McClelland (1986a), chapter 1, p. 36)3. The terms used to describe the artificial networks also give an obvious indication of this first stage. Some assumptions about the brain can be used to put together some sort of artificial model and different (or more) assumptions will lead to different architectures (see for example Happel and Murre (1994)). Stage 2 Testing of the models. Next of course this artificial structure will have to be investigated. So in stage 2 experiments are done in order to find out about the properties of these models. If the assumptions from stage 1 are correct, then at least some phenomena encountered when investigating the behavior of these structures should resemble those found in living organisms. So in stage 2, these models will have to be tested to see if their behavior is indeed similar to the way things are done in ‘the real thing’. Hinton, McClelland and Rumelhart (in Rumelhart and McClelland (1986a), chapter 3): “As we shall see, the best psychological evidence for distributed representations is the degree to which their strengths and weaknesses match those of the human mind”, p.78. Such evidence is provided, for example, by testing the network at various intervals during training on data that it has not seen before. Researchers examine the generalizations and the types of error that are made on the new data and relate these to the behavior shown by children learning similar tasks (see for example McClelland and Jenkins (1991)). Considering the large amount of experiments on neural networks that are described in articles, books, during conferences and in their proceedings, and the (sometimes striking) similarities between phenomena encountered when investigating the workings of the artificial networks and the brain, we seem to be fairly well into this stage (see for example the Proceedings of the International Conference on Artificial Neural Networks 1993 (ICANN ‘93)). Stage 3 Analyzing Models. What we have now is some simplified structure exhibiting similar behavior as the real subject. The real thing is too complex to understand at the level of units and connections4 at the moment, but it might just be possible to take apart these simpler models and see what makes them tick. McClelland and Jenkins (1991) give a nice example of the practical switch from stage 2 to 3: “Given the generally close correspondence between model and data (stage 2, R.M.), it is important to understand just how the model performs, and how its performance changes. To do this, it is helpful 3 As I will refer to more articles from these PDP-bibles, a shorthand notation seems in order. This will
be used throughout this thesis. 4 Be it a neuronal level or a subsymbolic level. Interesting philosophical discussions have taken and are still taking place about the correct level of description of cognitive phenomena. There are sides arguing for an extreme reductionist position; there are symbolist cognitivists who claim that the way the cognitive ‘program’ is implemented is not relevant to theory formation (see chapter 4) and a whole range of positions in between these two extremes. Most connectionist researchers can be found in this last range (e.g. Rumelhart and McClelland (1986a), chapter 4 or Rumelhart (1989) and Smolensky (1988)). Although much can be said about this discussion, I will not do it here as it mainly concerns content of the connectionist approach. As I am concentrating on the progress, this discussion is not directly relevant to the model I am describing here.
7
to examine the connections in the network at several different points in the learning process”, p.58, italics added. So once it is established that some interesting similarities arise between the real and the artificial, stage 3 can be entered. Eventually these models will have to be used to help us understand and describe the processes that go on in the brain, so in stage 3 the models will have to be analyzed and instruments will have to be developed that describe the workings of these (simple) artificial networks. Theories like the harmony theory of dynamic systems, but also cluster analysis of neural networks, principal component analysis (which uses eigenvalues and eigenvectors of the weight matrix) and phase state portraits (Elman (1989) mentions the latter three methods), are examples of work in this direction. Smolensky (in Rumelhart and McClelland (1986a), Chapter 6), on harmony theory: “My claim is not that this strategy leads to descriptions that are necessarily applicable to all cognitive systems, but rather that the strategy leads to new insights, mathematical results, computer architectures and computer models that fill in the relatively unexplored conceptual world of parallel, massively distributed systems that perform cognitive tasks. Filling in this conceptual world is a necessary subtask, I believe, for understanding how brains and minds are capable of intelligence and for assessing whether computers with novel architectures might share this capability”, p.196. Stage 4 Application of insights. After that is done we should have gained insight into the way artificial networks process information. These insights may now be transported into the realm of ‘the real thing’ in stage four where, hopefully, the analytical instruments that have been developed can be applied in a similar manner. Zipser (in Rumelhart and McClelland (1986b), chapter 23) expresses some thoughts along this line: “This approach implies that even if the model proves to be a valid predictor, it will not directly provide new physiological knowledge. What it will do is give a precise description of the computations being carried out [...] Once these computations are understood, the task becomes to explain how neural networks in the brain actually implement them”, p.436, italics added. Of course, no guarantees are given that this ‘transportation’ of insights will prove to be successful, but given the results of stage two, we can probably be optimistic. Elman (1989) has high hopes along these lines: “If successful, these analyses of connectionist networks may provide us with a new vocabulary for understanding information processing. We may learn new ways in which information can be explicit or implicit, and we may learn new notations for expressing the rules that underlie cognition”, p.20. The validity of this mathematical exercise of transfer is certainly not uncontroversial, however, psychological neural network modellers being called ‘closet mathematicians’, who finally get a chance to do something scientifically respectable for a change with ‘hard’ maths, see Cliff (1990). We are still far removed from even judging whether it is at all possible to develop such a mathematical analytical theory for the brain, but as we will see, these analysis do in the mean time lead to new ways of looking at the behavior of artificial networks and are therefore still valuable5. 5 A potential answer to scepticism on this transfer lies in the work of Edelman (1987). On very solid
biological grounds he posits two mathematical models of neuronal processes. When tested by means of
8
Figure 2.2 graphically depicts the four stages that are described above.
1. Generation of artificial neural network architectures based upon assumptions about the real thing.
2. Testing of the models to see if the behavior is similar to that encountered in the brain.
3. Analysis of the artificial networks and development of theories on their workings.
4. Application of these analytical insights on the real thing.
Figure 2.2. Basic model of advance in connectionist field.
Splitting up the connectionist program into these four stages of generation, testing, analysis and transfer or application may seem to be just a little more than an arbitrary choice, but it does help to explain the next point. Given these four stages it is clear that there might be a danger that new insights gained in the area of neurobiology will be overlooked because people are too busy working with the models to concern themselves with developments in other scientific fields. If this simplified model is taken literally then once stage 1 is passed the neurophysiological aspects would be out of sight. Crick and Asanuma (in Rumelhart and McClelland (1986b), chapter 20) mention this danger of loosing track of the interaction between insights from the artificial network research and neurophysiological research. There is some indication of this process, but on the other hand there are also examples of research neural network simulations these networks show similar neuronal group behavior as can be seen in the brains of monkeys. Portrayals of the results by means of pictures of the activations in the network act as further clarification and are very persuasive. Edelman has modelled biological processes in a very detailed way and these models show remarkable similarities in neuronal behavior to that seen in real brains. Further analysis of these models (and what else are neural networks but implementations of mathematical models?) must say something about brain processes, considering the detail involved. I propose that Edelman’s work provides at least an indication of the potentials of the transfer of box 3 analysis.
9
that keeps being inspired by the research that is done in areas like neurobiology or neurophysiology. Furthermore, this basic model has in it much for it to be called a reductionist approach to cognitive science. All the insights to be gained go through box 3, and are therefore mathematical or analytical accounts of human performance. No cognitive or psychological theories are introduced in this path to insights. But these aspects do feature in the current use of connectionist models. Bechtel and Abrahamsen (1991) describe three different ways connectionism can be used to study cognition. An approximationist point of view states that connectionist models will lead to a different and more accurate account than that proposed by symbolists; compatibilist theorists will emphasize the way connectionism can implement symbolist theories and a third view that emphasizes symbols as external entities. A hybrid account, in which the tasks are divided so that each does what it does best (i.e. pattern recognition vs. logical reasoning), also ranks foremost amongst proposals. These are all in some way cognitive in content and not reductionist. So, the proposal of a reductionist account is certainly not the only thing going on in the connectionist camp. Most of the work goes into formulating new hypotheses about cognitive processes, based upon these connectionist techniques but without the mathematical analysis involved in box 3. The basic model, however, does not accomodate for this type of work. It seems therefore that a more recurrent and interactionist model for the advance in the connectionist field, in which interaction between the stages occurs, would be more appropriate. In other words, in order to get a more realistic view of the connectionist operation, the model will have to undergo some sort of expansion. The next section will be dedicated to the description of a such an expansion. 2.4 The Proper STC The upgrading of the model consists simply of putting in those extra connections mentioned above (plus two others) between the four boxes. These connections do seem to represent intuitively the main categories of research which are currently being done. That this is so should speak in favour of the way the ‘staging’ of the field is done with this model. If it proves to be an acceptable description of what is going on, it could then be helpful in presenting a possibility of clearing up misconceptions about the field of connectionism, giving an indication of who is doing what and why. In the following chapters some of this clearing up will be done. Figure 2.3 presents the upgraded version, which will be used throughout the rest of this thesis. What follows is a short intuitive interpretation of the new connections and in some cases examples of the workings of these connections. Because of the new connections, more generalized interpretations of the contents (additions in brackets) have been added in the last three boxes. 1. From the study of the behavior of artificial neural networks new psychological theories and hypotheses might be proposed directly, as is being done all over the field. This accounts for the addition of ‘interpretation of cognitive processes’ to box 4. One example is the way connectionism has changed the way development is looked at. New (connectionist-based) hypotheses tend to be more ‘interactive’ (i.e. the interaction between nature and nurture) instead of polarized along one of these dimensions. Furthermore, link number 1 is the area in which the battle between the more symbolist oriented approaches and the connectionist approaches could be said to take place at this moment, as the connectionist cognitive theories compete with those proposed by the symbolist paradigm. In chapter 4 more will be said about this debate. Chapter 3 is for a large part dedicated to a more detailed description and evaluation of this link.
10
1. Generation of artificial neural network architectures based upon assumptions about the real thing.
2. Testing of the models to see if the behavior is similar to that encountered in the brain. (Testing of assumptions and hypotheses).
2. 4.
3. 1.
3. Analysis of the artificial networks and development of theories on their workings. (The interpretation of artificial networks).
4. Application of insights on the real thing. (Interpretation of cognitive processes).
Figure 2.3. The complete version of the four-stage STC model. Links which are not numbered are those already present in the basic model. The other connections will be elaborated upon in the main text.
2. A slightly more detailed look at the human brain, through connectionist glasses, is enough for more inspiration on new ways of putting together the artificial architectures. More or different assumptions about the actual operation of the brain will lead to the construction of different models. There are many examples of this process, including structures like Grossberg’s ART6 (Adaptive Resonance Theory, in which there is an overabundance of output units, which aren’t used till needed, e.g. Grossberg (1988; 1987)) and CALM (Categorizing And Learning Module, which can make a distinction between old and new representations, Murre, Phaf and Wolters (1992)) as alternatives to the ‘simpler’ back-propagation or Hebb-learning techniques. Rumelhart and McClelland (1986a, chapter 4): “[...] we see the process of model building as one of successive approximations. We try to be responsive to information from both the behavioral and the neural sciences”, p.130. 6 For the record: Grossberg developed his Adaptive Resonance Theory ‘long’ before the emergence of the
PDP-‘cultus’ (Rumelhart and McClelland (1986a and b)) and quite separate from this part of the field. In Grossberg (1988; 1987) the two are compared. So although it is an example of how, using different assumptions, one gets different models, it is not an example of STC-like progression in the field.
11
Link number 2 provides the route to these successive approximations. 3. A combination of insights gained from the advances in the theories of connectionism and different fields of psychology could lead to the formation of new hypotheses from which experiments can be devised using existing architectures in order to test out these new ideas (hence the ‘testing of assumptions and hypotheses’ in box 2). Clearly the backpropagation algorithm in feedforward networks is the prime example of this process, since it is used extensively, not to say almost exclusively, in different experiments to test hypotheses, from attentional processes in visual tasks to the balancing of scales (McClelland and Jenkins (1991)). 4. New insights gained from the analysis of neural networks present new ways of looking at the behavior of those networks (i.e. the ‘interpretation of artificial networks’). Ideas about the way neural networks operate could influence the interpretation of the phenomena encountered when testing different architectures. A description of the behavior of nets using principal component analysis or the minimization of the error function might lead to different interpretations of the behavior of the network and through these to different interpretations of the behavior of brains. For example, instead of merely adjusting weights, the network is trying to find the eigenvector with the greatest eigenvalue, or, it is settling or relaxing into a local or global minimum in the error landscape. It is metaphors like these that help understand what is going on in these structures. Based upon these links and boxes an important distinction can be made between two types of modelling. There is a difference between psychological or cognitive modelling and biological modelling (also Bechtel and Abrahamsen (1991)). The latter one is concerned mostly with the biological plausibility of the models, the first one with the results of experiments and link number 1. Psychological modelling is less concerned with the shape of the architectures and more with the cognitive insights and conclusions that can be gained by using artificial networks. Biological modelling needs explicit mention of the assumptions incorporated into the networks and an indication of the conclusions to be drawn on the basis of experiments, which are relevant to brain processes. Box 3 will prove to be especially valuable for more biologically inspired models, as some doubts arise that these types of insights will come from something like back-propagation, which has little biological foundation7. The value of STC therefore lies in the fact that it sheds some light on the different ways neural networks can have an impact upon (cognitive) research. Furthermore, it must be mentioned that the system as it is presented here is not a closed system. Each of the boxes gives the opportunity of entering new insights into the system, so that it can incorporate new theories and different approaches. A striking example of this process is the way insights from metallurgical domains provided ways of developing new networks which use the theory of simulated annealing8. The way in which the rising popularity of the ‘evolutionary nature of nature’ is changing the way (human) behavior is viewed, thereby changing the way in which neural networks are built, is another case in point. The use of genetic algorithms to evolve new artificial network architectures for a specific task is something which is gaining increasing attention (e.g. Happel and Murre (1994)). More of these ‘exotic’ theories, like chaos theory (e.g. Van der Maas, Verschure 7 Grossberg heavily critises the back-propagation learning rule in Grossberg (1988; 1987): “[as] the BP
model does not model a brain process [...], it undermines the model’s usefulness in explaining behavioral or neural data”, p.232. 8 This refers to the way a regular molecular structure is achieved in metals when liquid metal is allowed to cool down slowly. If we take the activations of neural networks to represent an energy function we can call the solution to a problem the global minimum in the energy landscape depicted by that function. By allowing the network to ‘cool down’ gradually, i.e. by slowly changing the activations so that the energy value is lower at each next time step, the network can settle into the global minimum and find the solution.
12
and Molenaar (1990)), Lindenmayer-systems (for the underlying principles of L-systems see for example Prusinkiewicz and Hanan (1989) and for connectionist application Boers, Kuiper, Happel and Sprinkhuizen-Kuyper (1993)), dynamical system theory (i.e. Van Geert (1991)) also find their way into the connectionist system and influence its workings9. 2.5 STC applied One example of neural network experiments should further indicate the legitimacy of STC. Bechtel and Abrahamsen (1991) devote chapter 5 of their book to the subject ‘connectionist and nonpropositional representation of knowledge’. As Bechtel has some insights in the way students learn to handle for example syllogisms, being experienced in teaching both informal and formal logic, they decided to investigate how neural networks might perform on (the learning of) such a task. In STC terms, this is an example of the application of link 3. They are testing a new theory using an existing architecture, i.e. a multilayer feedforward network using backpropagation as the learning rule. They trained the network to evaluate simple arguments like: If p, then q p _________ ∴q What needs to be done according to box 2 is to see if the network behavior is in any way similar to human behavior. Slow speed of learning and the necessity of a large amount of practice and error correction were, perhaps partly in jest but also more seriously, used as indications that the network showed relevant similar behavior. But by the end of the training, the performance was at least similar to that of average students, so it could handle the task. And although this is where most of the similarity ends, it was not the intention to model human performance precisely: “Rather, it is to show that logic problems, when viewed as pattern recognition tasks, can be solved by networks which, like humans, seem to be capable of learning from errors and tuning their performance”, p.173. So through link number 1, the insight to be gained from these experiments is that logical expertise may not have to be modelled through sets of mental rules or procedures, but that pattern recognition, as an alternative, could go some way in explaining what goes on in gaining logical competence. Feasibility studies such as these are very common in the network literature. The aim is often to show that alternative theories are possible, and not in first instance the explicit and complete proposal of such theories, which is future work (although the suggestion can be quite strong, as it is in this case). As for box 3 analysis: “Without a detailed analysis of the hidden units (which we have not performed), we cannot determine exactly how the network solved this problem”, p.171, italics added. So although not performed, it is recognized as being a separate step in the study of networks; one that should lead to different insights (i.e. vs. the ‘link 1 suggestion’ that different mechanisms could underlie logical competence). 9 Dynamical systems theory is briefly described in 3.3.3 and Lindenmayer systems in 6.4.2. Those more
interested in these theories should consult the references given above.
13
Even in such a simple example elements of STC show up, a pattern to be recognized in the majority of the experiments performed with artificial networks. 2.6 Remarks Before proceeding to the next chapter one thing has to be made explicit concerning the use of STC. The STC model describes (and can prescribe) the progress of the field of connectionism, a field which is defined in very general terms in this setting. The connectionist contents of the progress are quite another matter, however, and STC cannot be used to comment upon the background ideas, the validity of purposes and assumptions with which the boxes and links are gone through. This means that STC can be used by researchers with quite different goals in mind. Those aspiring to cognitive modelling will tread the path of link 1 mainly, keeping in mind that through box 3, underlying mathematical analysis is potentially close at hand, and that the neurally inspired modelling presented by networks ensures that a link with the brain, as metaphor for the artificial model, features somewhere in the theories to be developed. On the other hand, those guided by biological modelling and perhaps reductionist concerns (the basic model, Figure 2.3), who will venture mainly along the path leading through box 3, are also catered for in STC. Both the cognitive (or psychological) and the biologically interested connectionists can find their way through STC, which therefore presents a complete picture of connectionist possibilities. The most important links in STC are the ones from which insights can be gained into the human condition. The links entering box 4 from boxes 2 and 3 stand for the main processes of enlightenment in the connectionist paradigm. Some intuitive back-up for these processes comes from Smolensky (1991): “The new cognitive architecture is fundamentally two-level: formal, algorithmic specification of processing mechanisms, on the one hand, and semantic interpretation, on the other, must be done at two different levels of description”, p.203. This distinction corresponds closely to the two different ways of gaining insights in the four stage connectionist program, i.e. through link number 1 (semantic) and through box number 3 (formal, algorithmic specification). At first glance then, STC has plausibility. The next chapters will be dedicated to increasing this plausibility by using the model to examine three issues, i.e. connectionism itself, a comparison between this paradigm and symbolism and further examples of the practical applicability of STC.
14
Chapter 3 3.1
To boldly go...
Introduction
3.1.1 Boxes and links, potentials and problems In this chapter I will present a closer look at the workings of STC within the connectionist field. Specifically, link number 1, boxes 1 and 3 of STC and their influences in the progress of connectionism are the main items to be considered. Link 1 is very important in the process of generating new cognitive theories and hypotheses directly from neural network experiments. Some examples of this process will be given from the field of developmental psychology. But apart from being valuable there could be potential problems with this link and these problems will also be further elaborated upon in this chapter. Box 3 represents the other route to cognitive insights. It is valuable in providing analyses of networks which may be useful for the analysis of brain processes and in chapter 4 I will argue that its presence is one of the main differences between the symbolist and the connectionist approach. It is not uncontroversial however, so some comments upon its value are in order, which will be followed by two examples of theories which may be used to develop this part of STC further. 3.1.2 Developmental psychology and connectionism The field of developmental psychology will crop up in this chapter. There are several reasons for choosing this particular field, the first being that it presents a very clear example of one type of problem which I will call ‘the use of wrong models’ (section 3.6). The problem focuses on the difference between learning and development and as we will see, there is more to development than just changing weights on connections. Another reason lies in the fact that connectionism and developmental psychology have only started to ‘find each other’ relatively late in the short history of artificial neural networks. In vision research, neural networks have played a major role ever since Rosenblatt used his Perceptron to ‘look’ at and recognise letters on a screen, so this is a well-established part of the connectionist field10. Connectionism however has, not yet, had such an impact on developmental psychology, nor vica versa, which is a shame because in my opinion, both fields could learn a great deal from each other. So the other reason consists of trying to draw attention to the possibilities of such cooperation. Although those potentials are also noted by Bechtel and Abrahamsen (1991), they are at the time of writing less optimistic about the abandon with which connectionism will be embraced by the mainstream of developmental psychology. The idea that connectionism and developmental theory have a lot to offer each other is not entirely unrecognized however. Plunkett and Sinha (1992) “argue for an ‘epigenetic developmental interpretation’ of connectionist modelling of human cognitive processes, and (...) propose that Parallel Distributed Processing (PDP) models provide a superior account of developmental phenomena than that offered by cognitivist (symbolic) computational theories”, p.209. In their article they describe four applications in which different neural network architectures are used to model or describe different aspects of developmental phenomena. These are the over-regularization of irregular verbs (further elaborated from Rumelhart and McClelland (1986b, chapter 18)11), concept formation, compositionality and structure dependency (after Elman (1989)) and the occurrence of stages in development. In each example a network is trained on a problem and it is shown that 10 It is extremely likely that this also has to do with the more advanced computational descriptions of
several aspects of vision which had been developed before the reappearance of neural networks. Developmental psychology has a shorter history in this line of work. 11 For a commentary on these experiments see Pinker and Prince (1988).
15
during this learning phenomena occur that are similar to those observed in studies and experiments with children. Main aim of the article is to show the cognitive world that there is a workable alternative to the cognitivist explanations that are being offered for developmental phenomena and that this alternative comprises ‘simply’ one kind of theoretical architecture (the artificial neural network) plus an epigenetic approach. “What we call the cognitive system is best thought of as a dynamical system coupled within a dynamical system”. That aim is admirably reached but some further remarks have to be made on this subject and this will be done in this chapter. By using the examples of Plunkett and Sinha (1992) in combination with STC, I will try to indicate where these problems, and those mentioned in the first part of this introduction, could occur. 3.2 The importance of link number one For clarity’s sake, the relevant part of Figure 2.3 is shown here as Figure 3.1., with link number 1 represented by the bold arrow. As was stated in the introduction, link number 1 is a very important link in developing new theories as alternatives to symbolist theories. Olthof (1994) makes mention of this process when presenting a case for a major role of connectionism within the area of developmental psychology. Even if simulations and experiments like those mentioned in Plunkett and Sinha (1992) and several simulations done by Olthof12, do not in themselves say much about the way the development of children progresses, “it is shown however in an unequivocal way that systems with one specific structure, given a certain type of input, are able to generate the relevant phenomena. Simulations like these therefore produce hypotheses about the development of children and the nature of the information they pick up in their environment. After that, both kinds of hypothesis have to be tested in the usual way”, translated from Dutch. An important part of the contribution of connectionism therefore lies in the way new hypotheses can be stated and tested. One example of how neural networks could add to or improve on the symbolically oriented theories which are currently maintained in the area of developmental psychology, is the following. There is a stage in the life of a young child in which it learns single words: the one-word stage. Vocabulary growth increases during this stage but slows down again at the end (when the two-word stage kicks in...). Happily so, or otherwise the child would have to learn something like 17 items per minute, every minute of the day and night, to keep up the increase in growth rate. “The deceleration could be the effect of an underlying growth program’s putting on the brakes on Keren’s word leaning and accelerating her syntax learning”, Van Geert (1991), page 10, italics added. The ‘growth program’ mentioned by Van Geert is taken to be some kind of underlying symbolist program which regulates this learning behaviour. In the traditional accounts of development it is not clear how these programs are initiated or how they work. A connectionist approach could offer a simpler explanation in terms of limitations of the (growing) architecture of a (or the) neural network. This leaves intact the descriptions of these phenomena but grounds them in something more convincing than ‘an underlying growth program’. More examples of ‘programs’ of this type can be found in 12 On the subject of context independency.
16
developmental literature in which an explanation using neural network insights could give a more intuitive or ‘grounded’ explanation and some will be described in chapter 5.
1. Generation of artificial neural network architectures based upon assumptions about the real thing.
2. Testing of the models to see if the behavior is similar to that encountered in the brain. (Testing of assumptions and hypotheses).
3. Analysis of the artificial networks and development of theories on their workings. (The interpretation of artificial networks)
4. Application of insights on the real thing. (Interpretation of cognitive processes)
Figure 3.1. Part of the upgraded model. The bold arrow indicates the link that is discussed in the main body of the text.
There are yet other ways insights can be gained through experiments and the use of link number 1. Bechtel and Abrahamsen (1991) describe research done by Hinton and Shallice, who have lesioned artificial networks to see if resultant behavior is comparable to that of humans suffering from certain types of dyslexia. Results of the experiments suggest new and nonobvious ways of retraining the reading system that might prove useful in the human case, opening a possibility for clinical application. In summary then, connectionism is valuable in general in providing an opportunity, through link number 1, to generate and test alternative hypotheses to those currently available in the cognitive world. In particular, for the field of developmental psychology, this has meant two things. One, the architecture of the brain has come back into the picture, together with the biological constraints this can offer. The turn to a new approach is an important progression in this field as the (development of) architecture can start to play a role in theories now, and
17
perhaps more importantly, there is at last a possibility to test hypotheses using models. And two, the environment is again given a part to play. One practical contribution within the field of developmental psychology lies in the emphasis connectionism can put on the role of the environment in learning, paving the way for a more epigenetically oriented approach to the study of development; an important ‘re-evolution’ in this field of research. In chapters 5 and 6, I will look at this process in more detail, again using STC as a basis for deciding which possible steps can be taken. 3.3 On the value of box three 3.3.1 Functional architecture Coltheart (1994) makes a distinction between the network architecture, i.e. the number of nodes and the way they are connected, and the functional architecture, i.e. what the network is doing: the function represented by the weights. We want to know more about this functional architecture and that is why we use the network models. But if the functional architecture is created by the network itself, it will be difficult to discover the functional features. After making this point, Coltheart also argues that even if we do find this functional architecture, it will also be difficult to determine if this is the one used by humans. A related discussion is the one mentioned by Plunkett and Sinha (1992) when they demonstrate that it is possible to implement developmental phenomena in a mechanism that makes use of a single learning rule, versus one that uses a dual mechanism as is assumed in the traditional account. But what if this single learning rule does implement something like a dual mechanism? In order to present an answer to these problems, it is important to analyse what is going on inside the network and with this the importance of box 3 becomes clear. In chapter 4, differences between symbolism and connectionism will be looked at. As box 3 plays is important in this discussion, more will be commented on its merits there, also in light of the discussion mentioned above. It is important to find out what is going on on a mathematical level ‘inside’ the neural network and it is the work done in box 3 which should lead to these insights. It is therefore a critical component of the connectionist striving towards ‘true’ cognitive knowledge. But the ‘detour’ through box 3 (link number 1 being more direct) will most likely prove to be a difficult one. Experience has taught that the analysis of relatively simple architectures -like feedforward networks using back-propagation- is already difficult. The construction of more complex architectures with more complicated learning rules will not lead to networks that are easier to analyse. Still, the mathematical analysis of these networks is an essential task, and there are some theories that might prove useful in giving some support in this process. One way of solving potential problems is by looking around for help outside the boundaries of connectionism. Mathematical theory on non-linear systems for example might point in the direction in which such analysis could be found. In chapter 2, I already mentioned the openness of STC. By this I meant the possibility of introducing theories from other areas of research into the connectionist paradigm for further use. I now want to look at theories that might qualify for this introduction, especially applicable in box 3. I have in mind two articles: an application of catastrophe theory to stagewise development (Van der Maas and Molenaar (1992)), and a dynamical systems approach to cognitive growth (Van Geert (1991)). The examples concentrate on developmental concerns, but should prove to be of value in a more general sense for analytical purposes and in light of the other theme of this thesis it is not a divergence from the objectives I have in mind. 3.3.2 Catastrophe theory and stagewise development Catastrophe theory is used to describe transitions that take place in processes like boiling (e.g. the transition from water to steam) in which sudden jumps could be said to take place. Van der Maas and Molenaar (1992) present a closer look at catastrophe theory in connection with the jumps or transitions, better known as the stages, in cognitive 18
development. They give a comparison with Piaget’s theory of stages, and with the concept of stage transitions in an information processing account. Although hinted at, catastrophe theory and neural network analysis have not yet been related13. The following quote gives an indication of the possibilities of such an integration, if we keep in mind that artificial neural networks are prime examples of non-linear systems, a subclass of which are capable of self-organization: “At present we are witnessing a revolution in the pure and applied mathematical analysis of non-linear systems. Under the headings of bifurcation theory, catastrophe theory, nonequilibrium thermodynamics, synergetics, soliton and chaos theory, considerable progress has been made in the analysis of various aspects of nonlinear systems. One distinguishing feature of these systems is that sudden qualitative changes may occur in the dynamic structure of their behavior. These so-called catastrophic changes mark transitions to newly emerging equilibria, that arise through endogenous reorganization of the system dynamics. Accordingly, catastrophes are associated with a kind of self-organization that can only take place in non-linear systems [...]”, p.415, italics added. Furthermore, as neural networks show similar stages in their ‘development’ as training proceeds (e.g. Rumelhart and McClelland (1986b, chapter 18) and Plunkett and Marchman (1990)), it seems that catastrophe theory can be valuable in explaining the behavior of artificial networks. 3.3.3 Dynamic systems and cognitive growth Van Geert (1991) describes a dynamic systems model that takes cognitive growth to be a system of supportive and competitive interactions between so-called ‘cognitive growers’. The developing cognitive system is described as an evolving ecological system of these cognitive growers, which could be for example the growth of vocabulary, the growth of grammatical rules (like inversion in Wh-questions) or other ‘skills’. Growth of one of these ‘species’ can support or compete with others in an environment with limited resources (e.g. from the size of working memory to skills already present (facilitation), motivation, ‘educators’ and environmental reinforcement). At first this may sound rather exotic, but the eventual equations of dynamic growth show some promise when compared with empirical results. As a subclass of connectionist architectures also learns through a mechanism of competition, and as networks in general are taken to be dynamical systems, this type of analysis holds promise for connectionist application. Furthermore, some small changes of the equations described in Van Geert (1991) give rise to an opportunity to apply catastrophe theory to Van Geerts dynamic systems model (Van der Maas and Molenaar (1992), p.402). This could lead to an important integration of tools of analysis with potential opportunities for neural network analysis and a link with empirical research. A closer look at these theories however goes beyond the goals of this thesis, but by mentioning them I do want to stress the importance of looking around for inspiration from other fields such as these. The applicability of these types of theory does receive some support in Van der Maas, Verschure and Molenaar (1990), who note upon the chaotic behavior of very simple neural networks. It is the application of these types of theory that may shed new light on the analysis of network behavior. The explicit notation of mathematical definitions like those in Van Geert or Van der Maas and Molenaar might go some way to help understand what goes on in the stages that for example Elman (1993) and Plunkett and Marchman (1990) have noted in their experiments with networks on the subject of incremental learning. In chapter 5 these experiments will be described in greater detail.
13 It seems, however, that Molenaar is getting research like this underway (Olthof, personal
communication).
19
So, in summary, although it may become more difficult to analyse networks as they become more complex, there are mathematical theories on for example nonlinear systems that may prove helpful in further developing analytical tools. Import of these insights has proven successful in the past (e.g. simulated annealing), so there is reason to assume that this will be fruitful. Considering the importance of box 3 (also in light of the discussion in chapter 4), it is certainly something that deserves attention. 3.4 Problems, problems Apart from being of great value for the cognitive community, connectionism also brings with it some potential problems. They can arise at each of the boxes and connections but I will concentrate here on two of the possible difficulties. In short it boils down to the following simplification. It is possible to put network models to wrong use (‘transfer’ problems), but it is also possible to use the wrong networks (problems regarding assumptions; a combination of the two is certainly also possible, i.e. putting wrong models to wrong use, but I will not go into that). A first delineation of the area of these two problems will be followed by examples in which these problems have arisen. Putting networks to wrong use concerns especially the bold link of Figure 3.1, i.e. the problematic transfer of insights directly into the field of cognitive processes. The bold link has some potential pitfalls, although in chapter 4 we will see that connectionism is not the only scientific field to suffer from these predicaments. For the moment though I shall concentrate on connectionist specifics. The first item then is that bold link. A wrong use of neural network models would consist of overstating the implications of experiments with networks, drawing bold conclusions or making bold statements about the value of connectionism on the basis of experiments using simple architectures. I have in mind two examples while making this statement and these examples should further clarify the way the interpretation of results of experiments with artificial networks could lead to these overstatements. Now a look at the second type of potential problem, the use of wrong models. If the assumptions used in the construction of the networks are the wrong ones or constitute a insufficient subset of potentially usable assumptions, it could lead eventually to wrong conclusions in box 4. In a similar way, if the architectures or learning rules that are used in experiments are not appropriate for those types of tasks (maybe they handle the task effectively, but do so in a ‘non-cognitive’ way), it could lead to the wrong conclusions about the models and thereby possibly to wrong conclusions in later stages of STC. For example, back-propagation is a much used learning technique, but the biological or psychological plausibility of this specific learning rule could be questioned. The emphasis of this problem lies in box 1, the construction of network models, although of course consequences are ‘fed forward’ to further stages in STC. An example from the field of developmental psychology will further clarify this point in section 3.6. 3.5
Wrong use of models
3.5.1 Overstatements The problem that will be focused on in this section is the link between box 2 and box 4, in which insights gained from experiments using artificial neural networks are used directly in cognitive psychological theories. Link number 1 could in a way be seen as a short cut to box 4, as the ‘lengthy’ and difficult detour through box 3 is avoided. Box 3 provides some mathematical background for the experiments, which would otherwise mainly have an interpretative character. The network shows a certain kind of behavior, which we interpret in our own way to propose new cognitive theories. The danger in using only link number 1 therefore lies in the possibility of making bold or rash pronouncements on the basis of experiments that do not warrant these conclusions. In other words: experiments may show relevant emergent effects and insights, but however nice, they are still insights which mainly concern the artificial neural networks. The 20
question remains how much of these insights can be safely transferred to the real thing, box 4, through link 1 of STC. They may do the same thing, but if they have arrived at that stage by completely different routes, can you really compare the results? Please note that the emphasis lies on the making of bold or rash statements. I have indicated in section 3.2 that link 1 was important by providing the possibility to state alternative theories to those already present in the cognitive field. These theories however must be proposed in such a way that the scope of the experiments and their limitations are taken into account. The wrong use lies in overstating the possibilities and implications of artificial models and as the main process taking place in link 1 is one of interpretation of results, this human aspect is most likely to take place here. Two examples will further illustrate this point. The first is the classic example of how too much can be ascribed to the field of connectionism. Rosenblatt being the first, and historically speaking (in 1959) almost the last in overstating potentials in enthusiasm: “It is clear that the class C’ perceptron introduces a new kind of information processing automaton: For the first time, we have a machine which is capable of having original ideas. As an analogue of the biological brain, the perceptron, more precisely, the theory of statistical separability, seems to come closer to meeting the requirements of a functional explanation of the nervous system than any system previously proposed”, quoted in Rumelhart and McClelland (1986a), p.156, italics added. Perceptrons are two-layer networks that are limited in scope, they cannot for example be trained to correctly handle the XOR-problem, a simple standard benchmark for network models. So to jump to the conclusion that “we have machines capable of having original ideas” is to lay too much on the potentials of Perceptrons, which has been made perfectly clear by Minsky and Papert. It was partly in response to such claims that they wrote the infamous ‘Perceptrons’, in which the limitations of two-layer networks were proven, and thereby almost laying connectionism to rest (these history lessons are taken from Rumelhart and McClelland (1986a)). Although the lessons learned from this have lead to greater care in stating implications of experiments it remains important to try to dampen enthusiasm when making claims about network behavior and to state explicitly how the results warrant the claims that are made. 3.5.2 Interference The second example concerns criticism on the connectionist paradigm, based upon some simple experiments and a wrong interpretation of the goals of this paradigm. McCloskey and Cohen (1989) are quoted in many articles when the subject of catastrophic interference crops up. In their article they present two examples in which the learning of a new piece of information in a network that is already trained, can seriously (‘catastrophically’) interfere with the information that was already in the network. Performance on the old set can drop as much as 70% after just one learning trial on a new piece of data. This sequential learning problem has been widely recognised and accepted within the backprop community and some attention has gone to developing new architectures and learning schemes to avoid the problem (ART, CALM, semi-distributed representations (French (1991)), etc.). Based upon these two examples however, they draw the conclusion that connectionism is a hopeless cause, as they cannot foresee any possible way of dealing with the problem of catastrophic interference (a foresight proven wrong as shown above). Looking again at the definition given in chapter 2 we see that connectionism is the specific area of research in which cognition is studied using architectures called artificial neural networks. Back-propagation networks may be the architecture used in a majority of experiments, but it is certainly not the only game in town. Based upon my definition, the entire discipline called connectionism cannot be falsified by experiments using simple back-propagation networks. Something more fundamental lies at the heart of this 21
paradigm. So, dismissing connectionism on the grounds mentioned by the authors is a bit overenthusiastic, considering the overwhelming amount of experiments that do seem to go right and, furthermore, it is based upon a misconception of the goals of the field. It should by now be clear what is meant by the wrong use of models. Overstating the implications of network experiments is something which could be done through link 1, if no consideration is given to more formal analysis of the architectures, which does take place when taking the detour through box 3. This subject will again be mentioned in the next chapter (section 4.4), as it is relevant to the debate between connectionism and symbolism. The human factor, i.e. the interpretation of results, is something which plays a part in both paradigms, although with differences in the consequences. 3.6 Use of wrong models When using network models it is possible that the phenomena encountered when studying the neural networks could be of the right kind, but that for a variety of reasons the underlying assumptions are inappropriate or the learning rule too strong, so that the eventual psychological or cognitive conclusions (or conclusions about connectionism as a whole) or the eventual theory could be misplaced. This sketchy outline of the problem becomes especially relevant when looking at examples from developmental psychology. For this purpose I turn again to the article of Plunkett and Sinha (1992), mentioned in the introduction of this chapter. The neural networks they use in their examples are all feedforward neural networks using back-propagation (one of which recurrent, see chapter 5) as the learning rule. This means that the only things changing in the network are the weights on the connections between the nodes. When we now look at some elementary insights from developmental neurology, we can see that as the child is developing, the architecture of the brain is also developing. Several processes like migration, sprouting, cell death and myelinization are taking place whilst the child is learning. This means that, contrary to the artificial network which has a static architecture, the real network has a dynamic architecture which undergoes several changes as the child is growing up. The influences such architectural changes can have on development strongly suggest that they should be taken into account in a theory on development. These considerations, however, do not feature in the article of Plunkett and Sinha and in their experiments (but see also for example Rumelhart and McClelland (1986b, chapter 18) or McClelland and Jenkins (1991)). In the examples mentioned by Plunkett and Sinha (‘et al’), the architecture of the networks is fixed, so the conclusions that are drawn are based upon just a part of the possible foundations of development and learning. Some of the phenomena which are described could also be the result, not only of the ‘static’ dynamics of the networks, but also of the ‘dynamic’ dynamics of the network, i.e. some of the phenomena could also be the result of changes of the architecture, not just of changes of the strengths of the connections within the architecture. Just changing the weight may do the trick (possibly because of the strength of the back-propagation algorithm), but is certainly not the only thing going on in the development of the real architecture. Support for this argument comes from two sources. First, as mentioned above, looking at the way the brain develops suggests that implementing similar processes in artificial networks could lead to better network models. In fact, following STC would require working on this suggestion through the recurrent link number 2. As indicated in chapter 2, the process of model building is one of successive approximations (through link number 2). As there are no real deadlines in the global time table of cognitive science (apart from the ones we set ourselves) we, in theory at least, have time to add assumptions as more knowledge about the brain becomes relevant for modelling. Second, experiments with network structures and learning rules that already take into account more assumptions than the backprop feedforward networks show that performance and speed of learning do increase, indicating that following link 2 of STC is by no means void of possibilities. Problematic may certainly be deciding which assumptions to use, as it could remain uncertain what would constitute a correct or 22
complete set of assumptions. At what level of detail will they become sufficient? Much knowledge on brain processes and architecture has not (yet) been incorporated in artificial architectures and the question remains how much of this knowledge should be used. This is the core of the potential problem of using ‘wrong’ models, making it a question of explicit motivation of the backgrounds and results of the experiments concerned. So what is proposed at the end of this section is that, in order to keep stepping ahead of the problem, we keep on attempting to build those successive approximations to find out about the influences different assumptions can have and to ensure progression in the field. 3.7 Summary When we look at real life, it becomes obvious that a major aspect of development is, beyond doubt, the development of the brain architecture. Any theory which claims to state something about development therefore has to take into account the interaction between the developing architecture and the environment. Only through the study of such interaction can relevant claims be made about development as a whole. This is why the use of connectionist models in the study of development provides such a clear example of what it means to use wrong models, and this is why I have made the distinction between the wrong use of models and the use of wrong models. When studying adult cognitive behavior (i.e. in this context: learning), changes in architecture become less relevant, as the adult brain architecture can be taken to be fixed. Cognitive theory formation on adult behavior can therefore benefit from the exploration of connectionist models with static dynamics, the experiment performed by Bechtel and Abrahamsen (1991) in the previous chapter providing a nice example. Overstating conclusions on the basis of an overenthusiastic interpretation is something to be cautious of here, but we can still learn much from these models. It is only when development becomes the key issue that more assumptions have to be taken into account and the making clear of this distinction has been one of the main purposes of this chapter. And this is where STC proves its worth. It describes clearly how assumptions are used to build models which in turn lead to the (re-) statement of cognitive theories. It becomes obvious, from an STC point of view, that this process could go wrong at at least two points: one, at the choice of assumptions and two, at the interpretation of results. These two different stages should then lead to different kinds of problems and in the above examples we have seen that these do indeed crop up. In this chapter I have looked more closely at the implications of making explicit the steps taken in the progress of connectionism. On the one hand valuable lessons can be learned with a connectionist approach, but there are also some potential problems which have to be taken into account. What does this mean for the position of connectionism in the field of cognitive science, especially for its proposed role as alternative to a symbolist approach? It is this question that will be approached in the next chapter.
23
Chapter 4 On other metaphors and the difference between symbolism and connectionism Tucker (1992): “[...] if connectionism produces a paradigm shift in psychology and neuroscience, it may come not from specific modelling algorithms, but from the discovery of a more appropriate metaphor for understanding cognitive processes in the cortex”, p. 76.
4.1 Introduction The theme of this chapter is best described by the following quote in Rumelhart (1989): “Our goal in short is to replace the computer metaphor with the brain metaphor”, p.134. However, I will place more emphasis on the differences between them, and less on the proposal of replacement of the one by the other In the concluding remarks I will, nevertheless, propose that from a methodological point of view, connectionism does offer some advantages over the symbolist paradigm. The two metaphors mentioned by Rumelhart each lead to different ways of approaching the subject of cognition, so in order to appreciate the differences between connectionism and symbolism14 a small side-step into the theory of metaphors in scientific language is necessary15. It is this theory of scientific metaphors that will help to develop a description of the progress of symbolism in terms of the model that was described in chapter 1. Such a description will enable us to compare the two research programs within the same context, thereby making explicit some of the differences in the light of this approach. So in this chapter a short introduction into the model of scientific progress based upon metaphors will be given. I will use elements from STC to describe the progress brought about by these metaphors. In a similar manner the traditional16, symbolist paradigm will be described in more detail according to a Stagewise Treatment of Symbolism (STS).This is followed by a account of the more problematic aspects of this paradigm, in terms similar to those used for the description of connectionism in the previous chapter. Once this model is outlined, we will be able to remark on the differences between these two approaches to cognitive science. I will do this using yet another model of progress in science: the empiric cycle, which is taken to be a standard and general way of proceeding with scientific work. Using this methodological tool as a yardstick, it is possible to make the comparison more objective but also to assess the value of STC itself.
14
Connectionism and symbolism are no longer the only ‘presidential candidates’ we’ve got. A newcomer with as yet outside chances which should prove to grow, is a field called Artificial Life (A-Life) (Brooks (1991)) or ‘computational neuroethology’ (Cliff (1990)). In stead of trying to model aspects of the cognitive apparatus of complex beings such as humans, they claim we should start by investigating the complete networks of simpler beings, i.e. insects, and work our way up the phylogenetic ladder using these insights. See also this chapter and see Brooks (1991) for a very persuasive description of the assumptions underlying this up-and-coming new game in town. Brooks can be considered to be the ‘founder’ of this specific field. The main cognitive discussion at the moment is taking place between symbolism and connectionism, so I will concentrate on this, keeping in mind that this ‘outsider’ is quickly becoming another serious contender in this arena. 15 Overviews of the role of metaphor and analogy in several fields of psychology are given in for example Leary (1990) or Van Besien (1989). 16 Oliphant (1994): “[...] “traditional” is one of the classifiers which psychologists these days apply to any activity which has been going on for at least four months; another such classifier is “classical””, p.31.
24
4.2 Metaphors Metaphors have come and gone in the psychological field. The watch, the steam engine, the telephone switchboard en most recently the (Von Neumann) computer are all machines that have been used to describe certain aspects of human behavior. Metaphors have been important in trying to understand the workings of the human mind. Sentences like “I tried to unwind by letting off some steam” are an indication of the fact that they have great implications on the way the subject of ‘cognition’ is approached. Using the elements of STC, a description of the progression in the formation of theory which is guided by a metaphor can be constructed (Figure 4.1).
Compare the workings of the machine in study to similar phenomena in the human situation
Apply descriptions to the 'real thing' (Interpretation of cognitive processes)
Figure 4.1. Description of progression in studies using metaphors not related to the brain.
The two boxes presented in this adapted model are comparable to boxes 2 and 4 of the original four-stage STC model, connected by link number 1. In these boxes the neural network models are tested to see if their behavior is like that encountered in humans and new insights are applied to the description of cognitive processes. In a similar way metaphors show some kind of behavior that is like that encountered in humans, or are taken to show such similar behavior, and these phenomena lead to new descriptions of cognitive processes. The connection between the boxes of Figure 4.1 is therefore, by analogy, (it is also quite literally the way the analogies between the metaphor and the object in study are made) the bold link from Figure 3.1. If that is so then all of the inherent dangers (but of course also the positive aspects) described in chapter 3 are also applicable to these metaphors. Unlike connectionism though, the ‘programs’ or theories based on these metaphors have no box number 3, in which analysis of the techniques takes place (where the human situation is kept in mind) to compensate for these possible pitfalls. For example, there is no way to develop theories on steam engines which could be directly transferred to the brain. Furthermore no recurrent link, similar to the one found in the model of the connectionist program can be constructed for these metaphors (although for symbolism a case can be made for a recurrent link on a conceptual level; more will be said about that later). No insights in human cognition will be ‘ploughed’ back into more advanced models of the watch, the steam engine or the computer17 (at the hardware level). When described in this way, it seems clear why non-recursive metaphors are bound to be replaced by others. There comes a point where the limit of its value has been reached (when it has been milked for all that it has), because it cannot be updated by new insights. 17 Parallel computers do not provide a counterexample to this statement, as they are motivated mainly
by computational concerns and not biological ones.
25
The main differences between STC and the stagewise description of these metaphors therefore lie in the absence of a box 3 and the absence of a recurrent link between the first and the last box. As symbolism is based upon one of these metaphors, these distinctions alone are enough reason to suspect that connectionism is a different paradigm from symbolism, bringing the discussion to another level. 4.3
Setting some symbolist stages
4.3.1 The conceptual level Symbolism is the research paradigm which uses the assumption that cognition can be described as being the result of syntactic manipulation of symbols and symbol structures. The inspiration for this idea was taken from the computer metaphor, which separates the software from the hardware. In a similar way the cognitive computer program can, according to the symbolist, be separated from its implementation medium, the brain, giving researchers the opportunity to develop models of cognitive functioning without having to worry about the neurological and biological details. It is the software that counts and not the medium in which the software is implemented; the cognitive program could run on any machine, be it Unix or brain, just as any Prolog program can be run on different computers. Gardner (1984) argues that the idea of symbols (or compatibly: propositions) has existed long before the advent of the Von Neumann computer, but that its introduction presented the opportunity to support and make explicit these ideas in computational models. He also presents a comprehensive overview of the scientific disciplines influenced by the symbolic way of thinking and the use of computers. Bechtel and Abrahamsen (1991) present such an overview for the connectionist influences in chapter 8 of their book. So basically, what it comes down to is the following: symbolism states that when we have a formally correct computer program, the right input and output, the correct internal representations (the symbols) and the relevant algorithms, we will not only have simulated human cognition, but more importantly, that such a system is capable of thought just like us. Newell and Simon (1981) have developed the idea of the physical symbol system: “A physical symbol system consists of a set of entities, called symbols, which are the physical patterns that can occur as components of another type of entity called an expression (or symbol structure) [...]. The Physical Symbol System Hypothesis. A physical symbol system has the necessary and sufficient means for general intelligent action [...]. The two most significant classes of symbol systems with which we are acquainted are human beings and computers”, pp.40, 41 and 64. Because of this distinction between software and hardware, it is not possible to transfer the model of Figure 4.1 directly to the symbolist case. This research paradigm works on a different level and as indicated above, in this case there is also a recurrent aspect to the progress of the field. I will call this level the ‘conceptual’ level, since the objects in study are conceptual objects like symbols, symbol structures and syntactic rules operating on these entities. Connectionism could in a similar way be called more material, as the ‘implementation’ medium is also a studied entity. 4.3.2 Stagewise Treatment of Symbolism (STS) To understand the implications on the debate between connectionism and symbolism, a more thorough description of the symbolist progression in terms of STC is needed. Figure 4.2 presents two levels of description in which the ‘lower’ one refers to the 26
computer metaphor and the ‘top’ one to the actual symbolist process, which, as indicated in the introduction, should appropriately be called STS. Everything mentioned in the first part of this chapter is directly applicable to the lower (hardware) level, but the symbolist loop is different and can be described as follows.
Create conceptual models of cognitive behavior Compare the workings of the machine in study to similar phenomena in the human situation
Test the models
Apply descriptions to the 'real thing' (Interpretation of cognitive processes) Interpret cognitive processes using models
Figure 4.2. The two levels involved in the symbolist program. The 'material' (underlying) level describes the hardware metaphor of the Von Neumann computer. The overlying level describes the progress of the symbolist research program.
Insights in cognitive psychological phenomena are used to create complex conceptual models of cognitive behavior, which, after testing of the underlying hypotheses (i.e. by implementing them as computer programs), in turn can be used to gain more insights into the ‘real’ phenomena, which can are used to create even more complex conceptual models, etc.. These models are tested on a conceptual level, either by computer program or empiric experiments, and will therefore never be related to the medium of implementation, the brain. But for the symbolist this is not a problem since the metaphor of the computer has given him the distinction between hardware and software. It can only be a problem to those who feel that the brain has to enter the process at some stage of research. 4.4
Debate
4.4.1 Projections and brains One problem with this conceptual loop lies in the way it progresses. New cognitive models are in a large part dependent upon the creativity of researchers to come up with more detailed descriptions. But this has only in a largely indirect way to do with the way the brain works. It may be a trifle bold to state that symbolist theory does not correspond to external reality, but symbolism is easily associated with anti-realism in the philosophy of mind as symbolist theories often project their model of the mind on the human brain. Proposed theories are always interpretations of the observed behavior of the individual, but these observations are also guided by one’s theory about the world. The cognitive researcher observes and hypothesizes from a certain cultural, scientific and philosophical background, but the testing of the hypotheses is also done against this background (the appropriate term being ‘theory-laden observation’). It is the link with ‘external reality’ (empirical but not material, in that the brain is not taken into account) that remains 27
problematic if the explanations remain on this autonomous conceptual level (the subject of symbolist autonomy will be discussed in the next section). Note that this problem is in a way similar to the one mentioned in section 3.5 in connection with link number 1. Connectionism too could be seen to stand in danger of this type of problem. The difference lies in the subject to be interpreted. The computer metaphor leads to cognitive models in which the software/hardware distinction is the fundamental hypothesis. In the symbolist case, interpretations focus on behavior of the individual, conceptual models and the outcomes of implementing aspects of theories into computer programs. The brain metaphor leads to the construction of artificial networks, and interpretations focus on network behavior and the link it provides with the brain. Although the interpretative aspect might crop up, it is still the interpretation of something considered to be related to the brain (and hence ‘external reality’), instead of the interpretation of conceptual models without such a link. Bechtel and Abrahamsen (1991): “The [...] idea that the models work well because they are neurallyinspired concepts (i.e. that there is a causal relation) is speculative, but points the way to a potentially attractive bonus that is not shared by most other metaphors, including the currently dominant Von Neumann computer metaphor”, p.57, italics in original. This “attractive bonus” becomes most apparent when comparing symbolist and connectionist theories on development. How the process of rule-formation underlying development takes place in a symbolist account is not very clear. What is clear is that it must be a feature of the software. In connectionist accounts it is ‘simply’ a feature of the way networks learn and generalize, and this does provide insights into those ruleformation processes. This example shows that there are essential differences between the ‘interpretative’ aspects of symbolism and connectionism. Cliff (1990) also comments on the subject of interpretation of network behavior and argues for what he calls computational neuroethology. Instead of providing the network with input-output pairs (which still means manipulation of the results through underlying assumptions of the ‘programmer’), we should put the network in an environment18. The behavior of the model as it learns to negotiate the environment can then be observed and does not have to be interpreted. This may be helpful, but even so, we have just seen that observation can be just as tricky. So, although ‘interpretation’ might be problematic for some work that goes on through link 1, I believe that through more biologically inspired modelling, the problem could largely be avoided. Biological constraints, implemented in a network, limit the range of possible link 1 interpretations of the network, and in combination with ‘box 3’-analysis this interpretative aspect should not be expected to play a crippling part for connectionist accounts. One way of looking at symbolism that is directly applicable to this subject, is described by Winograd (1980), in a discussion on schema’s: “I still feel that the kind of phenomena that were pointed out and categorized were interesting and important, but dressing up the observations in the language of schemas did little or nothing to sharpen or develop them. A direct implementation of the purported schemas as representation structures in a computer program would have been uninteresting. It would have the flavor of much of the work we see, in which a program “plays back” the schemas or scripts put into, but in doing so does not provide any insight into 18 But most ‘environments’ in current experiments are limited in set-up and do not as yet provide a real
alternative for the ‘traditional’ input-output pairs. Only when these networks are let lose in the real world can this argument take hold (Olthof, personal communication).
28
what happens in those cases that do not closely match exactly one of the hypothesized schemas. The schemas correspond to classes of external behavior, which may not correlate in any straightforward way to the components of the internal mechanism (either physical or functional)”, p.228, italics in original. The same article, Winograd (1980), mentions Maturana, who has an explanation for the error that is made. When cognition is described, the following process is followed. First the scientist discovers a reappearing pattern of interactions of an organism. The scientist then comes up with a formal representation which characterizes the regularities. The next step is the tricky one, when it is assumed that the organism must really have these representations to be able to show the regularities, i.e. the formal representations are projected onto the brain of the organism. Lastly, experiments are designed to show the presence of these representations, or computer programs are implemented to see if the behavior can be generated. But the problem with this process lies in the way the explanations stay on a conceptual level, since constructing new models on the basis of other models provides no direct link with the medium which is doing all the work, the brain. The assumption that such a link is not necessary is the main difference between the symbolist and the connectionist programs. Rumelhart and McClelland (1986a, chapter 4) indicate why this difference is important: “[...] we have discovered that if we take some of the most obvious characteristics of brainstyle processing seriously we are led to postulate models which differ in a number of important ways from those postulated without regard for the hardware on which those algorithms are implemented”, p.130. Connectionism does provide a way of rooting the models into something more substantial, objective (“actually existing independently of the perceivers mind”, The Penguin English Dictionary, Third Edition, 1982) and real, by means of the link between box 3 and box 4, the one that is missing in model of the symbolist program. Here theories and (computational) models are directly tested for their relevance to the workings of the mind19. 4.4.2 Progress and lapse The application of STC makes the realism-projectivism distinction clear and might even indicate or predict the lapse in the ‘classical’ symbolist program. Dreyfus and Dreyfus (1990) comment upon this (potential) lapse: “In the light of this impasse (the common-sense knowledge problem, or frame problem (see for example Dennett (1984), R.M.), classical, symbol-based AI appears more and more to be a perfect example of what Imre Lakatos [...] has called a degenerating researchprogramme”, p.326, italics added. In terms of Lakatos a degenerating research program is a research program that, after a promising start, has fallen behind in developing its explanatory powers compared to another competing program (Chalmers 1984). There are limits to the complexity of the symbolist theories that can be constructed and limits to the human ability to understand and construct even more complex models. These limits could be reached before the limits of the cognitive powers of the mind have been described at an appropriate level of complexity. But, there might be a catch, as Dreyfus and Dreyfus (1990) argue:
19 De Groot (1975) gives another relevant definition of the term objective: "being focused on letting the
object of study to speak for itself". If we see the brain as being the object of cognitive study then connectionism is objective in that it takes into account this medium.
29
“Still, there is an important difference to bear in mind as each research-programme struggles on. The physical-symbol system approach seems to be failing because it is simply false to assume that there must be a theory of every domain. Neural network modelling, however, is not committed to this or any other philosophical assumption. Nevertheless, building an interactive net sufficiently similar to the one our brain has evolved may be just too hard”, p.330, italics added. There is a potential answer to this problem. Cliff (1990): “[...] if the details of the brain are so horrendously complicated, then perhaps the answer is not to study simple models of large brains, but to study large models of simple brains”, p.21, italics in original. Instead of starting at the top of the phylogenetic ladder and for example language research, we should start at the bottom, i.e. modelling insect behavior and work our way (bottom) up from these insights. Inspiration for this approach comes from the up-andcoming field of ‘Artificial Life’ (e.g. Brooks (1991)). Although it is rather extreme to suddenly abandon all work on higher cognitive phenomena and to turn ‘en masse’ to the study of insect models, as Cliff would have it, something could definitely be said for such an approach. Another potential answer to this problem, quite the opposite of the one mentioned above, comes from Karmiloff-Smith (1992). She argues that it is not the whole brain that is modelled by artificial networks, but that they should be seen to implement functional parts or modules of the brain, where one such module covers a cognitive microdomain or one aspect of that domain, instead of being a ‘general problem solver’. 4.4.3 Creativity and constraints So there are potential limits to the complexity that can be reached in the symbolist paradigm. But there is no limit to the creativity of humans and therefore no limit to the number of different theories that can be construed for the same phenomenon. This is an interesting but potentially frustrating aspect which is encountered when delving into the cognitivist or symbolist literature: most cognitivist or symbolic inspired theories have a plausible ring to them, but you can always find another theory (or metaphor) for the same cognitive phenomenon which sounds just as good or even better. Perfect examples of these ‘battles’ are the debates on mental imagery (e.g. Kosslyn (1990)), the debate on emotions (Mandler (1984) and Oatley (1992)) or more generally the debate on the nature of consciousness (e.g. Jackendoff (1989)). And experiments can always be found in favor or against these theories. Olthof (1994) mentions this creativity in a discussion on the use of connectionist techniques in the area of developmental psychology. When trying to ‘prove’ that a child is really like an ‘adult in pocket-size’20, experiments have to be designed to show that adult skills are already present at a very early age. The younger the child, the more creative the researcher has to be in selecting the experimental set-up, and the terms used in this context are ‘cunning’ or ‘crafty’. Another part of the problem which is facing symbolism lies in the fact that non-recursive metaphors ‘dry up’ eventually. The computer metaphor underlying symbolism has been used up to some extent now (short term, long term and working memory and buffers being examples of analogies made) and what remains for the symbolist approach is the subjective conceptual loop which overlies the metaphor. Connectionism on the other hand depends on something more specific as a basis for its recursiveness. The more details are known about the brain, the more details can be incorporated into new models and architectures which in turn could lead to greater insight, etc (one example will be given in chapter 5, when discussing the work of Elman 20 This is one of the approaches within the field of developmental psychology which states that all
cognitive capabilities in for example the area of problem solving are already present in the child and only have to be ‘extended’ or ‘awakened’.
30
(1993)). Connectionism is constrained in this sense because it cannot switch to another model (which can be done otherwise, symbols being ‘multi functional’, i.e. from semantic nets to scripts and frames to tacit knowledge). But the brain is what the whole area of research is about, so it is the appropriate metaphor and the constraints therefore something positive. Creative processes will therefore mainly concern the translation of brain architecture and processes into new models. Furthermore, due to the possibility of entering new information (like new theories on development) or more advanced technology, like massively parallel computers, from other areas of research (the system being an open system) connectionism should be less prone to stagnation in progress. Bechtel and Abrahamsen (1991) argue that the symbolist program has given itself a special status of being an autonomous special science which cannot be reduced, due to the independent software level. But: “The strong autonomy position seems to entail that there can be no fruitful direct interactions between the special sciences and more basic sciences. [...] The history of science, however, suggests that studies at different levels do contribute to each other”, p.283. The openness of the connectionist system does present the possibility of contribution and progress even depends on the interaction for a part. This surely is an advantage if cognitive science is taken to be a multi-disciplinary effort (e.g. Gardner (1984)). 4.5 In terms of STC The discussion as described above provides a background for a more direct comparison of STC and STS. The debate should become more clear when expressed more directly in terms of the distinctions between the two models. In Figure 4.3 they are placed side by side. A comparison shows that the STS model is similar to the part of STC which uses the link between box two and box four, the bold link from chapter 3. If this is so, then all of the advantages and disadvantages mentioned in chapter 3 regarding link number one should also be applicable to what is happening in STS. This is the gist of the discussion given above, in terms of symbolist ‘troubles’. We have seen in chapter 3 that the use of a link number 1 could be problematic, and now we clearly see that this link is the only link available to the symbolist side21. The connectionist camp still has another ‘escape route’ from these potential problems and here lies the main difference between the two. This difference is enhanced because of the nature of the escape route. The link it offers with the implementation medium, the brain, provides additional advantages, as indicated above. The step from box 3 to number 4 in STC provides the way of staying in touch with the brain, something which is not considered to be too important in the symbolist approach. The question “Is it possible to use these insights in describing what is going on in the real thing?” is the main part of this mathematical and analytical process. Whether or not this question can be answered in a positive way, the recurrent link makes sure that all the effort is not wasted. New models can be constructed on the basis of this new information and the cycle run through again. In addition, it must be noted that links 3 and 4 have not been included in this figure (nor in Figure 4.4 which follows in the next section). They present the possibility of testing new theories with existing network models, and the possibility to use the analyses of networks to come to new theories on the inner workings of these models. Although not directly relevant to the discussion presented here, they do of course add to the versatility of the paradigm. It should by now be clear how connectionism differs from symbolism in terms of the recursiveness of the programs and the way the brain enters the picture (or doesn’t enter 21 The talk of ‘sides’, ‘camps’ and ‘competition’ may not prove to be very constructive. I believe that in
stead of this polarization, co-operation and integration will be much more valuable. See for example Smolensky et al (1992) or Tijsseling (1994) for comments upon such a process.
31
the picture). Now that the brain has become its own metaphor, a different recursive process has been entered. assumptions -> models
design models
check models test models analyse models
transfer techniques application insights
apply insights
Figure 4.3. Direct comparison of the main parts of STC and STS.
4.6 Objectivity revisited: the cycle of empiric progress Describing the competition in terms of a model specifically designed for ones own uses is not the most ‘objective’ way of proceeding with a debate (although a technique used very often: twist the words of the opponent in such a way that they become easy to reject and never mind the original meaning). It is therefore necessary to introduce such an objective component into the discussion to increase the credibility of this exercise. The field of methodology in science is well established and the theory on the empiric cycle of scientific progress is well accepted. So an application of these theories to the models described in this thesis could give further insights into the debate. De Groot (1975) describes the five stages of the cycle of empiric progress. Stage 1 is concerned with the observation of the material. In stage 2, an inductive process takes place when from the observations hypotheses are put forward. In the next deductive stage, stage 3, consequences are deduced from these hypotheses, i.e. the predictions which follow from the assumptions are made explicit, to be tested in stage 4. The results of these experiments lead to an evaluation of the hypotheses and therefore also of the theory which is built up from them. Stage 1 can now be entered again with these new findings, starting the process again. This is more or less the standard, accepted (basic) way in which scientific research takes place, so it is important to relate these aspects to the models which have been developed up to now. Symbolism is a research program which follows these five steps very closely: behavior is observed, hypotheses induced (one of the foundations being the assumption that hardand software should be separated), models deduced, and programmed into the computer to be evaluated, leading to new hypotheses and some modification of the theory. It is more difficult to apply this model directly to connectionism. Behavior, but also the brain, is observed, hypotheses induced (one of the foundations being that the neural substrate has to play a large role in the formation of theory), network architectures and models are deduced after which these models are ‘programmed’ and evaluated. So far this sounds remarkably like going through the four-stage model when the path that is taken goes through link number 1. In the connectionist case however there is another stage, box number 3, which enters the picture and it is not clear how to describe this step in terms of the model of De Groot. It looks as if between the testing and evaluation another stage should be described, namely one of analyzing and transfer of the (mathematical) insights gained from these analyses. De Groot’s stage 5 would then be concerned with the 32
evaluation of these analytical instruments, and not with the direct results of the experiments. It seems then that both research programs can be described using the general description of how ‘good science’ should progress. However, connectionism goes beyond this description by adding another stage to the process. Taking Figure 4.3 and filling in the terms mentioned above in the appropriate places we get the following picture (Figure 4.4). This is something we had already noticed in the previous section, which again comes to light in a different setting. Connectionism provides another route to insight into brain processes than just a conceptual, ‘interpretative’ bold link and it is by using a model like STC that this difference becomes explicit. deduce
deduce
assumptions -> models
design models
test check models
test test models induce
analyse models
induce
evaluate observe transfer techniques application insights
evaluate observe apply insights
Figure 4.4. Direct comparison of the two models using terminology of De Groot (1975).
The emphasis in the last sections has been on box number 3, so at this point it becomes important to stress that the differences between the two paradigms do go beyond the mere addition of such a box. We have already seen that connectionism offers more, for example by presenting the opportunity to include biological and environmental considerations and through that, the possibility to entertain alternative theories to the ones proposed from the hardware/software metaphor. 4.7 Concluding remarks The discussion presented in this chapter hinged on two pairs of opposing terms: ‘computer metaphor’ versus ‘brain metaphor’, and ‘realism’ versus ‘projectivism’. In coupling these terms, in a setting provided by elements of STC, I hope to have made clear the distinction between (the progress of) a symbolist and a connectionist account of cognition. What follows is a summary of the most important aspects of the discussion. I have labelled the computer metaphor, and thereby the symbolist paradigm, as projective, because it leads to autonomous, conceptual models that have no link to the brain. These models are projected onto the brain since the computer metaphor does not provide for the possibility of finding a real link. The conceptual models are only constrained by the idea that a distinction between hardware and software is to be made, and that rules and symbols are the building blocks 33
of an account for cognition. These models are therefore constrained mainly by inventiveness of the researcher, within this framework. This leaves much room for the statement of a whole range of theories, but according to STS there is no way to ever relate these to the brain, as no such relation is sought. Verification therefore can only take place through empirical (naturalist) studies, which do not have to be conclusive, as they can be taken to be theory-laden. The brain metaphor I have called realist. It leads to models that have some link to the actual implementation medium, the brain. These models therefore contain natural constraints on the theories that follow from the study and analysis of their behavior. According to STC, there exists a possibility of a link to actual brain processes through box 3, as well as empiric verification (link 1) through naturalist studies that are inspired by results of network experiments. Furthermore, due to the openness of the connectionist account, it can be embedded into already existing scientific theories, or it can at least incorporate aspects of these to gain further insights. Examples of such a process will be given in chapters 5 and 6, when examining potential theoretical frameworks for connectionist modelling. We have seen that, due to the special autonomous status of symbolism, such “fruitful direct interaction” with other scientific theories is precluded. On methodological grounds then, we see that connectionism has more to offer by the background it offers potential connectionist theories (i.e. by having the brain as metaphor), by having two routes to cognitive insights (either through link 1 or box 3) and by offering the potential for interaction with established scientific theory. As for the role of STC, we have seen in this chapter that it holds its own in the comparison with a more accepted way of looking at empiric progress as described by De Groot (1975). The elements of STC can be used to describe more than just connectionism and can therefore make explicit some of the differences between connectionism and symbolism. So far I have described some of the general consequences of describing connectionism using STC in relation to the paradigm itself (chapter 3) and in its relation to its main ‘competitor’ (this chapter). What remains to be done, is to give an example of the application of STC in one specific area of research, to further determine its descriptive and its potential prescriptive powers. As indicated before, developmental psychology seems a good domain to do just that. The following two chapters will concentrate on cognitive and biological modelling. As we have seen in chapter 2, cognitive modelling concerns mainly link 1 type theoretical conclusions. Biological modelling focuses on the actual structure doing the work and what might be learned from taking into account different assumptions about (the development of) that structure.
34
Chapter 5
Cognitive developmental modelling
5.1 Introduction to the problem As children learn their language some general phenomena take place. One of these is a stage of overgeneralization of rules for forming the past tense of verbs. Up to a certain point in their development children produce correct past tenses for regular and irregular verbs. Then, errors start to crop up: instead of producing the past tense of come -came-, the child will overgeneralize and produce something like -camed- (or eat / ated, pick / pack). As the child learns more about its language, partly by being corrected by adults in its environment, correct performance will after a while again be attained. According to the traditional view the acquisition of English verb morphology supposes a dual mechanism of rote learning and systematic, rule-wise treatment of verbs. The first mechanism simply stores the learned verbs so that the past tense forms are produced correctly. In a second stage, the rule mechanism ‘discovers’ the underlying rules of forming past tenses. It is at this stage that overregularization and irregularization takes place, as the rules discovered are not applied correctly all the time. In a third stage, rote learning is again used to learn the exceptions to these rules so that eventually all past tenses are produced correctly. In the transition of the first to the second stage a process of system building has to be triggered, but the traditional account cannot provide a clear picture of how this is done. Connectionism on the other hand can represent an alternative to this dual mechanism architecture by means of a single mechanism system, i.e. an artificial neural network architecture. Memorization and generalization are important features of neural nets which together can account for the processes described above. These inherent features take away the necessity of assuming a dual-mechanism system, as the transition is ‘simply’ something these architectures do when presented with a sufficiently large (and representative) set of input patterns. The problem then becomes one of showing that neural networks do actually present such an account in terms of a single mechanism medium. I have already noted in chapter 2 that such feasibility studies of neural network accounts are very common in the literature. In this chapter I will present a number of examples of such studies, two concerned with the subject of past tense formation as described above and two other examples concerning the prediction of syntactic category of complex, imbedded sentences. As we will see, these studies and experiments are related in several ways and give a nice example of progress in connectionist accounts, thereby presenting an opportunity to put STC to its descriptive use and to corroborate the model further. Central issue will be the way the networks are made to learn their task and one thing that follows from these examples is that by taking into account concerns that focus on the development of architecture, more interesting insights can be obtained. But although these accounts do propose alternative ways of looking at development, something is still missing. The examples present isolated cases of how network studies propose a different account of development, but they provide, as yet, no alternative, coherent theory on development. As the effort up to now has been put mainly in the study of the feasibility of such a potential connectionist theory, this should not come as a surprise. I will, however, argue in section 5.5 that what is becoming essential for connectionist accounts in these areas, is a theoretical framework from which theories can be further developed and tested. Such a framework should provide a cognitive theory on development that is compatible with connectionist concerns, from which experiments can be inferred that can enhance both the connectionist and the cognitive theory formation. A good example of such a framework is given by Karmiloff-Smith (1992) and I will describe her work in this section. Before doing so, however, it is important to see what connectionist accounts have to offer such a ‘partnership’. The following examples indicate how cognitive theories follow from network experiments, from an STC point of view. 35
5.2 Incremental learning introduced The first example will serve to introduce the value of incremental learning and to introduce the network and experimental set-up that are also used in the last example. Elman (1989) sets out to answer one of the criticisms put forward by Fodor and Pylyshyn (1988), who claim that it is not possible for neural networks to encode representations with internal structure and which provide a basis for productive and systematic behavior. If true, it is a potentially fatal connectionist flaw (but see Clark (1991) for another reply). Elman (1989) has used a recurrent network to see if it is possible to encode complex and imbedded information. A recurrent network is essentially a feedforward, backpropagation network in which there are links going from either the hidden layer or the output layer back into the network. These connections add up to the next input, enabling the network to learn a time-series of input-output pairs. This timing is assumed to be essential in the understanding of sentences and, furthermore, this architecture is inspired by the brain in which recurrent (or reentrant, Edelman (1987)) connections are also present. These assumptions then comprise the assumptions of STC box 1, and the translation in a network architecture (Figure 5.1).
26
OUTPUT
10
HIDDEN 70
70
10 CONTEXT
26
INPUT
Figure 5.1. Recurrent network used in Elman's simulations. Rectangles indicate blocks of units; the number of which is indicated by the side. Forward connections are trainable. The connections going from the hidden layer to the context layer are fixed at 1.0 and link units on a one-to-one basis, effectively creating a copy of the hidden into the context layer. After Elman (1993).
The question he asked in one of two experiments described in the article was whether it is possible for these architectures to predict the syntactic category of the next word in a sentence, where the words are presented one at a time. One example is the sentence: ‘dogs see boys who cats who Mary feeds chase’. The only information available to the network (through the recurrent connections) are the preceding words, making it also a question of whether context-sensitive models, such as networks, can come to abstract generalizations on the basis of this type of input. So, in terms of box 2, the recurrent network was tested to see if it could perform the prediction task. What is most important for the purpose of this chapter, however, is the 36
way in which the training of the network is performed. Due to pilot-work done prior to these experiments, it was found that the network could not learn the task completely if all sentences were presented to it as a full set. The method that proved to be successful was that of incremental learning: start by presenting simple sentences first (like ‘cats chase dogs’) and in several stages, build up to the full set of simple and complex sentences combined. Box 3 analysis was essential in providing an account of the performance of the network, giving an example of the application of link 4. Link 4 leads back from box 3 to box 2 (see figure 2.3), and represents the way mathematical analysis of neural networks can lead to further insights in the way they perform their tasks. Elman uses this type of analysis (e.g. principle component analysis) to describe the behavior of the network. As more will be said about this type of analysis in the other example provided by Elman (1993), I will not go into this further here. As the network performed as was hypothesised, the link 1 conclusion was the following: connectionist models are capable of forming complex representations which posses internal structure and which support grammatical behavior which is abstract and general. Extrapolating to the cognitive apparatus leads to the statement that by supporting such productive and systematic behavior, connectionist models can provide an alternative account to the explicit rules proposed by symbolists. 5.3 Incremental learning explored In Elman (1989) the subject of incremental learning was mentioned only in passing as being essential in getting the network to learn the task. In Plunkett and Marchman (1990) the subject of incremental learning is the central focus of attention. The area of research is that of the learning of past tenses of English verbs, as I have described in the introduction. Rumelhart and McClelland (1986b, chapter 18) were the first to delve into this subject. They showed that neural networks could learn to produce correct past tenses for verbs and, more importantly, that the networks produced errors in generalization similar to those produced by children. Even though this work was -again- presented mainly as a feasibility study for connectionist possibilities, it was very heavily critized by Pinker and Prince (1988) as it stood for a serious threat to the reigning linguistic theories (Bechtel and Abrahamsen (1991)). Plunkett and Marchman pick up where Rumelhart and McClelland left off, to study the effects that gradual incremental learning can have on the behavior of the network during learning. Central concern was a further evaluation of the alternative account connectionism can provide for the acquisition of English verb morphology by means of a single mechanism (the network). The work done by Rumelhart and McClelland gives a first indication of potentials, but: “A sufficient evaluation of the single mechanism approach is hindered by the unrealistically abrupt vocabulary discontinuity that was introduced into Rumelhart and McClelland’s training set. While they clearly demonstrate that a single mechanism can perform both rote learning and generalization, as well as make a transition between the two, further evaluation of learning in a single mechanism architecture under realistic learning conditions has so far been obscured”, Plunkett and Marchman (1990), p.5. The “unrealistically abrupt vocabulary discontinuity” mentioned by Plunkett and Marchman refers to the way the network is first trained on 10 verb pairs, and after correct performance has been reached on those, how training proceeds immediately on the full set of 410 verb pairs. Plunkett and Marchman propose to gradually increase their training set to 500 verbs pairs, from an initial set of 20, by adding one verb every time a set number of learning trials (epochs) on the input/output set has been worked through, and which is therefore called an epoch expansion schedule. The network used was a multilayer perceptron with back37
propagation as learning rule, making the experiment essentially a link 3 process, in which alternative theories are tested using existing methods22. The alternative hypothesis to be tested in box 2 being in this case that such incremental learning should lead to more realistic and appropriate learning behavior. Analysis of the network behavior indicates that performance on the first 20 verbs remains correct, after they have been trained and after new verbs are added. This means that these first verbs have been learned individually, indicating a process similar to the process of rote learning taking place in children. Overall performance on those new verbs continues to deteriorate as new verbs are added to the set. This process, however, is reversed in experiments at a set size of about 50 verbs, when performance on new verbs improves rapidly. At this time, however, errors start to occur on some of the irregular verbs of the first set of 20 verbs. Again, these errors are similar to the overgeneralization errors made by children, strongly suggesting that at this point a process of system building has started to take place. As the timing of this change is constant across a number of experiments using different epoch expansion schedules23, Plunkett and Marchman conclude that the trigger for change is the quantity of verbs presented. In this, the network behavior mimics that of children when the ‘vocabulary spurt’ takes place. This is the time in development when, after initial slow learning of new words, there is a sudden increase in the rate at which new words are learned by the child. It is assumed that a reorganization of the vocabulary structure into coherent systems underlies this spurt. So, the network behavior shows some similarity to that observed in children as they grow up. Similar overgeneralization errors are made and in both cases there appears to be a change in representation after a certain critical mass of data has been reached. In more detail: analysis of network performance on trained verbs (i.e. the first 20) is analogous to naturalistic observations of children learning real English verbs. Network performance on the new verbs shows correspondence to results obtained in an experimental setting with children learning nonsense verbs. What link 1 insights are to be gained? First, the main aim was to further evaluate the value of an alternative single mechanism over the ‘traditional’ dual mechanism. Great care was taken in the preparation of the data set to ensure that it was empirically well founded, as this was one of the criticisms aimed at Rumelhart and McClelland’s experiments. Considering the results then, Plunkett and Marchman conclude that connectionism does indeed offer a feasible alternative account, based on a single mechanism. Second, the results suggesting the effects of a critical mass are important as they might be linked to other accounts of cognitive development, the representational redescription theory of Karmiloff-Smith (1992) being a good example, as we will see in section 5.5. Also mentioned are theories like Piaget’s ‘moderate novelty’ and Vygotsky’s ‘zone of proximal development’ to indicate similarities with the way the greatest advancement in learning in children takes place when new problems only exert moderate demands on their current knowledge state. Third, results suggest ways of further testing the ‘single mechanism’ theory. Variance in the expansion of the vocabulary (i.e. number of epochs of training before the next verb is added) leads to variance in types of errors made by the network. A suggestion for further empiric research is to explore the variance in errors made by children who start to talk early and those that start later.
22 There is a potential danger to this link also. By using backprop to test new theories, researchers take
over, mostly implicitly, the assumptions underlying it, but according to Grossberg (1988; 1987): “[...], the use of an unphysical process such as weight transport in a model casts an unanswerable doubt over all empirical applications of the model”, p.239. I would not go quite as far as this severe criticism. Cognitive modelling might not have to take into account such strict criteria, but it does show that indiscriminate use of any training scheme should be avoided. 23 Add verbs to the set after 1, 2, 5, 10 or 25 epochs on the previous set for the first 80 new verbs, and after that after 1 epoch for every new verb.
38
Analysis is done mainly on the results and errors present during training, to come to the above mentioned conclusions, even though one major goal of the work “was to determine the nature of the mechanisms that trigger the transition from rote learning to system building in children acquiring the English past tense”, Plunkett and Marchman (1990), p.16, italics added. Box 3 type analysis is restricted, however, to mentioning other researchers who have had success with incremental learning (e.g. Elman (1989)), and the suggestion that it is likely that improved overall learning is due to an increase in the network efficiency of uncovering principal components that define the problem space. If the subset is sufficiently representative, the network can approximate the solution space, which can then be further delineated by the rest of the full set. So, apart from stating that networks can be observed to pass from a period of rote learning to a period of system building, and that such a process is triggered by quantitative factors, no further analysis of the “nature of the mechanisms” is given. Emphasis then lies mainly on the lessons to be learned through link 1. 5.4
Incremental learning revised
5.4.1 One step further We have seen in the previous examples how the study of incremental learning has progressed. In Elman (1989) it is mentioned in passing and in Plunkett and Marchman it has a central role. Elman (1993) in turn mentions Plunkett and Marchman (1990)’s article explicitly when showing that better overall learning is achieved when using an incremental approach, i.e. by allowing the training corpus to slowly grow in size as training takes place. It seems then that an incremental approach is valuable, but Elman argues that although some simplification of language is used by parents when talking to their children (so-called ‘motherese’), this cannot be the whole story. Children do hear exemplars of all aspects of the adult language from the beginning and they (eventually) acquire grammatical rules correctly. If an incremental approach is valuable and if the environment is not really subject to change, then something else must be changing, i.e. the child itself. It is probable that innate predispositions narrow the range of what can be learned (“the child is not an unconstrained learning mechanism”, p. 73)24. and that therefore the ‘learning device’ has to be taken into account. Elman does just that in experiments with the recurrent network that is already introduced. The Elman (1989) experiment on the prediction of syntactic categories, which was the first example in this chapter, is used again, but now in a different setting: “It is a striking fact that in humans the greatest learning occurs precisely at that point in time -childhood- when the most dramatic maturational changes also occur”, p.71. “Interestingly, much of the work in learnability theory neglects the fact that learning and development co-occur [...] The typical assumption is that both the learning device and training input are static”, p.74. In Elman (1993) the approach is inspired by maturational aspects. The work follows from the question whether developmental restrictions in the learning device can be essential for
24 In another article, Bates and Elman (1993) discuss the subject of constraints within the traditional
developmental paradigm.
39
mastering complex domains. The assumption used is that there is a gradual, incremental increase in memory and attention span as the child develops. The box 2 translation of this assumption into neural network terms is the following: change the access the network has “to its own prior states” through the recurrent connections. This can be done by eliminating the feedback at intervals which are dependent on the ‘maturity’ of the network25. By eliminating the feedback in such a manner, the memory of the previous items presented is affected, so that the network has only a “limited temporal window within which patterns could be processed” (p.78). The data set is the full corpus of sentences which had proved to be unlearnable when the network did not make use of this technique (section 5.2). This novel use of existing networks is an example of the application of link 3. In this case the assumption about memory and attention span is ‘implemented’ by changing parameter settings in the internal mechanism of an existing network architecture. As no new architectures are proposed, this is not an application of the main recurrent link 2 of STC. The initial stage of training, with every third or fourth word randomly eliminated, proved to take more time than in the simulation with the incremental learning using staged input. After that, however, learning proceeded quickly throughout the remaining stages and the final performance was similar to the performance after training with staged input. No explicit similarities with the behavior of children are given, but the set-up of the experiment together with the results warrant the stating of several conclusions, which will be given next. 5.4.2 Link 1 conclusions The starting point of these experiments was the question if developmental restrictions can be important for developing correct performance in a complex domain. The first conclusion that follows from the results is that developmental restrictions can indeed influence performance in a positive way. According to Elman, limiting the memory capacity by eliminating feedback at certain time intervals ensures that the network can only learn a simple subset of the ‘adult’ data. Using the technique of staged input is an explicit way of limiting the complexity; using staged memory, this becomes a feature of the network itself. “With this perspective, the limited capacity of infants assumes a positive value. Limited capacity acts like a protective veil, shielding the infant from stimuli which may either be irrelevant or require prior learning to be interpreted. Limiting capacity reduces the search space, so that a young learner may be able to entertain a small number of hypotheses about the world”, p. 95. So one important link 1 insight to be gained from these experiments is that restrictions in the ‘learning device’ can be essential for learning about a complex world. Other types of link 1 hypotheses that can be ‘deduced’ from experiments like these have to do with the difference in learning capabilities of children and adults. We are all familiar with the difficulty adults have with learning a new language and how easy this seems to be for children. It is suggested that limited memory capability and attention span can act as a filter on the complex environment. Adults are no longer ‘in possession’ of this filter and therefore show different performance and different errors when learning this type of task. Further research into the differences observed between children and adults while learning a new language can support these conclusions. Furthermore, insights on the value of developmental restrictions can be transported to other areas of development. For example, to the field of perception:
25 This can be done by setting the values on the context units to 0.5. Eliminate this feedback randomly
after every third or fourth word in the first stage, then (with a new set) every 4-5 words; every 5-6 words; 6-7 words and finally no elimination at all.
40
“Turkewitz and Kenny (1993; first appeared 1982, R.M.) suggest that the problem of learning size constancy may be made much easier by the fact that initially the infant’s depth of field is restricted to objects which are very close. This means that the problem of size constancy effectively does not arise at this stage. During this period, therefore, the infant is able to learn the relative size of objects in the absence of size constancy. This knowledge might then make it possible to learn about size constancy itself”, Elman (1993), p.96. We see that the link 1 conclusions to be drawn from experiments can have great implications for further research. But also through box 3 can something be gained. 5.4.3 Box 3 Elman (1989 and 1993) gives an introduction into the internal workings of these recurrent networks using phase state portraits (which according to Elman can be thought of as the network equivalent of human EEG graphs). Put simply, these portraits are graphs which are based upon principal component analysis. In mathematical terms26: the principal components of a network are the eigenvectors of the weight matrix. If these eigenvectors are used as a new basis for the representation of the hidden unit patterns, more insight in the internal workings of the network can be gained, as these patterns stand for the internal representations of the network (see also Rumelhart and McClelland (1986a, chapter 9). Phase state portraits represent the path, or trajectory, of the representations of the elements of the sentence in the space spanned by these components in which each component stands for a certain aspect of processing (e.g. singular/plural distinctions or verb-argument structures). We have already seen why such analysis is important. I have mentioned Coltheart (1994) in section 3.3, who commented upon the distinction between the network architecture and the functional architecture. Analyzing the representations of the hidden layer of the network using these tools then, can tell us more about its behavior, as: “the network uses its hidden units to construct a functionally based representational scheme. Thus, the similarity structure of the internal representations can be a more reliable indicator of “meaning” than the similarity structure of the bare inputs”, Elman (1993) p.79, italics in original. For the moment these types of analyses are most valuable for application through link 4 (i.e. understanding the behavior of the artificial networks), as it is not clear how, or even if, these analyses might be applicable to networks of neurons. But by providing means to understand the internal workings of artificial networks, they represent a starting point for further exploration. Elman mentions a few properties of neural networks that play a part in the explanation for the above mentioned phenomena. Relevant properties of neural network learning are the following: one, they rely on the representativeness of their data sets; two, networks are most sensitive during the early period of learning (this is a consequence of using a sigmoidal activation function) and three, the gradient descent technique used makes it difficult for networks to change their initial hypotheses. Incremental learning using staged data then ensures that there is an external regulation of the possible initial hypotheses the network can make, by limiting the complexity of the environment. Staged structure on the other hand provides a filter which limits the hypotheses that can be made and some results suggest that noise is of the essence in this process. The noise is a feature arising from the complex sentences in which initially only parts can be ‘understood’. These parts are the simple sentences, that do form a coherent 26 It is too much to hope that this subject will become clear after this very brief introduction. For those
interested in further backgrounds of these analyses, the reader should consult the literature indicated. Those not interested can gloss over this part, as it serves only to bring across a suggestion of the type of work involved.
41
whole in the limited temporal window of the network. This could also explain why training in the first stage takes longer in the latter incremental training scheme: the basic hypotheses have to be filtered out of a larger set of data. Once the network is trained, it becomes more difficult to change the underlying function of the network; it is settled into one role or ‘set of hypotheses’ and new tricks become harder to learn. 5.4.4 Conclusions We have seen in these examples how assumptions about cognition can influence the way neural networks are used to study development. In the first example, the roots of the idea of incremental learning are present, but they are not the main concern there. In the example presented by Plunkett and Marchman, whose work is a further evaluation of possibilities of network techniques in the study of English verb morphology, one form of incremental learning is the main item under scrutiny. Presenting the data in an incremental fashion was shown to lead to better performances than can be achieved by presenting the full set from the beginning. Once it is established that incremental learning is useful (and sometimes essential), room can be made for further assumptions. Elman (1993) gives an example of such a process. If, by looking at the way children develop, the idea of presenting the input in an incremental fashion seems doubtful, an alternative can be found in letting the ‘learning device’ improve in an incremental manner. The fact that it is possible for a network to learn a task using this assumption leads the way to valuable insights and to further research. Re-examining the discussion on the use of wrong models, mentioned in chapter 3, we can now see how more can be learned about development when taking into account even one additional assumption. The goal of Plunkett and Marchman is mainly to show that neural networks provide an alternative one mechanism system theory for development and they do so convincingly. By incorporating an extra assumption about the development of memory and attention span, however, Elman can draw conclusions on the way limitations in the architecture of brains can be essential in providing filters on a complex environment. Surely this is a valuable addition to cognitive developmental theory formation, obtained only by letting insights related to the development of (cognitive) architecture guide connectionist modelling. As interactionist and epigenetic concerns gain popularity in theories on development, such insights cannot be overlooked. 5.5 A theoretical framework for developmental connectionism We have seen in the previous examples that connectionism can offer an alternative to a dual mechanism theory of development. Neural networks, by inherently combining memorization with generalization, can in a single mechanism account provide potential answers to questions about development. But so far, most of this work has been done in isolation, without attempts to incorporate the results into a more general cognitive theory of development. We now have some clues about the way networks perform the learning task, but a theoretical link with cognitive insights can place the work in a broader perspective. Bechtel and Abrahamsen (1991): “If connectionism is to take hold, either as a revolution or as part of a hybrid or pluralistic framework, at least the following events must co-occur. First, connectionism will have to provide the framework for the development of a number of specific models that are judged successful”, p.263, emphasis added. Such a framework should present a theory from which network experiments can be deduced, so that it may be evaluated and further advanced. One theory on cognitive development by Karmiloff-Smith (1992), comes very close to stating ideas about development that could be valuable as a framework for connectionist modelling. She introduces a theory on the development of representations in which ‘representational redescription’ (RR) plays a central role. Karmiloff-Smith sets out to 42
integrate nativist (Fodor) with constructivist (Piaget) insights in describing development as taking place in ‘microdomains’ (e.g., the theory of gravity within the area of physics) where innate restrictions, but also the environment cooperate. As children learn about microdomains, they first develop behavioral mastery through implicit representations (the I-phase) of a limited number -or a critical mass- of examples of events in that microdomain. Once these representations have stabilized the redescriptive process takes place, where the underlying rules become explicit, but not yet available to consciousness (E1). In later stages this knowledge does become conscious and available for other domains, and eventually children can report on these rules verbally (E2 and E3). An example should clarify this process. In balancing experiments, young infants (4 years old) initially are able to balance blocks correctly as they interpret each event as a single experiment (I). As they develop rules about balancing (around age 6), i.e. the centre of gravity should lie at the centre of symmetry (E1), errors are made with blocks that are tampered with (i.e. they are weighted), which do not follow these rules. Errors are quite persistent and the counterexamples are dismissed as irrelevant (Karmiloff-Smith makes a comparison with scientific theory formation at this point). The development of different theories (age 8) eventually leads to ‘untangling’ the rule on centre of gravity and symmetry and the child can balance the blocks correctly again and eventually give verbal reports on the rules used (E2/E3). Other, similar, examples are given from different domains, i.e. the child as a linguist, a physicist, a mathematician, a psychologist and a notator, to give further illustrations of such a process. The idea of representational redescription has been developed quite independently from any connectionist involvement and as such could not initially provide a description of potential mechanisms that might lie at the roots of redescription. At the end of the book, however, she outlines potential links between RR and connectionism (after initial scepticism about the value of PDP-modelling: “preposterous developmental postulates”). The inspiration is derived partly from the work of Elman. The connectionist mechanisms seem to combine nicely with the theory of redescription and could provide ways of implementing and testing it. Karmiloff-Smith conceives of the processes taking place in artificial networks as being similar to the first phase of development, i.e. the development of implicit representations27. What would be needed in subsequent network research would be to try to provide ways of ‘implementing’ the explicit phases of development, and a hybrid approach (in which symbolism might provide indications about the implementation of these further phases) is not out of the question according to her (see Tijsseling (1994) for an example of a hybrid approach to categorization). In her own terms, it is soft-core theories like these that can provide impetus for hard-core modelling theories and it could therefore be used as a starting point from which future research, incorporating connectionist insights, is directed. In her own words: “To the extent that they can, connectionism would offer the RR model a powerful set of “hard-core” tools from the mathematical theory of complex dynamical systems (Van Geert 1991). And to the extent that connectionist models fail to adequately model development, the RR model suggests some crucial modifications”, p.176. If we turn again to the quote of Bechtel and Abrahamsen mentioned at the beginning of this section we see that the representational redescription theory developed by KarmiloffSmith can provide just such a framework from which further network experiments can be devised: to explore, for example, the way networks could implement the E2 and E3 stages. And as the two different approaches (i.e. connectionism and the cognitive approach to development) do converge on a similar picture of development and similar 27 I do believe that a process approximating E1 is also taking place as networks learn their tasks. We
have seen in the examples mentioned above that at a certain point in learning a process of system building starts to take place in which underlying rules are somehow developed. This does sound very much like the E1 stage in which rules become explicit, but not yet available to consciousness.
43
conclusions on types of representation, even though advanced from independent perspectives, this holds promise for future collaboration. 5.6 Summary It has become clear from the examples in this chapter that STC can cover experiments well, by providing accounts for the building and use of networks according to assumptions; by describing types of analysis taking place (through link 4 and box 3 in combination with link 4). We have also seen that fitting connectionist insights into a framework describing development from a cognitive point of view, for example the one provided by KarmiloffSmith, can provide stimulus for further research. There are, however, other ways to stimulate further research and in order to present a complete picture, I will illustrate this next. The work described in this chapter focuses mainly on the subject of cognitive modelling, which was introduced in chapter 2. STC usage in these experiments is therefore restricted mainly to link number 3, in which new theories are tested using existing network architectures, although the training schemes used in the examples can differ somewhat from previous experiments. The other main recurrent link in STC, number 2, is not used in these experiments, as no new architectures are proposed on the basis of different assumptions about the (development of the) architecture of the brain. This type of work is more properly classified under the heading of biological modelling and in the next chapter I will indicate how STC deals with this area of research.
44
Chapter 6
Biological developmental modelling
“[...] one cannot construct an adequate theory of brain function without first understanding the developmental processes and constraints that give rise to neuroanatomy and synaptic diversity”, Edelman (1986), p.18.
6.1 Introduction We have seen in chapter 2 that by making some assumptions about the architecture of the brain, we can generate new network models via link number 2 of STC (Figure 2.3). In this chapter I will take a closer look at this type of modelling. The goal of this exercise has already been touched upon in chapter 3, where it was stated that the experiments done by Plunkett and Sinha (1992) might be overlooking something important in the set-up of their experiments, by not taking into account the changes in architecture that take place in developing brains (section 3.6). In the previous chapter we have already seen that such assumptions (e.g. Elman (1993)) can add much to theories. The assumption implemented by Elman was guided mainly by the psychological construct of short term memory. A further claim would be that actual changes in the architecture (i.e. changes in nodes and connections) could provide further insights. If we want to investigate this claim, STC prescribes that we should start with relevant theoretical developmental assumptions about the brain and translate these assumptions into artificial network architectures (box 1). These have to be tested to see if their behavior shows similarity to that shown by real architectures as they develop (box 2). From there, conclusions can be drawn on the value of the models and the insights to be gained (either through link 1, box 3, or both). In this chapter then, I will briefly describe the way brain development is currently assumed to take place. Some potentially usable assumptions for generating network architectures can be distilled from this picture and examples of such modelling and results of experiments will be given to indicate potentials. We have also seen in the previous chapter that it is important to try to find a framework as a basis for further experiments. The same is true for modelling which focuses on biological concerns and I will give an example of a framework that could be valuable in this respect in section 6.5. First, however something on methodology. The step from the descriptive use of a model to prescriptive use is a large one and needs more support. This will be given in the next section. 6.2 On the prescriptive use of STC We have seen in the previous chapters that STC can be used to accurately describe connectionist research. The model has been proposed on the basis of a study of a number of connectionist experiments, from which the ‘common denominators’ have been extracted. Expansion of the basic model (Figure 2.2) to the proper STC (Figure 2.3) was necessary to account for further steps encountered in connectionist practice. STC was then used to describe a number of experiments. In those descriptions featured all elements of STC, indicating that the model covers connectionist experiments well in practice. Apart from these practical considerations, however, a theoretical link to an established methodological model was also provided in section 4.6, where I introduced the cycle of empiric progress as described by De Groot (1975). In section 2.6 I have quoted Smolensky (1991), who commented on the two different levels of gaining insights through connectionist techniques: the formal algorithmic specification of the mechanisms of the networks, plus the semantic interpretation of the networks. Expanding the accepted model of De Groot to take into account the two connectionist levels of explanation mentioned by Smolensky has to lead to a model in which two pathways to insights should feature and, as we have seen, STC is precisely such a model. It can, therefore, be taken to be a more specific instance of the basic cycle of empiric progress, one which
45
handles connectionist theory formation. As such, it is backed up by an accepted prescriptive theory of progress. There are therefore three reasons for proposing that STC has prescriptive value. The first one lies in the ‘roots’ of the model, as it has been developed from the study of a wide range of connectionist experiments. The second reason is provided by the succesful description of a number of experiments in the previous chapters of this thesis. The third, and most convincing reason comes from the statement that STC can be taken to provide a specific connectionist instance of a generally accepted theory on progress. These three reasons then provide the background in this chapter for the proposal of how STC could guide future connectionist research. 6.3 The developing brain The following is a brief summary of the processes that take place as the brain develops, but it already presents enough material for inspiration for the generation of network architectures. The ‘unfolding’ of the basic architecture of the brain depends on the multiplication of brain cells and the migration of these cells to their specific neural sites. The regulation of this process depends on the genetic expression of cells28. The environment of cells determines the type of cell that will be expressed next and it is the DNA that encodes the rules for this expression. This genetic expression is important because it leads to a basic architecture that is relatively constant across individuals, thereby explaining how the coarse biological and cognitive make-up of individuals are similar. Next, cells send out axons and dendrites to meet the axons of the cells they are supposed to meet: “It is clear from many studies that a given brain region sends axons to only a limited number of other regions, and that synapses are made only in a specific part of certain cells in that region. The location of synapses are determined in part by genetic instruction, in part by orientation of the cell when the axons arrive, in part by the timing of axon arrival (axons apparently compete for available space) and in part by the use they are given once a connection is made”, Kolb and Whishaw (1990). During this stage of development there is an overproduction of synapses. Cell death occurs when the migrating dendrites have not been able to reach their enervation sites and this can take on quite large proportions, as much as 70% of the cells. Loss of synapses can take on similar proportions and could be as much as 50% (Kolb (1993)). After this pruning has taken place, the basic architecture is more or less fixed, through the process of myelinization, after which learning is ‘just’ a matter of groups of cells competing for the inclusion of other cells in their groups (or modules) or the strengthening or weakening of synaptic processes (e.g. Edelman (1987)). The picture that emerges is the following: genetic expression is responsible for the basic architecture of the brain, which is relatively constant across individuals. The filling in of the rest of the ‘final’ version during development is for a large part dependent upon the experience of the individual as it moves around in the world. Up to a certain point the architecture is relatively plastic, but after that myelinization takes place, as the individual starts to reach adulthood: a layer of ‘fixative’ is poured over the 28 The regulation of the architecture of the brain by genetic expression could also be the process that
evolution uses to change that architecture. When the timing of the expression of these molecules is altered, this could lead to the formation of new nuclei and new tracts (Edelman 1987). This idea is consistent with data indicating the advance of neural systems and components of these systems at different rates in different species. Edelman uses the term heterochrony to describe this process: "the alteration through mutations in regulatory genes of the rates of development of traits possessed by ancestors”.
46
brain structure so that further large scale changes in architecture, as those seen during development, cannot take place (but see Greenough et al (1993)). From this point onwards, another form of learning takes the upper hand, in which competition and the strengthening and weakening of synaptic connections play a major role. It should be noted that the picture given above is just a very coarse summary of some of the main processes taking place during development. Tucker (1992) presents a more detailed and comprehensive overview of the current knowledge about the several morphological changes taking place in the developing brain, in terms of migration, neural pathways, hemisphere control, (developing) functional regions in the brain, phylogenetic and physiological mechanisms. This review is much more complete than the one given here. It should however suffice for the purposes of this chapter. Learning after the phenotype has been ‘stabilised’ is, as indicated above, a process of competition and changing the strengths of synaptic connections. In conventional artificial neural networks the changing of weights is also the main process of learning, and competition (e.g. Kohonen networks) is an established method of learning. So the ‘conventional’ networks are certainly relevant for this type of learning. As I have argued in chapter 3, however, another picture arises from the description of the process of development. Not just the connections can undergo changes, but the whole architecture is changing and developing. From the viewpoint of STC, these are assumptions that should play a large role when designing new artificial network architectures for the investigation of biological (but also cognitive) aspects of development. Which elements from the description given above could or should be incorporated into a new architecture is a question that will be discussed in the next section. 6.4
From assumptions to models
6.4.1 Assumptions From an STC point of view, there are several things that can be taken into account when designing novel artificial architectures in box 1. Only a limited number of assumptions are mentioned to show how such a translation can be done. Many other processes can be taken into account (e.g. Crick and Asanuma in Rumelhart and McClelland (1986b), chapter 20). First, the architecture that is the result of the developmental process gets its shape from genetic (prestructural) and environmental factors. So an artificial network should have some initial structure which is adaptable by influences from the input data. KarmiloffSmith (1992) presents some examples of the ways in which this has been done. According to her, attention is also slowly concentrating on how more specific constraints (e.g. ‘innate’ prestructure in the networks or in the initial weight configuration) can influence learning. Constraining the structure of a network can influence the speed of learning and can much improve performance according to for example Happel and Murre (1994) and Murre, Phaf and Wolters (1992). Second, the initial overabundance of cells and connections and the pruning which takes place later, would suggest that the artificial architecture should initially also have a large number of connections. Depending on whether or not these connections are activated, they will either be allowed to stay, or they will be ‘pruned’ away. Third, once the system has reached some sort of ‘adult’ stage, it should become more difficult to do this pruning, and other forms of learning should start to take place, based more upon competition and the changing of weights than on architectural modifications. I have already stated in chapter 3 that it may be problematic to decide which subset of usable assumptions would be appropriate. Providing an answer to this issue will require much experimental work using different architectures. Next, we shall take a look at how such assumptions can be used to devise new network architectures. 6.4.2 Models
47
In a recent article, Nolfi, Miglino and Parisi (1994) combine some aspects of the description of development given above in a research project which follows recurrent link number 2 in STC almost explicitly. And again, we see that the work consists mainly of providing a feasibility study for biological modelling: “The main objective of the simulations reported in the paper was to show that simulations with genetic algorithms and neural networks can be used to explore many more aspects of biological phenomena than has been the case sofar”, p.11. They use a box 1 implementation of the first assumption (i.e. that genetic and environmental influences interact) to evolve neural network ‘templates’ that are further developed through exposure to simulated surroundings. Assumptions about genotype and phenotype interaction29 are used to develop evolutionally (by means of a genetic algorithm) an optimal neural network architecture which drives a small robot (Khepera) through an environment. Encoded in the ‘genotype’ of the networks are, among other things, the positions of the nodes in a two-dimensional network space (see Figure 6.1), the lengths and angles of branching segments plus and a threshold activation value above which the node is allowed to branch connections. Figure 6.1 presents an example of how this process works. The branching out of connections is done under the influence of activations of nodes as the network is learning, thereby presenting a possible interpretation of epigenetic processes. The genetic algorithm uses a single reproduction rule in which copies of the ‘fittest’ networks are randomly mutated to produce the next generation.
Figure 6.1. Example of the way networks are constructed through the method of Nolfi, Miglino and Parisi (1994). Connections branch out from the nodes, depending on the ammount of activiation received. When connections meet other nodes they are fixed, after which the nonfunctional parts are eliminated (rightmost picture). Taken from Nolfi, Miglino and Parisi (1994).
Results of experiments with evolution taking place in dark and light circumstances show that these different environments can influence architecture and performance. This leads the authors to the more general link 1 conclusion that the environment, through evolution and development, can have an influence on the architecture of a network, which was the aim of the article. The main idea behind the experiments being the exploration of the feasibility of biological modelling such as this, no attempt is made as yet to provide a box 3 type analysis. We can see, however, by looking at the final architecture in the example given in Figure 6.1, 29 Very crudely, the following equation can be used to describe the relation between these two:
phenotype = genotype + biological_constraints(environment), in which the biological constraints act as a function on the environment.
48
a first indication of how such an analysis could indeed become difficult. The network does not provide a regular weight matrix, like for example Elman’s recurrent network (Figure 5.1) so that study of the representations on the hidden units could stand in need of different analytical tools. The final network in Figure 6.1 is still relatively simple, but the next example will show more clearly how this becomes problematic. Boers, Kuiper, Happel and Sprinkhuizen-Kuyper (1993) give another demonstration of how biological concerns can shape network architectures. They use Lindenmayer systems to encode production rules for network architectures. In these L-systems, fractal-like rules underlie the building up of structures like for example plants. Just a few rules give rise to very life-like pictures of different varieties of plants (Prusinkiewicz and Hanan (1989)). In neural network experiments these rules can be used to encode the architecture and connections of networks. The link to living organisms can be found in the way DNA encodes rules to ‘express’ the molecules and cells which build up the individual. These L-system templates are made to evolve using a genetic algorithm, to produce networks that are capable of learning a task optimally. Some similarities with the above mentioned experiments done by Nolfi et al (1994) are apparent, but especially so in the complexity of networks that are involved (Figure 6.2) and the implications this could have for box 3 analysis. From Figure 6.2 it becomes clear that this type of analysis will not become easier as the network architectures become more intricate.
Figure 6.2. Example of one the networks constructed with the method of Boers et al (1993).
Murre, Phaf and Wolters (1992) and Happel and Murre (1994) also provide examples of modelling on the basis of biological considerations. In general, outcomes of these experiments are positive and show promise for advance in this direction. We have seen in the previous chapter that by providing a theoretical framework for cognitive modelling, further explorations can be deduced. The question arises whether such a framework could be found for the type of modelling that focuses on biological concerns. What would be required is an idea of the major assumptions underlying the development of the architecture of brains, developmental behavior and the translation of those assumptions into cognitive theories. Lenneberg (1993; original 1967) proposes a theory on the development of language couched in terms of an epigenetic approach, in which biology, species-specific features (like for example categorization in humans, and the language it makes possible), maturation of cognitive processes and social settings all have a place (but see also Piaget
49
(1993; original 1971) or Edelman (1987)30). The next section is dedicated to a closer look at this theory. 6.5 A biological framework Lenneberg (1993; original 1967) starts by stating five general premises on which his theory is based. These are: (i) Cognitive function is species specific. It is clear that different species ‘support’ different cognitive functions; human beings assumed to be on the top of the ladder by being able to use language. (ii) Specific properties of cognitive function are replicated in every member of the species. We have seen that the brain develops through the expression of cells by DNA encoded rules and that this genetic expression is important because it leads to a basic architecture that is relatively constant across individuals (section 6.3). The basic architecture thus supports those properties that are replicated. (iii) Cognitive processes and capacities are differentiated spontaneously with maturation. Lenneberg emphasizes that this premise should not be taken to mean that the environment does not contribute to the process of development. Rather, what is meant is that maturation of the brain is the process that guides development. The environment can act as a trigger for these processes and provides the essential information to further shape the architecture, but how it will be structured is already (latently) present in the individual. In a similar way we have seen in the experiments of Elman (1993) that maturation of memory and attention could be essential in providing a filter on the complex environment, thereby facilitating the learning of language. Nolfi et al (1994) provide an example of how this could be modelled using connectionist techniques. (iv) At birth, man is relatively immature; certain aspects of his behavior and cognitive function emerge only during infancy. This premise speaks for itself, when we look at the tremendous differences between the competent adult and the virtually helpless infant. (v) Certain social phenomena among animals come about by spontaneous adaptation of the behavior of the growing individual to the behavior of other individuals around him. We can again use Elman (1993) as an example of this process. If we take (adult) language to be such a social phenomenon, we can interpret the behavior of the network as an adaptation to the language behavior of adult individuals around it. In just a few pages (the excerpt of the 1967 chapter consists of only nine pages), Lenneberg puts down thirteen essential tenets of the theory. I will only describe the first two, as they are directly relevant to our purposes here, in providing a suggestion of connectionist compatibility. The other eleven tenets provide further elaboration of the theory in the light of the premises stated above. In the first statement the dependency of language on human cognition is proposed: “There is evidence that cognitive function is a more basic and primary process than language, and that the dependence-relationship of language upon cognition is incomparably stronger than vica versa”, p.41.
30 A quick scan through the list of references of Neural Darwinism show that none of the names that
feature in the PDP books (Rumelhart and McClelland (1986)), or other leading connectionist figures, are quoted in the preparations of this book (nor vica versa). Reading it in 1994 elicits some surprise that many of the ideas mentioned by Edelman are worked out in current models in the connectionist research world, although one does not encounter ‘Neural Darwinism’ much in reference lists. Some of the ‘labels’ have changed, for example, reentrant has become recurrent, but the ideas seem to be sound, given the results obtained so far within the PDP community. This book could not have been written now without at least a mention of some of these names, if only to back up some of the suppositions, theories and other basics involved. Of course the books were published a year apart but this could be an indication of the lack of communication between these fields of research.
50
At the time of writing this might have been a controversial statement, as in symbolist theory almost the exact opposite is assumed. According to the symbolist account, cognitive activity is based on a language of thought (‘mentalese’) and not the other way around, as is proposed by Lenneberg. In the second tenet of the theory, Lenneberg explains how cognitive activity could support language: “The cognitive function underlying language consists of an adaptation of a ubiquitous process (among vertebrates) of categorization and extraction of similarities”, p.41, italics added. Categorization and extraction of similarities is precisely what connectionist models do, so that compatibility at least is established. As we have seen in the previous chapter, network modelling can provide hard-core tools for making explicit parts of the theory and, furthermore, it presents a framework from which further connectionist experiments can be generated. 6.6 Concluding Remarks We have seen in this chapter that biological modelling, through the application of the recurrent link 2 of STC, can generate models and experiments that show promise for further research. Even from a brief sketch of developmental processes we can construct a variety of different network experiments. What this predicts is a blooming of different experiments in which different subsets of assumptions are used. As this type of modelling is still in a very early stage, no real conclusions, except possibly on feasibility, can be given. STC could play a part in the progression of models and experiments by indicating steps to be taken and, more importantly, by indicating which assumptions are to be made explicit in the exploration of biological theory. A theoretical framework, such as Lenneberg’s, could prove to be valuable in providing a background and inspiration for such research. In the previous chapters I have presented a ‘Stagewise Treatment of Connectionism’ and I have shown that such a methodological tool can present a new outlook on the debate between symbolism and connectionism in that it can effectively describe connectionist research that has been done in the field of developmental psychology. In this chapter I have indicated how it suggests future research as further assumptions about the brain architecture and brain development become available for modelling. It is now time to recapitulate all that has gone before (at least all that has been presented in this thesis) and to put things into some perspective. This is the aim of the next and final chapter.
51
Chapter 7 Conclusions "A developmental perspective is essential to the analysis of human cognition, because understanding the built in architecture of the human mind, the constraints on learning, and how knowledge changes progressively over time can provide subtle clues to its final representation format in the adult mind", Karmiloff-Smith (1992), p.26.
7.1 Introduction I had two main themes in mind while working on the preparation of this thesis: the STCmodel and developmental psychology. The basis for STC stems from a paper written for a course on the foundations of connectionism at the Utrecht University. The expansion of those ideas led, almost ‘by itself’, to the proper STC as worked out here. Before that and before the idea of STC really was developed, I was looking for a way to say something more about how connectionism can be of value to developmental psychology. Inspiration for this idea initially came from Spelke (1990), who looks at the development of vision in children and from the Elman (1989) article in which the idea of incremental learning was mentioned. A combination of the two in the field of perception should in my opinion lead to fruitful research, but this will have to be the topic of a follow-up of this thesis. Choosing between these two topics was something which I found hard to do, so eventually applying STC to developmental psychology became the obvious choice. In rounding off this thesis then, I will look at these subjects separately. 7.2 Connectionism and STC: an evaluation Figure 6.1 shows again the complete STC model. Chapter 2 was dedicated to a detailed description of the boxes and the links between them. In chapter 3 the theory of De Groot (1975) was introduced in order to make some methodological remarks on the model. This theory was later used to provide argumentation for the step from the descriptive use to the prescriptive use of STC. The distinction between link number 1 and a route to cognitive knowledge through box number 3 proves to be valuable in a number of ways. First, it becomes possible to evaluate these two different paths separately. We have looked at the possible merits and problems underlying link number 1. Although it is important in the formation of new hypotheses in light of new models, it has to be kept in mind that this process could be just as ‘subjective’, although in a different way, as the connections described in STS, and that some care has to be taken when stating connectionist claims. The value of box number 3 became clear in chapter 4 when looking at differences between the two main paradigms within the field of cognitive science. It provides a route through mathematical analyses of the processes in artificial networks to possible explanations in the field of the real thing. It is therefore very important that this kind of analytical work keeps on receiving attention, as it could be called an ‘objective’ foundation of connectionism (in the sense that statements about the workings of the object in question, i.e. the brain, can be made). STC also presents a practical way of looking at further research. Even though all the steps are already taken by (most) researchers in the field, it repays the effort to make the process explicit so that we know why we are doing what. An evaluation of connectionism on the basis of STC then leads to the following comments. Building networks on the basis of several assumptions might prove to be tricky business in areas in which the architecture of the brain (and its development) plays a large role. Deciding on relevant assumptions to use will be a process of experimentation with different models. Link number 1 still leaves connectionism with a possibility of theory-laden observation. We should be conscious of the fact that it is a ‘short cut’ and that most of the ‘real’ work lies in box 3. This means that a lot of emphasis has to be placed upon the analysis of the artificial networks, keeping in mind the purposes of box number 4. On the other hand, although we have seen that this link 1 could be problematic, we have also seen that the
52
formation of new theories and hypotheses by means of this link is very important in the way opportunities arise to state alternatives to already existing theories, based upon models inspired by the brain. We have seen that this inspiration is especially significant in the field of developmental psychology, as mechanisms of learning can say something about cognitive phenomena, i.e. the process of rule formation.
1. Generation of artificial neural network architectures based upon assumptions about the real thing.
2. Testing of the models to see if the behavior is similar to that encountered in the brain. (Testing of assumptions and hypotheses).
4.
2.
1.
3. Analysis of the artificial networks and development of theories on their workings. (The interpretation of artificial networks).
3.
4. Application of insights on the real thing. (Interpretation of cognitive processes).
Figure 7.1. The Stagewise Treatment of Connectionism (STC).
Box number 3 is important by providing the essential (mathematical) analyses of the artificial networks (either newly developed or imported), with which we should come closer to understanding what goes on in the real thing. We are still far removed from such a transfer, but in the mean time box 3 gives us new ways of looking at the artificial networks (through link number 4 of Figure 7.1), by providing metaphors of the behavior of these networks (i.e. finding principal components or local or global minima in an error landscape). New theories on cognition can be tested either by the use of existing networks through link number 3, or by the creation of new ones along link number 2, the ‘main’ recurrent link of STC. A comparison with symbolism showed a number of differences, which in my opinion are also advantages. First, connectionism gives the brain, the medium which does all the cognitive work, a place (again) within cognitive science. Soft biological constraints, 53
especially in the field of developmental psychology, provide new ways of looking at development and at already existing theories on development. Second, through link number 1, new cognitive theories can be proposed that are in some way related to the brain, as it is the metaphor for the models that are the basis of these new theories. Third, through box number 3, connectionism provides a link from ‘conceptual’ theories to the ‘material’ level of the brain, something lacking (and not deemed necessary) in the symbolist account, which stays on the conceptual level and projects the models onto the brain. Fourth, connectionist progress is based upon the development of models on the basis of assumptions about the brain. Incorporating more assumptions leads to different and more complex models and greater or different insight. Symbolist progress on the other hand depends more upon the creativity of the researchers to come up with different theories or improvements on existing ones. Fifth, especially in the field of developmental psychology, connectionism opens the way to a more interactionist account of development in which the combination of brain growth, the role of the environment and their combined effect on cognitive development is given a chance to prove its worth. Sixth, by being an open theoretical system (section 2.4), connectionism presents opportunities for cooperation with other more exotic theories on ‘life’ in general, like genetic algorithms, Lindenmayer systems (Boers et al (1993)) or the newly developing area of Artificial Life (A-life, Brooks (1991)). Symbolism, with its special status of autonomous science, does not offer the opportunity of interaction with other (basic) sciences. A number of these differences have become clear or more explicit through the use of STC, thereby lending support to the value of the model. 7.3 Connectionism and developmental psychology In chapter 4, I described the way in which the computer metaphor has shaped the foundations of cognitive investigations for the last forty odd years. The hardware/software distinction has had a large impact on the field of developmental research, as it placed the emphasis entirely on the software part. Development itself became a cognitive ‘program’, with unclear mechanisms setting off different developmental processes. The nature-nurture debate was fuelled once again by this metaphor. Learning viewed as programming comes down to choosing between different hypotheses already provided by the programmer, i.e. all hypotheses or potential rules have to be present in the system and only need to be ‘activated’: nativism returns. Molenaar (1986) comments upon this idea of nativism, by comparing the proposed necessity of Fodor’s innate origins of concepts or logical structures to a (theoretical) self-organising network (i.e. in line with Grossberg (1988)) in which the higher order rules emerge as the result of self-organizational processes. Molenaar suggests that self-organization offers a more plausible alternative to the proposal of innate concerns. The hardware/software distinction had done away with possible interactions between the brain and the environment, and the biological constraints such an interactionist view has to offer. The constraints were placed into the software as being part of the cognitive program, and not part of the hardware. Bates and Elman (1993) sum it all up: “Ironically, there has been relatively little interest in mechanisms responsible for change in the past 15-20 years of developmental research. The reasons for this de-emphasis on change have a great deal to do with a metaphor for mind and brain that has influenced most of experimental psychology, cognitive science and neuropsychology for the past few decades, i.e. the metaphor of the serial digital computer”, p.623, italics added. 54
In the mean time, epigenetic or interactionist concerns have been kept alive by several researchers (e.g. Lenneberg (1993; original 1967) and Edelman (1987)), but not within the main arena of cognitive science. The great value of connectionist research lies partly in the fact that it has stressed the change in neural structure that underlies (developmental) behavior. It is now possible and respectable (again) to look at the way environment can interact with genetic processes and, more importantly, it is now possible to simulate aspects of theories using neural networks. Connectionism has made room for the reoccurrence of these ideas, by challenging the computer metaphor and slowly but surely, developmental psychology is realizing these potentials. Even more potential lies in finding a framework from which further experiments can be deduced. Connectionist models can provide the ‘hard-core’ tools that can back up these theories and a theoretical framework can suggest new ways in which (developmental) modelling can be improved. I have given two examples; one potential framework for cognitive developmental modelling (Karmiloff-Smith (1992)) and one for biological modelling (Lenneberg (1993; 1967)). It is important to place connectionist work in the larger perspective provided by such theories. STC starts to play a part when we turn to some of the experiments that have been done. Link number 1 makes explicit the possible problems that arise doing experiments based on a small subset of possible assumptions which can be incorporated into the neural network models. An important assumption overlooked in some experiments (e.g. Rumelhart and McClelland (1986b), chapter 18, or Plunkett and Marchman (1990)) is the dynamics of the neural structure itself (and not the dynamics of just the connections). Although the first results of these tests have been valuable in directing attention to the possibilities of network architectures, and in stating new hypotheses concerning environment and single learning mechanisms, it seems that more assumptions should (and can successfully) be used in the development of architectures. Assumptions about increased memory size and attention span (Elman (1993)), and assumptions about architectural development (Nolfi, Miglino and Parisi (1994)) have proven to ‘hold court’ when used as basis for new use of existing networks or for ‘evolving’ new ones. One area which blends well with connectionist techniques is that of genetic algorithms, which are used not only to evolve optimal parameter settings, but also for the evolution of new network architectures. One problem this new approach does present, however, is the increased difficulty of analysis of the networks evolved by a genetic algorithm. Their complex structures often make it difficult to hypothesize about functionality, something which is already difficult in regular feedforward networks. A solution could possibly lie in the application of mathematical analysis of dynamic systems already existing in other domains. 7.4 The (loose) end(s) In this thesis, I have mentioned some important subjects only in passing, one example being self-organization through competition in neural networks. In the training method called unsupervised learning, the only thing presented to the network is the input data. The network is supposed to categorize the data in the data set, and after (successful) training a categorization of the input is achieved where each output node stands for a category. These networks are self-organizing in that they do not need the output data in order to converge on a description of the input space. Competition between the output nodes is the main process that shapes the way the network learns a categorization task, so this form of learning is often also called competitive learning. The development of motor skills (such as reaching) is just one example of selforganization in the human system. It is, therefore, relevant for developmental psychology in its own right, and deserves to be further investigated in a developmental setting. However, in this thesis, the main theme was developing the STC model and showing how it could be applied in a subfield of cognitive science. I cannot claim to have been exhaustive in covering the potentials of an interaction between this field and connectionism, although reading Karmiloff-Smith (1992) after a great deal of the work on 55
this thesis had been done, proved to be encouraging. From a developmental point of view she indicates what the possible value of connectionism could be. From my starting point, connectionism, I come to similar conclusions on aspects of such a cooperation (having read some of the literature quoted in the last part of her book). As my background knowledge in the field of developmental psychology is limited, to say the least, it was reassuring to find many similar ideas in the literature. This brings me to the end of this thesis. In the introduction I expressed my intention of presenting a diversity of subjects while at the same time making an attempt at coherence. I have tried to bring together several areas of cognitive research, to present a multidisciplinary picture of connectionism and development, in the spirit of my interpretation of the aims of CKI. I hope that this last chapter has helped to put most of the pieces outlined in this thesis into place. I believe that STC is a good model for describing and prescribing progress in the connectionist paradigm, and I believe that developmental psychology can benefit greatly from a multi-disciplinary approach, using insights and backgrounds from the several fields of research I have described.
56
References Anderson, J.A., Rossen, M.L., Viscuso, S.R. and Sereno, M.E. (1990). Experiments with representation in neural networks: object motion, speech and arithmetic. In: Synergetics of cognition, Haken, H. and Stadler, M. (Eds). Springer, Berlin. Bates, E.A. and Elman, J.L. (1993). Connectionism and the study of change. In: Brain development and cognition. A reader, Johnson, M.H. (Ed.). Blackwell Publishers, Cambridge , Massachussetts. Bechtel, W. and Abrahamsen, A.A. (1991). Connectionism and the mind. An introduction to parallel processing in networks. Blackwell Publishers, Cambridge, Massachusetts. Boers, E.J.W., Kuiper, H., Happel, B.L.M. and Sprinkhuizen-Kuyper, I.G. (1993). Designing modular artificial neural networks. Report nr. 93-24, Rijksuniversiteit Leiden. Brooks, R.A. (1991). How to build complete rather than isolated cognitive simulators. In: Architectures for Intelligence: the 22nd Carnegie Mellon Symposion on Cognition. Clark, A. (1991). Systematicity, structured representations and cognitive architecture: A reply to Fodor and Pylyshyn. In: Connectionism and the philosophy of mind. Horgan, H. and Tienson, J. (Eds.). Kluwer Academic Press, Dordrecht. Cliff, D.T. (1990). Computational neuroethology: a provisional manifesto. Cognitive Science Research Paper, serial number CSRP 162, University of Sussex, UK. Chalmers, A.F. (1984). Wat heet wetenschap. Boom Meppel, Amsterdam. Coltheart, M. (1994). Connectionist modelling and cognitive psychology. In: Collected papers from a symposium on connectionist models and psychology, Wiles, J., Latimer, C. and Stevens, C. (Eds.). Technical report no. 289, University of Queensland, Australia. De Groot, A.D (1975). Methodologie, Grondslagen van onderzoek en denken in de gedragswetenschappen. Mouton & Co, 's Gravenhage. Dennett, D. (1984). Cognitive Wheels: the frame problem of AI. In: Minds, machines and evolution, C. Hookway (ed), Cambridge UP. Dreyfus, H.L. and Dreyfus, S.E. (1990). Making a mind vs. modelling the brain: AI back at a branchpoint. In: The philosophy of artificial intelligence, Boden, M. (Ed.). Oxford UP. Edelman, G.M. (1987). Neural Darwinism: The theory of neuronl group selection. Basic Books, Inc., Publishers, New York. Elman, J.L. (1989). Representation and structure in connectionist models. Technical Report no. 8903, Center for Research in Language, University of California, San Diego. Elman, J.L. (1993). Learning and development in neural networks: the importance of starting small. Cognition Vol. 48, pp. 71-99. Fodor, J.A. and Pylyshyn, Z.W. (1988). Connectionism and cognitive achitecture: a critical analysis. Cogntion 28, pp. 3-71.
57
French, R.M. (1991). Using semi-distributed representations to overcome catastrophic forgetting in connectionist networks. CRCC Technical Report 51-1991, Indiana University, Bloomington. Gleitman, H. (1986). Psychology, 2nd Edition. W.W. Norton and Company, Inc., New York. Greenough, W.T., Black, J.E. and Wallace, C.S. (1993). Experience and brain development. In: Brain development and cognition. A reader, Johnson, M.H. (Ed.). Blackwell Publishers, Cambridge , Massachussetts. First appeared in Child development, Vol. 58 (1987), pp. 539-559. Grossberg, S. (1988). Competitive learning: from interactive activation to adaptive resonance. In: Neural networks and natural intelligence, Grossberg, S. (Ed.), pp. 213245. First appeared in: Cognitive Science 1987, Vol. 11, pp. 23-63. Happel, B.L.M. and Murre, J.M.J. (1994). The design and evolution of modular neural network architectures. To appear in: Neural Networks. Jackendoff, R. (1989). Consciousness and the computational mind. The MIT Press, Cambridge, Massachussetts. Karmiloff-Smith, A. (1992). Beyond modularity. A developmental perspective on cognitive science. The MIT Press, Cambridge, Massachussetts. Karmiloff-Smith, A. (1993). Self-organization and cognitive change. In: Brain development and cognition. A reader, Johnson, M.H. (Ed.), pp. 592-618. Blackwell Publishers, Cambridge, Massachussetts. Kolb, B. (1993). Brain development, plasticity and behavior. In: Brain development and cognition. A reader, Johnson, M.H. (Ed.), pp. 338-356. Blackwell Publishers, Cambridge, Massachussetts. Kolb, B. and Whishaw, I.Q. (1990). Fundamentals of human neuropsychology, third edition. W.H. Freeman and Company, New York. Leary, D.E. (1990) (Ed.). Metaphors in the history of psychology. Cambridge University Press. Kosslyn, S.M. (1990). Mental imagery. In: Visual cognition and action. An invitation to cognitive science, Vol. 2. Osherson, D.N., Kosslyn, S.M. and Hollerbach, J.M. (Eds.). The MIT Press, Cambridge, Massachussetts. Lenneberg, E.H. (1993; 1967). Toward a biological theory of language development. In: Brain development and cognition. A reader, Johnson, M.H. (Ed.), pp. 39-46. Blackwell Publishers, Cambridge, Massachussetts. Excerpt from: Lenneberg, E.H. (1967). Biological foundations of language. John Wiley & Sons Inc., chapter 9, pp. 371-380. Mandler, G. (1984). Mind and body. Psychology of emotion and stress. W.W. Norton & Company, New York. Mayr, E. (1982). The growth of biological thought: Diversity, evolution, and inheritance. Cambridge: Harvard Univ. Press. McClelland, J.L. and Jenkins, E. (1991). Nature, nurture and connections: implications of connectionist models for cognitive development. In: Architectures for Intelligence: the 22nd Carnegie Mellon Symposion on Cognition, pp. 41-73.
58
McCloskey, M. and Cohen, N.J. (1989). Catastrophic interference in connectionist networks: the sequential learning problem. In:The Psychology of Learning and Motivation, Vol. 24, pp. 109-165. Molenaar, P.C.M. (1986). On the impossibility of acquiring more powerful structures: a neglected alternative. In: Human Development, Vol. 29, pp. 245-251. Murre, J.M.J., Phaf, R.H. and Wolters, G. (1992). CALM: Categorizing and Learning Module. In: Neural Networks, Vol. 5, pp. 55-82. Newell, A. and Simon, H.A. (1981). Computer science as an empirical inquiry: symbols and search. In: Mind design, J. Haugeland (Ed.), MIT Press. Nolfi, S., Miglino, O. and Parisi, D. (1994). Phenotypic plasticity in evolving neural networks. Technical Report PCIA-94-05, Institute of Psychology, C.N.R.-Rome. Oatley, K. (1992). Best laid schemes. The psychology of emotions. Cambridge University Press, Cambridge. Oliphant, G. (1994). Connectionism, psychology and science. In: Collected papers from a symposium on connectionist models and psychology, Wiles, J., Latimer, C. and Stevens, C. (Eds.). Technical report no. 289, University of Queensland, Australia. Olthof, T (1994). Naar een alternatief voor het idee van de natuurlijke ontwikkeling. In: Nederlands Tijdschrift voor Psychologie, nr. 49, pp. 178-190. Phaf, R.H. & Murre, J.H.J. (1989). Cognitie onder de micrscoop. In Vensters op de geest. Grafiet, Utrecht. Piaget, J. (1993). The epigenetic system and the development of cognitive functions. In: Brain development and cognition. A reader, Johnson, M.H. (Ed.). Blackwell Publishers, Cambridge , Massachussetts. Excerpt from: Piaget, J. (1971). Biology and knowledge, section 2, pp. 14-23. Edinburgh University Press and University of Chicago Press. Pinker, S and Prince, A. (1988). into account (e.g. analysis of a parallel distributed processing model of language acquisition. In: Connections and symbols, Pinker, S. and Mehler, J., (Eds.). Elsevier, Amsterdam. Plunkett, K. and Marchman, V. (1990). From rote learning to system building. Acquiring verb morphology in children and connectionist nets. CRL Technical Report 9020, University of California, San Diego, La Jolla. Plunkett, K. and Sinha, C. (1992). Connectionism and developmental theory. In:The British Journal of Developmental Psychology, Vol. 10, pp. 209-254. Prusinkiewicz, P. and Hanan, J. (1989). Lindenmayer systems, fractals and plants. Springer Verlag, New York. Rumelhart, D. (1989). The architecture of mind. In: Foundations of cognitive science, Posner, M. (Ed.) pp. 133-159. MIT Press. Rumelhart D.E. and McClelland, J.L. (Eds.) (1986a). Parallel Distributed Processing. Explorations in the microstructure of cognition. Volume 1: Foundations. Cambridge, MA: The MIT Press.
59
Rumelhart D.E. and McClelland, J.L. (Eds.) (1986b). Parallel Distributed Processing. Explorations in the microstructure of cognition. Volume 2: Psychological and biological models. Cambridge, MA: The MIT Press. Smolensky, P. (1988). On the proper treatment of connectionism. Behavioral and Brain Sciences Vol. 11, pp. 1-74. Smolensky, P. (1991). Connectionism, constituency, and the language of thought. In: Meaning of Mind, Loewer, B. and reys, G. (Eds), pp .201-227. Basil Blackwell. Smolensky, P., Legendre, G. and Miyata, Y. (1992). Principles for an integrated connectionist/symbolic theory of higher cognition. Report CU-CS-600-92, Computer Science Department, University of Colorado, Boulder; Report 92-08, Institute of Cognitive Science, University of Colorado, Boulder; Report 92-1-02, School of Computer and Cognitive Sciences, Chukyo University. Spelke, E.S. (1990). Origins of visual knowledge. In: Visual cognition and action. An invitation to cognitive science, Vol. 2. Osherson, D.N., Kosslyn, S.M. and Hollerbach, J.M. (Eds.). The MIT Press, Cambridge, Massachussetts. Tijsseling, A. G. (1994). A hybrid framework for categorization. Unpublished Master thesis, Utrecht University. Tucker, D.M. (1992). Developing emotions and cortical networks. In: Developmental behavioral neuroscience. The Minnesota Symposia on child psychology, Vol. 24. Gunnar, M.R. and Nelson, C.A. (Eds.). Lawrence Erlbaum Asociates, Hillsdale, New Jersey, pp. 75-128. Turkewitz, G. and Kenny, P.A. (1993; 1982). Limitations on input as a basis for neural organization and perceptual development: A preliminary theoretical statement. In: Brain development and cognition. A reader, Johnson, M.H. (Ed.), pp. 510-522. Blackwell Publishers, Cambridge, Massachussetts. First appeared in Developmental Psychology, Vol. 15 (1982), pp. 357-368. Van Besien, F. (1989). Metaphors in scientific language. In: Communication and Cognition, Vol. 22. Van der Maas, H.L.J. and Molenaar P.C.M. (1992). Stagewise cognitive development: an application of catastrophy theory. In: Psychological Review, Vol. 99, No. 3, pp. 395417. Van der Maas, H.L.J., Verschure, P.F.M.J. and Molenaar,P.C.M. (1990). A note on chaotic behavior in simpe neural networks. In: Neural Networks, 3, pp. 119-122. Van Geert, P. (1991). A dynamic systems model of cognitive and language growth. Psychological Review Vol. 98, pp. 3-53. Winograd, T. (1980). What does it mean to understand language. Cognitive Science, Vol 4, pp. 209-241.
60