A Constructivist Neural Network Learns the Past Tense of English Verbs

In: Proceedings of the GALA ’97 Conference on Language Acquisition (1997), pp 393–398. Edinburgh, UK: HCRC

A Constructivist Neural Network Learns the Past Tense of English Verbs Gert Westermann Centre for Cognitive Science University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW [email protected]

Abstract A constructivist neural network is presented that models the acquisition of the past tense of English verbs. The network constructs its architecture in response to the learning task, corresponding to neurobiological and psychological evidence. The model outperforms other connectionist and symbolic models in learning and in displaying psychologically realistic learning and generalization behavior. It is argued that the success of the network is due to its constructivist nature, and that the distinction between fixed architecture and constructivist models is fundamental. Given this distinction, constructivist systems constitute better models of cognitive development.

1. Introduction The acquisition of the English past tense has in the past years become a touchstone for different theories of language acquisition and of cognition in general. Different theories and models have not only been used in the debate between proponents of symbolic and connectionist accounts of language learning, but have also raised the question whether regular and irregular past tense forms necessarily rely on different mechanisms for their formation or whether a single mechanism can account for both. While most connectionist accounts (Rumelhart & McClelland 1986, MacWhinney & Leinbach 1991, Plunkett & Marchman 1993) generally argue that a homogeneous architecture is sufficient for both forms, hybrid theories (Pinker 1991) explain regular cases with a rule and irregulars with an associative memory. There exist, however, modular connectionist (Westermann & Goebel 1995) and homogeneous symbolic (Ling & Marinov 1993) models of inflection acquisition. What is common to most of these models is that they rely on a fixed, pre-defined architecture, an assumption which, as argued below, is unrealistic and poses severe problems for their usefulness as models of cognitive development. In this paper, a constructivist neural network for learning the past tense is described that builds its architecture in response to the learning task. The network is compared with three other implemented models of past tense acquisition: the original pattern associator (Rumelhart & McClelland 1986, R&M), the improved backpropagation network by MacWhinney & Leinbach (1991) (M&L)

which took the extensive criticism of the R&M model into account, and the Symbolic Pattern Associator (SPA, Ling & Marinov 1993), which took up the challenge posed by M&L to present an implemented symbolic system (rather than just a theory) for past tense acquisition. It is shown that the constructivist neural network presented here outperforms the existing connectionist and symbolic models both in learning the task and in the display of psychologically realistic learning and generalization behavior. It will argued that such a model can help bridging the gap between symbolic and connectionist, and modular and homogeneous theories of inflection acquisition. The rest of this paper is organized as follows: in section 2 the argument is made that constructivist development is a necessary condition for realistic models of cognitive development. In section 3 a specific constructivist neural network algorithm, Supervised Growing Neural Gas, is described that was used for the simulations in this paper. Section 4 is concerned with the experimental setup, and in sections 5, 6, and 7, the results of the simulations are analyzed with respect to learning, a U-shaped learning curve, and generalization performance, respectively. These results are then discussed in section 8.

2. Why Constructivist Learning? Cognitive development has recently been argued to closely correlate with the structural development of the cortex, with an increase in structural complexity leading to an increase in cognitive capacities (Quartz & Sejnowski 1998, Johnson 1997). In order to understand the principles of cognitive development it is therefore important to take the mechanisms of brain development into account. Recent work in this area has provided evidence that the development of cortex is activity dependent on different levels (see e.g., Van Ooyen 1994): activity can determine the rate and direction of dendritic and axonal growth and the formation of synapses (e.g., Quartz & Sejnowski 1998). Stabilization and loss of these synapses is also activity dependent (Fields & Nelson 1992). It has further been shown in transplantation and rewiring studies that cortical areas are not innately prespecified to assume a certain functionality, but readily adapt to process afferent signals from different domains (O’Leary 1989). Further, cortex remains flexible to a certain degree

3. The Supervised Growing Neural Gas Algorithm There exist now a great number of constructivist neural network algorithms, most of which have been designed to overcome the shortcomings of fixed architecture networks (the need for choosing a predefined architecture, slow learning time, uniform allocation of resources for tasks of varying complexity). For the cognitive simulations described here, a modified version of the Supervised Growing Neural Gas (SGNG) algorithm (Fritzke 1994) was used because it incorporates constructive and regressive events which depend on the learning task and because it provides mechanisms to produce outputs based on both the structure and on the identity of input signals, conforming to

both neurobiological and psychological evidence. Activation

throughout life, with dendritic density increasing until a late age (Uylings et al. 1978). These results indicate that neural development proceeds in a constructivist way, with the neural organization of the brain being modified through constructive and regressive events by complex interactions between genetic predispositions and environmental inputs. Cognitive development which is based on cortical development will thus proceed in the same constructivist way, where activity dependent architectural modifications lead to increasingly complex cognitive representations. Most significantly, research in learning theory (Baum 1989, Quartz 1993) has shown that incorporating activity dependent structural modification into a learning system is not just a way to tune performance, but leads to entirely different learning properties of that system, evading many of the problems that are associated with fixed-architecture systems. Such constructivist systems can overcome Fodor’s paradox (Fodor 1980) which claims that no new representations can be learned and thus argues for innate representations (Quartz 1993), and they can overcome the prohibitive time complexity of even simple learning tasks in fixed-architecture systems (Baum 1989). Any model which aims to capture the essential properties of human cognitive development must take these results into account: cognitive models should therefore employ neural networks which, like the brain, adapt their architecture in a way specific to the learning task. These models can be called constructivist networks, reflecting their proximity to the constructivist developmental theories of Piaget in which structural modification of the learning system occurs in response to environmental input. In this paper a constructivist neural network model is employed for the simulation of past tense acquisition and compared with previous past tense models. This will allow to empirically assess the suitability of constructivist networks in the modeling of cognitive development.

Input x Position wc

Figure 1: A Gaussian activation function which acts as a receptive field to near inputs (viewed from the side and from the top). The SGNG algorithm constructively builds the hidden layer of a radial basis function (RBF) network. Such an RBF network is different from the more common backpropagation networks in that the hidden units do not have a sigmoid but a Gaussian, ‘bellshaped’ activation function (see figure 1). This allows each hidden unit to be active only for inputs within a certain range (as opposed to being active for all inputs above a certain threshold, as with sigmoidal units) and it can thus be viewed as a receptive field for a region of the input space. All input vectors obtain a position in this space (determined by their values), and the hidden units are placed at different positions to cover the whole space. Hidden units will be activated by an input if it falls within their receptive fields, and the closer the input is to the center of the field, the more the unit will be activated. The problem in building RBF networks is to decide on the number and positions of the hidden units, because inputs falling into a common receptive field will lead to similar outputs. The SGNG algorithm solves this problem by building the hidden layer and adding units when and where they are needed. The algorithm starts with just two hidden units. When an input is presented to the network, the hidden unit which is closest to this input (i.e., the winning unit) together with its direct topological neighbors are moved towards the input signal – this prevents hidden units to remain in regions of the input space where no inputs occur; hidden units which never win eventually die off. The activation from the hidden units is propagated to the output and the output error is calculated. This error is added onto a local counter variable of the winning unit, and the weights between the hidden and output units are adjusted (e.g., with the delta rule). A new hidden unit is inserted when the performance of the network no longer improves in the current architecture (i.e., when the error decreases less than a certain value within a certain number of epochs). The new unit is inserted next to the hidden unit which has accumulated the highest error (note that only winning units can accumulate error). The idea here is that a winning unit which produces a high output error is inadequate (because it covers inputs with conflicting outputs), and

more structural resources are needed in that area. On insertion of a unit the sizes of the receptive fields are shrunk so that they slightly overlap with each other; this in effect leads to a more “fine-grained” resolution in that area of the input space. Initial Final 00111100

0011

00111100

0011

0011 11001100 00111100 11001100 111 0011 000 111 000

00111100

00111100

00111100

Figure 2: Covering of the two-dimensional input space by receptive fields at the beginning (left) and the end (right) of learning. Figure 2 shows a hypothetical start and end state in a two-dimensional input space. While initially only two receptive fields cover the whole of the space, at the end hidden units have been inserted with different densities across the space to account for the specific learning task. brought b

r

O

t

111111 000000 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 000000 111111 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000001111111 111111 00000001111111 1111111 00000001111111 1111111 00000001111111 1111111 00000001111111 1111111 00000001111111 1111111 00000001111111 1111111 00000001111111 1111111 00000001111111 1111111 0000000 1111111

b

r

I

4. Experiments For our simulations, we borrowed the corpus from MacWhinney & Leinbach (1991) which consists of 1,404 stem/past tense pairs of English verbs. This corpus was also used by Ling & Marinov (1993) in their SPA to allow direct comparisons between models. The verbs were transcribed using UNIBET and, following MacWhinney & Leinbach (1991), represented in a templated format containing slots for consonants and vowels. Table 1 shows examples for the templated phonological encoding of some verbs. Each phoneme was represented by ten features, such as voiced, labial, dental for consonants, and front, center, high for vowels. A template consisted of 18 slots, resulting in a 180-bit featurevector for the representation of each verb.

Output Layer (Template)

Hidden Layer with Gaussian Units

111111 000000 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 000000 111111 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000001111111 111111 00000001111111 1111111 00000001111111 1111111 00000001111111 1111111 00000001111111 1111111 00000001111111 1111111 00000001111111 1111111 00000001111111 1111111 00000001111111 1111111 0000000 1111111

When similar verbs lead to similar outputs, however, (e.g., look and cook with looked and cooked), no new receptive field will be inserted there and one such field will cover different verbs without producing output error. Thus, the internal structure of the network will adapt to reflect the learning task, and observing this adaptation can lead to insights into the past tense learning process. The next section describes the specific simulations that were undertaken with the SGNG network model.

Input Layer (Template)

N

Input bring Template explain Template point Template recognize Template shake Template

br-I-N-----------CCCVVCCCVVCCCVVCCC ---I-ksp--l--e-n-CCCVVCCCVVCCCVVCCC p--2-nt----------CCCVVCCCVVCCCVVCCC r--E-k--I-gn-3-z-CCCVVCCCVVCCCVVCCC S--e-k-----------CCCVVCCCVVCCCVVCCC

Output brought explained pointed recognized shook

br-O-t-----------CCCVVCCCVVCCCVVCCC ---I-ksp--l--e-ndCCCVVCCCVVCCCVVCCC p--2-nt-I-d------CCCVVCCCVVCCCVVCCC r--E-k--I-gn-3-zdCCCVVCCCVVCCCVVCCC S--U-k-----------CCCVVCCCVVCCCVVCCC

Table 1: Some examples for the template-encoding of verb pairs.

bring

Figure 3: The initial SGNG network modified with direct input-output connections. All layers are fully connected. Figure 3 shows the whole SGNG network. For the cognitive simulations described here, the original SGNG network was extended with direct connections between the input and the output layer. These connections allow the past tense to be produced through a direct structural transformation of the input stem. By contrast, the (growing) hidden layer acts as a memory: it produces an output based on the identity and not the structure of the input verb. Initially, though, similar input verbs fall into the same receptive fields even when they require different outputs (e.g., hear and fear requiring heard and feared, respectively). This problem is overcome in the training of the network through the insertion of new receptive fields in the area of such verbs, and eventually similar verbs with dissimilar past tense forms will be discriminated.

From the original corpus of 24,802 tokens, 8,000 tokens were randomly extracted according to the frequency of their past tense forms. The resulting training corpus thus consisted of 8,000 tokens (57.2% regular, 42.8% irregular), corresponding to 1,066 types (88.4% regular, 11.6% irregular). Training of the SGNG network proceeded in a nonincremental fashion: the whole training set of 8,000 stem/past tense pairs was presented to the network in random order at each epoch. Hidden units were inserted depending on the learning progress (see section 3), and the network was tested for its performance on the training set prior to each insertion.

5. Learning Results After 912 epochs, the network produced 100% of the irregular and 99.8% (all but two) of the regular past tense forms correctly. At that point the network had inserted a total of 400 hidden units. On average, there-

fore, each of the 400 hidden unit receptive fields covered 2.67 verbs. Verb types

R&M 420

M&L 1,650

SPA 1,038

Constructivist 1,066

Percentage correct Total Regulars Irregulars

97.0 98.0 95.0

99.3 100.0 90.7

99.2 99.6 96.6

99.8 99.8 100.0

Table 2: Performance on training of the four compared models (extended from Ling & Marinov 1993). Table 2 shows a comparison of the training results of the different models. While all models performed almost equally well on regular verbs, the constructivist network clearly outperformed the other models on irregular verbs. This result indicated that the ability to add structure where needed was advantageous in that it allowed the network to allocate resources specifically for learning the irregular past tense forms. This was confirmed by an analysis of the hidden layer: while on average 2.7 regular verbs clustered to the same hidden unit (with the maximum of 16 regular verbs in one receptive field), this number was only 1.1 for irregular verbs. This result clearly shows the advantage of constructivist learning over the learning in fixed-architecture systems: resources are not evenly distributed a priori to handle all cases, but they can be specifically allocated for the more difficult, or exceptional cases, whereas fewer resources are needed for the easy, regular cases.

6. U-shaped Learning Curve A plausible model of past tense acquisition should follow the documented course of acquisition in children, that is, the U-shaped learning curve which has been extensively studied (see e.g., Marcus et al. 1992): while children initially produce a number of irregular verbs correctly, they subsequently overregularize the same verbs and only in a final step produce them correctly again. This phenomenon has theoretically been explained with the inappropriate application of a linguistic rule, but connectionist theories have argued that it can arise due to regularities in a subtly changing speech environment of the child. The constructivist model described in this paper displayed a U-shaped learning curve for many of the irregular verbs: A period of overregularization was preceded by a phase of correct production of the past tense form; this was the case e.g., for knew, sat, made, took and said. Often, all forms were produced (irregular past, stem + regular ending, irregular past + regular ending; e.g., knew – knowed – knewed – knew). For other, less frequent verbs (e.g., wet, sell, cost), the first sampled past tense form was overregularized. This corresponds to data on overregularization in children (Marcus et al. 1992):

While in the corpora of Adam, Eve, and Sarah several past tense forms were produced correctly before their first overregularization, this was not always the case. It is unclear, however, whether this is due to a lack of speech samples, but presumably a child would overregularize an infrequent verb in its first usage. The network displayed psychologically plausible behavior in more specific aspects as well: on average, the more frequent verbs were less often overregularized than less frequent ones. Further, clusters of irregular verbs acted as protection from overregularization: the verbs ring, sing, and spring were overregularized only in 2.9% of all cases. By contrast, the verbs hang, slide, and bear, which had a comparable token frequency, had an average overregularization rate of 15.4%. The constructivist network model was thus successful in modeling the left side of the U-shaped learning curve, i.e., a correct production of past tense forms before their subsequent overregularization, and its performance corresponded to the details of children’s past tense learning. How does the U-shaped learning in the constructivist network occur? Since the verb set was held constant throughout training, the change in network performance could only be a consequence of the internal reorganization of the network architecture. Initially, the network had only two hidden units which were of little use since they each covered about half of all verbs with their varied past tense forms, and the network therefore had to rely on the direct input-output connections for producing the past tense forms. Given these restrictions the network initially learned to produce the past tense forms of the frequent irregulars (because they were so frequent) and of many regular verbs (because there were so many of them). When learning stagnated and no more past tense forms could be produced correctly, the network gradually grew its hidden layer, adding more receptive fields which were then used to account for more past tense forms. The output forms were produced through a ‘collaboration’ of the direct connections with the newly established hidden unit connections. The growing process lead to a phase in which representations were already relocated to the hidden layer, but the few receptive fields were large and included verbs that required different past tense forms, thereby leading to errors even for verbs that had initially been produced correctly through just the direct input-output connections. This phase corresponds to the overregularization stage in children. It is evident that with this mechanism, different verbs would be overregularized at different times, depending on whether they had been allocated an individual receptive field. This process of internal reorganization of the network’s representations becomes evident in figure 4, which shows the learning curves for the regular and irregular past tense forms, but also

how many of these forms were still produced correctly when the hidden layer was lesioned and only the direct input-output connections were used.

10 9 8

human constructivist

6

4 3 2

P % correct

8 7

11001100 111 000 000 111 000 111 000 111 000 111 000 111 000 111

I D Irregulars

111 000 000 111

00111100111 000 000 111

11001100 111 000 000 111

6 5

111 000 000 111 000 111

4 3 2

1111 0000 0000 1111 0000 1111

111 000 000 111

111 000 000 111 000 111

1 0

P

I D Regulars

Figure 5: Generalization of the constructivist network to different classes of pseudo-verbs, in comparison with humans, the SPA, and R&M’s network (extended from Ling & Marinov 1993). P = Prototypical, I = Intermediate, D = Distant.

40

regulars regulars with lesioned hidden layer irregulars irregulars with lesioned hidden layer

20

9

R&M

111 000 000 111 000 111

0

60

10 SPA

000000000000000000000000000000 111111111111111111111111111111 111 000 000000000000000000000000000000 000111111111111111111111111111111 111 000 111 000000000000000000000000000000 111111111111111111111111111111 000 111 000 111 000000000000000000000000000000 0011111111111111111111111111111111 000 111 000 111 000000000000000000000000000000 111111111111111111111111111111 000000000000000000000000000000 111111111111111111111111111111 000 111 000000000000000000000000000000 111111111111111111111111111111 000 111 000 111 11001100 000 111 000 111 000 111 000 111 000 111 000 111

1

80

11001100 111 000 000 111 000 111

7

5

100

111 000 000 111 000 111 000 111 000 111

0 0

100

200

300

400

500 Epoch

600

700

800

900

1000

Figure 4: The learning curves for the regular and irregular past tense forms in the intact network and with a lesioned hidden layer. Initially, with few hidden units, lesioning the hidden layer did not lead to a strong decrease in the network performance: with or without the hidden layer, initially about 20% of the irregular and 60% of the regular past tense forms were produced correctly. As the hidden layer grew, however, lesioning lead to a marked decrease in performance, and at around epoch 200, when the network had constructed 91 hidden units, deletion of these units resulted in no more irregulars and only 7.2% of the regular verbs to be produced correctly. This confirmed the collaboration of the two pathways, direct input-output and via the hidden units, in producing the past tense forms of most verbs, and it showed that even the representations of initially correct verbs were transferred from the direct connections into the growing hidden layer, leading in many cases to the temporary wrong production of initially correct past tense forms. The internal reorganization of the network due to a constructivist adaptation of its structure could thus account for the unlearning of initially correct outputs and for the U-shaped learning curve in the acquisition of the English past tense.

7. Generalization The network was also tested on its generalization to pseudo-verbs. As Ling & Marinov (1993) pointed out, testing the generalization ability of a model on existing verbs is misleading because irregular verbs are by their nature unpredictable, and in line with Ling & Marinov (1993) we therefore used the set of 60 pseudo-verbs which had been devised by Prasada & Pinker (1993) and had been tested by them on human subjects. These verbs consisted of blocks of ten which were prototypical, intermediate and distant with respect to existing regular and irregular verbs. The results of the generalization experiments are

shown in figure 5. The generalization performance of the constructivist network was similar to that of human subjects for both regular and irregular cases. It performed similar to the SPA, but better than the R&M network model.

8. Discussion The experiments reported here show empirically that a constructivist neural network can model the acquisition of the English past tense more closely than other, fixed-architecture networks. This is due to the fact that a constructivist network is capable of adding structure when and where needed, thereby adapting to the specific learning task, and to the resulting internal reorganization of representations which lead to the U-shaped development which is also found in children’s learning. These results, together with those from learning theory (see section 2), indicate that constructivist learning is superior to learning in fixed-architecture systems. In fact this is also true for the symbolic SPA: this model builds a decision tree in response to the learning task and therefore also constitutes a constructivist system. It is likely that the fact that the SPA outperformed both R&M’s and M&L’s neural network model is not based on it being symbolic, but is due to its constructivist nature. It seems therefore that the dichotomy constructivist/fixed-architecture is more fundamental than the symbolic/subsymbolic distinction which previous past tense models have aimed to emphasize. Direct comparisons between symbolic and subsymbolic models can thus only be made fully within or without the constructivist framework, and, as seen in this paper, models within the constructivist framework conform better to evidence from neural and cognitive development. Comparing the constructivist network with the constructivist symbolic SPA indicates, however, that the network constitutes a more realistic psychological model: it both learns better than the SPA and it explains the U-shaped learning curve more realistically. In the SPA, U-shaped learning was achieved by the ex-

plicit manipulation of a learning parameter that controlled how many times a verb had to be seen to be memorized as an exception – if it occurred less often, it was overregularized. Besides “hard-wiring” the theory that children possess such a variable parameter and using the resulting U-shaped learning curve as evidence for just that theory, leading to a circular argument, this procedure also established an unrealistically direct relationship between the frequency of a verb and its overregularization. In the constructivist network, however, U-shaped learning arose as a direct outcome of the learning algorithm due to the internal reorganization of the network architecture. The constructivist network contradicts the view that connectionist learning implies a homogeneous architecture, which is often held for connectionist past tense models. Although learning was based, as in conventional fixed-architecture networks, on the complex interactions of many simple units and on the gradual adjustment of connection weights, the constructivist network developed a “pseudo-modular” architecture where more space was given to the harder, irregular cases, and where a memory in the form of hidden unit receptive fields developed in addition to the direct input-output connections. Goebel & Indefrey (in press) and Westermann & Goebel (1995) had shown how learning in (fixed architecture) modular connectionist networks modeled cognitive development more closely than homogeneous architectures, and the present paper shows how a similar modular architecture can develop in a constructivist framework. The results obtained with the present and with previous past tense models thus suggest an extension of the common symbolic/connectionist distinction by the dimensions of modular/homogeneous and fixed-architecture/constructivist. Given this threedimensional classification matrix, the present paper indicates that connectionist modular constructivist systems constitute the most realistic models of cognitive development in the child. Future work will address an extension to the SGNG algorithm: in its present form, it only learns to discriminate between similar inputs requiring different outputs (such as hear and fear) but has in its hidden layer no mechanism of integrating different inputs requiring similar outputs, such as note and decide. Further research will also involve assessment of the neurobiological plausibility of the constructivist growth process and its modification to that end. Such research might then contribute to the understanding of the connection between neural and cognitive development, an area which is currently only beginning to be addressed.

9. Acknowledgements This research was supported by the ESRC (award no. R00429624342) and by the Gottlieb Daimler-und

Karl Benz-Stiftung (grant no. 02.95.29).

References Baum, E. B. (1989), ‘A proposal for more powerful learning algorithms’, Neural Computation 1, 201–207. Fields, R. D. & Nelson, P. G. (1992), ‘Activity-dependent development of the vertebrate nervous system’, International Review of Neurobiology 34, 133–214. Fodor, J. (1980), Fixation of belief and concept acquisition, in M. Piattelli-Palmarini, ed., ‘On Language and Learning: The Debate between Jean Piaget and Noam Chomsky’, Routledge & Kegan Paul, London and Henley, pp. 143–149. Fritzke, B. (1994), ‘Fast learning with incremental RBF networks’, Neural Processing Letters 1, 2–5. Goebel, R. & Indefrey, P. (in press), The performance of a recurrent network with short term memory capacity learning the German s-plural, in P. Broeder & J. Murre, eds, ‘Cognitive Models of Language Acquisition’, MITPress, Cambridge, MA. Johnson, M. H. (1997), Developmental Cognitive Neuroscience, Blackwell, Oxford, UK; Cambridge, MA. Ling, C. X. & Marinov, M. (1993), ‘Answering the connectionist challenge: A symbolic model of learning the past tenses of English verbs’, Cognition 49, 235–290. MacWhinney, B. & Leinbach, J. (1991), ‘Implementations are not conceptualizations: Revising the verb learning model’, Cognition 40, 121–157. Marcus, G. F., Pinker, S., Ullman, M., Hollander, M., Rosen, T. J. & Xu, F. (1992), ‘Overregularization in language acquisition’, Monographs of the Society for Research in Child Development, Serial No. 228, Vol. 57, No. 4. O’Leary, D. D. M. (1989), ‘Do cortical areas emerge from a protocortex?’, Trends in Neuroscience 12, 400–406. Pinker, S. (1991), ‘Rules of language’, Science 253, 530– 535. Plunkett, K. & Marchman, V. (1993), ‘From rote learning to system building: Acquiring verb morphology in children and connectionist nets’, Cognition 48, 21–69. Prasada, S. & Pinker, S. (1993), ‘Generalization of regular and irregular morphological patterns’, Language and Cognitive Processes 8(1), 1–56. Quartz, S. R. (1993), ‘Neural networks, nativism, and the plausibility of constructivism’, Cognition 48, 223–242. Quartz, S. R. & Sejnowski, T. J. (1998), ‘The neural basis of cognitive development: A constructivist manifesto’, Behavioral and Brain Sciences 21. Rumelhart, D. E. & McClelland, J. L. (1986), On learning past tenses of English verbs, in D. E. Rumelhart & J. L. McClelland, eds, ‘Parallel Distributed Processing, Vol. 2’, MIT Press, Cambridge, MA, pp. 216–271. Uylings, H. B. M., Kuypers, K., Diamond, M. C. & Veltman, W. A. M. (1978), ‘Effects of differential environments on plasticity of dendrites of cortical pyramidal neurons in adult rats’, Experimental Neurology 62, 658–677. Van Ooyen, A. (1994), ‘Activity-dependent neural network development’, Network 5, 401–423. Westermann, G. & Goebel, R. (1995), Connectionist rules of language, in ‘Proceedings of the 17th Annual Conference of the Cognitive Science Society’, Erlbaum, pp. 236–241.