Single Mechanism but not Single Route: Learning

3 downloads 0 Views 86KB Size Report
Single Mechanism but not Single Route: Learning Verb. Inflections in Constructivist Neural Networks. Gert Westermann. Institute for Adaptive and Neural ...
Commentary to Clahsen, H. Lexical Entries and Rules of Language: A Multidisciplinary Study of German Inflection. Behavioral and Brain Sciences, 22:6 (1999), 1042–1043.

Single Mechanism but not Single Route: Learning Verb Inflections in Constructivist Neural Networks Gert Westermann Institute for Adaptive and Neural Computation Division of Informatics, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW Scotland, UK [email protected], http://www.cogsci.ed.ac.uk/˜gert/

Abstract Clahsen’s theory raises problems that make it seem untenable. As an alternative, a constructivist neural network model is reported which develops a modular architecture and in which a single associative mechanism produces all inflections, displaying an emergent dissociation between regular and irregular verbs. Thus, Clahsen’s rejection of associative models of inflection only concerns a sub-group of these models.

The qualitative distinction between the mechanisms for regular and irregular inflections lies at the heart of the dual-mechanism theory adopted by Clahsen: each inflected form is produced either by the default rule or in the associative lexicon. However, the important question of the character of the interaction between these two mechanisms remains unclear. The only specific explanation that has been put forward is the Blocking Principle (Marcus et al.; 1995) which states that when an inflection is produced, the lexicon is searched for an entry which, if found, blocks the application of the rule. Although it can intuitively account for several psycholinguist data, an implementation of this principle (Nakisa et al.; 1997) has shown that it bears many problems and yields no advantage over single-route classifiers. The dual-mechanism theory is under-specified in this important aspect, and Clahsen’s rejection of fully implemented single-mechanism associative models on the basis of the vague dual-mechanism theory seems premature. A second problem arising from the assumed qualitative distinction concerns the German mixed verbs: these verbs which represent 32% of all participle tokens, combine an irregular stem with the regular ending -t (e.g., denken ` a gedacht). In a dual mechanism account these verbs have to be considered as irregular (because they are not formed by the rule), with the consequence that -t can be both a regular and an irregular ending. A third problem concerns the acquisition of the English past tense: here, children occasionally make mistakes such as broked and tooked, where the regular ending is attached to an irregular past tense form (e.g., Marcus et al.; 1992). If the two mechanisms of the inflection system are distinct, such blends between the two mechanisms are hard to explain. A final problem concerns impaired processing: Penke et al. (1999) found that for German agrammatic aphasics who showed specific deficits for irregular inflections, the only errors occuring with regular verbs were for those regulars that were similar to irregulars, and they could therefore be viewed as “less regular” than others. An analysis of the errors made with irregular verbs showed that those that were similar to regular verbs were overregularized more often than those that were dissimilar to regulars. Thus, there seemed to be a grading within regular and irregular verbs 1

which was determined by the similarity to the respective other group. Such phenomena are best explained by associative effects for regulars which according to the dual-mechanism theory, should not exist. Taken together, these points support a view in which there are no qualitatively distinct mechanisms for the production of regular and irregular inflections. Instead, regulars and irregulars can be seen as two ends of a continuum, with mixed verbs and the blends produced by children representing intermediate cases. An implemented model that is based on this view is a constructivist, single-mechanism neural network that accounts for the phenomena found in past tense acquisition (Westermann; 1997, 1998) and in impaired adult language processing (Westermann et al.; 1999). This model starts with direct connections between the input and the output units, and during the learning process it constructs a hidden layer of receptive fields in response to the input data. The model takes into account recent theoretical arguments (Quartz; 1993) and neurobiological and cognitive developmental evidence for constructivist development (Elman et al.; 1996; Johnson; 1997; Quartz and Sejnowski; 1997). The network model displays a double dissociation between regular and irregular verbs without having to rely on qualitatively distinct mechanisms. Instead, it exploits two representations for each verb: the direct phonological input representation is, through the constructivist learning process, enhanced with similarity-based, localist representations in the hidden layer. The dissociation between regular and irregular verbs emerges because they rely to different degrees on these two representations. This explanation does not imply, however, that the claim of two qualitatively distinct production mechanisms is merely shifted onto two qualitatively distinct representations with all else being equal: both representations are activated for all verbs, but they are exploited to different degrees by regular and irregular verbs. Whereas production of most of the regular participles is based on the direct input representation alone, most irregulars rely mainly on the localist hidden layer. In this way the model accounts easily for the problematic data outlined above: the degree of activation of each pathway determines the degree of (ir)regularity of a verb, and intermediate cases are produced when both pathways are active. The distinction between the mechanisms producing regular and irregular inflections is thus quantitative and not qualitative. Instead of a dual-mechanism theory I therefore propose a dual-representation model emerging from a constructivist learning process. In this way, a single associative mechanism can account for the dissociation between regular and irregular inflections and avoid the problems of the dualmechanism theory outlined above. According to this interpretation, theories and models can be distinguished along three essential dimensions: fixed structure vs. structure emerging from constructivist development; homogeneous architecture vs. modular architecture; and single-mechanism vs. multiple mechanisms. The dualmechanism theory propagated by Clahsen is a fixed, modular-architecture, multiple-mechanism account, and his rejection of associative models is aimed at fixed, homogeneous-architecture, singlemechanism models. The model reported here is a constructivist, modular-architecture yet singlemechanism account that avoids problems of both the dual-mechanism theory and of homogeneous neural networks. The strength of this model lies in its constructivist nature which leads to modularization in response to its environmental input and which allows for it to account for the observed human data based on a single associative mechanism. The constructivist model thus makes the postulation of two qualitatively distinct mechanisms in the language system unnecessary.

2

Acknowledgements Thanks to Martina Penke, Richard Shillcock, and David Willshaw for helpful comments on a draft of this paper.

References Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff-Smith, A., Parisi, D. and Plunkett, K. (1996). Rethinking Innateness. A Connectionist Perspective on Development, MIT Press, Cambridge, MA. Johnson, M. H. (1997). Developmental Cognitive Neuroscience, Blackwell, Oxford, UK. Marcus, G., Brinkmann, U., Clahsen, H., Wiese, R. and Pinker, S. (1995). German inflection: The exception that proves the rule, Cognitive Psychology 29: 189–256. Marcus, G. F., Pinker, S., Ullman, M., Hollander, M., Rosen, T. J. and Xu, F. (1992). Overregularization in language acquisition, Monographs of the Society for Research in Child Development, Serial No. 228, Vol. 57, No. 4. Nakisa, R. C., Plunkett, K. and Hahn, U. (1997). A cross-linguistic comparison of single and dualroute models of inflectional morphology, in P. Broeder and J. Murre (eds), Cognitive Models of Language Acquisiton, MIT Press, Cambridge, MA. Penke, M., Janssen, U. and Krause, M. (1999). The representation of inflectional morphology: Evidence from Broca’s aphasia, Brain and Language 68: 225–232. Quartz, S. R. (1993). Neural networks, nativism, and the plausibility of constructivism, Cognition 48: 223–242. Quartz, S. R. and Sejnowski, T. J. (1997). The neural basis of cognitive development: A constructivist manifesto, Behavioral and Brain Sciences 20: 537–596. Westermann, G. (1997). A constructivist neural network learns the past tense of English verbs, Proceedings of the GALA ’97 Conference on Language Acquisition, HCRC, Edinburgh, UK, pp. 393–398. Westermann, G. (1998). Emergent modularity and U-shaped learning in a constructivist neural network learning the English past tense, Proceedings of the 20th Annual Conference of the Cognitive Science Society, Erlbaum, Hillsdale, NJ, pp. 1130–1135. Westermann, G., Willshaw, D. and Penke, M. (1999). A constructivist neural network model of German verb inflections in agrammatic aphasia, Proceedings of ICANN99, pp. 916–921.

3

Suggest Documents