some issues in computational modelling; occam's

0 downloads 0 Views 153KB Size Report
philosophical assumptions that underlie this current practice. We will then ... recognition [3] is a good example of how the main features of an ordinary- language ...
SOME ISSUES IN COMPUTATIONAL MODELLING; OCCAM’S RAZOR AND HEGEL’S HAIR GEL* RICHARD SHILLCOCK† School of Informatics and School of Philosophy, Psychology and Language Sciences, University of Edinburgh, UK MATTHEW ROBERTS School of Philosophy, Psychology and Language Sciences, University of Edinburgh, UK HAMUTAL KREINER Department of Behavioural Science, Ruppin Academic Center, Israel MATEO OBREGON School of Philosophy, Psychology and Language Sciences, University of Edinburgh, UK We look at some of the assumptions made in the computational cognitive modelling of cognition. In particular, we look at some of the problems raised by the conventional modelling goal of simplicity. We review why cognitive modellers make certain choices and we emphasise the central role of abstraction. We conclude that Occam’s Razor is only half the story in using implemented computational models to explain cognitive processing, and we raise a number of questions that point the way to a materialist position in computational cognitive modelling.

1. Introduction In this paper we present some critical questions for computational cognitive modellers, which we attempt to address comprehensively from a materialist standpoint elsewhere [1]. We will begin by looking at some normal practice in the computational modelling of cognition, and then we will look at some of the philosophical assumptions that underlie this current practice. We will then present some problems that seem to us to represent a crisis requiring a principled response, and we will conclude with some questions that indicate the general direction along which we believe that response lies. Overall this discussion will lead to look closely at the role of simplicity in cognitive modelling.

*

This work was partially supported by project grants R39195 and R39942 from the ESRC (UK).

1

2

2. Standard practice in cognitive modelling Computational cognitive modelling is recognized as a central paradigm, perhaps the central paradigm, in cognitive science. At its best, it is seen as proceeding hand in hand with experimental work – a virtuous spiral in which each informs the other, and in which the modelling makes predictions that can be tested in the laboratory. Since cognitive science became an identifiable enterprise several decades ago, implemented computational models have been based on a range of computational substrates and on a number of different formal approaches, including production rules [e.g. 2], interaction-activation models [e.g. 3], distributed connectionist architectures [e.g. 4], and Bayesian probabilities [e.g. 5]. Biological and ecological criteria have variously been employed to argue for the plausibility, or lack of plausibility, of particular approaches. In the history of cognitive science, the TRACE model of spoken word recognition [3] is a good example of how the main features of an ordinarylanguage theory – the Cohort Model [6] – can be computationally implemented so as to (a) reproduce the basic data on which the ordinary-language theory was founded, (b) force cognitive scientists to make concrete decisions about aspects of representation and process that might not be salient to informal theorising, (c) generate a useful and productive conceptual vocabulary, and (d) make behavioural predictions that are experimentally testable. Thus, TRACE was shown to reproduce behaviours in which candidate words in the lexicon compete to represent specific stretches of the continuous speech input. Its authors took the categories of formal phonology (words, phonemes, phonetic features) and instantiated them in the model. They made specific proposals about the direction of interaction between such levels of processing (proposals which fuelled a decades-long psycholinguistic controversy [see, e.g., 7, 8]). They raised the issue of speech segmentation “emerging” from word recognition without a specific segmentation stage, and they characterised different types of temporary conspiracy involving partially activated lexical candidates in the course of perceiving any one stretch of speech input. The model has motivated a substantial volume of experimental work on speech processing, and has constituted a particular position within theorising about speech perception. TRACE is an example of a successful cognitive model that has advanced our understanding of speech processing. But this is not to say that everything in the wider cognitive modelling enterprise is rosy, or even to absolve TRACE of its own specific limitations. 3. Two challenges A number of issues demand the attention of the present cognitive modelling community. The first – not specific to the domain of cognitive science – is the sheer volume of data. Google Scholar provides a striking demonstration: taking examples from the study of eye-movements in reading, a search for “saccade”

3

yields “about 47,500 results”, “eye movements” “about 567,000 results”, and “word recognition” “about 209,000 results”†‡. Anyone sceptical of the necessity of researchers addressing the philosophical aspects of their cognitive modelling, and who says “never mind about philosophy, just give me the facts!”, faces no shortage of facts. The response to this scepticism is “state the reasons for choosing to include or exclude this fact or that fact in your model”. Abstraction is central to the whole enterprise of cognitive modelling; how does the researcher work with something other than the full, real-world complexity of the brain, the stimulus environment, and human behaviour? The second issue becomes apparent when we use another current electronic search tool, Citeseer, as a crude way of gauging the influence of a particular cognitive model by plotting its citations by year. In comparison with most cognitive models, TRACE fares well. All too often, a model attracts a little, temporary interest in the literature, like the Venerable Bede’s description of life as the “short, swift flight of the sparrow through the brightly-lit banqueting hall”. How is it that a typical implemented cognitive model does not become more securely incorporated into the scientific enterprise of understanding cognition? How should the model relate to the rest of cognition, to very closely associated behaviours and models? These questions have long been recognised as central to modelling. The responses to these questions all revolve around the issue of simplicity, as we will see below when we look at some of the philosophical assumptions that have been carried into the computational modelling of cognition. 4. Some underlying philosophical assumptions In reality, most actual modelling research follows an existing modelling paradigm, introducing variations in form and content, and researchers do not typically feel the need to articulate the philosophical assumptions underlying the research. This may even be so when a new modelling paradigm is introduced. Such philosophical assumptions are nonetheless present. We can identify a number of beliefs and assumptions in current cognitive modelling, all concerned one way or another with simplicity and with varying practical, scientific and epistemic significance. Cognitive modellers dramatically simplify real cognitive processing, and, indeed, simplicity is typically seen as a goal in itself – the simpler the model, the better the explanation. Occam’s Razor is readily cited: do not multiply entities in explanations. Behind this is the longstanding positivist desire to emulate physics by supplying what GodfreySmith [9], in the context of modelling evolution, calls a “master equation”, such †



These results are a snapshot from the time of writing. In fact, hits may accrue to some searches at rates that make it unlikely that a single researcher can read each new article. On the one hand, electronic search now means that observations with a low electronic profile risk being lost to researchers, but on the other hand, “library-type research” can reveal new real connections between data. Electronic search is itself conditioning the abstractions we make.

4

as F = ma; see also Chater and Brown’s [10] discussion of simplicity in the sense of a “law”. At the level of particular algorithms and representations, there are formal approaches that seek to show that simpler processing accounts are more likely to be true, and thus should be preferred [11]. Finally, there are illdefined aesthetic preferences for simplicity in modelling. Drawing on a recent review of idealization in science by Weisberg [12], we can catalogue a number of ways in which researchers simplify the world in order to model it, or make other modelling choices that have implications for the simplicity of the model. First, a simple model can be used as a means of investigating the dynamics of a particular cognitive process, with the aim of eventually trying to test whether the behaviour survives when the model is made more complex, or the range of the data is extended to make it more realistic. This is the classic “Galilean Idealization”, in which a temporary, unreal simplification is introduced to make modelling or theorizing tractable. Indeed, the scientific program of Artificial Intelligence as a discipline was initially based on this principle: “block worlds” typically replaced real world complexity, and the goal was to discover the principles that would solve these toy problems. Latterly, the complexity of the real world has been restored to its rightful place to some extent in some branches of cognitive science. Second, Weisberg defines “Minimalist Idealization” in which the modeller strips away irrelevant aspects of the situation in order to reveal that which is causally important. The creation of such minimal models is closely tied up with the goal of explanation. The researcher may specify a level of fidelity that the behaviour of the model is expected to attain, so that the enterprise avoids the risk of trivializing any explanation by extreme minimalization. Jones [13] suggests it may be useful to distinguish between omitting details (abstraction) and adding simplification (idealization). Third, Weisberg identifies “Completeness” as one possible representational ideal “associated with classic accounts of scientific method”. The aspiration here is the inclusion of as much as possible of the real world detail pertinent to the phenomenon under study, while still retaining the clarity of the causal explanation produced by the prior simplification. Fourth, Weisberg discusses an approach that concentrates on the primary causal factors in a process, the factors that make a difference to whether or not the phenomenon actually presents itself, as opposed to factors that simply finetune the phenomenon. Fifth, Weisberg discusses cases in which the theorist aims to maximize the precision and accuracy of the model’s output in some way that does not prioritize the transparency of the causal mechanism that produces the ouput. 5. Some problems to be addressed The picture of simplification that emerges from this one recent review of approaches to abstraction and idealization is a complex one across all the

5

sciences. The range of approaches in the computational modelling of cognition is narrower, with TRACE being a good example. The concentration is on minimalization. Often the categories used in the modelling are adopted from existing theory or ordinary language; for instance, the authors of TRACE incorporated the categories of “phoneme” and “feature” from phonology. It is relevant to ask how often the goal of “scaling up” is ever carried through in cognitive modelling. How often do researchers increase the contents of one dimension of the model so that it resembles real-world complexity? How often do researchers incorporate in their model a component that can be said to be “real”? In cognitive science research, the quickest rewards frequently come from “picking the low-hanging fruit”. There are often diminishing returns from augmenting the precision and extent of the modelling: such work necessarily takes longer for reasons of scale; sometimes the algorithms themselves are unforgiving of such scaling up; confirming the simple model may be unremarkable, and disconfirming the simple model by scaling up the demands on the model may be unconvincing if, for instance, the simple model had no claim on biological realism. Overall, though, when scaling up is not attempted it is typically for the reason that the researcher judges, implicitly or explicitly, that the minimal model is a satisfactory explanation of the cognitive process under study. In these cases there is no detailed consideration of the question “would this model work with a realistic range of x”, where x might be the number of words in a model of reading, number of eyes (i.e. two) in a model of eyemovements, degree of phonological reduction in a speech processing model, lighting conditions in a model of visual processing, and so on. Overall, the very word “model” is taken to convey simplicity as a goal. The outcome of this general approach to simplicity in cognitive modelling is that the models are typically limited in their scope for development. Adding new components runs the risk of obscuring the researcher’s initial insight by complicating the architecture. The operation of the model may make it unattractive or inappropriate to attach it to other models of related processes, in anything other than an ad hoc way. This is the general situation in which cognitive modellers find themselves, producing models that are inherently limited in the extent to which they can be developed, and in the implications they have for the rest of the field. The attitude towards simplicity in modelling is at the heart of this impasse.§

§

This issue of simplicity does not necessarily result from the fact that we are dealing with “higherorder” entities in cognitive modelling. If we were modelling neural behaviour we would still need to make motivated abstractions from the rest of the totality in which the neurons are embedded. Modellers at the neural level may still need to address questions such as “would the model still work if it were embedded in a population of astrocytes, if the full range of synapse types was used, if the population of neurons was a realistic size, and so on.

6

Concern about the philosophical assumptions in contemporary cognitive modelling is not new. Churchland, Ramachandran and Sejnowski [14] present a critique of what they call the Theory of Pure Vision. They characterise this latter “commonsense” underpinning of vision science as consisting of (a) the assumption that the goal of vision is to reproduce the 3D visual scene in all its richness in the brain, (b) the assumption that visual processing is hierarchical, and (c) the assumption that there are dependency relations between higher and lower order processing. (See, also, Marr [15]). While acknowledging the success that such an approach has had in the past, they question the adequacy of such assumptions for future progress. Their scepticism is based on the case that they make for what they call Interactive Vision, in which they question the artificial consideration of vision isolated from issues of motor control, and consider examples of processing that subvert conventional notions of hierarchy and dependency relations. They acknowledge that their Interactive Vision is sketchy, and they call for a new set of concepts adequate to describing interactive systems. It will be clear from our discussion here that what is at the core of Churchland et al.’s disquiet over the direction of vision science is how vision scientists have applied abstraction to their subject matter; indeed, this interpretation is clear from Churchland et al.’s own very brief consideration of the issue of idealization, in the second paragraph of their article. Issues of causality and explanation follow in the wake of any discussion of abstraction and idealization. Overall, we have directly and indirectly identified a number of questions in this brief excursion into the role of simplicity in cognitive modelling. 1.

2.

3.

4.

What is the best way to simplify the complexity of real world cognition? Are there criteria for taking away something that appears not to be contributing to the behaviour of interest, in order to leave that which we judge to be crucial? Are there criteria for replacing some complex part of the domain under study with something different and simpler? Can we do such a thing for everything in the domain? Do we leave it like that for the whole of the modelling, or do we re-introduce real-world details at a later stage? In (1) (abstraction) and (2) (idealization) above, is one or the other preferable? And is there a criterion for judging how far to abstract or to idealize? How do we know if we have taken away or replaced enough material? What is the goal of the modelling? Do we expect a particular model to be only temporary and to be quickly updated as laboratory experimentation reveals more of the true picture? What are the limits on how much a model can be updated and retain its original integrity and insights?

7

5.

What are the criteria by which a modeller may fill in real world details in a simplified model? Are there different ways of doing this? Is there one privileged order for filling in such details? 6. What is the relationship between the modeller and the existing empirical literature; what criteria does the modeller have for taking up or ignoring the categories and distinctions of the existing literature? Does the chronological development of the existing empirical literature have to be respected in any sense? 7. Investigating cognitive impairment has had a high profile in cognitive science. What is the relation between impaired and normal processing in the modelling endeavour? 8. How are we to understand causality in all of this? Should we start speaking about causation at the most abstract level that we have characterised, perhaps identifying the primary causal factors cited by Weisberg? If so, how do we identify these factors; how are they characterized, and can there be more than one of them in a domain? Should we hold fire in talking about causality until we have assembled all of the elements in the domain that “make a difference”; if so, how indirect can this making a difference be? 9. How are we to understand explanation in all of this? At what point do we have an adequate explanation, at what point a full explanation? 10. Is simplicity a modelling goal in itself? Do we have a full explanation when we have obeyed Occam’s Razor and produced the sparsest model of the data? Or do we have a fuller explanation of the process once we have introduced more details? Or is there some further position involving both of these processes? 11. Is the goal of the modelling of higher cognition to understand some part of the cognition of an idealised, general adult? This is perhaps one of the most central questions we can ask. The laboratory study of any higher cognitive process tends to reveal subtypes, initially, and then a range of individual differences in which any one individual studied seems to be a unique, complex constellation of particular parameterizations, and particular strengths, weaknesses and compensations, all interacting in a momentary way. Should the researcher aim to produce an implemented model that is able to reflect the idiosyncrasy of a real individual? If so, how encapsulated can this model be from the rest of the individual and their cognition, and from the precise conditions of testing? In conclusion, we return to the issue of completeness, cited by Weisberg; the aspiration that the model should reflect real-world complexity as fully as possible, while still retaining the contours of an explanatory, causal structure. This modelling goal echoes Hegel’s classic dictum “the truth is in the whole”. It brings us to our provisional conclusion concerning simplicity, explored and illustrated more fully elsewhere [1], that the seeming impasse in much cognitive science modelling may be resolved by adopting this

8

modelling goal – the fullest possible representation of the real-world complexity of the modelling domain (and with additional criteria for which part of the model should be real). We adopt this position in opposition to the exclusive concern with simplicity in contemporary cognitive science modelling. This is not to say that demonstrations of the role of simplicity in relation to probability and cognitive processing are wrong – rather, it is the epistemic sense of simplicity that concerns us. Nor is it to say that Occam’s Razor is wrong – just that it only tells us half the story. To have the fullest explanation and understanding of a cognitive process we need to be able to see it both in its simplest manifestation and in its fullest manifestation. Our goal in this paper has been to make this single point, against solely relying on simplicity, and for taking the fullest complexity into consideration in our theorising about cognition and in our computational modelling of cognition. This point about completeness, about the fullest complexity and the whole, has been made before in other domains of study, and we repeat it here in the case of computational cognitive modelling. Our whimsical title comes from a desire to produce a name as memorable as “Occam’s Razor”, and staying within the tonsorial metaphor, for the method of revealing and accentuating the most elaborate structure possible … hence “Hegel’s hair gel”**. References 1. Shillcock, R., Roberts, M.A.J., Kreiner, H., Obregón, M., and Monaghan, P. Principles in the modelling of eye-movements in reading (submitted). 2. Anderson, J. R. ACT: A simple theory of complex cognition. Amer. Psych., 51, 355-365 (1996). 3. McClelland, J.L., and Elman, J.L. The TRACE model of speech perception. Cog. Psych., 18, 1-86 (1986). 4. Seidenberg, M. S. and McClelland, J. L. A distributed, developmental model of word recognition and naming. Psych. Rev., 96, 523-568 (1989). 5. Norris, D. The Bayesian reader: Explaining word recognition as an optimal Bayesian decision process. Psych. Rev., 113, 327-357 (2006). 6. Marslen-Wilson, W.D., and Welsh, A. Processing interactions and lexical access during word recognition in continuous speech. Cog. Psych., 10, 2963 (1978). 7. Norris, D., McQueen, J.M., and Cutler, A. Merging information in speech recognition: Feedback is never necessary. Behav. & Brain Sci., 23, 299-370. (2000). 8. McClelland, J.L., Mirman, D., and Holt, L.L. Are there interactive processes in speech perception? TICS, 10, 363-369 (2006). 9. Godfrey-Smith, P. Darwinian populations and natural selection (2009). 10. Chater, N., and Brown, G.D.A. From universal laws of cognition to specific **

Thanks to Jon Oberlander for this joke, and for innumerable insightful discussions.

9

cognitive models. Cog. Sci., 32, 36-67 (2008). 11. Chater, N., and Vitanyi, P.M.B. Simplicity: A unifying principle in cognitive science? TICS, 7, 19-22 (2003). 12. Weisberg, M. Three kinds of idealization. The Journal of Philosophy, 104, 639-59 (2007). 13. Jones, M. Idealization and abstraction: A framework. In M. Jones and N. Cartwright (eds.), Idealization XII: Correcting the Model: Idealization and Abstraction in the Sciences (New York: Rodopi), 173-217 (2005). 14. Churchland, P., Ramachandran, V., and Sejnowski, T. A critique of pure vision. In C. Koch, & J. Davis (Eds.), Large-scale neuronal theories of the brain (pp. 22-60). Cambridge, MA: MIT Press (1994). 15. Marr, D. Vision. New York: W. H. Freeman (1982).