Beyond validation Alternative uses and associated assessments of goodness for computational social models Jessica Glicken Turnley, Peter A. Chew, Aaron S. Perls Galisteo Consulting Group, Inc.
Defense Threat Reduction Agency Office of Strategic Research and Dialogues Contract Number HDTRA111P0007
June 2012 Approved for public release; distribution is unlimited
This product is the result of collaboration between the Defense Threat Reduction Agency’s Office of Strategic Research and Dialogues and Galisteo Consulting Group, Inc. The views expressed herein are those of the authors and do not necessarily reflect the official policy or position of the Defense Threat Reduction Agency, the US Department of Defense, or the United States Government.
The mission of the Defense Threat Reduction Agency (DTRA) is to safeguard America and its allies from weapons of mass destruction (chemical, biological, radiological, nuclear, and high explosives) by providing capabilities to reduce, eliminate, and counter the threat, and mitigate its effects.
The Office of Strategic Research and Dialogues (OSRD) supports this mission by providing long-term rolling horizon perspectives to help DTRA leadership identify, plan, and persuasively communicate what is needed in the near term to achieve the longer-term goals inherent in the agency’s mission. OSRD also emphasizes the identification, integration, and further development of leading strategic thinking and analysis on the most intractable problems related to combating weapons of mass destruction.
Turnley, Chew, Perls Beyond Validation
Acknowledgements: We would like to thank Jennifer Perry of the Defense Threat Reduction Agency (DTRA) for her contributions and support to this piece. She provided creative ideas and thoughtful comments and critiques that added to and improved the substance of this work, including strong contributions to the problem formulation and the research design. She also provided outstanding project management, keeping us on our toes while smoothing out road bumps.
Turnley, Chew, Perls Beyond Validation
EXECUTIVE SUMMARY Socio-cultural approaches have gained new prominence in national security analyses since the terrorist attacks of September 11, 2001. These events raised to prominence questions about motivations for such extreme behavior and the characteristics of its supporting social infrastructure. Actions that we in the United States (US) might take to reduce the likelihood of this kind of behavior in the future also came under scrutiny. Clearly, the array of new security measures (such as airport passenger screening) has been a part of the response. But there is now a significantly larger contingent in the national security community arguing the importance of understanding motivation and intent, including the role of social networks, culture and religion. It is argued that gaining this understanding will allow us better to anticipate attacks, interdict them, and predict the results of US intervention in target populations such as those in Afghanistan. Computational social models, or computational representations of social phenomena, have promise to help in these types of analyses. However, their increased visibility and sophistication have raised a variety of methodological questions. Questions of validation or, more broadly put, questions around how to assess the models’ goodness, are among these. This discussion addressing these methods issues is structured around the following three questions: 1. Are there characteristics of the sociocultural domain that make certain aspects of it very difficult, if not impossible, to model computationally in ways that would allow those models to meet the standards of proof required by classic definitions of validation? 2. What usefulness (if any) do computational social models have in addition to prediction? 3. If computational social models are used for purposes other than prediction, what assessments of goodness or effectiveness could be used for these models in these other applications? We address the first question about the possible peculiarity of the sociocultural domain with regards to validation by comparing it to three other domains that use computational models and apply validation methodologies to them. The three comparative domains are Page 1
Turnley, Chew, Perls Beyond Validation
systems ecology, physics, and computational linguistics. We find that, like the target systems in systems ecology, the open, complex dynamic nature of sociocultural systems raises some serious questions about the applicability of classic validation models used in physics and computational linguistics. Physical systems are amenable to reductionism, and computational linguistics uses closed, static data sets; in both cases, this allows validation methods to be differently applied than they would be to open, complex, dynamic systems. Thus the sociocultural computational social models in which the national security community is interested generally are what Refsgaard and Henriksen (2004) called engineering models. They begin with some general fundamental principles but are readily modified according to the requirements of some local data set. While these models are useful as explanatory models for that particular time and space, there are significant limitations on their generalizability and hence on the applicability of classic validation processes as an assessment tool. In all domains, we find that that validation is a process of comparing a model to new sets of data until the user is confident enough that the model will produce reliable results in the use domain that he is willing to apply it there. Validity thus is a function of the relationship between a user and a model, not a characteristic or state of the model itself. This, then, raises questions about the nature of the decision for which the model is developed (the use domain), and of how ‘confident enough’ translates into a determination of the riskiness of that decision. The use domain and the definition of risk are socio-cultural questions, questions about a user immersed in a particular use environment. Introduction of the user returns us to the purpose of constructing and analyzing computational models: to gain more familiarity in the target domain (to learn about it) so one could apply that familiarity in a problem-solving way. This allows us to address our second question: What usefulness (if any) do computational social models have in addition to prediction? We argue that the usefulness of a computational model, restated as its role, will depend upon the user’s preferred style of learning new information (where a user may be an individual or an organization), and on the use context for the model. We explore two general learning styles: cognitivism and constructivism. The cognitivist learner is a passive agent who receives context-free information from his environment. The professor professes and the student receives. The student’s brain processes incorporate new information into existing schemata, or use it to create new Page 2
Turnley, Chew, Perls Beyond Validation
schemata, which in turn drive his behavior. A model designed for prediction will be best suited to a user with a cognitivist learning style. The model is used in an advisory fashion, providing a piece of information (a ‘prediction’) which the learner then incorporates into his knowledge schema. This provides strong agency to the model (the ‘teacher’) and puts the learner into a passive role. The model-as-advisor role can lead to incremental changes in the knowledge structure of the model user, but rarely to big paradigm shifts. It also reifies the model as an information-providing tool. It works well in certain situations where information is required quickly and in environments where it is assumed the future will look much like the past (trend extrapolations, not paradigm shifts), the social environment is resistant to change, and the cost of failure is high. Finally, this use of models-as-advisors fits well with a technocratic approach to governance. The models are created by scientists or experts: the models then provide science/expert-based information to the decision-making process. The constructivist learner, on the other hand, generates or creates knowledge and meaning from the interaction of current knowledge with new knowledge or experience. Context is critical, and the history of the individual learner matters. The active agent is the learner, not the information provider. We argued elsewhere (Turnley and Perls 2008) that all models are metaphors, selecting some (but not all) elements and relationships from the real world to re-present. We take this further here, arguing that learning through metaphors (by using models) is to employ a constructivist method: the learner is a very active, creative agent in the learning process. He chooses the elements that he believes best explain the target environment. Using a model in a constructivist fashion is to use it in an explanatory rather than a predictive way. The model-as-metaphor approach can lead to creative moments but can be personally and institutionally highly disruptive so it is most useful in an environment open to change, and where conditions are recognized to be highly fluid. It also is very resourceintensive to implement. It requires each individual to engage with the model in order to learn, so it is very expensive in terms of human resources. In short, to use a model as an advisor is to use it to provide input to a particular decision. To use a model as a metaphor is to use it to either further the user’s understanding of the target system, or as a means to stimulate creative thinking. And, of course, any engagement with a model usually involves some combination of both.
Page 3
Turnley, Chew, Perls Beyond Validation
So this takes us to our third question: If computational social models are used for purposes other than prediction, what assessments of goodness or effectiveness could be used for these models in these other applications? If they are to be used to help people predict, the process of validation (with all necessary caveats) appears to be the best approach to assessing the model. But to discount a model because it is not accurate is to discount its creative power. As Box so famously said, all models are wrong but some are useful (Box 1979). Metaphors, such as the US Department of Defense is like Mars and the US Department of State like Venus, may not be accurate although they may be true. So if a model is assessed not as a context-free artifact or data point but as an integral part of a wicked problem, as a means of or method for learning about the dynamics, structure and function of a target system, the assessment process will be different. In these cases, the measure of goodness would be in the measure of the learning that took place in the model users or in the assessment of new ideas generated. It is an assessment of a method of learning rather than of the accuracy of a data point. So an assessment of the goodness of a particular model must begin with a clear understanding of the relationship among the model user, the target problem, and the model based on questions such as the following:
Is the model designed to expand the boundaries of general domain knowledge (a research model) or to answer a question about a particular time and place?
How accurate must a description of a future state be in order to address the target problem?
How significant are the consequences of failure to predict/forecast a future within the required accuracy bounds?
How has the model dealt with domain issues generated by the open, dynamic, complex nature of social systems?
How does what the user learned from engagement with the model help the user reframe the problem (the ‘wicked problem’ problem)?
If the user needs a highly accurate prediction from the model, he will require a model whose underlying structure has been calibrated against enough sets of data that the user is confident enough for his particular decision that the model will provide the ‘right’ answer Page 4
Turnley, Chew, Perls Beyond Validation
when applied in his use case. In this instance, the model will generally be used in a way that will perhaps extend and stretch existing cognitive schema, but will not radically change or replace them. Use of the model in this way allows the information it generates to be easily transferred from those who ‘run’ or exercise the model to many in the decisionmaking process. If, on the other hand, the user is interested in challenging existing cognitive schema, he can engage with the development and challenge the structure of the model. Asking questions around each one of the areas of potential flaws enumerated above (theoretical, empirical, parametrical, and temporal) will stimulate the user to challenge his own assumptions – and quite likely modify both the model and the problem in the process. Clearly, this requires that each model user engage with the model builders at a point quite different in a decision process than that in our previous example. In this case, however, the model need not be validated. It is here that its goodness will be evaluated through the stimulation of imagination and the generation of new possibilities, through changes in the model user. The ability to predict is critical as governments exercise their national security regimes. The 9/11 commission report states: “We believe the 9/11 attacks revealed four kinds of failures: in imagination, policy, capabilities, and management.” The report asserts that we had most of the data necessary to anticipate such an attack. In fact, earlier administrations had discussed that eventuality. The failure was not a failure to predict. It was a failure to imagine. The challenge for any organization utilizing computational models is to balance its need for information (prediction) with its need for increased insight or understanding. To put it another way, it needs to balance its requirements for stability and continuity (both important organizational features) with the possible behavioral or institutional change and adaptation provided by an active, creative engagement with the models. Any organization needs to balance the use of analytical reasoning and problem solving with the exercise of imagination. June 2012
Page 5
Turnley, Chew, Perls Beyond Validation
Beyond validation: Alternative uses and associated assessments of goodness for computational social models 1 Jessica Glicken Turnley2, Peter A. Chew, Aaron S. Perls Galisteo Consulting Group, Inc. 2403 San Mateo Blvd NE, Suite W-12 Albuquerque, NM 87110, USA 505.889.3927
1 This paper was prepared partially with funds from the Defense Threat Reduction Agency, US Department of Defense.
The views expressed herein are those of the authors and do not necessarily reflect the official policy or position of the Defense Threat Reduction Agency, the US Department of Defense, or the United States Government. 2 Corresponding author: Jessica Glicken Turnley, Galisteo Consulting Group, Inc. 2403 San Mateo Blvd NE, Suite W-12,
Albuquerque, NM 87110. 505.889.3927.
[email protected]
Page 6
Turnley, Chew, Perls Beyond Validation
CONTENTS 1
Introduction ................................................................................................................................................... 9
2
Setting the stage ........................................................................................................................................ 11
3
4
2.1
The nature of models ...................................................................................................................... 11
2.2
Definitions of validation: a brief introduction ....................................................................... 14
2.3
Models, validation and prediction ............................................................................................. 17
Is the ability to validate domain-dependent? ................................................................................ 19 3.1
Systems ecology ................................................................................................................................ 20
3.2
Physics .................................................................................................................................................. 27
3.3
Computational linguistics.............................................................................................................. 32
3.4
Summary .............................................................................................................................................. 38
What role does the user play in validation? ................................................................................... 45 4.1
The user ................................................................................................................................................ 45
4.1.1
Learning styles .......................................................................................................................... 46
4.1.2
Creativity ..................................................................................................................................... 50
4.1.3
Risk ................................................................................................................................................ 51
4.2
Connecting models to users ......................................................................................................... 54
4.2.1
Models as advisors .................................................................................................................. 55
4.2.2
Models as knowledge creators ........................................................................................... 57
4.3
Summary .............................................................................................................................................. 66
5
How can the goodness of models be assessed? ............................................................................ 68
6
Summary and conclusion ...................................................................................................................... 72
7
References ................................................................................................................................................... 78
Page 7
Turnley, Chew, Perls Beyond Validation
LIST OF FIGURES Figure 1. Relationship between Type A models, Type B models, and the target system ....... 13 Figure 2. Relationship of verification and validation to models, computer simulations, and physical reality .............................................................................................................................. 15 Figure 3: From theory to application ......................................................................................................... 62 Figure 4: Afghanistan/COIN dynamics ...................................................................................................... 70 Figure 5: Using a model ................................................................................................................................... 76
LIST OF TABLES Table 1: Verification, validation and calibration and different model types .............................. 17 Table 2: Domain characteristics of systems ecology ............................................................................ 27 Table 3: Domain characteristics of physics.............................................................................................. 31 Table 4: Domain characteristics of computational linguistics ......................................................... 37 Table 5: Validation in other domains ........................................................................................................ 39
Page 8
Turnley, Chew, Perls Beyond Validation
1 INTRODUCTION Socio-cultural approaches have gained new prominence in national security analyses since the terrorist attacks of September 11, 2001. These events raise to prominence questions about motivations for such extreme behavior and the characteristics of its supporting social infrastructure. Actions that we in the United States (US) might take to reduce the likelihood of this kind of behavior in the future also came under scrutiny. Clearly, the array of new security measures (such as airport passenger screening) has been a part of the response. But there is now a significantly larger contingent in the national security community arguing the importance of understanding motivation and intent, including the role of social networks, culture and religion. It is argued that gaining this understanding will allow us better to anticipate attacks, interdict them, and predict the results of US intervention in target populations such as those in Afghanistan. Computational social models, or computational representations of social phenomena, have promise to help in these types of analyses. These models have become increasingly visible in a variety of planning and analysis environments, including corporate, environmental and military/national security3 arenas. Their increased visibility and sophistication have raised a variety of methodological questions. Questions of validation or, more broadly put, questions around how to assess the models’ goodness, are among these. This is particularly important in the national security arena where analysts, planners and others use models as they develop solutions that potentially have high consequence outcomes, such as actions which might put lives in jeopardy. Computational models are well established elsewhere in high-consequence national security applications. For example, computational models have been relied upon since well before 9/11 for the research, design, testing and evaluation of weapons. To guide users in assessing the reliability of these models, agencies including the US Department of Defense and US Department of Energy have formulated definitions and methodologies of
3
When we refer to ‘national security’ in the context of using social models, we use this phrase in a broad sense encompassing the national security community, problems of national security, analysis of those problems, and related decision making.
Page 9
Turnley, Chew, Perls Beyond Validation
verification – establishing how accurately a model represents the developer’s requirements – and validation – establishing a model’s predictive capability. Now that computational social modeling has come of age, attempts are being made to apply these definitions and methodologies to models of social phenomena. What has interested us is the possibility that classic validation methods and associated expectations might not transfer well to the social domain, unlike verification which does. The social domain usually deals with phenomena which are unobservable directly, such as belief systems, and with highly complex, dynamic systems. This creates conditions which militate against the development of predictive models. However, we will argue that there are uses of computational models other than prediction which accommodate these otherwise problematic domain characteristics, allowing the computational social models still to serve as useful tools. We are most emphatically not arguing that predictive models are impossible to create in the social domain. We are arguing, however, that certain questions in the social domain cannot be addressed by predictive models. So if in some cases it does not make sense to develop predictive social models, we must ask whether and how non-predictive models can in fact add value. If they can, we must still assess their effectiveness, making explicit our criteria for model evaluation. This is necessary to establish the models’ credibility; without credibility, there is no reason for a decision-maker to use a model at all. To summarize, we will address three questions: 1. Are there characteristics of the sociocultural domain that make certain aspects of it very difficult, if not impossible, to model computationally in ways that would allow those models to meet the standards of proof required by classic definitions of validation? 2. What usefulness (if any) do social models have besides prediction? Note that the answer to this question will be useful even if computational social models can be used in a predictive manner. 3. If computational social models are used for purposes other than prediction, what assessments of goodness or effectiveness could be used for these models in these other applications?
Page 10
Turnley, Chew, Perls Beyond Validation
This report is structured as follows. We start in section 2 with some definitions: first, our own definition of the term ‘models’; and secondly, standard definitions of validation by major US federal agencies. This establishes a baseline for our discussion and allows us to conclude section 2 by considering the interplay of models, validation, and prediction. In section 3 we consider other domains which we believe have relevance to sociocultural modeling: systems ecology, physics, and linguistics. All these domains have a robust methodological literature focused on model validation which can potentially provide some lessons for us. In section 4, we provide some background for our question about alternative uses of computational social models with an exploration of different types of learning and decision-making, and the correspondingly different ways in which computational models can contribute. We weave the discussion together in section 5 which answers questions 2 and 3 above by addressing multiple ways in which the goodness of a given computational social model could be assessed. Finally, our discussion is summarized and concluded in section 6.
2 SETTING THE STAGE This section covers two topics. We first give a short synopsis of the nature of models as we presented it in our earlier work (Turnley and Perls 2008). This is important as it provides the first chapter in our story here. If we are to describe how we assess the goodness of models, we must first have agreement on the nature of those models - at least agreement for the sake of this discussion. The second topic in this section provides an overview of accepted definitions of validation. If we are to challenge those definitions, we must be clear as to what they are.
2.1 THE NATURE OF MODELS Before we delve into the nature of models themselves, we need to recognize that computational social models used in the national security community are of several types. Social network models and associated social network analysis (SNA) tools, including Bayesian analysis, are used by those who identify individuals of importance in communities (for high value targeting, key personnel engagement, and other applications) and for tracing the flows of information, goods and cash among agents in order to disrupt or perturb them. Agent-based models (ABM) are favored by those trying to understand how organizations work, whether those organizations are formal, such as a government agency, Page 11
Turnley, Chew, Perls Beyond Validation
or informal, such as a terrorist cell or a local population in some area of interest. And systems dynamics (SD) models, the third type, are used to gain an understanding of the relationships among the elements of a given system, where a target system could be a counterinsurgency operation, a nation-state, or an irrigation association in a local valley. The principles of construction of the three model types vary, as do the ways in which they engage with data. They also target different aspects and dimensions of the sociocultural realm – structure and flow for social network models, behavior of individual agents for ABMs, and a system’s elements and the interrelationships between those elements in the case of SD models. As we will see, social network models have somewhat different methods of comparison to their target data set, i.e. different methods of validation, than do the other two primarily because of the different nature of the data set. All that said, most of what we say here is applicable to models of all three types. Note that we said that each of these model types targets different aspects and dimensions of the sociocultural realm. That introduces a critical aspect of a model: no model is simply isomorphic to its target system; instead, all models select features of interest from that system to re-present. By this we mean that a model builder can never represent all objects and relations in the target system, but instead must decide which of these objects and relations are the most pertinent for the model’s particular intended uses (see Turnley and Perls 2008). If such selection is not exercised, the model would be the target system and would be of limited analytic utility. We argued in our earlier report (Turnley and Perls 2008) that this logic of selection is expressed as theory. Theory is often not recognized and managed explicitly in computational model construction, particularly in the social domain. We believe that the logic of selecting certain elements for inclusion in a model should always be made explicit in the model documentation, and the requirement to do so should be a de facto standard in model construction. We further argued (Turnley and Perls 2008) that this logic of choice of elements for inclusion in a model is an analogical one (see Figure 1). An analogy is a relationship that posits that the parts and relationships of one system are ‘like’ the parts and relationships of a second. We construct an economic or religious or kin-based picture of the social domain and say it is ‘like’ the target area, although it clearly abstracts what we (the model builder) believe are relevant aspects. We call these analogical or theoretic structures ‘type A models.’ A type A model can be instantiated in time and space by populating it with real world data which yields a type B model. The tribal structure of the Pashtuns would be a Page 12
Turnley, Chew, Perls Beyond Validation
type B model. Every type B model (every model of a particular instance) is based upon at least one type A model (i.e., is structured by a model or models of theory). To continue our example, the tribal structure of the Pashtuns is built on some abstract model of kinship. We have posited kinship as an important feature of the social landscape in northeast Afghanistan. Our data pushes us toward a particular type of kinship structure (patrilineal tribal structures) so we construct a type B model. So broadly, a type A model is one which describes some principle (such as kinship) at an abstract level while a type B model is an instantiation of a type A model for a particular use case or set of circumstances (tribal structures in northeast Afghanistan). (See Turnley and Perls 2008 for an in-depth discussion of this topic.) Type B model (Real world case – A particular instance of theory) instantiation
Type A model (Theory The familiar thing)
Target system (The unknown)
isomorphism
Figure 1. Relationship between Type A models, Type B models, and the target system
So if every computational social model includes both a type A model (a model of theory, a logic of selection from all possible social data) and a type B model (a model of data that instantiates the model in a particular time and place), the type B model should be checked for how accurately it reflects the theoretical model, and the theoretical model should be justified by a history of its testing against such data sets in many times and places through the construction of many type B models. To continue our kinship example: the type A model could be a theory about ways in which kinship can and does structure social relationships, derived from research of its applicability to many times and places which should be documented in the model formulation. One possible type B model would be the particular instance of that structure represented by data on the Pashtuns in (e.g.) 2011. The type B model would be checked for the accuracy of its representation of the type A model (verified). Deviations from the type A model could be due to poor or incomplete data, or due to some flaw in the type A theoretical model. If it is the latter case, we would need to see many more instances of the same inconsistency before we could argue that a modification to the type A model was necessary. (Note that a specific type B model can be Page 13
Turnley, Chew, Perls Beyond Validation
verified against an incorrect type A model. The ‘incorrectness’ of the type A model would be revealed if the next n type B models could not be verified.) As a relevant aside, some would argue that data-driven computational models are models of a different sort. These models begin with a corpus of unannotated data. Many computational linguistics and social network models work this way. By seeking patterns of various types in a data set, the model can elicit or construct theories from the data. However, one could argue that the a priori definition of which patterns are of interest (note: not which patterns exist) is itself a construction of a theoretical model, albeit a highly abstract one. Social network models provide a useful example. The network is the result of the examination of connections between nodes – the structure emerges from the data. But the definition of the type of connection and node of interest has to be driven by some logic of selection. Hence theory is present, albeit at some very preliminary point.
2.2 DEFINITIONS OF VALIDATION: A BRIEF INTRODUCTION We now turn to more formal definitions of validation that are at play in today’s computational social science realm. This section will provide just a brief introduction. The following sections, which discuss how validation is treated in different target domains, will add depth and breadth. Since our focus is on the national security application of computational social models, we will consider definitions of validation from both the US Department of Defense (DOD) and the US Department of Energy (DOE). (DOE is responsible for research and development on the nation’s nuclear weapons stockpile and its national laboratories also do much work – both research and development [R&D] and non-R&D – for DOD and other national security agencies.) We then discuss how prediction relates to computational social models, and introduce an alternative analytic structure to prediction: imagination. We emphasize here and throughout our discussion that our goal is not to discredit the predictive power of models. Rather we are looking for alternative ways in which computational social models can be of use in critical national security as well as other applications should they fall short of the predictive accuracy often asked of them (for an example of where social models are held to such a standard, see Tumasjan et al. 2010; using social models to predict the effects of an information operations campaign, for example, would also fall into this category).
Page 14
Turnley, Chew, Perls Beyond Validation
We start with a brief disclaimer: we are not addressing verification, although verification and validation are often conflated in the literature (see, for example, US Department of Defense 2009). Verification compares the model to the developer’s requirements (US Department of Defense 2009). More formally, verification is a process which “provides evidence, or substantiation, that the mathematical model, which is described from the conceptual model [our type A or theoretical model], is solved correctly by the computer code that is being assessed” (Oberkampf and Trucano 2007: 12). From our earlier discussion on the nature of a model, verification checks that the type B model constructed from data matches the structure required by theory (the type A model). As a process, validation differs significantly from verification, as validation compares the model to the ‘real world’ or observable reality (see Figure 2). The DOE and DOD definitions of validation are very similar. The DOD defines validation as “the process of determining the degree to which a model or simulation and its associated data are an accurate representation of the real world from the perspective of the intended uses of the model.” (US Department of Defense 2009). The DOE’s definition of validation (borrowed from the American Institute of Aeronautics and Astronautics [AIAA]) is the same, except that it eliminates ‘or simulation and its associated data’ (AIAA 1998). Note particularly that both DOD and DOE describe validation as a process, not a determination of a quality or characteristic of a model at a single point in time. Note also that the definition of validation is conditional on its use. These points will be important later in the discussion.
Physical reality
Validation
Computer simulation
Verification
Conceptual model
Figure 2. Relationship of verification and validation to models, computer simulations, and physical reality
Validation also differs from calibration. The National Institute of Standards at the US Department of Commerce (2006) tells us that calibration is an operation that establishes a relationship between a measuring system applied under certain conditions and some set of standards. In our case, the measuring system would be a computational social model developed for a problem specific to a particular place and time (a type B model) and the set of standards would be ‘the real world,’ or actually that segment of the real world defined by the problem. Note that only those models which are applied under specified conditions (space and time, in our case) can be calibrated. These are what we referred to earlier as Page 15
Turnley, Chew, Perls Beyond Validation
type B models. Models that present universal processes or theories (type A models) cannot be calibrated. As we suggested earlier, these are instead validated. Engagement with type A and type B models (and with validation and calibration), is often operationalized by the distinction between science and engineering. Science seeks universally applicable ‘truths.’ Engineering is interested in the application of those truths in the world (see, for example, Jensen et al 2007 for the way in which science and engineering are differently managed in the research and development enterprise). We illustrate the differences between the application of science and engineering models (validated and calibrated models, respectively) with an example. The example we use is the modeling used for vehicle electronic control module (ECM) programming which controls fuel injection. In a fuel injection system, the programmer essentially makes a multi-dimensional map of air flow rates through the engine at given speed and throttle levels. This is based on an “ideal” volumetric efficiency (VE), which represents how much air should be moving through the engine (a cylinder) at a given rpm/throttle according to physical laws. If all premises were correct, all VE maps would read 100%. In practice, a map for a given engine will be far above or below this mark. To solve the problem, the mechanic disregards the premises from the physical laws and manually ‘tunes’ (calibrates) the engine to find the correct VE percentage values i.e. finds the VE where the engine performs most efficiently. Other variables also come into play to tune the engine correctly, further impacting the fuel injection or VE model. In our engine example, the initial computational model must be valid (it must incorporate laws of physics – a type A model – which have been proven to be true, i.e. have been validated). However, the final computational model of the particular engine, a type B model, is not valid in the strict sense of the term. However, although not valid, it is useful as it provides a reference point for the manual tuning of the engine. Here a successful process is heavily dependent upon the implicit knowledge of the mechanic, knowledge often called ‘craft’ or technical or engineering rather than scientific knowledge, which builds upon the base provided by the validated (type A) model. Once again, the type B computational
Page 16
Turnley, Chew, Perls Beyond Validation
model of a particular engine is not predictive of the performance of this or other engines but still functions as a useful decision support tool.4 Table 1 summarizes the above discussion. It illustrates this relationship between type A and type B models on the one hand, and verification, validation and calibration on the other. Table 1: Verification, validation and calibration and different model types
Type A models Type B models
Can be verified Yes Yes
Can be validated Yes No
Can be calibrated No Yes
2.3 MODELS, VALIDATION AND PREDICTION Discussions of validation always raise the question of predictive capability. To set the stage for how we will address prediction in this discussion, we present a story. One of the central tenets of US foreign policy doctrine, from the 2003 invasion of Iraq through the end of the George W. Bush presidency (and arguably beyond), has been to spread democracy, especially in the Middle East but elsewhere too. In practice, this policy contributed to the overthrow of Saddam Hussein in Iraq, the Taliban in Afghanistan, and subsequent nation-building efforts in both nations. During the same time period, it also led the US to put pressure on the Palestinian Authority to hold elections in the Palestinian territories, elections which were eventually held in January 2006. Arguably, the consequences in Iraq and Afghanistan are still being played out, but the starkness of the end result of the Palestinian elections (the victory in Gaza of Hamas, an organization which was and still is designated as a terrorist organization by the US Department of State) is clear. Prediction thus is important in a national security and other arenas. The result of the Palestinian elections was that an organization that supported terrorism was empowered in a territory previously controlled by a non-terrorist organization (Fatah). Had we been able to predict the outcome, surely our pressures on the Palestinian Authority to hold the
The authors are grateful to various technical personnel at Sandia National Laboratories who stimulated this type of thinking about this problem in conversations about their own work. 4
Page 17
Turnley, Chew, Perls Beyond Validation
elections may have been modified. Then US Secretary of State Condoleezza Rice, speaking of her own staff, said: ‘I’ve asked why nobody saw it coming… It does say something about us not having a good enough pulse’ (Weisman 2006). An important element in the context of the Gaza intelligence failure was, moreover, the fact that intelligence failures surrounding 9/11 were still fresh in the collective memory of the US Intelligence Community; the 9/11 Commission Report had been issued just the previous year (National Commission on Terrorist Attacks Upon the United States 2004). Yet this same 9/11 Commission Report stated “We believe the 9/11 attacks revealed four kinds of failures: in imagination, policy, capabilities, and management” (ibid.:339), We had most of the data necessary to anticipate the 9/11 attacks. In fact, earlier administrations had discussed that eventuality. The failure was not a failure to predict. It was a failure to imagine. We can discuss why that failure to imagine occurred. There is a voluminous literature on the subject, much of it based on Heuer’s (1999) well-known discussions of analyst bias. But that is not the road this discussion needs to follow. We raise the failure of imagination to counterbalance the importance of prediction in an environment as high-consequence as national intelligence. We will follow this play through the course of this discussion. As stated previously, our aim here is not to discredit or discount the predictive capability of computational social models. Rather our purpose is to identify the limits of such a capability, and suggest alternative ways in which these models can be highly useful.
Page 18
Turnley, Chew, Perls Beyond Validation
3 IS THE ABILITY TO VALIDATE DOMAIN-DEPENDENT? We have posited that classic validation (which we define as validation tied to a model’s predictive accuracy) is problematic in the sociocultural domain. In this section, we look at other domains where computational models have been used for many years – decades in the case of physics. These domains have developed a robust methodological and critical literature on validation. Our goal is to determine whether classic validation in these domains is, in fact problematic, and if so, why. By selecting domains analogous in some way to social and/or cultural data and/or analytic processes, and identifying if and why validation in those domains is challenging, we believe that we can gain clearer insight into the issues encountered in validation models targeting the sociocultural domain. This will help us with our first question:
Are there characteristics of the sociocultural domain that make certain aspects of it very difficult if not impossible to model computationally in ways that would allow those models to meet the standards of proof required by classic definitions of validation?
The fields we have chosen are systems ecology, physics, and computational linguistics. There are differing reasons for our choice of each of these domains. We briefly address those reasons, then move to a more in-depth examination of validation in each of these domains. The target domain of systems ecology is highly complex, involves multiple timescales operating simultaneously, and draws on data and theory from a multitude of subfields. All of these domain features are also true of the sociocultural domain. Not surprisingly, literature that is relevant to our exploration of validation of computational models in systems ecology came from a diversity of disciplines such as geology and hydrology, as well as ecology. In the same way, methods work in the sociocultural domain draws on anthropology, sociology, economics, psychology, and other disciplines. In physics, our initial interest in modeling and hence in validation stemmed from our experience in the world of nuclear weapons physics – a primary mission area of DOE. For nearly half a century, final judgments about the safety, performance and reliability of the US nuclear stockpile were based primarily on the results of nuclear tests. The Comprehensive Nuclear Test Ban Treaty (CTBT) adopted by the United Nations General Assembly in 1996, which has been signed (but not yet ratified) by the US, led to a change in approach by DOE. For our purposes, the most notable change was the implementation of a Page 19
Turnley, Chew, Perls Beyond Validation
multi-million dollar program known as Advanced Simulation and Computing (ASC). Through sophisticated computer simulations (for example, of nuclear explosions), ASC aims to allow weapons scientists to understand the complex aging process of weapons components and weapons systems in ways which allow the stewards of the national stockpile to assess the reliability of the stockpile. The underground test ban, and the resulting shift in how computational models were constructed and tested (and how reliance was placed on those tests), stimulated an interest in the physics community in the validation process itself, leading to publications that became seminal in other fields as they addressed validation. Finally, we review validation in the field of computational linguistics. Language is a social artifact. It can be observed in production, and saved in repositories of written texts, some of which may be transcripts of utterances. The ability to computationally access and manipulate large bodies of similar texts known as corpora (sing.: corpus) has contributed to the development of the field of computational linguistics. These corpora are closed datasets and thus fixed and static. They thus provide a useful contrast with the open domains addressed by systems ecology and physics and the dynamic qualities of the domains of systems ecology.
3.1 SYSTEMS ECOLOGY Systems ecology as a domain applies systems theory to environmental science. Specifically, it applies notions of emergence and dynamics to environmental and related phenomena. It requires the integration of data, structure and dynamics from a wide range of fields, including geological sciences, land use planning, biology, climate science, and the like. Systems ecology makes extensive use of computational models. Some of these models are closely focused on answering a particular problem for a particular place. Others, like the coupled general circulation models being used to predict future climate change, integrate data, structure and dynamics from a wide range of domains, must operate simultaneously over multiple time scales, and are used to address both short-and long-term problems. Validation in the classic sense of establishing that a model is predictive is highly problematic in systems ecology (Beven 2002; Oreskes 1998; Oreskes, Shrader-Freschette and Belitz 1994; Rykiel, Jr. 1996). Many of the reasons validity of these models is problematic are similar to those which challenge the social scientist who works with computational models. Page 20
Turnley, Chew, Perls Beyond Validation
In a frequently cited article in the literature on validation in the earth sciences, Oreskes, Shrader-Frechette and Belitz assert that “the establishment that a model ‘accurately represents the processes occurring in a real system’ is not even a theoretical possibility” (1994:642). This is due partially to the simplifying assumptions that are required in order to build a model, partially to the difficulty of validating what we call the type A model, and partially to other factors. Oreskes et al argue that the best use of models is a heuristic one: “useful for guiding further study but not for proof” (ibid: 646). We will come back to this assertion later in our discussion. In a later article, Oreskes (1998) identifies four types of flaws of a model of any domain: theoretical, empirical, parametrical, and temporal. She illustrates each type with examples from geosciences. We will summarize her typology here with the geosciences examples and add our own from the sociocultural domain. We encounter theoretical flaws when we try to model phenomena we do not fully understand. This flaw can be encountered in our type A models which provide the underlying structure for all type B models. Clearly, many systems ecology phenomena are phenomena we do not understand. If we better knew the natural laws governing the stressors that cause an earthquake like that in Japan in early 2011, for example, we could predict future earthquakes with relatively great precision. Similarly, most sociocultural phenomena are poorly understood, even after over five thousand years of study and examination. Social theory today is highly contested (there are no ‘fundamental laws of human dynamics’) and continuously evolving. Any given social phenomenon can be explained by multiple theoretical constructs. Do people join social movements because they believe in a cause? Because their friends (their social network) belong? Because they derive a good measure of their social identity from belonging? The answer is probably yes to all, to some degree or another. As all computational models are structured by a theoretical base (a type A model), use of contested theory will lead to a contested model. And a contested model will raise questions about the confidence we may place in predictions we can make using the model. When the model application is to such high consequence decisions as high value targeting, key leader engagement, or the design of routes to avoid emplaced improvised explosive devices, anything that challenges or reduces confidence in predictions becomes problematic. Empirical flaws in computational models derive from instances where we cannot fully or precisely measure something, and so a level of uncertainty is introduced into our data. Page 21
Turnley, Chew, Perls Beyond Validation
Beven (2002) states the case quite strongly for ecosystem models. “Clearly, for many environmental systems, the perceived complexities are such that all the boundary conditions, auxiliary conditions and systems characteristics cannot be knowable given current measurement technologies” (Beven 2002: 2467). Many of the subsystems in the systems ecology domain that are measured are subject to measurement uncertainty. How precise were the instruments that were used to measure? Were they calibrated correctly? Were they used correctly? Were measurements accurately recorded? We will discuss the problematic nature of sociocultural data in a moment. However, we briefly note here the parallel problems with an ecosystem and a social computational model. Beven’s statement quoted above holds in the sociocultural domain as well (Beven 2002). Where does a ‘neighborhood’ begin and end? What are all the factors exogenous to the neighborhood influencing interaction within it? And although what you measure in a sociocultural system is defined by your research question, how you measure it is much more problematic if you are measuring anything other than behavior as movement through physical space. How do you collect data on motivations? How do you measure belief or conviction? We note here that rarely are the ‘collection mechanisms’, the measuring instruments, challenged in sociocultural modeling. Data are usually accepted ‘as is’ and any collection bias or methodological flaw in data collection is disregarded. Furthermore, behavioral proxies are often substituted for data that is intangible – belief in God, for example, is modeled as church-going behavior. And as we noted in our earlier work (Turnley and Perls 2008), rarely is the extent of the isomorphism of these proxies described and the cost of the substitution made explicit. Church-going behavior is stimulated by factors much more complex than simply a belief in God. In some communities, the church is the social center of the community. In others, peer pressure to attend is extremely strong, with the threat of social ostracism for those who do not appear on Sundays. And those are only a few examples. The movement of individuals through physical space is the same; the motivators can be very different. How you measure those motivators is far from a settled question. And finally, the nature of relationships between actors is often a key piece of sociocultural data. The fact of the relationship makes up much of a social network. The nature of the relationship – which can be multifaceted between two given actors – is much more problematic, both to identify and describe, and to capture computationally. Empirical flaws are very much at evidence in sociocultural models.
Page 22
Turnley, Chew, Perls Beyond Validation
Parametrical flaws arise when, as Oreskes says, “we reduce complex empirical phenomena to single or simply varying input parameters” (Oreskes 1998: 1457) in order to be able to accommodate them in a computational model. Beven (2002) suggests that modelers handle this by creating a ‘model space’ which is distinct from the ‘landscape space.’ The model space becomes a stylized landscape space, as it were, as each landscape unit is abstracted into a deterministic space. He uses global circulation models as an example. These models must accommodate every unit of analysis in the system as defined. Clearly computational capabilities cannot (for example) handle the chemical composition of every piece of dirt on the ground in New Mexico. Yet the differentiation among those pieces of dirt conceivably can affect global circulation. So modelers aggregate and abstract. For example, landscape units may be grouped into sets or types such as ‘forested upland or ‘southwestern riparian’ where it is assumed for modeling purposes that all units in the set are the same. However, a riparian area that recently experienced a wildfire is not the same, ecologically speaking, as one that has not. Agent-based models do something similar when they create simple agents (agents with a limited number of attributes when compared to a human) who move through a social space defined by a small number of rules. This, of course, significantly underplays the variation in the landscape space. And while that variation could be represented in many cases by uncertainties in the model, computing power puts limits on this. Social network models clearly abstract a limited number of structures to analyze from a dense web of interconnections among agents. And SD models must significantly bound the definitions of the connections among the system elements they represent. Parametrical flaws are very much in evidence in computational social models. And finally, temporal flaws for Oreskes are introduced when the modeler does not incorporate the complete dynamics of the system into the model. Rastetter (1996) adds what he calls sufficiency to this category of flaw, expanding on the rather simple definition Oreskes assigned to her temporal flaw category. Many of the systems in biological, geological, climatological or other relevant domains incorporate processes that may take decades, centuries, or even longer to unfold. Testing a model over a shorter period does not allow the dynamics and influences of these processes to become evident; yet computing hardware and firmware generally cannot simultaneously handle processes that span five minutes and centuries. Yet the output of the five minute process may be a significant input to that which is centuries-long. This is, in a sense, a type of empirical flaw as well, yet it is one that is often overlooked in computational social modeling. Page 23
Turnley, Chew, Perls Beyond Validation
Human systems are certainly subject to temporal flaws. Changes in values, for example, often take generations. The impact of a war fought on one’s home territory will certainly be felt over generations, although it will be felt differently by each individual. The impact of a new technology on a society cannot be seen in the space of a single year. The difficulty of incorporating different temporal cycles in models of an irregular warfare environment, for example, forces the modeler to recognize that consequences of a direct approach (kinetic activities) will reverberate through a sociocultural system very quickly (as well as leaving residuals that manifest over generations), while consequences of an indirect approach (non-kinetic activities) take much longer to appear, often longer than the model time horizon. And the converse is also true to some degree in both human and ecological systems. The way in which an organism responds to a diurnal cycle may or may not have long-term effects. What you had for breakfast this morning may have very little to do with how you respond to a physical challenge next year (although the set of all your breakfasts between now and then might). Almost all who address aspects of the difficulty of modeling the environment emphasize that at least part of that difficulty stems from the nature of the system itself. Environmental systems are open systems – as are social systems. Open systems are those which constantly exchange something of value with their environment such as energy, information or resources. The target system thus is constantly changing, is highly dynamic. The dynamic nature of the system means that unless these dynamics are an integral feature of the modeling approach, the model may present a snapshot of a phenomenon rather than a description or an explanation of its functionality and functioning. Social network analysis illustrates the limitations of the exclusion of dynamics. Each network is a picture – a snapshot – of a part of a whole. There is nothing inherent in that picture of a structure that would tell us why an individual would join or leave a network. Any such rules of change that are applied are learned from sources exogenous to the network itself. The example we gave earlier of some of the difficulties of modeling a neighborhood also illustrate this problem. Where are the boundaries of the neighborhood? Why there and not here? What about things that happen outside the neighborhood? They have influence on dynamics internal to the neighborhood…and so on. Open systems (as well as closed systems) also exhibit characteristics of equifinality, i.e. “many different structures or parameter sets may give simulations that are acceptable representations of the observations available.” (Beven 2002: 2479). That is, different initial Page 24
Turnley, Chew, Perls Beyond Validation
states can lead to the same end state. To illustrate this point through its inverse, for any given rule system in an agent-based system, the output of a model run will be just one point in a space of possibilities for that model. And while running the model thousands of times may give you a distribution of those points, it is still very important to remember that points at the tail of that distribution are still possible outcomes of the model structure and one ignores them at one’s peril. Therefore to assert that a model run produces a given set of data is to assert just that. It does not imply that the model construct is the only one which will produce such data. Do individuals turn to terrorism because of social pressures, prior involvement in ideological groups that subsequently adopt such tactics, or because of some relatively low level of risk aversion? Again, the answer is probably all of them to some greater or lesser degree. Creating a model of how individuals join terrorist groups based on social pressure may yield a result that ‘looks like’ the real world, but nothing in that model structure or its output tells you that other structures may also yield similar results. The system’s environment will determine to which end state a given path will lead at a particular point in time. We will return to this notion later when we discuss model calibration. Emergence is another problem raised for modeling by the highly complex systems in ecology as well as in the social domain. “Putting things colloquially, emergence is the concept of some new phenomenon arising in a system that wasn’t in the system’s specification to start with” (Standish 2001: 3). Prediction as a function of model inputs and rules simply is not possible. Future states can only be imagined. Finally, ecological models are often developed in reference to some particular place or locality. The data set is local, the system definition is local, and the resulting model is checked against a local data set. Although the model may initially be based on some fundamental processes, the constraints placed on those processes as they are parameterized by local constraints reduce the model’s universality. As Refsgaard and Henriksen (2004) point out, for the representation of fundamental (generalizable) processes to be validated in the classic sense of the term, the model would need to be checked against multi-site and multi-variable data. For a model that is explicitly sitespecific (a type B model), carrying out this process is not possible. This ‘uniqueness of place’ as Beven (2002: 2476) phrased it is found in social systems also. While patron-client networks in northern New Mexico and northeastern Afghanistan will share certain characteristics, the social and historical exigencies of the local context may make them look and function quite differently at times. As with the fuel injection model Page 25
Turnley, Chew, Perls Beyond Validation
we described earlier, the computational social model begins with an ‘ideal’ structure which incorporates certain principles of relationship, which is then modified to fit the time and place. In a patron-client model, for example, the principles might be that the exchange of value is asynchronous, is between individuals of unequal power, is usually not symmetric in terms of what is exchanged, and usually creates a very strong social bond. One modification placed upon its instantiation in northern New Mexico is that the relationship is usually between kin. Computational models of environmental phenomena developed with a systems ecology perspective are used for purposes as diverse as establishing regulatory guidelines and operating local fish hatcheries. Beven (2002) opens his critique of ecological modeling with the statement that “The predominant philosophy underlying most environmental modeling is a form of pragmatic realism” (Beven 2002: 2465). If these models are as flawed as the philosophical discussions of validation would have them, how, then, are they useful? The nature of the target system and associated theories constrain the type of model that can be developed. In some areas such as watershed science, required theories of largescale systems are not fully developed. Theories of small sub-systems do not scale in a complex systems environment; complex systems are not amenable to a reductionist approach. And in some cases the instrumentation required for observations and measurements at large-scale level may simply not be developed. In these cases, what is called ‘validity’ relies heavily on an assessment of goodness of fit of a model to an existing data set (McDonnell et.al. 2007; Vogel and Sankarasubramanian 2003). Refsgaard and Henriksen (2004) suggest classifying models as either engineering or research models. Research models (our type A models) do have to be validated through comparison to multiple sites. An engineering model, however, is of a particular site and developed for a particular use. Here, then, once could talk of the validity of the model “within certain domains of applicability and associated with specified accuracy limits” (Refsgaard and Henriksen 2004: 77). Domain-of-applicability and accuracy limits are developed in consultation with the model user. (We will return to the importance of the user later.) We would generally agree with this, although we would say that is a loose use of the term ‘validity.’ We would argue that the validity of an engineering model as just described is an exercise in calibration rather than one of validation, and would clearly be applicable to our type B Page 26
Turnley, Chew, Perls Beyond Validation
model. Type B models are designed for a particular time and place. They thus can provide explanatory depth, but they do not have broad explanatory power. In sharp distinction to a research model, they are not meant to be generalizable. As such, they are not required to be subjected to the multi-site, multi-variable tests that would constitute a full validation cycle. Goodness of fit to a particular set of data is sufficient. The model is calibrated, not validated. However, this means that any major perturbation to the target system renders the model that much less useful. So we find that the dialogue in systems ecology about the validation and use of computational models has identified some areas of interest for us. The open (unbounded) nature of the system which allows the system’s dynamic nature leads to significant concerns related to its computational representation. Equifinality also needs to be recognized and model results qualified accordingly. Systems ecology has addressed it when applying a model to a specific problem by considering these models as of a different type. Once they are labeled such (and clearly not identified as research models), they can be fit to a particular data set, i.e. calibrated, not validated. No claims are made for their universality. Summarizing the above discussion, we find that systems ecology has the characteristics listed in Table 2 : Table 2: Domain characteristics of systems ecology
Reductionist? Open or closed systems Variance by locality Theoretical flaws / equifinality Empirical flaws Temporal flaws Parametrical flaws
No Open systems Yes Frequently encountered Frequently encountered Yes Yes
Together, these significantly complicate the process of classic validation.
3.2 PHYSICS The physics domain has had decades of experience with computational models of universal principles and their application to domain-specific problems. The ban on underground testing of nuclear weapons instituted in 1996 through the CTBT and the resulting increased reliance on computational simulations of various aspects of nuclear explosions, stimulated an interest in the physics community in the validation process itself. This led to Page 27
Turnley, Chew, Perls Beyond Validation
publications that became seminal in other fields (such as systems ecology) as they addressed validation. Oberkampf and Trucano (2002, 2007), Oberkampf, Trucano, and Hirsch (2003), and Oberkampf and Barone (2005) have prepared extensive reviews and critiques of the physics literature on validation, critiques which are referenced in the systems ecology literature as applicable to models in that domain also. Supporting Oreskes et al’s (1994) belief that the establishment of isomorphism between a model and system in the world is “not even a theoretical possibility,” Oberkampf et al state that “all-encompassing proofs of correctness, such as those developed in mathematical analysis and logic, do not exist in complex modeling and simulation…[computational] models of physics cannot be proven correct; they can only be disproved.” (Oberkampf, Trucano and Hirsch 2002: 13). They argue that a model can never be fully validated because it can never be compared to experimental data from all possible situations, even including the intended use case. So there will always be some measure of uncertainty as a model is moved from an experimental environment to a use environment. So let’s begin with the importance of the experimental environment. If validation “addresses the question of the fidelity of the model to specific conditions of the real world” (Oberkampf et al. 2003: 9), those conditions must be first observed and documented. This is the role of the experimental physicist. Oberkampf et al. go on to say that “this strategy does not assume that the experimental measurements are more accurate than the computational results. The strategy only asserts that experimental measurements are the most faithful reflections of reality for the purposes of validation.” (Oberkampf et al. 2003: 10). Note that an experimental approach such as that described in the previous paragraph is dependent upon a domain that is amenable to a reductionist approach. One can start by validating models of the smallest elements and then combine these elements into more complex structures. This building-block approach (described by Oberkampf et al. 2003 with many references to the existing literature base) works well for linear closed systems. In physics, for example, the ‘world’ can be disaggregated into its smallest chunks without doing violence to the whole. Experiments can be performed on these small chunks of ‘world’ to generate data that look like the real world. Models can be built of, and validation tests run on, these small chunks. The chunks then can be reaggregated according to known principles. The validity of the whole is a function of the validity of the parts. However, as Oberkampf et al. (2003) do point out, the process does break down as systems become Page 28
Turnley, Chew, Perls Beyond Validation
more complex and dynamic. The ties between the tiers become much weaker and more problematic. We have already mentioned that the reductionist approach is generally inappropriate for a complex dynamic system such as an ecosystem or a sociocultural system. Dependence on experimentation as a key element in a validation methodology raises problems for those who work with computational social models. First, of course, are the ethical issues around experiments on human populations. Ethical considerations preclude any experiments which might put individuals at risk; for example, we cannot provide a benefit or remove a cost for one population and leave a second (our control population) untouched to be able to watch the effects of our intervention except under extremely constrained circumstances. The Tuskeegee syphilis experiments showed the social cost of unmonitored behaviors in this area (Jones 1992). The second problem stems from the systems nature of the human social domain. As we said earlier, when working with complex systems, it may not always be possible to disaggregate them into their component parts, understand the components, and then reaggregate them into a full system. Once an element is removed from a system, its nature and that of the system may change fundamentally. But back to physics. We begin with experiments as “the most faithful reflections of reality for the purposes of validation.” The data generated by computational models is then compared against that developed by experiments. Some quantifiable measure of (or lack of) isomorphism is developed which represents the ‘accuracy’ of the model under a specified set of conditions. The accuracy measurement must account for uncertainty in both the experimental and model data. Ideally, data from a particular model would be compared against an infinite number of experimental data sets representing all possible conditions to determine the universality of the principles the model embodies (see, for example, Oberkampf, Trucano, and Hirsch 2003). As that is obviously not possible, the model will be compared against some subset of experimental datasets. It will meet some user-determined accuracy threshold in each subsequent test – until it does not, or until someone decides that it has met the threshold in enough tests to be considered ‘accurate enough.’ At this point, having met the accuracy threshold in n datasets, some determination (measure) is made of the likelihood that the model will meet the accuracy threshold in the n+1th set of conditions where the last set could be the use environment. The result of the validation process is a measure of confidence that the model-generated results will agree with the next set of ‘real-world’ data it is given. As they say about Page 29
Turnley, Chew, Perls Beyond Validation
swimming pools – you never really know if there’s water in there until you jump – so one never really knows if a model will work in a use environment until it is used there. Using computational models is a risky business. The validation process is designed to reduce that risk, but it will never eliminate it. To paraphrase Oberkampf, Trucano and others who have written in this field, we suggest there are three key issues for validation in computational physics. The first is the quantification of the accuracy of the computational model through its comparison with experimental data. Accuracy is a measure of the likelihood that the model will fit the n+1th set of conditions. This allows us to derive a confidence measure. This confidence measure is not prediction. It does not allow us to say if a then b. However, it does allow us to say that if a, we are X% confident that b will follow. Accuracy, the measure of the likelihood that the model will fit the n+1th set of conditions, is the second issue to which Oberkampf and others refer, the extrapolation of the model to the conditions of the use environment. As noted above, a model can never be fully validated because it can never be compared to experimental data from all possible situations, including the intended use case. A model thus can only be invalidated, not validated, just as a scientific theory can only be disproved, not proven (Oberkampf and Trucano 2002) – a fact which is perhaps at odds with the existence of definitions by the DOD, DOE and others for ‘validation’ rather than ‘invalidation’. So there will always be some measure of uncertainty when a model is moved from an experimental environment to a use environment. The uncertainty can be expressed in a number of ways, but its expression always takes us to the third issue. The third key issue is to determine whether the model is accurate enough for the specified use case. As an example, we may have a situation where a model has met the accuracy constraints we have put upon it so that we can say that we are 95% confident that next iteration of the model will also yield an accurate (i.e. ‘true’) result. Whether that level of confidence is sufficiently high for the model to be applied in the environment of its intended use depends upon factors exogenous to the model such as the cost/benefit of an inaccurate or ‘wrong’ answer. In the nuclear weapons world the cost is very high. Therefore, confidence rates must also be very high. In counterinsurgency environments, actions around key leader engagement – identification of true power brokers in a particular village, for example – will require a lower level of confidence in a model than high value targeting where the goal is to kill or otherwise eliminate a person.
Page 30
Turnley, Chew, Perls Beyond Validation
This all introduces the model user into the validation process. Someone makes a determination that the model is ‘accurate enough.’ That someone may be the body politic as in the case of nuclear weapons, or it may be a planning community is in the case of other models used for national security purposes, or it may be an individual. This introduces the user and, consequently, a great deal of variability into the validation equation. We will return to this point later. Validation as we have described it is a process of continually comparing a model with new sets of data until the user believes the process has gone on ‘long enough’ and the model is sufficiently accurate for its intended use. This qualitative and user-centric nature of the validation process and its output has been addressed by many in a variety of domains (for examples from physics and the geosciences respectively, see Oberkampf and Barone 2005 and Oreskes and Belitz 2001). Konikow and Bredehoeft (1992: 78) point out that “one competent and reasonable scientist may declare a model as validated while another may use the same data to demonstrate that the model is invalid.” Ultimately, the validity of a model is a function of the likelihood (probability) that the model will produce conforming results on the use case run given the nature of that use case and the user’s risk attitude. As we stated in our introduction, validity is a function of the relationship between a user (which may be an individual or a community of practice) and a model, not a characteristic or state of the model itself. In summary, we find that physics as a domain has the characteristics outlined in Table 3: Table 3: Domain characteristics of physics
Reductionist? Open or closed systems Variance by locality Theoretical flaws / equifinality Empirical flaws Temporal flaws Parametrical flaws
yes Closed systems No Not an issue Sometimes encountered Not an issue Not an issue (reductionist approach obviates the need)
Together, these characteristics mean that classic validation is generally unproblematic in physics.
Page 31
Turnley, Chew, Perls Beyond Validation
3.3 COMPUTATIONAL LINGUISTICS Human language is clearly, if not a social phenomenon, then at least a phenomenon which has a great deal to do with the social sphere. If we move away from a Saussurean interest in semantics (the relationship of arbitrary sets of symbols to the real world) and a sociocultural linguistic interest in language in use, we can focus on produced language sets known as corpora. Arguably this would be analogous to an already produced set of behaviors which could be modeled computationally. This and other factors make computational linguistics (or equivalently, computational language modeling) a good candidate as an analog for computational social modeling within the context of our present study. A key pertinent difference between most computational social models and computational linguistics, as both are generally practiced, is that computational linguistics has clearly established methods of assessing the goodness of models, while the very crux of the present inquiry is whether it is even possible to establish metrics allowing validation of computational social models. This difference leads naturally to the question of whether the dissimilarity is in the domain or the method which gives rise to the possibility of validation on the one hand but not on the other. In this section we shall explore what computational linguistics is, how validation is practiced in the field, and what inferences (if any) we can draw for validation in the field of computational social modeling. Computational linguistics is an interdisciplinary field dealing with statistical and/or rulebased modeling of natural language (i.e. human language, as opposed for example to computer programming languages) from a computational perspective. As a field of study, computational linguistics traces its roots back several decades to the period when machine translation was first attempted in the 1950s (with limited success, we should note). In pragmatic terms, the impetus for increased research in the area was partially military (the US government realized early on the potential for automatically decoding foreign-language material during the Cold War), and the enabling factor was that mainframe computers were coming into their own at the time. However, the approach taken in the early days to computational modeling of language was very different to the more successful variants which have emerged in the last decade or two (variants which are, incidentally, much more heavily reliant on classic validation techniques), and the change in thinking mirrors changes which have taken place in computing in general. In the latter, memory and storage have become significantly cheaper, a trend which was predicted by Moore (1965) and which has continued until today. Though machine translation was definitely one early goal, Page 32
Turnley, Chew, Perls Beyond Validation
it should be emphasized that modeling in computational linguistics is not limited to machine translation or any particular subfield of linguistics. Focusing again on machine translation, it has only been in approximately the last two decades that significant progress has been made. This has been enabled by a ‘data-driven’ approach to language processing in general in which a set of rules, or more properly a conceptual model of language, emerges from the data itself. This approach contrasts sharply with the Saussurean or Wickensteinian tradition (see, for example, Koster (1996) or Holland 1992), in which language is a system of symbols, arbitrary combinations of concepts and sound images that create systems with internal logic and arbitrary connections to real-world referents (semantics) (De Saussure 1959 (1916): 65ff). A key feature of data-driven models is that they actually tend to improve the more data is given as training material. Examples of current and highly-successful data-driven approaches are Statistical Machine Translation (SMT) (Brown et al. 1994), Latent Semantic Indexing (LSI) (Deerwester et al. 1990) and Latent Dirichlet Allocation (LDA) (Blei et al. 2003). It is here that perhaps the first clues become apparent as to where the analogy may break down between computational linguistics and computational social modeling. As we have shown, the social domain (like that described by systems ecology) is an open system, with many and heterogeneous factors influencing a given system. In some cases, these factors may even be unknowable, or if they are known, it may not be feasible to collect relevant data. In other cases, where target features of the system are unobservable (such as motivation, beliefs, etc.), it may be necessary to collect highly imperfect surrogate data (such as church attendance). In general, then, computational linguistics follows the same validation path described earlier in the section on physics. The validation exercise involves testing the model against as many data sets as possible, and to stop testing when the tester (user) is reasonably certain that test n+1 would yield the same results as test n. Sommerstein’s (1977) statement with regard to phonology (a branch of linguistics) supports this notion of validation as process, and the statement made by Oberkampf et al that a model can never be validated but only disproved: In science we frame and test hypotheses. It does not matter in the least how these hypotheses are arrived at in the first place; it is the exception rather than the rule for an interesting hypothesis to be reached by a mechanical Page 33
Turnley, Chew, Perls Beyond Validation
procedure, such as phonemic analysis essentially is. Rather, what makes a hypothesis scientific or unscientific is whether it can be stated what kind of empirical evidence will tend to disconfirm it, and what kind will definitely refute it. And there is no reason why this general scientific principle should not be valid for phonological analysis. (Sommerstein 1977:9) Interestingly, there may be cases where we want to ignore the counter-examples. Chomsky and Halle (1968: ix) explain as follows why ignoring counter-examples may be justified: We see no reason to give up rules of great generality because they are not of even greater generality, to sacrifice generality where it can be attained. It seems hardly necessary to stress that if we are faced with the choice between a grammar G1 that contains a general rule along with certain special rules governing exceptions and a grammar G2 that gives up the general rule and lists everything as an exception, then we will prefer G1. For this reason, citation of exceptions is in itself of very little interest. Counterexamples to a grammatical rule are of interest only if they lead to the construction of a new grammar of even greater generality or if they show some underlying principle is fallacious or misformulated. Otherwise, citation of counterexamples is beside the point. Put another way, a model which accounts accurately for 99.5% of cases may still be useful, even if the 0.5% of counter-examples have to be listed as exceptions. Computational linguistic models can have multiple functions. One function might be to assist a language learner: for example, a model describing the formation of plurals in English could be helpful to a learner of English who is familiar with general concepts in linguistics, but not the specifics of English, in knowing when to use ‘s’ or ‘es’. Another function might be to serve as a basis for programming a computer to recognize which (singular) dictionary form a particular plural is associated with, an application which could have a variety of practical uses. Classical validation (such as measuring, in percentage terms, the accuracy of the theory) is eminently suited to assess how well models perform functions such as these. A third but by no means least important function of computational linguistics is to inform through the very process of formulating the model and examining the intuitiveness and correctness (or otherwise) of its predictions. Linguists formulate rules from their knowledge of how languages in general work. In other words, by formulating a theory (a type A model) and evaluating its predictions, the linguist will be able to formulate better Page 34
Turnley, Chew, Perls Beyond Validation
models in future. Related to this is the criterion of linguistic generality alluded to by Chomsky and Halle (ibid): a general model may be preferable to one which simply lists everything, even if the latter has higher accuracy. It is hard to conceive of a classical validation strategy which would assess a model according to the criterion of linguistic generality, but it is precisely here that the danger of relying solely on classical validation becomes clear. Classical validation would lead us to prefer the heuristic-based, exceptionlisting model which matches our specific case best (arguably, one could say it has been calibrated through the listing of exceptions) over the linguistically smarter model which might be more useful in cases other than the particular use case at hand. It is not enough to say that assessing the goodness of a model in terms of linguistic generality is hard, and that we should rely only on the forms of classical validation which are easier to pin down, as this clearly creates a ‘blind spot’ in how we view competing models. Furthermore, logically we should not exclude that there could be cases, even in computational linguistics, where classical validation is impossible. Perhaps an area to look to for examples of this might be sentiment analysis, an area within computational linguistics which also brings us closer to the social domain. Pang et al. (2002), a very widely cited paper on sentiment analysis, presents an apparently impressive comparative empirical evaluation (based on accuracy) of various approaches to sentiment analysis. Yet despite this, they never really define what sentiment is. Instead, in their paper, positive and negative sentiment are in effect characterized by exemplars of each. This is an instance of using proxies (the exemplars) for the target datum as we mentioned earlier. We now illustrate more clearly the pitfalls of such an approach. This discussion is of interest to the national security community as the ability to abstract ideology and recognize deception, two areas of particular interest to the national security community, depend on accurate assessments of sentiment. Hence a significant amount of work is currently being funded in this area under the rubric of ‘sentiment analysis’. Yet if the efforts rest on the shaky foundation of unclear definitions, all the empirical evaluation in the world may be of little use. The documents which Pang et al classify are on-line movie reviews each associated with a reviewer rating, such as a number of stars. At a high level, the goal of their approach to sentiment analysis is to predict positive or negative sentiment (high and low star ratings respectively) based solely on the words (and sometimes punctuation) used in the reviews. The way this works is that an algorithm is trained to recognize sentiment based on examples from the data (in a way similar to the recent approaches to machine translation Page 35
Turnley, Chew, Perls Beyond Validation
already discussed). The output is validated by reserving some of the examples for testing, and determining in what percentage of cases the algorithm makes the correct predictions; this kind of approach is standard in the field of data mining. So much for the general approach; but what of the underlying assumptions? Note that Pang et al. operate on the assumption that a high number of stars can automatically and uniformly be translated into positive sentiment, or a low number of stars into negative sentiment. One reviewer can be unhappy with the production quality but really enjoy the acting, while a second reviewer feels exactly the reverse. Further, there is the assumption that my ‘four out of five star’ rating reflects the same intensity of sentiment as yours. All the analysts can really say about any number of stars is that the reviewer had some degree of like/dislike about something related to the movie. This is a critical point, because the star ratings are, after all, treated in this case as the ‘ground truth’ for the purposes of validation. The significance of this becomes apparent when we use a different example. A reviewer can give President Bush a negative review after a speech because he does not like Mr. Bush’s delivery style – but this does not mean that he disapproves of the way Mr. Bush was managing the presidency. The referents of the ‘sentiment’ often are not clear, creating significant ambiguity in the analysis. In summary, their ‘computational sentiment model’ uses some methods from computational linguistics and does purport to be measurable in terms of fit to the data, yet it also rests on questionable assumptions such as that number-of-star ratings are a reasonable analog to ‘sentiment’. Yet a crisp definition of ‘sentiment’ is surely more elusive than that. How does sentiment differ from topic? From ideology? Is deception a type of sentiment? Is it possible that Pang et al.’s success in classifying documents by ‘sentiment’ is indirectly due to the fact that their positive and negative movie reviews also separate relatively cleanly along the dimension of topic, as suggested by Kegelmeyer et al. (2011)? With techniques of the type they use, the answers to such questions are opaque. We selected computational linguistics as a domain of comparison because language is a social phenomenon. However, we find that the data sets upon which most computational linguistic models are tested are closed static corpora in which words are treated as objects independent of the Saussurean meaning that arises from some connection of them to the world. For modern computational linguistics in general, theories are data-driven, arising from patterns inherent within the corpus. Once a theory (rule, type A model) can be formulated, the validation process is very similar to that in physics. The model is tested on n corpora until the tester (user) believes that to compare it to the n+1th corpus would yield Page 36
Turnley, Chew, Perls Beyond Validation
the same results. Results are expressed as the average accuracy, which can also be thought of as the degree of isomorphism. We note that computational linguistics does allow for models to be useful if they yield a high but not perfect isomorphism and the exceptions are listed. Problems begin to arise for computational linguistics models (as for other mathematicallybased models) when the target of those models is harder to define crisply (see the sentiment analysis example above). We suggest that the problems arise because the models are now engaging with a very different domain. They no longer are targeting corpora (although those may be the data sets used) but are aiming more directly at the complex, dynamic and open system that is sociocultural systems. Computational linguistics as a domain is qualitatively different for modeling purposes from systems ecology and physics. We can see difference this illustrated in Table 1. Again, the import of this is that classic validation is generally unproblematic in computational linguistics. Table 4: Domain characteristics of computational linguistics
Reductionist? Open or closed systems Variance by locality Theoretical flaws / equifinality Empirical flaws Temporal flaws Parametrical flaws
Yes Closed systems Yes Not relevant Not relevant Yes (no temporality) Yes
We end this section with the description of one more use of models in computational linguistics that is outside the scope of classic validation methods, and highly relevant to socio-cultural systems. As Shieber (1985: 192) points out, computational modeling of language has more than one advantage over its non-computational counterpart: the computer can act as a straitjacket (forcing rigorous consistency and explicitness), touchstone (indicating a theory’s correctness and completeness), and mirror (objectively reflecting everything in its purview): ...we have found that among those who have actually attempted to write a computer-interpretable grammar, the experience has been invaluable in revealing real errors that had not been anticipated by the Gedanken-
Page 37
Turnley, Chew, Perls Beyond Validation
processing typically used by linguists to evaluate their grammars - errors usually due to unforeseen interactions of various rules or principles. Clearly, there are analogs here to computational social modeling. If there is a lack (in either linguistic or social models) of either (1) clarity about the nature and scope of a theory’s predictions, or (2) data on which the theory can be tested, then perhaps the greatest value of computational implementation is not in yielding empirical evaluation (acting as touchstone or mirror), but rather in serving as a straitjacket. While the general scientific principle that a model should be falsifiable dictates that classical validation and empirical testing should be carried out upon a model where possible, we must concede that other criteria for assessing a model’s goodness are also important, namely that the model should be in line with general (linguistic or social) theory, and that it should be internally consistent. And if classical validation is not possible, we should not necessarily throw a model out as useless: using alternative methods of assessing model goodness may be sufficient, providing we are clear on what the model is good for.
3.4 SUMMARY In this (section 3), we have compared social science to three other domains, particularly with respect to the issue of validation. Table 5 summarizes this comparison and shows how what we learned from the other domains transfers into the sociocultural realm.
Page 38
Turnley, Chew, Perls Beyond Validation
Table 5: Validation in other domains
Physics
Theoretical flaws
Not an issue - In most cases, fundamentals of system behavior are understood ('fundamental laws of physics')
Empirical flaws cannot precisely measure something contributing to uncertainty
Empirical flaws exist, but only to a marginal degree. Phenomena are much better understood and bounded than in comparison fields
Computational linguistics Not an issue where computational linguistics adopts a data-driven approach that lets theory emerge from data, rather than be assumed a priori and incorporated in model structure
Not an issue in a data-driven approach
Systems ecology
Sociocultural domain
Theoretical flaws often encountered include: Phenomena are not fully understood so there are many possible model structures
Empirical flaws often encountered include: How were data defined? How were they measured? Can we identify all exogenous factors that influence our target data?
Parametrical flaws reduce complex phenomena to simple parameters
Not an issue: reductionist approach obviates the need
Not an issue in a data-driven approach
Parametrical flaws are encountered
Temporal flaws complete dynamics of system are not modeled
Not an issue: Reductionist approach removes the problem
Not an issue: System is static
Temporal flaws are encountered
Parametrical flaws are encountered when complex social systems are modeled by creating agents that are extremely simple in terms of number of defining attributes and rules to generate behavior Temporal flaws are encountered when social systems have phenomena operating according to a multitude of timescales ranging from seconds to centuries - computational models make a choice of which to capture
Page 39
Turnley, Chew, Perls Beyond Validation
Physics
Open/closed system
Effectively a closed system (reductionist approach sidesteps the problem)
Equifinality - many different inputs may lead to the same output - difficult to assert causality
Computational linguistics Closed system: Target data set is a closed system (although inferences are made to the full language system, they are not made until the model is validated on a closed system)
Systems ecology
Sociocultural domain
Systems are open systems: the system of interest will exchange energy with its environment and so be conditioned by it to some degree
Not an issue in a data-driven approach
Systems exhibit equifinality
Reductionist?
Systems amenable to reductionist approach - can disaggregate them, perform experiments on small parts, and reaggregate with no loss of fidelity of parent system
Not relevant in a data-driven approach; modeled domain has no dynamics
Systems not amenable to reductionist approach (because they exhibit emergence)
Variance by locality
No - basic units do not vary by time and place
Not relevant in a data-driven approach
Yes - systems or models developed in reference to a particular place or locality
Page 40
Turnley, Chew, Perls Beyond Validation
The key point to be drawn from Table 5 is that sociocultural systems share most features in common with ecological systems on dimensions of concern to the application of validation methods.
They both deal with highly complex, dynamic, and open-ended systems Much of their subject matter is not directly observable Their theoretical (and methodological, in some cases) base is still developing
This places social science (and systems ecology) in sharp contrast to physics and most of computational linguistics, domains with limited complexity and minimal dynamics associated with that complexity. The reductionist nature of the physical domain allows the generation of experimental data which facilitate the comparison of the computational models against multiple data sets. Much of computational linguistics, the target data set (a corpus) is closed, static and complete, and the problems can be formulated in a way which lends themselves easily to mathematical and logical expression (for example, there is almost always a single right answer as to whether a plural in English is formed with ‘s’ or ‘es’). In a general sense, there is no theory constructed a priori. But when computational linguistics attempts to target and measure more elusive phenomena as it does in sentiment analysis, validation of the models can become highly problematic. This is because it is here that they engage the social domain: that world of highly complex, dynamic, open-ended systems, much of which lies in the non-observable realm. All three domains compared above to social science have a significant literature on validation. The differences between how the domains address validation mirror the differences between the domains themselves. In the physics and computational linguistics literature, the discussion of validation methodologies centers overwhelmingly on ‘classic validation’ – the process of continually comparing a theoretical model with new sets of data until the user believes the process has gone on ‘long enough,’ that the model has been proven ‘accurate enough’ and he is ‘confident enough’ that the model (the theory) will hold when applied to the use case. In systems ecology, there is also a robust (but critical and self-reflexive) ‘methods’ literature on validation. While the classic validation process works well on clearly defined and bounded data sets, what we find from the systems ecology literature is that the nature of the target domain can call the whole validation process into question. Problems such as equifinality and the temporal issues Oreskes raised make models of certain complex systems (including natural and social systems) difficult to validate. Hence, when we look for computational models in Page 41
Turnley, Chew, Perls Beyond Validation
these domains, we often find type B models – models of particular instances – which are calibrated rather than validated. Type A models also are often difficult to validate in the social domain partially because of the difficulty of experimenting on or managing human populations in the ways necessary to control for relevant variables. The data we encounter in the social domain also leads to difficulty in modeling. The sentiment analysis example from our discussion of computational linguistics illustrated the confusion that can arise when the referent of the system element (in this case, sentiment) under study is ill-defined. For example, it is unclear precisely what a movie rater intends with four vs. five stars. This frequently happens when the target of study is unobservable and ill-defined and addressed through behavioral proxies as it is in many sociocultural computational models. So we have answered our first question: Are there characteristics of the sociocultural domain that make certain aspects of it very difficult, if not impossible, to model computationally in ways that would allow those models to meet the standards of proof required by classic definitions of validation? We believe the answer is yes, for all the reasons Oreskes listed and then some. There are theoretical, empirical, parametrical and temporal flaws in the models that must be addressed. Currently many of these flaws are simplified away or ignored in the sociocultural domain, as in the case of data quality. There also is the inability to conduct reasonable experiments on the target domain (the socio-cultural domain). Finally, there are all the challenges that come along with a complex adaptive system such as emergence, equifinality, and the openness of the data set with its consequent indeterminate boundaries. For all these reasons, the computational social models in which the national security community is interested generally are what Refsgaard and Henriksen (2004) called engineering models. They begin with some general principles but are readily modified according to the requirements of the local data set. While useful as explanatory models for that particular time and space, there are significant limitations on the generalizability of these models. For example, a computational model could be constructed of the ‘left of boom’ space, the social interactions that lead up to the execution of violence, in Afghanistan in 2011. It could be fitted to local data, i.e. calibrated. However, if that model were to be Page 42
Turnley, Chew, Perls Beyond Validation
validated it would need to be exercised in communities as diverse as (e.g.) Afghanistan, Colombia, Sri Lanka and Mexico and over time in each geopolitical space. If it were so exercised, the accuracy of the model (the principles used to construct the model) could be determined and a measure of confidence assigned to it. The analyst would then have a guide as to its potential explanatory power for a new domain such as, say, India. Such a validation exercise would assume, however, that we had comparable data from these different locations for testing. However, most of the areas in which DOD works (and for which we would like to be sure the computational social models are valid) are areas experiencing combat, difficult to access physically, and/or populated by closed or difficultto-penetrate social groups. The likelihood of constructing such data sets is very low. Furthermore, a model that embodies principles of human interaction that can be applied in places as diverse as Afghanistan, Colombia, Sri Lanka and Mexico necessarily would be at such a level of abstraction that it would not be helpful in tactical and perhaps even operational campaigns. While it might have utility for DOD in a strategic environment, the latter is a far different and much more abstract environment than those in which computational social models are used and exercised today. That said, there are indeed computational models of the social domain which serve as research models. Sugarscape (Epstein and Axtell 1996) and the agent-based segregation models (Schelling 1969) are examples. As we can see, however, these two models are extremely abstract models – yet they can elucidate some principles of human interaction. These principles now need to be tested by data from multiple times and places in order to validate those computationally represented theories. There also are computational approaches in the social domain that have implemented ways of working with social phenomena whereby some of the pitfalls described above can be avoided. When computational linguistics operates on a closed static set, it is similar to these approaches. This data-based approach has yielded some useful and interesting contributions to the study of language. Social network analysis takes a similar data-based approach, although the nature of the data is different than that of computational linguistics. For example, there are not yet social network analysis tools which can extrapolate the size of a total population from fragments of data. Computations upon a particular data set using social network analysis tools will yield useful information about that data set. However, the structure that set yields will be perturbed every time a node is added – and the initial structure gives the analyst no clues as to the nature of the perturbation. Page 43
Turnley, Chew, Perls Beyond Validation
Before we move on to the next chapter in our story, it may be useful at this point for us to remind ourselves what we mean by validation. We have argued that validation is a process of comparing a model to new sets of data until the user is confident enough that it will produce reliable results in the use domain that he is willing to apply it there. Validity thus is a function of the relationship between a user and a model, not a characteristic or state of the model itself. This, then, raises questions about the nature of the decision for which the model is developed (the use domain), and ‘confident enough’ translates into a determination of the riskiness of that decision. The use domain and the definition of risk are socio-cultural questions, questions about a user immersed in a particular use environment. So far, we have mostly talked about models. We would like now to turn attention to the users.
Page 44
Turnley, Chew, Perls Beyond Validation
4 WHAT ROLE DOES THE USER PLAY IN VALIDATION? 4.1 THE USER If validity is a function of the relationship between a user and a model, it would be useful to understand both parts of this equation. A great deal of attention has been paid to the model in the validation process literature, but not much to the user. McNamara (2010:10) argues that “the main benefit of V&V [verification and validation] is not (perhaps counterintuitively) increased focus on the model but the contextual issue of how the model will be used and, therefore, how the organization and its members identify what decisions they are responsible for making and how they negotiate acceptable levels of risk.” The interesting question is then how the user engages with the model. In this section, we center a discussion of the engagement of a user with a model around users’ learning styles, with particular emphasis on cognitivism and constructivism. We shall argue that the role of a computational model will depend upon the user’s preferred learning style and the context of use for the model, where a user may be an individual or an organization. In particular, a model used for prediction will be best suited to a user with a cognitivist learning style, generally embedded in an environment where it is assumed the future will look much like the past, the social environment is resistant to change, and the cost of failure is high. A model used for creative analogy will be best suited to a user with a constructivist learning style operating in an environment open to change, and where conditions are recognized to be highly fluid. Once this is recognized, it should be evident that this has implications for validation, because, as we have already pointed out, validation takes into account the intended uses of a model. All the domains we addressed in the previous section recognized the process nature of validation. It needs to be emphasized that in no case did we find validity to be a property or characteristic of a model. In all cases it was defined as a description of an outcome of a process: the likelihood that the model would have the same relationship to the n+1th data set (which may be the use case) as it did to the nth data set. This is a measure of the model’s accuracy, and is related to a user’s confidence in the model. Whether or not that level of confidence was sufficient for action was a function not only of the model, and not only of the process, but of the user.
Page 45
Turnley, Chew, Perls Beyond Validation
If we shift our focus from the model to the user, we can identify several ways in which models can be of use as well as clarify how we can identify their goodness in different use scenarios. This will move us toward an answer to our second question:
What usefulness (if any) do social models have besides prediction?
So let us focus on the user. If validity is a function of the relationship between a model and its user, understanding how the user engages with models and learns from them what he might apply is important to our discussion. Since we are interested in how people learn and apply something from models to help in analysis and decision-making, we will present a short a very brief overview of how people learn. We will discuss three general approaches to learning: cognitivism, behaviorism, and constructivism. We will talk about hedgehogs and foxes, and then move into a short discussion on creativity as this arguably has a place in decision-making. We will close this section with a brief overview on risk and risk perception - a nice counterbalance to creativity, and reminding us of the McNamara quote where she argues that validation increases the focus on “how the organization and its members…negotiate acceptable levels of risk” (ibid.).
4.1.1 LEARNING STYLES The fox knows many things, but the hedgehog knows one big thing.- Archilochus 5
To speak very broadly, there are three relevant approaches to learning, each of which will be addressed in a little more depth in this section. In one, generally labeled cognitivism (see, for example, Atkinson and Shiffrin 1968; Dewey [1910] 2010; Ormrod 2003), the learner is a passive agent who receives context-free information from his environment. His brain processes the information, which can be explained through measurement and experimentation, and incorporates this information into existing schemata (or uses it to create new schemata). These schemata, in turn, drive his behavior. A second learning approach, constructivism (see Bruner, Goodnow, and Austin 1956; Ormrod 2003; Piaget 1950; Richardson 2003), argues that we generate or create knowledge and meaning from the interaction of existing knowledge with new knowledge or experience. Context is critical, and the history of the individual learner matters. The active agent is the learner, not the information provider. The third learning approach is behaviorism, advocated by
This is a fragment of verse by Greek poet Archilochus which was made famous in modernity by Sir Isaiah Berlin in his essay on Tolstoy from his book "Russian thinkers". 5
Page 46
Turnley, Chew, Perls Beyond Validation
those like B.F. Skinner (1976, 1979, 1983) and Edward Thorndike (1903, 1904, 1905) where learning occurs through conditioning. Clearly, most learning situations involve some of all approaches. 4.1.1.1 Behaviorism Behaviorism is not highly relevant to our discussion of models. However, we include a short description of it here for the sake of completeness of our discussion of theories of learning. Behaviorism posits (not surprisingly) that everything is essentially behavior. The school of thought maintains that behaviors can be described scientifically without turning to internal physiological events or abstract ideas like mind. It takes the position that all theories should have observational correlates, and that there are no philosophical differences between publicly observable processes (such as actions) and privately observable processes (such as thinking and feeling). Pavlov (1927) and Skinner (1976, 1979, 1983) are the iconic examples of this school of thought with their work in behavioral conditioning. Behaviorism emphasizes learning through questions/stimulus leading to answers/responses exposing the student to the subject in gradual steps (Pavlov 1927). The smaller the learning steps the better, and there should be as close to instant feedback as possible for every exchange (Watson 1913). It also argues for positive reinforcement, so the best learning environment will present the learner with increasingly difficult problems. Additional positive reinforcement is important (Watson 1913). Positively reinforcing desired behavior is more important than negatively reinforcing unwanted behavior. Reinforcement/responses will generalize and be applied to similar stimuli (Pavlov 1927). 4.1.1.2 Cognitivism Cognitivism (see, for example, Atkinson and Shiffrin 1968; Dewey [1910] 2010; Ormrod 2003) is a positivist theory which argues that human psychology can be explained through measurement and experimentation. The learner is a passive agent who receives contextfree information from aspects of his environment which serve as teachers or information providers. His brain processes, which can be explained through measurement and experimentation, incorporate this information into existing schemata or use it to create new ones. These schemata drive his behavior. The success of the learning process is measured by changes in behavior. Page 47
Turnley, Chew, Perls Beyond Validation
In a cognitivist construct, teachers profess rather than facilitate. We will suggest that treating a model as a tool to generate a prediction is to treat it in a professorial (or as we shall say, advisory) role. It provides information (a prediction) to a learner who then internalizes it and acts upon that internalization. We would suggest that this type of learning is similar to that described first by Berlin ([1953] 1993) and then by Tetlock and others (Tetlock 1998, 2005; Suedfield and Tetlock 2001) as the ‘hedgehog’ cognitive style. A hedgehog will attempt to organize the world in self-consistent models or schemata (Archilochus’s ‘one big thing’), and then apply all experience and all future expected experiences to these models (Chiu et al 2000; Kruglanski and Webster 1996). The schemata are used to predict the future, describe present experiences, and retroactively reinterpret the past (Hawkins and Hastie 1990; Tetlock and Belkin 1996; Tykocinski et al 2002; Campbell and Tesser 1983), and even engage in what is called “anticipatory hindsight” (Fischoff 1975). Arguably, this world view tends to depend heavily on verification. Once the models or schemata are constructed, experience is expected to match these models. Those holding this worldview are inclined to see the world as a series of near-misses when their predictions are faulty, and value more highly counterfactual realities that might, or ought to have happened according to their operating models. Those holding this mindset generally attribute their faulty predictions to insufficient data, rather than a faulty model. The road to improvement of these predictions is more data, more analysis. There are several attractive features of this cognitive style. It allows for the construction of a coherent and consistent narrative which has strong explanatory power. It thus contributes to the development of stable social environments. As a result, this style of cognition allows very long-term predictions to be made as one plays the narrative (cognitive schema) forward in time, although it often requires retroactively modifying the predictions to fit facts. 4.1.1.3 Constructivism Constructivism (see Bruner, Goodnow, and Austin 1956; Ormrod 2003; Piaget 1950; Richardson 2003) is a fundamentally different approach to learning. Here the active agent is the learner in context, not the information provider. There are no professors, only facilitators. What unifies …constructivist perspectives is rejection of the view that the locus of knowledge is in the individual; learning and understanding are Page 48
Turnley, Chew, Perls Beyond Validation
regarded as inherently social; and cultural activities and tools (ranging from symbol systems to artifacts to language) are regarded as integral to conceptual development. (Palincsar 1998: 348) Constructivism argues that we generate knowledge and meaning from interaction between our (historic) experiences and their relationship to new experience. Context is critical, and the history of the individual learner matters. Situated cognition (Brown, Collins and Duguid 1989) is a form of constructivism in which the knowing is seen as inseparable from the doing. As a consequence, learning is measured through its effective application in the real world rather than through the abstract accumulation of knowledge. Situated cognition posits that what is known depends on the situation, the agents present, and the context. Knowledge is completely situational, and learners are measured on their abilities to act. As we will see, we will suggest that thinking of models as analogies stems from a constructivist approach. This is also the basis of the argument that some of the most valuable learning from models comes from participation in their construction. To continue our very brief explanation of cognitive styles that we began with the description of the hedgehog, we now describe the fox (again beginning with Tetlock). The ‘fox’ cognitive style is defined primarily by an individual’s ability and propensity to hold multiple conflicting models or narratives in his head at the same time when considering a problem or thinking about the future. As F. Scott Fitzgerald said, The test of a first-rate intelligence is the ability to hold two opposing ideas in the mind at the same time, and still retain the ability to function. This cognitive style values a multiplicity of ideas, even if they conflict with each other, at the expense of a coherent narrative. Its attractiveness lies in its potential to generate change, whether it be on a small scale or through the production of paradigm-shifting ideas. 4.1.1.4 Learning styles summarized We have argued that validation is the process of a user engaging with a particular characteristic of a model. We described that characteristic extensively in section 3. In section 4, we have introduced some key aspects of users: how they learn (since we are assuming that we construct a model in order to learn from it so we can apply that learning). For our purposes, two key cognitive styles which affect learning approaches are as follows: Page 49
Turnley, Chew, Perls Beyond Validation
Cognitivism (a ‘hedgehog’ style) assumes a relatively passive learner and an active teacher or advisor. Information, while it may be judged by its source or environment, is independent of them – almost artifactual in nature as it moves from advisor to decision maker/learner/judge. It is internalized by the recipient and matched to internal schemata. The schemata may be stretched or modified, but they generally remain as relatively coherent narratives. A failure to make an appropriate decision is usually attributed to a lack of data.
Constructivism (a ‘fox’ style) assumes a relatively active learner. The teacher acts as a facilitator or guide. Knowledge is created as the learner engages with his environment. This construct allows the learner to hold multiple and perhaps competing ideas at the same time, for he is making no effort to either construct or maintain a coherent narrative or schema. This allows for creativity – the ‘novel combination’ of ideas (Martindale 1999:134). Failure to make an appropriate decision here is often characterized as the failure to come into contact with enough disparate ideas which would provide the creative stimulus.
Although we have carefully separated cognitivism from constructivism, as is the case with any archetypes these dichotomies are rarely true in practice. Most environments involve some learning in both styles, and some logical and some creative engagement with new material. The challenge, as always, is to decide what is most appropriate for the problem at hand and to leverage that learning approach and cognitive style.
4.1.2 CREATIVITY The ability to hold conflicting or at least not completely consistent ideas in one’s head allows for the exercise of creativity. Although we do not understand creativity well (Spulak 2010), we do know that creativity always involves “novel combinations of preexisting mental elements” (ibid:137). Combination in this way would undermine efforts to construct a coherent narrative; as such a narrative requires logical connection of one element to another. Any effort to logically deduce a possible future from the present is antithetical to the creative method. Recall our introduction of the comment from the 9/11 report on page 18 above. “We believe the 9/11 attacks revealed four kinds of failures: in imagination, policy, capabilities, and management.” M. J. Kirton’s work on the role of creativity as expressed in institutions follows nicely from that statement. If we look at the role of creative individuals in
Page 50
Turnley, Chew, Perls Beyond Validation
institutions, we find it to be quite different than those who work with established narratives. M. J. Kirton ([1984] 1986) calls our hedgehogs ‘adaptors,’ and says that they …characteristically produce a sufficiency of ideas based closely on, but stretching, existing agreed definitions of the problem and likely solutions. They look at theses in detail and proceed within the established paradigm (theories, policies, mores, practices) that are established in their organisations. Much of their effort in effecting change is improving and ‘doing better’ (Kirton [1984]1986:17). Note the effort to ‘stretch’ the narrative but to avoid abandoning it. And recall that this style will look for additional data rather than new narratives if there are inconsistencies between a new element and the established environment. Foxes, or creative people, on the other hand, are quite willing to abandon an institutional or personal narrative. Kirton says that this type of person in an institutional environment is an ‘innovator.’ [An innovator is] more likely in the pursuit of change to reconstruct the problem, separating it from its enveloping accepted thought, paradigms, and customary viewpoints, and emerge with much less expected, and probably less acceptable solutions… They are much less concerned with ‘doing things better’ and more with ‘doing things differently.’ (ibid.) Interestingly, Tetlock (2005) claims that thinkers of this type tend to place a much lower level of certainty on their predictions despite their clear track-record of being able to predict human-driven events more accurately than individuals using a deductive method (Green and Armstrong 2004).
4.1.3 RISK This raises the question of the level of risk the user is willing to assume by using the model. Recall that validation is a process in which both the model and the user take part, not a characteristic of the model itself. As we noted earlier, one scientist or a particular user community may be willing to accept a certain level of confidence in a model while another may not. While much work can be done to evaluate the costs and benefits of failure of a particular model to produce a ‘correct’ or true answer when applied to the use domain, ultimately the decision to use the model comes down to a willingness by someone or some community to accept those costs in order to gain the benefit. Page 51
Turnley, Chew, Perls Beyond Validation
In the introduction to section 4, we touched briefly on the research on risk aversion. Recall that our definition of validation determined that it was not a characteristic or state of a model. Instead, it was a function of the relationship of a user to a particular characteristic of the model (the probability we spoke of earlier). That is, it was a function of how willing the user was to use the model, given that probability of correctness. And this willingness depends upon the user’s risk aversion. (Recall that ‘user’ can mean an individual or a community of practice.) ‘Risk aversion’ is a dimension of a construct addressed in the literature as ‘risk attitude.’ (O’Neill 1998). Risk attitude describes decision making over quantifiable alternatives under conditions of uncertainty. We will not attempt to cover the field in detail here as there is a tremendous amount of research and discussion on this topic, but will give enough background and context to understand how this factor can constrain the use of a model. The classic approach to risk attitude evaluates the behavior of an individual when faced with choices between quantified outcomes expressed in terms of probabilities. [I]ndividuals who prefer a guaranteed outcome to a gamble when the expected value of the gamble is equal to the guaranteed outcome are said to be risk averse… Individuals who prefer a gamble to the guaranteed outcome when the expected value of the gamble is equal to the value of the guaranteed outcome are said to be risk seeking. Finally, if an individual is indifferent between a guaranteed outcome and a gamble with the same expected value, he or she is expected to be risk neutral (Rosen et al. 2003). Until recently, almost all studies of decision making were based on expected utility theory (see von Neumann and Morgenstern 1947). “Implicit in this theory is the assumption that individuals have stable and coherent preferences; they know what they want and their preference for a particular option does not depend on the context.” (Hedesström 2006). In other words, the assumption is that one’s risk attitude is unchanging no matter what the circumstances (ibid.). In practice, however, individuals are generally risk seeking over small stakes and risk-averse over large stakes (Fehr-Duda et al. 2008; O’Neill 1998:4). Under expected utility theory, determination of an individual’s risk attitude as risk seeking, risk-averse, or risk neutral assumes some abstract, objective point of risk neutrality against which such a position can be measured. In 1979, Daniel Kahneman and Amos Tversky introduced a variant on this approach which they called prospect theory in which a risk Page 52
Turnley, Chew, Perls Beyond Validation
avoidant or risk seeking stance must be determined relative to an endpoint defined by a decision maker (Kahneman and Tversky 1979). Until recently, the risk calculation was assumed to be a purely cognitive rational process. However, recent discoveries using fMRI techniques have shown that there is a significant affective component to what had historically been considered a purely cognitive process (Camerer et al 2005). When individuals were asked to make relatively simple risk decisions (such as bets on a throw of dice), the fMRI showed more than expected activity in the areas of the brain associated with affect. Finally, research is beginning to show that decision context may have a much bigger effect on risk attitude than had been thought. The result is a construct that argues that decisions are constructed based on the decision maker’s assessment of the decision-in-context. Risk attitude thus will become highly time, space, and person-specific. The following quotation sums up the findings of our review of literature in the field: First, decisions often involve conflicting values, where we must decide how much we value one attribute relative to another. In trying to deal with such conflicts, individuals often adopt different strategies in different situations, potentially leading to variance in preferences. Second, decisions are often complex, containing many attributes or alternatives. Since these problems are simplified by decision-makers in different ways, failures of invariance [across decision makers or across decisions by the same decision maker] might be related to task complexity. Finally, although we may know what we get when we choose an option, we may not know how we feel about it. A prestigious Ivy League school may offer a competitive and highpressure graduate program, but we might be uncertain about how we would like that environment. Hence, invariance may fail because of uncertainty in values, even when we know what we will receive. (Payne et al. 1992:91) There has been a great deal of research on different aspects of decision-in-context. Paul Slovic’s extensive work on risk perception has made a significant contribution.6 Other
6
Slovic’s work is extensive and has been extremely influential. A bibliography can be found on his web site at http://www.decisionresearch.org/people/slovic/
Page 53
Turnley, Chew, Perls Beyond Validation
work has experimented with the impact of specific variables on decisions in particular domains such as the impact of ethnicity, gender, and education on decision making in medicine (Rosen et al. 2003) or on the influence of different content domains on the decisions of the same individual. In the latter case, for example, differences in decisions “stem[med] from differences in the definition of what constitutes or contributes to risk in different types of situations, rather than from differences in true attitude toward risk” (Weber et al. 2002). In summary, expected utility theory (which includes prospect theory) assumes that all actors will use the same formula to calculate the riskiness of a given decision, and that a given actor’s risk attitude is not decision-dependent. Therefore, it will remain constant from one decision to the next. Individuals, however, do vary in their risk attitude from one another, with the distribution favoring the risk-averse end of an objectively determined continuum. Risk attitude curves for an individual in practice tend to be concave, with individuals risk seeking on low consequence decisions and risk-averse on high consequence decisions. Prospect theory differs from classic expected utility theory in requiring the decision maker to determine the mid- or risk neutral point of the risk attitude scale. This also allows for variance from individual to individual, but assumes that such a point is determinable and that, once determined, the decision maker will use the same calculus regarding the actual decision as he would under expected utility theory to determine the expected value of a given decision. Behavioral decision research is taking this user-centrism even further, showing that decisions are constructed based on the decision maker’s assessment of the decision-in-context. Risk attitude thus becomes highly time, space, and person-specific. This underscores the importance of knowledge of the user when considering computational models and the possible variability over users and over decision types in the outcome of the application of the validation function.
4.2 CONNECTING MODELS TO USERS We argued in section 4.1 that the user is an integral part of the validation process. Validation involves developing a level of confidence in the model such that the user is willing to use it. Both partners must be equally dressed and equally turned out… the model and the user. We assumed that the purpose of constructing and analyzing with computational models was to gain more familiarity in the target domain (to learn about it) so one could apply that Page 54
Turnley, Chew, Perls Beyond Validation
familiarity in a problem-solving way. To speak very broadly, there are two general approaches to learning that are of interest to us here, cognitivism and constructivism. The cognitivist learner is a passive agent who receives context-free information from his environment. His brain processes, which can be explained through measurement and experimentation, incorporate this information into existing or use it to create new schemata, which in turn drive his behavior. The constructivist learner generates or creates knowledge and meaning from the interaction of current knowledge with new knowledge or experience. Context is critical, and the history of the individual learner matters. The active agent is the learner, not the information provider. In section 4.1, we also discussed the related questions of creativity and risk. This all begs the question of the user’s engagement with the model. So how, then, does a user engage with a model? In this section, we shall see that – at least in the case of constructivist learning – a key role for computational models is stimulating creativity and helping the user to learn. This will at least in part answer our second question:
What usefulness (if any) do social models have besides prediction?
4.2.1 MODELS AS ADVISORS Computational models of any sort are often called decision support systems, suggesting that the decision is made by a human who acquires knowledge from various sources, not by the model. As McNamara (2010: 4) pointed out, “models don’t forecast because people forecast.” Models provide information to people who then make the forecast. Models thus can play a role in a decision-making process similar to an advisor. Beemer and Gregg noted (2008: 361), “Advisory systems do not make decisions but rather help guide the decision maker in the decision-making process, while leaving the final decision-making authority up to the human user” (see also Sniezek 1999). There is a significant literature on advisors in the decision-making process, most of which is irrelevant to our discussion here. However, it is useful to identify the actors or roles and to explore the process a little, as this may give us some insight into the role models will play in this process and how validation or the assessment of their goodness can impact their role.
Page 55
Turnley, Chew, Perls Beyond Validation
Budescu and Rantilla (2004) present a list of ‘actors’ or roles in the decision-making process. There is a decision maker, which can be a person, group or model. The decision maker identifies a decision task. The decision task includes the context for the decision, its importance, type and amount of information available and how it is presented. Expert advisors serve as subject matter experts. Additional information is available to the decision maker (including his assessment of the advisor) which the decision maker uses to evaluate the advice. Note that this last information differs from the advice itself or other decision-focused information the decision maker may have available to him. Bonaccio and Dalal (2006) in their review of the literature in this field use a slightly different vocabulary, calling decision-makers judges, and describing them as “the person who receives the advice and must decide what to do with it. The advisor is the source (or sources) of advice or suggestion. Bonaccio and Dalal coined the term ‘judge-advisor system’ (JAS) which is frequently used in the literature. Note how this description of the JAS follows a cognitive approach to learning. A passive learner (the ‘hedgehog’) receives pieces of information that are complete and independent, integrates them into his cognitive schemata, and then acts. In fact, Bonaccio and Dalal “propose an input-process-output model for the JAS” (2006: 129). While their input-output system is far more complex than simply the presentation of advice to the judge who processes it and produces a decision, their model still assumes a relatively passive relationship of the judge to the information. The judge may discount the advice because of biases against the advisor or for other reasons; but while the weight or importance he places on the information he receives may change, the information contained in the advice does not change. The information becomes reified and acts almost as a token, passed from the advisor to the judge. A decision-making process involving advisors also serves some important social functions beyond that of producing a good decision. The process of giving and receiving advice could be viewed as an adaptive social decision-support system that helps individuals overcome their inherent limitations by proposing new alternatives, different frames, and disconfirming information (Yaniv and Milyavsky 2005: 109)
Page 56
Turnley, Chew, Perls Beyond Validation
Interacting with others prior to making a decision forces decision-makers to think of the decision problem in new ways (Schotter 2003) and provides them with new information or alternatives (Heath and Gonzalez 1995). It also reduces the effects of framing (Druckman 2001), where the decision maker brings a personal, or preprogrammed, perspective to the decision. The interaction thus socially locates the decision-making process, introducing a constructivist dimension to the process. Some argue that recommendations from decision aides such as computational models are not the same as advice from humans because of these social dimensions (Bonaccio and Dalal 2006: 135). However, while recognizing the social contribution human advisors can provide, we note that most decision makers seek out advice to improve the quality of their decisions and share the accountability for the outcome (Harvey and Fischer 1997, Yaniv 2004). Therefore, we argue that for our purposes computational models can be seen in much the same light as human advisors. (Note that in a technocratic governance environment, a computer model can be as accountable as a human advisor in this JAS.)
4.2.2 MODELS AS KNOWLEDGE CREATORS “History repeats itself, but not exactly” (Mair et al 2009). This section will stretch the creative engagement of the user with the model much further than occurs in a model-as-advisor paradigm. In that paradigm, as just described, the decision maker receives information from (a number) of sources which may include a computational model. He incorporates that information into his existing knowledge base, discounting each piece of received information by what he knows of the information provider and other contextual information. In this cognitivist paradigm, new information is incorporated into a schema. That schema may be stretched and extended, but rarely is it completely discarded and a new one created. Research has shown that individuals generally give less weight to advice that is farther from their own opinion (Yaniv and Milyavsky 2005): advice that is not close to the decision maker’s schema is highly discounted. Suppose we use models differently. Suppose that instead of using them as advisors to provide us with artifactual pieces of information, we use the model to stimulate creative thinking – to provoke the imagination. This is a use of models which fits more within the constructivist paradigm discussed in section 4.1.1.3, and one which we would argue could
Page 57
Turnley, Chew, Perls Beyond Validation
have stimulated the imagination which was lacking, for example, in the failure to anticipate 9/11 or the outcome of the Palestinian elections. 4.2.2.1 Models as analogies or metaphors In our earlier work (Turnley and Perls 2008), we argued strongly that all models (including computational ones) are analogies. Since we cannot include the whole world in our model, we must pick parts of the world to include. We argued that that logic of choice, expressed as theory, was actually the logic of analogy. An analogy is a relationship between two systems by virtue of their common properties (Hesse 1966). Analogies enable users to draw inferences about an unknown system by seeking perceived likenesses between that system and a known system. Analogical problem solving or reasoning is a process of comparison using prior knowledge and applying it to the current situation while recognizing where it does not fit (Gick and Holyoak 1980). George Lakoff and Mark Johnson, in their provocative work, Metaphors We Live By, introduce the creative capability of analogy.7 The primary function of metaphor is to provide a partial understanding of one kind of experience in terms of another kind of experience. This may involve preexisting isolated similarities, the creation of new similarities, and more. (Lakoff and Johnson [1980] 1993:154) Max Black goes further, and says that a metaphor must be something that takes us by surprise, that is not obvious. Taken as literal, a metaphorical statement appears to be perversely asserting something to be what it is plainly not. . . . But such “absurdity” and “falsity” are of the essence: in their absence, we should have no metaphor but merely a literal utterance (Black, 1979b: 21, in Duit 1991:2).
7
Note: Lakoff and Johnson speak of metaphors, which are simply strong analogies or similes. Analogy/Simile: A man is like a wolf. Metaphor: A man is a wolf. We will use the two terms (analogy and metaphor) interchangeably.
Page 58
Turnley, Chew, Perls Beyond Validation
If one calls the US Department of State “Venus” and the US Department of Defense “Mars,” “and takes these statements literally, they are absurd. A metaphor
compares without doing so explicitly. Metaphors always have some aspect of surprise; they provoke anomaly. In this sense metaphors are comparisons where the basis of comparison must be revealed or even created by the addressee of the metaphor (Duit 1991:2) Interestingly, we still do not know very much about how analogies and metaphors work from a cognitive point of view. As Black, still regarded as one of the leading thinkers about metaphors said, To think of God as love and to take the further step of identifying the two is emphatically to do something more than to compare them as merely being alike in certain respects. But what that "something more" is remains tantalizingly elusive: we lack an adequate account of metaphorical thought (Black 1979a:143). But to say we do not know how they work is an unsatisfactory position. Jon Ogborn and Isabel Martins did some empirical work with students and teachers and discovered that what they called ‘ontological distance’ has a relationship to how metaphors work. Ontological distance is “the number of salient basic features in respect of which term and analogue differ” (Ogborn and Martins 1996:650). If the distance is too small, the terms are too similar and will not stimulate the imagination. And, of course, both parties must be reasonably familiar with the terms. Gentner proposed a mapping of the objects and relationships of one system onto the other, defining various aspect of that mapping such as clarity (the precision of the node correspondence), systematicity and so on (Gentner 1983). We also need to keep in mind that metaphors are what I.M.Carroll and R.L. Mack called ‘open-ended,’ that is, by their very nature they do not completely explain the target system. They use selected attributes of the source system to explain the target system. “A man is a wolf” does not completely explain a man, nor would it stand up to a validity challenge. It provides a particular perspective on a man however, and leaves us to discover the rest on our own. It stimulates the imagination. What is important to note, and what makes the use of metaphors interesting and the previous statements less obvious, is that they all are dependent on context: prior knowledge of the participants, and personal and linguistic history. Metaphors that speak to me may be quite different from those that speak to you. I may see the US as a melting pot, taking heterogeneous inputs and creating something quite new with them. Others may see Page 59
Turnley, Chew, Perls Beyond Validation
the US as a salad bowl, a tasty dish in which each element retains its own integrity and flavor, bound together by a ‘dressing’ of shared beliefs. These different metaphors drive feelings about national language, the importance of participation in ethnic festivals and rituals, the wearing of particular clothing, and the like, in all of which we are highly – but differently – invested. It is the discussions around these different metaphors that force us to question our assumptions and the constructions of the social realities on which they are based. The models/metaphors need not be accurate. The US is neither a melting pot nor a salad bowl. It is enough that the metaphors be true – that they are both explanations of how this country is struggling with ways to accommodate difference. In the same way, two computational models may both be true (i.e. explanatory), although neither may ‘accurately simulate the real world’ (be validated) and so be useful for prediction in a formal sense. In most cases, what is at issue is not the truth or falsity of a metaphor but the perceptions and inferences that follow from it and the actions that are sanctioned by it (Lakoff and Johnson [1980] 1993:158) As we said earlier, analogies work when we hold what appear to be dissimilar ideas (systems) in our head and perceive similarities between them. This echoes the definition of creativity we presented earlier: the ‘novel combination of pre-existing mental elements.’ Analogical learning (or, perhaps, a creative process) begins when the learner notices that a similarity exists between a source and target phenomenon. He then maps corresponding parts of the source to the target, generating new understanding (for him) in terms of the source phenomenon (Craik and Lockhart 1972; Tulving 1974). The learning is thus contextual (Tulving and Wiseman 1976) although generalizable to other similar targets. Note that this approach suggests that since a thing is known in terms of something else, we can never engage with something that is completely other. The anthropologist, Claude Lévi-Strauss, presents a nice description of the problems this may create. He traveled from France to Brazil as I had wanted to reach the extreme limits of the savage (sic)….I found myself among these charming Indians whom no other white man had ever seen before and who might never be seen again…Alas! they were only too savage….They were as close to me as a reflection in a mirror; I could touch them but I could not understand them. I had been given, at one and the same time, my reward and my punishment. I had only to succeed in guessing what they were like for them to be deprived of their strangeness. Or if, as was the case here, they retained their Page 60
Turnley, Chew, Perls Beyond Validation
strangeness, I could make no use of if, since I was incapable of even grasping what it consisted of (Lévi-Strauss (1955) 1977: 375-376) So let us bring this back to computational models. As we said in our introduction, a model is not all things in the world but a selection from them. We argued that the selection from the real world creates an analogy – a theory, a type A model. That type A model can be instantiated using a data set from the real world which becomes a type B model. At some number n, the number of data sets used to successfully instantiate the theory will be sufficient such that the decision maker is confident enough, within his own risk paradigm, that the n+1th data set (the use case) also will successfully instantiate the theory. At this point, the model is used. Alternatively, the model builder can ‘assume’ a type A model (assume a theory), access a data set, and go directly to a type B model. He will determine the goodness of his type B model by calibrating it to his data set. We then engage with the process as depicted above at the third step, where the risk attitude of the decision maker comes into play. So the elements in this process which takes us from selection of elements from the ‘real world’ to a model suitable for prediction are, very roughly:
The original analogy to the world – the theory, the type A model
The instantiation using data from the real world n times – many type B models OR
The calibration of one type B model to its source data set
The risk attitude of the user (recall that this is unchanging over decisions)
The type of decision to be made (a socially determined construct which includes determination of the nature of the consequence of the decision)
(Other input available to the decision maker)
This process and inputs are shown in Figure 3. Note that the focus is on the decision.
Page 61
Turnley, Chew, Perls Beyond Validation
Figure 3: From theory to application
But suppose that we recharacterize this process and engage the model user in the creation of the analogy. Recall that to create an analogy one must hold in one’s head, simultaneously, the known system and the unknown system. The known system may be arbitrary. Our analogy may be that the US Department of State is like Venus and the US Department of Defense is like Mars, but there is no logical connection between the Roman gods and various agencies of the US government. That connection depends upon the recognition of certain aspects of the ‘personalities’ and functions of the two agencies and the nature of the respective gods. Two data points are chosen from a myriad of possibilities. Recall now our definition of a creative individual: one who is able to make “novel combinations of preexisting mental elements” (Martindale 1999:137). Runco and Pritzker (1999) put it a little differently, and emphasized the role of analogy as one of the mechanisms of creativity. “Creative individuals (divergent thinkers) have the ability to categorize in both conventional and unconventional ways which facilitates efficient retrieval by means of analogies and unpredictable associations” (Runco and Pritzker 1999). Matching Roman gods to US government agencies takes us into the unconventional and allows us to view and understand the two agencies in a different light than they otherwise might be. Page 62
Turnley, Chew, Perls Beyond Validation
Identifying and describing entities in a system, the relationships among them, and the processes that can affect system functioning – all necessary to build a successful model – also can be a learning process. As Thomas Karas (2004) pointed out in his discussion of a workshop focusing on the relationships between computational modelers and policymakers, “The very act of constructing a model requires learning, in a structured way, how the modeled system works. Thus the mental model that the participant leaves with may be more sophisticated and more reflective of the best knowledge on the subject—the analyst or policymaker becomes a more proficient expert himself” (Karas 2004: 12). The modeler himself learns about the target system through the act of modeling. And when the modeler also is the decision-maker, the decisions he will make can change profoundly. (We should keep in mind that one participant whispered to one of the authors of this paper at the conclusion of the workshop, “This is great! But we just don’t have time to understand it – we just need answers.” And if those answers are required quickly, this often tends to lead, for good or bad, to a cognitivist rather than constructivist use of models: we use models as advisors or knowledge providers and accept the ‘answer token’ from them to forward on to the judge or decision maker.) 4.2.2.2 Participatory modeling Participatory modeling (also known as companion modeling) is a formalized approach which facilitates constructivist learning (Naivinit et al 2010; Castella et al 2005; Barnaud et al 2006.) In a participatory modeling process the stakeholders in the decision (who themselves may also be decision-makers) take an active part in the construction of a conceptual model of the decision space. Interestingly, the formal application of participatory modeling has been restricted primarily to complex resource allocation decisions (primarily water) in places such as Morocco (Dionnet et al. 2008), Senegal (D’Aquino et al. 2003), Asia (Gurung et al. 2006), and elsewhere. The process uses roleplaying activities to help elicit revealed preferences in key decision-making contexts, to make explicit strategic goals, and to illuminate political positioning. The knowledge gained from these activities is captured and represented in a computational simulation of the local situation. The stakeholders then provide input to the developers of the computational simulation who modify it to better fit the local situation. That conceptual model is instantiated computationally, vetted by the decision makers, and revised by the model builders (Janssen and Ostrom 2006). The object of a participatory modeling exercise is to capture the “shared representation” of the stakeholders of their community and/or problem domain by eliciting this Page 63
Turnley, Chew, Perls Beyond Validation
representation from the stakeholders themselves (Gurung et al. 2006). The stakeholders thus create the analogy (i.e. identify relevant points that provide the bridge between the ‘real system’ and the model) that provides the underlying structure for the computational model. They pick from their shared social landscape those features they believe are important. Facilitators of this type of participatory process claim that it is highly educational for the participants, providing them with knowledge and insight into other players’ agendas that they otherwise would not have (Dionnet et al 2008). The exercise also produces knowledge about how players move through a particular complex system (ibid.). The complex domain dynamics, emergent characteristics and multiple stakeholders with conflicting agendas of scarce resource allocation systems that make them well suited for participatory modeling are also found in the national security environment. These same characteristics create a shifting problem space with particular challenges for decision making. Most computational social models built by the government are built because some target system is in need of intervention. However, systems of the type of which we are talking here will continue to evolve as the modeler works as the system exercises its own internal dynamics. Furthermore, the modeling team’s engagement with the system in order to get data to populate the model will perturb the system so that when the modeling team leaves the system no longer is the same as the one they entered. Suppose, for example, we are considering a sociocultural intervention (such as building a schoolhouse) in an occupied area. We begin to do an assessment of the target population so we know what kind of school we need to build – how big, and so on. However, just the fact that we are considering this action (which the target population knows because we must come into contact with it to gather data for the assessment) can change relationships, power structures, local economics and the like. The problem changes as we work it. This dynamic dimension of problems is a characteristic of complex adaptive systems which, as Holland (1992: 20) pointed out, "never get there. They continue to evolve, and they steadily exhibit new forms of emergent behavior.” These problems have recently been discussed in some circles as ‘wicked problems’ (Horst et al. 1973). Wicked problems are common in the world of socio-cultural analysis and are characteristic of any complex adaptive system. These systems never can achieve an optimal state. There is no ‘answer’ to wicked problems, just ways to manage the system. Page 64
Turnley, Chew, Perls Beyond Validation
So if we are engaged with the construction of a model of this system (a systems dynamics model, or, less likely, an agent-based model), as the system evolves and changes, what is important in the system – our key analogic points – also will continue and evolve. Margaret Morrison and Mary Morgan (1999) continue this thread of the active role a model might play in the learning process. They suggest that for any model, modelers must bring into the modeling process information other than that from the theory or the data. There are simplifying assumptions that are made throughout the modeling process, for example, that neither derive from theory nor are based on data. Eric Winsberg called these “model building techniques, a particular way of altering a theoretical model to make it more computationally tractable” (Winsberg 2010: 123) Morrison and Morgan claim that the crucial feature of partial independence is that models are not situated in the middle of a hierarchical structure between theory and the world. Because models typically include other elements, and model building proceeds in part independently of theory and data, we construe models as being outside the theory-world axis. (Morrison and Morgan 1999:17-18). Thus, computational models do not simply translate a conceptual model into a different language. Because of those simplifying assumptions and other elements, the model builders actually create something new as they code. 4.2.2.3 Models as straitjackets The final mode of creative engagement with a model is, ironically enough, one we introduced earlier in the section on computational linguistics: the straitjacket. (Aha! An analogy!) Engagement with the model as it is being constructed can provide clarity about the structure or logic of the target complex system. The model thus makes visible the unseen, showing us connections and relationships of which we might otherwise not have been aware. As mentioned already, the straitjacket forces rigorous consistency and explicitness on the narrative world of (social) theory (Shieber 1985). (We note here that the multivocality provided by the lack of clarity of narrative structures also has its place, for it allows for the stimulation of dialogue, the exchange of ideas, and the stimulation of creativity.) To recall, Shieber said that developing a model usually allowed the developers to identify “errors usually due to unforeseen interactions of various rules or principles” (Shieber 1985:12). In other words, the model’s formalisms allowed the user to see through complexity in a way that would have been impossible without the computational power underlying the model. One of the authors of this document has experienced the value of Page 65
Turnley, Chew, Perls Beyond Validation
this type of engagement on several different modeling projects on which she worked. In each project, she worked with the modelers to formalize theory that is generally narratively expressed. The rigor required to develop these formalisms was very helpful in developing her own thinking. The formalisms as captured in code also allowed her and her team to explore interactions among elements of a system that was complex enough to be impossible to manipulate without the code. For those target communities or individuals involved in a participatory modeling process, the learning is the same. In all the cases we have given of engagement with model-as-metaphor, a decision was no longer the focus. The focus was on the education and development of the user so that he might differently engage with the system. It would be possible that conversations may get no further than arguments over the appropriate elements to use for the analogy - but those conversations themselves reveal key perspectives on the system and allow those perspectives to be manipulated, developed, and changed.
4.3 SUMMARY We have suggested that learning through metaphors is a constructivist explanation of learning in which the learner is a very active, creative agent in the learning process. Using a model in a constructivist fashion is to use it in an explanatory way. We contrast this with an argument that if models are to be used to predict, they will be used in an advisory fashion, providing a piece of information which the learner then incorporates into his knowledge schema. This provides strong agency to the model (the ‘teacher’) and puts the learner into a passive role. This is a cognitivist model of learning, in which the professor professes and the student receives. These models are usually used to predict. As we saw, the model-as-advisor role can lead to incremental changes in the knowledge structure of the model user, but rarely to big paradigm shifts. It also reifies the model as an information-providing tool. It works well in certain situations where information is required quickly. Finally, it fits well with a technocratic approach to governance. The models are created by scientists or experts and the models then provide science/expertbased information to the decision-making process. The model-as-metaphor approach can lead to creative moments but can be personally and institutionally highly disruptive and is resource-intensive to implement. It requires each individual to engage with the model in order to learn, so it is very expensive in terms of human resources. These models generally are explanatory not predictive models. And though engagement with these models would not only allow but support the exercise of Page 66
Turnley, Chew, Perls Beyond Validation
imagination the 9/11 report called for, it does not map well into a technocratic governance paradigm. It does not provide an accurate representation of the world but may provide a true one, as the US Department of Defense ‘acting as’ Mars may be true but not accurate. It thus has the potential to make many in government today highly uncomfortable. In short, to use a model as an advisor is to use it as support for a particular decision. To use a model as a metaphor is to use it to either further the user’s understanding of the target system, or as a means to stimulate creative thinking. And, of course, any engagement with a model can involve some combination of both. The challenge for the organization utilizing computational models is to balance its need for information (prediction) with its need for increased insight or understanding. To put it another way, it needs to balance its requirements for stability and continuity (both important organizational features) with the possible behavioral or institutional change and adaptation provided by an active, creative engagement with the models. Any organization needs to balance the use of analytical reasoning and problem solving with the exercise of imagination.
Page 67
Turnley, Chew, Perls Beyond Validation
5 HOW CAN THE GOODNESS OF MODELS BE ASSESSED? We have reviewed literature on validation from several fields. This literature is fairly rich and reasonably critical of the construct and the process of applying it. We believe we have developed a reasonably robust concept that is useful for our purposes. We have argued that just as a theory can never be proven but only falsified, so can a model never truly be validated but only invalidated. The model can be assigned a measure of accuracy which can be expressed as some level of confidence that it when it is applied to the target data set it will be true. It is then up to the user to decide if the cost of failure (i.e. of a wrong answer, a wrong prediction) is worth taking the risk. We note here that many computational models of social phenomena are calibrated, not validated. They are constructed based on theory (they are type B models) and then are ‘tweaked’ (calibrated) to fit the target data set. They thus can be used in a ‘soft’ predictive sense for that domain or region only. They are not transferable or generalizable to other domains or regions. This means that a counterinsurgency model developed to show relationships and encounter outcomes in one operating theater cannot be transferred to another. [GEN] Petraeus was quick to note the “enormous difference between Iraq and Afghanistan.” Afghanistan, for example, lacks Iraq’s huge oil revenues and its “muscle memory” of strong central government institutions. These, as well as culture differences between the two countries, require a different approach to conducting counterinsurgency operations, he said. “You have to apply [them] in a way that is culturally appropriate for Afghanistan. (Miles 2009) But what if a model is not designed to be used for prediction? Even if it is, what if the user chooses to use it for some other purpose?8 How, then, do we assess its goodness? This is our third and last question:
8 Every model is constructed to answer a question. Ideally the question is explicitly stated. Models may be used for
purposes other than that for which they are designed, but at the user’s peril (see Turnley and Perls 2008).
Page 68
Turnley, Chew, Perls Beyond Validation
If computational social models are used for some other purpose other than prediction (whether or not they can be validated), what assessments of goodness or effectiveness could be used for these models in this other application?
As an introduction to our answer, we provide a story. This is a story of a model created for one purpose (to stimulate conversation and creative thought) and used (in a sense) for another (as a provider of knowledge such as a prediction). Suppose we created a systems dynamics model of a counterinsurgency environment. The target environment is extremely complex, with a multitude of actors and the corresponding spaghetti diagram representing their influences. Obviously we cannot capture all the actors in our model (where ‘actors’ could be an agency, a country, an organization, or an influential individual). Nor can we represent all the relationships among them. So… we choose. The resulting model graphic when presented as a whole is highly inaccessible (see Figure 4).
Page 69
Turnley, Chew, Perls Beyond Validation
Figure 4: Afghanistan/COIN dynamics9
However, when this graphic culminates a presentation which builds it segment by segment, with a narrative describing the logic of choice for each element, the metaphor (that is this graphic) is clearer. The value lies in the process of explanation and discussion, engagement with the model as it is being built, as it were. To select this graphic, let it stand alone, and to deride it for its inability to represent the target system in a comprehensible manner is to speak truly about this graphic. However, it is to speak falsely about the model that generated it. The purpose of that model was not to provide a ‘piece of information.’ The purpose of that model was to stimulate creative thinking about a hard problem. So how do we assess the ‘goodness’ of a model such as this? By the creative thinking it stimulates. And how do we measure or assess creativity? Though we do not know, a model or a modeling process that is surrounded by dialogue and constructive argument is likely serving as an effective metaphor. Ideally, of course, we would need to measure the change
9 Source: New York Times 26 April 2010
http://www.nytimes.com/2010/04/27/world/27powerpoint.html
Page 70
Turnley, Chew, Perls Beyond Validation
in knowledge or perspective of the user, pre- and post-model engagement to truly ‘measure’ its effectiveness. There is a growing literature on the role of analogies and metaphors in learning. As of today, the measurement instruments are very cumbersome and the results not very satisfactory. We will close this section on this uncomfortable note. Just as a technocracy does not like imagination (Where in the world did you get that from? Can you support it?), so it will be uncomfortable with the use of models to provide perspectives that might be true but not accurate. However, if we are looking for ways to stimulate thinking, develop new paradigms, move beyond our comfort zone – to help us imagine, just as the 9/11 report called for – engagement with models as analogies can help us get there.
Page 71
Turnley, Chew, Perls Beyond Validation
6 SUMMARY AND CONCLUSION We set out to address three questions:
Are there characteristics of the sociocultural domain that make certain aspects of it very difficult, if not impossible, to model computationally in ways that would allow those models to meet the standards of proof required by classic definitions of validation?
What usefulness (if any) do social models have besides prediction?
If computational social models are used for some other purpose other than prediction (whether or not they can be validated), what assessments of goodness or effectiveness could be used for these models in this other application?
Our exploration of validation in domains other than the sociocultural led us to the conclusion that the sociocultural domain, indeed, by its very nature is difficult to model in ways that would allow those models to meet the standards of proof required by classic definitions of validation (our first question). Among them are such hard problems as the complex, dynamic and open nature of the sociocultural systems. The problematic nature of social theory also makes it difficult to argue with certainty for any model structure. The multiple time scales at play in any social environment, many of which will go long past the time horizon of the model, make it difficult to represent their interacting effects. We reframed the validation discussion by first distinguishing between two types of model: research models and ‘engineering’ or site-specific models. Research models can be validated: engineering models can be calibrated. Most computational models in the social arena are engineering models, constraining general principles of human behavior (theory) by the details of time and space. However, it is very important to remember that every model of some particular time and space (the tribes of eastern Afghanistan) should be underpinned by research models which provide its underlying structure. These research models are fundamentally analogies. Our example of the tribes of eastern Afghanistan begins with the recognition that kinship is an important feature of this area of the world. We recognize and develop general principles of kinship structures. We then construct a model of a particular kinship structure. In the ideal world, we would use it to validate the general principles of kinship by testing the actual structure against the theoretical structure and assessing the lack of isomorphism. We use it for decision-making about activities in eastern Afghanistan by calibrating it against our actual data. Page 72
Turnley, Chew, Perls Beyond Validation
With this distinction in view, we returned to the notion of validation. Keep in mind that only research models can be validated. They are validated by instantiating them with data sets from multiple sites and times. This develops a measure of accuracy which is expressed as the confidence that the model with give the ‘right’ answer when it is instantiated with a new data set. It is the user who must determine whether or not that confidence level is high enough to apply in a given decision environment. Validation thus is a function of the relationship between the user and the model, and acquires a dimension of subjectivity. It thus becomes important to understand how users engage with information and manage risk in decision-making. We very briefly focused on two types of learning or engagement with the world. Very broadly, in one, the teacher/professor/advisor provides a piece of information (a token) to the user. The user compares the token to his existing mental schemata. If it is too divergent he either heavily discounts or discards it. If not, he uses the new information (the token) to revise or reform those schemata. This type of learning focuses on the maintenance over time of coherent narratives. It dovetails well with organizational stability and continuity. A computational model would take on the role of an advisor in this type of process. It would generate information, including a prediction, which would be ‘used’ by the user. In the second learning process, the user actively engages with his learning environment. He creates information through his engagement with that environment. This is a highly learner-centric approach. It is very expensive in terms of information–generating resources, for each learner must engage independently with the environment for the learning to take place. Passing around a spaghetti diagram of a social situation will not work. Walking the learner through the construction of the diagram by explaining its elements and being open to challenge about the (non)inclusion of those elements will work and will facilitate learning. So this answers our second question – if not for predictive purposes, for what else can computational social models be used? We believe that through their metaphorical properties they can be used to stimulate creativity, to excite the imagination. They can help users look at problems in new ways. So that takes us to our last question: how to assess the goodness of computational models. If they are to be used to help people predict, the process of validation (with all necessary caveats), appears to be the best approach. But to discount a model because it is not accurate is to discount its creative power. As Box so famously said, all models are wrong but some are useful (Box 1979). Suppose a model is not assessed as a context-free artifact, but as an integral part of a wicked problem, as a means of learning about the dynamics, Page 73
Turnley, Chew, Perls Beyond Validation
structure and function of a target system. In these cases, the measure of goodness, of the usefulness of the model would be in the measure of the learning that took place in the model users or in the assessment of new ideas generated. So an assessment of the goodness of a particular model must begin with a clear understanding of the relationship among the model user, the target problem, and the model. We will suggest a set of questions that would describe that relationship, presenting them first discursively and illustrating them in Figure 5.
How accurate must a description of a future state be in order to address the target problem? o Is it a problem that requires a forecast (prediction)? o Could it be reframed as one which looks to the generation of alternative possibilities (a much ‘softer’ framing of alternative futures)?
How significant are the consequences of failure to predict/forecast a future within the required accuracy bounds?
Is the model designed to expand the boundaries of general domain knowledge (a research model)? o If so, are there resources (including the necessary data) to apply the model to multiple data sets? o If not, and the proposed model is an engineering model to be calibrated to a data set, is there adequate data for the target problem area? Is the model user comfortable ‘enough’ with the underlying theoretical model structure? (Is that model structure elucidated by the model builders?)
How has the model dealt with domain issues generated by the open, dynamic, complex nature of social systems such as: o Theoretical flaws (is the choice of underlying theory appropriately justified?); o Empirical flaws (is the user comfortable with the ways in which data are collected and measured?); Page 74
Turnley, Chew, Perls Beyond Validation
o Parametrical flaws (are the simplifying assumptions used by the model builders explicit and acceptable to the user?); and o Temporal flaws (how has the model engaged with the multiple time scales that are operative in any social domain?)
How does what the user learned from engagement with the model help the user reframe the problem (the ‘wicked problem’ problem)?
These interactions among the model, user and problem are illustrated in Figure 5. It is a complex figure, as these interactions are complex. These interactions will determine when and how the model can contribute to a decision process. If the user needs a highly accurate prediction from the model, he will require a model whose underlying structure has been calibrated against enough sets of data that the user is confident enough for his particular decision that the model will provide the ‘right’ answer when applied in his use case. In this instance, the model will generally be used in a way that will perhaps extend and stretch existing cognitive schema, but will not radically change or replace them. Use of the model in this way allows the information it generates to be easily transferred from those who ‘run’ or exercise the model to many in the decisionmaking process. If, on the other hand, the user is interested in challenging existing cognitive schema (in exercising the imagination called for by the 9/11 report we referenced in the introduction to this discussion), he can engage with the development and challenge the structure of the model. Asking questions around each one of the areas of potential flaws enumerated above (theoretical, empirical, parametrical, and temporal) will stimulate the user to challenge his own assumptions – and quite likely modify both the model and the problem in the process. Clearly, this requires that each model user engage with the model builders at a point quite different in a decision process than that in our previous example. In this case, however, the model need not be validated. It is here that its goodness will be evaluated through the stimulation of imagination and the generation of new possibilities, through changes in the model user (an evaluation of what he has learned or imagined, for example).
Page 75
Turnley, Chew, Perls Beyond Validation
Figure 5: Using a model
Page 76
Turnley, Chew, Perls Beyond Validation
We close by emphasizing that we are not discounting or dismissing the importance of validating computational social models. To the contrary: for certain types of decisions, validated models are critical. We are making two points with our arguments. First, we need to be clear exactly what we mean by validation in the computational social domain, and recognize possible constraints on such a process. And second, we described a different use for computational social models, one which will require a different assessment of goodness than validation. If the models are used to introduce alternative futures, to cause us to exercise our imagination, we will need to evaluate them in the same way we evaluate other similar tools. How far do they push us from our current positions? What new possibilities do they introduce? What have we learned by engagement with the model? We recognize that these are cumbersome, post facto measures and so may not be satisfactory in today’s high accountability environments. But if the analytic community is looking for the exercise of imagination whose absence was lamented by the 9/11 Commission, it may find it in different ways of engagement with tools it already possesses. As we said in our discussion of the melting pot and salad bowl as two possible analogic descriptions of the US, neither model is accurate but both are true. If computational social models – or computational models of any domain, for that matter – are to be used to stimulate the imagination and promote creativity, perhaps now we only can find the best ones by looking for those corners in the offices where the most passionate arguments and excited conversations are taking place.
Page 77
Turnley, Chew, Perls Beyond Validation
7 REFERENCES American Institute of Aeronautics and Astronautics (AIAA). 1998. AIAA Guide for the Verification and Validation of Computational Fluid Dynamics Simulations. G-077-1998e. Atkinson, R. and R.M. Shiffrin. 1968. Human Memory: a Proposed System and its Control Processes. KW Spence (ed.), The Psychology of Learning and Motivation: Advances in Research and Theory. Vol. 2. Academic Press. New York, NY. Pp.89-195. Barley, Stephen R. and Julian E. Orr (eds). 1997. Between Craft and Science: Technical Work in US Settings. Cornell University Press. Ithaca, NY. Barnaud, Cécile, Panomsak Promburom, Tayan Raj Gurung, Christophe Le Page, and Guy Trébuil. 2006. Communication presented at the International Symposium Towards Sustainable Livelihoods and Ecosystems in Mountainous Regions, Chiang Mai, Thailand, 7-9 March 2006. Accessed at http://cormas.cirad.fr/pdf/barnaud2006.pdf on January 26, 2012. Beemer, Brandon and Dawn G. Gregg. 2008. Advisory Systems to Support Decision Making. Handbook on Decision Support Systems. Chapter 24. Springer. Berlin, Isaiah. 1993 (1953). The Hedgehog and the Fox: An Essay on Tolstoy's View of History. Ivan R Dee Beven, Keith. 2002. Towards a Coherent Philosophy for Modeling the Environment. In Proceedings of the Royal Society of London Vol. 458, pp. 2465-2484. Black, M. 1979a. How Metaphors Work: A Reply to Donald Davidson. Critical Inquiry Vol. 6 (1), pp.131-143. Black, M. 1979b. More about Metaphor. In A. Ortony (ed.), Metaphor and Thought. Cambridge University Press: Cambridge, pp. 19-43. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research Vol. 3 (2003), pp. 993-1022. Box, George. 1979. Robustness in the Strategy of Scientific Model Building. In RL Launer and GN Wilkinson (eds.), Robustness in Statistics: Proceedings of a Workshop. Bonaccio, Silva and Reeshad S. Dalal. 2006. Advice Taking and Decision-Making: an Integrative Literature Review, and Implications for the Organizational Sciences. Organizational Behavior and Human Decision Processes Vol. 101, pp. 127–151.
Page 78
Turnley, Chew, Perls Beyond Validation
Brown, John Seely. Allan Collins and Paul Duguid. 1989. Situated Cognition and the Culture of Learning. Educational Researcher. Vol.18, No.1, pp.32-42. Brown, P. F. Della Pietra, V. J., Della Pietra, S. A. and Mercer, R. L. 1994. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics Vol. 19 (2), pp. 263-311. Bruner, J., J Goodnow, J., and A. Austin. 1956. A Study of Thinking. Wiley: New York, NY. Budescu, David V. and Adrian K. Rantilla. 2004. Confidence in Aggregation of Expert Opinions. Acta Psychologica Vol. 104 (3), pp. 371-398. Camerer, Colin F., Loewenstein, George F. and Prelec, Drazen. 2005. Neuroeconomics: How Neuroscience Can Inform Economics. Journal of Economic Literature Vol. 43, Mar 2005. pp.9-64. Campbell, J. and A. Tesser 1983. Motivational Interpretation of Hindsight Bias: And Individual Difference Analysis. Journal of Personality Vol. 51, pp. 605-20. Castella, Jean-Christophe, Tran Ngoc Trung, and Stanislas Boiisseau. 2005. Participatory Simulation of Land-Use Changes in the Northern Mountains of Vietnam: the Combined Use of an AgentBased Model, a Role-Playing Game, and a Geographic Information System. Ecology and Society Vol. 10 (1). Chiu, C.-Y., M. W. Morris, Y.-Y. Hong, and T. Menon. 2000. Motivated cultural cognition: The impact of implicit cultural theories on dispositional attribution varies as a function of need for closure. Journal of Personality and Social Psychology. Vol. 78, pp. 247-259. Chomsky, N. and Halle, M. 1968. The Sound Pattern of English. Harper & Row: New York. Craik, F.I.M. and R.S. Lockhart. 1972. Levels of Processing: A Framework for Memory Research. Journal of Verbal Learning and Verbal Behavior Vol. 11 (6), pp. 671-684. D’Aquino, Patrick, Christophe Le Page, Francois Bousquet and Alassane Bah. 2003. Using SelfDesigned Role-Playing Games and a Multi-Agent System to Empower a Local Decision-Making Process for Land use Management; The SelfCormas Experiment in Senegal. Journal of Artificial Societies and Social Simulation Vol. 6 (3). Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K. and Harshman, R.. 1990. Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science Vol. 41 (6), pp. 391-407. De Saussure, Ferdinand. 1959 (1916). Course in General Linguistics. Charles Bally and Albert Sechehaye (eds), Wade Baskin (trans.). McGraw-Hill: New York, NY. Dewey, J. 2010 (1910). How We Think. FQ Books: New York, NY.
Page 79
Turnley, Chew, Perls Beyond Validation
Dionnet, Mathieu, Marcel Kuper, Ali Hammani, and Patrice Garin. 2008. Combining Role-Playing Games and Policy Simulation Exercises: an Experience with Moroccan Smallholder Farmers. Simulation Gaming Vol. 39 (4), 498-514. Druckman, J. N. 2001. Using Credible Advice to Overcome Framing Effects. Journal of Law, Economics, and Organization Vol.17, pp. 62–82. Duit, Reinders. 1991. On the Role of Analogies and Metaphors in Learning Science. Science Education Vol. 75 (6), pp.649-672. Epstein, Joshua M. and Robert L. Axtell. 1996. Growing Artificial Societies: Social Science from the Bottom Up (Complex Adaptive Systems). The MIT Press: Cambridge, MA. Fehr-Duda, Helga, Adrian Bruhin, Thomas F. Epper, and Renate Schubert. 2008. Rationality on the Rise: Why Relative Risk Aversion Increases with Stake Size. Socioeconomic Institute, University of Zurich. Working Paper No. 0708, February 2008, revised edition. Fischhoff, B. 1975. Hindsight not Equal to Foresight: the Effect of Outcome Knowledge on Judgment under Uncertainty. Quality and Safety in Health Care Vol. 12, pp.304-312. Fitzgerald, F. Scott. 1936. The Crack-Up. Originally published as a three-part series in the February, March, and April 1936 issues of Esquire. Accessed at http://www.esquire.com/features/thecrack-up#ixzz1b02VsXzH on January 26, 2012. Gentner, Dedre. 1983. Structure-Mapping: A Theoretical Framework for Analogy. Cognitive Science Vol. 7 pp. 155-170. Gick, M.L. and Holyoak, K.J. 1980. Analogical Problem Solving. Cognitive Psychology Vol. 12 (3), pp. 306-355. Green, Kesten C., and J. Scott Armstrong. 2004. Value of expertise for forecasting decisions in conflicts. University of Pennsylvania Scholarly Commons Marketing Papers 12-7-2004. Accessed at http://repository.upenn.edu/marketing_papers/10 on February 15, 2012. Gurung, Tayan Raj, François Bousquet and Guy Trébuil. 2006. Companion Modeling, Conflict Resolution, and Institution Building: Sharing Irrigation Water in the Lingmuteychu Watershed, Bhutan. Ecology and Society Vol. 11 (2), p. 36. Harvey, N., and I. Fischer. 1997. Taking Advice: Accepting Help, Improving Judgment and Sharing Responsibility. Organizational Behavior and Human Decision Processes Vol. 70, pp. 17–133. Hawkins, S. and R.R. Hastie 1990. Hindsight: Biased Judgment of Past Events after the Outcomes Are Known. Psychological Bulletin Vol. 107, pp.311-27.
Page 80
Turnley, Chew, Perls Beyond Validation
Heath, Chip and Gonzalez, Richard. 1995. Interaction with Others Increases Decisions Confidence but Not Decision Quality: Evidence Against Information Collection Views of Interactive Decisionmaking. Organizational Behavior and Human Decision Processes Vol. 61 (3), pp. 305-326. Hedesström, Ted Martin. 2006. The Psychology of Diversification: Novice Investors’ Ability to Spread Risks. Department of Psychology, Göteborg University, Gothenburg, Sweden. Hesse, Mary B. 1966. Models and Analogies in Science. University of Notre Dame Press: Notre Dame, IN. Heuer, R.J. 1999. The Psychology of Intelligence Analysis. Center for Intelligence Studies, Central Intelligence Agency: Washington, DC. Holland, John. 1992. Complex Adaptive Systems. Daedalus Vol. 121 (1), pp. 17-30. Horst, W, J. Rittel and Melvin M. Webber. 1973. Dilemmas in a General Theory of Planning. Policy Sciences Vol. 4, pp.155-169. Janssen, Marco A. and Elinor Ostrom. 2006. Empirically Based Agent-Based Models. Ecology and Society Vol. 11 (2). Accessed online at http://www.ecologyandsociety.org/vol11/iss2/art37/ on January 26, 2012. Jensen, Morten Berg, Björn Johnson, Edward Lorenz, and Bengt Åke Lundval. 2007. Forms of Knowledge and Modes of Innovation. Research Policy Vol. 36 (2007), pp.680-693. Jones, James H. 1992. Bad Blood: The Tuskegee Syphilis Experiment. Free Press: New York, NY. Kahneman, Daniel and Amos Tversky. 1979. Prospect Theory: an Analysis of Decision under Risk. Econometrica Vol. 47 (2), 263-291. Karas, Thomas. 2004. Modelers and Policymakers: Improving the Relationships. Sandia report. SAND 2004-2888, Sandia National Laboratories. Albuquerque, NM. Kegelmeyer, Philip, Brett Bader and Peter Chew. 2011. Multilingual Sentiment Analysis Using Latent Semantic Indexing and Machine Learning. Paper presented at SENTIRE workshop, International Conference on Data Mining, December 2011, Vancouver, BC. Kirton, M. J. (1984) 1986. Adaptors and Innovators: Why New Initiatives Get Blocked. Long-Range Planning. pp. 137-143; reprinted in Hellriegel & Slocum (eds.), Companion to Organisational Behaviour, West Publishing and Richards, M.D. (ed.), Readings in Management, South-Western Publishing Co., 1986. Konikow, Leonard F. and John D. Bredehoeft. 1992. Ground-Water Models Cannot be Validated. Advances in Water Resources Validation of Geo-hydrological Models Part 1. Vol. 15 (1), pp.75-83. Kruglanski, Arie W. and Donna M. Webster. 1996. Motivated Closing of the Mind: "Seizing" and "Freezing”. Psychological Review. Vol. 103, No. 2, pp. 263-283
Page 81
Turnley, Chew, Perls Beyond Validation
Lakoff, George and Mark Johnson. (1980) 1993. Metaphors We Live By. University of Chicago Press: Chicago, IL. Lévi-Strauss, Claude. (1955) 1977. Tristes Tropiques. John and Doreen Weightman (trans.). Pocket Books: New York, NY. Mair, Carolyn, Miriam Martincova, and Martin Shepperd. 2009. A Literature Review of Expert Problem Solving using Analogy. 13th International Conference on Evaluation and Assessment in Software Engineering, 20-21 April 2009, Durham, UK. Martindale, Colin. 1999. Biological Bases of Creativity. Chapter 7 in Robert J. Sternberg (ed.) Handbook of Creativity, Cambridge University Press: New York, NY. Mayer, Igor and Martin de Jong. 2004. Combining GDSS and Gaming for Decision Support. Group Decision and Negotiation Vol. 13, pp.223-241. McDonnell, J. J., M. Sivapalan, K. Vaché, S. Dunn, G. Grant, R. Haggerty, C. Hinz, R. Hooper, J. Kirchner, M. L. Roderick, J. Selker, and M. Weiler. 2007. Moving Beyond Heterogeneity and Process Complexity: a New Vision for Watershed Hydrology. Water Resources Research Vol. 43. Accessed at http://seismo.berkeley.edu/~kirchner/reprints/2007_79_McDonnell_beyond_complexity.pdf on January 26, 2012. McNamara, Laura. 2010. Why Models Don’t Forecast. Draft paper prepared for the workshop on Unifying Social Frameworks, National Research Council, Washington, DC. August 16-17, 2010. http://www7.nationalacademies.org/bohsi/Why%20Models%20Dont%20ForecastMcNamara.pdf on January 26, 2012. Miles, Donna. 2009. Petraeus Notes Differences Between Iraq, Afghanistan Strategies. American Forces Press Service 22 April 2009. http://www.defense.gov/news/newsarticle.aspx?id=54036 Moore, Gordon E. 1965. Cramming More Components onto Integrated Circuits. Electronics Vol. 38 (8). Morrison, Margaret and Mary S. Morgan. 1999. Models as Mediating Instruments. In Morgan, Mary S. and Margaret Morrison (eds.), Models as Mediators: Perspectives on Natural and Social Science. Cambridge University Press: Cambridge, UK, pp.10-37. Naivinit, W., C. Le Page, G. Trébuil and N. Gajaseni. 2010. Participatory Agent-Based Modeling and Simulation of Rice Production and Labor Migrations in Northeast Thailand. Environmental Modelling and Software Vol. 25 (11), pp. 1345-1358. National Commission on Terrorist Attacks upon the United States. (Philip Zelikow, Executive Director; Bonnie D. Jenkins, Counsel; Ernest R. May, Senior Advisor). 2004. The 9/11 Commission Report. New York: W.W. Norton & Company.
Page 82
Turnley, Chew, Perls Beyond Validation
Oberkampf, William. L. and Matthew F. Barone. 2005. Measures of Agreement Between Computation and Experiment: Validation Measures. Sandia Report SAND 2005-4302. Sandia National Laboratories. Albuquerque, NM. Oberkampf, William L. and Timothy G. Trucano. 2002. Verification and Validation in Computational Fluid Dynamics. Progress in Aerospace Sciences Vol. 38 (3), pp. 209-272. Oberkampf, William L. and Timothy G. Trucano. 2007. Verification and Validation Benchmarks. Sandia Report SAND2007-0853. Sandia National Laboratories, Albuquerque, NM. Oberkampf, William L., Timothy G. Trucano, and Charles Hirsch. 2003. Verification, Validation, and Predictive Capability in Computational Engineering and Physics. Sandia Report SAND 2003-3769. Sandia National Laboratories. Albuquerque, NM. Ogborn, Jon and Isabel Martins. Metaphorical Understandings and Scientific Ideas. International Journal of Science Education, Vol. 18 (6), pp. 631-652. O’Neill, Barry. 1998. Risk Aversion in International Relations Theory. Projektbereich B: Discussion Paper No. B-445. Dec. 1998. Oreskes, Naomi. 1998. Evaluation (Not Validation) of Quantitative Models. Environmental Health Perspectives Vol. 106 (6), pp. 1453-1460. Oreskes, Naomi and Kenneth Belitz. 2001. Philosophical Issues in Model Assessment. In M.G.Anderson and P.D.Bates (eds.), Model Validation: Perspectives in Hydrological Science. John Wiley & Sons, Ltd: New York, NY, pp.23-41. Oreskes, Naomi, Kristin Schrader-Frechette and Kenneth Belitz. 1994. Verification, Validation and Confirmation of Numerical Models in the Earth Sciences. Science Vol. 263 (5147), pp.641-646. Ormrod, J.E. 2003. Educational Psychology: Developing Learners (4th ed.). Merrill Prentice Hall: Upper Saddle River, NJ. Ormrod, Jeanne. 2003. Human Learning. Prentice Hall: New Jersey. Palincsar, Annemarie Sullivan. 1998. Social Constructivist Perspectives on Teaching and Learning. Annual Review of Psychology Vol. 49, pp. 345-375. Pang, Bo, L. Lee, and S. Vaithyanathan. 2002. Thumbs Up? Sentiment Classification Using Machine Learning Techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 79-86. Pavlov, I. P. (1927). Conditioned Reflexes: an Investigation of the Physiological Activity of the Cerebral Cortex. G. V. Anrep (trans. and ed.). London: Oxford University Press. Payne, John W.. James R. Bettman and Eric J. Johnson. 1992. Behavioral Decision Research: A Constructive Processing Perspective. Annual Review of Psychology Vol. 43, pp.87-131.
Page 83
Turnley, Chew, Perls Beyond Validation
Piaget, Jean. 1950. The Psychology of Intelligence. Routledge: New York, NY. Rastetter, Edward B. 1996. Validating models of ecosystem response to global change. BioScience. Vol. 46, Issue 3, pp.190-198. Refsgaard, Jens Christian and Hans Jørgen Henriksen. 2004. Modeling Guidelines – Terminology and Guiding Principles. In Advances in Water Resources Vol. 27, pp. 71-82. Richardson, Virginia. 2003. Constructivist Pedagogy. In Teachers College Record, Vol. 105 (9). Rosen, Allison B. MD, MPH, Jerry S. Tsai, MS, Stephen M. Downs, MD, MS. 2003. Variations in Risk Attitude Across Race, Gender, and Education. Medical Decision Making (Nov-Dec 2003), pp.511517. Runco, M.A. and S. Pritzker (eds.). 1999. Encyclopedia of Creativity. Academic Press: San Diego, CA. Rykiel, Jr., Edward. 1996. Testing ecological models: the meaning of validation Ecological Modelling., Vol. 90, pp.229-244 Schelling, T. 1969. Models of Segregation. The American Economic Review Vol. 59 (2), pp. 488-493. Schotter, A. 2003. Decision Making with Naive Advice. American Economic Review Vol. 93, pp. 196– 201. Shieber, Stuart. 1985. Criteria for Designing Computer Facilities for Linguistic Analysis. In Linguistics Vol. 23, pp. 189-211. Skinner, B. F. 1976. Particulars of My Life. Knopf: New York. Skinner, B. F. 1979. The Shaping of a Behaviorist. Knopf: New York. Skinner, B. F. 1983. A Matter of Consequences. Knopf: New York. Sniezek, Janet A. 1999. Judge Advisor Systems Theory and Research and Applications to Collaborative Systems and Technology. Proceedings of the 32nd Hawaii International Conference on System Sciences, Maui, Hawaii January 5-8, 1999, Institute of Electrical and Electronic Engineers. Sommerstein, Alan. 1977. Modern Phonology. University Park Press: Baltimore. Spulak, Robert G. Jr. 2010. Innovate or Die: Innovation and Technology for Special Operations. JSOU Report 10-7. The JSOU Press: MacDill Air Force Base, Florida. Standish, Russell K. 2001. On Complexity and Emergence. Complexity International Vol. 9. Accessed at arXiv:nlin/0101006v1 [nlin.AO] 2 Jan 2001 on 11/2011. Suedfield, Peter and Phillip Tetlock. 2001. Cognitive Styles. In Tesser and Schwartz (eds.), Blackwell International Handbook of Social Psychology: Intraindividual Processes. Wiley-Blackwell: New Jersey.
Page 84
Turnley, Chew, Perls Beyond Validation
Tetlock, Phillip. 1998. Close-call Counterfactuals and Belief System Defenses: I Was Not Almost Wrong but I Was Almost Right. Journal of Personality and Social Psychology Vol. 75, pp. 639-52. Tetlock, Phillip. 2005. Expert Political Judgment: How Good Is It? How Can We Know? Princeton University Press: Princeton, NJ. Tetlock, Phillip E. and Aaron Belkin (eds). 1996. Counterfactual Thought Experiments in World Politics: Logical, Methodological, and Psychological Perspectives. Princeton University Press: Princeton, NJ. Thorndike, Edward L. 1903. Educational Psychology. The Science Press: New York. Thorndike, Edward L. 1904. An Introduction to the Theory of Mental and Social Measurements. In Library of Psychology and Scientific Methods. The Science Press: New York. Thorndike, Edward L. 1905. The Elements of Psychology. A.G. Seiler: New York. Tulving, E. 1974. Cue-Dependent Forgetting. American Scientist Vol. 62, pp. 74-82. Tulving, E. and S. Wiseman. 1976. Relation between Recognition and Recognition Failure of Recallable Words. Bulletin of the Psychonomic Society Vol. 6, pp. 79-82. Tumasjan, A., Timm O. Sprenger, Philipp G. Sandner, and Isabell M. Welpe. 2010. Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, pp. 178-185. Turnley, Jessica G. and Aaron Perls. 2008. What is a Computational Social Model Anyway? Discussion of Definitions, a Consideration of Challenges, and an Explication of Process. Report No. ASCO 2008009, Defense Threat Reduction Agency, Advanced Systems and Concepts Office, US Department of Defense: Washington DC. Tykocinski, O.E., D. Pick, and D. Kedmi. 2002. Retroactive Pessimism: A Different Kind of Hindsight Bias. European Journal of Social Psychology Vol. 32 (4), pp. 577-88. US Department of Commerce, National Institute of Standards and Technology. 2006. State Weights and Measures Laboratories Program Handbook. NIST Handbook 143. Accessed at http://ts.nist.gov/WeightsAndMeasures/Publications/glossary.cfm on January 26, 2012. US Department of Defense. 2009. Instruction No. 5000.61. December 9, 2009. Vogel, R. M., and A. Sankarasubramanian. 2003. Validation of a Watershed Model without Calibration. In Water Resources Research Vol. 39 (10), pp.1292-1300. Von Neumann, J. and O. Morgenstern. 1947. Theory of Games and Economic Behavior, 2nd ed. Princeton University Press: Princeton, NJ. Watson, John B. 1913. Psychology as the Behaviorist Views It. Psychological Review Vol. 20, pp. 158177.
Page 85
Turnley, Chew, Perls Beyond Validation
Weber, Elke U. , Ann-Renée Blais, and Nancy E. Betz. 2002. A Domain-Specific Risk-Attitude Scale: Measuring Risk Perceptions and Risk Behaviors. Journal of Behavioral Decision Making Vol. 15, pp. 263-290. Weisman, Steven R. 2006. Rice Admits US Underestimated Hamas Strength. The New York Times, January 30, 2006. Accessed on May 2, 2011 at http://www.nytimes.com/2006/01/30/international/middleeast/30diplo.html?pagewanted=p rint. Winsberg, Eric. 2010. Science in the Age of Computer Simulation. University of Chicago Press: Chicago, IL. Yaniv, Ilan. 2004. The Benefit of Additional Opinions. Current Directions in Psychological Science Vol. 13, pp.75–78. Yaniv, Ilan and Maxim Milyavsky. 2005. Using Advice from Multiple Sources to Revise and Improve Judgments. Organizational Behavior and Human Decision Processes Vol. 103 (2007), pp.104–120.
Page 86