potentially catastrophicâ of the cognitive biases to which human fall victims (Plous ...... meltdowns at three reactors in the Fukushima Daiichi Nuclear Power Plant, ...
Project number: Project name: Project acronym: Theme:
Start date:
265138 New methodologies for multi-hazard and assessment methods for Europe MATRIX ENV.2010.6.1.3.4 Multi-risk evaluation and mitigation strategies 01.10.2010
End date:
multi-risk
30.09.2013 (36 months)
Deliverable D6.2 “Individual perceptual and cognitive barriers to multi-hazard management” Version: Final Responsible partner: IIASA Month due: M24 Month delivered: M35 Document:
Primary author:
Nadejda Komendantova, Noel van Erp, Pieter van Gelder, Anthony Patt
________________________________ Signature Date
08.2013 _________________
Reviewer:
Sandra Laskowski (KIT), Gordon Woo (ASPINALL) 08.2013 ________________________________ _________________ Signature Date
Authorised:
Kevin Fleming
08.2013
________________________________ _________________ Signature Date
Dissemination Level PU Public Restricted to other programme participants (including the Commission PP Services) Restricted to a group specified by the consortium (including the Commission RE Services) CO Confidential, only for members of the consortium (including the Commission Services)
X
Abstract The goal of the task 6.2 was to identify individual perceptual and cognitive barriers in decision-making under conditions of multi-hazards and multi-risks. We approach this task both psychologically and mathematically. The psychological approach consists of identifying the relevant heuristics and biases, as proposed by behavioural economics and environmental psychology, and applying these heuristics and biases to the analysis of historical cases of multihazard events. The historical cases chosen for this study are, respectively, the Kobe earthquake in 1995, the Sumatra-Andaman earthquake in 2004 and the Tohoku earthquake in 2011. The mathematical approach consists of offering both a Bayesian decision theoretic framework and an extended information theoretic framework, also called inquiry calculus. The former models the way we choose among competing decisions. The latter allows us to quantify the relevance that a source of information carries for some issue of interest. Both frameworks are ‘new’. The here offered Bayesian decision theoretic framework will be used to analyse a simple decision problem under multi-risks conditions. This is done for demonstrative purposes. The extended information theory framework can be used to provide us with some insights into the mechanisms which underlie risk communication.
Keywords: multi-hazard risk mitigation, heuristics, cognitive and behavioural biases, Bayesian decision theory, variance preferences, information theory, inquiry calculus
1
Acknowledgments The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under grant agreement n° 265138.
2
List of Tables, Figures and Annexes Tables Table 1: Management strategies for correlated and uncorrelated multiple hazards (page 28) Table 2: Biases and factors influencing decision-making process in multi-hazards cases (page 29) Table 3: Contributing factors identified in decision-making in frames of three historical case studies (page 44) Table 4: Factors influencing decision-making of risk management in three historical cases (page 46)
Figures Figure 1: Schematic of a general to model human reasoning (page 18) Figure 2: The Sumatra-Andaman earthquake and the resulting Indian Ocean tsunami 2004 (page 32) Figure 3: Great Hanshin earthquake of 1995 (page 36) Figure 4: ShakeMap (intensity) of the Haiti earthquake of 2010 (page 40) Figure 5: Great East Japan earthquake of 2011 (page 42)
Annexes Appendix 1: Bayesian Decision Theory All tables in this appendix are for internal use only, that is, for the demonstration and the clarification of the concepts behind the Bayesian decision theory. As such, they have no immediate bearing outside this appendix and, thus, will not be listed here.
Appendix 2: Information Theory and Risk Communication All tables in this appendix are for internal use only, that is, for the demonstration and the clarification of the concepts behind information theory. As such, they have no bearing outside this appendix and, thus, will not be listed here.
3
Table of Contents
Table of Contents ....................................................................................................................... 4 1 Introduction ............................................................................................................................. 5 1.1 Goals of this deliverable .................................................................................................. 5 1.2 Modification of Task 6.2 .................................................................................................. 6 2 Theoretical framework ............................................................................................................ 8 2.1 A typology of multi-hazard decision situations ............................................................... 9 2.2 Theories .......................................................................................................................... 10 3. Historical analysis ................................................................................................................ 17 3.1. Behavioral and Cognitive Biases in Historical Multi-Hazards Cases ........................... 17 3.2. Test cases ....................................................................................................................... 24 3.2a. The Indian Ocean Tsunami of 2004 ............................................................................ 29 3.2b The Kobe Earthquake of 1995...................................................................................... 33 3.2c The Haiti Earthquake of 2010 ...................................................................................... 36 3.2d The Tohoku Earthquake of 2011 .................................................................................. 38 4 Discussion ............................................................................................................................. 42 Appendix 1: Bayesian Decision Theory................................................................................... 47 Appendix 2: Information Theory and Risk Communication ................................................... 95 References .............................................................................................................................. 121
4
1 Introduction 1.1 Goals of this deliverable A community or region can be vulnerable to multiple hazards such as tsunamis, earthquakes, landslides, floods, volcanic eruptions etc. However, it is still unclear as to how policy-makers face different independent and inter-dependent hazards, how they perceive the probabilities of these events, and how they make choices with regards to their risk mitigation and management. Depending upon the available means, they might treat multiple hazards according to their likelihood to happen, the possible impacts upon society and the economy, while considering them independently from each other or in their inter-dependency. Therefore, further understanding of the decision-making process is required for successful multi-risk mitigation and management. This decision-making process involves both individual and institutional decision-making, marked by existing institutional structures, the division of responsibilities and conflicts of interests between involved stakeholders. In task 6.2 we focus on individual patterns of decision-making, while the institutional patterns are studied in greater detail in the task 6.3. One question is whether decision-makers and individual stakeholders base their estimations of the probabilities and outcomes of interdependent multiple hazards on the basis of the assessment of separate hazards or upon their inter-dependency? Another question is if this process is marked by evidence of behavioural and cognitive biases, which could influence the estimation of the probabilities and therefore the risk mitigation process? If risks are considered independently through inhomogeneous procedures, their comparison is most likely difficult. In cases where decision-makers consider multiple hazards, they might treat them as simply the sum of single-risk assessments. This may lead to a severe underestimation of the resulting multi-hazard risks because of two reasons. First, single-risk assessments are not always adapted for inter-comparison because of different spatial and temporal resolutions and different approaches to vulnerability. Second, not all single risks are really independent. This makes it difficult to compare risks of different origins, while the implicit assumption of independence of the risk sources leads to the neglect of possible interactions among the risks. This inherently leads to the neglect of the so-called “conjoint” and “cascade” effects, when the risk of a hazard and vulnerability of exposed elements may change significantly after an initial event has occurred. Conjoint effects are when a series of parallel adverse events, generated by different sources such as earthquakes and landslides, 5
increase the vulnerability of a region. Cascade effects happen when an adverse event, located inside or outside the site, triggers one or more sequential events (some of which may be worse than the triggering event, e.g., the Tohoku earthquake and tsunami in 2011). One of the goals of the MATRIX project is to establish a ranking of different types of risks within the framework of a multi-hazard event, taking into account possible conjoint and cascade effects. The multi-risk analysis, developed in previous scientific works, can provide a framework for quantitative evaluations, assessing potential damages caused by all events to an object such as industry, city or environment. But the framework of analysis on the role and capacities of risk mitigation actors, especially in cases of cascading and conjoint hazards, is still under development. It is important to understand how these stakeholders perceive the various risks and how they settle priorities for their risk mitigation actions, based on their perceptions of probabilities for different risks and their possible outcomes. These perceptions lead to different choices for multi-hazard risk mitigation and management.
1.2 Modification of Task 6.2 In the MATRIX proposal, the task 6.2 is described as “Individual perceptual and cognitive barriers to multi-hazard management”. Originally, we planned that this task would deal with patterns of decision-making at the individual level. Based on this research task, a deliverable D6.2 “Individual barriers to multi-hazard analysis” was to be produced. This deliverable was originally planned to have two parts. The first was to analyse findings from the literature dealing with behavioural economics, which identifies how individuals evaluate and prioritize their actions under conditions of multiple types of risk. Then we planned to map our findings into the decision-making process for the case of multi-hazards. The second part would involve conducting an economic experiment in the decision-making laboratory, to quantify and to validate our findings. However, the original content of the task 6.2 was not tied to any case study. Preliminary work has shown that there is a significant amount of scientific literature dealing with the issues of individual barriers to multi-hazard analysis. At the same time, case studies and best practices of the management of multi-hazards appear to have been studied very weakly. Therefore, it appeared to us that it was necessary to refocus the means of achieving the aims of task D6.2 by undertaking the planned research activities within the framework of specific case studies. We did so by considering the examples of the Kobe
6
earthquake of 1995, the Sumatra-Andaman earthquake of 2004 and the Tohoku earthquake of 2011, as well as other, less well known, multi-hazard events. Our motivation for refocusing the work plan under task 6.2 is the following. One of the assumptions of the MATRIX project is that analysing and addressing multiple hazards together will lead to significant reductions in the costs of, and improvements in the efficiency of, risk mitigation and adaptation measures, compared to cases where hazards are treated separately from one another. There is also the fact that currently the decision-making process is driven more by the single-risk approach. This motivated us to conduct research on how people perceive probabilities of different hazards in a multi-risk surrounding, to be followed by assessing how they make their choices with regards to the mitigation and management of hazards under such conditions. Our hypothesis was that the decision-making process is influenced by different kinds of cognitive and behavioural biases, such as availability heuristics, limited worry, bounded rationality, dread risk and others. These biases result in a situation where decision-makers perceive the probability of certain events as being too low or that they do not address them in their risk mitigation strategies. We follow these questions through the application of a variety of tools and methods, including both quantitative and qualitative methods of analysis. The types of biases and their influence on the assessment of probabilities and the making of choices were identified through the analysis of three historical cases of multi-hazard events using Bayesian Decision Theory and Information Theory.
7
2 Theoretical framework There have been numerous suggestions within the MATRIX project that people make poor decisions in response to the threat of multiple hazards. The project proposal, for example, begins with the paragraph: A variety of natural extreme events, including earthquakes, landslides, volcano eruptions, tsunamis, river floods, winter storms, wildfire, and coastal phenomena, threaten different regions of European countries. Planners and policy-makers, and the scientists, who inform their judgment, usually treat the hazards and risks related to such events separately from each other, neglecting interdependencies between the different types of phenomena, as well as the importance of risk comparability. Fixing this deficit will improve their ability to take risk reduction measures in a cost-effective way. One might suspect that numerous references in the literature would be available to support the claims made in the second and third sentences of the paragraph above. One would, however, be mistaken. Indeed, the peer-reviewed literature does not contain a single documented, well-examined example that deals with how the failure to consider two or more separate hazards together led to an outcome worse than their separate considerations. Even anecdotal evidence to support the claim appears weak. An informal poll of the participants in the MATRIX project, all experts in one element or another of risk analysis and management, generated few examples that were not problematic in some way. We will describe these examples in the next section. In this document, we examine the substance of the claim that a failure to consider multiple hazards together is a problem, in terms of causing unnecessary losses. We do so in three steps. First, we develop a simple typology of the multi-hazard situation, distinguishing those cases where a failure in analysis derives from the failure to consider two or more hazards at the same time, as distinct from the failure to consider one or more of those hazards at all. Second, we examine the reasons that one might expect individuals faced with a multihazard problem to do a poor job at considering the two hazards together. Based on the behavioural economic and behavioural decision-theory literature, we ask if there are reasons to expect such a problem to exist. Third, we examine case studies where there is some evidence that an error of judgment occurred, as a consequence of a failure to consider multiple hazards together. We appraise whether the claims for such an error of judgment are justified, and if they are attributable to the behavioural factors that we have previously identified. 8
2.1 A typology of multi-hazard decision situations In disaster management, multiple extreme events fall into three categories. The first is when two or more extreme events occur simultaneously. The second involves the combination of extreme events with underlying conditions that amplifies the impacts of the events. The third involves the combination of events, which are themselves not extreme, but will lead to extreme events when they are combined (Orlowsky and Seneviratne, 2012). One can also add conflicting or synergistic management strategies for separate hazards, regardless of the hazards themselves, in the cases where they are co-related. A variety of natural extreme events can threaten a region at the same time. For example, a hazard may not only cause additional events via cascade or domino effects, such as earthquakes triggering tsunamis, but may also increase the vulnerability of the natural and human environment to events in the future, such as an earthquake damaging flood defences. The consequences of such interactions are likely to worsen in the future, as society’s interdependency increases at all social, economic and political levels. The multi-risk approach is defined in the literature as a joint analysis and quantification of all anthropogenic and natural risks that can affect a territory (Marzocchi et al., 2012). It can be a precondition for sound environment management and land use planning, as well as for competent emergency management before and during catastrophic events (Durham, 2003). Multi-risk analysis is still regarded as a new field, with incomplete theory and fragmentation across different scientific and socio-economic disciplines. However, in multi-risk research, the decision-making process of stakeholders is often described in terms of “how the decision should be made”. To understand the cognitive biases in the decisionmaking process, we need to describe it from the point of view of “how the decision is made in reality”. Since the decision-makers make numerous decisions per day, the rational decision based on a systematic prescriptive model is not always possible, therefore leading to a situation when the majority of decisions are done on the basis of value judgments (Kahnemann, 2011). The analysis of how decision-makers perceive risks is an important part of understanding the processes behind multi-hazard risk mitigation and management. Most of the scientific research assessed indicates that risk perceptions have a significant influence on risk mitigation and management, for example, evacuation or long-term hazard adjustments
9
(Sorensen, 2000). It seems that people are prone to using behavioural heuristics, which improves judgmental efficiency in situations of risk (Finucane at al., 2000). The decision-making process in terms of the assessment of the probability of an event and the making of informed choices influences the mitigation of multiple hazards. The decision-makers may err when defining the probability or estimating the strength of an event. Based on the analysis of historical cases, we follow the question of what the outcome is if stakeholders treat such hazards separately from each other, neglecting the interdependencies and frequent causal, spatial and temporal relationships that exist between different kinds of risk. The second question is how their behaviour is influenced by different kinds of behavioural biases, leading to the possibility of false estimations of the probabilities of multihazard events and the influence this has on further choices in the decision-making process with regards to multi-risk mitigation and management. Additionally, time and cost constraints limit the quantity and quality of any available information.
2.2 Theories Two theories related to human behaviour focus on the decision-making process with regards to the probability and consequences of hazardous events. These are classical utility theory (Ramsey, 1931), and support theory or behavioural economics (Kahneman and Tversky, 1984). In his recent work, Kahneman (2011) reflects upon the division in human perception processes into these two systems. One system operates quickly on the basis of emotions, personal experience and behavioural heuristics. Another system makes the decision on the basis of more complex analytical activities, such as an understanding of numerical probabilities and probability theory or learning from statistical descriptions. This corresponds to the different theoretical angles of support theory or behavioural economics, which studies the first system, and of classical utility theory, which is based on the second system. The aim of classical utility theory is to understand how people make their decisions under conditions of uncertainty, and is connected to the subjective probability of an event. The major argument is that people base their beliefs about the likelihood of an uncertain event according to the expected consequences from this event (Ramsey, 1931). This approach is based on the expected utility theory with risk aversion. The methodology of this theoretical framework requires the application of such methods as cost-benefit analysis and risk-benefit analysis. However, this theoretical approach does not investigate how decisions are actually made, or what factors influence them. In fact, in practice, decisions are often driven by issues 10
other than logical factors, which are subject to cognitive biases in terms of the estimation of the probability of events and making choices. The gap in our knowledge on how decision are made is filled by behavioural economics, which analyses how individuals perceive the likelihoods of risks and make their decisions facing uncertainties. For example, when a coastal community takes a decision whether to invest in risk mitigation measures against floods, the scenarios for this decision will be developed within frameworks of classical theory, at the same time as the behavioural factors influencing this decision will be the subject of support theory. Another example is a farmer who may decide not to purchase insurance as he perceives the risk not to be high enough. Or a decision-maker may decide not to face high up-front investments in risk mitigation measures as he perceives the probability of a given event as low or that the expected short-term benefits would not justify the high investment, even though the long-term benefits will exceed the up-front investment. The methodological framework of behavioural economics is mainly focused on decision-making processes and behavioural patterns of how stakeholders perceive the probability and likelihood of natural hazard risks. It helps to understand the perceptions of stakeholders of existing risks and uncertainties, when a decision on a given action needs to be taken, for example, when a community takes a decision whether or not to invest in a new irrigation system when facing the risk of drought, or when a farmer wants to understand the risks and uncertainties about droughts in the future and their consequences (Patt et al., 2005). Behavioural economics do not analyse the probabilities of risk or provides risk assessment, but it helps to understand how stakeholders make their decisions with regards to their judgment of the probabilities and consequences of an event. It is closely connected with the definition of epistemic uncertainty in its three forms according to Stirling (2007), namely uncertainty in general, which results from insufficient knowledge to assess probabilities, ambiguity, which is connected with insufficient knowledge about possible outcomes, and with ignorance, which results from both insufficient knowledge of likely outcomes and of their probabilities. Behavioural economics originated as a reaction to the mathematical expected utility theory (Philips and Winterfeldt, 2006), when it was found that expected utility theory, the dominant decision theory in the 1950's, failed to adequately model human decision making in certain instances, leading to such paradoxes as the Ellsberg and Allais paradox. In 1992 cumulative prospect theory was introduced (Tversky and Kahneman, 1992). Cumulative prospect theory is in structure, though not in implementation, much akin to expected utility 11
theory. In the latter weighted sums, obtained by adding the utility values of outcomes multiplied by their respective probabilities, are compared. In the former weighted sums are also compared, but these weighted sums are obtained by multiplying the utility values, obtained by applying a two part power function to both monetary gains and losses, (Tversky and Kahneman, 1992), by their respective decision weights. These decision weights are transformed probabilities. If the transformation function is the unity function, then the decision weights equal the probabilities themselves, (Fennema and Wakker, 1997), and cumulative prospect theory collapses to expected utility theory. The behavioural economics paradigm may be complemented by a Bayesian decision theory, which holds as a special case the expected utility theory, (Appendix 1, Chapter 6). This complementary theory has as its building blocks, for the inference part, the product and sum rule, (Appendix 1, Chapter 4), that is, Bayesian probability theory, as well as, for the utility assignment of monetary outcomes, the Weber-Fechner law of psychophysics, (Appendix 1, Chapter 7). The Bayesian decision theoretical framework is both a rediscovery and re-appreciation of Allais’ two suggestions of both variance preferences and the use of the Weber-Fechner law in order to assign utilities to monetary stimuli, (Appendix 1, Chapter 9). The mathematical equations, that is, the product and sum rule of Bayesian probability theory, or, equivalently, the plausible syllogisms, (Appendix 1, Chapter 2), only provide us with the structural machinery of the decision making process, whereas the particular inputs that are fed into this mathematical structure, that is, the probability assignments and outcome assessments, are subjective and, therefore, open for psychological insights. Such insights may be provided by way of the assignment heuristics, that is, behavioural economics, and the Weber-Fechner law, that is, psychophysics. So, from a Bayesian perspective, all the psychological literature on how humans assign their probabilities or their utilities, that is, the assignment heuristics discovered by behavioural economics, may, in principle, be a welcome help in the construction of any general model of human risk perception. With the caveat that human behaviour is perceived to be basically rational, that is, we all have a ‘sense’, which is ‘common’ to all. Once we have assigned our probabilities, common sense is expressed by way of a consequent application of the product and sum rules (Appendix 1, Chapter 3). In Bayesian probability theory, the probability distributions are the information carriers, which represent our state of knowledge, (Jaynes, 2003). In real life of decision problems, we are often faced with several possible outcomes, each outcome having its own plausibility of occurring, relative to the other outcomes taken under consideration. 12
Furthermore, each outcome in the outcome probability distributions may be mapped on either a single utility value or a range of utility values. This remapping then leaves us with the utility probability distributions, upon which we may base our decisions. If we have an utility axis which goes from minus infinity to plus infinity, then the utility distribution which is ‘more-to-the-right’ will tend to be more profitable then the utility distribution which is ‘more-to-the-left’. Seeing that we are comparing utility distributions in the Bayesian decision theoretical framework, this then leaves us with the question how to go about comparing such distributions. The answer is that by way of the mean and standard deviation of the utility probability distribution we may get a numerical handle on these probability distribution objects, by way of the k-sigma confidence intervals, (Appendix 1, Chapter 8). It may be demonstrated that expected utility theory is a special instance of the more general Bayesian decision theoretical framework, in which 0-sigma ‘confidence-intervals’, that is, expectation values, are used to differentiate between decisions, (Appendix 1, Chapter 8). Furthermore, it may be shown that all of the five phenomena, which set a minimal challenge that must be met by any adequate descriptive theory of choice, (Tversky and Kahneman, 1992), are met by this new mathematical and, thus, parsimonious framework. The five phenomena are source dependence, (Appendix 1, Chapter 5), non-linear preferences, riskseeking, loss aversion, (Appendix 1, Chapter 7), and framing1. The Bayesian decision theoretical framework not only adheres the five phenomena which set a minimal challenge that must be met by any adequate descriptive theory of choice, but is also an intuitive extension of expected utility theory, (Appendix 1, Chapter 6). This extension removes both the Allais and Ellsberg paradoxes, respectively, (Appendix 1, Chapter 9) and (Appendix 1, Chapter 5). Now, seeing that many researchers tend to ignore these paradoxes and go ahead with their cost-benefit analyses, by way of expected utility theory, anyways, one could argue that is not such a big deal. But we think it is happy news indeed that from now on we may perform these analyses without those pesky Allais and Ellsberg paradoxes snapping at our heels.
1
The reason that the chapters on non-linear preferences, risk-seeking, and the framing effect are not included in
the appendix, is due to the space constraints placed on us by the scientific reviewer. Non-linear preferences, risk-seeking, and the framing effect are all treated in a thesis which is to be published in the second half of 2013, (van Erp, 2013).
13
Furthermore, for the simple Dutch multi-hazard flooding example, (Appendix 1, Chapter 8), we see that the amount of money we are willing to invest in our flood defences for a 0-sigma level cautiousness, which corresponds with the expected utility theory case, differs with a factor 100 for a 1-sigma level cautiousness up to a factor of about 1000 for a 6-sigma level cautiousness, (Table 8.1, Appendix 1). This would seem to be enough of a material difference with the current expected utility paradigm, as typically used in cost-benefit analyses, to warrant a second look at the Bayesian decision theoretical framework as is presented in Appendix 1. In Bayesian probability theory, we assign probabilities to propositions and then operate on these probabilities by way of the probability theoretic product and sum rules. Bayesian probability theory has now been supplemented with an extended information theory, or equivalently, inquiry calculus, (Knuth, 2002, 2003, 2004, 2008, 2010). In the extended information theory, we assign relevancies to sources of information and then operate on these relevancies by way of the information theoretic product and sum rules. This new information theory, which is Bayesian in its outlook, constitutes an expansion of the canvas of rationality (Skilling, 2008) and, consequently, of the range of psychological phenomena which are amenable to mathematical analysis. We will give here a rough outline of the results of such an analysis. Slovic, (1999), states that the limited effectiveness of risk-communication can be attributed to the lack of trust. If you trust the risk-manager, communication is relatively easy. If trust is lacking however, no form or process of communication will be satisfactory. In information theoretic terms, this translates to the statement that if the probability of some source of information being unbiased and competent is low, then its relevance will also be low, where a low relevance implies an a priori inability to modulate our prior beliefs regarding some issue interest. However, information theory shows us that there is a second factor at play in riskcommunication, other than trustworthiness. This second factor is the likelihood of an actual danger as perceived by stakeholders. This is because even for a high level of trust in the unbiasedness of a given source of information, a high perceived likelihood of the danger in question may render that trusted source of information still irrelevant, as one’s own sense of danger overrides all the assurances of safety. The perceived likelihood of the unbiasedness of the source of information will typically be influenced by the actions of the source of information. If the source of information is the government itself, then these actions may entail giving full disclosure, 14
taking full responsibility, and distancing oneself from any suggestion of a conflict of interest, and so on. The perceived likelihood of danger will typically be a function of the state of knowledge one has regarding the specifics of the danger, which is in play. If a full and thorough understanding of the dangers involved requires some form of scientific training, then the plausibility that a lay person assigns to the proposition of there being a danger, typically, will be diffuse. Such diffuse plausibility may be swayed either way, if some authoritative and unbiased source of information has some pertinent information to offer regarding these dangers. Examples of such ‘opaque’ dangers would be, for example, the dangers associated with exposure to radiation or those associated with climate change. In such cases, scientists from the respective fields, typically, fulfilled the role of being the authoritative and unbiased sources of information. However, today we witness a rising general scepticism as to the unbiasedness of these experts. So, even if their perceived competence is high, by way of their scientific credentials, scientists now also will need to manage the perceived likelihood of them being unbiased in order to be heard, that is, be relevant. Just like the governments have to do. Scientists may do this by giving full disclosure, distancing themselves from any suggestion of a conflict of interest, and by refraining from committing scientific fraud. For a comprehensive as well as gentle introduction into the new extended information theoretic framework, not to be found anywhere else, we refer the interested reader to Appendix 2, Chapter 2. In Appendix 2 one may also find the information theoretic analysis of the simple risk communication problem given above, (Appendix 2, Chapter 3). The extended information theory constitutes an expansion of the canvas of rationality, (Skilling, 2008), and, consequently, of the range of psychological phenomena which are amenable to mathematical analysis. Bayesian information theory is both applicable to data-analysis problems as well as to the modelling of psychological phenomena. In this regard, information theory is no different from Bayesian probability theory, which may also be used for both data-analysis as well as the construction of Bayesian belief networks. In Appendix 2 we will only apply information theory to the mathematical modelling of the psychology underlying risk communication, as this was seen to be the more pertinent application of the extended information theory for this particular deliverable. We summarize, the Bayesian decision theoretical framework is a direct off-shoot of Bayesian probability theory. The latter having its axiomatic roots in lattice theory, as the product and sum rule of Bayesian probability theory may be derived by way of consistency 15
requirements on the lattice of statements, (Knuth and Skilling, 2010). One may derive, likewise, by way of consistency requirements on the lattice of questions, the product and sum rule of Bayesian information theory, (personal communication Knuth, 2012). So, if we choose rationality, that is, the consistency requirements, as our guiding principle to model the inference process in itself, then we get on the one hand Bayesian probability theory, with as its specific application Bayesian decision theory, and on the other hand we get Bayesian information theory. Assignment heuristics of behavioural economics and the Weber-Fechner law of psychophysics may be invoked to assign probabilities and assess utilities of outcomes. In doing so, we obtain a comprehensive, coherent, and powerful framework of both human reasoning and decision making, in which both rationality and irrationality may take their proper places, Figure 1. Figure 1: Schematic of a general to model human reasoning
Psychophysics:
Bayesian Decision Theory
Weber-Fechner law for utility assignments
Behavioural Economics:
Comparing utility distributions through k-sigma confidence intervals
Assignment heuristics for both probabilities and utilities
Bayesian Information Theory
Lattice of questions
Bayesian Probability Theory
Lattice of statements
Lattice theory + consistency requirements
16
3. Historical analysis 3.1. Behavioural and Cognitive Biases in Historical Multi-Hazards Cases The history of efforts to mitigate the effects of natural hazards and disaster losses is one of increasing attention to anticipatory action, rather than simply post-disaster clean-up (Linnerooth-Bayer et al., 2005). It is also a history marked by cognitive biases, or claimed cognitive biases, and exasperated individuals claiming that others are acting irrationally (Kunreuther 1996; Kunreuther 2008). One of the areas where recent attention has turned to the irrationality of disaster mitigation efforts is the case when multiple hazards threaten the same region or community; there are claims that under such situations, decision-makers failed to appraise the entire situation, and thus to mitigate multiple hazards (T6, 2007). Behavioural economists have identified several cognitive and behavioural biases, which influence perceptions of probability and consequences of hazardous events by stakeholders. These biases are bounded rationality, experimental versus statistical evidence, probability weighting and 50/50 bias, dread risk, availability heuristics, limited worry, prospect theory and loss aversion. We describe these biases in a greater detail below, and place them within the context of multi-risk concerns. Bounded rationality in decision-making describes the situation when the rationality of individuals is limited by information, which they already have and the amount of time they have to make a decision. It is an alternative bias to mathematical modelling of decisionmaking, which is a rational process of finding an optimum solution. In situations when decision-makers lack the ability and resources to arrive at the optimal solution, they apply their rationality only after they have simplified significantly the available choices. Misguided by this bias, people won’t optimize, but they would rather satisfy. They indeed won’t solve a particular problem at all until they recognize the current situation as being unsatisfactory. There are examples in the scientific literature on bounded rationality that show that decisionmakers are limited in their ability and desire to collect information for their further actions. Often in their actions they attempt to satisfy rather than optimize, which leads to a situation when decision-makers focus only on a limited set of options from available alternatives (Lindblom, 1959). Under time and cost constraints, the decision-makers are simply searching for a solution until they reach a certain acceptable level of performance. In their decisions, the stakeholders are relying on a number of simplifying strategies or rules of thumb, which serve as a mechanism for coping with complex environmental surroundings. Sometimes they are
17
helpful, but sometime it could lead to severe errors. In this case, the time saved is offset by the poor quality of the decision. With regards to a multi-hazard situation, the decision-makers could limit the focus of their attention to well-known risks and ways to mitigate it. This would limit their capacity to address additional probable or new risks. Also, they will be constraint in their choice for the best possible risk mitigation option. The experimental versus statistical evidence bias is also known as the base rate fallacy. It appears when people make their judgments about the probability of an event under uncertainty. It influences the decision-making process, as people prefer individual information in their judgments instead of general or statistical information, even in cases when the latter is available. This bias is also known as representativeness heuristics, where representativeness is defined as "the degree to which an event is similar in essential characteristics to its parent population, and reflects the salient features of the process by which it is generated" (Kahneman and Tversky, 1979). This bias potentially leads to wrong judgments, as the fact that an event is more representative does not make it more likely than another event. In addition, stakeholders usually overestimate the ability of this heuristic to accurately predict the likelihood of an event. This bias is connected with two determinants, similarity and randomness. Considering similarity, by judging the representativeness of a new event, people usually pay attention to the degree of similarity between this event and a standard event. It is also strongly influenced by memory, when concrete examples are remembered and influence judgments about the probability of an event. The randomness appears when people judge things that do not look like having any logical sequence as being representative of randomness and thus more likely to happen. With regards to estimations of the probabilities of natural hazards, and especially when prioritizing actions in the case of a multi-hazard situation, stakeholders are more motivated to solve problems born out of their personal experience than out of an unfamiliar threat that is communicated to them by scientists or analysts. The probability weighting or 50/50 bias is also known as the conservatism or regressive bias or regression fallacy. It ascribes causes in cases when they do not exist, especially with regards to predictions. The logical flow is to make predictions that expect rational results to continue as if they are average. Influenced by bias, people tend to make actions when the variance of an event is at its peak. After the circumstances become more normal, people believe that their actions were the cause of the change, when in fact there was no causal relationship. An example of this fallacy could be the installation of speed cameras 18
after a number of accidents have happened on the road. The proponents of cameras would then argue that a decrease in the number of accidents was caused by the installed cameras. However, it might be or might be no causal dependency between these two events. The regressive bias is connected with the regression towards the mean: when it is extreme on its first measurement, it will tend to be closer to the average on a second measurement, and if it is extreme on a second measurement, will tend to have been closer to the average on the first measurement. For example, a class of students takes two editions of the same test on two successive days. It has frequently been observed that the worst performers on the first day will tend to improve their scores on the second day, and the best performers on the first day will tend to do worse on the second day. This bias influences estimations in multi-hazard environment, as stakeholders tend to overestimate small risks and to underestimate large risks. This leads to poor risk comparability in cases where the probabilities of different events are quite different. The dread risk bias is connected with the judgments of people about unknown risk and their “perceived lack of control, dread, catastrophic potential, fatal consequences, and the inequitable distribution of risks and benefits” (Slovic, 1987). The bias appears when the expert judgments of risk run counter to the judgments of lay people. The nuclear power industry is a canonical example of dread risk. The majority of experts find nuclear power to be a relatively low risk technology, at the same time the public finds it to be a high-risk technology. Another example is the consequences of fear of flying in terms of mortality rates compared to the mortality rates in car accidents. The dread risk leads to the situation when many more people are afraid to fly by planes than to drive by cars, as they perceive dread risk to be higher in plane accidents. Slovic argues that dread risk can hardly be corrected as all “attempts to educate” or reassure the public and bring their risk perceptions in line with those of industry experts appear unlikely to succeed because, taking the example again of the nuclear industry, the low probability of serious reactor accidents make empirical demonstrations of safety difficult to achieve. Because nuclear accidents are perceived as unknown and potentially catastrophic, even small accidents will be highly publicized and may produce large ripple effects (Slovic, 1987). Stakeholders estimate some risks, mainly unknown ones, as being more frightening than others. Lay people also estimate their probability as being higher than other types of risks. The dread risk relates to emotions, which make people perceive the probability of an event as high only because it lies out of their control in terms of the risk exposure and 19
consequences. It is also connected with unknown risk, when the risk is perceived as new with unforeseeable consequences and not detectable exposure. Often this type of bias is associated with a new technology, marked by unknown risks. Availability heuristics describes a bias in behaviour when people are confronted with the difficult task of judging the probability or frequency of an event. When they use a strategy which is based on how easy it is to recall similar events to simplify this judgment, , it is connected with the perceptions of how often an event has occurred. The perceptions can also be influenced by media coverage, when people tend to estimate the likelihood of events, which are shown regularly in the mass media, as being more likely and probable than other less discussed or broadcast events. Availability heuristics makes the decision-making process not reliable as the stakeholders tend perceive recent risks or risks of which they can easily recall information as being more probable. Under availability heuristics, the recent experience of a disaster influences strongly the perceptions of people about the probability of similar disaster in the future. This is also called familiarity heuristics and suggests that the ability of stakeholders to see events as likely to happen depends on how easily they can recall specific past information associated with this event. When an individual judges the frequency of an event by the availability of its instances, an event whose instances are more easily recalled will appear to be more numerous than an event of equal frequency whose instances are less easily recalled (Tversky and Kahneman, 1974). People estimate the probability of events to be higher if they can draw upon emotionladen memories. This leads to a situation where people underestimate the likelihood of low probability events and then overestimate them after a disaster has occurred. Availability heuristics also leads to situations where people take seriously uncertainties, which relate to their personal and local experience. However, they have difficulties in perceiving uncertainties which relate to more global and abstract processes (Marx et al., 2007). Under cognitive myopia and selective attention, people focus primarily on short-term consequences, when they perceive a hazard as being less probable if it did not happen for several years (Kunreuther et al., 1978). The probability and consequences of a hazard are essential attributes of how people perceive risks. The perceptions of environmental hazards are closely connected with people’s expectations of personal impacts from a disaster. These expected personal impacts include death and injury, damage to property and professional activity. They are influenced by two groups of factors. The first one includes personal experience with hazards, considering such factors as recency, frequency and the intensity of personal experience of the stakeholders. It is 20
also correlated with the proximity to a source of hazard. The second group includes indirect factors such as information received from the mass media as well as from other sources. Information about a hazard, together with personal experience, produces a situational perception of personal risk and possible impacts. These perceptions influence the actions of stakeholders on risk mitigation and during the disaster situation. Moreover, scientific research on the perceptions of adjustment actions have found that stakeholders differ significantly in their perceptions according to such characteristics as expertise, trustworthiness, and protection responsibility, that is if they believe that adjustment actions are their responsibility or if they shift this responsibility to the government. This leads to a situation where decision-makers assess the frequency, probability and likelihood of a natural hazard based on the occurrence of an event in the recent past, i.e., one that is “available” in the memory. Such natural hazards will be perceived to have a higher probability than a hazard that is difficult to imagine or vague, as it has not occurred in recent memory. Also, a natural hazard of greater frequency generally is more easily kept in mind than an event of a lesser frequency. In multi-hazard situations, such decision-making patterns lead to an ignorance of events, which may not have happened for several years, but are still probable for a given region. This susceptibility to vividness and recency also leads to a situation when decision-makers overestimate events given recent history, which might in fact be unlikely in the future. The stakeholder can make a decision with regards to a hazard where his confidence is higher due to familiarity heuristics and which will lead him or her to a faster decision and to results that, while more comfortable, are not really accurate. The ambiguity aversion bias is associated with the source preference as the willingness of people to estimate the probability of an uncertain event, depending not only on the degree of uncertainty, but also on its source. People often will solve what they perceive as the most important problem, which may be caused by the dread risk. After they have done so, they may stop worrying about the other problems. This preference to bet on clear or known probabilities rather than on vague or unknown probabilities has been called the ambiguity aversion. This means that stakeholders prefer to comment on their vague beliefs in situations where they feel themselves particularly competent or knowledgeable, and they prefer to comment on chances when they do not feel themselves as competent (Heath and Tversky, 1991). This also involves quasi-hyperbolic time discounting, where decision-makers discount future events, which are constructed abstractly, and give more attention to events that are more concrete and are in the recent past (Trope and Liberman, 2003).
21
Another example of ambiguity aversion in decision-making regarding multiple hazards is that stakeholders frequently judge an event as being less likely than its sub-events, which is also known as the “conjunction fallacy”. The judged probability of the union of disjoint events is generally smaller than the sum of judged probabilities of these events (Teigen, 1974). The probability is not attached to events, but rather to the description of events, hence two descriptions of the same event can be assigned different probabilities according to the number of components. The unpacking of the description of an event into an explicit disjunction of consistent events generally increases its judged probability. Thus, the judged probability of an event and its complement will sum to unity, however, the sum of the judged probabilities of disjoint events will be greater than or equal to the judged probability of their union. This is the evidence of so-called “bounded sub-additivity” in simple choices between uncertain prospects (Tversky and Kahneman, 1992). The limited worry is also connected with a row of biases, such as excessive optimism, overconfidence, confirmation bias and illusion of control. Under excessive optimism, stakeholders tend to overestimate the number of favourable outcomes in comparison to unfavourable ones (Shefrin, 2007). Overconfidence is when stakeholders overestimate their abilities and underestimate the limits of their knowledge. In general, they tend to overestimate their ability to perform well, which results in impulsive decisions as stakeholders think that they know more than they really do. The confidence bias leads to interpretations of information in such a way that it confirms preconceptions, while avoiding interpretations that contradict previously held beliefs. This type of bias is regarded as the most “pervasive and potentially catastrophic” of the cognitive biases to which human fall victims (Plous, 1993). Other biases include positive illusions, where people overly believe positively about themselves (Taylor and Brown, 1988), and illusion of control, when people tend to behave as if they would have control over a situation, which they do not have in reality, or they overestimate how much control they have (Gino et al., 2011). The illusion of control results in the belief of decision-makers that they can control or influence outcomes that are in reality out of their influence. In multi-hazard situations, the ambiguity aversion can lead to a preference for known risks over unknown risks. In risk mitigation and management strategies, the decision-makers would rather choose options with fewer unknown elements than with many unknown elements, even though there is the need to find out what the required factors are. This also will influence perceptions of stakeholders with regards to risk, and to the ambiguous events. The ambiguous events, such as low probability, but high impacts hazards, will be perceived with a greater degree of uncertainty in terms of their outcomes and probability. 22
These biases cause decision-makers both to underestimate the probability of a disaster and to underestimate its effects. They are often connected with risk mitigation strategies where people fail to prepare for a disaster. Especially tragic is when a government fails to prepare the population of its country under an assumption of limited worry. This bias makes people unable to react properly during a disastrous situation, which they had not confronted before, and to properly prepare themselves for the disaster. Also, by being overly optimistic, stakeholders disregard warning signals and overestimate their control over the situation. All of these limited worry biases lead to higher numbers of casualties resulting from a disaster situation, as it results in inadequate shelter, supplies and evacuation plans. They also influence the decision of people to evacuate or not. The unpacking principle of different kinds of natural hazards in a multi-risk situation implies that the particular description of events and especially of their components may affect the willingness of stakeholders to act. This also applies to the willingness to take protective actions, such as the construction of flood or cyclone defence infrastructure from the side of decision-makers, or the willingness to purchase insurance from the side of stakeholders. The unpacking principle can be applied to multi-hazard situations when the risk attributable to some hazard combinations can be greater than the sum of the risk attributable to each constituent hazard. Excessive optimism reduces the willingness of decision-makers to search for further information on multi-hazard events. The overconfidence bias in multi-hazard situations potentially leads to a lack of recognition of the true probabilities of hazards and therefore to incorrect decision and gaps in risk mitigation measures. The confidence bias results in a situation where while stakeholders may believe that a specific natural hazard or hazards will strike a region, they may cease searching for information about, or neglect available information on, other probable natural hazards. The illusion of control therefore may give decision-makers an impression that the effects of a natural hazard can be influenced by their personal involvement when the reality is quite different. Loss aversion is expressed in terms of the behaviour of stakeholders with regards to expected losses and gains, when losses are perceived as being greater than gains. In contrary to risk aversion, loss aversion relates not to the perceptions of probability of risks but more on perceptions of necessary investment. Influenced by this bias, individual perceive necessary investment to mitigate the risk as being a greater loss then possible gain from mitigation of the risk (Rhoads, 1997). This leads to stakeholders experiencing a stronger desire to avoid losses in the short term without considering possible gains from mitigation of these losses in 23
the long run (Tversky and Kahneman, 1979). In fact, losses are emotionally felt twice as strongly by stakeholders compared to possible gains. Risk aversion is often combined with the sunk costs fallacy, which means an expense that has been already incurred and cannot be recovered to any significant degree. The decisions which are influenced by the sunk cost fallacy are focused on minimizing already wasted resources rather than maximizing future utility. Loss aversion is also connected with status quo bias, when potential losses are perceived as being greater than potential gains. This leads the decision-maker to prefer the current situation, and to not take any action to change the status quo. With regards to low probability hazards, stakeholders are risk seeking for gains and risk averse for losses, while at the same time they are risk averse for gains and risk seeking for losses of high probabilities (Tversky and Kahneman, 1992). The argument here is that people care more about losses than gain. This means that additional benefits of adapting a management strategy for one risk so that it can also provide benefits for another risk may be largely ignored. The status quo bias leads to a situation where decision-makers do not implement any new risk mitigation measures, even when they receive information about the possibility of additional hazards or on changes in their intensity.
3.2. Test cases As a part of our research strategy, we apply the case study method, which allows us to investigate the phenomenon of decision-making in multi-hazard environments and connected cognitive biases within the real-life context of the historical events. This method allows us to develop generalizations of cognitive biases, which fail to address cascading and conjoint effects in multi-risk mitigation and management strategies. The case study method is an intensive analysis of an individual unit, person, community or event, stressing developmental factors in relation to the environment (MerriamWebster dictionary, 2009). The method is widely applied in a range of disciplines such as psychology, sociology, economics, political science, geography and medical science. Around half of all articles in the political science journals apply the case studies method. In recent decades, the method gained its popularity in testing hypotheses. One of the examples of the application of the method for the decision-making process is the analysis of rationality and power in urban policy and planning. Another example is the analysis of large-scale infrastructure projects like the Channel Tunnel, which links Great Britain and France. It 24
allowed for the answering of research questions such as costs overrun, which were not possible to answer with the existing statistical methods. In science, the case study method has helped to identify biases in the decision-making process regarding large-scale infrastructure projects (Flyvbjerg et al., 2003). Among other strengths of the method are the depth of analysis, high conceptual validity, understanding of the context and process, as well as of what causes a phenomenon, linking causes and outcomes and fostering new hypotheses and new research questions. In addition, the method has the value of providing phenomenological insights, which are gleaned by closely examining contextual expert knowledge (Flyvbjerg, 2011). Within the MATRIX project, we polled participants for examples of multi-hazard decision-making problems. Our intention was to identify a number of case studies, which we could examine in more detail, to find out if indeed a multi-hazard decision making error had occurred, for what causes, and with what consequences. The respondents replied with five examples. These were: the 2011 Tohoku earthquake, the 1908 Messina earthquake and tsunami, the 1995 Kobe earthquake, the 2010 Haiti earthquake and the 2004 SumatraAndaman earthquake. The Tohoku earthquake, tsunami, and nuclear accident of 2011. There were several claims here, each of which is problematic in its own way. First, there is the claim that risk analysts failed to take into account the fact that an earthquake could cause a tsunami, leading to both a shutdown of the plant (from the earthquake) and a devastating flood. In fact, it appears that this contingency was examined, but the problem was that the estimates of the size of a possible tsunami were poor, given the relative poor state-of-the-art of oceanographic modelling at the time the nuclear plant was constructed compared to today. Second, there was the claim that plant designers chose to locate the plant on the coast, rather than inland, in order to be able to build on the more stable bedrock that the coast offered, thus taking precautionary steps for an earthquake, but not towards tsunamis. In fact, there is reason to believe that the primary motivation for building on the coast was to take advantage of unlimited seawater for the secondary cooling system. Given the coastal sitting, designers did take tsunami risk into account, but again underestimated the size of a potential tsunami. Hence, the roots of the nuclear accident—the decision to locate the backup diesel generator, needed for the emergency reactor cooling system, in the basement, which was flooded—was an overconfidence in the degree to which the constructed seawall would guard against a tsunami risk. This was unrelated to the multi-hazard nature of the accident.
25
The Messina earthquake and tsunami of 1908. The claim here was that after a major earthquake, people sought refuge near the city’s harbour. Soon thereafter, the harbour was inundated by a tsunami—linked to the earthquake—killing many people. This example suggests that people were not aware that tsunamis often result from earthquakes. The error in decision-making was not so much a failure to put knowledge of earthquakes and tsunamis together, but rather the simple absence of knowledge about tsunamis. The Kobe earthquake of 1995, and the Haiti earthquake of 2010. In both cases, the claim was that building construction practices made use of heavy roofing material, in order to make buildings able to withstand heavy winds, a concern in a city subject periodically to tropical cyclones. The heavy roofs, in turn, made building more vulnerable to earthquake damage. This example may have some truth to it, as there is reason to believe that earthquake planners failed to consider roofing design, even as they addressed—at least in Japan—other aspects of building construction, such as the traditional use of wood frames rather than unreinforced masonry support structures. The decision to use heavy roofing materials may have been a response, either explicitly as a result of analysis or implicitly as a result of evolved building practices, to the risk of cyclones, and yet there was a failure to consider the additional damage that this could cause in the event of earthquakes. The 2004 Indian Ocean tsunami. There were several claims associated with this event. One was that the efforts to reconstruct the district of Aceh following the destruction of the tsunami failed to take into account the threat of political risk and social disruption, and hence proceeded more slowly. In fact, it does not appear that there was any interaction between the tsunami risk and the political risk; the political risk made reconstruction efforts more difficult, but these had no practical impact on mitigation efforts being included in the reconstruction efforts. A second claim was that in India, people could have fled to emergency cyclone shelters to avoid the approaching tsunami, but did not. This example may have some truth to it. While an early warning system was set up to warn of approaching cyclones, giving people the opportunity to seek refuge in the shelters, the system was not designed to warn of tsunamis, and so people did not take advantage of the shelters that were there. This could have resulted from simply the failure to take into account tsunami risk. Nevertheless, this may have been a result of poor judgment at the time of constructing the shelters, as it would have been appropriate to consider all risks for which they could be used, while also making the minor additional investment to enable the warning system to respond to tsunamis. Based on these, we chose to focus greater analysis on the first, third and fourth of these examples. We did not consider the Messina earthquake as it was the least recent event 26
and there were no sufficient data for it. Thus, we have two case studies, one on the Kobe and Haiti earthquakes, and one on the Indian Ocean tsunami shelters. It is important to add that both of these examples fall into the class of problems where there are links between the management strategies: in the case of the two earthquakes, it may have been a failure to recognize how confronting one hazard (the cyclone) could make another hazard (the earthquake) worse. In the case of the tsunami, it was a failure to see a possible synergy: how confronting one problem (the cyclone) at great expense, by constructing a network of shelters, could also make it possible to guard against a second hazard (the tsunami) at very little additional cost. In this section we sketch out the cases where a multi-hazard decision-making problem exists, and distinguish these from where there is simply a single-hazard problem. We start with the matrix below (table 1).
Table 1: Management strategies for correlated and uncorrelated multiple hazards
Uncorrelated
Uncorrelated hazards
Correlated hazards
No problem.
Potential problem in the risk
management strategies
analysis stage
Correlated management
Potential problem in the risk
Potential problems of both risk
strategies
management stage
analysis and risk management
We can also add cases where there was simply a failure to recognize a hazard at all. This is a single-hazard problem, not a multi-hazard one, unless the failure was somehow caused by the recognition of another one. In multi-hazard situations, the cognitive biases could lead to situations, where judgments about hazards and risks relevant to a region are made separately from each other and neglect interdependencies between different types of risks. They would therefore also fail to recognize the cascading or conjoint effects which these hazards might have, for example, when an earthquake triggers a tsunami or a volcanic eruption triggers an earthquake. However, these biases could lead to a decision-making process where the major mitigation efforts are expended on some risks, while other risks are overlooked. We summarize the types of biases and their possible influence in multi-hazard situations in the table 2 below. 27
Table 2: Biases and factors influencing decision-making process in multi-hazards cases Biases and factors Bounded rationality
Experimental versus statistical evidence
Probability weighting and 50/50 bias
Dread risk
Availability heuristics
Limited worry
Prospect theory and loss aversion
Problems for multi-hazards cases Limitation in the ability and desire to collect information on multiple hazards due to time and other constraints. An example is that the need for the communication of signals from early warning systems in the Pacific Ocean on the probabilities of tsunami was not perceived and communicated by decision-makers The belief in good seismic mitigation measures due to progresses in science and earthquake engineering and technology because of the bubbling economy in the country made Japanese decision-makers to believe that structures became strong enough to outlive the most violent ground motions. Interpretation of facts depends on the current picture of the world and not on analytical or statistical information. An example is that the geology of the “Ring of Fire” when considering 2004 Sumatra earthquake case misguided policy-makers’ perceptions on the probability of a large underwater earthquake in the region. The introduction of new building codes after recent earthquakes in Japan also led to policy-makers overlooking seismic dangers for the old wooden houses (e.g., Kobe). Overestimation of small risks and underestimation of large risks. Example is the decision of Japanese planers to build nuclear plants close to coastlines to be able to build on more stable bedrock, thus mitigating the earthquake risk. However, they underestimated the possible sizes of tsunamis. Overestimation of low probability, but high impact events. Strong emotions with regards to risks being out of our control and unknown risks. An example could be the construction of protection shelters against the atomic war in the former Soviet Union. Underestimation of events that did not take place in the recent past, and the overestimation of these events after they have happened. For example, early warning systems in the Indian Ocean, which did not exist before the earthquake and tsunami of 2004, as the last undersea earthquake happened many years previously, i.e., the 1883 Krakatau eruption. For the case of the 1995 Kobe earthquake, no large earthquake had occurred in the large cities of Japan for the previous fifty years, which made decision-makers believe that Japanese cities are seismic resistant. Mitigation of only one type of hazard, which is perceived as being the most serious in terms of outcome, or the most probable to happen. Ignorance of other types of hazards, which are perceived to have less significant consequences or with lower level of probability to happen. Ignorance of potential additional gains when considering mitigation measures for another hazard. Example, since the construction of multi-hazard versions of existing cyclone shelters is associated with additional investment, the gains made by the mitigation of additional risks, such as tsunamis, are ignored.
We examine the substance of the claim that a failure to consider multiple hazards together is a problem, in terms of causing unnecessary losses. We do so in three steps. First, we develop a simple typology of the multi-hazard situation, distinguishing those cases where a failure in analysis derives from the failure to consider two or more hazards at the same time, as distinct from the failure to consider one or more of those hazards at all. Second, based on behavioural economics and behavioural decision-theory literature, we examine the cognitive and behavioural biases, which influence perceptions of the probabilities of multiple risks. Third, we examine case studies where there is some evidence that an error of judgment has occurred, as a consequence of a failure to consider multiple hazards together. We appraise whether the 28
claims for such an error of judgment are correct, and as if they are attributable to the behavioural factors that we previously identified. We analyse evidence of cognitive biases in connection to five historical cases: the 2011 Tohoku earthquake, the 1908 Messina earthquake and tsunami, the 1995 Kobe earthquake, the 2010 Haiti earthquake and the 2004 SumatraAndaman earthquake.
3.2a. The Indian Ocean Tsunami of 2004 The Sumatra-Andaman earthquake of the 26th of December 2004 that hit the west coast of Indonesia was the largest recorded earthquake since 1964 and the biggest in the Indian Ocean for the last 700 years. The magnitude of the earthquake ranged between 9.1 and 9.3 on the Richter Scale according to different estimates (Stein and Okal, 2005). It occurred 250 km southwest of the North tip of Sumatra, Indonesia, with waves from the resulting Indian Ocean tsunami reaching between 15 and 30 meters high, causing damage in several countries of the Indian Ocean, including Indonesia, Thailand, Sri Lanka, India, Maldives and even Somalia (Titov et al., 2005). The extent of flooding inland varied from 500 meters to two kilometers.
Figure 2: The Sumatra-Andaman earthquake and the resulting Indian Ocean tsunami 2004
Source: UNEP, 2005
29
The lag between the earthquake’s occurrence and the tsunami’s arrival was up to several hours, but almost all victims were not prepared for it. The tsunami reached the coasts of Thailand and Sri Lanka in 2 hours, Maldives within 4 hours and east Africa in 10 hours (Japan Coast Guard, 2004). In the case of Indonesia, which was especially hard hit, less than 15 minutes was available to alert the population. However, if the system of warning was effective, while the buildings and infrastructure would still have been destroyed or damaged, the number of deaths and injuries would have been much lower. This was the deadliest tsunami recorded in the region and in history, and one of the most devastating events caused by natural hazards in recent years, killing killed more than 227,898 people in fourteen countries of the South Asian region as of March 2005, with many still regarded as missing (U.S. Geological Survey, 2005). In comparison, the tsunami in the Pacific Ocean in 1782 killed 40,000 people and the tsunami created by the eruption of Krakatau in 1883 killed 36,000 people. The most deadly tsunami in the Atlantic Ocean after the Lisbon earthquake in 1755 killed over 100,000 people and the tsunami in the Mediterranean Sea, in Messina, Italy in 1908 killed 70,000 people. The largest number of deaths from the Indian Ocean tsunami was registered in Indonesia with around 170,000 people, followed by Sri Lanka, India and Thailand. Additionally, the tsunami made more than five million persons homeless and resulted in massive displacement of population. In Indonesia, more than 500,000 were displaced, in while in Sri Lanka and India these figures were 516,150 and 647,599, respectively. Women were hit much stronger than men and in some regions, like Banda Aceh, 80% of all who died as a result of the tsunami were women. There were different reasons for this. On the Indian coast, many women were waiting on the beach for fishermen to return. On the east coast of Sri Lanka, women were taking baths in the sea exactly at the time of tsunami. In other cases, women were at homes looking after their children as men were in their boats out at sea, where waves were less ferocious, or carrying out errands. Other reasons involve sociocultural norms, which limit the mobility of women, placing them in subordinated position, with fewer educational opportunities, less voice in decision-making and poorer employment. All of these factors make women around 14 times more vulnerable to disasters then men (Oxfam, 2008). The early warning systems did not work because they were oriented for different kinds of hazards and not for tsunamis. The main reason for this is that at that time, there was no tsunami warning system in the Indian Ocean, neither to detect tsunamis nor to warn the population. This can be explained first by the fact that earthquakes are frequent in this region, 30
but tsunamis are rare. The last major tsunami, which took place before 2004, was in 1883 after the Krakatau eruption. For comparison, an effective warning system was established in the Pacific Ocean, which is famous for its “Ring of Fire”. However, the western edge of the Ring of Fire in fact extends into the Indian Ocean, exactly to the point where the earthquake of 2004 took place, and it was this rare possibility of earthquake was not taken into consideration. There are, however, a few anecdotal examples of when early warning worked successfully and lead to evacuation of people. First, one belongs to the evidence from national folklore. On the Indonesian island of Simeulue, the local population fled to the inland hills after the initial shaking even before the tsunami struck. The reason is to be found in the folklore stories that recounted an earthquake and tsunami that occurred in 1907. The second is a school educational program. On Maikhao beach in Thailand, people were successfully evacuated after a 10-year-old British tourist recognized the first signs of tsunami, which she studied in geography at school. The third is work experience in areas connected with natural hazards. The tourists were evacuated from the Kamala Bay, Thailand, after a Scottish biology teacher recognized the first signs. The forth are indigenous practices when the aboriginal population of the Andaman Islands successfully escaped the tsunami due to existing indigenous practices of early warning. The tsunami of 2004 showed that one of the reasons for so high a death toll, in addition to the absence of early warning systems, was the absence of safe shelter buildings in the coastal villages, where population could be evacuated too. Analysis of existing official documentation showed that there were shelters constructed for the case of cyclones, as India had already a 40-year history of constructing cyclone shelters. With additional technical equipment and investment, these shelters could also be used for the case of tsunamis. However, additional investment to adapt shelters from single-hazard to multi-hazard situation was regarded as too high by the local decision-making process. Also, tsunamis were perceived as being low probability events, which did not provide a justification for investment into this risk mitigation measure. We found evidence that availability heuristics strongly influenced the decision-making process with regards to multi-hazard risk mitigation measures. The Indian decision-makers perceived the probability of tsunami as low because they were lacking in the experience of it, since, as stated above, the last major tsunami prior to the 2004 event was in 1883 after the Krakatau eruption. The decision-makers perceived earthquakes as highly probable because this type of hazard is frequent in the countries around the Indian Ocean. However, they 31
perceived the underwater earthquake and tsunami as less probable because it did not happen for almost one hundred years. We did not find evidence of the dread risk bias in the tsunami 2004 historical case. However, experimental versus statistical evidence bias may have been present in the decisionmaking with regards to the estimation of the probabilities of tsunami and earthquake, as discussed above with regards to the extent of the Ring of Fire. This bias therefore influenced the decision-making process in India with regards to the communication of the results from risk assessments. Although several numerical models of earthquakes in the region alerted seismologists about possibility of a great underwater earthquake and therefore a tsunami, there was still a failure to establish a warning system before the 2004 event (IUGG, 2005). Thus the experimental versus statistical evidence bias made decision-makers deal with the types of hazards they felt themselves to be competent with and knowledgeable about, such as land-based The bounded rationality bias is often caused by time constraints, when decisionmakers are choosing simplified tools from complex structures and sets of figures. This bias may have influenced the risk communication process. It is known that the early warning system operating in the Pacific Ocean was sending messages about the occurrence of an underwater earthquake in the Indian Ocean, however, these messages were not communicated to the people along the coasts of the Indian Ocean and no action was taken to prevent this type of risk and to prepare population for disaster. The limited worry bias leads to situation when risk mitigation measures are taken for more probable and more recent events then for an event that may happen in a future. This bias may have influenced the implementation of risk mitigation measures like the construction of multi-purpose cyclone shelters (MPCS) along the Indian coastline. Cyclones are frequent in the region and are well known to the Indian decision-makers. Therefore, the majority of the shelters are constructed in a way to stand cyclones. However, as tsunami was not so common, several shelters were constructed in a way that they were not high enough to provide shelter for the case of tsunami. The loss aversion bias motivated decision-makers to invest into the mitigation of wellknown highly probable risks to prevent losses from them. However, they were more risk averse for losses from risk with low probability such as tsunami. The loss aversion is connected with status quo bias when potential losses are perceived as being greater than potential gains. This leads the decision-maker to prefer the current situation and not to take any action to change the status quo. Despite the 40 years of experience with MPCS, their 32
construction is very slow and still there are not enough shelters to accommodate the population in need.
3.2b The Kobe Earthquake of 1995 The Great Hanshin earthquake, which is also known as the Kobe earthquake, occurred on the 17th of January 1995 in the southern part of Hyogo Prefecture in Japan. The focus of the earthquake was located 20 km from the city of Kobe, Japan´s sixth largest city with population over 1.6 million. The magnitude of the earthquake has been estimated to be from 6.8 to 7.3 on the moment magnitude scale. . The disaster happened at 5:46 in the early morning and the majority of people were still sleeping after the three-day long holiday before the January 17. “Everywhere buildings were toppled, houses were in rubble, infernos swallowed entire towns, elevated highways and railways collapsed, and crumbled cliffs buried houses. Everywhere people died”, wrote the Asahi Evening News on the same day. This was the first earthquake in the modern history of Japan that seriously hit a heavily populated urban area, resulting in damage beyond all expectations.
Figure 3: Great Hanshin earthquake of 1995
Source: Kobe City Restoration Plan, 1995
The limited worry bias may have made decision-makers to mitigate one type of hazards and ignored another one. The earthquake in Kobe caused much more significant damage than a similar event, which occurred one year earlier in Los Angeles and is known as the Northridge earthquake. The hazard was of the same magnitude of 6.8, and the density of 33
population was also roughly the same, with about 2 million people in the Kobe area and the San Fernando Valley of Los Angeles. However, in the Northridge earthquake, only 72 people died at the same time as more than 6,000 people died in Kobe. The majority of losses (99%) were in the Hyogo prefecture, with more than 71% of death toll in Kobe. One of the major differences was the construction of buildings in Kobe and Los Angeles. The buildings in Kobe were constructed in a way to stand typhoons and they were not designed to stand earthquakes. This was the case for the older buildings but not so much for the newer ones. Most of traditional houses had heavy tiled roofs, which weighed around 2 tons. These buildings were constructed to resist typhoons, the most frequent hazard in Kobe. Therefore, they had only a light wood support frame, which was not designed to withstand the earthquake. When the wood supports were destroyed, the roof crushed the unreinforced walls and floors. This was like a “pancake” collapse. More than 80% of all human losses were caused in the early hours of the earthquake because of the collapse of such houses and buildings, with more than 100,000 houses destroyed. More than 60% of all destroyed houses were in Kobe. All-together, 10% of all houses in Kobe were destroyed. These were the old houses built in the old wooden style, mostly before 1970, according to the old building code. We also found evidence of the limited worry bias when mitigation efforts were focused on one type of hazard or in the areas where the losses were perceived as being the most probable or most significant. At the same time, this bias leads to ignorance of other types of areas. For example, some structures were reinforced after the 1971 San Fernando and 1964 Niigata earthquakes, however, the large number of old wooden houses was not refurbished (Katayama, 2004). The experimental versus statistical evidence bias may have marked the behaviour of stakeholders, when their interpretation of facts depends on their current picture of the world and not on analytical or statistical information. Before the Kobe earthquake happened, nobody could imagine that such a devastating earthquake could strike the Kansai region, which includes the cities of Osaka, Kyoto and Kobe. The attention of risk management authorities was focused on plate boundary earthquakes taking place in subduction zones. This led to a situation where all anti-earthquake policies were focused on the Kanto area, with a special emphasis on Tokyo, owing to the large Kanto earthquake, which took place in 1923. The region of Kansai was a blind spot where officials and population were unprepared. The loss aversion bias can be also found in the behaviour of the population with regards to the purchasing of insurance, when people perceived the losses due to the necessary investment into insurance as being greater than the gains for the mitigation of potential risk. Only 4.9% of 34
the population in Osaka was insured at the time of disaster, at the same time as in Shizuoka, where a big earthquake was expected, over 13% of population had purchased insurance, which is higher comparatively to the Osaka but still is rather low in international comparison. Some authors argue that the purchase of insurance might also lead to a behavioural bias as it can undermine incentives to refurbish the house for the case of earthquakes. The Kobe earthquake showed that there were millions of old wooden houses in the large urbanized areas, which are prone to earthquakes. Even now, there are around 24 million old wooden houses which were built before 1971, and which still need to be refurbished. Starting from 1995, the government was offering a free assessment of these houses and a special loan system with a maximum of 6 million Yen (ca. 46,000 Euros). However, the number of citizen who profited from this offer was very small. This made the government introduce a new system in 1999 with a maximum subsidy of 2 million Yen for households with average income. However, in the period from 1999 to 2003, only about 6000 people used the offer of free assessment and only about 600 refurbished their houses with the help of government loans. Taking as a reference the example of Yokohama with 300,000 vulnerable houses, only 0.2% of all citizens profited from the offer of the government to refurbish their houses. Studies about this determined that the barrier to the further deployment of this initiative to mitigate the earthquake hazard resides not in technology or engineering, but in policy and the system of incentives. Under the present legal system, there is no compensation for the house owners for the case where an already a strengthened house is damaged by an earthquake. Unless this regulation changes, the gap will remain between the number of old wooden houses and those houses that have been strengthened. We found evidence of the availability heuristics bias when decision-makers underestimated events, which have not taken place in the recent past. For example, for more than 50 years no large earthquake has occurred in the large cities of Japan, hence the policymaking process was not prepared to deal with this and the planners and engineers believed that Japanese cities had become earthquake resistant (Shimazaki, 2001). The availability bias can also be found in the judgment of Japanese officials regarding other infrastructure. For example, since the Kanto earthquake in 1923, all infrastructures were designed to withstand large static horizontal forces. The seismic factors of 0.2 to 0.3 were considered to be four to five times greater than those used for the US structures constructed before the 1971 San Fernando earthquake. This was the major reason why Japanese decision-makers perceived Japanese infrastructure to be more earthquake resistant than similar US infrastructure.
35
We found evidence of the bounded rationality bias when decision-makers were limited in their ability and desire to collect information due to time or other types of constraints. The development of good seismic mitigation measures due to progress in science and earthquake engineering and technology, supported by the strong economy, also led people to believe that structures had become strong enough to survive the most violent ground motions (Yamazaki, 2001), supported by how Japanese technology and engineering had made significant progress. The majority of stakeholders in Japan therefore tended to think that structures in Japan were strong enough to resist the most powerful earthquakes. This misleading situation thus led to the thought that no earthquake could lead to so significant volumes of victims as it happened in Kobe.
3.2c The Haiti Earthquake of 2010 The Haiti earthquake happened on the 12th of January 2010 approximately 25 kilometers west of the capital, Port-au-Prince. The earthquake had a magnitude of 7.0 on the Richter scale and was followed by several aftershocks measuring 4.5 or greater. The earthquake triggered fissures in ground, widespread landslides and tsunami.
Figure 4: ShakeMap (intensity) of the Haiti earthquake of 2010.
Source: USGC, 2010
36
The estimations of human losses vary between 220,000 (ISDR, 2011) and 316,000 (Haitian government, 2011), with additionally several hundreds of injured and homeless people. Haiti has a long history of devastating earthquakes, starting from 1751, when only one building remained, in 1770 when the whole city collapsed, and in 1946 when an 8.0 magnitude earthquake caused a tsunami, killing several people. The country is also vulnerable to cyclones, for example in the summer of 2008 where some 800 people died. The majority of deaths from the 2010 earthquake were caused by the destruction of buildings, which were constructed without considering any building code or anti-seismic norms, which simply did not exist in Haiti before and people did not apply them by constructing private housing. In addition, a significant proportion of the population lives in informal settlements, which are also constructed without any safety measures and building codes. Following the earthquake, USAID organized a team of seismologists and geologists to create hazard maps, which should be the basis for the seismic guidelines for the rehabilitation of infrastructure and the construction of new housing. Haiti was known to be at risk from a major earthquake due to the build-up of stress along the fault line upon which Port-au-Prince sits. Haitian officials were also aware of the city's risk, but did little to address its vulnerabilities. The officials preferred the status-quo situation and according to the chief of Haiti's Bureau of Mines and Energy, it was better not to even talk about this so as not to create panic. One of the reasons for this behaviour was the limited funds and capacities available in a poor country like Haiti for risk mitigation. The daily needs were so pressing that officials preferred to distribute funds for this, leaving the long-term goals unaddressed. Disaster management had a very low priority in the state budget, with most of the available funds directed towards education, basic health care and immunizations. Natural catastrophes were therefore considered as a chance event. The tragedy of Haiti highlighted the importance of disaster risk reduction within the contexts of rapid urbanization, seismic risk, population growth and poverty (UNISDR, 2011). In contrast to the Kobe earthquake case, the failure to implement risk mitigation measures in Haiti results mainly from weak institutions, which were not able to enforce the standards in building products for infrastructure development, and the quality control was poor. Another problem is the level of poverty, as Haiti ranks one of the lowest places (149 of 182) in the UNDP poverty index and 80% of its population lives below poverty line. Poverty and urbanization has forced large parts of population to concentrate in high-risk urban areas, making worse an already dire situation.
37
3.2d The Tohoku Earthquake of 2011 The Great East Japan Earthquake, which is also known as the Tohoku earthquake, occurred on the 11 March 2011. Its epicentre was 70 kilometers east of the Oshika Peninsula of Tohoku and the hypocentre was at a depth of 32 km. It was an undersea mega thrust earthquake with a magnitude of 9.0, actually the most powerful recorded earthquake to ever hit Japan. The earthquake triggered powerful tsunami waves that reached heights of up to 40.5 meters and travelled up to 10 km inland. Two major tsunamis followed the earthquake. The height of the first was recorded as 4 m. The height of the second tsunami exceeded 7.5 m. These tsunamis brought destruction along the whole Pacific coastline of Japan and were felt in many countries around Pacific, including even North and South America.
Figure 5: Great East Japan earthquake of 2011
Source: World Property Chanel, 2011
The tsunami caused a number of nuclear accidents, which lead to the on-going level 7 meltdowns at three reactors in the Fukushima Daiichi Nuclear Power Plant, when the tsunami overtopped the protective seawalls and destroyed the diesel backup power systems. The wave overcame the banks (10 m high) into the plant site and swept over buildings and components, eventually causing the loss of AC power at Reactors 1-5, and the loss of DC power at Reactors 1, 2 and 4, with core cooling functions almost totally lost at Reactors 1-3.This caused severe problems at Fukushima Daiichi, including three large explosions and radioactive leakage. Fallen building walls and ceilings further damaged major equipment and piping systems. At the time of the explosions, a large quantity of radioactive materials was 38
released. Increased radiation levels were measured over a wide area, including Tokyo, and the contamination by caesium was recorded even further away in the Kanto Region, west of Tokyo. An early warning was issued and people were evacuated. The Japan Meteorological Agency issued the “major tsunami” warning, which means at least 3 m high. Actually, in some places the tsunami reached heights of 38 m. The tsunami reached Japan over a time period between 10 and 30 minutes after the triggering earthquake. Like the 2004 Indian Ocean earthquake and tsunami, the damage caused by surging water was much higher and deadlier than the actual quake. If the earthquake resulted in a dozen deaths, the tsunami lead to the loss of several thousands of people. Of the 13,135 fatalities as per 11 April 2011, more than 90% died by drowning, while more than 65% of all dead were over 60 years of age (Japan Times, 2011a). Later estimates place the total number of dead and missing as over 19,000 (CATDAT, 2011). The availability heuristics influenced estimations of tsunami heights. In some parts of the coast, the tsunami was much higher than those usually observed in these places. The large size of the water surge was among several factors which caused a high death toll. The tsunami walls at several of the affected cities were based on estimations for much smaller tsunami heights. This not only resulted in massive destruction, but also significant human losses. Many people caught in the tsunami thought that they are safe because they are located on ground which is usually high enough to be safe (Japan Times, 2011). This resulted in only 58% of all people in coastal areas in Iwate, Miyagi and Fukushima prefectures heeding tsunami warnings and headed for higher ground. Only 5% of those who attempted to evacuate were caught in the tsunami. Of those who didn’t heed the warning, 49% were hit by water (Japan Times, 2011a; Japan Times, 2011b). The experimental versus statistical bias led to false estimation of the probability of an earthquake and therefore improper action by the government. The risk assessment and seismologists estimated that the “big one” will strike at the same place as the 1923 Great Kanto earthquake. This is at the Sagami Trough, south west of Tokyo. Following this estimation, the government tracked the plate movements in the preparation for the so-called Tokai earthquake. However, the Tohoku earthquake occurred 373 km northeast of Tokyo. It was a great surprise to seismologists, since the Japan Trench was known for the creation of large quakes, but it was not expected that the quake would be greater than 8.0 magnitude (Lovett, 2011; Achenbach, 2011).
39
The case study showed that there was the overconfidence bias in the decision-making regarding the reliability of the nuclear power plant. A deterministic approach was used for design of the plant. According to this approach, the decision-maker assumes critical situations for important safety functions and proves that sufficient safety margins are maintained for these situations. However, this approach is applied only for basis events. Another approach is more probabilistic as it adopts the general framework of probabilistic risk analysis or probabilistic safety assessment. In the case of the Fukushima Daiichi, all regulations were developed on the basis of the deterministic approach and the probabilistic approach was treated only for supplementary information. There are three reasons for the application of this approach. The first one is that experts perceived the failure probability to be low enough. The second, they perceived the probabilistic approach as not being technically mature enough. Third, they worried that if they applied the probabilistic approach, this would create an argument that the deterministic approach is incomplete. We found evidence of dread risk in the statements of the Japanese policy-makers after the Fukushima disaster. The former prime-minister, Nato Kan, pointed out that he experienced a very bad feeling when he thought about the complete evacuation of Tokyo after the catastrophe took place, and that Tokyo would became uninhabitable, but it would be impossible to evacuate the 30-million population living in the city and its surroundings (Japan Times, 2011). The conclusion of this politician was that there is no choice, but to become independent of nuclear power plants. According to his opinion, if an accident could make half of the country uninhabitable, than the risk cannot be taken, even if it was once in a century. However, such predictions varied with the assessments of experts, whose discussions mainly focused on trying to accurately characterize the risks associated with the aftermath of the tsunami. The overconfidence bias also manifested itself in the application of the probabilistic risk analysis. Usually it is applied to facilitate the systematic and rational design of a plant. However, to validate the safety of the plant, different approaches have to be applied, especially with regards to severe situations, including a wider perspective. However, these approaches were never applied as experts were overconfident about their validation methods. The bounded rationality bias showed itself in the activities of officials in the estimations of the likelihood of the event as being extremely small. It appeared after one influential member of the Nuclear Safety Commission proclaimed in a TV program that the failure rate was small enough and that there was no justifiable needed to assume the total loss
40
of power from the plant. It seemed that people did not worry much about situations whose probabilities were estimated to be extremely low. The loss aversion expressed itself in the communication of findings from science to politicians and application of such measures as simulation codes and construction of off-site centres. An off-site centre near the plant becomes unusable or non-functional because of the inappropriate design of the plant. Actually, experts were aware that off-site centres were so ill designed that they would not be usable. The loss aversion was also manifested in the application of the simulation code called SPEEDI, which was designed to estimate the diffusion of radioactive materials, however this tool was not used at all. The major reason for this was that SPEEDI was under the control of a different authority and people at the plant did not even know that it exists. In any case, it would require additional efforts and resources from experts to apply this tool properly.
41
4 Discussion Behavioural economics and decision theory tell us that cognitive biases manifest themselves automatically and unconsciously over a range of human reasoning. Hence, a possible way to mitigate them is to raise awareness of their existence and how they influence the decisionmaking process in situations of multiple hazards. Our reviews of historical cases identified the possible presence of cognitive biases in decision-making process on estimation of probabilities and the making of choices with regards to multi-hazard risk mitigation and management in three historical cases: 1995 Kobe Earthquake, 2004 Sumatra Earthquake and 2011 Tohoku Earthquake. We did not identify any evidence of cognitive and behavioural biases in the case of the 1908 Messina Earthquake and 2010 Haiti Earthquake. In the case of Messina earthquake the data were absent, in the case of the Haiti earthquake other factors had strong impact on the decision-making process, such as the absence of funds for risk mitigation. The results of the reviews of three historical cases are summarized in table 3 below. The historical cases showed evidence of several biases, which have been identified by behavioural economics research. However, the theoretical framework of behavioural economics itself of heuristics and prospect theory may be complemented by the Bayesian paradigm, which for the inference part, uses the product and sum rule of the Bayesian probability theory, this Bayesian probability theory may be extended to a Bayesian decision theory, which includes as a special case the expected utility theory. Furthermore the Bayesian paradigm also provides us with a new Bayesian information theory in which we assign relevancies to sources of information and then operate on these relevances by way of the information theoretic product and sum rules. Besides the identified evidence of cognitive and behavioural biases in the decisionmaking process, there are also several other factors which influenced the estimation of the probabilities of hazards within multi-risk environments and the process of making choices by the decision-makers, further on leading to losses during the disaster event. These factors include the lack of funds (especially for Haiti), lack of early warning systems, lack of good hazard models, lack of data from science and gaps in organizations’ capacities, as well as in communication among involved stakeholders. In the table below, we give some examples of factors influencing risk management, such as success stories mainly in the recovery phase, and factors where significant improvements are needed for both risk mitigation and management.
42
Table 3: Contributing factors identified in the decision-making frameworks of three historical case studies, Bias Availability heuristics
Dread risk
1995 Kobe Earthquake Absence of large earthquakes during the last decades, absence of large earthquakes in the bigger cities.
-
2004 Sumatra Earthquake Perceptions of tsunamis were influenced by the fact that there was no large earthquake for more than one hundred years. -
2011 Tohoku Earthquake Estimations of tsunami heights based on previous tsunamis experienced in this region. This resulted in the unwillingness of people to evacuate. Estimations and risks of further nuclear catastrophes after Fukushima varying from experts’ estimations. Estimations of the place and probability of large earthquake were based on the experience of the Kanto earthquake.
Experimental versus statistics
Because of the 1923 Kanto earthquake, all risk mitigation measures were focused at this region. Nobody could imagine that a destructive earthquake could strike the Kansai region.
Several numeric models alerted seismologists about possibility of a large undersea earthquake in the region, however, the actions of decision-makers were based on beliefs about the Ring of Fire.
Bounded rationality
Belief in technological and engineering progress made officials less willing to collect other information.
Misperception of signals from early warning centre in the Pacific Ocean.
Estimation of nuclear accident as being extremely low.
Limited worry
Risk mitigation for typhoons and limited worry for earthquake.
Mitigation of cyclone risks and limited concern about tsunamis.
Risk mitigation carried out in some areas, although limited concerns in other areas. Behaviour of the population with regards to the purchasing of insurance.
Application of a deterministic approach instead of a probabilistic risk assessment for safety of nuclear plants. Overconfidence in the validation methods.
Connected to the necessary investment into additional risk mitigation measures.
Additional resources and efforts needed for the application of tools designed for estimating the diffusion of radioactive materials currently under the control of different authorities.
Loss aversion
43
Table 4: Factors influencing the decision-making of risk management in three historical cases Success story
Improvement needed Significant improvement needed
In sectors with already established programs and capacities, such as health (no major outbreaks of diseases) and education (one third of schools was destroyed, however, children returned to school after few weeks). Transportation possibilities, especially to islands and mountain areas, delivery of goods, treatment of displaced persons (protection and long-term planning). Cooperation between the international community and local governments, absence of experience in several areas, sporadic coordination of recovery and reconstruction works, confusion about responsibilities, lack of leadership and coordination, lack of attention to the establishment of local disaster prevention capacities. Absence of strategic planning, ad-hoc emergency response, lack of monitoring and follow-up efforts. Lack of vulnerability mapping and comprehensive risk assessment, minimal field assessments to date, mainly restricted to areas of high population density, lack of environmental baseline data, lack of environmental quality assessments and data on toxic and hazardous wastes that may be mixed with other debris, lack of environmental guidelines in national disaster plans, where they exist at all.
The communication of the need for risk mitigation measures to decision-makers could involve strategies that would address two systems of perceptions: analytical and behavioural (Kahneman, 2011). Science could provide arguments in the form of probabilities and risks assessments, scenarios and cost-benefit analysis, thus addressing the first system based on logical thinking. By addressing this system, the communication process shall involve numbers representing the likelihood of an event, such as percentages, or present it in graphical forms. Cost-benefit analysis might be one of the best tools to present to stakeholders the optimal decisions. One of its strengths is the possibility of comparing costs and the benefits of different alternatives to the area of government decision-making. However, the most important correction to cognitive biases is to simply make stakeholders aware of them. The fundamental finding of such studies is that “ there is wisdom as well as error in public attitudes and perceptions. Lay people sometimes lack certain information about hazards. However, their basic conceptualization of risk is much richer than that of the experts and reflects legitimate concerns that are typically omitted from expert risk assessments. As a result, risk communication and risk management efforts are destined to fail unless they are structured as a two-way process. Each side, expert and public, has something valid to contribute. Each side must respect the insights and intelligence of the other” (Slovic, 1987). However, as its focus is the comparison of impacts on policy outcomes from different alternatives, it does not evaluate patterns of the decision-making process at the regional, national and local levels. It can only deal with well-defined problems, which involve a limited number of actors in the process of choosing alternative policy measures. It cannot be applied to areas, which are difficult to measure in monetary terms. It also fails to deal with low probability catastrophic events, which might lead to unbounded measures of either costs or 44
benefits. Analytical tools can support the decision-making process with regards to the probability judgments of different types of hazards, which require individuals to make risk assessments with respect to natural hazards. But the perception of statistical information might be different than intuitive and affect-based judgments, which stakeholders make in every-day life. The structured approach of the classical utility theory model can reduce the impact of probabilistic biases and simplify the decision rules. But it is necessary to keep in mind the limitations that the application of this approach might have for successful communication. In the absence of relevant expertise, decision-makers might be biased towards subjective probability measures. As in the scientific literature about the role of risk and uncertainty, it was often assumed that decision-makers are rational in their assessment of probabilities and making choices and are supported by analytical tools developed within the framework of the Classical utility theory. However, too little attention was given to judgments and simplified decision rules, which are influenced by cognitive and behavioural biases. There was no assumption that the decision-maker might be risk averse or myopic or ambiguous in his decisions. However, if the communication process is only based on communicating the results from analytical estimations, people will tend to perceive multi-hazard situations more as additive risk and not synergistic. The literature evidence on decision-making shows that individuals tend to use the rule of addition and numbers to solve numerical problems, which means simply adding a small number to arrive at a larger number (Torbeyns et al., 2011). However, the whole process of risk perceptions is not only numerical, but involves behavioural heuristics. The communication process could involve also the second system of risk perceptions, largely addressing behavioural and cognitive patterns. Stakeholders will perceive specific combinations of risks as synergistic if they perceive these risks as being highly dangerous and are adequately informed about these risks. This underlines another importance of efficient communication strategies of the dangers of combined risks, as without it people will underestimate the extent of their exposure to potential harm caused by synergistic risks. The literature identified the need for further research on the format and content of communication messages that can effectively inform people about synergistic risks (Dawson et al., 2012). Taking into account the findings of works within the framework of support theory about the unpacking of information about potential synergistic hazards and their components (Tversky and Kahneman, 1992), we argue that a successful communication
45
strategy dealing with exposure to multi-hazards should explain every separate risk but also the interdependencies.
46
Appendix 1: Bayesian Decision Theory 1 Introduction The making of decisions is one of the most basic of human of activities. As humans, we are daily confronted by the necessity of having to choose between the alternative courses of actions that present themselves to us. And since each action has its own set of corresponding consequences, the process of making decisions can be a deeply emotional affair. This is especially so if the consequences of a given action carry with them high potential losses and gains. So, on a first glance, it would be logical to assign Decision Theory, in its totality, to the domain of psychology. However, we also recognize that there are many instances where the decisions making process has a highly procedural quality about itself. Case in point, if we are ‘in doubt', we typically enumerate the alternative courses of actions that are open to us, their possible consequences, the likelihood of these consequences, and the losses and gains which these consequences could entail, should they materialize. Now, since procedures imply structure, and structures may be captured in laws, and laws are most succinctly written down in mathematics, we, alternatively, on a second glance, could be tempted to postulate that Decision Theory should purely be a mathematical affair. We shall do neither here. It is our belief that the mathematical equations only provide us with the structural machinery of the decision making process, whereas psychological insights provide us with the particular inputs that go into this mathematical framework. So, we let our decision theory be the ensemble of mathematical structure and its psychological inputs. The objective of this research is to present a general framework wherein, by integrating the mathematical and the social scientific approaches, both rationality as well as subjective experience may take their proper places. In the Bayesian decision theoretic framework, outcome probability distributions are constructed for each and every decision we may wish to take. These outcome probability distributions are then mapped onto utility probability distributions. All this is done by the way of the product and the sum rule. Furthermore, in order to assign utilities to monetary outcomes, use is made of the Weber-Fechner law (Fechner, 1860. The use of these psychophysical laws is tantamount to using Bernouilli's utility suggestion of 1738, (Masin et al., 2009). Stated differently, Bernoulli's suggestion for the subjective value of objective monies, derived from intuitive first principles, was a century later confirmed, by way of
47
rigorous psychophysical laboratory experiments, to hold for the subjective value of all objective stimuli. The Bayesian decision theoretic framework is presented as specifically Bayesian in order to contrast it with expected utility theory, which suffers from many paradoxes and which is just a special case of the more general Bayesian framework. In expected utility theory, means of the utility distributions under different decisions are compared to each other. In contrast, in the Bayesian framework utility, ranges under different decisions are compared to each other, that is, not only the expected utility but also the standard deviation of the utility is used to differentiate between the different decisions.
48
2 Deduction and Induction The here proposed Bayesian decision theoretic framework has as one of its basic assumptions that Bayesian probability theory is, by construction, common sense quantified (Jaynes, 2003). As this is such an essential assumption, we will use both this and the following chapter to demonstrate the appropriateness of Bayesian probability theory as a model for human rationality2. If Bayesian probability theory is indeed common sense quantified, as we claim, then it should, at a very minimum, be commensurate with the formal rules of deductive and inductive logic, (Jaynes, 1957, 2003). So, we will now proceed to demonstrate how these rules, that is, the Aristotelian syllogisms, may be derived by way of the rules of Bayesian probability theory. The rules of Bayesian probability theory are the product and sum rule, respectively, P A | B P B
P AB
P B | A P A
(2.1)
and P A | B
where
A
1 P A | B
is the proposition ‘not
(2.2)
A
’, (Jaynes, 2003).
The strong syllogisms in Aristotelian logic correspond with the process of deduction. The first strong syllogism is
Premise: Observation: Conclusion:
2
if
A
then also
(2.3)
A
therefore
B
B
Those readers who are already intimately familiar with Bayesian inference and, consequently, will need no
convincing may, if pressed for time, skip both this as well as the following chapter, and proceed directly to the decision theoretic part of this appendix.
49
It follows from the premise that proposition
A
is logically equivalent to the proposition
,
AB
that is, they have the same ‘truth value’ (2.4)
A AB
The most primitive assumption of probability theory is that consistency demands that propositions which are logically equivalent, that is, have the same truth values, should be assigned equal plausibilities, (Jaynes, 2003). So, the premise of (2.3) translates to P A P AB
(2.5)
Because of the product rule, (2.1), we have P AB
P A P B | A
(2.6)
Substituting (1.5) into (1.6), it follows that after having observed
A
the proposition
B
has a
probability 1 of being true, that is, P B | A 1
(2.7)
This concludes our derivation of the first strong syllogism of deductive logic. The second strong syllogism is
Premise:
if
Observation:
B
Conclusion:
therefore
A
then also
B
(2.8) A
The premise in the second strong syllogism is the same as premise of the first strong syllogism. Therefore, we may use the results of the first strong syllogism in the derivation of the second strong syllogism. From the sum rule and the first strong syllogism, (2.2) and (2.7), it follows that P B | A 1 P B | A 0
(2.9) 50
From the product rule, (2.1), we have P A | B
P B
P AB
P B | A P A
From (2.9) and (2.10), it follows that for
P A | B
P A
P B | A P B
P B
(2.10)
0
(2.11)
0
Substituting (2.11) into the sum rule (2.2), we find that that after having observed proposition
A
P A | B
B
the
has a probability 1 of being true, that is,
1 P A | B 1
(2.12)
This concludes our derivation of the second strong syllogism of deductive logic. The weak syllogisms in Aristotelian logic correspond with the process of induction. The first weak syllogism is
Premise:
if
Observation:
B
Conclusion:
therefore
or, equivalently, as
A
A
then also
B
(2.13a) A
less plausible
becoming less plausible implies that
Premise:
if
Observation:
B
Conclusion:
therefore
A
then also
A
has become more plausible,
B
(2.13b) A
more plausible
We will first derive (2.13b). From the product rule, (2.1), we have
P
A
| B
P
A
P B | A P B
(2.14)
51
Substituting (2.7) into (2.14), we find
P
A
| B
P A
1 P B
(2.15)
Seeing that a probability lies in the interval 0 , 1 , we have that
0 P B
(2.16)
1
From (2.16), it then follows that (2.15), translates to the inequality P A | B
P A
(2.17)
In other words, after having observed
B
the proposition
A
has become more probable. By
applying the sum rule, (2.2), to (2.17), we find the equivalent weak syllogism (2.13a), P A | B
1 P A | B 1 P A
P A
(2.18)
This concludes our derivation of the first weak syllogism of inductive logic. The second weak syllogism is
Premise: Observation: Conclusion:
if
A
then also
B
(2.19a)
A
therefore
B
less plausible
or, equivalently,
Premise: Observation: Conclusion:
if
A
then also
B
(2.19b)
A
therefore
B
more plausible
52
In what follows we first will derive the second weak syllogism in the form of (2.19a). From the product rule it follows that P A | B P B
P A B
P B | A P A
(2.20)
Rewriting (2.20), we get P A | B P A
P B | A P B
(2.21)
From (2.18), we have that (2.21) implies the inequality P A | B P A
P B | A P B
(2.22)
1
It then follows that P B | A P B
(2.23)
In words, after having observed
A
the proposition
B
has become less probable. By applying
the sum rule, (2.2), to (2.23), we find the equivalent weak syllogism (2.19b), P B | A
1 P B
| A 1 P B
P B
(2.24)
This concludes our derivation of the second weak syllogism of inductive logic. We have seen how the Aristotelian syllogisms may be derived effortlessly by way of the product and sum rules. We now, will show how a plausibility premise tends in a limit of certainty to the second Aristotelian strong syllogism, thus, demonstrating that deduction is just a limit case of induction.
53
tends to certainty if we observe
Premise:
B
Observation:
B
Conclusion:
therefore
A
(2.25) A
tends to certainty
The premise of syllogism (2.25) translates to P B | A 1
(2.26)
Making use of the sum rule, (2.2), we have P B | A 1 P B | A 0
(2.27)
From the product rule, (2.1), we have
P A | B
P A
P B | A
Equality (2.28), for both
P A | B
P B
(2.28)
P A 0
and
P B
0
, tends to
(2.29)
0
Substituting (2.29) into the sum rule (2.2), we find P A | B
1 P A | B
1
(2.30)
It follows that from a plausibility premise we may approach the second strong syllogism in a limit of certainty, (2.26) and (2.30). This then begs the philosophical question if there really is such a thing as deduction, and if not all is induction. Even the proverbial death and taxes not being absolute truths. In regards to the latter, we may easily envisage an utopian paradise were taxes are no longer needed. In regards to the former, medical science has found that the process of aging is due to the wear and tear in the telomeres. So, if the telomeres could be restored, in principle, death of aging could be a thing of the past. 54
The fact that plausible premises by way of the product and sum rule, that is, Bayesian probability theory, may lead, in a limit of certainty, to the strong syllogisms of deduction, led Jaynes, (2003), to the statement that Bayesian probability theory is an extension of formal logic. We now turn to an even weaker kind of syllogism (Jaynes, 2003).
more plausible if we observe
Premise:
B
Observation:
B
Conclusion:
therefore
A
(2.31) A
more plausible
The premise of this ‘plausibility’ syllogism translates to P B | A P B
(2.32a)
or, equivalently, P B | A P B
(2.32b)
1
From the product rule, (2.1), we have that
P
A P
| B
A
P B | A P B
(2.33)
Substituting (2.32b) into (2.33), we get
P
A P
| B
A
1
(2.34a)
or, equivalently, P A | B
P A
(2.34b)
This concludes our derivation of the plausibility syllogism (2.31).
55
3 Are we Bayesians? The mathematician Polya (1945, 1954) showed that even a pure mathematician actually uses the weak syllogisms of induction of the type (2.13), (2.19}, and (2.31). Of course, on publishing a new theorem, the mathematician will try very hard to invent an argument which uses only the strong syllogisms of deduction, (2.3) and (2.8). But the reasoning process which led to the theorem in the first place almost always involves one of the weaker syllogisms. This human tendency to gravitate towards plausible reasoning, that is, induction, is also found by Wason and Johnson-Laird (1972). They report psychological experiments in which subjects erred systematically in simple tests which amounted to applying a single syllogism. When asked to test the hypothesis ‘ A implies tendency to consider it equivalent to ‘ B implies
A
B
’, subjects had a very strong
’, the plausible syllogism (2.31), instead of
‘not- B implies not- A ’, the strong syllogism, (2.8). This preference for plausible reasoning over deductive reasoning suggests a tendency toward Bayesianity, (Jaynes, 2003).This is the opposite of the Kahneman-Tversky conclusion (1972), who cite the research of Johnson-Laird and Wason (1972) in a footnote of their paper, as further corroborating evidence for the fact that people are supposed to be inherently nonBayesian in the way they reason. That is, people seem to be ‘incapable of discovering a very simple logical rule’. Where Polya (1945, 1954), a mathematician himself, demonstrates in great detail how mathematicians may derive their theorems by way of rational inference. There we have Slovic et al. (2004), who are social scientists, who point to the intuitive goodness that those same mathematicians may feel once they have derived their elegant simple theorems and suggests that this positive affect is what guides the mathematician to his theorems. Now, being that human beings are not wholly infrarational, groping their way through life by emotions alone, nor are they wholly rational, we propose a middle position. The mathematician deriving his theorems is a limit case of rationality with low emotional content, where the product and sum rule of probability theory give us an accurate model of how humans navigate in a rational manner through the hypothesis space. In contrast, a limit case of irrationality is given by the case where a possible state of nature has such a high negative emotional content, that the psychological model of denial as a defencemechanism accurately models how we assign a probability of zero to that state of nature, denying it to be true, even in the face of evidence pointing to the contrary.
56
A poignant example of irrational denial was seen in the Chernobyl disaster (Medvedev, 1990). 0.028 Roentgens per second (R/s) are deemed to be fatal after 5 hours. After the initial explosion all remaining dosimeters had limits of 0.001 R/s. and therefore read ‘off-scale’. Thus, the reactor crew could only state with certainty the lower bound of radiation levels of 0.001 R/s. Based on this lower bound, the reactor crew chief Alexander Akimov choose to assume that this lower bound was the actual radiation level, and that the reactor was intact. By setting a probability of zero to radiation levels higher than 0.001 R/s, the reactor crew chief managed to keep the probability of the state of nature where the reactor had exploded, a scenario too horrible to conceive, at zero. Subsequently, Alexander Akimov ignored the evidence of pieces of graphite and reactor fuel lying around the building, and the readings of another dosimeter, having a limit of 1.0 R/s, brought in later were dismissed under. The rationalization being that the new dosimeter must have been defective. It was later estimated that the radiation levels in the worst-hit areas of the reactor building were at 5.6 R/s at the time.
3.1 Heuristics The psychological paradigm of heuristics and cognitive biases originated as a reaction to the mathematical expected utility theory (Philips and Winterfeldt, 2006). When it was found that expected utility theory, the dominant decision theory in the 1950's, failed to adequately model human decision making in certain instances, leading to such paradoxes as the Ellsberg and Allais paradox, the psychologists stepped in. And the rest is, as they say, history. Currently, the psychological paradigm of heuristics and cognitive biases is the dominant decision theoretic paradigm. In this paradigm heuristics are mental shortcuts or ‘rules of thumb’ we as humans use to do inference. It is said that, as we don't always have the time or resources to compare all the information at hand, we use heuristics to do inference quickly and efficiently. Most of the times these mental shortcuts will be helpful, but in other cases they are lead to systematic errors or cognitive biases. The representativeness heuristic is proposed to be one type of heuristic that we use when making judgments, (Tversky and Kahneman, 1973). Using this heuristic, we estimate the likelihood of an event by comparing it to an existing prototype that already exists in our minds. Our prototype is what we think is the most relevant or typical example of a particular event or object.
57
Like other heuristics, making judgments based upon representativeness is intended to work as a type of mental shortcut, allowing us to make judgments quickly. Though, if not used wisely, it can lead to errors. That is, when making judgments based on representativeness, we are likely to overestimate the likelihood that something will occur. Just because an event or object is representative does not mean that it is more likely to occur. In their classic experiment (Tversky and Kahneman, 1973), Tversky and Kahneman presented the following profile to a group of participants:
Tom W. is of high intelligence, although lacking in true creativity. He has a need for order and clarity, and for neat and tidy systems in which every detail finds its appropriate place. His writing is rather dull and mechanical, occasionally enlivened by somewhat corny puns and by flashes of imagination of the sci-fi type. He has a strong drive for competence. He seems to feel little sympathy for other people and does not enjoy interacting with others. Self-centred, he nonetheless has a deep moral sense.
The participants were then divided into three separate groups and each group was given a different task. The first group was asked how similar Tom was to one of nine different college majors. The majority of participants in this group believed Tom was most similar to an engineering major and least similar to a social science major. Participants in the second group were asked to rate the probability that Tom was one of the nine majors. The probabilities given by the participants in the second group were very similar to the responses given by those in the first group. In the third group, participants were asked a question unrelated to Tom's description. They were asked to estimate what percentage of first-year graduate students were in each of the nine majors. Tversky and Kahneman found that 95% of the participants judged that Tom was more likely to study computer science than humanities or education, despite the fact that there was a relatively small number of computer science students at the school where the study was conducted. Seeing that people were likely to judge that Tom was a computer science major based on representativeness, Tversky and Kahneman argue that pertinent information such as the small number of computer science students was ignored and, consequently, cognitive errors were made.
58
However, we propose that the participants might have made use of the plausible syllogism, (2.31):
Premise:
Profile more plausible if Computer Science
Observation:
Profile
Conclusion:
therefore Computer Science more plausible
and its corollary
Premise:
Profile less plausible if Humanities
Observation:
Profile
Conclusion:
therefore Computer Science less plausible
Based on Tom's profile the probability of him being a computer science student is increased, whereas the probability of him being a Humanities student is decreased. Consequently, it is judged by 95% of the participants that Tom is more likely to study computer science than humanities. Note that a formal Bayesian analysis would have given us the same result. We will demonstrate. The propositions we will work with are
A1
= Computer Science Student
A2
= Humanities Student
B
= Profile
If Tom’s psychological profile is both archetypical of Computer Scientists and antiarchetypical of Humanities, then P B | A1 P B | A 2
(3.1)
or, equivalently, P B | A1 P B | A2
1
(3.2)
59
where the symbol ‘ ’ stands for ‘much greater than’. Tversky and Kahneman (1973), report that the prior odds for humanities against computer science were estimated by the participants to be P A2
P A1
(3.3)
3
If the odds (3.2) are deemed to exceed the odds (3.3), that is, if those odds are estimated to be greater than 3. Then, by way of (3.2), (3.3), and the product rule (2.1), we find that P B | A1 P A 1 P B | A2
P A2
P A1 B
P A2 B
P A1 B
P B
P A2 B
P B
(3.4)
P A1 | B
P A2 | B
1
or, equivalently, P A1 | B
P A2 | B
(3.5)
So, if 95% of the participants judged that Tom was more likely to study computer science than humanities, then we may infer that 95% of the participants deemed the odds (3.2) to be greater than 3, which in our opinion is not that far-fetched. However, Tversky and Kahneman (1973) beg to differ, as they are of the opinion that the plausibility judgments of the participants ‘drastically violate the rules of the normative [that is, Bayesian] rules of prediction’. As an aside, one of the main arguments against the Bayesian model of human rationality was that it was too complex. In contrast, the cognitive heuristics paradigm was deemed to be much more parsimonious and, thus, more realistic model for human inference. 60
But what started as a parsimonious theory of human inference, consisting of a handful of heuristics and cognitive biases, has proliferated into 20 heuristics, (source Wikipedia, search ‘heuristics’), and an impressive 170+ cognitive biases, (source Wikipedia, search ‘list of cognitive biases’). Further belying the initial ideal of parsimony, is the decision theoretic Prospect Theory, (Kahneman and Tversky, 1979). This theory is a hybrid of the mathematical expected utility theory and the psychological heuristics and cognitive biases. Prospect Theory was plagued by its own particular set of paradoxes, this theory was later revised into Cumulative Prospect Theory, (Kahneman and Tversky, 1992). It may be checked that the mathematics of both Prospect Theory and the Cumulative Prospect Theory are not elementary, (Fennema and Wakker, 1996). And it is at this point that we submit that it is the Bayesian paradigm, which is extremely parsimonious, rather than the cognitive heuristics paradigm. The Bayesian paradigm states that once we have assigned our probabilities, rational inference or, equivalently, common sense, can only be modelled by applying the product and sum rule, that is, (2.1) and (2.2), to these probabilities. Stated differently, the Bayesian paradigm consists of just two ‘heuristics’, the product and sum rule, and these two ‘heuristics’ are the sufficient and necessary operators for our given probabilities, (Jaynes, 1957, 2003).
3.2 Bayesian Inference We state that Bayesian inference is common sense amplified3, having a much higher ‘probability resolution’ than our human brains can ever hope to achieve (Jaynes, 2003). The fact that Bayesian statistics has a much higher ‘probability resolution’ than our human brains can ever hope to achieve, is in accordance with existing psychological studies. Kahneman and 3
If Bayesian inference were not common sense amplified, then it could not ever hope to enjoy the successes it
currently enjoys in the various fields of science; astronomy, astrophysics, chemistry, image recognition, etc... Laplace, the first Bayesian in the modern sense of the word, estimated in the 18th century the most probable mass of Saturn to be a 1/3512th part of the mass of the sun, on the basis of the mutual perturbations of Jupiter and Saturn, and the motion of their moons. Furthermore, he estimated the probability of this estimation being in error with more than 1% to be exceedingly small, P = 0.0001. In 1976, another 150 years of accumulation of data had increased this estimation by only 0.63%, well within the bounds predicted by Laplace (Jaynes, 1976).
61
Tversky (1972) found that, if presented with some chance of a success, subjects fail to draw the appropriate binomial probability distribution of the number of success
r
in
n
draws.
Subjects manage to find the expected number of successes, but fail to accurately determine the probability spread of the
r
successes.
But, where in (Kahneman and Tversky, 1972) this is seen as evidence that humans are fundamentally non-Bayesian in the way they do their inference. We, instead, propose that human common sense is not hardwired for data-analysis problems. Were this the case, than there would be no need for data-analysis as we know it. Since we only would have to take a quick look at our data, after which we would be able to draw the probability distributions of interest. However, as humans beings, we do seem to be hardwired for more ‘ecological relevant’ Bayesian problems of inference. For example, given that the burglary alarm has gone off, then the knowledge that a small earthquake has occurred in the vicinity of the house modify our probability assignment of there being a burglar in the house. For this case, Bayesian probability theory is able to model our common sense intuitions regarding the plausibility of a burglary, as we will demonstrate here. The narrative we will formally analyse is taken from (MacKay, 2003):
Fred lives in Los Angeles and commutes 60 miles to work. Whilst at work, he receives a phone-call from his neighbour saying that Fred's burglar alarm is ringing. While driving home to investigate, Fred hears on the radio that there was a small earthquake that day near his home.
Now, how does the knowledge of a small earthquake having occurred translate to our state of knowledge regarding a possible burglary? We assume that the neighbour would never phone if the alarm is not ringing and that the radio is a trustworthy reporter to. Stated differently, we know for a fact that the alarm is ringing and that a small earthquake has occurred near the home. Furthermore we assume that the occurrence of an earthquake and a burglary are independent and that the alarm ringing is somehow dependent on an earthquake or burglary occurring. The propositions we will then be working with are
B
= Burglary
B
= No burglary
62
A
= Alarm
A
= No alarm
E
= Small earthquake
E
= No earthquake
We assume that the burglary alarm are almost certainly triggered either a burglary or a small earthquake or both, that is, P A | BE
P A | B E
P A | BE
(3.6)
1
whereas as false alarms, in the absence of either a burglary or a small earthquake, are extremely rare, that is, P A | B E
P E
,
(3.7)
0
Let
e
P B
b
(3.8)
P B
1 b
(3.9)
Then by way of the sum rule, (2.2), P E
1 e
,
It follows, by way of the product rule (2.1) and (3.6) through (3.9), that
63
P AB E
P B E
P A | BE
P AB E
P B E
P A | B E 1 b e
b 1 e
(3.10) P ABE
P BE
P A
P AB E
P B E
| BE
be
P A | B E
0
By way of ‘marginalization’, that is, of application of the sum rule, (2.2), we obtain the probabilities P AE
P AB E
P AB E
b 1 e
(3.11) P
AE
P ABE
P AB E
e
The moment Fred finds out that an earthquake actually did occur, then, by way of the product rule (2.1) and (3.10) and (3.11), he assess the likelihood of a burglary to be
P B | AE
P
ABE
P
AE
be
b
(3.13)
e
And in (3.13) we see that in the presence of an alternative triggering of the alarm, that is, a small earthquake occurring, the alarm has lost its predictive power over the prior probability of a burglary, that is, (3.8) and (3.13), P B | AE
P B
(3.14)
Consequently, Fred’s fear for a burglary, as he rides home, after having heard that a small earthquake did occur, will only be dependent upon his assessment of the general likelihood of a burglary occurring. And if we assume that Fred lives in a nice neighbourhood, rather than some crime-ridden ghetto, then we can imagine that Fred will be greatly relieved. A traditional argument against Bayesian probability theory as a normative model for human rationality is that people are in general numerical illiterate. Hence, the Bayesian model is deemed to be too complex a model for human inference, (Slovic et al., 2004). However, 64
note that the Bayesian analysis given here was purely qualitative, in that no actual numerical values were given to our probabilities, apart from (3.6) and (3.7), which are limit cases of certainty and, hence, in a sense, may also be considered to be qualitative. And the result of this qualitative analysis, that is, (3.14), seems to be intuitive enough for the Fred scenario.
65
4 The General Framework We now proceed to work out the general framework, using the Bayesian rules of probability theory. Say, that there is more than one possible event, which may occur. Then each of these events has associated with it a probability and a consequence. In our problem of choice, the possible decisions,
D
, under consideration are whether or not to wear seatbelts:
i
,
D 1 Seatbelts
The relevant events,
E
D
j
No Seatbelts
, when driving a car, as perceived by the decision maker, are
,
E 1 No Accident
The perceived outcomes,
O 1 All is Well
2
O
,
E 2 Small
k
,
Accident
E 3 Severe
Accident
, are
O 2 Some
Pain
,
O
3
General
Misery
As, in this particular case, the decisions taken do not modulate the probabilities of an event, we have that the probability for an event conditional on the decision taken is the same for both decisions, say: P E1 | D
for
i 1, 2
i
0 . 950
,
P E
2
| D
i
0 . 049
,
P E 3 | D
i
0 . 001
(4.1)
. However, the conditional probability distributions of the outcomes given an event
are modulated by the decision taken,
P O k | E j D i
decision maker is considering to wear seatbelts,
. We first consider the case where the D1:
66
P O 1 | E 1 D 1 1 . 00 ,
P O
2
| E 1 D 1 0 . 00 ,
P O
3
| E 1 D 1 0 . 00
P O 1 | E 2 D 1 0 . 75 ,
P O
2
| E 2 D 1 0 . 25 ,
P O
3
| E 2 D 1 0 . 00
P O 1 | E 3 D 1 0 . 20 ,
P O
2
| E 3 D 1 0 . 70 ,
P O
3
| E 3 D 1 0 . 10
,
(4.2)
Then by way of the product rule, the first necessary operator of probability theory, (Jaynes, 2003), P A | B P B
P AB
P B | A P A
(4.3)
we may combine the probability of an event (4.1) with the corresponding conditional probability distributions of some outcome given that event, (4.2), and obtain the probabilities of an event
E
j
and a outcome
O
k
given decision
D
i
,
P E
j
O
| D1 .
k
We may present all these probabilities in a table and, so, get the corresponding bivariate probability distribution, Table 4.1. P E
j
O
k
| D1
O 1 All is Well
E 1 No Accident
O 2 Some
Pain
O
3
General
Misery
0.9500
0.0
0.0
E 2 Small
Accident
0.0370
0.0120
0.0
E 3 Severe
Accident
0.0002
0.0007
0.0001
Table 4.1: bivariate event-outcome probability distribution for
D1
Note that, each row in Table 1 corresponds with a risk distribution for some event
E
q
, (4.2).
We then also may see why a ‘risk distribution’ is not a ‘risk probability distribution’, since elements in a risk distribution do not sum to 1, a necessary prerequisite for a distribution to be a probability distribution. Let
Ai
A1 , , A n
be a set of
that is, one and only one of the m
Ai
n
mutually exclusive and exhaustive propositions,
is necessarily true. Let
B
j
B1 ,
, Bm
be another set of
mutually exclusive and exhaustive propositions. Then, by way of the generalized sum rule,
(Jaynes, 2003), the second and last necessary operator of probability theory, 67
n
P Ai B
j
P A1 B
j
P An B
j
P B
j
(4.4a)
i 1
where
m
P B
j
1
(4.4b)
j 1
we may ‘marginalize’ the event-outcome probabilities E
j
P E
j
O
| D1
k
over the outcomes and
and so get the marginalized outcome probability distribution, Table 4.2.
O 1 All is Well P
Ok
| D1
O 2 Some
0.9872
O
Pain
3
General
0.0127
Table 4.2: marginalized outcome probability distribution for
Misery
0.0001 D1
We now consider the case we the decision maker is considering not to wear seatbelts, D
2
. Say we have the following conditional probabilities:
P O 1 | E 1 D
2
1 . 00 ,
P O
2
| E1D
2
0 . 00 ,
P O
3
| E1D
2
0 . 00
P O 1 | E 2 D
2
0 . 25 ,
P O
2
| E2D
2
0 . 75 ,
P O
3
| E2D
2
0 . 00
P O 1 | E 3 D
2
0 . 10 ,
P O
2
| E3D
2
0 . 30 ,
P O
3
| E3D
2
0 . 60
,
(4.5)
Then, using the product rule (4.3) we may combine the probability of an event, (4.1), with the corresponding conditional probability distributions of some outcome given that event, (4.5):
68
P E
O
j
| D2
k
O 1 All is Well
E 1 No Accident
O 2 Some
O
Pain
3
General
Misery
0.9500
0.0
0.0
E 2 Small
Accident
0.0120
0.0370
0.0
E 3 Severe
Accident
0.0001
0.0003
0.0006
Table 4.3: bivariate event-outcome probability distribution for
Marginalizing the event-outcome probabilities
P E
j
O
k
| D2
D
2
over the events
E
j
, by of the
generalized sum rule (4.4), we get the marginalized outcome probability distribution, Table 4.4.
O 1 All is Well P
Ok
| D
2
O 2 Some
0.9621
0.0373
Table 4.4: marginalized outcome probability distribution for
We summarize, each potential decision, possible events occurring,
E
ji
having as potential outcomes
E
Ok i
1
, , E m
O
1
Pain
i
D
i
O
3
General
0.0006 D
2
, which we make will lead to set of
, with probabilities of occurring P E
, , O l
i
Misery
ji
, with conditional probabilities P O
| Di
ki
,
| E
ji
Di
.
By way of the product rule, (4.3), we compute the bivariate probability of an event and an outcome conditional on the decision taken:
P E
ji
Ok | Di P E i
ji
| Di P Ok | E i
ji
Di
(4.6)
The outcome distribution is then obtained by marginalizing, by way of the generalized sum rule (4.4), over all the possible events
P O
mi
ki
| Di
P E
ji
O
ki
| Di
(4.7)
ji 1
Note that the marginalized outcome probability distributions (4.7) are the information carriers of the inference part of the decision making process. 69
5 Source Dependence Before we progress with the general framework, we will turn to the psychological phenomenon of source dependence. Ellsberg (1961) found that people’s willingness to bet on an uncertain event depends not only on the degree of uncertainty but also on its source. He observed that people prefer to bet on an urn containing equal numbers of red and green balls, rather than on an urn that contains red and green balls in unknown proportions. Tversky and Kahneman (1992), state that source dependence constitutes one of the minimal challenges that must be met by any adequate descriptive theory of choice. As our descriptive theory of choice is essentially Bayesian, we will give a Bayesian treatment of this phenomenon. Note that we do not claim that decision makers will derive the Bayesian equations which we will shortly derive ad verbatim. Rather, we state that Bayesian inference is common sense amplified, having a much higher ‘probability resolution’ than our human brains can ever hope to achieve. But even so, the Bayesian analysis should, to some extent, be commensurate with our intuitions. Say, we have a large urn consisting of 1000 balls of which 500 are red and 500 green. If we tell our subject that of the 1000 balls 500 are red and 500 green, and that he is to draw a ball
n 100
times, and that after each draw he will get 1 dollar if the ball is red and nothing if
the ball is green. After which the ball is put back in the urn. As we are not in the custom of giving money away, the subject is also told that for the privilege to partake in this bet he must pay an entrance fee of 50 dollars. So, the probability of drawing
r
red balls in the first bet
translates to
r
P r | n , R , N
R R 1 r ! n r ! N N
nr
n!
Now as the net return, say,
u
(5.1)
is in dollars, having value number of red balls minus entrance
fee, we have
u r 50
,
where, as there are
(5.2)
n 100
draws,
50 u 50
. Assuming a linear utility for money, that is,
each dollar gain corresponds with a corresponding utility gain (more on utilities later on), we make a simple change of variable, using (5.2), 70
r u 50
,
(5.3)
and substitute (5.3) into (5.1), so as to get the probability function of the net return
P u | n , R , N
R ( u 50 ) ! n u 50 ! N
u 50
n!
R 1 N
The probability distribution of the net return for bet 1,
n u 50
(5.4)
p u | n 100 , R 500 , N 1000
, then
can be plotted as, Figure 5.1: 0.08
0.06
0.04
0.02
u -60
-40
-20
20
40
60
Figure 5.1: probability distribution net return for bet 1
This probability distribution has a mean and standard deviation of, respectively, E u
0
,
std
u
(5.5)
5
For the second bet, we tell our subject that the urn holds 1000 balls consisting, either red or green. Again, for every red ball drawn there will be dollar payout, after each draw the ball is to be replaced in the urn, and the entrance fee of the bet consists of 50 dollars. As the subject does not know the number of red balls, the Bayesian thing to do is to weigh the probability of drawing
r
red balls over all plausible values of
the urn. The prior probability of
P R
1 N 1
R
R
, the total number red balls in
, based on the background information, is
(5.6)
71
where
R 0 ,1, , N
. So, by way of the product and the sum rule, (4.3) and (4.4), the
probability of drawing
r
red balls in the second bet translates to
N
P r | n , N
P r , R | n , N
R0
N
P R P r | n , R , N
(5.7)
R0
N
R0
r
R R 1 N 1 r ! n r ! N N 1
nr
n!
Again making a change of variable from the number of red balls drawn, u
r
, to the net return,
, we substitute (5.3) into (5.7), and find the probability distribution of the net return
N
P u | n , N
R0
R N 1 u 50 ! n u 50 ! N n!
1
The probability distribution of the net return for bet 2,
u 50
R 1 N
u
:
n u 50
p u | n 100 , N 1000
(5.8)
, then can be
plotted as, Figure 5.2: 0.08
0.06
0.04
0.02
u -60
-40
-20
20
40
60
Figure 5.2: probability distribution net return for bet 2
This probability distribution has a mean and standard deviation of, respectively, E u
0
,
std
u
29 . 18
(5.9)
In Figure 5.3, we give both probability distributions, Figures 5.1 and 5.2, together: 72
0.08
0.06
0.04
0.02
u -60
-40
-20
20
40
60
Figure 3: probability distributions net return for bets 1 and 2
Comparing the characteristics of the probability distributions of the net return and (3.8), we see that both bets have the same expected values of net gain
E u
the first bet has smaller uncertainty in regards to the final payout, with
u
second bet,
std
u
29 . 18
std
5
0
u
, (3.4)
, but that
, than the
. So, if Ellsberg (1961) observes that people prefer to bet on an urn
containing equal numbers of red and green balls, rather than on an urn that contains red and green balls in unknown proportions, then this implies that for bets of this kind, people prefer those bets with the least uncertainty, that is, they show uncertainty aversion. Heath and Tversky (1991) have found evidence that indicates that people often prefer a bet on an event in their area of competence over a bet on a matched chance event, although the former probability is vague and the latter is clear. This would imply that people feel that they can make clear probability estimates in their fields of specialization, at least enough to beat the fifty-fifty odds. Now, some may wonder why source dependence should constitute one of the minimal challenges that must be met by any adequate descriptive theory of choice (Tversky and Kahneman, 1992). The reason is that they find fault with Expected Utility Theory, which essentially only computes the expected utility,
E u
, for the ‘utility probability distributions’
given in Figure 5.3. And, as we found in (5.5) and (5.9), both bets have an expected utility of zero and, hence, EUT cannot differentiate between both bets. This, then, is in contrast with Ellsberg’s findings that people prefer the bet having the smallest spread in net payout,
std
u .
In the Bayesian approach it is utility distributions themselves which are the essential entities of interest, not the
E u
. The latter are only one of the characteristics of the former
and, consequently, tell us only part of the story; as was pointed out in (Ellsberg, 1961). So, from a Bayesian point of view, the Ellsberg source dependence is not that much of an issue.
73
Rather its importance lies in the fact that, together with (Heath and Tversky, 1991), it may point to uncertainty aversion.
74
6 Expected Utility Theory We stated in the previous section that the Expected Utility Theory is just a special case of the general Bayesian decision theoretic framework given here. We illustrate this by way of the Ellsberg bets in the previous chapter. P O
Say, we have the decision-outcome probability distribution want to associate the labels
O
k
with values on the utility dimension P u | O
by way of the probability distribution
k
u
k
| D
i
, (4.7), and we
, then we may do this
. Combining both probability distributions, by
way of the product rule (4.3), and marginalizing over the possible outcomes
O
k
by way of the
sum rule (4.4), we get the decision-utility probability distribution:
P u | D i
P u , O
| Di
k
k
P u | O
k
P O k
| Di
(6.1)
k
In the special case where
P u | O
admits only one utility value
u
k
is of the Dirac-delta form, we have that every
O
k
label
. This then enables us to obtain the decision-utility probability
distribution by way of the simple change of variable on the labels
O
k
in the decision-outcome
probability distributions. We demonstrate below. Let
D1
stand for the decision to choose the first Ellsberg bet, and
to choose the second Ellsberg bet. Each bet admitted
O
r
O 0 , O 1 ,
,O
n
n 1
D
2
for the decision
possible outcomes
(6.2)
were the outcome labels stand for
Or
for
(6.3)
r red balls
r 0 , 1, , n
. We, trivially, replace the label
O
r
by the label
r
. Then the corresponding
outcome distributions, compare with Tables 4.2 and 4.4, can be obtained by substituting, were appropriate, the values
n 100
,
R 500
,
N 1000
into (5.1) and (5.7):
75
P r | D 1
P r | D
1 r ! 100 r ! 2
100
100 !
2
1
1000
100 ! r ! 100 r !
1001
(6.4a)
R0
R 1000
r
R 1 1000
100 r
(6.4b)
The implicit conditional utility probability distribution employed in the Ellsberg example was
P u | r
for
r 0 , 1, , n
u r 50
1, 0,
u r 50
,
(6.5)
. This probability function takes us from the
r
dimension, which is just the
dimension of the outcome labels, in terms of the number of red balls drawn, to the dimension of utilities. From (6.5), we see that every
r
label admits only one utility value
u
, that is, (6.5)
is of the Dirac-delta form: P u | O
where
r
u
r
(6.6)
is the Dirac-delta function for which
1,
u c
0,
u c
u c du
(6.7)
Because of (6.7), we have that
f u u
c du f c
(6.8)
This then enables us to make a direct one-on-one mapping, by way of the simple change of variable, instead of using the more elaborate but also more general (6.1). Note that the Dirac-delta
u c
is equivalent to a normal distribution with mean
c
and a standard deviation that goes to zero. Also note that for a different choice of (6.6) we
76
may have that each outcome
O
admits a range of utility values
r
u
, instead of just the one
value. Substituting into (5.4) and (5.8), were appropriate, the values N 1000
n 100
,
R 500
,
, we obtain the utility probability distributions, Figures 5.1 and 5.2,
P u | D 1
P u | D
1 ( u 50 ) ! 50 u ! 2
2
100
100 !
1 1001
100 !
u
50 ! 50 u !
(6.9a)
1000
R0
R N
u 50
R 1 N
100 u 50
(6.9b)
Now, where in the Bayesian approach the entities of interest are utility distributions (6.9) themselves, in EUT the entities of interest are expectation values of (6.9):
50
E u | D 1
u P u | D 1
(6.10a)
u P u | D
(6.10b)
u 50
50
E u | D
2
2
u 50
According to EUT, the choice with the highest expected utility will be chosen by a rational decision maker. The EUT is a special case of the more general Bayesian framework in that of the utility distributions, which are conditional on the decision taken, EUT only takes into account the expectation values, (6.10), thus, potentially, neglecting useful information, upon which a decision maker may, partly or wholly, base his decisions, (Ellsberg, 1961). Seeing that the expected-utility framework is a special case of the more general Bayesian framework, the former is, trivially, incorporated within the latter. This has the major benefit that the apparent contradiction between the mathematical and the psychological modelling approaches of the human decision making process automatically fall away. Rather, these approaches complement each other. This is because Bayesian probability theory only tells us that the sum and product rule are the sufficient and necessary operators for our given
77
probabilities. However, Bayesian probability theory stays mute on the subject on how we should assign these probabilities, (Jaynes, 2003). For example, if we do a data-analysis, which is a hyper-rational endeavour, a Bayesian will typically assign his probabilities on the basis of consistency requirements. This, however, does not mean that the Bayesian is normative in that he thinks that humans ought to assign their probabilities also by way of consistency requirements; though sometimes they do, for example, by assigning fifty-fifty probabilities to the head-or-tails propositions. But he is normative in that is of the conviction that once we have assigned our probabilities, rational inference or, equivalently, common sense, can only be modelled by applying the product and sum rule to these probabilities. So, from a Bayesian perspective, all the psychological literature on how humans assign their probabilities or their utilities may, in principle, be a welcome help in the construction of any general model of human risk perception. With the caveat that human behaviour is perceived to be basically rational, that is, we all have a ‘sense’ which is ‘common’ to all. Nonetheless, people may vary as to how much they have developed this common sense.
78
7 Utility Functions In the Ellsberg example we introduced the concept of utility. Utilities are a technical term. Firstly, utilities may be used translate outcomes, which are labels, to numerical values, thus, allowing us to compute expectation values and standard deviations, with which we may compare outcome distributions, which otherwise would be qualitative, see for example Tables 4.2 and 4.4. Secondly, utilities may also be used to translate a numerical outcome, for example the number of red balls, to the stimulus of interest, for example money. Thirdly, utilities may be used to translate the stimulus of interest to an appropriate perception scale. If we take money to be an incentive for our decisions, then we may perceive money to be a stimulus. Now as 10 dollars for a rich man is an insignificant amount, whereas for the poor man it is two days’ worth of groceries and, thus, a highly significant amount of money. This difference in perception may be modelled by way of a utility function. That is why after (5.2) we explicitly assume a linear utility for money. This enabled us to let the payout stand for the utility of that payout,
u
.
The translation of stimuli to utilities is analogous to the case where we are asked to translate ‘loudness’ to a numerical value. According to Weber-Fechner law, (Fechner 1860, 1882), postulated in the 19th century by the experimental psychologist Fechner, intuitive human sensations tend to be logarithmic functions of the difference in stimulus. We do not perceive stimuli in isolation, rather we perceive the relative change in stimuli. So, it is often appropriate assign a scale to the change in stimuli, case in point being the decibel scale of sound. The finding of (Kahneman and Tversky, 1979), that empirical evidence shows that the carriers of value, that is, utilities, are gains and losses, not final assets, points to the appropriateness of assigning utilities to the differences in stimuli, rather than the stimuli themselves. The Weber-Fechner law tells us that the Relative Change (RC) is the difference of the logarithms of the stimuli.
S1
and
S2
be two stimuli which are to be compared, then their RC
is
RC
c log
d
S 2 c log
d
S 1 c log
S2 d
(7.1)
S1
79
where stimuli
c
is some scaling factor and S1
and
S2
then RC > 0. If
S2
some base of the logarithm. From (7.1), we have that if
d
are of the same strength, then their RC is 0. If decreases relative to
S1
S2
increases relative to
S1
,
, then RC < 0.
The Weber-Fechner law allows for one degrees of freedom. This can be seen as follows. Since
log
d
log x
x
(7.2)
log d
we can rewrite (7.1) as
RC
q log
S2
(7.3)
S1
where
q
c
(7.4)
log d
Let
S
be an increment, either positive or negative, in stimulus
define the utility of an increment
S
S
. Then we may
in stimulus to be the perceived Relative Change, (7.3),
due to that increment:
u S | S
S S q log
,
S S
(7.5)
S
From (7.5) it can be seen that, in terms of stimuli, we cannot lose more than we have. For if this is not the case, we may have that the ratio of stimuli may become negative, leading to a breakdown of the Weber-Fechner law. Having said that, the first instance that comes to mind, of losing more than we have, is the case when we incur a debt. We propose that in this specific case there are two different stimuli dimensions in play. The first dimensions being a debt dimension and the second the actual income dimension. For example a student loan, initially represents a gain in debt stimulus. This debt only makes itself felt, in terms of loss, after graduation, the moment the 80
monthly payments have to be paid and take a considerable chunk out of one’s actual income. This proposed delay of loss for negative money on the loss scale, that is, debt, may offer an explanation why during the housing bubble people did not hesitate to mortgage themselves to the eventual point of breaking, in 2008. If we want to give a graphical representation of (7.5), then
q
, also known as the
Weber constant, (Fechner 1860, 1882), must be set to some numerical value4. For the moment, we let
q 1
. Now, in order to model the loss of, say, ten dollars for someone who
only has ten dollars with (7.5), we must introduce the minimum significant amount of stimulus
, (Jaynes, 2003), where
0
(7.6)
We explain, even for, say, a homeless person, there is some minimum amount that of money that is still significant. This may be one dollar for a bag of potato chips, or three dollars for a packet of cigarettes. If the loss of money breaks through the limit of the minimum significant amount
, it is, for all intents and purposes, as if all was lost. Using the concept of the
minimum amount of stimulus, we then rewrite (7.5) as
u S | S
S S q log
,
S S
(7.7)
S
Suppose we have a student, who has 200 dollars per month to eat, whose minimum significant amount of money is 0.35 dollar, the price of an energy drink, and who, through no fault of his own, stands to lose all of his money, or, through some good luck, stands to gain 200 dollars, Figure 7.1.
4
Note that for decibels the Weber constant has been determined to be
q 10
log 10 4 . 343
.
81
u
S -200
-100
100
200
-1 -2 -3 -4 -5 -6
Figure 7.1: utility function of 200 dollars for the poor student
Alternatively, the case of the moderately rich man who has one million dollars and who, through some misguided speculation, stands to lose 100.000 dollars, or, through some keen insight, stands to gain 100.000 dollars, Figure 7.2.
u 0.1
0.05
S -100000
-50000
50000
100000
-0.05
-0.1
Figure 7.2: utility function of 10.000 dollars for the rich man
Comparing Figures 7.1 and 7.2, we see that the Weber-Fechner law of experimental psychology captures both the loss aversion of the poor student, that is, asymmetry in gains and losses, as well as the linearity of the utility of relatively small gains and losses for the moderately rich man. According to (Kahneman and Tversky, 1992), apart from the source dependence, another minimal challenge that must be met by any adequate descriptive theory of choice is precisely this phenomenon of loss aversion, that losses loom larger than gains. It is stated to be one of the basic phenomena of choice under both risk and uncertainty. For a discussion of the equivalence of the Weber-Fechner and Stevens’ Power Law, (Stevens, 1961), we refer the interested reader to the original document on the Bayesian decision theoretic framework, which will be submitted as an adjunct to this deliverable. We forego of this discussion here because of the space constraints given to us by the editors of this deliverable.
82
8. A Multi-Risk Analysis In Bayesian probability theory, the probability distributions are the information carriers which represents our state of knowledge, (Jaynes, 2003). In real life decision problems, we are often faced with several possible outcomes. Each outcome having its own plausibility of occurring, relative to the other outcomes taken under consideration. As one outcome becomes ever more plausible, we will go to a limit of certainty, in which only that one outcome is possible. Furthermore, each outcome in the outcome probability distributions may be mapped on either a single utility value or a range of utility values. This remapping then leave us with the utility probability distributions, upon which we may base our decisions. If we have an utility axis which goes from minus infinity to plus infinity, then the utility distribution which is ‘more-to-the-right’ will tend to be more profitable then the utility distribution which is ‘more-to-the-left’. Seeing that we are comparing utility distributions, this then leaves us with the question how to go about comparing such distributions. The answer is that by way of the mean and standard deviation of the utility probability distribution we may get a numerical handle on these probability distribution objects. Let
k
be a positive constant. Then the
distribution under decision
E u
| D 1 k std
u
D
1
, that is,
k
-sigma confidence interval of the utility
P u | D 1 ,
| D 1 , E u | D 1 k std
is given as
u
| D1
(8.1)
By way of Chebyshev's inequality, (Lindgren, 1993), we have the following inequality for the coverage of any
coverage
k
-sigma confidence interval, (8.1),
k
2
k
1
(8.2)
2
Note that the confidence interval (8.1) is used as a proxy for the actual utility distribution P u | D 1 .
For example, let
D
1
be the decision to maintain some status quo and let
D
2
be the
decision to implement some improvement. Then we will implement the improvement, that is, choose decision
D
2
over
D
1
, if the utility distribution under decision
D
2
will tend to be
83
‘more-to-the-right’ then the utility distribution under decision
D
1
. This then translates to the
inequality E u | D
2
k std
In words, the decision
u
D
2
| D2
E u | D 1 k std
u
| D1
(8.3)
will be chosen if the lower bound of the corresponding utility
distribution exceeds the lower bound of the utility distribution under decision
D
1
. For
comparison, under the classical utility theory, the inequality of interest would be E u | D 2
E u | D 1
(8.4)
which only focuses on the expectation values, thus, leaving out the pertinent information the standard deviation has to bring to the decision theoretic table. Inequality (8.4), compared to inequality (8.3), represents a simplification. This simplification is problematic in that it leads to paradoxes such as the Ellsberg paradox (see Chapters 5 and 6) and Allais’ paradox (see Chapter 9).
8.1 A Multi-Risk Scenario MATRIX is a multi-risk project. The very practical problem central to MATRIX is, for example, the scenario in which there is a major European windstorm followed by a major European flood. With the Bayesian decision theoretic framework firmly in place, we now apply our framework to such a multi-risk scenario in flood defences in the Netherlands. The two decisions under consideration in our case study are
D
1
= keep the status quo
D
2
= improve the flood defences.
The investments costs associated with the improvement of the flood defences are designated as
I
= investment costs associated with improved flood defences
84
The possible outcomes in our multi-risk remain the same under either decision, and as such are not dependent upon the particular decision taken. These outcomes are
O1
= regular river flooding
O
2
= catastrophic river flooding
3
= no flooding
O
where
O
2
is the multiple hazard instance in which the synergy of a regular river flooding in
conjunction with a heavy storm conspire to cause a catastrophic flooding. The reason that the heavy storm is not included as a possible outcome is because the flood defences will not protect against damages from this storm, that is, storm damages other than a regular river flooding which are already covered by outcome
O1.
The decision to improve the flood defences or not, is of influence on the probabilities of the respective outcomes. Under the decision to make no additional investments in flood defences and keep the status quo,
D
1
, the probabilities of the outcomes are respectively
P O 1 | D 1 10
2
P O
| D 1 10
4
2
P O
3
| D 1 1 P O 1 | D 1 P O
(8.5) | D1
2
Under the decision to improve the flood defences,
D
2
, the probabilities of the flood outcomes
will be decreased, leaving us with outcome probabilities
P O 1 | D
2
10
P O
| D
2
10
P O 3 | D
2
1 P O 1 | D
2
3
6
(8.6)
2
P O
2
| D
2
The hypothetical flood defences will decrease the chances of a regular river flooding by a factor 10. But, as the proposed flood defences explicitly take into account the failure 85
mechanisms resulting from the simultaneous occurrence of windstorms and a flooding, the chances of a catastrophic river flooding are reduced by a factor 100. We now proceed to assign utilities to the outcomes. The damage costs associated with the outcomes are, respectively, (Kok et al., 2002 ),
C1
= 100 million euro
C
2
= 50 billion euro
3
= 0 euro
C
(8.7)
Note that if we were to do an actual analysis, rather than a demonstration of the here proposed decision theoretical framework, then the cost of money itself, in the form of interest, would have to be added to the damage costs, as is also done in (Kok et al., 2002 ). The initial wealth of the Netherlands is negative. That is, the Netherlands has a national debt of 394 billion Euros. So, if there are any additional investments to made, there will not be a decrease in wealth, rather there will be an increase in debt. Let
M
= 394 billion
(8.8)
Then the (negative) loss utilities for the decision not to invest in additional flood defences, D
1
, are given by the Fechner-Weber law, (7.7), as
u i | D 1 q log
M C
i
,
i = 1, 2, 3
(8.9)
M
If additional investments are made to improve the flood defences,
ui | D
2
q log
M Ci I
,
i = 1, 2, 3
2
, the utilities become
(8.10)
M
Note that, for simplicity’s sake, we have mapped the outcomes corresponding utilities
D
ui
O
i
directly to their
. Rather than using the proper procedural way to obtain such a
mapping, by way of the Dirac-delta function, (6.6), together with the product and sum rule, (6.1). 86
The constant
q
in both (8.9) and (8.10) is the unknown Weber constant of a monetary
stimulus. As it will turn out, all reference to this constant falls away the moment we solve inequality (8.2) for investment stochastic and
E q X
. This can be seen as follows. Let
X
and
Y
both be
some positive constant. Then the inequality
q
I
std
q
X
E q Y std
q Y
(8.11)
is equivalent to the inequality, (Lindgren, 1993), q E
X
std
X
q E
Y
Dividing both sides by the constant
E
X
std
X
E
Y
std
q
std
Y
of (8.12), we are left with a further equivalence
Y
(8.13)
in which all mention to the unknown Weber constant of generality, we may set
u i | D 1 log
q
has fallen way. So, without any loss
to 1 in (8.9) and (8.10), and get
q
M C
(8.12)
i
,
i = 1, 2, 3
(8.14)
i = 1, 2, 3
(8.15)
M
and
ui | D
2
log
M Ci I
,
M
The utility distributions of interest then can be written out as, (8.5), (8.6), (8.7) and (8.14),
87
P O 1 | D 1 , P O 2 | D 1 , P O 3 | D 1 ,
P u i | D 1
u 1 log
M C1
u 2 log
M M C
(8.16)
2
M
u3 0
and (8.5), (8.6), (8.7) and (8.15)
P u i | I , D
2
P O 1 | D 2 , P O 2 | D 2 , P O 3 | D 2 ,
M C1 I
u 1 log
M
u 2 log
I
2
I
(8.17)
M M I
u 3 log
where we explicitly conditionalize on the investment
M C
M
, as this is the variable for which we
have to solve inequality (8.3). The expectation values and standard deviations of utility distributions (8.16) and (8.17), may be computed by way of the standard identities, (Lindgren, 1993),
E X
P X
i
X
std
i
X
P X
i
X i
E X
2
i
i
Using these expectation values and standard deviations, we then construct inequality, (8.3), E u | I , D 2 k std
and find that investment I
u
I
| I,D2
E u | D 1 k std
where decision
D
2
u
| D1
(8.18)
starts to become more profitable than
D1
. This
is then the maximal investment we are willing to make in order to improve of our flood
defences. If we numerically solve for maximal investments
I
I
for different sigma levels
k
, we find the following
for which the implementation of the additional flood defences still
are profitable.
88
sigma level k
coverage CI greater than
maximal Investment I
0
n.a.
1.366 × 106
1
0
1.354 × 108
2
3/4
2.694 × 108
3
8/9
4.036 × 108
4
15/16
5.378 × 108
5
24/25
6.72 × 108
6
35/36
8.062 × 108
Table 8.1: maximal investments for different k-sigma levels So, if we have a 6-sigma level of cautiousness, we will be willing to spend up 8.062 × 108 Euros for the additional flood defences which decrease the chances of flooding from (8.5) to (8.6). And if we only have a 1-sigma level of cautiousness, then we are only willing to spend up to 1.354 × 108 Euros on those same flood defences. Note that the maximal investment of 1.366 × 106 Euros for
k 0
is, roughly, the classical utility theory solution of this multi-risk
investment optimization problem. For two further examples of the application of here the proposed decision theoretical framework, we refer the interested reader to the original document on the Bayesian decision theoretic framework, which will be submitted as an adjunct to this deliverable. In these examples insurance scenarios are given in which both the insurer and the insured must decide on an insurance premium which is acceptable to the both of them.
89
9 Variance Preferences In the Bayesian decision theoretic framework outcome probability distributions are constructed for each and every decision we may wish to take. This is done by way of the product and the sum rule. These outcome probability distributions are then mapped to utility probability distributions. The Weber-Fechner law assigns utilities to monetary outcomes, and the actual mapping is then performed by way of the Dirac-delta function and, again, the product and the sum rule. The utility distributions having the highest k-sigma lower bounds then represent decisions that minimizes probable loss or, depending on the sign of this lower bound, maximizes probable minimal gains. The suggestion that apart from the expected utility it is also necessary to take into account at least the variance, and possibly the higher moments, of the utility distribution was also suggested by Allais (1952, 1953) and Georgescu-Roegen (1953). Allais constructed his famous paradox for the sole purpose of demonstrating the need ‘variance preferences’. The Allais paradox goes as follows. Assuming linear utilities for the value of money, we have a bet which has the following utility distributions P u | D 1
P u | D
p 1 0 .5 , p 2 0 .5 ,
p 1,
(9.1)
u 1 . 000 . 000
and
2
u1 0
(9.2)
u 2 4 . 000 . 000
Based on the utility distributions (9.1) and (9.2), people tend to prefer decision even though the expected utility value under decision D
2
D1
D1
over
D
2
,
is much smaller than under decision
,
E u | D 1 1 . 000 . 000
2 . 000 . 000 E u | D 2
(9.3)
which is in contradiction to the basic premise of expected utility theory that people will choose that decision which maximizes the expected utility. Allais contributed this to the fact
90
that the variance under decision
D1
is zero, while under decision
D
2
it is much greater than
zero, reflective of the fact that there is a very real chance of not winning anything,
std
u
| D1 0
2 . 000 . 000 std
u
| D2
(9.4)
So, people not only try to maximize the expected utility, they also take into account the variances of the respective utility distributions. Hence, the name variance preferences, that is, preferences between decisions based upon the variance, or, equivalently, the standard deviations of the utility distributions. We quote Edwards’ appreciation of the concept behind the ‘variance preferences' suggestion, (Edwards, 1954): There are instances in which this argument seems convincing. You would probably prefer the certainty of a million dollars to a 50-50 chance of getting either four million or nothing. I do not think that this preference is due to the fact that the expected utility of the 50-50 bet is less than the utility of one million dollars to you, although this is possible. A more likely explanation is simply that the variances of the two propositions are different. Evidence in favour of this is the fact that if you knew you would be offered this choice 20 times in succession, you would probably take the 5050 bet each time.
Furthermore, Allais also proposed to use the Weber-Fechner law to assign utilities to monetary stimuli, the stimulus of choice in betting problems. And we can only guess why these two simple ideas of the variance preferences and the Weber-Fechner law are not the current standard in utility theory. A first possible explanation might be that Allais’ articles are written in French (Allais, 1953, 1954), thus, making it relatively inaccessible for the Anglo-Saxon scientific community, whereas (Georgescu-Roegen, 1953) is a paper which was read at the Econometric Society in Kingston and cannot be found online. The second explanation is that Edwards deemed the problem of finding the Weber constant
q
for monetary stimuli to be
insurmountable. We quote, (Edwards, 1954):
91
The dollar scale of the value of money is so thoroughly taught to us that it seems almost impossible to devise a psychophysical situation in which subjects would judge the utility, rather than the dollar value, of dollars.
But as have shown, (8.9) through (8.15), all references to this unknown Weber constant falls away. Consequently, the actual value of the Weber constant
q
q
is not an issue. Stated
differently, we think that Edwards was too quick in his dismissal, on practical grounds, of Allais’ suggestion of variance preferences, which he endorsed in principle (as the first quotation shows). Had Edwards realized that the Weber constant
q
, being only a scaling factor, which
falls away in the decision inequalities, is of no real importance, then the decision theoretical landscape might have been radically from the way it is today. Since it were Edwards’ PhD students that brought the psychological paradigm of heuristics and cognitive biases into this world, Kahneman, Tversky, Slovic, and Lichtenstein being amongst the most prominent amongst them5. Furthermore, we are of the opinion that Edwards may have been too timid in regards to the Weber constant. Introspection would suggest that if we have a monthly expendable income of a thousand Euros, for groceries and the like, then a loss or gain of an amount less than ten Euros would not move us that much. Ten Euros then constitutes a Just Noticeable Difference for an initial wealth of a thousand Euros. The implied Weber constant for money then would be
q 100
. As a consequence, 10 Euros to someone who has a monthly
expendable income of a 1000 Euros holds the same worth as 50 Euros is to someone who has a monthly expendable income of 5000 Euros. As an aside, Allais’ paradox stands prominent among the paradoxes often used to dismiss the excepted utility framework. Which is highly ironical, as it is Allais himself who showed us the way out by pointing out that together with the expected value the variances and the higher order moments of the utility probability distributions should also be taken into account. And proposing to use the Weber-Fechner law to assign utilities to monetary stimuli.
5
Though Edwards himself was never too happy about the implicit irrationality assumption of the psychological
paradigm, (Phillips and Winterfeldt, 2006), which is nihilistic in its heavy emphasis on the imperfections in human decision making processes.
92
10 Discussion The objective of this research was to construct a general decision theoretic framework wherein, by integrating the mathematical and the social scientific approaches, both rationality as well as irrationality may take their proper places. This has resulted in the Bayesian decision theoretical framework, which is a re-appreciation of Allais’ two suggestions of both variance preferences and the use of the Weber-Fechner law in order to assign utilities to monetary stimuli6. This re-appreciation is done in a purely Bayesian context. In the introduction we stated that the mathematical equations, that is, the product and sum rule of Bayesian probability theory, or, equivalently, the plausible syllogisms, Chapter 2, only provide us with the structural machinery of the decision making process, whereas the particular inputs that are fed into this mathematical structure are subjective and, therefore, open for psychological insights. For example, in a data-analysis, which is a hyper-rational endeavour, a Bayesian will typically assign his probabilities on the basis of consistency requirements. This, however, does not mean that the Bayesian is normative in that he thinks that humans ought to assign their probabilities also by way of these rational consistency requirements. Though sometimes they do, for example, by assigning fifty-fifty probabilities to the head-or-tails propositions. But he is normative in that is of the conviction that once we have assigned our probabilities, rational inference or, equivalently, common sense, can only be modelled by applying the product and sum rule to these probabilities. As an example of an irrational probability assignment we can point to the Tschernobyl case, Chapter 2. In this particular instance the possibility of a reactor explosion had such a highly negative emotional content for the reactor crew chief in charge that he went into denial and consequently assigned a probability of zero to this eventuality, which literally was too horrible to conceive, even in the face of overwhelming evidence pointing to the contrary. On the utilities side we have the psychophysical Weber-Fechner law of stimuli perception as our guiding principle to assign utilities to monetary outcomes, Chapter 7. An example of ‘irrational’ utility assignment then would be the case were, for example, a banker 6
Note that Allais only stated that decisions should be based on some functional of the utility distributions. But
he never specified which functional and, consequently, never proceeded to work out his suggestion of 'variance preferences'. We on the other hand, in this deliverable, do commit to a functional, that is, the upper and lower bound, of the utility distributions and then proceed to work out the consequences of the phenomenon of variance preferences.
93
assigns a Weber constant of
q 0
to the utilities of the clients’ money, assigning it a value of
zero, while at the same time assigning a Weber constant of
q 0
to the utility of the banks’
profits, assigning it a value of much greater than zero. Although, this seems to be more a case of simple prioritization, rather than one of irrationality per se.
94
Appendix 2: Information Theory and Risk Communication 1 Introduction In Bayesian probability theory we assign probabilities to propositions and then operate on these probabilities by way of the probability theoretic product and sum rules. Bayesian probability theory has now been supplemented with an extended information theory, or equivalently, an inquiry calculus, (Knuth, 2002, 2003, 2004, 2008, 2010). In the extended information theory we assign relevancies to sources of information and then operate on these relevances by way of the information theoretic product and sum rules. This new information theory, which is Bayesian in its outlook, constitutes an expansion of the canvas of rationality (Skilling, 2008) and, consequently, of the range of psychological phenomena which are amenable to mathematical analysis. For example, say there is the possibility of some danger, then information theory allows us to assign in a rational manner relevances to statements made by officials in regards to that danger. It is then found that the unbiasedness and competence of that official source are directly related to its relevance. A high probability of unbiasedness and competence imply a corresponding high relevance, and a low probability of unbiasedness and competence imply a low relevance. This mathematical derived result is in close correspondence with social scientific findings (Slovic, 1999). Since unbiasedness and competence turn out to be the necessary boundary conditions for relevance, we find that risk communicators, in order for their message to be effective (e.g. relevant), should not only focus themselves on the message itself, but also should take great care to manage their perceived unbiasedness and competence with the public at large. Since it is rationality itself which, given a low perceived probability of unbiasedness and competence of the risk communicators, dictates the public to disregard that which is communicated to them.
95
2 Information Theory New Style Information theory is still a very young scientific discipline. The first rudimentary building blocks of information theory were laid in 1948, with Shannon's work on Information Entropy, and only very recently, in 2009, with the derivation of the information theoretic equivalent of Bayes' Theorem, do we have an information theoretic framework of any generality7. We will give here a brief overview of information theory by introducing the most important information theoretic concepts in their chronological order of discovery.
Information Theory: The First Phase, 1948-1963 Information theory started in 1948 with Shannon’s formal derivation of the information entropy, (Shannon, 1948):
H
p1 ,
, pm
1 p 1 log
p m log
p1
1
(2.1)
pm
as a measure of the amount of uncertainty in the probability distribution p
1
, , p m
. In order
to get some feeling for the abstract statement: ‘the amount of uncertainty in a probability distribution p
1
, , p m
information entropy
H
’ , we start by discussing the building blocks that make up the
.
The probability distribution p to the plausibility of each of the log
,
1 p
j 1, , m
j
m
1
, , p m
represents our state of knowledge in regards
possible outcomes we are considering, and the terms
, in (2.1) are called the ‘surprises’, (Tribus, 1961). If the
almost certain, that is, if
pk 1
k
th
outcome is
, then the corresponding surprise goes to zero, as we observe
this outcome:
surprise
k
th
outcome
log
1 pk
7
log
1
log 1 0
1
In contrast, the main foundations of probability theory were already worked out in great detail by Laplace in
1800, who basically was the first ‘Bayesian’, (Jaynes, 1976).
96
To illustrate, if the sun rises in the morning, then we will not be that surprised, being that the sun always rises. Alternatively, if the
k
th
outcome is almost impossible, that is, if
pk 0
,
then the corresponding surprise will go to infinity:
surprise
k
th
log
outcome
1
log
pk
1
log
0
Imagine our surprise should the sun not rise. If the probability distribution p
1
, , p m
is our state of knowledge in regards to
possible outcomes of some event. Then Shannon’s entropy
H
m
, (2.1), is the expected, that is,
mean, surprise we are expected to experience after the outcome of that event is presented to us:
H p
surprise
where the notation ‘
x
log
(2.2)
1 p
’ stands for the mean, or, equivalently, expectation value, of
x
. This
is what is meant with measuring the amount of uncertainty in a probability distribution, (2.1). For example, if the chances of recovery of a terminal patient are negligible with a probability of effectively 0 and the chance of death is 1, then the amount of uncertainty in regards to the outcome is zero:
H 1 log
1
0 log
1
1
(2.3)
0
0
Now, if the chances of recovery are fifty-fifty, then the amount of uncertainty in regards to the outcome of this event is
H
1
1 log
2
1 2
1
1 log
2
log 2 0.693
(2.4)
1 2
For two-outcome events, (2.2) and (2.3) give, respectively, the minimum and maximum possible entropies
H
. The latter is in accordance with our intuition of uncertainty. The
97
statement ‘fifty-fifty’ is our way of saying that we do not have any clue, whatsoever, as to the eventual outcome of some two-valued event. In general, if we have
m
equally probable outcomes, then the amount of certainty
becomes
m
So, as the number
m
H
m1
, ,
1
1
1 log
m
1 m
1
1 log
m
(2.5)
log m
1 m
of equally probable outcomes of some event goes to infinity, our
uncertainty in regards to the specific outcome will also go to infinity:
H log m log
,
as
m
(2.6)
Property (2.6) is in accordance with our intuition on uncertainty. As the number of equally probable possible outcomes increases, the more uncertain we become in regards to the eventual outcome.
Information Theory: The Second Phase, 1951-2002 Information theory may be used to quantify the information gain of performing some experiment, or, equivalently, asking some question or performing some test. This is done by generalizing the uncertainty measure
H p 1 , p 2 , ,
The measure
m 1
H
,
m
p 1 log
, (2.1), to the distance measure
H
p1
1
p m log
H
, (Kullback, 1951):
pm
(2.7)
m
is known by various names, and one of those names is the cross-
entropy, (Sivia, 1996). The cross-entropy is a measure of the distance between the prior 1 , ,
m
distribution, describing our initial ignorance, and the posterior p 1 ,
, pm
distribution, describing the information which the data of the experiment has given us. The greater the distance
H
, the greater the information gain, as captured in the post-experiment
posterior, relative to the pre-experiment situation, as captured in the prior.
98
We may quantify the information gain of performing some experiment as follows. The experiment that promises to be the most informative is the one which is expected to generate H
the greatest mean cross-entropy
. In what follows we will derive the mean cross-entropy
for a medical example. Let E
k
E 1 ,
A1 ,
Aj ,E
K
be a set of, say, competing medical diagnoses of a disease. Let
, Am
be set of medical test we are contemplating. Let
set of possible outcomes for a given test assign to the diseases
and the
Aj
E
p Aj | Di E k k
after having observed the data-point
Di
in a given test
k
k
k
p Aj | Di E k
p Aj | Di E
log
k
k
p A j
j
Let
p Di | AjE k
points
k
Di
be the likelihoods of
in a given test
E
k
ik
| E
k
p D
ik
k
Di
Aj | E
k
k
Di
k
, , D n
k
be the
k
E
Di
k
Aj
. Then, taking some notational is, (2.7),
k
(2.8)
p D
ik
E
k
| AjE
k
Aj
. The (marginal)
are
p A
(2.9)
j
j
So, the mean cross-entropy, for a given test points
1
, that is, the posterior probabilities of the data-
in a given test
j
D
are the prior probabilities we
and given that one has the disease
k
probabilities of the data-points
p D
Di
k
are the posterior probabilities of disease
liberties, the cross-entropy associated with data-point
H Di | E
p Aj
. Then the
k
Di
E
k
, is the sum of cross-entropies for the data-
, (2.8), weighted by the probabilities of these data-points, (2.9):
H E
k
p Di | E k
k
H Di | E k
k
(2.10)
ik
If we compute (2.10) for all the
H E 1 ,
H E
2
,
K
,
medical tests, then we get
H E
K
K
mean cross-entropies
99
And the medical test which has the largest mean cross-entropy is the test that promises to give us the most information as to which of the competing diseases
Aj
is the most probable (Sivia,
1996). The mean cross-entropy (2.10) may be rewritten as a sum of Shannon information entropies, (2.1). By way of (2.8), (2.9), and (2.10), we have
H E
k
p D
| E
ik
k
ik
p AjD
p AjD
| E
ik
p D
ik
k
p Aj | D
log
p A j
log
1 p A j
| E
ik
k
| AjE
k
p A j
ik
1
log
p A j
ik
k
k
p A j
ik
| E
ik
k
log
ik
ik
| E
k
log
1
p D
ik
| E
k
k
p AjD
j
ik
| E
k
log
1
p D
ik
| E
k
1
| E
ik
| E
log
k
k
1
p D
ik
| E
k
1
p AjD
j
| E
ik
p AjD
j
1
ik
p AjD
p AjD
p D
k
1
log
| E
ik
E
p AjD
log
| E
ik
k
log
j
j
p AjD
| E
ik
ik
j
p A j
ik
p A j
p D
j
k
p AjD
ik
E
p AjD
log
ik
k
j
| E
ik
j
ik
ik
j
ik
p Aj | D
ik
| E
k
(2.11)
Then by way of (2.1) and (2.11), we have
H E
k
H
p A j
H
p D
ik
| E
k
H
p A
j
Di | E k
k
(2.12)
100
The mean cross-entropy in the form of (2.12) is also known as the Mutual Information, (Knuth, 2008) We summarize, the more general definition of cross-entropy, (2.7), through equations (2.8) to (2.10), has given us the identity (2.12), which is a function of the initial information entropies (2.1). In the next section we will give a further generalization of information theory. This generalization will allow us to find identities like (2.12) in a relatively effortlessly manner. This is done by combining the information entropies (2.1) themselves, as opposed to having to go through elaborate chains of inference as, for example, is done in equations (2.8) through (2.10).
Information Theory: The Third Phase, 2001-now Information theory has entered a new phase with the work of Knuth, (2002, 2003, 2004, 2008, 2010). In Knuth’s Inquiry Calculus relevances are assigned to questions, or equivalently, tests or experiments. These relevances are always defined relative to some central issue. This central issue is not necessarily the issue of interest. The central issue is the baseline in that it is the question which, when answered, will fill in all the unknowns. For example, in the previous section there were for the Aj
k
th
test, that is,
nk
. Then the central issue
possible observations I
is the joint question
which of the states D
Let
Q
A
1
A1 , D 1 A 2 , , A m D n
k
Q
k
A
possible outcomes conceivable
in conjunction with Q
D
k
m
possible diseases
:
will we observe?
be the question:
which of the disease diagnoses A
1
and let
Di
nk m
Q
D
, , Am
should we make?
be the question: k
which of the data-points D
1
, , D n
k
will we observe if we perform the
k
th
test?
101
Then the relevances are defined as the Shannon entropy’s of Shannon entropy of the central issue
d Q
Q
A
D
d Q
d Q
| I k
A
D
H H
| I
p A p A H
H
| I k
j
D
j
D
H
ik
A
Q
D
,
Q
k
A
, and
p A
| E
k
| E
k
1
k
j
j
D
p D p A D
ik
j
ik
| E
| E ik
k
Q
Dk
, scaled by
, (Knuth, 2004):
p A
H
ik
I
Q
(2.13)
| E
k
where it is understood that the entropies
H
are measures on probability distributions. For
example, (2.1),
H
p A j
j
p A j
log
1 p Aj
We summarize, questions
Q
are the collection of all the statements that will answer
that question (Knuth, 2008). Relevances entropies
H
d
of the probability distributions
may be assigned to questions p
I
by computing
which assign probabilities to all the statements
that will answer these questions. These entropies central issue
Q
H
, which is the join of all the questions
are then scaled by the entropy of the Q
taken together.
Now, how to operate on the relevances given in (2.13)? Thanks to the efforts of Knuth, a sum and product rule for relevances of questions have been derived, as the sufficient and necessary operators of relevances. The sum rule for relevances was already given in 2001, (Knuth, 2002). The product rule, though already conjectured in 2001, (Knuth, 2002), was only formally derived in 2009, (Center, 2010). So, only as recent as 2009 does information theory have its own ‘Bayes Theorem’. This is no trivial thing. Ever since Laplace's work in 1800, Bayesians have been able to effortlessly operate on probabilities by way of the product and sum rule. And this ease of use and generality has now been extended to information theory. We proceed to demonstrate this ease of use.
102
The sum rule of relevances states that
d Q
Q
A
D
| I k
d Q
| I d Q
A
D
| I d Q k
Q
A
| I
D
k
(2.14)
where ‘+’ is the logical ‘OR’. The product rule of relevances states that d Q
| I d Q
A
D
|Q k
d Q
A
A
Q
D
| I k
d Q
D
| I d Q k
A
|Q
D
k
(2.15)
which gives the ‘Bayes Theorem’ of information theory:
d Q
D
|Q k
Note that
d Q
D
A
d Q
|Q k
A
A
Q
d Q
d Q
A
D
| I
|Q
D
| I k
k
A
relevance of the test results of the
d Q
I k
th
D
| I k
d Q
|Q
A
d Q
A
D
| I
k
(2.16)
is the relevance we are looking for. Namely, the test, that is,
Di
, for the diagnosis of the disease k
Aj
.
By substituting the relevances (2.13) into the sum rule (2.14), we find
d Q
Q
A
D
| I k
H
p A j
H
p D | E H p A H p A D | E ik
k
j
ik
j
D
| E
ik
k
(2.17)
k
Substituting (2.13) and (2.17) into (2.16), that is, the reshuffled product rule (2.15), we obtain
d Q
D
|Q k
A
H
p A j
H
p D
ik
H
p A | E
k
H
p A
j
Di | E k
k
(2.18)
j
Substituting (2.12) into (2.18), we see that the relevance of interest is just the scaled mean cross-entropy, which was earlier obtained by going through a rather torturous line of reasoning, equations (2.8) through (2.10):
d Q
D
|Q k
A
H E H
k
p A
(2.19)
j
103
Note that the scale
H
p A is the same for all tests j
E1 , , E
K
. So, the ordering between
the mean cross-entropies,
H E 1 ,
H E
2
,
H E
,
K
is equivalent the ordering in conditional relevances,
d Q
D1
|Q
A
, d Q
D
|Q 2
A
,
, d Q
|Q
D
K
A
So, we may reinterpret (2.12) as choosing that test relative to the issue of interest
Q
A
E
k
, which has the highest relevance
which is the question:
which of the disease diagnoses A
1
, , Am
should we make?
This then concludes our introduction into the new and extended information theory. We have demonstrated how each phase in the development of information theory has been a steady generalization of the principles that were found earlier. So the earlier principles were found to be specials cases of latter generalizations. Which is at it should be.
104
3 Information Theory and Risk Communication If there is the possibility of some danger, then information theory allows us to assign in a rational manner relevances to statements made by officials in regards to that danger. It is then found that the competence and trustworthiness of that official source is directly related its relevance. High competence and trustworthiness imply a corresponding high relevance, and low competence trustworthiness imply a low relevance. This mathematical derived result is in close correspondence with previous social scientific findings (Slovic, 1999). Since competence and trustworthiness turn out to be the necessary boundary conditions for relevance, we find that risk communicators, in order for their message to be effective (e.g. relevant), should not only focus themselves on the message itself, but also should take great care to manage their perceived competence and trustworthiness with the public at large. Since it is rationality itself which, given a low perceived competence and trustworthiness of the risk communicators, dictates the public to disregard that which is communicated to them.
3.1 The Importance of Unbiasedness Using information theory, or, equivalently, inquiry calculus, we will demonstrate the importance of a source of information being unbiased in order for it to be relevant. Say, some disaster has occurred like, for example, the nuclear accident in Fukushima. Then the propositions we will use are
D
Danger
D
No danger
W
Warning
All-clear
U
Unbiased
W
U
Biased
If our source of information is unbiased, a warning will be given in case of danger and an allclear if there is no danger. Then we have the following probabilities: 105
p W | DU
1
p W | DU
0
(3.1) p W | D U
0
p W | D U
1
If our source of information is biased, in that an all-clear will be given even if there is a clear and present danger, then we have the following probabilities: p W | DU
0
p W | DU
1
(3.2) p W | D U
0
p W | D U
1
Personal prior probabilities are assigned to the possibility of there being a dangerous situation: p D
d
(3.3) p D
1
d
Personal prior probabilities are also assigned to the possibility of the source of information being unbiased: p U
u
(3.4) p U
1
u
106
Combining (3.1) through (3.4), by way of the product rule of probability theory8, we find:
p WDU
du
p WD U
0
p W DU
0
p W D U
d 1 u
(3.5) p W D U
p W D U
p W D U
0
1 d
p W D U
u
0
1 d 1 u
Using the sum rule of probability theory9, we find from (3.5) p WD
p W D
du
d 1 u
(3.6) p W D
0
p W D
1 d
as well as p W
du
(3.7) p W
1
du
The central issue is or
W
Q
D
. We want to find to relevance of the warning signal,
For example, For example,
D
or
D
W
. So, the issue of
, and we are looking for the conditional relevance of the signal given by our
source of information,
9
D
, in relation to the dangerousness of the current situation,
interest is
8
I QW Q
p WDU p WD
QW
, in relation to this issue of interest. We have, (2.16),
p W | DU
p WDU
p D p U
p WD U
du
du
.
. 107
d Q W | Q
D
d Q W Q d Q
D
D
| I
| I
(3.6)
where, (2.14), d Q W Q
D
| I
d Q W | I
d Q
| I
D
d Q W Q
| I
D
(3.7)
Substituting (3.7) into (3.6) we obtain
d Q W | Q
D
d Q W | I
d Q
D
d Q
| I D
| I
d Q W Q
D
| I
(3.8)
The right-hand relevances in (3.8) can be found to be, (2.13),
d Q W Q
D
| I
d Q W | I
d Q
D
| I
H
p WD , p WD ,
H
p W , p W p WD , , p W D
H
p D , p D p WD , , p W D
H
, p W D , p W D
H
(3.9)
H
where, (2.1), (3.3), (3.6), and (3.7)
H
p WD ,
, p W D
1 du log
1 d
1
log
1 d
du
H
p W , p W
1 du log
1 du
log
du
H
p D , p D
1 d log d
1 d
log
d 1 u log
1 1 du
1 d 1 u
(3.10)
1 1 d
Substituting (3.9) and (3.10) into (3.8), and making use of the logarithmic property
108
1 c log
c log c
c
we obtain, after some algebra,
d Q W | Q
D
d log d 1 du
log 1
du
d log d 1 d
d 1 u log d 1 u
log 1
d
(3.11)
Inspecting (3.11), we see that as the probability of unbiasedness goes to one, that is,
u 1
,
the relevance of the source of information goes to one as well, (3.11):
d Q W | Q
D
d log d 1 d
log 1
d
d log d 1 d
log 1
d
(3.12)
1
While as the probability of unbiasedness goes to zero, that is,
u 0
, the relevance of the
source of information goes to zero as well, (3.11):
d Q W | Q
D
d log d log 1 d log d d log d 1 d
log 1
d
(3.13)
0
3.2 What Does it Mean? Both the warning and the all-clear, that is,
W
and
W
are signals send by the risk
communicators to the public at large. Let p D
(3.14)
d
be the initial danger assessment, prior to receiving the signal, of a receiver. If the risk communication is optimal, then this signal will modify the belief of the receiver to the extent that a warning will imply with certainty the presence of danger, that is, p D | W
1
(3.15) 109
whereas and an all-clear will imply with certainty the absence of danger, that is, p D | W
0
,
(3.16)
irrespective of the initial danger assessments (3.14). The scenario where there is a possible biasedness in the direction of not giving a warning even if there is a clear and present danger is expressed in the probabilities (3.1) through (3.7). By way of the product rule of probability theory and (3.6) and (3.7), we find that a warning is always communicated successfully10:
p D |W
p WD p W
du
(3.17)
1
du
as (3.17) satisfies the communication ideal of (3.15). However, in the case of an all-clear, we have that the risk communication is dependent upon the perceived unbiasedness of the source of information, that is,
p D | W
p W D p W
d 1 u
, (3.4):
(3.18)
1 du
As the probability of unbiasedness goes to one, that is,
u
u 1
, then the relevance of the
source of information goes to one, (3.12), while
p D | W
10
d 1 u 1 du
0
(3.19)
To give an analogy, if the tobacco industry tells us that smoking causes lung cancer, then we will be more than
inclined to believe them. We expect them to be biased, but only to the extent that they will try to deny the causal effect between smoking and lung cancer. So, if the tobacco industry says that smoking is bad for you, then it is safe to assume that this indeed the case.
110
thus, satisfying the communication ideal (3.16). But as the probability of unbiasedness goes to zero, that is,
u 0
, then the relevance of the source of information goes to zero, (3.13),
while
p D | W
d 1 u 1 du
d
(3.20)
or, equivalently, (3.20) and (3.14), p D | W
p D
(3.21)
And we see that for this particular scenario, a relevance of zero implies the inability of the allclear signal to move the danger perception of the receiver away from its prior pre-signal state and in the direction of the communication ideal (3.16).
3.3 Truth or Dare In Truth or Dare in Japan, Correspondents Report, November 5, 2011, by Mark Willacy, we can read the following account:
A Japanese government official was dared by a journalist to drink a glass of water taken from a puddle inside the Fukushima nuclear plant. But this wasn't some glowing green liquid concoction that would turn the hapless official into the Incredible Hulk. It was water from the basement of reactors 5 and 6 at Fukushima, both of which were shut down successfully after the tsunami hit the plant in March unlike three of the other reactors which each suffered meltdowns. The water had been purified but because of fears it was still slightly contaminated it was deemed too unsafe to release outside the grounds of the Fukushima plant [by the journalists gathered at the press conference]. The TV cameras showed Yasuhiro Sonoda's hands shaking as he poured the water into the glass. The government MP and parliamentary secretary to the cabinet looked like a man about to drink poison. He took a gulp, held the glass out once more for the assembled press, and then skulled the rest.
111
How relevant was Mr. Sonoda gesture for our risk assessment in terms of alleviating our fears regarding the dangers of the decontaminated water? Common sense would suggest not very much. And indeed, we can read in the same article:
The ABC North's Asia correspondent Mark Willacy watched this feat of daring and wonders whether it was a publicity stunt, or just a dare between testosterone-fuelled men.
We now will look at this incident from an information theoretic perspective. The drinking of decontaminated water is just another way of giving the all-clear signal W
. If we look at the BBC report then it is stated that Mr. Sonoda had been challenged
repeatedly in the course of a five hour press conference to prove that what he was saying was true, that the decontaminated water was safe for use around the Fukushima plant. Seeing that Mr. Sonoda had committed himself to the decontamination being safe and had subsequently been called out to prove so, we suspect that had Mr. Sonoda not drunk the water this would have led to a serious loss of face. Furthermore, this then also would have amounted to giving a warning signal
W
as to the safety of the water, forcing the public to update their danger
assessment, irrespective of their initial danger assessment, to, (3.17), P D
updated to
d
p D | W
1,
(3.22)
which probably would have not been healthy for his political career. So, we assign a low probability of him being unbiased, that is, equivalently,
u 0 . 20
P U
0 . 20
, or,
, (3.4). Not being an expert on the decontamination process, we are on
the fence in regards to the short term safety of the decontaminated water. Stated differently, it may or may not be safe. We simply do not know. So, we assign as the probability of danger P D
0 . 50
, or, equivalently,
d 0 . 50
, (3.3). Then the relevance of the act of either drinking
or not drinking the decontaminated water, before the actual drinking, is, (3.11), d Q W | Q
D
0 . 108
(3.23)
And after we have observed Mr. Sonoda drinking the water, that is, giving us the all-clear
W
,
our danger assessment prior to the drinking of the water is updated somewhat, (3.18), 112
P D
0 . 50
updated to
P D | W
(3.24)
0 . 444
Note that had we assigned no credibility to the unbiassedness of Mr. Sonoda, that is, P U
0
, or, equivalently,
u 0
, (3.4), then the relevance
zero, (3.13). And the reassurance of no danger
W
d Q W | Q
D
would have been
would have failed to modulate our prior
danger assessments in any way whatsoever, (3.21). Based on the little knowledge on radiation we do have, we are less doubtful in regards the long term dangers of the decontaminated water. As even small amounts of radiation will accumulate over time. So, we assign a much larger probability to the long term dangers, that is, we assign
P D
0 . 90
, or, equivalently,
d 0 . 90
, (3.3). The corresponding relevance
then drops further to, (3.11), d Q W | Q
D
(3.25)
0 . 065
And our assessment prior to the drinking of the water, that is,
P D
0 . 90
, is updated
somewhat, (3.18),
P D
0 . 90
updated to
P D | W
0 . 878
(3.26)
A low relevance of a source of information implies the inability of that source to modulate the prior beliefs of those to which the information is communicated. Conversely, if a source of information is unable to modulate the prior beliefs, then this implies low relevance for that source of information. The fact that the relevances do not approach 0 much faster, given the small effects on the respective posteriors, (3.24) and (3.26), can be explained by the fact that the relevance values (3.23) and (3.24) also reflect the possibility that Mr. Sonado could have chosen not the drink the decontaminated water. This then would have amounted to the sending of a warning signal
W
, which would have been highly informative, (3.22).
Based on this information theoretical analysis, we see that Mr. Sonoda was forced into a very unfortunate position of damage control. Had he not drunk the decontaminated water, then a clear red flag would have been raised as to the dangers involved. Whereas drinking the
113
decontaminated water could hardly convince us of the safety of the that water11, given the very real possibility of biasedness.
3.4 How to Be Relevant The relevances
d Q W | Q
D
for the biasedness scenario, (3.11), may be plotted as a function of
the prior perceived danger, equivalently,
u 0 .9
p D
d
. For a high probability of unbiasedness,
p U
0 .9
, or,
, (3.4), we find the relevance function
d QW QD 1 0.8 0.6 0.4 0.2
0.2
0.4
0.6
0.8
Figure 3.1: relevance function for
1
p U
pD
0 .9
In Figure 3.1 we see that even in the case of great trust, that is, a high perceived likelihood of unbiasedness, that is,
p U
0 .9
, the relevance of the source of information drops of as the
perceived likelihood of danger, that is, p D , increases. The source of information remains relevant for small perceived likelihoods of danger. But as the perceived likelihood of danger grows, the information theoretic equivalent of a panic sets in, and all reassurances of safety on the part of the trusted source of information are bound to fall on deaf ears, as its relevance eventually diminishes to zero in the face of imminent danger. Stated differently, as the perceived danger, increases there comes a tipping point in which one's own prior perception of danger becomes more relevant, thus making the source of information less relevant.
11
As an aside, on 6 November, 2011, it was reported that Otsuka Norikazu (63), one of the main
newscasters on Fuji TV’s ‘Mezamashi TV’ had been diagnosed with acute lymphoblastic leukemia. In his morning program on Fuji TV Mr. Norikazu had been promoting Fukushima produce by eating them in the show. The incidence of adult cases of lymphoblastic leukemia is one in 100,000 annually in Japan.
114
The modified danger assessments of the prior perceived danger, or, equivalently,
u 0 .9
p D
d
p D | W
, (3.19), may also be plotted as a function
. For a high probability of unbiasedness,
p U
0 .9
,
, (3.4), we find the danger assessment modification function
p D not W 1 0.8 0.6 0.4 0.2 pD 0.2
0.4
0.6
0.8
1
Figure 3.2: danger assessment modification for
p U
0 .9
In Figure 3.2 we see that even in the case of great trust, that is, a high perceived likelihood of unbiasedness, that is,
p U
0 .9
communication ideal
p D | W
, the modified danger assessment goes away from the 0
, (3.16), as the perceived likelihood of danger, that
is, p D , increases. For a low probability of unbiasedness,
p U
0 .1
, or, equivalently,
u 0 .1
, (3.4), we
find the relevance function d QW QD 1 0.8 0.6 0.4 0.2
0.2
0.4
0.6
0.8
Figure 3.3: relevance function for
1
p U
pD
0 .1
danger assessment modification function of
115
p D not W 1 0.8 0.6 0.4 0.2 pD 0.2
0.4
0.6
0.8
1
Figure 3.4: danger assessment modification for
p U
0 .1
In Figure 3.4 we see a demonstration of (3.21), that low relevances imply the inability of the all-clear signal to move the danger perception of the receiver away from its prior pre-signal state
p D
and in the direction of the communication ideal,
p D | W
, (3.16).
In Figures 3.1 and 3.2 we see that in cases like the Fukushima nuclear accident there are two factors which determine the relevance of the source of information. The first factor is the perceived likelihood of unbiasedness, that is,
p U
. The second factor is the perceived
likelihood of danger, that is, p D . Because even for a high trust in the unbiasedness of the source of information, large perceived likelihoods of danger may render that source of information irrelevant as panic sets in, that is, as
p D
1
.
The perceived likelihood of unbiasedness of the source of information, that is,
p U
,
will typically be influenced by the actions of the source of information. If the source of information is the government itself, then these actions may entail giving full disclosure, taking full responsibility, and distancing oneself from any suggestion of a conflict of interest, and so on. And the perceived likelihood of danger
p D
will typically be a function of the
state of knowledge one has regarding the specifics of the danger which is in play. For example, in the Fukushima nuclear accident case there are some doubts on the unbiasedness of the government. As they have been slow to give full disclosure and as there are perceived to be strong ties between the ‘nuclear village’, that is, the nuclear industry and the government. Furthermore, as there is a relatively large dread for the dangers of radiation (Slovic, 2002), most people will typically assign by default a high plausibility to the proposition that a nuclear accident entails a severe danger. Both these factors then make it difficult for the Japanese government to effectively communicate to the public at large that the dangers are not that catastrophic. However, the plausibility that a lay person assigns to the proposition that a nuclear accident entails severe danger will typically be diffuse, as he himself is no nuclear physicist. 116
Consequently, his plausibility
p D
may be swayed either way, if some authoritative source
of information he trusts as being unbiased like, say, a nuclear physicist or health specialist with no discernible ties to the nuclear industry, has some additional information to offer regarding these dangers.
3.5 The Importance of Competence Using information theory, or, equivalently, inquiry calculus, we will demonstrate the importance of a source of information being competent in order for it to be relevant. We operationalize competence by way of the concept of false positives and false negatives. Let
be the probability of a false positive and
be the probability of a false
negative. Then we have that the conditional probability of either a warning or an all-clear, given either danger or no danger, is p W | D
1
p W | D
(3.27) p W | D
p W | D
1
The personal prior probabilities which assigned to the possibility of there being a dangerous situation are again, (3.3) p D
d
(3.28) p D
1
d
Combining (3.27) and (3.28) by way of the product rule of probability theory, we have
117
p WD
1
p W D
d
d
(3.29) p W D
p W D
1 d
1 1 d
and p W
1
d 1 d
(3.30) p W
d 1
1
d
Note that the false positive and false negative scenario may pertain to medical tests, which are validated by their false positives and false negatives; weather and terror alarms, which tend to favour false positives over false negatives, by way of the fact that it is better to be safe than sorry; and so on. Furthermore, if we set the probability of a false positive to zero, that is,
0
and if we interpret the probability of a false negative
(3.31)
as the probability of biasedness, that
is,
1 u
(3.32)
the competence scenario collapses to the previous scenario in which biasedness played an important role. This can be seen by substituting both (3.31) and (3.32) into (3.29) and (3.30) and comparing the result with (3.6) and (3.7) . So, biasedness and incompetence admit the same inference and relevance structure12.
12
Hanlon’s razor states that we ought to never attribute to malice that which can easily be explained by
incompetence. The unstated corollary being that incompetence is a great way to hide malice.
118
It was found for the biasedness scenario that as the probability of unbiasedness goes to zero, that is,
u 0
d Q W | Q
D
, then the corresponding relevance goes to, (3.13),
(3.33)
0
This result can be restated in terms of false positive and negatives, as follows. If the probability of a false positive is zero and the probability of a false negative goes to one, that is,
1
, (3.32), the source of information will flat-line, in that only no warnings
W
can be
given out, and hence the relevance of zero, (3.33). By going through the steps (3.6) through (3.11), we may compute the relevance of the of false positives and false negatives scenario, and may be checked, that if the source of information is as informative as a coin toss, that is, zero. Just as in the case of for that matter,
1
and
0 0
and
1
0 .5
, then its relevance goes to
, where there is a bias to not give a warning, or,
, which is the scenario there is a bias to over exaggerate the
possibility of danger, by always giving a warning. Both cases where
0
and
1
will result in relevances of one. The latter being the case where it is known that the source of information always lies, which in the case of dichotomous propositions, then implies perfect information. All other configurations of false positives
relevances greater than zero and smaller than one, that is,
and false negatives 0 d Q W | Q
D
1
will result in
.
119
4 Discussion Slovic states that the limited effectiveness of risk-communication can be attributed to the lack of trust. If you trust the risk-manager, communication is relatively easy. If trust is lacking, no form or process of communication will be satisfactory, (Slovic, 1999). In information theoretic terms this translates to the statement that if the probability of some source of information being unbiased and competent is low, then its relevance will also be low, where a low relevance implies an a priori inability to modulate our prior beliefs regarding some issue interest. However, information theory shows us that there is a second factor at play in riskcommunication, other than trustworthiness. This second factor is the perceived likelihood of danger. Because even for a high trust in the unbiasedness of a source of information, large perceived likelihoods of danger may render that trusted source of information still irrelevant, as one’s own sense of danger overrides all the assurances of safety, see Figure 3.1. The perceived likelihood of unbiasedness of the source of information will typically be influenced by the actions of the source of information. If the source of information is the government itself, then these actions may entail giving full disclosure, taking full responsibility, and distancing oneself from any suggestion of a conflict of interest, and so on. And the perceived likelihood of danger will typically be a function of the state of knowledge one has regarding the specifics of the danger which is in play. If a full and thorough understanding of the dangers involved requires some form of scientific training, then the plausibility that a lay person assigns to the proposition of there being a danger, typically, will be diffuse. Such diffuse plausibilities may be swayed either way, if some authoritative and unbiased source of information has some pertinent information to offer regarding these dangers. Examples of such ‘opaque’ dangers would be, for example, the dangers that flow from exposure to radiation or the dangers associated with climate change. While scientists from the respective fields, typically, will fulfil the role of being authoritative and unbiased sources of information.
120
References Achenbach, J., (11 March 2011). "Japan: The 'Big One' hit, but not where they thought it would". The Washington Post. Archived from the original on 17 March 2011. Retrieved 17 March 2011. Allais M.: Fondements d’une Theorie Positive des Choix Comportant un Risque et Critique des Postulates et Axiomes de l'Ecole Americaine, Colloque Internationalle du Centre National de la Recherche Scientifique, No. 36, (1952). Allais M.: Le Comportement de l’Homme Rationel devant le Risque, Critique des Postulates et Axiomes de l'Ecole Americaine, Econometrica 21,503-546, (1953). Allais M.: L’Etension des theories de l’equilibre economique general et du rendement social au cas du risque, Econometrica, 21, 269-290, (1953) . CATDAT Earthquake-Report.com Situation Report #41 Losses in the Japanese Earthquake/Tsunami of March 11th as of 30 September 2011. http://earthquake-report.com/2011/10/02/japan-tohoku-earthquake-and-tsunamicatdat-41-report-october-2-2011/ Center JL. Inquiry calculus and information theory. In Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Oxford MS, USA 2009 (eds. Goggans P and Chun CY) AIP Conference Proceedings, American Institute of Physics, New-York. 69-78, 2010. Dawson, I., Johnson, J., Luke, M., (2012). Do People Believe Combined Hazards Can Present Synergistic Risks? Risk Analysis, Vol. 32, No.5, 2012 Durham, K., “Treating the risks in Cairns”, Natural Hazards 30(2), 251-261, 2003. Edwards W.: The Theory of Decision Making, Psychological Bulletin, Vol 51., No. 4, (1954). Ellsberg D.: Risk, Ambiguity, and the Savage Axioms, Quarterly Journal of Economics 75, 643-699, (1961). Fechner G.J.: Elemente der Psychophysik, 2 vols.; Vol. 1 translated as Elements of Psychophysics, Boring E.G. and Howes D.H., eds. Holt, Rinehart and Winston, New York, (1966). Fennema H. and Wakker P.: Original Cumulative Prospect Theory: a Discussion of Emperical Differences, Journal of Behavioral Decision Making, Vol. 10, 53-64, (1997). Flyvbjerg, B., (2011), "Case Study," in Norman K. Denzin and Yvonna S. Lincoln, eds., The Sage Handbook of Qualitative Research, 4th Edition (Thousand Oaks, CA: Sage), pp. 301-316. Flyvbjerg, B., Bruzelius, N., & Rothengatter, W. (2003). Megaprojects and risk: An anatomy of ambition. Cambridge, UK: Cambridge University Press.
121
French DP, Marteau TM, Sutton S, Kinmonth L. Different measures of risk perception yield different patterns of interaction for combinations of hazards: Smoking, family history and cardiac events. Journal of Behavioral Decision Making, 2004; 17:381.Quality & Quantity, 2000b; 34:407–418. Finucane, M., Alhakami, A., Slovic, P., Johnson S., 2000. The affect heuristic in judgments of risks and benefits. Journal of Behavioral Decision Making, 13: 1-17 (2000). Finucane, M., Alhakami, A., Slovic, P., Johnson S., (2012). The affect heuristic in judgments of risks and benefits. Risk Analysis, 2012. Georgescu-Roegen N.: Utility, Expectations, Measurability, and Prediction, Paper read Econometric Soc., September 1953. Gino, F., Sharek, Z., & Moore, D. A. (2011). Keeping the illusion of control under control: Ceilings, floors, and imperfect calibration. Organizational Behavior & Human Decision Processes, 114, 104-114 Heath C. and Tversky A.: Preference and Belief: Ambiguity and Competence in Choice Under Uncertainty, Jornal of Risk and Uncertainty 4, 5-28, (1991). Japan Times, (2011). Kyodo News, 90% of disaster casualties drowned. 21 April 2011, p.2 Japan Times, (2011). Jiji Press, 42% didn´t immediately flee tsunami. 18 August 2011, p.2 Jaynes E.T.: Confidence Intervals vs Bayesian Intervals, W.L. Harper & C.A. Hooker, eds. Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science, Reidel Publishing Co., Dordrecht, Holland, (1976). Jaynes E.T.: How Does the Brain Do Plausible Reasoning?, Stanford University Microwave Laboratory Report 421, (1957). Jaynes E.T.: Probability Theory; the Logic of Science. Cambridge University Press, (2003). Johnson-Laird P.N. and Wason P.C.: A Theoretical Analysis of Insight into a Reasoning Task, Cognitive Psychology, 1, 134-148, (1970). Kahneman, D., (2011): Thinking, fast and slow, Allen Lane Paperback, ISBN 978-1-846-14606-0 Kahneman D. and Tversky A.: Subjective Probability: a Judgment of Representativeness, Cognitive Psychology 3, 430-454, (1972). Kahneman D. and Tversky A.: On the Psychology of Prediction, Psychological Review, Vol. 80, No. 4, (1973). Kahneman D. and Tversky A.: Prospect Theory: an Analysis of Decision Under Risk, Econometrica, 47(2), 263291, (1979).
122
Kahneman D. and Tversky A.: Choices, Values and Frames, American Psychologist 39, 341-350, (1984). Katayama, T., Meguro,K., Dutta, D., 2004. "Seismic Risk Management for Countries of the Asia Pacific Region", Proceedings of the 3rd Bangkok Workshop, December 2003. ICUS 2004-01. Kok, M., Vrijling J.K., van Gelder P.H.A.J.M., Vogelsang M.P.: Risk of Flooding and Insurance in the Netherlands, Flood Defence 2002, Wu et al. eds., Science Press, New York Ltd, (2002). Knuth KH. Inductive logic: from data analysis to experimental design. In Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Baltimore ML, USA 2001 (ed. Fry RL) AIP Conference Proceedings 617, American Institute of Physics, New-York. 392-404, 2002. Knuth KH. What is a question? In Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Moscow ID, USA 2002 (ed. Williams C) AIP Conference Proceedings 659, American Institute of Physics, New-York. 227-242, 2003. Knuth KH. Deriving laws from ordering relations. In Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Jackson Hole WY, USA 2003 (eds. Erickson GJ and Zhai Y) AIP Conference Proceedings 707, American Institute of Physics, New-York. 204-235, 2004. Knuth KH. Lattice Duality: the origin of probability and entropy. Neurocomputing 67;245-274, 2004. Knuth KH. The origin of probability and entropy. In Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Sao Paulo, Brazil 2008 (eds. Lauretto MS, Pereira CAB) AIP Conference Proceedings, American Institute of Physics, New-York. 35-48, 2008. Knuth KH and Skilling J. Foundations of inference, arXiv: 1008.4831v1 [math.PR], 2010. Kullback S and Leibler RA. On information and sufficiency. Ann. Math. Statist. 22:79-86, 1951. Masin S.C., Zudini V., and Antonelli M.: Early Alternative Derivations of Fechner’s Law, Journal of Behavioral Sciences, 45, 56-65, (2009). Kunreuther, H. 2008. “Reducing Losses from Catastrophic Risks Through Long-term Insurance and Mitigation.” Social Research: An International Quarterly 75 (3): 905–930. Kunreuther, Howard. 1996. “Mitigating Disaster Losses Through Insurance.” Journal of Risk and Uncertainty 12 (3): 171–187. Kunreuther, H., R. Ginsberg, L. Miller, P. Sagi, P. Slovic, B. Borkan, and N. Katz (1978), Disaster Insurance Protection: Public Policy Lessons. Wiley Interscience, New York, NY. Lindblom, C. 195 The Science Of 'Muddling Through'. In Public Administration Review, Bd. 19 (1959), S. 79– 88, ISSN 0033-33529
123
Linnerooth-Bayer, J., R. Mechler, and G. Pflug. 2005. “Refocusing Disaster Aid.” Science 309: 1044–1046. Lovett, R., (2011). Japan Earthquake Not the “Big One”? National Geographic News. 17 March 2011. Marx S. M., Weber, E. U., Orlove, B. S., Leiserowitz, A., Krantz, D. H., Roncoli, C., & Phillips, J. (2007). Communication and mental processes: Experiential and analytic processing of uncertain climate information. Global Environmental Change, 17, 47-58. Medvedev Z.A.: The Legacy of Chernobyl (paperback ed.). W. W. Norton & Company, (1990). Mongin P.: Expected Utility Theory, J.Davis, W. Hands and U. Maki, eds. Lond, Edward Elger, 342-350, (1997). Murray D.J.: A Perspective for Viewing the History of Psychophysices, Behavioral and Brain Sciences, 16, 115137, (1993). Nakamura, K., (2009). Disability, destitution, and disaster: Surviving the 1995 Great Hanshin Earthquake in Japan. Human Organization, Volume 68, Issue 1, 1 April 2009, Pages 82-88. Orlowsky, B., and S.I. Seneviratne, 2012: Global changes in extreme events: Regional and seasonal dimension. Climatic Change, 110, 669-696, doi: 10.1007/s10584-011-0122-9. Oxfam (2008). Collaboration in Crises: Lessons in community participation from the Oxfam International tsunami research program, February 2009 Patt, A., P. Suarez and C. Gwata. 2005. Effects of seasonal climate forecasts and participatory workshops among subsistence farmers in Zimbabwe. Proceedings of the National Academy of Sciences of the United States of America. 102: 12623-12628. Phillips L.D. and von Winterfeldt D.: Reflections on the Contributions of Ward Edwards to Decision Analysis and Behavioral Research, Working Paper LSEOR 06.86, Operational Research Group, Department of Management London School of Economics and Political Science, (2006). Plous, S. (1993). The psychology of judgment and decision making. New York: McGraw-Hill. Polya G.: How to Solve It, Princeton University Press, (1945). Second paperbound edition by Doubleday Anchor Books, (1957). Polya G.: Mathematics and Plausible Reasoning, 2 vols, Princeton Press, (1954). Ramsey F.P. (1931), Foundations – Essays in Philosophy, Logic, Mathematics and Economics, Humanities Press, 287pp (1977) LCCN 77-26864 Rhodes, R. A. W. Understanding Governance. (Buckingham and Philadelphia: Open University Press, 1997. Reprinted 1999). 124
Shannon CE. A mathematical theory of communication. Bell Syst. Tech. J. 27:379-623, 1948. Shefrin, Hersh. Behavioral Corporate Finance. Decisions that Create Value. McGraw- Hill/Irwin. New York, 2007. Shimbun, Y., 2001. Forecast of earthquake probability is within 30 years ahead, however Tsunami attack probability is much lower than earthquake so that the plan is set to be within 100 years ahead. Shimazaki, K., 2011. Press conference in Tokyo. India Times, May 12, 2011 Sivia DS. Data analysis – a Bayesian tutorial. Clarendon Press, Oxford, 1996. Skilling J. The canvas of rationality. In Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Sao Paulo, Brazil 2008 (eds. Lauretto MS, Pereira CAB) AIP Conference Proceedings, American Institute of Physics, New-York. 67-79, 2008. Slovic P. Trust, Emotion, Sex, Politics, and Science: Surveying the Risk-Assessment Battlefield, Risk Analysis 19(4), (1999). Slovic P. and Weber EU. Perception of risk posed by extreme events, paper prepared for discussion at the conference ‘Risk Management strategies in an Uncertain World’, Palisades, New York, April 12-13, 2002. Slovic P., Finucane M.L., Peters E., and MacGregor D.G.: Risk as Analysis and Risk as Feelings: Some Thoughts About Affect, Reason, Risk and Rationality Risk Analysis 24(2), (2004). Sorensen JH. Hazard warning systems: Review of 20 years of progress. Natural Hazards Review, 2000; 1:119– 125. Stein, S. & Okal, E.A. (2005): Speed and size of the Sumatra earthquake. Nature. Vol. 434, pp 581-582 Stevens S.S.: To Honor Fechner and Repeal His Law, Science, New Series, Vol. 133, No. 3446, 80-86, (1961). Stirling, A., M. Leach, L. Mehta, I. Scoones, A.Smith, S. Stagl, and J. Thompson. 2007. Empowering designs: towards more progressive appraisal of sustainability. STEPS Centre working paper 3. STEPS Centre, Brighton, UK. [online] URL: http://www.steps-centre.org/PDFs/final_steps_design.pdf. Taylor, S. E., & Brown, J. D. (1988). Illusion and well-being: a social psychological perspective on mental health. Psychological Bulletin, 103(2), 193-210. Teigen, K. H. (1974). Subjective sampling distributions and the additivity of estimates. Scandinavian Journal of Psychology, 15, 50-55. Terpstra T, Lindell MK, Gutteling JM. Does communicating (flood) risk affect (flood) risk perceptions? Results of a quasi-experimental study. Risk Analysis, 2009; 29:1141– 1155.
125
Titov, V. V., Rabinovich, A. B., Mofjeld, H., Thomson, R. E., and Gonzalez, F. I.: The global reach of the 26 December 2004 Sumatra tsunami, Science, 309, 2045–2048, 2005. Torbeyns, J., De Smedt, B., Ghesquière, P., & Verschaffel, L. (2009). Solving subtractions adaptively by means of indirect addition: Influence of task, subject, and instructional factors. Mediterranean Journal for Research in Mathematics Education, 8(2), 1–30 Tribus M. Thermostatics and thermodynamics. New York: van Nostrand, 1961. Trope, Y., & Liberman, A. (1993). The use of trait conceptions to identify other people’s behaviour and to draw inferences about their personalities. Personality and Social Psychology Bulletin, 19, 553–562. Tversky und D. Kahneman (1992): Advances in prospect theory: cumulative representation of uncertainty. In: D. Kahneman und A. Tversky (Hrsg.), (2000): Choices, values and frames, Cambridge University Press, Cambridge, S. 44-66. Tversky, A.; Kahneman, D. (1974). "Judgment under uncertainty: Heuristics and biases". Science 185 (4157): 1124–1131. doi:10.1126/science.185.4157.1124 T6. 2007. Assessing and Mapping Multiple Risks for Spatial Planning. Armonia Project deliverable, EU FP6. Rome: T6. http://ec.europa.eu/research/environment/pdf/publications/fp6/natural_hazards//armonia.pdf. Watts, Jonathan, "Quake survivors search for hope and shelter", Japan Times, 26 March 2011, p. 13.
126