(and the British) National Risk Assessment ... - Wiley Online Library

2 downloads 0 Views 285KB Size Report
Internationally, national risk assessment (NRA) is rapidly gaining government sympathy as a science-based approach toward prioritizing the management of ...
Risk Analysis, Vol. 33, No. 6, 2013

DOI: 10.1111/risa.12052

How Solid Is the Dutch (and the British) National Risk Assessment? Overview and Decision-Theoretic Evaluation Charles Vlek∗

Internationally, national risk assessment (NRA) is rapidly gaining government sympathy as a science-based approach toward prioritizing the management of national hazards and threats, with the Netherlands and the United Kingdom in leading positions since 2007. NRAs are proliferating in Europe; they are also conducted in Australia, Canada, New Zealand, and the United States, while regional RAs now exist for over 100 Dutch or British provinces or counties. Focused on the Dutch NRA (DNRA) and supported by specific examples, summaries and evaluations are given of its (1) scenario development, (2) impact assessment, (3) likelihood estimation, (4) risk diagram, and (5) capability analysis. Despite the DNRA’s thorough elaboration, apparent weaknesses are lack of stakeholder involvement, possibility of falsepositive risk scenarios, rigid multicriteria impact evaluation, hybrid methods for likelihood estimation, half-hearted use of a “probability × effect” definition of risk, forced comparison of divergent risk scenarios, and unclear decision rules for risk acceptance and safety enhancement. Such weaknesses are not unique for the DNRA. In line with a somewhat reserved encouragement by the OECD (Studies in Risk Management. Innovation in Country Risk Management. Paris: OECD, 2009), the scientific solidity of NRA results so far is questioned, and several improvements are suggested. One critical point is that expert-driven NRAs may preempt political judgments and decisions by national security authorities. External review and validation of major NRA components is recommended for strengthening overall results as a reliable basis for national and/or regional safety policies. Meanwhile, a broader, more transactional concept of risk may lead to better national and regional risk assessments. KEY WORDS: All-hazards approach; expert opinion; multicriteria evaluation; national risk assessment; risk diagram

1. INTRODUCTION

Inspired by the classical definition of risk as “probability × effect,” and apparently following the Dutch and British examples, such national risk assessment (NRA) is rapidly gaining popularity, witness similar endeavors in Australia,(8) Canada,(9) Germany,(10) New Zealand,(11) Norway,(12) (13) (14) Sweden, Switzerland, and the United States,(15) while the OECD(1) and the European Commission(16) are stimulating its adoption and elaboration. In addition, the World Economic Forum(17) has assembled a two-dimensional “landscape” of 50 global risks. This shows water supply crises, chronic fiscal imbalances, and severe income disparity as the most serious world hazards. The North-Atlantic

How could governments best evaluate major hazards and threats to their country? On what grounds could effective, fair, and efficient safety investments be prioritized? Recently, in the Netherlands and the United Kingdom, this is done via a charting of national risk scenarios following their estimated likelihood of occurrence and seriousness of expected impact.(2–7) ∗ University

of Groningen, Faculty of Behavioral and Social Sciences, Grote Kruisstraat 2/I, 9712 TS Groningen, the Netherlands; [email protected].

948

C 2013 Society for Risk Analysis 0272-4332/13/0100-0948$22.00/1 

National Risk Assessment Treaty Organization seems to follow a similar track under the label of risk-based planning.1 Meanwhile, two-dimensional risk matrices and rankings are also being produced at the regional level. In the Netherlands, “regional risk profiles” ensue from the new Law on the (25) Safety Regions, effective from October 2010. In the wake of the U.K. Civil Contingencies Act of 2004, “community risk registers” are developed in each of the 80-odd counties; for example, for the United Kingdom, see London(18) (65 risks) and Derbyshire(19) (64 risks) and for the Dutch, see Rotterdam(20) (30 risks) and Groningen(21) (14 risks). In most of these, pandemic influenza figures as one of the largest risk scenarios. Methodologically, the British and Dutch regional risk assessments faithfully follow the U.K. Cabinet Office(7) and the DNRA(3) approach, respectively. Conducting an NRA proves to be an intensive (and expensive) information-processing and judgmental exercise requiring the cooperation of various experts. The central—envisaged—result is a two-dimensional “risk diagram” in which major hazards and threats may be roughly categorized in four quadrants following: (A) low likelihood and high impact—for example, major coastal flooding, (B) high likelihood and high impact—for example, severe pandemic influenza, (C) low likelihood and low impact—for example, failure of Internet exchange, and (D) high likelihood and low impact—for example, corruption of stock trade. See Fig. 1 for an example,(6) where A–D mark the four quadrants of the risk diagram. On the basis of this representation, for each scenario, various capabilities for risk control are considered for their suitability before, during, or after any serious event actually might occur. Thus, the Dutch government’s national risk management strategy is founded on (1) a set of plausible hazard or threat scenarios, (2) a seriousness evaluation of possible impact following the scenario, (3) an assessment of the likelihood of scenario occurrence, (4) a two-dimensional risk diagram, and (5) an analysis of capabilities for effective likelihood and/or impact limitation. In the following, “NRA” is used to indicate national risk assessment generally, while “DNRA” more specifically stands for the Dutch 1 “This

NATO SAS Task Group [093] wants to explore and utilize this broader (nondefense) source of knowledge to identify and adapt developments in risk-based planning for use in defense departments, and risk analysis in general.” See http://www.cso.nato.int/Detail.asp?ID=5491.

949 national risk assessment on which this article is focused. One problem for NRA is that typically for largescale, complex, uncertain, and socially distributive risk scenarios, classical concepts and methods for risk assessment and management may yield insufficiently convincing results. This makes the transparency and social understanding of any NRA methodology of critical importance, including the assessors involved and the procedures followed. A related question is how national risk assessment and management differ from people’s dealing with everyday small-scale, well-known, and individually controllable risks. National safety and security strategies may provoke public skepticism in so far as the methodology for NRA does not somehow reflect the way many people deal with the “normal” risks of life. The purpose of this article is to scrutinize and evaluate the five components of the DNRA from a decision-theoretic and risk-psychological viewpoint, whereby the very similar British approach is considered sideways. From a fully scientific point of view, much more could be brought forward than would be useful for pragmatic policymaking. The prevailing question for evaluation, however, is how well increasingly popular national (and regional) risk assessments do reflect established ideas and finding about risk analysis, safety management, human judgment, and decisionmaking, and—thus—how valid their overall results are. In the following sections, each DNRA component is first summarized and then followed by a set of evaluative comments. To illustrate what NRA may precisely involve, four example scenarios are quoted and their impact and likelihood assessments are demonstrated. The concluding section provides a summary discussion and several research suggestions, along with some international reservations about NRA. A broader, more transactional concept of risk—perhaps more suitable for NRA—is discussed in the epilogue. 2. HAZARD AND THREAT SCENARIOS NRAs typically are about low-likelihood, highconsequence events characterized by low predictability and considerable uncertainty. Relevant data may be sparse, causal chains may be obscure, and assessors may suffer from limited expertise and/or stagnant policy frames. Human cognition, judgment, and evaluation are basic for “scenario thinking,” that is, the elaboration of past experiences, accident

950

Vlek

Fig. 1. Risk diagram after the DNRA-2011 (42 scenarios).(6) The abscissa represents likelihood (for hazards), which also reflects conceivability (for threats). The ordinate indicates overall seriousness of multicriteria impact. [A], [B], [C], and [D] indicate the four quadrants: high-low, high-high, low-low, and low-high, as mentioned in the text. Left- and right-positioned bullets indicate that the relevant scenario falls in the lower or upper half of its likelihood category, respectively.

reports, suspicions and/or plain imagination into plausible scenarios, and the subsequent identification of the more or less likely events in those scenarios, as well as their more or less serious consequences. Thus, our first question is: “How does one get to plausible national risk scenarios?” In this section, a summary of the DNRA(3) approach is given, followed by a number of evaluative comments.

2.1. Summary of Scenario Development in the DNRA In the DNRA, hazard and threat scenarios are identified by groups of experts from the government ministries or agencies most involved. Since 2007, some 40 scenarios have been elaborated.(6) The U.K. NRA(7) starts from a set of “reasonable worst-case scenarios,” and so far figures 80 hazards and threats, while an additional 40 are kept on a reserve list.

In the DNRA methodologists’ view, a useful risk scenario results from a thematic in-depth study, and it covers the following elements: (1) a relevant socioeconomic and/or physical-infrastructural context, (2) a lead-up and triggering action or event, (3) an actual serious incident, (4) a national impact of the incident including societal responses and control measures, as well as the ultimate longer-term effects on vital national infrastructure, whereby: it must be ensured that the scenario devised offers sufficient leads to be able to carry out the risk assessment in the next stage. (DNRA(3) , pp. 17–18)

Following DNRA,(3) a useful national risk scenario is a plausible, logically structured, and sufficiently detailed story holding a consistent incident description; it must indicate a clear time horizon as well as the policy field (e.g., transport, energy, telecommunication) to which it relates; and it should be specific enough to identify current and future

National Risk Assessment desirable capabilities for safety management. Also, a scenario: must be psychologically expedient, so that it can be sold to and accepted by others. (p. 18)

Table I presents concrete descriptions of four out of 39 different scenarios from the DNRA,(5,6) taken roughly from the upper right quadrant of the risk diagram of Fig. 1, that is, between “possible” and “very likely” to occur, and between “considerable” and “catastrophic” impact, respectively. Likelihood and impact assessments for these scenarios are discussed further on in this article. Given the many possible variants and gradations of any scenario, the 39 different hazard and threat scenarios in the DNRA(5) should be seen as “compression points” of what might actually occur and how this would come about. Incident scenarios for both (nonmalicious, unintentional) hazards and (malicious, intentional) threats are subdivided into immediate risk scenarios (e.g., major floods, pandemic disease) and scenarios subject to certain gradual developments (like health burdens from population aging, or flooding risks from climate change). A hazard or threat scenario is considered relevant for current policymaking to the extent that its likelihood of occurrence and/or its societal impact necessitate a significant allocation of additional capabilities for adequate safety management within the next five years. In addition, but not in the U.K. NRA,(7) future five-year periods may be considered for gradually evolving risks such as climate change or population aging. 2.2. Evaluative Comments on Scenario Development Forecasters, planners, and psychologists have conducted extensive research on scenario construction and evaluation.(22–24) A useful scenario starts from a realistic description of an initial state of affairs, then provides a reasonable and consistent explication of a possible future development (i.e., a set of critical actions and/or events), followed by a plausible description of a possible end state in terms of various consequences up to the adopted time horizon. Note that in the above, the words “reasonable,” “possible,” and “plausible” already reflect the notion of likelihood or probability (see Section 4). For the substantive DNRA,(4–6) it would appear that the basic scenario concept used may be gener-

951 ally adequate, but that there is insufficient explication of the kinds and sources of relevant imagination and information. The exclusive role of departmental specialists (cf. DNRA,(3) p. 13) and selected experts may narrow the scope of inquiry. This goes a fortiori for the “classified” British and American NRAs.(7,15) One way of drawing up a broad set of relevant risk scenarios is to go by each of the DNRA’s five “vital interests” (see Table II) and ask which unexpected events might seriously jeopardize each of them. Another scenario method is “back-casting”: first suppose that a disastrous event has actually occurred, then try to describe the way(s) in which this could have happened. In so doing, one would in fact write an ex ante version of the official investigators’ report in which the disaster, should it really have occurred, is to be meticulously explained. 2.3. False-Positive Versus False-Negative Scenarios In developing risk scenarios to improve safety management, there is always the danger of exaggeration. Those living scrupulously may expect a snake in any corner. After a serious traffic accident, one may be fearful for the rest of one’s life. International terrorism is a source of political anxiety, which in some places has led to “precautionary” infringement on basic human rights.(25,26) For the DNRA,(5) one may wonder if, for various reasons, serious pandemic flu, criminal infiltration of government, cyber conflict, and urban enclave formation have not been taken too seriously (likely as well as impactful) as national hazards or threats. The false-positive problem may get worse when responsible risk assessors know beforehand that additional safety investments may follow their overall judgments. For example, if the risk identifiers and scenario developers come from specific government departments having an interest in strengthening their own safety management, there is a temptation to present hazards and threats as more worrying than one would otherwise have done. But this medal, too, has another side. Serious accidents or disasters may well occur because we do not believe a priori that they can occur; see Wagenaar and Groeneweg’s(27) analysis of the “impossible” capsizing of the ferry “Herald of Free Enterprise” in 1987, just outside the port of Zeebrugge (B.). “Unbelievable” technical failures (e.g., in space shuttle “Challenger,” 1986), human error (e.g.,

952

Vlek

Table I. Summary (Author’s Translation) of Four Concrete Risk Scenarios from the Upper Right Quadrant of the Risk Diagram (Fig. 1), as Extensively Described in the DNRA(4–6) Food/meat scarcity (“possible”)

National electricity blackout (“likely”)

Heavy snow storm (“very likely”)

Serious pandemic flu (“likely”)

Collapse of soy bean imports after EU restrictions on genetic modification, failed South-American harvests, and subsidized imports to China. Forced reduction of Dutch meat production, rising fodder and meat prices, loss of employment, decreasing meat consumption, and heated debates about its future. Complete fail-out (voltage collapse) of national electricity supply due to European network disturbances. Sudden standstill of daily life, big traffic jams, collapse of telecom, heating and cooling, industrial, and business processes. Vital services continue on emergency facilities. Transmitting system operators succeed in restoring full network supply after 24 hours. In large parts of the country, much snow gets moved around, drifts up in high dunes, penetrates into buildings, and severely limits outdoor visibility. All traffic is paralyzed, vehicles get snowed in, villages are isolated, and daily life is seriously disrupted. Low temperature, strong wind, and drifting snow make being outdoors unpleasant and risky. Five million Dutch ill for about 14 days, 14,000–32,000 hospitalized, some 80,000 deaths. Several influenza waves of 9–12 weeks, covering altogether 6–8 months before pandemic vaccine is widely available. Enormous economic losses, strong public anxiety and distrust.

Note: See Table VII for elaborate judgments of these scenarios’ likelihood “within the next five years.”

Table II. Vital Interests, Impact Criteria, and Relative Importance Weights in Percent (Total 100) Under a General Perspective (Gen.) and “Worldview” Perspectives A1, B1 , A2 , and B2 (See Note), Following DNRA(3) Worldview Perspectives Vital Interest 1. Territorial safety

2. Physical security

3. Economic security 4. Ecological security 5. Social and political stability

Impact Criterion

Gen.

A1

B1

A2

B2

1.1. Encroachment on the territory of the Netherlands 1.2. Infringement of the international position of the Netherlands 2.1. Fatalities 2.2. Seriously injured and chronically ill 2.3. Physical suffering (lack of provisions) 3.1. Costs 4.1. Long-term impact on environment and nature (flora and fauna) 5.1. Disruption to everyday life 5.2. Violation of the democratic system 5.3. Social psychological impact

10

6

5

11

9

10

3

14

6

5

10 10 10 10 10

14 12 12 19 2

12 11 9 4 13

13 11 12 10 4

13 12 12 4 13

10 10 10

16 8 8

8 13 11

11 11 11

12 10 10

Note: Worldview perspectives or “preferences profiles” are, respectively, A1 , individualistic perspective: “global market”; B1 , egalitarian perspective: “global solidarity”; A2 , fatalistic perspective: “safe region”; and B2 , hierarchical perspective: “caring region”; see discussion further below.

before the airplane collision on Tenerife Airport, 1977), terrorists’ audacity (as in New York City on 9/11, 2001), earthquake tsunamis (like in December 2004 off the Sumatran coast), or weather conditions (like cyclone “Nargis” flooding southern Myanmar in May 2008) can be so catastrophic that we may wonder why one did not better prepare for them. Serious hazards or threats may be neglected because assessors do not recognize early signals, misperceive causal chains, are caught in outdated frames of reference (“has never happened before”), are less

sensitive to the longer-term future, or suppress their professional anxieties because they “don’t want to make a fuss” and cause extra costs for the organization. Considering the current, updated DNRA,(6) one might think that so far neglected are scenarios about transport system failure, a “bank run” and major bank collapse, contamination and/or disruption of water supplies, dispersed food poisoning, frequent sexual abuse of pupils in youth care institutions, and epidemic animal diseases. Also, an increasing annual risk for the Netherlands—not in the DNRA—is the

National Risk Assessment massive lighting of ever more powerful consumer fireworks to celebrate the New Year.2 Thus, on the one hand, scenario developers may be overzealous and come up with incident warnings that may be exaggerated. On the other hand, as a constructive process, the timely identification and elaboration of hazard and threat scenarios may be too latent or sloppy, such that real dangers are denied or neglected, and may become manifest as unfortunate surprises. Decision-theoretically we are facing the problem of false-positive scenarios (calling for needless safety measures) versus false-negatives (ignoring real danger); for statisticians: type I versus type II error in hypothesis testing. But how could one tell if the disastrous event is so rare that there is no objective criterion for checking the “falseness” of a priori judgment and the implied over- or underprotection? Here, there seems to be no other solution than a wellorganized, carefully conducted process in which the possible false-positives and false-negatives are creatively considered and their more or less likely costs and benefits are explicitly weighed.3 Again, it would be useful to think about the official investigative report one would write after a falsely negated disaster has actually occurred, or when it did not happen after you had false-positively decided in favor of extensive precautions. 3. SERIOUSNESS OF IMPACT FOLLOWING SCENARIO4 Hazard or threat scenarios first appear on the policy agenda for the possible serious impact they may have. Only then does it become important to investigate their likelihood. Given that serious national impact is already described qualitatively as part of the relevant hazard or threat scenario, how is the setime (2012/2013), about €70 million was spent on consumer fireworks nationwide, as in the previous year. All over the country, police, fire brigade, and ambulance personnel operated well above normal strengths; about 600 people got injuries requiring hospital treatment, while over 100 mostly young people suffered from permanent eye damage, whereby 23 eyes became fully dysfunctional. Municipal governments and private households together incurred millions of property damage. 3 Since around 1960, well-established signal detection theory(78) tells us that the evidence strength required for the one or the other decision should be inversely proportional to the ratio of one’s prior probabilities times the cost ratio for the two types of errors. 4 As over 40% of the DNRA(3) methodology report is devoted to multiattribute impact assessment as a central component, this section is somewhat longer than the others. 2 Last

953 riousness of impact evaluated in a quantitative manner following DNRA?(3) This section provides a summary of the DNRA’s impact assessment, followed by a number of evaluative comments. 3.1. Summary of Impact Assessment 3.1.1. Selection of Criteria and Scoring of Scenarios In the DNRA, seriousness of possible (scenario) impact—damage, loss, and/or harm—is elaborately assessed by scoring envisaged scenario outcomes on 10 distinct impact criteria categorized under five “vital interests,” as given in Table II; for each criterion, specific indicators are proposed to qualify and support scenario scoring. The “General” column of Table II shows a uniform set of importance weights. The last four columns are supposed to reflect different worldview perspectives, which will be discussed further below. The 10 impact criteria in Table II correspond reasonably well to those of other NRA countries (see Section 1). In the U.K. NRA(7) (p. 3), only five impact criteria are used: number of fatalities, illness or injury following the emergency, social disruption (10 different types) of daily life, overall economic harm, and psychological impact (anxiety, loss of confidence, outrage), whereby “territorial safety” criteria are left out. In the DNRA, per criterion variable, impact seriousness is scored on a five-point scale ranging from “limited” (A) to “catastrophic” (E) consequences, a hyphen representing a null or “irrelevant” score. Seriousness judgments are qualitatively underpinned via checklists of various types of possible harm, damage, or loss. For example, assessors for any scenario first have to mark expectedly affected items on a checklist of vital infrastructure such as electricity networks, postal services, food supply, railway stations, and public order. In view of this, they are to provide a score while taking account of, for example, the number of victims, duration of injury, total area affected, and amount of costs involved. Thus, the possible set of impacts of any hazard or threat scenario is first unfolded and checked in qualitative detail before it is translated into a five-point scale value on each of the 10 impact criterion variables. 3.1.2. Criterion Importance Weighting, Uniform and Differential As a next step in overall impact assessment, a weighted sum of the 10 (valued) impact criterion scores is taken to determine total impact seriousness

954

Vlek

for a given scenario. Generally, a uniform set of 10 weights is used, each 10%, together totaling 100%, as shown in Table II. This amounts to just taking the overall average impact score, as in the U.K. NRA(7) (p. 3): Each of the [five] dimensions . . . is scored on a scale of 0 to 5. The overall impact, which indicates the relative scale and extent of all the impacts, is the mean of these five scores.

Additionally, four differential weight sets are provisionally considered, as supposedly associated with four different “worldviews” or value orientations among the Dutch population. The four worldviews are labeled, respectively: A1 , individualistic perspective: “global market”; B1 , egalitarian perspective: “global solidarity”; A2 , fatalistic perspective: “safe region”; and B2 , hierarchical perspective: “caring region.” Their formulation in DNRA(3) is a combined adaptation from both Douglas and Wildavsky(28) and the IPCC,(29) updated via an empirical validation for the Dutch RIVM.(30) The corresponding weight sets or “preference profiles” are given in the last four columns of Table II. Following DNRA:(3) Ideally, these different preference profiles should largely reflect the main value orientations of Dutch policymakers and [those] of the citizens they represent. (p. 81)

Table II reveals that perspective differences in weighting are most apparent for impact criteria 1.1 (territorial encroachment), 1.2 (international position), 3.1 (economic costs), and 4.1 (environmental impact), while individualistic worldview A1 , “global ` market,” is given rather outstanding weights vis-avis the other three perspectives. The four worldview perspectives numerically expressed in Table II raise the double question of which different worldviews or value orientations may be meaningfully distinguished and how different worldviews could be validly reflected in different sets of importance weights. The overall, uniformly weighted aggregate impact value (cf. column “General” in Table II) is used to position the relevant scenario on the vertical axis of the “risk diagram” (Fig. 1), whose horizontal axis represents the assessed likelihood of the scenario. Differential weight sets are used to test for the robustness of risk ranking, and to advise policymakers to account for differences in overall risk ordering; see Table IV and text further below.

3.1.3. Linear Versus Exponential Criterion Valuation Simple weighted summation of the 10 impact criterion scores for each risk scenario would be based on a straightforward linear aggregation model. However, to capture the idea of a nonlinear value function for each criterion variable (where, e.g., the score difference E–D may well be—felt as—larger than the difference C–B), the DNRA methodologists have chosen to apply an exponential transformation on base 3 to the five-point A–E impact scale. This means that impact seriousness for each criterion variable (cf. Table II) is judged to be an exponentially increasing (i.e., accelerating) function of the initial A–E score.5 One question here is whether and why one and the same value function should be indiscriminately applied to all criterion variables; for example, the undesirability of fatalities (criterion 2.1) might well increase more rapidly from A to E than the undesirability of economic costs (criterion 3.1). For the four example scenarios of Table I, Table III presents five-point impact scores (A–E) and the resulting aggregate impact values under a linear and an exponential transformation of scores following the DNRA.(5) Here, it immediately appears that criterion 1.1 (encroachment on national territory) is irrelevant for comparing these four scenarios. Table III shows that the aggregate, uniformly weighted impact value yields a different seriousness rank order of the electricity blackout and snow-storm scenarios for the linear versus the exponential value function as applied to the A–E category scale. 3.1.4. Worldview Differences in Criterion Importance Does it matter much whether different sets of importance weights are used when impact criterion scores (linear or exponential) are aggregated? Table IV gives a comparative view of the perspectiveweighted average impact scores following the four weight sets (perspectives A1 , B1 , A2 , and B2 ) given in Table II; for simplicity, rank numbers have been left out. For the present discussion, the four worldviews themselves and their translation into different 5 To

test for the sensitivity of overall impact values, an exponential value function on base 10 was also applied: “−” = 0, A = 1−4 , B = 1−3 , C = 1−2 , D = 1−1 , and E = 1.0. This significantly enlarges the “value steps” as impact criterion scores rise from A through E, compared to a base-3 function.

National Risk Assessment

955

Table III. Categorical Impact Criterion Scores (−, A–E) with Uniform-Weighted Impact Seriousness Aggregation Under a Linear and an Exponential Value Transformation (on a 0–1 Scale) for Four Example Scenarios from the DNRA(5) Aggregate Impact Seriousness (Rank)

Impact Criteria Following Table II

Scenarios (Likelihood)

1.1

1.2

2.1

2.2

2.3

3.1

4.1

5.1

5.2

5.3

Linear

Expon. [3]

Food/meat scarcity (possible) Electricity blackout (likely) Heavy snow storm (very likely) Serious pandemic influenza (likely)

− − − −

C − − −

− B A E

− A A D

C D D E

B C C D

C − − −

− D E E

− B − C

B B − E

0.26 (4) 0.36 (2) 0.28 (3) 0.62 (1)

0.041 (4) 0.090 (3) 0.147 (2) 0.477 (1)

Notes: See scenario descriptions in Table I and criterion descriptions in Table II. Numerical translation of A–E scores: Cell entries represent the A–E impact category score (a “–” representing a null or “irrelevant” score). A linear quantification of these goes from “–” = 0 and A = 0.20 to E = 1.00. A standardized exponential conversion (on base 3) goes from “–” = 0 and A = 0.012 to E = 1.0; or: A = 1/81, B = 3/81, C = 9/81, D = 27/81, and E = 81/81. Aggregate linear or exponential impact values are sums of scores across all 10 criteria, each uniformly weighted by 10%. Table IV. Comparison of Differential, Perspective-Weighted (cf. Table II) and Uniform-Weighted Aggregate Impact Values for the Four Risk Scenarios Described in Table I, with Linear Versus Exponential (Italics) Transformations of the Original A–E Category Scores (cf. Table III)

Perspectives → Scenarios ↓ Food/meat scarcity Electricity blackout Heavy snow storm Pandemic influenza

A1 Individualistic

B1 Egalitarian

A2 Fatalistic

B2 Hierarchical

Linear

Expon.

Linear

Expon.

Linear

Expon.

Linear

Expon.

0.21 0.48 0.42 0.80

0.029 0.115 0.224 0.613

0.28 0.33 0.22 0.60

0.046 0.076 0.117 0.465

0.22 0.41 0.31 0.70

0.032 0.102 0.164 0.553

0.24 0.37 0.29 0.65

0.038 0.098 0.168 0.535

Notes: All cell values have been standardized to range from 0 to 1, that is, from least to most serious. See last two columns of Table III for comparison with uniform-weighted aggregates.

weight sets are less important than the idea of differential importance weighting per se. Table IV reveals that for the linear-transformed A–E scores, the impact seriousness rank order for these four scenarios is identical for differentialweighting perspectives A1 , A2 , B2 , and overall uniform weighting (next to last column of Table III); only differential weighting under perspective B1 leads to reversed ranking of “food/meat scarcity” and “heavy snow storm,” the latter being judged by the “egalitarians” (B1 ) to be the least serious. For the exponential-transformed A–E scores, however, the seriousness rank order is perfectly the same under all four differential-weighting perspectives and, by implication, under uniform weighting (last column of Table III) as well. In other words, for these four scenarios from the upper right quadrant of the risk diagram (Fig. 1), differential weighting of impact seriousness (A–E) scores does not matter much, and this is largely because the four scenarios were given rather different seriousness scores

on only one (viz. 1.3: economic costs) of the four perspective-sensitive criterion variables (1.1, 1.2, 3.1, and 4.1; see Table III). Moreover, the exponential A– E score transformation inflates the significant differences on the less perspective-sensitive impact criteria (e.g., 2.3, 5.1, and 5.3), which could thereby come to dominate the overall seriousness rank order of the four scenarios. Note that, irrespective of the type of A–E score valuation and kind of weighting, “serious pandemic flu” is evaluated as having the most serious impact of all four scenarios, while “food/meat scarcity” is valued as of limited impact compared to “heavy snow storm” or “national electricity blackout.” 3.2. Evaluative Comments on Multicriteria Impact Assessment Scenario impact assessment for the DNRA(3) follows multicriteria analysis (MCA). This wellestablished methodology (also called multiattribute

956 utility analysis or MAUA) was rapidly developed in the 1970s.(31–33) MCA/MAUA is a formal structuring and evaluation approach whose practical application strongly hinges on decisionmaker inputs, that is, choice alternatives, evaluation criteria or attributes, criterion weights, “objective” scores, value or utility judgments about these scores, and the adopted overall aggregation model, often straightforward weighted summation.6 If the 39 scenarios from the DNRA(5) are adopted for impact assessment, doubts may be felt about the way criterion selection, weighting, and scoring are handled in the DNRA, as follows. 3.2.1. Derivation and Selection of Impact Criteria Drawing up a goals-values hierarchy is a good way to identify specific evaluation criteria. Zooming in on the five “vital interests” listed in Table II, we may wonder why the key term “security” is not consistently used for all five. Moreover, in unfolding vital interest no. 5: “social and political stability” (or “security”), one might add: impact criterion 5.4, “diminishing social cohesion,” with specific indicators: increased individualism, social stereotyping and discrimination, reduced caring for vulnerable people, and decreased support for public goods protectors. More importantly, the DNRA(5) clearly reveals that, across all its 39 risk scenarios, impact criteria 2.1 (fatalities) and 2.2 (seriously injured/ill persons) are given highly correlating A–E scores. Also, criteria 1.1 (territorial encroachment) and 4.1 (environmental impact) are receiving rather similar scores. This suggests that there is room for a more efficient combination of impact criteria. Impact criterion 5.3 (social-psychological impact) takes a special position. In the DNRA,(5) high-scoring scenarios in this respect are pandemic influenza, heavy coastal or river flooding, willful extensive electricity blackout, and several kinds of extremist action. In many cases, however, expected fear and anger (criterion 5.3) appear to correlate with disruption of everyday life (criterion 5.1). Thus, it would seem that impact criterion 5.3 is a candidate 6 Relying

on a linear-weighted combination of criterion (utility) scores presupposes that a scenario’s (dis)utility is monotonically increasing along the relevant criterion variable. For singlepeaked utility functions, this may be realized by folding the criterion scale around the point of highest (dis)utility.

Vlek for improvement, perhaps through enrichment with other public moods or feelings like distrust, fatalism, and withdrawal, or by considering it as part (via additional specific indicators) of 5.1: disruption of everyday life. In contrast, one would think that impact criterion 3.1 (costs) requires a clarifying subdivision and specification, perhaps via the promotion of some specific indicators to the position of separate impact criterion, for example, health damage/care, property damage, commercial losses, and costs of disaster management. In conclusion, it would appear that the DNRA’s(3) five vital interests may need some textual adaptation to sharpen their meaning, but it seems especially desirable that the 10 impact criteria proposed (cf. Table II) be revised so that the entire set may become less redundant in parts as well as more comprehensive overall, and thus more useful in practice. 3.2.2. Ranking and Weighting of Impact Criteria Thus far, for the national risk diagram of Fig. 1, aggregate impact assessments have been used that are based on uniform-weighted summation (or just averaging) of the 10 impact criterion values—after linear or exponential score transformations; see the results for the four example scenarios in Table III (last two columns). As Table IV shows, differential impact criterion weighting changes little (under a linear value function) or nothing (under an exponential function) in the aggregate seriousness rank order of these four scenarios. Such an observation is well known among MCA/MAUA specialists, and it may well be generalized here to apply to all 39 risk scenarios of the DNRA-2010:(5) differential importance weighting does not matter much, at least as long as some weights do not get so extreme that the relevant criteria start dominating or being dominated by other criteria. This conclusion in fact means that simple addition or deletion of impact criteria may suffice for obtaining a valid account of the essential drivers of one’s overall impact seriousness judgment. If this idea is applied to the four different weighting perspectives (worldviews) specified in Table II, the conclusion would be that for perspective A1 (individualistic), low-weight criteria 1.2 and 4.1 might just as well be deleted; and similarly, criteria 1.1 and 3.1 for perspective B1 (egalitarian), criteria 1.2

National Risk Assessment and 4.1 for perspective A2 (fatalistic), and criteria 1.2 and 3.1 for B2 (hierarchical). These deletions would not change the overall seriousness rank orders of the four scenarios, as given in Table IV, under either a linear or an exponential criterion value function. A recent in-depth assessment of multicriteria impact evaluation in the DNRA(34) has revealed that the worldviews or preference profiles used have limited validity and that the weight sets used may not be representative for the cultural views of the Dutch population. This need not be surprising as the Dutch methodologists themselves note in DNRA(3) (p. 81) that the preference profiles: should be regarded as just an initial attempt to do this: the profiles . . . are derived in an intuitive—and not scientifically rigorous—way.

One suggestion by Willis et al.(34) is to ask Dutch representative groups and organizations directly for their concerns and evaluations about national hazards and risks, possibly with the help of a multiattribute risk-ranking procedure.(35) In view of the apparently limited role of differential importance weighting, one may nevertheless ask the question—which also applies to the U.K. NRA:(7) “But surely, not all impact criteria are of equal importance for the seriousness ranking of national risk scenarios?” Probably not, but this should be a political judgment, which may begin with an importance rank ordering (by pair comparisons, if need be) of all impact criteria and a subsequent distribution of 100% points among them, reflecting their relative importance. What complicates matters is that the nature as well as the importance of relevant impact criteria significantly depends on the set of risk scenarios to be evaluated. Simply put, a certain criterion should (also) be important to the extent that the scenarios get divergent or discriminating scores on it.(36) For the four scenarios in Table III, for example, this is obviously not the case with respect to criterion 1.1 (territorial encroachment) on which they all receive null scores. For the entire set of 39 risk scenarios in the DNRA,(5) the assessments on all 10 impact criteria appear to be unequally discriminating. This may provide reasons, for example, to assign greater weight to (strongly discriminating) “disturbance of everyday life” than to (weakly discriminating) “number of fatalities.”

957 3.2.3. Scoring and Evaluation of Risk Scenarios Evidently, scenario scores on impact criteria are crucial elements of overall impact seriousness judgments. The careful idea of an MCA/MAUA is to first measure or assess the “objective” score of an alternative and then value each score following an adopted criterion value function, involving, for example, a linear, exponential, or other transformation of the “objective” scores, as indicated in Table III. For the four scenarios given there, one might question some of the impact criterion A–E scores assigned to them. For example, in the DNRA-2010(5) “food/meat scarcity” obtains a surprising C: “serious” on criterion 4.1 (environmental impact) and a null score on 5.1 (societal disruption). The electricity blackout scenario is given a B: “considerable” on criterion 2.1 (fatalities) as well as on 5.3 (public fear/anger), which one would rather expect to be assigned a C or a D for “(very) serious.” The heavy snow storm scenario is given a C: “serious” on criterion 3.1 (costs), a “catastrophic” E on criterion 5.1 (disruption), and a null score on 5.3 (fear/anger). Finally, serious pandemic influenza is scored indiscriminately as E: “catastrophic” on all four criteria 2.1 (fatalities), 2.3 (suffering), 5.1 (disruption), and 5.3 (fear/anger). If indeed the four example scenarios in Table I are considered to be “compression points” of what might actually occur (cf. Section 2.1), one may question their representativeness and/or the realism of the A–E score patterns assigned to them. Particularly the pandemic flu scenario seems to be overestimated, perhaps in view of the political and public concern raised by the strongly feared but actually mild flu epidemic befalling the Netherlands in the autumn of 2009.7 Following the MCA/MAUA, assignment of “objective” A–E scores to risk scenarios with respect to impact criteria is [to be] followed by a valuation of each score before any weighted summation is performed. To this end, a linear value function: A = 0.2, B = 0.4, . . . , E = 1.0, may be simplest. But possibly, an exponential function: A = 0.012, B = 0.037, C = 0.111, . . . , E = 1.0, may be more appropriate; see explicatory note under Table III. However, to what extent did or do the DNRA assessors already use the initial A–E scales such that 7 In

the DNRA-2011,(6) the criterion impact scores for serious pandemic flu have been moderated—“due to enhanced safety measures”—so as to yield an aggregate linear impact value of 0.46 instead of the original 0.62 (DNRA-2010(5) ), as in Table III.

958

Vlek

the difference E – D (or D – C) is taken to be significantly larger than the difference C – B (or B – A)? This would imply a double increase or acceleration in impact seriousness for the higher (“very serious,” “catastrophic”) end of the scale. And, isn’t exponential evaluation already applied to some of the qualitative variables underlying separate impact criterion scores, such as the number of people injured (0–10, 10–100, 100–1000, etc., for A, B, C, etc.) or the size of the area affected (0–100, 100–1,000, 1,000–10,000 km2 , etc., for A, B, C, etc.)?8 This would amount to some kind of double weighting of impact criterion scores, whereby the final representation of hazard and threat scenarios (Fig. 1) may be significantly distorted.

4. LIKELIHOOD OF HAZARD OR THREAT SCENARIO For justifiable safety decisions about a given set of hazard and threat scenarios, sheer possibility or plausibility must be somehow translated into scalable probability. Or conversely, and whether you like it or not, any safety decision actually made is reflective of certain degrees of belief (“likelihoods”) concerning the critical hazard or threat involved. For a straightforward discussion of likelihood assessment, this section first recapitulates the nature and meaning of probability. Thereafter, a summary of likelihood assessment in the DNRA(3) is presented, followed by a number of evaluative comments. 4.1. Frequentistic, Logical, and Personalistic Probabilities Mathematically, probability is a numerical quantity between 0 and 1 that obeys a simple set of (Kolmogorov) axioms from which various other properties and rules are derived. Less widely accepted are the three customary interpretations of probability. Frequentistic probability is defined as the limit of a relative frequency, provided there is a sufficient number of observations under substantially similar conditions, which will also pertain to the relevant future (e.g., traffic accident statistics, elementary failure frequencies, sea-level and flooding statistics). Logical probability is defined as the result of logical inference or calculation, given the well-known and enduring characteristics of some “chance device” 8 See

DNRA,(3) pp. 31–32, 35–36, 43.

like a fair coin, a roulette wheel, or a valid fault-tree model of a technical installation (e.g., electric power plant, oil refinery, river dam). Personalistic probability is defined as a degree of belief of any reasonable person willing to engage in a proportional bet on the outcome of an uncertain event, as in “I bet you X against Y that it will rain tomorrow,” or “ . . . that the present bomb alarm is false.”9 Modern Bayesian decision theorists(37,38) maintain that any probability—even one associated with the outcome of a fair coin or a roulette wheel (or to a 100 years of sea-level statistics, for that matter)—is a future-oriented aid for making rational decisions. This means, for example, that relative frequency and/or logical measurements of probability should always be scrutinized not only for their satisfying of essential probability axioms, but also for their equivalent applicability to the uncertain future event(s) under consideration, whose outcomes are still to occur. One way to come to grips with the varying basis of probability assessment is a simple 2 × 2 taxonomy combining degree of frequentistic information with extent of human control over risk. In Table V, four “ideal” cases are presented, ranging from repetitive no-control situations in the upper left, to unique human-controlled situations in the lower right quadrant. The four cells of Table V indicate the typical (different) nature of risk taking involved, for each of which an example inspired by the DNRA is given. Obviously, the most convenient case for probabilistic risk analysis is a repetitive “act of God” situation (upper left quadrant). In the context of NRA generally, however, precisely the opposite case is the most pressing: rather unique, human-controlled risk problems (particularly threats) whose assessment is wrought with uncertainties. This would make a frequentistic probability concept the least suitable of all three. 4.2. Summary of Likelihood Assessment In the DNRA,(3) the likelihood of a particular risk scenario occurring within five years is assessed on various grounds, viz. historic (similar) events and 9 Thus,

for all hazard or threat scenarios where expert opinion is the major basis for (epistemic) likelihood assessment, the expert(s) concerned might be challenged to indicate which personal stake they would put on the actual occurrence of the scenario under consideration.

National Risk Assessment

959

Table V. Four Different Information Bases for Probability Estimation, Implying Different Approaches to Risk [Italics], with Examples from the DNRA(2,4–6) ; Adapted from Author(39) Externally Determined (“Act of God”) Repetitive, frequentistic information

Unique, nonfrequentistic information

[logical/personal]

Internally Determined (Human Control)

Observed (or computable) relative frequencies of fortuitous events [Frequency-based gambling]

Observed (calibrated) relative frequencies of success/failure in human performance [Frequency-based achievement]

→ heavy snow storm, drought, wildfire Properties (strengths, weaknesses) of systems, objects, or materials [Scenario-/model-based gambling]

→ large railway/shipping/chemical accident Belief in the plausibility of a new hypothesis or in the feasibility of a new activity [Scenario- /model-based achievement]

→ sea-dike collapse, food/meat scarcity

→ electric network blackout, satellite fail-out

case histories, probability model design and calculations, elementary failure frequency data combined with network or decision tree analysis, analyses of actors and actor strategies, and scenario and trend analyses involving expert opinions. Thus, it would seem that the frequentistic, logical, and personalistic interpretations of probability are all involved, in conjunction with varying methods of probability estimation. The DNRA assessors are to focus on two stages of a risk scenario: The likelihood of the incident scenario is determined primarily by the trigger. . . . [It] is determined to a secondary extent by the consequence (impact) of the incident scenario.10 . . . [Low likelihood of occurrence] . . . requires that a clear and uniform line of reasoning is followed where, besides trigger and impact, the context regarding the potential hazard/threat is also clearly described. (DNRA,(3) p. 53)

Obviously, DNRA assessors should indicate the uncertainty in their selection of any likelihood category. Scenarios are to be scored on a five-point interval scale ranging from A: “very unlikely” to E: “very likely,” whereby the ratio between successive categories (B/A, C/B, etc.) “should be kept equal as far as possible” (p. 53). Thus, in practice, when judgment C “possible” is taken to be twice as probable as B “unlikely,” then B “unlikely” should be twice as probable as A “very unlikely.” The decision-making implication of this is that one should be willing to double one’s stake on the occurrence of the relevant scenario when it is judged to be “possible” rather than “unlikely,” or “unlikely” rather than “very unlikely.”

Table VI shows the labeling of likelihood categories A–E for hazards (numerical and verbal) and for threats (only verbal). Hazard and threat scenarios are treated distinctly because for the latter “people’s intentions must be taken into account” (DNRA,(3) p. 54). Because threats involve rather unique events for which specific historic or statistical information is lacking, “the determination [of likelihood] will mainly be guided by expert opinions about scenarios, social trends and threat analyses” (DNRA,(3) p. 55). Here, expert judgment is focused on the likelihood (or rather the plausibility) of a terrorist or criminal attack and the likelihood of its success, whereby targets’ vulnerability and potential victims’ eventual self-protection are taken into account. In contrast, likelihood of hazard scenarios is significantly based on probabilistic risk analyses, with expert opinion brought in to adjust the computed probability values for altered conditions or recent changes in risk management. Following the DNRA-2010,(5) very likely risk scenarios are criminal distortion of the stock market, political salafism, and heavy snow storm. Very unlikely scenarios are a large railway or shipping accident, considerable coastal or river flooding, a nuclear incident, and willful disturbance of national gas distribution; see also Fig. 1. Table VII shows the main arguments underpinning experts’ likelihood estimates (from DNRA(4–6) ) for the four example scenarios summarized in Table I. 4.3. Evaluative Comments on Likelihood Assessment

10 The idea here, presumably, is that the larger the impact the lower

the likelihood.

Despite its apparently thorough consideration of likelihood assessment about national risk

960

Vlek Table VI. Likelihood Categories with Their Numerical and/or Verbal Labels for Hazards and Threats(3) Hazard

Scenario Type → Likelihood Cat.↓

Numeric

Verbal

Threat Verbal Only

A B C D E

50%

Very unlikely Unlikely Possible Likely Very likely

No concrete indication and the event is not deemed conceivable No concrete indication, event deemed far-fetched but conceivable No concrete indication, but the event is conceivable The event is deemed very conceivable Concrete indication that the event will take place

Note: Numerical values indicate likelihood in percent per five years. To obtain probability values, all numbers should be divided by 100. Table VII. Experts’ Probability Judgments (Quoted, in Author’s Translation) about the Four Risk Scenarios Summarized in Table I Scenarios Food/meat scarcity (DNRA,(5) pp. 28, 54)

National electricity blackouta (DNRA,(4) p. 160)

Heavy snow storm (DNRA,(4) p. 57)

Pandemic influenza (DNRA,(6) p. 24)

Summary of Likelihood Judgment This is judged “possible” (likelihood category C) but not very probable because there are no concrete indications (except for China, GMO debate). Moreover, it would involve at least two disturbances at the same time. Droughts and plant diseases are conceivable, but China’s role is unpredictable and can have a big negative effect. The Netherlands has never been hit by (but a few times narrowly escaped) a national blackout. Although this is (considerably) less probable than a partial big failure, it is possible that this occurs. For lack of statistics, probability calculations are based on expert opinions, assuming that 0.2% of 10 risky European incidents per year could yield this scenario (once in 50 years). Following DNRA-2010,(5) this is “likely” (category D). Heavy, several-day life-disrupting snow storms are rare. In the Netherlands, notorious storms occurred in 1937, 1942, 1945, 1947, 1958, 1963, and 1979. Thus, the probability may be set at 25–50% per year. The scenario is assigned to category E: very likely. Case history reveals that pandemic flu may occur once every 10–40 years, on average once per 25 years. Thus, the probability of one pandemic within the next five years is 20%. Assumed that the serious and the mild variants are equally likely, this 20% value can be divided such that scenarios “serious” and “mild” pandemic flu are assigned a probability of 10% each (category D: likely).

a Another

energy risk scenario in the DNRA(4) (p. 145) is a “willful prolonged electricity blackout” due to a terrorist attack. Experts judge this event to be in category C: “possible.” A third energy scenario is an “improbable” (category B) double attack on a major electric switching station and a network-supporting coal-power plant; DNRA(4) (pp. 163–171) gives an elaborate description of this rare terrorist action.

scenarios, the DNRA(3) does not offer a brief analysis of the (essential) probability concept and its measurement. Since the 1960s, a variety of probability assessment methods have been well researched and adapted for practical use.(40–42) Moreover, repeated empirical research has revealed significant probability heuristics and biases, which especially operate under limited-information conditions. Examples are cognitive availability, representativeness, anchoring and adjustment, overestimation of very low probabilities, and illusion of control or invulnerability.(43,44) These are not discussed and assessors are not advised about them in the DNRA.(3) Instead, the five-point likelihood and conceivability scales provided (Table VI) the straightforward “scoring” instructions given, and the adaptive

correction factors suggested (for new or specific circumstances) in DNRA(3) leave the impression that likelihood assessment is a rather tentative and knotcutting affair, which may not easily be replicable across different experts, may suffer from inconsistencies, and lacks transparency toward external groups and organizations. All this may weaken the comparability of hazard and threat scenarios following their likelihood or conceivability. As Table VI shows, the recommended ratio of 10 for successive likelihood categories only applies to the upper boundaries (not the virtual midpoints) of categories B, C, and D, while categories A and E fill up the total 0–100 range from 0.05% downwards and from 50% upwards, respectively. This means that the likelihood scale (in percent per five years) is rather

National Risk Assessment coarsely truncated at both sides. For the lower end (