Report on the elicitation of expert volcanological

0 downloads 0 Views 958KB Size Report
Mar 16, 2005 - elicitation procedure to the Sete Cidades volcano case study will .... in “A Procedure Guide for Structured Expert Judgement”, published by .... Dependence in uncertainty analysis is an active issue and methods ...... Society Conference: University of Kent, Canterbury, 22-26th June .... October, Cagliari: CISA.
DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

Project number: EVR1-CT-2002-40026 Project Coordinator: Dr. Augusto Neri

Deliverable D6.1

Report on the elicitation of expert volcanological judgement for each designated volcano

W.P. Aspinall (Aspinall & Associates) with contributions from Paul Cole, Simon Young, Augusto Neri, Gordon Woo, Thea Hincks and EXPLORIS colleagues

Report period 24 months

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

1

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

EXPLORIS Deliverable D6.1

Report on the elicitation of expert volcanological judgement for each designated volcano Prepared by Aspinall & Associates 16 March 2005

Executive Summary When the EXPLORIS project was first conceived, it was envisaged that a number of elicitations of expert opinion would be conducted throughout the 3-year programme of work. The purpose of these elicitations would be to provide a suitable risk-informed perspective for all the objectives of the study, and to set a framework for configuring deliverables. There was agreement that a special aspect of this part of the project should be to monitor the way people’s opinions evolved over the course of the project and, in particular, to quantify if possible the impact on their views of the new scientific insights that the project generates. This Deliverable D6.1 describes in detail the expert elicitation methodology that is being used within EXPLORIS, and summarises the outcomes of the initial elicitations that were conducted during the early phases of the project. These results will provide a reference basis for later elicitations, which will follow in Year 3, and for exercises to quantify the scientific uncertainty surrounding different eruption scenarios at the designated project volcanoes. During the first two years of the EXPLORIS Project, forty-nine project participants have been inducted into a performance-based decision-support scheme, via the chosen structured elicitation procedure. Their judgements can be pooled, combined and used, whenever necessary or appropriate, to obtain evaluations of volcanological factors and variables of interest in the different study areas. Working closely with WP2, a series of initial elicitations of opinions on the long-term recurrence rates for three of the four project volcanoes have been undertaken by involving as many relevant EXPLORIS partners as possible. The EC-sponsored EXCALIBR elicitation software has been adapted and used to provide a rational consensus of their expert opinions. The volcanoes that have been considered thus far are: Vesuvius; Soufrière of Guadeloupe, and El Teide. In the case of Guadeloupe, the implementation has involved a restricted sample of experts, and it is planned to extend the catchment to a wider group of specialists in early 2005. Application of the expert elicitation procedure to the Sete Cidades volcano case study will proceed as soon as is feasible. Valuable hazard- and risk-related guidance is being provided by the outcomes of these structured expert elicitations. The subjective data that are obtained from this procedure can be used now, together with new scientific results from the rest of the EXPLORIS Project, for assigning probability weights to the different volcanic hazards for each of the three volcanoes. These will be incorporated into event tree representations of hazard and risk, the construction and parameterisation of which are some of the main activities of WP6, scheduled for Years 2 and 3 of

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

2

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

the project.

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

3

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

Contents ........ Errore. Il segnalibro non è definito.

1

Introduction ....................................................... 6

2

Procedure........................................................... 7

2.1

Background ............................................... 7

2.2

Structured expert judgement .................... 8

2.2.1

Point values ...................................... 9

2.2.2

Discrete event probabilities ............. 9

2.2.3

Distributions of continuous uncertain quantities .........................................................10

2.2.4

Conditionalization and dependence10

2.3

Performance-based measures of expertise11

2.4

Examples of expert judgement exercises14

2.5

Constrained decision-maker optimisation15

2.6

Setting the scene for EXPLORIS elicitations .......................................................................17

3

Initial expert elicitations in EXPLORIS ........ 18

3.1

Calibration – summary of experts’ performance-based weights .........................................18

3.2

Initial elicitations for the EXPLORIS volcanoes..................................................................20

2.3.1

Event ‘size’...................................... 20

2.3.2

VSD probabilities of occurrence ... 22

2.3.3

Elicitation results ........................... 24

2.3.4

Designated volcano VSD cases: comparative results.................................................29

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

4

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

4

Summing up and work-in-progress................ 37

5

Bibliography and references........................... 39

6

Appendix 1: Expert elicitation - calibration questionnaire...........................................................43

7

Appendix 2: EXPLORIS VOLCANIC SCENARIOS: ELICITATION QUESTIONNAIRE .46

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

5

DEL IV ER ABL E D6.1

1

E X P E R T E L IC I T A T I O NS

Introduction

The main objective of Work Package 6 (WP6) in the EXPLORIS project is to develop probabilistic hazard and risk assessment techniques and tools for the designated set of European volcanoes, identified as suitable application templates for shaping envisaged methodological advances. The outcomes of these advances will maximize the benefit to other work packages in deciding strategies for civil protection, population evacuation and casualty mitigation. The main activities associated with the objectives identified for WP6 are predominantly focused in the second and third years of the EXPLORIS project. Within EXPLORIS, a series of structured expert judgment elicitations were planned to take place as the project progresses. The purpose of these elicitations is to derive rational, quantitative statements about the most appropriate values to use for variables of interest and, more importantly, to give expression to the scientific uncertainty that attaches to each. The elicitations are conducted under the direction of Aspinall & Associates, and the first exercise took place at the Kick-off Meeting in Pisa in January 2003 (see also Appendix 1). In that initial exercise, EXPLORIS participants who were present were each ‘scored‘ against a performance-based measure, using seed questions relevant to volcanology. While precise answers to seed questions were known to the facilitator and organisers, these were unlikely to be known exactly by any individual expert. By this means, each individual’s ‘calibration’ and ‘informativeness’ could be quantified (see Sect. 2, below), and these measures combined into a personal weighting function to be applied to each expert’s opinion on any other questions that the project needs to evaluate by expert judgement. Subsequently, several other members of the project, who were not present in Pisa, were also calibrated so that, by the end of the first year, forty-nine EXPLORIS persons had been ‘calibrated’ (see Sect. 2, below). The structure of this report is as follows. Details of the elicitation procedure used in EXPLORIS, its theoretical basis and some application precedents are provided in the next section, Section 2. Summary results of the EXPLORIS group calibration indicating, in a general sense, the outcome of that exercise are provided in Section 3. Then, the results of applications of the elicitation procedure to ascertain preliminary probabilities of occurrence for various volcanic eruption scenarios at the designated volcanoes are presented. Section 4 summarises the progress made, thus far, and indicates future work and activities that remain to be done under this topic within the EXPLORIS Project. Finally, references relevant to this report are recorded in an extended bibliography in Section 5, and some additional information and guidance notes, circulated and used during the elicitation exercises, are recorded in two appendices.

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

6

DEL IV ER ABL E D6.1

2

E X P E R T E L IC I T A T I O NS

Procedure 2.1 Background

Several approaches are available for the elicitation and aggregation of individual experts’ assessments, some of which can be termed “behavioural”, others “mathematical” (Clemen and Winkler, 1999). Mathematical methods construct a single ‘combined’ assessment for each variable, one by one, by applying procedures or analytical models that treat the individual separate variables autonomously. Behavioural aggregation methods, on the other hand, involve experts interacting together with a view to achieving homogeneity of information of relevance to the experts’ assessments across all the variables of interest. Through this interaction, some behavioural approaches, e.g., the expert information approach (Kaplan, 1992), aim at obtaining a clear agreement among the experts on the final probability density function obtained for each and every variable. In other approaches, such as those described by Budnitz et al. (1998) or by Keeney and Von Winterfeldt (1989), the interaction process is then followed by some form of elementary mathematical combining of the individual experts’ assessments in order to obtain a single (aggregated) probability density function per variable. Typically, these approaches rely on very simple combination schemes, such as equal weighting for all the participating experts. The mathematical approaches (with some component of modelling) and the behavioural approaches both seem to provide results that are inferior to simple mathematical combination rules (Clemen and Winkler, 1999). Furthermore, a group of experts tends to perform better than the average solitary expert, but the best individual in the group often outperforms the group as a whole (Clemen and Winkler, 1999). This motivates the adoption of procedures that elicit the assessments of individual experts without interaction during the elicitation process itself, which is then followed by simple mathematical aggregation in order to obtain a single assessment per variable. In this way, the individual experts’ assessments are given different weight on the basis on their performance and merit. Following Cooke (1991), over the last 10 years the Delft University of Technology has developed methods and tools to support the formal application of expert judgement (see Cooke and Goosens, 2004), including the development of the computer software EXCALIBR (Cooke and Solomatine, 1992) for conducting analysis of elicitations. Applications have included consequence assessments for both chemical substances and nuclear accidents, and case histories in several other fields of engineering and scientific interest. The techniques developed by Delft can be applied to give quantitative assessments, or qualitative and comparative assessments. The former give rise to assessments of uncertainty in the form of probability distributions, from which nominal values of parameters can be derived for practical applications. The latter lead to rankings of alternatives. The application of these techniques is underpinned by a number of principles, including scrutability (all data and all processing tools are open to peer review and results must be reproducible by competent reviewers), fairness (experts are not pre-judged), neutrality (methods of elicitation and processing should not bias results), and performance control (quantitative assessments are subjected to empirical quality controls).

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

7

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

The overall goal of these formal methods is to achieve rational consensus in the resulting assessments. This requires that diverse stakeholders ‘buy into’ or ‘take ownership’ of the process by which the results are reached, and that the process itself optimizes performance, as measured by valid functional criteria. Performance criteria are based on control assessments, that is, assessments of uncertain quantities closely resembling the variables of interest, for which true values (e.g., from experiments or observation) are known, or become known post hoc. Criteria for analysing control assessments are closely related to standard statistical methods, and are applied both to expert assessments, and to the combinations of expert assessments. The use of empirical control assessments is a distinctive feature of the Delft methods. The underlying methodology is described in “A Procedure Guide for Structured Expert Judgement”, published by the European Commission as EUR 18820 (Cooke and Goossens, 2000). The resources required for an expert judgement study vary greatly depending on the size and complexity of the study. A trained uncertainty analyst (or ‘facilitator’) is required for defining the issues and processing the results. Studies undertaken thus far have used as few as four and as many as fifty experts. The amount of expert time required for making the assessments depends on the subject and may vary from a few hours to as much as a week, for each participating expert. In the past, the total man-power time required for such studies has varied between one man-month and one man-year, although in certain special applications (e.g. volcano monitoring) the commitment may be condensed into shorter intervals. Other variables determining resource commitments are travel, training given to experts in subjective probability assessments, and the level of supporting documentation produced. However, post-elicitation processing and presentation of results are greatly facilitated by software support.

2.2 Structured expert judgement Expert judgement has always played a large role in science and engineering. Increasingly, expert judgement is being recognized as another form of scientific data, and formal methods are emerging for treating it as such. This section gives a brief overview of methods for utilizing expert judgement in a structured manner - for more complete summaries see Hogarth (1987), Granger Morgan and Henrion (1990), Cooke (1991), or Meyer and Booker (2001). In the world of engineering, technical expertise is generally separated from value judgements. Engineering judgement is often used to bridge the gap between hard technical evidence and mathematical rules on the one hand and the unknown, or unknowable, characteristics of a technical system on the other. Numerical statements or evaluations, that are tantamount to data, have to be derived which are suitable for the practical problem at hand, and engineers are usually able to provide these essentially subjective data through insights from engineering models and from experience. The same is true for expert judgements: models and experience largely inform the subjective experts’ assessments, which is why certain specialists acquire recognised expertise in certain fields of interest. Skipp and Woo (1993) take the conversation further, however. They argue that expert judgment should be distinguished from engineering judgement on the grounds that the former is, and must be, clearly anchored in a formal probabilistic framework, whereas that attribute is often absent from the latter.

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

8

DEL IV ER ABL E D6.1

2.2.1

E X P E R T E L IC I T A T I O NS

Point values

In many earlier elicitation methods, most notably the Delphi method (Helmer, 1966), experts are asked to guess the values of unknown quantities. Their answers are single point estimates. When these unknown values become known through observation, the observed values can be compared with the estimates. There are several reasons why this type of assessment is no longer in widespread use, which Cooke and Goosens (2004) summarise as follows. First, any comparison of observed values and estimates must make use of some scale on which the values are measured, and the method of comparison must incorporate the same properties of that scale. For example, percentages are measured on an absolute scale between 0 and 100; mass is measured on a ratio scale (values are invariant up to multiplication by a positive constant), wealth is often referred to an interval scale (values are invariant up to a positive constant and a choice of zero). In other cases, values are fixed only as regards rank order (an ordinal scale); a series of values may contain the same information as the series of logarithms of values, etc. To be meaningful, the measurement of discrepancy between observed and estimated values must have the same invariance properties as the relevant scales on which the values are measured. In other words, the meanings of ‘close’ and ‘far away’ are scale dependent, which makes it very difficult to combine scores for variables measured on different scales. A second and, in the present context, critical disadvantage with point estimates is that they give no indication of uncertainty. Expert judgement is typically applied when there is substantial uncertainty regarding true values and, in such cases, it is almost always essential to have some picture of the uncertainty in the assessments. A third disadvantage is that methods for processing and combining judgements are typically derived from methods for processing and combining actual physical measurements. This has the effect of treating expert assessments as if they were physical measurements in the normal sense, which they are not. On the positive side, however, point estimates are easy to obtain and can be gathered quickly – thus, these types of assessment will always have some place in the world of the expert, if only in the realm of the “quick and dirty”. For psychometric evaluations of Delphi methods, see Brockhoff (1975), and Gustafson et al. (1973), and see Cooke (1991) for a review.

2.2.2

Discrete event probabilities

An uncertain event is one that either occurs or does not occur, though we do not know which a priori. The archetypal example is “will it rain tomorrow?”. Experts are often asked to assess the probability of occurrence of such events, with the assessment usually taking the form of a single point value in the [0,1] interval, for each uncertain event. The assessment of discrete event probabilities must be distinguished from the assessment of ‘limit relative frequencies of occurrence’ in a potentially infinite class of experiments (the so-called reference class). The variable ‘limit relative frequency of rain in days for which the average temperature is 20 degrees Celsius’ is not a discrete event. This is not something that either occurs or does not occur; rather this variable can

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

9

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

take any value in [0,1], and under suitable assumptions the value of this variable can be measured approximately by observing large finite populations. If ‘limit relative frequency of occurrence’ is replaced by ‘probability’, then careless formulations can easily introduce confusion and misleading outcomes. Confusion is avoided by carefully specifying the reference class whenever discrete event probabilities are not intended. Methods for processing expert assessments of discrete event probabilities are similar in concept to methods for processing assessments of distributions of random variables. For an early review of methods and experiments see Kahneman et al. (1982); for a discussion of performance evaluation see Cooke (1991).

2.2.3

Distributions of continuous uncertain quantities

For applications in uncertainty analysis, concern is mostly with random variables taking values in some continuous range. Strictly speaking the notion of a random variable is defined with respect to a probability space in which a probability measure is specified, hence the term ‘random variable’ entails a distribution. Therefore the term ‘uncertain quantity’ is preferred - an uncertain quantity assumes a unique real value, but it is not certain as to what this value is. The uncertainty is described by a subjective probability distribution. The concern is with cases in which the uncertain quantity can assume values in a continuous range. An expert is confronted with an uncertain quantity, says X, and is asked to specify information about his subjective distribution over the possible values of X. The assessment may take a number of different forms. The expert may specify his cumulative distribution function, or his density or mass function (whichever is appropriate). Alternatively, the analyst may require only partial information about the distribution. This partial information might be the mean and standard deviation, say, or it might be several quantiles of the distribution. For r in [0,1], the rth quantile is the smallest number xr such that the expert’s probability for the event {X ≤ xr} is equal to r. The 50% quantile is the median of the distribution. Typically, only the 5%, 50% and 95% quantiles are requested, and distributions are fitted to the elicited quantiles.

2.2.4

Conditionalization and dependence

When expert judgement is cast in the form of distributions of uncertain quantities, the issues of conditionalization and dependence are important. When uncertainty is quantified in an uncertainty analysis, it is always uncertainty conditional on something. Thus it is essential to make clear the background information conditional on which the uncertainty is to be assessed. For this rerason, the facilitator should ensure that a clear ‘case structure’ is always provided. Failure to specify background information can lead experts to conditionalize their uncertainties in different ways or on different assumptions, and this can introduce unnecessary ‘noise’ or scatter into the assessment process. The background information will not specify values of all relevant variables. Obviously relevant

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

10

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

but unspecified variables should be identified, though an exhaustive list of relevant variables is seldom possible. Uncertainty caused by unknown values of unspecified variables must be ‘folded into’ the uncertainty of the target variables. This is an essential task of the experts in developing their assessments. Variables whose values are not specified in the background information can cause dependencies in the uncertainties of target variables. Dependence in uncertainty analysis is an active issue and methods for dealing with dependence are still very much under development. Suffice to say here, that the analyst must pre-identify groups of variables between which significant dependence may be expected, and must query experts about dependencies in their subjective distributions for these variables. Methods for doing this are discussed in Cooke and Goossens (2000), and Kraan and Cooke (2000).

2.3 Performance-based measures of expertise For deriving uncertainty distributions over model parameters from expert judgements the so-called Classical Model has been developed in Delft (Bedford and Cooke, 2001). Other methods to elicit expert judgements are available, for instance for seismic applications (Budnitz et al., 1998) and nuclear applications (USNRC, 1990). The European Union recently finalized a benchmark study among various expert judgement methods (Cojazzi and Fogli, 2000; Cojazzi et al., 2000). In a joint study by the European Communities and the Nuclear Regulatory Commission the benefits of the latter method - the so-called NUREG-1150 method (Hora and Iman, 1989) - have been used incorporating many elements of the Classical Model (Goossens and Harper, 1998). The Classical Model is a performance-based linear pooling or weighted averaging model. The weights are derived from experts’ calibration and information performance, as measured on calibration or seed variables. These are variables from the experts’ field whose values become known to the experts post hoc. Seed variables serve a threefold purpose: (i) to quantify experts’ performance as subjective probability assessors, (ii) to enable performance-optimized combinations of expert distributions, and (iii) to evaluate and hopefully validate the combination of expert judgements. The name ‘classical model’ derives from an analogy between calibration measurement and classical statistical hypothesis testing. It contrasts with various Bayesian models. The Classical Model contains three different weighting schemes for aggregating the distributions elicited from the experts. These weighting schemes are ‘equal weighting’, ‘global weighting’, and ‘item weighting’. The different weighting schemes are distinguished by the means by which the weights are assigned to the uncertainty assessments of each expert. The equal weighting aggregation scheme assigns equal weight to each expert. If N experts have assessed a given set of variables, the weights for each density are 1/N; hence for variable i in this set the (equal weights) decision maker’s CDF is given by:

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

11

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

Fewdm, j

 1 =    N

N

∑ j =1

f j ,i

where fj,i is the cumulative probability associated with expert j’s assessment for variable i. Global and item-based weighting techniques are termed performance-based weighting techniques because weights are developed based on an expert’s performance on seed variables. Global weights are determined, per expert, by the expert’s calibration score and overall information score. The calibration score is determined for each expert by his assessments of seed variables. The information score is related to the width of the uncertainty band and the placement of the median provided by the expert. As with global weights, item weights are determined by the expert’s calibration score. Whereas global weights are determined by expert, item weights are determined jointly by expert and by variable in a way that is sensitive to the expert’s informativeness for each variable. The performance-based weights use two quantitative measures of performance: calibration and information (or, ‘informativeness’). Calibration measures the statistical likelihood that a set of experimental results corresponds, in a statistical sense, with the experts’ assessments. At the heart of Cooke’s “classical” model is the concept: given a set of known (or knowable) seed items, for each expert test the hypothesis H0: “This expert is well calibrated”, leading to likelihood of acceptance at some defined significance level. Then use this likelihood to define the expert’s Calibration score: C j = 1 − χ R ( 2 * M * I ( s j , p) * Power ) 2

where j denotes the expert, R is number of quantiles (= degrees of freedom), M is the number of seed variables used in calibration, and I(sj ,p) is a measure of information (see below). Cj corresponds to the asymptotic probability of seeing a deviation between s and p at least as great as I(sj ,p), under the hypothesis. Thus, loosely, the calibration score is the probability that the divergence between the expert’s probabilities and the observed values of the seed variables might have arisen by chance. A low score (near zero) means that it is likely, in a statistical sense, that the expert’s probabilities are ‘wrong’. Similarly, a high score (near one, but greater than, say, 0.05) means that the expert’s probabilities are statistically supported by the set of seed variables. Informativeness represents the degree to which an expert’s distribution is concentrated, relative to some user-selected background measure. For instance, one could estimate individual’s information score relative to a uniform or log-uniform density function from:

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

12

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

I j ( s j , p) =

1 n s si ln( i ) ∑ n i =1 pi

where si is a sample distribution obtained from the expert on the seed variables, and pi is a suitable reference density function, depending on the appropriate scaling for the item. Thus, the overall information score is the mean of the information scores for each variable. This is proportional to the information in the expert’s joint distribution relative to the joint background measure, under the assumption of independence. Independence in the experts’ distributions means that the experts would not revise their distributions for some variables after seeing realizations for other variables. Scoring calibration and information under the assumption of independence reflects the fact that expert learning is not a primary goal of the study. The individual expert weights in the Classical model are proportional to the product of the calibration (statistical likelihood) and informativeness scores of the expert, where the latter is now estimated from all variables jointly, that is, both seeds and unknowns:

Wj = C j * I j ( s j , p) The Wj can be normalised across all experts to get relative weights. ‘Good expertise’ corresponds to good calibration (high statistical likelihood) and high information content (good informativeness). Once the individual experts have been scored and weighted, they can be pooled together to form a combined expert (or Decision Maker DM), and the net calibration and informativeness of this synthetic expert can also be measured. For more detail see Cooke et al. (1988), Cooke (1991) and Bedford and Cooke (2001). Thus, in the Delft Classical model, calibration and information are combined to yield an overall or combined score with the following properties: 1. Calibration predominates over informativeness, and informativeness serves to modulate between more or less equally well-calibrated experts. 2. The score is a long run proper scoring rule, that is, an expert achieves his/her maximal expected score, in the long run, by and only by stating his/her true beliefs. Hence, the weighting scheme, regarded as a reward structure, does not bias the experts to give assessments at variance with their real beliefs, in compliance with the principle of neutrality. 3. Calibration is scored as ‘statistical likelihood with a cut-off’. An expert is associated with a statistical hypothesis, and the seed variables enable measurement of the degree

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

13

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

to which that hypothesis is supported by observed data. If this likelihood score is below a certain cut-off point, the expert is unweighted. The use of a cut-off is driven by property (2) above. Whereas the theory of proper scoring rules says that there must be such a cut off, it does not say what value the cut-off should be. 4. The cut-off value for (un)weighting experts is determined by either optimizing or constraining the calibration and information performance of the combination (see Sect 2.5 below). A fundamental assumption of the Classical model (as well as Bayesian models) is that the future performance of experts can be judged on the basis of past performance, as reflected in the seed variables. Seed variables enable empirical control of any combination schemes, not just those that optimize performance on seed variables. Therefore, choosing good seed variables is of general interest - see Goossens et al. (1996, 1998) for background and discussion. The procedures necessary for using the Classical model in practice have been implemented in the software package EXCALIBR (Cooke and Solomatine, 1992), with support from the European Community. Examples of expert judgement studies using EXCALIBR and the Delft procedures are provided in the references, and short descriptions of some selected cases are provided in the next section.

2.4 Examples of expert judgement exercises This section highlights just a few of the many expert judgement exercises executed with the Delft methods and the EXCALIBR program. In the field of volcanology, the most intensive use and greatest experience has been obtained in connection with the eruption of the Soufrière Hills volcano, Montserrat, from 1995 onwards (Aspinall, 1997,1998; Aspinall and Cooke, 1998; Aspinall et al., 2002). The suggestion to use a structured approach to the elicitation of expert opinion in a volcano crisis was initially put forward by Aspinall and Woo (1993; 1994). Thus, within the volcanological context of the present EXPLORIS project, the application of formalised expert judgement procedures takes advantage of the insights gained from experience in Montserrat. As an example from another discipline, the EU Seveso II Directive requires risk assessments for, among others, water pollution by chemical establishments. Implementation of the Directive required a ranking of the contributions of the management factors and relative failure frequencies of chemical activities, such as processes, storage and transport (Goossens and Cooke, 1997). Rankings of management factors were also assessed with the paired comparisons method for safety management systems (Hale et al., 2000) as well as for the reliability of linings of solid waste landfills (Rodic and Goossens, 2001; Rodic, 2000). Most expert judgement exercises are aimed at providing quantitative assessments. Airborne releases of large amounts of toxic chemicals could result in major consequences. Dose-response relations of these chemicals are largely uncertain and difficult to achieve through animal

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

14

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

experiments, but some relations have been established for a few chemicals to show the potential of formal expert judgement (Goossens et al., 1992; Goossens et al., 1998). Comparable exercises have been undertaken for assessing hazardous airborne particles under various meteorological conditions (Cooke, 1994; Cooke et al., 1994). In succession to the last example, quantitative assessments of probability distributions for important parameters in accident consequence models for nuclear power stations (Goossens and Harper, 1998; Goossens and Kelly, 2000) have been derived and used in the uncertainty analysis of the accident consequence software package COSYMA. In this particular project experts also assessed correlations between parameters as conditional probabilities. In the aerospace sector, expert judgement exercises (Cooke et al., 1990) were done to determine predictions for space debris impacts and space shuttle composite material loading failures. The hazards identified in using a certain composite material in a spacecraft structure all related to the reduction of the load-carrying capabilities of the material eventually leading to probabilities of catastrophic failure during major loading episodes. The space debris exercise aimed at providing insights into future impact loadings by space debris in order to identify design criteria for spacecraft systems to provide protection against damage. In the veterinary sector, expert judgements were used to come up with model parameters for bovine respiratory diseases (Van der Fels-Klerx et al., 2002). The quantitative experts’ assessments were fed into economic models for farming practices. Some expert judgement exercises have also been performed for supporting the reliability of critical infrastructures. For example, the Netherlands is a unique country with respect to transport over water and flooding prevention needs. One exercise provided quantitative assessments of dike-ring failure probabilities in order to prevent the country from flooding (Slijkhuis et al., 1998). Another provided assessments for the reliability of movable water barriers (Van Elst, 1997), in particular related to human handling failure probabilities. A third exercise elicited contributing factors to accident frequencies for inland waterway transportation by ships. Also quantitative assessments were elicited for the gas company in the Netherlands in order to support the understanding of failure rates for underground transportation of gas through pipelines (Cooke et al., 1996; Cooke and Jager, 1998). Water safety considerations are not restricted to the Netherlands, however, and the same expert elicitation procedures and EXCALIBR program have been used to parameterise models for predicting progressive internal dam erosion in the UK (Brown and Aspinall, 2004). In the latter case, the issue of how to select the cut-off level for un-weighting individual experts (i.e. when they should receive zero weight) was addressed in a practical application, and the approach followed is described briefly in the next section.

2.5 Constrained decision-maker optimisation

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

15

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

As noted in Sect. 2.3 above, the cut-off value for un-weighting experts is determined either by optimizing the calibration and information performance of the combined synthetic ‘expert’ against proper scoring rules, or by providing constraining criteria, based on other considerations. One of the strengths of the Delft approach, and the EXCALIBR implementation, is that it allows a wide variety of pooling and scoring schemes to be explored quantitatively for any individual elicitation exercise. In the British dam safety study (Brown and Aspinall, 2004), for instance, the main results from the elicitation were derived by fixing, pragmatically, the calibration power and significance level parameters, so as to ensure that all experts obtain some positive, non-zero weight, and that the ratio between the highest and lowest weights was not too extreme. The span between the best and poorest performances was fixed, after discussion with the owners of the sruvey, to be no more than two orders of magnitude (i.e. the highest weighting being a factor of 100 times the lowest, or less). This approach, in which the weights of individuals are factored before pooling the whole group, quite strongly moderates the optimization of the synthetic decision-maker, and hence curtails the weight given to that entity as a virtual expert. In that instance, additional analyses were conducted for the purpose of adjusting (but not maximizing), in some realistic sense, the synthetic decision-maker’s performance so that the harshness of rejection of low-weighted real experts was limited. This was achieved by tuning the power of the chi-square test and the related significance level setting, which together determine the calibration threshold value. There is a wide range of possible combinations of settings for these tests and, in the case of the dam study, it was decided that, whatever selections were made, a majority of the group (i.e. for no less than six of the eleven experts) must retain non-zero weights. Supplementary analysis runs were undertaken, therefore, to examine how the elicitation results might change if this position was adopted. The calibration power and significance level were each increased incrementally to allow the analysis to give more weight to the synthetic expert, until the minimum size of majority, mentioned above, was reached. The results produced by this alternative pooling configuration were not dramatically different from those obtained with conventional optimisation, although there are notable changes for a few items, and hints of systematic shifts in the central value outcomes in several others. The fact that the differences were generally modest is not surprising, however, if it is pointed out that the discounted experts had quite low performance scores, and were not exerting much influence on the joint pooling, anyway. What is significant, however, is that, as a result, much greater authority was given to the synthetic decision-maker. This increased weighting represents a shift towards a more homogeneous collective combination of the views of the most influential experts, and a situation where the synthetic decision-maker then significantly out-performed the best individual expert. On this basis, it could be argued that results obtained under this ‘constrained optimisation’ scheme represent a better and more rational consensus and, as a consequence, should be preferred over those from the whole group. Further analysis of the experts’ contributions to the synthetic DM is possible by activating EXCALIBR’s ‘expert robustness’ option. This is a facility for re-running iteratively the analysis,

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

16

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

dropping one expert at a time, to show what impact his omission has on the DM's calibration score and informativeness. In the dam safety case, a breakdown of the contributions of the positivelyweighted experts indicates that three of them have detectable influences: two influence (in a positive sense) the DM's calibration score, and another exerts particular pressure on the DM's informativeness score. That said, the other three experts also contribute to characterizing the DM, but to an extent that is much less marked, and very similar, one to another. The particular expert who influences the DM’s informativeness presents an interesting example of expert judgement: his calibration score was fairly good (but not the best), and for ALL items in the subject questionnaire his informativeness measure is also quite good, but not exceptional. However, he had a particularly effective informativeness score for the seed questions, and this significantly enhances his weight and ranking overall. So, in the robustness trial, dropping this particular expert appears to improve the DM's relative calibration score much more than by dropping any of the other experts (including the lowest weighted!). But, in doing so, the DM's informativeness is reduced significantly, too. Importantly, what this robustness analysis showed was that the virtual DM in this example was not dominated by any single real expert (as has been found occasionally in other applications). Therefore, it was recommended that the synthetic decision-maker outputs, obtained with the socalled ‘constrained optimization’, should be used for informing the parameterization of the proposed internal erosion model.

2.6 Setting the scene for EXPLORIS elicitations In the light of the insights provided by previous experience and applications of the Delft approach to expert elicitations with the EXCALIBR procedure, it is considered advantageous to follow the same pragmatic approach within the present EXPLORIS project. In particular, it is proposed to adopt a subjective limit on the decision-maker optimization by preserving a limit of no more than two orders of magnitude variation between the highest and lowest weightings, as this has been found satisfactory in other disciplines where quantitative information is weakly-constrained. In the context of the EXPLORIS Project, an important point to recognise is that, in contrast to some procedures, all the participants in the expert elicitations are asked to provide a spread of values within which they feel the ‘true’ answer should fall for each variable or factor, not just their ‘best estimate’ value for that question. This distinguishes the approach from most traditional methods for obtaining guidance on issues for which there is only very sparse (or no) data. In this way, a proper measure of each individual’s range of uncertainty can be fully incorporated into the elicitation, via the EXCALIBR procedure, and a realistic indication of true collective scientific uncertainty can be gained.

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

17

DEL IV ER ABL E D6.1

3

E X P E R T E L IC I T A T I O NS

Initial expert elicitations in EXPLORIS

The initial performance-based calibration exercise for EXPLORIS members was conducted at the Kick-off Meeting in Pisa in January 2003, using a set of ‘seed questions’ devised specifically for the purpose (see Appendix 1). The outcome of that exercise was a set of numerical scores for the ‘informativeness’ and ‘calibration’ of the experts: these ranged from high to low, depending upon the calibration power limit imposed, and upon the individual performance of each expert against the set of seed questions, as summarised in the next sub-section.

3.1 Calibration – summary of experts’ performance-based weights The outcome of an EXCALIBR performance-based calibration exercise is a set of numerical scores for the experts, each individual’s score representing their empirical performance in making judgments about uncertain parameters. For the EXPLORIS group in Pisa, the statistical power of the Delft Classical method was adjusted to keep the range of weights within the two orders of magnitude, as decided on the advice of the facilitator.

Fig. 3.1 Summary plot of EXPLORIS calibration results, showing informativeness and calibration scores for all participants. Symbol size indicates the relative overall weighting given to each expert.

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

18

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

Taken as a whole, the pattern of the calibration results (see Fig. 3.1) is closely similar to that encountered with any large group of specialist experts, in any field of science. When their informativeness and calibration measures are combined together to produce individual weights (as described in Section 2.3, above), a few gain ‘high’ scores, and many are given much lower scores. Fig. 3.2 is an alternative, ternary plot of the same results, showing jointly on one diagram the relationships between individual informativeness, calibration and relative weight for EXPLORIS experts’ performance-based measures. Those experts with the highest relative weights find themselves plotting towards the top of the diagram (‘heavier’). ‘Opinionated’ individuals are attracted towards the right-hand axis, and ‘poorly-calibrated’ subside towards the lower right apex of the plot. In the case of the EXPLORIS team, as is common for all groups of experts in any scientific or engineering discipline, there is a tendency for a majority of people to be over-confident in their judgements (i.e. too opinionated), to lose relative weighting accordingly and thus fall into the lower right quadrant of the ternary plot.

Fig. 3.2 Ternary plot showing the relationships between informativeness, calibration and relative weight for EXPLORIS experts’ performance-based measures – see text for discussion

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

19

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

By way of contrast, there was also in the EXPLORIS group at Pisa one individual who apparently demonstrates a marked lack of informativeness and, as a consequence, also ends up with a very low relative weight. The likely explanation for this is, however, that because the seed questions were dominated by volcanological topics and the person concerned was not a volcanologist or geologist, he or she had quite properly expressed their lack of certitude in volcanological matters by providing very wide uncertainty bands in their responses. The EXCALIBR procedure has assessed that particular participant’s expertise appropriately, demonstrating the probity of the approach.

3.2 Initial elicitations for the EXPLORIS volcanoes Significant work has been undertaken in Years 1 and 2 in cooperation with all other WP partners in developing a procedure for assigning probability weights to the various logic-tree branches of volcanic events and impacts for each of the project’s four designated volcanoes. The main activities involved in this have been a joint work task with WP2 (Volcanological Scenario Definitions – VSD) and, in close collaboration with WP2 colleagues, volcanic scenarios for future explosive eruptions at Vesuvius, La Soufrière, Sete Cidades and Teide volcanoes have been defined. As noted in Sect. 3.1, above, the initial ‘calibration’ stage of the formalised procedure for eliciting VSD scenario probabilities using the EU-sponsored program EXCALIBR was conducted with participants at the Pisa meeting (16-17 January 2003), and a follow-up exercise was undertaken at the Naples meeting (19-20 September 2003), when additional calibrations were completed for several colleagues who were not at Pisa. By the time of the Tenerife Meeting (28–30 November 2003), at the end of the first year of EXPLORIS, 49 individual experts had been processed and incorporated into the calibration database, leaving just a few remaining to complete the coverage of the whole group. However, before proceeding with the volcano-specific elicitations, one of the main issues that had to be addressed at an early stage was how to categorise or define the scale or ‘size’ of an eruption.

2.3.1 Event ‘size’ During the initial stages of the EXPLORIS project, the decision was made in WP2 to rely on the volume of ejecta or material (sometimes referred to as ‘volcanic magnitude’) as a measure of the ‘size’ of an eruption for any particular VSD scenario (see also Deliverable 2.11). It is, of course, recognised that other characterisations of size of eruption could have been used. Thus the magnitude of an eruption can be expressed in terms of the volume (in cubic metres) of material erupted, and simple numerical orders of magnitude can be used to differentiate one scenario from another. For consistency, eruption volumes derived from deposit measurements, etc., need be stated in terms of DRE, wherever appropriate.

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

20

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

However, different workers can have a preference for indicating eruption ‘size’ in different terms, so Table 3.1 suggests a set of rudimentary conversions between Volcanic Explosivity Index (VEI), equivalent volume of ejecta, and column height – simply for the purposes of establishing a set of EXPLORIS scenario definitions that could work across different partners.

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

21

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

Table 3.1 Proposed relationships between VEI, column height and volume of ejecta for use in EXPLORIS to characterise VSD scenarios (see text and Appendix 2)

Again, while it is recognised that other relationships have been proposed in the literature, or for VEI 2

VEI 3

VEI 4

VEI 5

VEI 6

VEI 7

Column height (km)

1–5

6 – 10

11 – 15

16 – 25

26 – 35

> 35

Volume (cu m)

< 107

10 7 - 10 8

10 8 – 10 9

109 – 1010

10 10 – 1011

> 1011

other purposes, the scheme presented here has the merits of simplicity and the avoidance of overlapping or inconsistent range values. Further discussion of this issue was included in the handout that was provided at the EXPLORIS meeting held in September 2003 in Naples (see Appendix 2).

2.3.2

VSD probabilities of occurrence

At, and following, the EXPLORIS meeting held in Naples, elicitations were undertaken with the relevant specialists for each volcano to evaluate the comparative probabilities of occurrence of the five levels of VSD, defined by the project WP2 (see Deliverable D2.11). Tabulated summaries showing the profiles of the elicitation questionnaire responses, by volcano and by VSD, are shown in Figure 3.3. Even at this elementary level of detail, it is possible to discern which volcanic explosion scenarios are regarded as most significant at each volcano, and which are thought to be so remote a possibility that they can be effectively disregarded for risk assessment purposes.

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

22

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

V E S U V IO : S C E N A R IO VSD 1 E xp.

1 2 3 4 5 6 7 8 9 10 11 12 13

1

2

Y

Y

3

VSD 3

VSD 2 4

1

2

3

Y

Y

Y

4

Y

Y

Y

Y

Y

Y

Y

Y

Y Y

Y

Y

VSD 4 4

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

7

5

2

2

5

5

6

4

3

4

4

1

1

VSD 5

2

3

4

1

2

3

4

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y Y

Y Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y Y

Y Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

9

5

12 12

Y

Y

Y

Y Y

Y Y

Y Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

7

6

12 13

S C E N A R IO 1 1 2 3 4 5 6

2

3

VSD 2 4

Y Y

VSD 3

1

2

3

4

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

4

5

4

3

Y

Y

Y

Y

5

3

1

1

S O U F R IE R E :

1 2 3

2

3

VSD 4 4

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

4

3

1

3

1

2

3

Y

Y

Y

Y

Y

Y

Y

Y

Y

3

3

3

VSD 2 4

Y

1

2

Y

Y

3

Y

Y

Y

Y

Y

Y Y

Y

Y

Y

6

3

4

VSD 5 4

Y

1

2

Y

Y

Y

Y

Y

Y

Y

Y Y

3

4

Y Y

Y

2

1

Y

0

4

6

4

1

2

Y

Y

Y

Y

1

VSD 3 3

VSD 4 4

1

2

2

3

4

1

2

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

3

3

3

3

3

3

2

1

3

0

3

3

1

4

1

0

3

VSD 5

1

S E T E C ID A D E S :

0

3

4

0

S C E N A R IO 1

1 2 3 4 5 6 7 8

1

S C E N A R IO VSD 1

E xp.

3

Y Y

VSD 1

E xp.

2

Y

e l T E ID E

E xp.

1

Y

1

VSD 1 2 3

4

Y

1

1 Y

0

0

1

VSD 2 2 3

4

Y

1

1 Y

0

0

VSD 3 2 3

VSD 4 2 3

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y Y

Y Y

Y Y

Y Y

Y

Y

Y

Y

Y

8

8

4

8

1

0

Y

Y

VSD 5 2 3

Y Y

1

Y

1 Y Y

3

Y

4

Y Y

4

Y

Y

1

0

1

0

Fig. 3.3 Summary charts showing responses to the initial EXPLORIS elicitation questionnaire, by volcano, and by scenario and event size (see text).

The further processing of detailed questionnaire responses is work-in-progress through Year 2 and, by the end of the project, should provide a coherent set of equivalent annual probabilities of occurrence for each viable eruption scenario across all the project volcanoes.

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

23

DEL IV ER ABL E D6.1

2.3.3

E X P E R T E L IC I T A T I O NS

Elicitation results

The results of the volcano-specific elicitations of VSD probabilities of occurrence at Vesuvius, La Soufrière and Teide volcanoes are tabulated in Tables 3.2 – 3.4, and presented graphically in Figures 3.4 to 3.7. In the case of Sete Cidades, the elicitation responses that have been obtained thus far have been too deficient in number and scope to derive meaningful results and, it is hoped, this limitation will be pursued further with the relevant participants. On these tables, the separate items elicited are identified under column ‘Id’ by a three-letter code for the volcano (VES = Vesuvius; SOU = Soufriere; TEI = Teide), followed by a two-digit combination for the VSD concerned (i.e. 1-5) and the ‘size’ of the event (1-4). Note, however, that while each VSD has four sizes of eruption associated with it, expressed in terms of volume of material erupted, the actual ranges of volumes are scenario-specific and vary from from one VSD to another. The corresponding details or values are given on the accompanying figures. The experts’ evaluations are expressed in terms of expected long-run average return intervals (column ‘50%’, in years), together with their associated 5%ile – 95%ile spreads of uncertainty. On the corresponding plots, the expected return interval and uncertainty bounds have been converted to equivalent annual probability of occurrence.

Case name : Vesuvio 26/11/2003 CLASS version W4.0 ________________________________________________________________________________ Resulting solution (combined DM distribution of values assessed by experts) Bayesian Updates: no Weights: global DM Optimisation: no Significance Level: 0.0000 Calibration Power: 0.3300 _________________________________________________________________________________ Nr.| Id |Scale| 5%| 50%| 95%|Realizatii| Full Name ____|______________|_____|__________|__________|__________|__________|___________ 11|VES11 |LOG | 6.676| 226| 2.377E4| | 12|VES12 |LOG | 21.88| 430.8| 4828| | 13|VES13 |LOG | 245.4| 6115| 4.769E4| | 14|VES14 |LOG | 250.1| 1.359E5| 8.806E6| | 15|VES21 |LOG | 51.2| 239.1| 972.5| | 16|VES22 |LOG | 113.9| 1492| 1.315E4| | 17|VES23 |LOG | 108| 556.4| 2.073E4| | 18|VES24 |LOG | 269.9| 1434| 1.634E5| | 19|VES31 |LOG | 101.6| 357.7| 5803| | 20|VES32 |LOG | 224.9| 3545| 2.644E4| | 21|VES33 |LOG | 540.1| 1.443E4| 3.926E5| | 22|VES34 |LOG | 1E7| 5E7| 1E8| | 23|VES41 |LOG | 5.482| 76.61| 632| | 24|VES42 |LOG | 10.88| 239| 1843| | 25|VES43 |LOG | 12.76| 1450| 2.21E6| | 26|VES44 |LOG | 161| 5.725E5| 4.241E8| | 27|VES51 |LOG | 108.4| 461.4| 5649| | 28|VES52 |LOG | 183.3| 1978| 1.153E4| | 29|VES53 |LOG | 505.1| 8660| 3.329E5| | 30|VES54 |LOG | 2214| 2.066E6| 1.397E8| | _________________________________________________________________________________ ________________________________________________________________________________ (c) 1999 TU Delft

Table 3.2 Vesuvius: EXCALIBR-processed results from elicitations of inter-event intervals (in years) for defined eruption scenarios and eruptions sizes (see text) H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

24

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

Case name : Guadeloupe 26/11/2003 CLASS version W4.0 ________________________________________________________________________________ Resulting solution (combined DM distribution of values assessed by experts) Bayesian Updates: no Weights: global DM Optimisation: no Significance Level: 0.0000 Calibration Power: 0.3300 _________________________________________________________________________________ Nr.| Id |Scale| 5%| 50%| 95%|Realizatii| Full Name ____|______________|_____|__________|__________|__________|__________|___________ 11|SOU11 |LOG | 2.889| 57.61| 880.9| | 12|SOU12 |LOG | 28.97| 153.9| 1340| | 13|SOU13 |LOG | 58.49| 532.2| 3577| | 14|SOU14 |LOG | 300| 1000| 5000| | 15|SOU21 |LOG | 118.6| 926.8| 8874| | 16|SOU22 |LOG | 129.1| 1930| 1.774E4| | 17|SOU23 |LOG | 362.1| 3889| 2.692E4| | 18|SOU24 |LOG | 2746| 1.227E4| 4.779E4| | 19|SOU31 |LOG | 62.08| 387.1| 1824| | 20|SOU32 |LOG | 71.52| 750.7| 6096| | 21|SOU33 |LOG | 540.7| 2202| 1.37E4| | 22|SOU34 |LOG | 1000| 7200| 3E4| | 23|SOU41 |LOG | 233.8| 1305| 8890| | 24|SOU51 |LOG | 1020| 5001| 2.334E4| | 25|SOU52 |LOG | 3187| 1.325E4| 4.37E4| | 26|SOU53 |LOG | 6000| 1.5E4| 6E4| | _________________________________________________________________________________ ________________________________________________________________________________ (c) 1999 TU Delft

Table 3.3 Soufrière of Guadeloupe: EXCALIBR-processed results from elicitations of inter-event intervals (in years) for defined eruption scenarios and eruptions sizes

Case name : TEIDE 26/11/2003 CLASS version W4.0 ________________________________________________________________________________ Resulting solution (combined DM distribution of values assessed by experts) Bayesian Updates: no Weights: global DM Optimisation: no Significance Level: 0.0000 Calibration Power: 0.3300 _________________________________________________________________________________ Nr.| Id |Scale| 5%| 50%| 95%|Realizatii| Full Name ____|______________|_____|__________|__________|__________|__________|___________ 11|TEI11 |LOG | 58.07| 1.137E4| 9.996E4| | 12|TEI12 |LOG | 117| 1.01E4| 9.402E4| | 13|TEI21 |LOG | 57.8| 6005| 9.033E4| | 14|TEI22 |LOG | 109.2| 1.505E4| 3.616E5| | 15|TEI23 |LOG | 268.2| 1.058E4| 9.656E4| | 16|TEI24 |LOG | 5E4| 1E5| 6.737E5| | 17|TEI31 |LOG | 225.7| 1125| 1.793E4| | 18|TEI32 |LOG | 356.9| 5434| 9.003E4| | 19|TEI33 |LOG | 566.8| 1.412E4| 1.816E5| | 20|TEI41 |LOG | 149.1| 2543| 9319| | 21|TEI42 |LOG | 1008| 4252| 5.844E4| | 22|TEI43 |LOG | 5421| 4.78E4| 1.902E5| | 23|TEI51 |LOG | 151.2| 5610| 8.161E4| | 24|TEI52 |LOG | 1562| 5.1E4| 1.477E5| | 25|TEI53 |LOG | 1.2E4| 1.357E5| 3.855E5| | 26|TEI54 |LOG | 5E4| 1E5| 1E6| | _________________________________________________________________________________ ________________________________________________________________________________ (c) 1999 TU Delft

Table 3.4 Teide volcano: EXCALIBR-processed results from elicitations of inter-event intervals (in years) for defined eruption scenarios and eruptions sizes

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

25

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

Fig. 3.4 Vesuvius: Elicited annual probabilities of occurrence, by eruption scenario VSD and by volume of eruption

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

26

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

Fig.3.5 Teide: annual probabilities of occurrence, by eruption scenario VSD and by volume of eruption

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

27

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

Fig.3.6 Guadeloupe: annual probabilities of occurrence, by eruption scenario VSD and by volume of eruption

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

28

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

For Vesuvius (Fig. 3.4), there is a complete suite of estimates of recurrence probability for all the selected scenarios and for each of the sizes of eruption considered in the first elicitation. These plots suggest that dome collapse episodes (VSD 3) are considered to be rare at Vesuvius, compared to other eruptive scenarios, and that this especially so for the largest sizes of dome collapse event (i.e. > 10 9 cu m of material). Sector collapse events (VSD 2) and vulcanian/sub-plinian explosions (VSD 4) are indicated to be the most likely styles of eruption expected for this volcano, although there is apparently greater uncertainty amongst the experts about the probable recurrence intervals of the latter than for any other scenario, especially in respect of the largest sizes of such events. Overall, there is a consistency in the patterns shown in the panels of Figure 3.4 in that greater uncertainty spreads are associated with the likelihoods of occurrence of the bigger eruptions (excepting the case of dome collapses, as just noted). By way of contrast with Vesuvius, for Teide volcano (Fig. 3.5) the uncertainties associated with the recurrence of VSD 4 (vulcanian/sub-plinian) and VSD5 (major magmatic eruptions) are somewhat less than those attributed to the VSD 2 (sector collapse) situation. For this volcano, the initial elicitations suggest that small-scale (< 107 cu m of material) dome collapse activity (VSD 3) is considered likely to be encountered more frequently than other eruptive styles, although the responses suggest the largest volume considered (> 109 cu m of material) is judged implausible. Similarly, the elicitation indicates that large-scale phreatic eruptions (VSD 1) and full-size vulcanian explosive events can be omitted from the set of viable scenarios for Teide. Turning to the Soufrière of Guadeloupe (Fig. 3.6), the picture is somewhat different: phreatic events, dome collapses and sector failures are considered to be the prevalent styles of eruptive activity, while vulcanian/sub-plinian eruptions (VSD 4) hardly feature and major plinian eruptions (VSD 5) are relatively rare. If the uncertainty estimates in the frequency of occurrence of events at the Soufrière volcano are taken into account, the elicitation results suggest that small-scale eruptions of the phreatic type may be approaching two orders of magnitude more likely than other modes of eruption. By and large, however, it is worth noting that for all three volcanoes the spreads of uncertainty indicated in the tabulated results and plots are such that any firm assertions as to the relative rankings of the likelihoods of occurrence of the various combinations of scenario and event size must be regarded as far from conclusive, at this juncture. Further work, underway in EXPLORIS, may elucidate these opinions.

2.3.4

Designated volcano VSD cases: comparative results

The elicitation results presented in the last section were those obtained for each designated volcano,

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

29

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

tabulated and plotted individually by volcano. The same results are combined in the following plots (Figures 3.7 – 3.11) to allow relative comparisons to be made between the results for the volcanoes for which estimates are provided by elicitation. In Figures 3.7 – 3.11 inclusive, the equivalent annual probabilities of occurrence are therefore grouped by VSD. It should be noted, however, that for some of the VSD cases not all scenarios or size of eruption are included. Absences reflect the fact that either an insufficient number of opinions were offered, or that the EXPLORIS experts felt those specific combinations were inappropriate or implausible (as mentioned in the previous discussion, and in Sect. 3.2.2, above).

Fig 3.7 Comparison plot of elicited annual probabilities of occurrence of phreatic eruptions (VSD1), at three of the EXPLORIS volcanoes

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

30

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

For phreatic eruption scenarios (VSD 1), the implications of the elicitations are quite clear from Fig. 3.7: such events are expected to be significantly more frequent at Soufrière volcano than at Vesuvius, and about 100 times more likely than at Teide. This applies to phreatic events of all ‘sizes’. When it comes to sector collapses (VSD 2), it can be anticipated that Vesuvius will experience such events more frequently than either of the other two volcanoes (Fig. 3.8).

Fig 3.8 Comparison plot of elicited annual probabilities of occurrence of sector collapse eruptions (VSD2), at three of the EXPLORIS volcanoes

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

31

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

Fig 3.9 Comparison plot of elicited annual probabilities of occurrence of dome collapse episodes (VSD3), at three of the EXPLORIS volcanoes

As far as the EXPLORIS experts are concerned, Guadeloupe leads the way in terms of a propensity for dome collapse behaviour, although in terms of the smallest scale hazard considered (< 107 cu m of material) there is little to choose between it and Vesuvius (see Fig. 3.9).

Fig. 3.10 shows that, in terms of annual probability of occurrence, vulcanian/sub-plinian eruptions feature more strongly in the Vesuvius arsenal than at Teide, by a factor of about 30 times. In the case of Guadeloupe, small-scale events are evaluated to be slightly more likely than at Teide, but the initial elicitation does not provide any guidance on the likelihood of bigger eruptions of this type.

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

32

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

For the case of the most violent eruptive scenarios (VSD 5 – sustained major magmatic eruptions of Plinian type), the experts’ perceptions show a more varied picture (Fig. 3.11). At the lower end of event magnitudes (< 109 cu m of material - roughly equivalent to VEI 4), the probability of occurrence of such events at Vesuvius is indicated to be between 0.001 and 0.002 per year (i.e. between 1-in-1000 and 1-in-500 per year), while the same event at Guadeloupe or Teide is ascribed a probability of occurrence of about 2 x 10 -4 per year.

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

33

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

Fig 3.10 Comparison plot of elicited annual probabilities of occurrence of vulcanian/subplinian eruptions (VSD4), at three of the EXPLORIS volcanoes

When it comes to the largest explosions (i.e. of VEI 7), the likelihood of occurrence drops quite strongly below the overall trend for Vesuvius, whereas the Teide experts appear to suggest that this may be a more favoured scale of eruption for that volcano. One of the strengths of the structured elicitation process used in EXPLORIS is that it can highlight apparently inconsistent, quixotic or irrational outcomes, and prompt further discussion of all the factors and assumptions involved. This seems to be a case in point for further deliberations within EXPLORIS. For Guadeloupe, there is only a marginal reduction in the likelihood of VEI 5 and VEI 6 events, and the uncertainty spreads are such that it is not possible to distinguish between the estimated

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

34

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

probabilities of occurrence in any meaningful sense.

Fig 3.11 Comparison plot of elicited annual probabilities of occurrence of sustained major magmatic eruptions (VSD5), at three of the EXPLORIS volcanoes

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

35

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

Finally, in this section, a plot is presented in Figure 3.12 showing a comparison for Vesuvius between the elicitation results for two major eruption scenarios: VSD 4 (vulcanian/sub-plinian) and VSD 5 (plinian). These indicate that a cross-over in terms of probability of occurrence is anticipated between sub-plinian and plinian eruptions when the scale of total magma production of exceeds about 109 to 10 10 cu m (i.e. approximately equivalent to a VEI 5 event). In other words, if the eruption is going to produce a volume of 1010 cu m of magma, or greater, it is much more likely to do so as a plinian eruption, with all that that entails in terms of hazards and risks.

Fig 3.12 Elicitations of Vesuvius activity rates: comparative results for VSD 4 & VSD 5, expressed as annualised probabilities of exceedance

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

36

DEL IV ER ABL E D6.1

4

E X P E R T E L IC I T A T I O NS

Summing up and work-in-progress

At the time when the EXPLORIS project was first conceived, it was envisaged that a number of elicitations of expert opinion would be conducted throughout the 3-year programme of work. There was agreement that a special effort should be devoted to devising a way to monitor how people’s opinions evolved over the course of the project and, in particular, to quantify if possible the impact on their views of the new scientific insights generated by the project. This Deliverable D6.1 has described in detail the background, principles and methodology that underpin the procedure adopted for the elicitation and pooling of expert opinion, and summarises the outcomes of the initial elicitations that were conducted during the early phases of EXPLORIS. It therefore provides a formal reference basis for subsequent elicitation exercises, planned to follow in the final year of the project, and for the use of these results in informing the work underway in other packages and for other deliverables. Working closely with WP2, a series of initial elicitations of opinion on the long-term recurrence rates for three of the four project volcanoes have been undertaken involving relevant EXPLORIS partners, and making use of the EC-sponsored EXCALIBR elicitation software for pooling expert judgements. The volcanoes that have been considered thus far are: Vesuvius; Soufrière of Guadeloupe, and El Teide. In the case of Guadeloupe, the implementation has involved a restricted sample of experts, and it is planned to extend the catchment to a wider group of specialists in early 2005. Application of the expert elicitation procedure to the Sete Cidades volcano case study will proceed as soon as is feasible. Once expert calibrations and elicitations for the latter two volcanoes have been achieved, to make up the full suite, it is intended to pursue formal analysis of the item- and expert-specific robustness of all the different case studies. This would enable the project to produce a valuable comparative profile across all the case studies, and to explore the epistemological characteristics and limitations of the problems that are being addressed at each volcano. In the meantime, some preliminary work has been done for the case of Vesuvius to cross-compare initial elicited views on the expected recurrence intervals of various eruption scenarios with inferences drawn from the various historical catalogues that are available. In particular, special attention and credence will be paid to the important catalogue of explosive eruptions of Vesuvius that is in preparation under EXPLORIS WP2. Once the latter catalogue is finalised, a detailed joint analysis of that observational data with counterpart subjective data from expert judgements will be undertaken and reported. Within the EXPLORIS project, significant work has been on-going in Year 2 in cooperation with all other WP partners to further develop the procedure for assigning probability weights to the various logic-tree branches of volcanic events and to the associated risk impacts for each of the project’s four designated volcanoes. The efforts to design and construct a suitable universal logic-tree structure for all the EXPLORIS volcanoes has inevitably led to some re-thinking of the way the problem is formulated, and these adjustments will need to be reflected in the way subsequent

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

37

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

elicitations are conducted. The effort involved, therefore, is evolving and organic, and there is a need to maintain and foster strong collaboration and interaction with other EXPLORIS partners. It is confidently expected that this cooperation will continue to thrive as effectively and in the same good spirit as has marked the first two years of the EXPLORIS work.

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

38

DEL IV ER ABL E D6.1

5

E X P E R T E L IC I T A T I O NS

Bibliography and references

Aspinall W.P. (1997) Elicitation of expert opinions for decision-making: concepts, experience and issues. Paper presented to The Royal Society IDNDR conference ‘Extreme Natural Disasters - Mitigating Strategies for the 21st Century’, March 10, 1997; London. Aspinall W.P. (1998) The use of expert scientific judgement during the Montserrat eruption. abs.: Cities on Volcanoes - International Meeting, Rome and Naples, Italy, 28 June - 4 July 1998: Osservatorio Vesuviano/Gruppo Nazionale per La Vulcanologia; p26. Aspinall W. and Cooke R.M. (1998) Expert judgement and the Montserrat Volcano eruption. Proceedings of the 4th International Conference on Probabilistic Safety Assessment and Management PSAM4, September 13th -18th , 1998, New York City, USA (eds. Ali Mosleh and Robert A. Bari), Vol. 3, 2113-2118. Aspinall, W.P., Loughlin, S.C., Michael, F.V., Miller, A.D., Norton, G.E., Rowley, K.C., Sparks, R.S.J. and Young, S.R. (2002) The Montserrat Volcano Observatory: its evolution, organisation, role and activities. In: Druitt, T.H. & Kokelaar, B.P. (eds) The eruption of Soufrière Hills Volcano, Montserrat, from 1995 to 1999. Geological Society, London, Memoir. Aspinall W.P. and Woo G. (1993) A formalised decision-making procedure for the assessment of volcanic eruption threats using expert judgement. In: Natural Disasters: Protecting Vulnerable Communities (eds. Merriman P.A. and Browitt C.W.A.), Proceedings IDNDR Conference, London, 13-15 October 1993; 344-357. Aspinall W.P. and Woo G. (1994) An impartial decision-making procedure using expert judgement to assess volcanic hazards. Accademia Nazionale dei Lincei - British Council Symposium Large Explosive Eruptions, Rome, 2425 May 1993: Atti dei Convegni Lincei 112, 211-220. Bedford, T. J. and Cooke, R. M. (2001) Probabilistic Risk Analysis, Foundations and Methods, Cambridge: Cambridge University Press. Bradley, R. (1953) Some statistical methods in taste testing and quality evaluation, Biometrica 9, 22–38. Brockhoff, K. (1975) The performance of forecasting groups in computer dialogue and face to face discussions, in H. Linstone and M. Turoff (eds) The Delphi Method, Techniques and Applications, pp. 291–321. Reading Mass: Addison Wesley. Brown A.J. and Aspinall W.P. (2004) Use of expert opinion elicitation to quantify the internal erosion process in dams. In Proc: The 13th Biennial British Dams Society Conference: University of Kent, Canterbury, 22-26th June 2004. Budnitz, R. J., Apostolakis, G., Boore, D. M., Cluff, L. S., Coppersmith, K. J., Cornell, C. A. and Morris, P. A. (1998) Use of technical expert panels: applications to probabilistic seismic hazard analysis, Risk Analysis 18(4), 463– 69. Clemen, R. T. and Winkler, R. L. (1999) Combining probability distributions from experts in risk analysis, Risk Analysis 19(2), 187–203. Cojazzi, G. and Fogli, D. (2000) Benchmark Exercise on Expert Judgement Techniques in PSA Level 2, extended final report, European Commission EUR 19739, Brussels-Luxembourg. Cojazzi, G., Fogli, D., Grassini, G. and Coe, I. M. (2000) Benchmarking structured expert judgement methodologies for the assessment of hydrogen combustion in a generic evolutionary PWR, in S. Kondo and K. Furuta (eds)

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

39

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

PSAM5 – Probabilistic Safety Assessment and Management 2, pp. 1151–57. Tokyo: Universal Academy Press. Comer, K., Seaver, D., Stillwell, W. and Gaddy, C. (1984) Generating Human Reliability Estimates Using Expert Judgement I and II, NUREG/CR-3688, Washington, DC: USNRC. Cooke, R. M. (1991) Experts in Uncertainty, Oxford: Oxford University Press. Cooke, R. M. (1994) Uncertainty in dispersion and deposition in accident consequence modelling assessed with performance-based expert judgment, Reliability Engineering and System Safety 45, 35–46. Cooke, R. M. and Solomatine, D. (1992) EXCALIBR Integrated System for Processing Expert Judgements version 3.0, User’s manual, prepared under contract for Directorate-General XII, Delft: Delft University of Technology. Cooke, R. M. and Jager, E. (1998) A probabilistic model for the failure frequency of underground gas pipelines, Risk Analysis 18(4), 511–27. Cooke, R. M. and Goossens, L. H. J. (2000) Procedures Guide for Structured Expert Judgement, Report EUR 18820, Brussels-Luxembourg. Cooke, R. M., Mendel, M. and Thys, W. (1988) Calibration and information in expert resolution: a classical approach, Automatica 24, 87–94. Cooke, R. M., French, S. and van Steen, J. F. J. (1990) The use of expert judgement in risk analysis, Report to the European Space Agency, Delft: Delft University of Technology. Cooke, R. M., Goossens, L. H. J. and Kraan, B. C. P. (1994) Methods for CEC\USNRC accident consequence uncertainty analysis of dispersion and deposition – performance based aggregating of expert judgements and PARFUM method for capturing modelling uncertainty, Prepared for the Commission of European Communities, EUR 15856. Cooke, R. M., Jager, S. E. and Geervliet, (1996) Failure frequency of underground gas pipelines, in P. C. Cacciabue and I. A. Papazoglou (eds) Probabilistic Safety Assessment and Management 2, pp. 992–99, New York: Springer. David, H. (1963) The Method of Paired Comparisons, London: Charles Griffin. Goossens, L. H. J. and Cooke, R. M. (1997) Applications of some risk assessment techniques: Formal expert judgement and Accident Sequence Precursors, Safety Science 26(1/2), 35–48. Goossens, L. H. J. and Harper, F. T. (1998) Joint EC/USNRC expert judgement driven radiological protection uncertainty analysis, Journal Rodiological Protection 18(4), 249–64. Goossens, L. H. J. and Kelly, G. N. (2000) Expert judgement and accident consequence uncertainty analysis (COSYMA). Special Issue of Radiation Protection Dosimetry, 90(3), 291–384. Goossens, L. H. J., Cooke, R. M. and van Steen, J. F. J. (1989) Expert Opinions in Safety Studies, Report to the Dutch Ministry of Housing, Physical Planning and Environment, Delft: Delft University of Technology. Goossens, L. H. J., Cooke, R. M., Woudenberg, F. and van der Torn, P. (1992) Probit Functions and Expert Judgement: Main report: protocol and results; Appendices report: backgrounds, variables and data, Report prepared for the Ministry of Housing, Physical Planning and Environment, the Netherlands; Delft University of Technology, Safety Science Group and Department of Statistics, Stochastics and Operations Research, and Municipal Health Service, Rotterdam, Section Environmental Health, October. Goossens, L. H. J., Cooke, R. M. and Kraan, B. C. P. (1996) Evaluation of Weighting Schemes for Expert Judgement

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

40

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

Studies, Final report prepared under contract Grant No. Sub 94-FIS-040 for the Commission of European Communities, Directorate-General for Science, Research and Development, XII-F-6, Delft University of Technology, Delft/NL. Goossens, L. H. J., Cooke, R. M., Woudenberg, F. and van der Torn, P. (1998) Expert judgement and lethal toxicity of inhaled chemicals, Journal of Risk Research 1, 117–33. Goossens, L., Cooke, R. and Kraan, B. (1998) Evaluation of weighting schemes for expert judgement studies, in A. Mosleh and R. Bari (eds) Proceedings of the 4th International Conference on Probabilistic Safety Assessment and Management, pp. 1937–42. New York: Springer. Granger Morgan, M. and Henrion, M. (1990) Uncertainty. A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis, Cambridge: Cambridge University Press. Gustafson, D., Shulka, R., Delbecq, A. and Walster, A. (1973) A comparative study of differences in subjective likelihood estimates made by individuals, interacting groups, Delphi groups and nominal groups, Organisational Behaviour and Human Performance 9, 280–91. Hale, A. R., Costa, M. A. F., Goossens, L. H. J. and Smit, K. (1999) Relative importance of maintenance management influences on equipment failure and availability in relation to major hazards, in G. I. Schuëller and P. Kafka (eds) Safety and Reliability ESREL ’99 2, pp. 1327–32, Rotterdam: A. A. Balkema. Hale, A. R., Goossens, L. H. J., Costa, M. A. F., Smit, K. and Matos, L. (2000) Expert judgement for assessment of management influences on risk control, in M. P. Cottam (ed.) Foresight and Precaution 2, pp. 1077–82. A.A. Balkema, Rotterdam/Brookfield. Helmer, O. (1966) Social Technology, New York: Basic Books. Hogarth, R. (1987) Judgement and Choice, New York: Wiley. Hora, S. and Iman, R. (1989) Expert opinion in risk analysis: the NUREG-1150 methodology, Nuclear Science and Engineering, 102, 323. Kahneman, D., Slovic, P. and Tversky, A. (eds) (1982) Judgement under Uncertainty, Heuristics and Biases, Cambridge: Cambridge University Press. Kaplan, S. (1992). ‘Expert information’ versus ‘expert opinions’. Another approach to the problem of eliciting/combining/using expert knowledge in PRA, Reliability Engineering and System Safety 35, 61–72. Keeney, R.L. and Von Winterfeldt, D. (1989) On the uses of expert judgment on complex technical problems, IEEE Transactions on Engineering Management 36(2), 83–6. Kraan, B.C.P. and Cooke, R.M. (2000) Processing expert judgements in accident consequence modelling, Radiation Protection Dosimetry 90(3), 311–15.

Meyer, M.A. and Booker J.M. (2001) Eliciting and analyzing expert judgment: a practical guide. Philadelphia/Alexandria, ASA-SIAM: 459 pp. Newhall, C.G. and Hoblitt, R.P. (2002). Constructing event trees for volcanic crises. Bulletin of Volcanology 64: 3-20. Rodic, Lj. (2000) Reliability of landfill technology, PhD thesis, Delft: Eburon.

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

41

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

Rodic-Wiersma, Lj. and Goossens, L. H. J. (2001) Landfill technology performance: more than technology alone, in R. Cossu (ed.) Sardinia 2001: Eighth international Waste Management and Landfill Symposium, Sardinia, 1–5 October, Cagliari: CISA. Simkin T. and Siebert L. (1994) Volcanoes of the world, 2nd edn. Geoscience Press, Tucson. Skipp, B.O. and Woo G. (1993) A question of judgement: expert or engineering? In: Proceedings “Risk and Reliability in Ground Engineering”, London, Thomas Telford, 29-39. Slijkhuis, K., Frijters, M., Cooke, R. and Vrouwenvelder, A. (1998) Probability of flooding: an uncertainty analysis, in K. Lydersen, Hansen & Sandtorv (eds) Safety and reliability, Balkema, Rotterdam, pp. 1419–25. Thurstone, L. L. (1927) A law of comparative judgment, Psychological Review 34, 273–86. USNRC (1990) Severe accident risks: an assessment for five US nuclear power plants. Report NUREG-1150, Washington, DC: USNRC. Van der Fels-Klerx, I. H. J., Goossens, L. H. J., Saatkamp, H. W. and Horst, S. H. S. (2002) Use of a heterogeneous expert panel – elicitation of quantitative data from a heterogeneous expert panel: formal process and application in animal health, Risk Analysis 22, 67–81. Van Elst, N. P. (1997) Betrouwbaarheid Beweegbare Waterkeringen [Reliability of Movable Water Barriers], Delft: Delft University Press, WBBM report Series 35. Woo G. and Aspinall W.P. (1993) Expert judgement in making decisions on mining hazards. In: Rockbursts and Seismicity in Mines (ed. Young R.P.), Proceedings 3rd. Intl. Symp. on Rockbursts and Seismicity in Mines; Kingston, Ontario, 16-18 August 1993; 279-283.

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

42

DEL IV ER ABL E D6.1

6

E X P E R T E L IC I T A T I O NS

Appendix 1: Expert elicitation - calibration questionnaire

FUNDED BY THE EUROPEAN COMMUNITY

KICK-OFF MEETING 16-17 JANUARY 2003 Aula Magna Storica dell’Università, Palazzo La Sapienza Via Curtatone e Montanara 15, Pisa

Elicitation of scientific judgements Questionnaire for inputs to probabilistic risk analyses Willy Aspinall, Paul Cole & Gordon Woo

Discussion session Friday 17th January 2003

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

43

DEL IV ER ABL E D6.1

Your name/initials……………………

E X P E R T E L IC I T A T I O NS

Your EXPLORIS Volcano…………………

Part 1 First, a few ‘seed’ questions for calibrating individual expert’s inputs and ‘informativeness qoutients’ for the EXPLORIS elicitation exercise. Please provide both your range of uncertainty, and your best estimate or judgement. The range you give should indicate the lowest and highest values you believe must encompass the ‘correct’ answer.

1] Mount St Helens volcano, USA: in the 5 years prior to the first major earthquake swarm on 20 March 1980, how many earthquakes were detected and located anywhere within a radius of 35 km round the volcano? Your lowest value.............best estimate.............highest value.............

2] A recent summary of the stratigraphy of Mont Pelee, Martinique, for the last 6,000 years identifies two style of eruption: Peleean and Plinian; there are extensive deposits from 25 events with inferred Volcano Explosivity Index VEI>=3 (i.e. significant eruptions); what percentage of these do you think are classified as Plinian? Your lowest value.............best estimate.............highest value.............

3] Globally, how many eruptions of magnitude VEI = 5 or greater do you think there have been in the last 2000 years? Your lowest value.............best estimate.............highest value.............

4] What would you estimate is the highest (non-explosive) extrusion rate (in cu m/sec) reported for the 1995-1998 period of the dome-building eruption of Soufriere Hills volcano, Montserrat? Your lowest value.............best estimate.............highest value.............

5] Given typical magma production rates for arc volcanoes, what would you expect for the estimated time-averaged production rate of Shiveluch volcano, Kamchatka, in the Holocene (in '000s tons/year)? Your lowest value.............best estimate.............highest value.............

6] There was a major lake outburst from Kelud volcano, Java, in 1919: what do you think is the reported estimate of the number of fatal victims from the hot lahars? Your lowest value.............best estimate.............highest value.............

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

44

DEL IV ER ABL E D6.1

E X P E R T E L IC I T A T I O NS

7] Hudson volcano, Chile had a major explosive eruption 3,600 yr BP: what do you think is likely to be the maximum pumice clast dimension, in mm, for the isopleth that encloses an area of 50 sq km of deposits downwind of the crater? Your lowest value.............best estimate.............highest value.............

8] Myojinsho shallow submarine volcano, Japan, had an explosive eruption sequence in 1952-53, and a further 12 minor episodes of unrest up to 1993: when it was surveyed in 1993, what was the angle of slope of the upper parts of its very symmetrical cone? Your lowest value.............best estimate.............highest value.............

9] Back to Shiveluch volcano, Kamchatka: the biggest recent debris avalanche was in AD1430, and involved more than 3 km3 - what area did it cover, in sq km,? Your lowest value.............best estimate.............highest value............. 10] Lastly, an all-purpose question for testing judgment ‘precision’: in 1995, how many publications per year (including abstracts) related to the "Hawaiian-Emperor volcanic chain"? Your lowest value.............best estimate.............highest value.............

H:\PUBLIC_ HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DELIVERABLES \WOR D2004\EXPLORIS _D6.1 .DOC

45

DELIVERABLE D6.1

7

EXPERT ELICITATIONS

Appendix 2: EXPLORIS VOLCANIC SCENARIOS: ELICITATION QUESTIONNAIRE

EXPLANATION The purpose of this questionnaire is to assist the risk modellers of EXPLORIS with guidance on the relative quantitative likelihoods of different hazardous events at volcanoes of interest. The intention is to pool together the responses we get from EXPLORIS partners, so that we can map the likelihood distributions from volcano to volcano onto a rational and coherent comparative basis for risk assessment (N.B. the purpose here is NOT to undertake risk assessments for the particular volcanoes - that is the responsibility of the relevant authorities in each country, and outside our remit). The objective is to provide a statistically-based FRAMEWORK for consistent input parameters for the EXPLORIS modellers at each of the four volcanoes, using agreed volcanological scenario definitions that can be quantified in terms of their probability of occurrence. The way we would like to do this is by asking for your judgment of the median recurrence interval for the various Volcanic Scenario Definition (VSD) ‘class’ events of each ‘size’ (i.e. the 50 percentile value of a representative statistical distribution of such intervals). For a symmetrical distribution, such as a Gaussian, this might be taken as equivalent to the average recurrence interval, which you may be able to assess from available evidence; for skewed distributions, sparse data or subjective values, it might be appropriate to use an estimate or judgement of the mode of typical values as an ‘anchor’, approximating the median value that is needed. For instance, if you judge that a certain size of explosive eruption happens about once every 500 years, on average, at your volcano under existing conditions, and your uncertainty can be assumed normally distributed, then this is the value we could take as corresponding to the 50 percentile recurrence interval. Obviously, there is likely to be major uncertainty about any of the recurrence intervals we are attempting to estimate, but it is a measure of this scientific uncertainty that we are trying to capture in an objective manner. This is an important, and novel, aspect of the EXPLORIS approach. To accomplish this, we ask you to provide a range for your own individual scientific opinion on each recurrence interval (not an ‘institutional’ or ‘political’ opinion, please, we are concerned here with scientific uncertainty and individual scientists’ degrees of belief on different issues). This range should be such that you are confident it would encompass the ‘true’ average recurrence interval; for this study, the ‘working range’ is defined by the values you would give to mark: 1) the point below which the ‘10% shortest’ intervals would fall, and 2) the point above which the ‘10% longest’ intervals would be found - put another way, these two upper and lower values should be chosen such that you think there is less than a 1 in 5 chance that the proper value to use for the recurrence interval falls outside this range. The values you choose do not have to be symmetrical about the median (50 percentile) estimate - but can be chosen to reflect any skewness you think is likely to be present in such a distribution.

H: \PUBLIC_HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DE LIVERABLES\WORD2004\EXPLORIS_D6.1.DOC

46

DELIVERABLE D6.1

EXPERT ELICITATIONS

Thus, three values are needed to define the distributional spread over each recurrence interval value (shown on the attached forms as red-outlined boxes), so please give all three values for each class event case you are responding to. We recognise that you may wish to decline to give an opinion on some cases, so please leave the boxes empty and enter ‘no opinion’ in the Notes column. The key thing to remember is that this formulation allows you to express your scientific certainty or uncertainty on any particular question, and the ‘width’ of your spread should reflect your degree of confidence in the answers you provide. We have agreed that the various different kinds of hazard eruption scenarios that can occur at our volcanoes can be exemplified by the simplified ‘Volcano Scenario Definition’ events; these VSD events are chosen to be representative of varieties of manifestations that might be grouped together generically for risk assessment purposes. While the risk assessment part of our work will be based ultimately on annual probabilities of occurrence, we feel it is probably easier for people to think about recurrence rates or return intervals for these VSD events, especially when we are dealing with rare and infrequent eruption events; the recurrence interval estimates we obtain from this elicitation will be subsequently be converted into equivalent per annum probabilities. ADDITIONAL INFORMATION We are also seeking to obtain some supplementary information on the different VSD event types so that we can refine our assessments of the nature of the hazards involved: for instance, for phreatic eruptions, it would be helpful to know what proportion of such episodes you would expect to generate cold base surges, and what proportion might involve ballistics. There are a number of different supplementary questions like this in the questionnaire - please express the relevant proportions which apply as percentages on the form (in the boxes outlined in blue).

EVENT ‘SIZE’ In this exercise, we have made the somewhat arbitrary decision to use as a measure of ‘size’ the volume of ejecta or material involved in the particular VSD scenario (sometimes referred to as ‘volcanic magnitude’), relying on numerical orders of magnitude expressed in terms of cubic metres. For consistency, volumes derived from deposit measurements, etc., should be in terms of DRE.

H: \PUBLIC_HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DE LIVERABLES\WORD2004\EXPLORIS_D6.1.DOC

47

DELIVERABLE D6.1

EXPERT ELICITATIONS

It is recognised that other characterisations of size of eruption can be used. We suggest the following table for rudimentary conversion to equivalent volume of ejecta, if your preference is to work initially with one of these alternative indicators: VEI: VEI2 Column height (km): 1 - 5 Volume (cu m): < 107

VEI3 6 - 10 107-108

VEI4 11 – 15 108-109

VEI5 16 – 25 109-1010

VEI6 26 – 35 1010-1011

VEI7 > 35 > 1011

On the first page following is a sample set of answers, to illustrate the way the elicitation form is laid out.

PLEASE COULD YOU SAVE YOUR COMPLETED QUESTIONNAIRE AS A WORD DOC FILE WITH A UNIQUE FILENAME THAT INCLUDES YOUR INITIALS IF POSSIBLE, RETURN YOUR FORM BY EMAIL TO WITHIN ONE MONTH (or at the meeting in Napoli !!). Also, please don’t forget to supply your name and the name of your volcano on the first page of your response! Paul, Guillermo, Simon & Willy 09 Sept 2003

H: \PUBLIC_HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DE LIVERABLES\WORD2004\EXPLORIS_D6.1.DOC

48

DELIVERABLE D6.1

EXPERT ELICITATIONS

DUMMY Dome collapse episodes @ …MONTSERRAT…….volcano [type examples: Montserrat 1995; Merapi 1994]

Max.volume involved

percentage “10% shortest” 50 percentile or “10% longest” producing ave. recurrence recurrence recurrence pyroclastic interval (years) interval (years) interval (years) flows?

-

Notes

< 107 cu m

100

250

1000

90

Intervals episodes

107 - 108 cu m

200

500

1500

95



108 - 109 cu m

300

1000

3000

99



> 109 cu m

between

separate

No opinion

DUMMY

H: \PUBLIC_HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DE LIVERABLES\WORD2004\EXPLORIS_D6.1.DOC

49

DELIVERABLE D6.1

EXPERT ELICITATIONS

Your name………………………………………………………….your volcano…………………………………………….. EXPLORIS VSD EVENT TYPE 1: Phreatic eruptions [type examples: Guadeloupe 1976 (illustrative)] percentage “10% shortest” 50 percentile or “10% longest” percentage Max. volume recurrence ave. recurrence recurrence with cold base with involved interval (years) interval (years) interval (years) surges? ballistics?

Notes

< 106 cu m 106 - 107 cu m 107 - 108 cu m > 108 cu m

H: \PUBLIC_HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DE LIVERABLES\WORD2004\EXPLORIS_D6.1.DOC

50

DELIVERABLE D6.1

EXPERT ELICITATIONS

EXPLORIS VSD EVENT TYPE 2: Sector collapses [type examples: Mount St Helens 1980 (magmatic); Bandai-san 1888 (non-magmatic)] “10% shortest” 50 percentile or “10% longest” percentage Max. volume recurrence ave. recurrence recurrence noninvolved interval (years) interval (years) interval (years) magmatic?

-

Notes

< 107 cu m 107 - 108 cu m 108 - 109 cu m > 109 cu m

H: \PUBLIC_HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DE LIVERABLES\WORD2004\EXPLORIS_D6.1.DOC

51

DELIVERABLE D6.1

EXPERT ELICITATIONS

EXPLORIS VSD EVENT TYPE 3: Dome collapse episodes (n.b. in this case, for any particular episode when there may be multiple collapses, ‘max. volume’ refers to the largest single event in that episode) [type examples: Montserrat 1995 – pres.; Merapi 1994]

Max.volume involved

percentage “10% shortest” 50 percentile or “10% longest” producing recurrence ave. recurrence recurrence pyroclastic interval (years) interval (years) interval (years) flows?

-

Notes

< 107 cu m 107 - 108 cu m 108 - 109 cu m > 109 cu m

H: \PUBLIC_HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DE LIVERABLES\WORD2004\EXPLORIS_D6.1.DOC

52

DELIVERABLE D6.1

EXPERT ELICITATIONS

EXPLORIS VSD EVENT TYPE 4: Short-lived magmatic explosive eruptions [type examples: vulcanian/sub-Plinian, e.g. Vesuvius 1944, 1906, 1631]

Max.volume involved

percentage “10% shortest” 50 percentile or “10% longest” producing recurrence ave. recurrence recurrence pyroclastic interval (years) interval (years) interval (years) flows?

percentage involving hydromagmatic activity?

Notes

< 108 cu m 108 - 109 cu m 109 - 1010 cu m > 1010 cu m

H: \PUBLIC_HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DE LIVERABLES\WORD2004\EXPLORIS_D6.1.DOC

53

DELIVERABLE D6.1

EXPERT ELICITATIONS

EXPLORIS VSD EVENT TYPE 5: Sustained major magmatic explosive eruptions [type examples: Plinian, or greater, including Vesuvian caldera collapse-type events, e.g. Vesuvius AD 79; Avellino, 3.7ka; Guadeloupe 26ka]

Max.volume involved

percentage “10% shortest” 50 percentile or “10% longest” producing recurrence ave. recurrence recurrence pyroclastic interval (years) interval (years) interval (years) flows?

percentage involving hydromagmatic activity?

Notes

< 109 cu m 109 - 1010 cu m 1010 - 1011 cu m > 1011 cu m

H: \PUBLIC_HTML\EXPLORIS\CONFIDENTIAL\PRODUCTS\REPORT\DE LIVERABLES\WORD2004\EXPLORIS_D6.1.DOC

54