Quantitative and Qualitative Approaches to Reasoning under Uncertainty in Medical Decision Making John Fox, David Glasspool, and Jonathan Bury Imperial Cancer Research Fund Labs Lincoln’s Inn Fields London WC2A 3PX United Kingdom
[email protected];
[email protected];
[email protected]
Abstract. Medical decision making frequently requires the effective management and communication of uncertainty and risk. However a tension exists between classical probability theory, which is precise and rigorous but which people find non-intuitive and difficult to use, and qualitative approaches which are ad hoc but can be more versatile and easily comprehensible. In this paper we review a range of approaches to uncertainty management, then describe a logical approach, argumentation, which subsumes qualitative as well as quantitative representations and has a clear formal semantics. The approach is illustrated and evaluated in five decision support applications.
1
Introduction
Representing and managing uncertainty is central to understanding and supporting much clinical decision-making and is extensively studied in AI, computer science and psychology. Implementing effective decision models for practical clinical applications presents a dilemma. On the one hand, informal and qualitative representations of uncertainty may be natural for people to understand but they often lack formal rigour. On the other hand formal approaches based on probability theory are precise but can be awkward and non-intuitive to use. While in many cases probability theory is an ideal for optimal decision-making it is often impractical. In this paper we review a range of approaches to uncertainty representation, then describe a logical approach, argumentation, which can subsume both qualitative and quantitative approaches within the same formal framework. We believe that this may enable improvements in both the scope and comprehensibility of decision support systems and illustrate the approach with five decision support applications developed by our group.
2
Decision Theory and Decision Support
It is generally accepted that human reasoning and decision-making can exhibit various shortcomings when compared with accepted prescriptive theories derived from mathematical logic and statistical decision-making, and there are systematic patterns of distortion and error in people’s use of uncertain information [1,2,3,4]. In the 1970s and 1980s cognitive scientists generally came to the view that these characteristic failures come about because people revise their beliefs by processes that bear little resemblance to formal mathematical calculation. Kahneman and Tversky developed a S. Quaglini, P. Barahona, and S. Andreassen (Eds.): AIME 2001, LNAI 2101, pp. 272–282, 2001. © Springer-Verlag Berlin Heidelberg 2001
Quantitative and Qualitative Approaches to Reasoning under Uncertainty
273
celebrated account of human decision-making and its weaknesses in terms of what they called heuristics and biases [5]. They argued that people judge things to be highly likely when, for example, they come easily to mind or are typical of a class rather than by means of a proper calculation of the relative probabilities. Such heuristic methods are often reasonable approximations for practical decision making but they can also lead to systematic errors. If people demonstrate imperfect reasoning or decision-making then it would presumably be desirable to support them with techniques that avoid errors and comply with rational rules. Mathematical logic (notably "classical" propositional logic and the predicate calculus) is traditionally taken as the gold standard for logical reasoning and deduction, while expected utility theory (EUT) plays the equivalent role for decisionmaking. A standard view on the "correct" way to take decisions is summarised by Lindley as follows: "... there is essentially only one way to reach a decision sensibly. First, the uncertainties present in the situation must be quantified in terms of values called probabilities. Second, the consequences of the courses of actions must be similarly described in terms of utilities. Third, that decision must be taken which is expected on the basis of the calculated probabilities to give the greatest utility. The force of ’must’ used in three places there is simply that any deviation from the precepts is liable to lead the decision maker in procedures which are demonstrably absurd" [6], p.vii. However, many people think that this overstates the value of mathematical methods and understates the capabilities of human decision makers. There are a number of problems with EUT as a practical method for decision making, and there are indications that, far from being "irrational", human decision processes depart from EUT because they are optimised to make various tradeoffs to address these problems in practical situations. An expected-utility decision procedure requires that we know, or can estimate reasonably accurately, all the required probability and utility parameters. This is frequently difficult in real-world situations since a decision may still be urgently required even if precise quantitative data are not available. Even when it is possible to establish the necessary parameters, the cost of obtaining good estimates may outweigh the expected benefits. Furthermore, in many situations a decision is needed before the decision options, or the relevant information sources, are fully known. The complete set of options may only emerge as the decision making process evolves. The potential value of mathematical decision theory is frequently limited by the lack of objective quantitative data on which to base the calculations, the limited range of functions that it can be used to support, and the problem that the underlying numerical representation of the decision is very different from the intuitive understanding of human decision makers. Human decision-making may also not be as "absurd" as the normative theories appear to suggest. One school of thought argues that many apparent biases and shortcomings are actually artefacts of the highly artificial situations that researchers create in order to study reasoning and judgement in controlled laboratory conditions. When we look at real-world decision-making we see that human reasoning and decision-making is more impressive than the research implies.
274
J. Fox, D. Glasspool, and J. Bury
Shanteau [7] studied expert decision-making, “investigating factors that lead to competence in experts, as opposed to the usual emphasis on incompetence". He identified a number of important positive characteristics of expert decision-makers: First, they know what is relevant to specific decisions, what to attend to in a busy 1 environment, and they know when to make exceptions to general rules . Secondly, experts know a lot about what they know, and can make decisions about their own decision processes: they know which decisions to make and when, and which to skip, for example. They often have good communication skills and the ability to articulate their decisions and how they arrived at them. They can adapt to changing task conditions, and are frequently able to find novel solutions to problems. Classical deduction and probabilistic reasoning do not deal with these meta-cognitive skills. It has also been strongly argued that people are well adapted for making decisions under adverse conditions: time pressure, lack of detailed information and knowledge etc. Gigerenzer, for instance, has suggested that people make decisions in a "fast and frugal" way, which is to say human cognition is rational in the sense that it is optimised for speed at the cost of occasional and usually inconsequential errors [4]. 2.1
Tradeoffs in Effective Decision-Making
The strong view expressed by Lindley and others is that the only rational or "coherent" way to reason with uncertainty is to require that we comply with certain mathematical axioms - the axioms of EUT. In practice compliance with these axioms is often difficult because there is insufficient data to permit a valid calculation of the expected utility of the decision options, or a valid calculation may require too much time. An alternative perspective is that human decision-making is rational in that it incorporates practical tradeoffs that, for example, trade a lower cost (e.g. errors in decision-making) against a higher one (e.g. the amount of input data required or the time taken to calculate expected utility). Tradeoffs of this kind not only simplify decision-making but in practice may entail only modest costs in the accuracy or effectiveness of decision-making. Consequently the claim that we should not model decision-support systems on human cognitive processes is less compelling than it may at first appear. This possibility has been studied extensively in the field of medical decisionmaking. In the prediction of sudden infant death, for example, Carpenter et al. [8] attempted to predict death from a simple linear combination of eight variables. They found that weights can be varied across a broad range without decreasing predictive accuracy. In diagnosing patients suffering from dyspepsia, Fox et al. [9] found that giving all pieces of evidence equal weights produced the same accuracy as a more precise statistical method (and also much the same pattern of errors). In another study [10] we developed a system for the interpretation of blood data in leukaemia diagnosis, using the EMYCIN expert system shell. EMYCIN provided facilities to attach numerical "certainty factors" to inference rules. Initially a system was developed 1
“Good surgeons, the saying goes, know how to operate, better surgeons know when to operate and the best surgeons know when not to operate. That's true for all of medicine” Richard Smith, Editor of the British Medical Journal.
Quantitative and Qualitative Approaches to Reasoning under Uncertainty
275
using the full range of available values (-1 to +1). These values were then replaced with just two: if the rule made a purely categorical inference the certainty factor was set to be 1.0 while if there was any uncertainty associated with the rule the certainty factor was set to 0.5. The effect was to increase diagnostic accuracy by 5%! In a study of whether or not to admit patients with suspected heart attacks to hospital by O’Neil and Glowinski [11] no advantage was found of a precise decision procedure over simply "adding up the pros and cons". A similar comparison by Pradhan et al in a diagnosis task [12] showed a slight increase in accuracy of diagnosis with precise statistical reasoning, but the effect was so small it would have no practical clinical value. While the available evidence is not conclusive, a provisional hypothesis is that for some decisions strict use of quantitatively precise decision-making methods may not add much practical value to the design of decision support systems over simpler, more “ad hoc” methods.
3
Qualitative Methods for Decision Making
Work in artificial intelligence raises an even more radical option for the designers of decision support systems. The desire to develop versatile automata has stimulated a great deal of research in new methods of decision making under uncertainty, ranging from sophisticated refinements of probabilistic methods such as Bayesian networks and Dempster-Shafer belief functions to non-probabilistic methods such as fuzzy logic and possibility theory. Good overviews of the different approaches and their applications are [13, 14]. These "non-standard" approaches are similar to probability methods in that they treat uncertainty as a matter of degree. However, even this apparently innocuous assumption has also been widely questioned on the practical grounds that they also demand a great deal of quantitative input data and also that decision-makers often find them difficult to understand because they do not capture intuitions about uncertainty the nature of "belief", "doubt" and the form of natural justifications for decisionmaking. Consequently, interest has grown in AI in the use of non-numerical methods that seem to have some "common-sense" validity for reasoning under uncertainty but are not ad hoc from a formal point of view. These include non-classical logics including non-monotonic logics, default logic and defeasible reasoning. Cognitive approaches, sometimes called reason-based decision making, are also gaining ground, including the idea of using informal endorsements for alternative decision options [15] and formalisations of everyday strategies of reasoning about competing beliefs and actions based on logical arguments [16, 17, 18] Argumentation is a formalisation of the idea that decisions are made on the basis of arguments for or against a claim. Fox and Das [19] propose that argumentation may be the basis of a generalised decision theory, embracing standard probability theory as a special case, as well as other qualitative and semi-quantitative approaches. To take an example, suppose we wished to make the following informal argument:
276
J. Fox, D. Glasspool, and J. Bury
If three or more first degree relatives of a patient have contracted breast cancer, then this is one reason to believe that the patient carries a gene predisposing to breast cancer. In the scheme of [19] arguments are defined as logical structures having three terms: (Claim, Grounds, Qualifier) In the example the Claim term would be the proposition "this patient carries a gene predisposing to breast cancer" and the Grounds "three or more first degree relatives of this patient have contracted breast cancer" is the justification for the argument. The final term, Qualifier, specifies the nature and strength of the argument which can be drawn from the grounds to the claim. In the example the qualifier is informal "this is one reason to believe", but it could be more conventional, as in "given the Grounds are true the Claim is true with a conditional probability of 0.43". Qualifiers are specified with reference to a particular dictionary of terms, along with an aggregation function that specifies how qualifiers in multiple arguments are to be combined. One possible dictionary is the set of real numbers from 0.0 to 1.0, which would allow qualifiers to be specified as precise numerical probability values and, with an appropriate aggregation function based on the theorems of classical probability, would allow the scheme to reduce to standard probability theory. In many situations little can be known of the strength of arguments, other than that they indicate an increase or decrease in the overall confidence in the claim. In this case the dictionary and aggregation function might be simpler. In [20] for example we describe a system which assesses carcinogenicity of novel compounds using a dictionary comprising the symbols + (argument in favour), - (argument against). The aggregation function in this case simply sums the number of arguments in favour of a proposition minus those against. Other dictionaries can include additional qualifiers, such as ++ (the argument confirms the claim) and -- (the argument refutes the claim). Another possible dictionary adopts linguistic confidence terms (e.g. probable, improbable, possible, plausible etc.) and requires a logical aggregation function that provides a formalised semantics for combining such terms according to common usage. Such terms have been formally categorised by their logical structure for example [21, 22]. A feature of argumentation theory is that it subsumes these apparently diverse approaches within a single formalism which has a clear consistent formal semantics [19]. Additionally, argumentation gives a different perspective on the decision making process which we believe to be more in line with the way people naturally think about probability and possibility than standard probability theory. Given the evidence reviewed above that people do not naturally use EUT in their everyday decision making, our tentative position is that expressing a preference or confidence in terms of arguments for and against each option will be more accessible and comprehensible than providing a single number representing its aggregate probability. We have developed a number of decision support systems to explore this idea. In the next sections we describe a technology which supports an argument-based decision procedure, then outline several applications which have been built using it, and finally consider quantitative evaluations of the applications.
Quantitative and Qualitative Approaches to Reasoning under Uncertainty
4
277
Practical Applications of the Argumentation Method
We have used the argumentation framework in designing five decision support systems to date. 4.1
CAPSULE
The CAPSULE (Computer Aided Prescribing Using Logic Engineering) system was developed to assist GPs with prescribing decisions [19,23]. CAPSULE analyses patient notes and constructs a list of relevant candidate medications, together with arguments for and against each option (based on nine different criteria, including efficacy, contra-indications, drug interactions, side effects, costs etc). 4.2
CADMIUM
The CADMIUM radiology workstation is an experimental package for combining image processing with logic-based decision support. The main use that CADMIUM has been applied to is in screening for breast cancer, in which an argument based decision procedure has the dual function of controlling image processing functions which extract and describe micro-calcifications in breast x-rays and interprets the descriptions in terms of whether they are likely to indicate benign or malignant abnormalities. The decision procedure assesses the arguments and presents them to the user in a structured report. 4.3
RAGs
The RAGs (Risk Assessment in Genetics) system allows the user to describe a patient’s family tree incorporating information on the known incidence of cancers within the family. This information is evaluated to assess the likelihood that a genetic predisposition to a particular cancer is present. The software makes recommendations for patient managementin language which is comprehensible to both clinician and patient. A family tree graphic is used for incremental data entry (Figure 1). Data about relatives are added by clicking on each relative’s icon, and completing a simple form. RAGs analyses the data and provides detailed assessments of genetic risk by weighing simple arguments like the example above rather than using numerical probabilities. Based on the aggregate genetic risk level the patient is classified as at high, moderate or low risk of carrying a BrCa1 genetic mutation, which implies an 80% lifetime risk of developing breast cancer. Appropriate referral advice can be given for the patient based on this classification. The software can provide a comprehensible explanation for its decisions based on the arguments it has applied (Figure 1). RAGs uses a set of 23 risk arguments (e.g. if the client has more than two firstdegree relatives with breast cancer under the age of 50 then this is a risk factor) and a simple dictionary of qualifiers which allows small positive and negative integer values as well as plus and minus infinity (equivalent to confirming or refuting the
278
J. Fox, D. Glasspool, and J. Bury
claim). This scheme thus preserves relative rather than absolute weighting of different factors in the analysis. 4.4
ERA
Patients with symptoms or signs which may indicate cancer should be investigated quickly so that treatment may commence as soon as possible. The ERA (Early Referrals Application) system has been designed in the context of the UK Department of Health’s (DoH) “2 week” guideline which states that patients with suspected cancer should be seen and assessed by an appropriate specialist within 2 week of their presentation. Referral criteria are given for each of 12 main cancer groups (e.g. Breast, Lung, Colorectal). The practical decision as to whether or not to refer a particular patient is treated in ERA as based on a set of patient-specific arguments (see figure 2). The published referral criteria have been specifically designed to be unambiguous and easy to apply. Should any of the arguments apply to a particular patient, an early referral is warranted. Qualifiers as to the weight of each argument are thus unnecessary, with arguments generally acting categorically in an all or nothing fashion. An additional feature of this domain is that there are essentially no counter-arguments. The value of argumentation in this example lies largely in the ability to give meaningful explanations in the form of reasons for referral expressed in English. 4.5
REACT
The applications described so far involve making a single decision - what drug to prescribe, or whether to refer a patient. Most decisions are made in the context of plans of action, however, where they may interact or conflict with other planned actions or anticipated events. REACT (Risk, Events, Actions and their Consequences over Time) is being developed to provide decision support for extended plans. In effect REACT is a logical spreadsheet that allows a user to manipulate graphical widgets representing possible clinical events and interventions on a timeline interface and propagates their implications (both qualitative and quantitative) to numerical displays of risk (or other parameters) and displays of arguments and counterarguments. While the REACT user creates a plan, a knowledge-based DSS analyses it according to a set of definable rules and provides feedback on interactions between events. Rules may specify, for example, that certain events are mutually exclusive, that certain combinations of events are impossible, or that events have different consequences depending on prior or simultaneous events). Global measures (for example the predicted degree of risk or predicted cost or benefit of combinations of events) can be displayed graphically alongside the planning timeline. Qualitative arguments for and against each individual action proposed in the plan can be reviewed, and can be used to generate recommended actions when specified combinations of plan elements occur.
Quantitative and Qualitative Approaches to Reasoning under Uncertainty
Fig. 1. The RAGs software application. A family tree has been input for the patient Karen, who has had three relatives affected by relevant cancers. The risk assessment system has determined a pattern of inheritance that may indicate a genetic factor, and provides an explanation in the left-hand panel. Referral advice for the patient is also available.
Fig. 2. The ERA early referral application. After completing patient details on a form, the program provides referral recommendations with advice, and can contact the hospital to automatically make an appointment.
279
280
J. Fox, D. Glasspool, and J. Bury
5
Evaluations
A number of these applications have been used to carry out controlled evaluations (with the exception of ERA and REACT which are still in development. In the case of the CAPSULE prescribing system a controlled study with Oxfordshire general practitioners showed the potential for substantial improvements in the quality of their prescribing decisions [23]. With decision support there was a 70% increase in the number of times the GPs decisions agreed with those of experts considering the cases, and a 50% reduction in the number of times that they missed a cheaper but equally effective medication. The risk classifications generated by the RAGS software was compared for 50 families with that provided by the leading probabilistic risk assessment software. This uses the Claus model [27], a mathematical model of genetic risk of breast cancer based on a large dataset of cancer cases. Despite the use of a very simple weighting scheme the RAGs system produced exactly the same risk classification (into high, medium or low risk, according to established guidelines) for all cases as the probabilistic system [24, 25]. RAGs resulted in more accurate pedigree taking and more appropriate management decisions than either pencil and paper or standard probabilistic software [26]. In the CADMIUM image-processing system users are provided with assistance in interpreting mammograms using an argument-based decision making component. Radiographers who were trained to interpret mammograms were asked to make decisions as to whether observed abnormalities were benign or malignant, with and without decision support. The decision support condition produced clear improvements in the radiographers’ performance, in terms of increased hits and correct rejections and reduced misses and false positives [27].
6
Summary and Conclusions
Human reasoning and decision-making can exhibit various shortcomings when compared with accepted prescriptive theories. Good decision-making is central to many important human activities and much effort is directed at developing decision support technologies to assist us. One school of thought argues that these systems must be based on prescriptive axioms of decision-making since to do otherwise leads inevitably to irrational conclusions and choices. Others argue that decision-making in the real world demands practical tradeoffs and a failure to make those tradeoffs would be irrational. Although the debate remains inconclusive, we have described a number of examples of medical systems in which the use of simple logical argumentation appears to provide effective decision support together with a versatile representation of uncertainty which fits comfortably with people's intuitions.
Acknowledgements. The RAGs project was supported by the Economic and Social Research Council, and much of the development and evaluation work was carried out by Andrew Coulson and Jon Emery. Michael Humber carried out much of the implementation of the ERA system.
Quantitative and Qualitative Approaches to Reasoning under Uncertainty
281
References 1. 2. 3. 4. 5. 6. 7. 8.
9. 10.
11. 12. 13. 14. 15. 16. 17.
18.
19. 20.
21.
22.
Kahneman D., Slovic P. and Tversky A. (eds): Heuristics and Biases. Cambridge University Press, Cambridge (1982) Evans J. st B. and Over D.E.: Rationality and reasoning. Psychology press, London (1996) Wright G. and Ayton P. (eds): Subjective Probability. Wiley, Chichester, UK (1994) Gigerenzer G. and Todd P.M.: Simple heuristics that make us smart. Oxford University Press, Oxford (1999) Kahneman, D. and Tversky, A.: Judgement under Uncertainty: Heuristics and Biases. Science, Vol. 185, 27 (1974) pp. 1124-1131. Lindley DV.: Making Decisions (2nd Edition). Wiley, Chichester, UK (1985) Shanteau, J.: Psychological characteristics of expert decision makers. In , J Mumpower (ed) Expert judgement and expert systems. NATO ASI Series, volume F35 (1987) Carpenter R G, Garnder A, McWeeny P M. and Emery J L.: Multistage scoring system for identifying infants at risk of unexpected death. Archives of disease in childhood, 53(8), (1977) pp.600-612 Fox J. Barber DC. and Bardhan KD.: Alternatives to Bayes? A quantitative comparison with rule-based diagnosis. Methods of Information in Medicine, 10 (4) (1980) pp.210-215 Fox, J. Myers CD., Greaves MF. and Pegram S.: Knowledge acquisition for expert systems: experience in leukaemia diagnosis. Methods of Information in Medicine, 24 (1) (1985) pp.65-72 O’Neil MJ. and Glowinski AJ.: Evaluating and validating very large knowledge-based systems. Medical Informatics, 15 (3) (1990) pp.237-251 Pradhan M.: The sensitivity of belief networks to imprecise probabilities: an experimental investigation. Artificial Intelligence Journal, 84 (1-2) (1996) pp.365-397 Krause P. and Clark C.: Uncertainty and Subjective Probability in AI Systems In Wright G & Ayton P (Eds) Subjective Probability, Wiley J & Sons (1994) pp.501-527 Hunter, A. and Parsons, S. (Editors) Applications of uncertainty formalisms, Springer Verlag, LNAI 1455, (1998) Cohen P.R.: Heuristic Reasoning: An Artificial Intelligence Approach. Pitman Advanced Publishing Program, Boston (1985) Fox J.: On the necessity of probability: Reasons to believe and grounds for doubt. In Wright G, Ayton, P, eds., Subjective Probability. John Wiley, Chichester (1994) Fox J, Krause P. and Ambler S.: Arguments, contradictions and practical reasoning. In Neumann B, ed. Proceedings of the 10th European Conference on AI (ECAI92), Vienna, Austria (1992) pp.623-627 Curley SP. and Benson PG.: Applying a cognitive perspective to probability construction. In G Wright & P Ayton (eds.), Subjective Probability. John Wiley & Sons. Chichester, England (1994) pp. 185-209 Fox J. and Das S K.: Safe and Sound: Artificial Intelligence in Hazardous Applications, American Association of Artificial Intelligence and MIT Press (2000) Tonnelier CAG., Fox J., Judson P., Krause P., Pappas N. and Patel M.: Representation of Chemical Structures in Knowledge-Based Systems. J. Chem. Inf. Sci. 37 (1997) pp.117123. Elvang-Goransson M, Krause P.J and Fox J.: Acceptability of Arguments as Logical Uncertainty. In: Clarke M, Kruse R and Moral S, eds. Symbolic and Quantitative Approaches to Reasoning and Uncertainty. Proceedings of European Conference ECSQARU93. Lecture Notes in Computer Science 747. Springer-Verlag (1993) pp.85-90. Glasspool DW. and Fox J.: Understanding probability words by constructing concrete mental models. In Hahn M, Stoness, SC, eds, Proceedings of the 21st Annual Conference of the Cognitive Science Society (1999) pp.185-190.
282
J. Fox, D. Glasspool, and J. Bury
23. Walton RT, Gierl C, Yudkin P, Mistry H, Vessey MP & Fox J.: Evaluation of computer support for prescribing (CAPSULE) using simulated cases. British Medical Journal 315 (1997) pp.791-795. 24. Emery J, Watson E, Rose P and Andermann, A. A systematic review of the literature exploring the role of primary care in genetic services. Fam. Prac. 16 (1999) pp.426-45. 25. Coulson AS, Glasspool DW, Fox J and Emery J. Computerized Genetic Risk Assessment from Family Pedigrees. MD computing, in press. 26. Emery J., Walton, R., Murphy, M., Austoker, J., Yudkin, P., Chapman, C., Coulson, A., Glasspool, D. and Fox, J.: Computer support for interpreting family histories of breast and ovarian cancer in primary care: comparative study with simulated cases. British Medical Journal, 321 (2000) pp.28-32. 27. Claus E, Schildkraut J, Thompson WD and Risch, N. The genetic attributable risk of breast and ovarian cancer. Cancer 1996; 77, pp. 2318-24. 28. Taylor, P, Fox J and Todd-Pokropek, A “The development and evaluation of CADMIUM: a prototype system to assist in the interpretation of mammograms” Medical Image Analysis, 1999, 3 (4), 321-337.