An investigation into the perceived probabilities of ... - BIR Publications

1 downloads 496 Views 327KB Size Report
person who judges the meaning of these terms as well ..... This means that some words are more ambiguous .... Sox HC, Blatt MA, Higgins MC, Marton KI, eds.
Blind chance? An investigation into the perceived probabilities of phrases used in oral radiology for expressing chance S.E. Stheeman, P.A. Mileman, M.A. van't 80f* and P.F. van der Stelt Department of Oral Radiology, Academic Centre for Dentistry Amsterdam and *Medical Statistics Department, Catholic University, Nijmegen, The Netherlands

Received 26 October 1992, and in final form 12 January 1993 The necessity for numerical probabilities in oral radiographic diagnosis is increasing, due to recent developments in computer-aided diagnosis, decision analysis, informed consent and medical litigation. These numerical probabilities are only partly available from current texts on oral radiology, where they are often expressed by ill-defined, semiquantitative phrases. Therefore, in this study 30 phrases expressing the probability of a relationship between a diagnosis and its symptoms were taken from a selected textbook on orat radiology. Seven oral radiologists from the USA and the Netherlands scored each of these probabilistic phrases on a 20-cm visual analogue scale. Low intraradiologist and high interradiologist variation was found. Because the high variation among authors of texts on oral radiology in interpreting probability information could have a negative influence on their ability to transfer unambiguous information to their readers, it is recommended that the use of semiquantitative phrases in oral radiology is restricted to five probability groups. Keywords: Radiology, diagnosis; decision making

Dentomaxillofac. Radiol., 1993, Vol. 22, 135-139, August

Probabilities play an important role in oral radiographic diagnosis. A diagnosis can be defined as an estimate of the probability that a patient with a certain set of signs and symptoms suffers from a certain disease. In diagnostic texts, the probabilities, or relative frequencies, that are needed to quantify uncertainty, are often described by words such as 'often' or 'possibly'. A sentence such as •A dentigerous cyst appears as a well-defined radiolucency, usually with a hyperostotic border" contains knowledge about the relationship between the signs and the diagnosis. The need for numerical data has grown because of the increasing application of decision-analysis techniques to diagnostic procedures as well as the development of computer-aided clinical decision making. Similar demands have come from those concerned with the medicolegal aspects of diagnosis and treatment planning-. Recently, researchers in psychology'"? and political forecasting" have quantified verbal descriptors of uncertainty. In medicine Toogood", Robertson'' and Kenney" have attempted to express descriptive words and phrases as numerical probability estimates. Eekhof etaf.10 carried out a similar study using Dutch. From all these studies, it is apparent that the interpretation of phrases used to express probability is dependent on the person who judges the meaning of these terms as well as the context in which they are used. The purpose of this study was to describe the variation in the interpretation by oral radiologists of probabilistic phrases taken

from a textbook of oral radiology and to suggest ways to avoid ambiguity in the presentation of such probabilities in future.

Materials and methods The editor of an international journal of oral radiology was asked to select five international experts in the field of oral radiology with a good command of English. Each expert came from a different European country with the exception of The Netherlands where the study was to be carried out. Four of those selected agreed to participate in the study. The members of this panel of experts were not told the names of the other participants. The first part of a Delphi agreement procedure II was then initiated. The experts were asked to rank the three English language textbooks of oral radiology which they considered the best in the field. In this initial phase they all selected the same textbook, with direct convergence of the Delphi technique. Three chapters comprising 100 pages of text and illustrations were selected from that part of the textbook covering the radiological signs of pathology, to serve as the source of data for the next part of the study. From these chapters all the words and phrases indicating either a probability or frequency relationship between a diagnosis and its signs and symptoms were listed and the frequency of their use noted. A list of 30 Dentomaxillofac. Radiol., 1993, Vol. 22, August

135

Perceived probabilities in oral radiology: S. E. Stheeman et al. different words and phrases was assembled. The list contained eight expressions of probability: could reasonably be expected to tend to are likely may well occur are unlikely to must be considered to may possibly and 22 expressions describing relative frequency: usually occasionally almost never very often predominantly in some cases common seldom always in most cases often never not usually frequently rarely The words and phrases in this list were then incorporated in specially constructed short sentences, so as to set them in a uniform context for subsequent evaluation, for example: 'It is common among patients having disease D to show symptom S'. The sentences had no specific pathological or radiological meaning. Two contributors to the selected textbook, working in the same university oral radiology department in the USA and five oral radiologists from the same university department in The Netherlands, undertook the evaluations. They were asked to mark with a cross on a 20-cm visual analogue scale'? the probability they felt best described each of the 30 phrases. They repeated the procedure 1 week later when it was assumed they would not be able to recall their original response. The data was analysed in two separate ways. Firstly, we were interested in the meaning of individual phrases as well as in the variation in their interpretation by the seven radiologists. Therefore, we calculated the median and range for each of the 30 phrases. To avoid distortions due to extreme and unrepresentative responses, only the median and the value immediately above and immediately below it were used to calculate the range. Second, we wanted to analyse whether there were systematic differences in the ranking of low compared with high probability phrases between the individuals in the study. For this purpose we calculated Spearman rank correlations between the results of each individual radiologist and the group's median rank correlation. As a reliability check, Spearman rank correlations were also calculated between the two trials.

Results The Spearman rank correlation coefficients between the two trials ranged from 0.76 to 0.95 with an average 136

Dentomaxillofac. Radiol., 1993, Vol. 22, August

Table I Median reported probability and range for each of the 30 phrases, ranked from low to high. The four extreme values for each phrase were left out in calculating the ranges

Phrase

Median

Range

Never Almost never Rarely Unlikely Very rarely Seldom Infrequently Not usual Occasionally May Possibly In some cases Not uncommon Not infrequently Tend to May well occur Must be considered Common Not always Frequently In many cases Are likely to Often Could reasonably be Usually Predominantly Very often In most cases Almost always Always

0 0.03 0.05 0.06 0.07 0.11 0.14 0.14 0.19 0.22 0.23 0.24 0.37 0.39 0.58 0.63 0.63 0.67 0.68 0.71 0.73 0.74 0.74 0.74 0.8 0.8 0.88 0.93 0.95 1

0 0.03 0.03 0.08 0.08 0.04 0.01 0.05 0.16 0.14 0.06 0.03 0.33 0.23 0.09 0.14 0.28 0.17 0.25 0.16 0.08 0.14 0.02 0.05 0.12 0.05 0 0.05 0.01 0

~f 0.9.

The ~edi~n interpretation, expressed as subjective probability, IS shown for each phrase in Table I. It can be seen that a large range of probabilities was found for the double negative phrases not uncommon and not infrequently. These phrases were not interpreted in the same way as their positively formulated equivalent terms common and frequently. 'Mirror phrases' such as likely and unlikely did not complement each o~her. Furthermore, the qualifier very had very unpredictable effects on the meaning of the phrases it preceded. In the case of very often, the adverb very increased the probability expressed by often as might have been expected. On the other hand, it increased the probability expressed by rarely instead of decreasing it, although the difference was not significant. Figure 1 illustrates the relationship between the probability expressed by a phrase and the variation in interpreting this probability among the seven radiologists. There was a tendency towards larger ranges for phrases expressing probability in the region of 0.50, that is a 50% chance. In Figure 2 the phrases are divided into five groups, based n the median probabilities they express as well as their ranges. Those with a probability between 0.33 ~nd 0.67 formed one group (group 3), which con tamed ..~?st of the phrases with the largest ranges of probabilities, The phrases expressing low or high probability were each divided into groups with an equal range (groups 1, 2, 4 and 5). Phrases in these groups generally had smaller ranges. Spearman rank correlation coefficients were calcu!at~d. betwee.n t~e interpretation given by each individual radiologist and the group median (Table II),

Perceived probabilities in oral radiology: S.E. Stheeman et al, 150

o 0.3

125

0 0 0 III

ell

100

0.2

>u

c III a::

C

00

0 0

0

411

:::l 0"

0 0

0.1

0

tm 0

0

0

0

0

0

0.167

0.333

0.500

0.667

0

0.834

1.000

Subjective probability F~ I The relationship between the median subjective probabili-

ties ranked from low to high and the corresponding probability ranges of the 30 phrases. The four extreme values for each phrase were left out in calculating the ranges. Each square represents one phrase Group 1 Group 2 never almost never

rarely

unlikely

very rarely

seldom infrequently

not usual occasionally

----

--

may

possibly In some cases

Group 3

I--

not infrequently

-

--

tend to

may well occur l'llUsl be considered

common

not always frequently In many cases are likely to often could reasonably be

--

- r-7 -::: f-

usually predominantly very often In most cases almost always always

o

0.167

0.333

0.5

a

Group 1

Group 2

Group 3

Group 4

Group 5

Groups of verbal expressions

Figure 3 The frequency of the usage of phrases from five probability groups in a textbook of oral radiology (~) and the frequencies corrected for the fact that each group contains a different number of phrases (.)

Group 4 Group 5

4

not uncommon

50 25

0

0

0

L.

u.

0

00 00

75

411

0.667

-

4

0.833

Subjective probability

FIgure 2 The median probability ranked from low to high (.) and the ranges 0---0 for the 30 phrases. The four extreme values for each phrase were left out when the ranges were calculated. The phrases aredivided into five groups of probabilities by the vertical lines

agreement in their interpretation could have obscured systematic differences between radiologists in ranking terms from groups 2-4. The number of times each phrase appeared in the three chapters was summarized for each of the five probability groups (see Figure 2) and is shown in Figure 3. Because each phrase is used at least once, the number of different phrases in each of the five groups has a direct influence on the frequency with which the phrases in each group are used in the text. Therefore, the results have been corrected for the number of different phrases in each group, as if all groups contained the same number of different phrases (see also Figure 3). Words expressing a probability of between 0.33 and 0.83 are used more often than those from any of the other groups.

Discussion

n Interobserver agreement in ranking the semiquantitative expressions of probability expressed as Spearman rank correlations calculated for the phrases in groups 1-5 (middle column) and groups 2-4 (Figure 2) (right-hand column) Table

Radiologist

Spearman correlations (groups 1-5)

Spearman correlations (groups 2-4)

Author1 Author2 Reader 1 Reader2 Reader 3 Reader4 Reader 5 Average

0.7324 0.8675 0.8540 0.8291 0.7727 0.8341 0.8613 0.8216

0.4650 0.7025 0.6843 0.5980 0.5998 0.6622 0.6931 0.6293

in order to be able to judge differences in ranking the set of phrases from low to high probability. The results showed that ranking within the group of radiologists was not absolutely consistent. On the other hand, when the high variation in the interpretation of the individual phrases is considered, the agreement on ranking seems to be fairly high, even when the terms from groups 1 and 5 were omitted from the test. The high overall

To be able to transfer written information in a meaningful way, it should be unambiguous. When the author of a textbook states 'It is not uncommon for lesions that are surrounded by an osteolytic zone to show microscopic signs of malignancy' , it makes a lot of difference to the patient whether not uncommon means that there is a 32% probability or a 65% probability of malignancy. Both these values were mentioned by individual radiologists in our study as their interpretation for the phrase, not uncommon. There should therefore be a clear consensus about the numerical interpretation of semiquantitative terms that are widely used to express the probability of the relationship between the signs, symptoms and a disease, and between the results of a diagnostic test and a disease. Budescu and Wallsteirr' gave two possible general explanations for the use of imprecise descriptive phrases instead of numerical expressions. First, precise knowledge is often lacking and it would therefore be misleading to represent it as such. Second, people are said to understand words better than numbers. We found that intraradiologist variation in interpreting the probabilistic phrases was relatively small. This agrees with the results of both Lichtenstein and Dentomaxillofac. Radiol., 1993, Vol. 22, August

137

Perceived probabilities in oral radiology: S.E. Stheeman et al. Newman" and Beyt-Marom". There is, however, considerable interradiologist variation with certain phrases. Other comparable studies have shown similar results3-9. In addition, the results of this study may demonstrate 'anchoring'!", This means that observers may judge each phrase in relation to the degree it increases or decreases their initially perceived probability. Such anchoring would lead to increased variability in the absolute judgement of the meaning of the phrases but would have only a small effect on their ranking from low to high probability. The results of this study show no significant difference between the judgement of the authors of the textbook and its expert professional readers. In order to convey the highest possible degree of information a consensus is needed on editorial style in diagnostic texts and in the primary literature. The use of the phrases with the highest ranges in interpretation should be avoided. Most of these phrases express a probability of between approximately 0.33 and 0.83 (groups 3 and 4). We found that these phrases were preferred in the textbook we used in this study. Some more than others were interpreted very differently. This means that some words are more ambiguous than others. From our study, it seems likely that double negatives, mirror phrases and qualifiers like 'very' , only introduce additional vagueness and should therefore be avoided. This was also found by Lichtenstein and Newmarr'. Kenney" suggested that if the use of semiquantitative phrases is unavoidable the authors should specify their best estimate of its value or range of probability. The second solution was also proposed by Beyt-Mayrom". Probability ranges have the advantage that they can be used if the available information is not very precise. Based on the results of this study, we propose that the terms used to describe probabilities are limited to the five groups shown in Figure 1. Additionally, we recommend that the words never and always are used to express 0 and 100% probability respectively. Since humans have cognitive difficulty in manipulating information in more than approximately seven distinct categories", using a limited number of probability groups might result in increased clarity. Dividing the phrases into more than five groups would not be justified, due to the large ranges found in their interpretation. The different phrases within each group should be stated with their ranges at the beginning of a textbook or journal. The reader should be guided in reading the text by the range of each group as an estimate of the numerical probability expressed by a particular phrase. It may be that currently, to avoid monotony some authors use different probability expressions with apparently similar meanings interchangeably. However, as shown in this study, readers might interpret them quite differently. By using a set of phrases with similar meanings, an author can write not only clearly but also interestingly. The five probability groups might also be used to translate the phrases into numerical data for use in decision analysis procedures and in building expert diagnostic systems. Buchanan and Shortliffe 15, in developing their expert system MYCIN, showed that a difference as large as 20% in the probabilities used did not significantly alter its outcome. The use of five 138

Dentomaxillofac. Radiol., 1993, Vol. 22, August

probability groups could thus at this stage be an acceptable compromise in the abscence of exact probabilities. We found that the textbook authors preferred use of phrases from goups 3 and 4. These terms emphasize the existence of positive relationships, although for the diagnostician information about negative associations, such as 'Signs of bone loss are never found in a osteosclerotic area of the jaw', might be extremely valuable in order to rule out the possibility of disease from the differential diagnosis. This applies expecially to those diseases with low prevalence which might otherwise be overdiagnosed. We suggest authors consider adding more of this important information to their texts. In conclusion, authors and editors should be encouraged to provide the reader with the exact probabilities that are so vital to the diagnostic process. To some extent, this has already happened in the medical literature where prevalence and likelihood ratios are published for a number of diseases and their diagnostic tests 16. In contrast, much of this essential data is still not available in dentistry and more research is required to increase the technical efficacy of oral radiology. Patient outcome in diagnostic imaging is, however, highly dependent on the way the information produces changes in the physician's subsequent thinking!", Only by careful control of all of the intermediate steps that finally lead to a treatment plan can the efficacy of diagnostic imaging be increased". Improving the conditions for developing cognitive skills involved in decision making by providing better textual information may thus enhance diagnostic accuracy and improve patient care.

References 1. Goaz PW, White SC. Cysts of the jaws. In: Goaz PW, White SC, eds. Oral Radiology: Principles and Interpretation, 2nd edn. St Louis: CV Mosby, 1987: 486. 2. Men JF, Druzdzel DJ, Mazur DJ. Verbal expressions of probability in informed consent litigation. Med Decis Making 1991; 11: 273-81. 3. Budescu DV, Wallsten TS. Consistency in interpretation of probabilistic phrases. Organiz Behav Hum Decision Proc 1985; 36: 391-405. 4. Wallsten TS, Budescu DV, Rapoport A, Zwick R, Forsyth B. Measuring the vague meanings of probability terms. J Exp Psychol [Gen] 1986; 115: 348-65. 5. Lichtenstein S, Newman JR. Empirical scaling of common verbal phrases associated with numerical probabilities. Psychon Sci 1967; 9: 563-4. 6. Beyt-Marom R. How probable is probable? Numerical translation of verbal probability expressions. J Forecasting 1982; 1: 257-69. 7. Toogood JH. What do we mean by 'usually'? Lancet 1980; 1: 1094. 8. Robertson WOo Quantifying the meaning of words. JAMA 1983; 249: 2631-2. 9. Kenney RM. Between never and always. N Engl J Med 1981; 305: 1097-8. 10. Eekhof JAH, Mol SSL, Pielage JC. Is doorgaans vaker dan dikwijls; of hoe vaak is soms? Ned Tijdschr Geneeskd 1992; 136: 41-2. 11. Fink A, Kosecoff J, Chassin M, Brook RH. Consensus methods: characteristics and guidelines for use. Am J Public Health 1984; 74: 979-83.

Perceived probabilities in oral radiology: S.E. Stheeman et a1. 12. Aitken RCB. Measurement of feelings using visual analogue scales. Proc R Soc Med 1969; 62: 989-93. 13. TverskyA, Kahneman D. Judgement under uncertainty: heuristicsand biases. Science 1974; 185: 1124-31. 14. MillerGA. The magical number seven plus or minus two. Some limits on our capacity for processing information. Psychol Rev 1956; 63: 81-97. 15. Buchanan BG, Shortliffe EH. Uncertainty and evidential support. In: Buchanan BG, Shortliffe EH, eds. Rule-based Expert Systems - The MYCIN Experiments ofthe Stanford Programming Project. Reading: Addison-Wesley, 1984: 217-9.

16. Sox HC, Blatt MA, Higgins MC, Marton KI. Appendix I. In: Sox HC, Blatt MA, Higgins MC, Marton KI, eds. Medical Decision Making. Boston: Butterworths, 1988: 337-71. 17. Fryback DG, Thornbury JR. The efficacy of diagnostic imaging. Med Decis Making 1991; 11: 88-94. 18. Mileman PA, Kievit J. Editorial - Achieving efficacy in oral radiology - out of the woods, and into decision trees? Dentomaxillofac Radioll992; 21: 115-7. Address: Dr S.E. Stheeman, ACTA - Department of Oral Radiology, Louwesweg 1, 1066 EA Amsterdam, The Netherlands.

Dentomaxillofac. Radiol., 1993, Vol. 22, August

139