''What does this mean?'' How Web-based consumer

0 downloads 0 Views 161KB Size Report
marily to empower consumers and to enable them to make informed ..... Mammogram http://www.nlm.nih.gov/medlineplus/tutorials/mammogram/rd139101.pdf. B4. ... those five resources (B2–B6) that include at least some information about the ...
‘‘What does this mean?’’ How Web-based consumer health information fails to support information seeking in the pursuit of informed consent for screening test decisions By Jacquelyn Burkell [email protected] Assistant Professor D. Grant Campbell [email protected] Assistant Professor Faculty of Information and Media Studies Middlesex College University of Western Ontario London, Ontario N6A 5B7 Canada

Purpose: The authors analyzed online consumer health information related to screening tests to see how well this information meets known standards for supporting the understanding of test uncertainty. Setting/Subjects: MedlinePlus documents regarding maternal serum screening (6), prostate-specific antigen testing (6), and screening mammography (6) were analyzed. Methodology: The content of the documents was analyzed. Results: This study showed that most sites conscientiously report that tests are less than 100% accurate, but few provide important details about the level of uncertainty associated with test results. In particular, few resources give information about the predictive value of screening tests and have little mention of the fact that predictive value is influenced by the a priori likelihood of having the condition. Discussion/Conclusion: These results suggest that online consumer health information does not adequately support decisions about medical screening. We suggest a potential solution to the problem: metadata harvesting coupled with optimized presentation techniques to format personalized information about screening tests. Using these techniques, the empowerment of personal choice in matters of health decisions could become the de facto standard.

INTRODUCTION On April 22, 1995, a subscriber to the sci.med newsgroup sent a message explaining that his friend had tested positive in a Down syndrome screening test and had subsequently gone for amniocentesis. His friend was worried, and the subscriber was looking for feedback on the benefits of the screening test. Among the responses he received were: n a discussion on the ethics of terminating a pregnancy if the presence of Down syndrome was verified; n a suggestion that the friend contact support groups on Down syndrome to reduce her fear of the condition through direct contact with those living with it; and J Med Libr Assoc 93(3) July 2005

n a flame message arguing that the world does not need another Down syndrome child to ‘‘drag the world down to the lowest common level.’’ Lost in the discussion was the fact that the cutoff rate for testing positive in this case was 1 in 270: his friend’s fetus had a 1 in 270 chance of having Down syndrome. In broader terms, it meant that there was an overwhelming likelihood that the friend’s screening test had produced a false positive result. False positive screening test results are perfectly possible, and even expected, at least in low-risk populations, and screening tests can (and do) return false negative results. Unfortunately, individuals taking screening tests are 363

Burkell and Campbell

rarely aware of this fact, and the consequences of their lack of knowledge can be tragic [1]. As the unfortunate sci.med subscriber learned, the anxiety surrounding screening tests can be intense and uninformed, and the demands on information provision regarding these tests are great indeed. This raises two important questions that are addressed in this paper. Are available resources sufficient to meet the information needs of consumers? If not, can methods be identified that will support the development and delivery of appropriate information? BACKGROUND: INFORMATION, DECISIONS, AND SCREENING TESTS Consumer health information should be designed primarily to empower consumers and to enable them to make informed decisions about their health and health care. Truly informed decision making requires that consumers understand complex aspects of health care options, including information about risk and benefit that is inherently difficult to grasp [2, 3]. Informed consent has been identified as a significant issue with respect to participating in screening programs designed to test large sectors of the population for relatively infrequent but serious health conditions, including genetic birth defects, breast cancer, prostate cancer, and HIV/AIDS [4, 5]. In general, it is challenging to develop information to support informed health care decision making because of the complexity of the information and the relatively low literacy levels of the general public to whom it must be presented [6]. This situation is exacerbated in the case of deciding about participating in a screening test, because understanding the relevant information requires a high level of quantitative literacy in addition to the prose literacy required to understand most other information relevant to medical decisions. According to the International Adult Literacy Survey, almost half of North Americans lack what are considered to be minimum quantitative literacy skills [7], suggesting that many will need some form of explanatory support to understand the information relevant to deciding to participate in a screening test. Participation in screening programs is widely endorsed, particularly in the wake of recent tragedies such as the Canadian ‘‘tainted blood scandal,’’ in which thousands of Canadians were infected with HIV and hepatitis as a result of receiving blood that had not been screened for these conditions [8]. In some jurisdictions, screening is even required for certain communicable diseases [9, 10]. Nonetheless, the decision to take a screening test should not be treated as a fait accompli, because participation in screening can have potentially devastating physical and emotional consequences [11, 12]. Instead, screening test participation should be the result of an informed decision, which requires, according to the General Medical Council ,http://www 364

.gmc-uk.org/standards/CONSENT.htm., an understanding of the following:* 1. the purpose of the screening; 2. the likelihood of positive and negative results; 3. the possibility of false positive and false negative results; 4. the uncertainties and risks of screening; 5. medical, social, or financial implications; and 6. follow-up procedures after positive results. Does the available consumer health information regarding screening tests satisfy these conditions? Consumers, health care providers, and others are beginning to suggest that the answer is no [13–20]. Sometimes the problem lies in the rhetoric of persuasion: some resources seem designed to promote the uptake of screening tests rather than to offer consumers truly informed choice [15]. Sometimes, the problem lies with the complexities of risk and probability information, which is subject to incorrect interpretation as a result of cognitive heuristics and biases [3]. Information about the efficacy of screening tests is difficult to understand, and different presentation formats for the same screening test information have been shown to influence decisions about screening test participation [11, 17, 20–22]. This study set out to explore this issue on two fronts: first, to determine the validity of the concerns about the quality of information about screening tests and, second, to study how future functionalities of an increasingly dynamic Web could be exploited to mitigate these concerns. In particular, the authors aimed to see how consumer health information currently available on the Web satisfies requirements 2, 3, and 4 of the General Medical Council’s recommendation. How well does this information enable the consumer to retrieve and comprehend information regarding positive and negative results, false positives and negatives, and uncertainties of the screening procedure? If the information is insufficient, what would be required to make this information better? Screening tests: sensitivity and specificity Although screening tests are designed to be as accurate as possible, no test is perfect and there is always the chance that a specific test result is inaccurate. Consumers are naturally concerned about false negatives (the danger that a condition has somehow evaded the screening procedure), and false negative results raise liability concerns for health care practitioners [23]. In the case of widespread screening for conditions such as HIV or genetic birth defects, however, the potential for false positive results is also matter of grave concern, particularly because screening tests, designed as they are to catch all possible positive cases, will inevitably produce some false positive results. The costs of a false negative result are obvious, given that screen* Providing decision makers with this information will also benefit health care practitioners by limiting the legal liability that may arise from false negative or false positive screening test results [13, 23].

J Med Libr Assoc 93(3) July 2005

Web-based consumer health information

ing tests attempt to identify conditions that are serious, possibly treatable, and, in some cases, transmissible. The costs of false positive results, although less obvious, are no less significant. The evidence suggests that false positive screening results can have serious emotional, psychological, and even physical consequences that may persist after the test result is resolved [24–29]. The efficacy of screening tests is generally characterized by two measures: n sensitivity: the proportion of actual cases that are correctly identified by the test n specificity: the proportion of negative cases that are correctly identified by the test Although these values vary greatly from test to test, for many well-established tests (e.g., the ELISA test for AIDS/HIV), sensitivity and specificity are both over 90%, indicating that the test correctly detects at least 90% of cases and correctly identifies the absence of the condition in at least 90% of cases. These figures, however, are somewhat misleading, because the meaning of a specific test result is actually far less certain than the sensitivity and specificity values might suggest. Predictive values, a priori likelihood, and meaning of a screening test result Sensitivity and specificity are important indicators of test performance: they reflect the ability of the test to correctly categorize someone whose status (condition present or absent) is already known by other means. But the friend of the sci.med subscriber did not know her correct status; rather, she was using the test to help determine that status and, therefore, needed to know the likelihood that her individual result was correct. This likelihood is termed ‘‘predictive value.’’ n ‘‘Negative predictive value’’ is the likelihood that a negative test result is correct. n ‘‘Positive predictive value’’ is the likelihood that a positive test result is correct. Informing decision makers about the predictive value of tests is a critical part of the consent process for screening test participation, especially because consumers often have inaccurate perceptions about the ability of tests to detect disease and have difficulty interpreting test results [30–32]. The predictive value of a test is related to sensitivity and specificity, but it is also closely tied to the a priori risk of having the condition, and this relationship is neither intuitive (even for experts) nor widely understood by the general public [33–35]. The a priori likelihood of the condition is reflected in the base rate of the condition in the tested population, and it has a large and often counterintuitive effect on predictive value. Take as an illustrative case a hypothetical condition that occurs in 1% of the screened population, and a screening test that detects 90% of those cases where the condition is present (sensitivity of 90%) and correctly identifies 90% of those cases where the condition is absent (specificity of 90%). Given an incidence rate of 1%, 10 people among the 1,000 screened would be expected to have the condition, and 990 would be J Med Libr Assoc 93(3) July 2005

Figure 1 Contingency table for screening test results, assuming sensitivity 90%, specificity 90%, and base rate 1%

Test result

Condition present

Condition absent

Positive

True positive 9

False positive 99

108

Negative

False negative 1

True negative 891

892

10

990

1,000 cases

Positive predictive value True positive/(true positive 1 false positive) 9/(9 1 99) 0.0833 Negative predictive value True negative/(true negative 1 false negative) 891/(1 1 891) 0.9988

expected not to have it. Given the 90% sensitivity of the test, 9 of the 10 people with the condition would test positive and 1 would incorrectly test negative. Given the 90% specificity of the test, 891 of the 990 people who do not have the condition would test negative and 99 would incorrectly test positive. Thus, a total of 108 positive test results among the 1,000 individuals tested (9 true positives and 99 false positives) would be anticipated, of which fewer than 10% are true positives (Figure 1). The implication is somewhat surprising: in this case, someone who receives a positive screening test result is still far more likely not to have the condition in question. If the same screening test were applied in a population where the incidence rate was 10%, the meaning of a positive test result would change dramatically. One hundred people would be expected to have the condition (given the 10% incidence rate), and 900 would be expected to be free of it. Among the 100 with the condition, 90 would test positive and 10 would test negative (based on 90% test specificity). Among the 900 without the condition, 810 would test negative and 90 would test positive (based on 90% test sensitivity). Thus, out of 1,000 tests, a total of 180 positive results would be expected, of which 90 (or 50%) would be true positives. As the example illustrates, base rate has an important and astonishing effect on predictive value. The predictive value of test results is, therefore, important information both for someone considering whether to take a screening test and for someone interpreting a screening test result. In both cases, it is critical that the consumer understand that false positive and false negative test results are possible, and the consumer should understand the potential costs of inaccurate results. False negative results mean that the condition goes undetected and potentially untreated [23]. Positive screening test results require additional tests to determine whether the result is a false positive, and these additional tests can be costly, time consuming, painful, and harmful. For example, the cost of examinations 365

Burkell and Campbell

required to resolve positive screening mammography tests has been identified as a substantial problem in at least one jurisdiction [24], and the procedures generally suggested to resolve a positive maternal serum screen result (amniocentesis and chorionic villus sampling) carry a risk of spontaneous abortion on the order of 1%. Furthermore, a positive result causes significant emotional and psychological distress, and that distress (or its consequences) often endures, even if the initial result is shown to be a false positive [25–29]. In some cases, a false positive can have a permanent impact on social or organizational status. For example, in Canada all blood donations are screened for HIV exposure; if the result is positive, the donor is barred from future blood donation, even if subsequent testing shows the initial positive result to be a false positive. In the context of decisions about participating in a screening test, fully informed consent requires an understanding of some relatively complex statistical concepts. The concept of predictive value is particularly difficult to understand, because experts and novices alike show a strong tendency to overestimate the predictive value of tests (particularly positive predictive value) as a result of a persistent cognitive bias termed ‘‘base rate neglect’’ [30, 34]. Because screening tests are often presented by medical professionals as inevitabilities rather than as matters of choice, individuals who take the onus of the choice on themselves may be driven to support these choices by finding information on health information Websites, without the benefit of face-to-face interaction with informed professionals. Metadata, dynamic Web objects, and future Web functionalities Obviously, it is difficult to present information about screening tests in a way that both provides the necessary details and avoids the negative impact of cognitive heuristics. Two formidable challenges arise. First, information must be specific to the person receiving it: if the wrong base rate is used, estimates of predictive value (and thus the meaning of test results) will be inaccurate and misleading. Second, the appropriate information about outcomes of screening programs must be presented in a format that makes predictive value easier to understand (see Burkell [2], Gigerenzer and Edwards [3], and Bell et al. [26] for descriptions of this optimal format). Upcoming innovations in Web and information tools present some promising ways of meeting these challenges. These innovations can be classed into three broad categories: n Data identification: the development of metadata encoding systems, particularly the World Wide Web Consortium’s Resource Description Framework (RDF), which enables highly granular pieces of data to be tagged according to their semantic content and their anticipated use. In this sense, metadata ceases to be a synonym for cataloguing or any other document-representation scheme. Instead, relevant pieces of data are tagged as they occur in a wide variety of data sources, ranging from traditional documents to data retrieved 366

from databases, often referred to as ‘‘the deep Web.’’ Examples include not just articles and information pamphlets, but non-document material, such as demographic data gathered from a statistical database. n Data gathering: the development of namespaces and ontologies that enable different metadata systems to work together, thereby enabling search engines and harvesting programs to gather data from multiple and diverse sources. n Data presentation: the design of stylesheets that enable the same dataset to be viewed in multiple formats depending on the information need and the user’s preferences. These new abilities are consistent with the dynamic nature of the Web, where it is becoming increasingly possible to mount Web objects that are regularly and even automatically updated and that pull data from multiple sources into on-the-fly views according to specific needs. Metadata systems that encode relevant pieces of medical information could be employed to provide dynamic consolidations of diverse data relevant to decision making, and styling instructions could present this information in a way that maximizes the chance of comprehension. If this system is to happen, however, more needs to be known about the ways in which existing information to support decision making is presented, to understand where it succeeds and where it fails, and to estimate what would theoretically be required of a metadata system that would enhance the utility and value of these sites. As a result, we felt it necessary to examine information seeking regarding important and common screening tests. Our study, therefore, was designed around the following research questions: 1. How well does the information currently available from prominent health information Websites support an informed decision to participate in a screening test? In particular, how well does this information support objectives 2, 3, and 4 of the General Medical Council’s recommendation and clearly inform the user about the likelihood of positive and negative results, the possibility of false positive and false negative results, and the uncertainties and risks of screening? 2. How can metadata improve this information? In particular, what would be involved in creating a metadata set that supports the accurate and meaningful collection and display of information relating to sensitivity, specificity, predictive value, a priori likelihood, and test result meaning? And what can health libraries and librarians do to bring these beneficial changes about? The research reported in this paper extends previous studies in three ways. First, this study examines information on three different screening tests, while other studies focus on only one test (most often screening mammography, although individual studies have focused on testing for other conditions). Second, this study examines the information on these screening tests available through a particularly important, widely referenced health information portal: MedlinePlus ,www.medlineplus.com.. This portal, a project of J Med Libr Assoc 93(3) July 2005

Web-based consumer health information

the National Library of Medicine, is endorsed by the American College of Physicians In fact, in a number of US states, physicians are provided with ‘‘information prescription pads’’ that are preprinted with MedlinePlus uniform resource locators (URLs), so that physicians can direct consumers to this resource. Finally, the study seeks to establish, not the core elements of a metadata set for presenting this information, but the core conditions in which a metadata set would have to function to serve a meaningful and informative purpose. METHOD We chose to examine the information provided to health care consumers about three common screening tests: n maternal serum screening for birth defects (including Down syndrome) n prostate-specific antigen (PSA) testing for prostate cancer n screening mammography for breast cancer Although each of the tests shows high levels of sensitivity and specificity, none is perfect in these respects, and at least some false positive and false negative results are expected with each [36–38]. The results of these tests, therefore, have some associated uncertainty, and the principles of informed consent require that these uncertainties be understood by consumers who are deciding whether to take the tests. We used MedlinePlus as our information source, because this Web-based consumer health information resource is widely recommended [39–41]. The MedlinePlus Website was examined for information about the three screening tests. MedlinePlus resources are organized in categories, and we selected one category to examine for information for each test. In each case, we first looked for a category for the test or general testing procedures for the condition (e.g., screening mammography, prenatal testing). If we did not find this category, we used the category for the condition itself (e.g., prostate cancer). Information resources in the identified categories were selected for examination on the basis of their titles as they appeared on the MedlinePlus page. Resources were selected if the titles indicated that they were: 1. general resources about testing for the condition (e.g., ‘‘Prenatal Tests’’ or ‘‘Medical Tests for Prostate Problems’’) 2. information resources for the specific screening test (e.g., ‘‘Mammograms’’) 3. answers to frequently asked questions about specific screening tests (e.g., ‘‘Questions and Answers about the Prostate-Specific Antigen (PSA) Test’’) 4. decision aids to help consumers make decisions about screening tests (e.g., ‘‘Prostate Cancer Screening: A Decision Guide’’ or ‘‘Prenatal Testing: What’s Involved and Who Should Consider It’’) 5. materials designed to promote screening uptake (e.g., ‘‘Get a Mammogram: Do It For Yourself, Do It For Your Family’’) J Med Libr Assoc 93(3) July 2005

Additional information for each screening test might have been available on other areas of the MedlinePlus site (e.g., information about screening mammography would no doubt also be available among the resources for breast cancer), and, in the selected category, resources other than those identified could include information about the test in question. We, however, took the perspective of a typical reasonably informed information seeker, who would almost certainly use the most specific information available (e.g., looking under ‘‘mammography’’ rather than ‘‘breast cancer’’ for mammography information) and selected resources on the basis of title, which presumably would indicate the information focus. Thus, we believe that the selected information resources represented those that consumers would most likely access from MedlinePlus, if they went to that information portal to learn about any of the screening tests we considered. Each selected resource was examined for the following information: 1. Does it indicate that the test is less than 100% accurate? 2. Does it indicate that false positive results are possible? 3. Does it indicate that false negative results are possible? 4. Does it provide information about test sensitivity (the proportion of actual cases identified)? 5. Does it provide information about test specificity (the proportion of negative cases correctly rejected)? 6. Does it provide any information about positive predictive value (the proportion of positive results that are true positives, rather than false positives)? 7. Does it provide any information about negative predictive value (the proportion of negative results that are true negatives, rather than false negatives)? 8. Does it indicate that predictive value depends on a priori probability of disease? RESULTS Prenatal testing for Down syndrome: maternal serum screening MedlinePlus has a specific subject heading for prenatal testing, with a total of twenty links to resources. Of these, we identified six resources that addressed the maternal serum screening test, under the headings ‘‘General/Overviews’’ and ‘‘Specific Conditions/Aspects.’’ Two of the resources (M4 and M6) came from the same information provider but were intended for different audiences: one (M6) for consumers, while the other (M4), though cited on this consumer health resource list, for health professionals. The remaining three resources were from different sources, and each appeared to have consumers as the intended audience. Table 1 presents the analysis of the identified resources. Overall, the resources provide little of the information required for informed decisions about maternal serum screening. Each resource notes that the test is not perfectly accurate. All of the resources indicate that false positive results can occur, but only 367

Burkell and Campbell

Table 1 Maternal serum screening resources Information resources Does the resource . . . 1. 2. 3. 4. 5. 6. 7. 8.

Indicate that the test is less than 100% accurate? Indicate that false positive results are possible? Indicate that false negative results are possible? Provide information about test sensitivity? Provide information about test specificity? Provide information about positive predictive value? Provide information about negative predictive value? Indicate that predictive value depends on a priori probability of disease?

M1

M2

M3

M4

M5

M6

Y Y N N N N N N

Y Y N N N Y N N

Y Y N N N N N N

Y Y Y Y N Y N N

Y Y Y Y N N Y N

Y Y N N N Y N N

Information resources: M1. Prenatal Testing: What’s Involved and Who Should Consider It ,http://www.mayoclinic.com/invoke.cfm?id5PR00014. M2. Prenatal Tests ,http://kidshealth.org/parent/system/medical/prenatalptests.html. M3. Routine Tests during Pregnancy ,http://www.medem.com/MedLB/articlepdetaillb.cfm?articlepID5ZZZ84JKXODC&sub_cat52005. M4. Maternal Blood Screening (for professionals and researchers) ,http://www.marchofdimes.com/professionals/681p1166.asp. M5. Prenatal Genetic Testing ,http://www.dnapolicy.org/genetics/prenatal.jhtml. M6. Triple Screen ,http://www.marchofdimes.com/pnhec/159p522.asp.

two (M4 and M5) identify that false negative results are also possible. However, some of this information is implied rather than explicitly stated. M2, for instance, does not actually use the term ‘‘false positive’’ and instead only implies that some positive results might be false: ‘‘screening tests only reveal the possibility of a problem existing’’ (italics added). Only two resources (M4 and M5) indicate the sensitivity of the test, and none discuss test specificity. Half of the resources (M2, M4, and M6) provide information about the positive predictive value of the test, and one (M5) provides information about negative predictive value. Even more important, none of the resources indicate that the predictive value of the test depends on a priori risk of the condition.

appeared likely to address PSA testing or medical testing for prostate cancer. All were written for lay audiences. One relevant resource was identified under the heading ‘‘Overviews,’’ three under ‘‘Diagnosis/Symptoms,’’ and two under ‘‘Prevention/Screening.’’ Table 2 presents an analysis of the PSA testing information resources. All of these resources indicate that the test is less than 100% accurate, half discuss sensitivity (P2, P4, and P5), and none discusses specificity. Most resources discuss the possibility of false positive (all) and false negative (P2, P3, P4, and P6) results, and, while half (P2, P4, and P5) provide a numeric estimate of positive predictive value, none provides an estimate of negative predictive value. Some resources offer information about the base rate for prostate cancer among various groups. P3, for instance, mentions ‘‘risk factors’’ such as age: ‘‘The risk of developing prostate cancer increases as a man gets older.’’ These data, however, are used primarily to assess whether the user is a member of a higher-risk group, and only two of the information resources (P2 and P4) explicitly identify the impact of base rate on the predictive value of the test.

Prostate-specific antigen (PSA) test for prostate cancer MedlinePlus does not have a specific page for prostate cancer screening; instead, resources on this test were found on the prostate cancer page. A total of seventyone resources were listed on the page, of which six Table 2 Prostate-specific antigen (PSA) testing resources

Information resources Does the resource . . . 1. 2. 3. 4. 5. 6. 7. 8.

Indicate that the test is less than 100% accurate? Indicate that false positive results are possible? Indicate that false negative results are possible? Provide information about test sensitivity? Provide information about test specificity? Provide information about positive predictive value? Provide information about negative predictive value? Indicate that predictive value depends on a priori probability of disease?

P1

P2

P3

P4

P5

P6

Y Y N N N N N N

Y Y Y Y N Y N Y

Y Y Y N N N N N

Y Y Y Y N Y N Y

Y Y N Y N Y N N

Y Y Y N N N N N

Information resources: P1. Medical Tests for Prostate Problems ,http://kidney.niddk.nih.gov/kudiseases/pubs/prostatetests/. P2. Questions and Answers Regarding the Prostate-Specific Antigen (PSA) Test ,http://cis.nci.nih.gov/fact/5p29.htm. P3. Prostate Cancer (PDQR): Screening ,http://www.cancer.gov/cancerinfo/pdq/screening/prostate/patient/. P4. Prostate Cancer Screening: A Decision Guide ,http://www.cdc.gov/cancer/prostate/decisionguide/. P5. Can Prostate Cancer Be Found Early? ,http://www.cancer.org/docroot/CRI/content/CRIp2p4p3XpCanpprostatepcancerpbepfoundpearlyp36.asp?sitearea5. P6. Prostate Cancer: The Public Health Perspective ,http://www.cdc.gov/cancer/prostate/prostate.htm.

368

J Med Libr Assoc 93(3) July 2005

Web-based consumer health information

Table 3 Screening mammography resources Information resources Does the resource . . . 1. 2. 3. 4. 5. 6. 7. 8.

Indicate that the test is less than 100% accurate? Indicate that false positive results are possible? Indicate that false negative results are possible? Provide information about test sensitivity? Provide information about test specificity? Provide information about positive predictive value? Provide information about negative predictive value? Indicate that predictive value depends on a priori probability of disease?

B1

B2

B3

B4

B5

B6

N N N N N N N N

Y Y Y Y N Y N N

Y Y Y N N N Y N

Y Y N N N N N N

Y Y N N N Y N N

Y Y Y N Y Y N N

Information resources: B1. Get a Mammogram: Do It for Yourself, Do It for Your Family ,http://cancer.gov/images/Documents/68432989-7c99-4e56-8352-c813d5ef3422/english.pdf. B2. Screening Mammograms ,http://cis.nci.nih.gov/fact/5p28.htm. B3. Mammogram ,http://www.nlm.nih.gov/medlineplus/tutorials/mammogram/rd139101.pdf. B4. Mammograms and Breast Cancer ,http://www.fda.gov/opacom/lowlit/mammo.html. B5. Mammography ,http://www.radiologyinfo.org/content/mammogram.htm. B6. Mammography and Other Breast Imaging Procedures ,http://www.cancer.org/docroot/cri/content/crip2p6xpmammographypandpotherpbreastpimagingpproceduresp 5.asp?sitearea5cri.

Screening mammography

DISCUSSION

MedinePlus has a specific category for mammography, and all resources on screening mammography were identified from this page. Of the thirty-two resources on the page, six were identified that provided a general overview of screening mammography (all others addressed more specific aspects such as the influence of breast implants on mammography), and all fell under the categories ‘‘from the National Institutes of Health’’ and ‘‘General/Overviews.’’ Two of the resources (B1 and B2) were provided by the same organization, and all appeared to be written for lay audiences. One resource (B1) was obviously designed to promote participation in screening mammography; of all the resources examined for the three screening tests, this was the only one that explicitly promoted the test. Table 3 presents the full results for all resources. Most of the six identified resources indicate some degree of uncertainty associated with the results of screening mammography. The sole exception to this rule (B1) is also the single resource explicitly promoting mammography, and this resource is the only one that provides none of the desired information about test efficacy. The remaining discussion addresses only those five resources (B2–B6) that include at least some information about the indeterminacy of test results. Among these five remaining resources, all indicate that the test is less than 100% accurate. Only one (B2) addresses the question of sensitivity (providing a numeric estimate for the proportion of cancers detected by mammography), and none discusses test specificity. Each indicates that at least some results are false positives, and three (B2, B5, and B6) provide some indication of the proportion of positive results that are actually false positives. Three of the five resources (B2, B3, and B6) indicate that false negative results are possible, although only one (B3) provides numeric estimates for the proportion of negative results that are incorrect. None of the resources discusses the effect of a priori risk on the meaning of test results.

The results of this study are consistent with previous research examining consumer information about screening tests delivered via printed text [42–44], online [13], and in person [45], extending those previous results by examining online information about a variety of screening tests. The conclusion is simple: the information available for consumers regarding screening tests is inadequate for informed decision making and inadequate as a basis for interpreting screening test results. This result is somewhat surprising, given the widespread recognition of the problem from the perspective of both basic research in cognitive psychology [3] and applied work in health sciences [46]. Although what information consumers need to make these decisions and how that information should be presented is well known, these prescriptions are not being followed in real information provision. When evaluating this inadequacy, however, it is crucial to make an important distinction between information that is incorrect and information that is incomplete. The information studied on MedlinePlus appeared to be substantially reliable, correct, and created with the best of intentions. The portal includes information on the three screening tests, with six resources identified for each test. Most of the identified information resources are developed by different information providers, and almost all are directed toward lay audiences. In all but one case (a single resource for screening mammography), the goal of the resources appears to be to offer information to support an informed decision, rather than simply to promote uptake of the screening test. It is important to note that the single information resource that explicitly promotes a test is also the only resource that does not indicate that the test is less than 100% accurate. In every other case, the information resource explicitly states that screening test results are not always correct. An information service that tags the granular elements of these resources with metadata for harvesting could be reason-

J Med Libr Assoc 93(3) July 2005

369

Burkell and Campbell

Figure 2 Natural frequency presentation format Of 1,000 pregnant women who are 40 years of age, 10 will be carrying children with Down syndrome. If all 1,000 women were tested, 9 of the women with Down syndrome babies would test positive for the condition, and one would test negative. Of the 990 women whose babies do not have Down syndrome, 394 would test positive and 596 would test negative.

ably sure that the information, such as it is, is accurate and useful. For fully informed consent, however, users clearly need more. It is not enough to warn consumers that tests are less than perfectly accurate. Given the significant physical and emotional consequences of the test results, consumers must also understand the kinds of errors they can expect, if they are to minimize these traumatic consequences. And here, the resources are not satisfactory. The most basic question is whether the information resources indicate that false positive and/ or false negative results are possible. In most cases, the resources do provide this information, although it is not always explicitly presented. Thus, although some resources explicitly state that false positives (or false alarms) and false negatives (or misses) can occur, some rely on the consumer to infer this from a relatively oblique presentation. Armed with the knowledge that screening tests are less than 100% accurate and that both false negative and false positive results are possible, consumers are in a better position to make informed decisions about participating in a screening test or to understand the meaning of screening test results that have been reported to them. Nonetheless, fully informed consent requires a greater level of knowledge and, thus, a greater level of informing. In particular, fully informed consumers will also understand the predictive value (both negative and positive) of screening test results, values that depend critically on test sensitivity, specificity, and the base rate of the condition in the population to which they belong. What do these Web resources tell consumers with respect to these aspects of test performance? In general, less than half of the resources identify test sensitivity, and only one identifies test specificity. Arguably, however, sensitivity and specificity are of less interest to those taking the test than are positive and negative predictive values or the degree to which the decision maker can trust the results of the test, and these predictive values depend critically on the a priori probability of disease: screening tests that have extremely accurate results, when applied to high-risk populations, have a much lower likelihood of being correct when applied to low-risk groups. Unfortunately, the information resources do not provide consumers with explicit details about the relationship, and they do not even signal the issue. Not one of the resources provides explicit numeric information about the impact of a priori risk on the predictive value of test results, and only a small minority of resources even indicate that this could be an issue in the 370

interpretation of the results of screening tests. This should be a significant concern to consumers, information professionals, and health care providers. Information seekers who are trying to make important decisions or who are dealing with the anxiety and despondency engendered by positive screening test results presented in a cryptic manner will be unable to clarify their conditions and take intelligent action based on the information they receive from Web-based information. THE ROLE OF METADATA Given that the information currently on these health information sites is inadequate, the questions remain: to what extent can metadata be used to improve the sites, and how can the medical library community bring about this improvement? Data identification Because the relevant data on sensitivity, specificity, and population base rates tend to appear in different documents from different sources, librarians in the areas of science, health, and government information are often in the best position to find sources of this information: in reliable journals, in government census data, and in the enormous wealth of consumer health information that is now available. Data harvesting Future metadata harvesting systems, building on such initiatives as the Open Archives Initiative’s Metadata Harvesting Protocol, will gather that data from multiple sources. This data gathering is critical to developing and presenting information about screening tests. Such information could even be linked to a patient’s medical profile and the information filtered to provide information about the predictive value of screening tests according to such factors as the patient’s age, gender, medical status, and current medications. In such a situation, extreme care will be needed both to protect the patient’s rights to privacy and full access and to ensure that data are properly disambiguated, so that the appropriate filters can be added. With their long traditions of supporting access and practicing authority control, libraries could play a part in developing such systems. Data presentation Finally, semantically meaningful document and data structures could be filled with this harvested metadata and then styled in ways that support understanding and decision making. Research in both cognitive psychology [2, 47] and medical decision making [46] has identified presentation formats that promote optimal understanding of the difficult concept of predictive value. Preferred information formats include contingency tables of the sort in Figure 1 or verbal presentations of what are termed ‘‘natural frequencies’’ [2, 47] (Figure 2). J Med Libr Assoc 93(3) July 2005

Web-based consumer health information

The widespread adoption of these standard views could make them common across many different tests and many different resources, creating familiarity and enhancing the likelihood that users will interpret them correctly. Above all, a standardized, dynamic metadata structure that integrates data and performs the necessary calculations would achieve the important task of relating the base rate data to the test’s sensitivity and specificity, thereby showing more clearly how the a priori likelihood of having the condition alters the predictive value of the tests. These standardized sets of structures, tagging sets, and harvesting mechanisms, therefore, could significantly alleviate the difficulty of providing the necessary information for making informed decisions. Two important challenges, however, make the creation and adoption of standard metadata systems far more complicated than it might appear at first. 1. The ‘‘human’’ touch All of these sites are designed and created by organizations committed to providing information in a direct and personal way: anticipating the end user’s questions and fears and using a rhetoric of respect, caring, and individual address to reach a user who may well be confused, overwhelmed, and frightened. The National Cancer Institute frames its information on screening mammography and PSA testing as a series of questions, each followed by a clearly written answer designed to inform without alarming. Clearly, any system of dynamically created and updated medical data will need to retain that commitment to the rhetoric of caring, respectful, calm, and unhurried communication. 2. ‘‘Diagnosis’’ versus ‘‘screening’’ The distinction between screening—looking for evidence of a condition when no symptoms are present— and diagnosis—attempting to find a reason for specific symptoms—is absolutely critical to the predictive value of a test. Some tests, such as mammograms, are used both for diagnosis and screening; others, such as maternal serum screening tests, are used only for screening purposes. When a test is used for screening, there is no prior evidence that the patient has the condition, and the a priori likelihood that the condition is present is the base rate in the population of which the patient is a member. For example, for a woman with no symptoms of breast cancer, the a priori likelihood of her having it is determined by the base rate of breast cancer for women who are comparable in terms of relevant risk factors such as age and family history. When, however, a test is used for diagnosis, the patient is showing symptoms of the condition, and the appropriate a priori likelihood is the base rate in the population of those who exhibit the symptom. For a woman with a lump on her breast, therefore, the a priori likelihood is determined by the rate of breast cancer for women in her age range who have that J Med Libr Assoc 93(3) July 2005

symptom of breast cancer. This rate is understandably much higher. Because the positive predictive value of a test is closely tied to base rate and increases as base rate increases, the positive predictive value of a test for diagnosis is considerably higher than the positive predictive value of the same test for screening purposes.† This distinction is not always clear in the resources we examined. While many resources mention incidence of conditions in different populations (such as age groups or ethnic groups), very few of the information resources explicitly mentioned how strongly these base rates affect predictive value. Any metadata system that generates contingency tables for screening tests must be designed to establish, from the outset, whether the test is for screening or for diagnosis. Otherwise, it runs the risk of using incorrect estimates for the a priori likelihood of the condition, thus potentially giving entirely incorrect measures of predictive value: dramatically underestimating the likelihood of a false positive in screening instances or dangerously overestimating the likelihood in diagnostic instances. Telling a 25-year-old woman who shows concrete symptoms of breast cancer that her chances of having it are 1 in 2,000 (the general base rate for women in their 20s) would be a disastrous act of misinformation. Calculating the predictive value of her screening mammography results based on this inaccurate a priori risk would lead her to drastically underestimate the likelihood of a true positive test result. CONCLUSIONS Our study has confirmed to a great extent previous research suggesting that online health information is currently inadequate to support informed decision making. Looking at a set of specific criteria appropriate for information about screening tests for three separate conditions, we conclude that the emerging possibilities of metadata tagging at the granular data level show significant potential for raising the standards of information presentation for risk assessment and the enlightened act of personal choice. Such metadata harvesting and presentation techniques, already being pioneered in hospital portals, could, in the future, lead to the standardized presentation of these data in such a way that the empowerment of personal choice in health decisions becomes the de facto standard. Nonetheless, we are equally convinced that this attractive scenario will not come about simply through the naive imposition of metadata schemes that fail to take into account the real complexities of health information communication. Emerging metadata systems will need to negotiate the commitment of the health † This distinction is frequently mentioned in the health information resources but is not always represented in the layout of general resource pages. MedlinePlus, for instance, presents most of its information about HIV tests in the section labeled ‘‘Diagnosis/Symptoms.’’ The section labeled ‘‘Prevention/Screening’’ lists only resources that, from their titles, appear to be concerned solely with prevention.

371

Burkell and Campbell

community to communicating directly, personally, and sympathetically and to maintaining public health and safety at times when the individual’s interests or behavior may threaten that safety. Finally, the crucial difference between screening and diagnosis shows that estimations of likelihood are heavily influenced by fine and subtle distinctions that are not always clearly demarcated in the information as it stands. To avoid disastrous instances of misinformation, the imposition of metadata systems must be accompanied by the careful and conscientious improvement of the data. ACKNOWLEDGMENTS Funding for this project was provided through a University of Western Ontario Internal Social Sciences and Humanities Research Council grant awarded jointly to the first two authors. Considerable research assistance was provided by Cindy Peltier, master’s of library and information science student, and Robin Hepher, master’s of library and information science student, both of University of Western Ontario. REFERENCES 1. STINE GJ. Testing for Human Immunodeficiency Virus. AIDS Update 1994–1995. New Jersey, NJ: Prentice Hall, 1995: 231. 2. BURKELL J. What are the chances? evaluating risk and benefit information in consumer health materials. J Med Libr Assoc 2004 Apr;92(2):200–8. 3. GIGERENZER G, EDWARDS A. Simple tools for understanding risk: from innumeracy to insight. BMJ 2003 Sep 27; 327(7417):741–4. 4. AUSTOKE J. Gaining informed consent for screening. BMJ 1999 Sep 18;319(7212):722–3. 5. GRIBBLE JN. Informed consent documents for BRCA1 and BRCA2 screening: how large is the readability gap? Patient Educ Couns 1999 Nov;38(3):175–83. 6. RUDD RE, MOEYKENS BA, COLTON TC. Health and literacy: a review of medical and public health literature. In: Comings J, Garner B, Smith C, eds. Annual review of adult learning and literacy. v.1. New York, NY: Jossey-Bass, 1999: 158–99. 7. Literacy in the information age: final report of the international adult literacy survey. Education & Skills 2000 (9):1– 204. 8. Charges laid in tainted blood scandal. [Web document]. CBC News. [cited 22 Mar 2005].,http://www.cbc.ca/ stories/2002/11/20/bloodpcharges021120.. 9. GARMAISE D. Mandatory HIV testing used to bar potential immigrants. Can HIV AIDS Policy Law Rev 2003 Apr; 8(1):20–1. 10. SNELL JG. Mandatory HIV testing and prostitution: the world’s oldest profession and the world’s newest deadly disease. Hastings Law J 1994 Aug;45(6):1565–92. 11. RAFFLE AE. Information about screening: is it to achieve high uptake or to ensure informed choice? Health Expect 2001 Jun;4(2):92–8. 12. WELCH HG. Informed choice in cancer screening. JAMA 2001 Jun 6;285(21):2776–8. 13. THORNTON H, EDWARDS A, BAUM M. Women need better information about routine mammography. BMJ 2003 12 Jul;327(7406):101–3. 14. CARROLL JC, BROWN JB, REID AJ, PUGH P. Women’s ex-

372

periences of maternal serum screening. Can Fam Physician 2000 Mar;46:614–20. 15. JøRGENSEN KJ, GøTZSCHE PC. Presentation on Websites of possible benefits and harms from screening for breast cancer: cross sectional study. BMJ 2004 17 Jan;328(7432):148–51. 16. LIAMPUTTONG P, HALLIDAY JL, WARREN R, WATSON F, BELL RJ. Why do women decline prenatal screening and diagnosis? Australian women’s perspective. Women and Health 2003;37(2):89–108. 17. SANTER M, WYKE S, WARNER P. Women’s experiences of chlamydia screening. qualitative interviews with women in primary care. Eur J Gen Pract 2003 Jun;9(2):56–61. 18. SEARLE J. Routine antenatal screening: not a case of informed choice. Aust N Z J Public Health 1997 Jun;21(3):268– 74. 19. SEROR V, COSTET N, AYME S. Participation in maternal marker screening for Down syndrome: contribution of the information delivered to the decision-making process. Community Genet 2001;4(3):158–72. 20. WARD J. Population-based mammographic screening: does ‘‘informed choice’’ require any less than full disclosure to individuals of benefits, harms, limitations and consequences? Aust N Z J Public Health 1999 Jun;23(3):301–4. 21. EDWARDS A, ELWYN G, COVEY J, MATTHEWS E, PILL R. Presenting risk information: a review of the effects of ‘‘framing’’ and other manipulations on patient outcomes. J Health Commun 2001 Jan–Mar;6(1):61–82. 22. SARFATI D, HOWDEN-CHAPMAN P, WOODWARD A, SALMOND C. Does the frame affect the picture? a study into how attitudes to screening for cancer are affected by the way benefits are expressed. J Med Screen 1998;5(3):137–40. 23. STOLLMAN N. We’re good . . . but we’re not perfect: the limitations of diagnostic tests. Managing Risk 2001 Winter; 8(2):1–4. 24. LIDBRINK E, ELFVING J, FRISELL J, JONSSON E. Neglected aspects of false positive findings of mammography in breast cancer screening: analysis of false positive cases from the Stockholm trial. BMJ 1996 Feb 3;312(7026):273–6. 25. ANDRYKOWSKI MA, CARPENTER JS, STUDTS JL, CORDOVA MJ, CUNNINGHAM LL, BEACHAM A, SLOAN D, KENADY D, MCGRATH P. Psychological impact of benign breast biopsy: a longitudinal, comparative study. Health Psych 2002 Sep; 21(5):485–94. 26. BELL S, PORTER M, KITCHENER H, FRASER C, FISHER P, MANN E. Psychological response to cervical screening. Prev Med 1995 Nov;24(6):610–6. 27. LERMAN C, TROCK B, RIMER BK, JEPSON C, BRODY D, BOYCE A. Psychological side effects of breast cancer screening. Health Psych 1991;10(4):259–67. 28. TOWLER B, IRWIG L, GLASZIOU P, KEWENTER J, WELLER D, SILAGY C. A systematic review of the effects of screening for colorectal cancer using faecal occult blood test, hemoccult. BMJ 1998 Aug 29;317(7158):559–65. 29. BRETT J, AUSTOKER J, ONG G. Do women who undergo further investigation for breast screening suffer adverse psychological consequences? a multicentre follow-up study comparing different breast screening result groups five months after their last screening appointment. J Pub Health Med 1998 Dec;20(4):396–403. 30. WARDLE J, POPE R. The psychological costs of screening for cancer. J Psychosom Res 1992 Oct;36(7):609–24. 31. BARRATT A, COCKBURN J, FURNIVAL C, MCBRIDE A, MALLON L. Perceived sensitivity of mammographic screening: women’s views on test accuracy and financial compensation for missed cancers. J Epidemiol Community Health 1999 Nov;53(11):716–20. 32. DAVEY HM, LIM J, BUTOW PN, BARRATT AL, HOUSSAMI

J Med Libr Assoc 93(3) July 2005

Web-based consumer health information

N, HIGGINSON R. Consumer information materials for diagnostic breast tests: women’s view on information and their understanding of test results. Health Expect 2003 Dec;6(4): 298–311. 33. MICHIE S, DORMANDY E, MARTEAU TM. Informed choice: understanding knowledge in the context of screening uptake. Patient Educ Counseling 2003 Jul;50(3):247–53. 34. CASE DA, FANTINO E, GOODIE AS. Base-rate training without case cues reduces base-rate neglect. Psychon Bull Rev 1999 Jun;6(2):319–27. 35. LABARGE AS, MCCAFFREY RJ, BROWN TA. Neuropsychologists’ abilities to determine the predictive value of diagnostic tests. Arch Clin Neuropsychol 2003 Mar;18(2):165– 75. 36. CAPLAN A, KRATZ A. Prostate-specific antigen and the early diagnosis of prostate cancer. Am J Clin Pathol 2002 Jun; 117(Suppl):S104–S108. 37. GILL KS, YANKASKAS BC. Screening mammography performance and cancer detection among black women and white women in community practice. Cancer 2004 1 Jan; 100(1):139–48. 38. WAPNER R, THOM E, SIMPSON JL, PERGAMENT E, SILVER R, FILKINS K, PLATT L, MAHONEY M, JOHNSON A, HOGGE WA, WILSON RD, MOHIDE P, HERSHEY D, KRANTZ D, ZACHARY J, SNIJDERS R, GREENE N, SABBAGHA R, MACGREGOR S, HILL L, GAGNON A, HALLAHAN T, JACKSON L, First Trimester Maternal Serum Biochemstry and Fetal Nuchal Translucency Screening (BUN Study Group). First-trimester screening for trisomies 21 and 18. N Engl J Med 2003 Oct 9; 349(15):1405–13.

J Med Libr Assoc 93(3) July 2005

39. FITZPATRICK RB, HENDLER G. What every medical librarian should know about MedlinePlus. Med Ref Serv Q 1999 Winter;18(4):11–7. 40. LACROIX EM, MEHNERT R. The US National Library of Medicine in the 21st century: expanding collections, nontraditional formats, new audiences. Health Info Libr J 2002 Sep; 19(3):126–32. 41. MILLER N, LACROIX EM, BACKUS JEB. MedlinePlus: building and maintaining the National Library of Medicine’s consumer health Web service. Bull Med Libr Assoc 2000 Jan; 88(1):11–7. 42. CROFT E, BARRATT A, BUTOW P. Information about tests for breast cancer: what are we telling people? J Fam Pract 2002 Oct;51(10):858–60. 43. HOWARD K, SALKELD G. Home bowel cancer tests and informed choice: is current information sufficient? Aust N Z J Public Health 2003 Oct;27(5):513–6. 44. SLAYTOR EK, WARD JE. How risks of breast cancer and benefits of screening are communicated to women: an analysis of 58 pamphlets. BMJ 1998 25 Jul;317(7153):263–4. 45. GIGERENZER G, HOFFRAGE U, EBERT A. AIDS counseling for low-risk clients. AIDS Care 1998 Apr;10(2):197–211. 46. GOYDER E, BARRATT A, IRWIG LM. Telling people about screening programmes and screening test results: how can we do it better? J Med Screen 2000;7(3):123–6. 47. GIGERENZER G, HOFFRAGE U. How to improve Bayesian reasoning without instruction: frequency formats. Psych Rev 1995 Oct;102(4):684–704.

Received July 2004; accepted January 2005

373