A Clustering Approach to Segmenting Users of ...

4 downloads 315 Views 235KB Size Report
All rights reserved. Downloaded from www.methods-online.com on 2016-01-03 | IP: 54.152.109.166 ..... bachelor's degree –% advanced degree ). 26 – 31 – 43.
244

© Schattauer 2011

Original Articles

A Clustering Approach to Segmenting Users of Internet-based Risk Calculators C. A. Harle1; J. S. Downs2; R. Padman3 1Department

of Health Services Research, Management and Policy, University of Florida, Gainesville, Florida, USA; of Social and Decision Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA; 3H. John Heinz III College, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA 2Department

Keywords Consumer health information, cluster analysis, risk assessment, risk communication, Internet

Summary Background: Risk calculators are widely available Internet applications that deliver quantitative health risk estimates to consumers. Although these tools are known to have varying effects on risk perceptions, little is known about who will be more likely to accept objective risk estimates. Objective: To identify clusters of online health consumers that help explain variation in individual improvement in risk perceptions from web-based quantitative disease risk information. Methods: A secondary analysis was performed on data collected in a field experiment that measured people’s pre-diabetes risk perceptions before and after visiting a realistic health promotion website that provided quantitative risk information. K-means clustering was performed on numerous candidate

Correspondence to: Christopher A. Harle Department of Health Services Research, Management and Policy College of Public Health and Health Professions University of Florida P.O. Box 100195 Gainesville, FL 32610-0195 USA E-mail: [email protected]

1. Introduction Websites that provide statistics on disease and mortality rates in order to promote risk awareness and preventive health behaviors are increasingly available and often ac-

variable sets, and the different segmentations were evaluated based on between-cluster variation in risk perception improvement. Results: Variation in responses to risk information was best explained by clustering on pre-intervention absolute pre-diabetes risk perceptions and an objective estimate of personal risk. Members of a high-risk overestimater cluster showed large improvements in their risk perceptions, but clusters of both moderate-risk and high-risk underestimaters were much more muted in improving their optimistically biased perceptions. Conclusions: Cluster analysis provided a unique approach for segmenting health consumers and predicting their acceptance of quantitative disease risk information. These clusters suggest that health consumers were very responsive to good news, but tended not to incorporate bad news into their self-perceptions much. These findings help to quantify variation among online health consumers and may inform the targeted marketing of and improvements to risk communication tools on the Internet.

Methods Inf Med 2011; 50: 244–252 doi: 10.3414/ME09-01-0080 received: September 4, 2009 accepted: February 26, 2010 prepublished: March 19, 2010

cessed by health consumers. The Internet provides a natural medium for delivering risk information via dynamic text, numbers and graphics to laypeople interested in becoming more informed about their health. Quantitative information provided

in the form of probabilities, proportions and graphs is appealing because numbers are precise and objective. In addition, graphical displays allow for more vivid and automatic visual processing of information and pattern recognition [1]. Web-based applications called risk calculators are published by various organizations including academic institutions, charitable groups, and for-profit companies. These websites use statistical models to compute individualized disease risk estimates, and are popular tools for information consumers. For example, the website Your Disease Risk (www. yourdiseaserisk.wustl.edu) receives approximately 2000 visitors daily [2] and contains personal risk assessments for the likelihood of diabetes, stroke, and heart disease. The American Diabetes Association (ADA) (www.diabetes.org/phd) and National Cancer Institute (NCI) (www. cancer.gov/bcrisktool) provide similar applications. Though easy to find and oftenused, there is no consensus about the optimal way to format personal risk assessments [2, 3], nor is there sufficient evidence about their efficacy as health education and promotion tools. These shortcomings reflect the lack of definitive best practices for communicating quantitative risk information in general [4]. The intent of this paper is to report an assessment of meaningful variability among people who seek out health risk information online by segmenting these users in ways that can predict whose health risk perceptions will become more accurate after visiting the websites. Previous work has compared different approaches to formatting and displaying health risk magnitudes, identifying some factors that may improve medical risk communication generally [5–12]. Less is known about variation

Methods Inf Med 3/2011

Downloaded from www.methods-online.com on 2016-01-03 | IP: 54.152.109.166 For personal or educational use only. No other uses without permission. All rights reserved.

C. A. Harle et al.: A Clustering Approach to Segmenting Users of Internet-Based Risk Calculators

among the people who use online health interventions in the course of their daily lives and which individual characteristics predispose them toward improvements in risk perceptions. In some studies, a majority of participants failed to adjust their risk perceptions at all in the face of objective health risk estimates [13], or improved their risk perceptions somewhat but still fell short of the objective estimate [11], possibly due to anchoring and adjustment [14]. For example, most women overestimate their risk of breast cancer [15–18], a tendency that is allayed but not eliminated by counseling [19]. Those who start out underestimating their breast cancer risk also tend to be more accurate after counseling, but still optimistically biased [20]. The variability observed across and within prior studies of risk communication raises the question of whether meaningful clusters or “types” of health-risk-information seekers can be identified. Understanding these groups better may provide insight about the relative efficacy of technologymediated, consumer-oriented risk communication services. Identifying predictors of who will be more likely to accept quantitative risk information, and assessing how these predictors are distributed among health-risk-information seekers, will allow interventions to be better targeted to user segments that will reap the most benefits from them. This line of inquiry can also help to shape expectations as to when quantitative risk information delivered via the Internet will actually lead people to have more accurate risk perceptions. Finally, if reliable population segments can be identified, such identifiers may be used to tailor risk information services to particular groups, as has been done in the tailoring of other health behavior messages [21]. Differences in demographics, numeracy, and prior risk beliefs are all factors that could be used to segment the population in their response to risk information. Highly numerate individuals are more accurate than others in providing numeric risk estimates [22]. They also demonstrate more affective responses to quantitative risk information and are less influenced by irrelevant information [23]. Women who have “frequent and intrusive thoughts” about breast cancer appear to benefit less from counsel-

ing [18]. People differ in their motivation for seeking disease risk information online, and in their willingness to trust risk estimates. For example, some risk calculator users want to learn more about their own risk, while others are simply curious but do not accept the risk estimates as personally relevant [24]. People’s trust in risk assessments reflects motivations to feel less vulnerable as well as beliefs about the source’s credibility and the validity of scientific evidence generally [25]. This study extends prior work by applying unique analytic methods to perform a secondary analysis of data collected on employee risk perceptions before and after participation in an employer-sponsored online health-information campaign. In conjunction with other employer-sponsored health services, a website providing risk estimates about undiagnosed prediabetes was promoted to employees for voluntary use. Pre-diabetes is a prevalent and often undiagnosed precursor to diabetes, which can be effectively managed once diagnosed [26, 27]. Pre-diabetes and diabetes are perceived as less dreaded and fatal than heart disease, stroke, or cancer [28]. Risk factors for diabetes, including age and obesity, are not always associated with increased risk perceptions, and there is evidence of optimistic bias [29, 30]. Given the low concern about diabetes relative to its prevalence and costliness, improved risk perceptions and protective health behavior may help prevention, making the development and evaluation of effective prediabetes and diabetes risk communication tools particularly important. Participants in the study were randomly assigned to one of three websites that differed in the extent to which pre-diabetes risk information was individualized to their personal risk factors. The primary investigation [31] revealed no overall differences in improvements in people’s risk perceptions between the different websites. However, interactions emerged suggesting that different kinds of people may respond differentially to risk information. Following these results, we developed a clustering model to identify distinct “types” of people who use these tools, and to characterize how the factors defining these segments predict improvements in risk perceptions.

This study has strong ecological validity, due to its administration as a real health-promotion program rather than a laboratory-based experiment, making the investigation of population properties broadly generalizable to other employerbased health-information programs.

2. Objectives The objectives in this study were 1) to identify and describe different segments of health consumers participating in an online risk communication campaign; and 2) to identify how these segments vary in their tendency to improve in their postintervention risk perceptions, as measured by correspondence with objective quantitative risk information.

3. Methods 3.1 Participants A secondary analysis was conducted on a dataset collected in November 2008 for a randomized field experiment [31]. The experiment was administered as a realistic health promotion campaign in collaboration with the human resources office of a private university in the Mid-Atlantic United States. All university employees over the age of 18 were eligible to participate. Those who identified themselves as having a prior diagnosis of pre-diabetes or diabetes were allowed to participate but were excluded from the final data analysis. The experimental protocol was judged to adhere to ethical principles and approved by the university’s institutional review board (IRB).

3.2 Design and Materials This study used a pre-post design whereby employees completed a survey measuring their pre-diabetes and diabetes risk perceptions before and after visiting one of three similar risk communication websites. The educational objective of the intervention was to teach participants about prediabetes using quantitative estimates of

© Schattauer 2011

Methods Inf Med 3/2011

Downloaded from www.methods-online.com on 2016-01-03 | IP: 54.152.109.166 For personal or educational use only. No other uses without permission. All rights reserved.

245

246

C. A. Harle et al.: A Clustering Approach to Segmenting Users of Internet-Based Risk Calculators

Table 1 Example risk estimates: average risk (left) and personalized estimate (right) “The average adult who doesn’t have diabetes, has between a 26% and 30% chance of having pre-diabetes. This means that around 26–30 out of every 100 people are living with pre-diabetes and are at higher risk for getting diabetes.”

“Your risk of already having pre-diabetes is between 26% and 30%. Around 26–30 out of every 100 people like you are living with prediabetes and are at higher risk for getting diabetes. Compared to other men [women] around your age, your risk is about average.”

Fig. 1 Graphical personalized pre-diabetes risk estimate

undiagnosed pre-diabetes likelihood and textual information about both modifiable and non-modifiable factors, such as family history and physical activity, which make up pre-diabetes risk magnitudes. The website design was informed by a review of existing online risk calculators and two pilot studies of adult Internet users who worked with earlier versions of the site used in this study [6, 24]. The website interfaces were created by a professional web designer and focused on developing a simple, usable tool with risk information that was understandable and credible. All three website versions contained quantitative estimates of undiagnosed prediabetes risk presented as a percentage and as a number of people affected out of 100.

The estimate was an unconditional population average for approximately one-third of the participants (씰Table 1 left), or an estimate conditioned on personal risk factor information (씰Table 1 right). Those receiving individualized assessments also saw their risk plotted on a bar graph relative to an average (씰Fig. 1). All risk estimates were based on rates of undiagnosed pre-diabetes among adults in the nationally representative 2003–2006 National Health and Nutrition Examination Survey (NHANES). Individualized estimates were generated by a logistic regression model that predicted the probability of having undiagnosed pre-diabetes conditional on age, gender, BMI, family history, HDL cholesterol, blood pressure, and exercise frequen-

cy. In-depth user feedback from pilot studies indicated that these estimates were perceived as being similarly credible to other publicly available quantitative health risk information [24]. In order to further enhance the credibility of the risk information, the website was sponsored by the university’s human resources office. In addition to the summary numeric risk estimates, all websites contained textual descriptions of pre-diabetes risk factors and their relationship to disease incidence. This information was provided to give participants a more detailed understanding of pre-diabetes risk factors and how strongly each relates to the disease. In the case of the participants who received only population risk estimates, the nonpersonalized risk factor information provided an additional source that they could draw on to potentially improve their personal risk perceptions. Participants’ baseline and post-intervention perceptions of their personal risk of pre-diabetes and diabetes were assessed as a commonly used metric that evaluates both response to disease information [5, 32], and propensity to engage in protective health behavior [33]. The key perceptual measure was participants’ perception of their absolute risk of having undiagnosed pre-diabetes in response to the question: “Pre-diabetes (also called impaired fasting glucose) occurs when you have high blood sugar, but it is not yet high enough to be considered diabetes. How likely do you think it is that you currently have prediabetes?” A slider scale was provided on which participants selected their perceived chance of currently having undiagnosed pre-diabetes from 0% (labeled “no chance”) to 100% (labeled “certainty”). To obtain a related measure on a more distal outcome, absolute perceptions of getting diabetes over the next 20 years were elicited using the same slider scale.

3.3 Procedure During a two-and-a-half week period leading up to the university’s annual health and fitness fair, university employees responded to mail, e-mail, and fliers sent by Human Resources urging them to visit a

Methods Inf Med 3/2011

© Schattauer 2011

Downloaded from www.methods-online.com on 2016-01-03 | IP: 54.152.109.166 For personal or educational use only. No other uses without permission. All rights reserved.

C. A. Harle et al.: A Clustering Approach to Segmenting Users of Internet-Based Risk Calculators

Fig. 2 Analytic process used in arriving at final cluster specification

new website about important health risks. The advertising avoided describing this opportunity as a research study in order to obtain a realistic sample of participants who might typically seek online health risk information and to minimize experimenter demand effects. Employees were able to voluntarily and anonymously visit the website once, and they were asked to complete the study in one sitting. After completing the baseline risk perception questionnaire, participants took as much time as they wished to explore the pre-diabetes risk website. They were then directed to the follow-up survey where their post-intervention risk perceptions were again elicited. The surveys included a limited number of items in order to minimize participant attrition and to prevent the completion of the survey from having undue influence on participants’ perceptions or interactions with the intervention.

3.4 Data Analysis Four hundred and forty-two out of 4815 active university employees visited the website and completed the experiment. The final data set included 407 observations after removing 17 employees who reported a previous diagnosis of pre-diabetes or diabetes, and an additional 18 that did not answer one or more of the risk perception questions in either the pre or post survey.

A three-step analytic process was used in arriving at a final set of data clusters (씰Fig. 2). Candidate models were estimated on different combinations of three categories of variables: pre-intervention risk perceptions, calculated personal risk, and demographics. For each candidate variable set, k-means solutions [34] were estimated for one through ten clusters, and the optimal number of clusters (k*) was automatically selected using the gap statistic [35]. Observations were finally assigned to k* clusters. When candidate variable sets included binary or ordinal variables, Gower’s general measure of dissimilarity was used [36]. All analyses were conducted using the R statistical package [37]. Candidate models were compared based on between-cluster differences in postintervention improvement in absolute pre-diabetes risk perceptions. Absolute pre-diabetes perceptions were chosen as the outcome of interest because they were most similar in format to the objective probabilistic estimates provided by the intervention. Parsimony and subjective interpretability of the cluster means were secondary criteria used to choose between models with similar between-cluster differences in risk perception improvements. The preferred model included two variables, pre-intervention pre-diabetes risk perceptions and objective calculated risk. The gap statistic indicated a four-cluster solution was optimal. Subsequent addition

of demographic variables to this preferred set of two variables provided no additional power in finding clusters that better explained variation in post-intervention risk perception improvements.

4. Results Participant demographics were similar to the employee population except that women were much more likely than men to participate (씰Table 2). This was consistent with the fact that women are more likely to seek online health information [38]. The participant sample that sought out the prediabetes risk website was relatively educated compared to the U.S. population, which was consistent with online health information seeking [38] and to be expected given the educated nature of the organization being studied. Data were not available to determine if participation in the program was more common among more highly educated employees.

4.1 Pre-post Changes in Risk Perceptions The mean baseline absolute pre-diabetes risk perception was 12.31%, and the mean calculated risk was 18.33%. Mean postintervention risk perceptions were 10.05%, significantly lower than baseline, t (406) =

© Schattauer 2011

Methods Inf Med 3/2011

Downloaded from www.methods-online.com on 2016-01-03 | IP: 54.152.109.166 For personal or educational use only. No other uses without permission. All rights reserved.

247

248

C. A. Harle et al.: A Clustering Approach to Segmenting Users of Internet-Based Risk Calculators

Table 2 Comparison of employees who visited the pre-diabetes risk website to all employees Participating employees

All employees

Age (mean /SD)

44.14 (11.89)

43.62 (12.50)

Sex (% female – male)

74 – 26

45 – 55

Race (% White – Black – Hispanic – other)

87 – 6 – 1 – 6

86 – 6 – 1 – 7

Education (% less than 4-year college degree – % bachelor’s degree –% advanced degree )

26 – 31 – 43

Not available

Non-respondent data is based on Human Resources data for all employees on December 1, 2008. For all employees, Am. Indian /Alaskan Native and Asian / Pacific Islander race categories were collapsed and reported as ‘other’ to approximate the categories for participating employees.

–3.13, p = 0.002, suggesting that risk perception accuracy actually worsened. As mentioned previously, this effect was not significantly different across the three website versions [31]. The tendency to lower perceptions was also observed for 20-year absolute diabetes risk, which decreased from 26.69% to 21.87%, t (406) = –7.38, p < 0.001. A cluster of 64 relatively high-risk participants (mean calculated risk = 24.88%) tended to overestimate their risk at baseline and then, on average, significantly reduced their perceptions after receiving information (씰Fig. 3). There were also a cluster of 184 participants who were at moderate risk and another cluster of 56 participants who were at high risk, both of which predominantly underestimated their risk at baseline and then reported significant, but small, increases in risk perceptions after receiving information. Finally, there was a cluster of 103 moderate-risk participants who were relatively accurate about their baseline risk that did not significantly revise its perceptions after receiving the risk information. The first three clusters were of particular interest because they represented participants with largely inaccurate baseline perceptions and potential for improvement. In general, participants in all three of these clusters did shift their perceptions closer to their objective calculated risk. However, the magnitude of these shifts varied across clusters. The high-risk overestimaters reported large enough decreases in their risk perceptions to completely offset their original errors. In contrast, the moderate-risk underestimaters and the high-risk underestimaters only adjusted their unrealisti-

cally optimistic perceptions slightly, barely changing the large average difference between their risk perceptions and their calculated risk. The indication is that people who had overestimated their risk were apt to adopt lower, more accurate perceptions, but those who had underestimated their risk seemed to respond very little to the information. It is important to note that these asymmetric differences cannot be explained by regression to the mean. If perceptual changes were due solely to regression effects, we would expect more uniform pre-post changes across the clusters, especially those that had relatively large baseline disparities in perceived and calculated risk. Instead, high-risk overestimaters reduced their baseline disparity of 21% to 2% (a 90% relative reduction), while high-risk underestimaters reduced their 31% disparity to 25% (a 20% relative reduction) and moderate-risk underestimaters reduced their 10% disparity to 9% (a 10% relative reduction). Simplifying the analysis to a binary comparison of whether each individual’s risk perceptions moved in the direction of the objective risk by any amount, there were similar differences between clusters in the proportion of participants who improved their pre-diabetes risk perceptions, χ2 (3) = 18.64, p < 0.001 (씰Table 3). Highrisk overestimaters were more likely to improve their pre-diabetes risk perceptions than any of the other three groups. A similar and even more distinct pattern emerged in directionality of perceptions of 20-year diabetes risk, although some caveats are necessary in interpretation of this variable because participants did not receive

20-year diabetes risk estimates. Because pre-diabetes and 20-year diabetes risk are highly positively correlated, epidemiologically as well as in reported pre-intervention risk perceptions (ρ = 0.56), adjustments in 20-year diabetes risk perceptions should normatively follow pre-diabetes risk. Thus, overestimaters of pre-diabetes should normatively decrease their 20-year diabetes perceptions between baseline and followup, and underestimaters should increase theirs. Using this formulation, we again see general differences between clusters in the proportion of participants who improved their risk perceptions, χ2 (3) = 36.92, p < 0.001, and high-risk overestimaters were more likely to move in the normative direction compared to participants in the other clusters (씰Table 3).

5. Discussion This paper extends the consumer health informatics literature on web-based risk communication by utilizing a unique cluster-analytic approach to identifying important segments of health consumers and factors related to their improvements in risk perception accuracy. Prior work has not applied cluster-analytic methods to stratify people into groups that help explain responsiveness to quantitative risk information. In addition to the methodological novelty in this domain, the study had high ecological validity based on the fact that participants were unpaid volunteer consumers of health risk information who were recruited through an employersponsored health promotion program. A majority of online risk-informationseekers were unrealistically optimistic about their absolute pre-diabetes risk and less likely than overestimaters to improve their risk perceptions when provided with quantitative risk information. The two highest-risk groups (high-risk underestimaters and high-risk overestimaters) showed statistically significant improvements in their risk beliefs, but only highrisk overestimaters showed qualitatively large improvements. Given that high-risk overestimaters accounted for only 16% of the people seeking out the risk information being advertised, these findings suggest

Methods Inf Med 3/2011

© Schattauer 2011

Downloaded from www.methods-online.com on 2016-01-03 | IP: 54.152.109.166 For personal or educational use only. No other uses without permission. All rights reserved.

C. A. Harle et al.: A Clustering Approach to Segmenting Users of Internet-Based Risk Calculators

only a small portion of online-risk-information seekers are actually improving their absolute risk perceptions. This may be especially worrisome for diabetes, which is already known to invoke low concern from the public compared to other diseases. Improvements in risk perception accuracy were not generally sensitive to differences in the level of personalization in the

risk information presented. This lack of effect may have been due to the resistance to change for the majority of the sample, beyond the relatively small group of highrisk overestimaters who did respond strongly to the risk information. Indeed, highrisk overestimaters who received personalized estimates had qualitatively, though not significantly, larger decreases in their

risk perceptions after receiving information. This study used a real-world online health promotion setting and found that optimistic pre-diabetes risk perceptions were difficult to modify. These results are consistent with prior work that finds optimistically biased risk perceptions may be resistant to debiasing interventions [39].

Fig. 3 Mean pre-post change in risk perceptions for each cluster. Clusters were estimated based on pre-intervention absolute risk perceptions and calculated risk. The dotted lines represent the mean calculated risk for the participants in each cluster. Tests of pre-post difference for each cluster: 1. t(63) = – 6.82, p < 0.001; 2. t(183) = 2.43, p = 0.02; 3. t(55) = 3.07, p = 0.003; 4. t(102) = –1.49, p = 0.14 © Schattauer 2011

Methods Inf Med 3/2011

Downloaded from www.methods-online.com on 2016-01-03 | IP: 54.152.109.166 For personal or educational use only. No other uses without permission. All rights reserved.

249

250

C. A. Harle et al.: A Clustering Approach to Segmenting Users of Internet-Based Risk Calculators

Table 3 Individual changes in perceptions 1. High-risk overestimaters

2. Moderate- 3. High-risk risk underesti- underestimaters maters

4. Moderaterisk, mixed accuracy

58 % with improved prediabetes risk perception accuracy1

28

41

37

% with normative change in diabetes risk perception2

30

32

43

72

Four participants whose baseline risk perception matched their calculated risk, were excluded. 1Improvement defined as a pre-post decrease in the absolute difference between risk perception and calculated risk. Odds ratios on likelihood of improvement – Cluster 1 vs. Cluster 2: OR = 3.52, p < 0.001; Cluster 1 vs. Cluster 3: OR = 1.97, p = 0.07 (marginally significant); Cluster 1 vs. Cluster 4: OR = 2.37, p = 0.008. 2Normative defined as change in the direction indicated by the inaccuracy of pre-diabetes perceptions. Odds ratios on likelihood of normative change – Cluster 1 vs. Cluster 2: OR = 6.06, p < 0.001; Cluster 1 vs. Cluster 3: OR = 5.40, p < 0.001; Cluster 1 vs. Cluster 4: OR = 3.45, p < 0.001.

The analysis identified some characteristics that may be used to target or tailor online risk assessments. Websites like risk calculators may be more effective when targeted towards people with unrealistically high risk perceptions, but may not be the optimal form of risk communication for unrealistic optimists. Online applications could easily be designed to elicit a person’s risk beliefs prior to delivering a risk assessment and then tailor the information presentation depending on which segment of the population that individual falls into. High-risk underestimaters may be a particularly important population to reach from a health education standpoint. Online risk applications could attempt to direct them to a more involved, technology-mediated intervention such as a chat-based counseling session with a health professional prior to, or instead of, providing specific individualized feedback that may be discounted by the individual.

6.1 Limitations and Future Work A primary focus of our experiment was to engage a typical sample of online-health-information seekers with a realistic web-based risk communication intervention. A related objective was to limit the number of preintervention survey questions in order to minimize attrition, maintain the realism of the experience and prevent the questions

themselves from extensively biasing the effect of the intervention. For example, eliciting risk perceptions at baseline may help calm the feelings of overestimaters who are presented with lower than expected risk information [22], and this could have contributed to highrisk overestimaters’ improved risk perceptions. A more extensive set of preintervention survey items could have even further exacerbated such biases or fatigued participants. We chose to limit baseline questions to only those necessary for identifying under- from overestimaters. This reduced the number of variables available for the cluster analysis. Future studies should include additional variables but will need an experimental design that administers baseline surveys well in advance in order to maintain realism of the intervention and prevent the questions themselves from biasing its effects. The measures that we used are also not without their shortcomings. Absolute risk estimates are precise, but do not necessarily provide a complete picture of people’s risk perceptions. For example, eliciting comparative perceptions likely invokes important social comparative cognitive processes that absolute judgments using probabilities or frequencies do not, potentially resulting in less accuracy [40], although comparative risk perceptions can also be resistant to debiasing interventions [39]. Indeed, we show elsewhere that comparative pre-diabetes risk perceptions follow a similar pattern of more improvement for overestimaters

compared to underestimaters [31]. Another important measure to consider is non-analytic risk perceptions such as anxiety or feeling at risk, which are closely related to health behavior [18, 32]. Future work should address whether analytic risk representations, such as numbers and graphs, influence non-analytic risk beliefs in similar ways. Because people with higher numeracy have stronger affective responses to numeric risk information [23], measuring non-analytic risk perceptions and numeracy may allow for improved segmentation based on these variables. The sample was predominantly welleducated, white and female, all of which are characteristics of people who are generally more likely to seek online health information [38]. However, this also meant that the sample was at lower average risk of prediabetes compared to the national average of 28%, and the results may not generalize to less educated, more racially diverse and higher-risk populations. As the digital divide closes, more research is needed to determine how the population of onlinehealth consumers compares to the population at large. Future studies that explicitly include participants with more variability in education level, race and sex are needed in order to obtain an understanding of how online risk information affects a higherrisk and generally more diverse sample. On a related note, our analysis showed that adding demographic variables to the cluster models did not improve the segmentation of participants. However, future work that applies cluster analysis to a more demographically diverse sample might introduce new variability, such that variables like education, race or sex become dimensions around which the data naturally cluster, helping to explain differences in acceptance of risk information. An important limitation is that we have no external validation of participants’ risk factors. Participants may not have been perfectly truthful in disclosing their health information, especially if any had concerns about their employer having access to their responses. This concern should be mitigated by the voluntary and anonymous nature of the study. Still, information privacy is a significant concern to health consumers on the Internet, and an interesting future

Methods Inf Med 3/2011

© Schattauer 2011

Downloaded from www.methods-online.com on 2016-01-03 | IP: 54.152.109.166 For personal or educational use only. No other uses without permission. All rights reserved.

C. A. Harle et al.: A Clustering Approach to Segmenting Users of Internet-Based Risk Calculators

extension would be to measure beliefs about privacy to understand if and how they relate to the clusters found in this study. Future work using clinical populations with access to health records could help to assess the overall accuracy of answers and any predictors of inaccuracies. Finally, this study was limited by its reliance on a minimally validated logistic regression model for calculating each participant’s objective pre-diabetes risk. This is a concern both because the model may have produced some inaccurate estimates and because participants may have perceived the estimates as non-credible. These concerns affect all risk calculators, even those from better-validated risk assessment models. Although good statistical models may be relatively accurate in the aggregate, they will inevitably provide inaccurate point estimates to some individuals. Because this problem by its very nature cannot be fully solved, we built an experimental website around a less-validated model expecting participants would be trusting of information from a professionally designed website and credible source (i.e., data collected from NHANES and an employer-sponsored campaign). Although the objective risk estimates inevitably contained some error, and this limits our confidence in precisely interpreting participants’ true disease risks, we believe the experimental design parallels consumers’ experience with disease risk information on the Internet, which suffers from the same shortcomings. Cluster analysis provided an unsupervised statistical technique for examining the heterogeneity in health consumers’ beliefs and behaviors as they use popular Internet-based health risk information resources. A limitation of cluster analysis is its inherent subjectivity, including that involved in the selection of variables to cluster on, in deciding on the number of clusters, and in interpreting clusters. This study was strengthened by the use of an objective approach to determining the number of clusters, but our findings should be used to develop new hypotheses, which can then be tested more directly in different online-health-promotion settings and for different health risks.

7. Conclusion An employer-sponsored online intervention for pre-diabetes risk communication did not lead to wide adoption of accurate risk perceptions among participating employees. A small cluster of relatively highrisk participants who tended to overestimate their risks at baseline showed the most willingness to revise their beliefs downward, towards an objective risk calculation. To a far lesser degree, a small cluster of highrisk participants who underestimated their risk also adopted somewhat higher and more accurate beliefs. These results highlight, in a realistic health-promotion setting, the difficulty of motivating changes in risk perceptions, the need for better insights into the variability among online-health consumers and the need for advances in the design of technology-mediated risk communication applications for educating health consumers on the Internet. Acknowledgments The authors acknowledge Professor Daniel Nagin for his support in designing this study and for providing feedback on earlier versions of this paper, the Human Resources Office staff of the university that participated in this study and three anonymous reviewers for their helpful feedback. The authors also gratefully acknowledge the McDowell Research Center (MRC) at the University of North Carolina-Greensboro and grant 41100037706 from the Commonwealth of Pennsylvania for supporting this research.

References 1. Lipkus IM, Hollands J. The visual communication of risk. Journal of the National Cancer Institute Monographs 1999; 25: 149–163. 2. Waters EA, Sullivan HW, Nelson W, Hesse BW. What is my cancer risk? How internet-based cancer risk assessment tools convey individualized risk estimates to the public: Content analysis. Journal of Medical Internet Research 2009; 11 (3): e33. 3. Levy A, Sonnad S, Kurichi J, Sherman M, Armstrong K. Making sense of cancer risk calculators on the web. J Gen Intern Med 2008; 23: 229–235. 4. Lipkus IM. Numeric, verbal, and visual formats of conveying health risks: Suggested best practices and future recommendations. Medical Decision Making 2007; 27: 696–713.

5. Emmons K, Wong M, Puleo E, Weinstein N, Fletcher R, Colditz G. Tailored computer-based cancer risk communication: Correcting colorectal cancer risk perception. J Health Comm 2004; 9: 127–141. 6. Harle C, Padman R, Downs J. The impact of webbased diabetes risk calculators on information processing and risk perceptions. In: Proceedings of the 2008 American Medical Informatics Association (AMIA) Symposium. Washington, D.C.; 2008. 7. Hawley ST, Zikmund-Fisher BJ, Ubel PA, Jancovic A, Lucas T, Fagerlin A. The impact of the format of graphical presentation on health-related knowledge and treatment choices. Patient Education and Counseling 2008; 73: 448–455. 8. Kreuter MW, Strecher VJ. Do tailored behavior change messages enhance the effectiveness of health risk appraisal? Results from a randomized trial. Health Education Research 1996; 11: 97–105. 9. Malenka DJ, Baron JA, Johansen S, Wahrenberger JW, Ross JM. The framing effect of relative and absolute risk. Journal of General Internal Medicine 1993; 8: 543–548. 10. McNeil B, Pauker S, Sox H, Tversky A. On the elicitation of preferences for alternative therapies. N Engl J Med 1982; 306: 1259–1262. 11. Weinstein N, Atwood K, Puleo E, Fletcher R, Colditz G, Emmons K. Colon cancer: Risk perceptions and risk communication. J Health Comm 2004; 9: 53–65. 12. Zikmund-Fisher BJ, Ubel PA, Smith DM, Derry HA, McClure JB, Stark A, Pitsch R, Fagerlin A. Communicating side effect risks in a tamoxifen prophylaxis decision aid: The debiasing influence of pictographs. Patient Education and Counseling 2008; 73: 209–214. 13. Avis NE, Smith KW, McKinlay JB. Accuracy of perceptions of heart attack risk: What influences perceptions and can they be changed? Am J Public Health 1989; 79: 1608–1612. 14. Senay I, Kaphingst KA. Anchoring-and-adjustment bias in communication of disease risk. Medical Decision Making 2009; 29: 193–201. 15. Cull A, Anderson E, Campbell S, Steel M. The impact of genetic counselling about breast cancer risk on women’s risk perceptions and levels of distress. British Journal of Cancer 1999; 79: 501–508. 16. Gurmankin AD, Domchek S, Stopfer J, Fels C, Armstrong K. Patients’ resistance to risk information in genetic counseling for BRCA1/2. Arch Intern Med 2005; 165: 523–529. 17. Helmes AW, Culver JO, Bowen DJ. Results of a randomized study of telephone versus in-person breast cancer risk counseling. Patient Education and Counseling 2006; 64: 96–103. 18. Lerman C, Lustbader E, Rimer B, Daly M, Miller S, Sands C, Balshem A. Effects of individualized breast cancer risk counseling: A randomized trial. Journal of the National Cancer Institute 1995; 87: 286–292. 19. Meiser B, Halliday JL. What is the impact of genetic counselling in women at increased risk of developing hereditary breast cancer? A meta-analytic review. Social Science & Medicine 2002; 54: 1463–1470. 20. Lobb EA, Butow PN, Meiser B, Barratt A, Gaff C, Young MA, Kirk J, Gattas M, Gleeson M, Tucker K. Women’s preferences and consultants’ communication of risk in consultations about familial breast cancer: Impact on patient outcomes. Journal of Medical Genetics 2003; 40: e56–e56.

© Schattauer 2011

Methods Inf Med 3/2011

Downloaded from www.methods-online.com on 2016-01-03 | IP: 54.152.109.166 For personal or educational use only. No other uses without permission. All rights reserved.

251

252

C. A. Harle et al.: A Clustering Approach to Segmenting Users of Internet-Based Risk Calculators

21. Kreuter MW, Farrell D, Skinner C, Jacobsen H. Tailoring health messages: Customizing communication using computer technology. Mahwah, New Jersey: Lawrence Erlbaum; 2000. 22. Fagerlin A, Zikmund-Fisher BJ, Ubel PA. How making a risk estimate can change the feel of that risk: Shifting attitudes toward breast cancer risk in a general public survey. Patient Education and Counseling 2005; 57: 294–299. 23. Peters E, Vastfjall D, Slovic P, Mertz CK, Mazzocco K, Dickert S. Numeracy and decision making. Psychological Science 2006; 17: 407–413. 24. Harle CA, Downs JS, Padman R. Designing a personalized health risk communication website to motivate user attention and systematic processing. In: Proceedings of the Seventh Annual Workshop on HCI Research in MIS. Paris, France; 2008. 25. Han PKJ, Klein WMP, Lehman TC, Masset H, Lee SC, Freedman AN. Laypersons’ responses to the communication of uncertainty regarding cancer risk estimates. Medical Decision Making 2009; 29: 391–403. 26. American Diabetes Association. Standards of medical care in diabetes. Diabetes Care 2007; 30: S4–41. 27. Cowie CC, Rust KF, Byrd-Holt DD, Eberhardt MS, Flegal KM, Engelgau MM, Saydah SH, Williams DE, Geiss LS, Gregg EW. Prevalence of diabetes and

impaired fasting glucose in adults in the U.S. Population: National health and nutrition examination survey 1999–2002. Diabetes Care 2006; 29: 1263–1268. 28. Walker EA, Mertz CK, Kalten MR, Flynn J. Risk perceptions for developing diabetes. Diabetes Care 2007; 26: 2543–2548. 29. Adriaanse MC, Snoek FJ, Dekker JM, Spijkerman AMW, Nijpels G, van der Ploeg HM, Heine RJ. Perceived risk for type 2 diabetes in participants in a stepwise population-screening programme. Diabetic Medicine 2003; 20: 210–215. 30. Fisher EB, Walker EA, Bostrom A, Fischhoff B, Haire-Joshu D, Johnson SB. Behavioral science research in the prevention of diabetes. Diabetes Care 2002; 25: 599–606. 31. Harle CA. Doctoral dissertation: Essays on the design and evaluation of information technologyenabled interventions for chronic disease risk assessment and communication. In: The H. John Heinz III College. Pittsburgh, PA: Carnegie Mellon University; 2009. 32. Weinstein N, Kwitel A, McCaul KD, Magnan RE, Gerrard M, Gibbons FX. Risk perceptions: Assessment and relationship to influenza vaccine. Health Psychology 2007; 26: 146–151.

33. Brewer NT, Weinstein ND, Cuite CL, Herrington J. Risk perceptions and their relation to risk behavior. Annals of Behavioral Medicine 2004; 27: 125–130. 34. Kaufman L, Rousseeuw P. Finding groups in data: An introduction to cluster analysis. New York: Wiley; 1990. 35. Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society 2001; 63: 411–423. 36. Gower J. A general coefficient of similarity and some of its properties. Biometrics 1971: 623–637. 37. R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2009. 38. Fox S. Online health search 2006, Pew Internet & American Life Project, 2006, Available at: www. pewinternet.org, Accessed: December 9, 2009. 39. Weinstein ND, Klein WM. Resistance of personal risk perceptions to debiasing interventions. Health Psychology 1995; 14: 132–140. 40. Woloshin S, Schwartz LM, Black WC, Welch HG. Women’s perceptions of breast cancer risk: How you ask matters. Medical Decision Making 1999; 19: 221–229.

Methods Inf Med 3/2011

© Schattauer 2011

Downloaded from www.methods-online.com on 2016-01-03 | IP: 54.152.109.166 For personal or educational use only. No other uses without permission. All rights reserved.

Suggest Documents