AIDS Behav (2008) 12:S131–S141 DOI 10.1007/s10461-008-9413-1
ORIGINAL PAPER
Implementation Challenges to Using Respondent-Driven Sampling Methodology for HIV Biological and Behavioral Surveillance: Field Experiences in International Settings Lisa Grazina Johnston Æ Mohsen Malekinejad Æ Carl Kendall Æ Irene M. Iuppa Æ George W. Rutherford
Published online: 6 June 2008 Ó Springer Science+Business Media, LLC 2008
Abstract Using respondent-driven sampling (RDS), we gathered data from 128 HIV surveillance studies conducted outside the United States through October 1, 2007. We examined predictors of poor study outcomes, reviewed operational, design and analytical challenges associated with conducting RDS in international settings and offer recommendations to improve HIV surveillance. We explored factors for poor study outcomes using differences in mean sample size ratios (recruited/calculated sample size) as the outcome variable. Ninety-two percent of studies reported both calculated and recruited sample sizes. Studies of injecting drug users had a higher sample size ratio compared with other risk groups. Study challenges included appropriately defining eligibility criteria, structuring social network size questions, selecting design effects and conducting statistical analysis. As RDS is increasingly used for HIV surveillance, it is important to learn from past practical, theoretical and analytical challenges to maximize the utility of this method.
L. G. Johnston C. Kendall School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, USA L. G. Johnston (&) 13 Via Plaza Nueva, Santa Fe, NM 87507, USA e-mail:
[email protected] M. Malekinejad I. M. Iuppa G. W. Rutherford Global Health Sciences, University of California, San Francisco, San Francisco, CA, USA M. Malekinejad School of Public Health, University of California, Berkeley, Berkeley, CA, USA
Keywords HIV/AIDS Most-at-risk populations Respondent-driven sampling Biological and behavioral surveillance
Introduction As of October 1, 2007, respondent-driven sampling (RDS) has been used effectively in at least 123 studies conducted in more than 28 countries outside of the United States to measure HIV and other sexually transmitted infection (STI) prevalence and their associated risk factors. More than 32,000 injecting drug users (IDUs), men who have sex with men (MSM), sex workers (SWs) and high-risk heterosexual (HRH) men were enrolled in these studies (Malekinejad et al. 2008). More studies using RDS are currently underway or are being planned. RDS is a quasi-random sampling method based upon numerous statistical assumptions and includes several key requirements that must be observed to create as representative a sample as possible (Abdul-Quader et al. 2006; Heckathorn 1997, 2002a; Magnani et al. 2005; Salganik 2006; Salganik and Heckathorn 2004; Semaan et al. 2002). First, the population being recruited must be socially networked; for instance, social network connections among SWs could be as friends, co-workers, roommates, acquaintances, family or other recognizable members of the same group. Sampling begins with a purposive selection of individuals from the targeted population, usually referred to as ‘‘seeds,’’ who initiate the recruitment process. Second, starting with the seeds, each participant is allowed to recruit no more than a pre-specified number of recruits. This is known as a ‘‘recruitment quota’’ and controls for those with larger networks being able to over-recruit from among their peers and thereby biasing the sample
123
S132
(Erickson 1979; Heckathorn 1997, 2002a). Third, recruitment chains must be long enough for the sample estimator of key variables to stabilize, thereby reaching ‘‘equilibrium.’’ Equilibrium, which is derived from concepts developed for stochastic Markov chain models (Kemeny and Snell 1960), indicates that the final sample is not biased by the purposive selection of seeds. Equilibrium is assessed in the analysis phase of RDS (Heckathorn 1997, 2002a). Fourth, during data collection, information about who recruits whom and the size of each participant’s social network size must be gathered. Information regarding recruits is usually collected through a coupon tracking method linking unique identification numbers. Participants’ social network sizes are used to provide the statistical weights necessary to mitigate biases due to differential network sizes, whereby those with larger network sizes have a smaller probability of selection and vice versa (Heckathorn 2002a; Salganik and Heckathorn 2004). Selfreported social network sizes should be as accurate as possible in order to improve accuracy of estimates. Finally, RDS data must be analyzed to account for homophily, differential social network sizes and recruitment patterns. Without these adjustments, RDS analysis is incomplete, and data gathered from the population are of unknown representativeness. Although some investigators have developed their own software programs to analyze RDS data accounting for these adjustments (Frost et al. 2006), the RDS Analysis Tool (RDSAT) (www.respondentdriven sampling.org), a statistical package developed by Heckathorn and colleagues, offers a free and relatively straightforward way to analyze RDS data. RDSAT was used by all but two of the studies we identified in our parallel review (Malekinejad et al. 2008). Given the importance of each component discussed above, there are potential theoretical and practical challenges that can lead to RDS being implemented incorrectly. In our parallel review we identified 13 studies (three of IDUs, six of MSM and four of SWs) that failed to come within 90% of their pre-calculated sample size and six that failed to reach equilibrium on key variables of interest. In total, 17 studies did not reach 90% of their calculated sample size or equilibrium, or both (Malekinejad et al. 2008). In addition, studies suffered from operational, design or analytical challenges, such as inadequately measuring social network sizes, sampling from an insufficiently networked population, combining samples using multiple methods, combining samples from two separate geographical areas, applying an inadequate design effect (or no design effect) to account for sampling variability when calculating a sample size, and improperly analyzing RDS data. Because RDS is increasingly used as an HIV surveillance-sampling tool, it is important for investigators to
123
AIDS Behav (2008) 12:S131–S141
learn from past practical, theoretical and analytical difficulties to maximize the utility of this method. In this paper we examine predictors of poor outcomes of RDS studies; review field experiences and the operational, design and analytical limitations associated with conducting RDS in international settings; and offer recommendations to improve future HIV surveillance assessments that use RDS.
Methods We collected data on HIV biological and/or behavioral surveillance studies using RDS from published and unpublished manuscripts, abstracts, reports and protocols and through co-authors’ firsthand knowledge. Our datagathering methods are fully described elsewhere (Malekinejad et al. 2008). Briefly, we conducted extensive searches of published and grey literature using key words, including methodology (‘‘chain-referral sampling’’ or ‘‘respondent-driven sampling’’), population of interest (‘‘men who have sex with men’’, ‘‘bisexual’’, ‘‘sex workers’’, ‘‘injecting and non-injecting drug users’’), medical domains (‘‘HIV’’, ‘‘HCV’’, ‘‘sexually transmitted infections’’, ‘‘drug abuse’’, ‘‘overdose’’ and ‘‘needle sharing’’), language (documents in English, Spanish, French, Portuguese, Farsi, or Arabic) and location (studies conducted outside of the United States). We also conducted a ‘‘cited reference search’’ in Web of Science on the relevant papers and used the ‘‘related articles feature’’ in PubMed. Most of the information gathered for this paper comes from coauthors’ personal involvement in specific studies and investigators known to them. Eligibility Criteria We assessed articles identified through our original search and differentiated studies that used both the RDS recruitment process and analytical elements from those that did not. We abstracted specific data from each study to identify those that met the following five criteria: (1) initiated recruitment chains with members of the target population, known as ‘‘seeds;’’ (2) used a fixed number of recruits, known as a ‘‘recruitment quota;’’ (3) collected data on social network sizes for all participants, using a consistent set of parameters; and (4) systematically recorded who recruited whom. We excluded studies that did not generate weighted estimates of variable frequency and confidence intervals using network size data. For studies that only recently completed data collection, we excluded any that did not intend to use weighting in their analysis; combined an RDS sample with other sampling methods; or combined samples from multiple RDS studies that either used different eligibility criteria or that were conducted in distinct
AIDS Behav (2008) 12:S131–S141
geographical areas. We considered studies that fulfilled all inclusion criteria as a complete RDS study and included them in our review. In addition, this paper includes five studies that were excluded from our parallel review (Malekinejad et al. 2008). These five studies, four (two of SWs and two of MSM) from the Caribbean (Ogunnaike-Cooke and Bombereau 2007), and one (in SWs) from Montenegro (Simic et al. 2006) were excluded from the parallel paper because they did not meet an additional criterion that the sampled population be sufficiently socially networked as evidenced by at least one of the seeds generating a minimum of three referral waves or attaining a minimum of 10% of its desired sample size. These five studies are included here to capture the full range of studies that were designed to use RDS methods, including those that failed to propagate referral chains and because they are useful in describing some of the challenges related to RDS implementation. Analysis To assess whether certain RDS study characteristics might affect participant recruitment, we defined sample size ratio as the number of reported subjects that actually was recruited through RDS (recruited sample size) divided by the number of subjects that was pre-specified during the study design (calculated sample size). We then examined factors in the implementation, design and analysis phases of the studies that might influence the sample size ratio. These factors included the types of study populations (IDUs, MSM, SWs or HRH men), whether formative work was conducted before initial recruitment, the use of mobile as opposed to fixed venues, and the use of monetary primary and secondary incentives for recruitment rather than combinations of other incentives. Primary incentives refer to gifts or money provided to participants upon completion of the study process (e.g., interview and biological specimen); secondary incentives refer to gifts or money provided to participants who successfully recruit their peers to the study. Other variables were the number of sites being used for recruitment of subjects (1 vs. [1), use of a coupon expiration date to limit referral and enrollment time (limited vs. no limitation), and use of a design effect C1.5 to calculate the sample size. We applied a Student t-test with no assumption of equality of variance to assess whether the mean of the sample size ratio differed based on various study characteristics. We applied a multivariate Poisson regression model to generate adjusted incidence rate ratios (IRR) using a robust procedure. The robust estimator of variance takes into account the heterogeneity of observations. In the Poisson model, we included only predictors in which the
S133
mean of the sample size ratio was statistically significantly different between two levels of predictors using a statistical P-value of B0.2. We conducted a sensitivity analysis by repeating the multivariate analysis after excluding the five studies in which none of the seeds could generate a minimum of three referral waves or, in case the number of waves was not reported, it did not attain a minimum of 10% of its desired sample size. Specific RDS study examples used to describe operational, design and analytical limitations were based on coauthors’ first hand experiences, field notes and discussions with study investigators. Complete references for these studies are listed in the appendix in our parallel review paper (Malekinejad et al. 2008).
Results We identified 128 studies that met our inclusion and exclusion criteria, including five (three in SWs and two in MSM) that were excluded from our parallel review of 123 studies. With the exception of a study conducted among HRH men in Cape Town, South Africa in 2007, the countries, cities and years of the studies by study population are presented in Table 1. Of the 128 studies, 123 (96%) reported their calculated sample size, which ranged from 100 to 800 and averaged 275. One hundred and eighteen (92%) studies also reported their final sample size, which ranged from 2 to 963 and averaged 263 (median 225, interquartile range [IQR] 152–360). In total, 118 studies reported both calculated and recruited sample sizes, with the average recruited/calculated sample size ratio of 0.94 (median 1.00, IQR 1.00–1.02). Of these 118 studies, 100 (85%; 58 in IDUs, 29 in MSM, 12 in SWs and one in HRH men) came within 90% of their calculated sample size. Table 2 summarizes the bivariate analyses of sample size ratios by different characteristics of RDS studies. The mean of the sample size ratios was higher among IDU studies compared with studies of the other most-at-risk populations (Student t-test [t] = -2.9033, degree of freedom [df] = 59.00, P \ 0.01) and lower among studies that used an expiration date on the coupon (t = 2.8199, df = 45.97, P \ 0.01) compared with studies that did not use specific time limits for recruitment. Adjusting for potential confounding variables in the multivariate Poisson model, using a coupon expiration data was associated with a 12% increase in sample size ratio (IRR = 0.88, 95% confidence interval [CI] 0.78–0.99) (Table 3). IDU studies were associated with an 11% increase in sample size ratio compared with studies of other most-at-risk populations (IRR = 1.11, CI 0.099–0.24), but this association was only marginally significant (P \ 0.1). None of the factors remained significantly associated with
123
S134
AIDS Behav (2008) 12:S131–S141
Table 1 Countries and cities where RDS studies were conducted among injecting drug users, men who have sex with men and sex workers by year of studya Injecting drug users
Men who have sex with men
Sex workers
Tanzania: Zanzibar
2007 Bosnia-Herzegovina: Sarajevo, Banja Luka, Zenica
Estonia: Tallinn
Estonia: Tallinn, Kohtla-Jarve
Indonesia: Bandung, Batam, Malang
Indonesia: Bandung, Surabaya
Tanzania: Zanzibar
Tanzania: Zanzibar 2006 Egypt: Cairo India: Bishenpur and Chrachandpur Districts, Manipur; Phek and Wokha Districts, Nagaland; Mumbai and Thane Districts, Maharashtra
Antigua and Barbuda: St. John’s Bangladesh: Dhaka
Antigua and Barbuda: St. John’s Honduras: Comayagua, La Ceiba, San Pedro Sula, Tegucigalpa
Iran: Tehran
China: Beijing, Guang-zhou, Jinan
India: Dimapur District, Nagaland; Mumbai and Parbhani Districts, Maharashtra
Ukraine: Cherkasu, Dneprodzerjunsk, Dnepropetrovsk, Donetsk, Kharkov, Kherson, Kahovka, Kiev, Kirovograd, Krivoy Rog, Lugansk, Lutsk, Marupol, Makeevka, Nikolaev, Norovokunsk, Odessa, Poltava, Sevastopol, Simferopol, Smela, Sumy, Voznesensk, Yalta, Znamenka
Croatia: Zagreb
Paraguay: Cuidad del Este
United Kingdom: Bristol
Egypt: Alexandria
St. Vincent/Grenadines: Kingstown
Honduras: La Ceiba, San Pedro Sula Kosovo: Pristina Paraguay: Ciudad del Este St. Vincent and the Grenadines: Kingstown Ukraine: Cherkasu, Donetsk, Dneprodzerjunsk, Ivano-Frankovsk, Kherson, Kiev, Krivoy Rog, Lugansk, Nikolaev, Odessa, Simferopol, Yalta 2005 Albania: Tirana
Albania: Tirana
Brazil: Porto Alegre, Santos
Estonia: Kohtla-Jarve, Tallinn
Brazil: Campinas, Fortaleza
China: Yangiang
Indonesia: Surabaya
Cambodia: Battambang, Phnom Penh, Siem Reap
Montenegro: Podgorica
Mexico: Ciudad Juarez, Tijuana
China: Beijing
Papua New Guinea: Goroka, Port Moresby
Montenegro: Podgorica
Papua New Guinea: Goroka
Serbia: Belgrade
Nepal: Various locations
Vietnam: Hanoi: Ho Chi Minh City
Russia: St. Petersburg Serbia: Belgrade Ukraine: Kiev, Odessa, Pavlohrad, Poltava Vietnam: Can Tho, Danang, Hanoi, Ho Chi Minh City 2004 Indonesia: Bandung
Nepal: Katmandu
Russia: Togliatti
Uganda: Kampala
Vietnam: Hai Phong, Hanoi, Ho Chi Minh City 2003 Thailand: Bangkok a
Reported through October 1, 2007
123
Vietnam: Hai Phong, Ho Chi Minh City
AIDS Behav (2008) 12:S131–S141
S135
Table 2 Comparison of the mean of the sample size ratio by different study characteristics in 128 studiesa
Table 3 Predictors of the sample size ratioa in multivariate analysis (number of studies = 84)
Variable Number Mean (90%CI)b of studies
Variable
IRR (95% CI)b
IDU group (vs. other groups)
1.11 (0.99–1.24)
Considering design effect (C1.5)
1.10 (0.93–1.31)
Using more than one recruitment site
0.90 (1.02–1.22)
Using coupon expiration date
0.88 (0.78–0.99)c
Dfc
t-statisticsd P-valuee
IDU group (vs. other risk groups) Yes
61
1.00 (0.99, 1.01) 58.99 -2.9033
0.0052
No 57 0.87 (0.79, 0.94) MSM group (vs. other risk groups)
a
Yes
37
0.89 (0.81, 0.97) 56.49
No
81
0.96 (0.92, 1.00)
1.2718
0.2086
b
Using mobile venue Yes No
7
0.85 (0.66, 1.04)
111
0.94 (0.90, 0.98)
Sample size ratio is defined as the ratio of recruited sample size divided by calculated sample size
6.74
0.9149
0.3918
0.3437
0.7349
Adjusted incidence rate ratio (IRR) based on multivariate Poisson regression model with 95% confidence interval using robust procedure c P-value of coefficient \ 0.05
Conducting formative work Yes
97
0.93 (0.88, 0.97) 18.80
No
13
0.95 (0.85, 1.05)
Both primary and secondary incentives were monetary Yes
51
0.92 (0.87, 0.98) 98.30
No
51
0.93 (0.87, 1.00)
0.2003
0.8417
2.8199
0.0071
0.1261
Using coupon expiration date Yes
45
0.85 (0.75, 0.94) 45.97
No 55 1.00 (0.99, 1.01) Considering design effect (C1.5) Yes
31
0.99 (0.93, 1.06) 80.67 -1.5455
No
59
0.91 (0.85, 0.98)
Using more than one recruitment site Yes
28
0.99 (0.93, 1.05) 56.02 -1.4968
No
90
0.92 (0.87, 0.97)
0.1393
a
Sample size ratio is defined as the ratio of recruited sample size divided by calculated sample size; number of studies analyzed differs for each characteristic due to missing data
b
Mean and 90% confidence intervals of sample size ratios
c
Satterthwaite’s degrees of freedom
d
Approximate t-statistics of t-tests to compare mean of sample size ratios with unequal variances
e
Sample size ratio with P-value B 0.2 were included in the multivariate model
sample size ratio after removing the five studies that had not generated a minimum of three waves. Apart from these quantitative predictors, we identified a number of areas in which studies experienced difficulties, which are described below. Operational Limitations Defining Eligibility and Measuring Social Network Sizes Of the 128 studies identified, 66 (51%) (45 of IDUs, 19 of MSM and two of SWs) had insufficiently defined eligibility criteria, which included not clearly defining parameters such as eligible age range or sex, geographic area or
definition of the study population’s specific surveillance behavior. These omissions are significant in RDS studies because the eligibility criteria are used to build the question on social network size, which is necessary for analyzing RDS data. Several studies, for example, neglected to include geographic or age parameters in their network size question, which likely resulted in recruitment outside of the city/district or age group under investigation. To improve accuracy in responses to the question about social network sizes, some studies cut the question into several parts. In an MSM survey in Estonia, for instance, the question was divided into three parts: (1) How many men who have sex with other men do you know by name and they know you by name and you have seen them in the past one month?; (2) How many of those men you mentioned in last question live in Tallinn?; and (3) How many of those men who live in Tallinn are at least 18 years-old? (Trummal et al. 2007). Some studies also neglected to include the social network question during interviewing, therefore ignoring a standard requirement in RDS methodology and analysis (Nepal, Kenya [2002]). In the first round of a study of MSM conducted in China (2004), the question regarding participants’ social network size was not asked for onethird of the sample and later had to be imputed from the participants’ number of sex partners (Ma et al. 2008). This problem most likely resulted in a deflated social network size, as not all members of someone’s social network will be sex partners. China was able to conduct three rounds of RDS in 2004, 2005, and 2006 among MSM, and the second and third rounds were successful in gathering data on social network size. The social network size question requires careful construction if RDS is to be successfully conducted. Designating Appropriate Incentives Some studies used non-monetary forms of incentives, which have included phone cards, vouchers for food, a
123
S136
‘‘gift bag’’ of items such as condoms, lubricants, educational materials, and gift certificates (IDUs in Estonia [2005, 2007]; MSM in Estonia [2007], Eastern Caribbean [2006], Honduras [2006], and Campinas, Brazil [2005]; SWs in Brazil [2006], Eastern Caribbean [2006], and Honduras [2006]; HRH men in South Africa [2006]). In our quantitative analysis we found that using either monetary or non-monetary incentives was not associated with the success or failure of a study. Incentives that are too high can result in coupon bartering and selling. In Serbia (2005), where the incentive to complete the interview process was ten Euros, and to recruit up to three peers was five Euros for each successful recruit, some IDUs sold coupons to their peers. For half of the ten Euro incentive, someone’s peer bought a coupon to enroll in the survey and net five Euros by participating. Furthermore, IDUs could buy recruitment coupons for one to two Euros each and make three to four Euros for each person recruited. A similar situation occurred in Phnom Penh, Cambodia (2005), where some MSM waited in front of the interview site to collect coupons for a small price and then sold them at a higher price to other MSM. MSM who bought a coupon were able to redeem them to enroll and participate in the study and receive a larger amount of money than was originally paid for the coupon. Similar situations occurred in studies of IDUs in Vietnam (2004) and Tanzania (2007). In Tanzania (2007), Montenegro, Russia, and Serbia (2005), and Vietnam (2004), some IDUs waited in front of the survey site to acquire a coupon from an exiting IDU, thereby increasing the chances that coupons were being distributed to strangers rather than someone known to the recruiter (and in the his/her social network). In Tanzania (2007) and Goroka, Papua New Guinea (2005), some individuals misrepresented themselves as MSM (i.e., masking), and in Tanzania (2007) and Serbia (2005) some individuals misrepresented themselves as IDUs to participate in the study and receive an incentive. High incentives also can result in logistical challenges. In studies of IDUs in Tanzania (2007) and Vietnam (2004) and MSM in Cambodia (2005), incentives were perceived as so attractive to the target population that interview staff became overwhelmed with the number of referrals wanting to redeem their coupons. In Vietnam, the waiting room of the RDS interview site sometimes contained as many as 20 IDUs waiting to be interviewed. When some IDUs were told to return another day, they became disruptive and threatened staff. Furthermore, in Serbia (2005) and Vietnam (2004), the high incentive increased the possibility that IDUs would try to participate in the study more than one time. Low incentives, on the other hand, can result in slow recruitment. During the first three months of the RDS study in Iran (2006), for instance, IDUs did not participate due to the perceived low incentives. Some SWs in Vietnam (2004)
123
AIDS Behav (2008) 12:S131–S141
reported not participating due to the incentive ‘‘not being worth the time,’’ and SWs in Montenegro (2005) believed that the low incentive was not ‘‘worth the risk’’ to participate. No monetary primary incentive was offered to MSM and SWs in the Eastern Caribbean (2006), which may be part of the reason why recruitment failed in this location (Ogunnaike-Cooke and Bombereau 2007). Studies in Antigua and Barbuda ultimately recruited only six of 111 MSM and 11 of 218 SWs, and in St. Vincent and Grenadines recruited seven of 175 MSM and four of 218 SWs. However, among IDUs in Russia (2005) and MSM in Uganda (2005) and China (2005, 2004), no primary incentives were used, and recruitment was steady and ongoing. It should be noted that in Uganda, recruitment stopped before attaining sample size due to media exposure about the location and the nature of this study, resulting in MSM being afraid to participate. In response to incentives’ impact on recruitment, investigators conducting studies among SWs and MSM in Tanzania (2007) decreased incentives, and among MSM in Estonia (2006) and IDUs in Iran (2007), increased incentives during data collection. Although not explicitly called an incentive, clinical examinations and testing and test results are seen by some target populations as an inducement to participate in the survey. MSM in Bangladesh (2006) and SWs in Northeastern India (2005), for example, reported participating in the study because of the clinical examination and/or test results they were offered (Johnston et al. 2007b). SWs in Sa˜o Paulo, Brazil (2005) reported being more interested in the rapid HIV test and results than in another type of incentive. On the other hand, recruitment in some surveys suffered due to the provision of testing and results. In Northeastern Brazil (2005) and some cities in the Ukraine (2006), many MSM participants did not want to be tested for HIV. Follow-up research among MSM in Estonia (2007) indicated that many did not enroll in the RDS study because they did not want to provide a blood specimen and then feel pressured to learn their HIV test results (Trummal et al. 2007). In a study in Kosovo (2006), some IDU respondents did not like using the penile swab. However, in Croatia (2006), most MSM were willing to provide a rectal swab, and in Honduras (2006), most SWs were willing to provide a vaginal swab as part of the survey. Design Limitations Social Networks RDS will not function if the population of interest is not socially networked. Two surveys conducted among SWs in Serbia and Montenegro (2005) did not attain their calculated sample size, most likely due to insufficient social networks among this population. The study in Serbia recruited 209 of 400 SWs, and the study in Montenegro
AIDS Behav (2008) 12:S131–S141
recruited two of 150 SWs. These SWs were moved frequently and closely controlled by criminal elements; they did not have the freedom to roam, and therefore they formed closed social networks, mainly with other restricted SWs in their immediate location (Simic et al. 2006). In the Eastern Caribbean (2006) and Paraguay (2006), where 160 out of 425 participants were recruited, SWs also were found to form small and closed social networks, making it impossible to maintain recruitment (OgunnaikeCooke and Bombereau 2007). RDS studies among IDUs in Iran (2006) and Egypt (2006) failed to recruit a significant number of female IDUs, potentially due to weak social networking among female IDUs and fear of being identified as an injector. Some studies have had to convert from RDS to a convenience sample due to difficulties in recruiting specific types of participants (SWs in Egypt [2006] and Estonia [2006] and female IDUs/SWs in Russia [2004]) (Simic et al. 2006; Trummal et al. 2006). In studies that looked at diversity in social network connections, SWs in Ho Chi Minh City were found to form diverse social network associations by type of sex work, even discovering SWs who self-reported as working in brothels that were thought to not exist (Johnston et al. 2006). In Bangladesh, MSM were found to have diverse network patterns by self-identified type of MSM (e.g., bisexual, khoti, Panthi, etc.) (Johnston et al. 2007b). Other studies have found that some populations do not form diverse social network connections for certain characteristics. In Porto Alegre, Brazil (2005), the social networks of female, male and transgender SWs did not cross over but rather acted as three independent networks, and in Ho Chi Minh City, Vietnam (2004), higher-paid and lower-paid SWs from different areas of the city formed independent social network ties. Insufficiently linked social networks occurred in some MSM studies as well. In northeastern Brazil (2005), MSM from higher social classes did not participate in RDS for a range of reasons, including location of the interview site and incentives, but also because they were not well networked with MSM from lower social classes, and in Tanzania (2007) drug-injecting MSM formed groups distinct from non-injecting MSM. Efforts to increase crossover among distinct groups within a social network have been overcome by identifying seeds that will recruit across those groups; for example, SWs and IDUs in India (2006) and IDUs in Vietnam (2004). Analytic Limitations Calculating a Sample Size Using an Appropriate Design Effect Few studies (36%) reported calculating their sample size using a design effect to account for potential intra-class
S137
correlation among members of social networks. These studies included 14 in IDUs (22% of all IDU studies), 14 in MSM (36% of all MSM studies) and 16 in SWs (76% of all SW studies) and one in HRH men. Achieving Equilibrium Once data are gathered, they must be properly analyzed. The first step in analysis is to determine that key variables, including sample characteristics, have reached equilibrium—an indication that the final sample is not biased by the inclusion of the purposively selected seeds (Heckathorn 1997, 2002a). If equilibrium is not achieved, a larger sample size may be needed. However, the achievement of sample size, when calculated correctly, with all other methodological and analytical issues being equal, is an important indicator of survey success. Among the 111 studies that reported on equilibrium, 52 IDU (91% of all IDU studies), 31 MSM (91% of MSM studies) and 15 SW (79% of SW studies) studies and the one HRH men study reported reaching equilibrium. Notably, equilibrium was reached on some variables in three surveys that fell short of their calculated sample size (MSM in Estonia [2007], MSM in Uganda [2005], and SW in Serbia [2005]). Adjusting Data All but two studies reported using RDSAT or some other statistical software package that adjusts for differential network sizes and recruitment patterns. Currently, RDSAT can generate proportion estimates and confidence intervals as well as output on homophily, average network sizes and equilibrium. Although the primary purpose of HIV surveillance is to generate descriptive data to understand the status of the epidemic and measure trends over time, biological and behavioral data collected from most at-risk groups using RDS would, in some situations, be more useful if we could accurately measure predictors of HIV and other STI risks and behaviors. As a result of this limitation, only three of the surveys we reviewed used regression analyses to show associations in their data (HRH men in South Africa [2007]; MSM in Croatia [2006]; IDUs in Estonia [2005]). Regression analyses using Estonian IDU data were conducted accounting for ‘‘intracluster correlation coefficients’’ based on the degree to which observations from individuals recruited by the same individual were correlated, and a random effects model was used to adjust for any correlations (Platt et al. 2006), and Croatian RDS data of MSM were not adjusted at all (Stulhofer et al. 2007). Logistic regressions using South African data of HRH men were calculated using individualized weights
123
S138
generated on the dependent variable using RDSAT 5.6 (Johnston et al. 2008). These weights are calculated based on each participant’s social network size (degree weight) and recruitment patterns (recruitment weight) (Heckathorn 2007). There still is no consensus among statisticians as to whether RDS data can be appropriately weighted for multivariate analysis, and discussions are underway among the RDS community on how to advance the use of RDS data in regression analyses.
Discussion We reviewed RDS studies that gathered HIV behavioral and/or biological data from hard-to-reach groups in international settings. Based on information gathered from RDS studies identified through October 1, 2007, improved sample size ratios were found in studies sampling IDUs. Relatively more successful outcomes among IDUs seem appropriate given that RDS was originally developed as a method to sample and provide peer education among IDUs (Broadhead et al. 1995, 1998; Heckathorn 2007; Heckathorn et al. 1999, 2002b). Among IDUs, social networks are driven by consumption of an illicit product and the need to maintain information about source, price and quality, as well as sharing equipment and drugs. We note, however, that when we removed the five studies that failed to propagate beyond the third wave in a sensitivity analysis, the association between studies among IDUs and larger sample sizes was no longer significant. This result suggests that RDS among IDU populations is only marginally easier than among other populations. Although most RDS studies we reviewed attained their predetermined sample size, a handful of studies did not. In most cases, excluding operational factors, this failure was most likely due to a lack of social networking among the target population, which was most common in studies of SWs. Clearly a range of issues are involved here: members of the population may not have the freedom to recruit or participate in the study, either due to physical constraint or stigma, or the social network ties may not be sufficient to overcome other barriers to participation. Also, this reason may be used by respondents when they are difficult to recruit, no matter the actual reason. Chopra et al. (2008) demonstrated that a group of single, sexually very active men were able to recruit each other. On the surface this did not appear to be a ‘‘standard’’ network. Undoubtedly, this issue of ‘‘networkability’’ will continue to be discussed. Our quantitative analysis did not show any association between conducting formative research or the level of incentive on the outcome of interest (nine studies did not report some prior formative research). However, our data are limited by the lack of detailed information about the quality
123
AIDS Behav (2008) 12:S131–S141
and types of formative research conducted in many of these studies. Although most studies used both primary and secondary incentives during data collection, our results show that there is no significant difference in sample size ratio between studies that used both primary and secondary incentives and those that used only one incentive and no incentives. Our data are limited, however, because the amount of a secondary incentive was not reported by 44 studies. More accurate data on all major components of RDS studies, including detailed information about the comprehensiveness of pre-survey assessment, would help us understand important factors that can influence RDS recruitment. Also more detailed information about the number and type of variables that studies reported reaching equilibrium can shed light on the success or failure of studies. Results from specific study experiences demonstrate numerous operational, design and analytical challenges. Of most importance, many studies failed to use clearly defined eligibility criteria, which are essential in structuring the question about social network size. The collection of social network data must be as accurate as possible and include each component of the eligibility criteria used in a survey. If the eligibility criteria, for instance, are men who engaged in anal sex with another man in the past 6 months, are 18 years of age or older and live in City A, the network question should be similar to: ‘‘How many men do you know (and they know you) who have engaged in anal sex with another man in the past six months?, who are 18 years and older and who live in City A and whom you have seen them in the last one month (or other consistent time frame)?’’ If one of the criteria (e.g., age) is not included in measuring social network sizes, then those under 18 years of age will have a probability of inclusion equal to those 18 years and older. Some studies had difficulty determining an appropriate incentive level, thereby introducing logistical and sampling challenges during data collection. In response, some studies increased or reduced their incentive amount during data collection. Selecting an appropriate incentive is not an exact science and is often one of the more difficult RDS components to plan, especially in light of differing countryspecific socioeconomic factors. It is unknown to what extent those studies used formative research to determine which incentives were more likely to be of an appropriate amount or type. It is logical that input from target populations and key informants, as well as a thorough understanding of country-specific economic and political contexts, would improve the chances of selecting an appropriate incentive. It is not known to what extent the provision of HIV and/ or STI testing and test results deterred or encouraged participants from participating in some studies. Publications from surveys conducted in Bangladesh of syphilis among MSM and in Cape Town of HIV among HRH men have
AIDS Behav (2008) 12:S131–S141
discussed the populations’ positive responses to receiving on-site testing and results as a motivator to participate in the RDS study (Johnston et al. 2007b, 2008). It would be interesting to investigate whether most-at-risk populations are more receptive to HIV and/or STI testing and receiving results and clinical examinations in an RDS setting in countries where these services are limited, not available or offered in a stigmatizing manner versus in countries where these services are well-developed, easily available and offered supportively for most-at-risk populations. As in most surveys, the sample size for RDS is typically calculated either based on the precision with which estimates are to be made, in the case of estimates at a single time point, or based on statistical power for testing differences, in the case of estimating changes in HIV prevalence or a certain risk behavior over time. For most HIV biological and/or behavioral surveillance, surveys are meant to provide point estimates and are to be repeated over time with the same sampling method. In this regard, many RDS survey sample sizes are calculated based on both the desired precision in the estimate of a characteristic such as HIV prevalence and on the power for detecting a change in that characteristic over time (FHI 2000; Salganik 2006). When calculating a sample size, it is recommended that an appropriate design effect be applied to account for variability in the RDS sampling design. Many of the studies we reviewed did not do so. Recent evidence suggests that a design effect of at least 2.0 is most suitable for this design type of sampling design (FHI 2001; Salganik 2006). With respect to the final analysis of data collected from RDS surveys, there are recommended methods for making simple point estimates of proportions and for using bootstrapping (Salganik and Heckathorn 2004; Salganik 2006) to calculate confidence bounds on those proportions. These methods are implemented in the RDSAT software, which was used by all but two studies reviewed here. Unfortunately, there is no consensus on the appropriate statistical methods for using RDS in more complex analyses, such as multivariate linear or logistic regression; thus, there is no standardized software for producing or testing the significance of adjusted odds ratios. One approach has been to use the weights exported from RDSAT for complex analyses, which is by itself insufficient, as it does not incorporate the bootstrapping methods needed to calculate confidence intervals (Salganik and Heckathorn 2004; Salganik 2006).
Recommendations Based on our review of RDS studies among IDUs, MSM, SWs and HRH men, we identified several operational, design and analytical challenges which we address in our
S139
recommendations. It is critical that both the recruitment strategy and appropriate analysis, taking into account social network sizes and who recruited whom, are used before a study should be classified as using RDS. Without the necessary analytical adjustments, RDS is just a very good chain-referral sample. We recommend that minimum reporting standards include the criteria listed in the methods section of this paper as well as report how the sample size was calculated, the eligibility criteria for data collection, the network size question used for data analysis, and whether equilibrium was reached on key variables of interest. We also recommend improvement in measuring participants’ social network sizes. We suggest that extra training be provided to interviewers to ensure accurate responses to this type of open-ended question. A quick response from a participant to the social network size question is likely to be inaccurate, and we encourage a standardized method for probing for responses. Some studies have cut social network size questions into several parts to ensure accuracy. We also suggest that RDS studies conduct formative research, especially in areas where outreach and other services have weak links to the target population, to explore populations’ social networking properties, ability and willingness to participate in the study, and other issues such as selecting an appropriate site location for conducting the survey and to identify appropriate incentives and levels. With regard to analytical considerations, we recommend that RDS studies now consider using a minimal design effect of 2.0 to calculate sample sizes or consider adjusting P-values for variables that have design effects that were too small when the sample size was calculated. Although many surveillance activities will not require multivariate analysis, it is important to address the limitations of RDS data for conducting regression analyses. There have been attempts to account for the biases inherent in this method (e.g., clustering, individualized weights) when examining associations of HIV prevalence and risk. However, these methods have limited utility. Until better analytical methods for making associations are found, conducting regression analyses using individualized weights exported from RDSAT will need a caveat to remind policy planners that findings from these data must be interpreted with caution. RDS is an effective and useful tool for collecting HIV and STI behavioral surveillance data from hard-to-reach populations. Given that until recently there has been limited instruction on how to accurately conduct RDS (Johnston 2007a; www.respondentdrivensampling.org), it is noteworthy that a large number of countries have successfully implemented RDS studies to better understand their HIV and STI prevalence and behavioral trends among
123
S140
populations that had not been sampled before. We hope that the examples from the RDS studies reviewed in this paper will provide future users of RDS the opportunity to learn from past successes and challenges to improve the operations, design and analytical aspects of this important and useful methodology. Acknowledgments Special thanks to all of the RDS investigators who shared their protocols and related documents with us. We also wish to thank Meade Morgan, Senior Statistician, Global AIDS Program, Centers for Disease Control and Prevention, Atlanta, Georgia, for his contribution to the discussion on conducting regression analysis with RDS data and to Daniel Tancredi, of the Center for Healthcare Policy and Research, University of California, Davis, for his assistance with Poisson regression analysis. The authors acknowledge the support of the University Technical Assistance Projects at the Center for Global Health Equity, Tulane University, and the Institute for Global Health, University of California, San Francisco. Disclaimer: The findings and conclusions in this paper are those of the authors and do not necessarily represent those of donor agencies.
References Abdul-Quader, A. S., Heckathorn, D. D., Sabin, K., & Saidel, T. (2006). Implementation and analysis of respondent driven sampling: Lessons learned from the field. Journal of Urban Health, 83(Suppl. 7), 231–35. doi:10.1007/s11524-006-9029-6. Broadhead, R. S., Heckathorn, D. D., Grund, J. C., Stern, S., & Anthony, D. L. (1995). Drug users versus outreach workers in combating AIDS: Preliminary results of a peer-driven intervention. Journal of Drug Issues, 25(3), 531–564. Broadhead, R. S., Heckathorn, D. D., Weakliem, D., Anthony, D. L., Madray, H., Mills, R., et al. (1998). Harnessing peer networks as an instrument for AIDS prevention: Results from a peer-driven intervention. Public Health Reports, 113(Suppl. 1), 42–57. Chopra, M., Townsend, L., Johnston, L. G., Mathews, C., Shaikh, N., Tomlinson, M., et al. (2008). Sexual risk behaviour among men with multiple, concurrent female sexual partners in an informal settlement on the outskirts of cape town. MRC Policy Report, South Africa, April 2008. Erickson, B. H. (1979). Some problems of inference from chain data. Sociological Methodology, 10, 276–302. Family Health International (2000). Behavioral surveillance surveys: Guidelines for repeated behavioral surveys in populations at risk of HIV. Arlington, VA: Family Health International. Available at: http://www.fhi.org/en/HIVAIDS/pub/guide/bssguidelines.htm. Family Health International (2001). Evaluating programs for HIV/ AIDS prevention and care in developing countries. Arlington, VA: Family Health International. Available at: http://www.fhi. org/en/HIVAIDS/pub/Archive/evalchap/index.htm. Frost, S. D., Brouwer, K. C., Firestone Cruz, M. A., Ramos, R., Ramos, M. E., Lozada, R. M., et al. (2006). Respondent-driven sampling of injection drug users in two U.S.-Mexico border cities: Recruitment dynamics and impact on estimates of HIV and syphilis prevalence. Journal of Urban Health, 83(Suppl. 7), 83–97. doi:10.1007/s11524-006-9104-z. Heckathorn, D. D. (1997). Respondent driven sampling: A new approach to the study of hidden populations. Social Problems, 44, 174–199. doi:10.1525/sp.1997.44.2.03x0221m. Heckathorn, D. D. (2002). Respondent driven sampling II: Deriving valid population estimates from chain-referral samples of hidden
123
AIDS Behav (2008) 12:S131–S141 populations. Social Problems, 49, 11–34. doi:10.1525/sp.2002. 49.1.11. Heckathorn, D. D. (2007). Extensions of respondent driven sampling: Analyzing continuous variables and controlling for differential recruitment. Sociological Methodology, 37(1), 151–207. doi: 10.1111/j.1467-9531.2007.00188.x. Heckathorn, D. D., Broadhead, R. S., Anthony, D. L., & Weakliem, D. (1999). AIDS and social networks: HIV prevention through network mobilization. Sociological Focus, 32, 159–179. Heckathorn, D. D., Semaan, S., Broadhead, R. S., & Hughes, J. J. (2002). Extensions of respondent-driven sampling: A new approach to the study of injection drug users aged 18–25. AIDS and Behavior, 6(1), 55–67. doi:10.1023/A:1014528612685. Johnston, L. G. (2007). Conducting respondent driven sampling (RDS) in diverse settings: A manual for planning RDS studies. Atlanta, GA/Arlington, VA: Centers for Disease Control and Prevention/Family Health International. Johnston, L. G., Khanam, R., Reza, M., Khan, S. I., Banu, S., Alam, M. S., et al. (2007b). The effectiveness of respondent driven sampling for recruiting males who have sex with males in Dhaka, Bangladesh. AIDS and Behavior, 12(2), 294–304. Johnston, L. G., O’Bra, H., Chopra, M., Mathews, C., Townsend, L., Sabin, K., et al. (2008). The associations of HIV risk perception and voluntary counseling and testing acceptance to HIV status and risk behaviors among men with multiple sex partners in a South African township. AIDS and Behavior [Epub ahead of print]. Johnston, L. G., Sabin, K., Hien, M. T., & Huong, P. T. (2006). Assessment of respondent driven sampling for recruiting female sex workers in two Vietnamese cities: Reaching the unseen sex worker. Journal of Urban Health, 83(Suppl. 7), 16–28. doi: 10.1007/s11524-006-9099-5. Kemeny, J., & Snell, J. (1960). Finite Markov chains. Princeton, NJ: Van Nostrand. Ma, X., Zhang, Q., He, X., Zhao, J., Sun, W., Yue, H., et al. (2008). Trends in prevalence of HIV, syphilis, hepatitis C, hepatitis B, and sexual risk behavior among men who have sex with men: Results of three consecutive respondent-driven sampling surveys in Beijing, 2004–2006. Journal of the Acquired Immune Deficiency Syndromes, 45(5), 581–587. Magnani, R., Sabin, K., Saidel, T., & Heckathorn, D. D. (2005). Sampling hard to reach and hidden populations for HIV surveillance. AIDS, 19(Suppl. 2), S67–S72. doi:10.1097/01.aids. 0000172879.20628.e1. Malekinejad, M., Johnston, L. G., Kendall, C., Kerr, L. G. F. S., Rifkin, M., & Rutherford, G. W. (2008). Using respondentdriven sampling methodology for HIV biological and behavioral surveillance in international settings: A systematic review. AIDS and Behavior, in press. Ogunnaike-Cooke, S., & Bombereau, G. (2007). Report of the pilot behavioral and HIV seroprevalence surveillance surveys of men who have sex with men (MSM) and female sex workers (FSW) in Antigua and Barbuda and St. Vincent and the Grenadines. Portof-Spain, Trinidad and Tobago: Caribbean Epidemiology Centre-Special Programme on Sexually Transmitted Infections. Platt, L., Bobrova, N., Rhodes, T., Uusku¨la, A., Parry, J., Ru¨u¨tel, K., Talu, A., Abel, K., Rajaleid, K., & Judd, A. (2006). High HIV prevalence among injecting drug users in Estonia: Implications for understanding the risk environment. AIDS, 20(16), 2120– 2123. Salganik, M. J. (2006). Variance estimation, design effects and sample size calculations for respondent driven sampling. Journal of Urban Health, 83(Suppl. 7), 98–112. doi:10.1007/s11524006-9106-x. Salganik, M. J., & Heckathorn, D. D. (2004). Sampling and estimation in hidden populations using respondent-driven
AIDS Behav (2008) 12:S131–S141 sampling. Sociological Methodology, 34, 193–239. doi:10.1111/ j.0081-1750.2004.00152.x. Semaan, S., Lauby, J., & Liebman, J. (2002). Street and network sampling in evaluation studies of HIV risk-reduction interventions. AIDS in Review, 4, 213–223. Simic, M., Johnston, L. G., Platt, L., Baros, S., Andjelkovic, V., Novotny, T., et al. (2006). Exploring barriers to ‘respondent driven sampling’ in sex worker and drug-injecting sex worker populations in eastern Europe. Journal of Urban Health, 83(Suppl. 7), 6–15. Stulhofer, A., Bacak, V., Bozicevic, I., & Begovac, J. (2007). HIVrelated sexual risk taking among HIV-negative men who have
S141 sex with men in Zagreb, Croatia. AIDS and Behavior. [Epub ahead of print]. Trummal, A., Fischer, K., & Raudne, R. (2006). HIV-Nakkuse Levimus ning Riskika¨itumine Prostitutsiooni Kaasatud Naiste Hulgas Tallinnas. Uurimuse Raport. Tallinn, Estonia: Tervise Arengu Insituut. Trummal, A., Johnston, L. G., & Lo˜hmus, L. (2007). Men having sex with men in Tallinn: Pilot study using respondent driven sampling: Final study report. Tallinn, Estonia: National Institute for Health Development.
123