International Journal of Public Health Research 2016; 4(6): 47-59 http://www.openscienceonline.com/journal/ijphr ISSN: 2381-4829 (Print); ISSN: 2381-4837 (Online)
The Delphi Technique: Methodological Considerations and the Need for Reporting Guidelines in Medical Journals Catalin Toma*, Iuliana Picioreanu Centrul Pentru Medicina de Familie Holmed, Holboca, IASI, Romania
Email address
[email protected] (C. Toma) *
Corresponding author
To cite this article Catalin Toma, Iuliana Picioreanu. The Delphi Technique: Methodological Considerations and the Need for Reporting Guidelines in Medical Journals. International Journal of Public Health Research. Vol. 4, No. 6, 2016, pp. 47-59. Received: October 14, 2016; Accepted: October 28, 2016; Published: December 6, 2016
Abstract Background. The Delphi technique has been increasingly used in healthcare research over the past decade. Study aim. To present the development workflow and methodological issues encountered while designing a Delphi study targeting family care practitioners in northeast Romania. Method. Extraction from existing literature and analysis of several reported factors which may decrease the internal validity of a Delphi study and therefore, the overall study reliability: expert selection criteria and process, expert retention, dropout management, types of questionnaires, questionnaire delivery, data analysis in qualitative and qualitative rounds and construct of Likert questionnaires. Result. Based on the existing literature data and taking into account the study objectives we propose a design allowing a decrease of the systematic bias and an improvement of the study findings: purposive stratified sampling, financial incentives, dropouts’ inclusion in the analysis, dual administration of questionnaires, open-ended questionnaire for round 1, Likert forms with 7 ascending, fully labeled, horizontal response options including a midpoint. Conclusion. Careful consideration should be given to all variables that may influence the internal validity or induce a consensus bias in healthcare studies using the Delphi technique. A more detailed and standardized reporting is needed for Delphi studies published in biomedical journals and we propose a checklist to be used by readers and editors.
Keywords Delphi Technique, Health Care, Delphi Reporting, Expert Panel, Likert Questionnaire, Qualitative Research
1. Introduction Romania has been going through a strenuous period, full of challenges for the healthcare system, which has to cope with financial constraints due to the economic crisis that reduced both the public expenditure and the consumer spending on health services, with personnel problems as a result of migration of physicians and healthcare personnel with secondary education as well as with significant changes in the demographic structure [1-7]. The family medicine practice includes the integrated continuous management of a group of patients requiring long term care: patients with cardiovascular risk factors (hypertension, dyslipidemia, diabetes), respiratory diseases (COPD, asthma) and chronic kidney disease. We can
presume that the management of such patients is influenced by economic and social issues, the overcrowding of medical facilities, lack of logistic support in primary care, problems caused by the migration or aging of the population, legislative instability or health system infrastructure. The subject of primary care in Romania is not encountered often in the medical literature: in fact, a PubMed search (15.04.2015) using a few keywords and combinations (“primary care” - 93764 results, “primary care” AND “Romania” - 49 results and “primary care” AND “Iasi” - 12 results; search using MeSH term “Romania” generated 2462 results over the last 10 years) resulted in a remarkably low number of references related to our topic of interest. Given the importance of this patient group in the daily practice and to the purpose of assessing the complexity of the
48
Catalin Toma and Iuliana Picioreanu: The Delphi Technique: Methodological Considerations and the Need for Reporting Guidelines in Medical Journals
background surrounding the relationship between family physicians and patients with chronic diseases, we aimed at conducting a study that would describe the overall framework in which family physicians provide care for such patients and evaluate the level of consensus related to the main issues identified in a well-defined professionally, economically and geographically community: Iasi County, Northeast Romania. When coming to the selection of the method to be used in such a study we had to choose between two established methods used to gather opinions on complex issues: the Nominal Group Technique and the Delphi technique [8-9]. As a method which does not require physical proximity between group members and guarantees response anonymity the Delphi method was our preferred choice. The Delphi technique was developed and used by the Rand Corporation in a project sponsored by US Air Force in the early 1950s. The purpose of such study was to call forth an
opinion consensus related to the most likely aimed at US industrial targets from the viewpoint of Soviet strategic planners. A panel of 7 experts were asked to evaluate the targets and estimate the number of 20kT atomic bombs required to reduce the industrial munitions output (expressed in USD) by 75% should a war broke out between the US and Soviet Union [9]. The Delphi method can be briefly defined as a structured communication technique that allows a group of individuals, acting as a whole, to deal with complex problems. This implies that, at each stage of the study, all the participants have access to the aggregate data gathered during the previous one and each participant’s decision taken at each stage represents a sum of their personal opinions, individual reasoning, overview of the group’s approach and scientific information available at the time the research study is conducted [10]. Four key features characterize a Delphi process [8], [11], [12], namely:
Table 1. Key features of a Delphi study [8], [11], [12]. Anonymity of response Multiple iteration of the questionnaire Controlled feedback Statistical derivation of the group response and its dissemination to the experts
The experts’ answers are anonymous (in relation to the other respondents), thus eliminating possible group members’ tendency to impose their opinions on others. The questionnaires are delivered repeatedly with a view to assessing the degree of generated consensus and possible ranking of various items. The answers to the questionnaire surveys are grouped, synthesized and provided in a standard format/form to all participants by the research study coordinators. The statistical evaluation of the group members’ responses is performed using descriptive measures of central tendency (the median) and measures of score dispersion (frequency tables). Such values are communicated in numerical/graphical format to the participants to provide them with an overview of the position of the whole group as well as to allow them to modify or maintain their previous judgments.
The Delphi technique is used more and more in various healthcare domains in order to deal with complex questions to which the classical evidence-based medicine approach could not answer or was not applicable (oncology, dermatology, rheumatology, surgery, emergency medicine, primary care, neurology and many others): an interrogation of the Pubmed database gave us more than 5,000 papers published since 1973 and more than 3,700 only in the last 10 years. The purpose of this paper is to describe the methodological steps of a Delphi study in primary care medicine and to
identify, based on the available literature, potential systematic bias risk factors which may affect the internal validity of such study. A secondary objective is to put forward for consideration a reporting and reading template for Delphi studies published in biomedical journals.
2. Material and Method A chart of the Delphi process and the emerged questions can be seen in the figure below.
Figure 1. Chart of a Delphi process and main questions that emerged.
International Journal of Public Health Research 2016; 4(6): 47-59
2.1. What Is an Expert and Which Are the Eligibility Criteria The primary objective of our study is to identify the main issues considered important by the family doctors from the Iasi County when dealing with chronic diseases patients. This means that we need a homogeneous panel in terms of structure and professional experience and able at the same time to provide as many different opinions as possible based on their professional experience. The homogeneity of the expert panel is assured by the first set of criteria to which we added the expert’s availability: minimum 5 years of practice in family medicine workplace located in the Iasi County area at least 200 patients with chronic conditions availability to participate in the study throughout its entire duration We believe these criteria are in accordance with the primary objective of our study and comply with the principles formulated in several publications that have explored this chapter of the Delphi method [10], [12–14]. Expert panel heterogeneity is considered to be important for a Delphi study [15], but in our case we opted for a homogeneous composition and tried to enhance the diversity of opinions, through a selection algorithm stratifying experts by workplace location (urban/rural weighted according to the proportion of population in the Iasi County) and distance to tertiary/secondary health care facilities ( 40 km) 2.2. How and How Many Experts Do We Recruit The process of expert panel recruitment will be initiated by the coordinating committee of the study and will involve the drafting of a list containing family physicians from the Iasi County who meet the eligibility criteria. Since there are data suggesting that the stability of the responses given by a group homogeneous in terms of education level is constant even if the number of experts involved in the Delphi process increases [16], we propose the following steps in reaching our target consisting of a panel of 50 physicians (estimating 23 physicians from the urban area and 27 from the rural one, out of which 14 at a distance > 40 km from secondary/tertiary health care facilities): the initial list will include 100 names of physicians who will be contacted over the phone and in writing /by email and will receive an official invitation to participate in the study; considering a 50% acceptance rate, the list of participants will include 50 individuals; should the number of those who have accepted be lower than 50, the experts who have confirmed will be asked to recommend a fellow practitioner to take part in the study until the number 50 is reached; in addition, in case of contingencies that may prevent physicians who have accepted to participate from
49
starting the study, we take into consideration the designation of 5 replacements. Following the time of acceptance to take part in the study, each expert will receive a detailed description of it, a sheet asking for demographical information and will sign a consent form for participation. After signing the consent form an identification code will be assigned to each expert in order to better track the responses and organize data. 2.3. What Method Do We Use to Retain the Experts Along the Delphi Rounds Achieving and maintaining a high response rate is a critical objective of a Delphi study [17] as a high dropout rate will affect the consensus level and lead to biased study findings. As it results from VanGeest’s analysis [18] and other similar studies [19–21], in order to maximize the participation and response rate, we considered the use of financial incentives which would allow the expert to choose from a series of options (listed in order of the estimated influence on the response rate): an allowance /voucher handed out at the beginning of the study; an allowance handed out at the end of the study; the alternative of donating the equivalent of the allowance to a non-profit organization on their own behalf, or; participation in a prize lottery; In line with a literature review which examined the factors that affect oncologists’ participation in surveys [22], we have planned a permanent communication with the panel experts in order to increase the rate of response and prevent dropouts: all experts will receive motivating phone calls and a study newsletter every 2 weeks. Taking into consideration the circumstances in which the study is conducted, we trust that providing financial incentive options as described above, along with very good communication between the coordination committee and the panel are the best ways in which we can effectively secure the objective of having at least 90% of the experts complete the entire study. 2.4. How Do We Manage the Dropouts and Impact on Consensus Study dropouts will be managed as follows: Dropouts before round 1 and the receiving of the questionnaire: we consider they do not affect the study – the expert is replaced; Dropouts before round 1 but after the receiving of the questionnaire: we consider they do not affect the study – the expert is replaced; Dropouts before round 2 and the receiving of the questionnaire: we analyze the received data and consider they affect the process of obtaining the consensus. The inexistence of a rating does not imply
50
Catalin Toma and Iuliana Picioreanu: The Delphi Technique: Methodological Considerations and the Need for Reporting Guidelines in Medical Journals
that the issues to which the respondent made reference in Round 1 are unimportant. We can assume that an issue mentioned by the expert during Round 1 and not assessed in the following rounds would attain a higher rating at the end of the study. Consequently, in such cases, we suggest a favorable intermediate rating for the issues/topics set forth by the expert. The review is going to be carried out with and without taking into consideration such opinions so that we can have a clear image of the impact on the consensus. Dropouts before Round 2 but after receiving the second questionnaire: we consider that leaving the study at this point may represent a disagreement or be the effect of cognitive load induced by the questionnaire form design. In such cases, we will assign a favorable rating to the topics approached by the expert in Round 1 and a neutral response or slight disagreement to the topics not covered by the expert in Round 1; Dropouts before Round 3 and the receiving of the questionnaire: the opinions expressed during Round 2 are taken into consideration; Dropouts before Round 3 but after receiving the questionnaire: the opinions expressed in the previous round are taken into consideration. 2.5. What Kind of Questionnaire Do We Begin with The first questionnaire sent to the participants in a Delphi study has usually one of the following two designs: open-ended questionnaire; structured questionnaire. In the decision-making process related to which type of questionnaire would fit best, we had to take into consideration, as usual, the primary objective of our study, namely to generate an overview of the problems the family physicians from the Iasi County confront in providing care to their patients with chronic conditions. Moreover, we were aware of the fact that each option has its advantages and drawbacks and we tried to estimate them before taking any decision. The open-ended questionnaires encourage freedom of expression and a higher diversity of opinion, the collected data being rich in information. At the same time, however, there is the risk of altering the final results of the study by neglecting major issues already identified by other researchers, issues existing and confirmed in the literature [23], [24]. Advantages of the open-ended questionnaire: affluence of information acquired, different perspectives. Drawbacks: failure to include major subjects already confirmed in the literature, possible limitation of the number of issues an expert can bring up if no minimal number of issues is requested. The structured questionnaire is usually based on a systematic literature review conducted to select the key issues relevant to the objectives of the study. Franklin underlined the difficulty in creating the first questionnaire
and pointed out the risk of neglecting major issues due either to the fact they are recent and therefore not yet included in the literature or to the lack of recognition in the literature [25]. Advantages: use of already confirmed issues from the very start and the fact that experts may add their own opinions Drawbacks: absence/omission of major issues not published, confirmed or discovered yet, time and resource consuming. The cultural, social, economic and occupational background represents a matrix which can individualize fairly well the perspectives of a family physician from the Iasi County/Romania. Therefore, without leaving out certain established facts in the field of family medicine pertaining to patients with chronic diseases, we are not convinced these can be extrapolated to/automatically associated with a family physician from Romania or Iasi County. As already mentioned in the introduction, the available literature covering our subject does not seem very replete and therefore, made us give up the option of a literature review and choose for the first round of our study an open-ended questionnaire in which we solicit the enumeration of at least 6 issues/topics. 2.6. How Will the Questionnaires Be Delivered in the Three Rounds In a narrative review published in 2005, Bowling synthesized the main risks of error which may occur depending on the mode of survey questionnaire administration. Of the modes of administration indicated by the author we retained the computer-assisted selfadministered interviewing (questionnaire sent by email and completed on the PC or questionnaire completed via web) and the “paper and pencil” mode (in which the questionnaire is sent by post or hand-delivered to the respondent). The other enumerated modes, i.e. face-to-face interview survey, telephone survey, are not applicable to a Delphi study. The risks of error identified in relation to the self-administered questionnaires were generated by the cognitive burden, the order of the questions and response choices, recall bias, and the tendency to avoid communicating information considered sensitive [26]. Taking into account how a Delphi process is performed and the objectives of our study, we can exclude the “recall bias” (the experts are communicated how they responded) and partially the avoidance of communicating sensitive information. Practically, in our case, we should choose among the following: self-administration of an electronic questionnaire which can be completed on the personal computer, independent of the Internet connection; self-administration of a questionnaire completed via web and administered by means of a software; classic “paper and pencil” administration Of the three options, we selected the electronic questionnaire survey doubled in Rounds 2 and 3 by the “paper and pencil” questionnaire to the purpose of increasing
International Journal of Public Health Research 2016; 4(6): 47-59
the rate of response and motivating the experts to continue their participation in the study [19–22]. We consider it is the best option for our case because, compared to the web option: it does not require an Internet connection; it does not require the use/running of a web application; it allows the expert to spend any amount of time for its completion; it allows browsing through the written responses and their correction; it allows saving of the document at any time and continuation of it later on; it allows the adaptation to the expert’s knowledge of working with printed materials; it provides information which is easy to understand and transfer on another digital support Such mode of administration is still susceptible to the above-mentioned errors (however, we think that those pertaining to both the cognitive load and question order are smaller due to the absence of time pressure), but the main factor taken into consideration in our selection process was the participation rate. The double administration (PDFformat plus pencil-and-paper questionnaire) provides better chances for expert retention.
51
2.7. Analysis of Data Collected in Round 1 The first round of our study is qualitative and consists in the administration of an open-ended questionnaire to solicit the panel experts’ opinions or examples pertaining to the problems they encounter in the relation with their patients with chronic diseases. Taking into account that our objective at this stage is to inductively develop a structured set of topics and subtopics for the Round 2 questionnaire, we decided to use for the qualitative analysis of the data the method of thematic analysis. In what concerns the organization of the process, we took our inspiration from the methodology proposed by Braun and Clarke [27], both in terms of data organizing and the analysis itself, adding two more elements to those enumerated, namely: - together with the panelists, review of the way in which the iterated themes and sub-themes are consistent with their ideas; - calculation of indices of primary data use and the weight of themes and sub-themes in relation to the primary data.
Table 2. Data definition for the qualitative analysis [27]. Description
Our study
Data corpus
All data collected from the respondents
The 50 questionnaires completed by the physicians
Data set
A set of data that is being used for a particular analysis
The data used to produce a particular theme / sub-theme
Data item
Each individual piece of data
The questionnaire completed by a single physician
Data code
Code selected from a data item
A single idea selected from a questionnaire completed by a single physician. Ideally, the code should reflect the connection between only two variables, one associated to the patient and the other to a future pattern Table 3. Stages of qualitative analysis [27].
Phase Familiarizing with data
Generating codes
Objectives Reading the collected responses by 2 people involved in the process to have a comprehensive overview the entire “Data corpus” Generation of as many “data codes” relevant for the objectives of the study as possible. The code will include, when necessary, the surrounding words to avoid context loss, and will be permanently checked against the pre-existing codes to allow and enable cross indexing 28 as well as the estimation of the saturation level. Calculation of the extent of use of the primary data: Total number of characters in data codes/ Total number of characters in Data corpus.
Searching for themes and sub-themes (Data set)
Identification of the themes characterized by a group of homogeneous codes irrespective of their frequency.
Reviewing themes
Review of the codes associated to the themes and code re-creation or re-assigning if necessary. Verification of the internal homogeneity and external heterogeneity for all the themes and sub-themes. Decision to merge or expand themes or subthemes depending on their relevance in terms of the information (codes) they contains. Calculation of the theme/primary data (total number of words in the Data set/total number of words in the Data corpus) and frequency/data item for each theme (how many data items the codes of a theme or sub-theme contain)
Defining and naming themes
Naming the themes and sub-themes in relation to the objectives of the study.
Reviewing along with panelists
Reviewing with each expert the manner in which the wording of the themes and sub-themes correspond to the ideas and perspectives expressed when they completed the questionnaires, deployment of the received comments and confirmation of the changes together with them.
Producing the report
Preparation of the questionnaire to be used in Round 2 Delphi
52
Catalin Toma and Iuliana Picioreanu: The Delphi Technique: Methodological Considerations and the Need for Reporting Guidelines in Medical Journals
2.8. What Methods Do We Use to Present and Analyze the Results from Rounds 2 and 3 The controversies related to the statistical methods most suitable for the Likert type questionnaires seem to be generated both by the terminology used [29], [30] and the types of ordinal variables grouped by Kampen in 5 categories [31]. Recently, reasoned opinions in favour of the parametric tests for ordinal data [32] as well as recommendations on how to justify the selection of statistical analysis methods in research protocols have emerged [33]. The questionnaires of our study will contain individual questions to be treated independently from one another and therefore, we decided to use the median as measure of central tendency and frequency tables for the distribution of answers. Their use is justified by the ordinal nature [34], [35] of the variable (it indicates an order of the more than/less than type, but it does not show the magnitude of the difference between the choices of answer): an unstandardized discrete variable
with ordered categories [31]. 2.9. Development of the Likert Questionnaires The development of a Likert questionnaire integrated in a Delphi process gave rise to many questions, among which, in our opinion, those related to the avoidance of systematic errors in the experts’ answers were the most important. For our study, we are interested in the items which may generate a consensus bias and how to minimize it while moving between the study objectives, panel composition and questionnaire related variables. In a paper published by van Vaerenbergh and Thomas [36] the response style in surveys was analyzed in terms of the systematic errors which may be induced either by the stimulus (i.e. the questionnaire) or the respondent’s personal characteristics. There are different types of response styles and some of them are affected by the way we will construct the questionnaire.
Table 4. Types of response style bias (adapted from van Vaerenbergh and Thomas, 2013[36]). Definitions ARS NARS
Likert scale with 7 response options Acquiescence response style Net acquiescence response style
ERS
Extreme response style
MRS
Midpoint response style
MLRS
Middle range(mild) response style
Tendency to select the positive response options Tendency to select more positive options than negative ones Tendency to select extreme options(positive or negative) Tendency to select the midpoint response option Tendency to select moderate(middle range) options
Among the sources of errors that can be generated at the stimulus level, taking into consideration their influence on the reliability and validity of the questionnaire, we decided to examine the response scale format. The other factors enumerated by van Vaerenbergh (mode of data collection, topic involvement and cognitive load) are marginally discussed in this paper. 2.9.1. How Many Answer Choices Will We Use for the Form Themes The Likert type questionnaires used in Delphi studies usually contain questions with 5 or 7 answer choices. Why not less? Why not more? Which option would be the most appropriate for our study? Based on a study involving a simulation on three types of response scales (i.e. dichotomous, ordinal and numerical), Cicchetti recommended the use of an ordinal scale with 7 answer choices as the best option from the viewpoint of scale reliability [37]. Preston [38] tested 2 to 11 answer choices and concluded that 7 answer choices represented the most appropriate option in terms of reliability and validity. His conclusion was consistent with Miller’s observations [39] related to the ability of the human brain to perceive and organize onedimensional stimuli, but contrary to Matell’s findings [40]
that did not point out any relation between reliability or validity and the number of answer choices. Weng suggested that 6 or 7 response choices represented the best option when there were no problems related to the participant’s cognitive ability and ensured a good reliability, fact confirmed in their study also by retesting the previously administered questionnaires [41]. A simulation performed by Lozano showed that the optimal number of response alternatives was between 4 and 7 (with maximum reliability and validity at 7), while the gain in reliability and validity was insignificant after more than 7 response options [42]. A more recent study indicated that 7 answer choices proved to be more reliable than 5 in terms of scale sensitivity [43]. Weijters also recommended the use of 7 response options for groups with good cognitive abilities [44]. 2.9.2. Neutral Response (Midpoint) Option Matell [45] tested 18 different Likert scale response options (ranging from 2 to 19 choices) and noted that the number of neutral responses decreased as the number of response alternatives increased. Thus, if at 5 response alternatives the neutral option was chosen by 21% of the respondents, at 7 alternatives such percentage decreased at 12%, while at 9 was 11%. This suggests that the existence of
International Journal of Public Health Research 2016; 4(6): 47-59
a neutral alternative does not affect the expression of a clear opinion when the questionnaire has sufficient response alternatives, fact also confirmed by Presser [46]. Garland suggested in a study, in which the questionnaire was not anonymously administered, the elimination of the neutral response (mid-point) to alleviate the potential social desirability bias [47]. In this case, the anonymous survey administration (one of the strengths of the Delphi technique) and the dynamic nature of the Delphi process (the opinions are expressed and adjusted over several rounds) are two arguments in favour of the midpoint existence. Clark and Watson underlined the importance of the selection of an appropriate text to label the midpoint option being against its elimination since it would force respondents to "fall on one side of the fence or the other" [48]. Krosnick pointed out in a study that inclusion of a neutral response option (in this case, there were surveys in which four different options were examined, namely “I haven’t thought much”, “I would not vote”, “I have no opinion”, and “I don’t have enough information”) was preferred by the respondents with low cognitive skills or inclined to devote little effort to the process and concluded that the presence of the neutral option might preclude potentially more meaningful opinions [49]. One of the features of a Delphi study is that it does not require an immediate answer and allows sufficient time for the congregation of an opinion/standpoint. The experts’ level of education and cognitive ability in our study are not actually subjects of discussion, and the offering of a neutral response option would not, in our opinion significantly influence the experts’ answers. Nowlis [50] investigated the effect of removing the neutral option in a series of studies focused on the attitude towards a certain type of stimulus manipulated in such a way to induce the respondents either indifference or ambivalence. He concluded that when the respondents have an ambivalent attitude towards the stimulus, the elimination of the neutral option would change the distribution of responses. Such finding did not apply to the indifferent respondents. The connection between the intensity of the attitude towards a topic and the preference for neutral response was previously confirmed by Presser [46]. Presser has also shown that an increase of the response options number leads to a decrease of midpoint selection by respondents. Taken into account the direct involvement of the panel experts into the development of questionnaire items, it seems highly unlikely they would have an indifferent attitude towards the questions on which they are asked to provide an opinion. Kulas pointed out, in a test-retest experiment, a consistent association between a neutral response and the necessity to place the question in a particular context (midpoint in the test associated with “it depends” in the retest) [51]. Moreover, based on the calculated response times, he concluded that the cognitive demand to agree/disagree with a question is lower than the demand required by the neutral answer. The author suggests, based on the Tourangeau and Rasinski’s questionanswer model (interpretation – information retrieval –
53
judgment making – response choosing) [52], that the selection of the neutral alternative is connected to the interpretation of the question. A higher rate of neutral responses can signal the need either to reformulate items so that they are clear, unambiguous and easy to understand, or to contextualize them. These findings are important for a Delphi study both in relation to the use of the midpoint as an indicator for item rephrasing and as a way to collect more nuanced information on the reasons for expert’s neutrality. The hypothesis of a connection between the characteristics of the question/topic or those of the respondent and the selection of a neutral answer was investigated in a study conducted by the same authors and published in 2013. They confirmed there is not only a strong association between the predominant presence of midpoint responses due either to confusion or need for response contextualization and item characteristics, but also a negligible association with the respondent characteristics. In such a circumstance, the authors’ recommendation was simple: item or topic revision [53]. The influence of the neutral response was examined also in a study, which tested several hypotheses related to the directional biases in Likert-type questionnaires induced by the inclusion of a neutral response and the type of labeling (end-points only or full labeling) of the available response categories [44]. Two midpoint-relating hypotheses were tested and confirmed as statistically significant: when adding a midpoint, the level of NARS increases; when adding a midpoint, the level of ERS decreases; 2.9.3. Order of Response Options and Labeling Friedman examined the influence of the order and labeling on the responses and concluded that only the verbal cues associated to an option can affect the response, irrespectively of the position of such option [54]. The use of a descending scale as a reference and the absence of comparison to an ascending one have resulted in the insufficient investigation of the influence of response ordering. The use of labels for response options seems to increase the ARS and most likely NARS levels. We may presume that an increasing scale would reduce ARS and NARS. Chan investigated the effect of the order in which the response options are presented on the received answers (primacy effect – tendency for the choices placed on the left of the scale) using questionnaires with 5 response alternatives. The questionnaire in which the choices were arranged in a descending order (the positive form) yielded a significantly higher number of positive responses than the questionnaire with ascending options (the reversed form) [55]. According to the findings of the study, choosing a questionnaire with response alternatives in an ascending order would result in a decrease of ARS and NARS, but also in a possible DARS increase. Friedman compared the responses in two types of Likert scales containing 10 items with 5 response options and found
54
Catalin Toma and Iuliana Picioreanu: The Delphi Technique: Methodological Considerations and the Need for Reporting Guidelines in Medical Journals
that descending ordering had a significant influence on the ARS, concluding that there was a response bias induced by the orientation towards the options on the left side of the scale (ARS and NARS increase in the descending scale, DARS increase in the ascending scale) [56]. In 1997, Albanese tested the effect of response ordering using 3 models of a Likert scale (5, 6, and 7 response alternatives) arranged in ascending and descending order. He noticed a strong tendency to choose more positive answers when these were placed on the left side of the scale in all the tested models (ARS) [57]. This effect (i.e. primacy effect) was confirmed by the same author in another study in which he also underlined the need for correct labeling of all the response options [58]. In 1999, Barnette conducted a study in which he tested 4 forms of a 20-item questionnaire with response options arranged in an ascending and descending order, and with negative wording for 10 selected items. The study did not detect any influence of the order of response options [59]. He admitted in another study that the negative wording was not the best choice after all and suggested the reversal of response scale orientation without changing the item [60]. Using the same set of items as Chan but combined into two factor groups, Weng concluded that the order of response options did not affect response distribution [61] even if he did notice some influence (which was attributed to the item wording) in certain items. In his opinion, the participants’ motivation and clarity of the items were more important than response order. In another study, the same author recommended the labeling of all the response alternatives to the purpose of obtaining stable responses as a result of the accurate interpretation of the provided response options [41]. In an experiment which included 362 students, half of the participants received questionnaires with a descending scale, while the other half received an ascending scale; besides the 5 response options, a “not applicable” option was added with a view to excluding the students giving this response from analysis. The test showed a statistically significant difference between the favourable responses for the descending scale and those for the ascending scale in favour of the former. By contrast, unfavorable responses were overrepresented for the ascending scale [62]. In another experiment which included the administration of the same items with 5 response options placed both on an incremental scale and a decremental one, Hofmans discovered a significantly higher number of positive responses (“agree” and “rather agree”) with the decremental scale. No remarkable difference was noticed for the negative response options (“disagree” and “rather disagree”). Despite the subtle differences made by Hofmans in his findings, one can say that ARS and NARS were influenced by the decremental scale, while the use of the incremental scale had a certain decreasing effect on them [63]. Weijters, comparing end-points labeling with full labeling on an ascending scale, pointed out the influence of full labeling on the response style and found an increase in the NARS and decrease in ERS. His recommendation was for a
fully labeled scale when an opinion measurement study is done within a population with good cognitive abilities [44]. Yan T examined the effect of the response order in a survey administered by telephone and confirmed an ARS increase for the descending scale [64] independently from the respondents’ cognitive skills or their motivation level. In an experiment involving 6 types of a Likert scale, Maeda concluded on the shifting of the response tendencies to the left of the scale both in the case of the ascending scale and the descending one and did not find any influence of the response configuration on the internal consistency or factorial validity [65]. 2.9.4. Layout of Response Options The manner in which the response options are presented is important as well, even if so far the results have been contradictory and require further investigation. Christian and Dillman showed how different modes of data collection and response configuration on the questionnaire can trigger different types of responses when the questions are graphically or verbally manipulated [66], [67]. Toepoel examined the influence the verbal language (arrangement of labels added to responses, decremental vs. incremental), graphical language (font, colour, emphasizing response options) and numerical language (numbers associated with response options) have on the distribution of responses to 5-point Likert questions arranged in different ways (horizontal format, vertical format, in columns). His findings suggested that a horizontal layout is less susceptible to modify the response distribution than the other tested formats and he recommended the use of a horizontal layout, fully labeled and without added numbers [68]. Maeda tested six response options (3 horizontal and 3 vertical: ascending, descending, and mixed) and recommended for online administered questionnaires the use of vertically oriented response scales as an alternative to eliminate directional bias [65].
3. Discussions There are numerous discussions related to the evaluation criteria applicable to qualitative research. Morse suggests using the concepts of reliability and validity so that qualitative research would not be left without funding and underlines that creating a parallel terminology will only diminish the rigour/thoroughness of qualitative studies. “We argue that strategies for ensuring rigor must be built into the qualitative research process per se. These strategies include investigator responsiveness, methodological coherence, theoretical sampling and sampling adequacy, an active analytic stance, and saturation.” [69] Our attention was focused particularly on investigating potential sources of error and their appropriate correction prior to starting the study. We identified 12 potential error sources and proposed actions in order to diminish their effect.
International Journal of Public Health Research 2016; 4(6): 47-59
Seven of the twelve possible sources are related to what we would call “consensus bias” (attaining an overrated consensus due to an error in the study design and management). Even if the purpose of a Delphi study is to analyze/assess the level of consensus about various topics, we believe that an artificially enhanced consensus combined with other design related factors can distort the findings of the study, especially by including topics for which the predefined level of consensus has not been reached in reality. A debatable decision can be considered the one pertaining to the use of a neutral response but, since the benefits are more substantial than the drawbacks, we believe that our decision is in consonance with our study as a whole. (1) Criteria for expert panel selection. The criteria we propose for selecting the experts take into consideration their professional experience, the number of patients with chronic conditions they have in care, geographic location of their medical facility (urban vs. rural) as well as the distance to secondary and tertiary healthcare units. We do not believe these criteria are restrictive and we estimate that their influence on the sampling process is negligible. (2) Purposive stratified sampling. The sampling process is going to take place inside a professional community well defined in terms of location and number, stratified by practice location and distance to secondary and tertiary healthcare structures. Thus, a panel of 50 experts represents over 10% of the total number of family physicians in the county and not aiming at developing a representative sample, we consider that their opinions will rather accurately reflect the judgment of the entire group. (3) Management of expert panel. This is a critical aspect of any Delphi study [70], [71] and its objective is to maximize the participation of the experts in all the rounds because discontinuance affects the final results. The combination of financial incentive and intensive communication would be an efficient way to secure the participation of at least 90% of the panel experts. (4) Management of dropouts and their inclusion in analysis. We think that by replacing the experts leaving the study before round 1 and taking into consideration the judgments of those who leave after rounds 2 and 3 upon assessing the consensus, the risk of a final consensus bias is decreased. (5) Open-ended questionnaire round 1. The decision to begin the study with an open-ended survey is in line with the background and objectives of the study, being influenced also by the low probability to be able to develop a structured questionnaire based on literature data. On the other hand, we expect the findings of our study to reflect not only the present situation, but also to give rise to suggestions for the future based on the experts’ practical experience – hence the orientation towards an open-ended questionnaire. (6) Thematic analysis round 1. The decision to use thematic analysis arose from the need to develop a structured questionnaire for round 2. The methodology includes constant comparison between the codes emerged from a questionnaire (data item) and the existing codes and, together
55
with the panelists, review of the way in which the iterated themes and sub-themes are consistent with the expressed opinions. Audit trail and external audit of the data analysis are also two methods aimed to decrease the risk of bias (wrong or partial reflection of expert opinions in the round 2 form). (7) Administration of questionnaires. Questionnaire administration affects the response rate and can induce a bias on final findings due to dropouts from study. The decision to employ a questionnaire in PDF format sent by email was taken out of practical reasons, while backing up the PDF questionnaire with a “paper and pencil” one was a measure having as goal better response rates [22]. (8) Questionnaire round 2 and 3. The decision on the structure of the Likert-type questionnaire used for rounds 2 and 3 was prompted by the need to limit the errors generated by response style, improve the sensitivity of the participants’ opinions and identify items that need rephrasing, decrease the ERS level and offer to the expert the option to explain a neutral response. We therefore decided to use “Likert items” with - 7 response options; - a neutral response (midpoint) option with a field allowing the expert to explain the neutral point of view ; - ascending response options; - fully labeled response options; - response options in a horizontal format Having decided on these points we were able to move forward with some other variables related to our study. 1. Consensus level. We have chosen to consider that consensus has been reached on a theme when at least 80% of the experts’ opinions rating consist of “strongly agree” or “agree” options. 2. Timeframe. The study is expected to take between 4 and 6 months to complete and this duration will be transmitted from the beginning to all invited experts. 3. Consensual themes management. If consensus has been reached on a certain theme it will be moved under a separate section in the round 3 form. 4. Minority opinions/themes dropping. We do not plan to drop themes with high percentages of disagreement. 5. Stopping the study. As mentioned in the study diagram the study will stop after the 3rd round. 6. Reporting. Full reports with study findings will be sent to all experts accompanied by a “thank you” letter. Subgroup analyzes will be performed for urban and rural practice locations and for the groups with easy/difficult access to secondary or tertiary healthcare structures. 7. Expert experience with the study. All experts will be asked to answer a short survey at the end of the study in order to rate their experience and suggest improvements. While trying to offer an overview of the Delphi design choices we are aware about the weaknesses of our paper. There are very few articles in the literature allowing to make solid methodological options and the field of Delphi study
56
Catalin Toma and Iuliana Picioreanu: The Delphi Technique: Methodological Considerations and the Need for Reporting Guidelines in Medical Journals
design needs plenty of further research in order to clarify important issues like: size of the expert panel, cognitive load generated by the number and order of topics in the form, validation of study findings, validation of Likert constructs, expert panel management, influence of dropouts on study findings. The decisions to set the consensus level at 80%, the expert retention rate at 90%, to stop the study after 3 rounds or to keep minority opinions are not based on research results and we cannot claim that all these will offer a “bias free” study. This is not a recipe for a “bulletproof” protocol but rather a suggestion to analyze every facet and every step of the technique in order to decrease the risk of systematic bias. Detailed reporting of potential bias sources can be a way to maximize readers’ confidence in Delphi studies findings. In a review regarding the reporting quality of Delphi studies Diamond et al [72] focused on consensus definition and identified four criteria which define the quality of a Delphi report: stopping criteria specified, planned number of rounds, reproducible criteria for selection of participants, criteria for dropping items at each round. In his opinion consensus shouldn't be considered as an automated outcome of a Delphi study and the consensus threshold must be specified prior to starting the study. We consider that when the consensus level (or a predefined number of themes reaching consensus) is being used as stopping criterion the variables pushing forward/backward the consensus may become in fact a source of bias and these should be presented in any Delphi report. If a planned number of rounds are specified, the same variables potentially modifying the consensus need to be reported, especially the ones influencing the response style of participants. Dropping themes over the rounds is a frequently encountered practice and we totally agree that, when themes are dropped, criteria should be clearly specified in any report. In a review on the Delphi method use and reporting for the selection of healthcare quality indicators Boulkedid et al [73] noticed a lack of consistency in reporting details which are important for the interpretation of study results: how the feedback was provided to participants between rounds, response rates for each round and the definition of consensus. Based on their analysis a number of recommendations on the use and reporting of the Delphi method were made: report on study objective, expert selection criteria, first round questionnaire, definition of consensus and stopping criteria report on expert panel management and retention: willingness to participate, agreement, information letter, % participating to each round report on questionnaire sending: electronic and mail report on feedback provided to participants: quantitative and qualitative(based on expert feedback in each round) overall group response Sinha et al proposed three main topics to be addressed when designing and reporting a Delphi study: size and composition of the panel, methodology of the Delphi process and reporting of results [74]. Methodological rigor is an important chapter in Delphi
research debate [75] and therefore a more detailed reporting on the design, process and results are required in order to allow the reader to assess both the quality of the method and the study findings. Based on our literature analysis and some existing papers on this topic [73], [74] we propose a reading template which can summarize the design and management characteristics of Delphi studies and offer the reader a better methodological perspective on study findings. Table 5. Proposal for a Delphi study reading “template”. Reporting items Are the expert definition criteria reported? Expert selection process description reported? Is the expert panel management procedure described? Study dropouts management explained? Were the final results of the study sent to the panel experts? Expert experience with the study collected? Number of respondents for each round reported? Is the form administration procedure reported and/or explained? Type of form used for the first step of the study: is the author option explained? Qualitative data analysis process described/referenced? Likert items : how many response options? Likert items: is a midpoint used? Likert items: how are the response options ordered? Likert items: response options labeling reported (end points or continuous)? Likert items: display of response options reported? Data analysis and reporting between quantitative rounds explained? Is the consensus level defined? Planned duration of study mentioned? Consensual themes management reported (themes are being kept or removed after reaching consensus)? Minority opinions management (criteria for dropping the themes) reported? Rules for stopping the study (n rounds, consensus reached for a number of themes) presented?
YES
NO
Not clear
4. Conclusions When using Delphi technique one should look carefully to the potential sources of bias which have the potential to influence the quality of the findings. As a method used to gather expert opinions and evaluate the consensus on these opinions, the Delphi method can be biased through several variables: expert selection, expert panel management, dropouts management, questionnaire construction and administration, data analysis and reporting. We suggest in our paper that careful consideration should be given to all these aspects in order to improve the rigor of each Delphi research and consequently its internal validity.
International Journal of Public Health Research 2016; 4(6): 47-59
57
This is especially important for studies where the consensus level threshold is used in decisions to stop the study, rank or drop themes. We also suggest that a more structured reporting on the method in medical journals will help the reader to better evaluate the strengths and weaknesses of each Delphi study through the potential biases introduced by the choices made in the design phase.
[14] Baker J, Lovell K, Harris N. How expert are the experts? An exploration of the concept of 'expert' within Delphi panel techniques. Nurse Res 2006; 14: 59–70.
References
[17] Chia-Chien Hsu & Brian A. Sandford. Minimizing NonResponse in The Delphi Process: How to Respond to NonResponse. Practical Assessment, Research & Evaluation 2007; (12)17.
[1]
[2]
[3]
Pana M. România după 25 de ani (I) / Evoluția reală a PIB: Creșterea economică și deceniul pierdut, 2014. (http://cursdeguvernare.ro/romania-dupa-25-de-ani-i-evolutiareala-a-pib-cresterea-economica-si-deceniul-pierdut.html). Pana M. Sănătatea 2014 și banii ei: Bugetul scade, cheltuielile cresc. Subvențiile urcă la 0,7% din PIB, 2014. (http://cursdeguvernare.ro/sanatatea-2014-si-banii-ei-bugetulscade-cheltuielile-cresc-subventiile-urca-la-07-din-pib.html). Health at a Glance: Europe 2012: OECD Publishing, 2012. (Health at a Glance: Europe). ISBN: 9789264183605.
[4]
Eurostat. GDP per capita in PPS: Eurostat, 2014. (http://ec.europa.eu/eurostat/tgm/table.do?tab=table&init=1&l anguage=en&pcode=tec00114&plugin=1).
[5]
Osan A., Lungu D. Astărăstoae: Deficitul de medici din România a ajuns în jur de 40-42 la sută. Bucuresti: Mediafax, 2014. Available from: http://www.mediafax.ro/social/astarastoae-deficitul-demedici-din-romania-a-ajuns-in-jur-de-40-42-la-suta13215107.
[6]
Mullan F. The metrics of the physician brain drain. N. Engl. J. Med. 2005; 353: 1810–8.
[7]
Institutul National de Statistica. Populaţia după domiciliu la 1 ianuarie 2015 a scăzut cu 0,3% faţă de 1 ianuarie 2014. Institutul National de Statistica. Bucuresti, 2015. Available from: http://www.insse.ro/cms/files/statistici/comunicate/com_anual e/populatie/PopDom2015r.pdf.
[8]
Delbecq AL, Van de Ven, Andrew H, Gustafson DH. Group techniques for program planning: A guide to nominal group and Delphi processes: Scott, Foresman Glenview, IL, 1975. ISBN: 0673075915.
[9]
Norman Dalkey. An experimental application of the Delphi method to the use of experts. Management Science 1963; 9: 458–67.
[10] Linstone HA, ed. The Delphi Method. Techniques and Applications. 2nd ed., 2002. 1 vol. [11] Fink A, Kosecoff J, Chassin M, Brook RH. Consensus methods: characteristics and guidelines for use. Am J Public Health 1984; 74: 979–83. [12] Keeney S, Hasson F, McKenna H. The Delphi Technique in Nursing and Health Research. 1st ed.: Wiley-Blackwell, 2011. 1 vol. ISBN: 978-1-4051-8754-1. [13] Okoli C, Pawlowski S. The Delphi method as a research tool: an example, design considerations and applications. Information and Management 2004; 42: 15–29.
[15] Bolger F, Wright G. Improving the Delphi process, Lessons from social psychological research. Technological Forecasting and Social Change 2011; 78: 1500–13. [16] Akins RB, Tolson H, Cole BR. Stability of response characteristics of a Delphi panel: application of bootstrap data expansion. BMC Med Res Methodol 2005; 5: 37.
[18] VanGeest JB, Johnson TP, Welch VL. Methodologies for improving response rates in surveys of physicians: a systematic review. Eval Health Prof 2007; 30: 303–21. [19] Edwards PJ, Roberts I, Clarke MJ, et al. Methods to increase response to postal and electronic questionnaires. Cochrane Database Syst Rev 2009; (3): MR000008. [20] Flanigan TS, McFarlane E, Cook S, eds. Conducting survey research among physicians and other medical professionals: A review of current literature. In: Proceedings of the Survey Research Methods Section, American Statistical Association. 2008. p. 4136-47. [21] Thorpe C, Ryan B, McLean SL, et al. How to obtain excellent response rates when surveying physicians. Fam Pract 2009; 26: 65–8. [22] Martins Y, Lederman RI, Lowenstein CL, et al. Increasing response rates from physicians in oncology research: a structured literature review and data from a recent physician survey. British Journal of Cancer 2012; 106: 1021–6. [23] Frewer LJ, Fischer A, Wentholt M, et al. The use of Delphi methodology in agrifood policy development, Some lessons learned. Technological Forecasting and Social Change 2011; 78: 1514–25. [24] Schmidt RC. Managing Delphi surveys using nonparametric statistical techniques. Decision Sciences 1997; 28: 763–74. [25] Franklin KK, Hart JK. Idea Generation and Exploration, Benefits and Limitations of the Policy Delphi Research Method. Innov High Educ 2006; 31: 237–46. [26] Bowling A. Mode of questionnaire administration can have serious effects on data quality. J Public Health (Oxf) 2005; 27: 281–91. [27] Braun V, Clarke V. Using thematic analysis in psychology. Qualitative research in psychology 2006; 3: 77–101. [28] Pope C, Ziebland S, Mays N. Qualitative research in health care: analysing qualitative data. BMJ: British Medical Journal 2000; 320: 114. [29] Jamieson S. Likert scales: how to (ab) use them. Med Educ 2004; 38: 1217–8. [30] Carifio J, Perla RJ. Ten Common Misunderstandings, Misconceptions, Persistent Myths and Urban Legends about Likert Scales and Likert Response Formats and their Antidotes. J. of Social Sciences 2007; 3: 106–16. [31] Kampen J, Swyngedouw M. The ordinal controversy
58
Catalin Toma and Iuliana Picioreanu: The Delphi Technique: Methodological Considerations and the Need for Reporting Guidelines in Medical Journals revisited. Quality and quantity 2000; 34: 87–102.
[32] Norman G. Likert scales, levels of measurement and the "laws" of statistics. Adv Health Sci Educ Theory Pract 2010; 15: 625–32. [33] Sullivan GM, Artino AR. Analyzing and interpreting data from likert-type scales. J Grad Med Educ 2013; 5: 541–2. [34] Stevens S. S. On the Theory of Scales of Measurement. Science 1946; 103: 677–80. [35] Kuzon WM, Urbanchek MG, McCabe S. The seven deadly sins of statistical analysis. Ann Plast Surg 1996; 37: 265–72.
[49] Krosnick JA, Holbrook AL, Berent MK, et al. The impact of" no opinion" response options on data quality: Non-attitude reduction or an invitation to satisfice? Public Opinion Quarterly 2002; 66: 371–403. [50] Nowlis SM, Kahn BE, Dhar R. Coping with ambivalence: The effect of removing a neutral option on consumer attitude and preference judgments. Journal of Consumer Research 2002; 29: 319–34. [51] Kulas JT, Stachowski AA. Middle category endorsement in odd-numbered Likert response scales, Associated item characteristics, cognitive demands, and preferred meanings. Journal of Research in Personality 2009; 43: 489–93.
[36] van Vaerenbergh Y, Thomas TD. Response Styles in Survey Research, A Literature Review of Antecedents, Consequences, and Remedies. International Journal of Public Opinion Research 2013; 25: 195–217.
[52] Tourangeau R, Rasinski KA. Cognitive processes underlying context effects in attitude measurement. Psychological bulletin 1988; 103: 299.
[37] Cicchetti DV, Shoinralter D, Tyrer PJ. The effect of number of rating scale categories on levels of interrater reliability: A Monte Carlo investigation. Applied Psychological Measurement 1985; 9: 31–6.
[53] Kulas JT, Stachowski AA. Respondent rationale for neither agreeing nor disagreeing, Person and item contributors to middle category endorsement intent on Likert personality indicators. Journal of Research in Personality 2013; 47: 254– 62.
[38] Carolyn C. Preston & Andrew M. Colman. Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta Psychologica 2000; 104: 1-15.
[54] Friedman HH, Leefer JR. Label versus position in rating scales. Journal of the Academy of Marketing Science 1981; 9: 88–92.
[39] Miller GA. The magical number seven plus or minus two: some limits on our capacity for processing information. Psychol Rev 1956; 63: 81–97. [40] Matell MS, Jacoby J. Is There an Optimal Number of Alternatives for Likert Scale Items?, Study I: Reliability and Validity. Educational and Psychological Measurement 1971; 31: 657–74. [41] Weng L-J. Impact of the Number of Response Categories and Anchor Labels on Coefficient Alpha and Test-Retest Reliability. Educational and Psychological Measurement 2004; 64: 956–72. [42] Lozano LM, García-Cueto E, Muñiz J. Effect of the Number of Response Categories on the Reliability and Validity of Rating Scales. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences 2008; 4: 73– 9. [43] Finstad K. Response Interpolation and Scale Sensitivity: Evidence Against 5-Point Scales. Journal of Usability Studies 2010; 5: 104–10. [44] Weijters B, Cabooter E, Schillewaert N. The effect of rating scale format on response styles, The number of response categories and response category labels. International Journal of Research in Marketing 2010; 27: 236–47. [45] Matell MS, Jacoby J. Is there an optimal number of alternatives for Likert-scale items?, Effects of testing time and scale properties. Journal of Applied Psychology 1972; 56: 506–9. [46] Presser S, Schuman H. The measurement of a middle position in attitude surveys. Public Opinion Quarterly 1980; 44: 70–85. [47] Garland R. The Mid-Point on a Rating Scale: Is it Desirable? Marketing Bulletin 1991; (2): 66–70. [48] Clark LA, Watson D. Constructing validity: Basic issues in objective scale development. Psychological assessment 1995; 7: 309.
[55] Chan JC. Response-order effect in Likert-type scales. ERIC Clearinghouse, 1990. [56] Friedman HH, Herskovitz PJ, Pollack S, eds. The biasing effects of scale-checking styles on response to a Likert scale. In: Proceedings of the American statistical association annual conference: survey research methods, 1994(vol. 792). [57] Albanese M, Prucha C, Barnet JH, Gjerde CL. The effect of right or left placement of the positive response on Likert-type scales used by medical students for rating instruction. Acad Med 1997; 72: 627–30. [58] Albanese M, Prucha C, Barnet JH. Labeling each response option and the direction of the positive options impacts student course ratings. Academic Medicine 1997; 72: S4-S6. [59] Barnette JJ. Likert Response Alternative Direction: SA to SD or SD to SA: Does It Make a Difference? Annual Meeting of the American Educational Research Association (Montreal, Quebec, Canada, April 19-23) 1999. [60] Barnette JJ. Effects of stem and Likert response option reversals on survey internal consistency: If you feel the need, there is a better alternative to using those negatively worded stems. Educational and Psychological Measurement 2000; 60: 361–70. [61] Weng L-J, Cheng C-P. Effects of Response Order on LikertType Scales. Educational and Psychological Measurement 2000; 60: 908–24. [62] Nicholls MER, Orr CA, Okubo M, Loftus A. Satisfaction Guaranteed The Effect of Spatial Biases on Responses to Likert Scales. Psychological Science 2006; 17: 1027–8. [63] Hofmans J, Theuns P, Baekelandt S, Mairesse O SN, and Cools W. Bias and Changes in Perceived Intensity of Verbal Qualifiers Effected by Scale Orientation. Survey Research Methods 2007; 1: 97–108. [64] Yan T, Keusch F. The Effects of the Direction of Rating Scales on Survey Responses in a Telephone Survey. Public Opinion
International Journal of Public Health Research 2016; 4(6): 47-59
Quarterly 2015; 79: 145–65. [65] Maeda H. Response option configuration of online administered Likert scales. International Journal of Social Research Methodology 2013; 18: 15–26. [66] Christian LM, Dillman DA. The influence of graphical and symbolic language manipulations on responses to selfadministered questions. Public Opinion Quarterly 2004; 68: 57–80. [67] Dillman DA. Survey Mode as a Source of Instability in Responses across Surveys. Field Methods 2005; 17: 30–52. [68] Topoel V, Das J. W. M., van Soest A. Design of Web Questionnaires: The Effect of Layout in Rating Scales. Journal of Official Statistics 2009; 25: 509–28. [69] Morse JM, Barrett M, Mayan M, Olson K, Spiers J. Verification strategies for establishing reliability and validity in qualitative research. International journal of qualitative methods 2002; 1: 13–22. [70] Day J, Bobeva M. A generic toolkit for the successful management of Delphi studies. EJBRM 2005; 3: 103–16.
59
[71] Rowe G, Wright G. The Delphi technique: Past, present, and future prospects — Introduction to the special issue. Technological Forecasting and Social Change 2011; 78: 1487– 90. [72] Diamond IR, Grant RC, Feldman BM, et al. Defining consensus: a systematic review recommends methodologic criteria for reporting of Delphi studies. J Clin Epidemiol 2014; 67: 401–9. [73] Boulkedid R, Abdoul H, Loustau M, Sibony O, Alberti C. Using and reporting the Delphi method for selecting healthcare quality indicators: a systematic review. PLoS ONE 2011; 6: e20476. [74] Sinha IP, Smyth RL, Williamson PR. Using the Delphi technique to determine which outcomes to measure in clinical trials: recommendations for the future based on a systematic review of existing studies. PLoS Med. 2011; 8: e1000393. [75] Hasson F, Keeney S. Enhancing rigour in the Delphi technique research. Technological Forecasting and Social Change 2011; 78: 1695–704.