Understanding Evidence-Based Research Methods

1 downloads 0 Views 344KB Size Report
Jason M. Etchegaray, PhD, and Wayne G. Fischer, MS, PhD ... next best approach: asking them about their per- • Water features (e.g., fountains) in the hospital.
HERD VOLUME 4, NUMBER 1,PP 131-135 COPyRIGHTO2010 VENDOME GROUP. LLC

Research Methods Column

Understanding Evidence-Based Research Methods: Reliabiiity and Validity Considerations in Survey Research Jason M. Etchegaray, PhD, and Wayne G. Fischer, MS, PhD

T

his contribution to the methodology series focuses on two issues relevant to all research projects: reliability and validity. There are many types of reliability and validity, and to make this contribution more grounded in previous columns, the reliability and validity of survey instruments will be the focus of this paper. A future paper will incorporate other aspects of validity (e.g., internal and external validity) when discussing the strengths and weaknesses of different methodological designs. The authors welcome Author Affiliations: Dr. Etchegaray is affiliated with the University of Texas Medical School at Houston and UT Houston— Meniorial Hermann Center for Healthcare Quality and Safety. Dr Fischer is affiliated with the Perioperative Enterprise at The University of Texas M. D. Anderson Cancer Center in Houston, TX. Acknowledgment: Funding for the first author was provided by a K02 award from the Agency for Healthcare Research and Quality (Grant # 1 K02 HS017145-02) and The University of Texas at Houston—Memorial Hermann Center for Quality and Safety.

comments from all readers who have suggestions about the way information is presented or questions about the content of this column. It is not comprehensive and does not replace information found in textbooks or peer-reviewed articles. Beyond their experience, the authors have used textbooks and articles as sources and recommend that you also refer to them for more detailed explanations of what is discussed in this column. Scenario: When we last met Allan and Gwen, they discussed different issues that needed to be considered before conducting a survey. Since that discussion, Gwen has identified some survey questions that might fit Allan's needs. She calls to inform him about what she has discovered. Their conversation follows. Gwen: Hi again, Allan. I've made progress on identifying some survey questions that might be helpful for your study. As we discuss the questions, I'd like

HERD Vol. 4, No. 1 FALL 2010 • HEALTH ENVIRONMENTS RESEARCH & DESIGN JOURNAL

131

Research Methods Column to take this opportunity to also discuss the importance

Gwen: Actually, we will want to ask more than

of reliable and valid measurement. I think we should

one question about each construct. Usually, we

start by discussing some terminology. How does that

try to measure constructs with at least three good

sound? If that sounds OK, we can start by discussing survey items. Given that we do not know which a construct. We discussed this previously, but it is so

items will be good when we start our research

important that I think we need to revisit it to make

project, we actually try to use more than three

sure we are on the same page.

survey items per construct. Items that are good are retained and items that are not good are discard-

Allan: That sounds good, Gwen.

ed and not used again in the future. A reasonable rule of thumb is to write four or five items and

Gwen: OK Gonstruct is the term for something we

then narrow these down to three items based on

want to measure andean do so only indirectly through

pilot testing—asking a small sample of people to

indicator variables (for example, a survey question),

take and evaluate your survey for comprehension

Because of such indirect measurement, constructs rep-

and content relevance,

resent latent variables. An example of direct measurement is weight in pounds: you can stand on a scale

One reason for using at least three survey items is

and we can get a measure of how much you weigh. In

to be able to estimate whether our measurement

this case, we can directly measure weight. An indirect

of each construct is internally consistent. Internal

measurement in the same situation would mean that I

consistency provides information about the reli-

wouldaskyou how much you weigh. In this case, this

ability of a survey (Etchegaray & Fischer, 2006).

indirect measure helps us understand a construct called

A specific measure of internal consistency is Gron-

weight. We can't always collect direct measurements, so sometimes we rely on indirect measurements.

bach's alpha, which measures the average intercorrelation of all survey items for a construct. An

example might help clarify internal consistency. In the study you want to do, peoples'perceptions about

Let's suppose that we measure perceptions of water

nature are indirect measurements. We are unable to

features by asking participants to indicate the ex-

measure exactly what they are thinking/feeling with

tent to which they agree with the following survey

respect to trees, green spaces, water, etc., so we rely on the next best approach: asking them about their per-

items:

• Water features (e.g., fountains) in the hospital

ceptions. In your study, then, we have a number of

are soothing to watch.

constructs—trees, green spaces, water features, etc.—

• Water features (e.g., fountains) contribute to a

that we want to measure.

peaceful atmosphere in the hospital.

Allan: Makes sense so far. Can we just ask study

• / am relaxed when watching the water features (e.g., fountains) in the hospital.

participants one question about each of these constructs?

• / look for water features (e.g., fountains) in the hospital.

132

WWW.HERDJOURNAL.COM ISSN: 1937-5867

HERD VOLUME 4, NUMBER 1.PP 131-135 COPYRIGHTO2010 VENDOME GROUP, LLC

Research Methods Column

It is plausible that some participants will agree with Allan: Good. I am glad I understand this. I think all of the preceding items. If they do, we would say you also mentioned validity when we started this that the participants responded to all of the items in discussion. How does validity differ from what we a consistent manner. In this case, Cronbach's alpha just discussed about reliability? would be high, and we could state that our internal consistency is high. Interestingly, we would also have Gwen: Reliability isfocusedon the extent to which rehigh^ internal consistency if participants disagreed sponses to survey items are consistent; validity is focused with > these items, because the participants would be on whether the survey items measure what we want to consistent in their responses. measure. Remember, survey items act as surrogates for the real construct we want to measure. The better the If, on the other hand, most of the participants agreed

survey items measure the construct, the more valid the

with theßrst three items and disagreed with the last

survey items are.

one, Cronbach's alpha would be lower as a result of the fourth item. In this case, we might consider

There are several types of validity that we need to con-

eliminating thefourth item from future administra-

sider with respect to surveys. Thefirst type isface validi-

tions ofthis survey, given that it is negatively impact-

ty. As the name implies, the purpose of face validity is to

ing internal consistency AND it seems to measure determine whether the survey items appear to measure what we want to measure just by looking at the con-

something slightly differentfrom the first three items (i.e., looking for the features vs. finding the features

tent ofthe items. A good questionforus to ask ourselves

relaxing).

aboutface validity is: When we look at the wording of

Although there are other ways to estimate the con-

the items, do they seem to measure what we intend?

sistency of responses to survey items (i.e., test-retest, Raykov's p coefficient, etc.), Cronbach's alpha cur-

If you look at the four survey items about water fea-

rently is the most commonly used estimate (Nunnally

tures again (see below), do these items seem like they are

& Bernstein, 1994).

measuring what you want to measure? • Water features (e.g., fountains) in the hospital are

Allah: I think I understand what you are saying. It seems that the survey items for a construct need to be similar to each other. If the items are similar and the participants respond to the items in a similar manner, we have internal consistency. Is that correct?

soothing to watch. • Water features (e.g., fountains) contribute to apeaceful atmosphere in the hospital • lam relaxed when watching the waterfeatures (e.g., fountains) in the hospital • I look for waterfeatures (e.g.,fountains)in the hospital

Gwen: That's correct. We expect all survey itemsfor a construct to yield similar answers, because the items Allan: Well, the first three items focus on whether the water features are soothing, peaceful, and reare supposed to be measuring the same construct.

HERD Vol. 4, No. 1 FALL 2010 • HEALTH ENVIRONMENTS RESEARCH & DESIGN JOURNAL

133

Research Methods Column laxing. That is exactly what we had in mind when we decided to incorporate the water features into

tures in thefirstplace, being soothing, peaceful, and relaxing.

the hospital, so I would answer in the affirmative for these items. As you noted, the fourth item is

Are there other aspects of waterfeatures that are part

framed a bit differently. However, I think it is a

of the reason you included them in these buildings?

valuable item and also of interest to us because we want people to look for items that they think

Allan: No, those were the main reasons.

have positive attributes (i.e., are soothing, peaceful, relaxing).

Gwen: OK, in that case, these items appear to have some level of content validity. There are also statisti-

Gwen: Good. In this case, we have confidence that cal ways to assess content validity that we can conthese items have face validity. There are other types sider. Eor instance, one method is to send our draft of validity that are of interest to us, namely content,

items to 10 experts in the field and ask them to in-

construct, and criterion validity.

dicate their level of agreement (on a scale of 1-10, say) for each item as to whether it's a good measure

Allan: That sounds like a lot of types of validity! Do we really need to have all of these types of validity?

of the construct of interest. We pre-set our accepted level of agreement and compare the mean score for each item obtainedfrom the panel of experts. If the mean is below the cut-off, we discard it; if above,

Gwen: Ideally, yes. Obtaining evidence of validity

we accept it. The mean across all items is called the

takes time, though, and it's unlikely that we will es-

content validity index, and we'd like that to be at

tablish all types in one study. As we just discussed,

least 70% (see discussion about the Gontent Validity

face validity is rather straightforward and easy to

Index; Polit & Beck, 2006). Now let's turn our at-

establish. Typically, we will also focus on content

tention to construct validity.

and construct validity when initially creating a set of survey items.

Gonstruct validity is focused on whether we are

Gontent validity focuses on whether the survey con-

example, do our items about water features actu-

tains items that are relevant to the domain of the

ally measure perceptions of waterfeatures? One way

measurement of interest. In terms of the four items

we can determine whether this validity exists is to

measuring what we think we are measuring. Eor

we've examined in this discussion, the domain of compare what we are trying to measure (water feameasurement interest is waterfeatures. One aspect of tures in this case) to other measures that are expected water features that's been incorporated in the build- to be either similar or different from our measure. ings is fountains, so we have included fountains in

We might expect patient perceptions of the quality

the survey items as examples. We also incorporated of food in the hospital cafeteria to be unrelated to ideas that relate to reasons you included water fea- perceptions of water features. Gonversely, we might

134

WWW.HERDJOURNALCOM ISSN: 1937-5867

HERD VOLUME 4, NUMBER 1,PP 131-135 COPYRIGHT C2010 VENDOME GROUP, LLC

Research Methods Column

expect patient perceptions of artwork to be related

behavior. For example, we might use perceptions of

to perceptions of water features because both are

waterfeatures to predictpatient satisfaction with the

a visual feature of the hospital To the extent that

hospital overall. By looking at the extent of predic-

quality of food does not relate to perceptions of water tive validity, researchers form an idea about which features and perceptions of artwork do relate to per- predictors are most valuable to them and which are ceptions of waterfeatures, we have established diver-

less able to predict satisfaction.

gent and convergent validity, respectively, which are two components of construct validity.

Allan: Makes perfect sense to me, Gwen. What's next?

Allan: Will we be able to assess construct validGwen: I will send you recommended survey ques-

ity?

tions and we can start pilot testing them. While that Gwen: Yes, we will By measuring multiple con-

is going on, I will develop a data analysis plan. So,

structs in our survey, some of which should be related our next two discussion points will be pilot testing to each other and others which should not, we will be and the data analysis plan. Your thoughts? able to gather initial evidence for construct validity. Additional statistical tests can be used to gather more Allan: Sounds great, Gwen. Thanks! evidence for construct validity, such as confirmatory factor analyses (Nunnally & Bernstein, 1994). We

References

can explore confirmatory factor analyses when we

Etchegaray, J. M., & Fischer, W. G. (2006). Survey research: Be careful where you step. Quaiity and Safety in Healthcare, 15(3), 154-155.

start to analyze the data we have collected. The, other type of validity is predictive validity (Nunnally & Bernstein, 1994). Predictive validity, sometimes called criterion validity, is focused on the extent to which our construct predicts some outcome

Nunnally, J, C, & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill. Polit, D. F., & Beck, C. T. (2006). The Content Validity Index: Are you sure you know what's being reported? Critique and recommendations. Research in Nursing & Health, 29, 489-497.

of interest. The outcome might be an attitude or

HERD Vol. 4, No. 1 FALL 2010 • HEALTH ENVIRONMENTS RESEARCH & DESIGN JOURNAL

135

Copyright of Health Environments Research & Design Journal (HERD) is the property of Vendome Group LLC and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Suggest Documents