Khera and Benson (1970), for example, compared the responses of .... such as meta-analysis (Glass, McGaw, & Smith, 1981; Hedges & Olkin, 1985; Hunter,.
Rationally Thinking about Nonprobability John A. Courtright “Once more unto the breach, dear friends, once more. . . . King Henry V, Act iii, Sc.1 Certain intellectual issues in the social scientific study of human communication seem to ignite controversy on a regular and recurring basis; for example, is behavior law-governed or rule-governed, are the effects of television direct or limited, or what is the “real” impact of televised violence on children? No issue, however, has a longer half-life or recurs with more regularity than the argument over the validity and generalizability of findings obtained from social scientific experiments. This controversy flared again recently in an exchange in Communication Theory between Sparks (1995a, 1995b) and Potter, Cooper, and Dupagne (1995). Briefly, this particular round of argument originated with Potter, Cooper, and Dupagne’s (1993) finding that 61% of the mass media social science studies they surveyed did not obtain data from a probabilistic sample, accompanied by their interpretation that such studies are not “scientific,” but rather exploratory and “prescientific.” In a response to these findings, Sparks (1995a) offers the observation that such a large percentage would require the, in Sparks’ view, inappropriate inclusion of experiments under the rubric of nonprobabilistic and, thus, “not scientific.” Potter et al. (1995) reply that they did indeed include experiments in their calculations, that experimental findings using nonprobability samples are not capable of being generalized, and that Sparks is incorrect in his contentions. At this point, the battle is joined! Several of the themes from the Sparks versus Potter et al. exchange are apparent in the various essays published in this special section. These authors, however, are eminently capable of reviewing and explicating their own positions, and I will leave them to it. My purpose, in contrast, is to make the point somewhat intensely that this has all been done before, and that we should outline our positions, raise whatever arguments are germane, and get on about the business of studying human communication! The surface question under discussion is whether experiments that employ nonprobability samples can produce valid and generalizable results. This question, however, can be approached in two ways. One interpretation addresses the basic purpose of experimentation and argues that generaiization to a popuiation is not the goal of such research. Rather, the purpose of experiments, this position would assert, is to examine hypothesized causal relationships between independent and dependent variables. If one adopts this position, then concerns over the makeup of the sample become clearly secondary, if not superfluous. © 1996 Broadcast Education Association
Journal of Broadcasting & Electronic Media 40, 1996, pp. 4 1442 1.
414
Courtright/RATIONALLY THINKING ABOUT NONPROBABILITY
415
A second interpretation, and the one that almost certainly prompted Potter et al. (1993) to adopt their initial position, is the concern over the composition of these nonprobability samples. By adopting this approach, the motivating goal of experimentation is implicitly assumed to be to generalize one’s experimental findings to a larger, unobserved population. As a consequence, the validity of such generalization hinges directly on the representativeness of the sample. If the sample is not reasonably similar to the population, then one cannot confidently infer that results obtained from that sample would also appear in the population. We can, of course, obfuscate this second issue under a mound of statistical jargon, but the real issue that motivates this concern, I believe, is the validity of student samples in experimental research. Who else, after all, almost always comprises these “nonprobability samples?”
Moreover, if students are simply a readily available
microcosm of the larger adult world (at least on the dimensions that make a difference in our experimentation), then few of us would worry about valid generalization and these debates would have never taken place. But alas, take place they have. The concern about student samples has been raised several times in several different disciplines, including the discipline of communication. As a result, there is little insight in the current essays that has not been previously presented, replied to, subjected to rejoinder, and evidently forgotten. Accordingly, the remainder of this essay will seek to review those previous discussions, show their remarkable similarity across disciplines, and make at least one small recommendation for putting this issue behind us.
From Whence Have We Come? Exactly half a century ago, Quinn McNemar (1946) published in Psychological Bulletin a comprehensive essay on opinion-attitude methodology, in which he reviewed a variety of procedures associated with the study of attitudes and opinions, and presented guidelines for correct methodological approaches. Much of McNemar’s sage advice has become outdated, has been repeated in more contemporary monographs, or has been (unfortunately) forgotten in the passage of time. One particular observation, however, remains the notable and defining statement from his essay. In addressing various statistical issues associated with research on attitudes and opinion, McNemar (1946, p. 332) discusses what he terms “sampling problems.” He raises many of the same issues that vex us today—issues that prompted him to make his oft-cited observation: The existing science of human behavior is largely the science of the behavior of sophomores. Too much research effort is expended on college students with subsequent waste of journal space devoted to speculation concerning whether the findings hold for mankind in general. (p. 333)
416
Journal of Broadcasting & Electronic Media/Summer 1996
Ouch! And remember that this was before the intervening 5 0 years have witnessed many more journal pages containing much more speculation. Approximately a quarter of a century later, this entire issue was revisited in the discipline of psychology, prompted primarily by the work of Robert Rosenthal and his colleagues (Rosenthal, 1966; Rosenthal & Rosnow, 1969), as well as others (e.g., London & Rosenhan, 1964; Ora, 1966; Rosnow & Aiken, 1973; Schultz, 1969). The general rubric for much of this discussion is the important and valid concern over “artifact” in the conduct of experimental research (e.g., pretest sensitization, suspiciousness of experimenter intent, etc.). This work is thoroughly summarized in Rosenthal and Rosnow’s (1969) classic volume on Artifact in Behavioral Research.
Arguably, the most widely disseminated issue to emerge from these discussions is the topic of “experimenter demand characteristics” (see Orne, 1962; Rosenthal, 1966). In terms of the number of independent articles and books devoted to the topic,
however, the concern over “volunteer subjects” rivals or exceeds those associated with experimenter demand (this work is summarized in Rosenthal & Rosnow, 1975). The “volunteer” subject is not simply a college student, but rather a very specific type of college student who has populated (overpopulated?) social psychological experiments throughout generations of such research. The importance of this issue is emphasized by Higbee and Wells (1972), who present data demonstrating that, by 1969, over three-quarters of all experiments published in the Journal of Personality and Social Psychology employed college students as their subjects. Who are these students? Are they representative of the so-called “larger population”about which most social scientists would like to make inferences? The empirical answer seems to be almost certainly not. As described by Rosenthal and Rosnow (1969, pp. 59-60): There are now indications that these “psychology sophomores” are not entirely representative of even sophomores in general, a possibility that makes McNemar’s formuiation sound unduiy optimistic. The existing science of human behavior may be largely the science of those sophomores who both (a) enroll in psychology courses and (b) volunteer to participate in behavioral research.
After reviewing a relatively large volume of research on volunteering as a purposive behavior, as well as the personal attributes of those who volunteer, Rosenthal and Rosnow (p. 110) conclude their chapter even more pessimistically than they began: We began this chapter with McNemar’s lament that ours is a science of sophomores. We conclude this chapter with the question of whether McNemar was too generous. Often ours seems to be a science of just those sophomores who volunteer to participate in our research and who also keep their appointment with the investigator.
These are seemingly serious indictments of decades of experimental research in the social sciences, including experimental investigations of human communication. As we shall see shortly, however, not all researchers agree that such an “artifact” is a
Courtright/RATIONALLY T HINKING ABOUT NONPROBABILITY
417
serious problem, or that experimental research using nonprobabilistic samples is not sufficiently scientific. Before turning to those arguments, however, a brief review of how student subjects have been viewed in other disciplines seems in order.
Closer to Home The entire issue of the validity of experimental findings in the discipline of communication has been addressed previously by Rossiter (1976). Rossiter began with a relatively straightforward review of artifact-producing procedures (including the use of students as subjects), and examined whether those criticisms are applicable to research in communication. He surveyed 68 experimental studies in three major journals (JOC, CM, and HCR), and concluded that “reports of communication experimentation rarely provide sufficient information to allow critical evaluation of crucial aspects of validity” (p. 203). Despite his own finding of insufficient information, Rossiter nonetheless adopted the role of “thoughtful critic”and proceeded (however uninformed) directly to critical evaluation. He concluded that, with few exceptions, the validity of communication experiments “may be severely limited” by a lengthy list of inappropriate and/or unreported procedures. There is little new or noteworthy in Rossiter’s account, except for it to serve as additional evidence for my thesis that “we’ve done this all before.”
Let’s Get Down to Business Lest we feel the pox of student subjects falls only on the houses of psychology and communication, our colleagues in the fields of business, marketing, and management have also devoted their fair share of hand-wringing (and journal space) to these same issues. To their credit, however, several researchers in these areas have gone beyond conceptual or theoretic speculation about the attributes of students and have empirically compared their experimental responses to those of members of potential populations, such as, adult consumers, business executives, CPA’s, union negotiators, and the like. I will review a small, and what I hope is a representative sample. Khera and Benson (1970), for example, compared the responses of business school juniors and engineering juniors (what happened to sophomores?) to those of purchasing agents and professional engineers. These four groups of individuals were shown different versions of a filmed persuasive appeal and asked to evaluate the presentation, as well as the communicator. Although their findings indicated a few similarities of responses, the findings supported the general contention that “businessmen’s behavior as subjects is likely to be a more accurate predictor of actual businessmen’s behavior than that of students’ ” (p. 531). In an ambitious study, Burnett and Dunne (1986) compared a sample of students to their parents, as well as other nonstudents, on a variety of consumer issues and preferences. Their general and not surprising conclusion is that “the use of students to
418
Journal of Broadcasting & Electronic Media/Summer 1996
represent other populations carries with it the risk of contaminating most research endeavors” (p. 341). A noteworthy, yet secondary finding of this research is that students generally make better subjects, exhibiting better “alpha scores, factor structures, and correlational results” than the other samples in the study (Burnett & Dunne, 1986, p. 341). Rather than viewing this finding positively, however, the authors raise a cautionary note about questionnaires or survey instruments developed with students, but designed for use with the general population. The authors’ reasonable interpretation is that students are more experienced and familiar with test-like experimental materials, and also are capable of higher levels of concentration for longer periods of time. Assuming that members of the general population will not possess these experiences and concentration levels, the results they provide on materials developed with students may well be quite different and, perhaps, invalid. A third and quite comprehensive empirical study was conducted by Gordon, Slade, and Schmitt (1986). These authors reviewed 32 previous studies that had compared student samples with nonstudents on multiple behaviors and response types. While being careful to assert that “it would be wrong to assume that students are totally unlike nonstudents,” Gordon et al. come to much the same conclusion as previous researchers:“it is clear that problems exist in replicating with nonstudent subjects behavioral phenomena observed in student samples” (p.200).
Taking Stock If we are to take at face value the results of almost 50 years of concern about the use of student samples in experimental research, the conclusion would be inescapable: Most of the experiments in all branches of the social sciences cannot sustain valid generalization and, thus, we are quite knowledgeable about 20-year-old college students, but hardly anybody else. On the other hand, recall that I suggested at the outset of this essay that there were two approaches to the validity of experiments and their generalization. Accordingly, simply stopping at “face value” would be inappropriate. Clearly, if the accepted purpose of experiments and their associated statistical tests was to generalize statistical results to an unobserved, heterogeneous population, then the use of nonprobability, student samples would be highly problematic. Equally clear is that this is the purpose assumed sometimes implicitly, sometimes openly by many of the critics of this practice, including Potter et al. (1993, 1995). Such a purpose, however, is not universally accepted. One of most cogent statements of an alternative position is found in Berkowitz and Donnerstein’s (1982, p. 245) attempt to provide“some answers to criticisms of laboratory experiments.” Foreshadowing the argument subsequently presented by Sparks (1995a) and Annie Lang in this issue, Berkowitz and Donnerstein argue: Laboratory experiments are mainly oriented toward testing some causal hypothesis (and according to Postman, 1955, ought to deal chiefly with mediational processes)
Courtright/RATIONALLY THINKING ABOUT NONPROBABILITY
419
and are not carried out to determine the probability that a certain event will occur in a particular population. (p. 247) Citing the work of Kruglanski (1975), Berkowitz and Donnerstein further distinguish between two fundamentally different goals of experimentation: (a) particularistic inquiries, in which tightly defined, causal relationships are investigated with any attempt at generalization necessarily restricted in scope, and (b) universalistic inquiries, in which the goal is to make broad generalizations to an unobserved population that contains “all sorts and varieties of theoretically irrelevant conditions" (Kruglanski, 1975, p. 105). Berkowitz and Donnerstein’s implicit, but unmistakable assertion is that most social scientific experimentation is pursuing particularistic inquiry. Accordingly, “the experimentalist pursuing this type of research need not be especially worried about the representativeness of the sample and laboratory conditions” (p. 248). This distinction seemingly has great import for the present debate, because we see the protagonists squarely planted on opposite sides of this divide, but (perhaps this is too harsh) without recognizing the existence and legitimacy of alternative viewpoints. There is no locus where debate can center, and the arguments are like skew lines shooting off into space, but never intersecting. So, where do we stand? Potter, Cooper, and Dupagne, the editor of this journal (as is apparent in his editorial guideline #8, “Manuscript Submission Guidelines,” 1995), as well as hundreds of social scientists before them are advocating a more universalistic approach to the generalization of scientific knowledge. As Lang (this issue) so aptly suggests, given the educational pedigree and current research orientation of these fine scholars, their acceptance and advocacy of this approach is hardly surprising. I hope that one positive outcome of this brief debate will be to alert these colleagues to the legitimacy of the “particularistic” position. The theoretic domains that constitute the study of mass communication are replete with issues amenable to experimentation. To assign such research to the disciplinary junk pile of “nonscientific” or even “prescientific” is to, I believe, do a great injustice to those researchers and an even greater disservice to the study of human communication. A second outcome of this exchange of essays will be, again I hope, a realization that a type of synthesis is not totally impossible. Ultimately, all social scientists would desire to achieve “universal” generalization, even if they do not possess that lofty goal for any one, specific experiment. This is why replication, procedural/operational extensions, and the proverbial “stream of research” have traditionally been so important. There can be no doubt that several studies aimed at investigating interrelated facets of a basic question, using different samples and perhaps even different operational definitions (Kruglanski, 1975), can generate more confidence in the “universality” of one’s findings than a single experiment. Equally important is that, as a result of recent advances in techniques of synthesis such as meta-analysis (Glass, McGaw, & Smith, 1981; Hedges & Olkin, 1985; Hunter, Schmidt, & Jackson, 1982), empirically sound methods for combining the results from
420
Journal of Broadcasting & Electronic Media/Summer 1996
different experiments (and even different researchers!) now exist and are widely accepted. Considering the logistical and pragmatic difficulty (nay, impossibility) of obtaining a national probability sample for an experiment or set of experiments, using meta-analytic techniques to combine the findings from a reasonably large set of nonprobability samples may be the best compromise we can reach. Finally, a third hoped-for outcome of this exchange is that students and scholars of human communication will finally recognize that we’ve done this all before and will put this set of issues to rest once and for all. Researchers will almost certainly continue to investigate certain important relationships in mass communication using experimental methods and convenience samples composed of college-aged students. Whether these experiments are labeled “scientific”or “prescientific” will not change this basic fact, nor will such labels affect the quality and theoretic utility of those studies. Additional debate over these issues, however, will only detract us from what ought to be our primary task: the study of human communication phenomena. Hence, to conclude as I began with the words of that most famous student of the human condition, Shall we clap into't ? roundly,
hawking or spitting or saying we are hoarse
...?
As You Like It, Act V. S C iii
References Berkowitz, L., & Donnerstein, E. (1982). External validity is more than skin deep: Some answers to criticisms of laboratory experiments. American Psychologist, 37, 245-257. Burnett, J. J., & Dunne, P. M. (1986). An appraisal of the use of student subjects in marketing research. Journal of Business Research, 74,329-343. Glass, G. V., McGaw, B., & Smith, M. L. (1981). Meta-analysis in social research. Beverly Hills: Sage Publications. Gordon, M. E., Slade, L. A., & Schmitt, N. (1986). The “Science of the Sophomore” revisited: From conjecture to empiricism. Academy of Management Journal, 7 1, 191-207. Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando: Academic Press. Higbee, K. L., & Wells, M. G. (1972). Some research trends in social psychology during the 1960s. American Psychologist, 27, 963-966. Hunter, J. E., Schmidt, F. L., & Jackson, G. B. (1982). Meta-analysis: Cumulating research findings across studies. Beverly Hills: Sage Publications. Khera I. P., & Benson, J. D. (1970). Are students really poor substitutes for businessmen in behavioral research? Journal of Marketing Research, 7,529-532. Kruglanski, A. W. (1975). The human subject in the psychology experiment: Fact and artifact. In L. Berkowitz (Ed.), Advances in experimental socia l psycholog y (Vol. 8, pp. 101-l 47). New York: Academic Press. London; P., & Rosenhan, D. (1964). Personality dynamics. Annual Review of Psychology, 15, 447-492. Manuscript Submission Guidelines. (1995). Journal of Broadcasting & Electronic Media, 39, 145-l 46. McNemar, Q. (1946). Opinion-attitude methodology. Psychological Bulletin, 43,289-374. Ora, J. P., Jr. (1966). Personality characteristics of freshman volunteers for psychobgy experiments. Unpublished master’s thesis, Vanderbilt University, Nashville, TN.
Courtright/RATIONALLY THINKING ABOUT NONPROBABILITY
421
Orne, M. T. (1962). On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications. American Psychologist, 17, 776-783. Potter, W. J., Cooper, R., & Dupagne, M. (1993). The three paradigms of mass media research in mainstream communication journals. Communication Theory, 3, 317-335. Potter, W. J., Cooper, R., & Dupagne, M. (1995). Reply to Sparks’s critique. Communication Theory, 5, 280-286. Rosenthal, R. (1966). Experimenter effects in behavior research. New York: Appleton-Century-
Croft.
Rosenthal, R., & Rosnow, R. L. (Eds.). (1969). Artifact in behavioral research. New York: Academic Press. Rosenthal, R., & Rosnow, R. L. (1975). The volunteer subject. New York: John Wiley & Sons. Rosnow, R. L., & Aiken, L.S. (1973). Mediation of artifacts in behavioral research. Journal of Experimental Social Psychology, 9, 18 l-2 0 1. Rossiter, C. M. (1976). The validity of communication experiments using human subjects: A review. Human Communication Research, 2, 197-206. Schultz, D. P. (1969). The human subject in psychological research. Psychological Bulletin, 72, 214-228. Sparks, G. G. (1995a). Comments concerning the claim that mass media research is “prescientific”: A response to Potter, Cooper, and Dupagne. Communication Theory, 5, 273-280. Sparks, G. G. (1995b). A final reply to Potter, Cooper, and Dupagne. Communication Theory, 5, 286-289. John A. Courtright (Ph.D., University of Iowa, 1976) is Professor and Chair of the Department of Communication, University of Delaware.