Experimental Method

263 downloads 142 Views 36KB Size Report
theoretical, and to the advancement of education as a science” (p. 298). — Gay, L. R. (1992). Educational research (4th Ed.). New York: Merrill. Importance of ...
Y520 — Spring 2000 Page 1

Experimental Method “The best method — indeed the only fully compelling method — of establishing causation is to conduct a carefully designed experiment in which the effects of possible lurking variables are controlled. To experiment means to actively change x and to observe the response in y” (p. 202). — Moore, D., & McCabe, D. (1993). Introduction to the practice of statistics. New York: Freeman. “The experimental method is the only method of research that can truly test hypotheses concerning cause-and-effect relationships. It represents the most valid approach to the solution of educational problems, both practical and theoretical, and to the advancement of education as a science” (p. 298). — Gay, L. R. (1992). Educational research (4th Ed.). New York: Merrill. Importance of Good Design: (http://www.tufts.edu/*gdallal/study.htm) “100% of all disasters are failures of design, not analysis.” — Ron Marks, Toronto, August 16, 1994 “To propose that poor design can be corrected by subtle [statistical] analysis techniques is contrary to good scientific thinking”. — Stuart Pocock (Controlled Clinical Trials, p 58) regarding the use of retrospective adjustment for trials with historical controls. “Issues of design always trump issues of analysis.” — G.E. Dallal, 1999, explaining why it would be wasted effort to focus on the analysis of data from a study under challenge whose design was fatally flawed.

Unique Features of Experiments: 1. The investigator manipulates a variable directly (the independent variable). 2. Empirical observations based on experiments provide the strongest argument for cause-effect relationships. Additional features: 1. Problem statement ⇒ theory ⇒ constructs ⇒ operational definitions ⇒ variables ⇒ hypotheses. 2. The research question (hypothesis) is often stated as the alternative hypothesis to the null hypothesis, that is used to interpret differences in the empirical data. 3. Random sampling of subjects from population (insures sample is representative of population). 4. Random assignment of subjects to treatment and control (comparison) groups (insures equivalency of groups; ie., unknown variables that may influence outcome are equally distributed across groups). 5. Extraneous variables are controlled by 3 & 4 and other procedures if needed. 6. After treatment, performance of subjects (dependent variable) in both groups is compared. Ways to control extraneous variables: 1. Random assignment of subjects to groups. This is the best way to control extraneous variables in experimental research. Provides control for subject characteristics, maturation, and statistical regression. 2. Variables that may still exist: a. Subject mortality (i.e., dropouts due to treatment) b. Hawthorne effect c. Fidelity of treatment (manipulation check) d. Data collector bias (double blind studies) e. Location, history 3. Additional procedures for controlling extraneous variables (use as needed) a. Exclude certain variables. b. Blocking. c. Matching subjects on certain characteristics. d. Use subject as own control. e. Analysis of covariance.

Michael – Y520

Y520 — Spring 2000 Page 2

True Experimental Designs A. Randomized Post-test only Control Group Design Treatment Comparison

R R

X1 X2

O O

R = random assignment X = Treatment occurs for X1 only O = Observation (dependent variable)

This is the best of all designs for experimental research. Random assignment controls for subject characteristics, maturation, statistical regression. Potential threats not controlled: subject mortality, Hawthorne effect, fidelity of treatment, data collection bias, unique features of location, history of subjects.

B. Randomized Pretest — Post-test Control Group Design Treatment Comparison

R R

O1 O1

X1 X2

O2 O2

R = random assignment X = Treatment occurs for X1 only O1 = Observation (Pre-test) O2 = Observation (Post-test, dependent

Potential threat: Effect of pre-testing. variable)

C. Randomized Solomon Four Group Design Treatment R O1 X1 O2 Comparison R O1 X2 O2 ———————————————————— Treatment R X1 O2 Comparison R X2 O2

R = random assignment X = Treatment occurs for X1 only O1 = Observation (Pre-test) O2 = Observation (Post-test, dependent variable)

Random sampling, random assignment. Best control of threats to internal validity, particularly the threat introduced by pretesting. Requires a relatively large number of subjects.

D. Randomized Assignment with Matching 1. Randomized (Sampling & Assignment), Matched Ss, Post-test only, Control Group Treatment M,R X1 O ———————————————————— Comparison M,R X2 O

M = Matched Subjects R = Random assignment of matched pairs X =Treatment (for X1 only) O = Observation (dependent variable)

Example: An experimenter wants to test the impact of a novel instructional program in formal logic. The investigator infers from reports in the literature that high ability students and those with programming, mathematical, or music backgrounds are likely to excel in formal logic regardless of type of instruction. The experimenter randomly samples subjects, looks at subjects SAT scores, matches subjects on basis of SAT scores and randomly assigns matched pairs (one of each pair to each group). The other concominant variables (previous programming, mathematical, and music experience) could also be “matched.”

Michael – Y520

Y520 — Spring 2000 Page 3 2. Randomized Pretest-Post-test Control Group, Matched Ss Treatment O1 M,R X1 O2 ———————————————————— Comparison O1 M,R X2 O2

O1 = Pretest M = Matched Subjects R = Random assignment of matched pairs X =Treatment (for X1 only) O2 = Observation (dependent variable)

Subjects are matched on the basis of their pretest score and pairs of subjects are randomly assigned to groups.

3. Matching Methods a. Mechanical matching 1). Rank order subjects on variable, take top two, randomly assign members of pairs to groups. Repeat for all pairs. 2). Problems: • Impossible to match on more than one or two variables simultaneously. • May need to eliminate some Ss due to no appropriate match for one of the groups. a. Statistical matching b. Statistical Matching 1). The purpose is to control for factors that cannot be randomized but nonetheless can be measured on (at least) an interval scale (but in practice we often treat ordinal scales as if they were interval). Statistical control is achieved by measuring one or more concomitant variables (referred to as the “covariate”) in addition to the variable (variate) of primary interest (i.e., the dependent or response variable). Statistical control can be used in experimental designs and because no direct manipulation of subjects or conditions is required, it can also be used in quasi-expermential and non-experimental designs. 2). “Analysis of covariance is used to test the main and interaction effects of categorical variables on a continuous dependent variable, controlling for the effects of selected other continuous variables which covary with the dependent.The control variable is called the ‘covariate’.” (http:http://www2.chass.ncsu.edu/garson/pa765/ancova.htm). 3). “To control a covariate statistically means the same as to adjust for the covariate or to correct for covariate, or to hold constant or to partial out the covariate.” (http://www.psych.uiuc.edu/mho/psy307a.html) 4). But see: Loftin, L., & Madison, S. (1991). The extreme dangers of covariance corrections. In B. Thompson (Ed.), (1991). Advances in educational research: Substantive findings, methodological developments (Vol. 1, pp. 133-148). Greenwich, CT: JAI Press. (IBSN: 1-55938-316-X) Thompson, B. (1992). Misuse of ANCOVA and related "statistical control" procedures. Reading Psychology, 13, iii-xviii.

Michael – Y520

Y520 — Spring 2000 Page 4

“ Pre-Experimental” Designs A. One-Shot Case Study X

O

X = “treatment” O = Observation (dependent variable)

Problems: No control group; cannot tell if “treatment” had any effect. Comments from Campbell and Stanley (1963): • “As has been pointed out (e.g., Boring, 1954; Stouffer, 1949) such studies have such a total absence of control as to be of almost no scientific value” (p. 6). • “Basic to scientific evidence (and to all knowledge-diagnostic processes including the retina of the eye) is the process of comparison, of recording differences, or of contrast. Any appearance of absolute knowledge, or intrinsic knowledge about singular isolated objects, is found to be illusory upon analysis. Securing scientific evidence involves making at least one comparison" (p. 6). • “It seems well-nigh unethical... to allow, as theses or dissertations in education, case studies of this nature (i.e., involving a single group observed at one time only)" (p. 7).

B. One Group Pretest-Post test Design O1

X

O2

O1 = Pretest X = “treatment” O2 = Observation (dependent variable) Problems: No control group. Changes between pre- and post-test may be due — not to the treatment — but to: history, maturation, instrument decay, data collection characteristics, data collection bias, testing, statistical regression, attitude of subjects, problems with implementation, etc.

C. Static-group comparison design X O1 X = “treatment” ———————————————————— O1 = Observation (dependent variable) O1 Intact, existing groups are used. No random selection of subjects; no random assignment to groups. No way to insure equivalence of groups. Comments from Campbell and Stanley (1963): • “Instances of this kind of research include, for example, the comparison of school systems which require the bachelor’s degree of teachers (the X) versus those which do not; the comparison of students in classes given speed-reading training versus those not given it; the comparison of those who heard a certain TV program with those who did not, etc.” (p. 12). • There is “... no formal means of certifying that the groups would have been equivalent had it not been for the X.... If O2 and O2 differ, this difference could well have come through the differential recruitment of persons making up the groups: the groups might have differed anyway, without the occurrence of X" (p. 12).

Michael – Y520

Y520 — Spring 2000 Page 5

Quasi-Experimental Designs • No random sampling of subjects. Intact groups often used. • No random assignment of Ss to groups. Confidence in equivalency of groups is lower. A. Matching-only Group Design Treatment M X1 O ———————————————————— Control M X2 O

X = “treatment”

B. Matching-only Pretest-Post test Group Design Treatment O1 M X1 O2 ———————————————————— Control O1 M X2 O2

O1 = Pretest X1 = “treatment” O2 = Post test

• Existing, intact groups. • Subjects matched on one or more variables; can't be certain if groups are equivalent on remaining unmatched variables. • Matching is never a substitute for random sampling and random assignment to groups.

C. Single Group Time Series Design • “The essence of the time-series design is the presence of a periodic measurement process on some group or individual and the introduction of an experimental change into this time series of measurements, the results of which are indicated by a discontinuity in the measurements recorded in the time series" (Campbell & Stanley, 1963, p. 37). O 1O 2

O3O4

O5X1

O6O7

O8O9

O10

X1 = “treatment”

Factorial Designs. • Requires at a minimum, two levels variable A crossed with two levels of variable B. That is, all levels of A occur with all levels of B. • Factorial designs enable the investigator to observe an interaction, if one exists. An interaction simply means that different levels of the dependent variable occur at different levels of the independent variable. • “Let us suppose that three types of teachers are all, in general, effective (e.g., the spontaneous extemporizers, the conscientious preparers, and the close supervisors of student work). Similarly, three teaching methods in general turn out to be equally effective (e.g., group discussion, formal lecture, and tutorial). In such a case..., teaching methods could plausibly interact strongly with types, the spontaneous extemporizer doing best with group discussion and poorest with tutorial, and the close supervisor doing best with tutorial and poorest with group discussion” (Campbell & Stanley, 1963, p. 29).

Threats to Internal Validity Is the investigator’s conclusion correct? Are the changes in independent variable indeed responsible for the observed variation in the dependent variable? Or, might the variation in the dependent variable be attributable to other causes? This is the question of internal validity.The following list is from Campbell and Stanley (1963) as interpreted by Kirk (1995): 1. “History. Events other than the administration of a treatment level that occur between the time the treatment level is assigned to subjects and the time the dependent variable is measured may affect the dependent variable. 2. “Maturation. Processes not related to the administration of a treatment level that occur within subjects is simply a function of the passage of time (growing older, stronger, larger, more experienced, and so on) may affect the dependent variable. 3. “Testing. Repeated testing of subjects may result in familiarity with the testing situation or acquisition of information that can affect the dependent variable.

Michael – Y520

Y520 — Spring 2000 Page 6 4. “Instrumentation. Changes in the calibration of a measuring instrument, shifts in the criteria used by observers and scorers, or unequal intervals in different ranges of a measuring instrument can affect the measurement of the dependent variable. 5. “Statistical regression. When the measurement of the dependent variable is not perfectly reliable, there is a tendency for extreme scores to regress or move toward the mean. Statistical regression operates to (a) increase the scores of subjects originally found to score low on a test, (b) decrease the scores of subjects originally found to score high on a test, and (c) not affect the scores of subjects at the mean of the test. The amount of statistical regression is inversely related to the reliability of the test. 6. “Selection. Differences among the dependent-variable means may reflect prior differences among the subjects assigned to the various levels of the independent variable. 7. “Mortality. The loss of subjects in the various treatment conditions may alter the distribution of subject characteristics across the treatment groups. 8. “Interactions with selection. Some of the foregoing threats to internal validity may interact with selection to produce effects that are confounded with or indistinguishable from treatment effects. Among these are selection-history effects and selection-maturation effects. For example, selection-maturation effects occur when subjects with different maturation schedules are assigned to different treatment levels. 9. “Ambiguity about the direction of causal influence. In some types of research — for example, correlational studies — it may be difficult to determine whether X is responsible for the change in Y or vice versa. This ambiguity is not present when X is known to occur before Y. 10. “Diffusion or imitation of treatments. Sometimes the independent variable involves information that is selectively presented to subjects in the various treatment levels. If the subjects in different levels can communicate with one another, differences among the treatment levels may be compromised. 11. “Compensatory rivalry by respondents receiving less desirable treatments. When subjects in some treatment levels receive goods or services generally believed to be desirable and this becomes known to subjects in treatment levels that do not receive those goods and services, social competition may motivate the subjects in the latter group, the control subjects, to attempt to reverse or reduce the anticipated effects of the desirable treatment levels. Saretsky (1972) named this the ‘John Henry’ effect in honor of the steel driver who, upon learning that his output was being compared with that of a steam drill, worked so hard that he outperformed the drill and died of overexertion. 12. “Resentful demoralization of respondents receiving less desirable treatments. If subjects learn that the treatment level to which they have been assigned received less desirable goods or services, they may experience feelings of resentment and demoralization. Their response may be to perform at an abnormally low level, thereby increasing the magnitude of the difference between their performance and that of units assigned to the desirable treatment level.” Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Chicago, IL: Rand McNally. Kirk, R. E. (1995). Experimental design: Procedures for the behavioral sciences. Pacific Grove, CA: Brooks/Cole.

Michael – Y520

Suggest Documents