EFFECT STRENGTH VS. STATISTICAL ...

10 downloads 0 Views 174KB Size Report
Munich: Institute of Educational. Psychology. Benbow, C.P. & Stanley, J.C. (1984). Gender and the science major: A study of mathematically precocious youth.
Downloaded By: [TIB-Lizenzen - TIB Licence Affairs] At: 13:47 28 July 2008

EUROPEAN JOURNAL FOR HIGH ABILITY, 1991, 2, 236-243.

EFFECT STRENGTH VS. STATISTICAL SIGNIFICANCE: A WARNING AGAINST THE DANGER OF SMALL SAMPLES A comment on Gefferth and Herskovits's article "Leisure activities as predictors of giftedness" 1

Detlef H. Rost

Abstract: In research on giftedness, small samples are the rule rather than the exception. Such samples require an especially meticulous research design. The absence of power-analytical considerations (i.e., defining the minimal sample sizes according to a- and β-error and expected effects) prior to data gathering can easily lead to false conclusions based on faulty data interpretation. Especially with smaller samples, it is strongly recommended that practical relevance (employing a suitable index of effect strength) be taken into consideration when interpreting observed group differences. This is better than relying on "statistical significance" alone. Group studies should not be conducted with extremely small samples. According to a proven "rule of thumb", each subgroup tested should be comprised of at least 20 to 25 subjects. Because of the prevalence of small sample studies in research on giftedness (gifted individuals are by definition rare), it is important that statistical analyses be conducted in ways which are capable of detecting "significant" effects and which are not, for instance because of small sample size, predestined to conclude that no effects are present, even when they are (i.e., to accept the null hypothesis, even when it is false). The nature of the problem can be made more concrete by looking at a practical example taken from an article of Gefferth and Herskovits (1991). In a recent study published in this journal Gefferth and Herskovits 0991) investigated an unselected sample of 192 Hungarian academic secondary school students (80 boys and 112 girls aged 15, sample [T]) and administered two leisure activities questionnaires, the MAI (Bittner & Hany, 1987) and the TAICP (Milgram, 1987). 1

Author's address: Detlef H. Rost, Philipps-University, Department of Psychology, Gutenbergstr. 18, D-W-3500 Marburg/Lahn, Germany

236

Downloaded By: [TIB-Lizenzen - TIB Licence Affairs] At: 13:47 28 July 2008

From this sample of 192 students they selected the upper ten percent of "intellectually gifted" (subsample [A]: 9 boys and 9 girls, defined as those with the best achievements in Raven's intelligence test) and the upper ten percent of "schoolhouse gifted" (subsample [B]: 5 boys, 11 girls, defined as those with the best grades). Investigating gender differences across the various scales of both leisure time questionnaires, Herskovits and Gefferth found significant mean differences in 13 of 18 sample [T] comparisons. In testing for sex differences in the two small subsamples of the 18 "intellectually gifted" [A] and the 16 "schoolhouse gifted" [B] students, only 5 of 36 comparisons were statistically significant. Hence the authors (1991, p. 49) concluded that "the characteristic differences in leisure time activities decrease or vanish if boys and girls who have high intellectual potential, are clever or display excellent academic achievement are taken". This interpretation is erroneous and misleading, as the following analysis will show.

The importance of effect sizes for interpreting mean differences Introductory textbooks of psychological statistics frequently point out that the significance of a mean difference between two groups depends on: (1) the actual size of the observed mean difference (i.e. M 1 -M 2 ); (2) the within group standard deviations of the scores (i.e. S1, S 2 ); (3) the chosen significance level (i.e. the probability of a type I error, usually α = .05); (4) the power of the statistical test used for significance testing (for example, t-test) and, most importantly, (5) the number of subjects under investigation (N 1, N 2 ). One of the facts of empirical research is that whether significance is obtained with any statistical test depends heavily on sample size. Therefore, statisticians have developed various indices to estimate the practical relevance of given effects. In this case (two groups, namely boys and girls) the standardized effect size (ES) may be calculated by dividing the observed mean difference by the pooled within group standard deviation (see next page, Table 1 and Table 2). The effect sizes for the gender differences in the total sample ( E S [ T ] ) and in the two gifted subsamples ( E S [ A ] ES [ B ] ) are shown in Tables 1 and 2. In the unselected sample [T], the minimal effect size of a significant difference is ES[T] min = 0.29, and the maximum effect reaches ES [ m a x ] = 0.78. In both gifted subsamples [A] and [B], ESs vary between ES [ A ] min = 0.01 and ES[B] max = 2.44. Altogether, 64 percent (23 out of 36) of all possible comparisons between gifted boys and girls reveal effects which are substantially larger than the effects found in the total sample [T]. Only in 25 percent (9 of 36) of the possible comparisons are the effects less than those of the

237

Downloaded By: [TIB-Lizenzen - TIB Licence Affairs] At: 13:47 28 July 2008

unselected sample [T], and similar effect strength is evident in 4 (= 11 percent). Moreover, it is worth noting that the direction of the observed gender differences in the two gifted subsamples [A] and [B] almost always corresponds to those found in the total sample [T] (namely in 87 percent, or 28 of 32 comparisons). Table 1. Gender differences in the Munich Activity Inventory in terms of effectsizes (ES), based on data from Gefferth and Herskovits (1991) Leisure time activity

Total sample [T]

Intellectually a

gifted [A]

b

Scboolhouse gifted [B]

c

Group with greater

ES when comparing ITlvstA] [T]vs[B]

Direction of gender differences comparing IT] vs [B] JJ] vs [A]

ES[T]

ESfAT

ESfBl

Natural science

0.78**

0.95

0.34

TA1

rn

same

different

Social activities

0.44**

0.25

1.39*

rn

[Bl

same

same

Technics

0.77**

1.03

0.14

Music

0.04

0.37

0.90

Arts

0.60**

0.92

0.97

Literature

0.25

0.97

1.64*

rAi fAl TAl fAl

Sports

0.29*

0.26

0.91

Creativity

0.23

0.60

Total

0.03

0.01

m

same

different

rBi

same

same

PI

same

same

TBl

same

same

m=rAi

[Bl

same

same

0.75

TAl

same

same

1.06

[T] = [A]

FBI fBl

*=p