The Journal of Experimental Education, 2004, 73(1), 41-52
Multiple Intelligences and Reading Achievement: An Examination of the Teele Inventory of Multiple Intelligences SUSAN D. McMAHON DePaul University DALE S. ROSE MICHAELA PARKS 3D Group
ABSTRACT. With increased interest in the theory of multipie intelligences (MI), there is a need to identify and evaluate instruments designed to assess them. This study was designed to evaluate the reliability of the Teele Inventory of Multiple Intelligences (TIMI) and the relationship between intellectual preferences and reading achievement. The TIMI was administered to 288 urban 4th-grade students. Results suggest that the TIMI subscales, which examine preferences for linguistic, logical-mathematical, interpersonal, intrapersonal, musical, spatial, and bodily-kinesthetic intelligences, were found to have poor to moderate reliability. Students with higher scores on logical-mathematical intelligence were more likely to demonstrate at or above grade-level reading comprehension scores compared with students who scored lower on logical-mathematical intelligence, but none of the other MI scales was predictive of student achievement. Implications for test development and assessment of MI are discussed. Key words: assessment, elementary students, multiple intelligences, reading achievement, TIMI
AS EDUCATORS CONTINUE TO SEARCH for effective methods of teaching, there has been increasing interest in the role and assessment of multiple intelligences (MI) in relation to learning and achievement. Yet, the development The authors acknowledge the DePaul University College of Liberal Arts and Sciences Summer Research Grant, Reading in Motion, Chicago Public Schools, and the Evanston School District for their assistance in this study. Address correspondence to: Susan D. McMahon, 2219 N. Kenmore, Chicago, IL 60614. E-mail:
[email protected] 41
42
The Journal of Experimental Education
of Standardized, reliable assessment tools for MI has lagged behind the development of theory (Klein, 1997). If educators seek to use MI concepts to tailor teaching techniques, they need reliable, valid ways of assessing student preferences and student abilities. There are some tools available to assess MI among schoolchildren, but there are few data about the psychometric properties of those tools. Thus, a primary purpose of this study was to assess the reliability and validity of one tool that assesses MI preferences, the Teele Inventory of Multiple Intelligences (TIMI). In addition, it is useful to know the extent to which MI relates to academic achievement, as achievement is the primary yardstick that schools use to assess learning. Thus, the second goal of this study was to validate the TIMI against a standardized achievement test that is used in the educational system. Gardner (1993) posited a theory of multiple intelligences, which proposes distinct areas of skill that each individual possesses to a different degree. His original theory comprised seven areas of intelligence: linguistic (learn through auditory and verbal methods); logical-mathematical (focus on logic, order, and problem solving); spatial (learn through visualization); musical (use rhythm and sound to process information); bodily-kinesthetic (learn through body sensations and physical activities); intrapersonal (focus on inner self); and interpersonal (relate well to others, learn through collaboration). More recently, Gardner (1999) added three intelligences to the previously identified seven: naturalist (sensitivity to the ecological environment), moralist (sensitivity to ethical concerns), and existentialist (insight into the meaning of life); however, the majority of existing empirical research and available measurement tools, including the TIMI, are based on his original theory of seven multiple intelligences. Gardner and Hatch (1990) demonstrated that children perform differently on activities that require the use of different intelligences, suggesting that they have strengths and weaknesses in different areas and distinct intellectual profiles. Although Gardner's theory has been criticized as being too broad for planning a curriculum, inadequately supported by evidence, and representing abilities in a static manner, his MI theory has created much interest in more diverse teaching strategies, balanced programming, and matching instruction to learning styles (Klein, 1997). Furthermore, Klein has suggested that learning more about the knowledge and strategies that students use in particular activities would increase the relevance of MI theory to classroom interventions. If valid and reliable tools are developed to assess multiple intelligences, then these preferences could be examined in relation to skill development in particular areas. Some researchers have argued for alternative assessment procedures that address the biases of standardized test-based approaches to placement and to measuring student learning (Frasier et al., 1995; Gardner, 1993; Krechevsky & Gardner, 1990; Reid, Romanoff, Algozzine, & Udall, 2000), yet few studies have
McMahon, Rose, & Parks
43_
reported the relationship between the various intelligences and standardized test scores. Although standardized test scores have been criticized on many fronts, they remain the essential markers of student and school success. Additional basic research on, and development of, standardized alternative approaches to assessment have the potential to move the field of education forward. For example, if certain intelligences are linked with currently used measures of achievement, and if interventions can demonstrate increases in those intelligences or increases in achievement, then schools are more likely to adopt the MI approach to assessment and learning. Reid and colleagues (2000) demonstrated that spatial, linguistic, and logical-mathematical intelligences were statistically significantly associated with the Matrix Analogies Test (MAT; Naglieri, 1985), a nonverbal assessment used for making screening decisions for gifted programs. Assessment of intelligences was conducted through a Problem-Solving Assessment (PSA; Reid et al.), which involves standardized observation of a series of linguistic, logical-mathematical, and spatial problem-solving activities over a period of 4 hr. These researchers found that MAT scores were correlated with PSA scores, indicating moderate concurrent validity, but that the PSA yielded a higher referral rate to gifted programs (about 40%) compared with the MAT (about 17%). Furthermore, placement decisions were more highly correlated with performance on MI tasks assessed by the PSA (.59-.74) than with MAT stanine scores (.43). This study suggests that MI theory can be applied to placement decisions, that alternative forms of measurement can identify more diverse groups of students as gifted, and that multiple intelligences can be reliably assessed through observation of problem-solving activities. Yet, the assessment process took several months, and this type of approach is unlikely to be adopted by schools on a large scale. Thus, there is a need to identify assessment tools that are easy to use and accurate. Some schools have applied multiple intelligences theory to their curricula and have reported success in improving performance on achievement tests (Geimer, Getz, Pochert, & Pullam, 2000; Gens, Provance, VanDuyne, & Zimmerman, 1998; Greenhawk, 1997; Kuzniewski, Sanders, Smith, Swanson, & Urich, 1998; Mettetal, Jordan, & Harper, 1997). Greenhawk reported a 20% increase in students' scores on the Maryland School Performance Assessment after just 1 year of implementation of MI techniques across the curriculum. In that study, students were taught how to assess and use their own MI strengths and weaknesses and were encouraged to use a variety of intelligences to display their knowledge. Mettetal et al. demonstrated improvements in standardized test scores incrementally over a 2-year period in a suburban Indiana school, with a marked increase during the second testing period. In a study of students in 9th through 11th grades in suburban Chicago, implementation of multiple intelligence strategies yielded improved scores in reading comprehension and math skills
44
The Journal of Experimental Education
(Kuzniewski et al.). Although each of these studies reported positive results from Ml-based curricula, none used a standardized approach to assess individual multiple intelligences. When multiple intelligences are not assessed in relation to MI interventions, it is unclear whether the interventions are effective because they are tapping multiple intelligences. For example, it is possible that MI interventions led to improvements as a result of other changes, such as the use of a curriculum that incorporates more interactive teaching strategies than were previously used. As schools search for ways to improve test scores, reliable, "culture-fair" approaches to assess the strengths and weaknesses of their students are needed. However, there continues to be a dearth of research on tools to assess children's multiple intelligences, with even less research on how multiple intelligences relate to academic achievement. When the current study was conducted, we could find only two paper-and-pencil measures of multiple intelligences appropriate for children, the Teele Inventory of Multiple Intelligences (TIMI; Teele, 1992), and the Multiple Intelligence Developmental Assessment Scales for Children (MIDAS-for-KIDS; Shearer, 1997). Although Gardner (1993) has advocated that assessment of intelligences should be conducted with the materials of that intelligence (i.e., musical intelligence should be assessed with musical instruments), it is not always practical or feasible for educators to use these lengthy and complex tools to assess student strengths and weaknesses. Although it has been suggested that alternative assessments (i.e., performance-based) can provide a more accurate measure of student achievement and ability, problems with cost, bias, training, scoring, and lack of sound psychometric characteristics have also been raised (e.g., Plucker, Callahan, & Tomchin, 1996). Educators need instruments that are reliable and easy to use. This need is highlighted by the fact that the TIMI is being used in more than 1,000 schools in the United States and seven other countries (Teele, 1996). Thus, there is a need to learn more about tools that are easily accessible to determine whether they have adequate reliability and validity. Teele (1995, 1996) has indicated that the TIMI has proven reliable through test-retest studies, but internal consistency data have not been reported. Furthermore, given the popularity of multiple intelligences and the widespread use of this assessment tool, it is necessary to assess further the reliability of the inventory. It should be noted that the TIMI assesses preferences for learning, using the concept of multiple intelligences. This approach is in contrast to assessing actual intelligences, which would require a different, more comprehensive type of approach to assessment. In the current study, we examined the following aspects of the TIMI: (a) the reliability of each of the MI subscales, (b) relationships among the different intelligences, and (c) the relationships between each of the intelligences and reading comprehension, as assessed by the Gates-MacGinitie Test of Reading (MacGinitie & MacGinitie, 1989).
McMahon, Rose, & Parks
45
Method Procedure We solicited two Illinois school districts to participate in this study (Chicago and Evanston). In Evanston, a Chicago suburb, school district administrators recommended particular schools to participate, and we held meetings at these schools to recruit teachers for participation. Eourth-grade students from three Chicago schools (nine classes) and two Evanston schools (six classes) participated in this study. There were 288 students who completed the TIMI; of those 288 students, 218 students had valid scores on the Gates-MacGinitie Test of Reading. Gender and racial/ethnic background information was not available, but basic demographic information on the participating schools is shown in Table 1. The majority of the students in the participating schools represented low-income, ethnic minority students. Measures Student reading. We measured reading comprehension using the GatesMacGinitie Test of Reading (MacGinitie & MacGinitie, 1989), a nationally normed test used in many school districts across the United States. The test measures reading comprehension and is administered in a group format. KuderRichardson 20 (K-R 20) reliability coefficients reported in the technical manual ranged from .72 to .87 (MacGinitie & MacGinitie). Test validity was demonstrated through statistically significant correlations to established tests of reading, such as the Iowa Test of Basic Skills. For the current study, we used the grade equivalent derived from the reading score to measure student performance in reading comprehension. The internal consistency (Cronbach's alpha) for the current study TABLE 1. School Characteristics
School 1 2 3 4 5
Classes (n) 4 2 3 3 3
District Chicago
Chicago Chicago Evanston Evanston
ITBS mean score"
Low income (%)
Minority (%)
4.8 4.6 5.4
63 78 70 71 26
75 85 79 47 50
"Scores on the Iowa Test of Basic Skills (ITBS) are represented as grade-equivalent scores, The ITBS was not administered to Evanston students.
46
The Journal of Experimental Education
was very high (.93). Note that the Kuder-Richardson and Cronbach's alpha are the same if items are dichotomously scored. Multiple intelligence preferences. We used the TIMI (Teele, 1992) to measure students' multiple intelligence preferences. This instrument includes 28 pairs of black-and-white drawings of pandas engaged in a variety of activities (e.g., reading and roller skating). Each activity is related to one of the seven intelligences, and students select the picture in each pair that best describes them. This tool is purported to measure linguistic, logical-mathematical, spatial, musical, bodily-kinesthetic, interpersonal, and intrapersonal learning preferences. Students have the opportunity to select each of the intelligences eight times; therefore, each subscale (intelligence) comprises eight dichotomous items. Thus, each student's score can range from 0-8 on each of the subscale intelligences. Field-testing was conducted with the TIMI to determine content validity, and item-by-item analysis led to corrections to the pictures to enhance validity (Teele, 1995). Furthermore, three studies were conducted, with sample sizes ranging from 52 to 812 students, to examine test-retest reliability at 2-, 3-, and 4-week intervals. Test-retest reliability estimates ranged from .46 for intrapersonal to .88 for musical for the 2-week interval, .50 for intrapersonal to .68 for logical-mathematical for the 3-week interval, and .49 for spatial to .66 for logical-mathematical for the 4-week interval (Teele, 1995). In the current study, we administered the TIMI immediately following the Gates-MacGinitie Test of Reading. Results We calculated Cronbach's alphas for each intelligence subscale to examine internal consistency (Table 2). Overall, Cronbach's alphas were very low and unacceptable. Logical-mathematical intelligence demonstrated the highest coefficient alpha (.61), and intrapersonal intelligence demonstrated the lowest coefficient alpha (.22). Next, we examined inter-item correlations using Briggs and Cheek's (1986) rule that items measuring the same construct should yield intercorrelations greater than .20. The item correlations of the TIMI subscales met this criterion 4% (Interpersonal) to 28% (logical-mathematical) of the time. The percentage of negative item correlations for the subscales ranged from 0% (musical) to 43% (intrapersonal). When we examined Cronbach's alphas after deleting each item one at a time from the subscale, the subscales demonstrated negligible improvement, regardless of which item was deleted. The estimated coefficient alpha improvement using single-item deletion ranged from 0 (musical) to .12 (linguistic); improved subscale coefficient alphas ranged from .31 (bodily-kinesthetic and intrapersonal) to .64 (logical-mathematical). In addition, item analysis suggested that deleting more than one item would not notably improve estimated subscale internal consistency. Given negligible improvement
McMahon, Rose, & Parks
47
using item deletion, we conducted all further analyses on the original subscales. These reliability analyses suggest that caution should be taken in interpreting the results of the TIMI, in terms of scale correlations and the relationships between multiple intelligences and reading comprehension. We conducted correlational analyses to examine the relationships between each of the intelligence scores (Table 3) as well as between the MI scores and reading comprehension skills. Correlational analyses revealed three statistically significant positive associations between the intelligences: linguistic and logical-mathematical, spatial and intrapersonal, and interpersonal and bodily-kinesTABLE 2. Cronbach's Alphas ant 1 Summary of Inter-Item Correlations for Multiple Intelligence Subscales (n = 288) Inter-item correlation Subscale
a
Negative («)
Linguistic Logical-mathematical Spatial Interpersonal Intrapersonal Musical Bodily-kinesthetic
.45 .61 .41 .39 .22 .52 .28
6 1 8 5 12 0 9
Range
>.2O in)
-.20 to .40 -.07 to .32 -.08 to .23 -.04 to .20 -.15 to .31 .03 to .28 -.16 to .29
4 9 5 1 2 3 4
Note. There were 8 items for each subscale, yielding 28 inter-item correlations for each subscale.
TABLE 3. Interscale Correlations for tbe Teele Inventory of Multiple Intelligences (n = 288) Subscale 1. Linguistic 2. Logical-mathematical 3. Spatial 4. Interpersonal 5. Intrapersonal 6. Musical 7. Bodily-kinesthetic *p< .05. **p< .01.
1
.24** -.20** -.24** -.31** -.15** -.23**
2
3
-. 32** -. 17** -.16** -. 31** .16** -. 21** -.14* -. 25** -.06
4
5
-.13* -.06 .20**
-.12* .09
6
-.13*
7
—
48
The Journal of Experimental Education
thetic intelligences were positively correlated. Aside from these positive relations, most of the intelligences demonstrated statistically significant negative correlations with one another. None of the intelligences was statistically significantly related to grade equivalent for reading comprehension. Descriptive statistics of the intelligences suggest that, overall, students tended to score highest on spatial, logical-mathematical, linguistic, and bodily-kinesthetic intelligences and lowest in intrapersonal intelligence (Table 4). The intrapersonal, musical, and bodily-kinesthetic scales demonstrated some skewness, while the logical-mathematical and spatial scales showed negative kurtosis estimates, indicating a platykurtic (flatter than normal) distribution. To examine further the association between multiple intelligences and grade equivalent, based on reading achievement, we categorized students as below grade level versus at or above grade level (Table 4). We conducted an exploratory stepwise logistic regression, with grade level category as the dichotomous dependent variable, to explore whether any of the intelligence preferences predicted whether students were achieving at or above grade level versus below grade level. Logical-mathematical intelligence was the only predictor that was entered into the regression equation, x^(l. A' = 218) = 4.96, p = .026. These results suggest that students with a greater preference for logical-mathematical intelligence were more likely to be at or above grade level, compared with students who scored lower on logical-mathematical intelligence. Discussion In this study, we examined the TIMI as a tool to assess children's MI preferences. Reliability analyses for each of the subscales of the TIMI suggested that the instrument does not provide consistent measurement and needs further development and refinement. Item analyses suggest that it is not a matter of a few items on each scale that fit poorly with the subscale, but rather the scales themselves are not unidimensional. In addition, there seemed to be an adequate distribution of scores on each subscale; therefore, inadequate variation within scales also seems an unlikely explanation. We feel the most likely reason for poor reliability of the TIMI is that students are not being systematic in their choice of pictures. For example, people may see different things in each picture that trigger them to decide that these pictures are "like them." Thus, the items may not have enough in common to create consistency among the preferences. Furthermore, it is possible that student opinions about characterisfics that are "like me" do not represent actual preferences for learning. The reliability results obtained in this study suggest that the TIMI, as currently designed, is not a useful tool for educators. Given the poor reliability of this test, validity is not discemable at this point. Although we designed this study to examine the TIMI and the relationship between MI and reading achievement, all relationships between MI and reading
McMahon, Rose, & Parks
49
1
u
u x:
m
1 oa
"K u
NO
kin
•5
__ ^
CN ON
o
oo en
(N
NO en ON NO
l' l'
NO CN
-a;
SIC
"3
q
3
s
CN —
0C3
en
nal
r
Taper
OS.
O
CO
vq CN
—
m
in —
—r
NO
in
•*
(M
oi —
• ~ "
nal
_c
o1* en
ON NO
NO
S
— oo
r
1
r
B c
• * '
J:
(N
—
M
E CJ
Xi
oo (U
Skew Kurtosis elow g rade 1
•g
pa
u
•o
(A