Document not found! Please try again

Comparing Indicators of Knowledge within and between ... - QualQuant

39 downloads 61 Views 85KB Size Report
In other studies that included multiple cultural domains, researchers have .... triad test that included the names of thirteen cyclists and used a lambda 2.
10.1177/1525822X02239570 FIELD Furlow METHODS / COMPARING INDICATORS OF KNOWLEDGE

ARTICLE

Comparing Indicators of Knowledge within and between Cultural Domains CHRISTOPHER A. FURLOW University of Florida This article examines the use of the cultural consensus model to estimate individual informants’ knowledge of cultural domains. The author compares informants’ knowledge scores generated by the cultural consensus model derived from triad data and rating data with each other and with free list lengths in two closely related domains—brands of bicycles and the greatest cyclists of all time. Results indicate that individual informant competency scores correlated at highly variable levels (.11–.75). These results raise questions concerning the model’s validity related to individual informant competency scores. More research is needed concerning the range of domain consensus and interinformant variability in genuine knowledge that produces valid informant competency scores and the impact that different research instruments have on competency scores. Keywords: intracultural variation; cultural consensus model; validity; informant competency; cultural domain analysis

Intracultural variation in informants’ knowledge is a major area of concern for anthropologists and other social researchers interested in accurately representing cultural domains. Of all the informants available to the researcher, which individuals know the most about the specific topic of interest and how can researchers determine this? One method developed to address this problem is the cultural consensus model. Romney, Weller, and Batchelder (1986) proposed the cultural consensus model (1) to estimate the degree of consensus among informants about particular cultural domains, (2) to estimate the correct answer to each question asked to informants, and (3) to estimate informant knowledge or “cultural competency” within cultural domains. In this article, I am interested in the use of the cultural consensus model to estimate individuals’ knowledge. Anthropologists have applied the cultural consensus model to a wide variety of cultural domains including folk medicine (Weller 1984; Garro 1986, I thank Mark Papa for assisting in data gathering and entry. I also thank Bryan Byrne, H. Russell Bernard, and two anonymous reviewers for their comments, which greatly improved this article. Field Methods, Vol. 15, No. 1, February 2003 51–62 DOI: 10.1177/1525822X02239570 © 2003 Sage Publications

51

52

FIELD METHODS

1988) and ethnobiology (Boster 1986; Boster and Johnson 1989) among others (see also Boster, Johnson, and Weller 1987; Romney, Batchelder, and Weller 1987; Brewer, Romney, and Batchelder 1991; Furlow and Papa n.d.). They have used a variety of data collection techniques, including true-false (Batchelder and Romney 1986, 1988; Romney, Weller, and Batchelder 1986), multiple choice (Romney, Weller, and Batchelder 1986), fill in the blank (Boster 1986), rank-order tasks (Romney, Batchelder, and Weller 1987; Brewer 1993, 1995), triads (Brewer 1995; Furlow and Papa n.d.), and pile sorts (Boster, Johnson, and Weller 1987; Boster and Johnson 1989). In each case, the cultural consensus model was successful. The cultural consensus model has also been tested in other ways. Weller (1984) and Brewer, Romney, and Batchelder (1991) have demonstrated that the more internally consistent informants are more likely to agree with other internally consistent informants. Likewise, Boster (1985) demonstrated the reliability of the cultural consensus model using a test-retest research design. In other studies that included multiple cultural domains, researchers have shown that informant knowledge is domain specific. Thus, an informant’s knowledge about animals has no relationship to that same individual’s knowledge about things students do to get good grades (Brewer, Romney, and Batchelder 1991). In contrast to the relationship between consistency and consensus, test-retest reliability, and the relationship between informant knowledge of different cultural domains, the validity of the cultural consensus model for measuring individual informants’ knowledge within a single cultural domain is still relatively taken for granted. Boster (1985) reported a correlation between informants’ agreement with each other and test-retest reliability. Brewer (1995) attempted to evaluate cultural competency scores generated using the cultural consensus model through comparison with other indicators of knowledge. Specifically, Brewer measured correlations between competency scores in five cultural domains generated from triad tests and a ranking exercise with free listing lengths, self-ratings of knowledge, and in some cases, self-reported recognition ability. Brewer concluded that these comparisons further validated the cultural consensus model and that free list length may be a good proxy for measuring informant knowledge. Brewer’s strongest evidence for the validity of the consensus model derives from the fact that the highest correlations occurred between informant competency scores from matching tasks and the various other indicators of knowledge because it is more difficult for informants to guess correctly on matching tasks. It is important to note, however, that cultural competency scores from the triad and ranking exercises correlated at relatively low levels with each other and with other indicators of knowledge. Brewer (1995) hypothesized that the

Furlow / COMPARING INDICATORS OF KNOWLEDGE

53

low level may be due to minimal interinformant variation in knowledge and that therefore variations in competency scores were due to guessing and sampling variability rather than being representative of real differences in knowledge. Brewer (1995) tested whether the competency scores represented genuine intracultural variation by comparing the standard deviations in competency scores with standard deviations from simulated informants with the same underlying mean competency as observed among his real informants. If the actual standard deviation was larger than the range of standard deviations from ten simulations, Brewer considered this to represent genuine intracultural variation in knowledge. Brewer concluded that four of five triad tests and one of five ranking tests in both high salience and low salience lists represented genuine variation. However, even among the competency scores considered genuine, correlations were only modest. For example, the mean r of exercises with genuine intracultural variation include r = .44 (seven triads × free list), r = .21 (seven triads × self-rating), and r = .28 (three triads × recognition). Significantly, while the mean correlation between triads competency and ranking competency was r = .29, the only correlation between competency scores of triad and ranking exercises that Brewer (1995) considered genuine in the same domain was at an r = .02 level. Boster, Johnson, and Weller (1987) also compared informant competency scores derived from free pile sort and triad data as part of a social networks research project. They found that informants’ competency scores in the two tasks correlated at a moderate level (r = .47). Thus, there is a significant need for additional research. In this article, I compare indicators of knowledge within and between two closely related cultural domains—brands of bicycles (bike domain) and the greatest cyclists of all time (cyclist domain). Specifically, I compare informants’ knowledge scores generated by the cultural consensus model derived from triad data and rating data from seven-point Likert-type scales with each other and with free list lengths.

METHOD The research findings presented here are part of a larger cultural domain analysis project among road cyclists in Gainesville, Florida (see Furlow and Papa n.d.). As part of this larger project, I collected a variety of data, including demographic information, free list exercises, unconstrained pile sort tasks, triad tests, and rating data on the bike and cyclist domains. Most data

54

FIELD METHODS

were gathered at weekly club meetings. During the first week, individuals provided demographic information and completed free lists for both the bike and cyclist domain. In subsequent weeks, individuals completed bike pile sorts, cyclist pile sorts, bike triad tests, cyclist triad tests, and four ratings tasks using seven-point Likert-type scales. In the bike domain, twenty-five informants sorted thirty 3″ × 5″ index cards containing the brand names of the thirty most frequently listed bicycles from the free list exercise (all brands listed by at least 20% of free list respondents) into piles containing similar bikes using whatever criteria they wished; the only restrictions were that they must have at least two piles and that each bike could not be in a separate pile. Two weeks later, thirty-three informants completed a triad test. The triad test included thirteen bicycle brands and used a lambda 2 design generated using ANTHROPAC 4.0 (Burton and Nerlove 1976; Borgatti 1992). The thirteen brands were selected from the twenty-two bicycle brands listed by at least 30% of respondents to maximize variation on qualities such as cost, frame material, usage, and country of manufacture. In subsequent weeks, informants completed three seven-point Likert-type scale questionnaires to elicit rating information regarding prestige (twenty-five respondents), expense (twenty-two respondents), and exoticness of frame material (twenty-two respondents) for each of the thirty most frequently listed bike brands. In the cyclist domain, thirty-one informants were asked to sort fourteen 3″ × 5″ index cards containing the names of cyclists named by at least two of the respondents in the free list exercise and were given the same instructions as for the bike domain. Two weeks later, twenty-nine informants completed a triad test that included the names of thirteen cyclists and used a lambda 2 design generated using ANTHROPAC 4.0 (Burton and Nerlove 1976; Borgatti 1992). The triad test included ten cyclists named by at least three respondents and three of the five cyclists named by two respondents. One week later, twenty-seven informants completed a seven-point Likert-type scale questionnaire to elicit rating information regarding the era in which cyclists competed (i.e., how recently each cyclist raced). The data were analyzed using ANTHROPAC 4.0 (Borgatti 1992), which generated eigenvalues and eigenvalue ratios, an answer key, and informant competency scores for triad and rating data in each domain.1 Table 1 presents a summary of descriptive statistics for the two domains. Triad data were analyzed using the multiple-choice method, while rating data were analyzed using the interval method. The informant competency scores on each task in each domain were then compared using a Pearson product–moment correlation. Competency scores were then further compared with free list length (in

Furlow / COMPARING INDICATORS OF KNOWLEDGE

55

TABLE 1 Descriptive Statistics for Indicators of Knowledge Variable Triad eigenvalue ratio Prestige rating eigenvalue ratio Exoticness rating eigenvalue ratio Expense rating eigenvalue ratio Era rating eigenvalue ratio Free listing length mean SD Length range

Bikes

Cyclists

4.942 12.776 9.945 14.954 — 21.68 11.54 5–60

3.869 — — — 20.829 3.57 2.94 0–12

NOTE: Dashes indicate that no data were gathered.

the bike domain only).2 Overall, thirty-eight informants participated in at least one of the exercises. Because the bike and cyclist domains are in essence subsets of a general cycling domain, the competency scores were also compared across these two closely related domains. For example, my experience among cyclists indicates that bikes, races, and racers are the most common topics of conversation. While I would never advocate comparing informant competency scores about flowers and animals, comparing informant knowledge about bikes and cyclists, especially among cultural specialists (i.e., racing cyclists who are members of a cycling club) seems justified.

RESULTS The informant competency scores on each of the six exercises are presented along with summary statistics in Table 2. Within the bike domain, the results of the Pearson product–moment correlations (see Table 3) indicate high variability among correlation levels between informant competency scores on the five exercises. These scores ranged from an r = .11 correlation between the competency scores on the prestige rating and free list tasks to an r = .75 correlation between the expense and exoticness rating tasks, with a mean r of .38. Within the cyclist domain, the Pearson product–moment correlation (see Table 3) was r = .45 between competency scores from the triad and era-ranking tasks. The mean r for the eleven correlations is .39. Competency correlations between the bike and cyclist domains were smaller than within domain correlations, with a mean r of .07 including negative correlations in four of ten cases.

56

FIELD METHODS

TABLE 2 Informant Competency Scores and Aggregate Statistics Calculated by ANTHROPAC 4.0 from Bike Triads (BTR), Bike Prestige Vector (PRE), Bike Exoticness Vector (EXO), Bike Expense Vector (EXP), Cyclist Triads (CTR), and the Cyclist Era Vector (ERA) Bikes

Cyclists

Informant

BTR

PRE

EXO

EXP

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

.65 .72 .44 .06 .39 .77 .19 .26 .10 .75 .26 .69 .67 .55 .57 .61 .68 .59 .57 .59 .75 .63 .65 .53 .35 .51 .68 .38 .69 .39 .62 — .74 — .73 —

.85 — .89 .63 .68 .77 .85 .79 .80 .94 .89 .93 .82 .85 .87 .72 — .90 .91 .72 .92 — .58 — — .63 — .66 — — — — .86 .76 — .75

.84 .79 .94 .71 .89 .65 .87 .78 — .85 .77 .91 .85 — .77 — .85 — .81 .69 — — — .58 — .47 .76 — — .50 — — .87 — — .91

.82 .94 .93 .87 .85 .82 .89 .87 — .96 .90 .92 .88 — .79 — .89 — .90 .78 — — — .61 — .76 .90 — — .37 — — .91 — — .86

CTR .41 .60 .60 .55 .35 .37 .30 .68 .35 .60 .70 .55 .60 .19 .13 .35 .49 .32 .17 .62 .48 — — .45 .41 — — .64 — .04 .57 .55 .33 — .44 —

ERA .91 — .94 .97 .91 .90 .96 .98 .98 .90 .94 .95 .91 .87 .77 .76 .93 .98 .92 .95 .93 — — .91 — .88 — .96 — — — — .96 .85 — .73

Furlow / COMPARING INDICATORS OF KNOWLEDGE

57

TABLE 2 (continued) Bikes

Cyclists

Informant

BTR

PRE

EXO

EXP

M SD Minimum Maximum n

.54 .19 .06 .77 33

.80 .10 .58 .94 25

.78 .13 .47 .94 22

.84 .13 .37 .96 22

CTR

ERA

.44 .17 .04 .70 29

.91 .07 .73 .98 26

NOTE: Dashes indicate that no data were gathered.

TABLE 3 Pearson Product–Moment Correlations Calculated by ANTHROPAC 4.0 Using the Informant Competency Scores from Bike Triads (BTR), Bike Prestige Rating (PRE), Bike Exoticness Rating (EXO), Bike Expense Rating (EXP), Cyclist Triads (CTR), and the Cyclist Era Vector (ERA) in Table 2 and Bike Free List Length (BFL)

BTR PRE EXO EXP BFL CTR ERA

BTR

PRE

EXO

EXP

BFL

CTR

ERA



.28 —

.13 .58 —

.16 .64 .75 —

.14 .11 .46 .54 —

–.01 –.16 .30 .54 –.04 —

–.36 .14 .00 .24 .04 .45 —

Gatewood (1984) originally suggested free list length as a good proxy for domain knowledge. Brewer (1995) confirmed these findings despite relatively high variability in correlations between competency scores and free list length. I also found correlations between individual competency scores and free list lengths to be highly variable. The data in Table 3 indicate that for the bike domain, free list length correlations within the bike domain range between .11 and .54, with a mean r of .31. This is comparable to Brewer’s mean r of .29 between free list length and triads/ranking competency scores. The correlations between bike free list length and cyclist triads (r = –.04) and era ratings (r = .04) were insignificant and may indicate that informant knowledge may be domain specific even in the case of two closely related domains.

58

FIELD METHODS

DISCUSSION In their article outlining the cultural consensus theory, Romney, Weller, and Batchelder (1986) defined validity to mean “that our measures relate in known and precise ways to other variables that we accept as measuring substantially the ‘same’ thing as we think we are measuring” (p. 329). In other words, two measures of the same thing, such as informant knowledge or cultural competence, correspond with each other or produce comparable findings. For example, the validity of the cultural consensus model’s informant competency scores depends on the model’s ability to produce consistent within-domain results—regardless of the instrument used to gather the data, assuming the model is applicable to the data type. The research presented here compares several indicators of informant knowledge, including competency scores from six different exercises plus one free list exercise. The validity of the model is based on the comparison of indicators of informant knowledge within a single domain. Between-domain consistency is irrelevant to the validity of the model as one would not expect informants to be equally competent in all domains or perhaps even in closely related domains such as bikes and cyclists. The data presented here, consisting of highly variable correlation levels (rs ranging from .11–.75), seemingly raise questions concerning the validity of the cultural consensus model’s competency scores. However, the issue of the potential impact of low genuine intracultural variability in informant competency must be addressed. Brewer (1995) has indicated that informant competency scores correlate weakly with other indicators of informant knowledge in cases where there appears to be low intracultural variability in informant knowledge levels (i.e., in which the effects of guessing and sampling could not be distinguished from genuine differences in informant knowledge). I did not attempt to test for low intracultural variation among my informants using the simulation described in Brewer and first used by Weller (1987). However, I can say that informant experience ranged from novice (less than six months of riding) to long-term cyclists (more than a decade). In discussions and interviews, individuals displayed a wide range of knowledge abilities as judged by my own decade of riding and racing experience at the time the research was conducted and based on formal and informal peer evaluations, including a focus group. Second, the lack of correlation between competency scores may be the result of the inability of the tasks to capture the variability rather than a flaw in the cultural consensus model itself. Brazill, Romney, and Batchelder (1995), for example, found that interinformant reliability varied among four methods used to collect perceived similarity data. Therefore, different tasks

Furlow / COMPARING INDICATORS OF KNOWLEDGE

59

may do a relatively better or worse job of capturing and distinguishing between informants’ underlying knowledge. I believe that there was extensive intracultural variation in knowledge in this case, although I am not fully convinced that the tasks were adequate in distinguishing cultural competency between individuals. Clearly, more research is needed because the problem of a lack of genuine intracultural variation in knowledge has significant implications for the cultural consensus model. For example, if (1) moderately high interinformant consensus is required, by definition, for a cultural domain to exist but, at the same time, precludes valid informant competency scores and (2) low interinformant consensus precludes the existence of a cultural domain but, at the same time, enables valid informant competency scores, that leaves only a narrow range of variability between instance 1 and instance 2 (how narrow remains to be determined) in which the model can determine that both a cultural domain does indeed exist and, at the same time, provide valid and therefore useful informant competency scores. Perhaps the most useful product of consensus analysis is to assist researchers in selecting the most knowledgeable informants who can then be interviewed in depth when research time is limited. If individual informants’ competency scores are highly variable across data-gathering instruments within a single domain, informant selection becomes problematic. On the purely pragmatic level, however, I must point out that there may be a way around this problem. In cases of moderately high to high interinformant consensus in which the validity of informant competency scores may be negatively affected by a lack of genuine intracultural variation in knowledge, researchers may use the competency scores as a guide to whom to interview in depth, with only minor concerns precisely because there are just minimal differences in knowledge from one informant to the next. It is important to restate that the cultural consensus model produces three products. The first product is a score, based on the ratio between the first and second eigenvalues of factor analysis, which measures overall domain consensus. The second product is an answer key for the test or research instrument used. The third product is a list of competency scores for each informant. I have addressed only the third product—individual informant competency scores. In summary, the results presented here suggest that individual informant competency scores derived from a variety of instruments within two closely related domains correlated at highly variable levels (.11–.75). These results raise questions concerning the model’s validity related to individual informant competency scores, specifically concerning the range of domain consensus and interinformant variability in genuine knowledge that produces

60

FIELD METHODS

valid informant competency scores and the impact that different research instruments have on competency scores. I conclude, then, by suggesting that more research is needed to test the validity of the model using more tasks for comparison with more items on each task and more difficult tasks such as fill in the blank questions, matching, rank ordering, true-false, and multiplechoice questions, in addition to triads and ratings.

NOTES 1. Competency scores were also generated from the pile sort data. However, it is important to note that the individual informant competency scores should not be used from single, free pile sort data because these data may be bound up with lumper-splitter problems. To test for lumpersplitter influences, I compared competency scores from the free pile sorting exercises with the number of piles informants made. These comparisons yielded some of the highest correlations among any of the variables—.71 in the bike domain and .79 in the cyclist domain. Indeed, even the between-domain correlations (i.e., bike pile sort competency and number of piles of cyclists [.40] and cyclist pile sort competency and number of piles of bikes [.50]) were among the highest correlations, as was the correlation between the bike pile sort and cyclist pile sort competency scores (.49). In addition, I compared competency scores derived from both the multiple-choice matching and the covariance methods of consensus analysis as recommended in Romney (1999). The correlations were r = .23 and r = –.25 for the bike and cyclist domains, respectively, and the cyclist domain’s ratio dropped to 1.306, indicating that the data violate the consensus model when analyzed using the covariance method. This indicates to me that the lumper-splitter problem significantly influenced individual competency scores for these free pile–sorting tasks and also was responsible for the relatively high correlation between bike pile sort and cyclist pile sort competency scores. The importance of not using competency scores derived from free pile sort data cannot be emphasized enough because the only published materials that explicitly discuss cultural consensus analysis and pile sort data (Boster, Johnson, and Weller 1987; Boster and Johnson 1989) use consensus analysis to measure and even to compare individual informant competency scores. Boster and Johnson (1989) reported in a footnote that Romney recommended they use consensus analysis on their pile sort data and on the pile sort data in their collaborative project with Weller (Boster, Johnson, and Weller 1987). Given that two of the originators of the cultural consensus model have either used or reportedly recommended the use of consensus analysis on free pile sort data in the only published materials dealing directly with this topic, it is important to state unequivocally that free pile sort data should not be used to compare individual competency scores because individual differences in competency are overwhelmed by lumper-splitter effects. By extension, therefore, these data may not be indicative of the model’s validity as a whole (see Weller and Romney 1988; Boster and Johnson 1989). 2. It is important to note that the free-listing exercise for the cyclist domain is not a traditional free-listing exercise in that it asked participants to list the greatest cyclists of all time rather than all the cyclists the individual could think of, thus requiring informants to make a judgment based on knowledge of cultural norms concerning what counts as greatness and, in effect, moderating the importance of length as an indicator of knowledge. Because length is therefore not a reliable

Furlow / COMPARING INDICATORS OF KNOWLEDGE

61

indicator of knowledge, I did not compare competency scores on other tasks with the cyclist free list lengths.

REFERENCES Batchelder, W. H., and A. K. Romney. 1986. The statistical analysis of a general Condorcet model for dichotomous choice situations. In Information pooling and decision making, edited by B. Grofman and G. Owen, 103–12. Greenwich, CT: JAI. . 1988. Test theory without an answer key. Psychometrika 53:71–92. Borgatti, S. P. 1992. ANTHROPAC version 4.0. Columbia, SC: Analytic Technologies. Boster, J. S. 1985. Requiem for the omniscient informant: There’s life in the old girl yet. In Directions in cognitive anthropology, edited by J. Dougherty, 177–97. Urbana: University of Illinois Press. . 1986. Exchange of varieties and information between Aguaruna manioc cultivators. American Anthropologist 88:428–36. Boster, J. S., and J. C. Johnson. 1989. Form or function: A comparison of expert and novice judgments of similarity among fish. American Anthropologist 91:866–89. Boster, J. S., J. C. Johnson, and S. C. Weller. 1987. Social position and shared knowledge: Actors’ perceptions of status, role, and social structure. Social Networks 9:375–87. Brazill, T. J., A. K. Romney, and W. Batchelder. 1995. A comparison of methods for collecting judged similarities among items in a semantic domain. Journal of Quantitative Anthropology 5:359–74. Brewer, D. D. 1993. Multiple ranking tasks in a semantic domain: An investigation of task order effects. Quality & Quantity 27:421–38. . 1995. Cognitive indicators of knowledge in semantic domains. Journal of Quantitative Anthropology 5:107–28. Brewer, D. D., A. K. Romney, and W. Batchelder. 1991. Consistency and culture: A replication. Journal of Quantitative Anthropology 3:195–205. Burton, M. L., and S. B. Nerlove. 1976. Balanced designs for triads tests: Two examples from English. Social Science Research 5:247–67. Furlow, C. A., and M. Papa. n.d. Explaining intracultural variation among cyclists: A comparison of two domains. Unpublished manuscript. Garro, L. 1986. Intracultural variation in folk medical knowledge: A comparison between curers and noncurers. American Anthropologist 88 (2): 351–70. . 1988. Explaining high blood pressure: Variation in knowledge about illness. American Ethnologist 15:98–119. Gatewood, J. B. 1984. Familiarity, vocabulary size, and recognition ability in four semantic domains. American Ethnologist 11:507–27. Romney, A. K. 1999. Culture consensus as a statistical model. Cultural Anthropology 40:S103– 15. Romney, A. K., W. Batchelder, and S. C. Weller. 1987. Recent applications of consensus theory. American Behavioral Scientist 31:163–77. Romney, A. K., S. C. Weller, and W. Batchelder. 1986. Culture as consensus: A theory of culture and informant accuracy. American Anthropologist 88:313–38. Weller, S. C. 1984. Consistency and consensus among informants: Disease concepts in a rural Mexican town. American Anthropologist 86:966–75.

62

FIELD METHODS

. 1987. Shared knowledge, intracultural variation, and knowledge aggregation. American Behavioral Scientist 31:178–93. Weller, S. C., and A. K. Romney. 1988. Systematic data collection. Newbury Park, CA: Sage.

CHRISTOPHER A. FURLOW is a doctoral student in the Department of Anthropology at the University of Florida. His research interests include the anthropology of science, technology, and knowledge; globalization and identity; Islamic science; and intracultural variation. He is currently completing his dissertation, Islam, Science, and Modernity: From Northern Virginia to Kuala Lumpur, and he is the author of “The Islamization of Knowledge: Philosophy, Legitimation, and Politics” (1996, Social Epistemology).

Suggest Documents