Examining the Exam: A Critical Look at The California ...

Science & Education (2005) 14: 117–135

© Springer 2005

Examining the Exam: A Critical Look at The California Critical Thinking Skills Test DON FAWKES1, BILL O’MEARA2, DAVE WEBER2 and DAN FLAGE2

1 Eutaw Village Center Court, Box 35786, Fayetteville, NC 28303-0786, U.S.A.; 2 James Madison University

Abstract. This paper examines the content of The California Critical Thinking Skills Test (1990). This report is not a statistical review. Instead it brings under scrutiny the content of the exam. This content will be of interest to the general reader, because the issues range from logic to ethics to pedagogy, and to questions of evidential and epistemological support. Anyone interested in clear thought and expression will find these issues of significance. Although the exam has a number of strengths and has the clearest instructions of all the presently available Critical Thinking exams, the content of 9 of the exam’s 34 questions is defective, namely the content of questions 6, 7, 8, 19, 21, 23, 24, 29, and 33. These questions make errors in critical thinking. Hence, no statistical results pertaining to the administration of these questions to students can be acceptable. The remaining questions are acceptable as to content. But until the problems are corrected, those who may use the exam should remove the defective questions from test administration or from data collection and reporting. The scope of the exam also is quite limited, but this may be unavoidable for any instrument designed to be completed in about an hour. Further, the scores resulting from any such testing can be understood only as a measure of minimal competency (below which remediation likely is needed) for the skills tested, but not as an adequate measure of critical thinking.

Before turning to the analysis it may be useful to give a brief sketch of the context in which the exam is used, and its relevance to those interested in science and education. In recent years critical thinking (CT) has become a matter of interest to collegiate administrators and assessments offices. For most colleges CT has become some sort of (usually lower division) requirement, and for many colleges the assessment office is tasked to measure student success in CT. One way that colleges try to teach CT is in science courses, but there are many approaches. One interesting question is how CT is to be defined. There are several widely known “models” of it, and Fawkes (2001) describes them and goes on to describe the CT skills tested by the three widely used CT skills tests. The three tests are known as Disclosure: Three of the authors are engaged in producing and marketing a critical thinking test. Though this paper was written before any of us considered developing such a test, the reader should be informed. Each of the writers has exercised considerable care to avoid any bias, and we thank our independent reviewers for helping us in this regard as well.

118

DON FAWKES ET AL.

the “California”, the “Cornell”, and the “Watson–Glaser”. None of these three tests is based on any of the models; each test is independently developed and marketed. In so far as the tests are used to check student acquisition of CT skills, it is the tests that define CT. As teachers of science are sometimes expected to teach CT, it is useful to know what is being tested and the quality of the tests. Fawkes et al. (2001) does the analysis for the Watson–Glaser, the present paper does it for the California, and a paper in progress will do it for the Cornell. Moreover, the analysis is a logical analysis, and in as much as logic is a science and an essential part of all other sciences, the analysis will be of interest to scientists and educators alike. Every scientist and every educator benefits from exercising logical muscles.

Analysis of Questions and Instructions The California Critical Thinking Skills Test (1990) consists of 34 questions. There are two forms of the exam; this paper addresses Form A. This is not a statistical review, but rather a review of the content of the exam. This content raises issues that range from logic to ethics to epistemology to pedagogy, and so they will be of interest to the general reader; these issues are interesting in themselves. Since the California Skills Test (hereafter, CS) is a widely used measure of critical thinking it warrants careful review. The analysis is offered in a collegial spirit, intended to foster discussion and improvement. Different readers will find different parts of the analysis controversial, and some possible challenges will be noted as the analysis proceeds. Controversy strengthens the invitation to discussion and improvement. The reader should consider everything controversial and examine the analysis carefully. As the analysis uncovers a number of flaws in the exam it is useful to note at the outset that the instructions in CS are especially clear and nontechnical, clearly the best among the currently available standardized CT exams. There are difficulties however with a number of questions, to which we now turn. We begin with Question #6: 6. Suppose “Only those seeking challenge and adventure should join the Army” were true. Which of the following would express the same idea? (A) If you seek challenge and adventure, you should join the Army. (B) If you join the Army, you should seek challenge and adventure. (C) You shouldn’t seek challenge and adventure except by joining the Army. (D) You shouldn’t join the Army unless you seek challenge and adventure. CORRECT The problem with response D is that it is ambiguous. “You shouldn’t join the Army” has two possible meanings. One of these meanings would express the same idea as the statement in the stem, but the other would not express the same idea as the statement in the stem. The two possible meanings turn on just exactly what the negation in “shouldn’t” applies to. (Generally, logicians would express this

EXAMINING THE EXAM

119

by saying that the two possible meanings depend on what the negation “operates over”.) “You shouldn’t join the Army” could mean, “It is the case that you should not join the Army”.

(1)

Or, “You shouldn’t join the Army” could mean, “It is not the case that you should join the Army”.

(2)

(1) says that you have an obligation to not join the Army, but (2) says that you do not have an obligation to join the army. “You shouldn’t join the Army” can mean either (1) or (2). But only (2) allows the statement in D to mean the same as the statement in the stem. To see this, consider that the stem statement, “Only those seeking challenge and adventure should join the Army”

(S1)

expresses the same idea as “If you should join the Army, then you seek challenge and adventure”,

(S2)

and both (S1) and (S2) express the same idea as “If you do not seek challenge and adventure, then it is not the case that you should join the Army”.

(S3)

(S1), (S2), and (S3) each express the same idea. D however, can mean either “If you do not seek challenge and adventure, then it is the case that you should not join the Army”, or,

(D1)

“If you do not seek challenge and adventure, then it is not the case that you should join the Army.”

(D2)

Note that only (D2) [and not (D1)] expresses the same idea as (S3), (S2), and (S1). Basically, this is a matter of taking care in expressing a negation. As expressing negations is one of the most basic and common uses of language, this item and the analysis of it may have broader interest beyond tests and testing, broader interest to those who simply are curious about expressing ideas clearly. Giving a thorough demonstration of the relevant points (please see the Appendix) is somewhat daunting, and that is likely the best point to make about this test question: If professors have to delve into logic textbooks to figure out a test question, it is not the sort of question that can be a useful measure of students’ CT skills. But for the more broadly interested reader, the best point may be to be on alert when negations are used, and to attend to just what is being negated. We turn next to Question #7. It reads as follows:

120

DON FAWKES ET AL.

7. Suppose a botanist lecturing about garden plants said, “The rose offers many colors”. Which would be the best interpretation of this claim? (A) (B) (C) (D) (E)

There is a rose which is more than one color. There is a thing that is more than one color and it is a rose. All roses are more than one color. Not every rose is the same color. CORRECT All of the above are equally acceptable interpretations.

It is not clear what are the criteria for “acceptable interpretations” here; accordingly, it is not clear why E is not an answer as good as or better than the “correct” answer, D. If an “acceptable” interpretation need only be consistent with the wording of the botanist’s claim that “The rose offers many colors”, then there is no obvious reason why any one of A, B, C, or D should be considered a superior reading. For this claim is ambiguous: does “the rose” denote a particular rose, or roses in general; and if the latter, is “offers many colors” being predicated of roses collectively or distributively? Given this dual ambiguity of subject and predicate, the claim taken by itself seems equally compatible with A, B, C, and D in which case, the correct answer is E. However, the authors of the test designate D as the correct answer. Given the ambiguity of the statement, perhaps everyday beliefs about roses are given some say in determining the answer. But then the item seems to be testing more for everyday knowledge than for reasoning ability. This question does not serve to measure the presence or the absence of critical thinking skills. Question #8 is as follows: 8. “Ezerinians tell lies”, means the same thing as: (A) (B) (C) (D) (E)

If anyone is Ezerinian, then that person is a liar. CORRECT If anyone is a liar, then that person is Ezennian. There is at least one person who is an Ezerinian who lies. People don’t lie unless they are Ezerinian. All of the above mean the same thing.

The statement “Ezerinians tell lies”, does not indicate whether all Ezerinians tell lies, or whether it is generally the case that Ezerinians tell lies. If the statement is interpreted to mean the former, then response A is correct. But the question is ambiguous as it stands. “Ezerinians tell lies” can mean either that all Ezerinians tell lies A If anyone is an Ezerinian, then that person is a liar.) or some Ezerinians tell lies C There is at least one person who is an Ezerinian and a liar.) The interpretation of this statement is unclear. Certainly, A is a plausible answer – it might even be the most natural way to read the statement – but there are two problems with it: (1) The word liar is obscure. How many lies does one need to tell to be a liar? Consider: a. Presumably, every person has told at least one lie at some time. Does that mean that every person is a liar? b. Or is it someone who makes a habit of it – who lies much or most of the time?

EXAMINING THE EXAM

121

c. Or must it be someone who always lies? If it is the last, then one might reasonably claim that there are no liars, and A would be false on the assumption that there are Ezerinians. (2) Within the context of an argument, it is reasonable to tell students to interpret the statement in such a way that the statement is true and (when possible) the argument is valid or strong. For example, consider the premise, “Students aren’t college graduates”. This premise might mean “No students are college graduates” or “Some students are not college graduates”. Perhaps the more understanding interpretation is, “Some students are not college graduates”, since that statement is true: i.e. (virtually) all graduate students are college graduates. Within the context of an argument, one would have some guidance as to the interpretation of “Ezerinians tell lies”: If the statement “All Ezerinians are liars” would yield a valid argument, then ceteris paribus that may be the more sympathetic interpretation – it would also indicate that the argument relies on liar in sense (a) or (b). On the other hand (again ceteris paribas), if “Some Ezerinians are liars” would yield a valid argument, then that may be the better understanding. And that statement might be true even given the highly improbable sense (c) of liar. So, either response A or response C could be correct, and apart from some context, it would not be possible to determine which is correct. One way to correct the problem with this question would be to change option A to “If anyone is Ezerinian, then that person tells lies”. Question #19 raises an interesting logical puzzle. But alas, no genuinely correct answer is provided. Here is the item: 19. Consider the “krendalog” relationship. It is defined as follows: “Only humans are krendalogs. But not every member of the human species has krendalogs. Nobody can be a krendalog to themself, but today every human is someone 5 krendalog. If someone is your krendalog, then all that person’s krendalogs are your krendalogs too. If someone is your krendalog, then you cannot be that person’s krendalog. Assume the first two humans, the long ago deceased ancestors of our species, were named Jake and Kathy”. Given this meaning of “krendalog” we can say for sure that (A) Jake and Kathy are krendalogs to one another. (B) Jake or Kathy is each their own krendalog. (C) Someone is neither Jake’s nor Kathy’s krendalog. (D) All of us are krendalogs to Jake and Kathy. CORRECT (E) None of the above because this concept does not make sense. The idea suggested here seems to be this: the relation (a) x is a krendalog of y (the relation K) is supposed to be identical to or isomorphic to or analogous to (b) x is a (human) descendent of y. If one analogizes (a) to (b), answer D does seem to follow from the information in the passage. However, the information given about the relation K does not necessarily make it isomorphic to the descendent relation. The information given is as follows: (1) Only humans are krendalogs. (2) Not every member of the human species has krendalogs.

122

DON FAWKES ET AL.

(3) Nobody can be a krendalog to himself or herself, but today every human is some human’s krendalog. (4) If someone is your krendalog, then you cannot be that person’s krendalog. In addition, we are told to assume that: (A) The first two humans, the long ago deceased ancestors of our species, were named Jake and Kathy. There are a number of ways to analyze this information; here’s one: a relation can be defined extensionally over individual humans. Consider the set of all humans living in the past, p1 . . . pn. Consider the set of all humans living now, s1 . . . sn. The following examples describe relations that satisfy the conditions. Where the intended interpretation of K(ni, nj) is ni is a krendalog of nj (i) K(sl, p1), K(s2, p2), K(s3, p3), . . . K(sn, pn).1 (Note that here only currently alive individuals are krendalogs, and only members of the previous generation have krendalogs.) (ii) K(sl, p1), K(s2, p1), K(s3, p1), . . . K(sn, p1).2 (Note that here only currently alive individuals are krendalogs, and only a single individual, p1, has everyone as his krendalog.) Both (i) and (ii) satisfy the description of the K relation but neither makes all currently alive individuals krendalogs to Jake and Kathy. Thus response D is not the right answer here and neither are any of the other responses. Correlatively, (i) and (ii) show that the K-relation is not the ancestor relation. Now perhaps it might be thought that this question can be fixed by changing option D to “Jake and/or Kathy”. But changing option D to “Jake and/or Kathy”, is not sufficient to fix the problem, because with this change in place both (i) and (ii) still satisfy the description of the K relation, and though (ii) makes all currently alive individuals krendalogs to Jake and/or Kathy, (i) does not. And if D were to read, “All of us are krendalogs to Jake and/or Kathy” then, as noted above, there are a number of ways to analyze this information, only one of which would be to take “All of us” to mean “all currently alive individuals”, as both (i) and (ii) interpret it to mean. But obviously, “All of us” could mean something else, like “All humans”. This item needs some careful revision. Question #21 and its instructions are as follows: For Questions 20, and 21 use this fictitious case: “In a study of high school students at Mumford High, it was found that 75drank two or more beers each day for a period of 60 days experienced measurable liver function deterioration. That these results could have occurred by chance was ruled out experimentally with high levels of confidence”. 21. If the information in the Mumford High case were true, which of the following hypotheses would not have to be ruled out in order to confirm the claim that for about 75 adolescents out of 100, after two months of drinking as little as two beers a day, measurable liver deterioration can be found?

EXAMINING THE EXAM

123

(A) Liver deterioration occurs only in inexperienced beer drinkers, but it levels off after people have been drinking beer for longer periods of time. (B) Since teens brag about their drinking, the positive relationship between drinking and adolescent liver function deterioration is much higher than reported. (C) Since the students at Mumford High are predominantly Black or Hispanic, the findings do not apply to adolescents in general. (D) Liver function deterioration in adolescents is the result of other factors, such as normal growth and development, poor diet, and sports injuries. (E) Since school officials failed to keep this research project confidential, the purpose of this study was known by the students being tested and by unauthorized persons. CORRECT It isn’t clear that A would have to be ruled out to confirm the claim. The claim that the deterioration levels off is irrelevant to the hypothesis that for 75 of 100 adolescents drinking two beers a day for two months causes measurable liver damage. Both (A) and (E) appear to be correct. Here is Question #23: 23. Consider this argument: “Person L is shorter than person X. Person Y is shorter than person L, but person M is shorter than person Y. Therefore, person Y is shorter than person J”. What information must be added to require that the conclusion be true, assuming all the premises are true? (A) Person L is taller than J. (B) Person X is taller than J. (C) Person J is taller than L. CORRECT (D) Person J is taller than M. The test’s authors designate the “correct” option C (“Person J is taller than L”) as the conveyor of “information (that] must be added to require that the conclusion be true, assuming all the premises are true” (original emphasis). In fact, no one of the four options provided on the exam “must” be added “to require” the conclusion’s truth: in order to certify the conclusion, given the premises, not only option C but also the distinct statement “Person J is taller than X” will suffice. Option C would clearly be the right answer if the question were to ask something like: “Of the following four statements, which one would, if added to the other premises and with all premises assumed to be true, make it certain that the conclusion is true?” However, what is actually asked is, logically, something quite different. Students who have really learned to think clearly and critically can be expected to pick up on that difference – yet this may cost them at least a certain amount of unnecessary confusion and lost time, because as the question is currently worded, strictly speaking, the options provided contain no correct answer. Perhaps it might be thought that the criticism of this item is a mere quibble. But logic is often a matter of slight changes, and slight changes are often crucial to understanding what is being said, and to thinking critically. Quibbles are addressed a little further below.

124

DON FAWKES ET AL.

We turn next to Question #24. Here are the instructions and the question: For Questions 24 and 25 use this fictional passage: “Research at the Happy-Days Pre- School on the campus of State University showed that four-year-old children who attended the Happy-Days Pre-School all day for 9 months averaged 58 points on a standardized test of kindergarten readiness. The research showed also that those four-year-olds who attended only in the morning for 9 months averaged 52, and those four-year-olds who attended afternoons only for 9 months averaged 51. A second study of fouryear-olds who attended Holy Church Pre-School all day for 9 months showed these children averaged 54 on the same kindergarten readiness test. A third study of four-yearolds who attended no pre-school and were all from low income households showed an average score of 32 on the same test. The difference between 32 and the other scores was found to be statistically significant at the .05 level of confidence”. 24. Initially, the most plausible scientific hypothesis regarding the data is (A) a child who scores 50 or higher is ready for kindergarten. (B) more testing is needed before a plausible hypothesis can be formed. (C) pre-school attendance is not related to kindergarten readiness. (D) there should be funding for four-year-olds to attend pre-school. (E) attending a pre-school is correlated with kindergarten readiness. CORRECT The construction of this item is somewhat puzzling. Note that the “third study” cited in the fictional passage correlates low scores on the kindergarten readiness test with a pair of what seem to be separable factors: “no pre-school” and “lowincome background”. This makes B a plausible option. A, C, and D are clear non sequiturs, and B looks to have an edge over E, as E seems to neglect the possibility (raised by the findings of the “third study”) that differences in household income correlate with differences in test scores. Despite looking good on these grounds, however, B seems to be ruled out, simply by the way in which the question stem and option together are worded. Note that the test taker who selects B is effectively committed by the wording of the item to the strange claim that “Initially, the most plausible scientific hypothesis regarding the data is” [option B] “more testing is needed before a plausible hypothesis can be formed”. This is a rather odd “hypothesis”. So while option B on its own seems at least as good as, if not better than, any of the other options, taken in combination with the question stem it makes for a singularly odd answer. The result is unclarity (at minimum). Option B seems best on strictly evidential grounds, but at the same time it appears questionable owing to the wording of the item. As the question stands it cannot serve to measure either the presence or absence of critical thinking skills. We turn next to Question #29. 29. “Confidentiality is an important part of the relationship between doctor and patient. But protecting innocent people from serious harm is also important. Nobody can say with certainty which value is the more important of the two.

EXAMINING THE EXAM

125

This can create some agonizing dilemmas. For example, a doctor may know that a patient is going to harm someone or be harmed by someone, as in the case of suspected child abuse. This puts the doctor in a difficult situation regarding whether to maintain confidentiality or to inform the proper authorities about the suspected danger”. The best evaluation of the speaker’s reasoning is (A) good thinking, because confidentiality cannot be compromised. (B) good thinking, because in the abstract these values conflict. CORRECT (C) poor thinking, because in practice doctors do choose one value over another. (D) poor thinking, because the law clearly says protecting the child is more important. This question raises a number of concerns – from ethics to logic to epistemology – and appears to offer no clearly correct answer. The key selects option B as the correct answer, but is it? First of all it is difficult to see how “values” might “conflict” in the abstract, but even if they may, the description given in this item poses a concrete case of alleged “conflict”. So on this ground alone, C is at least as good an answer as B. And the “Nobody can say with certainty . . . ” line is oddly out of place in the context of such a matter: for the law does govern such cases in every U.S. State and in most places on earth. So the law does “say with certainty” in so far as the law “says” anything “with certainty”. But on this ground, D is at least as good as C or B. Admittedly, this point depends on background knowledge, but it is background knowledge that most test takers are likely to have, and that more advanced critical thinkers are even more likely to have. And the “doctor may know . . . ” line is also confused here because even in the unlikely event of a clear and direct statement by a patient, the most a doctor has is a suspicion – a suspicion which may be strong or weak and well supported or poorly supported – that the patient will act in some specific way. But this consideration further supports the view that there is “poor thinking” in the item, even though none of the options matches with it. But then an attentive test taker would just note this “poor thinking” and wonder further about this item. For since the answers provided don’t take note of this aspect of “poor thinking”, the especially adept critical thinker will begin to wonder which of the instances of “poor thinking” in C and D were also not noticed by those who prepared the test, and since the test taker doesn’t have the key, the choice between C and D begins to look like an even bet from the point of view of a test taker who sees the flaws in the question and who has no option other than to try to guess what the test writers meant to say. In any case, the critical thinking skills of a test taker who reasoned in such ways are not faulty, and so this item cannot provide a measure of either the presence or the absence of CT skills. Perhaps the item could be improved by deleting “in the abstract” from option B and then rephrasing to correct the remaining problems Here is Question #33 and its instructions: For Questions 31, 32, 33 and 34 focus on the faulty inference in the following fictional case:

126

DON FAWKES ET AL.

A speech writer working for a white supremacist group claimed that white Americans were “genetically superior to Blacks, Hispanics, Asians, Iranians and all the other mongrel races in terms of native human intelligence”. To support this claim, the speech writer quoted a study which compared two groups of tenth graders. Each group was given the same exam covering European geography. The exam focused on European rivers, mountain ranges, countries, capital cities, agriculture, industry, religion, music and languages. Group A was 35 tenth graders, 34 of whom were whites with Anglo-European family names. Group A students attended a private college prep school in wealthy Orange County, California. That school requires ninth graders to take a year of European history. Group B was 40 tenth graders, all but 4 of whom were Hispanic, Black, Asian or Middle Eastern. Group B students attended a public high school in a violent, gang infested ghetto community of south central Los Angeles County. Ninth graders at the public high school take a year of world history. The writer pointed out that Group. A did significantly better on the geography test than Group B. 33. Suppose a female social worker objected, “You can’t expect Group B children to be as intelligent. After all, they come from a background of poverty, crime and broken families”. If true, would this social worker’s reason be a good or bad reason, and why? (A) Good reason. Poor neighborhoods mean poor schools, poor schools mean poor teachers, poor teachers mean poor students, poor students mean poor test scores. (B) Good reason. Regardless of race, children from these kinds of backgrounds are less intelligent than children from wealthy backgrounds. (C) Bad reason. Regardless of socioeconomic conditions, intelligence depends on the quality of the school you attend. (D) Bad reason. Poverty, wealth and family circumstances do not make a person more or less intelligent. CORRECT Selecting the “correct” answer to this question, response D, seems to depend upon taking the term “intelligent” to denote something like inborn intellectual aptitude. If the term is taken in this sense, then the social worker’s objection is a clear non sequitur – but it is not clear that this is the only sense in which the term “intelligent” is regularly employed. Suppose, for example, that one takes “intelligent” to be roughly equivalent to “particularly apt at assimilating and applying new concepts and new information”. There is nothing particularly idiosyncratic about ascribing this sense to that term, but if one does so, then it is not at all clear that option D is correct, because it is not clear that this kind of intellectual aptitude cannot be negatively affected by “socio-economic conditions”. Presumably this aptitude is helped or hindered by some array of conditions importantly including brain development, during early childhood especially, and by good education or the lack thereof. And in that case, the social worker’s claim begins to look like a plausible (if not unproblematic) objection to the supremacist: one could take the objection

EXAMINING THE EXAM

127

to say that the difference in intelligence (supposedly) exhibited by the members of these groups is a function of socio-economic background, not of race. Moreover, because the social worker’s objection is clearly a non sequitur provided that “intelligent” is given a strict innatist construal, many of the better critical thinkers taking the test may well reject this construal in favor of an alternate reading like the one just offered. Such a reading is both permissible (owing to the ambiguity of “intelligent”) and more plausible as an interpretation of the social worker’s view; it is also incompatible with the selection of response D, which the test’s authors deem the correct response. Since options A and C are not plausible, this likely will lead the best critical thinkers to reconsider options B and D, costing them time on the exam. Likely some of them in the end would conclude that the innatist notion of intelligence is being assumed, and so choose D. But there is a case for option B. For example, consider a student who reasons something like this: “Eating leadbased paint off the walls (a matter of poverty and family circumstances, among other things) causes lower intelligence. Nutrition (a matter of poverty and family circumstances) has effects on intelligence. So, I’ll choose B – which has a grain of truth to it”. In short, this item is ambiguous, and implausible in its construal of the social worker’s claim. At best, the talented critical thinker will spend inordinate time on it; at worst she will spend that time only to choose a “wrong” answer, B. Before turning to the scope of the exam, there are a few matters of expression worth noting. There is an improper expression of the subjunctive mood contained in the instructions for Questions #11 and #12 viz. “For example, (4) suppose there was a woman prisoner whom you knew for certain to be totally innocent”. This should read, “. . . suppose there were . . . ”. Further, the expression in Question #19 reads in part, “Nobody can be a krendalog to themself, but today . . . ”. This should read, “Nobody can be a krendalog to himself or to herself, but today . . . ”. Now perhaps these might be thought to be mere quibbles. But that would be a mistake: The subjunctive mood is of considerable importance in ordinary English. It provides English speakers with the ability to distinguish between possibility and actuality, a logical distinction of considerable consequence: what is (or was) the case and what can be the case are quite different things, and any critical thinker (including any writer of a critical thinking test) must be competent in expressing this distinction. A competent critical thinker might respond to, “suppose there was a woman prisoner whom you knew for certain to be totally innocent”. By saying, “I don’t know what I’m supposed to do about that now!” And in general careful attention to matters of proper expression is a mark of a critical thinker. Hence, it is important that academic materials reflect this sort of careful attention. Materials and teachers are students’ role models.

Scope Another consideration relevant to the assessment of any test is its breadth, the range of competencies it attempts to measure. Posted at the website

128

DON FAWKES ET AL.

is a fairly comprehensive inventory of more than 250 basic CT skills. By comparing the exam with the inventory we can identify the skills that the exam attempts to measure.3 The reader is encouraged to make a comparison as well. In the presentation below three CT models are crosschecked with the analysis, that is, if a skill is found in one of the three CT models this is indicated by a letter designation: D, for the Delphi model; E, for the United States National Educational Goals model; and S, for the Sonoma model. (Please see references.) The CT skills listed in the analysis are stated as objectives; each completes the phrase, “A critical thinker is able to . . . ”. QUESTIONS 1–4 • interpret, and apply complex texts, instructions D E • distinguish: • conclusions D E S • premises (reasons) D E S • distinguish supporting, conflicting, and compatible claims, arguments, explanations, descriptions, representations etc. D • assess the relevance of claims to other claims D E • evaluate whether a deductive argument is valid or invalid (logical form) D E • evaluate whether an inductive argument is strong or weak D E • evaluate claims and arguments in terms of criteria such as: • consistency D E S • relevance E S • support QUESTIONS 5–9 • recognize ambiguity and unclarity in claims, arguments, and explanations D E • interpret and apply complex texts, instructions, illustrations etc. D E • distinguish supporting, conflicting, compatible, and equivalent claims, arguments, explanations, descriptions, representations, etc. D QUESTION 10 • recognize and clarify issues, claims, arguments, and explanations D E • interpret and apply complex texts, instructions, illustrations etc. D E QUESTIONS 11–13 • distinguish: • conclusions D E S • premises (reasons) D E S • explanations D E S • assumptions (stated and unstated) D E S QUESTIONS 14–19 • interpret and apply complex texts, instructions, illustrations etc. D E • evaluate whether a deductive argument is valid or invalid (logical form) D QUESTIONS 20–21

EXAMINING THE EXAM

129

• interpret and apply complex texts, instructions, illustrations etc. D E • evaluate whether an inductive argument is strong or weak D E QUESTIONS 22–23 • interpret and apply complex texts, instructions, illustrations etc. D E • evaluate whether a deductive argument is valid or invalid (logical form) D QUESTIONS 24–27 • interpret and apply complex texts, instructions, illustrations etc. D E • evaluate whether an inductive argument is strong or weak D E QUESTION 28 • interpret and apply complex texts, instructions, illustrations, etc. D E • evaluate whether an inductive argument is strong or weak D E • identify and avoid errors in reasoning: D • informal fallacy: • post hoc, ergo propter hoc (after that, therefore because of that) QUESTION 29 • interpret and apply complex texts, instructions, illustrations etc. D E • assess the relevance of claims to other claims D E • distinguish supporting, conflicting, compatible, and equivalent claims, arguments, explanations, descriptions, representations etc. D QUESTION 30 • interpret and apply complex texts, instructions, illustrations etc. D E • identify and avoid errors in reasoning: D • informal fallacy: • begging the question QUESTIONS 31–34 • interpret and apply complex texts, instructions, illustrations etc. D E • evaluate whether an inductive argument is strong or weak D E • identify and avoid errors in reasoning: D • informal fallacy: • smokescreen/red herring/rationalizing Summary for The California Critical Thinking Skills Test • interpret, and apply complex texts, instructions D E • distinguish: • conclusions D E S • premises (reasons) D E S • explanations D E S • assumptions (stated and unstated) D E S • assess the relevance of claims to other claims D E • evaluate whether a deductive argument is valid or invalid (logical form) D E • evaluate whether an inductive argument is strong or weak D E • evaluate claims and arguments in terms of criteria such as: • consistency D E S

130

DON FAWKES ET AL.

• relevance E S • support • recognize ambiguity and unclarity in claims, arguments, and explanations DE • distinguish supporting, conflicting, compatible, and equivalent claims, arguments, explanations, descriptions, representations, etc. D • recognize and clarify issues, claims, arguments, and explanations D E • identify and avoid errors in reasoning: D • informal fallacy: • post hoc, ergo propter hoc (after that, therefore because of that) • begging the question • smokescreen/red herring/rationalizing Of more than 250 basic critical thinking skills listed in the inventory (and there are surely more than these), 17 are addressed by CS. Recommendations CS is seriously flawed as it stands. The limited scope may be unavoidable for any multiple choice test on Critical Thinking designed for completion in about an hour. But the exam makes mistakes in critical thinking in questions 6, 7, 8, 19, 21, 23, 24, 29, and 33. In most of the places where it goes wrong, the exam seems likely to produce “false negative” evaluations of the performance of students who have better developed CT skills. But whether or not CS generally results in such false negatives in practice has no bearing on this assessment of CS’s content. The details of the assessment given above provide the grounds on which the conclusion rests. And those grounds show that any results from the portions of the exam shown to be defective cannot be meaningful. No statistical conclusion can follow from contentdefective testing material. Nevertheless, the remainder of the exam is acceptable with respect to content; and, the defective questions can be replaced or modified fairly readily. In the interim, those who may use the exam can eliminate the defective parts from test delivery or from data collection; elimination of these parts from test delivery would be best, in the interest of saving time and avoiding unnecessary distractions for test takers. As to the scope of the exam and the significance of such multiple choice testing, it is unlikely that any multiple choice exam can hope to capture the range of CT basic skills, but that is not an argument against such testing. It is instead an argument in favor of understanding the results of any such testing. Such results can give an indication of competence for the skills measured (as a kind of minimal competency, below which remediation is in order for those skills); but such results cannot serve as an adequate measure of critical thinking skills generally, and any such testing should involve several different tests to give better indications. But (1) since most CT skills involve a “supply” response rather than a “select” response (i.e. most CT skills involve initiating responses rather than making a selection from given

EXAMINING THE EXAM

131

alternatives); and, (2) since most CT skills involve reflection on these “supply” responses themselves (thinking about thinking); and, (3) since many CT skills involve originating thought and then carefully examining it, rather than making any response at all, such testing even when done well can only provide indicators at a rudimentary level. For these reasons any attempt to use such testing to grant any form of credit by exam, or to waive any CT requirement, or to make any positive claim about scores on such exams as indicators of competence is sheer folly. The better place for both the acquisition and the assessment of CT skills is the traditional classroom (with small class size,4 without multiple choice testing, and with the requirement that students explain every answer to a teacher competent in CT skills, who cares enough and has time enough to read and listen and respond to every response and every explanation). There are no short cuts. Quibbles, Controversy, and the General Reader Quibbles and controversy are useful in an open and objective analysis intended to invite discussion and improvement; and perhaps these provide some of the most interesting issues to the general reader. Critical thinking is not easy. It ranges across all subjects and disciplines; it ranges across far more than the logical, ethical, pedagogical and epistemological issues raised here. As to whether or not some of the analysis herein amounts to quibbling, the reader may wish to consider various points of view. One point of view that seems particularly relevant is that of the student. The analysis shows that those students who are most advanced in their critical thinking skills are the ones most likely to be adversely affected. This is a serious ethical and pedagogical matter. As for the matters of expression criticized, collegiate level material needs to meet collegiate standards. The general reader will no doubt find these points of interest as well. This evaluation has been an endeavor to raise and to answer some questions about the scope and content of the exam, and to examine issues of clarity, accuracy and precision that may be of interest to a wider readership, with the expectation that the reader will hold this evaluation to high standards as well. Appendix Demonstrations That “Only those seeking challenge and adventure should join the Army” and “You shouldn’t join the Army unless you seek challenge and adventure”. Do Not Necessarily Express the Same Idea

1. We can explore these matters and provide a little context by considering the notions of “obligation” and “permissibility”.

132

DON FAWKES ET AL.

Where O is an obligation operator and P is a permissibility operator, and is a proposition, the relationship between obligation and permissibility is: O ↔ ∼P ∼ ∼O ∼ ↔ P ∼P ↔ O ∼ ∼O ↔ P ∼ Call this the “Obligation-Permissibility Relation”. The stem of Question 6 gives “Only those seeking challenge and adventure should join the Army” where • • • • • • •

‘Hx’ is “x is a human”, ‘Jx’ is “x joins the Army”, ‘xSy’ is “x seeks y”, ‘a’ is “adventure”, ‘c’ is “challenge”, ‘O’ is the obligation operator, and ‘P’ is the permissibility operator,

the claim “Only those seeking challenge and adventure should join the Army” is (x){OJx → [Hx & (xSc & xSa)1}. So, beginning with ‘(x){OJx → [Hx & (xSc & xSa)]}’ we can prove: 1. 2. 3. 4. 5. 6. 7. 8. 9.

(x){OJx → [Hx & (xSc & xSa)]} OJp → [Hp & (pSc & pSa)] ∼OJp ∨ [Hp & (pSc & pSa)] [∼OJp ∨ Hp] & [∼OJp ∨ (pSc & pSa)] [∼OJp ∨ (pSc & pSa)] & [∼OJp ∨ Hp] ∼OJp ∨ (pSc & pSa) OJp → (pSc & pSa) ∼(pSe & pSa) → ∼OJp ∼(pSc & pSa) → P ∼ Jp

From 1 by Universal Instantiation From 2 by Implication From 3 by Distribution From 4 by Commutation From 5 by Simplification From 6 by Implication From 7 by Transposition From 8 by Obligation-Permissibility

The allegedly correct answer is “You shouldn’t join the Army unless you seek challenge and adventure”, So we have, (x){[Hx & O ∼ Jx] ∨ (xSc & xSa)}. So, beginning with ‘(x){[Hx & O ∼ Jx] ∨ (xSc & xSa)}’ we can prove:

133

EXAMINING THE EXAM

D1. (x){[Hx & O ∼ Jx] ∨ (xSc & xSa)} D2. [Hp & O ∼ Jp] ∨ (pSc & pSa) D3. (pSc & pSa) ∨ (Hp & O ∼ Jp) D4. [(pSc & pSa) ∨ Hp] & [(pSc & pSa) ∨ O ∼ Jp] D5. [(pSc & pSa) ∨ O ∼ Jp] & [(pSc & pSa) ∨ Hp] D6. (pSc & pSa) ∨ O ∼ Jp D7. ∼(pSc & pSa) → O ∼ Jp D8. ∼(pSc & pSa) → ∼PJp

From D1 by Universal Instantiation From D2 by Commutation From D3 by Distribution From D4 by Commutation From D5 by Simplification From D6 by Implication From D7 by Obligation-Permissibility

Since step 9, in the first proof, is not equivalent to step D8 in the second proof, the proposed answer to Question #6 is incorrect. The authors of the California Test might reply by saying, The claim, “You shouldn’t join the Army” is to be understood as [Hx & ∼OJx]. But consider the following: • Though “You shouldn’t join the Army” can be understood as [Hx & ∼OJx], this is not clearly the only (or best) understanding of “You shouldn’t join the Army”. While ‘should’ is ambiguous vis-á-vis the kind of obligation – presumably it is prudential obligation-it would seem to make better sense to claim it means, “It is in your best interest not to join the Army”, rather than, “It is not in your best interest to join the Army”. • For the “key correct” answer actually to be correct it is not sufficient for it to be the case that “You shouldn’t join the Army” can be understood as [Hx & OJx], rather, this must be the case. • If there is any ambiguity regarding what the statement means, the question is defective, since it is a question of which statement expresses “the same idea”. 2. Perhaps a simpler way to see these matters is as follows: The basic point is: “should not p may be interpreted either as (1) “should not p” or (2) “not should p”. The relevant structure of the statements can be expressed simply as: The Stem Statement: “(x)[Ojx → Ax]” where Ax = [Hx & (xSc & xSa]. This translates “Only A are OJ” as the categorical statement “All OJ are A”. The relevant structure of the second (purportedly equivalent) statement in response D is EITHER: “not OJ unless A” = “∼OJ ∨ A” = “∼A → ∼ OJ” in which case response D “would mean the same as the statement in the stem.” OR:

134

DON FAWKES ET AL.

“∼A → O ∼ J” in which case response D “would not mean the same as the statement in the stem.” Acknowledgements Thanks to the following individuals for providing suggestions, review, and/or commentary on earlier drafts: Bonnie Abney, President, Abney International; Henry C. Byerly, Regents Professor Emeritus, The University of Arizona; Manuel Davenport, Professor, Texas A & M University; James B. Dixon, Associate Dean of Liberal and Interdisciplinary Studies, St. Augustine College; Charles Hudlin, Professor, United States Air Force Academy; Nathan H. Miller, Attorney, Harrisonburg, Virginia; Richard Morehouse, Editor, Analytic Teaching and Professor, Viterbo College of La Crosse, Wisconsin; Ann-Janine Morey, Professor, Southern Illinois University at Carbondale; Paul Newberry, Professor, California State University at Bakersfield; Richard Parker, Professor, California State University at Chico, and co-author of Critical Thinking (1998); James Slinger, Professor Emeritus, California State University, Fresno; James M. Smith, Professor Emeritus, California State University, Fresno; Barry Watts, Director of Programs Analysis and Evaluation, Office of the United States Secretary of Defense. Thanks also to Tom Adajian and Bill Knorpp of James Madison University, and Katherine Dimittiou of the University of Virginia for suggestions for the first section of this paper. Notes 1 “K(sl, p1), K(s2, p2)”, can be read as “s1 is a krcndalog of p1, s2 is a krendalog of p2” and soon. 2 “K(s1, p1), K(s2, p1)”, can be read as “s1 is a krendalog of p1, s2 is a krendalog of p1” and

soon. 3 Fawkes (2001) provides an analysis of the scope of all currently available commercial CT tests and Fawkes et al. (2001) provides an assessment of the Watson–Glaser Critical Thinking Appraisal exam as to both content and scope. 4 It may he appropriate to quantify the meaning of “small class size”. Perhaps it would be best simply to note that at this writing the fairly widely publicized national goal of the U.S. Department of Education for K through 12 education is a class size of 18. Is collegiate education less demanding? The relevant question here is not whether or not college courses are more or less demanding than K through 12 courses, but rather how demanding the relevant courses are to the students taking them, and obviously there is a range difficulty for any group of students. In that sense, K through 12 subjects likely are generally as demanding to those students as college subjects generally are to college students.

References Crosschecked Models of Critical Thinking: D the Delphi model: Facione, Peter A. 1990, Critical Thinking: A Statement of Expert Consensus for Purposes of Educational Assessment and Instruction, “The Delphi Report”, The California Academic Press, Milbrae, California.

EXAMINING THE EXAM

135

E the U.S. National Educational Goals model: Click, Benjamin A.L., Hoffman, Steven, Jones, Elizabeth, Moore, Lynne M., Ratcliff, Gary & Tibbitts, Stacy: 1990, National Educational Goals, National Assessment of College Student Learning: Identifying College Graduates’ Essential Skills in Writing, Speech and Listening and Critical Thinking, Final Round Consensus of Faculty, Employers, and Policymakers, United States Department of Education. S the Sonoma model: Paul, Richard et al.: 1998, Center for Critical Thinking, Sonoma State University, Critical Thinking: Basic Theory & Instructional Structures, Rohnert Park, California: The Foundation for Critical Thinking. General References: Facione, Peter A. & Facione, Noreen C.: 1992, The California Critical Thinking Skills Test, Milbrae, California: The California Academic Press, Second Edition (updated) 1994. Fawkes, Don: 2001, ‘Analyzing the Scope of Critical Thinking Exams’, American Philosophical Association Newsletter on Teaching Philosophy, Spring. Fawkes, Don, Adajian, Tom, Flage, Dan, Hoeltzel, Steven, Knorpp, Bill, O’Meara, Bill, and Weber, Dave: 2001, ‘Examining The Exam: A Critical Look At The Watson–Glaser Critical Thinking Appraisal Exam’, Inquiry, Fall.