Harder, Knowledge-Based QA Questions for Intelligence Analysts and the Researchers Who Want to Help Them∗ Selmer Bringsjord Micah Clark Andrew Shilliday Joshua Taylor Rensselaer AI & Reasoning (RAIR) Lab Department of Cognitive Science Department of Computer Science Rensselaer Polytechnic Institute (RPI) Troy NY 12180 USA {selmer,clarkm5,shilla,tayloj}
[email protected] rough draft of 12.13.06
Contents 1 Introduction and Immediate Disclaimer
1
2 One Part of Bringsjord’s Theory of Intelligence Analysis
2
3 Additional Theoretical Foundations
3
4 Why Invest in Knowledge-Based R&D?
3
5 Specific Motivation: Be Responsive to the Nature of Analysis
4
6 The Three Classes of Harder Questions 6.1 Class One: LIj . . . . . . . . . . . . . . . . . 6.1.1 Relationship Between LIj and F1–F4 . 6.2 AR Questions . . . . . . . . . . . . . . . . . . 6.2.1 Relationship Between ARj and F1–F4 6.3 LRj Questions . . . . . . . . . . . . . . . . . 6.3.1 Relationship Between LRj and F1–F4
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
4 5 6 7 8 8 8
7 Who Can Provide Questions in LI, LR, and AR?
9
8 For Further Reading
9
∗ We are indebted to a number of researchers who, through the years, have raised our understanding of questions used in the “high stakes” standardized testing arena. This group includes Malcolm Bauer at ETS, Princeton’s Phil Johnson-Laird, and RPI’s own Yingrui Yang. Special thanks are due to Elizabeth Bringsjord, for analysis and information given from the perspective of psychometrics.
1
Introduction and Immediate Disclaimer
This short white paper introduces, and briefly explains the value and relevance of, three classes — LIj , ARj , LRj — of questions to intelligence analysis, and to those engineers wanting to build QA systems that help these analysts. The classes AR and LR (no subscript j) will be familiar to most psychometricians. If you are such a reader, note that the subscript indicates that all the questions we are proposing require, as part of the answer given back, a justification for that answer. Whenever possible, the justification should be an outright proof, though it will presumably be an informal one. (Informal proofs make significant use of natural language. Formal proofs have natural language in them.) Please note that if it was discovered that some human had devised a way to correctly answer questions on a “high-stakes” standardized test without being able to give a rigorous justification at the semantic level, the test would thereby be invalidated. Specifically, if the New York Times were shown an algorithm for correctly answering questions on the LSAT on the basis of superficial structure of the text, rather than on the basis of understanding its underlying propositional content, that would be the end of the LSAT. For this reason, it is quite peculiar that QA technology, and testing of that technology, has often steered clear of semantic-level justification. These three classes are distinguished by fact that they are responsive to four distinguishing features of intelligence analysis (IA); this quartet — F1–F4 — is given in section 5. Systems, whether human or machine, able to answer questions of a type responsive to this quartet are necessarily knowledge-based (or more precisely, logic-based1 ) systems. Now, our disclaimer is this: We in no way wish to impugn the value of the kinds of questions typically found today found in current QA systems, and in systems tackling “text entailment” challenges like those posed by Pascal RTE. We don’t fully understand why these questions have received so much attention, while questions of the sort that we present in this paper have been almost completely ignored, but we in no way wish to criticize the work revolving around standard, simple questions: this is fantastic work that we ourselves find valuable engineering-wise. But governmental sponsorship of R&D that covers the full spectrum of questions (see the relevant axis in Figure 1) is needed. At present, DTO’s portfolio of QA R&D is apparently weighted severely in the direction of simple questions, and therefore decidedly away from F1–F4. No one can dispute the fact that the standard questions are painfully simple, and that this simplicity obviates the need to do sophisticated logical reasoning of the sort that is part and parcel of intelligence analysis. The simplicity of the questions is confirmed by the fact that subjects who are plagued by bias (as shown by the established empirical techniques in psychology of reasoning for exposing bias; more about that later) can nonetheless for the most part answer such questions correctly, and can provide correct justifications for these answers. (As you know, justifications are not required in the case of the simpler questions used in extant QA systems and RTE.) Consider the RTE 1 pair 1. Researchers at the Harvard School of Public Health say that people who drink coffee may be doing a lot more than keeping themselves awake — this kind of consumption apparently also can help reduce the risk of diseases. 2. Coffee drinking has health benefits. 1 Mathematically speaking, knowledge-based R&D is logic-based R&D, and for maximum clarity is best viewed that way, but this is a brief, non-technical document, and so we don’t cast things in the form of formal logic. For coverage of logicist or logic-based AI, consult (Bringsjord & Ferrucci 1998a, Bringsjord & Ferrucci 1998b, Nilsson 1991). In addition, for a quick, readable overview of logic-based AI, you may find it profitable to consult the relevant entries on AI in the Stanford Encyclopedia of Philosophy: “Logic and Artificial Intelligence” by Richmond Thomason, and the forthcoming “Artificial Intelligence” by Selmer Bringsjord (available directly from Bringsjord).
1
Now suppose that 100 undergraduates at a first-rate university, as subjects in a simple experiment (this is the typical type of subject used in experimental psychology), are given this pair, and asked to say whether 2. follows logically from 1. What percentage of the group would give the correct response (affirmative, of course)? The answer, obviously, is 100%, or at least very close to that. Furthermore, if asked to provide a justification, the vast majority of subjects would doubtless argue that since coffee consumption reduces the risk of diseases, and any activity that reduces the risk of diseases has health benefits (a fact that is part of their background knowledge), it follows that coffee consumption has health benefits. The justification in this case is a deductively valid argument. It can easily be formalized as a proof. For a stark contrast, consider the following classic problem: the so-called Wason Selection Task (Wason 1966). Suppose that you are dealt four cards out of a larger deck, where each card in the deck has a digit from 1 to 9 on one side, and a capital Roman letter on the other. Here is what appears to you when the four cards are dealt out on a table in front of you, as you look down upon the surface of the table: E
K
4
7
Now, your question is this one: Which card or cards should you turn over in an attempt to determine whether the following rule is true? (R) If a card has a vowel on one side, then it has an even number on the other side.
In contrast to the RTE pair above, less than 5% of the college-educated adult population can solve this problem — but, predictably, those sufficiently capable of bias-free, context-independent deductive reasoning are rarely fooled. We have replicated the result countless times over the past 15 years, with subjects ranging from 7th grade students to illustrious members of the Academy; see (Bringsjord, Bringsjord & Noel 1998). About 30% of subjects do turn over the E card, but that isn’t enough: the 7 card must be turned over as well. As to the justification, consider the following. The rule in question is a so-called conditional, that is, a proposition having an if-then form, which is often symbolized as φ → ψ. As the truthtables routinely taught to young math students make clear (e.g., see Chapter 1 of Bumby, Klutch, Collins & Egbers 1995), a conditional is false if and only if its antecedent, φ, is true, while its consequent, ψ, is false; it’s true in the remaining 3 permutations. So, if the E card has an odd number on the other side, (R) is overthrown. However, if the 7 card has a vowel on the other side, this too would be a case sufficient to refute (R). The other cards are entirely irrelevant, and it’s irrational to turn them over. Children have no chance against the selection task just presented. But children, as we have noted, can almost invariably answer the questions on which non-knowledge-based QA tends to focus. When it comes to the classes of questions we introduce herein, the situation is quite different: Subjects able to correctly answer questions from our three categories, as a matter of empirical fact, are subjects who are largely immune to bias. Such immunity correlates with higher performance on tasks that require bias-free reasoning. IA is such a task.
2
One Part of Bringsjord’s Theory of Intelligence Analysis
But why have the questions in QA/RTE R&D undertaken for the IC heretofore been simple? We think the answer is that the technology explored in this R&D is aiming at the very early, prereasoning stages of IA. If one is trying to figure out what recommendation should be provided 2
after receiving a request to explore different responses that the US might make in the wake of a catastrophic terrorist attack sponsored by government G, one may well begin with questions that are simple. The analyst may want to know about the official offices in G, their backgrounds and beliefs, and so on. In order to get this information, the analyst may wish to issue questions that are straightforward; so-called “factoid” questions. But later in the process, inevitably, the questions the analyst has will be deep and complex, and the analysis and arguments given in support of a recommendation will by definition be deep, complex, and knowledge-based. For example, the analyst may want to know what the response of G will be to each of the responses that are available to the US, and the analyst should want a justification for answers received. At this point, her questions will require knowledge-based technology, if these questions are to be answered by systems that assist her. Bringsjord’s theory of intelligence analysis is sensitive to variation in complexity of question, and to where in the IA process (from initial acquisition of data to polishing and then issuance of report) a question is issued. More specifically, his theory includes the view that IA, and R&D aimed at supporting and facilitating IA, obeys a certain binary function. This function takes as input two parameters: a temporal one indicating what stage the analyst is at, ranging from tasking to the issuing of a final report; and a question Q falling along a continuum of complexity, ranging from “factoid” questions, to questions featured in current QA and RTE systems, to the kinds of questions seen in the three categories (LIj , ARj , LRj ) introduced in the present white paper. The value of the function operating on these inputs indicates whether R&D undertaken in support of the analyst at this point (in Euclidean 2-space) is knowledge-based or non-knowledge-based in nature. Please see Figure 1 for a pictorial view of the function in question.2
Figure 1: A Graphical View of Part of Bringsjord’s Theory of Intelligence Analysis 2
We do of course realize that the time allowed an analyst between tasking and report varies greatly among the different kinds of analysis needed by the United States and its allies. A CIA analyst may spend a long time working on a report that predicts what will happen should the US suddenly pull out of Iraq, but another analyst may need to react very quickly to pieces of data continuously streaming in to him or her. The present paper, and specifically the graph shown in Figure 1, leaves this complexity aside in the interests of exposition within a short white paper. To be neater would require the function to be a ternary one, since the type of analysis would need to be another argument.
3
3
Additional Theoretical Foundations
Bringsjord et al. (2003, 2004) have argued that the engineering of intelligent systems should be based upon determinate tests. The present white paper is in line with this position.
4
Why Invest in Knowledge-Based R&D?
It has come to our attention that there is a desire, on the part of those in position to control, or at least influence, funding for R&D intended to help the IC, to receive justification for QA R&D that is knowledge-based (or, again, more accurately, logic-based; see note 1). In light of this desire, the present paper may be helpful, as it it is designed to show that without funding for knowledge-based R&D, QA systems responsive to the four-fold motivation in the next section would never arrive. There are compelling reasons to carry out knowledge-based R&D in IC-relevant areas outside QA, but providing these reasons is outside the scope of the present paper.
5
Specific Motivation: Be Responsive to the Nature of Analysis
With respect to the word ‘Harder’ in our title: We don’t mean ‘hard’ in the way you might expect. For example, Under what conditions would Kim Jong II launch a nuclear missile? is not the kind of question with which we are concerned herein. Such a question is indeed a hard one, intelligence analysts of certain types are certainly in the business of trying to answer it (and report and justify these answers for others), and some engineers (in AI, e.g.; we ourselves are in this category) are in the business of building software intended to help analysts produce justified answers to such questions. But, herein, these are not our questions. Our questions are hard relative to current QA and RTE, and our questions are motivated by four facts about intelligence analysis, viz., F1 (Avoid Bias) Intelligence analysts, like most human reasoners, can, with disturbing ease, succumb to bias (e.g., see Heuer 1999). But because the stakes can be rather high in intelligence, bias can be an exceedingly bad thing. If faulty reasoning supports the recommendation to refrain from detaining a potential sleeper, many innocents can die. If faulty reasoning supports the recommendation to carry out a significant military action, and that action backfires, that can be a very nasty situation. And so on. F2 (Dodge Deception) Intelligence analysts, more than most, must be able to determine when deception is afoot — or, to put it more crudely, when they are being lied to. Before heading out with her to a formal ball, Selmer’s wife may tell him that, in his new tuxedo, he looks positively dashing, but if she is lying, no harm (and arguably some good) is done. But when an analyst falls prey to deception, the consequences can be devastating. Unfortunately, the bad guys are often smart enough to know that if they can deceive us, their effectiveness can be greatly enhanced. F3 (Handle Novelty) Intelligence analysts must be prepared to reason over novel information, in order to thwart novel plans to harm the United States, her allies, her citizens, and other innocents. Unfortunately, our enemies don’t hand us a nice neat “training set” in advance of their strikes. On the contrary, as far as is possible given various constraints, our enemies explicitly try to behave in ways that cannot be computed from prior cases. F4 (Justify) Intelligence analysts must be prepared to justify their recommendations and hypotheses. If the recommendation is made to arrest a suspect, or intervene in the internal affairs of another country,
4
or search for WMDs in a particular area, the analysts must be able to give, at least in principle, a cogent rationale that supports the recommendation. That rationale, if it is to hold any water, must be of a structure that is normatively correct, and certifiably so.
The questions we urge the IC to sponsor investigation of are directly responsive to these four facts. This would mean that sponsorship would need to be of knowledge-based R&D. We turn now to the questions in the three aforementioned categories.
6
The Three Classes of Harder Questions
We introduce three classes of harder questions, each of which requires knowledge-based cognition to correctly answer in the human case, and knowledge-based R&D in order to build computational systems able to correctly answer in the machine case. The three classes are: Logical Illusion with justification required (LIj ), Analytical Reasoning (ARj ) with justification provided, and Logical Reasoning with justification provided (LRj ). (As we shall see, Aj R can be partitioned into those V questions that are wholly linguistic, ARL j , and those that include some visual information, ARj .) Items for the second and third category have occupied a longstanding, venerable position in the history of “high-stakes” testing and psychometrics. For example, both AR and LR have been used for many years on the LSAT, which among high-stakes standardized tests has unparalleled predictive power (e.g., higher than the SAT and GRE). We now give some examples of each of the three classes, beginning with LIj .
6.1
Class One: LIj
Here’s an example of a question of the LI type:3 LI1 (1) If there is a king in the hand, then there is an ace, or else if there isn’t a king in the hand, then there is an ace. (2) There is a king in the hand. Given these premises, what can you infer?
Take a moment to answer the question in your mind, or even jot the answer down. Now, Johnson-Laird has recently reported that Only one person among the many distinguished cognitive scientists to whom we have given [LI1 ] got the right answer; and we have observed it in public lectures — several hundred individuals from Stockholm to Seattle have drawn it, and no one has ever offered any other conclusion (Johnson-Laird 1997b, p. 430).
The bias-based conclusion often drawn is that there is an ace in the hand.4 Bringsjord (and, doubtless, many others) has time and time again, in public lectures, replicated Johnson-Laird’s numbers — presented in (Johnson-Laird & Savary 1995). Why is this conclusion wrong? First, note that ‘or else’ is to be understood as exclusive disjunction; that is, one of the conditionals (= if-thens) is true, but not both. It follows immediately that one of the conditionals is false. But as we already noted when examining the Wason Selection Task, the only way a conditional φ → ψ is false is when the antecedent (φ) is true, but the 3 4
LI1 is from (Johnson-Laird & Savary 1995). Variations are presented and discussed in (Johnson-Laird 1997a). Note that Johnson-Laird’s original version, shown just above, doesn’t request a justification from the subject.
5
consequent (φ) is false. Since both conditionals have the same consequent (viz., there is an ace in the hand), no matter which conditional is false, it’s not the case that there is an ace in the hand. Lest you think the illusion here hinges on concealing from the average human that the disjunction is exclusive, we point out that even when you make the exclusive disjunction explicit, the results are the same. For example, you still have an illusion if you present LI0j,1 (1) If there is a king in the hand then there is an ace; or, if there isn’t a king in the hand then there is an ace; but not both of these if-thens. (2) There is a king in the hand. Given these premises, what can you infer? Why?
Please note that once one has discovered a structure that produces a logical illusion, that structure can be “clothed” in different ways, but the force of the illusion will be preserved. Figure 2 shows the illusion above clothed in a manner that has an intelligence flavor. Subjects lacking an ability to reason in context-independent fashion succumb to this re-clothed illusion. (Of course, any subject presented repeatedly with a question having the very same structure would no doubt become suspicious, and the experiment would be contaminated.) Generalizing, humans and machines able to answer the questions in our three proposed categories would be able to answer questions that are specifically related to subjects analysts are concerned with, in the real world. It wouldn’t be difficult to find real questions that correspond to the questions in our three proposed categories, and we have done some work in this area. For example, it’s not hard to show that CIA- and DIA-asked questions about what purposes vehicles found in Iraq served before the US invasion are knowledge-based questions confirming the reality and importance of F1–F4. Suppose that it’s true that: Either (1) If North Korea has long-range nuclear missiles, then we must invade. or (2) If North Korea doesn’t have long-range nuclear missiles, then we must invade. but not both (1) and (2). What follows logically from this given information? Why? 00
Figure 2: A “Re-clothed” Logical Illusion (LIj,1 )
6.1.1
Relationship Between LIj and F1–F4
F1 (Avoid Bias) A human or machine unable to correctly answer LIj questions is plagued by bias. But, a human or machine immune to such illusions is in all likelihood immune to bias, at least to a very high degree. 6
F2 (Detect Deception) A human or machine who answers an LIj question incorrectly can easily be deceived. (To succumb to an illusion is by definition to be deceived. If we manufacture the illusion that fools you, and do so with the aim of fooling you, we have successfully deceived you.) It is thus very important to make sure that intelligence analysts, and the machines that assist them, don’t answer such questions incorrectly. It’s also important to engineer systems that help prevent the analyst from slipping into this bias. F3 (Handle Novelty) LIj questions do not come with any training set. In fact, one of the interesting things about LIj items is that they are novel; each one is different. A human or machine able to handle these questions can, in some sense, handle novelty. F4 (Justify) In order to answer LIj questions, a sound justification must be provided, at the semantic level.
6.2
AR Questions
A typical ARj question, this one from ETS, is shown in Figure 3. (Technically, this question is in AR, but we assume that a call for a justification is included.) Some readers may remember these items from the old GRE (Graduate Record Exam). On that exam, the Analytical section contained questions of this type, as well as LR questions. Now, the GRE has two sections, neither of which includes LR or AR items. However, as mentioned above, the LSAT has LR and AR items; in fact, such items are the heart of the exam. Viewed from the standpoint of knowledge representation and reasoning, the problem shown in Figure 3, which we leave to sedulous readers to crack, is a request is for a consistency proof. (All questions in ARj are deductive in nature, so proofs are always available, in principle.) That is, only one of the options (the key, to use the argot of psychometrics) is consistent with the given information. A proof (or careful argument) establishing this would show a situation in which all the given information holds, and the information in the key does as well. This approach relies on the fact that a set Φ of propositions is consistent provided that all of the propositions in Φ are true on at least one — to use the relevant term from logic — model. Problems within AR that are entirely linguistic, such as the one shown in Figure 3, are in the space ARL j . Another very interesting sub-space of ARj is composed of questions in which some of the given information is couched in diagrammatic form. For example, some of the information in the problem shown in Figure 3 might be given diagrammatically as in Figure 4, which shows a set of constraints with respect to the 11–12pm time slot. 6.2.1
Relationship Between ARj and F1–F4
F1 (Avoid Bias) A human or machine unable to correctly answer ARj questions would be hampered by bias. One able to crack such problems has, to an appreciable degree, a capacity to reason in context-independent, normatively correct fashion. F2 (Detect Deception) ARj problems are not very deceptive. In fact, they are distinctively “what-you-see-is-what-you-get.” All the information needed for a correct answer, and a proof that it’s indeed correct (as well as for proofs that other options are incorrect), is invariably be provided. There is no uncertainty in these items. However, as in all testing of this form, the test designer does premeditatedly construct distractors — the options in multiple-choice tests that are incorrect.
7
Three security force details — ‘Papa’, ‘Quebec’, and ‘Romeo’ — are assigned to guard three sections — the high-voltage Junction substation, the Kilocycle turbine generators, and the spent fuel-rod Loading dock — of the San Onofre nuclear power plant in three shifts of one hour each. The shifts begin at 9:00 p.m., 10:00 p.m., and 11:00 p.m. Exactly one of the three details is assigned to each of the sections per shift. No detail is assigned to any given section for more than one shift, and no detail is assigned to more than one section per shift. An Abu Sayyaf strike team, intent on stealing spent fuel-rods, has tracked the movements of the security forces for the past week. From their observations the strike team has learned: ‘Papa’ always guards the Loading dock beginning at 11:00 p.m. ‘Papa’ always guards the Kilocycle turbine generators earlier than ‘Quebec’ guards the Kilocycle turbine generators. Which of the following does the Abu Sayyaf strike team believe can be the assignment of guards to the Loading dock beginning at 9:00 p.m., 10:00 p.m., and 11:00 p.m., respectively?
(A) (B) (C) (D) (E)
9:00 p.m. ‘Papa’ ‘Quebec’ ‘Romeo’ ‘Romeo’ ‘Romeo’
10:00 p.m. ‘Romeo’ ‘Papa’ ‘Papa’ ‘Quebec’ ‘Romeo’
11:00 p.m. ‘Papa’ ‘Romeo’ ‘Quebec’ ‘Papa’ ‘Papa’
Figure 3: An AR Question (AR1 ) In the case of ARj , though, it’s very difficult to fashion clever distractors (as compared, e.g., to LI and LR). F3 (Handle Novelty) ARj questions do not come with any training set. However, there are thousands, perhaps millions, of AR questions available. These questions are not only provided by the organizations using AR questions in their tests; they are also provided by the many companies in the business of trying to help test-takers become prepared for tests that include AR questions. For example, the Princeton Review is in this category. F4 (Justify) In order to answer ARj questions, a sound justification must be provided. That justification must come in the form of an informal or formal proof.
6.3
LRj Questions
We come now to the third and final class of questions. An example of such a question is shown in Figure 5, the “Lobster Problem.” As these questions go, this particular question is on the more difficult side. Readers are encouraged to ascertain the correct answer for it now, to appreciate the complexity of questions in this class. It will be quickly seen that one of the distractors in this case is very attractive — which is what makes the problem so difficult. If you wish to know the answer to the problem, and what the best distractor is, look at the footnote linked to at the end of the sentence you’re presently reading, and if you wish to see proofs confirming this answer, please email Bringsjord directly.5 5
Key: A. Clever distractor: E. (B, C, D easily ruled out.)
8
Figure 4: Info About 11pm Slot for AR Question AR1 Given Diagrammatically Lobsters usually develop one smaller, cutter claw and one larger, crusher claw. To show that exercise determines which claw becomes the crusher, researchers placed young lobsters in tanks and repeatedly prompted them to grab a probe with one claw — in each case always the same, randomly selected claw. In most of the lobsters the grabbing claw became the crusher. But in a second, similar experiment, when lobsters were prompted to use both claws equally for grabbing, most matured with two cutter claws, even though each claw was exercised as much as the grabbing claws had been in the first experiment. Which of the following is best supported by the information above? Why? A. Young lobsters usually exercise one claw more than the other. B. Most lobsters raised in captivity will not develop a crusher claw. C. Exercise is not a determining factor in the development of crusher claws in lobsters. D. Cutter claws are more effective for grabbing than are crusher claws. E. Young lobsters that do not exercise either claw will nevertheless usually develop one crusher and one cutter claw.
Figure 5: The (LRj ) Lobster Problem Engineering a machine able to answer these questions is not doable today. But these questions are an inspiring target for knowledge-based QA technology to aim for. Needless to say, IA-relevant isomorphs could be devised for LR problems, but doing so takes some ingenuity. We will spare the readers re-clothed versions of the lobster problem. 6.3.1
Relationship Between LRj and F1–F4
We will not repeat here the full sequence covered for the previous two classes of questions. We simply note that in the case of LRj , the questions are responsive to each of F1–F4.
7
Who Can Provide Questions in LI, LR, and AR?
At least theoretically, LRj and ARj questions can be provided by the Educational Testing Service (ETS) and the Law School Admissions Council (LSAC). Item writers, as they are called, work both
9
inside and (as independent contractors providing content back to ETS and LSAC) outside these organizations. The development and testing of these items, to put it mildly, is well-oiled. A very inexpensive way to acquire AR and LR items would be to directly hire relevant item writers. While such items would not have gone through the statistical analysis carried out by ETS and LSAC (using experimental sections in tests given to thousands of test-takers; this is how the difficulty of the questions is divined), that probably doesn’t matter for purposes of driving and evaluating knowledge-based QA technology. In addition, many researchers familiar with AR and LR items, and familiar as well with logic, can create such items. Such researchers could be employed to provide such items to ETS and LSAC, and could therefore obviously provide them to DTO or NIST. LI/LIj questions are more challenging to develop. They are clearly the most interesting class of items of the three, but those who can author them must not only be immune to such illusions: they must also have on hand some algorithms for generating such questions. Such algorithms have not been published anywhere, to our knowledge. They would be exceedingly clever algorithms. At the top of the list of those able to create LI questions, obviously, is Johnson-Laird, who has generated and analyzed many. Bringsjord and Yang (Bringsjord & Yang 2003a) present an algorithm-sketch for generating LI questions. Micah Clark’s (2008) dissertation is devoted to presenting and testing algorithms for creating deception by generating the illusions catalyzed by LI questions.
8
For Further Reading
Motivated readers may find it helpful to assimilate content upon which the present paper is based, for example: • The reigning master of creating questions in category LI is Johnson-Laird, inventor of the mental models theory of human reasoning (e.g., introduced in Johnson-Laird 1983). A nice paper on logical illusions is (Johnson-Laird, Legrenzi, Girotto & Legrenzi 2000). • As already mentioned, Bringsjord and Yang (Bringsjord & Yang 2003a) present an algorithmsketch for generating LI questions (neither has endeavored to develop a full-blown algorithm, though that could certainly be done), and Clark’s (2008) dissertation presents and tests algorithms for creating deception by generating the illusions catalyzed by LIj questions. • For a nice introduction to the distinction between System 1 and System 2 cognition, replete with reference to questions of a sort that require System 2 cognition to solve, see (Stanovich & West 2000). A commentary on this piece perhaps worth reading is (Bringsjord & Yang 2003b). • Presentation of, and evidence for, the view that humans, if appropriately trained, can reason in bias-free, context-independent fashion, can be found in (Bringsjord et al. 1998, Rinella, Bringsjord & Yang 2001).
References Bringsjord, S., Bringsjord, E. & Noel, R. (1998), In defense of logical minds, in ‘Proceedings of the 20th Annual Conference of the Cognitive Science Society’, Lawrence Erlbaum, Mahwah, NJ, pp. 173–178. Bringsjord, S. & Ferrucci, D. (1998a), ‘Logic and artificial intelligence: Divorced, still married, separated...?’, Minds and Machines 8, 273–308. Bringsjord, S. & Ferrucci, D. (1998b), ‘Reply to Thayse and Glymour on logic and artificial intelligence’, Minds and Machines 8, 313–315.
10
Bringsjord, S. & Schimanski, B. (2003), What is artificial intelligence? Psychometric AI as an answer, in ‘Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI–03)’, San Francisco, CA, pp. 887–893. Bringsjord, S. & Schimanski, B. (2004), Pulling it all together via psychometric ai, in ‘Proceedings of the 2004 Fall Symposium: Achieving Human-Level Intelligence through Integrated Systems and Research’, Menlo Park, CA, pp. 9–16. Bringsjord, S. & Yang, Y. (2003a), Logical illusions and the welcome psychologism of logicist artificial intelligence, in D. Jacquette, ed., ‘Philosophy, Psychology, and Psychologism: Critical and Historical Essays on the Psychological Turn in Philosophy’, Kluwer, Dordrecht, The Netherlands, pp. 289–312. Bringsjord, S. & Yang, Y. (2003b), ‘Problems used in psychology of reasoning are too easy, given what our economy demands’, Behavioral and Brain Sciences 26(4), 528–530. Bumby, Klutch, Collins & Egbers (1995), Integrated Mathematics Course 1, Glencoe/McGraw Hill, New York, NY. Clark, M. (2008), Cognitive Illusions and the Lying Machine, PhD thesis, Rensselaer Polytechnic Institute (RPI). Heuer, R. (1999), Psychology of Intelligence Analysis, Center for the Study of Intelligence, Central Intelligence Agency; United States Government Printing Office, Pittsburgh, PA. Johnson-Laird, P. (1997a), ‘Rules and illusions: A criticial study of Rips’s The Psychology of Proof’, Minds and Machines 7(3), 387–407. Johnson-Laird, P. N. (1983), Mental Models, Harvard University Press, Cambridge, MA. Johnson-Laird, P. N. (1997b), ‘And end to the controversy? A reply to Rips’, Minds and Machines 7, 425– 432. Johnson-Laird, P. N., Legrenzi, P., Girotto, V. & Legrenzi, M. S. (2000), ‘Illusions in reasoning about consistency’, Science 288, 531–532. Johnson-Laird, P. & Savary, F. (1995), How to make the impossible seem probable, in ‘Proceedings of the 17th Annual Conference of the Cognitive Science Society’, Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 381–384. Nilsson, N. (1991), ‘Logic and Artificial Intelligence’, Artificial Intelligence 47, 31–56. Rinella, K., Bringsjord, S. & Yang, Y. (2001), Efficacious logic instruction: People are not irremediably poor deductive reasoners, in J. D. Moore & K. Stenning, eds, ‘Proceedings of the Twenty-Third Annual Conference of the Cognitive Science Society’, Lawrence Erlbaum Associates, Mahwah, NJ, pp. 851–856. Stanovich, K. E. & West, R. F. (2000), ‘Individual differences in reasoning: Implications for the rationality debate’, Behavioral and Brain Sciences 23(5), 645–665. Wason, P. (1966), Reasoning, in ‘New Horizons in Psychology’, Penguin, Hammondsworth, UK.
11