An Evaluation of Polygraphers' Judgments: A Review ... - CiteSeerX

9 downloads 0 Views 1MB Size Report
EVALUATION OF POLYGRAPHERS' JUDGMENTS. 703 raphers' evaluations. As to reliance on confessions, this may bias the sampling of cases studied.
Journal of Applied Psychology 1982, Vol.67, No. 6, 701-713

Copyright 1982 by the American Psychological Association. Inc. 0021-9010/82/6706-0701100.75

An Evaluation of Polygraphers' Judgments: A Review From a Decision Theoretic Perspective Gershon Ben-Shakhar, Israel Lieblich, and Maya Bar-Hillel Hebrew University, Jerusalem, Israel

A critical discussion of the theory, practice, and evaluation process of the Control Questions Technique (CQT) is undertaken. Sources of possible contamination in the use of this technique are described. Data from seven field studies are reviewed and used to estimate the discriminability of the CQT in real-life situations. A signal detection model is applied to areanalysisof these seven studies. An attempt is made to derive the value system of the polygraphers who participated in the seven studies. It is demonstrated that under an assumption of rationality, the examiners tend to value the detection of guilty suspects highly, even in the presence of a high risk of falsely classifying innocent suspects as deceptive. An index of usefulness of the CQT-based evaluation system is denned, and the CQT's range of usefulness is examined as a function of operating costs, possible payoff ratios, and prior probabilities of guilt. The results of these analyses indicate that it is unlikely that the CQT method of lie detection would be useful for determining guilt in a court of law, or even for preemployment screening, though it might be useful for police investigative procedures.

Recently we have been witnessing a debate over the Control Questions Technique (e.g., Backster, 1963; Reid & Inbau, 1966) in the detection of deception (Lykken, 1974, 1978, 1979; Podlesny & Raskin, 1977; Raskin, 1978; Raskin & Podlesny, 1979). This debate has focused mainly on the validity of this technique. The principal debaters, Lykken and Raskin, have cited different studies to support their respective positions. The present article attempts to integrate the results of several of these studies and to evaluate them from a decision-theoretic perspective in hope of clarifying the issues. Additionally, such a perspective could differentiate between the various contexts in which the polygraph is frequently used that are characterized by different utility structures. Briefly, the Control Questions Technique (CQT) utilizes two types of questions for interrogation: questions deemed relevant, since they pertain to the offense under investigaThis study was supported by the Eshkol Institute of the Faculty of Social Sciences at Hebrew University, Jerusalem. We wish to thank Michal Beller, Irving Biederman, and Itamar Gati for their comments, and Yaffa Goldberg for her expert programming. Requests for reprints should be sent to Gershon BenShakhar, Department of Psychology, Hebrew University, Jerusalem, 91905, Israel.

tion, and questions that serve as controls. The latter are specially formulated for each suspect in a pretest interview and, though unrelated to the crime, are such that the suspect is very likely to be deceptive or concerned about them. A typical relevant question in a theft case might be: "Did you steal a diamond ring from Jones's safe last night?" A typical control question might ask: "Have you ever stolen anything from someone who trusted you?" During the interrogation, certain physiological changes are continuously recorded, and at the end, the physiological responses to the relevant questions are compared with those given to the control questions. Typically, a suspect is classified as deceptive if he or she tends to react more to the relevant questions, and as not deceptive if he or she tends to react more to the control questions. When there is no clear tendency in either direction, the result of the test is labeled inconclusive. The rationale behind this method is based on the assumption that under proper pretesting conditions, control questions can be found about which an innocent suspect would be more concerned than he or she would be about the relevant questions, and so would produce greater physiological responses to them than to the relevant ques-

701

702

G. BEN-SHAKHAR, I. LIEBLICH, AND M. BAR-HILLEL

tions. At the same time, the control questions must be such that a guilty suspect would be less responsive to them than to the relevant questions (Podlesny & Raskin, 1977). This assumption has been criticized by Lykken (1979). Lykken doubts whether the control questions really function as controls in the usual scientific sense of the word. That is, he doubts whether the response of an innocent suspect to a control question yields a reasonable estimate of how a guilty suspect would respond to a relevant question. In general, Lykken expects most suspects in reallife interrogations to be more concerned about the relevant questions than about the control questions, whether they are guilty or innocent. In standard applications of the CQT, the evaluation procedure treats the possible patterns of responding symmetrically. That is, if a given difference between control responses and relevant responses is sufficient to establish an assessment of the suspect as innocent, then the same difference between relevant responses and control responses is sufficient to establish an assessment of him or her as guilty, and vice versa. If Lykken's critique is correct, it would mean that most innocent suspects should "fail" the lie detection test. The CQT can also be evaluated on pragmatic or empirical grounds. Podlesny and Raskin (1977) estimated that between 88% and 96% of suspects tested by the CQT are correctly classified. Lykken (1979), on the other hand, claimed that when the polygraph charts are scored blindly, without the benefit of clinical impressions of the suspects or access to any other evidence about them, the CQT classifies only 64% to 71% of the suspects accurately, against a chance expectation of 50%. Moreover, Lykken claimed, at least half of the truthful suspects will be erroneously classified as deceptive, so that the CQT is actually biased against them. In summary, three issues form the core of the Lykken-Raskin debate: (a) the psychological and logical soundness of the theory on which the CQT is based; (b) the empirical value of using physiological recordings obtained by the CQT for the detection of deception in real-life interrogations; and (c) the relationship between the size of the two types of errors that might result from decisions

based on the CQT. The present article relates especially to the last two issues. It utilizes a signal detection model in order to shed light on the empirical dispute. Current estimates of the CQT's accuracy were derived on the basis of arbitrary decision rules rather than optimal ones. That is, no consideration had been given to the essential components of the decision problem. These components are the prior probability of an interrogated suspect to be guilty and the payoffs associated with the possible outcomes of the given decision. Lieblich, BenShakhar, Kugelmass, & Cohen (1978) demonstrated how these factors might be combined to produce an optimal decision rule within the context of detection of deception. The present study utilizes data from seven field studies to extrapolate an index of efficiency of classification that is independent of any specific decision rule. On the basis of this extrapolation, Lykken's predicted assymetry of the distribution of errors in reallife interrogations will be reconsidered. In addition, an attempt will be made to extract the payoff component on the basis of which decisions had been made in the available field studies and to assess the range of usefulness of the CQT for real-life applications. Reanalysis of Seven Field Studies A Summary of Methodological Problems in Field Studies of Lie Detection Two types of criterion are commonly employed in polygraph validation studies of the CQT: (a) the consensual judgment of a panel of legal experts who are privy to all the information gathered about the case, except for the polygraph results; this was the criterion employed by Barland & Raskin (Note 1) and Bersh (1969); and (b) confession of guilt; this was the criterion employed by Horvath (1977), Horvath & Reid (1971), Hunter & Ash (1973), Slowik & Buckley (1975), and Wicklander & Hunter (1975). Such criteria must be resorted to since in real life, the truth concerning the suspects' guilt or innocence is rarely known (see, however, Ginton, Daie, Elaad, & Ben-Shakhar, 1982). But these criteria suffer from clear and major drawbacks. Panel decisions are not necessarily accurate and may be even less valid than the polyg-

EVALUATION OF POLYGRAPHERS' JUDGMENTS

raphers' evaluations. As to reliance on confessions, this may bias the sampling of cases studied. The manner in which this can come about is elaborated on in the Discussion section. Another methodological problem concerns the fact that in practice, polygraphers' judgments are rarely based on an objective quantification of the physiological recordings. Typically, all the data available to the experts is integrated intuitively. Szucko & Kleinmuntz (1981) have recently analyzed the judgments of polygraph experts based on an ad hoc assessment of physiological charts. They concluded that the experts' judgments bore little resemblance to the actual content of the physiological data. It should therefore be clear that in the subsequent discussion of the CQT, it is the totality of the cognitive activity of the polygraph experts that we are referring to, rather than to some objective usage of physiological data. A third methodological problem concerns the possible contamination of the information at the interrogator's disposal. The physiological data typically comes in the company of other information, such as the intelligence gathered about the suspect, the impressions of former interrogators, records of previous convictions, observations of the suspect's behavior during the pretest interview, and the actual polygraph test. Indeed, Barland (1975) has found a high rate of agreement between polygraph examiners' final evaluations and those solicited prior to the actual polygraph test. This factor must be borne in mind whenever interpreting the results of a polygraph validation study, since it implies that the observed validity may not be due exclusively to the physiological data. A more subtle source of contamination is the "experimenter expectancy effect" (Rosenthal & Rubin, 1978). The expectations that the polygraph examiner forms, based on any of the above mentioned cues, might affect the suspect in ways that will manifest themselves later in his or her physiological records. This source of contamination is inherent to the CQT, since the CQT is essentially a clinical method that calls for an intense interaction between the interrogator and the suspect. Finally, the criterion itself might be contaminated directly or indirectly by the poly-

703

graph test, as when a suspect who had failed the test is thereby pressured to confess, thus yielding the criterion. Even a panel is vulnerable to this kind of contamination, since it often utilizes some of the same information that the polygrapher used in his or her evaluation. The Data Base The present study was limited tofieldstudies only. There exist enough fundamental differences between the simulated laboratory setup (e.g., Raskin & Hare, 1978) and reallife criminal investigations (e.g., Lykken, 1979) to make these two incomparable. Hence we shall deal only with the following studies: Barland and Raskin (Note 1), Bersh (1969), Horvath (1977), Horvath and Reid (1971), Hunter and Ash (1973), Slowik and Buckley (1975), and Wicklander and Hunter (1975). Table I presents a summary of the major results of these studies. We classified them according to the type of criterion used, and the combined results within the two combined categories are presented in Table 1 as well. Derivation of the Receiver Operating Characteristic Curves It is our impression that polygraphers often use decision rules that are arbitrary, if indeed they use any at all. Had other rules been employed in the surveyed studies, other rates might have been obtained for the various outcomes in Table 1. We shall, therefore, examine other rules to see their effects on these outcomes. In order to characterize the set of all possible decision rules for each study, we adopted a scheme that depends on the following assumptions: 1. Decisions are based on an underlying quantitative dimension.1 Cutoff points are defined on this dimension, such that all val' Backster (1963) developed an objective quantitative method for evaluating physiological charts. It calls for a comparison of the response to each relevant question with the response to the nearest control question, separately for each of the physiological channels. Each comparison is scored on a scale from - 3 to 3, depending on the direction and the size of the difference, and the final score is the sum of the partial scores across all channels and all relevant questions.

Table 1 A Summary of the Major Results of Seven Field Validation Studies and Combinations Thereof Confessions criteria1"

Panel of experts criteria"

Results

Barland & Raskin Note T

Method of chart evaluation field Number of independent evaluators 1 Effective sample size 92 Number of innocents 17 Proportion of innocents classified as:f Nondeceptive (correct rejection rate) .294 Deceptive (false alarm rate) .353 Inconclusive .353 Number of guilty 47 Proportion of guilty classified as:" Nondeceptive (miss rate) .021 Deceptive (hit rate) .830 Inconclusive .149 Number of inconclusive by criterion 28

P

Bersh (1969)

Combined

Horvath (1977)

Horvath & Reid (1971)

Hunter & Ash (1973)"

Slowick & Buckley (1975)c

Wicklander & Hunter (1975)"

Combined

visual 1 243 112

mixed 1 335 129

visual 10 56 28

visual 10 40 20

visual 7 20 10

visual 7 30 15

visual 6 20 10

visual 6-10 166 83

.893 .107 — 104

.814 .140 .047 151

.511 .489 — 28

.905 .095 — 20

.857 .143 _ 10

.905 .067 .029 15

.867 .050 .083 10

.762 .223 .015 83

.144 .856 — 27h

.106 .848 .046 55

.229 .771 — o

.150 .850 — 0

.100 .871 .029 0

.152 .838 .010 0

.083 .900 .017 0

.163 .830 .007 0

r

I Z

a

"The panel criterion is based on a majority decision.b The results of the confessions criterion are based on means of decisions for polygraph examiners c Based on Barland & Raskin, Table 13. Only the first analysis of the charts is reported.' Only the analysis based on all three physiological charts is reported.' These proportions are computed from the number of innocents times the number of evaluators. Thus, .294 represents 5 cases out of 17 x 1, whereas .511 represents 143 cases out of 28 X 10 = 280 These proportions are computed from the number of guilty times the number of evaluators. h Eighty additional cases were excluded from the study by Bersh because their files contained insufficient information.

JO

x

705

EVALUATION OF POLYGRAPHERS' JUDGMENTS

Table 2 Estimates ofd', the Area Under the ROC Curve A, and the Payoff Ratios Payoff ratios Study Barland & Raskin (Note 1) Bersh(1969) Horvath(1977) Horvath & Reid (1971) Hunter & Ash (1973) Slowick & Buckley (1975) Wicklander & Hunter (1975) Panel combined Confessions combined

n 64 243

P(N) P{SN) .36

d estimated from:

Area

X,

x2

both

A

1.49 2.30

1.33 2.30

1.41 2.30

.77

.77

.77

.841 .949 .707

e

-d a

.41

1.14

1.89 1.14

.76

.76

56

1.08 1.00

40

1.00

2.35

2.35

2.35

.952

1.37

1.37

20

1.00

2.35

2.20

2.28

.947

.78

.94

30

1.00

2.34

2.49

2.42

.957

1.39

1.89

20 280

1.00 .85

2.50 2.14

2.92 2.11

2.71 2.12

.973 .934

.70 .80

1.69 1.24

166

1.00

1.69

1.71

1.70

.886

.80

.85

Note. ROC = receiver operating characteristic.

ues larger than some critical value result in sumptions imply that these two estimates a classification of the suspect as deceptive, must coincide. We did not always obtain the and all values smaller than some critical same estimate from both cutoff points. Howvalue (that is allowed to be smaller) result in ever, since the discrepancy was always small a classification of the suspect as not decep- and could be plausibly construed as meative. Values lying between the two critical surement error, we averaged the two d' espoints result in an inconclusive classification. timates to obtain a single one. The d' esti2. The quantitative dimension is nor- mates obtained thusly appear in Table 2 for mally distributed in both the innocent and each study separately and for the combinedthe guilty subpopulations and has the same by-criterion categories. variance in both.2 The distance between the Since, by assumption, d' completely demeans of these two normal distributions can fines the position of our guilty and innocent then be estimated for any study from its hit curves relative to each other, we can now rate and false alarm rate (see Swets. Tanner, examine the effects of utilizing other cutoff & Birdsall, 1961), as follows. points on the same distributions; each cutoff The hit rate corresponds to the area under point defines a hit and a false alarm rate. The the guilty curve from the larger cutoff point function relating them to each other can be and upwards, whereas the false alarm rate specified, resulting in a receiver operating corresponds to the area under the innocent characteristic (ROC) curve (Lieblich, Kugelcurve from the same cutoff point upwards. mass, & Ben-Shakhar, 1970; Swets et al., Each of these two areas, in turn, can be trans- 1961). The area under each such ROC curve lated into the distance, in standard units, can be computed to measure the degree of between the cutoff point and the respective separation between the guilty and the innomean. From these two distances it is easy to cent distributions on a scale from 0 to 1. On derive the distance between the respective this scale, .5 signifies a complete overlap of means themselves (d1). It is also possible to the two distributions, whereas 0 or 1 are obestimate d' from the smaller cutoff point. In this case, the miss rate corresponds to the - It is not necessary that the underlying dimension be area under the guilty curve from that point continuous, since a similar analysis could be done with downwards, and the correct rejection rate discrete categories. We chose to assume continuity here corresponds to the area under the innocent since physiological measures are usually treated as concurve from that point downwards. Our as- tinuous variables.

706

G. BEN-SHAKHAR, I. LIEBLICH, AND M. BAR-HILLEL

Table 3 A Schematic Payoff Matrix for a Polygraph Decision Problem With an Inconclusive Category

quantitative dimension X. Let f\SN) be the prior probability of guilt, and P\N) = 1 P\SN). The notion of a prior probability of guilt is a highly problematic one (e.g., Tribe, 1971), but here it merely refers to the probPolygrapher's classification ability that the polygrapher attaches to the Not suspect's guilt on the basis of everything he Deceptive deceptive Inconclusive or she knows just prior to obtaining the physCriterion (A", < X < X2) (X > X2) (X < X,) iological charts. The polygrapher's overall expected utility, Guilty c Innocent U, is the sum of the products of the utility / values in Table 3 with their respective probNote. Entries can be any real numbers such that a > abilities (Wiggins, 1973, p. 258). Hence, c> b and e > / > d. U = aP(SN)P(X > X2/SN) + bI\SN)P{X < X,/SN)

tained in the absence of any overlap (and hence are unobtainable in our case, since the two distributions are assumed to be normal). Moving from one point to another on the ROC curve indicates how the hit rate changes with a given change in the false alarm rate. The areas under the ROC curve estimated from the selected studies appear in Table 2. To test for the robustness of our estimated areas to violations of the normalcy assumption, we employed uniform and exponential distributions as well. Similar areas resulted. We did not test robustness in a bimodally distributed innocent population, though it is conceivable that X is bimodally distributed there (Lykken, Note 2), since a bimodal distribution that remains mathematically unspecified could result in an increase as well as a decrease of the estimates. Estimation of the Implicit Payoff of the Polygraph Interrogators The data provided in the selected studies allow the extraction of the value system of the polygraphers who were engaged in those studies, provided we assume that each one acted so as to maximize expected payoffs. This may not be a very plausible assumption, and we shall later reconsider it, but none of the cited studies offers any other kind of justification, or even discussion, of why the particular decision rule was used. Let a, b, c, d, e, f denote the utilities associated with all possible outcomes in a polygraph classification problem with an inconclusive category, as in Table 3. Let SN and TV denote the guilty and innocent populations, respectively, and let JSN(X) and f^X) be their respective density functions for the

+ cHSN^X,

< X < X2) + dP(N)l\X > X2/N)

+ eP[N)f\X < XJN)+ fP{N)P(Xi

X,/SN) + (d- f)P(N)P{X > XtIN) + (J - e)F\N)P(X > X,/N) + bP(SN) + eP{N) = (a- c)P(SN) f * •>x2

fSfl(x)dx

(c - b)I\SN) f* fSN(x)dx Jx,

(d- f)P(N) f"

jN(x)dx

J

X2

- e)P(N) P fN(x)dx + bP{SN) J

It can be shown by differentiation that U is maximal for Xl and X2 such that fssjXt)

=

(e-f) (c-b)

P(N) P\SN)

(a - c)

P(N) I\SN)

and fN(X2)

(2)

To simplify matters, let us assume that the value associated with an inconclusive classification is zero (c = / = 0), that the values associated with a correct classification are positive (a, e > 0), and that the values associated with an erroneous classification are negative (b, d < 0). In this case, Condition 2 reduces to fsNJXi) _ e -b

P{N) P(SN)

Pl

707

EVALUATION OF POLYGRAPHERS' JUDGMENTS

and fN(X2)

-d a

P(N) P(SN)

= (a- b)HSN) f * f {x)dx (3)

The likelihood ratio at X, is fSN(.X,)/fN(X,). Thus, the expected utility is maximized if all X values whose corresponding likelihood ratio is less than /Si entail a not deceptive classification, whereas those for whom it exceeds f}2 entail a deceptive classification. In each study, P(N) and P{SN) are taken to be the base rate of innocent and guilty, respectively, and hence the payoff ratios e/ —b and — d/a can be readily extracted from Condition 3. These payoff ratios appear in Table 2. The robustness of our payoff ratio estimates was tested using exponential instead of normal distributions. In all cases, this only caused the estimates to shrink further. The payoff ratio ej—b measures the substitution rate between the benefit of correctly identifying an innocent suspect and the benefit of avoiding a misclassification of a guilty suspect. Thus, the larger e/—b is, the more valuable it is deemed to correctly detect the innocent for a given cost entailed by misclassifying the guilty. Similarly, — d/a gives the substitution rate between the benefit of avoiding a misclassification of an innocent suspect and that of correctly classifying a guilty one.

-{e-

+ bP(SN) + eP(N). (4) It can be shown, by differentiation, that U is maximal for Xc that satisfies (e - d) P{N) (5) (a - b) P(SN) Given 0, and considering our assumptions regarding the normality and the equal variance of JSN(X) and fN(X), the value of Xc can be extracted.3 It may be the case that some combinations of payoff ratios and prior probabilities result in cutoff points that are so extreme as to be of little if any use. It seems, therefore, interesting to define an explicit index of usefulness for a CQT polygraph system. We now do so, in line with Cronbach and Gleser (1965). Assume that the cost of operating a CQT lie detection system is K units per test. This must be subtracted from the expected utility in Condition 4, so that = (a- b)P[SN) ~{e-

The Usefulness of Suspect Classification by the CQT In the previous paragraph, we attempted to extract the value system (i.e., the payoffs) of the polygraph examiners under certain assumptions. Clearly, however, these payoffs do not necessarily represent all field situations. Furthermore, the limited set of prior probabilities used in the previous analysis also need not hold in all field situations. It would thus be of interest to consider a wider range of different payoff ratios and prior probabilities. We shall do so for a simplified situation in which no inconclusive classifications are allowed, that is, when a single cutoff point is used. Hence, only entries a, b, d, and e of Table 3 are relevant. The expected utility of the appropriate payoff matrix (derived as before) is:

d)P(N) Jf * fN{x)dx xc

d)F\N)

J

xc

fN{x)dx

+ bP(SN) + eP(N) - K. (6) Obviously, a system is useful if its expected utility is larger than the utility that can be expected in its absence, that is, the utility of 1

Divide

by

and set equal to 0, by Condition 5. By talcing natural logarithms we obtain Xc = ln/3/rf' + d'/2, and the resulting hit and false alarm rate are Hit Rate =

vs

and False Alarm Rate =

708

G. BEN-SHAKHAR, I. LIEBLICH, AND M. BAR-HILLEL

suspect classifications that are based only on the prior information, denoted Uo. Uo is maximal when all suspects are classified into the same category, depending on (i. If 0 > 1, it is optimal to classify all suspects as not deceptive; hence Uo = bP(SN) + eP(N). If /3 < 1, then it is optimal to classify all suspects as deceptive; hence Uo = aP{SN) + df\N). The net gain from soliciting the polygraph examiner's expert opinion at the cost ofKis: (a - b)I\SN) f J

fSN(x)dx

(e-d)P(N))

fN(x)dx-K if

/?> 1

hU"V-Uo= \

(e - d)P(N) J ' fN(x)dx rx, b)f\SN) J fsf/(x)dx - K

-(a-

if 0< 1. (7) Dividing by (a - b)P(SN) we obtain: fSN(x)dx - ff I

Suppose that the payoff ratio, r, is symmetrical, that is (e - d)f(a - b)= 1. In this case, the lie detection test would be useful if either P{X > XJSN) -

or

XJN)

.025 P(SN)'

.025 < XJN) ~P{X< XJSN) >-—-.

(9)

In the panel criterion studies, for example, where d' = 2.12, Condition 9 holds whenever the probability of there being an embezzler in the bank is anywhere between .081 and .920. If d' is only 1.70, as it was in the confessions criterion studies. Condition 9 holds when that probability is between .118 and .883. For every cost-to-benefit ratio, K*, and every payoff, r, it is possible to compute the range of prior probabilities of guilt for which the system is useful (i.e., AU > 0). The upper and lower bounds of this range as a function of r were computed for two K* values, .025 and .05, that seem to us fairly characteristic, and for two d' values, 2.12 and 1.70. They can be found in Table 4.

fN(x)dx Discussion K*

1

a- b

f\SN) when

AU > 0 if and only if

cxr

J >

I\SN) /3 > 1 rxc

fN(x)dx - J

K a- b

fSN(x)dx K*

1 x •P(SN)

P[SN)

when

< I. (8)

K* is a kind of cost-benefit factor, defined by dividing the cost of each test by the gain from implicating rather than clearing a guilty suspect. To illustrate a numerical computation of AU, imagine a bank that wants to use the CQT to identify embezzelers among its employees. Suppose the cost, K, of each polygraph test is $500. In addition, suppose that the identification of an embezzler saves the bank $10,000, s o a = $10,000, whereas an embezzler missed costs the bank $10,000, so b = -$10,000. Thus, a-b

10,000-(-10,000)

= .025.

The Discriminability of the CQT It might seem from inspection of Tables 1 and 2 that overall, polygraphers employing the CQT discriminated between guilty and innocent suspects in thefield,since the smallest d' value in the selected studies was .77 standard deviations. However, several difficulties necessitate the serious qualification of such a conclusion. 1. Four of the five largest d' values were obtained in studies conducted at the John Reid and Associates Laboratories, a commercial polygraph agency. In these studies, the verification process is not always sufficiently described, although it is crucial for a proper evaluation. Lykken (Note 2) raised the possibility that in all four studies conducted in the Reid Laboratories, the charts chosen for the blind scoring were selected precisely because the original examiner had scored them correctly, or because the experimenters thought, from an inspection of the charts, that they ought to be scored in accordance with the criterion. We would add

709

EVALUATION OF POLYGRAPHERS' JUDGMENTS

Table 4 Ranges ofP(SN)for Which the Polygraph Test is Useful, for Some Values ofd\ K*, and r. Panel criterion studies ( P(Guilt/"not deceptive"). The positive d' values suggest that this is so, but we have spelled out our reasons for suspecting them, (b) Polygraph evaluations must provide information that is not redundant with whatever other items of information are to be integrated with it. If those evaluations are contaminated by nonphysiological evidence, little is to be gained by combining the two. At present, results of polygraph interrogations are typically phrased in terms of deceptive versus not deceptive, a terminology that is uncomfortably close to the one in which final verdicts, say in a court of law, are phrased, namely guilty versus not guilty. We believe that an important, indeed necessary, step towards putting polygraph information in its proper place would be for poiygraphers to couch their professional opinions in more neutral and objective terms. They could be Conclusions constrained, for example, to just reporting At the outset we wish to stress that this the average difference between responses to paper has analyzed the judgments rendered relevant versus control questions. This could by polygraph investigators employing the have two beneficial effects: Control Questions Technique. Other meth1. Poiygraphers would be seen in the role ods of polygraph interrogation, such as the of chart producers, rather than that of deciGuilty Knowledge Technique (Lykken, 1959, sion makers. The decisions would be made 1960) were not touched upon in this article by those empowered to make them (e.g., a nor were any objective properties of the court in legal debates; an employer in canpurely physiological data. didate selection). The charts would then be The assessments of usefulness adopted but a component in the decision. 2. Consequently, poiygraphers would be here were based on the assumption that the issue under investigation is to be resolved relieved of the need to take prior probabilities exclusively by the polygrapher's evaluation. of guilt and payoff considerations into acMoreover, the ultimate decision on this issue count. Such considerations would be the was assumed to actually coincide with the province of the decision maker. We have polygrapher's decision. Thus, in our exam- already remarked that it is highly unlikely ples above, if the polygrapher classified a sus- that poiygraphers presently take such considpect as "deceptive," the court, the police, or erations into account, and might well even the employer, as the case may be, were taken prove a difficult thing to train them to do to have classified him or her guilty, "hold for (Lieblich & Lieblich, 1969). But for inforfurther investigation," or "reject," respec- mation extractors, unlike for information tively. When used exclusively, we saw that evaluators, this is no longer a handicap.

712

O. BEN-SHAKHAR, I. LIEBLICH, AND M. BAR-HILLEL

Even in the more limited role suggested, we are not overly optimistic about the present worth of the CQT. For one, the appropriate weight to assign physiological evidence is difficult, if not impossible, to determine, and we fear that since the polygraph is commonly called lie detector, its results would tend to be overweighted in investigations of deceit. For another, the decision makers would have to be instructed on the nature of the relationship between physiological indications and guilt or deception, a relationship which is highly controversial. However, an alternative use of the polygraph comes to mind (Raskin & Podlesny. 1979). Studies of the CQT show the false negative rate to be considerably lower than the false positive rate. Hence, it could be used primarily for screening out innocent suspects. In other words, a suspect who "passed" the test might be released as most probably innocent, whereas one who "failed" would remain under suspicion, pending further investigation. The role of the polygraph findings would end there, however, and they would no longer be considered in the final integration of evidence. This would put the polygraph results in a status similar to, say, that of old-fashioned blood tests in paternity suits. Whereas a mismatch was considered proof of nonpaternity, a match was considered proof of nothing, and was therefore often not even admissible. Such use of diagnostic test results is not optimal from a Bayesian perspective, but since it is a safeguard against misuse of the polygraph, it may be worthwhile from a legal and social perspective. Finally, we wish to point out that whereas the numerical figures on which our analysis was based may be open to modifications, the method employed itself is independent of the figures. As new and better validity studies become available, our quantitative conclusions can be updated accordingly.

References

Backster, C. Polygraph professionaiization through technique standardization. Law and Order, 1963,11, 6364. Barland, G. H. Detection ofdeception in clinical suspects: A field evaluation study. Unpublished doctoral dissertation. University of Utah, 1975. Bersh, P. J. A validation of polygraph examiner judgments. Journal ofApplied Psychology, 1969, 53, 399403. Cronbach. L. J., & Gleser, G. C. Psychological tests and personnel decisions (2nd ed.). Urbana: University of Illinois Press, 1965. Cullison, A. D. Probability analysis of judicial fact finding: A preliminary outline of the subjective approach. Toledo Law Review, 1969, 538-598. Ginton, A., Daie, N., Elaad, E.. & Ben-Shakhar, G. A method for evaluating the use of the polygraph in a real-life situation. Journal of Applied Psychology, 1982,67, 131-137. Horvath, F. S. The effects of selected variables on interpretation of polygraph records. Journal of Applied Psychology, 1977,62, 127-136. Horvath, F. S., & Reid, J. E. The reliability of polygraph examiner diagnoses of truth and deception. Journal of Criminal Law, Criminology and Police Science. 1971,62,276-281. Hunter, F. L., & Ash, P. The accuracy and consistency of polygraph examiners' diagnoses. Journal of Police Science and Administration, 1973, 1, 370-375. Lieblich, I., Ben-Shakhar, G., Kugelmass, S., & Cohen, Y. Decision theory approach to the problem of polygraph interrogation. Journal of Applied Psychology, 1978, 63, 489-498. Lieblich, 1., Kugelmass, S., & Ben-Shakhar, G. Efficiency of GSR detection of information as a function of stimulus set size. Psychophysiology, 1970, 6, 601608. Lieblich, I., & Lieblich, A. Effects of different payoff matrices on arithmetic estimation tasks: An attempt to produce rationality. Perceptual and Motor Skills, 1969,29,467-473. Lykken, D. T. The GSR in the detection of guilt. Journal of Applied Psycholopr, 1959, 43, 385-388. Lykken, D. T. The validity of the guilty knowledge technique: The effects of faking. Journal of Applied Psychology. 1960, 44, 258-262. Lykken, D. T. Psychology and the lie detection industry. American Psychologist, 1974, 29, 725-739. Lykken, D. T. The psychopath and the lie detector. Psychophysiology, 1978, 15, 137-142. Lykken, D. T. The detection of deception. Psychological Bulletin, 1979, 86, 47-53. Podlesny, J. A., & Raskin, D. C. Physiological measures and the detection of deception. Psychological Bulletin, 1977,84, 782-799. Reference Notes 1. Barland, G. H. & Raskin, D. C. Validity and reli- Raskin, D. C. Scientific assessment of the accuracy of ability ofpolygraph examination of criminal suspects deception: A reply to Lykken. Psychophysiology, 1978, 75, 143-147. (Report No. 76-1. U.S. Department of Justice Contract 75-N1-99-0001). Salt Lake City: University of Raskin, D. C, & Hare. R. D. Psychopathy and the detection of deception in a prison population. PsychoUtah, 1976. physiology, 1978, 13, 126-136. 2. Lykken, D. T. Personal communication, September Raskin, D. C, & Podlesny, J. A. Truth and deception: 21, 1981.

EVALUATION OF POLYGRAPHERS' JUDGMENTS

713

A reply to Lykken. Psychological Bulletin, 1979, 86, Szucko, J. J. & Kleinmuntz, B. Statistical versus clinical 54-59. lie detection. American Psychologist, 1981, 36, 488Reid, J. E., & Inbau, F. E. Truth and deception- The 496. polygraph ("lie detector") technique. Baltimore, Md.: Tribe. L. Trial by mathematics: Precision and ritual in Williams & Wilkins, 1966. the legal process. Harvard Law Review, 1971, 84, Rosenthal, R. R., & Rubin, D. B. Interpersonal expec1329-1393. tancy effects: The first 345 studies. The Behavioral Wicldander, D. E., & Hunter, F. L. The influence of and Brain Sciences, 1978, 3, 377-415. auxiliary sources of information in polygraph diagSimon, R. J. & Mahan. L. Quantifying burdens of proof: noses. Journal of Police Science and Administration, A view from the bench, the jury and the classroom. 1975, 3, 405-409. Law and Society Review, 1971, 5, 319-330. Wiggins, J. S. Personality and prediction. Principles of Slowick, S. M., & Buckley, J. P. Relative accuracy of personality assessment. Reading. Mass.: Addisonpolygraph examiner diagnosis of respiration, blood Wesley. 1973. pressure and GSR recordings. Journal of Police Science and Administration, 1975, 3, 305-309. Swets, J. A., Tanner, W. P., & Birdsall, T. G. Decision Received November 18, 1981 processes in perception. Psychological Review, 1961, 68, 301-340. Revision received March 12, 1982 •

Instructions to Authors Articles submitted for publication in the Journal of Applied Psychology are evaluated according to the following criteria: (a) significance of contribution, (b) technical adequacy, (c) appropriateness for the Journal, and (d) clarity of presentation. In addition, articles must be clearly written m concise and unambiguous language. They must be logically organized, progressing from a statement of problem or purpose, through analysis of evidence, to conclusions and implications. To prepare manuscripts for submission, authors should refer to the Publication Manual ofthe A merican Psychological Association (2nd edition). Articles not prepared according to these guidelines will not be reviewed. Instructions on tables,figures,references, metrics, and typing (all copy must be double-spaced) appear in the Manual. Authors are asked to refer to the "Guidelines for Nonsexist Language in APA Journals" (Publication Manual Change Sheet 2, American Psychologist, June 1977, pp. 487-494) before submitting manuscripts to this journal. Also, all manuscripts must include an abstract of 100-120 words typed on a separate sheet of paper. Authors can refer to recent issues of the Journal for approximate length of regular articles. (Three double-spaced manuscript pages equal one printed page.) A few longer articles of special significance are occasionally published as monographs. Short Notes feature brief reports on studies such as those involving some methodological contribution or important replication. Blind review is optional and must be specifically requested when a manuscript is first submitted. Authors' names and affiliations must appear only on a separate title page, included with each copy of the manuscript Footnotes containing the identity of authors or their affiliations must be on a separate sheet. APA policy prohibits an author from simultaneously submitting the same manuscript to two or more journals. A manuscript to be considered for publication must be submitted in triplicate, and all copies should be clear, readable, and on paper of good quality. Authors should keep a copy of the manuscript to guard against loss. Mail manuscripts to the Editor-elect, Robert Guion, Department of Psychology, Bowling Green State University, Bowling Green. Ohio 43403.

Suggest Documents