American forces accidentally destroyed nearly 80% (27 of 35) of the U.S. M1 Abrams and Bradley Fighting Vehicles lost in combat (Briggs & Goldberg, 1995).
PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 50th ANNUAL MEETING—2006
1716
APPLICATION OF FUZZY SIGNAL DETECTION THEORY TO THE DISCRIMINATION OF MORPHED TANK IMAGES J.L. Szalma1, T. Oron-Gilad2, B. Saxton1, and P.A. Hancock1 1
University of Central Florida Orlando, Florida 2
Ben-Gurion University Beer-Sheva, Israel
The effect of response set size on performance on a detection task was evaluated using both fuzzy and traditional signal detection theory. Fuzzy categories of stimuli were created using morphing software to blend profile images of American (M1A1) and Iraqi (T55) tanks to different degrees. These combinations were used to create static images varying from 100% T55 to 0% T55 (100% MIA1). Participants were asked to indicate the degree to which each image did not resemble an American tank. Consistent with previous research, results indicated that the FSDT model conforms to the normality assumption of traditional SDT. In addition, forcing observers to make binary decisions impaired performance relative to multi-category response sets in the FSDT analysis but not the traditional analysis. However, there were more model convergence failures in the FSDT analysis relative to the traditional analysis, mostly associated with conditions in which there were 100 response categories.
INTRODUCTION Combat identification refers to the means by which military units distinguish friend from foe during operation. Accuracy in combat identification improves military effectiveness in two ways. First, it enhances the ability to successfully engage the enemy, but just as important it minimizes the risk of fratricide, which is the accidental destruction of friendly and or allied forces. Fratricide is a ubiquitous risk in warfare, accounting for up to 10 to 15% of friendly causalities during operations. In land-land operations some estimates have suggested figures closer to 40% (Bourn, 2002). In the first Persian Gulf conflict in Iraq, for example American forces accidentally destroyed nearly 80% (27 of 35) of the U.S. M1 Abrams and Bradley Fighting Vehicles lost in combat (Briggs & Goldberg, 1995). The process of attaining an accurate characterization for detected objects in combat environments with high confidence is a very difficult one, requiring timely application of tactical options. Signal Detection Theory (SDT) is an extremely useful and powerful analytical tool for evaluating human and machine performance in situations which require detection and/or discrimination. However, traditional SDT artificially forces events into crisp, mutually exclusive categories (e.g., either black or white, you are either with us or against us), which may not be accurate representations of the true states of the world. In many instances events possess properties of both signal and non-signal, at least to some varying degree. Indeed, it is such blending that leads to uncertainty in decision making in many operational settings. Parasuraman, Masalonis, and Hancock (2000; see also Hancock, Masalonis, & Parasuraman, 2000) have addressed this uncertainty by combining elements of SDT with those of Fuzzy Set Theory, in which category membership is not considered mutually exclusive and stimuli and responses to them can therefore be simultaneously assigned to more than one category. Thus, in Fuzzy SDT
(FSDT) a given stimulus may be categorized as both a correct detection and a false alarm depending on the relative degrees of signal-like versus non-signal-like properties. Formally, in FSDT a stimulus dimension, s, and a response dimension, r, are defined from 0 to 1, with 0 representing full membership in the category ‘non-signal’ (or ‘no’ response) and 1 representing full membership in the category ‘signal’ (or ‘yes’ response). For instance, a value of s=.5 indicates that the stimulus possesses characteristics of signal and non-signal to equal degrees, and r=.5 indicates maximal uncertainty in responding (e.g., “not sure”). Each dimension is related to real world stimulus and response properties via a mapping function; the key to successful application of FSDT. For instance, a fuzzy signal of s=.5, a point of maximal uncertainty along the fuzzy dimension, must be defined by a particular characteristic or set of characteristics of the real-world variable (e.g., the percentage of a morphed image that is derived from a T55 Iraqi tank; for more detail on mapping functions, see Parasuraman et al., 2000). Initial investigation comparing FSDT and standard SDT indicated that FSDT analysis generally reduces the false alarm rate and inflates the hit rate (Masalonis & Parasuraman, 2003; Murphy, Szalma, & Hancock, 2004). In addition, there is evidence that FSDT meets the Gaussian assumptions of traditional SDT and that fuzzy ROC functions are of the same general form as traditional or crisp ROCs (Murphy et al., 2004). The current study was designed to determine if these ROC results extend to more complex stimuli that simulate real-world objects, and to examine the effect of response set size on fuzzy ROCs. Thus, a 5-category condition was compared to a 100 category condition and a binary response set. Although the latter is not truly a fuzzy response, Parasuraman et al. (2000) noted that there is no constraint that both the s and r dimensions be fuzzy. Indeed, examination of binary response to fuzzy stimuli is critical for evaluating
PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 50th ANNUAL MEETING—2006
performance for domains in which discrete yes/no decisions are required. EXPERIMENTAL METHOD Three levels of response bias instruction set (conservative, unbiased, and lenient) were factorially combined with three response sets (2, 5, or 100 categories) to yield 9 experimental conditions. Response bias was manipulated using a point system as a payoff matrix (see Macmillan & Creelman, 2005). All participants experienced all conditions. The order of conditions was counterbalanced separately for each factor across participants, with the constraint that in any given session observers experienced only one bias condition. This was done to avoid problems associated with shifting criteria within a session. Prior to these experimental conditions participants engaged in a practice session in which they experienced each response set without bias instructions. Six
38%
1717
students from the University of Central Florida (3 men and 3 women) participated in this study for monetary compensation. The stimuli for this experiment consisted of profile images of an M1A1 (US) tank and a T55 (Iraqi) tank. These images were combined using morphing software, so that as the two images are blended, they assume characteristics of both tanks. Thirty blended images were selected for presentation, ranging from 38% T55 to 60% T55. The selection of this range was based on work in which the psychophysical function of these stimuli was assessed using the method of limits (Oron-Gilad, Morgan & Hancock, under revision). Images beyond 60% T55 were almost always labeled as ‘Iraqi’ and images below 38% were almost always labeled ‘American.’ Examples of the two extremes and a morphing of 50% are shown in Figure 1. In each condition observers were presented with 480 static images, with each of the 30 images presented 16 times. On each trial observers were asked to indicate the degree to which the image appeared to be an Iraqi tank.
50%
60%
Figure 1. Examples of the stimuli used in this experiment. The tank on the far left is an American M1A1, and the tank on the right is an Iraqi T55. The middle image is the 50/50 morphed combination of the two tanks. RESULTS Current algorithms for estimation of ROC curves require provision of frequencies of response. However, these programs do not permit application of fractional frequencies such as those associated with an FSDT analysis. Due to this constraint ROC functions were estimated by rounding the frequencies to the nearest whole number and entering them into FitRoc: Parameter Estimation for Gaussian Signal Detection Model (Wickens, 2002), version 2.1 program which provides (1) a test of how well an ROC curve fits the Gaussian 2 model (χ), (2) an estimate of perceptual sensitivity (Az), and (3) an estimate of response bias (βlog). Additionally, the intercept and slope for the z-score form of the ROC are estimated by the program (zH = a + b zF). Note that methods for determining SDT parameters are the same for both fuzzy and traditional analyses. The two models differ in how hit and false alarm rates are calculated. The formulas for calculating FSDT measures were obtained from Parasuraman et al. (2000), and the method for computing crisp hit and false alarm rates followed procedures outlined by MacMillan and Creelman (2005). For the latter analysis the stimulus and response dimensions were collapsed into two categories in order to facilitate direct comparison of the traditional and
fuzzy SDT models. We also compared response set conditions using a z test as described in Wickens (2002). Tables 1 and 2 report goodness of fit, intercept (i.e., ‘a’), slope (i.e., ‘b’), and perceptual sensitivity estimates for the fuzzy and traditional SDT methods, respectively. Fuzzy SDT Analysis Binary Response Set. Without exception, the data for every observer met the Gaussian equal variance assumptions of SDT. 5 Category Response Set. The equal variance Gaussian model fit for observers 2, 3, and 6. The unequal variance model fit for observer 5. Neither model fit for observers 1 and 4 (see Table 1). Interestingly, observer 1 only used the two extreme responses (1 and 5) for the lenient and conservative conditions. Although observer 5 did use other categories, the distribution of responses was bimodal, with the majority of responses at the extremes. Thus, the failure of the data of these two individuals to meet the equal variance assumption may be in part due to response strategy. 100 Category Response Set. The data of two observers, 2 and 3, met the equal variance assumption. The unequal variance model fit the data of observer 6. However, the unequal variance model produced an extremely steep slope, a
1718
PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 50th ANNUAL MEETING—2006
very unusual outcome, given that slopes generally vary from .5 to 2 (Swets, 1996). Neither model fit for observers 1, 4, and 5. However, observers 1 and 5, primarily used extreme categories in responding, as in the 5-category response condition.
the ROCs were significantly different, with the 100 category response set yielding higher sensitivity. 5 category vs. 100 category response. For the three observers for whom comparisons could be made, a significant difference in ROCs was observed for observer 2, with the 100 category condition yielding the higher sensitivity. In the other two cases, the two conditions did not differ in sensitivity. Comparison of Response Sets: Fuzzy SDT Analysis Indeed, for observer 3 the points were clustered close together 2 category vs. 5 category response. For those observers in ROC space. whose data fit the Gaussian SDT model, the ROCs were In sum, binary response sets seem to constrain compared for these two conditions. In the case of observer 5, performance as evaluated by FSDT. The number of multiple whose data met the unequal variance assumption, the unequal categories (5 vs. 100) exerts a weaker effect, showing variance model for the 2 category condition was used for differences for only one person. For that individual, providing comparison. In each case the ROCs for the two conditions more response categories improves detection, presumably were significantly different, with the 5-category response set because it allows a more precise linkage of response to the producing the higher sensitivity. Thus, constraining response degree of signalness observed. However, three observes chose sets impairs detection performance. the response strategy of using primarily extreme categories 2 category vs. 100 category response. For the three and therefore did not utilize the full range of possible observers for whom comparisons could be made, in each case responses. Table 1. Displays goodness of fit, sensitivity, and criterion for bias statistics calculated using Fuzzy SDT Participant Binary Response 1 2 3 4 5 6 5 Category 1 2 3 4 5 6
χ
A(z)
a
b
Conservative βlog
Unbiased βlog
Lenient βlog
.256e .177e .020e .208e .398e 2.517e
.787 .790 .793 .785 .780 .746
1.126 1.139 1.157 1.117 1.091 .938
1.00 1.00 1.00 1.00 1.00 1.00
.037 .749 .023 -.073 -.076 -.238
-.007 .302 -.075 -.204 -.272 -.185
-.212 .106 -.186 -.656 -.690 -.587
.910 .940
1.895 2.201
1.00 1.00
3.616 -.394
-.640 -.131
1.826 -.454
.923 .917
1.426 1.954
.053 1.00
-3.485 -.231
-3.740 -.103
-3.932 -1.445
2
cf .964e .449e cf .126u 2.823e
100 Category 1 cf .942 2.224 1.00 3.203 .607 .941 2 .734e .950 2.329 1.00 -.464 -.104 -.698 3 1.045e 4 cf 5 cf .860 15.023 13.896 2.255 2.668 1.872 6 .224u u data fits the unequal variance model; e data fits the equal variance model; cf. = convergence failure. Traditional (Crisp) SDT Analysis Arguably the most direct comparison of a traditional SDT model to the FSDT analysis described above is to reduce the number of stimulus and response categories to two. Thus, an analysis was computed in which a response of 3 or lower was categorized as a ‘no,’ and a similar analysis in which a
response of 3 or higher was considered a ‘yes.’ These analyses differ only in whether the middle category is categorized as ‘yes’ or ‘no.’ Binary Response Set. The Gaussian assumption was met for the data of each participant, and with the exception of participant 6, the equal variance assumption also held.
PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 50th ANNUAL MEETING—2006
5-Category Response Set: Middle category as ‘yes.’ The Gaussian equal variance model fit for observers 1, 2, 4, and 5. The unequal variance model fit for observer 3, and neither model fit for observer 6. 5-Category Response Set: Middle category as ‘no.’ The Gaussian equal variance model fit for all participants except observer 6, for whom the unequal variance model fit. 100-Category Response Set. The Gaussian equal variance model fit for observers 1 and 4, and the unequal variance model fit for observers 2, 3, and 6. Neither model fit for observer 5. Comparison of Response Sets: Traditional SDT Analysis Pairwise comparisons of response set conditions were computed using the z test described in Wickens (2002). Comparisons were made only when the Gaussian model fit for each condition. 2 category vs. 5 category (middle category as ‘yes’). Comparisons were possible for all observers except number 6. Results were mixed. For observers 1 and 3 there were no significant differences between these conditions. For observer 2 the binary response set yielded significantly greater sensitivity than the 5 category set. However, the opposite was true for observer 4 (see Table 2). 2 category vs. 5 category (middle category as ‘no). Results for this comparison were the same as those observed when the middle response category was categorized as ‘yes,’ with the exception that in this case the binary response set produced higher sensitivity than the 5-category set for observer 3 as well as for observer 2. Comparison for observer 6 indicated no significant differences between conditions. 2 category vs. 100 category response. Comparisons of these conditions resulted in only one significant difference. The sensitivity of observer 1 was significantly greater in the 100 category condition relative to the binary response set. 5 category vs. 100 category response. Regardless of whether the middle response category was coded as a ‘yes’ or a ‘no,’ z tests revealed that for observers 1 and 2 the 100 category conditions produced significantly higher sensitivities relative to the 5 category condition. Comparisons for the other participants indicated no significant differences between these conditions. DISCUSSION The fuzzy analysis yielded similar results for the 5- and 100-category response set conditions. In the crisp analysis a similar pattern was observed with the exception of two participants, for whom the larger response set was related to higher sensitivities. However, there were differences between multiple category response and binary response set conditions for both analyses. In the fuzzy analysis performance was degraded by forcing a binary response. By contrast, results were mixed in the crisp analysis, with multi category response sets producing greater sensitivities in some cases and the binary response set producing superior performance in other cases.
1719
In a traditional analysis a response is a correct detection, correct rejection, or an error of omission or commission, while in an FSDT analysis observers are awarded a degree of correctness in responding. Permitting multiple response categories and submitting the data to FSDT analysis capture these aspects of performance. That is, partial hits in fuzzy SDT are categorized as full hits or full misses in the crisp analysis, and partial false alarms are categorized in the traditional SDT analyses as full false alarms or correct rejections. The forced assignment of stimuli and responses to crisp categories may account for the inconsistencies in comparisons between conditions. It may be that the loss of information that results from such procedures biases the results in one direction or the other depending on how the participant’s responses are distributed. Recall that a few participants used only extreme response categories in the multiple response set conditions. The outcome of a crisp analysis may therefore be expected to depend in part on the response pattern of the individual regardless of that person’s perceptual sensitivity. A central question for fuzzy SDT is whether the assumptions of traditional SDT hold for this new model. The current study confirms prior findings that, in general, the assumption of normally distributed noise and signal plus noise distribution holds (see χ2 tests in Tables 1 and 2), and that these results extend to tasks in which the stimuli are complex visual images. Future investigations will examine the effect of different distributions of stimuli (e.g., more signal-like than non-signallike stimuli are presented), analogous to manipulations of signal rate in traditional SDT. In addition, future experiments will manipulate instructions regarding the definition of ‘signal’ and ‘non-signal’ to determine their effect not only on sensitivity but also on criterion setting. Recent evidence indicates that the fuzzy nature of the stimulus dimension may make stable criterion setting difficult for observers, which could account for cases in which the equal variance assumption was not met (Szalma, Parasuraman, & Hancock, in preparation). REFERENCES Bourn, J. (2002). Ministry of Defence: Combat Identification, British National Audit Office Report, retrieved August 1st 2005 from www.nao.gov.uk. Briggs, R.W. & Goldberg, J.H. (1995). Battlefield recognition of armored vehicles, Human Factors, 37(3), 596-610. Hancock, P.A., Masalonis, A.J., & Parasuraman, R. (2000). On the theory of fuzzy signal detection: Theoretical and practical considerations. Theoretical Issues in Ergonomic Science, 1, 207230. Macmillan, N.A., & Creelman, C.D. (2005). Detection theory: A user’s guide, 2nd ed. Mahwah, NJ: Erlbaum. Masalonis, A.J., & Parasuraman, R. (2003). Fuzzy signal detection theory: Analysis of human and machine performance in air traffic control, and analytic considerations. Ergonomics, 46, 1045-1074. Murphy, L., Szalma, J.L., & Hancock, P.A. (2004). Comparison of fuzzy signal detection and traditional signal detection theory: Analysis of duration discrimination of brief light flashes.
1720
PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 50th ANNUAL MEETING—2006
Proceedings of the Human Factors and Ergonomics Society, 48, 2494-2498. Oron-Gilad, T. Morgan J. F., & Hancock P. A. (,under revision). Categorical Perception of Morphed Objects Parasuraman, R., Masalonis, A.J., & Hancock, P.A. (2000). Fuzzy signal detection theory: Basic postulates and formulas for analyzing human and machine performance. Human Factors, 42, 636-659. Szalma, J.L., Parasuraman, R., & Hancock, P.A. Fuzzy signal detection theory: The form of empirical fuzzy receiver operating characteristics. Manuscript in preparation. Swets, J.A. (1996). Signal detection theory and ROC analysis in psychology and diagnostics. Mahwah, NJ: Erlbaum. Wickens, T.D. (2002). Elementary signal detection theory. Oxford University Press.
ACKNOWLEDGEMENTS This work was supported by the Department of Defense Multidisciplinary University Research Initiative (MURI) program administered by the Army Research Office under Grant DAAD19-01-1-0621. P.A. Hancock, Principal Investigator. The views expressed in this work are those of the authors and do not necessarily reflect official Army policy. The authors wish to thank Dr. Sherry Tove, Dr. Elmar Schmeisser, and Dr. Mike Drillings for providing administrative and technical direction for the Grant.
Table 2. Displays goodness of fit, sensitivity, and criterion for bias statistics calculated using the Traditional SDT. χ
A(z)
a
b
Conservative βlog
Unbiased βlog
Lenient βlog
Binary Response 1 2 3 4 5 6
1.023e 1.696e .563e .548e 3.011e .651u
.940 .964 .959 .946 .939 .805
2.196 2.540 2.461 2.273 2.191 2.997
1.00 1.00 1.00 1.00 1.00 3.342
.113 2.779 .111 -.243 -.206 .570
-.022 1.221 -.299 -.667 -.828 .826
-.635 .452 -.695 -2.046 -2.019 -1.632
5 Category 3 as ‘yes’ 1 2 3 4 5 6
1.077e 3.723e .114u .683e 4.239e cf
.945 .926 .955 .975 .944
2.263 2.044 1.816 2.783 2.250
1.00 1.00 .389 1.00 1.00
-.819 5.235 -2.509 -3.034 -1.729
-.837 -1.226 -2.091 -2.883 -2.450
-.931 2.267 -2.435 -2.813 -2.217
5 Category 3 as ‘no’ 1 2 3 4 5 6
1.077e 3.721e .630e .217e 2.372e 2.599u
.945 .921 .933 .972 .947 .838
2.263 2.000 2.117 2.703 2.285 6.724
1.00 1.00 1.00 1.00 1.00 6.749
-.819 5.231 .922 -.194 .524 2.093
-.837 -.900 .878 -.082 .075 2.245
-.931 2.244 .605 -2.478 -1.241 1.133
100 Category 1 2 3 4 5 6
5.410e * .780u .005u 2.425e 5.437n 1.709u
.969 .958 .878 .964 .943 .863
2.634 3.930 3.998 2.542 1.795 6.757
1.00 2.044 3.288 1.00 .539 6.093
-.478 3.011 -.652 -2.121 -.291 .729
-.187 1.199 .692 -2.125 -1.317 1.708
-.620 1.536 .229 -2.755 -2.065 -3.215
Participant
u
2
data fits the unequal variance model; e data fits the equal variance model; n neither model fit—values for unequal variance model are shown; cf. convergence failure; * model fit was marginal (.05