Signal detection theory, detectability and stochastic ... - Springer Link

1 downloads 0 Views 210KB Size Report
in the light of a central result of signal detection theory, which predicts a monotonic relationship between signal- to-noise ratio and detectability (Peterson et al.
Biol. Cybern. 87, 79–90 (2002) DOI 10.1007/s00422-002-0327-0  Springer-Verlag 2002

Signal detection theory, detectability and stochastic resonance effects Jakob Tougaard Centre for Sound Communication, Institute of Biology, SDU/Odense University, Campusvej 55, 5230 Odense M, Denmark Received: 3 April 2001 / Accepted in revised form: 8 March 2002

Abstract. Stochastic resonance is a phenomenon in which the performance of certain non-linear detectors can be enhanced by the addition of appropriate levels of random noise. Signal detection theory offers a powerful tool for analysing this type of system, through an ability to separate detection processes into reception and classification, with the former generally being linear and the latter always non-linear. Through appropriate measures of signal detectability it is possible to decide whether a local improvement in detection via stochastic resonance occurs due to the non-linear effects of the classification process. In this case, improvement of detection through the addition of noise can never improve detection beyond that of a corresponding adaptive system. Signal detection and stochastic resonance is investigated in several integrate-and-fire neuron models. It is demonstrated that the stochastic resonance observed in spiking models is caused by non-linear properties of the spike-generation process itself. The true detectability of the signal, as seen by the receiver part of the spiking neuron (the integrator part), decreases monotonically with input noise level for all signal and noise intensities.

1 Introduction Stochastic resonance is a term used to describe certain non-linear effects in a large and diverse range of detector models as well as in physical and biological systems. Common to all these is some sort of measurement of signal information transferred through a detector. Stochastic resonance is observed as a local increase in information transfer for certain levels of background noise. Gammaitoni et al. (1998) and Mitaim and Kosko (1998) provide recent reviews of stochastic resonance phenomena. The methods applied in the analysis of stochastic resonance are as diverse as the systems Correspondence to: J. Tougaard (Tel.: +45-65502222, Fax: +45-65930457, e-mail: [email protected])

analysed, making direct comparisons of results difficult. Some of the different measures of detector performance includes output signal-to-noise ratio (e.g. Douglass et al. 1993), direct measures of information transfer (e.g. Levin and Miller 1996), percent-correct classifications (e.g. Collins et al. 1996; Tougaard 1999a, 2000) and detectability (Ward et al. 2002). The use of an index of detectability is appropriate and revealing, especially seen in the light of a central result of signal detection theory, which predicts a monotonic relationship between signalto-noise ratio and detectability (Peterson et al. 1954; Green and Swets 1966; for a recent presentation see Macmillan and Creelman 1991). Recent studies have focused on the apparent conflict between stochastic resonance and signal detection theory. Tougaard (1999a, 2000) investigated stochastic resonance in a simple energy detector model and found on one hand a local increase in detection performance (measured as percent-correct classifications) with increased background noise, but on the other hand also found a monotonic decrease in detectability (measured as d 0 ) with noise. Ward et al. (2002) investigated stochastic resonance in two more-complex neuron models, and contrary to Tougaard (2000) report a local increase in d 0 with increased noise variance. The following is a detailed discussion of reasons for this apparent discrepancy, with an added extension of the model of Tougaard (2000) to make it comparable to the models of Ward et al. (2002). Central in this discussion lies the concept of signal detectability and specific measures of it. As a basis for the subsequent discussion, the first part of the text is devoted to a recapitulation of central parts of signal detection theory, with special emphasis on measures of detectability and the rationale for introducing them in the first place. This part attempts to treat signal detection problems in the most general terms available. The second part of the text is devoted to a detailed discussion of stochastic resonance in spiking models in general and the results of Ward et al. (2002) in particular. It is concluded that the results of Ward et al. (2002) does not violate the fundamental dogma of signal detection theory (monotonicity between signal-to-noise ratio and detectability), even

80

though they report a local increase in detectability with increasing input noise. The apparent paradox arises because of hidden non-linearities in the models of Ward et al. (2002), more specifically in the spiking process itself. 2 Signal detection theory Signal detection theory deals with the general problem of detecting a known signal (or a group of known signals) in a known, random-noise background. A central goal of signal detection theory is to devise methods for evaluating performance of given detectors discriminating between signal-plus-noise and noise-only inputs. A second objective, outside the scope of the present paper, is the determination of optimal detectors for given signals in given noise backgrounds. Signal detection theory traditionally considers an observer which evaluates the output of a detector and makes a decision based on this output. This terminology, however, is potentially misleading in suggesting a setting consisting of a human (or animal) observer, whose task it is to discriminate outputs from some sort of man-made device (e.g. sounds from a loudspeaker or, as in the early experiments, blobs on a radar screen). Since it really is the observer who performs the detection task, the detector part is better referred to as a receiver, and the term classifier is preferred over observer. In this way we indicate that the only requirement from the classifier is an ability to classify observations as ‘‘Yes’’ or ‘‘No’’, based on the output of the receiver. This removes the need for a conscious observer. The elements are illustrated schematically in Fig. 1. The receiver is presented with either background noise alone (N condition) or a mixture of a signal and background noise (SN condition). The input s(t), which is a function of time, is transformed by the receiver into a decision variable i ¼ I(s). I can have any number of dimensions (e.g. the complex FFT spectrum of a segment of s), but most often only the situation where i is one-dimensional is considered (for an explicit multidimensional example see Licklider 1964). In an observation event, the classifier is presented with a single value i from the receiver and must then decide whether or not the input s to the receiver contained a signal. This is usually done by means of some criterion, such that the classifier reports ‘‘Yes’’ whenever i belongs to some set A, and reports ‘‘No’’ when i belongs to A (with A ¨ A being the total sample space of i). There are several ways to evaluate the performance of the receiver/classifier, but a central question is the following: is it possible to evaluate their performance individually, even in a situation with limited or even no knowledge of the function I and the criterion set A? Several potential parameters for this evaluation are discussed below. The proportion of signals detected is a poor measure, since it completely overlooks the errors committed in classifying the noise-alone inputs. The first step in evaluating the fidelity of the receiver/classifier is thus the proportion of all inputs that are classified correctly. There

Fig. 1. Schematic diagram of detection process. The receiver transforms a noisy input stimulus, s(t), into a decision variable i. The classifier decides, based on i and by means of some suitable criterion set A, whether it was likely that a signal was present in the input Table 1. Response matrix of detector, showing the four possible combinations of inputs (column 1) and responses (row 1)

Signal plus noise Noise

Yes

No

Hit False alarm

Miss Correct rejection

are two possible inputs or ‘‘states of the world’’ (N and SN) and two possible outputs of the system (‘‘Yes’’ and ‘‘No’’), leading to four possible combinations, two of which are correct and two of which are wrong. These are hits (Yes|SN), correct rejections (No|N), misses (No|SN), and false alarms (Yes|N). The four combinations constitute the response matrix (Table 1). Because there are only two degrees of freedom in the response matrix, only two of the parameters are needed for a complete description of performance. Hits and false alarms are traditionally used and can be combined into the proportion that are correctly classified, P(c): P ðcÞ ¼ P ðYesjSNÞ  P ðSNÞ þ ½1 P ðYesjNÞ  P ðNÞ

ð1Þ

where P(Yes|SN) and P(Yes|N) are hit and false-alarm rates, respectively. P(c) is a simple and useful parameter for characterising the overall performance of the system. However, it fails to provide information on the separate performance of receptor and classifier parts. 2.1 Receiver operating characteristics This separation is achieved by a division of errors into those concerning the N input and those concerning the SN input. This can be done by plotting hit and falsealarm rates as a single point in a receiver operating characteristics (ROC) plot (Fig. 2). This point represents the performance realised with a particular criterion of the classifier. For a given signal and a constant noise background, several criteria of the classifier can be imagined, each resulting in different performance and hence different points in the ROC plot. (Note that for this general discussion, we need not be concerned whether these different criteria are physically realisable for a given classifier.) If a sufficiently large number of criteria are evaluated in this way and the individual points connected by lines in an appropriate way, it is intuitively clear that the resulting ROC curve contains information about the receiver part only, with classifier criterion information provided by the location of each individual point. We have then achieved what we set out for: devising a tool for separate evaluation of detector

81

Hit and false-alarm rates for the intermediate criterion, Amix, are Z Z P ðijSNÞdi þ p P ðijSNÞdi PAmix ðYesjSNÞ ¼ A2 \ A1

A1

¼ PA1 ðYesjSNÞ 2 3 Z Z 6 7 P ðijSNÞdi5 þ p4 P ðijSNÞdi A2

A1

¼ PA1 ðYesjSNÞ þ p½PA2 ðYesjSNÞ PA1 ðYesjSNÞ

Z Z PAmix ðYesjNÞ ¼ P ðijNÞdi þ p P ðijNÞdi A2 \ A1

A1

Fig. 2. Receiver operating characteristics (ROC). The upper curve is constructed by employing three different criterion sets (A1, A2 and A3) in the detection of the same signal at the same noise level, and plotting the corresponding false-alarm and hit rates as single points. The position of a point of operation for an intermediate criterion (Amix) between A1 and A3 is also shown. The upper curve represents the best possible performance at the given signal-to-noise ratio, and the lower curve represents the worst-possible performance, found by employing inverse criteria (A1 , A2 , A3 ). The shaded area between the two curves represents the area of all possible points of operation, or set of achievable policies

and classifier. The ROC curve is thus a criterion-free measure of receiver performance and has some important, general features: 1. All ROC curves start in (0, 0) and end in (1, 1). These two points are the results of selecting the most extreme criteria available (say ‘‘No’’ to all values of i, and say ‘‘Yes’’ to all values of i, corresponding to A ¼ Ø and A = Ø, respectively). 2. The ROC curve is continuous, even in the case of a finite sample space of i. This is realised by observing that we can select criteria which are linear mixtures of any two other criteria: Given two sets A1 and A2 where A2 ¸ A1 8 say ‘‘Yes’’ < i 2 A1 :  1 : say ‘‘Yes’’ with probability p Criterion: i 2 A2 \ A : 1 \ A  2 : say ‘‘ No’’ i2A Corresponding hit and false-alarm rates for criterion A are given as Hit rate: PA ðYesjSNÞ ¼ P ði 2 AjSNÞ ¼

Z

P ðijSNÞdi

A

False-alarm rate: PA ðYesjNÞ ¼ P ði 2 AjNÞ ¼

Z

P ðijNÞdi

A

ð2Þ

¼ PA1 ðYesjNÞ 2 3 Z Z 6 7 P ðijNÞdi5 þ p4 P ðijNÞdi A2

A1

¼ PA1 ðYesjNÞ þ p½PA2 ðYesjNÞ PA1 ðYesjNÞ

ð3Þ

The point in the ROC plot corresponding to the intermediate criterion thus falls on a straight line connecting the points corresponding to the A1 and A2 criteria (Fig. 2). 3. Guessing at random results in a corresponding point in the ROC plot on the positive diagonal (lower left to upper right). Guessing is equivalent to combining the two most extreme criteria described above (A ¼ Ø and A = Ø). The coordinates of the corresponding ROC point can thus be found from (3) to be (p,p), where p is the probability of the answer being ‘‘yes’’. 4. An optimal ROC curve exists, by which we mean a curve constructed entirely from Neymann–Pearson criteria (Neymann and Pearson 1933), also referred to as optimal criteria. A Neymann–Pearson criterion maximises the hit rate for a given false-alarm rate, and the optimal ROC curve thus defines the best possible performance of the classifier for the given signal and noise. We know from (3) that any two criterion sets can be combined into a third criterion set, resulting in a point located on a straight line between the two original points in the ROC plot. This intermediate criterion must either in itself be optimal, or else a better criterion Ax must exist which has a higher corresponding hit rate. Thus PAx ðYesjSNÞ PAmix ðYesjSNÞ for all intermediate criteria, and it follows that the optimal ROC curve is never concave and thus also monotonically increasing. 5. Just as the optimal-criteria ROC curve represents the best possible performance for a given signal and noise, a similar curve representing the worst possible performance exists. If one optimal criterion is: say ‘‘Yes’’ for i 2 A, then the worst possible to do is the exact opposite: say ‘‘Yes’’ for I 2 A. This yields the following hit and false-alarm rates:

82

PA ðYesjSNÞ ¼

Z

P ðijSNÞdi

A

¼1 PA ðYesjNÞ ¼

Z

Z

P ðijSNÞdi ¼ 1 PA ðYesjSNÞ

A

P ðijNÞdi

A

¼1

Z

P ðijNÞdi ¼ 1 PA ðYesjNÞ

ð4Þ

criterion of the classifier. This choice is reflected in the location of the point of operation in the ROC plot. Of this point we know from above that it must be located either on the border of the SOAP or within it. The latter readily identifies the criterion as suboptimal in the Neyman–Pearson sense. If the criterion is identified as optimal, further information can be inferred from the position of the point of operation along the optimal ROC curve, expressed as classifier bias. Green and Swets (1966) and Macmillan and Creelman (1991) provide indepth discussions on classifier bias.

A

The worst possible ROC curve is thus identical to the optimal ROC curve rotated 180 around the point (0.5, 0.5). With this we have defined a region stretching from the lower left to the upper right corner of the ROC plot, bordered upwards by the optimal ROC curve and downwards by the same ROC curve rotated 180. Since we have shown that it is possible to identify a corresponding criterion to any point on a line connecting the points of two other criteria, it follows that a criterion exists for all points inside the bordered area. This area has been termed the ‘‘set of achievable policies’’, or SOAP (Kaernbach 1991), which indicates that for a given detector presented with a given signal in a given noise background (in the following referred to as a given set-up), all theoretically possible points in the ROC plot fall either within or on the border of the SOAP.

2.3 Assumed distributions of signal and noise and measures of detectability In order to proceed let us assume that i is distributed according to some probability density function in the case of a noise-only input, P(i|N) ¼ fN(i), and according to a different probability density function in the case of a signal-plus-noise input, P(i|SN) ¼ fSN(i). A classifier which performs better than chance level must somehow base its classification on knowledge of the functions fN(i) and fSN(i); more specifically, it must be able to evaluate the a posteriori probabilities P(SN|i) and P(N|i). In other words: how likely is it that a given observation i was the result of a SN versus N condition as input? An optimal solution to this problem involves the use of a likelihood-ratio criterion. The likelihood ratio, l, for a given observation i is given as lðiÞ ¼

2.2 The ROC curve as a measure of detectability The optimal ROC curve for a given set-up is a measure of the difference between noise and signal-plus-noise inputs, or more specifically, the overlap between the probability density distributions of i|N and i|SN, as described below. This separation is the detectability of the given set-up. The larger the separation between noise alone and signal plus noise, the further towards the point of perfect discrimination (0,1) the ROC curve reaches (i.e. the larger the SOAP), and the better the performance that can be realised by an appropriate choice of criterion. This detectability is strictly coupled to the particular set-up (i.e. a particular signal in a particular noise and specific for the given receiver). A different receiver, more appropriately designed for the particular signal and noise, may thus provide a higher detectability. An example of this is the detection of a signal specified exactly (i.e. all amplitude and phase information known). Such a signal may be perfectly detected at good signal-to-noise ratios by a simple levelcrossing detector, but a much higher detectability can be realised by employing a matched-filter type of detector, in which case the detection of the signal is possible with good fidelity at considerably poorer signal-to-noise ratios. The actual performance of the combined receiver and classifier, as measured by p(c), depends on the particular

P ðSNjiÞ P ðSNÞ P ðijSNÞ ¼  P ðNjiÞ P ðNÞ P ðijNÞ

ð5Þ

A simple likelihood-ratio criterion is of the type: say ‘‘Yes’’ for l(i) larger than some value b, otherwise say ‘‘No’’. This likelihood-ratio criterion fulfils as a Neymann–Pearson criterion (for proof, see Green and Swets 1966). Classifier bias was mentioned briefly above and it should be added here that b is a valuable measure of bias. This makes good sense, as b expresses a weighting of likelihoods of N and SN conditions. With b ¼ 2, the criterion can be formulated as: say ‘‘Yes’’ only if it is more than twice as likely that an observed value i was the result of a stimulus present compared to the likelihood that it was the result of noise alone. So far we have not made any assumptions about the nature of the probability density functions fN(i) and fSN(i). It is thus well worth noting that the conclusions drawn so far are absolutely general, valid for any combination of receiver and classifier and independent of the exact nature of both signal and noise as well as the transformation function I and the classification criterion employed. If a ROC curve can be constructed for a given set-up, important information on detectability and possible limiting factors for detection performance can be inferred, independent of what limited knowledge about the system we may otherwise have. Important in this context is that it allows us to conclude whether a poor performance in a given situation is caused by a low

83

detectability of the signal (in which case little can be done to improve performance) or by a classifier criterion not suited to achieve the desired goal (in which case improved performance may be possible by altering the classifier criterion).

Thus Z1

P ðcÞ2AFC ¼

P ðiSN ¼ ic Þ  P ðiN < ic Þdic

1

m 2.4 Area below ROC curve A straightforward and useful measure of detectability is the area below the ROC curve. As the detectability of a signal increases, the corresponding ROC curve moves towards the upper left corner of the ROC plot and it is intuitively clear that the area below the ROC curve contains some information about detectability. It gets better than just this, but in order to show this a detour to an experimental paradigm slightly different from the yes/ no situation discussed so far, namely the two-alternatives forced-choice (2AFC) is required. Consider a receiver/classifier, which is presented not with one single stimulus (SN or N), but two stimuli sequentially. One stimulus is noise alone (N), the other is signal plus noise (SN). The task of the classifier is now not to decide Yes versus No, but whether the signal occurred in the first or the second presentation. Without proof we will accept that an optimal strategy for solving this classification problem involves the calculation of the likelihood ratio l(i) for both intervals, and choose the interval with the highest likelihood ratio as the interval most likely to contain the signal (Green and Swets 1966). If we denote the two intervals a and b, respectively, the criterion can be formulated as if lðia Þ > lðib Þ then pick a Criterion: otherwise pick b How often will this choice be correct? It will be correct whenever the likelihood ratio for the interval containing the signal plus noise exceeds the likelihood ratio for the interval containing noise only. An observation consists of two measures, iN and iSN, from the noise-only and the signal-plus-noise interval, respectively. Say l(iSN) equals some value c, then the probability of a correct choice is P(l(iN) < c). To find the total proportion of correct answers, this value must be multiplied by the probability P(l(iSN) ¼ c) and this product summed for all possible values of c: P ðcÞ2AFC ¼

Z1

P ½lðiSN Þ ¼ c  p½lðiN Þ < c dc

1

Note that one must assume independence between the two observations, which is rarely of concern. Furthermore, the classifier must be unbaised; i.e. not have a preference for selecting one interval over the other. See Green and Swets (1966) for dealing with biased 2AFC data. If l(i) is monotonic with i we obtain P ½lðiÞ ¼ c ¼ P ði ¼ ic Þ;

where Iðic Þ ¼ c

ð6Þ Z1

P ðcÞ2AFC ¼

fSN ðic ÞFN ðic Þdic

1

where R i F(i) is the cumulated probability function (equal to 1 f ðtÞdt). This formula for calculating the area below the ROC curve is general, provided that the assumption of monotonicity between ic and l(ic) is fulfilled (Egan 1975; see also below). Another instant where (6) is valid, which we will use later, is the situation where the classifier operates with a simple criterion directly on i, rather than l(i). In the 2AFC paradigm, this type of criterion would be: choose a if ia > ic, otherwise choose b. The area below the ROC curve is calculated by the integral Area ¼

Z1

FN ðic Þdy;

where y ¼ 1 FSN ðic Þ

0

where integration is along the y-axis for simplicity. From the expression for y it follows that dy ¼ fSN ðic Þdic and we obtain by substitution Area ¼

Zy¼1 y¼0

FN ðic ÞfSN ðic Þdic ¼

Z1

FN ðic ÞfSN ðic Þdic

1

ð7Þ When compared to (6), it is clear that the area under the ROC curve exactly equals the proportion correct of a 2AFC experiment. This relationship, demonstrated by Green (1964) and known as Green’s theorem (Simpson and Fitter 1973) or the area theorem (Egan 1975), is central. It allows us to use the area below a ROC curve from a yes/no experiment as a direct measure of detectability, knowing for certain that a larger area is reflected in a larger separation of the signal-plus-noise and noise only distributions. This higher detectability is reflected in a higher proportion correct in a 2AFC experiment. Bamber (1975) also showed that given ic is continuous, the area below ROC is also closely related to the Mann–Whitney U statistic: U ¼ A  nSN nN , where nSN and nN are the number of signal-plus-noise and noise-only presentations, respectively. When it comes to actually determining detectability, there are two approaches. Either a ROC curve can be found experimentally by manipulation of the criterion (which may not always be possible), or a ROC plot is

84

constructed on the basis of certain assumptions about the distribution of i. Real-life solutions are often a combination of few measurements backed by theoretical assumptions. Unfortunately, even in a situation where a complete ROC curve can be constructed, no definitive information about the underlying distributions of i|N and i|SN can be inferred. What is needed are more specific assumptions regarding the distribution of the decision parameter i. 2.5 The Gaussian assumption The by far most common assumption regarding fN(i) and fSN(i) is the Gaussian, equal-variance assumption (e.g. Peterson et al. 1954; Green and Swets 1966), which will be described in some detail. The Poisson assumption is another (Egan 1975; Kaernbach 1991), and numerous others are given by Egan (1975). The extensive application of the Gaussian assumption is partly justified by theoretical and experimental results, partly because of convenient behaviour of the model. Under the Gaussian assumption with equal variance it is assumed that fN, and fSN are both Gaussian with means lN and lSN, respectively, and identical variance r2. For some combinations of signals, noise and receivers, such as the energy-detector model of Tougaard (2000), fN and fSN can be derived analytically and found to be well approximated by two Gaussian distributions with identical variances. In other situations, as in psychophysics, the nature of the transformation function I is largely unknown, but experimental results are often not grossly deviant from predictions of a Gaussian, equal–variance model. It also seems fair to assume that a large number of independent stochastic processes are involved in a psychophysical detection process, thus justifying the Gaussian assumption through the central limit theorem. The most convenient consequence of the Gaussian assumption with equal variances is that i and likelihood ratio, l(i) are monotonic: h  2 i p1ffiffiffiffi exp 1 i lSN 2 r r 2p h  lðiÞ ¼ 2 i i l 1 1 N pffiffiffiffi exp 2 r r 2p   ðlSN lN Þ l2SN l2N ¼ exp iþ r2 2r2

ð8Þ

This result is central because it removes the need for a direct evaluation of a posteriori probabilities P(i|N) and P(i|SN) because a criterion based on i: say ‘‘Yes’’ for i > ic is equivalent to the likelihood-ratio criterion: say ‘‘Yes’’ for l(i) > c, where c ¼ l(ic). In fact a criterion based on any order preserving transformation of the likelihood ratio is optimal in the Neymann–Pearson sense (Bamber 1975). Criteria based on i rather than l(i) are almost universally adopted when data from models or experiments are evaluated. It is crucial, however, to be aware that such criteria are suboptimal, unless the assumption of monotony is fulfilled. An important example is the situation where fN(i) and fSN(i) are

Gaussian with unequal variances. In this case l(i) is not monotonic with i. The degree to which the criterion is suboptimal depends on the difference in variance between fN(i) and fSN(i), and may or may not be acceptable, depending on the specific application. 2.6 Index of detectability, d 0 Under the Gaussian assumption with equal variances a single-parameter description of the ROC curve and hence the detectability can be defined as l lN ð9Þ d 0 ¼ SN r d 0 is a dimensionless measure of the separation of the N and SN distributions. It is monotonic with performance, and hence a convenient measure of detectability. Below it is shown that d 0 increases monotonically as the area below the ROC curve increases and hence with performance. The relation of fN(i) to fSN(i) is straightforward when expressed via d 0 :     i lN i lSN lSN lN ¼u þ fN ðiÞ ¼ u r r r ¼ fSN ði þ d 0 Þ

ð10Þ

where u(i) is the Gaussian probability density function. The following proof is general for all situations where the assumption of monotonicity between ic and l(ic) is fulfilled, and (10) applies (i.e. the only difference between fN(i) and fSN(i) being in their means). The area below the ROC curve was found in (7), which is rewritten as Area ¼

Z1 1

fSN ðiÞFN ðiÞdi ¼

Z1

fSN ðiÞFSN ði þ d 0 Þdi

1

As FSN(i) is monotonically increasing with i (by definition), the above integral and hence the area below the ROC curve and thus performance increase as d 0 increases. The definition of d 0 in (9) can and is used for any two distributions fN(i) and fSN(i), where a mean and variance can be calculated. The simple definition breaks down, however, when fN and fSN have unequal variances. Several ways to bypass this problem have been suggested. The simplest solution is to use either rN or rSN in the calculation (termed d10 and d20 , respectively), or one can calculate a mean standard deviation. This mean can 0 ) or root mean be arithmetic (de0 ), geometric (dGM 0 squared (RMS, da ). It will lead too far to enter into a discussion of the pros and cons of the various indices (see Simpson and Fitter (1973) and Macmillan and Creelman (1991) for discussions of this). The central point in this connection is that as a general rule the different indices cannot be compared, although they are interrelated and in the case of two Gaussian distributions with unequal variance, simple relations between the various indices can be found (Simpson and Fitter 1973), and also geometric interpretations on the ROC

85

curve can be found (e.g. Macmillan and Creelamn 1991). These relations are by no means general, however. They all express detectability and are in general monotonic with performance and signal-to-noise ratio. The exact behaviour of the different indices, however, depends strongly on the behaviour of the underlying distributions fN(i) and fSN(i), and unless one has specific knowledge of these functions, one has little chance of predicting this behaviour. A short comment on notation: the symbol d 0 is used commonly for detectability, irrespective of the nature of the underlying distribution. This is unfortunate as it may create confusion about the comparability of the different indices. d 0 should thus be strictly reserved for the Gaussian, equal-variance situation (Eq. 9), and other d 0 indices should be marked with an appropriate index to notify the deviation from (9). 2.7 Concluding remarks on measures of detectability As evident from the above, signal detection theory offers powerful tools for evaluating the performance of detector systems. More specifically, by means of the concepts of detectability and bias, is it possible to look into a specific detection problem and pinpoint a poor performance as originating either in the receiver or classifier part of the detector. The most universal tool for evaluating the performance of detectors is the ROC curve. If a ROC curve can be constructed, important information about detectability and classifier bias is immediately available. It is most important, however, that no information or assumptions regarding the nature of the underlying distributions fN and fSN are needed for this evaluation. Conclusions drawn directly from the ROC plot are thus the strongest possible, although also often the most difficult to obtain. If one can demonstrate or assume that classification occurs with a simple criterion employed either directly on the decision variable or on a likelihood ratio that is monotonic with the decision variable, the area below the ROC curve is a powerful measure of detectability. As evident from the area theorem, its usefulness stems from the relation to 2AFC performance. The 2AFC percent correct is possibly the most direct measure of receiver performance, and since no knowledge about the underlying distributions or the classification criterion is needed, it is probably the best measure of detectability of all. The downside to the 2AFC paradigm is of course the total inability to evaluate the performance of the classifier. If the assumption of two underlying Gaussian distributions with equal variance is fulfilled or well approximated, the standard index of detectability, d 0 , can be applied. What separates d 0 from the area below the ROC curve is that it also provides direct information about the separation of the underlying distributions. For all other distributions where means and variances can be calculated or measured, other indices of detectability can be defined. These are to be considered ad hoc measures, and careful investigation of their behaviour in the given

situation is necessary before one is allowed to conclude that they are indeed good measures of detectability. A minimum requirement of a good detectability measure must be monotonicity with performance in an unbiased 2AFC setting, or at least knowledge of the extent to which this monotonicity breaks down. 3 Signal detection theory and stochastic resonance in integrate-and-fire neuron models Stochastic resonance is a non-linear phenomenon. It occurs in many different types of systems, all characterised by a non-linear relation between input and output. Often the non-linear element is in the form of a threshold, below which no information on the input signal is passed through the system. In this simple version, information about a subthreshold signal can be transferred through the system only if adequate levels of random noise is added to the signal. This makes the combined signal plus noise cross the threshold, at times correlated with the input signal (equivalent to a dithering process, Wannamaker et al. 2000). However, too much noise saturates the system and removes the correlation. A key question is where in the detection process stochastic resonance originates. In other words: does the effect reside in the receiver part or in the classifier part of the detector? Signal detection theory and especially measures of detectability are ideal for answering this question. A large proportion of stochastic resonance studies deal with real neurons (e.g. Levin and Miller 1996) or neuron models (e.g. Bezrukov and Vodyanoy 1997). Stochastic resonance behaviour in a simple energy detector model was investigated recently by Tougaard (2000). In that study it was clear that the stochastic resonance was linked to a suboptimal criterion of the classifier (which was kept constant). The receiver in the model of Tougaard (2000) did not exhibit any stochastic resonance, evidenced by a monotonic decrease in detectability with increasing input noise. The stochastic resonance effect is in this case was thus closely linked to the non-linear classifier. What about the receiver? Is it possible to have stochastic resonance effects originating in this element? If so, this would be revealed by a nonlinear relation between detectability and noise level. Such a non-linear relationship has been demonstrated by Ward et al (2002). 3.1 Integrate-and-fire models Ward et al. (2002) looked at the behaviour of two different integrate-and-fire neuron models. Integrateand-fire models convert an input signal into a train of impulse signals (spikes). The temporal pattern of the spikes correlates with the input signal to different degrees, depending on the signal-to-noise ratio as well as the absolute noise level. Integrate-and-fire models are essentially energy detectors, since they integrate stimulus intensity over time, weighted by a decay function which

86

secures that the most recent parts of the stimulus contribute more to the output than do more distant parts. Whenever the summated energy exceeds some predetermined threshold, a spike is elicited. The integrator is then most often reset (accumulated energy set to zero), and the integration process repeated. The FitzHugh–Nagumo neuron model of Ward et al. (2002) receives noisy inputs, and the output–in the form of spikes–is fed to a classifier, which decides ‘‘Yes’’ or ‘‘No’’, based on a spike-count criterion. The decision variable is thus a spike count, which is distributed according to two different distributions resembling Poisson distributions. An index of detectability, da0 (Simpson and Fitter 1973), was calculated, and a local increase in da0 is observed for increased input-noise variance indicating a stochastic resonance effect linked to the receiver part. However, the FitzHugh–Nagumo neuron is in itself non-linear since it contains a threshold which must be exceeded before a spike is elicited. The FitzHugh–Nagumo neuron can thus also be considered to consist of both a receiver part (the integrator) and a classifier (the spike generator). This is illustrated in Fig. 3. The classifier can be thought of as employing n different criteria simultaneously in order to separate inputs into n + 1 different categories: Given A1  A2      An i 2 A1 ; answer with n spikes i 2 A2 \ A1 ; answer with n 1 spikes Criterion: ... i 2 A \ A ; answer with one spikes n n 1 i 2 A ; answer with no spikes n This type of data can be used to construct a so-called rating ROC curve (see appendix in Cohn et al. 1975). This is done by sequentially evaluating the data with a new set of n criteria:

3.2 Extension of energy-detector model of Tougaard (2000) To illustrate the above, the model of Tougaard (2000) is extended below to include a ‘‘fire’’ part in addition to the ‘‘integrate’’ part, which it already contains. Details on the integration part can be found in Tougaard (2000). Briefly, the model consists of a simple integrator, integrating signals over a finite time interval T, expressed as a discrete approximation: IðsÞ ¼

ZT 0

s2 ðtÞdt 

WT 1 X s2 ðtÞ WT n¼1 n

ð12Þ

where s(t) is the time varying input signal and W is the receiver bandwidth. The distribution of the decision parameter i ¼ I(s) is close to Gaussian for not-too-small values of WT when the input is either Gaussian white noise with bandwidth W and RMS amplitude A2N (N condition), or the same noise plus a sine wave signal of amplitude (A2s ) (SN condition). If the classifier employs a fixed criterion of the type: say ‘‘Yes’’ for i > ic, with A2s < ic , a non-monotonic relationship between the proportion classified correct and input-noise intensity is observed (Fig. 4, solid line) interpreted as a stochastic resonance effect. This effect originates from the constant criterion of the classifier. If instead an optimal likelihood-ratio criterion is used, maximising proportion-correct classifications, performance decreases monotonically with noise intensity (Fig. 4, dashed line). The decision variable i in this simple model is continuous, in contrast to integrate-and-fire models. A spike-generating function M(i) is therefore added. This stimulus–response function transforms the integrator output i into a discrete number of spikes: n ¼ MðiÞ ¼ intðaebi Þ

ð14Þ

Criterion 1: answer ‘‘Yes’’ if one spike or more ði 2 An Þ: Criterion 2: answer ‘‘Yes’’ if two spikes or more ði 2 An 1 Þ: .. . Criterion n: answer ‘‘Yes’’ if n spikes ði 2 A1 Þ:

ð11Þ

The area below the ROC curve can then be calculated and used as a detectability measure.

Fig. 3. Detection process based on a spiking neuron. The neuron consists of a linear receiver (R1) and a spike-generation mechanism, comparable to a classifier (C1). This combined detector can be considered a second, non-linear receiver (R2) which delivers spikes to the final classifier (C2)

Fig. 4. Behaviour of the linear energy detector. Detection is measured as the proportion of correct classifications in a yes/no paradigm with P(N) ¼ P(SN). Solid line is the performance of the detector with a constant criterion, above the level of the stimulus at various levels of input noise. Dashed line is the performance with a variable and optimally adjusted likelihood-ratio criterion. WT ¼ 25, fixed criterion ¼ 3A2s

87

Given the spiking criteria ic1 ; ic2 ; . . . ; icn ; determined by the inverse spiking function M 1 ðnÞ : icn ¼ b lnða  nÞ FN ðic Þ for n ¼ 0 1 gN ðnÞ ¼ P ðnjNÞ ¼ FN ðic Þ FSN ðic Þ for n 1 nþ1 n FSN ðic Þ for n ¼ 0 1 ð15Þ gSN ðnÞ ¼ P ðnjSNÞ ¼ FSN ðic Þ FSN ðic Þ for n 1 nþ1 n An example of two such resulting spike-count distributions are shown in Fig. 5B, together with the corresponding rating ROC curve (Fig. 5C). The distributions of n|N and n|SN are clearly not Gaussian for low signal and noise intensities, where the number of spikes elicited is low. The standard definition of d 0 is thus not usable, but since both a mean and a variance can be computed for the distributions, an index such as da (Simpson and Fitter 1973) can be calculated. This index of detectability is given as pffiffiffi 2ðlSN lN Þ 0 da ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð16Þ r2N þ r2SN

Fig. 5A–C. Behaviour of spiking energy detector model: A spiking function, indicating number of spikes elicited by the spike generator as a function of the output from the linear integrator; B spike-count distributions for noise alone (filled bars) and signal plus noise (open bars) as input, with a constant signal-to-noise ratio; C receiver operating characteristics plotted from B. Data were evaluated sequentially with a range of spike count criteria from one or more spikes to 40 or more spikes. WT ¼ 50, A2N ¼ 36, A2s ¼ 4 (signalto-noise ratio ¼ )10 dB)

where the coefficients a and b are selected arbitrarily to give a realistic output (Fig. 5A). Real-world stimulus response curves level out above a certain input level (the neuron saturates), which is ignored in this analysis. The exact form of the stimulus–response function, however, is not important for this general illustration. The distributions of n|N and n|SN, termed gN(n) and gSN(n), can be calculated by sequentially employing the different criteria from (11) on the cumulative distributions FN and FSN:

When calculated this way, the detectability is found to be non-monotonic with noise intensity (Fig. 6A, solid line). It thus seems that the detectability of the signal displays stochastic resonance and more than that, da0 of the spike signal at the optimal noise level is actually higher than d 0 of the original signal, s(t) (Fig. 6A, dashed line). Although this may at first seem to indicate that the performance of the spiking model is superior, this in fact only demonstrates that d 0 and da0 are not numerically comparable. To compare detectabilities in a sensible way, we must turn to a measure such as the area below the ROC curve. For the continuous decision variable i, the area below the ROC curve is found from (7). For the spike-count variable n, the area can be found from simple geometry. Only integer criteria can be employed directly, but we know from (3) that neighbouring points in the ROC plot can be connected by straight lines. The area below the ROC curve can thus be found by summing the trapezoid-shaped areas between sequential criteria from 0 spikes to infinity: Areas ¼

1 X

ðGN ðnÞ GN ðn þ 1Þ

n¼0

  GSN ðnÞ GSN ðn þ 1Þ  GSN ðn þ 1Þ þ 2 where GN and GSN are the cumulated distributions corresonding to gN and gSN. The area below the ROC curve decreases monotonically with increasing noise for the continuous criterion, i (Fig. 6B, dashed line), as expected from the detectability (Fig. 6A, dashed line). The area below the ROC curve for the discrete spike-count criterion shows one global and two local maxima (Fig. 6B, solid line), also paralleling the corresponding detectability curve (Fig. 6A, solid line). Several key points are illustrated in these two curves. The performance of the spike-count classifier,

88

Fig. 6A,B. Performance of spiking-energy detector compared to that of a linear detector. A Detectability of the signal at increasing input noise levels. Solid line, spiking model (da0 ); dashed line, linear model (d 0 ). Calculation of detectability described in text. B Area below ROC curve, both for spiking model (solid line) and linear model (dashed line) at various levels of input noise. Curves for spiking model in both A and B display a global maximum at a non-zero noise level, indicating stochastic resonance. Signal and noise parameters as in Fig. 5

measured as 2AFC proportion correct (area below the ROC curve), shows a pronounced optimum at a nonzero noise level. At low noise levels the output i of the integrator virtually never exceeds the threshold for eliciting the first spike. The a posteriori probabilities P(N|n) and P(SN|n), where n is the number of spikes, are thus virtually zero for all n values larger than zero, leaving P(N|0) and P(SN|0) almost equal to the a priori probabilities P(N) and P(SN). Virtually no information about the stimulus is thus transferred to the classifier, which consequently cannot do better than random guessing. As the noise level increases, the likelihood that a spike is elicited also increases. At some optimal noise level a single spike will often be elicited as a response to an SN stimulus, whereas none will be the predominant output in a noise-only situation, leading to a better separation of the two distributions and hence an increased detectability. At higher noise levels, a spike is also often elicited on noise-only presentations, causing a drop in performance. Since more than one spike can be elicited, the situation is repeated for two and three spikes, leading to a second and third (local) optimum in

Fig. 7A,B. Spike-count distributions and ROC of the A1 auditory receptor cell of a noctuid moth. A Frequency of occurrence for zero to six or more spikes per stimulus presentation. Open bars show spontaneous activity counted in a 50-ms window. Filled bars are spikes evoked by a 5-ms pure-tone signal, also counted in a 50-ms window, beginning at stimulus onset. B ROC plot of data in A. Data were evaluated sequentially with spike count criteria from one or more spike to five or more spikes. Criteria are indicated next to the corresponding point of operation. Solid curve is the best-fitting ROC curve assuming Gaussian distributions with equal variances, and corresponds to d 0 ¼ 1:3. Data from Tougaard (1999b)

performance. This effect is known as stochastic multiresonance (Vilar and Rubı´ 1997), and is a consequence of the multiple thresholds for spike generation. However, the variances of the spike-count distributions increase with the increasing noise level, leading to a progressively smaller improvement in performance. At higher noise levels there are no more local optima, and performance reaches the performance of the linear classifier asymptotically. This effect, termed linearisation by noise (Chialvo et al. 1997), happens according to the central limit theorem, dictating that the spike-count distributions approximate Gaussian distributions better and better as the noise variance and hence degrees of freedom increase. For the continuous decision variable i, the situation is drastically different. As the decision variable is continuous, the system is perfectly linear and no lower limit in noise level exists which can prevent passage of information through to the classifier. As noise level increases, the signal-to-noise ratio of course decreases, and with it detectability and performance. No local optima are observed, underlining the requirement of a non-linear process in the detector in order to observe stochastic resonance effects.

89

The most important point from Fig. 6 is that no noise level can be found where the performance of a spikecount classifier exceeds that of the continuous (linear) classifier. Increasing the input noise to overcome a receiver non-linearity is thus suboptimal compared to the linear situation. This again stresses the fact that stochastic resonance cannot do wonders in detection in the sense that no non-linear system exploiting stochastic resonance can ever do better than the corresponding linear system (Dykman and McClintock 1998; Petracchi 2000; Petracchi et al. 2000).

Acknowledgements. The author is financed by the Danish National Research Foundation, Centre for Sound Communication. Special thanks goes to Bent Jørgensen for fruitful discussions.

3.3 Stochastic neuron models

References

The FitzHugh–Nagumo model of Ward et al. (2002) is deterministic in the sense that the same input always results in the same output. It is thus straightforward to apply the above analysis to this model. It becomes slightly more difficult with the second model of Ward et al. (2002), which is a Bezrukov–Vodyanov type of model (Bezrukov and Vodyanov 1997). This model is more realistic due to the presence of a stochastic component. This means that the criterion sets A1 to An are defined only in terms of probabilities, and the exact same input may on repeated presentations be classified sometimes into one class and sometimes into another. This is clearly parallel to real neurons, where various number of spikes can be elicited by the same repeated stimulus. We can nevertheless still construct a rating ROC curve as above, which will provide information on the average criteria of the classifier. An example of such a rating ROC from a real neuron (an auditory receptor cell from an insect ear) is shown in Fig. 7, together with the spike-count distributions. From this example it is clear that although the classifier as such is inaccessible, and limited knowledge on the nature of criterion and decision variable is available, a level of detectability can still be assigned to the stimulus. This is the true detectability of the receiver. An outside observer, however, has no choice but to treat the output number of spikes as the decision variable and base the classification on this, even though this parameter is itself the result of a classification (and thus non-linear) process.

Bamber D (1975) The area above the ordinal dominance graph and the area below the receiver operating characteristics graph. J Math Psychol 12: 387–415 Bezrukov SM, Vodyanoy I (1997) Stochastic resonance in nondynamical systems without response thresholds. Nature 385: 319–321 Chialvo DR, Longtin A, Mu¨ller-Gerking J (1997) Stochastic resonance in models of neuronal ensembles. Phys Rev E 55: 1798– 1808 Cohn TE, Green DM, Tanner WPJ (1975) Receiver operating characteristic analysis. Application to the study of quantum fluctuation effects in optic nerve of Rana pipiens. J Gen Physiol 66: 583–616 Collins JJ, Imhoff TT, Grigg P (1996) Noise-enhanced tactile sensation. Nature 383: 770–770 Douglass JK, Wilkens L, Pantazelou E, Moss F (1993) Noise enhancement of information transfer in crayfish mechanoreceptors by stochastic resonance. Nature 365: 337–340 Dykman MI, McClintock PVE (1998) What can stochastic resonance do? Nature 391: 344 Egan JP (1975) Signal detection theory and ROC analysis. Academic, New York Gammaitoni L, Ha¨nggi P, Jung P, Marchestoni F (1998) Stochastic resonance. Rev Mod Phys 70: 223–287 Green DM (1964) General prediction relating Yes-No and forced choice results. J Acoust Soc Am 36: 1042 Green DM, Swets JA (1966) Signal detection theory and psychophysics. Wiley, New York Kaernbach C (1991) Poisson signal-detection theory: link between threshold models and the Gaussian assumption. Percept Psychophys 50: 498–506 Levin JE, Miller JP (1996) Broadband neural encoding in the cricket cercal sensory system enhanced by stochastic resonance. Nature 380: 165–168 Licklider JCR (1964) Theory of signal detection. In: Swets JA (ed) Signal detection and recognition by human observers. Wiley, New York, pp 95–121 Macmillan NA, Creelman CD (1991) Detection theory: a user’s guide. Cambridge University Press, Cambridge Mitaim S, Kosko B (1998) Adaptive stochastic resonance. Proc IEEE 86: 2152–2183 Neymann J, Pearson ES (1933) On the problem of the most efficient tests of statistical hypotheses. Philoso Trans R Soc Lond A 231: 289–337 Peterson WW, Birdsall TG, Fox WC (1954) The theory of signal detectability. Trans. IRE Professional Group on Information Theory 4: 171–212 Petracchi D (2000) What is the role of stochastic resonance? Chaos Solitons Fractals 11: 1827–1834 Petracchi D, Gebeshuber IC, DeFelice LJ, Holden AV (2000) Stochastic resonance in biological systems. Chaos Solitons Fractals 11: 1819–1822 Simpson AJ, Fitter J (1973) What is the best index of detectability? Psychol Bull 80: 481–488

4 Conclusion In Sect. 1 it was asked whether stochastic resonance can improve not only the observed performance of a given system but also improve detectability of the signal itself. The answer for spiking neural models, and likely also for real neurons, is that this is not possible. By using the spikes generated by the neuron as a decision variable, a non-monotonic relation between input noise level and detectability can be generated, indicating a beneficial role of the noise. This local improvement may be demonstrable by an outside observer and may indeed in some situations be beneficial to the receiving central nervous system in the case of a real sensory neuron. However, as the role of the noise is to compensate for the inherent

non-linear process of spike generation itself, a happy end is still within reach, despite the non-monotonic relationship between signal-to-noise ratio and detectability. The detectability as seen by the neuron itself is not influenced by the spike-generation process, and decreases monotonically with input noise level, in full correspondence with the central dogma of signal detection theory.

90 Tougaard J (1999a) Stochastic resonance and the role of noise in hearing. In: Proceedings of the 27th Go¨ttingen Neurobiology Conference. Georg Thieme, Stuttgart, p 261 Tougaard J (1999b) Receiver operating characteristics and temporal integration in the moth ear. J Acoust Soc Am 106: 3711–3718 Tougaard J (2000) Stochastic resonance and signal detection in an energy detector – implications for biological receptor systems. Biol Cybern 83: 471–480

Vilar JMG, Rubı´ JM (1997) Stochastic multiresonance. Phys Rev Lett 78: 2882–2885 Wannamaker RA, Lipshitz SP, Vanderkooy J (2000) Stochastic resonance as dithering. Phys Rev E 61: 233–236 Ward LM, Neiman A, Moss F (2002) Stochastic resonance in psychophysics and in animal behavior. Biol Cybern

Suggest Documents