2308
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 5, MAY 2011
A Parametric Copula-Based Framework for Hypothesis Testing Using Heterogeneous Data Satish G. Iyengar, Student Member, IEEE, Pramod K. Varshney, Fellow, IEEE, and Thyagaraju Damarla, Senior Member, IEEE
Abstract—We present a parametric framework for the joint processing of heterogeneous data, specifically for a binary classification problem. Processing such a data set is not straightforward as heterogeneous data may not be commensurate. In addition, the signals may also exhibit statistical dependence due to overlapping fields of view. We propose a copula-based solution to incorporate statistical dependence between disparate sources of information. The important problem of identifying the best copula for binary classification problems is also addressed. Computer simulation results are presented to demonstrate the feasibility of our approach. The method is also tested on real-data provided by the National Institute of Standards and Technology (NIST) for a multibiometric face recognition application. Finally, performance limits are derived to study the influence of statistical dependence on classification performance. Index Terms—Copula theory, hypothesis testing, Kullback-Leibler divergence, multibiometrics, multimodal signals, multisensor fusion, statistical dependence.
I. INTRODUCTION
Fig. 1. A multisensor system with common region of interest (ROI). Different shapes for the sensors denote different sensing capabilities.
M
ULTIMODAL or heterogeneous signal processing offers several advantages and new possibilities for system improvement in many practical applications. For example, audio-visual interactions are particularly significant in human speech communication. Enhancement of speech intelligibility (in the presence of background noise) with the help of visual cues from the lip movements is a prime example of such joint information processing. The use of visual cues in addition to the acoustic signal increases redundancy and thus makes speech intelligibility more robust to noise. In medical imaging, images of multiple modalities such as MRI and CT, are acquired from the same set of subjects. Even within MRI, structural images of high resolution are acquired along with low-resolution images of brain chemistry or blood flow. Thus, the level of detail and Manuscript received May 19, 2010; revised September 18, 2010, December 20, 2010; accepted December 21, 2010. Date of publication February 14, 2011; date of current version April 13, 2011. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Aleksandar Dogandzic. This work was supported by the Army Research Laboratory and was accomplished under Cooperative Agreement No. W911NF-07-2-0007 and ARO grant W911NF-09-1-0244. The material in this paper was presented at ICASSP 2009,Taipei, Taiwan. S. G. Iyengar and P. K. Varshney are with the Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY 13244 USA (e-mail:
[email protected]). T. Damarla is with the U.S. Army Research Lab, Adelphi, MD 20783 USA (e-mail:
[email protected];
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSP.2011.2105483
the type of information present in one modality may be different from the other and these complementarities motivate the acquisition of multimodal images. Another example, discussed in more detail in Section V, is the design of multibiometric systems where disparate human signatures such as face, iris and fingerprints, are combined for verifying the identity of the user. In this paper, we propose a parametric framework for fusing information from heterogeneous sources for binary hypotheses testing problems. Our goal is to develop a detection methodology for a canonical heterogeneous system shown in Fig. 1. We also study the influence of intermodal dependence on classification performance. Our proposed solution uses the statistical theory of copulas to model the joint distribution of data from heterogeneous sources. The topic of heterogeneous signal processing has been previously considered in the literature. Speaker detection is addressed in [1] where a time-delay neural network is used to learn the correlation between a single microphone signal and a camera image. In [2], the tracking problem is addressed by modeling the audio-video interaction as a Bayesian network. The speaker tracking problem is approached as a particle filtering problem in [3]. Kidron et al. [4] describe a canonical correlation analysis (CCA)-based approach to fuse audio-visual information to localize pixels in images (sampled video signal) that correspond to the acquired audio signal. In [5], the authors use CCA to extract audio and lip features for speaker identification. These methods,
1053-587X/$26.00 © 2011 IEEE
IYENGAR et al.: HYPOTHESIS TESTING USING HETEROGENEOUS DATA
however, fail to extract any nonlinear dependence information. Information theoretic approaches [6]–[8] may be more suitable. Besson et al. [7] use an information theoretic framework to extract audio features with respect to corresponding video streams. Fisher and Darrell [8] consider the problem of speaker association using audio-visual information fusion and use mutual information as the metric. In [9], passive infrared (PIR) sensors were used in conjunction with a video sensor to resolve changes in the direction of motion during moments of occlusion for a people tracking application. Most studies discussed above consider a data association problem where the main interest is in the detection of changes in statistical dependence between heterogeneous sources (e.g., CCA or mutual information-based methods). We consider a more general formulation in this paper. Our goal is to detect both, the changes in statistical dependence (common information) between heterogeneous sources as well as variations in marginal distributions (complementary information). One of the main difficulties limiting the design of heterogeneous systems is the scarcity of good models to describe the joint statistics of heterogeneous data [8]. Random variables that characterize the data from each source in a heterogeneous signal processing system may follow disparate probability distributions. This is because of the differences in the physics that govern each modality. There may also be differences in signal dimensionality, support and sampling rate requirements across modalities. Moreover, in most applications, the signals share a common source and thus may exhibit statistical dependence. Thus, a good statistical model for heterogeneous data must meet the following two requirements, 1) The model should allow for disparate marginals. 2) The model must also account for any statistical dependence that may be present between the marginals. Often one chooses to assume simple tractable models such as multivariate Gaussianity or intermodal independence (also called the product model) leading to suboptimal solutions. In the following sections, we address modeling issues in detail and propose another approach based on copula theory. While satisfying the two modeling requirements discussed above, the copula approach also allows us to decouple the common and complementary information. We also consider the important problem of copula misspecification and its effect on classification performance and derive application specific rules for choosing the best copula. The use of copula functions is quite prevalent in econometrics and finance. However, it is only recently that their usefulness and applicability to signal processing problems has been realized. For example, in [10], the authors consider the problem of tracking a colored object in video. Application of copulas for texture classification and segmentation of multicomponent images has been shown in [11] and [12], respectively. In [13], we considered the problem of detecting footsteps where CCA was used in conjunction with copulas to fuse acoustic and seismic measurements. However, the problem of copula selection was not addressed in any of these studies. Copula theory was used to detect changes between two remotely sensed images before and after the occurrence of an event [14]. Here, the authors choose a copula by visual inspection using a graphical method based on
2309
K-plots [15]. In this work, we adopt a more analytical approach which allows us to exactly link classification performance to copula fitting. The rest of the paper is organized as follows. Heterogeneous data modeling using copulas is discussed in detail in Section II. A binary hypotheses testing problem is then formulated in Section III and a copula-based detector is designed. The important problem of identifying the best copula is considered in Section IV. One of the main contributions of this paper is the derivation of application-specific copula selection rules based on criteria such as the overall loss in detection power (due to model mismatch) and area under the receiver operating characteristic (ROC). A multibiometrics application is presented in Section V to evaluate the theory developed in the previous sections on real-data. Asymptotic performance limits are derived in Section VI to study the impact of intermodal dependence on classification performance before concluding in Section VII. II. STATISTICAL MODELING OF HETEROGENEOUS DATA USING COPULAS The discussion in the previous section motivates the following general definition for heterogeneous random vectors that would arise with multimodal signals. governing the Definition 1: A random vector joint statistics of an N-variate data set is termed as heterogeare nonidentically distributed1. neous if the marginals Often statistical independence is assumed when dealing with heterogeneous data sets for mathematical tractability and the of the heterogejoint probability density function (pdf) neous random vector is approximated as a product of the marginals (1) We will refer to the model in (1) as the product model or the independence model. As is evident, the product model allows for disparate marginals but does not account for any joint statistical behavior that may be present across the disparate marginals. Alternatively, one may model using a multivariate Gaussian pdf with a covariance matrix . Two limitations plague this approach: (a) the covariance matrix can characterize only linear relations between the variables. It is, thus, a weak measure of dependence when dealing with non-Gaussian random variables [16] and (b) the assumption of joint normality constrains the to follow Gaussian distributions. Yet another marginals approach for modeling heterogeneous observations could be the use of Nataf transform [17] where one transforms the marginals where to the standard normal space, i.e., is the standard normal cumulative distribution function (cdf). is then estimated as The joint pdf (2) 1Definition 1 is, of course, inclusive of the special case when the marginals are identically distributed and/or are statistically independent. It also encompasses the case when the signals, although sharing a common modality (e.g., two acoustic sources or two video sources), may exhibit different statistics (due to different locations, signal strengths, etc.). Hence, we use the term “heterogeneous” instead of “multimodal” in this paper.
2310
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 5, MAY 2011
TABLE I COPULA FUNCTIONS USED IN THIS PAPER
where and are the N-dimensional multivariate is the coand univariate standard normal pdfs, respectively. variance matrix in the transformed (normal) space. Copula theory, discussed next, provides a generalized framework that includes Nataf transform as a special case.
are univariate marginal cdfs Theorem 2: If is an dimensional copula, then the function and if
A. Copula Theory
is a valid -variate cdf with marginals . A copula-based parametric model can be derived by taking the th–order derivative of (5) to obtain
Copulas are functions that couple multivariate joint distributions to their component marginal distribution functions [18]–[20]. The following theorem by Sklar is central to the statistical theory of copulas. Theorem 1: (Sklar’s Theorem): Let be an -dimensional cdf with continuous marginal cdfs . Then there such that for all in exists a unique copula
(3) Note that the copula function is itself a cdf (by probawith uniform marginals as bility integral transform). The joint density can now be obtained by taking the th–order derivative of (3)
(4) weights the product distribution Thus, the copula density appropriately to incorporate dependence between the random variables . Theorem 1 also admits the following converse, especially useful in practice when the true distribution (and hence the true copula ) is typically unknown. It allows one to construct a statistical model by considering, separately, the univariate behavior of the underlying marginals and the dependence structure specified by some copula . This is well suited for modeling a heterogeneous random vector where different distributions might be needed for each marginal.
(5)
(6) (7) From (2) and (6), it is evident that the Nataf transform is equivalent to the copula approach with the Gaussian copula density as the weighting function. Several copula functions other than the Gaussian copula with different dependence characteristics have also been defined in [19] and [20]. Table I provides a list of copula functions considered in this paper. It may also be required to estimate the parameters of the chosen copula function (e.g., , , in Table I) from the acquired data. These parameters control the shape of the copula function and can be estimated by exploiting their relations to other nonparametric measures of association such as Kendall’s or Spearman’s . We, however, use a maximum likelihood (ML)-based approach known as the method of inference functions for margins (IFM) [21] to estimate the copula dependence parameters (see Appendix). It is evident that model mismatch errors are introduced when ; i.e., the selected copula does not represent the true dependence structure. This leads to suboptimal performance. from An important question then is, how does one choose a finite set (say ) of copula densities and, in particular, for the specific problem at hand? A second source of mismatch is related to the uncertainties in the marginal distributions. Copula . parameters can still be estimated using empirical cdfs, This approach is known as the canonical maximum likelihood
IYENGAR et al.: HYPOTHESIS TESTING USING HETEROGENEOUS DATA
2311
(CML) method [22]. Uncertain marginals naturally degrade the quality of the copula parameter estimates. However, given a reasonable amount of training data, it may still be possible to obtain fairly accurate and consistent estimates of marginal distributions. In this paper, we assume full knowledge of the marginal statistics. Thus, mismatch, if any, is only due to copula misspecification. This allows us to test the efficacy of copula modeling relative to the simpler product model in the context of binary classification. Note that such a comparison would not be possible in the absence of parametric forms for the marginal pdfs. The assumption of known marginal pdfs allows us to compare the copula approach with the best performing product model (since there is no uncertainty in the marginal pdfs). Further, as we show in the following sections, the restriction to known marginal pdfs case also allows us to theoretically link errors due to copula misspecification and classification performance. In the following, we formulate a binary classification problem where given the measurements, one needs to decide which one or is true. We first consider the of the two hypotheses design of the test assuming the knowledge of the true copula density and consider the important issue of identifying the best copula for binary classification problems in Section IV. III. THE HYPOTHESIS TESTING PROBLEM Consider the following problem: Let denote a sequence of i.i.d. observations from an -variate heterogeneous random vector with pdfs and under hypotheses and , respectively. An optimal test in both the Neyman–Pearson (NP) and the Bayesian , the log-likelihood ratio sense computes the test statistic (LLR) and decides in favor of when the ratio is larger than a threshold (8) The two possible errors associated with a binary classification versus ) as defined above are, (a) Type I error problem ( and (b) Type II or the false alarm rate, . error or the probability of miss Using the copula representation for pdfs and as in (4), the copula-based test statistic becomes
(9) where the copula density denotes the statistical dependence structure between the variables under the hypothesis . Further, (respectively, ) denotes the marginal pdf under the hypothesis . Similarly, (respectively, cdf) of
will be denoted as and , retheir counterparts under spectively. It is interesting to note the form of the test statistic in (9). The first term (10) corresponds to the differences in the marginal statistics of each modality (or the product distribution) across the two hypotheses while the cross-modal dependence and interactions are included in the second term. Equation (10) is also the test statistic obare statistically tained when (a) the variables independent or (b) when dependence between them is deliberately neglected for analytical simplicity or due to lack of knowland/or . The latter case edge of dependence structures naturally results in a suboptimal performance. In contrast, we reand/or by place the unknown dependence structures and/or , respectively. their estimates in the next We derive methods to choose copula densities section. IV. COPULA SELECTION FOR HYPOTHESIS TESTING PROBLEMS Approaches for copula selection in the past have mostly focused on data modeling; i.e., they select the copula that best fits the data (e.g., [23] and [24]). We, however, propose an application-specific copula selection rule in this paper. For example, the performance of a binary hypotheses test, which is of interest and the rein this paper, depends only on the statistics of sultant hyperplane separating the two hypotheses rather than the density estimates of the observations under each hypothesis. Thus, it is possible that even though density estimates may be coarse, the resultant classification rates may still be within the required limits ([25, p. 124]). Thus, attempts to model pdfs with high accuracy may prove to be an overkill for classification problems. Another advantage of such an application-specific approach is that it allows for the exact quantification of performance gain (if any) and thus one can study the resultant gain with respect to the added detector complexity due to copula processing. Our goal, in this section, is to derive a copula selection rule that results in best classification performance. The most appropriate metric for comparing different models in the context of a binary hypotheses testing problem is the ROC. However, such a comparison would require the derivation of the pdfs of the test statistic for each copula model in contention. This is difficult and often not feasible in most practical scenarios. In the following, we therefore, resort to measures that are more tractable, while also providing some performance quantification. We first consider the case when pdf under one of the hypotheses is known. A. Distribution Under One of the Hypotheses is Known In problems where the distribution of measurements under one of the hypotheses (say ) is known, it has been shown in [26] that loss in the detection power when (pdf under ) in (8) is misspecified as
is equal to the Kullback-Leibler
2312
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 5, MAY 2011
(KL) divergence2, . Note that is, in fact, a global measure obtained by integrating over all possible threshold . In other words, is the amount by which the values area under , , decreases due to mismatch in . Our such that is goal is to select the copula density minimized. For the product model (i.e., when statistical independence is is given as assumed), loss in detection power (11) where
is the multiinformation [27] between variables under . Loss when the true dependence structure is approximated by a copula density can be derived
as
Fig. 2. Scatter plot of 500 realizations. Nonzero dependence between observa(Column 1) is evident from the figure. tions under
H
However, note that it is not possible to evaluate (14) and (17) since they require an expectation w.r.t. the true joint pdf which is unknown. We, therefore, approximate them using the sample expectation. For example, the selection rule now becomes
(12) (18) where can be interpreted as an estimate of when the copula density is used. Note that since . It is clear from (11) and (12) that
obtained
(13) that satisfy the improvFurther, of all copula functions in set where ability condition (13), the copula density (14)
results in the maximum reduction in performance loss [from (12)] and is, the best choice to model the dependence structure (under ). The corresponding performance gain (increase in ) over the product model is equal to the area under (15)
(16) (17) 2
D(p; q) = p(z) log p(z)=q(z)dz
Next, we present an illustrative example. 1) Example 1: Consider a network of three heterogeneous sensors monitoring a region of interest (ROI) for the presence of a target. The sensors communicate their observations to a fusion center whose role is to combine the received information and make a global decision regarding the presence or absence of the target. Sensors usually transmit a compressed version of their observations due to limitations on the available transmission bandwidth and power. We assume so that analog compression rules at each sensor the marginal random variables are continuous under both hypotheses3. As an example, we assume that the heterogeneous data , , follow and received at the fusion center under hypothesis Gaussian, exponential and beta distributions, respectively,
(19) where and . Further, without loss of (the product copula under ), generality, we assume A common random phenomenon i.e., causes a change in the statistics of the three sensors’ data and the sensor measurements are thus statistically dependent under hypothesis . In order to generate dependent data with the
n
3The compression rule (1) at each sensor can be thought of as a local feature extraction step operating on individual signals.
IYENGAR et al.: HYPOTHESIS TESTING USING HETEROGENEOUS DATA
2313
Fig. 3. P versus obtained for different copula functions: Parameters for the marginal pdfs are ( = 1; = 1:1), ( = 2:1; = 2:2), (b = b = 1; a = 1:8; a = 2). The shaded region represents the increase in the area (over the product model) when FGM copula is used to model dependence under H .
Fig. 4. Multiinformation estimates (averaged over 5000 Monte Carlo trials) with symmetric one standard deviation error bars and Gain (15) computed using the trapezoidal rule, for all copula functions. The number of trials (in %) for each copula for which the improvability condition (13) was NOT satisfied is also indicated.
marginals given in (19), we define two auxiliary random variand . It ables follows then that (20) (21) Statistical dependence between sensor observations is evident from the scatter plot shown in Fig. 2. Their joint pdf can be derived as Fig. 5. Monte Carlo-based receiver operating characteristics: Parameters for the marginal pdfs are ( = 1; = 1:1), ( = 2:1; = 2:2), (b = b = 1; a = 1:8; a = 2).
where , , and , is an is the gamma indicator of event and function. The optimal test at the fusion center can now be obtained as
(22) In cases where pdf under is unknown, we use a copulabased test in place of (22) at the fusion center. We set the deci(averaged over 5000 sion window to 20 samples and plot Monte Carlo trials) as a function of the decision threshold for all copula functions in Fig. 3. The plot for the optimal test is also included. The shaded region in the figure represents the increase in (over the product model) when fusion rule based on the FGM copula is employed. Further, as derived in (17), increase (or decrease) in is exactly equal to the multiinfor. This is confirmed in Fig. 4 which shows that the mation, and Gain (15) computed using sample-based estimates of the trapezoidal rule are more or less equal for all copula functions. The figure also shows that the selection rule (18) chooses
the Student’s t-copula to model dependence under . Receiver operating characteristic (ROC) curves obtained for the optimal test and tests using different copula densities are shown in Fig. 5. Note that the optimal test achieves zero false alarm rate for all detector thresholds. This is because, it is highly likely that for at least one of the observations belonging . Thus, as increases. to The number of trials for which the improvability condition was not satisfied is indicated in Fig. 4. For the Clayton copula model, was less than zero for 4833 of the 5000 trials of the trials). For the other three copulas, was al( most never less than zero during the 5000 trials. This is the cause for the Clayton copula-based ROC to be lower than that obtained using the product model (Fig. 5). The mismatch between the actual dependence structure and the Clayton copula density is quite severe so that it results in a nonconcave ROC. It is thus important to develop good copula selection rules. As evident from Fig. 4, the selection rule derived in this paper [see (18)] correctly discards the Clayton copula and identifies the Students’ t-copula as the best choice among the four copulas tested in this example. B. Distributions Under Both
and
Unknown
When distributions under both hypotheses are unknown, exact computations of and are not possible and we
2314
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 5, MAY 2011
TABLE II AUC ESTIMATED USING THE WILCOXON-MANN-WHITNEY STATISTIC
propose a copula selection procedure based on a simpler performance criterion, the area under the ROC curve (AUC). AUC is defined as (23) where as it is a portion of the unit square. Application of AUC as a performance measure can be found in many fields such as mathematical psychology, psychophysics, medical diagnotics, signal processing and machine learning (e.g., [28]–[30]). The measure is completely independent of the threshold . Under the assumption of normality for the pdfs of , it was shown in [31] that
Fig. 6. ROC curves obtained when Clayton and Gumbel copula densities model and , respectively.
H
H
(24) where the subscript assumption and
is used in (23) to denote the normality (25)
. From (25), it is clear that is a monotonic function of We, thus, have a selection rule where we choose copula densiand such that is maximized. Note that the ties -based selection rule is more accurate in the case of large when the pdfs of the test statistic under both hypotheses ap). proach normality (as For the case when the Gaussian assumption for the test statistic is not valid, one can use the Wilcoxon-Mann-Whitney (WMW) statistic, a nonparametric approach to estimate AUC and , the LLR scores under [28], [32]. Given and , respectively, the WMW statistic (also known as the U-statistic) is given as
(26) Note that (26) is the estimate of the probability [28]. Next, we next present an illustrative example. 1) Example 2: We consider a two sensor example where the and follow Gaussian and Exponential disobservations tributions, respectively, under the two competing hypotheses. and with prescribed In order to generate dependent data marginals, we use the same method as described in Example 1. Thus (27)
Fig. 7. ROC curves obtained when Gaussian and Clayton copula densities and , respectively. model
H
H
However, unlike Example 1, we consider dependence under both hypotheses and consider AUC-based copula selection. , for copula pairing and , AUC estimate, is evaluated using (26). We also calculate the difference , where is the AUC estimated using product models under both and , i.e., . Table II and corresponding in parentheses, for sevcontains eral copula pairings. For example, when Clayton and Gumbel and copula densities are used to model dependence under , respectively, AUC is approximately 64%. Further, a negasuggests that the product model outperforms tive the copula approach. This is confirmed in Fig. 6 which shows the ROCs obtained using both the copula and product models. On the contrary, it can be seen from Table II, that the copula approach, and , when Gaussian and Clayton copula densities model respectively, performs better than the product model (in terms of AUC). The corresponding ROC is shown in Fig. 7.
IYENGAR et al.: HYPOTHESIS TESTING USING HETEROGENEOUS DATA
C. Further Discussion on Selection Methods
and
2315
Based Copula
It is important to note that, while and are attractive because of their simplicity, they are still heuristic measures and their optimization may not always result in the best ROC. is, in fact, equivalent to minimizing the KL Maximizing and its copula-based estidivergence between the true pdf (12), an approach widely used for quantifying dismate tance between two pdfs. However, KL divergence is not a true distance metric as it is asymmetrical and fails to satisfy the triangular inequality. This also limits the criterion in characterizing the detector perforutility of the mance in terms of the ROC curve. For example, the perceptible gap between the ROCs obtained using the product and Clayton copula models (as well as its nonconcavity) (Fig. 5) is not ev-based copula selection is a ident in Fig. 3. Nevertheless, useful approach due to its reduced complexity while still providing a systematic KL divergence-based approach for copula selection. Similarly, it may be useful in some applications to focus on the area computed for a portion of the ROC (as opposed to the entire ROC)
It is likely that in many cases a closed form for is not available. One may need numerical integration in such cases. In the next section, we apply the copula-based selection framework to a real-world application. V. MULTIBIOMETRICS: AN APPLICATION Biometrics-based automatic person recognition systems use human physical features such as face, fingerprints or iris as passwords. For example, in building access control applications, a person’s face may be matched to templates stored in a database consisting of all enrolled users who are allowed entry to the building. Decision to allow or deny entry is then taken based on the similarity score generated by the face matching algorithm. In a multibiometric system, not one but several biometric features are fused. Multibiometric data are clearly heterogeneous as per Definition 1. We apply the copula-based method to fuse similarity scores from two different face matching algorithms to classify between and impostor users. Our goal is to evalgenuine uate the performance of the copula-based test using real data. We use the publicly available NIST-BSSR 1 database [33]. The database includes similarity scores from two commercial face recognizers and one fingerprint system and is partitioned into three sets. We use the NIST-face dataset which consists of match scores from three thousand subjects in our experiment. Two samples (match scores) are available per subject, thus resulting impostor in a total of 2 3000 genuine and scores. The scores are heterogeneous as the two face recognizers use different algorithms to match the frontal face images of the subjects under test. Face matcher 1 quantifies the similarity between two images using a scale from zero to one while the second matcher outputs score values ranging from zero to
Fig. 8. Scatter plot of the genuine and impostor scores from the two face matchers.
one hundred with higher values indicating a better match between the face image of the subject under test and the template recorded previously and stored in the database. A scatter plot of both the genuine and impostor scores is shown in Fig. 8. is reThe scatter plot shows that sometimes a score of ported by face matcher 1. Surprisingly, this is true even with some genuine match scores, i.e., when the acquired face image of a subject was matched to his own template in the database. This may have been due to errors during data acquisition, image registration or feature extraction due to the poor quality of the image. Negative one thus serves as an indicator to flag the incorrect working of the matcher. The fusion performance will thus depend on how this anomaly is handled in the decision making. In our design, the fusion center does not make a global decision upon receiving the error flag. Instead the subject claiming an identity is requested to provide his/her biometric measurement again. This request for a retrial design ensures that scores . We emulate this by are always in the valid range deleting all the users whose match scores were reported as and present results using genuine and impostor scores. Data (under each hypothesis) is partitioned into two subsets of equal size where the first subset is the training set used for model fitting and the second is the testing set used for system’s performance evaluation. Thirty different training-testing sets (resamples or trials) are obtained from the same data by randomizing the partitioning process described earlier. We denote . System performance each resample/trial by averaged across all trials is then obtained. We now describe the training and the performance evaluation steps in detail. Given the training set, the joint pdf (of scores from the two face matchers) is estimated by first modeling the marginal pdfs and then estimating the parameters of the selected copula densiand . A Gaussian mixture model (GMM) is used ties to fit the scores generated by both face matchers [34]. Fig. 9 shows the estimated marginal pdfs for both impostor and genuine scores. This is followed by the copula modeling step. We consider five families of copula functions, namely, the Gaussian copula, the Student’s t-copula, Clayton copula, Frank copula, and the Gumbel copula. Maximum likelihood estimates of parameters of each copula function are obtained using the IFM approach (see the Appendix) and copula functions are compared on the basis of the resultant detection performance. Specifically,
2316
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 5, MAY 2011
Fig. 9. Gaussian mixture models for match scores from the two face matchers (FM). (a) FM C Impostor scores. (b) FM G Impostor scores. (c) FM C Genuine scores. (d) FM G Genuine scores.
Fig. 10. ROC for the “Request for retrial” design.
the AUC criterion is used as the performance metric and we and so that AUC [see (25)] is maximized. choose Model fitting is then followed by performance evaluation using the testing data. False alarm and detection rates are com. A mean ROC curve is obtained by puted for each averaging and across the thirty training-testing sets, i.e., and . This mean ROC is shown in Fig. 10. It can be seen from Fig. 10 that the product fusion rule outperforms individual face matchers as expected. Further, the superiority of copula-based processing over the product rule is also evident from the plots. The copula fusion rule, in addition to the differences in the marginal pdfs, is also able to exploit the differences in the dependence structures across the two hypotheses as evident in Fig. 11; a plot which shows the average (over all thirty resamples) mutual information between the genuine and impostor scores. Finally, we note here that [35] addressed the biometrics scores fusion problem using copulas and observed no improvement over the product fusion rule. The authors modeled the marginal pdfs as a mixture of discrete and continuous components
Fig. 11. Mutual information estimates averaged over 30 resamples. The plot also shows symmetric one standard deviation error bars.
to account for the error flags (negative ones). However, copula methods require the marginal distributions to be strictly continuous. Further, their analysis was limited to the use of Gaussian copula densities which may be insufficient to model the intermodal dependence. In this paper, we have employed a different approach to handle error flags and have considered the use of a more general family of copula functions. These reasons could explain the differences between our results and those in [35]. VI. ROLE OF DEPENDENCE Thus far, it was assumed that the ground truth corresponded to nonzero statistical dependence across disparate marginals and copula functions were used to account for it when modeling heterogeneous data. The impact of model mismatch errors due to incorrect dependence characterization (copula misspecification) on binary classification performance were studied. In this section, we consider a different framework where the goal is to study the impact of dependence on classification performance when there is no mismatch. Specifically, we compare two setups: (a) fusion of statistically dependent
IYENGAR et al.: HYPOTHESIS TESTING USING HETEROGENEOUS DATA
2317
modalities and (b) fusion of independent modalities and derive performance limits when optimum fusion rules for each case are employed. It has been observed in many studies (e.g., [36]) that in some cases it is better to fuse independent sources while in other cases fusion of dependent sources may be more beneficial. A formal solution to this problem is essential and would have implications in feature selection [37] and sensor/modality selection applications. Although the literature on information fusion is rich, the exact link between classification performance and statistical dependence has not been established [38]. Some recent contributions have been made in this direction (e.g., [37]–[39]). In [38], the authors assume a multivariate Gaussian model for the observations (genuine and impostor scores) and conclude that a positive value for the correlation coefficient is detrimental. Contrary to this result, in [39], the authors use error exponent analysis and conclude that dependence always enhances classification performance. More recently, in [37], the authors consider the impact of correlation using bivariate Gaussian features. They use Matusita distance as a measure of separation between the pdfs of the competing classes and show that the conclusions of the above two studies do not hold in general, i.e., they do not extend to arbitrary distributions. As we show next, copula theory allows us to answer this important question and is also general in treatment; i.e., the result holds for all distributions. A. Performance Limits Using Error Exponents The asymptotic performance of a LLR test can be quantified using the KL divergence, . For the “ ” i.i.d. -variate , measurements through Stein’s Lemma [40], we have for a fixed value of
(28) , faster is the converThe greater the exponent gence of to zero as . KL divergence is thus indicative of the performance of an LLR test for large . Now
Thus, the change in classification performance (in the KL divergence sense) due only to intermodal dependence is equal . This shows that fusion of dependent sources is to a better approach than using independent sources if and only is greater than zero. The nonnegativity of this if term is guaranteed for problems where dependence between is zero . This is usually the signals under versus signal in noise detection case for Noise alone problems. However, the problem is more complicated when the variables exhibit statistical dependence under both hypotheses and one needs to examine each scenario to determine the benefits of fusion. Our result agrees with the conclusion drawn by the authors in [37]. Further, our methodology is not limited to linear correlation, thus extending [37] that would now allow the analysis of dependence effects on multisensor fusion in non Gaussian scenarios. VII. CONCLUSION In this paper, we have presented a parametric framework for the joint processing of heterogeneous data such as fusion of multiple biometric signatures. Modeling such data is difficult as signals across different modalities may be incommensurate. Often statistical independence (the product model) is assumed for analytical tractability. Here, we have shown that copula functions possess all the properties necessary for modeling heterogeneous data. A binary hypothesis test was formulated and the copulabased detector was obtained which, in addition to the differences in the marginal pdfs, is also able to exploit the differences in the dependence structures across the two hypotheses. The important problem of identifying the best copula was considered. Selection rules that exactly quantify classification per, area under ROC) formance enhancement (e.g., area under due to copula processing were derived. This allows one to compare performance gain as a function of detector complexity. The framework was successfully tested on real data using the NIST database for a biometric authentication application. Finally, performance limits using error exponents were derived to analyze the effect of statistical dependence on classification performance. The results have implications in feature extraction and sensor/modality selection applications which are of interest in our future work. It will also be interesting to explore the idea of combining several copula functions. Different copula functions exhibit different behavior and a combination of multiple copula functions may be able to better characterize dependence between disparate modalities. APPENDIX
(29)
is the error exponent corresponding to fusion where is a function of statistical of independent modalities and dependence under .
ML estimation methods for copula models are discussed in detail in [22]. The exact ML approach requires the joint estimation of both the marginal parameters (if unknown) and the copula dependence parameter. This could be computationally intensive, especially for higher dimensions. However, the copula approach separates the marginal and dependence parameters and allows one to first estimate only the marginal parameters and then the dependence parameter in the second step. This two-step procedure, known as the method of
2318
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 5, MAY 2011
inference functions for margins (IFM), helps reduce the computational complexity while still being highly efficient [21]. Given i.i.d. realizations (30) Solution to (30) is obtained using the golden section search and parabolic interpolation. The copula fitting routine for the bivariate case is available with the MATLAB Statistical toolbox. We estimate the dependence parameter for the multivariate Clayton and FGM copulas (Section IV-Example 1) similarly. When the marginal cdfs in (30) are unknown, they can be replaced by their empirical estimates (31)
where is an indicator of event . This is the so-called canonical maximum likelihood (CML) approach [22] (32)
ACKNOWLEDGMENT The authors would like to thank the anonymous referees and the Associate Editor Dr. A. Dogandzic for their valuable comments and suggestions. They also thank Dr. A. K. Jain and Dr. K. Nandakumar for their valuable comments and suggestions for the biometric authentication example and appreciate the thoughtful suggestions provided by Dr. H. Chen, Dr. A. Sundaresan, and A. Subramanian of Syracuse University, Syracuse, NY. REFERENCES [1] R. Cutler and L. Davis, “Look who’s talking: Speaker detection using video and audio correlation,” in Proc. ICME, 2000, pp. 1589–1592. [2] M. J. Beal, N. Jojic, and H. Attias, “A graphical model for audiovisual object tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 7, pp. 828–836, 2003. [3] Y. Rui and Y. Chen, “Better proposal distributions: Object tracking using unscented particle filter,” in Proc. CVPR, 2001. [4] E. Kidron, Y. Y. Schechner, and M. Elad, “Pixels that sound,” in Proc. CVPR, 2005. [5] M. E. Sargin, E. Erzin, Y. Yemez, and A. M. Tekalp, “Lip feature extraction based on audio-visual correlation,” in Proc. EUSIPCO, 2005. [6] T. Butz and J. Thiran, “From error probability to information theoretic (multi-modal) signal processing,” Signal Process., vol. 85, pp. 875–902, 2005. [7] P. Besson, V. Popovici, J. M. Vesin, J. P. Thiran, and M. Kunt, “Extraction of audio features specific to speech production for multimodal speaker detection,” IEEE Trans. Multimedia, vol. 10, pp. 63–73, 2008. [8] J. Fisher, III and T. Darrell, “Speaker association with signal-level audiovisual fusion,” IEEE Trans. Multimedia, vol. 6, pp. 406–413, 2004. [9] A. Prati, R. Vezzani, L. Benini, E. Farella, and P. Zappi, “An integrated multi-modal sensor network for video surveillance,” in Proc. VSSN, 2005, pp. 95–102. [10] W. Ketchantang, S. Derrode, and L. Martin, “Pearson-based mixture model for color object tracking,” Mach. Vis. Appl., vol. 19, pp. 457–466, 2008. [11] Y. Stitou, N. Lasmar, and Y. Berthoumieu, “Copula based multivariate gamma modeling for texture classification,” in Proc. ICASSP, 2009.
[12] N. Brunel, W. Pieczynski, and S. Derrode, “Copulas in vectorial hidden markov chains for multicomponent image segmentation,” in Proc. ICASSP, 2005, pp. 717–720. [13] S. G. Iyengar, P. K. Varshney, and T. Damarla, “On the detection of footsteps using acoustic and seismic sensing,” in Proc. Asilomar Conf. Signals, Syst. Comput., 2007, pp. 2248–2252. [14] G. Mercier, G. Moser, and S. B. Serpico, “Conditional copulas for change detection in heterogeneous remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 5, pp. 1428–1441, May 2008. [15] C. Genest and J.-C. Boies, “Detecting dependence with Kendall plots,” Amer. Statist., vol. 57, no. 4, pp. 275–284, 2003. [16] D. Mari and S. Kotz, Correlation and Dependence. London, U.K.: Imperial College Press, 2001. [17] J. E. Hurtado, Structural Reliability: Statistical Learning Perspectives. New York: Springer, 2004. [18] H. Joe, Multivariate Dependence and Related Concepts. London, U.K.: Chapman and Hall, 1997. [19] R. B. Nelsen, An Introduction to Copulas. New York: SpringerVerlag, 1999. [20] D. Kurowicka and R. Cooke, Uncertainty Analysis With High Dimensional Dependence Modeling. New York: Wiley, 2006. [21] H. Joe and J. J. Xu, The Estimation Method of Inference Functions for Margins for Multivariate Models Dep. Statist., Univ. British Columbia, 1996. [22] E. Bouyé, V. Durrleman, A. Nikeghbali, G. Riboulet, and T. Roncalli, Copulas for Finance—A Reading Guide and Some Applications SSRN eLibrary, 2000 [Online]. Available: http://ssrn.com/paper=1032533; http://ssrn.com/abstract=1032533 [23] C. Genest and L. P. Rivest, “Statistical inference procedures for bivariate archimedian copulas,” J. Amer. Statist. Assoc., vol. 88, no. 423, pp. 1034–1043, Sep. 1993. [24] C. Genest, B. Rémillard, and D. Beaudoin, “Goodness-of-fit tests for copulas: A review and a power study,” Insurance: Math. Econom., vol. 44, pp. 199–213, Apr. 2009. [25] B. W. Silverman, Density Estimation for Statistics and Data Analysis. London, U.K.: Chapman and Hall, 1986. [26] S. Eguchi and J. Copas, “Interpreting Kullback-Leibler divergence with the Neyman–Pearson lemma,” J. Multivar. Anal., vol. 97, no. 9, pp. 2034–2040, Oct. 2006. [27] M. Studený and J. Vejnarová, “Learning in Graphical Models,” in Multiinformation Function as a Tool for Measuring Stachastic Dependence. Cambridge, U.K.: MIT Press, 1999, pp. 261–297. [28] D. Bamber, “The area above the ordinal dominance graph and the area below the receiver operating characteristic graph,” Journal of Mathematical Psychology, vol. 12, 1975. [29] R. Gupta and A. Hero, “High rate vector quantization for detection,” IEEE Trans. Inf. Theory, vol. 49, no. 8, pp. 1951–1969, 2003. [30] A. Bradley, “The use of the area under the ROC curve in the evaluation of machine learning algorithms,” Pattern Recogn., vol. 30, no. 7, pp. 1145–1159, 1997. [31] A. J. Simpson and M. J. Fitter, “What is the best index of detectibility?,” Psycholog. Bull., vol. 80, pp. 481–488, 1973. [32] C. Cortes and M. Mohri, AUC Optimization vs. Error Rate Minimization, S. Thrun, L. Saul, and B. Scholkopf, Eds. Cambridge, MA: MIT Press, 2004. [33] NIST Biometric Scores Set 2004 [Online]. Available: http://www.itl. nist.gov/iad/894.03/biometricscores/ [34] M. Figueiredo and A. K. Jain, “Unsupervised learning of finite mixture models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, pp. 381–396, 2002. [35] S. C. Dass, K. Nandakumar, and A. K. Jain, “A principled approach to score level fusion in multimodal biometric systems,” Audio and Videobased Biometr. Person Authent., 2005. [36] J. Kludas, E. Bruno, and S. Marchand-Maillet, “Can feature information interaction help for information fusion in multimedia problems?,” Multimedia Tools and Appl., vol. 42, no. 1, pp. 57–71, Mar. 2009. [37] K. Kryszczuk and A. Drygajlo, “Impact of feature correlations on separation between bivariate normal distributions,” in Proc. 19th Int. Conf. Pattern Recogn., 2008, pp. 1–4. [38] N. Poh and S. Bengio, “How do correlation and variance of base-experts affect fusion in biometric authentication tasks?,” IEEE Trans. Signal Process., vol. 53, no. 11, pp. 4384–4396, Nov. 2005. [39] O. Koval, S. Voloshynovskiy, and T. Pun, “Analysis of multimodal binary detection systems based on dependent/independent modalities,” in Proc. IEEE 9th Workshop on Multimedia Signal Process., 2007. [40] H. Chernoff, “Large-sample theory: Parametric case,” Ann. Math. Statist., vol. 27, pp. 1–22, 1956.
IYENGAR et al.: HYPOTHESIS TESTING USING HETEROGENEOUS DATA
Satish G. Iyengar (S’06) received the B.E. degree in electronics and telecommunication from the University of Mumbai, Mumbai, India, and the M.S. degree in electrical engineering from Syracuse University, Syracuse, NY. Since 2006, he has been pursuing the Ph.D. degree with the Department of Electrical Engineering and Computer Science, Syracuse University. He is currently an intern with Blue Highway, LLC (a Welch Allyn company), Syracuse, and was a Research Intern with the U.S. Army Research Lab, Adelphi, MD, in 2009. His research interests include statistical signal processing, information fusion in sensor networks, time-series analysis, and computational neuroscience.
Pramod K. Varshney (S’72-M’77-SM’82–F’97) was born in Allahabad, India, on July 1, 1952. He received the B.S. degree in electrical engineering and computer science (with highest honors) and the M.S. and Ph.D. degrees in electrical engineering from the University of Illinois at Urbana-Champaign in 1972, 1974, and 1976, respectively. During 1972–1976, he held teaching and research assistantships with the University of Illinois. Since 1976, he has been with Syracuse University, Syracuse, NY, where he is currently a distinguished Professor of Electrical Engineering and Computer Science and the Director of CASE: Center for Advanced Systems and Engineering. He served as the Associate Chair of the department during 1993–1996. He is also an Adjunct Professor of Radiology with the Upstate Medical University, Syracuse. His current research interests are in distributed sensor networks and data
2319
fusion, detection and estimation theory, wireless communications, image processing, radar signal processing, and remote sensing. He has published extensively. He is the author of Distributed Detection and Data Fusion (New York: Springer-Verlag, 1997). He has served as a consultant to several major companies. Dr. Varshney was a James Scholar, a Bronze Tablet Senior, and a Fellow, all while he was with the University of Illinois. He is a member of Tau Beta Pi and is the recipient of the 1981 ASEE Dow Outstanding Young Faculty Award. He was elected to the grade of Fellow of the IEEE in 1997 for his contributions in the area of distributed detection and data fusion. He was the Guest Editor of the Special Issue on Data Fusion of the PROCEEDINGS OF THE IEEE, January 1997. In 2000, he received the Third Millennium Medal from the IEEE and Chancellor’s Citation for Exceptional Academic Achievement at Syracuse University. He serves as a Distinguished Lecturer for the AES society of the IEEE. He is on the Editorial Board of the International Journal of Distributed Sensor Networks and the Journal of Advances in Information Fusion. He was the President of International Society of Information Fusion during 2001.
Thyagaraju Damarla (S’83–M’84–SM’95) received the B.S. and M.S. degrees from the Indian Institute of Technology (IIT), Kharagpur, and the Ph.D. degree from Boston University, Boston, MA. He is working as an electronics engineer with the US Army Research Laboratory. He was with the Indian Space Research Organization, Sriharikota, from 1973 to 1979, and with the IIT from 1979 to 1982. He worked as an Assistant Professor with the University of Kentucky, Lexington. He published more than 100 technical papers in various journals and conferences and has three U.S. patents. His current interests include signal processing and sensor fusion for situational awareness.