A Comparison between Compression and Entropy Based Approaches

0 downloads 0 Views 177KB Size Report
Luísa Castro, Andreia Teixeira, Marcelo Santos,. Cristina Costa-Santos. Center for Health Technology and Services Research. Porto Medical School, University ...
2018 IEEE 31st International Symposium on Computer-Based Medical Systems

Towards FHR biometric identification: a comparison between compression and entropy based approaches Luísa Castro, Andreia Teixeira, Marcelo Santos, Cristina Costa-Santos

Susana Brás Inst. Electronics and Informatics Engineering of Aveiro University of Aveiro Aveiro, Portugal [email protected]

Center for Health Technology and Services Research Porto Medical School, University of Porto Porto, Portugal [email protected], [email protected], [email protected], [email protected]

II.

Abstract— In this study, fetal heart rate signal is used to exemplify the performance of compression and entropy based approaches in biometric identification. A total of 167 pairs of traces from real fetus are analyzed under the popular normalized compression distance, the recently proposed normalized relative compression measure and mutual information measure. The best performance was achieved with the normalized compression distance resulting in a misclassification rate of 12%. Fetal heart rate could be a relevant feature for biometric identification models, namely in multiple pregnancies.

A. FHR signals The FHR dataset used is a subset of recordings acquired with Doppler sensors during the pregnancy of 250 mothers, already published and fully described in [7]. FHR signals are provided at a fixed rate of 4 Hz and were included in this study if they referred to >36 weeks of gestation. In cases where more than one record satisfied the criteria for the same mother, only the last was selected. For the analysis, the last 3478 samples (14.5 minutes, corresponding to the length of the shorter trace on the dataset) were extracted from each recording and divided in two equally sized traces.

Keywords- normalized compression distance; normalized relative compression; mutual information; fetal heart rate; biometric identification

I.

B. Symbolic representation For the symbolic representation of the FHR signals, necessary for the application of the normalized relative compression technique, we used the SAX algorithm [8]. This algorithm converts a time series into a string, assuming a certain statistical distribution of the input samples. For the selection of the best signal’s symbolic representation, mutual information, Euclidean distance and Pearson correlation estimates were computed for all the possible combinations of SAX parameters alphabet size in {20,21,22,23,24,25,26,27, 28,29,30} and window size, ws, in {1,2,3}.

INTRODUCTION

The identification of newborns is not straightforward for medical teams following birth in some developing countries. In some crowded public maternity wards of Brazil, for instance, the number of cases of kidnap, baby swap and even baby selling are above acceptable [1]. In their majority, studies for biometric recognition of newborns rely on one or a set of static characteristics within palmprints, face, ear, footprints and fingerprints [1]. However, static biometrics can be forged to overcome systems security. Some physiological signals have been extensively studied in adults’ identification, such as EEG [2] and ECG [3]. Several studies use heart rate signals and even heart sound for biometric identification/authentication employing machine learning and statistical techniques [4]. Normalized relative compression and finite-context models have been successfully applied to a broad variety of problems [5][6], including ECG biometric identification in adults [3]. Although some studies have focused in newborns’ identification, to our knowledge there are no fetus’ identification studies. Therefore, in this exploratory study, we compare the performance of the recently proposed compression measure with the well-known normalized compression distance and also with the entropy based mutual information measure, for the identification of real fetal heart rate (FHR) signals. 2372-9198/18/$31.00 ©2018 IEEE DOI 10.1109/CBMS.2018.00085

METHODS

C. Measures Normalized relative compression (NRC) is based on the notion of compression methods, and the data representation is performed using finite-context models (FCM) [6]. The NRC of target x given reference y is defined as: C x || y NRC x || y |x| where |x| is the size of the object x. The compression of target x relatively to reference y, denoted by C(x||y), uses a combination of finite-context models of several orders (k) to build an internal model of the reference, which is kept fixed afterwards. It is expected that when we compare two segments of data from the same source the NRC distance will be lower than when comparing data from different sources [5]. Besides k, the model receives two more arguments: a forgetting factor for mixture models, γ; and parameter α for the probability estimator [6]. 440

TABLE 1. NRC, NCD AND MI (MEAN ± STANDARD DEVIATION) VALUES WITHIN GROUPS OF PAIRED SIGNALS . BONFERRONI CORRECTED P-VALUES FOR THE COMPARISONS BETWEEN INTRA AND INTER MEASURES (AFTER SIGNIFICANT DIFFERENCES FOUND BETWEEN GROUPS).

Normalized compression distance (NCD) was applied to the pairs of FHR traces, with three distinct compressors: bzip2, ppmd and zlib. In parallel with compression based techniques NRC and NCD, mutual information (MI), which is based on the concept of entropy, was also tested. For comparisons of normally distributed scale variables, we used paired two-sided Student t tests (2 groups) or analysis of variance ANOVA (more than 2 groups). When using ANOVA, if homogeneity of variance was not satisfied, we used the Welch test. III.

Measure NRC (3;1/1) NRC (5;1/100) NRC (7;1/1000) NRC (9;1/1000) NRC (mix; γ=0.1) NCD (bzip2) NCD (ppmd) NCD (zlib) MI

RESULTS

For the window size, the value of 1 was selected because it produced the lowest mean Euclidean distance and the highest Pearson correlation estimates, for all alphabet sizes. The performance according to the MI was not as systematic, but an alphabet size of 29 was selected since it maximized the mutual information for ws=1. A total of 334 FHR recordings were included in the analysis, two for each fetus, corresponding to 7.25 minutes of recording time. As in [3], three sets of signal pairs (reference, target) were built: i) self – containing only the second trace from each fetus (target = reference); ii) intra – with the pairs of traces from the same fetus; iii) inter – containing pairs from distinct fetus. We analyzed the performance of NRC for FCM of a single order where the argument pair (k, α) is taken from the set {(3,1/1), (5,1/100), (7,1/1000), (9,1/1000)}. Mixture models using all pairs (k, α) in the above set were also applied for γ in {0.1, 0.2, 0.3}, with similar results for distinct γ values. Summary statistics are presented in Table 1. Overall, the best misclassification rate was 12.0% obtained with NCD (bzip2), 15.0% with NCD (ppmd) followed by NCD (zlib) with 17.4%. The NRC measure obtained misclassification rates above 30.5%. IV.

self 0.47±0.08 0.15±0.01 0.06±0.02 0.03±0.01 0.08±0.02 0.28±0.02 0.48±0.06 0 4.58±0.59

intra 0.74±0.08 0.80±0.09 0.90±0.06 0.93±0.04 0.74±0.09 0.79±0.03 0.75±0.03 0.89±0.02 0.91±0.39

inter 0.77±0.07 0.85±0.07 0.94±0.04 0.95±0.04 0.78±0.07 0.84±0.03 0.81±0.04 0.92±0.03 0.87±0.29

p-value ≤0.001* ≤0.001* ≤0.001* ≤0.001* ≤0.001* ≤0.001* ≤0.001* ≤0.001* 1.000 * p

Suggest Documents