Jun 15, 2018 - D34. 600. 2. 0.78. -. 0.89. -. 0.63 D34. 600. 4. -. 0.56. -. 0.63 0.56 D34. 600. 1. 0.76 0.74. -. -. -. D34. 300. 2. 0.79. -. 0.88. -. 0.58 D34. 300. 4.
Diagnosing the signs of pathological states of a human based on the analysis of heart rate variability Anatolii A. Pulavskyi, Sergey S. Krivenko, Liudmyla S. Kryvenko BioPromin LTD, Kharkiv National Medical University Introduction Improvement of algorithms for detecting the signs of pathologies, when using only the time HRV, is a very actual issue. Besides, it is desirable to shorten the time of measurement and reduce the number of leads to one. In this case, such a system can be used not only by a medical professional, but also by an untrained user, and also be connected to cloud services for ECG analysis. For these purposes, it is proposed to use the information analysis of the time series of HRV. Symbolic analysis of HRV has proven itself for diagnostic of congestive heart failure, atrial fibrillation and other cardiac diseases. We propose to use symbolic analysis also for the diagnosis of internal diseases. At the same time, it is necessary to account that the behavior of HRV is a consequence of ANS activity and, apparently, the signs of diseases appear as a result of ANS reaction to pathological processes occurring in the body. OBJECTIVE: The objective of the work is to improve the system of diagnosing the signs of several pathological conditions of a human body, using the time HRV and its symbolic analysis.
Theory/Approach/ Methodology The ECG of a human can be represented as a sequence of heartbeats. Herewith, each heartbeat may be matched with a timestamp and the value of an amplitude of the R-peak. The set of timestamps sequence is a time series - IBI (interbeat interval sequence). The existing method for detection of the signs of pathologies of internal organs assumes a fixed length of IBI, which is 600 (the amplitudes of R-peaks are also included in the sequence, but only timestamps are used in our work). The average time to obtain this sequence is 10 minutes. In practice, for unprepared users, it is more comfortable and easy to use the pre-set measurement time (e.g. 100 sec or 300 sec). For an IBI of 100, 300 or 600 seconds duration, it is possible to form a specific symbolic output sequence (OSeq). Each symbol of this sequence is created according to a particular rule, named as TRule (Transform Rule), which consists in converting the several IBI elements into a single element of the OSeq sequence, using a sliding window with a TL step (Time Lag). Then, sequential elements of OSeq can be combined into symbols (words). The number of OSeq elements in a symbol is WL (Word Length). Apparently, for different TRule-WL-TL combinations, corresponding finite alphabets of symbols may be got. Also for each OSeq sequence, the histogram of frequencies for each of the symbols of the related alphabet may be determined. The length of OSeq can be different for different input IBIs. It depends on the duration of the obtained ECG signal and the heart rate. Therefore, for the universality of work with sequences OSeq of arbitrary length, it is proposed to normalize the histograms of symbol frequencies on the length of OSeq. Bins of the obtained histograms will be the relative probabilities of the appearance of corresponding symbols of the alphabet.
When determining the minimum length of OSeq, it is essential to take into account the fact that this value must exceed the length of the alphabet, so that the histogram "has enough time" to accumulate its elements. SVM with the RBF kernel is used to obtain predictive models. The truncated combined vector of considered TRule and WL is used as the vector of input properties for SVM. The truncation of the total vector of properties is carried out by selecting the most significant ones (searching the dictionary of a particular disease) using SVM with a linear kernel. For Test Set, the sensitivity (TPR) and the specificity (SPC) are determined. Also, the posterior probabilities for Test Sets are calculated, they are used to determine the posterior efficiency of the model, as well as to define the percentage of PercH precedents, which posterior probability exceeded the threshold. In this work, the models for signals of duration 100, 300, 600 seconds (or heartbeats) are compared. In the context of this approach, it is difficult to talk about the diagnosing of a disease in the commonly used sense of the word. The proposed method instead determines the signs (in our case, specific symbols of specific alphabets – disease dictionaries) related to one or another disease, similar to the definition of metabolic parameters. In this work, four TRule methods are used to convert the input sequences into one single vector. All four methods work only with the time variability of heart rhythm, as a precise definition of the ECG peaks amplitude variability is associated with some additional difficulties. In this study TL=1. The length of the final property vector after concatenating all TRule was 1206 elements.
Results During the study, the models for dataset record durations of 100, 300 and 600 seconds were obtained. Besides, the posterior probabilities were determined for test sets. The models were built for three pathologies, such as angina pectoris, benign neoplasm of the thyroid gland, peptic ulcer. Before constructing each Kaggle model, the related dataset was balanced - both classes of use precedents had the same size. As the number of ill patients in each of the group exceeds the number of healthy ones, the remaining ill patients were used as an additional test set for an estimation of the sensitivity of the obtained models. The previously balanced dataset was divided in the proportion of 70/30, where 70% were used as the training set and 30% as the test set. Additionally, the parameters of a model efficiency were obtained using the posterior probabilities - TPRa, SPCa. The results are represented in the table below (Table 1). Wherein, the parameter PercH shows the share of precedents from the total number (in percentage) when their posterior probability exceeds the threshold, calculated in accordance with the previously described principle; ICD means the code of disease according to the 10th version of the International Classification of Diseases; Len is the length of records in seconds. The analysis of obtained data shows that the specificity and sensitivity of models, which diagnose the signs of the pathologies, for the signals of 300 and 600 seconds duration, do not differ significantly. When the duration of a signal is reduced to 100 seconds, metrics tend to decrease. In particular, the specificity is radically reduced, in some cases up to 0.44. Nevertheless, the obtained models with duration of a signal of 300 seconds or more are recommended to use.
The method for diagnosing the signs of certain pathologies using the time variability of the heart rhythm was improved. It was demonstrated that decreasing of a signal duration to 100 seconds leads to the reduction of the predicted model efficiency. For practical use, a minimum signal duration of 300 seconds is recommended. In such a case, the sensitivity and the specificity of the Test Sets while determining the signs of angina pectoris were 0.81 and 0.92, accordingly, which is not worse than the similar metrics for the signals with duration 600 seconds. The sensitivity and specificity of the Test Sets in diagnosing the signs of benign neoplasm of thyroid gland were 0.76 and 0.74, accordingly. The sensitivity and specificity of the Test Sets in diagnosing the signs of peptic ulcer were 0.80 and 0.71, respectively. The important is the fact that decreasing of a signal duration leads to the deterioration of the models' specificity. The efficiency of the system is possible to be improved up to 15% using the posterior probabilities. Dataset #
TPR
SPC
TPRa
SPCa
PercH
ICD
Len
1 2 3 4 1 2 3 4 1 2 3 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4
0.88 0.88 0.79 0.82 0.88 0.90 0.86 0.80 0.83 0.75 0.78 0.76 0.79 0.72 0.68 0.72 0.65 0.80 0.67 0.73 0.65 -
0.91 0.60 0.91 0.60 0.79 0.58 0.72 0.56 0.74 0.48 0.66 0.5 0.83 0.63 0.71 0.44 0.70 0.48
0.96 0.87 0.96 0.97 0.90 0.89 0.89 0.88 0.81 0.81 0.78 0.66 -
0.70 0.71 0.64 0.63 0.57 0.55 0.73 0.55 0.61
0.68 0.57 0.63 0.62 0.68 0.50 0.60 0.63 0.58 0.63 0.56 0.58 0.48 0.58 0.38 0.50 0.46 0.66 0.60 0.53 0.48
I20 I20 I20 I20 I20 I20 I20 I20 I20 I20 I20 I20 D34 D34 D34 D34 D34 D34 D34 D34 D34 K27 K27 K27 K27 K27 K27 K27 K27 K27
600 600 600 600 300 300 300 300 100 100 100 100 600 600 600 300 300 300 100 100 100 600 600 600 300 300 300 100 100 100
References [1] https://ec.europa.eu/programmes/horizon2020/ [2] https://kolibri.one/ [3] Lane N (2018) Hot mitochondria? PLoS Biol 16(1): e2005113 [4] V. Uspenskiy Diagnostic system based on the information analysis of Electrocardiograph, 2012 Mediterranean Conference on Embedded Computing (MECO), 2012, 3 p. [5] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm [6] Richard G. Baraniuk “Compressive Sensing”, IEEE Signal Processing Magazine [120] July 2007 [7] S.S. Krivenko, A.A. Pulavskyi, S.A. Krivenko Identification of diabetic patients using the nonlinear analysis of short-term heart rate time series, 2018, in press. [8] A.A. Pulavskyi, S.S. Krivenko, S.A. Krivenko, The computation of line spectral frequencies using discrete wavelet transform for electrocardiograms processing, IEEE 36th International Conference on Electronics and Nanotechnology (ELNANO), Kiev, Ukraine, April 1921, 2016, pp. 202-205. [9] S.S. Krivenko, A.A. Pulavskyi, S.A. Krivenko, Determination of low hemoglobin level in human using the analysis of symbolic dynamics of the heart rate variability, IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), Kyiv, Ukraine, 2017, pp. 271-274.
MECO’2018 & ECYPS’2018, Budva, Montenegro, June 10th – 15th , 2018, www.embeddedcomputing.me