Multiple error measures in ECG data compression Ranveig Nygaard and Dag Haugland Høgskolen i Stavanger Department of Electrical and Computer Engineering P. O. Box 2557 Ullandhaug, 4004 Stavanger Phone: +47 51 83 10 51 Fax: +47 51 83 17 50 E-mail:
[email protected]
ABSTRACT
The problem of compressing digital electrocardiograms (ECG) has traditionally been tackled by heuristic approaches. Recently, it has been demonstrated how techniques based on graph theory can be applied in order to yield optimal compressions. Unlike the conventional methods, the graph algorithms guarantee the minimal reproduction error given some reasonable assumptions. Closeness between original and reproduced data is in the above mentioned exact algorithms measured by the sum of squared deviations. No bound is however assumed on the error at any individual sample, and hence the innity norm error is beyond control of the algorithm. In traditional heuristic compression methods the maximum error is accounted for. It is the challenge of this paper to enhance the exact algorithms such that optimal compressions respecting a given maximum error bound are found. Based on the algorithm in [3] we show in the current paper how bounds on the innity norm error can be incorporated in a computationally ecient way. With reference to the real time application of ECG compression, we stress the need to satisfy rather strict constraints on the execution time of the algorithm. Fortunately, the desired algorithm enhancement is feasible without signicant increase in computational complexity, and numerical experiments prove good performance of the method.
1. INTRODUCTION
Storage and transmission of ElectroCardioGrams (ECG) call for ecient methods to keep such data in manageable size. Most of the well known compression algorithms in the time domain can be categorized as fast heuristics based on clever considerations of very small sample subsets. Such methods include the frequently applied AZTEC [2] program, the popular reference FAN [1] algorithm, and the heuristics of Tai [6, 7]. As opposed to the methods above, the optimization algorithm in [3] is based on a rigorous mathematical model of the entire sample selection process. The achieve-
ment of such an approach has been a signicant reduction in reproduction error. Any time domain algorithm for signal compression is based on the simple philosophy of extracting a subset of the original sampled values. The key to a successful algorithm is a good rule to determine the most signicant samples. In [3] this is accomplished by considering the samples as nodes in a directed graph. Any pair of nodes are connected by one arc, the direction of which is consistent with the sample order. Each arc signies the option to include the samples corresponding to its end nodes as consecutive samples in the extracted subset. In [3] the goal is to minimize the reproduction error when a bound on the number of samples to extract is imposed. It is shown that this is identical to solving the cardinality constrained shortest path problem dened on the graph. This problem refers to a path from the rst to the last sample, and the nodes on the path correspond to extracted samples. Furthermore, the length of the path is given as a sum of arc lengths along the path. The length of the arc connecting i and j is dened as the contribution to reproduction error from eliminated samples recorded between i and j . The measure of error in question is the distortion, dened as the sum of squares of deviations between original and reproduced signal values. An algorithm for solving the compression problem in the case of such an error measure is given in the above paper. It is shown that the algorithm converges in cubic time, and in [5] it is demonstrated how the idea can be implemented in order to comply with the constraints that apply to execution time. Unlike heuristics like [1] and [2], which both are guided by a bound on the maximum reproduction error, the exact method has so far been unable to deal with such a bound. In this paper we show how this can be overcome. The paper is organized as follows: In the next section we dene the problem in strict mathematical terms. Section 3 is devoted to the computational procedure required to deal with the innity norm error. Finally, we report on computational results from an implementation of the enhanced optimization algorithm.
2. PROBLEM DEFINITION
We here introduce the notation used throughout this paper, and describe the problem to be solved concisely. Let n denote the total number of samples, and let m be an upper bound on the number of extracted samples. Let y denote the amplitude of sample i. Dene = y + ?? (t ? i) ? y for all i t j . With each arc P(i; j ) we associate the length parameters c2 = = 2 and c1 = max fj j : i t j g. Correspondingly, 2 2 = P we dene the path lengths " ( )2 c and o n "1 = max c1 : (i; j ) 2 P , where P is the set (path) of pairs (arcs) of consecutive retained samples. Let jP j denote the cardinality of P . The set P denes the compression uniquely, and we dene the restored signal values y^1 ; : : : ; y^ by letting y^ = y if sample t is retained. Otherwise, we let y^ = y + ?? (t ? i), where i and j are the consecutive retained samples closest to t, for which i < t < j . Hence signies the local error introduced when replacing y by its approximation y^ . It is required that the absolute value of the dierence between original and restored signal nowhere violates the upper bound c1 . That is, the maximum error is to be bounded from above. The problem thus amounts to extract a sample selection to be represented by P , satisfying "1 c1 and jP j 1 m ? 1, and such that "2 is minimized. When c is suciently large, this essentially becomes the problem addressed in [3].
i j
tij
i
yj
j
yi
k’’
i’
i
t
i
ij
t
i
tij
P
ij
tij
ij
i;j
ij
n
t
t
t
i
yj
j
k’
k
j
Figure 1. The convex hull of a set of 2D points i j i’
yi i
i’’ j+1
tij
t
t
3. COMPUTING THE INFINITY NORM ERROR
In [4], it is proven that only minor changes to the cardinality constrained shortest path problem are necessary in order to cope with bounds on the maximum reproduction error. Actually, this can be accomplished by reducing the graph, i.e. eliminating all arcs that would contribute to a violation of this bound. Next the algorithm in [3] can thus be applied to the reduced graph. The main eort in reducing the graph, is to compute c1 . This is easily accomplished provided that the convex hull of all sample points from i to j are available. By executing a binary search along the boundary vertices of the convex hull, c1 can be computed for any combination of i and j . In the case depicted in Figure 1 the maximum error occurs at node k. The problem of reducing the graph can thus be solved by the following major steps: For each pair of nodes i and j; i = 1; : : : ; n ? 1; j = i + 1; : : : ; n, compute the convex hull containing all samples from i to j . ij
ij
Figure 2. Insertion of a new point in an existing convex hull Compute c1 by binary search along the boundary vertices of the corresponding convex hull. Compare c1 to c1. If c1 > c1 , eliminate the corresponding arc from the graph. An important part of the problem of constraining the maximum error, is the computation of a dynamic convex hull of a sorted set of 2D points. A general method for this is given in [8]. Denote the convex hull of all samples from i to j by H (i; j ). From Figure 1 it is clear that H (i; j ) can be uniquely represented by its vertices. The boundary of H (i; j ) is described by one convex and one concave function, and is thus naturally subdivided into two parts. When computing H (1; j ), j = 2; : : : ; n, we verify whether sample i (1 < i < j ) is a vertex of the convex part, a vertex of the concave part, or an interior point. Such boundary relations are stored in a dynamic data structure, and exploited in the computation of H (i; j ); i = 2; : : : ; n ? 1; j = i + 1; : : : ; n. In the following discussion we focus on how to nd the vertices of the convex part of the boundary, but equivalent operations apply to the concave part. Assume H (i; j ) is found. This hull is to be updated to H (i; j +1) by taking one new sample, j +1, into account. We nd H (i; j + 1) by backtracking along the boundary of H (i; j ), and insert the new sample in such a way that convexity is maintained (See Figure 2). ij
ij
ij
To analyze how H (i; j + 1) will change compared to H (i; j ), we must examine the slope of the straight line joining sample j + 1 and one vertex in H (i; j ). Let denote the slope of the straight line joining samples i and j . To decide where to insert sample j + 1 in H (i; j ), we backtrack until we nd two succeeding vertices i < i such that +1, and insert sample j + 1 after vertex i as illustrated in Figure 2. For the concave part it is the other way around: We backtrack the boundary vertices in H (i; j ) until we nd two succeeding vertices i < i such that +1 and insert sample j + 1 after vertex i .
1400 1200 a) 1000
ij
0
0 00
i i
i
00
00
00
800 0
100
150
200
250
300
350
400
450
500
50
100
150
200
250
300
350
400
450
500
50
100
150
200
250
300
350
400
450
500
1200
b)
1000
j
0
800 0 1200 1000
c)
00
0 00
i i
00
i
800
j
00
A total of ( 2?1) convex hulls have to be made available in order to restrict "1 . Each of these are represented by the vertices of the convex and concave part of the boundary. When the vertices are found, we make binary search in both vertex sets in order to nd the points where the maximum error occurs. The binary search can be applied in this case because the line segments on the boundary are sorted by slope. By stepping through the vertices of the convex part, we will reach a point where the slope of the straight line connecting two vertices gets larger than . This is the point where the maximum error occurs (see Figure 1). We see that > and that < . Thus k is the point of maximum error. By making binary search in both vertex sets of H (i; j ), c1 can be found in O(log n) time.
600 0
Figure 3. Test signals: 1) Normal sinus rhythm (MIT100, sample nr. 1 to 500) 2) Supraventricular tachycardia (MIT202, sample nr. 1201 to 1700) 3) Ventricular rhythm (MIT203, sample nr. 601 to 1100) 5.2
0
ij
i k
kk
0
ij
Computing all ( 2?1) convex hulls along the lines outlined above, requires O(n2 ) operations [4]. Hence the graph reduction is performed in O(n2 log n) time. Furthermore, the complexity of the optimization algorithm in [3] is O(mn2 ), where the dependency upon n2 is explained by the fact that an unreduced graph consists of ( 2?1) arcs. If the number of arcs after reduction is p, the complexity of the new version of the algorithm becomes O(n2 log n + mp). Note however that in the worst case, p = O(n2 ). n n
n n
−−− PRD with bound on inf. norm
5.16 5.14 5.12 5.1 5
5.1
5.2
5.3 5.4 5.5 5.6 5.7 CCSP infinity norm, signal 1), m = 50
5.8
5.9
6
14
n n
ij
....... PRD without bound on inf. norm
5.18
PRD
Once H (i; j ); j = i +1; : : : ; n are found, the next step is to compute H (i +1; j ); j = i +2; : : : ; n. We know that interior points of H (i; j ) may be on the boundary of H (i + 1; j ). However, if sample k is a vertex of both H (i; j ) and H (i + 1; j ), then all vertices k0 > k of H (i; j ) are vertices of H (i + 1; j ) as well. In order to be computationally ecient, the algorithm has to exploit this information. In the computation of H (i; j ); j = i + 1; : : : ; n, we therefore keep track of all boundary relations for later to be utilized in the computation of H (i + 1; j ); j = i + 2; : : : ; n.
ij
50
1400
....... PRD without bound
13
PRD
on inf. norm 12 −−− PRD with bound
11
on inf. norm 10 9 13
13.5
14
14.5 15 15.5 16 CCSP infinity norm, signal 1), m = 25
16.5
17
Figure 4. PRD versus innity norm error for test signal 1)
4. NUMERICAL EXPERIMENTS
To investigate the performance of the algorithm, three ECG recordings (se Figure 3) from the MIT [9] databases were chosen. In each case, we let n = 500. Results from the CCSP algorithm with the maximum error incorporated and results from the traditional FAN algorithm were then recorded. The results are shown in Tables 1, 2 and 3. Input to the CCSP algorithm are c1 and m, whereas only c1 is input to the FAN algorithm. The total error is reported in terms of the Percentage Rootmean-square r P [ ?^ ]2Deviation (PRD) dened as PRD = P =1=1[ ?]2 100%, where y is the mean value of the original signal. From the data in Tables 1, 2 and 3, we see that there is a close connection between c1 and the number of retained samples, m. When m is reduced, c1 increases (and the other way around). n
i n i
yi
yi
yi y
c1
5.05 5.10 5.30 5.50 5.70 5.90 6.00 13.10 13.50 13.90 14.10 14.50 14.90 15.10 15.50 15.90 16.10 16.50 16.90
"1
5.00 5.07 5.21 5.21 5.21 5.86 5.94 13.00 13.22 13.50 13.50 13.50 14.72 14.72 15.36 15.36 15.36 16.41 16.89
CCSP
PRD 5.20 5.17 5.14 5.14 5.14 5.13 5.12 13.21 13.16 9.99 9.99 9.99 9.92 9.92 9.80 9.80 9.80 9.79 9.47
m 50 50 50 50 50 50 50 25 25 25 25 25 25 25 25 25 25 25 25
"1
5.00 5.00 5.21 5.50 5.67 5.89 6.00 13.04 13.50 13.73 13.82 14.26 14.73 15.00 15.27 15.87 16.00 16.50 16.55
FAN
PRD 5.76 5.81 5.88 6.26 6.32 6.46 6.65 15.98 16.99 17.49 17.50 18.84 18.58 19.69 20.19 20.96 20.35 20.26 20.45
m 74 71 68 66 59 58 54 28 26 25 25 25 25 25 25 24 23 21 21
Table 1. Test signal 1) run with CCSP and FAN
If the bounds on m and "1 are too strict, there exists no solution. This reects the fact that given a desired sample reduction ratio, it is not possible to get an arbitrarily small innity norm error because there exists no path through the graph which satises these constraints. For each reported value of m, the rst entry in Tables 1, 2 and 3 reports the smallest feasible "1 . An example : For test signal 2) with m = 50, the smallest possible value for "1 is 7.4 (see Table 2). We thus have to make a tradeo between c1 and the sample reduction ratio in order to nd a solution. For the FAN algorithm this problem is of no interest as we have no control of the number of retained samples. The only parameter we can control is c1 . Restricting the sample selection by introducing c1 implies an increase in PRD. This is shown for test signal 1) in Figure 4. Earlier experiments with the CCSP algorithm, without c1 incorporated, show that the CCSP algorithm is superior to conventional time-domain heuristics in the sense that it exhibits much less PRD, especially when the sample reduction ratio increases [3]. When c1 is introduced, the comparison with FAN in Tables 1, 2 and 3 shows that CCSP still exhibits less PRD, even for higher sample reduction ratios. Another important aspect by restricting the innity norm error, is the total execution time. As pointed out in the previous section, the complexity of the algorithm becomes O(n2 log n + mp) when a bound on the maximum error is introduced, as opposed to O(mn2 ) in the original version. In cases where p n2 , this may be of major importance. For the data reported in Table 1, we
c1
7.40 7.50 7.60 7.70 7.80 7.90 8.00 20.50 21.00 23.00 25.00 27.00 29.00 31.00 33.00 35.00 37.00 39.00
"1
7.40 7.40 7.56 7.56 7.56 7.86 7.91 20.43 20.98 22.21 22.21 22.21 22.21 22.21 22.21 34.00 34.00 37.75
CCSP
PRD 3.61 3.61 3.35 3.35 3.35 3.34 3.34 9.51 9.05 8.74 8.74 8.74 8.74 8.74 8.74 8.61 8.61 8.57
m 50 50 50 50 50 50 50 25 25 25 25 25 25 25 25 25 25 25
"1
7.40 7.40 7.57 7.67 7.75 7.75 8.00 20.40 21.00 23.00 25.00 26.90 28.00 30.97 32.45 34.95 36.88 38.80
FAN
PRD 4.06 4.06 4.19 4.23 4.55 4.47 4.55 13.56 13.66 16.17 16.86 18.37 17.65 18.06 20.73 23.48 24.30 26.20
m 60 59 59 59 59 57 56 29 28 27 23 22 20 19 19 19 19 17
Table 2. Test signal 2) run with CCSP and FAN
c1
11.40 11.60 11.80 12.00 12.20 12.40 12.60 12.70 27.30 28.00 29.00 30.00 31.00 32.00 33.00 34.00 35.00 36.00
"1
11.38 11.38 11.38 11.38 12.00 12.27 12.27 12.67 27.29 27.87 28.85 29.80 30.92 31.50 31.50 31.50 31.50 35.75
CCSP
PRD 4.59 4.59 4.59 4.59 4.55 4.51 4.51 4.50 12.32 12.19 12.07 11.94 11.76 11.32 11.32 11.32 11.32 11.29
m 50 50 50 50 50 50 50 50 25 25 25 25 25 25 25 25 25 25
"1
11.29 11.60 11.75 12.00 12.11 12.23 12.60 12.69 27.10 28.00 29.00 30.00 30.64 32.00 32.89 34.00 35.00 35.63
FAN
PRD 5.22 5.41 5.43 5.53 5.55 5.60 5.75 5.73 14.61 15.12 15.42 15.93 16.67 16.16 16.60 18.16 18.91 19.43
m 63 63 61 59 59 58 58 57 36 35 35 34 33 31 31 26 27 27
Table 3. Test signal 3) run with CCSP and FAN
1300 1200 a)
1100 1000 900 0
50
100
150
200
250
300
350
400
450
500
50
100
150
200
250
300
350
400
450
500
1300 1200 b)
1100 1000 900 0
Figure 5. a) Test signal 1) compressed with CCSP, kck1 = 14:72; PRD = 9:92; m = 25, b)Test signal 1) compressed with FAN, kck1 = 15:00; PRD = 19:69; m = 25 . pression algorithm. The goal is to minimize the found that p attains values 8210, 11285, 21648 and total error while respecting bounds on the maxi29189 when c1 is put equal to 5.05, 6.00, 13.10 and mum error and the size of the compressed signal. 16.90, respectively, whereas the unreduced graph We outline the idea behind ecient computation has 124750 arcs. This implies considerable reducof the maximum error, and show how to incorpotion factors (ranging from 8 to 30), and the actual rate a bound on this error in an existing algorithm eect of this will be more carefully analyzed in fufor minimizing the total error. ture work. The complexity of the algorithm implemented In [5], an eective implementation of the CCSP is no worse than cubic in the number of samples, algorithm is suggested. The idea is to divide the and experiments show that the introduction of the input signal into partly overlapping segments, and new error bound does not slow down the execuprocess each segment independently. It is shown tion. Unlike previously known methods, the sugthat an optimal compressor can be implemented gested algorithm enables us to control both the by a windowing technique, processing 100 samples maximum error and the sample reduction ratio. at a time with only slightly reduced coding perWhen applied to standard test data, our method formance. By reducing the execution time of the yields compressions superior to those of the conalgorithm, it is possible to process larger blocks of ventional FAN algorithm. data without loss of eciency. This will cause the Future works lie in investigating the execution algorithm to produce compressions of which the time of the CCSP algorithm. Results in this article total reproduction error is reduced. indicates that by incorporating the innity norm From the data presented in Tables 1, 2 and 3 error in the algorithm, execution time can actually we can see that the results from both the CCSP be reduced. This will make it easier to meet the and FAN algorithm are signal dependent. Extenreal time requirements. sive tests of the two algorithms all show the same trend: The CCSP algorithm generally gives less inREFERENCES nity norm error and PRD, even for higher sample reduction ratio, than does the FAN algorithm. The [1] R.C. Barr, Adaptive sampling of cardiac waveforms, dierence in performance increases as the sample Journal of Electrocardiography, 21, 57-60, 1988. reduction ratio increases. One single run of the [2] J.R. Cox, F.M. Noile, H.A. Fozzard and G.C. Oliver, experiment is shown in Figure 5. In this example AZTEC: preprocessing program for real-time ECG we see that at the same sample reduction ratio the rhythm analysis, IEEE Transactions, BME-15, 128CCSP algorithm gives smaller innity norm error 129, 1968. and signicantly smaller PRD than the FAN algo[3] D. Haugland, J.G. Heber and J.H. Husøy, Optimisarithm. tion algorithms for ECG data compression, Medical
5. CONCLUSIONS
This paper demonstrates a method for incorporating two error measures in an time domain com-
& Biological Engineering & Computing, accepted for publication, 1997. [4] D. Haugland, J.G. Heber and J.H. Husøy, Compressing data by shortest path methods, in Operations Re-
[5] [6] [7] [8] [9]
search Proceedings 1996, Springer, Berlin, Germany, 1997. J.G. Heber, D. Haugland and J.H. Husøy, An ecient implementation of an optimal time domain coder, In Proc. IEEE EMBS'96, Amsterdam, Holland, 1996. S.C. Tai, SLOPE - a real time ECG data compressor, Medical & Biological Engineering & Computing, 29, 175-109, 1991. S.C. Tai, AZTDIS - a two-phase real-time ECG data compressor, Journal of Biomedical Engineering, 15, 510-515, 1993. M. Wan, Z. Tang, A new algorithm for constructing dynamic convex hull in the plane, SPIE, 2644, 273282, 1996. Moody, G. (1992) MIT-BIH Arrythmia Database CDROM (second edition), overview. Massachusetts Institute of technology.