Lossless ECG Compression for Event Recorder ...

3 downloads 8491 Views 325KB Size Report
Email: {sarada, jssahambi}@iitg.ernet.in ... Huffman coder gives a better result when the input data is .... adaptive algorithm such as MTF coder, IR coder etc. For.
RESEARCH PAPER International Journal of Recent Trends in Engineering, Vol 1, No. 3, May 2009

Lossless ECG Compression for Event Recorder Based on Burrows-Wheeler Transformation and Move-To-Front Coder Sarada Prasad Dakua1 and Jyotinder Singh Sahambi2 1, 2

Department of Electronics and Communication Engg. Indian Institute of Technology Guwahati, Guwahati, INDIA Email: {sarada, jssahambi}@iitg.ernet.in rearranges it using a sorting algorithm. This transformation is reversible, meaning that the original ordering of the data elements can be restored with no loss of fidelity. It has modest properties 1) flexibility 2) robustness. Modification is possible when desired [7], [8]. The Burrows Wheeler algorithm is a relatively recent algorithm. An implementation of the algorithm called bzip, is currently one of the best overall compression algorithms for text data. It gets compression ratios that are within 10 percent of the best algorithms such as Prediction by Partial Matching (PPM), but runs significantly faster. A similar algorithm has been implemented in [9] and a compression ratio of 3.21 (bits per sample) is achieved but there are still some points which need some attention to have an overall improvement. It suggests the requirement of a large data size for achieving a good compression. The correctness absolutely lies on this fact that a data of small duration results an insatiable compression ratio. The paper has used mostly 30 minute data bases for surviving a good compression ratio. Honestly in practice, for a patient having serious heart ischemia each moment is so crucial that the patient with no immediate treatment even for a small extra time may cost him with a worse condition. Conceiving this situation the time required for decoding a huge amount of encoded data at that moment probably is less feasible. To reduce this time requirement the amount of encoded data is to be reduced. Another important case could be an event recorder where a small portion of interest in the entire data is to be examined. If it is required for transmission under discussed method [9], it will not give a good compression. Similarly, another problem lies with the data format. It can be easily assessed that if any data (which are in megabyte) from the source site [10] is taken and converted into any text format there will be a reduction in its size. In this process, definitely some loss occurs [11] that may be even very little. If this loss includes some valuable diagnostic information then further analysis of that data is entirely useless. To avoid these problems, a simple method is adopted which proves to be more useful at the end.

Abstract— The field of data compression has developed many algorithms so far and still the process seems to be always increasing in search of a better compression scheme. The Burrows-Wheeler transform (BWT) has been a crucial tool for data compression. Normally, the BWT earns maximum efficiency when the input is in the text format. In the present paper, some real possible difficulties are explored when the input data is in text format. Skewness of all the necessary and possible combinations are calculated and it is found that the combination of move-to-front coder with the Huffman coder gives a better result when the input data is in numbers rather than text. On contrary to the use of a large amount of input data in the existing algorithms which use BWT, in this work all the data used are of small duration and finally, a standard compression ratio of 2.7247 is achieved if the quantity of input is compared with that of the output simultaneously. For this reason this technique may be considered quite useful for the data transfer in an ECG event recorder. Index Terms— Burrows-Wheeler Transform, move to front coder, inversion rank, run-length coder.

I.

INTRODUCTION

ELECTROCARDIOGRAM (ECG) coding is required in several applications such as ambulatory monitoring, patient data bases, medical education systems, and transmission over telephone lines. The current technologies provide sufficient space to store or transmit data, so now it is no more a big problem. But the continuous effort to reduce the time requirement has made the ECG data compression more focused and thus has received much attention. Many papers have come in these years which are collectively provided in [1]. In recent years wavelet transform along with neural network, artificial intelligence etc. has gained the momentum but these [2]-[5] are basically based on lossy compression scheme. With the rapid growth of different diseases, the data obtained for diagnosis should be retained in its original form to ensure a better diagnosis. In case the transmission is necessary for an expert’s view or any such purposes, the lossless scheme serves better for the required compression. It is just realized such a flow with the introduction of Burrows-Wheeler transform [6]. Michael Burrows and David Wheeler in their paper discuss a complete set of algorithms for compression and decompression. The algorithm takes a block of data and

II. PROPOSED STEPS FOR COMPRESSION The input data are kept in their original format (.mat after extracting out the signal by running the program file

120 © 2009 ACADEMY PUBLISHER

RESEARCH PAPER International Journal of Recent Trends in Engineering, Vol 1, No. 3, May 2009 given in the same site) as are available in the source site [10] to deny any loss. Now onwards, the equivalent numeric data are to be dealt with for any further opera-

string in c corresponding to a source pixel value from block b = a W , and (b) a concatenation processor that i

assembles the bit strings into a representation of b. This would be followed by a concatenation network that would assemble the bit string on Y which represents source input a. Some of the points suggested in [9] are found to be less effective in our experiment, they are: For numeric data compression: 1. MTF coder is more effective than IR coder. 2. Huffman coder is found to be more useful than the Arithmetic coder. 3. the predictive coding is of least use in our process as the original signal has no role directly in compression. If the compression steps (MTF) can be recalled in [6], it can be realized that the input to the variable length encoder is nothing but the positive integers which are ranking values and the number of positions between the repetitive samples. Here the data used include channel one x_800, x_418, Cu01, 102, 103, 117, e0103, e0105, e0107. These are all publicly available in the site of physionet [10]. The corresponding sampling frequency and bit resolution are provided in this site.

Fig.1. Data flow diagram of the proposed method

tion. The data flow diagram is given in figure 1. Initially BWT was designed for an efficient use in text data compression. Gradually it became a useful tool for biomedical signal compression. Use of BWT with other tools like MTF coder or Inversion rank(IR) coder may achieve a compression within a percent or so, that achieved by statistical techniques, but speeds are comparable to those of algorithms based on Lempel and Ziv’s (LZ), so it became popular very quickly. Like Lempel and Ziv, BWT algorithm does not process its input sequentially instead it processes in the form of a block of text as a single unit. For this the input string is first divided into many blocks such that the entire string is covered. By applying BWT to these blocks, it gathers the identical characters without losing the proximity of a character to the adjacent character in the original signal. Now, it is easy to compress these blocks with locally adaptive algorithm such as MTF coder, IR coder etc. For a detailed study the readers are suggested to refer [6], [9], [14]. Finally. Huffman coder is used for entropy coding due its computational efficiency and stability. The performances using both arithmetic coder and Huffman coder are provided in section III. From the current paper point of view it is described below briefly. Huffman Encoding- Let B = 0, 1 and let the map TH :

R X × ( Bn )

Zm

→ BY

III.

In this section the performance measures viz. compression ratio, compression gain, skewness values of various existing algorithms with the proposed one are calculated. But it is hard to compare the speed among algorithms to date as most of them have not mentioned their run time speed. The time complexity of the Huffman algorithm is Ο(n log n) where n indicates the number of characters involved. The average time complexity in a RLE operation is Ο(u log v log w) where u indicates the data size (=unity in our method) that is repeated in the sequence and v indicates the overall length of the sequence and w indicates the size of one element in the sequence. Sometimes the time complexity does not roll in proportion with the execution time. Here, the execution time (CPU time) has been considered while deciding on the complexity. The speed in the present algorithm is found to be at minimum ten times faster as compared to the proposed algorithm discussed in [9].Because Huffman coder is computationally more efficient than arithmetic coder [15]. The results obtained from the various methods are shown in two tables. Table I and II show the compression ratios of different methods described below. They are: 1) Method A uses BWT, IR and arithmetic coding. 2) Method B uses BWT, MTF and arithmetic coding. 3) Method C uses BWT, MTF and Huffman coding. 4) Method D uses BWT, IR and Huffman coding.

denotes the Huffman encoding

which employs (a) quantization to map source input values to Z m = [0 : m - 1]; (b) a codebook

c ∈ ( Bn )

Zm

constructed from input statistics, which maps

each quantized sample value to a bit string of maximum length n; (c) concatenation of bit strings to yield the compressed version of source input a ∈ RX Complexity- If c stores words of average length k bits, then X table looks up and

k.

X CR

input output operations

are required for Huffman compression, where CR= compression ratio. Decomposition- Let X be divided into P partitions Wi, 1 ≤ i ≤ P of equal size, which can be concatenated in normal scanning order to yield X. Let each Wi be input to one of P pipelines, each of which consist of (a) a table lookup processor that outputs the bit 121 © 2009 ACADEMY PUBLISHER

EXPERIMENTAL RESULTS

RESEARCH PAPER International Journal of Recent Trends in Engineering, Vol 1, No. 3, May 2009 are obtained from the data of 3 minutes and the data are in numbers. For this reason it would be inconvenient if we compared with the algorithm reported in [9].

5) Method E uses BWT, IR, RLE, arithmetic coding. 6) Method F uses BWT, MTF, RLE, Huffman coding. It should be noted that the blocks in the methods described above perform in sequence from left to right.

IV.

It has been proposed a modified BWT-based coding scheme that replaces the IR block by an MTF block and incorporates a run-length coding block before the conventional entropy coding stage. Instead of transforming .mat to text, the original format is used thus avoiding any loss in any form. This scheme is more suitable especially in case of transferring ECG data especially for an event recorder, because if we use the existing algorithm which requires a huge amount of input data will not perform satisfactorily. Experimental results suggest that the MTF coder is more suitable as compared to IR coder when the input is in the original format. In this experiment it is found that a scope for further compression is possible due to the presence of two and three digit numbers in groups after MTF coding. If those numbers can be made made smaller before entropy coding then the current compression ratio may further be improved.

TABLE III VALUES OF SKEWNESS FOR DIFFERENT LOSSLESS TECHNIQUES

Sl. No. 1 2 3 4 5 6 7 8 9 10

Data Base x_418; ch1 102; ch1 e0103; ch1 103; ch1 e0105; ch1 104; ch1 e0107; ch1 117; ch1 x_800; ch1 Cu01; ch1

Method E .086472 .073289 .075751 .069534 .11029 .091592 .089228 .064258 .10032 .06801

Method F

.12609 .15392 .16762 .14692 .17076 .14263 .13867 .14363 .1741 .12851

The lower compression ratios obtained in methods (A thorough D) with the use of numeric data as input, make themselves less worthy while the main objective to have a precise compression from a small duration data is fulfilled from method E and F. To ensure the supremacy of the proposed method we have provided the skewness values of last two methods (E and F). Skewness is an important parameter to compare the compression capability among various methods. Higher the skewness better is the compression [12]. Skewness of a distribution is defined as

y=

E(x − µ)

REFERENCES [1] S. M. S. Jalaleddine, C. G. Hutchens, R. D. Strattan, and W. A. Coberly, ”ECG data compression techniques unified approach,” IEEE Trans. On Biomed. Eng., Vol. 37, No. 4, pp. 329343, Apr. 1990. [2] M. Velasco, F. Roldn, J. Llorente and Kenneth E. Barner, ”Wavelet Packets Feasibility Study for the Design of an ECG Compressor,” IEEE Trans. on Biomed. Eng., Vol. 54, No. 4, pp 766-769, Apr. 2007. [3] R. Benzid, F. Marir, A. Boussaad, M. Benyoucef and D. Arar,”Fixed percentage of wavelet coefficients to be zeroed for ECG compression,” Electron. Lett., Vol. 39, No. 11, pp. 830831, May 2003. [4] S. Miaou, H. Yen, and C. Lin, ”Wavelet-based ECG compression using dynamic vector quantization with tree code vectors in single codebook,” IEEE Trans. on Biomed. Eng., Vol. 49, No. 7, pp. 671680, Jul. 2002. [5] A. Chatterjee, A. Nait-Ali, and P. Siarry,”An input-delay neural-network based approach for piecewise ECG signal compression,” IEEE Trans. On Biomed. Eng., Vol. 52, No. 5, pp. 945947, May 2005. [6] M. Burrows and D. J. Wheeler,”A block-sorting lossless data compression algorithm,” SRC Res. Rep. 124, 1994. [7] M. Nelson,”Data Compression with the Burrows-Wheeler Transform,” Dr. Dobb’s Journal, pp.46K, Sept. 1996. [8] B. Balkenhol, S. Kurtz, Y. M. Shtarkov, ”Modifications of the Burrows Wheeler Data Compression Algorithm” Proceedings of the IEEE Data Compression Conf., 1999. [9] Z. Arnavut, ”ECG Signal Compression Based on BurrowsWheeler Transformation and Inversion Ranks of Linear Prediction,” IEEE Trans. on Biomed. Engg., Vol. 54, No. 3, Mar 2007. [10] G. B. Moody and R. G. Mark, ”The impact of the MIT/BIH arrhythmia database,” IEEE Eng. Med. Biol. Mag., Vol. 20, No. 3, pp. 4550, Jun. 2001.

3

(1)

σ3

where µ is the mean of x, σ is the standard deviation of x, and E(t) represents the expected value of the quantity t. The skewness for both the methods after the entropy coding (here arithmetic and Huffman coding) are calculated and given in Table III. All the values in method F are greater than the corresponding values under method E, which justifies the combination of MTF, RLE, and Huffman coder to be more efficient than the others. The compression ratios for all the databases have been calculated from the formula given below:

CR =

( sampling

freq. × duration × resolution ) no. of bits

(2)

A standard and quite comparable average compression ratio of 2.7247 (ratio of the number of bits in the compressed data to the number of bits in the original data) is achieved in method F if the output is considered along with the size of input simultaneously. Table II includes the CGs of two good methods and the positivity of CG for all the databases taken utters the benefit of using MTF over IR while treating the input as numbers than characters. It should be noted that all these results 122 © 2009 ACADEMY PUBLISHER

CONCLUSION

RESEARCH PAPER International Journal of Recent Trends in Engineering, Vol 1, No. 3, May 2009 [11] E. Johnson and J. Ha,”PDATS Lossless Address Trace Compression For Reducing File Size Arid Access Time,” IEEE Conf., 1994. [12] P. Fenwick,”The Burrows-Wheeler transform for blocksorting text compression: Principles and improvements,” Comput. J., Vol. 39, No. 9, pp. 731740, Oct. 1996. [13] C. D. Giurcneanu, I. Tabus, and S. Mereuta,”Using contexts and R-R interval estimation in lossless ECG compression,” Comput. Meth. Prog. Biomed. Vol. 67, No. 3, pp. 177186, Mar. 2002.

[14] Z. Arnavut,”Inversion coding,” Comput. J., Vol. 47, No. 1, pp. 4657, Jan. 2004. [15] X. Kavousianos, E. Kalligeros and D. Nikolos, ”Multilevel Huffman coding: An Efficient Test-Data Compression Method for IP Cores” IEEE Trans. on Comp. aided design of ICs and sys., Vol. 26, No. 6, Jun 2007.

TABLE I COMPRESSION RATIOS FOR DIFFERENT LOSSLESS TECHNIQUES Serial No.

1 2 3 4 5 6 7 8 9 10 Avg. Weig. Avg.

Data Base x_418; ch1

Method A 1.8125

Method B 1.9219

Method C 1.983

Method D 1.9138 2.3836

102; ch1

2.0876

2.3316

2.398

e0103; ch1

2.348

2.6787

2.7386

2.6985

103; ch1

1.9743

2.1591

2.2164

2.2086

e0105; ch1

2.4751

2.7972

2.8714

2.8689

104; ch1

1.9282

2.0966

2.1587

2.1362

e0107; ch1

2.1639

2.3956

2.4519

2.4496

117; ch1

1.9816

2.197

2.2494

2.2294

x_800; ch1

2.0554

2.3011

2.3522

2.4034

Cu01; ch1

1.6861

1.8158

1.865

1.9797

-

2.0513

2.2695

2.3285

2.3272

2.0443

2.2599

2.3192

2.3146

TABLE II COMPRESSION RATIOS AND COMPRESSION GAIN FOR LAST TWO TECHNIQUES Serial No.

1 2 3 4 5 6 7 8 8 10 Avg. Weig. Avg.

Data Base x_418; ch1

Method E 1.9980

Method F 2.4992

CG 20.05

102; ch1

2.1971

2.6944

18.46

e0103; ch1

2.5213

2.9999

15.95

103; ch1

2.0571

2.6433

22.18

e0105; ch1

2.6801

3.2335

17.11

104; ch1

2.0110

2.5687

21.71

e0107; ch1

2.3844

2.7864

14.43

117; ch1

2.0966

2.6824

21.84

x_800; ch1

2.2022

2.6087

15.58

Cu01; ch1

1.9932

2.5309

21.25

-

2.2141

2.7247

23.06

2.2000

2.7192

23.60

123 © 2009 ACADEMY PUBLISHER

Suggest Documents