Compression of waveforms is of great interest in applications where e ciency ... text compression, but perform poorly on most kinds of waveform data, as they fail ...
Lossless Waveform Compression Giridhar Mandyam, Neeraj Magotra, and Samuel D. Stearns
1 Introduction Compression of waveforms is of great interest in applications where eciency with respect to data storage or transmission bandwidth is sought. Traditional methods for waveform compression, while eective, are lossy. In certain applications, even slight compression losses are not acceptable. For instance, real-time telemetry in space applications requires exact recovery of the compressed signal. Furthermore, in the area of biomedical signal processing, exact recovery of the signal is necessary not only for diagnostic purposes but also to reduce potential liability for litigation. As a result, interest has increased recently in the area of lossless waveform compression. Many techniques for lossless compression exist. The most eective of these belong to a class of coders commonly called entropy coders. These methods have proven eective for text compression, but perform poorly on most kinds of waveform data, as they fail to exploit the high correlation that typically exists among the data samples. Therefore, pre-processing the data to achieve decorrelation is a desirable rst step for data compression. This yields a two-stage approach to lossless waveform compression, as shown in the block diagram in Figure 1. The rst stage is a \decorrelator," and the second stage is an entropy coder [1]. 1
This general framework covers most classes of lossless waveform compression.
2 Compressibility of a Data Stream The compressibility of a data stream is dependent on two factors: the amplitude distribution of the data stream, and the power spectrum of the data stream. For instance, if a single value dominates the amplitude distribution, or a single frequency dominates the power spectrum, then the data stream is highly compressible. Four sample waveforms are depicted in Figure 2 along with their respective amplitude distributions and power spectra. Their characteristics and compressibility are given in Table 1. Waveform 1 displays poor compressibility characteristics, as no one particular value dominates its amplitude distribution and no one frequency dominates its power spectrum. Waveform 4 displays high compressibility, as its amplitude distribution and its power spectrum are both nonuniform.
3 Performance Criteria for Compression Schemes There are several criteria in quantifying the performance of a compression scheme. One such criterion is the reduction in entropy from the input data to the output of the decorrelation stage in Figure 1. The entropy of a set of K symbols fs0; s1; ; sK?1g, each with probability
p(si ), is de ned as [2] Hp = ?
X p(si) log [p(si)] bits=symbol
K ?1 i=0
2
(1)
Entropy is a means of determining the minimum number of bits required to encode a stream of data symbols, given the individual probabilities of symbol occurence. 2
When the symbols are digitized waveform samples, another criterion is the variance (mean-squared value) of the zero-mean output of the decorrelation stage. Given an N point zero-mean data sequence x(n), where n is the discrete time index, the variance x2 is calculated by the following equation:
X
N ?1 x2 = N 1? 1 x2(n) k=0
(2)
This is a much easier quantity to calculate than entropy; however, this is not as reliable as entropy. For instance, only two values might dominate a sample sequence, yet these two values may not be close to one another. If this is the case, the entropy of the data stream is very low (implying good compressibility), while the variance may be high. Nevertheless, for most waveform data, using the variance of the output of the decorrelator stage to determine compressibility is acceptable, due to the approximately white Gaussian nature of the output of the decorrelation stage. This assumption of Gaussianity results from arguments based on the central limit theorem [3], which basically states that the distribution of the sum of independent, identically distributed random variables tends to a Gaussian distribution as the number of random variables added together approaches in nity. The compression ratio, abbreviated as c.r., is the ratio of the length (usually measured in bits) of the input data sequence to the length of the output data sequence for a given compression method. This is the most important measure of performance for a lossless compression technique. When comparing compression ratios for dierent techniques, it is important to be consistent by noting information that is known globally and not included 3
in the compressed data sequence.
4 The Decorrelation Compression Stage Several predictive methods exist for exploiting correlation between neighboring samples in a given data stream. These methods all follow the process shown in Figure 3: the same decorrelating function is used in compression and reconstruction, and this function must take as input a delayed version of the input sequence. Some of these techniques are described in the following sections.
4.1 Linear Prediction A one-step linear predictor is a nonrecursive system that predicts the next value of a data stream by using a weighted sum of a pre-speci ed number of samples immediately preceding the sample to be predicted. The linear predictor does not contain the feedback path in Figure 3, and is thus a special case of Figure 3. Given a sample sequence of length K , ix(n) (0 n K ? 1), one can design a predictor of order M by using M predictor coecients
fbig to nd an estimate ix^ (n) for each sample in ix(n): ix^ (n) =
X biix(n ? i ? 1)
M ?1 i=0
(3)
Obviously, M should be much less than K to achieve compression, because fbig must be included with the compressed data. The estimate ix^(n) is not the same as the original value; therefore a residue sequence is formed to allow exact recovery of ix(n):
ir(n) = ix(n) ? ix^ (n) 4
(4)
If the predictor coecients are chosen properly, the entropy of ir(n) should be less than the entropy of ix(n). Choosing the coecients fbig involves solving the Yule-Walker equations:
R0;0b0 + R0;1b1 + + R0;M ?1bM ?1 = RM;0
(5)
R1;0b0 + R1;1b1 + + R1;M ?1bM ?1 = RM;1 ... RM ?1;0b0 + RM ?1;1b1 + + RM ?1;M ?1bM ?1 = RM;M ?1 where Ri;j is the average over n of the product ix(n)ix(n + (i ? j )). This can be represented as the matrix-vector product
Rb = p
(6)
where R is the M by M matrix de ned in (5), b is the M by 1 vector of predictor coecients, and p is the M by 1 vector in (5). Equation (6) can be solved by a variety of techniques involving the inversion of symmetric matrices [4]. The original data sequence ix(n) can be exactly recovered using the predictor coecients
fbig, the residue stream ir(n), and the rst M samples of ix(n) [1]. This is accomplished by the recursive relationship
ix(n) = ir(n) +
X biix(n ? i ? 1) M n K ? 1
M ?1 i=0
(7)
If the original data sequence ix(n) is an integer sequence, then the predictor output can be rounded to a nearest integer and still form an error residual sequence of comparable size. 5
In this case, ir(n) is calculated as
ir(n) = ix(n) ? NINT f
X biix(n ? i ? 1)g
M ?1 i=0
(8)
where NINT fg is the nearest integer function. Similarly, the ix(n) data sequence is recovered from the residue sequence as
ix(n) = ir(n) + NINT f
X biix(n ? i ? 1)g M n K ? 1
M ?1 i=0
(9)
where it is presumed that the NINT fg operation is performed (at the bit level) exactly as in (8).
4.1.1 Determination of Predictor Order Formulating an optimal predictor order M is crucial to achieving optimal compression [1], because there is a tradeo between the lower variance of the residual sequence and the increasing overhead due to larger predictor orders. There is no single approach to the problem of nding the optimal predictor order; in fact, several methods exist. Each of the methods entail nding the sample variance of the zero-mean error residuals ir(n) as determined by (4) for an order M :
X
K ?1 1 2 ir 2(M ) = K ? M ? 1 ir (i) i=M
(10)
One of the easiest methods for nding an optimal predictor order is to increment M starting from M = 1 until ir2 (M ) reaches a minimum, which may be termed as the minimum variance criterion (MVC). Another method, called the Akaike Information Criteria (AIC) [5], involves
6
minimizing the following function:
AIC (M ) = K ln ir2 (M ) + 2M
(11)
The 2M term in the AIC serves to \penalize" for unnecessarily high predictor orders. The AIC, however, has been shown to be statistically inconsistent, so the minimum description length (MDL) criterion has been formed [5]:
MDL(M ) = K ln ir2 (M ) + M ln K
(12)
A method proposed by Tan [6] involves determining the optimal number of bits necessary to code each residual (M ) = 12 log2 ir2 (M ). Starting with M =1, M is increased until the following criterion is no longer true: (K ? M )(M ) > (M )
(13)
where (M ) = ?[(M ) ? (M ? 1)] and (M ) represents the increase in overhead bits for each successive M (due mainly to increased startup values and predictor coecients). There are several other methods for predictor-order determination; none has proven to be the best in all situations.
4.1.2 Quantization of Predictor Coecients Excessive quantization of the coecients fbig can reduce the c.r. and aect exact recovery of the original data stream ix(n). On the other hand, unnecessary bits in the representation of fbig will also decrease the c.r. The representation proposed by Tan et al [1] is brie y 7
outlined here. Given predictor coecients fbig, an integer representation is
ibi = NINT f2N
ib
?I ?1 b g ; i
0i