On the Realtime Satellite Image Compression of X-Sat - MIRLab

ICICS-PCM 2003 15-18 December 2003 Singapore

1C2.4

On the Realtime Satellite Image Compression of X-Sat R M Susilo, T R Bretschneider School of Computer Engineering Nanyang Technological University

Abstract The transmission of image data acquired by remote sensing missions based on spaceborne platforms is a major bottleneck. It results from limited power on-board, the huge volume of the data, and the number of accessible ground receiving stations. In particular these arguments hold true for small satellites that face additional design constraints in terms of size, mass and cost. To relax the downlink restriction image compression has to be applied. This paper analyses the architecture and requirements for the minisatellite X-Sat and sets them in relation with selected compression algorithms. Due to the realtime nature of the mission, i.e. simultaneous imaging and downlinking, only hardware-based solutions are considered. The investigation addresses the issues of achievable compression ratios and guaranteeing lossless compression.

1. Introduction One of the most crucial bottlenecks in nowadays remote sensing missions based on satellite platforms is the limited downlink bandwidth. Although the provided data transmission rates are constantly growing, they cannot keep up with the exponentially increasing data flood provided by the scanners. The reasons are twofold for the satellite under consideration: Firstly the acquisition rate exceeds the transmission rate and secondly the satellite is not in constant visibility of a ground receiving station, which effectively limits the downlinkable data volume per orbit. Data compression relaxes the latter constraint and although the approach is not capable of solving the original problem it is able to boost the missions’ benefits. However, the simultaneous data acquisition and downlink to the ground receiving station demands compression without exception to bridge the difference between the acquisition and the transmission rate. This paper investigates the compression possibilities for Singapore’s first remote sensing satellite X-Sat and compares the different approaches. The main objective is to enable realtime imaging and transmission using the onboard available resources. An in-depth analysis compares different algorithms, namely predictive coding, static and dynamic Huffman coding, arithmetic coding as well as Golomb-Rice coding. The main focus is the assessment of the algorithms’ ability to reduce the 81 Mbit/s data stream of the multispectral push-broom camera to the supported

0-7803-8185-8/03/$17.00 © 2003 IEEE

downlink data rates of 12.5, 25 and 50 Mbit/s, respectively. Further issues comprise the suitability of the different compression approaches for the implementation in the existing system, i.e. the solution has to be mapped onto an FPGA without external memory. This paper is organised as follows: Initially Section 2 provides a brief overview of the X-Sat design and highlights the resulting requirements for the compression. The different selected algorithms are discussed in Section 3, which also addresses the general suitability in terms of the previously derived constraints and the mapping into hardware. Afterwards Section 4 presents the results. Finally, conclusions are given in Section 5.

2. Overview of the X-Sat The X-Sat is a mini-satellite with a mass of approximately 110 kg and dimensions of 600 mm × 600 mm × 800 mm. The main payload is the imaging instrument IRIS that provides multispectral data in the visible and near-infrared wavelength range with a spatial resolution of 10m at 81Mbit/s. For the downlink of the image data a X-band with transmission rates of 12.5, 25 and 50 Mbit/s is available. Additionally the satellite possesses a 2 Gbyte storage device called RAM-Disk, which enables temporary recording of acquired data. The entire system is controlled by the on-board computer (OBC) that connects to all sub-systems via a redundant controller area network (CAN). For high-speed connections, i.e. for image data, dedicated lowvoltage differential signalling (LVDS) links are utilised. Figure 1 provides a connectivity diagram of the X-Sat whereby only relevant sub-systems for the focus of this paper are considered. More details about the entire satellite can be found in [1]. CAN: System S-Band S-Band Transmitter Transmitter

CAN bus LVDS link Direct link

On-Board On-Board Computer Computer CAN: Payload

X-Band X-Band Transmitter Transmitter

Camera Camera

RAM-Disk RAM-Disk

Parallel Parallel Processing Processing Unit Unit

Figure 1: Partial connectivity diagram of X-Sat In addition to the OBC a high performance computer is part of the X-Sat. The so-called parallel processing unit

(PPU) consists of four fully connected radiation-hardened field programmable gate arrays (FPGA) with each of it hosting five processing nodes (PN) utilising commercialoff-the-shelf processors and dedicated local memory. A schematic overview of the PPU is shown in Figure 2. The PPU is designed to provide computational resources in space that reach beyond those provided by radiation-hardened or space-qualified components [2]. However, due to the hostile environment the system is subject to degradation and the only truly reliable components are the four central FPGAs. Thus only they can be used for the realtime compression since the mission objective is to provide simultaneous imaging and downlinking throughout the satellite’s lifetime. In particular no external memory can be utilised. CAN CAN

PN PN

PN PN

PN PN

PN PN

PN PN

PN PN FPGA FPGA

PN PN Flash Flash Flash Flash

RAM-Disk Flash Flash

PN PN

PN PN FPGA FPGA

Flash Flash Flash Flash

PROM PROM PN PN

RAM-Disk

FPGA FPGA

PN PN

PN PN

Flash Flash Flash Flash

Flash Flash Flash Flash Flash Flash Flash Flash

PN PN

PN PN

PN PN

PN PN

PN PN

PN PN

The main considerations for the selection of a suitable compression algorithm are the achievable compression ratio, viability and simplicity of implementation as well as low storage requirements. Moreover the data format is of importance. Although the camera interlaces the different spectral bands the data consists after de-interlacing basically of three infinite byte (8-bit pixel values) streams. In any case it is preferable that the coder is adaptive to result in a sufficient compression ratio for a given uncertain data distributions. Therefore, looking at these initial constraints, the dictionary type coding types, such as Lempel-Ziv, are ruled out right away, mainly due to their storage requirements, computational complexity of search operation and the fact that they are better suited for compressing words rather than byte stream [3].

CAN CAN

PN PN

PN PN FPGA FPGA

PROM PROM

3.1. Selected Compression Techniques

For the given context the better option is the utilisation of entropy coders such as Huffman, arithmetic and GolombRice coder. The run-length coder is no real alternative due to its poor performance on satellite images but suitable as pre-processor. In an investigation using actual imagery with the same expected spatial and spectral resolution with respect to the X-Sat the following schemes were implemented in software and analysed: 1.

Predictive Coder: The approach predicts a pixel value from its neighbouring pixels and transmits the difference signal. Advantage: Simple and fast. Disadvantage: Poor compression ratio for real data.

2.

Huffman Coder: The static version uses a codebook of variable length codes that is optimal for a fixed known source distribution, while the dynamic version does not require any codebook to be available but instead maintains a so-called Huffman tree, which is updated every time a symbol is received. Advantage: Optimal results for the static version applied to a known distribution and efficient performance for a wide range of data entropies in the case of the dynamic approach. Disadvantage: Probability distribution of the source has to be known and an expensive data structure has to be maintained.

3.

Arithmetic Coder: It achieves optimal encoding by merging the probabilities of many symbols into a single high precision fraction for the representing symbol set. The difference with respect to the Huffman coder is the assignment of a unique tag with variable length to a group of symbols rather than for only one symbol. The technique is particularly useful if applied to data with a small alphabet and highly skewed distribution. Advantage: The coder is optimal and has low memory requirements for low order coder. Disadvantage: Extensive computing power is required.

4.

Golomb-Rice Coder: The code of the Rice coder is a derivative of Golomb codes [3] and encodes non-negative symbol run-lengths. These codes are characterised

Flash Flash

Figure 2: Schematic of the PPU For the described scenario of simultaneous imaging and downlinking the flow of the data starts from the camera and is passed through the RAM-Disk (without storing the actual image) to the PPU. There the incoming information is compressed by the FPGAs and streamed out in realtime via the second link to the RAM-Disk. Again the data stream is not recorded but forwarded to the X-band unit. The actual data format provided by the camera is interlaced, i.e. the different spectral bands (G,R,NIR) for a certain pixel p(i,j) are grouped before the next pixel is coded.

3. Realtime Compression Schemes There are several options and scenarios, which were considered during the initial stage of the implementation. Foremost they depend on the three different transmission rates supported by the X-band transmitter. Assuming the best case, i.e. 50 Mbit/s, a compression ratio (CR) of 1.62 is required for lossless compression. Similarly, ratios of 3.24 and 6.48 are needed for the 25 Mbit/s and 12.5 Mbit/s options, respectively. While lossy compression can increase the CR it often is not a suitable option in remote sensing due to the introduced distortions. Therefore, the first step in building compression a module for the X-Sat is to design a lossless coder that can support at least a realtime compression ratio of 1.62.

by an encoding parameter m that is constrained to be a power of two. Non-negative integers are coded in two parts [4], i.e. a unary representation of the integer division n / m and a binary representation of n mod m. Advantage: Efficient implementation since modulus and integer operations can be simplified to masking bits. Moreover the code is optimal for exponentially distributed sources [5]. Disadvantage: At least one bit has to be sent for each pixel, which may penalise long runs of bytes with identical values. It is worth noting that using a suitable prediction model is very important to achieve sufficient compression ratio, i.e. the prediction has to transform a given distribution in a uniform one and produces a skewed Laplacian-like distribution, centred on zero with quasi-symmetric long tails. Using a horizontal predictor for the given image data, the data distribution shows gradation, which is centred on different values before prediction. This suggests that there is a high correlation among pixels. After the prediction, a Laplacian-like distribution is achieved for all given test images as shown in Figure 3. This results in a significantly incremented CR of 0.4 to 0.6 achieved by all entropy coders except for the arithmetic coder with 0.2 only. For a better prediction, rudimentary edge detection [6] or an adaptive version [7] can be used. However, this imposes additional costs to the system since an extra storage for at least one image row (15’000 bytes) has to be available in the FPGA in case the former techniques is used and two or more rows if the latter approach is taken. 2500 2000 1500 1000 500 0 0

50

100

150

200

250

Figure 3: Data distribution after horizontal prediction

ages. However, its dynamic version is clearly not a good choice for a hardware implementation since firstly pointerlike data structures are generally not synthesisable in hardware description languages, secondly a periodic reset of the tree is required, and last but not least the update operations are time and memory consuming. For the static Huffman coding a reasonable accurate codebook is required a-priori. One way to solve this problem is to use multiple static tables, which are selected based on the result of an on-board analysis algorithm. Obviously, this approach is sub-optimal since it requires a stream of test data for the selection and cannot compensate for sudden changes in the observed area, for example the transition of a landmass towards an offshore region. However, with an appropriate set of static tables, a hardware implementation is cost efficient and fast. The arithmetic coder is not a speed-oriented coder [9] and violates the pre-defined realtime constraint for the particular case of X-Sat. But data parallelisation can ease the problem by a factor of three like discussed in the following sub-section. The Rice coder is an adaptive version of the Golomb coder. It has been recommended by the consultative committee for space data systems (CCSDS) as a sound lossless data compressor. Note that its adaptive scheme codes are close to the source’s entropy. In general, the code is comparable with those of Huffman and arithmetic coding [10] and can be extended to any entropy range desired. However, the RiceGolomb coder needs a superior application dependant predictor. Looking at a recent implementation [11], it has to be extremely fast in hardware (20-80 Msample/s at 0.11 W). The main operation is counting and can be implemented near the speed limit of any very large scale integration (VLSI) technology. However, there are two disadvantages of this method. Firstly, it is inefficient for encoding a long run of input bytes but switching to a run-length mode can solve this problem [6]. Secondly, it requires a certain amount of memory for the mapping function, which depends in accuracy on the context selection [12]. All results in terms of achieved for the three different test images are shown in Figure 4, whereby the values reflect the average over all multispectral bands. As a conclusion the static Huffman and the Rice coder were selected for the hardware implementation due to the compression performances and the suitability for a straightforward hardware solution. 3.0

3.2. Suitability for Hardware Implementation

Run-length

Arithmetic

Huffman

Rice

Given the listed options, the main criteria for choosing the right coder are the hardware costs, achievable compression ratios and respective speeds. While the hardware implementation for the run-length coder is trivial [8], it cannot achieve the required compression ratio for lossless compression. The predictive coder by itself is obviously not a good compression method, but using prediction is useful for modification of the data distribution as discussed earlier. The actual hardware requirements depend on the prediction scheme. In the software analysis, the Huffman coder provided the best compression ratios for all test im-

Compression Rates

2.5 2.0 1.5 1.0 0.5 0.0

Scene 1

Scene 2

Scene 3

Figure 4: Comparison of CR obtained for different coders

Clearly lossless compression is not a possibility for the two transmissions rates of 12.5 Mbit/s and 25 Mbit/s since the achieved CRs do not match the required CRs.

3.3. Parallelisation The most efficient entropy coders are limited in speed by their fundamental feedback loops. While it is not straightforward to parallelise this inner loop, one possible solution is to replicate the entire coder and parallelise the input stream. A prefixed de-interlacing step separates the input data stream according to the multispectral bands. Thus the speedup is directly proportional to the number of coders since no communication among the coders is required. One major problem associated with this type of parallelisation is the existence of multiple code streams. Since only one transmission channel is available, the code streams have to be interleaved again after the compression. One possible solution is to group codewords from different coders into fixed length words that contain many codewords and parts of codewords. Assuming each coder requires the same number of operation to output one m-bit word, a simple pipelining strategy can be used to interleave bytes for each stream in order. An example for three coders (each associated to one of the image bands) with n and p being the number of clock cycles to output one m-bit word and write the output to an external entity, respectively, is depicted in Figure 5. Coder 1

n

p

Coder 2

n

n

p

Coder 3

n

n p

2p

n

p n+p

n+2p

n+3p

Time

Figure 5: Schedule for parallel encoding

codeword in its static table. Afterwards the state machine changes towards the ‘shift’ state to create from the variable bit output of the codeword selector an output byte. If not enough bits are available to fill a byte the state is changed back to ‘load’. Alternatively the ‘write’ state is taken, which writes a fully constructed byte to the external world, i.e. the transmitter. Note that the Rice coder implementation is conforming to the CCSDS standard. As shown in Figure 6, the entropy coder consists of a first-in-first-out (FIFO) buffer, which delays the input bits to the coder while the winner select module decides, which coder is to be used. The winner select module is a bit counter that counts the length necessary to output each input used by different coders. This output length counting can be done efficiently because the fundamental sequences (FS) and split-option codeword length of the Rice coder is deterministic, i.e. one clock cycle for every byte in a data block. Afterwards, the sigma is passed to a suitable N-split option coder for encoding and the identifier bit, which indicates the used option, is sent as a prefix of the output data. One of the interesting aspects is that it is not necessary to provide a few parallel coders with this scheme. Yeh [11] suggested that since all the coders are similar, they actually could share the hardware needed for their operations. In particular, since the Rice code consists of FSs and splitoption sequences, these two sequence generators are all that is needed to implement N-split option coders with some data formatter module afterwards. Beside, it performs additional pre-processing steps, which include prediction and mapping of an input block. Meanwhile, the pre-processing stage would follow the recommendations given in [4], which are a differential pulse code modulation (DPCM) and a standard mapping function. The efficiency of a horizontal predictor for satellite images is discussed by Korany [13]. In the future, it is necessary to include an application specific predictor, which follows closely to the entropy limit of a given data set.

Since data re-ordering is not needed in this case, the additional memory requirement for the scheme is relatively low, i.e. 2m bits if one word consists of m bits for the delayed words in the second and third coder.

3.4. FPGA Implementation The static Huffman, the Rice and arithmetic coder were implemented on a Xilinx XCV800 FPGA. Note that this family member provides more resources than the flight model, which will be the radiation-hardened XQVR300. However, there will be no shortage in resources like configuration logic blocks (CLB), logic cells etc. since the PPU contains four FPGAs like shown in Figure 2.

4. Performance

The Huffman coder is implemented using a four stages state-machine, which has the main functionality to construct a byte from a variable bit output. In general, the coder is in one of three normal states at one time, i.e. ‘load’, ‘shift’ and ‘write’. At starting time, it loads an input from the external source and finds the corresponding

All the coders mentioned were implemented in software, while the Huffman and Rice coder were also realised for the FPGAs. A module for the arithmetic coder was developed as well but measurements showed that the set requirements were not met. The compression ratios obtained for the software implementation were already given in

Figure 6: Rice coder architecture

Figure 4 and since the coder mapping into hardware followed the same strategies the results are identical for the hardware counterpart. The hardware specifications of the Rice and Huffman coder are given in Table I. Both coders were written in VHDL and implemented in a Xilinx XCV800 FPGA. As the initial timing analysis suggested, the output data rate of both coders exceeds the required input rate of 81 Mbit/s. Regarding the memory requirement, the Rice coder is particularly memory efficient and only needs little storage space for the FIFO module. Thus it can easily be fitted within the FPGA and there is no need for external memory. The Huffman coder needs slightly more storage depending on the number of tables that is intended to be used. The same VHDL code may be synthesised into a number of different hardware architectures based on the imposed constraints, i.e. there is a trade-off between speed and area used by the synthesized design. If the circuit needs to meet a stringent speed constraint, the synthesis tool may decide to pipeline the circuit, which increases the number of storage space needed in the end. In the actual implementation around 150 bytes for a block size of 16 were required for the FIFO module.

References [1] T. Bretschneider, “Singapore’s satellite mission X-Sat”, Proceedings of the International Academy of Astronautics Symposium on Small Satellites for Earth Observation, pp. 105–108, 2003. [2] I.V. McLoughlin, V. Gupta, G.S. Sandhu, S. Lim, T.R. Bretschneider, “Fault tolerance through redundant COTS components for satellite processing applications”, to be published in the Proceedings of the International Conference on Information, Communications and Signal Processing, Singapore, 2003. [3] K. Sayood, Introduction to Data Compression, MorganKauffman, Second edition, 2000. [4] CCSDS 121.0-B-1, Lossless Data Compression for Space Data System Standards, Washington D.C., Blue Book, Issue 1, 1997. [5] CCSDS 121.0-G-1, Lossless Data Compression Report Concerning Space Data System Standards, Green Book, Issue 1, 1997.

In summary both coders meet the time and memory constraints and are therefore suitable for the X-Sat.

[6] M.J. Weinberger, G. Seroussi, G. Sapiro, “LOCO-I: A low complexity, context based, lossless image compression algorithm”, Proceedings of the IEEE Data Compression Conference, pp. 140–149, 1996.

Rice coder 100–150 MHz 130 Mbit/s 150–200 byte

[7] G. Motta, J.A. Storer, B. Carpentieri, “Lossless image coding via adaptive linear prediction and classification”, Proc. of the IEEE, Vol. 8, No. 11, pp. 1790–1796, 2000.

Clock Rate Data Rate Memory

Huffman Coder 80 MHz 200 Mbit/s 500–4000 byte*

Table I: Hardware specification of Huffman and Rice coder (* depends on the number of used tables)

5. Conclusion A selection of lossless compression algorithms was investigated in terms of their suitability for the expected image data and provided hardware environment. In particular the emphasis was on the realtime compression without the utilisation of any additional external memory. The reasons were twofold and derive themselves from the mission requirements, namely the simultaneous imaging and downlinking as well as the utilisation of electronic components that will be operational throughout the mission’s lifetime. The analysis of the algorithms revealed that only the Rice and static Huffman coder fulfilled the imposed requirements and therefore was selected to be implemented in an FPGA. The measurements proofed that the necessary compression was achieved and that realtime lossless compression can be guaranteed for natural scenes.

[8] S. Park, I. Park, J. Cha, H. Cho, “Reusable design of run length coder for image compression application”, Proceedings of the IEEE Asia Pacific Conference on ASICs, pp. 274 – 277, 1999. [9] J.L. Mitchell, W.B. Pennebaker, “Optimal hardware and software arithmetic coding procedures for the Qcoder”, IBM Journal of Research Developments, Vol. 32, pp. 727–736, 1988. [10] P.G. Howard, J.S. Vitter, “Fast and efficient lossless image compression”, Proceedings of the IEEE Data Compression Conference, pp. 351–360, 1993. [11] P.S. Yeh, “Implementation of CCSDS lossless data compression for space and data archival applications”, Proceedings of the Space Operations Conference, 2002. [12] D. Wu, E.C. Tan, “Comparison of lossless image compression algorithms”, Proc. of the IEEE Inter. Conference on Image Processing, Vol. 1, pp. 718–721, 1999. [13] E.A. Korany, “Prediction model selection for compression of satellite images”, Proceedings of the IEEE National Radio Science Conference, pp. 329–338, 1996.