Parallel Algebraic Approach of BCH coding in VHDL - kulslide

0 downloads 0 Views 153KB Size Report
we adopted n=63 and k=57, BCH(63,57) an usual configuration in many scientific communication systems as CCSDS telecommand systems of European Space ...
Parallel Algebraic Approach of BCH coding in VHDL Lu´ıs Vit´orio Cargnini∗ , Rubem Dutra Ribeiro Fagundes∗ , Eduardo A. Bezerra† and Gabriel M. Almeida† ∗ Instituto

de Pesquisas Cient´ıficas e Tecnol´ogicas - IPCT - PUCRS - PPGEE - FENG Porto Alegre, RS, Brasil, 90619-900 Tel: +55 (51) 3320-3565 Fax: +55 (51) 3320-3904 E-mails: [email protected] [email protected] † Embedded Systems Group, FACIN, Catholic University (PUCRS) Porto Alegre, RS, Brasil, 90619-900 Tel: +55 (51) 3320-3611 Fax: +55 (51) 3320-3904 E-mails: [email protected] [email protected]

Abstract— This work introduces an algebraic approach, using a Hardware Description Language (HDL) and shows that nowadays microelectronics technology could solve algebraic problems that were considered unsolvable using traditional sequential implementation forms as Berlekamp-Massey. An algebraic approach to implement Error Correcting Codes (ECC) is proposed, and implemented using a Hardware Description Language, specifically VHDL. The ECC designed for HDL algebraic implementation is Bose-Chaudhuri-Hocquenghem (BCH), that is one of the most important cyclic block codes. In this research work, we adopted n=63 and k=57, BCH(63,57) an usual configuration in many scientific communication systems as CCSDS telecommand systems of European Space Agency (ESA) and Agˆencia Espacial Brasileira (AEB). The achieved results clearly shows the main idea in our approach: to prove that an algebraic implementation is a far better approach, leading to an impressive efficiency, and much more suitable than any other sequential algorithm, even then the ones in a hardware version.

I. I NTRODUCTION Error Correcting Code has been applied in a wide range of communications systems, whenever it is necessary to keep transmitted data unchanged under noisily hostile environment. The Bose-Chaudhuri-Hocquenghem (BCH) algorithm [1], [2] is one of there ECC, with many applications like Consultative Committee for Space Data Systems (CCSDS), Brazilian Space Agency Telecommand/Telemetry (CCSDS compliant) system, CDMA mobile phones and so on. As linear algebraic operations, BCH is usually performed by sequential computing algorithms as Berlekamp-Massey, Euclidean, Suguiama or Sudan-Guruswami denoted in [3] and so on. In this work, we will prove that is possible to perform a BCH directly by its fundamental algebraic approach, since that such algorithm will be realized in hardware described using specifically the Hardware Description Language (HDL), the VLSI Hardware Description Language (VHDL). This algebraic approach will explore the main advantages of hardware implementation, like highly parallelism operations, high throughput, which is quite different from any usual

computing methods. Furthermore, this new implementation approach will lead to a smaller hardware and a expressive speed up, in contrast with computing algorithms (like BerlekampMassey, for instance) even in a HDL version. II. B OSE -C HAUDHURI -H OCQUENGHEM (BCH) In 1948 Claude Elwood Shannon published the paper ”A mathematical theory of communication” [4]. This paper, according to a few authors, is the beginning landmark of Information Theory and Codification Theory. BCH codification, is a generalization of Hamming codes [5], that permit the correction of more than one wrong bit in a block of codewords. It is one of the most important coders belonging to the class of the cyclic linear block-codes. It is possible to make BCH coders with different configurations of parity bits and correction capabilities, that is with different block sizes. In [6] there are some possible configurations for block (n), data(k) and correction capacity (t) lengths, to construct coders with maximum block size of 255 symbols. Also, it is possible to assure that the g(x) generator polynomial order will be n − k, where g(x) is based on cyclic codes properties. BCH coders are important, as they outperform other existent coders for same block sizes. Usually, BCH employ a binary alphabet and a codewords block with size 2m − 1 [6]. Some possible configurations for BCH coders construction are presented in both, [6] and [1]. BCH coders, can be designed to correct any number of errors t, according to the coder configuration. The BCH code length are always n = 2 j − 1 for j ≥ 3, besides we know that error detection (r) and correction (cor) capacities are, respectively, r = n − k and cor = r−1 2 (also known as 2t). III. A LGEBRAIC BCH I MPLEMENTATION An algebraic BCH prototype was implemented in VHDL, which is a hardware description language [7]. This language has allowed to describe our approach in a high level of

abstraction extracting as much parallelism as possible, employing combinational logic in most of it [8]. The algebraic approach has been adopted in place of classical ECC sequential algorithms (e.g. Berlekamp-Massey, Euclidean, Suguiama or Sudan-Guruswami [3]) showing that in nowadays VLSI technologies and tools it is feasible to adopt such pure and algebraic approach [9], [10], [11]. Until recently, sequential algorithms were conceived as ways to implement ECC techniques like BCH due to computational hardware limitations, where hardware parallelism cannot be applied or whell explored. Such sequential algorithms achieved successfully great performance and, even now, there are VHDL versions in hardware, usually the Berlekamp-Massey and Euclidean Algorithms. However, as an algorithm created to run in a sequential non-parallel state machine, any VHDL version will use considerable amount of hardware resources, because the main paradigm behind all those algorithms is the same: “to run efficiently in a sequential state machine”. For the other hard, the main paradigm in our approach, is to explore extremely hardware parallelism, because is the main advantage of VHDL realizations, and for that, the ECC scheme, like BCH, will be implemented in this pure algebraic form. This new approach will lead us a algebraic structures in hardware, highly parallel, and extremely efficient as any algebraic description of these ECC scheme can be. In this sense HDL languages can give us ways to make this new approach real, where parallelism is intrinsic, and the result of a synthesis is a digital hardware that can be implemented using FPGA or CMOS technologies. Based on this, we are explaining why the use of algebraic approach and HDL to implement it. The polynomial g(x) defined in [6] is the polynomial generator g(x) = x6 + x5 + 1. Such polynomial has a linear algebraic version G, which is denoted by: Gk,n = [Ik,k |Pk,n−k ]

In this sense, the submatrix P was created using Maple1 , a mathematical software, and statically defined in the VHDL codification of (3), as part of G. The m vector is the coded m message vector created by coding process and the final result of BCH error correcting code method. Such m is the one that will be transmitted by any kind of communication channel, with ability to detect n−k (n − k) − 1 errors. errors and fix 2 A. Coding Block The coding procedure has two main steps: 1) The generation of matrix G; 2) The generation of m. The generation of matrix G will be performed just once, because G will remain unchanged as constant values for a given coding scheme. The generation of vector m is the codification process itself, because m is a coded version of m, the message vector created by BCH for every message vector transmitted trough communication channel. Here we present, in detail, these steps:  Generating G matrix: • • •

compute submatrix P by any given math software; build G by P and identity matrix I as seen in (1); build a VHDL matrix multiplication block, in order to perform (3).

 Generating m: • • •

receives a message vector m; generate m by performing a matrix multiplication according to (3); output m.

The coding block can be seen in figure (1):

(1)

where the submatrix P is the parity matrix with formulation denoted in (2) and submatrix I is the identity matrix. k+(n−k)

Pk,n−k =



(xi mod g(x)) mod 2.

(2)

i=n−k+1

The BCH coding process is fully performed as denoted in (3): m = m ∗ G (3) where m is the message vector with k bits, m is the coded version with n bits and G is the generator matrix Gn,k . Regarding (3), the usual linear algebraic operations necessary to generate m are the main advantage of the VHDL implementation, because such equation is extremely hard to perform by a computational approach. However, by linear algebraic approach, equation (3) will become a hardware implementation which is very suitable for FPGA devices and even to create a Application Specific Integrated Circuit (ASIC) [12].

Figure 1: Internal concept of encoder block.

B. Decoding Block The decoding process is denoted by equation (4): 1 Maple is a general-purpose commercial computer algebra system. It was first developed in 1981 by the Symbolic Computation Group at the University of Waterloo in Waterloo, Ontario, Canada. Since Maple 6, the language has permitted variables of lexical scope.

S =

n−k

∑ s(i) = m ∗ H 

(4)

i=0

where S is usually called syndrome vector S, and H  is well known as parity check matrix, expressed by equation (5). 

Pk,n−k H = Ik,k 

 (5)

From (4) where P and I are the same submatrix used to create G, the syndrome vector S play a main rule in the decoding process, because this vector directly identifies any given error occurred in the transmission channel. When S is the zero vector Ø, there is no errors in the transmission process and the message is achieved by removing parity bits from m. However, if S is not the Ø vector, the S vector resulting from (4) will lead to the pattern-error eˆ stored in the decoder as a look-up table and this pattern-error will correct the message by: m = m + eˆ

(6)

where the addition operation in (6) is a XOR logic operation performed bitwised between m and e. ˆ The decoding procedure is described in two main tasks: Generating the pattern-error look-up table, and correcting the received message. Generating the pattern-error look-up table will be performed just once, in the decoding creation task, with the following steps: • •



Create a given m with the coder block; Choose a given pattern error which we will calculate a given syndrome (please note that each syndrome is directly associated with a given pattern-error) example for a BCH(7,3), we can choose a error pattern e: ˆ 0000001, which means a error in the last significative bit. Generate a corrupted message mˆ using (7), denoted by equation: mˆ = m + eˆ



C. Development Environment The coding and decoding blocks are prototyped on a xc2v500-4fg456 with a operation frequency above 600 MHz, using XST2 included in ISE version 8.1.03i, the clock frequency is obtained in XST synthesis report. For this project, XST was configured to one-hot CASE format description, and also CASE styles set to operate in full-parallel and effort set to maximum and focus on speed. This configuration features are important, because they will generate, at the end, a oriented to maximum operation speed; with no constraints about allocated area. IV. R ESULTS The tables below show that besides the heavy block necessary to perform all the linear operations, like matrix multiplication, the algebraic approach has a extremely good area efficiency (about 2% of the allocated area) with a maximum frequency operation around 687.049 MHz. The achieved results are very impressive, as can be seen in the following tables. Table I: Encoder area summary after synthesis of encoder block Number Number Number Number Number

of of of of of

Slices: Slice Flip Flops: 4 input LUTs: bonded IOBs: GCLKs:

72 out of 3072 126 out of 6144 64 out of 6144 122 out of 264 1 out of 16

2% 2% 1% 46% 6%

(7)

Calculate the syndrome vector S performing (4) with m, ˆ as denoted: S = mˆ ∗ H 

Figure 2: Internal concept of decoder block.

(8)

the pattern error look-up table will be built by using each kind of pattern-error, applying in sequence (7) and (8). Correcting the received message, will be performed by (6), with a XOR operation bitwise between m and e. ˆ Please note that the submatrix P and I is the same used on the coding block, and the same submatrix in H  . The figure (2) describes the decoding block.

Table II: Encoder timing summary after synthesis of encoder block Minimum period: Minimum input arrival time before clock: Maximum output required time after clock: Maximum combinational path delay:

1.456ns 5.172ns 5.446ns No path found

The maximum frequency for encoder, according XST synthesis final report is: Maximum Frequency: 687.049MHz R R 2 XST is a Xilinx tool that synthesizes HDL designs to create Xilinx specific netlist files called NGC files. The NGC file is a netlist that contains both logical design data and constraints that take the place of both EDIF and NCF files.

To the decoder block the synthesis final report summary was the following, as reported on tables (III) and (IV): Table III: Decoder area summary after synthesis of decoder block Number Number Number Number Number

of of of of of

Slices: Slice Flip Flops: 4 input LUTs: bonded IOBs: GCLKs:

79 out of 3072 114 out of 6144 138 out of 6144 122 out of 264 1 out of 16

2% 1% 2% 46% 6%

Table IV: Decoder timing summary after synthesis of decoder block Minimum period: Minimum input arrival time before clock: Maximum output required time after clock: Maximum combinational path delay:

1.456ns 8.292ns 5.446ns No path found

The maximum frequency for decoder, according XST synthesis final report is: Maximum Frequency: 687.049MHz. Figure (3) shows a full simulation running coder and decoder in parallel,in which is possible to see the four cursors in picture showing the time transitions occurrences on system. They are named according with its anchors in parenthesis: “First Message” (c1), “First Message Codified” (c2), “Message Received” (c3) and “Message Decoded” (c4). Using these cursors and measuring the time difference the c1 and c2 we can see that the time to codify a message is 1600 ps (ps picoseconds) and that the time to decodify a message is 2000 ps. A new message is generated in a interval of 50 ns inside the test bench. An error generator unit was created to intercept the message and add an error to the message as demonstrated in equation (7) and the message is corrected using equations (6) and (8). Using equation (8) we find the syndrome, in which we found the pattern-error can be founded, as demonstrated in picture (2). This pattern-error is summarized with vector message containing errors m, ˆ as denoted in equation (7), which is the recovered message. From the results above we have measured the coding and decoding throughput in about 625M data packages per second. In order to compare our approach implementation, we synthesized an Euclidean BCH. The Euclidean implementation is recommended as the best choice for an hardware implementation, but this algorithm, also, has a sequential part, and hardware implementation in a sequential form. The synthesis results for the Euclidean BCH encoder with same coder configuration (63, 57), device and same synthesis parameters, are presented in the tables (V) and (VI): Table V: Euclidean encoder area summary after synthesis Number Number Number Number Number

of of of of of

Slices: Slice Flip Flops: 4 input LUTs: bonded IOBs: GCLKs:

(a) Complete figure of simulation waveform

14 out of 3072 10 out of 6144 27 out of 6144 18 out of 264 1 out of 16

0% 0% 0% 6% 6%

(b) Cursors

(c) Cursors c1 and c2

(d) Cursors c3 and c4

Figure 3: Both coder and decoder running simultaneously, demonstrating algebraic approach capacity. Table VI: Euclidean timing summary after synthesis Minimum period: Minimum input arrival time before clock: Maximum output required time after clock: Maximum combinational path delay:

4.521ns 4.906ns 5.722ns No path found

Please note that the maximum operation frequency is 221.214MHz against 687.049MHz of our algebraic approach implementation. Also in figure (4) we could observe both encoders operation waveforms. BCH (Euclidean and algebraic) was executed in same frequency operation of 50 MHz. According synthesis and simulations our implementation need 8.292ns for setup time against 4.906ns of Euclidean, this is the point were we lose to Euclidean implementation. However our maximum frequency operation (considering the device where experiments was placed) is 687.049MHz while Euclidean in same device can operate at 221.214MHz maximum frequency. In the pictures (4a and 4b) is possible to see that with 50MHz clock frequency in our approach, the system expend 113ns, against the Euclidean that expend 480ns in the coder process, encoding the same amount of data. Comparing both systems, our approach (the pure algebraic hardware realization) has

speed-up of 4.2477. In other words, an algebraic hardware implementation is 4.2477 faster than an Euclidean BCH coder. Considering as example of long range operation systems as satellites communications, this advantage would represent a great amount of data transmitted in same time window.

BCH from theoretical algebraic approach, its possible to reach incomparable results, since that such algorithm will be implemented in hardware, FPGA platform or even in a ASIC. Furthermore, the VHDL can describe in more detailed way, the final results from the synthesized hardware, showing throughput, maximum system clock and area allocated in the BCH hardware version. Those results are clearly impressive, because either speed and area performances of our algebraic approach implementation in comparison with sequential algorithms implementations as Berlekamp-Massey, are greatly better than any other kind of implementation and due to incomparable efficiency, the algebraic approach must replace all kinds of next BCH prototype. R EFERENCES

(a) Algebraic encoding

(b) Euclidean encoding

Figure 4: Algebraic and Euclidean encoding waveforms demonstrating that we implemented an common known algorithm to prove that our approach is a considerable choice.

V. C ONCLUSIONS The results presented in the previous section, show the main advantage of the algebraic approach proposed. A great efficiency in speed and hardware required to perform a coder-decoder BCH. This new approach should be considered mandatory for any kind of future system requiring ECC BCH coding algorithm. Until recently, very well known BCH algorithms has been applied, either in software and hardware context, because, it was the usual way to perform such task, and usually in a software (microprocessing) platform, like, for instance BerlekampMassey, Euclidean, Suguiama or Sudan-Guruswami as denoted in [3] and so on. We are proving in this work that targeting

[1] R. B. Wells, Applied Coding and Information Theory for Engineers, M. Horton, Ed. Prentice-Hall, Inc. - Tom Robbins, 1999. [2] S. B. Wicker, Error Control Systems for Digital Communication and Storage. Upper Saddle River, New Jersey 075458: Prentice Hall, Inc, 1995. [3] W. C. Huffman and V. Pless, Fundamentals of Error-Correcting Codes. Cambridge University Press, 2003. [Online]. Available: http://www.cambridge.org/9780521782807 [4] C. E. Shannon, “A mathematical theory of communication,” Bell systems Technical Journal, vol. 27, pp. 379–423,623–656, 1948. [5] R. Hamming, “Error detecting and error-correcting codes,” Bell systems Technical Journal, vol. 29, pp. 147–160, 1950. [6] B. Sklar, Digital Communications: Fundamentals and Applications, 2nd ed., ser. Prentice Hall Communications Engineering and Emerging Technologies Series., P. H. PTR, Ed. Bernard Goodwin, January 2001, day 11. [7] P. J. Ashenden, The VHDL Cookbook, 1st ed. Dept. Computer Science, University of Adelaide, South Australia: University of Adelaide, 1990. [8] P. Banerjee, Parallel algorithms for VLSI computer-aided design, N. Englewood Cliffs, Ed. Prentice-Hall, 1994. [9] A. Kunzmann, Reuse techniques for VLSI design, R. Seepold, Ed. Kluwer Academic, 1999. [10] V. K. Madisetti, VLSI digital signal processors : an introduction to rapid prototyping and design synthesis. Butterworth-Heinemann, 1995. [11] Y. Taur and T. H. Ning, Fundamentals of modern VLSI devices, C. U. Press, Ed. Cambridge University Press, 1998. [12] N. H. E. W. . K. Eshraghian, Principles of CMOS VLSI design : a systems perspective. Addison-Wesley, 1994.