LONG RANGE REAL-TIME UNDERWATER ACOUSTIC ... - IEEE Xplore

LONG RANGE REAL-TIME UNDERWATER ACOUSTIC COMMUNICATION AT LOW BIT RATE WITH CHANNEL CODING PROTECTION André Goalic Joël Trubuil

Nicolas Beuzelin and

TELECOM Bretagne, BP 83818, 29238 Brest Cedex 3, France

ABSTRACT The TRIDENT1 project was launched a few years ago by GESMA2. The First objectives were to develop a multiple rate underwater acoustic link for images, text and data transmission. This link was designed to provide a wireless communication to AUVs3. Recently the platform was extended to low bit rate speech transmission using a MELP [1] coder working at 2400 bit/s. This allows us to use a lower carrier frequency able to reach a longer range. In order to increase the link reliability, channel coding was added to the system. Different kinds of error correcting schemes including Convolutional Code (CC) and Reed Solomon codes(RS) were checked during river and sea trials. The purpose of this paper is to present the whole platform including both the emitter and the receiver and an overview of the results during the trials. Among the numerous possible applications, one can note mine warfare, offshore activities with the development of digital acoustic links between divers, boats, submarines etc…or a digital underwater acoustic phone. Such an acoustic link is likely to increase underwater activity safety and also widely develop such activities. INTRODUCTION The GESMA, in collaboration with TELECOM Bretagne, has for several years conducted a project relating to a high data rate acoustic link. The goal is to provide to the autonomous underwater vehicles (AUVs) with wireless communication system processing between the AUV and the sea surface vessel. Multipath propagation, noise and the Doppler effect are responsible for most of the effects on underwater acoustic communications. These physical impacts cause a time 1

TRIDENT : TRansmission d’Images et de Données EN Temps réel, Real time transmission of images and data 2 GESMA : Groupe d’Etude Sous Marine de l’ Atlantique 3 AUV : Autonomous Underwater Vehicle 978-1-4244-2677-5/08/$25.00 ©2008 IEEE

G.E.S.M.A. BP 42, 29240 Naval Brest, France

dispersion and variability to the received signal. One can note that the carrier frequency and available bandwidth are much lower than those existing in other communication channels. To mitigate these different effects and optimize the spectral efficiency, a blind spatio-temporal equalizer, introduced by J. Labat et al [1] was chosen. This equalizer is the core of the realtime platform designed by TELECOM Bretagne and called TRIDENT (TRansmission d'Images et de Données EN Temps réel) [2]. With this platform, information can be transmitted at a data rate higher than 20 kbit/s in a horizontal configuration without periodic training sequences. The equalizer has already shown its robustness and reliability to combat strongly disturbed channels. Moreover, speech transmission were successfully realized with a 6 kbit/s speech coder in Brest bay over 2 kilometers [3]. In order to provide extra ranges over 4 kilometers for speech transmission, another low bit rate speech coder, called Mixed Excited Linear Prediction (MELP, 2400 bit/s) coder [4] is under consideration. GESMA also wants to increase the link reliability and provide strongly protected burst transmission for AUVs. Thus, the purpose is to choose channel coding abilities able to correct residual errors and add extra improvements to the bit error rates (BER). In order to do so, two kinds of channel coding are evaluated [5] (Convolutional Coding (CC) and Reed Solomon (RS) block). This paper provides an overview of the high data rate acoustic link. First, we present the TRIDENT platform with the different extensions. Second, we describe the very low bit rate speech coder/decoder (2400 bit/s, MELP). Then, different channel coding strategies are presented. The last section describes the sea trials carried out in October 2007 and the first acoustic communication results in the 2.8 to 8.7 kbit/s bit range, mainly applied to low bit rate speech transmissions.

2005. The interest of Spatial diversity is confirmed. The tests show that real time transmission of information is feasible even with harsh channels such as the underwater acoustic channel and in presence of multiple interferers. In this context, the contribution of channel coding can improve the transmission robustness and protect the data transmitted from remaining errors. Output equalizer samples w[n] are then transferred to the PC to be processed by channel and source decoding. Receiver and emitter have to be synchronized. Carrier recovery is classically resolved by the use of a Phase Locked Loop (PLL). Without resolving the fourfold phase ambiguity, the PLL could lock to any of four possible phase states, only one having the right carrier phase state. Since the QPSK modulation is invariant to jπ 2 shift, j ∈{0,1,2,3} , the receiver suffers from phase ambiguity.

TRIDENT PLATFORM As designed previously, the TRIDENT system was an acoustic link able to transmit information, images and text at a data rate higher than 20 kbit/s. Recently TRIDENT system possibilities were extended to speech underwater communications. The TRIDENT system [6](figure 1) can use four carrier frequencies (11.2, 17.5, 20.0 and 35 kHz). The bit rates under consideration are from 2.8 to 23.3 kbit/s with Quadrature Phase Shift Keying (QPSK) modulation.

SPEECH MELP CODER/DECODER Figure 1. TRIDENT Acoustic system The emitter platform consists of a PC and an external box containing a processing board. All algorithms are implemented on a Texas Instruments Digital Signal Processor (DSP) TMS320C6201. So far the emitter includes channel coding with the design of the emission scheme according to the choice of the error correcting strategy. The emitter does not include the speech coder. The receiver platform is based on an acquisition board, plugged into a personal computer (PC). The architecture of this board is based on an identical DSP.

Figure 2. The MELP model of speech production The MELP voice coder model (figure 2) is based on the traditional Linear Prediction Coefficient (LPC) vocoder. However, the synthesizer has additional abilities that allows MELP to mimic more of the characteristics of natural human speech: The speech is segmented into frames having a length of 22.5 ms and the sampling rate is set to 8 kHz. In the MELP coder each speech sample is 16 bitsquantizied giving a number of 2560 bits per frame at the MELP coder input. After a compression factor close to 53, this bit rate is reduced to 54 bits per frame, that is to say 2.4 kbit/s. Synchronization bits are added to the bit stream to retrieve the speech frame at the decoder. In the decoder, parameters from the bit stream (54 bits/22.5ms) are unpacked and decoded according to the appropriate scheme (voiced or unvoiced). Compared to figure 2, we can see that the speech production model is embedded within the structure of the decoder.

In reception, the signal is received on several sensors (4 in this version) coming only from one source emitter. The space diversity allows the Signal to Noise Ratio (SNR) to be improved in comparison to a single sensor version. After demodulation and synchronization (carrier and timing recovery), an equalizer is used to reduce the different perturbations induced by the underwater acoustic channel. This equalizer works according to two modes: a convergence or starting mode and a tracking mode. The switches between the modes are automatic and reversible by comparison of the MSE (Mean Square Error) to a threshold. In its tracking mode, the structure evolves towards a decision feedback equalizer using only filter permutation. The interest of this adaptivity lies in the possibility to switch from one structure to another according to the channel severity. The receiver robustness and its adaptability were clearly shown during sea trials carried out between 2002 and

2 of 7

UnderWater Acoustic (UWA) channel is also characterized by burst errors. Writing and reading matrix strategies are likely to improve the correcting process by working like an interleaver. The whole correcting rate is R = 0.7 whatever the correcting power t used.

CHANNEL CODING STRATEGIES In order to improve the acoustic link, channel coding is useful to correct the remaining errors, the goal being to decrease the BER from 10-2 down to 10-4. Different channel coding strategies may be used. In this project we check the use of Convolutional Codes (CC) and Reed Solomon (RS) block codes. Convolutional Codes are generally specified by three parameters (n, k, m). n is the number of output bits, k, the number of input bits and m corresponds to the number of memory registers. The quantity R = k/n called the code rate is a measure of the efficiency of the code. Commonly k range from 1 to 8 and n, from 2 to 10. Usually CC are specified by parameters (n, k, L), quantity L being the constraint length of the code. It is defined by L = k(m-1) and it represents the number of bits in the encoder memory that affect the generation of the n output bits. The decoding process uses the Viterbi algorithm with a trellis representation, 4 or 16 states according to the CC coder. Decoding is performed with hard or soft options. Hard decoding uses only binary values, whereas the soft option uses real values coming from the output equalizer. In the case of CC(7,5), two zero tail bits are added at the end of the frame in order to close the trellis. In the CC(35, 23) case, four zero bits close the treillis. Synchronization words S (respectively 16 and 14 bits) are included to retrieve the 54 bits speech frame at the receiver. Including the trellis tail bits, the information frame is a 72 bit length, whatever the CC in use. With a code rate R equal to 0.5, the emitted coded frame is 144 bits in length, thus the whole correcting rate is 0.375. After the coding process, the synchronization word is 32 bits (respectively 28) long. As we intrinsically have a synchronization word, differential coding, becomes useless to correct the phase ambiguity RS codes (n, k, t) are cyclic codes, built from n symbols with a maximum of n = q - 1, where q is the number of elements in the Galois field (GFq) and q = 2n. t is the symbol power correcting code, so the number of control symbols is 2t and the number of information symbols that can be transmitted is k = n - 2t. The decoding process can also be hard or soft. Based on the Chase algorithm, the soft decoding process also uses The Berlekamp and Chien algorithm to correct received symbols. In our case, we look for 4 unreliable bits and check among 16 possible codewords. Unlike the CC, the RS uses two matrix options to manage the information. A synchronization word S is also included to retrieve the speech frame at the receiver side. According to the RS power correcting code t, S is 29 bits in length in the t = 1 case and 27 in the t = 2 case. To match the chosen RS codes size, the information matrix processes over two speech frames (108 bits). The

SYNCHRONIZATION STRATEGY

Figure 3. TRIDENT receiver platform The Input channel decoder has to be synchronized with a synchronization word S. The bit error rate at the decoder input (BERin) is performed by a comparison of data emitted and data decided after synchronization and phase ambiguity correction. After a sufficiently reliable equalization, which deals with channel interference, and before launching the channel and source decoding processes, it is necessary to look for the synchronization word (figure 4). Among the different strategies, it is possible to track this word S using correlation methods. According to the error correcting code implemented, the word length is known at the receiver side. Let w(n) be the equalizer soft complex output at ˆ (n) is the instant nT (T being the symbol duration); w hard decision output. Then the correlator computes correlation function A(n) (1), and phase ϕ (n) (2). Decision

Equalizer

w(n)

S w(n) ˆ

Correlator

Output

−jφn

e

Figure 4 : Synchronization process

3 of 7

A(n) Fn Frame

1

L 1

A( n ) =

∑wˆ( n - l )s ( l )

about 400 m and around 7 m depth. The receiver was on board a boat while the emitter was at a 5 m depth below a buoy. In reception, acoustic signals were received on 4 sensors, interspaced close to 20 cm (around 5 wavelengths). The first hydrophone was 5 meters deep (about 1 meter below the ship's draught). Four carrier frequencies (11.2, 17.5, 20.0 or 35.0 kHz) and QPSK modulation were checked during this trial phase, the data bit rate being in the range 2.8 to 14 kbit/s.

2

*

(1)

l =0 L 1

ˆ ( n - l )s * ( l )) φ( n ) = Arg ( ∑w l =0

(2) A( n ) > Thresold

S = [s( 0 ), s( 1 ),..., s( L - 1 )] Fn = Wn e

- jφ( n )

Wn = [w(n+L)….w(n)…. w(n-N+1+L)]

(3) (4) (5)

N represents the information frame size and L the synchronization word length. The threshold user defined in this process allows a modification of the detection reliability to efficiently match the channel difficulties. The value ϕ (n) indicates the constellation rotation, when A(n) is greater than the threshold, thus allowing, its correction in the output frame Fn (4). This procedure was not included in our previous trials in the Penfeld river in March and July 2007. Now the problem is resolved and allows an improvement in speech synthesis quality. RESULTS

Figure 5. Input and output equalizer constellations, MSE for 150 s of signal

To improve the performance of the TRIDENT acoustic link, CC and RS block code are checked. The channel coding goal is to lower the bit error rate from 10-2 to 10-4 and improve speech synthesis quality. After efficiently realized synchronization and equalization, channel coding may add extra performance. Underwater channels may be subject to jumps of phase, introducing phase ambiguities. In such cases both channel and source decoding are unable to work. The use of differential coding decreases performance by doubling the bit error rate. On the other hand, the use of a synchronization word S efficiently solves the phase ambiguity. It also appears that we get better performance with the 16 state coder/decoder but it is more expensive in computational load. In the case of The CC coder, only the soft decoding option allows us to lower the BER from 10-2 to 10-4. In the case of RS coding/decoding the soft option also allows the BER of 10-4 [6] to be reached.

For example, a speech transmission, error-protected by a RS(31,27) code, was checked for 150 seconds at 6.7 kbit/s QPSK modulated over a 20 kHz carrier frequency. Figure 5 shows both the input and output equalizer (sensor 1) during a short sequence. The constellation output and MSE show and prove equalization efficiency. Equalizer parameters were adapted to the channel impulse response. A multipath within a time dispersive channel shows a strong Inter Symbol Interference (ISI) in the input constellation. After phase ambiguity correction and synchronization, the bit error rate fell from 1.85 10-4 (BERin decoder input) to 1.1 10-5 (BERout, output) in the case of hard decoding and 9 10-6 in the case of soft decoding. All channel coder/decoder configurations (CC and RS) were successfully tested for four bit rates with each carrier frequency representing 72 experiments (6 hours) at this site.

I. River trials

II. Sea Trials

River trials were carried out in March and July 2007 in the Penfeld river[6]. The main goal was to validate and evaluate TRIDENT system performance over horizontal shallow transmissions. Transmitter-receiver distance was

Sea trials took place between the 5th and 8th October 2007 in the bay of Brest (figure 6). The objectives of these trials were to check both receiver and emitter operating in moving and realistic conditions. The

4 of 7

10-2 to 4.25 10-3 after hard decoding and to 3.06 10-3 after soft decoding (figure 8). Note also that 43% of the frames were false at the decoder input.

emitter was on board a boat while the receiver was positioned in a laboratory van on the quay side.

Output equalizer: MSE −6

Level (dB)

−8 −10 −12 −14 −16 0

20

40

0

20

40

60

80 100 Time(s) Phase ambiguity detection

120

140

120

140

5

Phase (rd)

4 3 2 1 0

60

80 Time(s)

100

Figure 7. MSE and Phase ambiguity detection at equalizer output

Figure 6. Sea trial site, Brest Bay Two carrier frequencies were used: 11.2 and 17.5 kHz with the following rates: 3.2, 3.8 and 4.5 kbits/s for the first frequency, 5.8 and 8.7 kbits/s for the second. The communication distances were about 1000, 1500, 2000, 3000 and 3500 meters. During the trials, 71 sequences of 3 minutes were emitted, representing 213 minutes (3 hours 33 minutes). Demodulation, timing and carrier recovery and equalization, were computed in real time in the receiver. Channel and source decoding have not been performed in real time so far. The different coding strategies were used during these trials.

Level (dB)

Output equalizer: MSE −6 −8 −10 −12 −14 −16 0

20

0

20

40

60

80 100 120 Time(s) Correlator: Speech frame False (+1) or Lost (−1)

40

60

140

1 0 −1 80 Time(s) Speech signal

100

120

140

Level

1

Among the 71 sequences checked during the sea trials, we present some interesting results obtained both at different distances and in the coding strategies. For each trial the first figure plots the MSE and the phase ambiguity at the equalizer output, while the second figure also shows the MSE equalizer output, the position of both false frames and lost frames and the synthesized speech signal coming from the decoded frames.

0 −1

0

20

40

60 Time(s)

80

100

Figure 8. MSE, Frames false and lost, synthesized speech b) Reed Solomon, RS(31, 29) correcting power t=1.

The first result for a transmission used a carrier frequency of 11.2 kHz, a rate of 4.5 kbits/s and a range of 3500 meters.

In figure 9, note that the MSE is strongly perturbed in the middle of the sequence (between 50th and 100 th s) resulting particularly in phase jumps, detected and fixed by the synchronizer. Note also strong ambiguity variations corresponding to MSE perturbations (50th 100 th s). During this trial, 162 frames were lost over a number of 7664 emitted frames. BER fell from 2.38 10-2 to 1.7 10-2 after hard decoding and to 1.38 10-2 after soft decoding. Note also that 28% of the frames were false at the decoder input.

a) Convolutional Code CC(35, 23) 16 states. In figure 7 a certain number of peaks appear in the MSE plot whose values fluctuate around –12 dB. Phase ambiguity is fairly perturbed at the beginning of the sequence. During this trial, 51 frames were lost over a number of 4628 emitted frames. The BER fell from 1.1

5 of 7

Output equalizer: MSE

Output equalizer: MSE −6 Level (dB)

Level (dB)

−5

−10

−8 −10 −12 −14

−15

−16

0

20

40

60


120

140

40


100

0

20

40

100

4 Phase (rd)

4 Phase (rd)

20

5

5

3 2

3 2 1

1

0

0

0

0

20

40

60

80 Time(s)

100

120

Output equalizer: MSE Level (dB)

Output equalizer: MSE −6 Level (dB)

−8 −10 −12 −14 −16 0

20

40

60

80

100

120

−6 −8 −10 −12 −14 −16

140

0

20

40

0

20

40

Time(s) Correlator: Speech frame False (+1) or Lost (−1) 1

60 80 100 Time(s) Correlator: Speech frame False (+1) or Lost (−1)

1

0.5 0

0

−0.5

−1

−1 0

20

40

60

80

100

120

140

Time(s) Speech signal 1

60 Time(s) Speech signal

80

100

1

0

Level

Level

0.5

−0.5 −1

80

Fig. 11 : MSE and Phase ambiguity detection at equalizer output

Figure 9. MSE and Phase ambiguity detection at equalizer output

−18

60 Time(s)

140

0

20

40

60

80

100 Time(s)

120

140

160

180

0 −1

Figure.10 : MSE, Frames false and lost, synthesized speech

0

20

40

60 Time(s)

80

100

Fig. 12 : MSE, Frames false and lost, synthesized speech

The second result for a transmission used a carrier frequency of 17.5 kHz, a rate of 4.5 kbits/s and a range of 2500 meters.

b) Reed Solomon , RS(31, 27) correcting power t=2

a) Convolutional Code CC(35, 23) 16 states

In figure 13, the MSE is perturbed by some peaks resulting particularly in phase jumps, detected and fixed by the synchronizer. During this trial, 121 frames were lost over a number of 8004 emitted frames. The BER fell from 2.4 10-2 to 2.16 10-2 after hard decoding and to 1.95 10-2 after soft decoding. Note also that 28% of the frames were false at the decoder input (figure 14).

In figure 11, the MSE is relatively perturbed. These perturbations also appear in the phase ambiguity. During this trial, 143 frames were lost over a number of 4780 emitted frames (figure 12). The BER fell from 8.9 10-2 to 6.9 10-2 after hard decoding and to 6.6 10-2 after soft decoding. Note also that 68% of the frames were false at the decoder input.

For all the trials both in the river and the sea, the equalizer of course requires a delay to converge. During equalizer convergence, frames may not be detected.

6 of 7

is designed to work on hard and soft values. CC codes include two options 4 and 16 states while the RS has two correction power possibilities.

Output equalizer: MSE

Level (dB)

−5

−10

The majority of the trials operate correctly and show equalizer robustness for unmoving conditions and for moving conditions up to 2000 meters. For longer ranges the receiver suffers from a lack of AGC, phase jumps and bad synchronization recently corrected by a synchronization frame strategy. Comparison between the CC and RS options seems to lead to the superiority of the CC scheme compared to the RS, but when we look at the coding rate, it appears that the CC coding rate (0.375) is significantly lower than the RS coding rate (0.696). The spectral efficiency has to be taken into account for true comparison. Both the CC and RS options are only able to correct residual errors. A bit error rate around 10-2 at the channel decoder input falls in an area where the error correcting scheme has difficulties operatize. The use of interleavers and turbo codes will constitute interesting perspectives for improving the underwater acoustic link. A compromise between spectral efficiency, equalizer performance and channel coding rate has also to be found.

−15 0

20

0

20

40


100

5

Phase (rd)

4 3 2 1 0

40

60 Time(s)

80

100

Fig. 13 : MSE and Phase ambiguity detection at equalizer output

Level (dB)

Output equalizer: MSE −5 −10 −15 0

20

40 60 80 100 Time(s) Correlator: Speech frame False (+1) or Lost (−1)

0

20

1 0

REFERENCES

−1 40

60 Time(s) Speech signal

80

100

[1] J. Labat and C. Laot, ‘Blind adaptive multiple input decision feedback equalizer with a self optimized configuration’, IEEE trans on Comm, Vol. 49, N°4, April 2001. [2] J. Trubuil, G. Lapierre, T. Gall, J. Labat, ‘Real-time high data rate acoustic link based on spatio temporal blind equalization: the TRIDENT acoustic system’, Proc. OCEANS 2002, Biloxi, Vol.4, pp2438-2443. [3] A. Goalic, J. Trubuil, G. Lapierre, J. Labat, ‘Realtime low bit rate speech transmission through underwater acoustic channel’, OCEANS 2005, Brest, France. [4] L. M. Suplee, R. P. Cohn, J. S. Collura and A. V. McGree, ‘MELP: The New Federal Standard at 2400 bps’, IEEE ICASSP’97 Conference, Munich Germany, pp 1591-1594. [5] A. Goalic, J. Trubuil, N. Beuzelin, ‘Channel coding for underwater acoustic communication system’, Oceans 2006, September 18-21, Boston, Ma, USA. [6] J Trubuil, A. Goalic and N. Beuzeulin, “A low Bit-Rate Speech Underwater Acoustic Phone Using Channel Coding For Quality Improvement”, Milcom’07, Orlando Fl, Oct. 2007

Level

1 0 −1

0

20

40

60

80

100 120 Time(s)

140

160

180

Fig. 14 : MSE, Frames false and lost, synthesized speech CONCLUSION AND PERSPECTIVES The need for underwater acoustic communication led GESMA to launch a project called TRIDENT. This project began by the design of a receiver platform. This underwater acoustic link was able to transmit different kinds of information (data, images, text, etc …). Recently the platform has been extended by the design of the emitter part, implemented on a DSP chip Texas TMS320C6201. TRIDENT has also been extended to the transmission of low bit rate speech signals. To increase the robustness link, channel coding is added to the system with the objective of lowering the bit error rate from 10-2 to 10-4. The low bit rate speech coder MELP working at 2400 bits/s has then been chosen. Different kinds of channel coding using convolutional code and Reed Solomon codes have been implemented. Each error correcting code

7 of 7