Hardware Based Realtime, Fast and Highly Secured Speech Communication Using FPGA
Hussain Mohammed Dipu Kabir1, Syed Bahauddin Alam Department of EEE Bangladesh University of Engineering and Technology Dhaka, Bangladesh E-mail: 1
[email protected]
watermarking approach confirms content integrity. Recently watermark is also used for quality evaluation of speech [12]. Most watermarking algorithms have been developed for software implementations due to ease of use and upgrading. However, the software implementations have the limitations, speed problem and are vulnerable to the off-line attacks. The high density techniques of current FPGAs will be a highly attractive solution for hardware implementation of the software-based watermarking algorithm as they provide flexibility, easy implementation and high performance [13], [14], [15]. Encryption is necessary to ensure protection of data and watermark in the medium. Traditional symmetric key encryption algorithms like Data Encryption Standard (DES) use small blocks size with complex permutations process to give secure output cipher text [16]. Public key algorithms are not suitable for large amount of data due its slow performance [17-18]. AES was announced by National Institute of Standards and Technology (NIST) on 2001. AES is one of the most secure algorithms used in symmetric key cryptography [19-20]. It uses complicate repeated steps to prevent analytic attacks that can discover weakness in the algorithm and so attack any encrypted data. AES use high diffusion to eliminate any prediction of key. AES algorithm is not appropriate for realtime communication because of delay and interdependence of frames. If one frame is lost in AES next frames will be garbage to receiver. In this paper, we presented a FPGA implementation of com- pression, watermarking and encryption algorithm of speech signal. Compression is needed to prevent the increase of the data in transmission channel. Encryption provides secu- rity while the signal traveling via the transmission channel. Encryption is performed using random number. So, it is very difficult to decrypt this without knowing the algorithm/ sequence of Xoring and swapping. In FPGA based encryption system, random number is generated by XORing the nibble bit of a noise with a clock. This clock’s frequency is higher and not an integer multiple sampling frequency. Nibble bit of any noise signal is almost a random number. XORing with clock resultant will be a random number.
Abstract—This paper presents a FPGA (Field Programmable Gate Array) based secured speech signal communication system. The system consists of compression, encryption and water- marking. Watermark provides authentication and ownership verification. Encryption is used for security of watermark and voice signal. As random numbers are used as a key for the encryption process, the signal is secured for both wired and wireless media. FPGA is an efficient device for encryption. Conventional processors contain small number of registers and perform large operations in multiple cycles. FPGA can perform a large number of operations at a time. All the steps described in the algorithm is verified in Matlab and implemented in FPGA. Security level is also analyzed in this paper. Keywords— FPGA, Key, Encryption, Decryption, Compression, Security Attacks.
I.
Watermarking,
INTRODUCTION
Digital watermarking is the technique of inserting specific information (watermark) into audio signal, data, image or video. The specific information is known as watermark and usually used for ownership verification, identification, authen- tication purpose and protection. Due to rapid progress in wireless communication systems, mobile systems, and smart card technology, information is more vulnerable to abuse. For these reasons, it is important to make information systems secure to protect data and resources from malicious acts. The watermark is extracted or detected at any point where identification or integrity is concerned. Traditional digital watermarking schemes are mainly based on spatial-domain or transform-domain, such as discrete cosine transform and discrete wavelet transform. In [1] Z. M. Lu, et al. proposed a novel VQ-based watermarking algorithm. After that, various VQ-based watermarking algorithms emerged, such as the algorithms based on codebook partition [2], codebook expansion [3], code words labeling [4], indices statistical characteristics [5], predictive VQ [6], lattice VQ [7], finite state VQ [8], and MSVQ [9], etc. Robust, fragile and semifragile ap- proaches [10], [11] are the classification of recent watermark- ing techniques. The robust watermarking approach protects the copyright identifier of data in which watermarks are not easily removed by attacks. The fragile ___________________________________ 978-1-4244-6943-7/10/$26.00 ©2010 IEEE
This relation is not true for discarded one sample and this causes noise. According to this relation, some part of main signal may create noise and folded signal and low frequency noise signal are of same amplitude. If we can extract low frequency signal using filter and shift it by 2 FS and then 3 subtracting with entire signal we may eliminate folding noise. Low frequency noise is eliminated using low-pass filter. Watermark can be any identifying sequence or copyright mark. For IP-phone, the watermark may identify the device MAC or IP address or both for proper identification. Copyright information can represent any enterprise or organization. In the proposed system, encryption is performed with 3 successive XORing operation. From the property of XOR operation given in eqn. (1), the encryption and decryption operation is performed. (S Ͱ R) Ͱ R = S (1) Where, S represents sampled signal sequence and R represents random number sequence.
For compression non-linear down-sampling technique is applied. Watermark and random numbers are inserted in the places created due to compression. Entire signal is also encrypted for protection. Security analysis and comparison among different algorithm is also discussed here. II.
MATHEMATICAL BACKGROUND
In 1.5 factor non-linear down-sampling technique, one sam- ple is discarded after taking two samples. When the frequency of the main signal is small compared to Fs, main signal can be reconstructed from down sampling data with very small noise applying only interpolation. At high frequency, two noise signals are found. As it is nonlinear down sampling, we are getting original signal with less amplitude because of one same sampling period and also getting noise of folding frequency, F 2 FS because of one 3 double sampling period. Let us assume the signal of frequency, F 2 FS f 1 Hz. 3 Folding signal, sin[2{(2/3)Fs – f1}(n/Fs)] = sin{(2 /3) – 2f1(n/Fs) } Main Signal, sin[2{(2/3)Fs + f1}(n/Fs)] = sin{(2 /3) + 2f1(n/Fs) }
III.
PROCESS FLOW
A number of samples of speech signal create a frame. Here we are considering frame of 15 samples. Doing so, proposed encryption technique will also be applicable for frames, containing 30, 45, 60 samples. FPGA chip / programmed chip is used for this system. Fig.1 shows the arrangement for encryption in FPGA1 and Fig.2 shows the arrangement for decryption in FPGA2. According to Fig. 1 sampled signal is received synchronously using the CLK used for sampling. One bit random number is added with it.
Fig. 1 Arrangement for transmitting the speech signal in FPGA
Fig. 3. A frame of 15 samples
Fig. 2 Arrangement for receiving the speech signal in FPGA
Folded signal - Main signal= -2cos{(2 n)/3} sin{2f1(n/Fs)} Now, -2cos{(2 n)/3} = 1 for n%3 = 1,2 -2cos{(2 n)/3} = -2 for n%3 = 0 So, for the two samples, Folded signal - Main signal = Low frequency noise signal.
Fig. 4. The frame with block A, B, C, D and E
B BE C CE D DE
A.
Compression First of all, the speech signal is to be compressed. To compress the speech signal, non-linear down-sampling process is used. In this process, mid sample of every three successive samples are dropped and available space is created. For example, considering the system shown in Fig. 1. The input consists of the samples s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14, s15 shown in Fig. 3. The output set will be s1, s3, s4, s6, s7, s9, s10, s12, s13, s15 (10 samples). 11th and 12th sample will contain two watermarkw1, w2 and 13th-15th samples will contain three random numbers- r1, r2 and r3. During reconstruction at the receiving end, higher order interpolation technique is used to reconstruct the dropped samples s2, s5, s8, s11, s14. If the frequency of the main signal is small compared to the sampling frequency Fs, then the main signal can be easily reconstructed with a little noise.
˄3˅ ˄4˅ ˄5˅
The next step is to XOR the signal again 8 X 15 bit secret key (k) given in Eqn. (6). This secret key is available at both transmitting and receiving end. Sig.Frame = Sig.Frame Ͱ k (6) Then the signal is made more randomized using the opera-tions given in Fig. 5. Rotation of 4-bit is performed for even samples of the frame. To undo this operation same operation will be performed in decryption part. Both operation and pseudo code is shown in this figure. Rotation is easier and effective for FPGA based encryption. In the next step, the operations of Eqn. (7) - (9) are executed. ˄7˅ A AD ˄8˅ B BDC ˄9˅ E ECD And in final step of encryption is given in Eqn. (10) - (12). ˄10˅ C CA ˄11˅ D DAB The above two steps (Eqn. (7) - (11)) provides more security of the frame while transmitting in the media. The proper decryption is possible only when correct sequence of operation and block selections are known properly. FPGA will perform entire encryption steps in one positive edge of clock and within 5ns according to wave-shape editor of ’Quartus II’ software.
B. Watermarking After compression, spaces are available for watermark. If 15 successive speech samples are considered, then after compression, 5 samples are available in which two samples are used for watermark. The frame size is not changed and the watermark signal is buried between the voice samples and random number bits. So, it is difficult to identify the watermark in the signal. The watermark signal may be device identifier code like MAC address, IMEI number, IP address or any copyright information. The watermark is placed in 11th and 12th sample position of the 15 sample frame. So the watermark is buried in the frame which is not easily detectable. The watermark signal is available in every successive 15 samples. So, in case of any cropping or any other operation is performed on signal, the watermark signal is still available.
D.
Transmission The transmission media used is lossless wire media. In practical both wired and wireless media can be used for transmitting and receiving signal. The medium is expected to be high bandwidth and high QoS. But there may be possibility of alternation of data during transmission. The altered frames are dropped during FCS. Dropped frames will not cause any problem because frames are not interdependent. As the communication scheme is concerning about voice, so UDP protocol is preferable.
Fig. 5. Rotation of even samples by 4 and pseudo code for this operation
C.
Encryption After inserting the watermark signal, random three bytes are added. They are marked as r1, r2 and r3 (red lines of Fig. 9). The random number is used to make the signal like noise. The first step is to randomize the signal. Firstly, the frame of 15 samples is divided into block A, B, C, D and E shown in Fig.4. The first step is given in Eqn. (2) - (5). ˄2˅ A AE
Fig. 6. Original Signal
Fig. 9
Fig. 7 After Non-linear down sampling operation
Watermark (Green) and random samples (Red) are added into compressed signal
E. Decryption At the receiving end, decryption operation is performed. The decryption process is just the reverse order of sequences of encryption process. So, the decryption sequence for our scheme is given below which starts with Eqn. (11) and reversely ends with Eqn. (2); operation shown in Fig. 5 is also included. F.
Reconstruction During compression, one-third samples are dropped. Here, s2, s5, s8, s11 and s14 are dropped from the successive 15 samples. These signals should be reconstructed. As the middle bit of every 3 bit is dropped, so reconstruction is quite efficient. Here, higher order interpolation technique is used for signal reconstruction. In case of high frequency signal noise elimination is needed after interpolation reconstruction according to “Mathematical Background” section. FPGA chip can handle such complex operation with great performance (least delay and highest throughput). IV.
Fig. 10 Original Speech Signal
B.
Encryption and Decryption of entire voice signal For simulation five male, five female, sound of bird and sound of train is used. Encryption-decryption with watermark is reversible one to one system. For compression some loss of information occurs. For low frequency sound this loss is negligible. Speech signal is of much low frequency and can be reconstructed easily using interpolation. Fig. 10 shows a speech signal. This signal is watermarked and encrypted in Fig. 11. After decryption and elimination of watermark signal is reconstructed. Output signal is shown in Fig. 12.
SIMULATION RESULT
A.
Encryption and Decryption of a Single Frame For encryption of a single frame a pure sine wave is considered as original signal for better realization. Original signal is shown in Fig. 6. The compressed signal of Fig. 8 is found after down-sampling at Fig.7 and compressing. Finally the watermark and random number sequence are added and the final frame is shown in Fig. 9.
V.
RESULT ANALYSIS AND DISCUSSION
In this paper, a secured real-time voice communication scheme is proposed and implemented it using FPGA and analyzed for several cases. A.
Result Analysis Fig. 10 shows the decrypted waveform of original signal shown in Fig. 9. The signal is totally randomize and from the figure main signal can’t be determined. However hackers may determine the process of encryption using their attacking techniques. To know whether hackers will be able to do so, common attack of hackers are discussed.
Fig. 8 Compressed signal
6) Man-In-The-Middle Attack (MIM, or MITM): Attacker knows the input and output data of encryption process and can change input for trial and error. This attack is useless against proposed system as output changes simulation to simulation for same input. 7) Differential Cryptanalysis: Differential cryptanalysis is the attempt to find similarities between various cipher-texts that are derived from similar plaintexts. This similarities may assist in recovering key. In proposed system output varies simulation to simulation for same input. Combination of random numbers can’t be determined by hacker. So sequence of Xoring will be unpredictable to hacker. For comparison between hardware and software based encryption system readers are directed to [22].
Fig. 11 Watermarked and Encrypted signal
C.
Delay and Throughput Delay is important for realtime communication. In proposed system delay for encryption is very small (about 5ns). Encryption is performed using FPGA and delay for FPGA is approximately 5ns, according to waveshape editor of ’Quartus II’ Software. Delay is divided into three partdelay of transmitter, delay of network, delay of receiver [22]. In proposed system Delay for forming frame = 15 × Ts = 1.785 ms [For, Fs = 8K] Delay for adding watermark and encryption = 5ns Delay for removing watermark and decryption = 5ns Total delay = 1.785ms + 5ns + Time for transmission and receiving +5ns Throughput is not important for proposed system. For data encryption methods total file or folder is encrypted. Through- put reduces execution time. For proposed voice encryption method data rate is 8×Fs 64Kb/sec. Throughput of FPGA is much higher than that.
Fig. 12 Reconstructed signal at output
B.
Common Attacks 1) Brute Force Attack: Brute force attack is based on guessing the password or the encryption key. Key/password is determined by trial and error method. Hackers try for millions of time (if needed) for finding key. Proposed system is strong against this attack because random number is used as key and output changes simulation to simulation due to this random key. Also hardware based encryption systems are stronger against this attack. 2) Parallel Attack: Parallel attack is actually a brute force attack in which encrypted data is copied to many computers. All computers are assigned for decrypting data. Proposed system is also strong against this attack as the attack is same to Brute Force Attack. 3) Cold Boot Attack: DRAM memory is used to store data while the system is operating. After power is removed, all content is deleted in a gradual process that can take a few seconds and up to a few minutes. A hacker may read the memory content by cutting power and then performing a cold boot. Hardware based encryption system is not vulnerable to Cold Boot Attack since it does not use the host RAM. 4) Malicious Code: This attack alter the software based encryption software to disable encryption. Hardware based encryption methods are secured against this attack. 5) Known Plaintext Attack: In this attack hackers know the plaintext and corresponding ciphertext of encryption system. Using them hacker tries to recover key.
D.
Discussion The proposed scheme has multi levels of security because encryption is performed using random number and the random bits are not easily guessable. Compression makes space in the frame and facilitates the need of extra space for watermark and random numbers. Again watermark signal provides with necessary information and are hidden in the speech signal. They can be further used for authentication, user verification, copyright information and source tracing. The scheme saves the need for extra space. It is also good for high QoS [23, 24] (Quality of Service) data transmission as there is minimal latency from originating signal to the reconstructed signal. The compression technique used is discussed in details in [25]. The proposed encryption-decryption system is not suitable for data facing bit error. Rather it is suitable for internet data transferring, where any lose of data is identified but corrected data is not sent. That means this method is useable for those data transmission where bit error does not occurs (have very low probability of error) or
[10] C. I. Podilchuk, E. J. Delp, “Digital Watermarking: Algorithms and Applications,” IEEE Signal Process. Mag., Vol. 18, no 4, pp. 33-46, Jul. 2001. [11] E. Izquierdo and V. Guerra, “An Ill-posed Operator for Secure Image Authentication,” IEEE Trans. Circuits Syst. Video Technol, Vol. 13, No. 8, pp. 842852, Aug. 2003. [12] L. Cai, R. Tu, J. Zhao, and Y. Mao, “Speech Quality Evaluation: A New Application of Digital Watermarking,” IEEE Transaction On Instrumentation And Measurement, Vol. 56, No. 1,Page: 45-55 February 2007. [13] P. Kitsos. N. Sklavos. and 0. Koufopavlou, “An Efficient Implementation Of The Digital Signature Algorithm,” 9th IEEE International Conference on Electronics: Circuits and Systems, Pmc. of the . VoL3, San he. USA. 2002. [14] S. 0. Med , A K Katsaggelos, and M. Sartafzadeh, “Analysis and FFGA Implementation of Image Restoration Under Resource ConWainUC ,” IEEE Trans. on Computers, Vol 52. no.3, Mar. 2003. [15] M. Klimerh, V Swton. and D. Walola, “Robust Image Watermarking Algorithm Based on Predictive Vector Quantization,” Proc. ICICIC06, Vol. 3, pp. 491-494, 2006. [16] National Bureau of Standards, Data Encryption Standard, Federal Information Processing Standards Publication, U.S. Government Printing Office, Washington, DC (1977). [17] Anoop MS, “Public Key Cryptography-Applications algorithm and mathematical explanations,” 2007. [18] Teerakanok, Thongpon; Kamolphiwong, Sinchai “Accelerating asymmetric-key cryptography using Parallelkey Cryptographic Algorithm (PCA),” Telecommunications and Information Technology, 6th International Conference on computer, Volume 02, pp.812 815 ,2009. [19] Advanced Encryption System, “Robust Image Watermarking Algorithm Based on Predictive Vector Quantization,” Federal Information Processing Standards Publication, vol. 197, 2001. [20] J. Daemen and V. R. Rijndael, “The advanced encryption standard,” Dr.Dobbs J., Vol. 26 No 3, pp.137139 , 2001. [21] “Assessing the Security of Hardware-Based vs. Software-Based Encryp- tion on USB Flash Drives”, ScanDisk, Whitepaper, June 2008. [22] E. P. Guillen, D. A. Chacon, “VoIP Networks Performance Analysis with Encryption Systems”, World Academy of Science, Engineering and Technology, 5 August 2009. [23] J. Kim, S. H. Han, H. Lee, W. Ryu and M. Hahn, “QoS-Factor Transmission Control Mechanism for Voice over IP Network based on RTCP-XR Scheme,” IEEE Tenth International Symposium on Digital Object Identifier, pp. 16, 2006 [24] Tuoriniemi, A.; Eriksson, G.A.P.; Karlsson, N.; Mahkonen, A. “QoS concepts for ip-based wireless systems,” 3G Mobile Communication Technologies, pp. 229 233, 2002. [25] H. M. D. Kabir, S. Bahauddin, M. I. Azam, M. A. Hussain, K. M. S. Rahman, M. A. Matin, “Non-linear Down-Sampling and Signal reconstruction, Without Folding”, in European Modelling Symposeum (EMS 2010), 17-19 Nov. 2010.
bit error is identified and eliminated by inspection, such as check-sum or parity method. It is also suitable for uploading and downloading voice-data. Frame should be dropped when an error occurs. VI.
CONCLUSIONS
A new watermarking with encryption and compression scheme is described in this paper for real-time speech signal which can be used for identification, ownership verification and authentication. The watermark is not easily detectable as the size of the frame is not increased. If there is any attempt to change the content of the signal in the transmission channel, proposed scheme can detect it and return noise at the end. The encryption is also done with random block selection, random number, rotation and private key. So, if any other decryption algorithm is used, it also returns just noise at end. It is only possible for any system of exact algorithm scheme to get the speech signal. REFERENCES [1]
[2] [3]
[4]
[5]
[6]
[7]
[8]
[9]
Z.M. Lu and S.H. Sun, “Digital Image Watermarking Technique Based on Vector Quantization,” Electronic Letters, Vol. 36, No. 4, pp. 303-305, 2000. C. Lin and J. Pan, “Robust VQ-based Digital Image Watermarking or Noisy Channel,” Proc. ICICIC06, pp. 673-676, 2006. C. C. Chang, C. S. Shieh, J. S. Pan, B. Y. Liao and L. Hong, “An Improved Digital Watermarking Scheme in the Domain of Vector Quantization,” Journal of Computers, Vol. 17, No.2, pp. 19-26, 2006. Zhe-Ming Lu, et al., “Digital Image Watermarking Method Based on Vector Quantization with Labeled Codewords,” EICE Trans. Inf. and Syst., Vol. E86-D, no. 12, pp. 2786-2789, 2003. H. C. Huang, F. H. Wang, and J. S. Pan, “A VQ-based Robust Multiwa- termarking Algorithm,” IEICE Trans. Fundamentals, Vol. E85-A, No. 7, pp. 1719-1726, 2002. Y. N. Li, C. H. Liu, Z. M. Lu, J. S. Pan, “Robust Image Watermarking Algorithm Based on Predictive Vector Quantization,” Proc. ICICIC06, Vol. 3, pp. 491-494, 2006. L. Guillemot, J. M. Moureaux, “Indexing Lattice Vectors in a Joint Watermarking and Compression Scheme,” Proc. ICASSP06, Vol. 2, pp.11-329- 11-332, 2006. Y. C. Lin, Z. K. Huang, R. T. Pong and C. C. Wang, “A Robust Watermarking Scheme Combined with the FSVQ for Images,” Proc. ICITA05, Vol. 2, pp. 597-602, 2005. Z.M. Lu, D.G. Xu, S.H. Sun, “Multipurpose Image Watermarking Algo- rithm Based on Multistage Vector Quantization,” IEEE Trans. on Image Processing, Vol. 14, No. 6, pp. 822-831, 2005.