1320
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 6, JUNE 2006
A Blind Source Separation Based Method for Speech Encryption Qiu-Hua Lin, Fu-Liang Yin, Tie-Min Mei, and Hualou Liang, Senior Member, IEEE
Abstract—The underdetermined problem poses a significant challenge in blind source separation (BSS) where the number of the source signals is greater than that of the mixed signals. Motivated by the fact that the security of many cryptosystems relies on the apparent intractability of the computational problems such as the integer factorization problem, we exploit the intractability of the underdetermined BSS problem to present a novel BSS-based speech encryption by properly constructing the underdetermined mixing matrix for encryption, and by generating the key signals that satisfy the necessary condition for the proposed method to be unconditionally secure. Both extensive computer simulations and performance analyses results show that the proposed method has high level of security while retaining excellent audio quality. Index Terms—Blind source separation (BSS), independent component analysis (ICA), speech encryption.
I. INTRODUCTION
A
S SPEECH communications become more and more widely used and even more vulnerable, the importance of providing a high level of security is dramatically increasing. As such, a variety of speech encryption techniques have been introduced. The analogue encryption has been one of the most popular encryption techniques widely used in speech communication. In general, there are four main categories: frequency-domain scrambling (e.g., the frequency inverter and the band splitter), time-domain scrambling (e.g., the time element scrambling), amplitude scrambling (also known as the masking technique that covers the speech signal by the linear addition of pseudorandom amplitudes), and two-dimensional scrambling that combines the frequency-domain scrambling with the time-domain scrambling [1]. Besides, there are many other analogue speech encryption methods in the transform domain, e.g., fast Fourier transform, discrete cosine transform and wavelet transform, etc. [2]–[4]. Recently, some new speech encryption methods including chaotic cryptosystem [5] and encryption using circulant transformations [6] have also been developed.
Manuscript received March 21, 2005; revised August 29, 2005. This work was supported by the National Natural Science Foundation of China under Grant 60402013, Grant 60372082, and Grant 60172073. This paper was recommended by Associate Editor M. Chakraborty. Q.-H. Lin and F.-L. Yin are with the School of Electronic and Information Engineering, Dalian University of Technology, Dalian 116023, China (e-mail:
[email protected]). T.-M. Mei is with the School of Information Science and Engineering, Shenyang Institute of Technology, Shenyang 110168, China. H. Liang is with the School of Health Information Sciences, University of Texas at Houston, Houston, TX 77030 USA. Digital Object Identifier 10.1109/TCSI.2006.875164
Blind source separation (BSS) aims to recover a set of unknown mutually independent source signals from their observed mixtures without knowing the mixing coefficients. Therefore, it is also well known as independent component analysis (ICA). It has received considerable attention in recent years, and has been successfully applied to many fields such as communications and biomedical engineering [7]–[14]. Its applications for signal encryption thus far have been much focused on the image cryptosystems [15]–[17] but little on the speech encryption [18]. Motivated by the fact that the security of many cryptographic techniques depends upon the apparent intractability of the computational problems such as the integer factorization problem [19], [20], we present in this paper a BSS-based speech encryption scheme, the security of which relies on the difficulty of solving the underdetermined BSS problem where the number of the source signals is greater than that of the mixed signals. The sufficient condition for constructing the underdetermined mixing matrix for encryption is presented based on the source inseparability of BSS [21]. The two necessary conditions for generating the key signals required for this speech encryption are then described by retaining the perfect key characteristics of the one-time pad cipher [19], [20] while considering the BSS source separability. Computer simulations and performance analyses show that the proposed method has high level of security and excellent audio quality. The rest of this paper is organized as follows. In Section II, we briefly introduce the BSS mixing model and the underdetermined BSS problem. Section III describes the BSS-based speech encryption. Section IV presents the sufficient condition for constructing the underdetermined mixing matrix for encryption. Two necessary conditions for generating the key signals to be used for the underdetermined system are presented in Section V. Section VI describes the evaluation experiments and shows the results. Finally, we discuss related issues and conclude this paper in Section VII. II. BSS MIXING MODEL AND UNDERDETERMINED PROBLEM independent source signals Suppose that there exist and observed mixtures of the source signals (usually ). The linear BSS mixing model is represented by the mixing equation as follows: (1) , which is an column where similarly colvector collecting the source signals, vector lects the observed (mixed) signals, and is an mixing matrix that contains the mixing coefficients. The goal of BSS is
1057-7122/$20.00 © 2006 IEEE
LIN et al.: BSS-BASED METHOD FOR SPEECH ENCRYPTION
1321
Fig. 1. Block diagram of BSS-based speech encryption.
to find an vector
demixing matrix
such that
output
(2) where is a permutation matrix and is a diagonal scaling matrix. When the number of the source signals is greater than that , BSS becomes a difficult of the mixed signals, i.e., case of the underdetermined problem, in which the complete separation of the source signals is usually out of the question [21], [22]. Therefore, the underdetermined BSS problem has been routinely considered as an obstacle for source separation [7], [8]. In this paper, the underdetermined problem is rather treated as an advantage to establish a novel speech encryption. The proposed method is largely motivated by the popular use of the formidable intractability of computational problems (e.g., the integer factorization problem) to ensure the security of many cryptosystems [19], [20]. III. PROPOSED METHOD The block diagram of the BSS-based speech encryption is shown in Fig. 1. The central idea of the proposed method is to construct the intractable underdetermined BSS problem in encryption that can only be solved with the key signals in decryption. In the following, we first describe the encryption, and then explain the decryption. A. Encryption There are two main stages, the segment splitter and the underdetermined mixing, in the encryption process. 1) Segment Splitter: Similar to the time element scrambling [1], the segment splitter first divides the original speech segments into frames and then partitions each frame into , where is the segment length, is the frame pointer. Since the parameters and are necessary for the key signals generating algorithm (it will be described in Section V), they are inserted into the head data of the encrypted speech in a definite format for transmission. The proposed method encrypts the original speech frame by frame. 2) Underdetermined Mixing: At this stage, a underdetermined mixing matrix for encryption and independent key signals are first generated
and key signals will be described in (the generation of encrypted segSection IV and V, respectively), and then are the outputs through mixing the ments original segments and the key signals with the underdetermined mixing matrix according to the BSS mixing model (1). Specifically, given , , the encryption can be repreand sented by the following equation: (3) where . Obviously, this encryption process turns BSS into the underdetermined BSS problem since source signals but mixed signals. In such a case, there are by properly constructing the underdetermined mixing matrix , the original segments can not only be well masked by the key signals in the encrypted speech, but also immune from the separation from the encrypted segments by BSS without the key signals. This is exactly the intractability of the underdetermined BSS problem. B. Decryption The decryption of the proposed method also consists of two stages (see Fig. 1): the BSS and the waveform reconstruction. 1) BSS: Once the encrypted segments are received and the key signals are regenerated by the secret seed , they are combined to form mixed signals for decryption , upon which BSS is then performed. It is easy to find that , where is the mixing matrix for the BSS decryption as
(4) zero matrix, is a identity mawhere is a is a square matrix of full rank. trix. Obviously, Accordingly, with the key signals in decryption, the underdetermined BSS problem resulting from the encryption becomes the conventional BSS in which the number of the mixed signals is equal to that of the source signals. As a result, the original segments and the key signals can be well recovered together
1322
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 6, JUNE 2006
through BSS. After discarding the key signals, i.e., the signals with the biggest numbers of zero-crossings, we can obtain , as shown the copies of original segments in Fig. 1. 2) Waveform Reconstruction: Due to the indeterminacy of may have BSS, the BSS decrypted segments permutation and scale ambiguities compared with the original segments . To reconstruct the original speech, need to be reordered and rescaled by incorporating some waveform information of the original segments . In practice, the waveform information is collected by the segment splitter (see Fig. 1), which includes the and the numbers of zero-crossings (nzc), the maximum of each original segment with zero mean. With minimum decrypted segthe waveform information, the order of the ments can be correctly rearranged with nzc first, and their amplitudes can then be reconstructed by using and . The reordered and rescaled segments finally form one frame of the decrypted speech. Note that the waveform information has very small size and can be inserted into the encrypted frame in a definite format for transmission. IV. CONSTRUCTION OF UNDERDETERMINED MIXING MATRIX As mentioned above, the intractability of the underdetermined BSS problem essentially is the difficulty of separating the original segments out from the encrypted segments through BSS without the key signals, which can also be called the inseparability of the original segments. The dependence of source inseparability on the structure of the mixing matrix has been established in [21]. Therefore, with the proper construction of the , the inseparability of the underdetermined mixing matrix original segments, i.e., the intractability of the underdetermined BSS problem, is readily obtained. In the proposed method, is generated based on [21, Th. 1]: matrix is -row decomposable if Theorem 1: An and only if its columns , can be partitioned disjoint groups into , such that for all , if , then , where , is a partition of the set of integers (i.e., and ), is the direct sum of the spaces spanned by except (i.e., ). all matrix is -row decomposIn other words, an able if and only if its columns can be grouped into disjoint groups such that the column vectors in each group are independent of the vectors in all the other groups. Note that the spaces spanned by different groups may overlap, the theorem requires only that the column vectors do not lie in the joint part. Once matrix is -row decomposable, the source the signals are readily partitioned into groups [21]. As a result, two or more source signals recovered in one group by BSS are indeed inseparable. According to Theorem 1, the inseparability of the original segments can be achieved by constructing the underdetermined with proportional columns to make the origmixing matrix inal segments be separated out in groups together with the key signals with much higher level of energy. Specifically, the un-
derdetermined mixing matrix as follows:
for encryption is constructed
(5) where is a matrix of full rank, which is pseudoranand 1, domly generated with uniform distribution between is a scalar value and to make the original speech be well covered by the key signals. in The sufficiency of the construction condition of original seg(5) for ensuring the inseparability of the ments can be clearly shown by using Theorem 1. Let be independent be the last column veccolumn vectors of . According to (5), , tors of are disthen joint groups satisfying Theorem 1. Therefore, is -row decomposable. That is to say, if BSS is performed solely on encrypted signals without the key signals the , the estimated signals are definitely mixed signals corresponding to the groups , which are . Since , it turns out that the mixed are evensignals tually extracted, in which the original segments are still well covered by the key signals with much higher level of energy. As such, none of the original segments can be independently separated out from the encrypted signals without the key is signals. Therefore, the underdetermined mixing matrix sufficient to ensure the intractability of the underdetermined BSS problem, which is then utilized to establish the BSS-based speech encryption system. V. GENERATION OF KEY SIGNALS A. One-Time Pad Cipher The proposed method is a symmetric-key encryption scheme. A necessary condition for such a scheme to be unconditionally secure is that the key must be at least as long as the message [19], [20]. The one-time pad cipher is the one and only unconditionally secure encryption algorithm. It has perfect key characteristics: 1) the key is random, which is a pad with a non-repeating random string of letters and 2) the key has the same length as the message. Consequently, each letter of the key is used only once to encrypt the corresponding plaintext character. Therefore, the message is protected as long as the key remains secure [19], [20]. B. Two Necessary Conditions for Generating Key Signals From Fig. 1, we can see that the key signals have the same length as the original speech signal. Therefore, the BSS-based encryption essentially satisfies the above-mentioned necessary condition for the proposed method to be unconditionally secure. To have the perfect key characteristics of the one-time pad cipher and to meet the BSS requirement for source separation, we generate the key signals according to the following two conditions.
LIN et al.: BSS-BASED METHOD FOR SPEECH ENCRYPTION
1323
Fig. 2. Original speech signal (spoken digits 1-10 with a little music interference) with 110000 samples.
Fig. 3. Example of BSS-based speech encryption with two frames and four segments per frame. (a) Eight original segments s (t); . . . ; s (t) and s (t); . . . ; s (t). (b) Eight key signals s (t); . . . ; s (t) and s (t); . . . ; s (t). (c) Eight encrypted segments x (t); . . . ; x (t) and x (t); . . . ; x (t). (d) Eight decrypted segments u (t); . . . ; u (t) and u (t); . . . ; u (t) by Comon’s algorithm before waveform reconstruction. (e) Eight recovered segments after waveform reconstruction.
Condition 1: The key signals are statistically independent. Condition 2: The key signals are non-Gaussian. The necessity of these two conditions is obvious. Condition 1 is needed by the randomness of the key characteristics of the one-time pad cipher. Besides, it also meets the requirement of BSS for independent sources. Condition 2 is necessary for the correct decryption of BSS. Most BSS algorithms implicitly assume that the mixing matrix is time invariant and the sources are stationary; such assumptions render the BSS incapable of separating more than two Gaussian signals. In practice, the key signals are formed by the key signals generating algorithm (see Fig. 1), in which the secret seed , a pseudorandom number generator (PRNG), the number and the length of the segment within a frame, i.e., and , are used. The algorithm works by initializing the PRNG with the secret seed to generate pseudorandom values according to uniform distribution, which are then used to form the key signals in terms of and the frame pointer increased by the frame length . As such, these key signals accordingly fulfill the above-mentioned two conditions. Note that the secret seed is the key to be securely transmitted over a secure channel or over an insecure channel after encrypted with a public-key technique [19]. The key signals are not necessarily used for subsequent transmissions. They are regenerated for decryption by the key signals generating algorithm
after the secret seed is received and the parameters and are extracted from the encrypted speech on the receiving side. VI. EVALUATIONS AND RESULTS Extensive computer simulations have been performed to validate the proposed method. In most experiments, recorded speech files in wave-format are used and transmitted within local area network. As an example, our simulation is to securely transmit a speech file recording a person saying the digits “one” to “ten” in English while interfered with a little audible music [23]. The speech signal is sampled at 16 kHz and 6.875-s long (110000 samples), as shown in Fig. 2. We mainly used the Comon’s algorithm [24] in the simulations reported below. Other popular BSS algorithms such as the fast ICA algorithm [25] and the Infomax algorithm [26] can also be used. When these algorithms are compared, they generally perform near equally well. A. Encryption The segment splitter divides the original speech signal into two frames and four segments per frame. That is, and (about 0.86 s). Fig. 3(a) shows the waveform of the and eight original segments in the two frames. Their waveform information is collected and shown in Table I. The key signals for encrypting the two
1324
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 6, JUNE 2006
Fig. 4. Comparison of spectrograms of original speech, encrypted speech, and decrypted speech with time on horizontal axis, frequency on vertical axis, and with amplitude displayed using a “gray scale.” (a) Spectrograms of eight original segments. (b) Spectrograms of eight encrypted segments. (c) Spectrograms of eight decrypted segments by Comon’s algorithm.
WAVEFORM INFORMATION INCLUDING nzc, s
TABLE I s ABOUT EIGHT ORIGINAL SEGMENTS IN TWO FRAMES
AND
frames are generated by the key signals generating algorithm and , as shown in as for encrypFig. 3(b). One underdetermined mixing matrix tion is constructed according to (5), as shown in (6), at the is the first four column vectors bottom of the page, where and in (5). By using the encryption (3), two of and are encrypted frames obtained, as shown in Fig. 3(c) In addition, Fig. 4(a) and (b) shows the spectrograms of the eight original segments and the eight encrypted segments with time on the horizontal axis, frequency on the vertical axis, and with the amplitude displayed using a “gray scale” [1]. From Fig. 3(c) and Fig. 4(b), we can see that the original speech has been well masked by the key signals in the encrypted speech to achieve the security goal. B. Decryption Once the two frames of the encrypted speech and the secret seed are available on the receiving side, the key signals and for encrypting the two frames, as shown in Fig. 3(b), are regenerated by the
key signals generating algorithm, the two original frames are then well recovered one after another through performing the Comon’s algorithm on the two mixed signals vectors =1, 2. Fig. 3(d) shows the eight decrypted segments and , which are all of good quality but in different orders and changed scales compared with the eight original segments in Fig. 3(a). To recover the original speech, the waveform reconstruction is immediately performed after the BSS decryption for each frame by using the waveform information in Table I. The BSS decrypted segments are first and . reordered with nzc, and then rescaled by using Fig. 3(e) shows the finally recovered segments of the two frames, with their spectrograms shown in Fig. 4(c). Comparing both the waveforms between the original segments [Fig. 3(a)] and the decrypted segments [Fig. 3(e)] and their corresponding spectrograms[Fig. 4(a) and (c)], it can be seen that the original speech signal has been decrypted with very high quality. Our experiment can also be used to show the inseparability of the original segments in which the key signals are not available.
(6)
LIN et al.: BSS-BASED METHOD FOR SPEECH ENCRYPTION
1325
Fig. 5. Illustration of inseparability of original segments of speech by BSS without key signals. (a) Eight extracted segments by performing BSS with Comon’s algorithm solely on eight encrypted segments without eight key signals. (b) Spectrograms of eight extracted segments.
SNR
TABLE II (dB) OF FOUR ORIGINAL SEGMENTS IN FOUR ENCRYPTED SEGMENTS AND FOUR DECRYPTED SEGMENTS BY COMON’S ALGORITHM, FASTICA ALGORITHM, AND INFOMAX ALGORITHM FOR FRAME 1
SNR
TABLE III (dB) OF FOUR ORIGINAL SEGMENTS IN FOUR ENCRYPTED SEGMENTS AND FOUR DECRYPTED SEGMENTS BY COMON’S ALGORITHM, FASTICA ALGORITHM, AND INFOMAX ALGORITHM FOR FRAME 2
In this case, the BSS is solely performed on the two frames of encrypted speech in Fig. 3(c) without the two frames of key signals in Fig. 3(b). Fig. 5(a) shows two frames of extracted speech by the Comon’s algorithm. Correspondingly, Fig. 5(b) shows their spectrograms. We can see that the original speech is still well masked by the key signals in these extracted signals. Therefore, the in (5) is sufficient to ensure the intractability of the underdetermined BSS problem. C. Performance Analyses 1) Signal-to-Noise Ratio Computation: To quantify the performance of the proposed algorithm, we computed the signal-tonoise ratio (SNR) index for each original segment in the decrypted segments by the three BSS algorithms, the Comon’s algorithm, the fast ICA algorithm, and the Infomax algorithm, respectively. Specifically, the SNR index is defined as follows: (7)
are the original segments, are the key signals, is the demixing matrix obtained from the BSS algorithm used for is the mixing matrix corresponding to the BSS decryption, in (4), and denotes the th row decryption, i.e., . For comparison, we also and the th column of matrix for the original segments in the encrypted segcomputed ments using (7), but with replaced by in (6). Tables II indexes show that the and III show the results. These original segments have been well masked by the key signals in the encrypted segments yet recovered by BSS with very high SNR. For different BSS algorithms, there is no pronounced difference among the SNRs. The results demonstrate that the proposed method is able to separate the key signals out from the encrypted signals through BSS with very high accuracy. Thus, the recovered speech is of excellent voice quality. 2) Listening Test: To further measure the quality of the decrypted speech of the BSS-based encryption method, we conducted a subjective DCR (degradation category rating) testing,
where
1326
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 6, JUNE 2006
TABLE IV WAVEFORM INFORMATION nzc ABOUT ORIGINAL SEGMENTS IN FOUR CASES OF DIFFERENT
SNR (dB)
P
AND
T
WITH FIXED FRAME LENGTH
TABLE V
AND MEAN SNR (dB) OF ORIGINAL SEGMENTS IN DECRYPTED SEGMENTS BY COMON’S IN FOUR CASES OF DIFFERENT AND WITH FIXED FRAME LENGTH = 55000
P
T
in which the subjects are instructed to rate the conditions according to the 5-point degradation category scale as follows: degradation is inaudible (5), degradation is audible but not annoying (4), degradation is slightly annoying (3), degradation is annoying (2), and degradation is very annoying (1) [27]. The mean value of the results is called the degradation mean opinion score (DMOS). Sixteen subjects participated in the listening test. They are teachers and graduate students in the School of Electronic and Information Engineering. Nine of them are male and the other seven are female. In the listening test, all subjects heard recordings of the original speech [see Fig. 3(a)] and the decrypted speech [see Fig. 3(e)], and voted on their opinion of the quality degradation of the decrypted speech with regard to the original speech. As a result, the DMOS rating was obtained as 4.94. Besides, no degradation of intelligibility was detected by the listeners during the test. Therefore, from the listener’s viewpoint, the listening test also shows the excellent voice quality of the proposed method. D. Selection of Segment Number
P 2 T = 55000
and Segment Length T
In the above-mentioned experiment, the original speech was divided into two frames and each four segments, i.e.,
P
2T
ALGORITHM
with the frame length . We asked how the choices of the parameters and affect the performance of our proposed method. To approach the problem of parameter selection, we carried out the following experiments and for with different segment numbers, i.e., the fixed frame length of 55 000. The waveform information nzc about the original segments for each experiment is preand mean sented in Table IV, with corresponding computed for the original segments in the decrypted segments by the Comon’s algorithm shown in Table V. As can be seen becomes larger or becomes in Tables IV and V, when shorter, the nzc becomes more comparable in each segment and and the SNR gets decreased. For the shortest segment ( ) in our experiments, it is observed that the order of the decrypted segment 2 and 4 in frame 2 cannot be correctly recovered due to their similar nzc (1580 and 1577 for segment 2 and 4, respectively). This observation suggests that sufficient length of speech segment is required to achieve the accurate decryption of BSS and the exact reordering of the decrypted are segments. We found in our experiments that appropriate choices for decryption with high SNR and security enhanced by mixing.
LIN et al.: BSS-BASED METHOD FOR SPEECH ENCRYPTION
VII. DISCUSSION AND CONCLUSION Motivated by the fact that the security of many cryptosystems relies on the apparent difficulty of solving the computational problems such as the integer factorization problem, we take advantage of the underdetermined BSS problem to propose a BSS-based method for encrypting speech signals. Since the mixing matrix for encryption can ensure the intractability of the underdetermined BSS problem, and the key signals approximately have the prefect key characteristics of the one-time pad cipher, the BSS-based encryption method has enhanced security performance. The results of the computer simulations and the performance analyses demonstrate the high level of security and the excellent audio quality of the proposed method. Like previous work [15], our system exploits BSS for encryption. However, besides the signal to be encrypted is distinct, our system is substantially different from the previous study in the way such BSS methods are utilized. First, our system takes full advantage of the underdetermined BSS problem to construct the mixing matrix for encryption. Such an underdetermined mixing matrix renders the sources inseparable. Second, our key signals are generated under two explicit necessary conditions such that they retain the perfect key characteristics of the one-time pad cipher while satisfying the source separability of BSS. Third, multiple segments other than one segment are encrypted simultaneously in our system by preprocessing the original speech with the segment splitter in the time element scrambling, thus our method can be viewed as the masking technique to cover the mixture of multiple speech segments, and the BSS mixing is fully utilized to enhance the security. In fact, our system effectively combines two speech encryption techniques, the time element scrambling method and the masking technique, into one integrated system by virtue of BSS, and has substantially improved security performance while retaining excellent audio quality. We recognize that systematic comparisons of our algorithm to other speech encryption methods would require a separate study, and indeed this is a topic we intend to pursue in the future. Nonetheless, our BSS-based system has an extremely large key space; this is its major advantage over other techniques. For an encryption method, the larger the key space is, the higher the security is. The transform-domain speech scrambling, for example, has been significantly outperformed the time-domain scrambling in that the number of effective permutations (key) available in the transform-domain scrambling is much larger than that in the time-domain scrambling [2], [6]. The key space of our method is even larger than that of the transform-domain speech scrambling because the key signals are generated by pseudorandom values with nearly infinite period. Besides, unlike the classical key-based encryption method such as data encryption standard (DES) where the key is of fixed length (e.g., 56 bits) [19], [20], our key signals have the same length as the original speech. As such, as long as the intractability of the underdetermined BSS problem is guaranteed by the mixing matrix for encryption, our system is immune from the attacks such as the ciphertext-only attack, the known-plaintext attack, and the chosen-plaintext attack, which all assume that the private key
1327
used to encrypt all the plaintext is the same [19], [20]. Furthermore, our key signals are generated with the key signals generating algorithm initialized by the secret seed for efficient key management. The secret seed is designed as a composite seed to make an exhaustive search computationally infeasible. Specifically, the composite seed incorporates multiple seed values including initial values for a chaotic system and the fingerprint of a specific user processed by a 160-bit cryptographic hash function [19]. Other biometric data such as hand geometry, iris pattern and facial features can also be used if needed by the security [28]. Owing to the fact that our key signals are completely independent and the condition number of the mixing matrix for the BSS decryption is small, the BSS algorithms such as the Comon’s algorithm, the fast ICA algorithm, and the Infomax algorithm can be used to recover the original speech from the encrypted signal with very high accuracy. As a result, the voice quality of the decrypted speech is excellent in the proposed method. ACKNOWLEDGMENT The authors would like to thank the reviewers for valuable comments to improve the paper. REFERENCES [1] H. J. Beker and F. C. Piper, Secure Speech Communications. London, U.K.: Academic, 1985. [2] B. Goldburg, S. Sridharan, and E. Dawson, “Design and cryptanalysis of transform-based analog speech scramblers,” IEEE J. Select. Areas Commun., vol. 11, no. 5, pp. 735–744, May 1993. [3] A. Matsunaga, K. Koga, and M. Ohkawa, “An analog speech scrambling system using the FFT technique with high-level security,” IEEE J. Select. Areas Commun., vol. 7, no. 4, pp. 540–547, Apr. 1989. [4] F. L. Ma, J. Cheng, and Y. M. Wang, “Wavelet transform-based analogue speech scrambling scheme,” Electron. Lett., vol. 32, no. 8, pp. 719–721, 1996. [5] K. Li, Y. C. Soh, and Z. G. Li, “Chaotic cryptosystem with high sensitivity to parameter mismatch,” IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 50, no. 4, pp. 579–583, Apr. 2003. [6] G. Manjunath and G. V. Anand, “Speech encryption using circulant transformations,” Proc. IEEE Int. Conf. Multimedia and Expo, vol. 1, pp. 553–556, 2002. [7] A. Cichocki and S. Amari, Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications. New York: Wiley, 2003. [8] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis. New York: Wiley, 2001. [9] J. F. Cardoso, “Blind signal separation: Statistical principles,” Proc. IEEE , vol. 86, no. 10, pp. 2009–2025, Oct. 1998. [10] S. Amari and A. Cichocki, “Adaptive blind signal processing—Neural network approaches,” Proc. IEEE , vol. 86, no. 10, pp. 2026–2048, Oct. 1998. [11] T. Ristaniemi, K. Raju, J. Karhunen, and E. Oja, “Inter-cell interference cancellation in CDMA array systems by independent component analysis,” in Proc. 4th Int. Symp. Independent Compon. Anal. Blind Signal Separ., 2003, pp. 739–744. [12] V. D. Calhoun, T. Adali, L. K. Hansen, J. Larsen, and J. J. Pekar, “ICA of functional MRI data: an overview,” in Proc. 4th Int. Symp. Independent Compon. Anal. Blind Signal Separ., 2003, pp. 281–288. [13] M. Kawakatsu, “Application of ICA to MEG noise reduction,” in Proc. 4th Int. Symp. Independent Compon. Anal. Blind Signal Separ., 2003, pp. 535–541. [14] J. F. Cardoso, J. Delabrouille, and G. Patanchon, “Independent component analysis of the cosmic microwave background,” in Proc. 4th Int. Symp. Independent Compon. Anal. Blind Signal Separ., 2003, pp. 1111–1116. [15] W. Kasprzak and A. Cichocki, “Hidden image separation from incomplete image mixtures by independent component analysis,” in Proc. 13th Int. Conf. Pattern Recogn., 1996, vol. II, pp. 394–398.
1328
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 6, JUNE 2006
[16] Q. H. Lin and F. L. Yin, “Blind source separation applied to image cryptosystems with dual encryption,” Electron. Lett., vol. 38, no. 19, pp. 1092–1094, 2002. [17] Q. H. Lin and F. L. Yin, “Image cryptosystems based on blind source separation,” in Proc. IEEE Int. Conf. Neural Netw. Signal Process., 2003, vol. II, pp. 1366–1369. [18] Q. H. Lin, F. L. Yin, T. M. Mei, and H. L. Liang, “A speech encryption algorithm based on blind source separation,” in Proc. Int. Conf. Commun., Circuits Syst.: Signal Process., Circuits Syst., 2004, vol. II, pp. 1013–1017. [19] A. Menezes, P. van Oorschot, and S. Vanstone, Handbook of Applied Cryptography. Boca Raton, FL: CRC, 1996. [20] D. E. R. Denning, Cryptograghy and Data Security. Reading, MA: Addison-Wesley, 1982. [21] X. R. Cao and R. W. Liu, “General approach to blind source separation,” IEEE Trans. Signal Process., vol. 44, no. 3, pp. 562–571, Mar. 1996. [22] A. Cichocki, J. Karhunen, W. Kasprzak, and R. Vigario, “Neural networks for blind separation with unknown number of sources,” Neurocomput., vol. 24, pp. 55–93, 1999. [23] T. W. Lee, A. J. Bell, and R. H. Lambert, “Blind separation of delayed and convolved sources,” Adv. Neural Inf. Process. Syst., vol. 9, pp. 758–764, 1996. [24] P. Comon, “Independent component analysis, a new concept?,” Signal Process., vol. 36, no. 3, pp. 287–314, 1994. [25] A. Hyvärinen, “Fast and robust fixed-point algorithms for independent component analysis,” IEEE Trans. Neural Netw., vol. 10, no. 3, pp. 626–634, Jun. 1999. [26] A. J. Bell and T. J. Sejnowski, “An information-maximization approach to blind separation and blind deconvolution,” Neural Comput., vol. 7, pp. 1129–1159, 1995. [27] ITU, Methods for Subjective Determination of Transmission Quality 1996, ITU-T Rec. P.800. [28] M. Peyravian, S. M. Matyas, A. Roginsky, and N. Zunic, “Generating user-based cryptographic keys and random numbers,” Comput. Security, vol. 18, no. 7, pp. 619–626, 1999.
Qiu-Hua Lin received the B.S. degree in wireless communication, the M.S. degree in communications and electronic systems, and the Ph.D.degree in signal and information processing, all from Dalian University of Technology (DUT), Dalian, China. She is currently an Associate Professor in the School of Electronic and Information Engineering, DUT. Her research interests include stochastic signal processing, array signal processing, biomedical signal processing, and secure communication.
Fu-Liang Yin was born in Fushun city, Liaoning province, China, in 1962. He received the B.S. degree in electronic engineering and the M.S. degree in communications and electronic systems from Dalian University of Technology (DUT), Dalian, China, in 1984 and 1987, respectively. He joined the Department of Electronic Engineering, DUT, as a Lecturer in 1987 and became an Associate Professor in 1991. He has been a Professor at DUT since 1994, and the Dean of the School of Electronic and Information Engineering of DUT since 2000. His research interests include stochastic signal processing, speech processing, image processing and pattern recognition, wireless communication, and integrated circuit design.
Tie-Min Mei was born in Liaoning, China, in 1964. He received the B.S. degree in physics, the M.S. degree in biophysics, and the Ph.D. degree in signal and information processing from Sun Yat-Sen University, China Medical University, and Dalian University of Technology, Dalian, China, in 1986, 1991, and 2006, respectively. He was a Visiting Fellow with the School of Electrical Computer and Telecommunications Engineering, The University of Wollongong, Wollongong, NSW, Australia, from 2004 to 2005. He has been an academic staff at Shenyang Institute of Technology since 1996. His current research interests include stochastic signal processing and speech processing.
Hualou Liang (M’00–SM’01) received the M.Sc. degree in electronic engineering from Dalian University of Technology, Dalian, China, and the Ph.D. degree in physics from the Chinese Academy of Sciences, Beijing, China. He is an Associate Professor at the University of Texas Health Science Center, Houston. He was a Postdoctoral Researcher at Tel-Aviv University, Tel-Aviv, Israel, Max-Planck Institute for Biological Cybernetics, Tuebingen, Germany, and the Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, FL. He has written more than 50 papers, conference proceedings and book chapters. His research experience throughout the years ranges from areas in biomedical signal processing to cognitive and computational neuroscience. He teaches graduate interdisciplinary courses including “Biomedical Signal Processing” and “Computational Biomedicine.”