High capacity, transparent and secure audio

0 downloads 0 Views 3MB Size Report
Apr 22, 2018 - Loay Edwar George [email protected]. A. A. Zaidan aws.alaa@fskik.upsi.edu.my. Mohd Rosmadi Mokhtar [email protected].
Multimed Tools Appl https://doi.org/10.1007/s11042-018-6213-0

High capacity, transparent and secure audio steganography model based on fractal coding and chaotic map in temporal domain Ahmed Hussain Ali 1 & Loay Edwar George 2 & A. A. Zaidan 3 & Mohd Rosmadi Mokhtar 1

Received: 7 July 2017 / Revised: 22 April 2018 / Accepted: 24 May 2018 # Springer Science+Business Media, LLC, part of Springer Nature 2018

Abstract Information hiding researchers have been exploring techniques to improve the security of transmitting sensitive data through an unsecured channel. This paper proposes an audio steganography model for secure audio transmission during communication based on fractal coding and a chaotic least significant bit or also known as HASFC. This model contributes to enhancing the hiding capacity and preserving the statistical transparency and security. The HASFC model manages to embed secret audio into a cover audio with the same size. In order to achieve this result, fractal coding is adopted which produces high compression ratio with the acceptable reconstructed signal. The chaotic map is used to randomly select the cover samples for embedding and its initial parameters are utilized as a secret key to enhancing the security of the proposed model. Unlike the existing audio steganography schemes, The HASFC model outperforms related studies by improving the hiding capacity up to 30% and maintaining the transparency of stego audio with average values of SNR at 70.4, PRD at 0.0002 and SDG at 4.7. Moreover, the model also shows resistance against brute-force attack and statistical analysis.

* Ahmed Hussain Ali [email protected] Loay Edwar George [email protected] A. A. Zaidan [email protected] Mohd Rosmadi Mokhtar [email protected]

1

Universiti Kebangsaan Malaysia, Bangi, Malaysia

2

University of Baghdad, Baghdad, Iraq

3

Universiti Pendidikan Sultan Idris, Tanjung Malim, Malaysia

Multimed Tools Appl

Keywords Fractal coding . Least significant bit . Steganography . Information hiding . Logistic map . Statistical steganalysis

1 Introduction Information security is the practice of providing secure transmission of important data that mainly consist of two main techniques, cryptography and information hiding [13]. Cryptography renders the secret data to be meaningless and unreadable to the attackers. Information hiding can be divided into two classes, namely, watermarking and steganography. Watermarking achieves copyright protection or ownership by embedding watermarks inside the media. Meanwhile, steganography hides and transmits confidential data and their existence simultaneously [4, 39]. The most common digital media used in data hiding are image, text, audio, and video files. Audio steganography is an approach that hides secret information inside an audio file. Due to the high sensitivity of human auditory system (HAS), hiding secret data in the audio file is a challenge comparing with other media [15]. The performance of any data hiding technique depends on its hiding capacity, transparency, and robustness [14, 20]. Capacity or hiding capacity means the percentage of the size of the secret file to that of the cover file and measures by handard percentage. On the other hand, it can be represented by the number of the secret bits that can be embedded during a unit of time and it is measured by bit per Second (bps), which sometimes called embedding rate or payload [62]. Transparency means the closeness of property between the stego and reconstructed files and the original cover and secret files, respectively. This parameter also means minimum degradation and is inversely proportional to hiding capacity. Specifically, high distortion and low transparency result in high hiding capacity [6]. Robustness indicates the resistance of stego file to various attacks and its capability to retrieve secret message with a minimum error. Robustness is the significant parameter in watermarking while transparency and hiding capacity are the most important for steganography [8, 41]. These parameters are contradictory to each other. In particular, the increase in hiding capacity leads to degradation in the robustness of secret message and transparency of stego file. The trade-off among these parameters is complicated task [32, 60]. Data hiding techniques, in general, can be classified into three main domains according to the format of the cover file: temporal or time [25, 29], transform [7, 15, 20, 43, 47] and compressed [30, 36, 55] domains. In the temporal domain, the cover data are modified directly to hide the secret data. Thus, the techniques in this domain are considered simple and fast, but they are less robust to signal processing. An example of such technique is LSB. In the transform domain, the cover samples are transformed into a set of coefficients, while the secret data are concealed inside these coefficients to enhance robustness and security. Examples of techniques in transfer domain are Discrete Wavelet Transform (DWT), Fast Fourier Transform (FFT), and Discrete Cosine Transform (DCT). On the other hand, some data hiding techniques adopt data compression techniques as a new trend to decrease the bandwidth and the size of the secret data is compressed and then hidden into the cover file. Data hiding techniques adopted data compression method can be classified under the compressed domain. Vector quantization [30, 56], fractal coding [17], and block truncation coding [33, 55] are the most common lossy or irreversible compression techniques for increasing hiding capacity related to image steganography. However, such compression technique has not yet implemented in audio steganography. Additional aspects on audio steganography are also discussed by Djebbar et al. [14] and Ali et al. [5] in greater details.

Multimed Tools Appl

In this study, an audio steganography model based on fractal coding and chaotic LSB is proposed. This model exhibits high hiding capacity and preserves the transparency of the stego audio. Given that it is a hybrid, it adopts two domains for data hiding, namely, time and compressed domains. The HASFC model is also a blind method, which means that the embedded secret data can be extracted from the stego audio without referring to the original cover audio [40]. The rest of the paper is organized as follows. Section 2 shows the related work. Section 3 gives a summary of the methods adopted by the proposed model. Section 4 presents the proposed model and describes its phases. Section 5 shows the experimental results and discussion. Section 6 is the summary points. Finally, Section 7 and 8 highlight the limitation and conclusion.

2 Related work In this section, the related works in audio steganography are discussed along with the corresponding contributions in improving the hiding capacity through different approaches and domains. In recent years, various audio steganography techniques have been proposed and implemented. Nevertheless, it is only limited to temporal and transform domains. These methods aim to minimize the tradeoff between the hiding capacity and transparency, which are the significant parameters in each steganography technique [47].

2.1 Temporal domain LSB is the widely used approach in the embedding process in the temporal domain [35, 54]. LSB is also known as the replacement approach because the secret message is embedded by replacing the rightmost bit of the cover samples. This approach presents advantages of simplicity, ease of implementation, low distortion, and low computational cost. Owing to these merits, LSB has been adopted in many steganography and watermarking techniques. However, this approach is vulnerable to eavesdropping because of the imbalanced odd and even samples caused by the embedding process. LSB has been proposed recently in the following two studies. Kekre et al. [27] proposed two approaches for embedding audio signal using different numbers of LSB depending on the most significant bit (MSB) of the cover audio. They found that the number of LSBs used in embedding can be up to seven, and that the number of LSBs depends on the MSB of the cover samples. The authors used a 16-bit cover sample for their approaches. The results showed that the obtained hiding capacity is between 35 and 70% of the cover file size. The signal to noise ratio (SNR) of the stego file is 52 dB on average. Bazyar and Sudirman [10] on the other hand proposed an embedding technique for increasing carrying capacity. They used an LSB algorithm for embedding and shifting the embedding layer from the fourth LSB layer to the seventh LSB layer. The results showed that the obtained hiding capacity is between 35 and 55%, and the SNR of the stego file is 62 dB on average.

2.2 Transform domain DWT and DCT are used in the transform domain because of their capability to increase the hiding capacity and robustness of steganography systems. Several other methods are proposed in this particular domain.

Multimed Tools Appl

Sheikhan et al. [49] in 2010 suggested a method in the wavelet domain. In their method, the secret signal is embedded in the selected coefficients using LSB based on the modified floating three-level HAAR wavelet function. Floating number of bit is used in substitution, in which the number of embedded bits depends on the sub-band energy for producing good SNR. The findings showed an acceptable hiding capacity of 14.3% and high SNR and mean opinion score (MOS) compared with those of other previous studies. Shahadi and Razali [45] proposed a block matching algorithm based on discrete wavelet packet transform (DWPT). Matching, scaling, and replacement are adopted in data hiding rather than LSB. Their proposed algorithm obtains hiding capacity of 35%, more than 25 dB resistance to additive white gaussian noise (AWGN), and recognizes secret message up to 25 dB [45]. Shahadi and Jidin [46] in the same year also proposed an algorithm based on wavelet packet transform but with adaptive hiding based on LSB. In this algorithm, the strength of cover samples and the matching of bit blocks are the two factors that affect the hiding process. The results showed that the embedding capacity can be up to 42% of the cover signal with minimum SNR of 50 dB. Sheikhan et al. [50] proposed a method for hiding information in wavelet coefficients using the LSB substitution technique. The cover signal is divided into several sub-bands using DWT. The sub-band with a lower than or equal energy to the hearing threshold is used in the embedding process. The SNR of the stego file is 76 dB, and the hiding capacity is 34% of the cover file size on average. Shivdas [53] conducted sample segment comparison in the DCT domain for hiding text or audio in an audio carrier. The hiding capacity is up to 25% of the cover file size, and the SNR of the stego file is within 50 dB. Shahadi et al. [48] in 2014 proposed an audio steganography based on lifting wavelet transform (LWT) and adaptive random embedding using weighted block matching. This scheme increases hiding capacity to 48% (up to 340 Kbps), SNR of above 35 dB, and lossless message retrieval. El-Khamy et al. [15] proposed a scheme for concealing encrypted images using RSA in audio cover by sample comparison in a DWT domain and coefficients selected using the pseudo number. The experimental results show the embedding rate is 5698 bps which mean less than 1% hiding capacity and 41.73 SNR stego fidelity. Moreover, the results demonstrate that the proposed scheme is robust against some of the signal processing attacks as AWGN noise, MP3 compression and echo addition. El-Khamy et al. [16] proposed image in audio hiding scheme for improving the hiding capacity, security and the robustness of the audio steganography using two levels integer wavelet transform, wavelet coefficients modification, XOR, and chaotic map techniques. Payload with 21,845 bps, a hiding capacity of 25% of the cover file and SNR 44.6 dB stego fidelity are the obtained results from the proposed scheme.

3 Main methods This section briefly discusses the three main methods used in the HASFC model. Although there are many techniques for embedding and compression, the proposed model adopts the fractal coding, least significant bit and chaotic map for the reasons shown in next subsections.

3.1 Fractal coding Data compression techniques in general can be classified into lossless and lossy compression. Lossy techniques provide high compression ratio than lossless compression, however, the files before the compression are not identical with the files after the compression while in lossless

Multimed Tools Appl

the files are indistinguishable [44]. In most cases, the deficiency of the quality of the reconstructed audio, image or video files are not considered a critical issue while in other files such as text, this it is a very important issue where small differences can led to different meanings [28]. Fractal coding is a lossy compression technique that was introduced by Benoit Mandelbrot in 1975. Fractal geometry is the science concerning the property of fractal objects found in the real world. The fractal concept is based on the existence of numerous similarities and redundancies in most real-world objects [59]. Fractal coding was first used for image compression by Barnsley [9]. Jacquin [22] then extended Barnsley’s work using the mathematics of Iteration Function System (IFS). In IFS, the output of the first iteration is considered the input to the second iteration and the computational complexity is considered high. Jacquin finally established a practical Fractal coding algorithm using Partition Iteration Function System (PIFS). In this technique, an image is divided into two types of blocks, namely, overlapped domain and non-overlapped range blocks. Each range block is then represented and encoded by set of IFS code. The IFS code consists of set of coefficients that include the domain block index, scale and range mean. Fractal coding is a prominent approach used for lossy data compression because of its high compression ratio and accepted the quality of the reconstructed signal compared with those of other techniques, such as DWT and DCT [21, 38, 51]. Unlike DCT and DWT, Fractal coding also requires less computational complexity because its process does not require any transformation. Moreover, fractal coding presents an asymmetric property in which the encoding process is time-consuming during the range–domain matching process while the decoding is simple and fast [38, 52]. Similar to other compression techniques, fractal coding consists of two main processes, namely, encoding and decoding. The encoding process of fractal coding includes three steps [23, 24]: 1. Partitioning input signal into non-overlapped (shifting the previous block by one block size) range and overlapped (shifting the previous block by one pixel) domain blocks to increase the domain blocks and the probability of finding a domain block that is more similar to the particular range block. 2. Matching between range and domain blocks to produce optimum IFS coefficients for each range block with minimum error using Eqs. (1) to (4): x ¼ 2

σ2r

  2 n−1 2 þ s sσd þ 2dr− ∑ d i ri n n¼0

8 1 n−1 > > < ∑i¼0 d i ri −dr n ; if σ2d < 0 s¼ 2 σ > d > : 0; if σ2d ¼ 0

ð2Þ

1 n−1 1 di ∑ ri ; ¼ d ∑n−1 n i¼0 n i¼0

ð3Þ

1 n−1 2 2 1 n−1 2 2 ri −r ∑ d −d ; σ2r ¼ ∑i¼0 n i¼0 i n

ð4Þ



σ2d ¼

ð1Þ

where x2 is the error between the current range block and domain block;

Multimed Tools Appl

s is the scale parameter; d, r is domain and range with n samples respectively. di is the value of the ith sample in the domain block; ri is the value of the ith sample in the range block; d; r are the mean of the range and domain blocks, respectively; σ2d ; σ2r are the variances of the range and domain blocks, respectively. 3. Saving the optimum IFS coefficients for the decoding process The decoding process is simple and straightforward. In this process, the affine mapping is applied using the retrieved IFS coefficients and arbitrary samples by the following equation:   0 ð5Þ ri ¼ s d i −d þ r where. 0 ri is the retrieved range block; s is the scale parameter; di is the value of the ith sample of the arbitrary block; d; r are the mean of the stego and range blocks, respectively.

3.1.1 Fractal coding for data hiding The proposed model adopts the fractal coding proposed in [2, 3, 11], which is designed for image and audio compression after making amendments in order to use it in data hiding. The amendments of the fractal coding algorithm are: (1) adopting two signals as input to the fractal coding algorithm instead of one signal, (2) considering the cover audio as domain and secret audio as the range to generate the cover and secret blocks and (3) the reconstructed secret signal is obtained by applying the retrieved IFS codes on the stego data only once instead of using random signal and repeat the process several times. To the best of our knowledge, fractal coding has not been used in audio data hiding.

3.2 Least significant bit Least significant bit (LSB) is one of the conventional substitution methods used in time domain data hiding. The mechanism is to replace the LSBs of the cover samples with the secret bits directly. Although it has several pros such as simplicity, low complexity, and high hiding capacity, it has weak points such as low robustness against statistical analysis and it is vulnerable to attack [14, 29]. The robustness of the LSB will be enhanced in the proposed model by integrating the chaotic map.

3.3 Chaotic map The chaotic map is used due to the high sensitivity of the initial parameters and to scatter the secret data in a way that could not be exploited by an attacker for detection [25]. The logistic map is the simplest chaotic map that is adopted in the proposed model to randomly select the cover samples for embedding the secret bits and can be represented by: xnþ1 ¼ t xn ð1−xn Þ where 0 ≤ t ≤ 4, x0 ϵ (0, 1).

ð6Þ

Multimed Tools Appl

Embedding Phase Pre-Processing

Fractal Encoding

Generating Stego audio

Embedding

Extraction Phase Fractal decoding and reconstruction

IFS Extraction

Fig. 1 Proposed model

The characteristic of the logistic equation depends on the parameters t and x0 [19, 61]. The logistic map is adopted to chaotically select the samples of the cover audio

Embedding

Fractal Encoding

Pre-processing

Secret Audio

Cover Audio

Matrix of chaotic indexes

Sort in ascending order

Generate fixed non-overlapped secret and overlapped cover blocks

Compute mean and variance for secret and cover

Generate matrix of pseudo random numbers

Compute IFS code and update the error

Secret block

Skey1 Skey2

Select cover block from cover pool

Match secret with cover block

Yes More cover blocks?

Select bits from binary sequence chaotically

No

Register the IFS code for current secret block that has minimum error

Decimal to binary converter

More secret blocks?

Yes

No IFS codes for all secret blocks

LSB

Cover samples

Generate Stego File

Load and split audio Data from the header

Selected Cover samples chaotically

Combine the modified cover audio

Fig. 2 Sub-model (data embedding Phase)

Merge data with the header information

Stego audio

Multimed Tools Appl

for embedding rather than the sequential manner. The two parameters are considered the secret keys in embedding and extraction.

4 Proposed model This section discusses the details of the proposed model. The main objective of HASFC model is to improve the hiding capacity and maintain the transparency of the cover audio compared with those of other methods. The HASFC model employs fractal coding for encoding and compressing the secret audio, which accordingly increases the hiding capacity of the cover audio size. Moreover, adopting fractal coding offers security as fractal coding encodes the secret samples into a set of IFS codes. These IFS codes are then hidden into cover samples instead of the original secret samples. Moreover, any third party who finds the IFS codes will not understand the secret message without the specific method. Logistic map with LSB is adopted as an embedding technique to enhance the security of LSB method. In this model, the initial parameters of the chaotic map are used as a secret key required in the sender and the receiver sides. Similar to other steganography techniques, HASFC model consists of two main phases, namely, embedding and extraction as shown in Fig. 1. The embedding phase is composed of four processes which are pre-processing, fractal encoding, embedding and generating stego audio, whereas extraction phase consists of two processes, IFS extraction and fractal decoding and reconstruction.

4.1 The data embedding phase The data embedding phase is executed by the sender as shown in Fig. 2. In this phase, we denote the cover audio as C = {c(i), 1 ≤ i ≤ L1} where L1 represents the number of samples that are used for embedding. Accordingly, the secret audio is represented as S = {s(i), 1 ≤ i ≤ L2} where L2 signifies the number of secret samples to hide. The C and S are then partitioned into blocks with a number of samples known L1 , where blc as BL. Next, the cover audio C is partitioned into a number of blocks BL   L1  th ði Þ 1≤ i ≤ BL refers to the i block of C file whereas, the secret audio S is L2 blocks where bls(j) {1 ≤ j ≤ L2/BL} is the ith block of S. This partitioned into BL phase also involves the computation of the mean and the variance of both the cover C and secret S respectively. The mean and variance of the ith cover block are repre  L1 L1 sented as Mc¼ mcðiÞ; 1≤i ≤ BL and Vc¼ vcðiÞ; 1≤i ≤ BL . Similarly, the secret mean  L2 and and variance for the ith secret block are represented as Ms¼ msði Þ; 1≤ j ≤ BL  L2 Vs¼ vsð j Þ1≤ j ≤ BL . The secret blocks are represented by a set of IFS codes, IFS = {ifs(i), 1 ≤ i ≤ L2} where the ifs(i) is the ith IFS code for the ith secret block bls. The next step in the embedding process is the random selection of the cover samples for hiding the IFS. The chaotic vector CH = {ch(i), 1 ≤ i ≤ L} is used for this purpose with L indexes where L = L1. The three factors that are used in the matching process are the scale factor Scl, the approximate error x2 and the predefined error threshold Th.

Multimed Tools Appl

4.1.1 Pre-processing Pre-processing under the embedding phase consists of the following two main tasks.

Construct blocks and compute mean and variance This subprocess is responsible for constructing the cover and secret blocks. Moreover, the mean and variance for all cover and secret blocks are also computed in this subprocess. Algorithm 1 illustrates the pre-processing process:

Generate chaotic indexes process Chaotic indexes are generated using Eq. 6, and these indexes will be used in selecting the cover samples instead of the sequential manner in the traditional LSB. In this proposed model, two secret keys which are the initial parameters of the chaotic map are considered as a secret key provided in sender and receiver sides. The algorithm is as follow:

Multimed Tools Appl

4.1.2 Fractal encoding process In the process, the secret and cover blocks are considered as the range and domain pools, respectively. The IFS coefficients consist of the index, scale, symmetry, and the mean of the secret blocks. The total number of bits of the IFS for all secret blocks is less compared with the number of bits required to hide the actual secret samples. The binary sequences of IFS are used in the embedding sub process. The details of this process are illustrated in Algorithm 3:

An example of the fractal encoding process For example, when the secret and cover audio have the same size of 44,100 samples and each sample has 16 bits. Hence, the secret size will be 705,600 bits (44,100 × 16) and the cover audio are insufficient to embed secret data of less than 44,100 bits (given that 1 LSB for each cover sample from 16 bits is used for embedding). Fractal coding is utilized to encode the secret block to the minimum number of bits. In this case, the block size of 32 samples is selected based on Eq. (7). In the encoding process, instead of embedding each secret block with actual samples that require 512 bits for each block (32 × 16), each block will be encoded using fractal coding by only 31 bits. The IFS code is 16 bits for index +6 bits for quantized scale +1bit for symmetry +8 bits for the mean of the secret block. In this case, the compression factor of each block is 16.8 (512/31). Given that the number of secret blocks is 1378 samples (44,100/32), the total number of bits required to represent the secret data is reduced from 705,600 bits to 42,765 bits (1378 × 31 + 47 header information), with a compression ratio of around 93.9%.

Multimed Tools Appl

4.1.3 Embedding process When the encoding process finished, IFS coefficients are embedded in the cover audio samples after converting them into a sequence of binary bits using 1 LSB of 16 bits per sample. Algorithm 4 explains the embedding process.

4.1.4 Generate stego audio process Generation of the stego audio is the final step in the data embedding phase. The algorithm as follow:

4.2 The data extraction phase On the recipient side, the data extraction phase is taken place which is fast and simple, and it is divided into two processes: extraction and fractal decoding-reconstructing audio secret audio. The process begins with extracting the LSB from the stego samples then regenerating the IFS codes, followed by applying fractal decoding to reconstruct and create the reconstructed secret audio file as in Fig. 3. The stego audio st = {st(i); 1 ≤ i ≤ L} is the input file to this process with L number of samples. The reconstructed secret block, rbls(i){1 ≤ i ≤ L4) with L4 blocks, rbls(i) is the ith reconstructed secret block. The output from this process is the reconstructed secret samples rec = {rec(i); 1 ≤ i ≤ L3} with L3 number of samples. The number of samples of rec should equal to the number of samples of S, so that L3 = L2.

Fractal Decoding and Reconstruction

Extraction

Multimed Tools Appl

Stego File

Load stego file data

Skey1 Skey2

Select Stego samples chaotically

Reconstructed Secret file

Retrieve the IFS codes

Reorder the binary sequence

Generate chaotic indexes using secret keys

Combine reconstructed secret blocks

LSB gathering

Build reconstructed secret blocks

Fractal decoding

Build header file

Create reconstructed secret data

Fig. 3 Sub-model (data extraction phase)

4.2.1 Extraction process The LSB bits of the stego audio samples are collected in the same chaotic way as in the embedding process by using the secret key that retrieves the IFS coefficients. The retrieved coefficients are then used to reconstruct the secret blocks that are later used in the decoding process to reconstruct the secret audio. Algorithm 6 illustrates the extraction process.

4.2.2 Fractal decoding and reconstruction process The fractal decoding process is performed in this process. The process is considered fast and simple because of applying the affine mapping using Eq. (5) on the stego blocks and the IFS

Multimed Tools Appl

codes for retrieving the array of the secret samples. After obtaining the array of the reconstructed secret samples, the header for the retrieved secret audio is created. Finally, the

reconstructed secret audio is generated as shown in algorithm 7.Example of the fractal decoding process The decoding process is straightforward, and it is performed using the IFS code of each secret block and the particular stego block using the index parameter of the IFS code parameter. Using Fractal decoding, the approximation of each secret block can be constructed. For instance, suppose the block size is 8 samples and the secret block that want to be encoded with these values (133,134,135,136,138,140,142,143). During the encoding phase, suppose the best matching cover block to this particular secret block using cover-secret mapping algorithm is (133,134,132,3131,130,129,128,126) and the IFS code of this secret block is (0, 138, 0, −14). In decoding phase, using fractal decoding and the particular stego block using the IFS code, the reconstructed secret samples can be obtained such as this (134,133,136,137,139,140,141,144).

5 Experimental results and discussions The hiding capacity, transparency of the stego audio and the security of the HASFC model are presented through series of experiments in this section. The experiments use objective and subjective metrics presented in section 5.1 for evaluating the performance of the HASFC model in relation to the transparency of the stego audio, the hiding capacity, the statistical

Multimed Tools Appl

steganalysis tests in terms of histogram distribution and the fourth first moments and the security of the HASFC model. Moreover, comparison to related schemes is also conducted. The results from these experiments are used to assess the performance of the model with regard to the above-mentioned properties. Finally, the results of HASFC model are compared with the results reported in related schemes. HASFC model is developed using Java Eclipse EE IDE for Web Developers Luna SR2 Package 4.4.2 on the Intel® Core™ i5–4590 CPU @ 3.30 GHz 4GB RAM with Windows 7 Professional 64-bit operating system. In order to evaluate the performance of the proposed model and to compare the performance with the related work discussed in section 2, the same audio specifications have been adopted. For this reason, uncompressed audio files are used as secret and cover audios, which were selected from the GTZAN dataset [57, 58]. The specifications of the audio files used in the experiments are listed below in Table 1.

5.1 Measurement metrics In order to evaluate the performance of the proposed model, two different tests are adopted in the form of objective and subjective test. In the objective test, Mean Square Error (MSE), Peak Signal-to-Noise Ratio (PSNR), Signal-to-Noise Ratio (SNR) and Percentage Root Mean Square Difference (PRD) and Hiding Capacity (HC) are used. On the other hand, Subjective Difference Grade, namely (SDG) is used for the subjectivity listening test. These metrics are used to support the theory behind the transparency and the hiding capacity as follow:

5.1.1 Transparency The following metrics are used to gauge the performance of the proposed model in terms of transparency:

&

MSE is the average square of the differences between the input and output signals and can be defined as [36]

MSE ¼

1 N ∑ ðs1ðiÞ−s2ðiÞÞ2 N i¼1

ð8Þ

where s1(i) and s2(i) are the ith samples of the input and output signals, and N is the number of signal samples. When this value decreases to zero, the fidelity of the input and output signals becomes similar. Table 1 Audio files specification

Specification Bit per sample Sample rate Channel Audio type Duration in Seconds

16 44,100 Mono Speech Music 1–10

Multimed Tools Appl

&

PSNR measures the maximum signal to noise ratio of a given signal. PSNR is given by [36] ! ð2n −1Þ2 ð9Þ PSNR ¼ 10 log 10 MSE

where n is the maximum number of bits used to represent each signal sample.

&

SNR measures the distortion in the fidelity between two signals, input, and output. SNR is expressed as [26] N

∑ s1ðiÞ2 SNR ¼ 10 log10

i¼1 N

ð10Þ

∑ ðs1ðiÞ−s2ðiÞÞ2 i¼1

where s1(i) and s2(i) are the ith samples of the input and output signals, and N is the number of signal samples. The SNR should is more than 20 dB to be acceptable as declared by the International Federation of the Phonographic Industry (IFPI) [12, 31].

&

PRD measures the percentage of root mean square differences between two signals The PRD [42] is calculated based on the following equation vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u 2 u ∑ ðX i −Y i Þ ui PRD ¼ t ∑ X 2i

ð11Þ

i

where Xi is the first signal and Yi is the second signal. PRD values are ranged from 0 to 1, being 0 as the ideal value.

&

SDG metric is used to evaluate the perceptual quality of the stego signal subjectively which is implemented by human listeners. The score of the SDG is ranged from 1 to 5, with higher values indicate better quality of the audio signal. SDG is similar to PEAQ except the latter is implemented by software simulating the human auditory system [1, 26].

5.1.2 Hiding capacity The following metric is used to explain the hiding capacity of the proposed model. - HC is the essential factor for evaluating any steganography technique and can be calculated by [37] Secret file size HC ¼  100Þ ð12Þ Cover file size

5.2 Transparency tests Transparency refers to the perceptual similarity between the fidelity of cover and stego audio. Two experiments are conducted to explore the transparency of the HASFC model using different audio file types and block sizes.

Multimed Tools Appl

5.2.1 Various audio file types This objective of this experiment is to evaluate the transparency using two tests, objective and subjective with various speech and music files as cover and secret audios. MSR, SNR and PSNR, PRD and SDG are used to justify the transparency accomplished by the HASFC model. The experiment highlights the applicability of HASFC using various types of Audio files. Different files are used as cover and secret audio with 220,500 and 44,100 samples, respectively. The block size in this experiment comprises of 7 samples based on Eq. (7). This specific selection of secret audio size, cover audio size, and block size is due to that the secret audio with this size cannot be hidden inside the cover audio with a block size of fewer than 7 samples. If the size of the secret audio must be increased, then the block size must also be increased. The results of the objective test are shown in Table 2 and Fig. 4. It shows that HASFC model can be used to hide any type of audio files into another file and produced stego audios, such as speech in music or music in speech, with high fidelity, regardless of the type of file since the average of the SNR is above 20 dB [12, 31]. The average SNRs for all types of audio file are approximately 70.5 and 41.7 dB for the stego and reconstructed secret audios, respectively. The PSNR is 99.5 dB on average for the stego audios and 47.4 dB for the reconstructed audios while the average PRD value is 0.0002. The transparency of the stego file generated by HASFC model is preserved since the distortion of the cover file is reduced by using fractal coding that compresses the secret samples before embedding. Subjective listening test is also used to evaluate the perceptual quality of 8 stego audio signals generated by the HASFC model with the hiding capacity of 100% based on the SDG value. In this test, the cover and the stego audio signals are presented to 7 experts working in acoustics research group and steganography field. They listen to each audio file for several times and asked to evaluate the similarity between the audio signals by using a standard measurement. The scores that were given are shown in Table 3. Table 3 shows the average SDG value of 4.7 that is derived from the 8 stego audio signals with each signal performing above the minimum acceptable value of 4. These results show that the stego signals generated by the proposed model and the cover signals yield similar subjective quality.

Table 2 The fidelity of stego and reconstructed files using several secret and cover audio types Cover

Dialogue

Female

Jazz

Secret

Female Jazz Vlobos Dialogue Jazz Vlobos Dialogue Female Vlobos

Stego

Reconstructed

MSE

PSNR

SNR

PRD

MSE

PSNR

SNR

0.46 0.46 0.46 0.46 0.46 0.46 0.47 0.47 0.47

99.6 99.6 99.6 99.6 99.6 99.6 99.5 99.5 99.5

73.6 73.6 73.6 69.1 69.1 69.1 69 69 69

0.0002 0.0002 0.0002 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003

1.28 1.40 2.15 2.42 0.57 0.55 2.43 0.71 0.72

47 46.6 44.7 44.2 50.5 50.6 44.2 49.6 49.5

41.1 40.7 38.9 38.4 44.6 44.8 38.4 43.7 43.6

Multimed Tools Appl

80

SNR

60 Stego 40

Reconstrucon

20 0 Female

Jazz

vlobos

Dialogue

Dialogue Dialogue Dialogue Female

Jazz

vlobos

Female

Female

Dialogue Female Jazz

vlobos

Jazz

Jazz

Secret and Cover Files

Fig. 4 Fidelity of stego and reconstructed files using several secret and cover audio types

The results of the objective and subjective tests show the high transparency of the HASFC model from adopting the fractal coding and LSB method in the embedding process. For this reason, there is no significant audible distortion resulted from the embedding process that able raise suspicion on the existence of secret messages in the generated stego signal. Such substantial results are directly influenced by the block size in the encoding process of the fractal coding. Further analysis on the effect of block size to the transparency is presented in the next subsection.

5.2.2 Different block sizes This specific experiment aims to determine the effect of block size on the fidelity of stego and reconstructed audios. The secret audio used in this test is the vlobos, while the cover audio used are jazz, female, and voice. The audio size is fixed for the secret and cover audio of 220,500 and 44,100 samples, respectively. The results in Fig. 5 reflect the effects of block size on the fidelity of stego and reconstructed secret audios. The SNR of the stego audio is directly proportional to block size, whereas that of the reconstructed secret audio is inversely proportional to block size as shown in Fig. 6 that exhibits the differences between the original cover and the stego and those between the secret audio and reconstructed secret audios of different block sizes for the above two cases. The block size is an important step for obtaining an acceptable SNR value, which is related to the encoding process. During the encoding process, when the block size is increased, the

Table 3 The average SDG values

Audio No.

Audio Name

SDG

1 2 3 4 5 6 7 8 average

Dialogue Female Jazz Voice Undergrad Dialogue Undergrad Jazz

4.6 4.8 5 4.6 4.4 4.6 5 4.8 4.7

VlobosJazz FemaleVoice

35

25

20 15 Block Size

41 40 39 38 37 36 35 34

VlobosJazz Female _Voice

SNR

80 78 76 74 72 70 68 66

35

10

25

20 15 Block Size

SNR

Multimed Tools Appl

10

Fig. 5 Effect of block size on the fidelity of (left) the stego, (right) reconstructed file

number of IFS is decreased. As a result, the number of secret bits for embedding is decreased due to less distortion to the audio cover file and hence a higher transparency is achieved.

(1) (2) (3) (4) (5) (6) (a) (1) (2) (3) (4) (5) (6) (b) Fig. 6 Effect of block size using music files, Vlobos, Jazz as a secret and cover audio respectively. a Original cover and stego audio. b Original secret and reconstructed files, using different block sizes of 10, 15, 20, 25 and 35

Multimed Tools Appl

Figure 6 shows the result of hiding the audio signal Vlobos in Jazz. In Fig. 6a, the signal (1) represents the original cover audio while the signals from (2) to (6) are the stego audios after the hiding process. It is clear that the cover and stego audio show close similarity under block sizes of 10, 15, 20, 25, and 35. Figure 6b represents the signal of the original secret (1) and reconstructed secret audios (2) to (6) using different block sizes. Some changes appear on the reconstruction signals (2) to (6) when block size increases compared with the original secret audio represented by signal (1).

5.3 Hiding capacity test The objective of this experiment is to show the hiding capacity that can be achieved using the HASFC model. HC metric, Eq. (12) is used to explain the achieved hiding capacity theoretically. This experiment also demonstrates the effect of block size on hiding capacity. Here, the cover audio voice, vlobos, and female with 220,500 samples are used. Meanwhile, the secret audio consists of jazz, voice, and female with different file sizes. Speech into music, music into speech, and speech into itself are used in this experiment to investigate the effect of block size on hiding capacity. As shown in Table 4, different block sizes are used and the selection is conducted using Eq. (7) and the hiding capacity is measured using Eq. (12). The results in Table 4 show that, when hiding capacity is increased, block size must also be increased and thereby decreasing the fidelity of the stego and reconstructed audios. Based on the results, HASFC model has shown to hide secret audios with 100% hiding capacity of the cover audio size with SNR of 37.4 dB on average for the reconstructed audio. The proposed model also maintains the fidelity of the stego audio at approximately 70.4 dB. These results further justify the adaptation of the fractal coding technique in the HASFC mode. The integration of the fractal coding in the HASFC model has significantly improve the hiding capacity up to 100% of the cover size. This is directly related to the fractal coding that able to compress the secret samples with high compression ratio. Therefore, the block size is the effective factor in the encoding process as shown in the previous paragraph. Table 4 Effect of block size on hiding capacity with different secret audio sizes using optimum block size samples Cover Secret

Cover Samples

Secret Sample

Block size samples

Hiding capacity%

Stego SNR

Reconstructed SNR

Voice Jazz

220,500

Vlobos Female

220,500

Voice Voice

220,500

Female Jazz

220,500

44,100 88,200 176,400 220,500 44,100 88,200 176,400 220,500 44,100 88,200 176,400 220,500 44,100 88,200 176,400 220,500

7 14 27 34 7 14 27 34 7 14 27 34 7 14 27 34

20 40 80 100 20 40 80 100 20 40 80 100 20 40 80 100

71.1 71.1 71 71 71 71 70.8 70.8 71.1 71.1 70.9 70.9 69.1 69.1 69 69

42.6 41.2 38 37.2 42.2 39.2 38.5 38.1 41 39.3 37.3 37.2 44.6 41.1 38 37.3

Multimed Tools Appl

5.4 Steganalysis tests The objective of the steganalysis is to find any marks about the presence of the secret audio into the cover audio. HASFC model is a type of blind steganography in which the extraction process does not need the original cover audio to reconstruct the secret signal. In the case of the blind steganography which had no database that can be used to extract the secret data, steganalysis depends on the statistical analysis of the signal variation to classify the signal as a stego or cover audio. There are several steganalysis methods [7, 8, 18, 47] proposed for audio signals. In this section, the resistance of the HASFC model against two statistical steganalysis, histogram [7, 47] and first fourth moments statistical steganalysis [8, 47] is discussed since these two steganalysis methods are typically used for blind steganography technique that is similar to the HASFC model.

5.4.1 Histogram attack In relation to the histogram attack, we conduct two experiments using different cover and secret audio. Histogram Error Rate (HER) [43, 47] using Eq. (13) is adopted to find the histogram error between the original cover and the stego audio produced by the proposed model. Figure 7 presents the HER value and the histogram of the original cover audio before and after embedding the secret audio using hiding capacity 100% of the cover audio with a block size of 50 samples. N

∑ ðHisðcÞ−HisðsÞÞ2 HER ¼

i¼1 N

ð13Þ

∑ HisðcÞ2 i¼1

where His(c) and His(s) is the histogram of cover and secret audio.

(a) Left: Original cover Right: stego HER= 0.0431

(b) Left: Original cover Right: stego HER= 0.1278

Fig. 7 Histogram error a voice-rock and b jazz-female

Multimed Tools Appl

Based on the results in Fig. 7, the differences between the cover and its stego audio are less than 0.2 and the variation is not observed, so the proposed model is undetectable through histogram attack.

5.4.2 Fourth first moments The HSAFC model is also evaluated using fourth first moments [8, 47] which is statistics measurements that exhibit the differences between the cover and secret signal. The moments produce the function of the distribution of two signals. These moments are average (μ), variance (σ), skewness (s), and kurtosis (k) as in Eq. (14) to (17), respectively. This test calculates the difference ratio DR using Eq. (18) which represents any of these four moments for cover and stego signal. When the values of DR are below 10%, this indicates that the stego signal can resist the statistical steganalysis [8]. n

∑ si μ¼

i¼1

ð14Þ

n

n

∑ ðsi −μÞ2 σ2 ¼

i¼1

ð15Þ

ðn−1Þ n

∑ ðsi −μÞ3 sk ¼

i¼1

ð16Þ

ðn−1Þσ3

n

∑ ðsi −μÞ4 k¼

i¼1

ð17Þ

ðn−1Þσ4

Mc−Ms D ¼ 100  Mc

ð18Þ

Where si is the input signal and n is the size of S, Mc,Ms are any fourth first moments of cover and secret audio, respectively. The results listed in Table 5 show the differences ratio using various cover audios. The DR of the four moments for this test is less than 0.08 in all cases. This implies that it is not easy for steganalysis to identify the stego signal based on the statistical analysis. Table 5 Statistical analysis tests for HASFC: DR using fourth first moments Cover

Secret

Voice Rock Jazz Female The average

HER

Mean

Variance

Skewness

Kurtosis

Stego SNR

Reconstructed SNR

0.0431 0.1278

0.0810 0.0789 0.0799

0.0002 0.0014 0.0008

0.0002 0.0035 0.0018

0.00005 0.0057 0.0028

72.6 70.5 71.55

35.1 36.8 35.95

Multimed Tools Appl

In order to compare the proposed model with other related methods in terms of hiding capacity and statistical steganalysis, Table 6 presents the comparison of the proposed model and two methods entitled as HT_EWM [8] and LAS_LWD [47]. These two methods are selected since their proposed schemes are evaluated by statistical steganalysis in the time domain. It can be noticed that the HSAFC produces an acceptable DR values even with achieving 100% hiding capacity with a slight increase in the Mean moments. The comparison of HASFC with HT_EWM and LAS_LWD gives evidence that HASFC outperforms the two methods and has an acceptable result regarding the statistical analysis and the hiding capacity.

5.5 Security test Robustness against attacks is considered an important issue in data hiding. In fact, the security of the information system depends on the secret key rather than the privacy of the scheme [31], so in order to enhance the security of HASFAC, two secret keys are adopted using logistic map function to generate two chaotic sequences used for selecting the secret bits to be hidden and the cover sample for embedding as well. Each key consists of two values which are the initial parameters of the used chaotic map, r and x0, these values are within the range of (0,4) and (0,1) respectively, so these values are represented by double values with 64 bits. Hence the number of possible random numbers using the two keys is 2 to the power 256 (264 × 264 × 264 × 264) which is (1.1579209 × 1077). The attacker systematically checks all possible numbers until the correct one is found. Let’s assume the attacker has the fastest supercomputer in the world which is Sunway TaihulLight with about 10,649 processing unit and ability to calculate 93,014 trillion processes per second (https://www.top500.org/lists/2017/06/). Also, we assume that each number test equal to one process. In fact, the attacker can do many processes to test each number at any particular time. Now, we can calculate the time that attacker needs to do brute force attack: Number of possible random numbers ¼ 1:1579209  1077 Computer can do ¼ 93; 014 trillion process=second Time ¼ number of possible numbers=processes per second

Time ¼ 1:2448888  seconds; which is equal to 3:9475  1052 years approximately:

Table 6 Statistical analysis DR comparison for HSAFC and some related methods Methods Hiding capacity % Four First moments

Mean Variance Skewness Kurtosis

HASFC

LAS_LWD [47]

HT_EWM [8]

100 0.0799 0.0008 0.0018 0.0028

25 0.2304 0 0.0004 0.0005

33

Suggest Documents