FPGA and SoC Based VLSI Architecture of Reversible ... - IEEE Xplore

2014 Annual IEEE India Conference (INDICON)

FPGA and SoC Based VLSI Architecture of Reversible Watermarking Using Rhombus Interpolation By Difference Expansion Sudip Ghosh, Nachiketa Das, Subhajit Das

Santi P Maity and Hafizur Rahaman

School of VLSI Technology IIEST, Shibpur Howrah, West Bengal, India. [email protected], [email protected]

Department of Information Technology IIEST, Shibpur Howrah, West Bengal, India. [email protected],[email protected]

Abstract— This paper presents a VLSI architecture of rhombus interpolation based reversible watermarking by difference expansion. The proposed architecture have been implemented and tested on Xilinx Virtex-7 FPGA, Zynq SoC (System On Chip) and ultra-scale FPGA platforms. The system is based on the modified rhombus interpolation scheme to embed and extract the copyright protection for medical and military imaging applications. In the reversible watermarking, the embedded watermark can be completely extracted along with the original image in a lossless manner. The experimental result of the proposed architectures of the encoding and decoding process for reversible watermarking is implemented using VIVADO 2014.2. The results for quality factors of the original and watermarked image is obtained using MATLAB R2013a and compared with the result generated from hardware implementation in FPGA, SoC and Ultra-scale platform. The results show the viability of low cost, high speed and real-time use of the proposed VLSI architecture. Keywords—VLSI Architecture, Reversible Watermarking, FPGA, SoC, Ultra-scale, MATLAB R2013a, Xilinx VIVADO 2014.2.

I. INTRODUCTION The Reversible watermarking (RW) [1] is a data embedding process that embeds invisible data into a digital image in a reversible way. Using the reversibility property, we can remove the embedded data to restore the original image. Also we can get the watermarked image after extracting process. The embedded and extracting process can be called as the encoding and the decoding process respectively. The RW came as a technique and a tool to overcome shortcomings of current copyright laws for digital data [11]. The specialty of watermark is that it remains intact to the cover work even if it is copied. So to prove ownership or copyrights of data, watermark is extracted and tested. Over the last decade, a lot of research is performed on RW [10]; however, VLSI implementation of rhombus interpolation based approach is still an area to be explored. For the hardware implementations of the digital watermarking, to the best of our knowledge is not quite rich in the currentliterature but some hardware implementation of nonreversible watermarking have been proposed by several authors.

978-1-4799-5364-6/14/$31.00 ©2014 IEEE

In this paper, the advantages of rhombus interpolation based watermarking technique have been explored and implemented. Major concentration is given on developing a low cost, high speed VLSI architecture that can be used for real-time applications by employing the advantages of interpolation technique. The various algorithm for RW algorithms are generally classified into following categories: such that (a) RW using contrast mapping [6],(b) RW using difference expansion[2],(c) RW using histogram operation[7][9],(d) RW using integer transforms [3], (e) RW using Prediction Error [4],[5],(f) Integer wavelet transform [8] Thodi & Rodriguez [5] proposed an expansion embedding techniques for RW, where authors propose a prediction-error expansion (PEE), and a new method for expansion embedding RW. PEE combines the advantages of expansion embedding with the superior de correlating abilities of a predictor, resulting in a higher data-embedding capacity than with difference Expansion (DE). After that number of PEE methods have been developed using different prediction algorithm. Early reversible watermarking algorithms are mainly based on lossless compression, in which certain features of host image are lossless compressed to save space for embedding the payload. These methods usually provide a low capacity and may lead to severe degradation in image quality. Later on, more efficient algorithms have been devised which emphasize increasing the capacity and keeping the distortion low. Meanwhile, several valuable techniques are proposed, e.g., the technique of histogram shifting, the technique of Difference Expansion (DE), the technique of prediction-error expansion (PEE), the technique of integer transform etc. But these proposed methods are of very low complexity as it gives good estimation only at the time when the central pixel is in a uniform region. And if the pixel is not in a uniform region, the estimation error can be rather large. To reduce the estimation error Catalin Dragoi & Dinu Coltuc[4] proposed an approach known as adaptive interpolation where authors consider the same rhombus neighborhood [12] with in addition authors split the rhombus in two groups: horizontal and vertical neighbors. The central pixel is considered as belonging to the most homogeneous group and it is estimated as the average of the group. Then the author gives the uniformity of the rhombus by computing the distance between

the averages of horizontal and vertical groups and by checking the distance against a threshold. After calculating the prediction error, the authors in [4] embedded a single bit data to the central pixel by expanding its prediction error where the pixel values are limited to the spatial domain 0, 1 .That is if we take bits images then 2 . For 9 bit gray level images L = 512.To reduce the distortion introduced by the watermarking, authors used a threshold control scheme where they provide the data is embedded into a pixel only if its estimation error is smaller than a threshold T. The watermark size determines the value of T. In order to find T, the authors start simulation with T = 1 and then they simulate the embedding process. The threshold value is increase until the watermark can be fully embedded. T are collected into The pixels with prediction error ( T are for not the embedded group. And the pixels with embedded group. The organization of this paper is as follows: Section II. Preliminary, Section III gives the detail of the Digital Design Methodology of the proposed VLSI Architecture for Improved Rhombus Interpolation (IRI) based RW by DE. Section IV presents the Analysis and Experimental Results and Section V concludes the paper. II. PRELIMINARY The steps for writing the MATALB code of Data Embedding process and Data extraction process of the IRI based RW by DE explained by the authors Catalin Dragoi & Dinu Coltuc[4] is as follows: A. (Data Embedding) Step 1: Read into the image files, get the data matrixes of host image I(i, j) and watermark image W(i, j) , write the size of watermark into the head of the host image. I(i, j), i =1 : M, j = 1 : N, M and N are the number of row and column of host image respectively(for 9 bit gray images M=512 & N = 512). W(i, j), i = 1 : m, j = 1 : n, m and n are the row and column of watermark respectively(where m= M-1 & n= N-1). Step 2: Divide the image pixels by Cross set region and Dot set region this can be done by Prediction-error calculation. Step 3: Calculate the prediction error in each of the pixels. Step 4: Take the pixel selections for the data embedding based on prediction error resulted between a thresholds. Step 5: a) If the payload capacity size of watermark, the watermark image can be embedded into the host image in the selected pixels. Therefore the pixel will be expanded. Step 6: Embed the watermark data into the divided selected pixels and then export the complete image data matrix by embedded threshold.

B. (Data Extraction) The extraction algorithm is a reverse process of the embedding process. The steps for writing the MATALB code of Data extraction process of RW is as follows Step 1: Extract the size of the watermark from the head of host image data. Step 2: divide the image pixels by flat region and rough region this can be done by Prediction-error calculation. Step 3: calculate the prediction error in each of the pixels. Step 4: take the pixel selections for the data extraction based on prediction error resulted between thresholds. Step 5: extract the expanded bytes of watermark data until all bytes are extracted by depending on embedded threshold. Step 6: Recover the watermarked image and the original image. After successful simulation of the embedding and decoding process we write another function code to calculate the Quality of Services of each process which is briefly discuss in section IV. III. DIGITAL DESIGN METHOLOGY OF PROPOSED VLSI ARCHITECTURE FOR IRI BASED RW BY DE

The structures of the insertion and extraction modules are very similar, as they have almost same components. The data path of VLSI architecture for the encoder-watermarking chip of IRI based RW by DE is shown in Fig 1 which consists of four distinct modules: module 1 checker for setting the boundary condition of Cross set region, module 2 for setting the boundary condition of Dot region, module 3 for embedding process for Cross Set region and module 4 for embedding process for Dot Set region. The random access memory (RAM) is used to store the cover image, which is to be watermarked. The cover image data can be written to the image RAM by activating proper control signals. The watermark RAM uses as a storage space for the watermark data. The watermark data can be generated by given an external input by the user. In this hardware design, we assumed that at any point, a (512 512) cover image can be stored in the cover image RAM and a (512 512) watermark can be stored in the watermark RAM. In this architecture it is possible to watermark only a (510 510) region of the original image at a time. The region of the original image to be watermarked is described in terms of five parameters (North, East, center, West and South), and address decoders are used to determine the proper locations. The IRI RW insertion algorithm involves adding (or subtracting) a constant times the image pixel grey value to (from) a constant times the neighborhood function. The four output lines from the cover image RAM provide the pixels I(i, j),I(i, j 1), I(i 1, j) and I(i 1, j) ,I(i, j 1)for the row–column address pair (i, j) where i and j are both varies from 2 to 512 using proper control signal.

Fig 1: Data path for encoding reversible watermarking chip Fig 4: Data path for module 3 of encoder chip

Fig 2: Data path for module 1

Fig 3 Data path for module 2 The four outputs go to module 1 for checking the boundary condition to set the Cross set region. The data path for module 1 is shown in fig 2, consists an even parity bit checker. When the parity bit is generated, it makes the 1 bit flag register high which in turn activates controller to perform cross embedding process with current data of i and j. The next process, module 2 is used for checking the boundary condition to set the Dot set region which is shown in fig 3. The process of the module 2 is exactly same as the module 1 except it had an odd parity bit checker instead of the even parity checker. The module 3 and module 4 are basically used for implementation of rhombus interpolation for embedding process of cross and dot set respectively which is discussed in section II. The temporary registers serves as temporary storage for the first rhombus interpolation and a control signal is added to it to select between Data embedding and Data extraction operation. Here in the architecture the controller is modeled as a synchronizer between the modules. The key or watermarking image is read from storage to the input register. Since embedding process of Cross and Dot set is exactly same each other, only data path for module 3 is shown in fig 4.

This architecture provides the capability to watermark adaptive interpolation block with minimum FPGA resources, as it consists the same block for Data embedding and data extraction. The architecture of decoder RW chip is exactly same as the encoder chip except the watermarked image and watermark image data as the inputs. After decoding process we got the original cover image as a output. In case of the decoder chip, the data path for module 1 and module 2 are same as the first two modules of encoder chip. The last two modules of the decoder chip are almost same with the last two modules of encoder chip but these modules require more no. of subtractors, with less number of adders than encoder chip. For this reason only the whole decoder RW chip and the data path for module 3 of decoder chip are shown in fig 5 and fig 6 respectively.

Fig 5: Data path for decoding reversible watermarking chip

Table 1 show the quality factorr of each tested image with their corresponding watermarked im mage. Here the size of all the tested images is taken as 9 bitt or (512x512) in pixels. There are many way to calculate the Quality factors or sometimes it called as Quality of services [113]. By following this paper for measuring the quality of services, we calculate the Average Difference (AD), Mean Absoolute Error (MAE), Maximum Difference (MD), Mean Squaared Error (MSE), Normalized Cross-Correlation (NK), Peakk Mean Square Error (PMSE), Peak Signal to Noise Ratio (PSNR) ( and Structural Content (SC) and Bit Rate (Br). Table 1 Comparison Resultss in terms of Quality Factors for The different tested imagees SL. NO.

Fig 6: Data path for module 3 of deccoder chip IV. IMPLEMENTATION RESULTTS A. Software implementation The encoding and decoding process are fiirst verified using MATLAB R2013a. We take ten (512x512) gray g images as the test images. The ten tested images and thheir corresponding watermarked images are shown in fig 7 and fig f 8 respectively.

AD, MAE, MD

MSE

NK

PMSE

PSNR

SC

Br

Image1

0,0,0

0.0965

1.0357

3.545e-06

58.285

1.0726

0.2496

Image2

0,0,0

0.0969

1.0444

3.789e-06

0.0610

0,0,0

0.0969

1.0425

3.931e-06

1.0870

0.2976

Image4

0,0,0

0.0902

1.0375

2,606e-06

1.0766

0.2293

Image5

0,0,0

0.0969

1.0446

8.8975e-05

1.0917

0.1936

Image6

0,0,0

0.0956

1.0823

8.9976e-05

1.0714

0.3407

Image7

0,0,0

0.0969

1.0508

2.2933e-05

1.1046

0.3074

Image8

0,0,0

0.0830

1.0328

1.6112e-06

58.267 9 57.632 0 58.580 5 58.267 9 58.958 9 58.267 9 58.938 8 58.269 9 58.165 1

1.0909

Image3

1.0669

0.2748

1.0987

0.1946

1.0864

0.1471

Image9

0,0,0

0.0989

1.0476

2.9975e-05

Image10

0,0,0

0.0969

1.0422

2.0351e-05

PSNR 80 60 40 PSNR

20 0 64x64 128x128 256x256 512x512 Fig 9: PSNR vs. Sizze of the cover image

Fig 7: The ten Tested Images

B Br Br

1 0.8 0.6 0.4 0.2 0 Fig 8: The corresponding Watermarked Imagess of the ten Tested image shown in above figure

64x64

56x256 512x512 128x128 25

Fig 10: Bit rate (Br) vs.. Size of the cover image

80

MSE

0.4

60 PSNR

40 20

0.2 MSE

PSNR(using Hardware)

0

0

64x64 128x128 256x256 512x512 64x64 128x128 256x256 512x512 Fig 11: MSE vs. Size of the cover image

NK

1.01

Fig 14: PSNR vs. Size of the cover image 1 NK

Br(Using Software)

0.8 0.6

Br(Using Hardware)

0.4

1.008

0.2 0

1.006

64x64 128x128 256x256 512x512

64x64 128x128 256x256 512x512 Fig 12: NK vs. Size of the cover image

2.00E-05

PMSE

Fig 15: Bit rate (Br) vs. Size of the cover image

PMSE

1.00E-05 0.00E+00

0.25 0.2 0.15 0.1 0.05 0

MSE(Using Software) MSE(Using Hardware)

64x64

64x64 128x128256x256512x512

Fig 16: MSE vs. Size of the cover image

Fig 13: PMSE vs. Size of the cover image The fig 9 shows the graph between the PSNR and the various size of the first test image. Here we conclude that the PSNR is increased with the size of the cover images. It implies that by increasing the size of the cover image, the total area of the embedded watermarked on the cover image is also increased. Fig 10 shows the graph between bit rate vs. size of the test image 1. The bit rate is slightly increased with increasing the size of the cover image. i.e. we can embed more number of bit while taking a big size of cover images. From the remaining figure, i.e. from fig 11 to fig.13, we noticed that the errors can be minimized by increasing the size of the cover images. B. Hardware implementation The prototype is implemented in VHDL and synthesized using Xilinx VIVADO Virtex7, Zynq-7000 and Virtex UltraScale technology with xc72v2000tflg1925-2L, xc7z100ffg1156-2 and xcvu095-ffvd1924-3-e-es1 target device. The Quality factors [13],[14] for each devices of Hardware implementation with respect to the values of the quality factor measured by software (MATLAB R2013a) are shown from fig. 14 to fig. 18.

128x128 256x256 512x512

1.01

NK(Using Software) NK(Using Hardware)

1.009 1.008 1.007 1.006 1.005 64x64 128x128 256x256 512x512 Fig 17: NK vs. Size of the cover image 1.50E-05

PMSE(Using Software) PMSE(Using Hardware)

1.00E-05 5.00E-06 0.00E+00 64x64 128x128 256x256 512x512

Fig 18: PMSE vs. Size of the cover image

In Hardware Implementation, the Quality factors are all same for these three devices. From the above fig. 14 to fig. 18 , we noticed that the values of the quality factors measured by hardware are almost same the values measured by software except the values of PSNR and NK which are slightly greater than software’s measured values. The table 2 shows the hardware utilization along with power and timing comparison between Virtex7, Zynq-7000 and Virtex Ultra-Scale for data embedding process. Table 2: Comparison result between Virtex7, Zynq-7000 and Virtex Ultra-Scale for Device Utilization along with power and timing information for Embedding process Resource

Utilization for Virtex-7

LUT I/O Power Frequency

0.53% 87.33% 7.2 Watt 123.4 MHz

Utilization for Zynq7000 0.26% 73.24% 3.7 Watt 168.23MHz

Utilization for Vertex UltraScale 0.13% 61.12% 1.4 Watt 204.6MHz

The table 3 shows the hardware utilization along with power and timing comparison between Virtex7, Zynq-7000 and Virtex Ultra-Scale for data extraction process. Table 3: Comparison result between Virtex7, Zynq-7000 and Virtex Ultra-Scale for Device Utilization along with power and timing information for Decoding process Resource LUT I/O Power Frequency

Utilization for Virtex-7 4.58% 97.38% 9.2 Watt 138.1 MHz

Utilization for Zynq7000 3.26% 82.82% 6.7 Watt 178.33MHz

Utilization for Vertex UltraScale 1.14% 71.12% 3.4 Watt 210.436 MHz

expansion method. The chip can be easily integrated in any existing JPEG encoder to watermark images with different FPGA, SoC and Ultra-scale FPGA families. The implementation on Xilinx FPGA gives very compatible and suitable result of the reversible watermarking system on chip. In future scope, the low-power VLSI features, such as multiple supply voltages, dynamic clocking and clock gating must be used to reduce the power utilization which is currently high. The current results show the viability of low cost, high speed and real-time use of the proposed VLSI architecture. REFERENCES [1]

[2]

[3]

[4]

[5] [6] [7]

[8]

[9]

[10]

[11]

Hence we can say that the proposed VLSI architecture give very suitable realization of the reversible watermarking algorithm on hardware. V. CONCLUSION In this paper, we presented VLSI architecture of reversible watermarking encoder and decoder chips that can perform reversible watermarking using improved rhombus interpolation scheme. To the best of our knowledge, this is the first FPGA, SoC and Ultra-scale based watermarking VLSI architecture for improved rhombus interpolation by difference

[12]

[13]

[14]

J. Fridrich, M. Goljan, and R. Du, “Lossless data embedding—new paradigmin digital watermarking,” EURASIP J. Appl. Signal Processing,vol. 2002, no. 2, pp. 185–196, Feb. 2002. J. Tian, ”Reversible Data Embedding Using a Difference Expansion”, IEEE Trans. on Circuits and Systems for Videotechnology, vol. 13, no. 8, pp. 890–896, 2003. Y. Hu, H.-K. Lee, J. Li, ”DE-based reversible data hiding with improved overflow location map”, IEEE Trans.on Circuits and Systems for Video technology, vol. 19,no. 2, pp. 250-260, 2009. Catalin Dragoi, Dinu Coltuc, “Improved rhombus interpolation for reversible watermarking by difference expansion” IEEE EUSIPCO.2012 pp 1688-1692 D. M. Thodi, J. J. Rodriguez, ”Prediction-error based reversiblewatermarking”, ICIP’04, pp. 1549-1552, 2004. Coltuc, D., Chassery, J.M.: Very Fast Watermarking by Reversible Contrast Mapping. IEEE Signal Processing Letters 14, 255–258 (2007). Rajendra D. Kanphade and N.S. Narawade,“Forward Modified Histogram Shifting based Reversible Watermarking with Reduced Pixel Shifting and High Embedding Capacity”, IJECE,. ISSN 0974-2166 Volume 5, Number 2 (2012), pp. 185-191 Xuan, G.R., Yang, C.Y., Zhen, Y.Z., and Shi, Y.Q.: ‘Reversible data hiding using integer wavelet transform and companding technique’.Proc. IWDW, 2004 Wien Hong, Tung Shou Chen, Kai Yung Lin and Wen Chin Chiang, “A modified histogram shifting based reversible data hiding scheme for high quality images”, 2010 Asian Network for Scientific Information, Information Technology Journal, 9(1),2010, 179-183. Van Leest, A., Van der Veen, M., & Bruekers, F. (2003). Reversible image watermarking. Proceedings of the IEEE International Conference onImage Processing, II, 731–734. Juergen Seitz. Digital Watermarking for digital media, University of Cooperative Education Heidenheim, Germany, 2005. V. Sachnev, H. J. Kim, J. Nam, S. Suresh, Y. Q. Shi, ”ReversibleWatermarking Algorithm Using Sorting and Prediction”, IEEE Trans. on Circuits and Systems for Videotechnology, vol. 19, no. 7, pp. 989-999, 2009 C.Sasi varnan, A.Jagan, Jaspreet Kaur, Divya Jyoti,.D.S.Rao,“Image Quality Assessment Techniques pn Spatial Domain”, IJCST Vol. 2, Issue 3, September 2011 Zhou Wang, , Alan C. Bovik, Hamid R. Sheikh, Student and Eero P. Simoncelli,“Image Quality Assessment: From Error Visibility to Structural Similarity”, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 4, APRIL 2004.

FPGA and SoC Based VLSI Architecture of Reversible ... - IEEE Xplore

FPGA and SoC Based VLSI Architecture of Reversible ... - IEEE Xplore

Suggest Documents

An efficient reformulation based VLSI architecture for ... - IEEE Xplore

Defect-aware SOC test scheduling - VLSI Test ... - IEEE Xplore

Whirlpool hash function: architecture and VLSI ... - IEEE Xplore

Digital Design and Pipelined Architecture for Reversible ... - IEEE Xplore

Low Cost VLSI Architecture of Resisting Long Echo ... - IEEE Xplore

VLSI Architecture and FPGA Prototyping of a Secure Digital Camera ...

Novel Architecture for Efficient FPGA Implementation of ... - IEEE Xplore

Parallel-Processing VLSI Architecture for Mixed Integer ... - IEEE Xplore

vlsi architecture and fpga implementation of ice encryption ... - CiteSeerX

Design and Implementation of Fast FPGA Based ... - IEEE Xplore

Theoretical Design and FPGA-Based Implementation of ... - IEEE Xplore

Parallel-Processing VLSI Architecture for Mixed Integer ... - IEEE Xplore

VLSI-SoC 2013

CORDIC and Taylor Based FPGA Music Synthesizer - IEEE Xplore

An Optimized and Low-Cost FPGA-Based DNA ... - IEEE Xplore

Very low resource table-based FPGA evaluation of ... - IEEE Xplore

FPGA Based Control of Series Resonant Converter for ... - IEEE Xplore

Comparing the performance of FPGA-based custom ... - IEEE Xplore

FPGA implementation of Hilbert transformer based on ... - IEEE Xplore

FPGA-based Hardware Implementation of Optical Flow ... - IEEE Xplore

A Comparative Study of FPGA based Cycloinverter with ... - IEEE Xplore

Design of FPGA-Based Traffic Light Controller System - IEEE Xplore

Real-Time FPGA-Based Baseband Predistortion of W ... - IEEE Xplore

Synthesis of Reversible Synchronous Counters - IEEE Xplore