Secure Watermarking for Authentication of Scanned ...

2 downloads 0 Views 956KB Size Report
Java application running on any internet machine. Communication with server is also secured via a symmetric encryption: RC4 [7]. 6. The encoded signature of ...
SWAD Secure Watermarking for Authentication of Scanned Documents

Fernando Martín-Rodríguez

Juan Manuel Fernández-Montenegro.

Signal Theory and Communications Department. ETSET. University of Vigo (www.uvigo.es) Vigo (Pontevedra), Spain. [email protected]

Signal Theory and Communications Department. ETSET. University of Vigo (www.uvigo.es) Vigo (Pontevedra), Spain.

Abstract—This paper is about an application that tries to help people in their relations with their local government. Id EST: in many bureaucratic processes, citizens are required to provide copies of paper documents such as diplomas, birth certificates... Obviously, this kind of documents cannot be provided online. Using this application, users are able to provide the required documents on the internet and the administration can check document reliability to a level of security similar to that offered by an "in person" delivery Keywords-Information security, Document Authentication, Encryption, E-government

I.

The system consists of three Java [1] applications (we use Java because we want to offer service to users running any computer platform). The three applications are as follows:

-

The server: this application is necessary to to generate some keys that are used for correct authentication. As all activity is logged, we will have all the authenticated documents registered (this is fact registers users). This server would be run by the organization receiving the documents or by an external contractor.

In the rest of the paper, we will describe technical foundations of our application. We will also state our conclusions after development.

INTRODUCTION

As we explained in the abstract, this system allows sending secure scanned documents to a (public or private) organization that can check their authenticity. This is achieved using an application that allows scanning physical documents producing digitally signed images. User cannot avoid digital signature and after digital image is produced cannot modify it without being detected. In fact, a compliant user could not know that their images are being signed, being this a reasonably transparent system.

-

-

The client: it scans and authenticates documents. This is the main and the most visible part. It will be available for all (intended to be published on the Web). User has to download this application and to install it (this will require a valid Java virtual machine because application and its installer is a .jar file). User is not obliged to register in any place but he/she has to bear a valid digital certificate (issued by an appropriate certification authority). The checker: this application is only available to the organization receiving documents. This application permits to check document authenticity.

II.

TECHNICAL FOUNDATIONS OF THE SYSTEM

The initial idea of this project was to generate a secure hash from the image and hiding it into the same image using watermarking techniques. This initial idea was reinforced adding the user’s digital certificate (in order to also identify users that are authenticating documents) and combining it properly with the digital hash. We use various encrypting and digital signature techniques to make a very robust system. We decided to use Java [1] for programming because of its multi-platform nature. This is important because we do not know what platform the user will be using and a public service should be compatible with all (or at least with the most) of them. For scanner control we chose TWAIN standard [2] because nowadays this allows access to practically any hardware in the market.

A. Authentication Process The authentication process (user side) consists of the following steps: 1.

The user (any person that wants to submit a document) downloads the client application that is multi-platform (a Java executable .jar file) and very easy to install. It is not necessary to register into any system. It is important that this application be distributed from an institutional website so that users can rely on it.

2.

The application scans the document (it uses TWAIN standard to access the scanner [2]) and does not allow direct access to the image until it is authenticated.

3.

A hash signature is computed from the image. We use SHA-1 algorithm [3]. This algorithm produces a 160 bits hash for any file (SHA signature). Probability of a repeated signature between different files is negligible. Any modification of the image will cause that signature will be different when checked. Least significant bit is not taken into the computation because it will be used later for the watermark.

4.

We ask the user to provide a valid digital certificate in .p12 format [4]. We will use the private key of that certificate to cipher the former hash signature (RSA algorithm [3], [5]1). This step authenticates the user and the image.

5.

Then the application communicates with our secure server and asks for a new cipher key. The server registers this operation (we know who authenticates documents, when, and from which IP address2) and generates a pair of public/private keys together with a code to identify them. Only the code and the public key are returned to the user application. In this case, the private/public key is generated to protect the signature information (to avoid anyone can read and/or check it). User gets the public key but private key is only given to authorized users. Server is also a Java application running on any internet machine. Communication with server is also secured via a symmetric encryption: RC4 [7].

6.

The encoded signature of step 4 and the user certificate are ciphered now using RSA and the public key of step 5. This step completes the information protection. A non authorized user neither can check the documents nor can extract user information.

7.

Final encoded message (step 6) and the server code (which it is necessary at the server to locate the public/private key pair) are introduced into the image using simple watermarking. Id. EST: introducing the information in the least significant bit (LSB). This easy technique consists simply of writing information in the LSB. Perceptual characteristics of human vision (able only to distinguish about 60 gray levels) make that information in the LSB be only an invisible noise.

Finally, user gets a regular image file (.png format) that can be sent to anyone that requires it3.

1

RSA is an asymmetric encrypting scheme. Id EST: different keys are used for ciphering and deciphering. In this case, the private key (that belongs to user certificate and it is only known in the user side) is used for ciphering. Deciphering will be performed using the certificate public key (that it is known to anyone that receives the digital certificate). Deciphering will only be possible if the private/public key pair is consistent (RSA is designed in so manner that it is impossible to compute one of the keys although we have the other). This ciphering is in fact signing the SHA information so that we know that it was created by the user bearing the certificate.

2

With this mechanism we will get a registry on users authenticating documents although we have not obliged them to register. 3 Our software always generates .png files but these ones could be converted to any lossless image file format. Format has to be lossless so that watermark is not affected (for example, .jpg is not allowed). A very interesting format that is still not supported is .pdf which also allows authentication and digital signature.

Obviously if image was altered in any way (and at any time after scanning) this process will not succeed.

Figure 1. Authentication process. Least significant bit of the image is forced to zero before the SHA-1 computation.

Figure 2. Final data frame to be inserted into the image (via watermarking). The flag indicates if the image was obtained from the scanner or if it was loaded by user (an option we decided to include). User has no control on this flag and scanned images are not accessible until they are watermarked.

B. Verification Process Authorized users (document receivers) will use another Java application to check the authentication: verifying that scanned document is identical to the one that was presented to the scanner. The verification application will only be available to the authorized institutions. The processing is done inverting the previous process of authentication. Id EST: 1.

Reading the watermark: .image LSB is read to get information. At this point we have a code (from the authenticating server) and an encrypted message.

2.

Communicating to the server: the application has to give the code to the server in order to get a key (that is necessary for decrypting message). The server will search its database to retrieve the private key and will grant it only to authorized clients (communication again is encrypted via the RC4 algorithm).

3.

Deciphering the message: using the private key and RSA algorithm. At this point we will have access to the user certificate (so that we positively know who authenticated the message). We will also have access to the certificate public key.

4.

Checking the user digital signature: this consists simply of deciphering using the certificate public key (at this point we do not have access to private one). At this moment we have access to the image signature (SHA-1).

5.

Checking image signature: we re-compute the SHA signature and we compare it to the one obtained in step 4. If they are equal we will know that image was not modified after scanning and we also know who scanned it.

Figure 3. An image and the eight bit planes from bit 7 (most significant) to bit 0 (least significant). See that bit 0 plane is practically a noise signal that can be modified with no perceptual effect.

We have chosen a scheme where only authorized users are allowed to check an authenticated document. With other formats (like .pdf), any user can perform the authenticity test. If we wanted to make checking public, we could allow free access to the verification application at any moment. With respect to the used encryption algorithms, they are very secure (at least up to the moment as they use long keys Figure 4. Original image (left) and image with a message in the LSB (right). It and there are no known effective attacks).

is impossible for human eye to distinguish between them, what’s more: visually, it is also impossible to know that the right one contains a message (message is the word “MESSAGE” repeated 8 times). In a 512x480 image like this, we can save a message of 512x480=245760 bits (245760/8=30720 ASCII characters... More than this paper!!!).

IV.

CONCLUSIONS

We have implemented a system able to authenticate scanned documents where we guarantee not only authenticity but also who has scanned them. A registry of authenticated documents is created at a server. This system can be very useful for easing the bureaucratic processes allowing users to present physical documents on the internet.

V. Figure 5. Verification process.

III.

SECURITY ANALYSIS

In this section we want to analyze system security. Id EST: thinking about possible attacks and study system robustness against them. To begin we must clarify that this system only guarantees that a scanned document is equal to that one presented to the scanner. If a good counterfeit is scanned, we will detect it as valid. This system tries to be a digital version of the registry stamp that many public administrations use to authenticate the physical documents received. Due to the use of a very simple (weak) watermarking technique, it is easy to destroy the information in the watermark (without affecting the perceptual information). Perhaps this is not the worst problem because probably there will be few users interested in erasing an authenticity stamp from their documents. A solution for this could be the use of stronger watermarks [7]. System is very strong in detecting document manipulations because any slight change will result in a modification of the image hash. With some manipulations, LSB watermark could be affected so that the entire information can get lost. In this case, we could not be able to retrieve the certificate and to know who authenticated the document (if the “server code” in figure 2 is correctly retrieved, we can identify the user).

FUTURE LINES

Although our system is fully functional nowadays, we have detected some interesting improvements. Future working lines could be these ones: - Generating other output formats, specially signed .pdf files because this format is on the way to become a standard in secure documents. - Reading multi-page documents using an automatic feeding scanner. In this case we should choose how we want the output: several image files, a multiimage file (.tif) or .pdf (which again becomes the most interesting format). - Using a stronger algorithm for image watermarking [7]. LSB watermarking works fine but more advanced method treat the problem as a digital transmission and can create watermarks that are resistant to image processing (like scaling, rotating, JPEG compression…). - Designing (or using) a hash algorithm that depends on image information, not on individual bits. With the described method, any bit change on an image makes the hash to change (re-scanning the image would yield a different hash). With image hashing [8], hash changes only when the image changes are “perceptually significant”.

Figure 6. Screenshot of the client application when authenticating a document.

[3]

ACKNOWLEDGMENT We wish to thank to the engineers Javier Biurrum and Fernando Gil who suggested the system in this paper when working for the autonomous region of Galicia local government.

[4] [5] [6] [7] [8]

REFERENCES [1] [2]

Bruce Eckel, Thinking in Java, 4th ed., Prentice Hall: Upper Saddle River (NJ, USA), 2006. www.twain.org

William Stallings, Cryptography and Network Security: Principles and Practice, 4th ed., Prentice Hall: Upper Saddle River (NJ, USA), 2006. RFC 2459: “Internet X.509 Public Key Infrastructure. Certificate and CRL Profile”. Andrew S. Tanenbaum, Computer Networks, 4th ed., Prentice Education: Upper Saddle River (NJ, USA), 2003. RC4 (stream encrypting) official website: http://www.wisdom.weizmann.ac.il/~itsik/RC4/rc4.html F. Pérez-González, S. Voloshynovskiy, Fundamentals of Digital Image Watermarking, 1st ed., John Wiley & Sons Inc, 2011. M. Johnson and K. Ramchandran, "Dither-Based Secure Image Hashing Using Distributed Coding", Proc. IEEE International Conference on Image Processing (Barcelona, Spain), September, 2003.