Web Pages Tamper-Proof Method using Virus-Based ...

Web Pages Tamper-Proof Method using Virus-Based Watermarking Cong Jin, Hong-Feng Xu, and Xiao-Liang Zhang Department of Computer Science, Central China Normal University, Wuhan 430079, P.R.China E-mail: [email protected]

Abstract A novel tamper-proof model of web pages using virus-based watermarking is proposed in this paper. The model provides a good security and accuracy about judging the situation of web page tampering. The classifying theory based on virus is applied when watermark embedded and extracted. The proposed scheme is applied in all kinds of HTML or XML files, not just for English letters but also for the rest of characters. More importantly, it can be restored to the original file completely when the watermark is extracted. Therefore, the proposed scheme, associated with 3rd generation technology of tamper-proof for web pages, exhibits a good property of real-time performance and security. Experiment results show that it overwhelms existing schemes of tamper-proof in that it does not increase the file size and it does not expend great computing time such as cryptography.

1. Introduction Websites plays an important role with the development of information, and it has spread all over the world. Greats of websites and web people put a good base for the rapid development of the information times. The position of web pages has been enhanced increasingly. The web pages stands for its home for the enterprise and the government. However, Hacker intrusion and homepage tampers happen constantly through the system leak caused by the complexity and diversity of application systems, although the safeguard measures such as firewall and intrusion detection have been taken. The tampered web pages account for over 30,000 during May, 2007. In other words, about one website suffered being attacked in 1.5 minutes on average. So it is extremely important for us to exploit and develop the new scheme against websites being tampered. The technique of tamper-proof has developed to 3rd generation, which is the combination of file filter

driving and event-triggered technology. The technique of time polling and the cryptography has been far away from what we need. A primary drawback of Hash algorithm [1] for tamper-proof of website is that it requires extra storage and channel to transmit the Hash value. Recently developed watermark technique provides alternatives for integrity protection of digital documents [2]. Katzenbeisser et al.[3] proposed a watermark-based method by adding space and tag into the source code of web pages. However, it has the problem of expanding the size. The watermark scheme based on PCA [4] takes up greats of computing though it does not expand the file size. The information hiding technology based on web page tags [5] purposes to insert the information into the predicted position and the tags may be executed by browser. This paper provides a novel watermark scheme that can be associated with the 3rd generation technique. And it overcomes the defaults, exhibiting a good property of real-time performance, security.

2. Theory and application 2.1. Virus-based theory Computer virus always adapts to divide itself into many pieces to hide in anywhere of the file [6]. The virus could reduplicate and reconstruct to itself as long as all the pieces are not deleted. However, when the sequence of construction is changed, the virus will lose vitality. Accordingly, the embedded watermark information is a sequence phrase. Once the web page is tampered, the watermark information can be still extracted but remains to be a changed sequence phrase. So, we can compare the extracted watermark information with the embedded watermark information one to detect the security of the web page timely.

2.2. Application of watermark in tamper-proof

The key program of the tamper-detection will be applied in the web server by the technology of file filter driving, then automatic detection will be done by the way of event-triggered. All the files in the folder that have been sorted by fast algorithm will be extracted out of the watermark information, which is timely to be compared with the information embedded in advance. If they don’t match, the corresponding file content of backup will be copied to the location of the tampered file. The process of copy is completed by the way of the non-protocol and pure text, so it behaves high security. Besides the process lasts only millisecond. The running property and real-time detection reach a relatively high standard. When users want to browse the web page, the request will be sent to the web server. Once the server responds, it calls the program to extract the watermark out of the relative file. Then the file restores the original one that will be sent to users. Please see Fig.1.

3. Watermark embedded and tamper detection 3.1. Watermark embedded and extracted Fig.2 and Fig.3 show method of watermark embedded and extracted respectively. Key

Source file

Segment by Key

Source file

Information embedded

Classified by Hash

Characteristic statistic

Fig.2 Watermark embedded model Key

Secure file

Segment by Key

3.2. Watermark embedded algorithm We select a character that exists in high frequency as the key. The key will act as the dividing point and the text, such as HTML and XML, will be segmented to lots of section. We call the section as element. All the elements are divided into 32 classes just by Hash classified. Thus one class may contain several or more elements. Meanwhile it produces the 32-bit random sequence of ASCII value by seed, where the ASCII value ranged from 00 to 31. The 32 spices of information to be embedded respond to 32-bit sequence by certain way. Then some spaces in each class are replaced by the ASCII value according to the characteristic statistic of spaces. The relative table which contains the class, the character embedded and the significant letters is created for extracting significant information. The key issue is the location where we should define. In this paper, the location is defined according to the statistics of “” on each aggregate. If the aggregate doesn’t contain any “”, all the spaces will be replaced by the responding ASCII. Else the spaces that are in front of “” will be replaced by the responding ASCII.

3.3. Tamper detection

Random sequence

Secure file

Fig.3 Watermark extracted model

Classified by Hash

Information extracted

Random sequence

The program of tamper detection consists of two steps: 1) Firstly, watermark is extracted. The basic classified step is the same as the step of watermark embedded, and then extracts the information on each aggregate. In theory, one kind of information is extracted on each aggregate. Therefore if more than one kind of information is extracted, the one that appears least on frequency is extracted. Then a sequencing phrase is produced according to the relative table. The following is watermark comparing. The extracted sequence phrase is compared with the phrase embedded. If they are not matched, we can draw the conclusion that the file has been tampered. The responding file is copied from the backup files on the bottom layer of the OS at the fastest time.

3.4. Original file restored When the user requests the web page, the program of restoring original file is running. It traverses the whole file in sequence and replaces the character

whose ASCII value is less than 31 into the space. Thus the original file is running on the server and sent to the user.

4. Experiments and analysis In the experiment, we choose a simple HTML file as the original file, and its source code is also called the cover file. Before the information will be embedded the cover file is read as the .txt file. The experiment and analysis are as follows.

Because the embedded information is the invisible characters whose ASCII value is less than 32, the editor can not recognize them. Fig.6 reflects integrate watermark information extracted in good effect, which is consisting of meaningful phrase information. It can not only present itself copyright, more importantly it is used for detecting whether the web page file is tampered by the watermark matching. Fig.7 shows the watermark information extracted from the tampered file just as adding the tag and . Obviously three information bits have been changed, so they don’t match. The hint will be given that the file has been tampered. Actually the watermark information is related with the length of each item in aggregate, so the watermark will be different if the code is added in or deleted.

5. Conclusions Fig.4 Original cover file

Fig.5 Watermark embedded file

The paper provides a new watermark scheme which is applied in tamper-proof of web page. It presents good property as follows. (1) It behaves less computing than the cryptography that is always used in the second generation technique [7] . Also it can make the most accurate judge on whether the file has been tampered, which is not done by the cryptography. Table 1 presents the running time of both algorithms. Table 1 Comparison of two algorithms on running time Algorithm Running time

Fig.6 Watermark

Fig.7 Watermark tampered In the experiment, the detect program is simulated. The program reads the source code of web page showed as Fig.4. The secure file is obtained when the watermark information is embedded. Thus it is just the file that saves in the path on the server and becomes the object to be attacked by hacker. Fig.5 presents the watermark embedded file. We can see that there are no distinct difference between original and the watermark embedded file.

Cryptography

Proposed

2.163s

1.872s

(2) The size of all embedded files doesn’t expand and it’s the same size as before, although the watermark is embedded. (3) The scheme, associated with the 3rd generation, provides the good security for the website. The user can’t look through the tampered web page because the restored file can’t be sent to the user when the extracted watermark doesn’t match the embedded watermark. Besides the program can detect the security file timely and copy the file on bottom layer to cover the tampered file once being detected to be tampered. Actually the speed of Internet traveling is so fast that it proposes high requirement on security. On the opposite, if it doesn’t depend on the 3rd generation, the program must match the information with the embedded one when the server responds to the user, which will surely add the time consumption and slow down the speed of browsing the web page.

(4) It accomplishes blind detection, which decreases the overhead and consumption on OS. One side to be mentioned is that the new scheme presents the fragile watermark so that it makes the tamper detecting behave good robustness and security. However, it demands the server to be good property and high speed. Other side, the database linked to the file should be copied and modified timely.

6. References [1] W. Stallings. Cryptography and network security principles and practice. Prentice-Hall, Englewood Cliffs, NJ, 1999 [2] Guorui Feng, Lingge Jiang and Chen He. Orthogonal transformation to enhance the security of the still image Watermarking system. IEICE Trans, Fundamentals E 87-A(4): 949951, 2004 [3] S.Katzenbeisser, A.P.Petitcols. Information hiding techniques for steganography and digital Watermark. Boston, Artech House, 2000

[4] Qijun Zhao and Hongtao Lu. A PCA-based watermarking scheme for tamper-proof of web pages. Pattern Recognition 38: 1321-1323, 2005 [5] Changzheng Wang and Jianhui Liu. Research and implementation of the information hiding technology based on web page tags. 2007 [6] Haiyan Zhou, Fengsong Hu and Can Chen. English text digital watermarking algorithm based on idea of virus. Computer Engineering and Applications, 43(7): 78-80, 2007 [7] Liu Gu. Research and implementation of information hiding based on web page, Microcomputer Information. 22: 186-187, 2006

Acknowledgements This work was supported by the Natural Science Foundation of Hubei (China) and Grant No.2007ABA119.

Request

Copy The file of information embedded

User Watermark extracted

Backup Files

Tamper detection

Respond

N Original file

Watermark match Y

End

Fig.1 Application of watermark in tamper-proof

Web Pages Tamper-Proof Method using Virus-Based ...

Web Pages Tamper-Proof Method using Virus-Based ...

Suggest Documents

Tamperproof transmission of fingerprints using visual

A Method for Indexing Web Pages Using Web Bots - Semantic Scholar

Determining Web Pages Similarity Using Distributed ...

Positioning Characters Using Forces - Personal Web Pages

Ranking billions of web pages using diodes

Semantic Annotation of Web Pages Using Web Patterns - CiteSeerX

Web Usage Based Analysis of Web Pages Using RapidMiner - WSEAS

finding high-quality web pages using cohesiveness - Amazon Web ...

A Traveling Distance Prediction Based Method to ... - User Web Pages

The Case Study as a Research Method - Faculty Web Pages

A Method for Retrieving High-Resolution Surface ... - User Web Pages

WORDRANK: A METHOD FOR RANKING WEB PAGES BASED ON ...

g6diet_tut.lo - Personal Web Pages

Automatically Synthesizing Web Pages

g6diet_tut.lo - Personal Web Pages

duction - Faculty Web Pages

Web Pages - CiteSeerX

math.GT - User Web Pages

SCHOOLinSITES Teacher Web Pages

The challenge of tamperproof internet computing - Computer

From Web Pages to Web Communities

Evaluating the Visual Quality of Web Pages Using a Computational ...

Using longâterm water balances to parameterize ... - User Web Pages

isolating informative blocks from large web pages using html tag ...

Web Pages Tamper-Proof Method using Virus-Based ...