IAAS: An Integrity Assurance Service for Web Page via a Fragile Watermarking Chain Module Peng Gao
Tokyo Institute of Technology, Japan
[email protected]
Hao Han
National Institute of Informatics, Japan
[email protected]
ABSTRACT
1.
As the main facial point of the Web-based e-commerce which is frequently considered as a most important application area of Internet, Web page has been being given more and more duties. Accompanied by this trend, the importance of integrity protection for Web pages dramatically grows, since it influences a large amount of people’s business and daily life. Actually, during the past few years, the integrity of Web page is under constant threat, such as unauthorized modifications, malicious code injections, which make the risk of fraud lurking in page browsing high and cause many negative consequences. Especially after the so called in-flight page change has been widely detected in recent years, the situation is getting even urgent. In this paper, we present a design of an ”Integrity As A Service”(IaaS) system to enforce integrity in Web pages, which is based on a novel fragile watermarking chain scheme and covers both models of the traditional host-target and the new in-flight-target unauthorized modification. Our investigation and analysis show that the proposed system can not only offer a one stop service of Web page integrity protection to the Web sites and users, but also have the practical merits for the small and medium enterprises (SMEs),such as the reduced cost of system development.
1.1
Categories and Subject Descriptors C.2 [Computer Systems Organization]: Computer Communication Networks; H.4.3 [Information Systems]: Information Systems Applications—Miscellaneous
General Terms Analysis, Design, Security
Keywords Web page, Web site defacement, Integrity protection service, HTML anti-tampering, Fragile watermarking chain
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICUIMC’12, February 20-22, Kuala Lumpur, Malaysia Copyright 2012 ACM 978-1-4503-1172-4 ...$10.00.
Takehiro Tokuda
Tokyo Institute of Technology, Japan
[email protected]
INTRODUCTION Web Page Tampering
Nowadays the Web is becoming a core infrastructure of our society. Most of companies and institutes use Web based services for e-commerce to promote their business activities[2]. A large number of people use Web to get and share news in the world and also rely on Web services, such as online shopping, email, for making daily life more convenient. It is clearly that the quality of presence of Web page is a key issue for any organization, which must convey a sense of trust and dependability to its users. As a facial point, the Web page is generally located at the frontline of an organization within its network architecture, consequently they have become one of the main attack targets from Internet and the Web page defacement has become a common threat for organizations exposed on the Web. A defaced site typically contains only a few messages or images representing a sort of signature of the hacker that performed the defacement. It may also contain disturbing images or texts, political messages, and so on. These unauthorized modifications which corrupt the contents of Web pages may thus cause serious damage to the organization, such as a reduced number of accesses, a degradation of a company’s credibility. With hiding or misrepresenting important information, they can also harm the users of Web sites. Several statistics indicate the occurrence rate of these incidents and how long these defacements typically last, which can express the Web page defacement phenomenon’s practical impact. Sometimes defacing a Web site can be done easily, the incidents of Web page defacement are very common on the Internet which is similar to Phishing, worms, and denial of service[36, 37, 38, 39, 40, 41]. According to archive of Zone-H, which is a public Web site for listing defaced pages, approximately 481,000 defacements happened during 2007 alone and more than 1.7 million defacements from 2005 to 2007. Furthermore the trend has been constantly growing in the recent years[3, 42, 43], for example, on April. 2010 this Web site registered over 95.000 defacements, while it only had 60,000 in 2009 for the same period. The fact also tells that not only the small Web site or personal homepage suffer malicious page modification, serious damaging defacements also occurred to some IT companies and important organizations, such as The homepage of Microsoft France in 2006, the Italian Air Force on 26 January 2007 have been hacked, even the Web sites for the United Nations and the Spanish Ministry of Housing were defaced[38, 44, 45]. For repairing the site, a CSI survey estimates a total loss of US
dollar 725,000 in 2007. Because usually the IT companies first build a Honeypot site to investigate the problem and then fix the security vulnerability, at last restore the original page content. More depressing is that up till now Web content integrity has not received much attention in information security research, which partly because that the interests of the military and cryptography community are more focusing on data confidentiality[46]. It is stated that 90% of security research has been concentrated on confidentiality, mainly because of availability of military funding, only 9% on document integrity and 1% on document availability[47]. However, the highest priority of the business sector is focusing on the need for document integrity and wide availability. For example, the business world is interested in publishing data such as books, image, e-commerce, online business, etc. and to ensure that its integrity and its authorship are dynamically maintained. In the real world, the attacks targeting the integrity of Web page are dramatically increasing year by year. Not only is the occurrence number of page defacement very large, the statistics on the typical duration of defacement are also crucial[48]. Based analysis on a sample of more than 62,000 incidents that monitored for approximately two months, the researchers found the typical reaction time for recovering the defaced page is surprisingly long. Near 43 percent of the defacements in the sample lasted for at least one week, and more than 37 percent were still in place after two weeks. Furthermore the experiment results showed that even the pages with higher PageRank values also tend to exhibit an unaccepted reaction time that only less than 50 percent of samples did detection in one day. For example the Web page of Company Registration Office in Ireland were defaced with no restore from December 2006 through mid-January 2007. Such defacement that lasts a few weeks is intrinsically much more harmful than one that lasts a few minutes.
1.2 Research Motivation On the other side, the Web page security threats are evolving. Traditionally malicious Web page modifications happened when a Web server got intruded. Generally, Web page tampering involves one or more of the following strategies. • The attacker tries to comprise the system to open a ”backdoor” for performing some malicious actions which may involve giving away some credentials that could be used later by the attacker in order to actually carry out the attack. • The attacker steals legitimate account and password. The attacker can be or collaborate with an insider intruder. • The attacker exploits system vulnerabilities, such as implementation errors, design errors or intrinsic limitations in the targeted system. The third strategy is the most commonly used by attackers. It involves exploiting a Web application bug, for instance by performing a script file inclusion, a Web server intrusion and so on[49]. We name this kind of attack as host-target model. Recently a more harmful attack method called Web-based attack is getting popular. The Web-based attack is able to automatically infect a victim user’s system
by luring the user to visit a hacked Web page. A representative example is the so called drive-by-download attack. A large number of Web pages have been compromised by this kind of harmful attack. However, in the process of infection, we can find that the attackers still need to perform an unauthorized modification on the original Web page. Therefore even the injected malicious codes which are usually written in Javascript, are obfuscated with various code mutating methods, we still believe that it is an important issue to protect the integrity of Web pages for mitigating this kind of attack. In 2008 Reis et al. did an experiment shown that the in-flight changes of Web page widely exist when Web pages are transferred from a Web server to a user’s Web browser based on the Hypertext Transfer Protocol(HTTP)[12]. In the experiment, their Web server found at least 1.3% inflight page changes made to the HTML code of test page for visitors from over 50,000 unique IP addresses. The attacker can be an entity between the end user and the Web server, such as a proxy server, a wireless access point, and even the Internet Service Providers (ISPs). We name this kind of attack as in-flight-target model which loses effectiveness of traditional integrity protection methods and consequently have potential damage to existing information systems. For examples, the advertisements which were injected by ISPs to increase revenues that also damage user experience; the pop-up were blocked by the firewall which would decrease potential revenue of the Web site; even more the malicious code can be injected by malware authors. The Bahama botnet and Gumblar botnet has been reported, which make the infected systems to display to end users changed advertisement as well as search results [22]. Consequently, users may click on altered ads and generate revenue for the bot master instead of the Web site. These facts we described above make us believe that it becomes increasingly important to develop a countermeasure for protecting the integrity of Web page in both scenarios of the traditional host-target and the new in-flight-target Web page attacks.
1.3
Challenging Issues
• Since Web page integrity has not received enough attention in information security research. We have to investigate the topic from scratches from many integrity enforcement techniques which have been designed separately in the two models with their own advantages and disadvantages. Therefore how to classify the attack and countermeasure targeting the Web pages’ integrity and leverage their advantages into our scheme to maximized benefits is the first challenge. • The proposed scheme for protecting the integrity of Web page should consider how to solve the attacks of the host-target and in-flight models together. In this case, we also concern how to reduce the deployment cost of Web server investments and valid digital certificate purchasing for SMEs which usually have very limited human resource to effectively protect their Web sites.
1.4
Our Contributions
In this paper, we mainly focus on investigating the above challenges to develop a system for integrity enforcement of
Web pages. In particular, we make the following contributions: • We classify the attack targeting the integrity of Web page corresponding to host-target and in-flight-target models and analysis their advantages and disadvantages. And then we propose an Integrity as a Service(IaaS) architecture to enforce integrity protection in Web pages and also suit both models, which also can reduce the cost in system development for SMEs to effectively protect their Web pages. • We integrate our Fragile Watermarking Chain (FWC) scheme[26] with IaaS and develop an Integrity Assurance Service system. The FWC can verify the integrity of the segments of divided HTML code then enforce an integrity protection in web pages themselves. Therefore the proposed system not only can support web cache technology, save the communication channel bandwidth and storage space of the web server,but also can detect the location of tampering line in HTML code. This work is organized as follows: Section 2 classifies the existing countermeasures into two models. Section 3 describes basic preliminaries used in our integrity enforcement work. Section 4 states the system and adversary model and design requirements. Section 5 gives out our integrity assurance service design and introduces the implementation. In Section 6 we discuss the merits of the proposed architecture. Section 7 shows the security analysis, system limitation and a comparison with similar work. Section 8 concludes the paper.
2. COUNTERMEASURE CLASSIFICATION As the best of our knowledge, there does not have a countermeasure which concerns to protect the integrity of Web page in the both two models together. We describe and analysis the previous work separately.
2.1 For the Host-target Model Some security related methods, such as Intrusion detection systems (IDSs), Honeypot systems, and Hash-based integrity checking systems have been used as countermeasures in this model. Usually these kind of systems will try to give a warning of unauthorized modification to guarantee the Web server is presenting the right page to users. The IDSs try to directly detect or prevent the unauthorized modifications [5, 23, 24]. The ideal situation is that it fulfills detection before the intrusion occurs, but such methods cannot always do a good job in Web environment [25]. Generally speaking, the main disadvantage of this kind of approach is that, usually a lot of noises make a high false positives and the difficulty to cover all intrusion patterns bring high false negatives. Hash-based integrity check schemes are used as a metric for efficiently validating a unauthorized modification of Web pages [6, 7, 8, 9, 10, 11]. They compare two digest values that are computed from the original Web page and the target Web page. But this kind of schemes has a primary drawback that it needs additional space and channel bandwidth to store and transmit the large amount of Hash value[16]. And we consider that this method also has limitations when need to locate the modifications.
Since the PCA-based Web page watermarking scheme has been invented, there comes several schemes that use fragile watermarking technique to solve the tamper-proof of Web pages[16, 17, 18, 19, 20]. The main advantage of this kind of approach is that they embed the fragile watermarking into the Web page through the upper and lower cases (ULC)of letters in HTML tags. However, some of them encounter the performance, security, and sensitive problems.
2.2
For the In-flight-target Model
Usually when users or Web sites want to make sure the pages are safe when fly from the server to the browser, they choose to use Hypertext Transfer Protocol Secure (HTTPS)[13], which provides end-to-end security by symmetrically encrypting each document the user requested. However, this secure protocol also has some disadvantages. First of all, HTTPS significantly degrades performance because it does not support Web caching which is an important Web technology. This also breaks current Web distributed architecture and increase bandwidth cost in the Internet. Also,for lots of portal Web sites or Web applications HTTPS seems too restrict since confidentiality is not so critical important. Moreover, it is costly to deploy. The Web sites, that want to use HTTPS, need to buy a high priced valid certificate from certificate authority (CA). In [12], the authors designed a toolkit named ”Web Tripwires” to provide integrity for Web page. The Web server sends three parts to users: a Web page, a JavaScript (the Web tripwire) and a well-known representation (e.g. Hash value of the page). This toolkit automatically computes the Hash value of the received page and then compares it with the representation to check whether there existing any changes. The advantage of this solution is that it is flexible and less expensive than switching to HTTPS and do not require changes to current browsers. And it also allows the user to detect precisely which modifications have been done by the attacker. However, this approach has a fatal drawback. It’s not cryptographic security. Therefore if there are attackers have been noticed this technique, the Web tripwire can also be easily removed or modified with the Web page together. In [14], the authors developed a new protocol called ”SINE” to protect the integrity of transferred Web contents, which also support Web Cache technique. The main idea is based on a Hash-chain verification scheme for signing digital streams [15]. The Web server, firstly divides the Web contents into k equally-sized blocks. Secondly, starts to generate the Hashchain from the bottom block to the top block. Each block is attached the Hash value of the next one. This means that if assume the block i is correct, then it can verify the block i+1 through computing the Hash value of (i+1)th block and then compare with the attached Hash value from i th block. Finally the hash value of top block is signed by a digital signature to guarantee the integrity of the entire chain. When a user gets this document, it can be verified from top to bottom one by one. However, even though this method can grantee the integrity of the page and also support the Web cache technology, it has several disadvantages. While in a limited network bandwidth environment, this will be costing to store and transmit the huge numbers of hash values. And also it cannot locate the authorized modifications precisely.
3. PRELIMINARIES 3.1 Web Cache There are two types of Web caches: a browser cache and a proxy cache. In this paper when we said cache, we mean the proxy cache, which is near the location of users and stores copies of Web contents of the Web server passing through it, such as HTML pages, images, even Web applications. When subsequent requests for these contents arrive, the cache will deliver the locally stored copy of the content to avoid repeating the download from the host server. With this technology, it cannot only efficiently reduce host server bandwidth and workload but also let users get the response quickly. With service oriented Web application gets more popular, Web caching plays a more important role in improving service quality for a large range of Internet users.
Figure 1: Adversary Model segment is signed by the Web Server(WS)’s digital signature from Security ServiceProvider(SSP)(detail introduced in section 5) to guarantee the integrity of the entire chain. (n is the number of segments of a page, T : timestamp, E : expiration date)
3.2 PCA-based Web Page Watermarking Principal Component Analysis (PCA)[21] is a multivariate technique, which can transform a number of related variables to other uncorrelated variables, which can represent the original characters before the transformation took place. PCA has two main properties: (1)the principal vectors (PVs) are projection axes of the original data ,and (2)the projected vectors can express the most features of the original data. In [16], authors generate a fragile watermarking from the HTML source code of Web page based on PCA algorithm. And then they embedded watermarks through modifying the case of letters in HTML tags (Upper-Lower Coding, ULC). By doing this process, the watermarked Web page will not increase file size. The most important drawback is that when the size of Web page becomes larger, the computing time is exponential growth. Moreover, their scheme makes parts of HTML functions lose efficacy[18].
3.3 Fragile Watermarking-Chain Scheme This section we propose an important component of the integrity assurance service is a fragile watermarking-chain scheme which we proposed in [26]. The main process of the authenticator (showed in the right part of Figure 4) generation is shown as following. • Segmentation The Web server divides HTML source code of the Web page into n segments, which have equally rows (S1 , S2 . . . Sn ). • Watermarking-chain Generation Web server start to generate the fragile watermarking W (Xi ) from the bottom to the top of the page with a key Kw . Add padding in the bottom segment Sn as an END marker to detect the end of the watermarking-chain in next verifications process. – Attach the W (Xi+1 ) (watermarking of Xi+1 ) with the segment Si and repeat this action until get to the top of the chain. Here the Xi is a data structure which contains the segment Si and its corresponding watermarking value. Through this operation, a link between each segment is created. The X0 on the top is a new segment which contains the watermarking value of X1 and Kw . This
• Watermarking Embedding All segment’s watermarking value will be embedded into HTML tags circularly by using Upper-Lower Coding (ULC) method. For example, Wi = (1001) and Ti =“Web Page”, after being embedded, Ti is transformed into Tiw = “Web Page”. • Verification Use the SSP ’s public key to verify the first segment X0 , if it is correct, then will get the correct W (X1 ) which can verify the next segment X1 by generating W (X1 )again and comparing with the correct one. If no incorrect happening ,repeat this until see the END to incrementally verify the integrity of each segment of the Web page.
4. 4.1
SYSTEM AND ADVERSARY MODEL Adversary Model
Firstly we present the adversary model considered in our work, which is shown in Figure 1.The model is typically composed of three principal elements:(a)Web Server(WS), (b)Proxy /Cache Server(PS),(c)User(U). When a user sends a request to the Web server over HTTP, the proxy server will check whether the requested page is available within the cache itself. If the cache has it, the proxy server will reply to it directly, otherwise the proxy server forwards the request to the Web server. Then the Web server gives a response back. The proxy will store the Web page within the cache for next time’s user request. When there is a necessary, for example, strong security concern, the user also can establish a direct connection to the Web server through HTTPS which make the user must establish a direct connection to the server. That also make all communication between user and server be encrypted and authenticated under a secret pairwise key. The Figure 1 also has shown both models of the traditional host-target and the new in-flight-target Web page attacks.
4.2
Attacker I: host-target Model
Web servers are usually placed at the advance guard within Web site ’s network architecture. This make the Web server
become one main target of attacks from various attackers. One of the most common results of such attacks are Web page tamper such as totally changing page contents, modifying Web page on choosing contents, tag insertions, and especially spy ware or mail ware downloadable code insertions, which now is called drive-by download attack. Like the Gumblar malware is a popular one. The attacker compromises the Web server of a legitimate mainstream Web site and injects code in Web pages to re-direct users to malicious contents from the pre-prepared malware site. So it is able to infect an user’s system automatically when user is visiting the tampered Web page by testing various browser vulnerabilities and installing malware files or illegal content without user’s interaction.
4.3 Attacker II: in-flight-target Model Most Web pages are sent from Web servers to users by using HTTP. Base on the measurement study by Reis.et.al[12], a large number and variety of in-flight modifications happened to Web pages, and they often result in bad consequence for users and publishers. in our model, We suppose that the end-to-end protection of HTTPS channel is secure, but we do not trust any middle proxy servers between Web server and user on HTTP. Which means proxy should only can cache the Web pages.
4.4 Design Requirements The goal of our design is to enforce integrity protection in both two models. The Web pages displayed on the user’s browser is just the ones to which Web site originally sent. We designed the framework with the the following requirements. • Detecting and Locating The system should be able to detect any unauthorized modifications to a Web page , and also to locate the malicious modifications precisely according to requirement. • Reduce Development Cost We want to minimize the burden from system deployment expense for SMEs. The threat from Internet can cause more fatal losses to SMEs however usually they do not have enough resources to effectively enforce security measures. • Cryptographic Mechanism No matter what intended attacks exist or not, the mechanism should neither be bypassed nor removed by an attacker. • Support Web Caching Technology The system should keep the Server-Proxy-User architecture for supporting backward compatible. • High Performance It should be faster than HTTPS solution and also does not increase the page size for saving bandwidth of communication channel and storage space of web server. Moreover, it should allow users to verify the integrity incrementally when the web page is downloaded.
5. INTEGRITY ASSURANCE SERVICE Because the integrity of Web page constantly gets neglect, additionally to keep the integrity of Web page is really not an easy, cheap and enjoy work for most organizations, we consider to absorb the merits of ”Security as a Service”(SaaS)
to shift the workload of integrity enforcement to a ”Security Service Provider” (SSP). In this section,we describes our Integrity assurance service, including its IaaS architecture and the fragile watermarking chain scheme (FWC).
5.1
System Overview
In order to guarantee the integrity of a Web page through Internet, a Web sever should have a valid certificate which proves its authentic. However, as mentioned in our design requirements, we want to reduce the system deployment cost for Web sites. Therefore, rather than enforcing by Web site’s own security software tools or technical expertise, we shift the task of integrity check enforcement to a SSP which stands in between Web sites and users. At a high level, the architecture of our system is shown in Figure 2. We assume that most of the security service providers are major companies that can afford to buy or already have a valid certificate. Therefore, the Web sites does not need to buy an expensive digital certificate by themselves but only contract with SSP to have a one-stop integrity service with low cost. Every Web site need to contract with security service provider to get the integrity service for their Web page. Firstly, Web site generate the original Web page within its own Publish Server(PS) which is in Intranet and does not directly connect to Internet. The IaaS will automatically synchronize these contents in PS to the Web server(WS) within the Demilitarized Zone(DMZ) through an one way self-signed SSL channel. Then the DMZ contains and exposes the Web site’s Web pages to Internet through HTTP.
5.2
Demilitarized Zone (DMZ)
The DMZ is a subnetwork which contains and exposes an organizations’ external services to the untrusted Internet. It is a buffer place for a Web sites’ local resources. An external attacker only has access to equipment in the DMZ, rather than any other part of the network. Therefore the DMZ contains and exposes the Web sites’ Web pages to Internet through HTTP which is a untrusted network. However this time the external attacker only has access to the WS in the DMZ rather than PS belong to Web site. In DMZ the SSP will enforce the integrity in to the Web page using the Fragile Watermarking Chain(FWC) scheme, an important component of the integrity assurance service, which we presented in [26]. After that process a new page will be generated, which can proof the integrity by itself. We name this new page as anti-tamper page(ATP).
5.3
Logic Components
As shown in Figure 3, logically the system is consist of three components: Automatic Publish System, Management System, Web Protection System. In the publish server, it contains the Publish Module of Automatic Publish System and the Management System. In the Web server, it contains the Synchronization Module of Automatic Publish System, Anti-Tamper Module of Web Protection System and the Intrusion Prevention Module of Web Protection System. The following operation environment must be satisfied for securing the whole operation. All requests/replies intended for the Publish Server(PS) have to go via the DMZ. Any connection attempts to the PS from outside are prohibited. Since the PS is an intended protected target, any connection
Figure 2: The integrity serving architecture. WS and SSP have a contractural agreement that lets SSP synchronize Web page from WS’s server. The protocol illustrated is given in section 5.4 a page, the Web cache will store the page and the corresponding Watermarking chain. Step 3 When U gets the first segment X0 of ATP ((S0 )”,..., (Sn )”), then uses the public key of the WS to verify it. If the verification is correct, go to (step 4). If fail, go to (step 6). Step 4 Get the correct W (Xi ) from the upper segment (i=1,2,..,n). Then regenerate the watermarking for Xi to get a W (Xi )’. By comparing the W (Xi ) with the W (Xi )’, the user can verify Xi is unauthorized modified or not. If the verification is correct, the user will get the correctW (Xi+1 ) then repeat (step 4) until finishing the last segment. If fail, go to (step 6).
Figure 3: Schematic diagram of system components attempts to the Web should be via the WS of DMZ. The response from the PS should go back to the client via the DMZ. Any initial network connection attempts from the PS to the outside world are prohibited, so as to avoid opportunities for backdoor program installation whereby an attacker can build a connection and then transmit information to the outside world. These arrangements are reasonable to establish the above operation environment.
5.4 Process Description When the page goes out to the Internet, the validation process is shown in Figure 4. Step 0 WS automatically synchronize the contents with PS and generate FWC. Then verify the integrity before each publishing. Step 1 A User(U) sends a HTTP GET request for getting a Web page. Step 2 The WS sends segment of the ATP ((S0 )’,...,(Sn )’) to the U one by one through Internet which contains the proxy or cache server. After the first request for
Step 5 By comparing the number of segments received with the authenticated n from X0 . The U verifies whether the entire document has been correctly received. Step 6 If any failure happened, the in-flight Web page change has been found. U stops rendering the page and sends a report to the WS.
5.5
Implementation and Measurement Introduction
For implement the system, we investigated two popular tamper-resistant technology called the ”Incident Triggered Technology” and the ”Core Embedded Technology”. The core embedded is one of the most popular anti-page tamper technology. Because of its merits, lots of anti-page tamper systems select this to realize their systems. When the files are pouring out of the Web site, core embedded technology will analyze the integrity of the files, and then recover the tampered files, to prevent all the illegal files from flowing out of the Web site. It is considered that to combine the core embedded and incident triggered technology together is an ideal solution for page anti-tamper system [31]. Therefore in our work, we focus on implementing our proposed architecture based on these technology. Since we assume that the bandwidth used by the selfsigned channel between the web site and the security service provider is well enough, the overhead of our system mainly
Figure 4: Validation Process caused by the watermarking generating and embedded function. We have implemented a prototype of the FWC component and made a measurement about the segment size and computing time cost[26]. The experiment result shows that by choosing appropriate rows of segments of the divided HTML source code, we can get a much better performance than the PCA-based web page watermarking scheme[16]. As an example, when we divided the test page into 11 segments (12 rows per segment, total 132 rows HTML code) with 70.2 ms to generate and embed FWC into the page, however the whole page computed by [16] cost 1978.2 ms. Also the result shows the proposed scheme can have the same level executing time performance with the hash chain method used in “SINE” protocol[14]. By adjusting the rows of each segment, our method can also locate the unauthorized modifications in HTML segments with high precision. With maximum precision we can locate the modification in one specific line. About the increase of page size, we can compute the result simply because SHA-1 hash method will generate 160 bits hash value for each segment where our method would generate 0 bit. Then we make a comparison about our proposed scheme with the existing countermeasure within in-flighttarget model which shown that our proposal has a main advantage than SINE protocol that it does not increase the size of the Web page locate the modification more precisely and also had the same level performance with hash-method. Compare with the Web Tripwire we are cryptographic security by the guarantee of a digital signature.
6. MERITS OF ARCHITECTURE The proposed Integrity Assurance Service Module has a special advantage that it can enforce integrity protection into both the host-target and the in-flight-target models.
Furthermore, it has merits from the following evaluation criteria.
6.1
Development Cost
The decision to utilize a Public key infrastructure(PKI) solution is often based on a comparison of only the most obvious costs such as the Licenses fees, hardware and software investment. However they are only a part of the total cost of ownership (TCO) of digital signature solutions. It is important to consider not only the initial one-time costs, but the recurring annual cost as well. Digital certificates signed by Certification Authorities (CAs) tend to be used mostly by large and profitable Web sites or major companies, however most prevalent authentication issue with HTTPS is that most Web sites cannot afford certificates signed a trusted CA. Research shows that for most organizations of any size, to implement a digital signature with PKI solution could cost close to half-a-million dollars over a three year period for only a thousand users. In our proposed architecture, we rely on the fact that most of information security companies own digital certificates and can be trusted third partners who can verify the identity of servers and issue valid signed certificates (e.g., X.509 certificates) proving the ownership of a given public key by a server. Our proposed IaaS method can significantly reduce TCO. Because with the method, organizations can reduce costs by eliminating the distribution, deployment, and ongoing upgrade of on-premise hardware. In addition, low electricity or cooling is required. Bandwidth costs are lower, and built-in fault tolerance further eliminates the need for additional servers. Labor costs are also reduced, because instead of paying for training, installation, management, and ongoing maintenance, the labor costs associated with a the
solution are focused on minimal staff training and administrative functions. The difference in labor costs alone is often sufficient to justify a move away from an on-premise solution, although other factors, including the virtual elimination of hardware costs that also play a major role. In the best situation the estimate shows that our proposal can provide all lots of benefits that the web site only needs at 10% of the cost of HTTPS solution.
6.2 Reliability The IaaS approach is an effective architecture to reduce the Web-based security threats and internal data loss risks which is caused by the unplanted downtime. For example, if the Web site’s server goes down, they are cut off and unprotected. In this situation, the email can become a torrent of malware. Such a failure could quickly overwhelm an organization’s network infrastructure. The Demilitarize Zone (DMZ) holding by the SSP potentially has the ability to counter this risk. The DMZ could be stored in a well designed data center infrastructure. In order to accomplish this task, a SSP’s data center infrastructure must be: • Geographically dispersed and physically redundant. Even if a disaster destroys an individual data center, the remaining sites have the capacity and functionality to provide uninterrupted service. • Physically secure. Each data center must employ redundant power and cooling systems, physical access control, server clustering, multiple Internet uplinks, and other security measures. • Designed for maximum efficiency. The provider should be capable of routing customer traffic to the most appropriate data center, based on geographic location and Internet traffic conditions. This is an especially valuable capability for companies that must secure remote offices and mobile workers[28].
6.3 Effectiveness The zero-hour attacks[27], including new malware variants and compromised malicious Web sites, will compromise regular security solution. To shift the workload of security protection to the SSP can get a realtime threat environment with real-time threat detection and assessment tools and reduce the data loss risk. Usually the SSP has an ability to collect, classify, and correlate massive quantities of security intelligence data. Therefore it has the real-time threat response capabilities to recognize zero-hour threats which is faster than normal Web site. Also in our proposed architecture, the SSP can also integrate the integrity protection into other security module such as data encryption, static and dynamic malicious code detection, in which situation, our proposed system is highly flexible.
6.4 Performance Moving security into the DMZ can improve performance and the ability of integrity enforcement. Web site can focus on providing content of Web page and let the security professions handle the integrity protection. Routing the Web traffic through a SSP begins to make a lot of sense, the unwanted Web traffic never reaches a Web site’s local network
and never impacts its bandwidth or network integrity. The SSP’s global, distributed data center infrastructure is designed to minimize network latency by routing Web traffic based upon both geographic proximity (which data center is closest to a particular customer) and intelligent traffic analysis (how the Internet backbone traffic impacts latency)[28].
6.5
Privacy and Security
Usually we have a serious concern that whether the SSP will expose sensitive data to unauthorized users or that the SSP’s systems will fall down. The best ways to assess a SSP’s privacy and security measures is through the use of thirdparty certification procedures. The most relevant certification is known as ISO 27001[29], which is designed specifically to “provide a model for establishing, implementing, operating, monitoring, reviewing, maintaining, and improving an information security management system.” This certification process focuses on a number of key requirements, including: the use of best practices to ensure the privacy, integrity, and availability of customer data; a provider’s willingness to submit its data center and related operations to periodic certification audits.
7.
SECURITY ANALYSIS
Logically the proposed integrity enforcement mechanism is only known by the original Web site and SSP, in other words, it is unknown by the outside attackers. In this situation, it is very hard for the outside attacker to intrude the system and damage the integrity of Web page, because they only have little knowledge about the internal operations. The attacker cannot easily know the page is from the original Web site or SSP. Furthermore, without knowing the privacy key any modification of the content of a Web page embedded with the Fragile Watermarking Chain (FWC) is hard to achieve and easily to be detected. In case of the attack is from the insider of Web site, the attacker maybe not only target the integrity of the Web page but also want to leave behind a ”backdoor-port” within the compromised server to get more benefits. Then more malicious behavior could be realized by this vulnerability which lets attacker connect to the compromised server. In our approach, the attack will not work because the compromised server is isolated with the Internet by the Demilitarized Zone (DMZ). An attacker cannot initiate an outside connection with the compromised server unless allowed access to the ports by the DMZ. Another case is that the installed “backdoor-port” will automatically initial connection to outside port of an attacker. In this situation the connection will start from the WS, and it still cannot bypass DMZ which contains an Intrusion Prevention Module.
7.1
System limitations
The proposed system based on an assumption that the DMZ with the ”Introduction Prevention Module” is a secure environment. And the proposed system can only detect the violated of the integrity of Web page. It means that if an intrusion occurs by the insider attacker of Web site, but the Web page content is not modified, the proposed method may fail to detect it. However, even such intrusion cannot be detected, the proposed method still can guarantee that the server response is consistent with its original design logic. The leak-
ing of information to the outside intruder can be mitigated. Our method ensures that the response of the server follows its original design. We do not consider the detection of the intrusion techniques such as SQL injection attack which usually depends on the negligence of input validation. However the Web server application still follows the pre-designed logic under the SQL injection attacks[30]. These kinds of intrusion techniques should be considered as due to programming negligence without validating the input[8].
7.2 Comparison of Related Systems There are several existing tools or methods specifically aim at Web page defacement detection, some of which are or have been commercially available. Some of them are supposed to be run within the site to be monitored and some of them are supposed to run remotely. A large part of the tools and methods that we considered are based on the same mechanism: a copy of the digest of resource page to be monitored is kept in a safe environment. Then the resource page is compared to the trusted copy and an alert is generated whenever there is a mismatch. The trusted copy is usually stored in the form of a Hash or digital signature of the resource originally posted, for efficiency reasons. Clearly, whenever a resource is modified, its trusted copy has to be updated accordingly. Table1 shows the comparison of our proposal with the related works.
8. CONCLUSIONS In this paper, we look back to the history of integrity protection for Web pages from the conventional host-target model to the new in-flight-target model. We investigate the conventional Web page defacement and the in-flight page changes in the Web Request-Response architecture, and then we classify and analysis their corresponding countermeasures. For in-flight-target unauthorized page modification, we design and implement a Fragile Watermarking Chain (FWC) scheme to enforce the integrity protection into Web pages efficiently. The scheme supports Web caching technology, and also has two main merits. The first one is that it can save the channel bandwidth of Internet and the storage space of Web server and the second one is that it can locate the modification target HTML source code accurately. We believe that our proposal has a significant advantage in limited network bandwidth situation and environment than existing approaches. We also investigate the Security as a Service concept, and propose an Integrity Assurance Service system for helping Web site to ensure the integrity of Web page covering the host-target and the in-flight-target models together. Then we design and initially implement the Integrity as a Service architecture, in which the Anti-tamper component is based on our FWC module to show its practical aspect. As a future work, we want to realize the service for mobile devices which is battery-bound and usually do not has enough computing ability.
9. ACKNOWLEDGEMENT We gratefully acknowledge the valuable advice from Professor Kouichi Sakurai, Yoshiaki Hori and Takashi Nishide
(Department of Informatics, Kyushu University, Japan). This work was partially supported by Honors Scholarship for Privately Financed International Students from the Japan Student Services Organization (JASSO) and a Grant-in-Aid for Scientific Research C (No.23500035) from the Japan Society for the Promotion of Science (JSPS).
10.
REFERENCES
[1] J.Alpert, N.Hajaj, “We knew the Web was big,” http://googleblog.blogspot.com/2008/07/we-knew-Web-wasbig.html,2008. [2] D.Choi, E.G.Im and C.W.Lee,“Intrusion-Tolerant System Design for Web Server Survivability,” Information Security Applications, LNCS, Vol 2908, 2003. [3] Marcelo Almeida, “Defacements Statistics 2008-2009-2010,” http://www.zone-h.org/news/id/4735, 2010. [4] G.McGraw and G.Morrisett, “Attacking malicious code: A report to the Infosec research council,” IEEE Software, Vol 17, pp.33-41, 2000. [5] W.Kim, J.Lee, E.Park and S.Kim,“Advanced Mechanism for Reducing False Alarm Rate in Web Page Defacement Detection,” The 7th International Workshop on Information Security Applications, 2006. [6] F.Y.Wang, F.M.Gong, R.Sargor, K.G.Popstojanova, K.Trivedi,F.Jou, “SITAR: A Scalable Intrusion-Tolerant Architecture for Distributed Services-a technology summary,” DARPA Information Survivability Conference and Exposition Proceedings, Vol 2,pp.153-155, 2003. [7] F.Y.Wang, R.Uppalli, C.Killian, “Analysis of Techniques For Building Intrusion Tolerant Server System,” IEEE Military Communications Conference, Vol 2, pp.729-734, 2003. [8] D.Lin, YM Chen, “Dynamic Web page protection based on content integrity”, International Journal of Services and Standards, Vol 3, No.1, pp.120-135, 2007. [9] ModSecurity, “Open Source Web Application Firewall,” http://www.modsecurity.org/. [10] SecureIIS “Web Server Protection,”http://www.eeye.com/Products/SecureIISWeb-Server-Security.aspx. [11] Tripwire, “Software for Use on Web Servers, ” http://www.tripwire.com/. [12] C.Reis, S.Gribble, T.Kohno, N.C.Weaver, “Detecting In-Flight Page Changes with Web Tripwires,” The 5th USENIX Symposium on Networked Systems Design and Implementation, pp.31-44, 2008. [13] Wikipedia, “HTTP Secure(HTTPS),” http://en.wikipedia.org/wiki/HTTPS. [14] C.Gaspard, E.Bertino, C.Nita-Rotaru, S.Goldberg, W.Itani, “ SINE: Cache-Friendly Integrity for the Web, ”em The 5th workshop on Secure Network Protocols, 2009. [15] R. Gennaro, P. Rohatgi, “How to Sign Digital Streams,” em Proceedings of the 17th Annual International Cryptology Conference on Advances in Cryptology, pp.180 - 197, 1997. [16] Q.Zhao, H.Lu, “ PCA-based Web page watermarking,” em Pattern Recognition , Vol 40, pp.1334-1341, 2007. [17] X.Liu, H.Lu, “Fragile Watermarking Schemes for Tamperproof Web Pages,”LNCS, Vol 5264/2008, pp.552-559, 2008. [18] C.C.Wu, C.C.Chang and S.R.Yang, “An Efficient Fragile Watermarking for Web Pages Tamper-Proof, ”LNCS, Vol 4537/2007, pp.654-663, 2007. [19] P.Sun and H.T.Lu, “Two Efficient Fragile Web Page Watermarking Schemes,”Fifth International Conference on Information Assurance and Security, Vol 2, pp.326-329,2009. [20] X.Z.Long, H.Peng and C.Zhang, “A Fragile Watermarking Scheme Based on SVD for Web Pages, ”Proceedings of the 5th International Conference on Wireless communications,
Table 1: Comparison of Web Page Integrity Protection Systems Tool Support Support Detect Detect Page Size System Method Static Dynamic Host-target In-flight-target Increment Location Content Content Tampering Tampering Tripwire[11] Yes No Yes No Yes Locally Dynamic Yes Yes Yes No Yes Remotely Protection[8] (latency) Webagain[32] Yes No Yes No Yes Remotely SigNet[33] Yes No Yes No Yes Remotely (latency) Read-only Yes No Yes No Yes Locally Strategy[34] Goldrake[35] Yes Yes (low Yes No Yes Remotely accuracy) IaaS Yes Yes Yes Yes No Remotely (proposed) (latency)
networking and mobile computing , pp.5248-5251, 2009. [21] J.E.Jackson, “A User’s Guide to Principal Components,” Wiley series in probability and mathematical statistics, Applied probability and statistics, 1991. [22] N.Vratonjic, J.Freudiger and J.P.Hubaux, “Integrity of the Web Content: The Case of Online Advertising, ”Workshop on Collaborative Methods for Security and Privacy, 2010. [23] Y.Hollander, “Behavioral rules vs. signatures: Which should you use?” http://www.computerworld.com/ securitytopics/security/story /0,10801,78828,00.html, 2003. [24] M.Tanase, “The Future of IDS, ”http://www.securityfocus.com/infocus/1518, 2001. [25] Y.Hollander, “The Future of Web Server Security,” http://www.mcafee.com/us/local-content/white-papers/wpfuture.pdf. [26] P.Gao, T.Nishide, Y.Hori and K.Sakurai, “Integrity for the In-flight Web Page Based on A Fragile Watermarking Chain Scheme”, 5th ACM International Conference on Ubiquitous Information Management and Communication (ICUIMC), 2011. [27] Wikipedia,“Zero-day attack”, http://en.wikipedia.org/wiki/Zero-day-attack. [28] Websense, “Seven Criteria for Evaluating Securityasa-Service (SaaS) Solutions”, http://www.Websense.com/assets/whitepapers/whitepaper-seven-criteria-for-evaluation-security-asa-service-solutions-en.pdf [29] ISO 27001 Security, http://www.iso27001security.com/. [30] William G.J.Halfond, Jeremy Viegas and Ro Orso, “A Classification of SQL Injection Attacks and Countermeasures”, Proceedings of the IEEE International Symposium on Secure Software Engineering, March 2006. [31] W.Gaoqi and X.Xiaoyao, “Research and solution of existing security problems in current internet Web site system”, 2nd International Conference on Anti-counterfeiting, Security and Identification (ASID), August 2008. [32] Web Again, http://www.lockstep.com/Webagain/index.html. [33] W.Fone and P.Gregory. Web page defacement countermeasures. In Proceedings of the 3rd International Symposium on Communication Systems Networks and Digital Signal Processing, pages 26-29,July 2002. [34] A.Cooks and M.S.Olivier, ”Curtailing web defacement using a read-only strategy,” in Proceedings of the 4th Annual Information Security South Africa Conference, Midrand, South Africa, June/July 2004. [35] A.Bartoli, E.Medvet, ”Automatic Integrity Checks for Remote Web Resources,” IEEE Internet Computing, vol. 10, no.6, pp. 56-62, Nov/Dec, 2006.
[36] D.Pulliam, “Hackers deface federal executive board Web sites,” http://www.govexec.com/story page.cfm?articleid=34812. [37] J.Kirk, “Microsoft´s U.K. Web site hit by SQL injection attack”, http://www.computerworld.com/action/article.do? command=viewArticleBasic& articleId=9025941, 2006. [38] G.Smith, “CRO Website hacked”, http://www.siliconrepublic.com/news/news.nv? storyid=single7819, 2007. [39] R.Mcmillan, “Bad things lurking on government sites”, http://www.infoworld.com/article/07/10/04/Bad-thingslurking-on-government-sites 1.html,2007. [40] D.Dasey, “Cyber threat to personal details”, http://www.smh.com.au/news/technology/cyber-threat-topersonal-details/2007/10/13/ 1191696235979.html, October 2007. [41] PREFECT, “Congressional Web site defacements follow the state of the union”, http://praetorianprefect.com/archives/2010/01/congressional -Web-site-defacementsfollow- the-state-of-the-union/, 2010. [42] L.Gordon.et.al, “2006 CSI/FBI Computer Crime and Security Survey”, Computer Security Institute, 2006. [43] R.Richardson, “2007 CSI Computer Crime and Security Survey”, Computer Security Institute, 2007. [44] G.Killcrece et al., “State of the Practice of Computer Security”, Incident Response Teams (CSIRTs), tech. report CMU/SEI-2003-TR-001, ESC-TR-2003-001, Software Eng. Inst., Carnegie Mellon, 2003. [45] G.Keizer, “Hackers Deface UN Site”, Computerworld, http://www.computerworld.com/action/article.do? command=viewArticleBasic& articleId=9030318, August 2007. [46] Aljawarneh,S., Laing,C. and Vickers, P. “Verification of web content integrity: a new approach to protect servers against tampering”. In Merabti, M. (ed,), 8th Annual Postgraduate Symposium on the Convergence of Telecommunications, Networking and Broadcasting, 28-29 June. PGNET, pp.159-164, 2007. [47] S.Sedaghat, “Web authenticity. Master´s Thesis,” University of Western Sydney, Australia, 2002. [48] A.Bartoli, G.Davanzo and E.Medvet,“The Reaction Time to Web Site Defacements”, IEEE Internet Computing, vol.13, no.4, pp.52-58, July/Aug 2009, doi:10.1109/MIC.2009.91 [49] Eric Medvet, “Techniques for large-scale automatic detection of web site defacements”, 2008.