Security Issues in Cloud Computing Environments and Offering a Method Based on Data Mining for Increasing Content Security 1TalinAzarian*,Department of Computer Engineering ,Technical and Engineering Faculty ,Saveh Branch , Islamic Azad University, Saveh , Iran, email
[email protected] 2Amir Shahab Shahabi, Department of Computer Engineering, Technical and Engineering Faculty, South Tehran Branch, Islamic Azad University, Tehran, Iran, P. O. Box 11365/4435, email:
[email protected]
Abstract: Cloud computing is one of most exciting technologies as it reduces costs associated with computing while increasing the flexibility and scalability for computer processes. However, cloud based services have many major problems of information policy, including issues of privacy, security, reliability, access, and regulation, of which security is the most important one. Traditional security framework consists of three essential levels – Operating systems, Services and Application level. Web services contribute a new level in the framework and this new level is the business application layer. Since the web services communicates with each other by using an XML-based message known as Simple Object Access Protocol (SOAP), this research focusses on the security of web application level and further exploits the attack ambiguous factors and then proposes a solution based-on data mining by cleaning the outlier data in order to detect the attack in the XML message. And finally, by recognition of data distribution of a content, the changes in the XML code will be detected in order to avoid related attacks. Key words Cloud Computing Security, SOAP message, Data mining and Cloud Computing.
1. Introduction Cloud computing is a concept based on Internet, which delivers large scalable computing resources as services over the Internet. It contains a lot of positive features but there exist also some problems of which one of the most critical issues is security, which threatens the successfulness of cloud computing. The exchange of information between web applications is done by means of the SOAP messages. In cloud environment, XML signature is employed to secure SOAP messages, however, there also exist some weak points. One of the vulnerabilities is the modification of the SOAP message, using unauthorized access, called XML rewriting attack, for example, injecting of the XML document with new elements in order to modify the document. This technique is used to attack the SOAP message maliciously, using unauthorized access. Hence, the main focus in this paper is on the security of web application level.
In a previous study, Rahman et al introduced an inline approach for protecting the integrity of the SOAP message, by using a structure called SOAP Account [1]. Later, they presented a solution by using Check SOAP Account module at [2]. The author of [3] proposed an enhancement of the inline approach through the element position of SOAP message elements, using a tree-like structure. In another research, the authors proposed SOAP Account solution which can be applied for the early detection of XML rewriting attacks, specifically regarding the secure SOAP-based conversations [4]. The authors of [5] presented a way to perform signature wrapping attacks by applying the XML namespace injection technique and they demonstrated that the interplay of XML Signature, XPath, and the XML namespace concept contained severe flaws which can be exploited for an attack, and that XML namespaces in general poses real troubles to the digital signatures in the XML domain. At [6], researchers first built SOAP message elements structure, using ontology and then attached it in
SOAP message header. Validating the ontology in the receiving end makes it possible to detect attacks early in validating process. Also, in this approach, all modifications on SOAP messages are written to a log. So in the presence of security failures, by checking this log, the possibility of recovering from the effect of successful execution is sure. This paper describes a proposed approach to securing the XML web services. A solution is proposed based-on data mining, by using cleaning the outlier data technique in order to detect the attack in the XML message. This paper is organized as follows: Section 2 deals with a survey on the vulnerability in web services. In section 3, a method is proposed to detect the attack in XML codes, using data mining. Section 4 describes the execution of the proposed method. Section 5 is a summary of the research results. Section 6 concludes the research. Section 7 describes further works.
2. Survey on vulnerability in web services The general requirements for a secure system are integrity, confidentiality and availability. So any action which targets at violating one of these points would be an attack and the possibility for the occurrence of an attack is called vulnerability. Traditional security framework essentially consists of three levels – Operating systems, Services and Application level. Web Services adds a new level in the framework and this is called the business application layer. Firewalls are effective at blocking traffic directed at the operating system and services level; however, they are unable to block web application level attacks since traffic on ports 80 and 443 is legitimate. Application firewalls typically provide protection only against HTML and Browser-based attacks but not against the XML message stream [7]. Web services standards, including SOAP, WDSL and XML schema are fundamentally based on XML. Over time XML has emerged as a rich and extensible representation for maximizing the flexibility and ease of use. The ease of use and its text based nature are a liability, allowing the
launching of an attack by an attacker [8]. Vulnerabilities are classified into two which are described in the following passage. 2.1 Coercive Parsing Coercive Parsing is one of the simplest attacks to mount. It aims at exhausting the system resources of the attacked web service. The attacker just sends a SOAP message with an unlimited amount of opening tags in the SOAP Body. In other words, the attacker sends a very deeply nested XML document to the attacked web service. This attack is one of the more devastating denial of service attacks. However, countermeasures are available. In order for the proper working and functioning of this attack, the knowledge about the following things is essential: A. Attacker knows the endpoint of web service. WSDL is not required, since the attack is solely focused on the XML Parser. It is of no importance if the Operations within the SOAP Message are valid. B. Attacker can reach the endpoint from its location. Access to the attacked web service is required. If the web service is only available to users within a certain network of a company, this attack is limited. While denial of service attacks usually requires a large number of massages, coercive parsing attack can be launched on a web service with a single 2KB malformed XML massage, as shown: … 2.2 XML Digital Signature Attack XML-Rewriting attacks [9] refer to a message modification by a malicious attacker while the XML signature is kept valid. An attacker may perform an attack on SOAP messages which utilizes the trust and secure conversation frameworks. In the following,13 kinds of these attacks are described.
1) Using by or XPath Injection Web applications heavily use databases to store and access the data required for their operations. Historically, relational databases have been by far the most common technology for data storage, but in the last years, there was the case of increasing popularity for databases which organized data using the XML language. Just as relational databases are accessed via SQL language, XML databases utilize XPath as their standard query language. The XPath attack pattern was first published by Amit Klein [10] and is very similar to the usual SQL Injection. XPath is a language designed and developed primarily to address parts of an XML document. In XPath injection testing, the possibility of injecting XPath syntax into a request interpreted by the application, allowing an attacker to execute user-controlled XPath queries is tested. When successfully exploited, this vulnerability may allow an attacker to bypass authentication mechanisms or access information without proper authorization. 2) More Than two SOAP Accounts As a SOAP Account represents any kind of message structure information, this list is not exhaustive, rather it is always extensible, depending on the context. To explain this kind of attack, the scenario described in [4] is used. Requestor A attaches SOAP Account information into the message before sending it to a valued receiver. Only two structure information namely and are added in order to capture the number of child elements in the SOAP envelope and the number of header elements in the SOAP header, respectively. The SOAP Account is signed by A before sending the message. Now, let us consider having a message excerpt, after an attempt to attack by the attacker. Any legitimate receiver (e.g.STS) of the message complying with the SOAP Account approach will compute the SOAP Account information in the received message as 1
RequestSecurityToken
soon as it arrives. The computed SOAP Account in STS is as follows: 2 3 Note that, is 3, including the . It does not match with the attached SOAP Account information since contains 2. The receiver can immediately detect that there has been a rewriting attack on this message and rejects the request for issuing security context token. 3) Adding Bogus Tag Let us consider another scenario described in [4]. Requestor A intends having a conversation with the service B. Requestor A requires a security context token which can be acquired from STS. STS essentially builds the trust between A and B. To be able to get a required security context token, A sends a SOAP message to the STS with a RST 1 in the body of the message. Since this is a sensitive information, the RST element will be signed [11] by the requestor A. A may indulge in a conversation with service B residing in different trust boundary, after the security context token is received. Any malicious attacker in between the requestor A and the STS may capture the message and introduce its own inside the header of the SOAP message and copy the sensitive request information into the . It is noteworthy at this point that the attacker does not change or modify any sensitive information and thus keeps the signature value intact for the STS. In addition, the attacker adds its own request for a custom token and request Type into the body of the message. The STS may process the request by renewing the custom token, with the assumption
that the token has been established beforehand. Thus, the attacker may possess a security token after the arrival of the RSTR token in response from the STS. The STS may ignore or reject the request but this attack enables the attacker to make an invalid request to the STS (Figure 1).
Figure 1: Secure Conversation of multiple messages without and with SOAP Account
4) Number of child elements of the root (Envelope) more than 20 Buffer overflows involve sending large amounts of data as input to the application. Usually, a fixed amount of memory is used to contain the user input. When the input is too long for the memory allocated to it, the application overwrites other instructions, resulting in the abandoning of its normal behaviour. Attackers may even be able to obtain arbitrary code executed using this technique. This may also allow users to execute instructions with the same privilege level as the application [12]. So in the XML codes, if number of child elements are more than the usual numbers (for example 20 No.), it shows the presence of an attack in it. If the main XML code is available, it is compared with the requested XML code and the following 4 points checked (5-8), hence in the case of finding any variance, it shows the presence of an attack: 5) Number of child elements of the root. 6) Number of header elements. 7) Number of references for signing element. 8) Compare requested SOAP with original SOAP account.
9) Injection of RSA to Signature Algorithm Tag An attacker can create a large RSA signing key with a small exponent, thus requiring that the verification key have a large exponent. This will force verifiers to apply considerable computing resources in order to verify the signature. Verifiers might avoid this attack by refusing to verify signatures which reference selectors with public keys having unreasonable exponents. In general, an attacker might try to overwhelm a verifier by flooding it with messages requiring verification. This is similar to other MTA denial-of-service attacks and should be dealt with in a similar fashion. 10) XSLT Transform Injection XSLT is a complete programming environment. It is totally unsuitable for use in a digital signature technology. Using the base XSLT syntax, an attacker can specify loops which consume unbounded amounts of system resources or make outbound network connections. More dangerous is the fact that majority of the XSLT process or specify extension mechanisms which allow operations such as scripting, file system operations or even arbitrary code execution. As the XSLT transform is optional and cannot be relied on for interoperability, it should always be disabled or forbidden by schema validation prior to signature verification. If circumstances dictate that XSLT transforms must be used, extensions must be disabled in the XSLT processor, and mechanisms must be put in place to limit the total system resources which may be consumed by signature validation [13]. 11) Attacking by Existing User Name and Without Password Figure 2 demonstrates a malicious excerpt code which attacks with no password.
SOAPSDK2=”http://www.w3.org/2001/XMLSchemainstance” xmlns:> SOAPSDK3=”http://schemas.xmlsoap.org/soap/encoding/” xmlns:SOAP-ENV=’ http://schemas.xmlsoap.org/soap/envelope/”> < SOAPSDK4:GetProductInformationByName xmlns: SOAPSDK4=http://sfaustlap/ProductInfo/> % 551-457-4487 ’or 1=1 or password=’
Figure 2: Bypassing Authentication on “GetProductInformationByName “
12) Attacking by Mentioning an Invalid Web Site in xmlns:web1: Tag As earlier described, XPath injection, much similar to SQL injection, exists when a malicious user inserts arbitrary XPath code into form fields and URL query parameters in order to inject this code directly into the XPath query evaluation engine. Doing so allows a malicious user to bypass authentication (if an XML-based authentication system is used) or to access restricted data from the XML data source [14]. Figure 3 demonstrates that attacker sends his site to the web service in order to confirm user name and its bogus password. Hence, in XML code, if a URL address is seen in front of “xmlns:web1=” tag which is different from the supported URL address, it is known as the attacked code.
Joe Johnson hacking_isfun
Figure 3:Attacking by Mentioning an Invalid Web Site in xmlns:web1: Tag(excerpt)
13) Attacking by Extra Long Name or Long Name Space in XML In a regular SOAP, message components within an XML tag usually have a length of a few characters. Namespace declarations can be as long as a few hundred characters, but this does not usually pose a problem to any XML parser. However, when used in a malicious way, the components within an XML tag can be used to mount denial of service attacks. The attack is possible because the XML standard [15] does not limit the size of components in the XML tags, like length of element name and length of namespace. The "XML Extra Long Names" attack is very basic to execute. All the attacker has to do is make use of a very long element name, attribute name or namespace. In the case of a successful attack, the buffer of the XML parser for element names, attribute names and namespaces overflows, which results in a denial of service. With regards to Long Name Space Before a namespace prefix gets declared, all attributes are required to be read because at a later stage the namespace prefix declaration might become overwritten by another namespace. If an attacker places many attributes in an element, a buffer overflow in the XML parser occurs before the
namespace prefix is declared. An example for each attack is presented in the Figures 4 and 5. version=”1.0” encoding=”UTF-8”?> Attack_10000="XXXXXX">
Figure 5:"XML Namespace Prefix Attack"
Now, 40 samples of attacked code were surveyed based on previous factors and their feature vectors constructed. The detection of any of the aforementioned factors in the sample codes facilitates the setting of the corresponding component of the vector to 1, and if there exist no attacked pattern in code, then the 15th component is set to 1 and the other remains 0. Therefore the normal vector containing 14 components is set to 0 and one component is set to 1, while in suspected vectors, the other components are set to 1. Then, 150 normal samples are surveyed and their normal vectors created and added to our data. This is a sample space for applying the T2Criteriawhich is based on the following expression [16]:
…. T2 Criteria is a Normalized radius of statistical data. Let us consider the data as being distributed on an elliptical at an-dimensional space (Figure 6-A), therefore we intend to convert it to a circle with radius=1(Unit Circle) by rotating and scaling it (Figure 6-B). As shown in (Figure 6-B), the place of data is changed, and the advantage of this conversion is that we are dealing with numbers between 0 and 1,hence whatever is placed inside the Unit Circle is normal and what is placed outside the Unit Circle is abnormal. This is a simple hint of T2 Criteria.
3. Proposed Approach for Attack Detection based on Data Mining First, an-dimensional vector is constructed from input XML files. If there exist any pattern which increases the attack probability in the code, then the corresponding component of the feature vector is set to 1. Assuming this code is used as a correct code then the last component of the vector is also set to 1, otherwise if the code is not attacked, then only the15th component of the
feature vector is set to 1. This implies that the 15th component is beyond the normal codes. Any of these components correspond to the states which have been expressed in a previous section.
R=1 (X1,x2)
Figure 6-A:𝑥 𝑇 𝐶 𝑥 = 1
(y1,y2)
Figure 6-B:𝒚𝑻 𝒚 = 𝟏
After applying this statistical criteria to the proposed sample space, the vectors left out of our normalized distance based on 𝛼 = 2𝜎 is outlier and the other points are selected as a normal vector. The outlier points show the attacked codes.
4. Dotted Graph for Feature Vector Created by MATLAB There are 190 samples out of which 40 are attacked code and the rest are normal codes, so based on the method mentioned before, we tried to write a code with MATLAB which recognize the outlier factors in our n dimensional space. In this code, first the sample codes are divided into 3 groups in which the first group is the attacked sample codes and the others are normal. Then based on outlier factor detection method that is T2 Criteria, the points are normalized with 𝛼 = 1 and any point left out of this normal range is detected as outlier and is represented in a red crossed circle in the graph. The 15th component of any feature vector shows that the corresponding code is normal, therefore a projection of other components over this component is established in 14 graphs and as the red points is visible in all of them, with the outliers well separated from the normal points.
Figure 7:Projection of 1st component to 15th component
Figure 8:Projection of 2st component to 15th component
Figure 9:Projection of component
3st component to 15th
Figure 10:Projection of 4st component to 15th component
Figure 11:Projection of 5st component to 15th component
Figure 12:Projection of 6st component to 15th component
Figure 13:Projection of 7st component to 15th component
Figure 17:Projection of 11st component to 15th component
Figure 14:Projection of 8st component to 15th component
Figure 18:Projection of 12st component to 15th component
Figure 15:Projection of 9st component to 15th component
Figure 19:Projection of 13st component to 15th component
Figure 16:Projection of 10st component to 15th component
Figure 20:Projection of 14st component to 15th component
5. Results According to the aforementioned study and execution results at section 4, in 34 instances of 37 samples, it can be seen that attacks were detected. This shows 91.89% accuracy of the presented method, which is approximately acceptable.
6. Conclusion In this study, the main focus was on the security in the XML messages. The web services communicated with each other by using an XMLbased message, called Simple Object Access Protocol (SOAP). SOAP Message is very vulnerable, for example a SOAP message changes because of an unauthorized access, this type of attack is called XML rewriting attack. As mentioned, the firewall just blocked the traffic in the operating system layer and service layer, but was unable to block the Web application layer attacks because the traffic on port 80 and 443 was allowed. Application firewalls only possessed protection against attacks based on browsers and HTML but lacked protection against XML message attacks. As per the aforementioned research, the suspected cases of attacks on web services was classified into two main parts, the first one is Coercive Parsing and second one is XML Rewriting Attack / XML Digital Signature Attack, of which the XML Rewriting Attack was identified in 13 kinds. In other words, a total of 14 cases were identified. So according to the identified attacks for the web services in the cloud computing environment, a method based on cleaning outlier data with data mining using 𝑇 2 criteria and MATLAB programming language is proposed in order to discover and recover this security challenge. As per the 91.89 percent accuracy of proposed method, it can be used in the new generation of intelligent firewalls in the cloud computing environment because the execution speed of the offered program is also an
advantage which facilitates its use in online disposal.
7. Further work In further researches, the use of machine learning mechanism and neural networks can help to distinguish the correct and attacked given XML codes submitted to cloud and clearly show it. Then the results can be compared in terms of accuracy and speed with statistical method presented in this paper during execution time.
8. Acknowledgement The author wish to thanks Dr. Amir Shahab Shahabi for his comments and guidance and Islamic Azad University Saveh Branch for its financial support of this project.
9. References [1] M. A. Rahaman, R. Marten, and A. Schaad. An inline approach for secure soap requests and early validation. OWASP AppSec Europe, 2006. [2] M. A. Rahaman, A. Schaad, and M. Rits, Towards Secure Soap Message Exchange in a soa. In Workshop on Secure Web Services, 2006. [3] Tawfiq S. Barhoom,Raed S. K. Rasheed. Detection of XML Rewriting Attack: Enhance Inline Approach by Element Position,2011. [4]Mohammad AshiqurRahaman, Andreas Schaad. SOAP-based Secure Conversation and Collaboration,2007 [5] Meiko Jensen, Lijun Liao, JörgSchwenk. The Curse of Namespaces in the Domain of XML Signature, 2009 [6] Aziz Nasridinov, Jeong-Yong Byun, Y,ungHo Park. UNWRAP: An Approach on Wrapping-Attack Tolerant SOAP Messages, 2012 [7] Dr. SrinivasPadmanabhuni, Vineet Singh, K M Senthil Kumar, Abhishek Chatterjee .Preventing Service Oriented Denial of Service (PreSODoS): A Proposed Approach.
Proceedings of the IEEE International Conference on Web Services (ICWS'06) ,2006 [8] Weider D. . Software Vulnerability Analysis for Web Services Software Systems, Proceedings of the 11th IEEE Symposium on Computers and Communications (ISCC'06) 0-7695-2588-1/06, 2006 [9] A. D. G. G. O. KarthikeyanBhargavan, CdricFournet. An advisor for web services security policies. Volume 2005 Workshop on Secure Web Services, pages 1–9, Fairfax, VA [10] Amit Klein. Blind XPath Injection,2004. [11] D. S. Donald Eastlake, Joseph Reagle. Xml signature syntax and processing, http://www.w3.org/tr/xmldsig-core. [12] Irfansiddavatam and JayantGadge. Comprehensive Test Mechanism to Detect Attack on Web Services, 2008. [13]Brad Hill. A Taxonomy of Attacks against XML Digital Signatures & Encryption, 2007 [14] ChetanSoni .XPATH Injection, 2013 [15] Leroy MetinYaylacioglu. Business value einer web service firewall. Master’s thesis,HochschulefürAngewandteWissenschafte nHamburg, 2008. [16] Nematolahi Nader. Engineering Probability and Statistics (book) . 2007,P(220240)