SQL-IDS: Evaluation of SQLi Attack Detection and Classification Based on Machine Learning Techniques Naghmeh Moradpoor Sheykhkanloo School of Science, Engineering, and Technology (SET) Abertay University Dundee, United Kingdom
[email protected] ABSTRACT Structured Query Language injection (SQLi) attack is a code injection technique where malicious SQL statements are inserted into a given SQL database by simply using a web browser. Injected SQL commands can alter the database and thus compromise the security of a web application. In our previous work, we proposed an effective pattern recognition Neural Network (NN) model for detection and classification of the SQLi attacks. Our proposed model was built from: a Uniform Resource Locator (URL) generator, a URL classifier, and a NN model. The URL generator was implemented in order to generate thousands of malicious and benign URLs. The URL classifier was employed in order to identify each URL, which was generated by the URL generator, as either a benign URL or a malicious URL. The URL classifier also pigeonholed the malicious URLs into seven popular SQLi attack categories. The NN model includes n hidden layers with x input and y output nodes where the benign and malicious URLs were employed for training, validating, and testing phases. Addressing our previous captured results, our proposed pattern recognition NN model for the detection and classification of the SQLi attacks demonstrated a good performance in terms of accuracy, true-positive rate, and falsepositive rate. In this paper, we stress test our previous proposal in order to prove the effectiveness of our proposed approach.
Categories and Subject Descriptors G.4 [Mathematics of Computing]: MATHEMATICAL SOFTWARE (MATLAB) and D.3.2 [PROGRAMMING LANGUAGES]: Language Classifications (C, C++, Java)
General Terms Algorithms, Design, Experimentation, Measurement, Performance, Reliability, Security, Verification
------------------------------------------------------------------------------------------------Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
[email protected]. SIN '15, September 08 - 10, 2015, Sochi, Russian Federation © 2015 ACM. ISBN 978-1-4503-3453-2/15/09…$15.00 DOI: http://dx.doi.org/10.1145/2799979.2800011
Keywords Intrusion Detection, SQL injection attacks, machine learning, Artificial Intelligence, pattern recognition, Neural Networks, Web Attacks
1. INTRODUCTION Many organisations store important, sensitive, and confidential information related to them, their staff, their clients, and their business partners in databases all across the world. The stored data ranges from less sensitive information such as: first name, last name, and date of birth to more sensitive information such as username, password, pin code, and credit card information. Therefore, it is important for any organisation to protect their databases in order to prevent any loss. CIA triad, including three elements of Confidentiality, Integrity, and Availability, is a wellknown model that can be used in order to develop a security policy for a given organisation. If a given database is attacked for example if it is under the SQLi attack, three elements of the CIA triad can be violated. For instance, the data in the database can be disclosed to unauthorised users, a failure in Confidentiality, or the date can be modified by the hackers, a failure in Integrity, or it can be completely wiped out from the database, a failure in Availability. SQLi attack is a web application vulnerability that comes from dynamic script language such as PHP Hypertext Processor (PHP), Active Server Pages (ASP), Java Server Pages (JSP), and Common Gateway Interface (CGI). It is a code injection technique where hackers try to inject crafted SQL commands via user inputs from a simple/personal web application to the back-end SQL database. SQLi attack has been rated as number-one attack among top ten web application threats by Open Web Application Security Project (OWASP) [13-14]. OWASP is an open community dedicated to enabling organisations to consider, develop, obtain, function, and preserve applications that can be trusted. In our previous work [16], we proposed a NN-based model for the detection of SQLi attacks. The model was built from three elements of: a URL generator, a URL classifier, and a NN model. Addressing the published results, the proposed model was successful to detect the malicious URLs from the benign URLs. We then improved the functionality of our model in our most recent work [18] by adding another level of intelligence where the proposed model was successful to not only detect the malicious URLs from the benign URLs, but also classify the malicious URLs into seven popular SQLi attack categories. Addressing our published results, the improved model has a good performance in terms of accuracy, true positive rate, and false positive rate. In this paper, we stress test our previous proposals from [16] and [18] in order to demonstrate the effectiveness of our proposed method. The remainder of this paper is organised as follows. In Sections II and III, we review seven popular types of SQLi attack as well as related work for SQLi attack detections and preventions. Our previous proposal and the related implementation of NN-based model for detection and classification of SQLi attacks are
briefly discussed in Section IV and V, respectively. Section VI includes the captured results after stress testing our previous proposal which is then followed by conclusions of the work in Section VII, acknowledgments and references.
2. SQL INJECTION ATTACK TYPES In this section, we briefly discuss the popular types of SQLi attack. We try to provide a clear understanding for each type along with the related signature(s) and the possible countermeasure(s). Please refer to [1] for more details on the SQLi attack classifications and countermeasures.
2.1 Tautologies Tautology is a type of SQLi attack where hackers try to bypass authentications, identify injectable parameters, and/or extract data from a targeted database using WHERE clause conditions which are always true in every possible interpretation. For instance: “WHERE password = ‘x’ OR ‘x’ = ‘x’” or “WHERE password = ‘x’ OR 1=1”. Therefore, the possible signatures for this type of attack are: string terminator “‘”, OR, =, LIKE and SELECT. Tautology SQLi attack can be prevented by strictly validating user inputs on user side and blocking queries containing tautological condition WHERE clauses on database side.
2.2 Illegal/logically incorrect queries Illegal/logically incorrect queries SQLi attack is a type of attack where hackers try to identify injectable parameters, perform database fingerprinting, and/or extract data from a database by employing illegal/logically incorrect queries. There are several ways to perform this type of SQLi attack against a given database. This includes all the possible incorrect conversions and incorrect logics in SQL world. Therefore, the possible signatures are: invalid conversions (CONVERT (TYPE)), incorrect logics, using AND operator to perform incorrect logics, using ORDERBY, and incorrectly terminating the string using (‘), etc. Strictly validating user inputs on user side and stopping/sanitising generated error messages from a given database are the two effective countermeasures for preventing this attack.
2.3 Piggy-backed query Piggy-backed query SQLi is a type of attack where hackers aim to extract data, add or modify data, perform Denial Of Service (DoS) attacks, and/or execute remote commands on a given database by taking advantages of misconfigurations on a database in which executing multiple statements in a single query is allowed. Having said that there are at least two queries (benign and malicious) in this type of SQLi attack combining together using a delimiter “;”, the signature for piggybacked query SQLi is delimiter “;”. Strictly validating user inputs on user side and avoiding multiple statement executions on a database by scanning all queries for delimiter “;” on database side are two countermeasures for this attack.
2.4 Union query Union query SQLi is a type of attack where hackers try to bypass authentications and/or extract data from a given database by merging two separate SQL SELECT queries using UNION SELECT statement. Thus, the signature for this type of SQLi attack is: UNION and UNION SELECT meta characters of SQL world. Strictly validating user inputs on user side and blocking multiple query executions at a time on database side are the two countermeasures for preventing this attack.
piggy-backed query SQLi attack, delimiter “;”, and stored procedure keywords such as: SHUTDOWN, exec, xp_cmdshell(), sp_execwebtask(), etc. Strictly validating user inputs on user side, using a low privileged account to run a database on database side, executing stored procedures with a safe interface on database side and giving proper roles and privileges to stored procedures are some countermeasures to block and/or reduce the chance of a successful stored procedure SQLi attack.
2.6 Inference Inference SQLi attack is a type of attack where hackers aim to identify injectable parameters, extract data from a database, and/or determine database scheme by testing the possible vulnerabilities of a back-end database when no data is returned to an end-user from a slightly secured website. There are two popular types of inference SQLi attack discussed as follows.
2.6.1
Inference blind SQLi attack
Inference blind SQLi attack is an error-based attack where hackers try to force a back-end database to throw an error message by asking true-false questions. For instance by asking an “IF ELSE” statement where a division by zero will be executed or else a valid instruction would be performed. As the division by zero is undefined and has no meaning, running such query forces a back-end database to throw an error.
2.6.2
Inference timing SQLi attack
Inference timing SQLi attack is a time-based attack where hackers employ time delay in order to make a difference between true and false responses from a given database. For instance, a true response received from a given database means that the time delay was executed successfully while a false response means hackers were not successful to execute the time delay. Therefore, the possible signatures for this attack, including both interface blind and interface timing SQLi attacks, are: using delimiter “;” with AND operator, IF ELSE conditional operator, and WAITFOR. Strictly validating user inputs on user side, carefully crafting error messages return from databases on database side as well as patching/hardening databases can prevent this attack.
2.7 Alternate encoding Alternate encoding SQLi attack is a type of attack where hackers try to obscure their injected commands using encodings techniques such as: ASCII, hexadecimal, and Unicode character encoding. Thus, the possible signatures for this attack are: exec(), Char(), ASCII(), BIN(), HEX(), UNHEX(), BASE64(), DEC(), ROT13(), etc. Strictly validating user inputs on user side, for instance prohibiting any usage of metacharacters such as “Char()”, and treating all meta- characters as normal characters on database side can prevent the alternate encoding SQLi attack. In terms of violating the three elements of the CIA triad, Inference SQLi attack and Alternate encoding are different from the other SQLi categories. For instance, Inference attack does not compromise CIA of the data but it is rather a preliminary information gathering operation carried out by the attacker. Alternate encoding is way of masquerading SQLi attacks of the other types. All the above attacks along with their signatures and preventions are listed in Table 1 [18]. Related work for SQLi attack, detections and preventions, are discussed in the next section.
2.5 Stored procedures
3. RELATED WORK FOR SQL INJECTION ATTACK
Stored procedure SQLi attack is a type of attack where hackers aim to perform: privilege escalation, DoS attacks, and/or remote commands using stored procedures. The signature for this attack is as same as the
In this section, we address number of papers related to the SQLi attack detection and prevention techniques.
Table 1. SQL Injection Attack Types, Signatures, Preventions [18] No 1
Type of SQLi attack Tautologies
2
Illegal/logically incorrect queries
3
Piggy-backed query
4
Union queries
5
Stored procedures
6
Inference attack
7
Alternate encoding
SQLi
Signature ‘, OR, =, like, select
invalid conversions (CONVERT (TYPE)), incorrect logics, AND, ORDERBY, ‘ ;
UNION, UNION SELECT ;, Stored procedure keywords (SHUTDOWN, exec, xp_cmdshell(), sp_execwebtask()) ;, AND, IF ELSE, WAITFOR exec(), Char(), ASCII(), BIN(), HEX(), UNHEX(), BASE64(), DEC(), ROT13()
Prevention -strictly validating user inputs on user side -blocking queries containing tautological condition WHERE clauses on database side -strictly validating user inputs on user side -stopping and/or sanitising the generated error messages (e.g. logical errors, type errors, and syntax errors) from a given database -strictly validating user inputs on user side -avoiding multiple statement executions on a database by scanning all queries for delimiter “;” on database side -strictly validating user inputs on user side -blocking multiple query executions in a single statement on database side -strictly validating user inputs on user side -using a low privileged account to run a database on database side -executing stored procedures with a safe interface on database side -giving proper roles and privileges to stored procedures being used in a given application form -strictly validating user inputs on user side -carefully crafting error messages return from databases on database side -patching/hardening databases -strictly validating user inputs on user side, for instance prohibiting any usage of meta-characters e.g. “Char()” -treating all meta-characters as normal characters on database side
Authors in paper [2] used Support Vector Machine (SVM) in order to detect and classify the SQLi attacks. They have employed parameters such as: accuracy, detection time, training time, True Positive Rate (TPR), True Negative Rate (TNR), False Positive Rate (FPR), and False Negative Rate (FNR) in order to measure the performance of their proposed technique which shows 96.47% accuracy in detection of SQLi attacks. Authors in paper [3] proposed a static analysis tool for checking the Java Database Connectivity (JDBC) to verify the correctness of dynamicallygenerated query strings. Their proposed JDBC checker flags the potential errors or verifies their absence in dynamically generated SQL queries with low FPR. Authors in paper [4] proposed a query tokenisation algorithm for the detection of SQLi attacks including two arrays: one for original queries and one for injected queries. In order to detect the SQLi attacks, they obtained the lengths of the resulting arrays from two queries and compared them. Thus, if two arrays have the same length there is no injection otherwise there is an injection. Authors in paper [5] proposed Random4 encryption algorithm where user inputs convert into cipher text using a lookup table for the detection of SQLi attacks. They have compared their proposal with the existing algorithms such as: AMNESIA [8], SQL rand [10], SQL DOM [9], WAVES [11], and SQL check [12] taking into account parameters such as: encoding, detection and prevention. Authors in paper [6] proposed a Service Based SQL Injection Detection (SBSQLID) for the detection of SQLi vulnerabilities where user inputs are retrieved from a web application and passed to the set of injection characters for pattern matching. Thus, if pattern matching returns false, the users will be able to work with the web application otherwise they will be disallowed. Authors in paper [7] proposed TransSQL for the detection of the SQLi attacks where SQL requests are automatically translated to Lightweight Directory Access Protocol (LDAP)-equivalent requests. Both queries, SQL query and LDAPequivalent query, are then executed on a SQL database and a LDAP database, respectively. At the end, TransSQL checks the difference in responses from both databases for the detection of the SQLi attacks. Authors in paper [8] proposed AMENSIA for the
detection and prevention of SQLi attacks by combining static analysis and runtime monitoring where the queries that violate the static model will be classified as SQLi attack and will be prevented from accessing the back-end database. Authors in paper [9] proposed SQL DOM for compile-time checking instead of runtime checking of dynamic SQL statements. By using SQL DOM, application developers are able to build dynamic SQL statements through object manipulations, which are strongly typed to a database, without the need for string manipulations. Authors in paper [10] proposed SQLrand for the detection and prevention of SQLi attacks where the SQL standard keywords were manipulated by attaching a randomised and a hard to guess integer to them. Based on their captured results, the latency overhead that imposed on each query using their proposed SQLrand is negligible thus it does not sacrifice the performance. Authors in paper [11] proposed WAVES as a security assessment tool for the detection of SQLi and cross-site scripting attacks where a number of software testing techniques have been employed. Addressing their captured results, WAVES is a feasible platform for assessing the web application security. Authors in paper [12] proposed SQLCHECK for the prevention of SQLi attacks. Their proposed algorithm was evaluated on real-world web applications with real-world attack data as input which shows that SQLCHECK produces no false negative or false positive. SQLCHECK also has low run-time overhead and can be applied straightforwardly to web applications written in different languages. After studying the exiting work related to the SQLi attack detection and prevention techniques, we have noticed a huge lack of employing Artificial Intelligence (AI) in this filed. AI has been successfully used in a wide range of fields including: medical diagnosis, stock trading, robot control, law, remote sensing, scientific discovery, and toys. AI studies how to create computers and computer software that are capable of intelligent behaviour just like human beings. NN is one of the popular AI algorithms which has been employed in various fields in order to perform complex functions that are difficult for conventional computers or human beings. For instance: pattern recognition, identification, classification, speech, vision, and
Figure 1. Components of the proposed neural network-based model for detection and classification of SQLi attacks [18] control systems. This motivates us to bring the SQLi attack detection and prevention problem into the AI filed and particularly into the NN algorithm. Our ultimate research objective is to provide a NN-based Intrusion Detection and Prevention (ID/IP) tool that can be easily extended from SQL-IDS to any application level attacks e.g. Deny of Service (DoS), drive-by downloads, and Man-In-The-Middle (MIMT) attacks. In our previous work [16], we proposed a NN-based model for the detection of SQLi attacks. Our proposed technique was successful to classify a given URL as either a benign URL or a malicious URL by taking into account the popular SQLi attack keywords and URL patterns. In our most recent publication [18], we improved our previous work from [16] by adding another level of intelligence to our proposed NN-based model. Our improved model was successful to: 1) detect the malicious URLs from the benign URLs 2) detect the type of SQLi attacks for the malicious URLs and classify them accordingly. In this paper, we stress test our previous work from [16] and [18] in order to demonstrate the effectiveness of our proposed method. Our previous proposal from [16] and [18] is briefly discussed in the next section.
Let a URL characteristic 𝑟𝑖 generated by the URL generator is defined by a random variable 𝑅𝑖 as follows:
4. THE PROPOSED NEURAL NETWORKBASED MODEL FOR THE DETECTION AND CLASSIFICALTION OF THE SQL INJECTION ATTACKS
𝑇𝑖 =
Our previous proposal of a neural network-based model for the detection and classification of the SQLi attacks was built from three elements of: a URL generator, a URL classifier, and a neural network model [16] and [18]. These three components are depicted in Figure 1 [18] and detailed as follows.
Let D be a random variable representing the type of the malicious URLs: “Tautologies”, “Illegal/logically incorrect queries”, “Piggybacked query”, “Union queries”, “Stored procedures”, “Inference SQLi attack”, or “Alternate encoding” Table 1[18]:
4.1 The URL Generator
Dϵ {Tautologies, Illegal/logically incorrect queries, Piggy-backed query, Union queries, Stored procedures, Inference SQLi attack, Alternate encoding}
Addressing our previous proposal from [16] and [18], the URL generator has two elements: “Benign URLs” and “Malicious URLs”, Figure 1 [18]. The “Benign URLs” includes the most popular URL addresses in the UK which have been taken from [17]. The “Malicious URLs” contains the malevolent URLs which have been generated by adding the popular SQLi attack signatures from Table 1 [18] to the benign URLs.
4.2 The URL Classifier Addressing our previous proposal from [16] and [18], the URL classifier has three elements: “Benign”, “Malicious”, and “Type of attack”, Figure 1 [18]. In our proposal the URL classifier was accountable for: 1) classifying each URL into either a benign URL or a malicious URL and 2) detecting the type of the SQLi attacks for each malicious URL. We have mathematically defined the URL classifier’s functionalities as follows.
𝑅𝑖 = 1, if discovered by the SQLi signature detectors { 0, if not discovered by the SQLi signature detector Let C be a random variable representing the generated URL class: malicious or benign: Cϵ {malicious, benign} Every generated URL (both malicious and benign URLs) is assigned a vector defined by 𝑟 − = (𝑟1 , 𝑟2 , , … , 𝑟𝑛 ) with 𝑟𝑖 being the result of the i-th random variable 𝑅𝑖 . Let a malicious URL characteristic 𝑡𝑖 generated by the URL generator is defined by a random variable 𝑇𝑖 :
1, if discovered by the SQLi attack type detectors { 0, if not discovered by the SQLi atatck type detector
Every generated malicious URL is assigned a vector defined by 𝑡 − = (𝑡1 , 𝑡2 , , … , 𝑡𝑛 ) with 𝑡𝑖 being the result of the t-th random variable 𝑅𝑖 .
4.3 The Neural Network (NN) Model Addressing our previous proposal in [16] and [18], the NN model has three elements: “Training phase”, “Validating phase” and “Testing phase”, Figure 1 [18]. This model comprises of: n layers (n hidden layers or neurons) with x input and y output nodes, Figure 2. The x input nodes are connected to the y output nodes via n hidden nodes using directed arrows. The connection values are called weights. In our proposal, the NN model uses a process called backpropagation, which is an abbreviation for backward propagation of errors, to learn the weights. This process starts with
a given set of input values, a given set of random weights, and a given set of desire output values. Using the random weights, the NN model first let the nodes calculate some outputs. Then it compares the calculated outputs with the desire outputs. The difference is called as network error. Now that the network knows the error it needs to adjust the weights and try to produce smaller errors closer to the desire outputs. This is where the backpropagation process comes to play. We have mathematically defined the backpropagation process as follows. Let the weight for the i-th node defined by a random variable 𝑊𝑗,𝑖 (left side of the arrow below); where 𝑊𝑗,𝑖 (right side of the arrow below) is the node’s old weight, α is the learning rate, 𝑎𝑗 is the node’s input value, and Ϫ𝑖 is the network error. Therefore, the new weight is calculated and then adjusted as follows. 𝑊𝑗,𝑖 𝑊𝑗,𝑖 + α x 𝑎𝑗 x Ϫ𝑖 The error for the i-th nod, Ϫ𝑖 , is calculated as follows. Ϫ𝑖 = (𝑇𝑖 – 𝑂𝑖 ) x g’ ( ∑𝑗 𝑊𝑗,𝑖 𝑎𝑗 ) In our proposal [16] and [18], the neural network model uses the benign and malicious URLs as input vectors for training, validating, and testing. For simplicity three elements of the NN model are put together in two groups of training elements and validating/testing elements as follows.
4.3.1 Training Elements
‘Input’ matrix: this matrix includes the data that the NN model uses in training stage. It comprises all the benign and malicious URLs which have been generated by the URL generator Figure 1 [18]. ‘Target’ matrix: this matrix includes all the decisions including: malicious or benign for all the URLs and the type of the SQLi attacks (Tautologies, Illegal/logically incorrect queries, Piggy-backed query, Union queries, Stored procedures, Inference SQLi attack, Alternate encoding) for malicious URLs. This is for each string of data stored in the ‘Input’ matrix. Fitness network: this is the NN model with n layers (n hidden layers or neurons) with x input and y output nodes where the data from ‘Input’ and ‘Target’ matrixes are used for training, validating and testing, accordingly.
4.3.2 Validating/Testing Elements
‘Sample’ matrix: this matrix contains sample data from the ‘Input’ matrix. The trained NN model uses ‘Sample’ data as input values during the validation stage. ‘Output’ matrix: this matrix contains output data for the data in the ‘Sample’ matrix. The trained NN model predicts the output values for ‘Sample’ matrix and stores them in the ‘Output’ matrix.
The implementations of our previous proposal, [16] and [18], along with the changes that we are made in the input data before rerunning it, are briefly discussed as follows.
5. IMPLEMENTATIONS In our previous work [16] and [18], we have proposed a NN-based model for the detection and classification of the SQLi attacks. Our proposal includes three components: a URL generator, a URL classifier, and a NN model each with different elements, Figure 1 [18]. These three components have been implemented as follows.
5.1 The URL Generator As it is discussed before, the URL generator includes two elements: “Benign URLs” and “Malicious URLs”, Figure 1 [18]. The “Benign URLs” contains a list of the benign URL addresses while the “Malicious URLs” were generated by adding the SQLi attack signatures, Table 1 [18], to the benign URLs.
5.1.1 The “Benign URLs” In this paper, we stress test our previous work, [16] and [18], in order to prove the effectiveness of our previous proposal in terms of accuracy, false-positive rate, and false-negative rate by using different input data. To achieve this, for the “Benign URLs”, Figure 1 [18], we are taken into account two lists. The first list contains the benign URLs with absolutely no SQLi attack signatures while the second list contains the benign URLs with SQLi attack signatures, Table 1 [18]. We named the first list: List1 and the second list: List2 for simplicity. As an example of a benign URL for List1, consider the Google’s URL address in the UK [20]. This is a benign URL with absolutely no SQLi attack signature. As an example of a benign URL for List2, consider the European Union’s URL address in Wikipedia [19]. This is a benign URL address but contains “union” keyword where our proposed NN model could falsely detect it as a potential SQLi attack and thus classify it as a “Union queries” attack. This is called a false positive. To compile these two lists, List1 and List2, we are taken into account 500 real website addresses in the UK, where most of them have been captured from [17]. This includes 340 URLs for List1 (benign URLs with no SQLi attack signature) and 160 URLs for List2 (benign URLs with SQLi attack signatures).
5.1.2 The “Malicious URLs”
In order to generate the “Malicious URLs”, Figure 1 [18], we simply add the SQLi attack signatures, Table 1 [18], to the benign URLs. For instance, addressing the “Union queries” signatures, a generated malicious URL can be a benign URL that has been combined with word “UNION” or word “UNION SELECT”. Likewise, adding “;, AND, IF, ELSE, WAITFOR” signatures to a benign URL, generates a malicious URL which can be identified as a “Inference SQLi” attack. Please refer to Table 1 [18] for the SQLi attack signatures. In our scenario, the total number of the benign URLs is 500 while the total number of the malicious URLs comes to 12,500.
5.2 The URL Classifier
Figure 2. An artificial neural network model
As it is discussed before, the URL classifier includes three elements: “Benign”, “Malicious”, and “Type of attack”, Figure 1 [18]. The URL classifier is accountable for: 1) classifying each URL to either a benign URL or a malicious URL and 2) detecting the type of the SQLi attack for each malicious URL. These two tasks are coded and implemented based on the strings of logic, where 1 represents true/malicious and 0 represents false/benign, by allocating two vectors, r − = (r0 , r1 , , … , r31 ) and t − = (t 0 , t1 , , … , t 7 ), to each single URL in our input data set. For instance, addressing the signatures for “Tautologies” SQLi attack Table 1 [18], if a URL includes: “ ’ ”, “OR”, “=”, “like” and “select” keywords, it will be classified as a malicious URL with
Figure 3. Network Architecture for the Neural Network component of the proposed model
Table 2. Assigned Vectors To The SQLi Attack Signatures [18] Vectors SQLi attack signatures ‘ r0
r27
unhex()
r28
base64()
r29
dec()
r1
or
r30
rot13()
r2
=
r31
*
r3
like
r4
select
r5
convert
r6
int
r7
char
t1
Tautologies
r8
varchar
t2
Illegal/logically incorrect queries
r9
nvarchar
t3
Piggy-backed query
r10
incorrect logics
t4
Union queries
r11
and
t5
Stored procedures
r12 r13
orderby
t6
Inference SQLi attack
;
t7
Alternate encoding
r14
union
r15
union select
r16
shutdown
r17
exec
r18
xp_cmdshell()
r19
sp_execwebtask()
r20
if
r21
else
r22
waitfor
r23 r24
-ascii()
r25
bin()
r26
hex()
Table 3. Assigned Vectors To The SQLi Attack Type [18] Vectors SQLi attack type Benign t0
“11111000000000000000000000000000” value for r − where: r0 represents “ ‘ ”, r1 represents “OR”, r2 represents “=”, r3 represents “like”, and r4 represents “select”. Moreover, given that this is a “Tautologies” SQLi attack, which is the attack type1 in our scenario, the value for t − vector is “01000000”. The r − and t − components are shown in Table 2 [18] and Table 3 [18], respectively.
5.3 The NN Model
As it is discussed before, the NN model includes: “Training phase”, “Validating phase” and “Testing phase”, Figure 1 [18]. It is comprises of n hidden nodes with x input and y output nodes. The NN model uses the benign and malicious URLs, which are generated by the URL generator and classified by the URL classifier, for training, validating and testing phases. In this paper, we implement a NN model with 10 hidden nodes where 70%, 15%, and 15% of the total benign and malicious URLs are used for training, validating and testing, respectively. MATLAB [15], which is popular software for the numerical calculations and formulas with the vast library of functions and algorithms, is used in order to develop, train, validate, and test our previous proposal with a new input data set. The training and validating components are configured as follows.
5.3.1 Training Elements
‘Input’ matrix: this matrix is a logical n x 32 matrix where the data is represented in strings of logics; 1 as true and 0 as false. ‘Target’ matrix: this matrix is a logical n x 8 matrix where the data is represented in logics; 1 as malicious and 0 as benign. Fitness network: this is the NN with 10 hidden nodes where 70%, 15%, and 15% of the data from ‘Input’ and ‘Target’ matrixes are used for training, validating and testing, respectively.
5.3.2 Validating/Testing Elements
‘Sample’ matrix: this matrix is a logical n x 32 matrix contains sample data from the ‘Input’ matrix. ‘Output’ matrix: this matrix is a logical n x 8 matrix contains output data for the data represented in ‘Sample’ matrix. The trained NN model predicts the output value for ‘Sample’ matrix, in terms of a URL being benign or malicious and the type of SQLi attacks for a malicious URL. These predictions will be stored in the ‘Output’ matrix.
Figure 4. Confusion matrix for training
The capture results are discussed in the next section.
6. RESULTS As it was discussed before, our previous proposal [16] and [18] for the detection and classification of the SQLi attacks includes three components: 1) a URL generator, 2) a URL classifier, and 3) a NN model. In this paper, we stress test our previous proposal in order to demonstrate the effectiveness of our technique. For this, we take into account 13,000 URL addresses including 500 benign URLs and 12,500 malicious URLs. The benign URLs are the real URL addresses which are mostly captured from [17] while the malicious URLs are the malevolent URL addresses comprising the SQLi attack signatures, Table 1 [18]. The malicious URLs are generated by the URL generator using PHP language. As it was discussed in the previous section, in order to stress test our proposals [16] and [18], we take into account two lists: List1 and List2 including the benign URL addresses for our implemented scenario. This includes 340 benign URLs for List1 and 160 for List2 which comes to 500 benign URLs in total. List1 includes the benign URL addresses that have absolutely no SQLi attack signatures while List2 contains the benign URLs that have SQLi attack signatures. The entire 13,000 URLs, including 500 benign URLs (340 benign URLs from List1 and 160 benign URLs from List2) and 12,500 malicious URLs, are then classified into either a benign URL or a malicious URL by the URL classifier. The URL classifier also detects the type of the SQLi for each malicious URL by taking into account the seven popular SQLi attacks, Table 1 [18]. At the end we re-train, re-evaluate, and then re-test our proposed NNbased model for the detection and classification of the SQLi attacks using MATLAB [15]. The NN model has 10 hidden layers, 32 input features, 7 output layer, and 8 output features, Figure 3. The captured results are as follows. The confusion matrices for training, validating, and testing are shown in Figure 4 to Figure 6, all respectively, where, for each class, the number of the correct responses is shown in green squares and the number of the incorrect responses is shown in red squares. In our implementation, we map class 1 to benign URLs and classes 2-8 to Type1-7 malicious URLs, Table 1 [18]. For instance, class 2 represents Type1 SQLi attack, which is Tautologies, while class 8 represents Type7 SQLi attack, which is Alternate encoding, Table 1 [18]. The grey squares, which are placed at the end of each row and each column, illustrate the percentages of the accuracies (upper numbers) and inaccuracies (bottom numbers) for both output
Figure 5. Confusion matrix for validating
Figure 6. Confusion matrix for testing
and target classes. The lower-right blue squares illustrate the overall accuracies (upper numbers) and overall inaccuracies (bottom numbers) by taking into account the total accuracies and inaccuracies in output and target classes. Addressing our configuration scenario, the 13,000 URLs (500 benign URLs (340 for List1 and 160 for List2) plus 12,500 malicious URLs) are scattered in three phases of training, validating, and testing with distribution rates of 70%, 15%, and 15%, all respectively. Taking into account the distribution rates in three phases, the discussions on our output results are as follows.
The training phase receives 9,100 out of 13,000 URLs. This includes 0 benign URLs and 9,100 malicious URLs while the former is 0%, and the latter is 100% of the total URLs used in this phase. The malicious URLs comprising of 377 URLs from SQLi type2, 1,821 URLs from SQLi type3, 610 URLs from SQLi type4, 688 URLs from SQLi type5, 1,070 URLs from SQLi type6, 1,759 URLs from SQLi type7, and 2,775 URLs from SQLi type8. This includes 98.1% correct responses and 1.9% incorrect responses for SQLi type2, 96.5% correct responses and 3.5% incorrect responses for SQLi type3, 54.6% correct responses and 45.4% incorrect responses for SQLi type4, 100% correct responses and 0.0% incorrect responses for SQLi type5, 99.7% correct responses and 0.3% incorrect responses for SQLi type6, 99.5% correct responses and 0.5% incorrect responses for SQLi type7, and finally 99.8% correct responses and 0.2% incorrect responses for SQLi type8. The validating phase receives 1,950 out of 13,000 URLs. This includes 0 benign URLs and 1,950 malicious URLs while the former is 0%, and the latter is 100% of the total URLs used in this phase. The malicious URLs comprising of 72 URLs from SQLi type2,392 URLs from SQLi type3, 142 URLs from SQLi type4, 148 URLs from SQLi type5, 208 URLs from SQLi type6, 375 URLs from SQLi type7, and 613 URLs from SQLi type8. This includes 100% correct responses and 0.0% incorrect responses for SQLi type2, 97.2% correct responses and 2.8% incorrect responses for SQLi type3, 59.9% correct responses and 40.1% incorrect responses for SQLi type4, 100% correct responses and 0.0% incorrect responses for SQLi type5, 100% correct responses and 0.0% incorrect responses for SQLi type6, 99.7% correct responses and 0.3% incorrect responses for SQLi type7, and finally 99.3% correct responses and 0.7% incorrect responses for SQLi type8. The testing phase receives 1,950 out of 13,000 URLs. This includes 0 benign URLs and 1,950 malicious URLs while the former is 0%, and the latter is 100% of the total URLs used in this phase. The malicious URLs comprising of 58 URLs from SQLi type2, 375 URLs from SQLi type3, 127 URLs from SQLi type4, 168 URLs from SQLi type5, 225 URLs from SQLi type6, 376 URLs from SQLi type7, and 621 URLs from SQLi type8. This includes 100% correct responses and 0.0% incorrect responses for SQLi type2, 96.3% correct responses and 3.7% incorrect responses for SQLi type3, 61.4% correct responses and 38.6% incorrect responses for SQLi type4, 100% correct responses and 0.0% incorrect responses for SQLi type5, 100% correct responses and 0.0% incorrect responses for SQLi type6, 100% correct responses and 0.0% incorrect responses for SQLi type7, and finally 100% correct responses and 0.0% incorrect responses for SQLi type8.
Addressing the percentages of the incorrect responses in the red squares, Figure 4 to Figure 6, as well as the overall percentages of the accuracies and inaccuracies in blue squares, we can say that after stress testing our previous proposal, [16] and [18], the outputs for all three phases stay accurate. This includes overall 96% accuracies and 4.0% inaccuracies for training phase, overall 96.3% accuracies and 3.7% inaccuracies for validation phase, and overall 96.8% accuracies and 3.2% inaccuracies for testing phase.
7. CONCLUSION In our previous work, [16] and [18], we have proposed a pattern recognition neural network model for the detection and classification of the SQLi attacks. It includes three components: 1) a URL generator, 2) a URL classifier, and 3) a neural network model. In this paper, we stress test our previous work, [16] and [18], in order to demonstrate the effectiveness of our previous proposal. To achieve this, we have taken into account two lists to compile the benign URLs. The first list, List1, includes the benign URL addresses with absolutely no SQLi attack signatures while the second list, List2, contains the benign URL addresses with SQLi attack signatures. Addressing the captured results in this paper, our previous proposal achieves a good performance in terms of accuracy.
8. ACKNOWLEDGMENT The author would wish to acknowledge the support of the Abertay University for funding this work.
9. REFERENCES W. G. Halfond, J. Viegas, and A. Orso, “A Classification of SQLInjection Attacks and countermeasures”, in Proc. of the Internet Symposium on Secure Software Engineering (ISSSE 2006), Mar. 2006. [2] R. Romil, R. Shailendra, "SQL injection attack Detection using SVM", in International Journal of Computer Applications, V.42, N.13, March 2012. [3] C. Gould, Z. Su, and P. Devanbu, "JDBC checker: A static analysis tool for SQL/JDBC applications," 2004, pp. 697- 698. [4] N. A. Lambert and K. Song Lin, ”Use of Query Tokenization to detect and prevent SQL Injection Attacks”, IEEE, 2010. [5] A. Srinivas, G. Narayan, S. Ram, “Random4: An Application Specific Randomized Encryption Algorithm to prevent SQL injection, in Trust, Security and Privacy in Computing and Communications (TrustCom)”,2012 IEEE 11th International Conference, pp.no. 1327 – 133, 25-27 June 2012. [6] V. Shanmughaneethi, C. Emilin Shyni and S.Swamynathan, “SBSQLID: Securing Web Applications with Service Based SQL Injection Detection” 2009 International Conference on Advances in Computing, Control, and Telecommunication Technologies, 978-07695-3915-7/09, 2009 IEEE. [7] K. Zhang, Ch. Lin, Sh. Chen, Y. Hwang, H. Huang, and F. Hsu, “TransSQL: A Translation and Validation-based Solution for SQLInjection Attacks”, First International Conference on Robot, Vision and Signal Processing, IEEE, 2011. [8] W. G. Halfond and A. Orso, “AMNESIA: Analysis and Monitoring for NEutralizing SQL-Injection Attacks”, In Proceedings of the IEEE and ACM International Conference on Automated Software Engineering (ASE 2005), Long Beach, CA, USA, Nov 2005. [9] R. McClure and I. Kruger, “SQL DOM: Compile Time Checking of Dynamic SQL Statements”, In Proceedings of the 27th International Conference on Software Engineering (ICSE 05), pages 88–96, 2005. [10] Stephen W.Boyd , Angelos D.Keromytis, “SQLrand: Preventing SQL injection Attacks”. [1]
[11] Y. Huang, S. Huang, T. Lin, and C. Tsai, “Web Application Security Assessment by Fault Injection and Behavior Monitoring”, In Proceedings of the 11th International World Wide Web Conference (WWW 03), May 2003. [12] Z. Su and G. Wassermann, “The Essence of Command Injection Attacks in Web Applications”, In The 33rd Annual Symposium on Principles of Programming Languages (POPL 2006), Jan. 2006. [13] Open Web Application Security Project (OWASP), avulilable at: https://www.owasp.org/index.php/About_OWASP [retrieved: Dec, 2014]. [14] Open Web Application Security Project (OWASP) top 10, avialble at: https://www.owasp.org/index.php/Top_10_2013-Top_10 [retrieved: Dec, 2014]. [15] MATLAB R2014a, available at: http://www.mathworks.co.uk [retrieved: Dec, 2014]. [16] N. Moradpoor, “Employing Neural Networks for the Detection of SQL Injection Attack”, In The 7th conference on Security of Information and Networks (SIN2014), Sep 2014. [17] Alexa, Bringing Information into Focus, available at: http://www.alexa.com/topsites/countries/GB [retrieved: Dec, 2014]. [18] N. Moradpoor, “A Pattern Recognition Neural Network Model for Detection and Classification of SQL Injection Attacks”, The 13th International Conference on Information and Communication Engineering (ICICE15), Jun 2015. [19] Europeen Union specifications in Wikipedia, available at: www. www.en.wikipedia.org/wiki/European_union [retrieved: Apr, 2015]. [20] URL address for Google in the UK, available at: www.google.co.uk [retrieved: Apr, 2015].