Information-Theoretic Detection of SQL Injection Attacks Hossain Shahriar* and Mohammad Zulkernine School of Computing Queen’s University, Kingston, Canada {shahriar, mzulker}@cs.queensu.ca Abstract— SQL Injection (SQLI) is a wide spread vulnerability commonly found in web-based programs. Exploitations of SQL injection vulnerabilities lead to harmful consequences such as authentication bypassing and leakage of sensitive personal information. Therefore, SQLI needs to be mitigated to protect end users. In this work, we present a novel approach to detect SQLI attacks based on information theory. We compute the entropy of each query present in a program accessed before program deployment. During program execution time, when an SQL query is invoked, we compute the entropy again to identify any change in the entropy measure for that query. The approach then relies on the assumption that dynamic queries with attack inputs result in increased or decreased level of entropy. In contrast, a dynamic query with benign inputs does not result in any change of entropy value. The proposed framework is validated with three open source PHP applications that have been reported to contain SQLI vulnerabilities. We implement a prototype tool in Java to facilitate the training and detection phase of the proposed approach. The evaluation results indicate that the approach detects all known SQLI vulnerabilities and can be a complementary technique to identify unknown vulnerabilities. Keywords: SQL injection, software vulnerability, information theory, entropy.
I.
INTRODUCTION
Web-based programs store and retrieve sensitive information from databases by executing SQL queries. These queries often include user supplied inputs that are not sanitized properly before being included in dynamically generated queries. As a result, the intended structures of dynamic queries get altered and result in SQL Injection (SQLI) attacks. The consequence of SQLI attacks could be devastating. Altered queries due to SQLI attacks might (i) select all the records from a table as opposed to one or zero number of record, (ii) run additional queries, (iii) insert, update, or delete new records, and (vi) create or delete arbitrary tables. According to a recent survey performed by the Open Web Application Security Project (OWASP), SQLI is ranked as the number one source of vulnerabilities in webbased programs [1]. Approximately 63% of websites are still believed to be vulnerable to SQLI attacks [31]. Successful exploitations of SQLIV have already caused significant financial losses to organizations [8]. Therefore, mitigating SQLI attacks in web-based programs is essential to avoid malicious consequences. Many techniques have been proposed to detect SQLI attacks, particularly during different stages of the software development life cycle [31]. These include input character
filtering or input validation [34], static analysis [4], combination of static analysis and runtime monitoring [5, 29], fixing of source code by using prepared statements [9], intrusion detection at application [11] and network [28, 32] levels, randomization of SQL keywords [3], etc. However, all these approaches do not view testing of SQLI attacks in terms of the complexity of query measurement. As a result, most of the approaches work well for known malicious inputs and may not detect unknown attacks. Our viewpoint is that an attack query compromises the complexity of an intended query. Thus, measuring a query complexity statically and observing any deviation at runtime should provide us an indication of the occurrence of an SQLI attack. This motivation leads us to the development of necessary measurement as a technique to detect SQLI attacks and complement other existing research work. Information theory is a widely used concept to measure the complexity of real world phenomenon [25, 26] and has been applied to tackle many network security related problems [23, 27, 36]. In this paper, we present an information-theoretic approach to detect SQLI attacks. We compute the complexity of each query present in a program based on the entropy value, which is derived from the token probability distribution of a query. During program execution and when an SQL query is invoked, we compute the complexity (or entropy) again to identify any changes with respect to the known entropy measured earlier. A dynamic query with attack inputs alters its intended structure and hence the entropy level changes significantly. In contrast, a dynamic query with benign inputs does not result in any changes of the entropy values. The proposed framework has been validated with three open source PHP applications reported to contain SQLI vulnerabilities. 1 We implement a prototype tool to facilitate the training and detection of SQLI attacks based on the proposed approach. The evaluation results indicate that the approach can detect all known SQLI attacks. Moreover, the approach reveals several unknown vulnerabilities not reported before. The proposed approach has several advantages. First, it does not rely on the specific type of attack inputs. Second, input filtering mechanisms may not be accurately implemented and often fail to detect new types of malicious inputs. An application may be implemented to become user friendly to accept benign inputs while creating the security holes for SQLI attacks. For example, O’Connor includes a punctuation *The author is currently affiliated with the Department of Computer Science, Kennesaw State University, Kennesaw, GA, USA 30144 and can be contacted at
[email protected].
sign which if allowed by a program may open the door of SQLI attacks. Our approach can be deployed as a solution indepth to complement input filtering approach. The paper is organized as follows. Section II shows an example of SQLI attack followed by a brief discussion on the related works. In Section III, the proposed approach is discussed in details. Section IV discusses the implementation and experimental evaluation. Section V draws the conclusions and discusses limitations and future work. II.
SQL INJECTION ATTACK AND RELATED WORK
A. SQL Injection (SQLI) attack We provide an example of an SQLI attack by using a PHP code snippet shown in Figure 1 (adapted from PhpFusion application [24]). Lines 1 and 2 extract user supplied information from Login and Password fields of a request into $sLogin and $sPassword variables, respectively. The inputs are not filtered and concatenated with other strings to generate a dynamic SQL query at Line 3. Line 4 executes the query and Line 5 performs the authentication. If the provided credential information is matched with the information from the tlogin table, the current session id is set with the obtained id field from the table. 1. $sLogin = $_GET[‘Login’]; 2. $sPassword = $_GET[‘Password’]; 3. $qry = "select id, level from tlogin where login =’”.$sLogin. “’ and password =’”.$sPassword. “’”; 4. $result = mysql_query($sql); 5. while($row = mysql_fetch_array($result)) { // authentication 6. $_SESSION[’ID’] = $user['id']; …….. 7. } Figure 1. PHP code example for authentication TABLE I. Attack types Tautology (T) Union (U) Piggy backed query (P) Inference (I) Hex encoded query (H)
SQL INJECTION ATTACK EXAMPLES Example attack strings i) ’ or 1=1 -ii) 'greg' LIKE '%gr%' -iii) ' UNION SELECT 1,1 -iv) ’; show tables; -v) ’; shutdown; -vi) '; /*! select concat('1','2') */ \g (mySQL) -vii) '; /*! select ‘1’ + ‘2’ */ \g (MS SQL) -viii) 0x206F7220313D31 Here, 0x206F7220313D31 is hexadecimal representation of the string “' or 1=1')”
Let us assume that a user provides valid values of login and password as guest and secret respectively. Then, the generated query at Line 3 becomes, “select id, level from tlogin where login ='guest’ and password = 'secret’”. The database engine executes the query at Line 4 and the user is authenticated by assigning a valid session id at Line 6. However, an attacker might supply input “’ or 1=1 -- ” in the first field and leave the second input field as blank. The resultant query becomes “select id, level from tlogin where login =’’ or 1=1 --’ and password =’’”. The resultant query now becomes a tautology as the query after the comment
symbol “--” will be ignored by the database engine. The syntax of the altered query is correct. Therefore, the attacker will be able to skip the authentication after the query is executed and assume the identity of the first user of the table tlogin. Halfond et al. [19] classify SQLI attacks into seven types: tautology, union query, illegal/logical incorrect query, piggy-backed query, stored procedure, inference attack, and alternate encoding (or Hex encoded query). Tautology is the most common form of SQLIA. However, the form of tautology varies widely. The first two examples of Table I represents tautology (T) attack by adding (i)’ or 1=1 and (ii) 'greg' LIKE '%gr%'. The first case exploits the well known boolean logic (i.e., true or X = true). The second case takes advantage of the regular expression (%gr%) matching against the supplied known word (‘greg’). The union attacks add UNION keyword along with arbitrary column names or supplied values (example (iii)), which will always be selected in the output result set of the executed query. Piggy backed queries are the most harmful type of attacks, where attackers append additional SQL statements at the end of intended queries. These extra statements might reveal table information (example (iv)), shutdown the entire database server (example (v)), create new tables, and delete existing tables. The inference attack is performed to gather information about backend database engines by identifying names, versions etc. through the injection of known functionalities. For example, string concatenation is supported by the concat function in MySQL database (example (vi)), whereas Microsoft SQL server supports the same function by “+” operator (example (vii)). Maor et al. [20] show that it is possible to perform SQLI attack by tracing the error messages returned by a database engine. The Hex encoded query attacks are preformed by providing attack strings in the form of hexadecimal representation to escape input filters that are often placed on the client side of applications. Example (viii) shows the hexadecimal representation of the tautology attack string of example (i). Overall, any attack string can be translated to its hexadecimal form. B. Related work on SQLI attack detection Many works have addressed the detection of SQLI attacks. Buehrer et al. [2] detect SQLI by comparing the parse tree of an intended SQL query before and after the inclusion of user supplied input. The approach can be evaded if user supplied input does not appear at the leaf of the tree. In contrast, information-theoretic approach can overcome this issue. Boyd et al. [3] randomize SQL keywords so that attacker injected keywords become meaningless to the database engine. However, the approach requires randomizing SQL keywords at program code level with large random numbers as well as de-randomizing all the keywords before supplying queries to database engines. If caution is not taken, random numbers could be guessed by attackers and the approach could be thwarted. Further, the approach
may not detect piggy backed query attacks, where additional statements do not use any SQL keywords. Agosta et al. [29] identify SQLI vulnerabilities based on the symbolic code execution and static taint analysis, which include tainted sources (inputs from users or files), transfer functions (SQL related attack input filtering methods), and sinks (query invocation point) identification stages. If a tainted source is able to reach a sink, then vulnerability is identified. Su et al. [4] mark user supplied portions in queries with a special symbol and augment SQL grammar with production rules. A parser is generated based on the augmented grammar rules. The parser successfully parses the generated query at runtime, if there are no malicious inputs in the generated queries after adding user inputs. Halfond et al. [12] develop non-deterministic finite automata to model the structure of implemented SQL queries that needs to be conformed during program runtime. Kim et al. [30] remove query attribute values, which are parameters or input variables that become part of dynamic queries and identify the fixed values present in queries. At runtime, they remove attribute values from generated queries and perform XOR operations between their fixed values and the fixed values of the earlier saved queries. An attack results in a non-zero value during the XOR operation. A zero value indicates that the intended query structure and dynamic query structure remains similar. Unlike their approach, we do not require removing the attribute values from SQL queries and performing query attribute wise operation. Thomas et al. [9] convert each SQL statement into prepared statement so that the structure is not altered during runtime. The solution is very much specific to a few languages such as Java. In contrast, our proposed approach overcomes this limitation and can be applied for a wide variety of scripting languages such as PHP. Sruthi et al. [10] develop the CANDID tool where they initially generate the expected parse trees of SQL queries with benign inputs. They compare them with the trees generated from actual inputs to detect any deviation and SQLI attacks. Lin et al. [11] propose application level security gateway to prevent SQLI attacks. They identify possible entry points of SQLI attacks to be protected by employing meta-programs that can filter out meta-characters to avoid SQLI attacks. Kemalis et al. [13] express SQL queries with Extended Backus Naur Form (EBNF). They embed an architecture in program code to monitor SQLI attacks by matching runtime generated queries with EBNF specifications. A number of works have addressed SQLI attacks from the perspective of effective test case generation ([5, 7, 14, 17]). Interested readers may see the details of further works at [21] based on testing, static analysis, dynamic analysis, and program transformation. We area also aware of a recent project named loginsystem-rd that provides a collection of template page for programmers to securely implement the login functionality [34]. SQLI attacks are common at the login interface of web applications. While the effort is useful at the implementation level, complementary steps for mitigating SQLI attacks at runtime are still needed. Our work can be applied in that direction.
C. Related works on information theory-based attack detection Information theory has been pioneered by Shannon [26] who developed the probabilistic framework to measure the amount of bits required to transmit messages from one end point to another. Our proposed approach has been motivated by some of the works that apply information-theoretic measurement to solve some challenging security related issues such as malicious worm detection [23, 27]. Khayam et al. [23] develop a worm detection technique by comparing the current traffic patterns (between source and destination ports) of a given host to the corresponding benign traffic patterns. The deviation between two traffic patterns is computed using Kullback-Leibler (K-L) divergence (a measurement of computing the distance between two probabilistic distributions). Yu et al. [27] detect worms by comparing the entropy level between benign traffic and malicious traffic containing worm. In comparison to these efforts, we apply the entropy measurement from the information theory to identify the complexity (or randomness of token) of a given query during static and dynamic phases of an application and find the occurrence of any difference to detect SQLI attacks. A number of related works apply information-theoretic model to study Intrusion Detection Systems (IDS). Lee et al. [36] propose a number of information-theoretic measures (entropy, conditional entropy, relative conditional entropy, and information gain) to study the characteristics of audit log data as well as develop an anomaly detection model. Gu et al. [35] develop an information-theoretic model to study the performance of IDS. They apply a number of measures (e.g., entropy, conditional entropy, and mutual information) to detect the effectiveness of IDS (e.g., false positive and false negative rates). We are also aware of several works that apply information theory to tackle non-security related issues such as measuring the complexity of software systems divided into modules (e.g., cohesion and coupling among modules [18]). Information theory-based framework has also been applied to allocate resources to maximize performance of data storage and peer to peer file sharing systems [22]. In contrast to all these efforts, our work develops an entropy computation model to measure the complexity of a given query. The query complexity is denoted as entropy. This entropy should remain intact and any alteration indicates the presence of malicious inputs. III.
INFORMATION THEORY-BASED FRAMEWORK FOR SQLI ATTACK DETECTION
Figure 2 shows the proposed framework for detecting SQLI attacks. It has two phases: training (top level) and detection (bottom level). During the training phase, we analyze server side script code and compute the necessary elements of the model. We compute the query token probability distribution used by queries. This result is used to compute the entropies of each query.
Training phase Program source
Server-side script analyzer
Static query entropy calculator
Instrumented script code
Detection phase Query invocation
Instrumented script code
No Benign query
Dynamic query entropy calculator
Deviation found?
generated dynamic query may have been malformed through SQLI attack inputs. Let Q = {q1, q2, …, qN} be a set of queries present in a program, and P(q) is the probability of the query q, where qεQ. The entropy (denoted as H) of all the queries present in the program can be defined as follows (Equation (i) [25]): H(Q) = -E[logP(Q)] = -∑
Yes Malicious query
Figure 2. SQLI attack detection using information-theoretic framework
Once the entropies are computed, they are recorded at specific program locations (instrumented code) to be compared at a later time. Thus, the training phase works at the white box level compared to most contemporary information theory-based learning techniques that are suitable at black-box level [23, 27, 35, 36]. The detection phase begins with a database query invocation point for a given program instrumented with entropy information. The generated dynamic query is analyzed to compute the entropy. Ideally, the entropy of the dynamic query should match with the pre-recorded entropy level learned during the training phase. If there is a match, then the dynamic query is deemed as benign and no SQLI is present in the generated query. The query is allowed to execute. Otherwise, the query is marked as malicious and the dynamic query includes malicious attack inputs. The query is not executed and a log report is generated. The benefit of this approach is that it does not require tainted data flow analysis or complex static analysis. Moreover, it allows us to overcome the difficulty of implementing correct input filtering mechanisms to prevent SQLI attacks. In the next subsections, we provide the details of the information-theoretic model used for computing the entropy of queries, an example of program instrumentation followed by a working example of SQLI attack detection by the proposed model. A. Information theory model for SQL query We develop an information-theoretic framework to detect SQLI attacks at the server-side. First, we identify static SQL queries present in programs and develop the probabilistic distribution of the tokens present in queries. Then, information content is measured using the standard entropy. The entropy is an indicator of the complexity of the query written by a programmer. Our assumption is that for a given set of benign inputs, the complexity of the generated query should match with that of the complexity identified for the query written during the implementation of a program. In contrast, a set of attack inputs can alter a given query and as a result, the complexity of the implemented query may either increase or decrease. Thus, any deviation indicates that the
Q
(q=qi)… (i)
Here, E is the expected value and the base of the logarithm function is two. The unit of the entropy is measured as bit. In other words, the unit represents the average amount of information that is required to represent all the queries of the program. Let us assume that Ω = {x1, x2, …, xL) be the set of all tokens that can be present in a query q. P(x) is the probability of that the token x appears in the query q, where xεΩ and qεQ. We define the entropy of a query q as follows (Equation (ii)): H(q) = -∑
log P(X=x)… (ii)
Here, H(q) ≥ 0. This implies that the lowest value of entropy of a query q is zero. This might happen if a query is present after the comment symbol (e.g., -- select * from tlogin). For any query this is not within comment symbol, there is no fixed upper bound of H(q). Similarly, we can show that H(Q) ≥ 0. Since the framework relies on token probability distribution, it can be scaled up to cascaded queries (e.g., Select * from tlogin where id in (Select 1, 2, 3)) easily. B. PHP code instrumentation example The obtained entropy information is encoded by the program through instrumentation. We show an example snapshot of an instrumented program in Figure 3 for the vulnerable code shown in Figure 1. Here, Lines 4 and 5 set the entropy of $qry in a global storage (the q1 indicates a query id). 1. $sLogin = $_GET[‘Login’]; 2. $sPassword = $_GET[‘Password’]; 3. $qry = "select id, level from tlogin where login =’”.$sLogin. “’ and password =’”.$sPassword. “’”; 4. if (!isset($_GLOBALS[‘q1’]) 5. $_GLOBALS[‘q1’] = getEntropy($qry); 6. else{ 7. if (getEntropy($qry) != $_GLOBALS[‘q1’]) //entropy comparison 8. raiseWarningSQLI($qry); //SQLI attack detected 9. $result = mysql_query($sql); 10. while($row = mysql_fetch_array($result)) { // authentication 11. $_SESSION[’ID’] = $user['id']; 12. } …….. 13. }//end else Figure 3. PHP code instrumentation for entropy measurement
Lines 7 and 8 perform the runtime check and compare the entropy of the dynamic query with previously computed entropy. A deviation results in the generation of an SQLI attack warning and blockage of the query execution. Note
that if the source code is changed and query structures are modified, then programs need to be instrumented again to recompute the entropy values. C. SQLI attack detection using information theory model We now show an example of SQLI attack detection using the entropy of a query. Let us consider the table tlogin. Table II shows an example of SELECT query (q1). The third column of Table II shows the number of token counts. The token probability distribution is shown in Table III. We can measure the complexity of the query as follows: (xi)*logP(xi) H(q) = H (x1, x2, …, xN) = ∑ TABLE II.
QUERY FOR TLOGIN
Query # code Token count 13 q “select id, level from tlogin where login =’” + sLogin + “’ and password = ‘” + sPassword + “’;”; TABLE III.
TOKEN PROBABILITY DISTRIBUION FOR BENIGN QUERY
x5 x6 x7(logi x8 x9 x1(sele x2 x3 x4 ct) (id (lev (fro (tlogi (wher n) (=) (sLogi ) el m) n) e) n) 1/13 1/1 1/13 1/13 1/13 1/13 1/13 2/1 1/13 3 3 TABLE IV.
x10 x11 x12 (an (passwo (sPasswo d) rd) rd) 1/1 1/13 1/13 3
TOKEN PROBABILITY DISTRIBUION FOR MALICIOUS QUERY
x1(select) x2 x3 x4 x5 x6 x7(login) x8 x9 x10 (id) (level (from) (tlogin) (where) (=) (or) (1) 1/12 1/12 1/12 1/12 1/12 1/12 1/12 2/12 1/12 2/12
For the query q, we can measure the complexity based on the token distribution (shown in Table III) as follows: H(q) = H(1/13, 1/13, 1/13, 1/13, 1/13, 1/13,1/13, 2/13, 1/13, 1/13, 1/13, 1/13) = -11/13*log(1/13) - 2/13*log(2/13) = 3.54 If we consider an attack input such as tautology (described in Section II), the query q is “select id, level from tlogin where login =’’ or 1=1 --’ and password =’’”. Given this dynamic query where tlogin is the target table, we notice that the token distribution has been changed (shown in Table IV). The entropy of q becomes: H(q) = H(1/12, 1/12, 1/12, 1/12, 1/12, 1/12,1/12, 2/12, 1/12, 2/12) = -8/12*log(1/12) – 4/12*log(2/12) = 3.25 Note that we omitted all the tokens present after the comment symbol (--) as well as any blank inputs. The entropy value of the dynamic query with malicious inputs has decreased to 3.25 bits with respect to the intended entropy value of 3.54 bits (decreased by 8%). An attack might result in the increment of entropy value (e.g., an attack including a UNION query increases the number of unique tokens).
IV.
IMPLEMENTATION AND EVALUATION
We implement a prototype tool for testing SQLI. The tool accepts PHP files and computes entropy for each queries present in a program. The entropy information is instrumented in program code and compared during actual program execution time. We reuse the PHP parser developed in [33] to obtain the parse tree of a query and count the tokens from the tree. TABLE V.
CHARACTERISTICS OF THE PHP APPLICATIONS
Program # of files PHP-Address 12 book Serendipity 19 PHP-fusion 22
LOC SELECT INSERT UPDATE DELETE 3,422 25 2 3 2 5,465 6,419
4 62
1 7
1 9
0 1
We evaluate the proposed model using the three open source PHP programs available from sourceforge.net. The chosen programs are ranked among the top five most downloaded applications from the website. The programs include PHP-Address (address and contact information manger) [16], Serendipity (blog management program) [15], and PHP-fusion (content management system) [24]. The second and third columns of Table V show the number of files and the corresponding lines of code (LOC) that we analyze during our evaluation. The programs include SELECT, INSERT, UPDATE, and DELETE types of queries. All the database functionalities are supported by MySQL. TABLE VI. Program
# of tables Average entropy
PHP-Address book Serendipity PHP fusion TABLE VII. Program
PHPAddress book Serendipity PHP fusion
RESULTS FOR BENIGN QUERIES
7 1 18
1.6 2.12 1.68
Min Max Entropy Entropy 1 2.8 1.22 2.92 1 3.9
RESULTS FOR MALICIOUS QUERIES
# of attack # of attacks FN (%) # of # of inputs tested detected vulnerabili vulnerabili ties known ties (OSVDB) unknown 128 128 0 7 8 24 316
24 316
0 0
3 23
0 2
We perform the evaluation in the following two steps. First, we identify SQL queries in each file. Then, we compute the probability distribution of the queries followed by recording the entropy levels. In the second stage, we run the programs by deploying them in an apache web server. Then, we visit the corresponding pages and supply malicious inputs so that the queries are executed. We notice whether the instrumented code with entropy information successfully stops the query execution and logs a warning or not. For
attack test inputs, we choose test inputs from OWASP security testing guide [1] and consider tautology, union, and piggyback queries. We now discuss our findings before and after testing. Table VI shows the entropy information obtained for the queries during the first stage of the evaluation. We show the number of tables, average of entropy for all queries present in a program, and the minimum and maximum entropy found during the evaluation. Among the three applications, PHP Fusion uses the maximum number of tables with an average entropy level of 1.68. The minimum entropy level has been found in Address Book and PHP Fusion. The maximum entropy has been found in PHP Fusion. The maximum entropy is due to the large and complex queries present in the program. Table VII shows the results obtained during the second stage of our evaluation. For each of the files analyzed, we supply malicious inputs (two types of tautology attacks and two types of union attacks). Thus, the total number of attack test cases shown in the second column of Table VII is a multiple of the total number of select, insert, update, and delete types of queries (sum of the last four columns of Table V). We report the number of cases where query invocation points are blocked by instrumented programs. In our evaluation, we notice that all the malicious queries have been blocked by the framework. Thus, the false negative rate in our evaluation is zero. Further, our approach identifies some new cases of vulnerabilities in PHP-Address book and PHP Fusion. For example, in the PHP Address book, we find that the export.php file is vulnerable to SQLI attacks for the following query: SELECT * FROM $month_from_where $table.".id = '$id'". Here, $id is not sanitized properly. Note that we confirm the unknown vulnerabilities by comparing the reported vulnerabilities from Open Source Vulnerability Database (OSVDB) [6]. Figure 4 shows a snapshot of the changes of entropy that we observe during our evaluation with tautology and union attacks for the Serendipity application. Here, the horizontal axis indicates a query number (six queries are numbered from 1 to 6) and the vertical axis indicates the entropy value (bits). It is obvious from the graph that compared to the benign queries, the malicious queries usually have higher entropies. Moreover, depending on the input test cases the amount of entropy can be as high as 10 bits. V.
CONCLUSIONS AND FUTURE WORK
Despite secure programming practices are in place and many proactive countermeasures are available, most of the web-based programs still suffer from SQLI vulnerabilities. Exploitations of SQLI vulnerabilities result in unwanted program behaviors and loss and leakage of confidential information of end users. Thus, SQLI mitigation needs to be considered seriously. This work detects SQLI attacks using an information-theoretic approach. We define entropy as a measure to understand the complexity of the static query written by programmers. When a malicious inputs successfully alters the static nature of the query, the complexity value changes. Thus, we rely on comparing the
statically computed entropies with that of dynamically computed entropies. The deviation indicates the presence of SQLI in a query. The approach has been found to be effective for a set of vulnerable programs implemented in PHP. The approach allows us to identify unknown SQLI attacks. 12
Conditional entropy (union attack) Conditional entropy (tautology attack) Conditional entropy (expected benign)
10
8
6
4
2
0 1
2
3
4
5
6
Figure 4. Entropy for benign and malicious queries for Serendipity application
Currently, our approach does not address the SQLI in stored procedures as it requires our approach to be extended at the database script level. Our future work includes extending the information-theoretic model (e.g., applying conditional entropy and information gain measures) to detect SQLI attacks at the black-box level (HTTP payloads received and sent by web applications). We also plan to apply our developed model for detecting other web-based attacks such as cross-site scripting. ACKNOWLEDGMENT The authors would like to thank the Natural Sciences and Engineering Research Council of Canada (NSERC). REFERENCES The Open Web Application Security Project (OWASP), https://www.owasp.org/index.php/Top_10_2010-Main [2] G. Buehrer, B. W. Weide, P. A. G. Sivilotti, “Using parse tree validation to prevent SQL injection attacks,” Proceedings of the 5th International Workshop on Software Engineering and Middleware (SEM '05), Lisbon, Portugal, 2005, pp.106-113. [1]
S. Boyd and A. Keromytis, “SQLrand: Preventing SQL injection attacks”, Proceedings of the 2nd Applied Cryptography and Network Security Conference (ACNS), June 2004, pp. 292-302. [4] Z. Su and G. Wasserman, “The Essence of Command Injection Attacks in Web Applications,” Proc. of the Symposium on Principles of Programming Languages (POPL), January 2006, South Carolina, USA, pp. 372-382. [5] Y. Kosuga, K. Kono, M. Hanaoka, M. Hishiyama, Y. Takahama, “Sania: Syntactic and Semantic Analysis for Automated Testing against SQL Injection,” Proc. of the 23rd Annual Computer Security Applications Conference (ACSAC), Miami, Dec 2007, pp. 107-117. [6] Open Source Vulnerability Database, http://osvdb.org [7] Y. Shin, L. Williams, and T. Xie, “SQLUnitGen: SQL Injection Testing Using Static and Dynamic Analysis,” Proc. of the International Symposium on Software Reliability Engineering (ISSRE), Raleigh, NC, 2006, ISBN 978-0-9671473-3-3-8. [8] L. A. Gordon, M. P. Loeb, W.Lucyshyn, and R. Richardson., “Ninth CSI/FBI computer crime and security survey”, Technical Report RL32331, C.S.I. Computer Security Institute, 2004. [9] S. Thomas and L. Williams, “Using Automated Fix Generation to Secure SQL Statements,” Proc. of the 3rd International Workshop on Software Engineering for Secure Systems (SESS’07), Minneapolis, 2007, pp. 9-14. [10] S. Bandhakavi, P. Bisth, P. Madhusudan, and V. Venkatakrishnan, “CANDID: Preventing SQL Injection Attacks using Dynamic Candidate Evaluations,” Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS’07), Alexandria, Virginia, Oct 2007, pp. 12-24. [11] J. Lin and J. Chen, “The Automatic Defense Mechanism for Malicious Injection Attack,” Proc. of the 7th International Conference on Computer and Information Technology, Fukushima, Japan, October 2007, pp. 709-714. [12] W. Halfond and A. Orso, “AMNESIA: Analysis and Monitoring for NEutralizing SQL-Injection Attacks,” Proc. of Automated Software Engineering (ASE), Long Beach, California, USA, November 2005, pp. 174-183. [13] K. Kemalis and T. Tzouramanis, “SQL-IDS: A Specificationbased Approach for SQL-Injection Detection”, Proc. of 23rd ACM Symposium on Applied Computing (SAC’08), March 2008, Fortaleza, Brazil, pp. 2153-2158. [14] W.K. Chan, S.C. Cheung, and T.H. Tse, “Fault-based Testing of Database Application Programs with Conceptual Data Model,” Proc. of the 5th International Conference on Quality Software (QSIC), Melbourne, Australia, September 2005, pp. 187-196. [15] Serendipity PHP Weblog System, Accessed from http://www.s9y.org [16] PHP Address Book, http://sourceforge.net/projects/phpaddressbook/?source=directory. [17] H. Shahriar and M. Zulkernine, “MUSIC: Mutation-based SQL Injection Vulnerability Checking,” Proceedings of the 8th IEEE International Conference on Quality Software (QSIC), London, UK, August 2008, pp. 77-86. [18] E. Allen and T. Khoshgoftaar, ‘‘Measuring Coupling and Cohesion: An Information Theory Approach,’’ Proceedings of the 6th International Software Metrics Symposium, November 1999, Boca Raton, FL, pp. 119-127. [19] W. G. Halfond, J. Viegas, and A. Orso, “A Classification of SQL-Injection Attacks and Countermeasures,” Proceedings of the International Symposium on Secure Software Engineering (ISSSE 2006), Mar. 2006. [3]
[20]
SQL Injection Walkthrough, Accessed from http://www.securiteam.com/securityreviews/5DP0N1P76E.ht ml [21] H. Shahriar and M. Zulkernine, “Mitigation of Program Security Vulnerabilities: Approaches and Challenges,” ACM Computing Surveys (CSUR), Vol. 44, No. 3, Article 11, May 2012, pp. 1-46. [22] Y. Geng, ‘‘An information-theoretic model for resourceconstrained systems,’’ Proc. of IEEE International Conference on Systems Man and Cybernetics (SMC), October 2010, Istanbul, Turkey, pp. 4282-4287. [23] S. Khayam, ‘‘Worm Detection at Network Endpoints Using Information-Theoretic Traffic Perturbations,’’ Proc. of the IEEE International Conference on Communications (ICC), Beijing, China, May 2008, pp. 1561 - 1565. [24] PHP-Fusion, http://sourceforge.net/projects/php-fusion [25] T. Cover and J. Thomas, Elements of Information Theory, John Wiley and Sons, 2006. [26] C. Shannon, “A Mathematical Theory of Communication,” Bell Systems Technical Journal, 1948. [27] W. Yu, X. Wang, D. Xuan, and D. Lee, ‘‘Effective Detection of Active Worms with Varying Scan Rate,’’ Proc. of IEEE Securecomm and Workshops, Baltimore, MD, USA, August 2006, pp. 1-10. [28] A. Singh and S. Roy, ‘‘A Network Based Vulnerability Scanner for Detecting SQLI Attacks in Web Applications,’’ st
Proc. of the 1 International Conference on Recent Advances in Information Technology (RAIT), March 2012, Dhanbad, India, pp. 585---590.
[29] G. Agosta, A. Barenghi, A. Parata, G. Pelosi, ‘‘Automated Security Analysis of Dynamic Web Applications through th Symbolic Code Execution,’’ Proc. of the 9 International
Conference on Information Technology: New Generations (ITNG), April 2012, Las Vegas, NV, pp. 189-194. [30] J. Kim, ‘‘Injection Attack Detection Using the Removal of SQL Query Attribute Values,’’ Proc. of the International
Conference on Information Science and Applications (ICISA), Jeju Island, Korea, May 2011, pp. 1-7. [31] N. Antunes and M. Vieira, ‘‘Defending Against Web Application Vulnerabilities,’’ IEEE Computer, Volume 45, Issue 2, February 2012, pp. 66-72. [32] N. Khoury, P. Zavarsky, D. Lindskig, and R. Ruhl, ‘‘An Analysis of Black-Box Web Application Security Scanners rd against Stored SQL Injection,’’ Proc. of the 3 International Conference on Privacy, Security, Risk and Trust (PASSAT), Boston, MA, October 2011, pp. 1095-1101. [33] Y. Minamide, ‘‘Static Approximation of Dynamically Generated Web Pages,’’ th
Proc. of the 14 International World Wide Web Conference (WWW), Chiba, Japan, May 2005, pp. 432-441.
[34] loginsystem-rd Project, Login system to prevent XSS, SQL Injection and CSRF, Accessed from http://code.google.com/p/loginsystem-rd/ [35] G. Gu, P. Fogla, D. Dagon, W. Lee, and B. Skoric, ‘‘Towards an Information-Theoretic Framework for Analyzing Intrusion th Detection Systems,’’ Proceedings of the 11 European Symposium Research Computer Security (ESORICS 2006), Hamburg, Germany, September 2006, pp. 527-546. [36] W. Lee and D. Xiang, ‘‘Information-Theoretic Measures for Anomaly Detection,’’ Proceedings of the IEEE Symposium
on Security and Privacy, Oakland, California, May 2001, pp. 130-143.