DIDAFIT: DETECTING INTRUSIONS IN DATABASES THROUGH FINGERPRINTING TRANSACTIONS Wai Lup LOW, Joseph LEE, Peter TEOH DSO National Laboratories
20 Science Park Drive, Singapore 118230 flwailup,lsinyeun,
[email protected]
Key words:
Intrusion detection, Database security, Database misuse
Abstract:
The most valuable information assets of an organization are often stored in databases and it is pertinent for such organizations to ensure the integrity and confidentiality of their databases. With the proliferation of ecommerce sites that are backed by database systems, databases that are available online 24 7 are ubiquitous. Data in these databases ranges from credit card numbers to personal medical records. Failing to protect these databases from intrusions will result in loss of customers’ confidence and might even result in lawsuits. Database intrusion refers to the unauthorized access and misuse of database systems. Database intrusion detection systems identify suspicious, abnormal or downright malicious accesses to the database system. However, there is little existing work on detecting intrusions in databases. We present a technique that can efficiently identify anomalous accesses to the database. Our technique charaterizes legitimate accesses through fingerprinting their constituent SQL statements. These fingerprints are then used to detect illegitimate accesses. We illustrate how this technique can be used in a typical client-server database system setup. Experimental results show that the technique is efficient and scales up well. Our contributions include introducing a novel process for fingerprinting SQL statements and developing an efficient technique to detect anomalous database accesses.
1
INTRODUCTION
Almost every e-commerce site offering some kind of online service today has a database at its backend. Database systems also form the core of the information systems infrastructure of large organisations. These databases support a large variety of systems including B2B (business-to-business) workflows, manpower information systems, and hospital information systems. Data found in these databases ranges from personal information and banking transactions to medical records and commercial secrets. Any breach of security to these databases will result in tarnished reputation for the organization, loss of customers’ confidence and might even result in lawsuits. Clearly, it is important that the data in the database systems must be protected from unauthorised access and modification. One mechanism to safeguard the information in these databases is to use intrusion detection systems. These systems aim to detect intrusions as early as possible, so that any damage caused by the intrusions is minimized. They function as sentinels and ensure that
any compromise on the integrity and confidentiality of the data is detected and reported immediately to the administrators. Intrusion detection research is not new and has been on going for many years. However, previous efforts were focused on network-based intrusion detection and host-based intrusion detection. Networkbased intrusion detection typically works by monitoring network traffic and attempts to discover if an intruder is trying to break into the system. Instead of network traffic, host-based intrusion detection works by monitoring the log files in the hosts. Both networkand host-based intrusion detection systems look for attack signatures, which are specific patterns that usually indicate malicious or suspicious intent, to identify intrusions. There are numerous commercial networkand host-based intrusion detection systems (IDS) in the market today, and the market leaders include (Internet Security Systems, 2001), (NFR Security, 2001), (Enterasys Networks, Inc., 2001), (Cisco Systems, Inc., 2001) and (Symantec Corporation, 2001). There are also IDS that are available for free and are highly acclaimed (e.g. (Roesch, 1999)).
121
ICEIS 2002 - Databases and Information Systems Integration
However, these intrusion detection systems do not work at the application layer, which can potentially offer more accurate and precise detection for the targeted application. The distinctive characteristics of database management systems (DBMSes), together with their widespread use and the invaluable data they hold, makes it vital to detect any intrusion or intrusion attempts made at the databases. Therefore, intrusion detection models and techniques specially designed for databases are becoming imperative needs. There are recent reports in which SQL injection techniques, which refer to the use of carefully crafted and malicious SQL statements, were used by the intruders to gain administrator privileges to sensitive databases. SQL injection will be further discussed in a later section. This reinforces the point that intrusion detection systems should not only be employed at the network and hosts, but also at the database systems where the critical information assets lie. Unfortunately, there is little existing work on intrusion detection systems specifically for databases. Interestingly enough, ideas and results from network- and host-based IDS research can be applied to database intrusion detection. The concepts of system misuse and access anomalies (Castano et al., 1995) have their equivalent counterparts in database intrusions. Database management systems have become increasingly complex (some of which can rival the complexity of OSes). As a result, there is a myriad of definitions for the term database intrusion, with variations for different applications, systems, requirements etc. For our purpose, database intrusion refers to “the act of individuals or groups of individuals who use the database without authorisation, and those who are authorised, but abuse their privileges”. In this work, we present a technique that can identify illegitimate database accesses efficiently. Our technique works by fingerprinting legitimate access patterns of database transactions, and using them to identify database intrusions. Our contributions include : 1. Developing a fingerprinting process for SQL statements. 2. Introducing an efficient and accurate technique to detect anomalous database accesses with the fingerprints of SQL statements. The rest of the paper is as follows. Section 2 surveys related works in this area. Section 3 discusses the background concepts necessary for the understanding of subsequent sections. In Section 4, we detail our technique and describe how it can be implemented. The system prototype and results of experimental studies are presented in Section 5. Finally, we conclude in Section 6.
122
2
RELATED WORKS
There is now a proliferation of e-services that require database support. Examples of such services include cyber-shopping, online banking, paying taxes over the web, and facilities to upload personal medical records to servers, just to name a few. Such applications require connecting databases with sensitive information (credit card numbers, medical histories etc.) directly to the network (usually the Internet). Recent reports indicate that there is a large increase in the number of security breaches, which have resulted in theft of transaction information and financial fraud (Atanasov, 2001) (Hatcher, 2001). Existing works on intrusion detection have focused mainly on network and host intrusion. There are two main intrusion detection techniques - misuse detection and anomaly detection. Misuse detection schemes characterize attacks in the form of a pattern or a signature. This method of detecting intrusions/attacks works very well with known attacks, but is of little use for attacks that have not been seen and studied as it depends on known signatures. (Internet Security Systems, 2001), (NFR Security, 2001) and (Tan et al., 1998) are examples of systems that use this method. On the other hand, anomaly detection relies on establishing profiles of “normal” usage. This involves monitoring the system states over a period of time, with the assumption that this monitored profile can characterize the “normal” profile of the system. Future deviations from these profiles are considered “anomalies” and require attention. However, it is not easy to implement an anomaly detection solution. Profiling “normal” usage is a challenging task. There are too many system features to monitor and it may not be easy to determine which are the significant determinants. Anomaly detection methods are also prone to high false-positive rates, especially with over-fitted “normal” profiles. However, relaxing the constraints on these profiles might result in missed attacks. Thus, fine-tuning anomaly detection systems to find the optimal thresholds is a major issue. However, anomaly detection has the potential to detect unseen/unknown attacks that result in abnormal system states. (Lee et al., 1999) makes use of data mining techniques to build models that can differentiate “normal” and “anomalous” system states. There is little work on database IDSes. Most of the research on database security revolves around access policies, roles, administration procedures, physical security, security models and data inference (Castano et al., 1995). Database Scanner (Internet Security Sytems, 2001) is a product from Internet Security Systems that scans for security loop-holes in database server configurations. The Database Security Manager (Pentasafe, 2001) from Pentasafe is a tool for
DIDAFIT: DETECTING INTRUSIONS IN DATABASES THROUGH FINGERPRINTING TRANSACTIONS
auditing the compliance of the database server configuration to security policies. To the best of the authors’ knowledge, there is no report on an intrusion detection system for databases.
3
PRELIMINARIES
In this section, we present the background and concepts used in our fingerprinting technique. We illustrate the concepts with an example of a typical setup for database applications.
Application User 1
Database User
Application User 1 Database Server Application Server
Application User 1
Figure 1: Three-tier model for database applications.
Consider a typical 3-tier architecture for applications with a database server back-end (Figure 1). This architecture is very common, and is used for many database applications on the Internet. A typical user requests for a service offered by the application server via the web/client interface. Examples of this scenario include searching for information on a book as an anonymous user at the Amazon website, and buying an air ticket after logging in as a frequent flier at the United Airlines website. The application server then communicates with the database server back-end to perform the necessary data accesses and updates. The following points are note-worthy : 1. Although the application can have a large number of users, the application server typically communicates with the database server using only a single user that acts as an agent offering the database services. For the rest of this work, we use the term “application user” to denote users accessing the application via the web/client interface. The term “database user” is used to denote the agent connecting and accessing the database on behalf of the application users. Usually, the number of application users is much larger than the number of database users. 2. We assume that persistent database connections are not used for the database applications. (Landrum, 2001) notes that persistent connections may pose security issues for applications. In our context, a database session refers to the sequence of interactions between the database user (on behalf of the
application user) and the database server after logging in and before logging out. We note that a single request from the application user may result in one or more database sessions. 3. Application users do not communicate with the database server directly, but through the application server via the database user. Although there is no direct interaction with the database server, it is possible that unauthorized users can access the database in ways unintended by the developer. This is made possible by carelessly designed applications, database server holes, as well as application server exploits. One technique of exploiting carelessly written database applications is SQL injection (Andrews, 2001). SQL injection refers to crafting SQL statements using “string building” techniques to trick the application server into executing the intruder’s (often malicious) code. Possible results of actions by the injected code include information disclosure, unauthorised data modification, deletion of database or even escalation of the intruder’s database privileges to that of the administrator’s. As an illustration, consider the following Perl script: ... my $passwd = $cgi->param(’passwd’); my $name= $cgi->param(’name’); $sql = "select * from cust where". " name=’$name’". " and passwd=’$passwd’"; $sth = $dbh->prepare($sql); $sth->execute; if (!($sth->fetch)) { report_illegal_user(); }... The script shown is a typical procedure for login checking. It is supposed to verify if the user has supplied the user name and his/her password correctly. However, this script is vulnerable to SQL injection attacks. A malicious user can enter the following text into the password field of the submitted form: x’ OR ’x’=’x In this case, the prepared SQL statement becomes select * from customer where name=’alice’ and passwd=’x’ OR ’x’=’x’ The where clause of this statement will always be true since the intruder has carefully injected a OR
123
ICEIS 2002 - Databases and Information Systems Integration
’x’=’x’ clause into it. This makes the result set of the query non-empty no matter what password is supplied. The malicious user can now successfully log in as the user “alice”. The main reason for this vulnerability is the carelessly written procedure. The application should have checked the validity of the input before using it (e.g. dangerous characters such as ’ should not be allowed in the input). Methods to enforce good programming practices are beyond the scope of this work. We present a technique that can detect such attacks on the database and other anomalous data accesses.
4
TECHNIQUE FOR FINGERPRINTING DATABASE ACCESSES
For most applications with database services, the SQL statements submitted to the database are typically generated by some server-side scripts/programs. These statements are generated in a predictable manner and this regularity gives rise to opportunities to characterize valid transactions with some sort of signatures. For example, assume we have a delete order transaction that deletes an order with the userspecified orderID. The transaction formulates and executes two SQL statements. The first statement deletes the order and the second statement logs the transaction. A valid delete transaction will have the following SQL statements executed : delete from myOrders where orderID=’X’; insert into logs values (’delete’,’X’,...); Now, suppose the database server receives a delete transaction with the following SQL statements : delete from myOrders where orderID=’X’; select * from myOrders where orderID=’Y’;
2. A process to derive the fingerprints for SQL statements of legitimate database transactions. We propose the use of regular expressions to represent the derived fingerprints. 3. A database of “legitimate” fingerprints to be used for database intrusion detection. We discuss these three components next.
4.1
Logging of SQL Statements
We need to capture the submitted SQL statements for the database transactions in order to compare their fingerprints with those in the “legitimate” fingerprint database. Different database vendors offer different ways to capture the SQL statements. In particular, Oracle provides the sql trace (Oracle Corporation, 2001) utility that can be used to trace all database operations in a database session of an user. The trace results are logged to a file, but we can channel the data to a monitoring and misuse detection module. Note that the sql_trace utility is not designed for this purpose. Rather, its intended use was to aid in the process of optimizing database performance. However, we make use of its capability to log SQL statements executed by the database engine. This utility works in the kernel of the Oracle system and offers better security than logging the statements at an application level above Oracle. With the logging process enveloped by the Oracle kernel, it is much harder for intruders to hijack or modify the process. A concern with using the trace facility of database systems is its impact on the performance of the databases. We evaluate the performance impact in Section 5.
4.2
Generating the Fingerprint
The crux of this work is developing an efficient process to generate a fingerprint for SQL statements. This fingerprint characterizes one SQL statement and is used to detect anomalous statements. We propose the use of regular expressions for this fingerprint and the generation process consists of the following steps :
These SQL statements do not conform to the “signature” of a valid delete transaction and requires attention from the administrators. This could have been caused by maliciously crafted SQL statements from an intruder. Our fingerprinting technique derives signatures (which we call fingerprints) to help identify valid SQL statements in the transactions. Our technique involves three components:
4. White space in the SQL statement is preserved.
1. An utility to log all the SQL statements submitted to the database server.
5. The complete regular expression is preceded with a “ˆ” and appended with a “$”.
124
1. The character “*” in the SELECT clause is replaced by “\*” if it is present in the SQL statement. 2. All variables are replaced with the regular expression “[ˆ’]*”. 3. For each optional condition C , we enclose it with a pair of parentheses and append the optional operator (i.e. “(C )?”).
DIDAFIT: DETECTING INTRUSIONS IN DATABASES THROUGH FINGERPRINTING TRANSACTIONS
As an illustration, assume we have the following program: $sqlstm = "select * from book "; $sqlstm .="where type=’journal’ "; $sqlstm .="and author=’$author’"; if (!($pub eq "")) { $sqlstm .= " and pub=’$pub’"; } In the program snippet above, the pub condition in the query is optional. The fingerprinting process will generate the following fingerprint: ˆselect \* from book where type=’journal’ and author=’[ˆ’]*’ ( and pub=’[ˆ’]*’)?$ Notably, 1. The * in the select clause has been replaced with \*. 2. The variable author has been replaced with the expression [ˆ’]*. 3. The optional pub condition is represented by ( and pub=’[ˆ’]*’). 4. The whole expression begins with a ˆ and ends with a $. As an illustration, suppose our database server receives the following SQL statement. select * from book where type=’SECRET’ and author=’me’ and pub=’PBX’
craft a malicious SQL statement that passes our fingerprint test, it still has to contend with the imposed order constraint on the statements. The allowed ordering can be captured by an augmented directed graph called an activation graph, G(V ; E ), where 1. V is the set of valid SQL regular expressions (i.e. fingerprints). 2. Each vertex in V is augmented with two boolean flags Fbegin and Fend . Fbegin is true if and only if there exists some SQL statement matching V that is the first SQL statement of some transaction. Likewise, Fend is true if and only if it is possible to have some SQL statement matching V , that is executed as the last statement for some transaction. 3. E is the set of directed edges. A regular expression r1 has a directed edge to another regular expression r2 if and only if an SQL statement that matches r 1 can be executed just before r 2 . For example, consider the following code snippet that increases the freq value by 1 if the client is of VIP status. $stm = "select count(*) from order". " where oid=’". $orderID . "’"; $sth = $dbh->prepare($stm); $sth->execute; ... if ($vip==1) { $stm = "update customer". " set freq=freq+1". " where vipID=’".$vipID."’";
This statement does not match our legitimate fingerprint shown previously. This is caused by the statement trying to access the book whose type is ’SECRET’, and our signature requires that the type be ’journal’. Hence, this anomalous statement is detected. There are many software packages and utilities that can process regular expressions. The standard Unix tool egrep is one such software that can handle the matching of our fingerprints efficiently.
$sth = $dbh->prepare($stm); $sth->execute; }... The activation graph will consist of the following fingerprints : 1. r1 :
4.3
2.
Imposing Order on the SQL Statements
The fingerprinting technique is very effective in detecting anomalies in SQL statements. It is both accurate and does not have the problem of having a large number of false-positives. However, to add another layer of security assurance, we can impose a constraint on the ordering of the SQL statements submitted. In the unlikely event that an intruder manages to
ˆselect count(\*) from order where oid=’[ˆ\’]*’$ r2
ˆupdate customer set freq=freq+1 where vipID=’[ˆ\’]*’$ From the code snippet, we identify that r 2 can optionally follow r 1 . Furthermore, r 1 can both start and end the transaction (i.e. both F begin and Fend flags of r1 are true), and r 2 can only end the database session (i.e. Fbegin is false and Fend is true for r2 ). The activation graph for this scenario is depicted in Figure 2.
125
ICEIS 2002 - Databases and Information Systems Integration
From the graph, it is clear that any database session that starts with r2 is an anomaly and invalid. r 1
r 2
F begin
: true
F end
: true
F begin : false F end
: true
Figure 2: Activation Graph.
5
THE DIDAFIT SYSTEM PROTOTYPE
In this section, we present a prototype of the DIDAFIT (Detection of Intrusions in DAtabases through FIngerprinting Transactions) system. We detail how the system prepared, set up and deployed. The results of performance studies done on the prototype are discussed at the end of this section. The architecture of DIDAFIT is depicted in Figure 3. DIDAFIT requires a set of fingerprints which characterizes the set of legitimate database accesses for the application. The process to generate the set of fingerprints has been detailed in Section 4.2. It may require some effort to perform the fingerprinting process on all legitimate transactions. However, we note that this is an one-time effort. Moreover, it is possible to automate this process for large and complex applications. This will not be discussed here due to space constraints. Assuming the database of legitimate fingerprints has been set up, the working of the DIDAFIT system is as follows : 1. The application user issues a service request to the application server. This transaction may or may not be legitimate. 2. The application server formulates the necessary SQL statements and issues them to the database server through the database user. 3. The database user logs into the database. The database session is traced and the SQL statements received from the application are channeled to the misuse detection module. 4. In the misuse detection module, the fingerprint for the received SQL statements are matched with the set of fingerprints of legitimate database transactions. 5. Anomalies or intrusions are then channeled to the reaction modules for the appropriate actions to be taken. Actions that can be taken include alerting the administrators, sounding the alarm on the console and paging the duty personnel.
126
6. Output is returned to the application user (if applicable). As noted previously, most database applications do not allow users to issue their own SQL queries/statements. Users typically specify their requirements through a client interface and the SQL statements are generated by server-side scripts/programs. This results in a limited number of different SQL statements that are possible with the application (even with optional clauses and variants of the generated statements). Hence, this fingerprinting technique for database intrusion detection will raise very few false alarms (if any) due to this regularity, once we characterize the legitimate set of transactions.
5.1
Performance Studies
One concern of implementing a layer of intrusion detection functionality over the database server is the issue of performance. We require that the intrusion detection facility be efficient and it must not affect the performance of the database adversely. Our testbed mimics the setup shown in Figure 3. The machine hosting the database server is a dual processor SUN Ultra-SPARC machine running Solaris 8 with 512 MB of RAM. The database server used is Oracle 8.1.7. The applications are written in Perl, and the connection to the databases are made using the Perl DBI/DBD modules. The first experiment investigates the effects the size of the database transactions have on our system. We are interested in the response times of the system as the transaction size increases, and comparing them with the timing results when DIDAFIT is not used. We characterize the transaction sizes using load units. A transaction size of x load units is defined as 1. A select statement which will return a result set of x 500 rows. 2. Inserting x rows. 3. Deleting x rows. The result of this experiment is shown in Figure 4. As we can see from the results, the impact of running DIDAFIT has minimal effect on the performance of the database. This is the case even when the transaction size is 600 load units. This transaction includes a select query whose result set has 300,000 rows. For this transaction size, the difference in response time for the system with and without running DIDAFIT is only 6 seconds. For the next experiment, we monitor the response time from the database as we increase the number of users on the system. Each user issues a transaction of 1 load unit as defined previously. The results are
DIDAFIT: DETECTING INTRUSIONS IN DATABASES THROUGH FINGERPRINTING TRANSACTIONS
Fingerprint 4 Monitor and Misuse Detection Module
1
5
Actions to be taken
Fingerprint Database
2
Transaction 6
3
6 Database User
Application User Application Server
Database Server
Figure 3: Architecture for DIDAFIT.
Effect of scaling up transaction sizes
Effect of scaling up number of users
200
250 Not running DIDAFIT
Not running DIDAFIT
Running DIDAFIT
Running DIDAFIT
180
160
200
140
150 Time (sec)
Time (sec)
120
100
80
100
60
40
50
20
0
0 0
100
200
300
400
500
600
Transaction Size (units)
0
20
40
60
80
100
120
140
160
180
200
Number of users
Figure 4: Effect of DIDAFIT on response times with increasing transaction sizes.
Figure 5: Effect of DIDAFIT on response times with increasing number of users.
depicted graphically in Figure 5. At the load with 200 users, the difference in time taken is about 4 seconds. The results from the experiments show that the impact of running DIDAFIT on the performance of the database server is minimal. We believe that this small increase in response time can easily be obscured by other delays in operational environments (such as network lags).
We have presented a novel technique for fingerprinting SQL statements and using them to identify anomalous database accesses. First, we build the set of fingerprints that characterize all legitimate database transactions. This set of fingerprints is then used to match incoming database transactions. If the set of fingerprints in the legitimate set is complete, any incoming transaction whose fingerprint does not match any of those in the legitimate set is very likely to be an intrusion! Moreover, the false-positive rate of this technique is very low since database applications typically do not allow users to craft their own queries. Instead, they usually allow for the customization of several standard queries through an interface. Thus, false alarms will be rare since the SQL statements are only constrained to the allowed variants of the standard queries. Experimental results show that there is little performance hit on the database with our technique, even with large transaction sizes and number
6
CONCLUSION
Database intrusion is a major threat to any organization storing valuable and confidential data in databases. This is increasingly more so as the number of database servers connected to the Internet increases rapidly. Existing network-based and host-based intrusion detection systems are not sufficient for detecting database intrusions.
127
ICEIS 2002 - Databases and Information Systems Integration
of users. Currently, the technique detects the illegitimate transactions after they have been processed. Although the detection can be done in near-real time, we are looking into how the intrusion can be detected before the statements are executed. Although persistent connections pose security risks, they are often used for efficiency reasons. It is worthwhile to see how our technique can be extended to support persistent database connections.
REFERENCES Andrews, C. (2001). SQL injection FAQ. Available at http://www.sqlsecurity.com. Atanasov, M. (2001). The Truth about Internet Fraud. Available at URL http://techupdate.zdnet.com/ techupdate/stories/main/0,14179, 2688776-11,00.html. Castano, S., Fugini, M. G., Martella, G., and Samarati, P. (1995). Database Security. Addison-Wesley – ACM Press. Cisco Systems, Inc. (2001). Cisco intrusion detection. Available at URL http://www.cisco.com/warp/ public/cc/pd/sqsw/sqidsz/. Enterasys Networks, Inc. (2001). The Dragon IDS. Available at URL http://www.enterasys.com /ids/dragonids.html. Hatcher, T. (2001). Survey: Costs of computer security breaches soar. Available at URL http://www.cnn.com/2001/TECH/internet /03/12/csi.fbi.hacking.report/. Internet Security Systems (2001). RealSecure Intrusion Detection Solution. Available at URL http://www.iss.net. Internet Security Sytems (2001). Database Scanner. Available at URL http://www.iss.net/ securing e-business/security products/ security assessment/database scanner/. Landrum, D. E. (2001). Web application and databases security. Available at URL http://www.sans.org/infosecFAQ/ securitybasics/web app.htm. Lee, W., Stolfo, S., and Mok, K. (1999). A data mining framework for building intrusion detection models. In Proceedings of the 1999 IEEE Symposium on Security and Privacy. NFR
Security (2001). NFR network intrusion detection. Available at URL http://www.nfr.com/products/NID/.
Oracle Corporation (2001). Oracle. http://www.oracle.com. Pentasafe rity
128
(2001). manager.
Available at URL
Database Available at
secuURL
http://www.pentasafe.com/products/ database-overview.htm. Roesch, M. (1999). Snort: Lighweight intrusion detection for networks. In Proceedings of the 13th Conference on Systems Administration (LISA-99). USENIX Association. Symantec Corporation (2001). Enterprise solutions. Available at URL http://enterprisesecurity.symantec .com/. Tan, Y. T., Tan, W. K., Ong, T. H., and Ting, C. (1998). Nidar: The design and implementation of an intrusion detection system. In Proceedings of the First International Workshop on the Recent Advances in Intrusion Detection (RAID).