(IJCNS) International Journal of Computer and Network Security, 71 Vol. 2, No. 6, June 2010
Clustering Based Machine Learning Approach for Detecting Database Intrusions in RBAC Enabled Databases Udai Pratap Rao1, G. J. Sahani2, Dhiren R. Patel3 1
Dept. of Computer Engineering, S.V. National Institute of Technology Surat, Gujarat, INDIA
[email protected] 2
3
Dept. of Computer Engineering, SVIT,Vadodara, Gujarat, INDIA
[email protected]
Dept. of Computer Science & Engineering, Indian Institute of Technology Gandhinagar, Ahmedabad, Gujarat, INDIA
[email protected]
Abstract: Database security is an important issue of any organization. Data stored in databases is very sensitive and hence to be protected from unauthorized access and manipulations. Database management systems provide number of mechanism to stop unauthorized access to database. But, intelligent hackers are able to break the security of database systems. Most of the database systems are vulnerable or the environment in which database system resides may be vulnerable. People knowing such vulnerabilities can easily get access to database. Unauthorized suspicious activities can be trapped by database management systems. But, there are some authorized users who can violet the security constraints. Traditional database mechanisms are not sufficient to handle such attacks. Early Detections of any authorized or unauthorized access to database is very important for database recovery and to save the loss that can be occurred due to manipulation of data. There are number of database intrusion detection systems to detect intrusions in network systems, these IDSs cannot detect database intrusions. Very few IDS mechanism for databases has been proposed. Here we are proposing unsupervised machine learning approach for database intrusion detections in databases enabled with Role Based Access Control (RBAC) mechanism.
Keywords: Database Malicious Transactions
Security,
Clustering
Technique,
1. Introduction Databases not only allow the efficient management and retrieval of huge amounts of data, but also they provide mechanisms that can be employed to ensure the integrity of the stored data. Data in these databases may range from credit card numbers to personal information like medical records. Unauthorized access or modification to such data results in big loss to customers. So, database security has become an important issue of most of the organizations. Recently number of database attack incidents has been occurred and number of customer records was stolen. Most of the attacks were encountered because of bad coding of database applications or exploiting database systems vulnerabilities. Web applications are the main sources of database attacks. Attackers may attack databases for several reasons and they may deduce newer techniques of database
attacks over a time. In today’s network environment is necessary to protect our data from attackers. Mainly database attacks are of two types: 1) intentional unauthorized attempts to access or destroy private data; 2) malicious actions executed by authorized users to cause loss or corruption of critical data. Although there are number of number of approaches available to detect unauthorized attempt to access data, attackers are succeeded in attacking the system because of the vulnerabilities. As database security mechanisms are not design to primarily detect intrusions, there are many cases where the execution of malicious sequences of SQL commands (transactions) cannot be detected. Therefore it becomes necessary to employ intrusion detection system [1]. In case a computer system is compromised, an early detection is the key for recovering lost or damaged data without much complexity. When an attacker or a malicious user updates the database, the resulting damage can spread very quickly to other parts of the database. Intrusion Detection System (IDS) provides good protections from attacks aimed at taking down access to the network, such as Distributed Denial of Service attacks and TCP SYN Flood attacks. But such systems cannot detect malicious database activity done by users. In recent years, researchers have proposed a variety of approaches for increasing the intrusion detection efficiency and accuracy [2]-[5]. But most of these efforts concentrated on detecting intrusions at the network or operating system level. But, there have been very few ID mechanisms specifically tailored to database systems. They are not capable of detecting malicious data corruptions. So, reasonable effort is required in area of database intrusion detection system. Intrusion detection systems determine the normal behavior of users accessing the database. Any deviation to such behavior is treated as intrusion. There are mainly two models of intrusion detection system, namely, anomaly detection and misuse detection. The anomaly detection model bases its decision on the profile of a user's normal behavior. It analyzes a user's current session and compares it with the profile representing his normal behavior. An alarm is raised if significant deviation is found
72
(IJCNS) International Journal of Computer and Network Security, Vol. 2, No. 6, June 2010
during the comparison of session data and user's profile. This type of system is well suited for the detection of previously unknown attacks. The main disadvantage is that, it may not be able to describe what the attack is and may sometimes have high false positive rate. In contrast, a misuse detection model takes decision based on comparison of user's session or commands with the rule or signature of attacks previously used by attackers. We are presenting unsupervised machine learning approach for database intrusion detections in databases enabled with role based access control (RBAC) mechanism. It means number of roles has been defined and assigned to users of database systems. Keeping database security in view, proper privileges are assigned to these roles. The rest of this paper is organized as follows. In section 2, we discuss related background. In section 3, a detailed overview about our approach is given. In section 4, analysis and result of our approach is presented. Finally in section 5 we conclude with the references at the end.
2. Related Work Application of machine learning techniques to database security is an emerging area of research. There are various approaches that use machine learning/data mining techniques to enhance the traditional security mechanisms of databases. Bertino et al. [6] have proposed a framework based on anomaly detection techniques to detect malicious behavior of database application programs. Association rule mining techniques are used to determine normal behavior of application programs. Query traces from database logs are used for this purpose. This scheme may suffer from high detection overhead in case of large number of distinct template queries. i.e. the number of association rules to be maintained will be large. DEMIDS is a misuse-detection system, tailored for relational database systems [7]. It uses audit log data to derive profiles describing typical patterns of accesses by database users. The main drawback of the approach presented as in [7] is a lack of implementation and experimentation. The approach has only been described theoretically, and no empirical evidence has been presented of its performance as a detection mechanism. Yi Hu and Brajendra Panda proposed a data mining approach [8] for intrusion detection in database systems. This approach determines the data dependencies among the data items in the database system. Read and write dependency rules are generated to detect intrusion. The approach is novel, but its scope is limited to detecting malicious behavior in user transactions. Within that as well, it is limited to user transactions that conform to the read-write patterns assumed by the authors. Also, the system is not able to detect malicious behavior in individual read-write commands. False alarm rate is may be more. It also does not hold good for different access roles. Sural et al. [9] have presented a approach for extracting dependency among attributes of database using weighted sequence mining. They have taken sensitivity of data items into consideration in the form of weights. Advantage of this approach is that more rules are
generated as compared to the approach presented in [8]. More rules generated reduce false alarms. But it is also not well suited approach for role based database access. Kamra et. al [10] have proposed a role based approach for detecting malicious behavior in RBAC (role based access control) administered databases. Classification technique is used to deduce role profiles of normal user behavior. An alarm is raised if roles estimated by classification for given user is different than the actual role of a user. The approach is well suited for databases which employ role based access control mechanism. It also addresses insider threats scenario directly. But limitation of this approach is that it is querybased approach and it cannot extract correlation among queries in the transaction.
3. Our Approach The approach we are presenting is a transaction level approach. Attributes referred together for read and write operations in transactions play important role in defining normal behavior of user’s activities. For example consider the following transaction: Begin transaction select a1,a2,a3 from t1 where a1= 25; update t2 set a4= a2+ 1.2(a3); End transaction Where t1 and t2 are tables of the database and a1, a2, a3 are the attributes of table t1 and a4, a5 are the attributes of table t2 respectively. This example shows the correlation between the two queries of the transaction. It states that after issuing select query, the update query should also be issued by same user and in the same transaction. Approach presented in [10] can easily detect the attributes which are to be referred together, but it cannot detect the queries which are to be executed together. This example shows the correlation between the two queries of the transaction. It states that after issuing select query, the update query should also be issued by same user and in the same transaction. Our approach extracts this correlation among queries of the transaction. In this approach database log is read to extract the list of tables accessed by transaction and list of attributes read and written by transaction. The extracted information is represented in the form of following structure format: (Read, TB-Acc[ ], Attr-Acc[ ][ ], Write, TB-Acc[ ],AttrAcc[ ][ ] ) Where Read and Write are binary fields while TB-Acc[ ] is binary vector of size equal to number of relations in database and Attr-Acc[ ][ ] is vector of N vectors and N is equal to the number of relations in the database. If transaction contains select query then Read is equal to 1 otherwise it is 0. Similarly, if transaction contains update or insert query Write is equal to 1 otherwise it is 0. Element TB-Acc[i]=1 if SQL command at hand access i-th table and 0 otherwise. Element Attr-Acc[i][j] = 1 if the SQL command at hand accesses the j- th attribute of the i-th table and 0 otherwise. Table 1 shows the representation of example transaction given above using this format.
(IJCNS) International Journal of Computer and Network Security, 73 Vol. 2, No. 6, June 2010
Table 1: Representation of example transaction Rd
t1
t2
a1
a2
a3
a4
a5
1
1
0
1
1
1
0
0
Table 1: (Continued) Wt
t1
t2
a1
a2
a3
a4
a5
1
0
1
0
0
0
1
0
Where Rd=Read and Wt=Write
into number of groups, we have used k-means clustering algorithm for clustering. K-means is the fastest among the partitioning clustering algorithms. Training tuples generated from database log has binary data fields. Therefore similarity measures of binary variables can be used for clustering such tuples. Similarity measure between two tuples for clustering algorithm of our approach is as follows.
ncount11 simm(t1,t2) =
Values of fields of above structure will form the normal behavior of the transaction to be issued by user. Violation to such behavior will be detected as anomalous. The overall approach is depicted by figure 1.
ncount11 + ncount10 + ncount01 Where ncount11 – count equals to number of similar binary fields of both the tuples t1 and t2 has value 1. ncount10 – count equals to number of similar binary field of tuple t1 has value 1 and of tuple t2 has value 0.
Database Log (History Transactions)
ncount01 – count equals to number of similar binary field of tuple t1 has value 0 and of tuple t2 has value 1.
Preprocess (Read Items, Write Items) Current Session
For example consider the following transactions: Clustering (Learning Phase)
Clusters (Role Profiles)
Transaction tr1 Begin Transaction
User transaction
select a1,a2 from t1; Comparison (Detection Phase) Outlier
update t2 set a4; Update
End Transaction Corresponding bit pattern:
Raise Alarm
New DB Log
Figure 1. Overview of the proposed approach Information about the role of the users who had issued the transactions and the data items read written through these transactions is gathered from the database log. After gathering the history transaction from database log, it is preprocessed and stored as binary bits representing the items read and items written by the transactions in the form of structure presented above. Data generated in this form is form the dataset for clustering. Clustering forms the group of similar transactions. These groups represent the normal behavior of the users who have issued such transactions. It represents the role profile of the users who are authorized to issues such transactions. Once the role profiles are generated, next goal is to predict group of new incoming transactions. If the incoming transaction is found to be member of any of the cluster, then the transaction is considered as a valid transaction. If the incoming transaction is detected as an outlier, then it is considered as an invalid transaction and an alarm is generated. Valid transactions are fired on the database and are added to the database log. As history transactions are to be partitioned
110 1100010100010
Transaction tr2 Begin Transaction select a1,a3 from t1; update t2 set a5; End Transaction Corresponding bit pattern: 110 1010010100001 ncount11 = 5 ncount10 = 2 ncount01 = 2 Similarly- similarity measure of tr1 and tr2 will be simm(tr1,tr2) = 5/(5+2+2) = 55.5 % Advantage of our unsupervised approach is that role information of the transactions need not have to log in database log. Behaviors of the users belonging to the same
(IJCNS) International Journal of Computer and Network Security, Vol. 2, No. 6, June 2010
74
role are grouped into same cluster. Approach is also well suited for the users with more than one role. Detection phase need to be generalized only.
[2]
4. Result and Analysis For verification of our approach, we generated number of database tables with number of attributes. We defined number of roles and generated number of transactions for these roles. Based on these transactions, we also generated large number of tuples as a training dataset. For detection, we generated number of valid as well as invalid transactions. We tested our approach by supplying valid as well as invalid transactions and our approach was detecting these transactions with full accuracy. We considered all the possible ways for generating valid and invalid transactions and we got the proper result for all the cases. Our approach is perfectly detecting correlations among commands of the transactions. We tested the approach by issuing the valid transactions by eliminating one of the SQL command from the transaction and it was detected as invalid transaction. When we issued the transactions with all the desired SQL commands, it was detected as valid transaction. Training time was also varying linearly with respect to number of training tuples as per the expectations. Figure 2 shows the nature of training time vs number of training tuples.
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10] Figure 2. Training Time Vs Training Data
4. Conclusion In this paper we have proposed a new unsupervised machine learning approach of database intrusion detection for databases in which role based access control (RBAC) mechanism is enabled. It considers the correlations among the queries of the transaction and detects them accordingly. It does not require role information to be logged in database log. Clusters of transactions generated can also provide guidelines to the database administrator for role definitions.
References [1] Fredrik Valeur, Darren Mutz, and Giovanni Vigna., “A learning-based approach to the detection of sql attacks,” In Proceedings of the International Conference on Detection of Intrusions and Malware,
and ulnerability Assessment (DIMVA), pages 123140,2003. Lee, V. C.S., Stankovic, J. A., Son, S. H., “Intrusion Detection in Real-time Database Systems Via Time Signatures,” In Proceedings of the Sixth IEEE Real Time Technology and Applications Symposium, pages 121-128, 2000. Marco Vieira and Henrique Madeira, “Detection of Malicious Transactions in DBMS,” IEEE Proceedings11th Pacific Rim International Symposium on Dependable Computing, PP: 8, Dec 12-14, 2005. Ashish Kamra, Elisa Bertino, and Evimaria Terzi. , “Detecting anomalous access patterns in relational databases,” The International Journal on Very Large Data Bases (VLDB), 2008. Wai Lup LOW, Joseph LEE, Peter TEOH., “DIDAFIT: Detecting intrusions in databases through fingerprinting transactions,” ICEIS 2002 - Databases and Information Systems Integration, pages 121127,2002. Elisa Bertino, Ashish Kamra, and James Early, “Profiling database application to detect sql injection attacks,” IEEE International Performance, Computing, and Communications Conference (IPCCC) 2007, pages 449–458, April 2007. C.Y. Chung, M. Gertz, and K. Levitt. , “DEMIDS: a misuse detection system for database systems,” In Integrity and Internal Control in Information Systems: Strategic Views on the Need for Control. IFIP TC11 WG11.5 Third Working Conference, pages 159-178, 2000. Yi Hu and Brajendra Panda, “A data mining approach for database intrusion detection,” In SAC ’04: Proceedings of the 2004 ACM symposium on applied computing, pages 711–716, New York, NY, USA, 2004. Abhinav Srivastava, Shamik Sural and A. K. Majumdar, “Database intrusion detection using weighted sequence mining,” Journal of Computers, Vol. 1, NO. 4, pages 8-12, JULY 2006. Elisa Bertino, Ashish Kamra and Evimaria Terzi, “Intrusion detection in rbac-administered databases,” In Proceedings of the Applied Computer Security Applications Conference (ACSAC), 2005.