Implementing a CBR Recommender for Honeypot ...

2 downloads 89 Views 655KB Size Report
virtual host, and they can only scan and connect to the offered ports [12]. Another advantage of low-interaction honeypot – it is easy to deploy and maintain. Its.
The 3rd International Conference on Computer Science and Computational Mathematics (ICCSCM 2014)

Implementing a CBR Recommender for Honeypot Configuration using jCOLIBRI Wira Zanoramy Zakaria1, Miss Laiha Mat Kiah2 1

MyCERT, CyberSecurity Malaysia, Level 7, Sapura@Mines, No. 7 Jalan Tasik, The Mines Resort City,43300 Seri Kembangan, Selangor, Malaysia 22 Department of Computer Systems & Technology, Faculty of Computer Science & IT University of Malaya, 50603 Kuala Lumpur, Malaysia [email protected], [email protected]

Abstract: A dynamic and intelligent honeypot have the ability to learn the behavior of the network and automatically configures itself. This research proposed the Case-based Reasoning (CBR) methodology to realize a CBR recommender system for the domain of honeypot configuration and deployment. The prototype recommender system is built using a Java-based CBR framework, jCOLIBRI. This paper describes about the architecture of the proposed system, case-base, case representation, case retrieval, case reuse and case revise. The case-base for this system is built with an initial set of 10 honeypot cases contained within the case-base.

authorization or any detection? Could this event compromise the whole organization, its customers and the country? How much productivity will the organization lose if all of its databases were completely compromised or erased? Even though a lot of efforts has been made to secure the Internet from cyber-attacks, loopholes for the attackers still exists. Attack events that are happening without the consent of network administrator are a very serious issue. Table 1. List of reported security incidents for 2013.

Keywords: case-based reasoning system, honeypot, jCOLIBRI

Categories of Incidents

No. of Cases

1. Introduction

Fraud

4485

Computer systems and network infrastructures are the backbone of any organization especially in banking and government. If this backbone fails to work properly, the organization will face serious trouble and at the same time would affect its decision-making capability. It is also very important to secure the organization’s network environment as much as possible to prevent attackers from attacking and exploiting it. If an attacker is able to do so, significant losses in terms of cost and organization’s dignity will incur and more badly, sensitive information could be leaked to the Internet. With all sorts of businesses and government sites running on the Internet, this has made the environment become a place for cybercriminals. There are thousands of them connected to the Internet, breaking into real production systems and exploit vulnerabilities on those systems for fun and profit. Furthermore, they also actively sharing they tools and tactics with other attackers on the Internet [1, 2]. In 2013, Malaysian CERT (MyCERT) through its Cyber999 security incident handling service received a total of 10636 tickets regarding security incidents occurred in Malaysia. MyCERT is a department under CyberSecurity Malaysia that is mandated as a computer emergency response team (CERT) for Malaysia. Out of this huge number of reported incidents, fraud, intrusions and malicious codes being the top three security threats in Malaysia. Table 1 populates the incident categories and its respective number of cases [3]. System administrators and must always ask themselves these questions: What is the worst thing that could happen if an unauthorized person had full privileged access to their computer systems and network environment without any

Intrusion

2770

Malicious Codes

1751

Spam

950

Cyber Harassment

512

Intrusion Attempt

76

Content Related

54

DoS

19

Vulnerabilities Report

19

232

Due to this, a deception-based approach called as “honeypot” is needed in order for the IT security administrators to detect malicious activities and at the same time to better understand the taxonomy of the attacks. Honeypot is an information system resource that is being deployed inside the network and purposely designed to be scanned, attacked and compromised [4]. This technology plays an important role as it provides an in depth information about the attackers’ steps and techniques when they were compromising computer systems and networks. This information is essential in giving us deep understanding about the motives of the attackers, their skills and tools being used to intrude and compromise networks. By identifying the capabilities and tactics of the attackers, network administrators also could discover vulnerabilities of their network. Certain precautions and improvements can be taken to increase the security of the network environment in order to prevent the same attacks from happening again in the future.

The 3rd International Conference on Computer Science and Computational Mathematics (ICCSCM 2014)

2. Problem Statement Honeypots are resources that are purposely being setup inside the network in order to lure the attackers and at the same time captures their tools and techniques without the attacker's consent. The information captured are then will be used to learn about the attacks and as a guide to improve security measures in the future [5, 6]. Honeypots are categorized in two categories: low-interaction honeypot and high-interaction honeypot. Both types of honeypots have specific level of interaction with the attackers and how much data it can collect from the interaction. No matter which type of honeypot it is, accurate configuration and deployment is extremely needed in order to define how the honeypot will behave towards the attackers [7]. The challenges in configuring and maintaining the honeypots are the issues in honeypot [8]. The dynamic nature of network environment and the ever-changing cyberattack tactics makes it a tedious task for honeypot administrators to configure and maintain the honeypots from time to time [9]. Configuration issues such as what type of operating system that the honeypot will be running, which TCP and UDP ports to open, which services to offer to the attackers, which IP address it will the honeypot be attached to and what is the response of the honeypot towards the attacker. For example, if the network has a production FTP server running Fedora Linux 18 at IP address 10.10.0.13, then a properly configured honeypot should be running with the same OS and offering port 21. The honeypot also must be attached to an IP address that is nearby to the production host, which is for example – 10.10.0.9. Error in configuring it, can lead to missed detection, not attracting the attackers at all or even the honeypot could become a launch for the attackers to execute more damages [8]. For example, if the network environment is equipped with Windows based production servers, it is awkward to deploy a Linux based honeypots inside the network. This mistake could give a free clue to the attackers that the Linux hosts are actually honeypots. Another example is, if an attack is targeting a specific application running on a specific port number, deploying a honeypot that is offering a different port could lead to disaster. The attacker might interact with a different host instead of exploiting the honeypot. Furthermore, a honeypot that is listening to different port might not be catching any interactions at all. Thus, voids the true purpose of a honeypot which is logging the attacker’s activities. Due to this issue of misconfigurations, it is a crucial need to realize a honeypot that is intelligent and dynamic in the sense of maintaining itself in a timely manner. A honeypot framework that is able to learn to the changes within the network and right away configures and deploys accurately configured honeypots that mimics the real production hosts. In this research, a case-based reasoning (CBR) system that is capable to recommend honeypot configuration and deployment is proposed. CBR is a branch in artificial intelligence that solves current problem using similar past solutions. It is a methodology in which past experiences are utilized to guide the solution for a current problem. The input to this CBR system is the information about a

233

detected valid host in network and the output is a recommended low-interaction honeypot configuration. Through the CBR system, any changes to the identification of the production host will be reflected by the deployed honeypot. With this, a framework of dynamically configured honeypot is achieved.

3. Low-interaction Honeypot: Honeyd Honeyd is a popular open source low-interaction honeypot framework that offers a simple way to emulate virtual hosts on a single machine [6, 10, 11]. The main advantage of lowinteraction honeypot is it has low risk: because it only offers emulated services to the attacker. The attacker's action is limited to what is being offered or emulated within the virtual host, and they can only scan and connect to the offered ports [12]. Another advantage of low-interaction honeypot – it is easy to deploy and maintain. Its disadvantage is the amount of information that can be collected by the virtual host is very minimal. Honeyd is built in the form of UNIX daemon and can be run on all UNIX-based and BSD platforms. This tool is open source software released under the GNU General Public License. Honeyd is a small daemon that has the ability to create and deploy low-interaction virtual honeypots around the network. All these virtual honeypots can be configured to run certain services and system administrator has the freedom to set the personality of each virtual honeypot. The appearance of each single virtual honeypot can be customized so it can be set to mirror the current network environment. In the eyes of attackers, they will see the virtual honeypot as a usual production host appears to be running certain operating systems with some services [12]. Therefore, the objectives of this research are:  To propose a case-based reasoning (CBR) technique for dynamic honeypot.  To develop a prototype for the proposed system.  To test and evaluate the prototype.

4. jCOLIBRI Framework jCOLIBRI is a free and open source Java-based CBR framework. It is developed and maintained by Juan A. Recio-Garcia and his team of CBR researchers. It is a comprehensive and efficient tool in developing many types of CBR system applications, varying from textual, structured, knowledge to data intensive systems [13]. In this research, the version of jCOLIBRI framework being used is 2.1. This tool also includes a few examples of CBR-based classification and recommender systems complete with sample case base within the framework. In developing CBR system, jCOLIBRI allows developers to create new application by defining the case attributes, it weightage, and the CBR functions. To make the development even more efficient, this research utilized the Eclipse tool as the programming IDE. All jCOLIBRI classes are imported into Eclipse as a new Java project. The proposed CBR system is developed on top of these classes.

The 3rd International Conference on Computer Science and Computational Mathematics (ICCSCM 2014)

Besides jCOLIBRI, there are also many other tools available for building a CBR system, such as Kate, ReCall, CBR-Works, ReMind, CBR*Tools, CASPIAN, CAT-CBR Indiana University Case Based Reasoning Framework (IUCBRF) and myCBR [14, 15]. IUCBRF is technically more straightforward and fewer complexes compared to jCOLIBRI. But since the project has been long outdated and unsupported, it is not the best tool to use. Among these tools, jCOLIBRI is the most updated, supported and has active community of CBR system developers. Most of related works on CBR system found during this research were successfully implemented by using jCOLIBRI.

5. System Architecture This section described the architecture of the CBR recommender system. It consists of three modules: Host Scanner, CBR module and Honeypot Framework. 5.1 Host Scanner This module is responsible in gathering the identification information about the production host in the network. To realize this module, we approached the technique of active host fingerprinting. This technique enables us to gather important information from an active host within the network. In this research we utilized Nmap, an open source port scanner. Nmap interacts with the targeted host and the host responds with information such as operating system type, its version, patch level, IP address, host uptime, and list of open ports. The results are stored inside a log file and later it is extracted to be formed as target case, which is to be used as an input for the Case-based Reasoner module. 5.2 CBR Recommender This is the main module and also the main contribution of this research. Case-based Reasoner executes CBR steps towards the target case in order to solve it. The input for this module is the information of a production host that is collected by the Host Scanner module. The output of this module is a recommendation for a low-interaction honeypot configuration. The CBR steps involved in this module are case retrieval, case reuse, case adaptation and case retention. The honeypot configuration recommended by this module will determine the behavior of the honeypot to be deployed. In order to recommend accurate configuration for the honeypot, this module uses similar past cases as a guide to recommend solution for the target case. The objective of this module is to produce a honeypot configuration that is able to imitate the targeted production host in the network. 5.3 Honeypot Framework Based on the recommendation provided by the CBR system, this module’s role is to deploy low-interaction honeypot inside the network. Honeypot Framework is realized by utilizing honeypot deployment software – Honeyd. The identification and behavior of the deployed honeypot is based on how accurate the configuration designed by the CBR module. Figure 1 below shows the system architecture.

234

Figure 1. Architecture of the proposed system.

6. Building the CBR Recommender This section discusses about the implementation of the proposed CBR system. Five important components – case representation, case-base, case retrieval, case reuse and case revise are discussed in detail. 6.1 Case Representation Implementing the domain of honeypot as a CBR system involved treating the log of a host scanner and Honeyd configuration as cases. To represent the cases, the attributevalue approach is used. Each case is built up by eight attributes. This attributes represents a low-interaction honeypot configuration. This eight attributes are further divided into two groups: Problem and Solution. The Problem part of the case represents the information that describes about the identity of a scanned production host in a network. The attributes contained in this group are named as - Detected Host, IP Address, Uptime and Ports. The Detected Host attribute contains the name of the OS that is running on a production host. This OS name is as reported by the network scanner module. The IP Address attribute contains the IP address of the host. The Uptime attribute contains the running time of the host in seconds. The Ports attribute holds the information about open ports found on the host. The Solution part of the case describes about the honeypot recommendation that mimics the similar identity of the production host described in Problem. The names of the attributes listed in this group are: OS Personality, Uptime, Ports and Service Script. Figure 2 shows the case representation structure for the domain of honeypot configuration. The OS personality attribute responsible in determining the type of OS that the honeypot will mimic. This attribute contains the name of the OS for the honeypot. The Uptime attribute here also holds the time value but this value here represents the time unit that is recommended for the honeypot to mimic. The Ports attribute listed all the port numbers that the honeypot will offer to the attacker. This port numbers is slightly similar to the list of the open ports

The 3rd International Conference on Computer Science and Computational Mathematics (ICCSCM 2014)

found on the production host. The Service Script attribute contains the filename of a network service emulation program.

insert into honeypotCB values('Case 1','windows vista',21,21,37000,2,'April','Microsoft Windows Vista','windowsvista','ftp.sh',38000,1);

Production Host Description (Problem)  Detected_OS  Host_IP_Address  Host_Uptime  Host_Ports Honeypot Configuration (Solution)  OS_Personality  Bind  HP_Uptime  HP_Ports  Service_Script Figure 2. The case structure of the proposed CBR system.

insert into honeypotCB values('Case 2','windows vista',80,80,17500,3,'April','Microsoft Windows Vista','windowsvista','iisemul8.pl',25000,2);

6.2 The Honeypot Case-base Case-base is an important part of any CBR system. Without case base, the CBR cycle is unable to reason and make decision. The content of case base is critical to the performance of the whole CBR system. This component is accessed during case retrieval and case retain. In this research, the case base contains a maximum of 10 initial cases of previous honeypot configuration and deployment. The knowledge contained within the individual cases was extracted from honeypot literatures and security forums on the Internet. Most of the honeypot configurations are extracted from “Honeypot for Windows” by Roger A. Grimes and “Virtual Honeypots” by Niels Provos. For the proposed CBR system, the case base is implemented as a flat structure using SQL command as shown in Figure 3. While Figure 4 shows an example of three cases from the casebase.

insert into honeypotCB values('Case 3','windows 7',80,80,10500,13,'April','Microsoft Windows 7','windows_7','none',25000,11);

Figure 4. Three example cases in the honeypot case-base.

6.3 Case Retrieval Case retrieval approach implemented in this CBR system is k-nearest neighbour (k-nn) similarity algorithm. This approach is selected because of two main reasons. Firstly, the initial size of the case-base for this research is small. Secondly, this retrieval method is the most implemented in many successful CBR systems. The proposed CBR system receives an input case that contains the information of a production host in the network. This input case is built from the log of Host Scanner module. The information described by the input case is later used by the k-nn algorithm to find similar honeypot case(s) from the case-base. This similarity algorithm determines the similarity distance between the input case and the cases found from the case-base. In this research, only one case is needed from the casebase, which is the case that contains the most similar values as described by the input case. The case that has the highest distance value is chosen. To implement this in jCOLIBRI framework, the jcolibri.method.retrieve.NNretrieval.NNScoringMethod class is used. This method’s functionality is to execute the nearest neighbor scoring on all of the attributes. Later, this method returns the most similar case in the form of jcolibri.method.retrieve.RetrievalResult object, which contains the information of the retrieved case together with its distance value. 6.4 Case Reuse After a similar case is found, the solution contained within that case is used to solve the target case. In our case, not all of the attributes’ value can be reused. Some of it needs to be revised or adapted to fit the current problem described by the target case. For this purpose, the attributes contained within the Solution part of the selected case is further divided into two groups. The values from three attributes - OS Personality, Ports and Service Script, are reused back without any modifications.

Figure 3. The structure of the honeypot case-base.

6.5 Case Revise The most similar case that is selected from the case-base might be not fully accurate to solve the current problem described by the target case. Within this cycle, the Uptime

235

The 3rd International Conference on Computer Science and Computational Mathematics (ICCSCM 2014)

and Droprate attributes from the selected solution are further adjusted in order to suit it with the target case scenario. The initial value contained with both attributes can be used as a guide for the adaptation algorithm. The Uptime value from the solution need to be adjusted to a realistic value in relative to the Uptime described in the target case. This is because, the old Uptime value obtained from the past solution only fits the past scenario described in that particular case.

7. Conclusion & Future Work The proposed system will be tested in real network environment in order to test the workability and accuracy of the CBR approach. One possible approach for testing this system is by utilizing a group of virtual machines (VM) to emulate as production hosts. Since VM is very flexible to manage, it is easy to mimic a dynamic network environment in order to feed this CBR system with realistic test cases.

References [1] Nero, PJ, Wardman, B, Copes, H & Warner, G. (2011) Phishing: Crime That Pays eCrime Researchers Summit. eCrime Researchers Summit (eCrime), 2011 , vol., no., pp.1,10, 7-9 Nov. 2011. [2] Benjamin, V. & Hsinchun Chen, "Securing cyberspace: Identifying key actors in hacker communities," Intelligence and Security Informatics (ISI), 2012 IEEE International Conference on , vol., no., pp.24-29, 11-14 June 2012. [3] MyCERT Incident Statistics for the year 2013, http://www.mycert.org.my/en/services/statistic/mycert/2 013/main/detail/914/index.html [4] Kumar, S., Sehgal, R., Bhatia, J. S., "Hybrid honeypot framework for malware collection and analysis." Industrial and Information Systems (ICIIS), 2012 7th IEEE International Conference on , vol., no., pp.1,5, 6-9 Aug. 2012.

236

[5] Spitzner, L (2003). Honeypot: Tracking Hackers, Pearson Education, Boston. [6] Provos, N., & Holz, T. (2008). Virtual Honeypots: From Botnet Tracking to Intrusion Detection. Boston: Addison-Wesley. [7] Rowe, N.C., "Measuring the Effectiveness of Honeypot Counter-Counterdeception," System Sciences, 2006. HICSS '06. Proceedings of the 39th Annual Hawaii International Conference on , vol.6, no., pp.129c,129c, 04-07 Jan. 2006 [8] Wagener, G., State, R., Engel, T., Dulaunoy, A. (2011) Adaptive and self-configurable honeypots, Integrated Network Management, 2011 IFIP/IEEE International Symposium on , pp.345-352, 23-27 May 2011 [9] Budiarto, R., Samsudin, A., Heong, C.W., Noori, S. (2004). “Honeypots: why we need a dynamics honeypots?” International Conference on Information and Communication Technologies: From Theory to Applications. pp. 565-566, 19-23 April 2004. [10] Leita, C., Mermoud, K. & Dacier, M. (2005). ScriptGen: An Automated Script Generation tool for Honeyd. Proceedings of the 21st Annual Computer Security Applications Conference, 203-214. [11] Liu, X., Peng, L. & Li, C. (2011). The Dynamic Honeypot Design and Implementation based on Honeyd. In S. Lin & X. Huang (Eds.), Advances in Computer Science, Environment, Ecoinformatics and Education (pp. 93-98). Wuhan, China: Springer. [12] Sadasivam, K, Samudrala, B. & T. Andrew Yang, 2005. Design of Network Security Projects using Honeypots. Journal of Computing Sciences in Colleges, 20 (4), 282293. [13] Recio Garcia, J, Diaz Agudo, B, Gonzalez Calero, P (2009). Boosting the Performance of CBR Applications with jCOLIBRI 21st IEEE International Conference on Tools with Artificial Intelligence, 276 - 283. [14] Govedarova, N., Stoyanov, S., & Popchev, I. (2008). An ontology based CBR architecture for knowledge management in BULCHINO catalogue. Proceedings of the 9th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing - CompSysTech ’08, V.5. [15] Atanassov, A., & Antonov, L. (2012). Comparative Analysis of Case-based Reasoning Software Frameworks jCOLIBRI and myCBR, 83–90.