FLUKES: Autonomous Log Forensics ... - ACM Digital Library

4 downloads 41 Views 968KB Size Report
Jul 20, 2017 - [4] Monther Aldwairi, Yaser Khamayseh, and Mohammad Al-Masri. 2015. ... [6] Mansour Alsaleh, Abdulrahman Alari , Abdullah Alqahtani, and ...
FLUKES: Autonomous Log Forensics, Intelligence and Visualization Tool Monther Aldwairi∗

Hesham H. Alsaadi

College of Technological Innovation, Zayed University P.O. Box 144534 Abu Dhabi, UAE [email protected]

College of Technological Innovation, Zayed University P.O. Box 144534 Abu Dhabi, UAE [email protected]

ABSTRACT

1

The number of structured and unstructured logs datasets is increasing, and the complexity of analyzing threats from log files poses a challenge to the research community. We propose intelligent technique to visualize and extract threats from logs files using D3.js modules with standard RegEx API, called ”FLUKES”. In this paper we investigate the text-based ASCII format FTP, Snort, Apache and IIS server logs. When a content of a file type .json, .csv, .log, and .txt format is loaded into FLUKES, a representative summary is executed with least significant attack traces. FLUKES will formalize and generate a new signature pattern that eases the process of detection and analysis of threat anomalies in log files. Forensic investigators can then determine a set of certain fields relevant to the attack according to the corresponding target. We present an example investigation comparison based on FTP and Apache server logs collected and managed using Snort. The ultimate contribution is to forensically determine the summary of authentication (failed and successful) attempts to secure systems and traces found without altering the log evidence.

Security logs are considered a rich resource for forensics information useful in investigating a security breach. Most of security defense systems utilize some format of logging to record securityrelated events, mainly text-based [6]. Users often do not realize that their systems may be a target or a source of a tactical attack, due to the difficulty in distinguishing normal and abnormal threat events in the log files [10]. Collecting and investigating logs from centralized production environment provides a good foundation for security monitoring. The sheer volume, complexity and different formats of logs have been rapidly increasing and therefore providing a solution to mine the threats continues to pose a challenge to research community. Visualizing logs has become an essential part in providing efficient analysis and comprehensive security solutions. Visualizing logs include graphically representing log entries and data events to make attack detection and forensic analysis simpler and faster. It might also include log cleansing, correlation, aggregation and providing a meaningful time-line of threat relevant events [2]. However, diversification of security equipments and advancements in data storage, adds limitations to the system security administrator human ability to analyze security logs. [19]. Security Information and Event Management system (SIEM) solutions promise to provide abstraction and correlation of incidents to produce security situational awareness reports. The reports help in rapid incident response and log management [17] as well as provide effective defense solution for detecting advanced threats attacks [7]. Despite the introduction of several commercial SIEM solutions, some utilizing Apache Hadoop and Spark ecosystems to allow processing large amount of logs in parallel (e.g. Cisco MARS [11], HPE Security ArcSight Data Platform[12]), they often fail to detect subtle inconsistencies or identify new attack signatures [15]. Unfortunately, these systems are signature based, where unknown emerging attack signatures take days to be generated and configured into the system to detect threats [16]. Mostly these new types of attacks (e.g zero day) will go undetected in most systems [8], because they are unable to distinguish a normal from abnormal network activity [4], resulting in generating false alarms from benign user actions such as Log in attempts). In this paper we intend to investigate and track the signature of Brute-force attacks against ftp server host machines causing the server to be unable to respond to requests from legitimate users. The optimal goals of this paper are: a) to develop a novel approach in detecting threat signatures by reverse engineering FTP text-based log structure b) to correlate the events from multiple sources and

CCS CONCEPTS •Computer systems organization → Embedded systems; Redundancy; Robotics; •Networks → Network reliability;

KEYWORDS Intrusion detection, intrusion prevention, log forensics, visualization, D3 ACM Reference format: Monther Aldwairi and Hesham H. Alsaadi. 2017. FLUKES: Autonomous Log Forensics, Intelligence and Visualization Tool. In Proceedings of ICFNDS ’17, Cambridge, United Kingdom, July 19-20, 2017, 6 pages. DOI: 10.1145/3102304.3102337

∗ Department of Network Engineering and Security Jordan University of Science and Technology PO. BOX. 3030, Irbid, 22110, Jordan

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. ICFNDS ’17, Cambridge, United Kingdom © 2017 ACM. 978-1-4503-4844-7/17/07. . . $15.00 DOI: 10.1145/3102304.3102337

INTRODUCTION

ICFNDS ’17, July 19-20, 2017, Cambridge, United Kingdom generate a graphical representation of the threat logs. The main contributions of this work are: • The proposed approach able to generate and characterize a new signature pattern obstinately, and automatically render the events into graphical representation. All graphical representation is presented in a single dashboard, providing a comprehensive summary view of the attack traces. • The dashboard includes the capabilities to export different uploaded logs and visualizing them. Its supports XML ,JSON and CSV file formats, for future integration with frameworks such as Apache Hadoop, Spark and Splunk ecosystems. • The proposed approach is generic in sense, that is not limited the presented scenario of Brute-force attack. The rest of the paper is organized as follows. Section 2 discusses the existing related work and proposed methods in distributed log management systems. In Section 3, we describe the methodology, proposed approach and illustrate the FLUKES dashboard. In Section 4, we show the implementation and experimentation results as well as discuss FLUKES limitations. Section 5 concludes the paper.

2

RELATED WORK

Security-related logs are poorly structured and maintained, due to fact that most web servers adhere to the National Center for Super computing Applications (NCSA) Common Log Format and the later World Wide Web Consortium (W3C) Extended Log File Format. Both of which are not intended to support the rich data payloads for detection of attack signatures. Numerous existing security visualization tools are specific to a certain log file format or structure. For instance, ClockView [18] and PeekKernelFlows [25] use NetFlow logs to monitor large IP spaces over long periods. In addition, NetBytes Viewer [23] uses NetFlow logs and focuses only on the communications of a single host network implemented in projects such as FlowViz [24]. As a result, the collection of log files, which mostly happens through TCP/IP connections and probably using a common application protocol such as HTTP. NetFlow can only determine a certain type of log structures that are not monitored in a network but observed in network traffic communications. NetFlow uses some type of modern visualization technique in monitoring activity logs, the signature detection mechanisms is obtained from a single host network, which results in failing to detect potential threats [3]. Security Information and Event Management systems solutions such as Cisco MARS [11] differentiate real-time attacks based on pre-configured batch rules. The rules are crafted manually and obtained remotely. SIEM has the ability to collect log files from central location and process all logs to supported visualization platforms e.g Apache Hadoop. The weakness of the SIEMs signature mechanism, is that it accepts only information flow formats from specific centralized data collection units . Another weakness of the platform is generating false alerts from a benign user activities on the network, because of the inability to distinguish anomalies and new attacks [5]. Tableau Public and Many Eyes softwares uses the model of distributed log collection, similar to SIEM solutions, it can be integrated with Hadoop environment for graphical representation. It

M. Aldwairi and H. H. Alsaadi accepts different log formats from different sources [20]. The weaknesses of the platform are not supporting complex logs formats and log files need to be reformatted as well as sorted in a readable format for visualization. Another proposed tool called ELVIS (Extensible Log Visualization tool), allows security experts to explore and visualizes specific security logs dataset formats. The formats are: Apache standard, syslog and its variations, auth.log and Snort IDS. It restructures the contents of the log file to specific format for pattern matching. If pattern match is found, a dataset is created for processing and visualization. The weakness resides within the complex graphical representation to detect anomalies, and lack in correlation of multiple log file datasets. The representation allows one view from only one log file [14]. The aforementioned platforms adopted pre-and post-condition techniques for log signature detection as well as log dataset processing for visualization. Due to the complexity of log file structures, the discussed threat detection mechanisms often fail to detect anomalies in network. Some support specific format, and even when handling multiple formates they cannot correlate more than one log structure. The fact that attack patterns are detected based on pushed or manually generated signature rules, makes it difficult to detect zero-day-attacks in real-time Finally, lack of a unified format, coming short in log batch processing, and streaming data processing signature pipelines is challenging, leaves the door open for new visualization tools [1]. Next, we explain the proposed methodology and reverse engineer sample log files from Snort/FTP logs and IIS server log different structures. FLUKES correlates the logs, creates a visual representation and generates unique signature pattern of Brute-force attack.

3

METHODOLOGY

FLUKES uses log files as data source for processing. In this section, first we show the relational paradigm of FTP text-based log file structure. Second we show the process of structuring new fields from the datasets. Third, we use regular expression language (RegEx) to characterize the attack signature. Finally, we correlate and visualize threat events found from two different log files.

3.1

Log Event Management

In this section, we detail the process of managing the log files, how the logs files are organized and broken into instances in FTP log files. In FLUKES we consider that each event is recorded on a single record per-line, and the newly recorded events are always added at the end of each record in single log file. Fig. 1 shows a sample record from FTP server log file. The record has one line representing one event. The record consists of 4 instances of different status codes and username for the user authenticated by the server. Each record contains different values called instances. To manage the structure of the events, FLUKES assigns for each instance in record/line a virtual value, a total of 8 instances, which generate a virtual list. FLUKES will determine the similarity of instances resulting with a new signature pattern for the attack. We describe the two records as blocks, block 1 represents right bond of first event and block 2 represents the left bond of second event. Both blocks contain 4 instances with different format values.

FLUKES: Autonomous Log Forensics, Intelligence and Visualization Tool

ICFNDS ’17, July 19-20, 2017, Cambridge, United Kingdom

Figure 1: Simple structure of two records/lines in FTP log file

FLUKES performs a reverse process to block 2, which results in the new signature of administrator account access login with status code 230. This process will loop over all uploaded log files until each record per-line is processed at time of upload. The next section explains the technique employed by FLUKES in processing FTP log datasets. Later we show how FLUKES is able to characterize the signature pattern for a Brute-force attack from the log file.

3.2

Log Utilization

FLUKES is able to restructure different log formats, since all log files are recorded in text-based (ASCII), and saved in files with .log extension. FLUKES accepts and is able to convert several formats: .txt, .log, .json and .csv extensions. All uploaded log files will remain unmodified and retain their original format, however, FLUKES appends them internally to an existing DOM element for graphical representation.

Figure 2: Sample of FTP log file Fig. 2 shows the signature log file of Brute-force attack using tool known as ”Multi-Thread FTP scanner version 0.25”. We observe from the figure that there were 5 failed login attempts represented by the events with status code 530, and one successful administrator access event with status code 230. Fig. 3 shows the activity of the

Figure 3: Sample of Web log IIS server log file

attacker from the website IIS log file. The attacker logged in from different IP addresses on different dates. The activities in the figure show the directories accesses made by the attacker. In FLUKES we use regular expression (RegEx) language to capture anomaly in each uploaded log file. We preform insertion sort and use text pattern recognition to characterize the attack signature events. The RegEx script allows us to exploit (restructure) the datasets of Snort FTP server log and IIS server log one at a time. FLUKES matches instances based on programmed keyword text patterns (RegEx) such as client IP address,timestamp and status code 230. Any detected anomalies will be considered a new signature threat. Fig. 4 explains the technique used in FLUKES for generating the new signature threat and the dataset processing is described in details below. 3.2.1 Semantic Preprocessing. The preprocessing step is performed to process all records’ fields uploaded in FLUKES project folder. Fig. 4 (b) shows the use of text pattern recognition (RegEx) process, which aims to find the exact similarity of instances based on its text-based structure. Once similar text is found, FLUKES inserts integer values to each instance and compiles all suspect records/events in each log file with the same behavior. In addition, the script performs data cleansing and deletes unnecessary alphanumeric characters to speed up the process of sorting datasets. 3.2.2 Threat Detection. The use of programmed (RegEx) script allows FLUKES to look only for events that match programmed keywords text patterns. This allows FLUKES to determine the structure of the threat in text accurately. For example if the script is programmed to detect text based structure that has the status code

ICFNDS ’17, July 19-20, 2017, Cambridge, United Kingdom

M. Aldwairi and H. H. Alsaadi

Figure 4: FLUKES methods for processing FTP log datasets (a) FTP log file and (b) Semantic preprocessing: text pattern recognition with insertion sort (c) Data abstraction, iteration and extraction (d) The new signature pattern of Brute-force attack 230 (successful administrator access), FLUKES will trigger the event as threat to compile its semantics. Fig. 4 (a) shows the detected threat observed from successful administrator access matching with the programmed (RegEx) code of the exact text pattern of the event. Definition 1 explains the pattern matching process. The definition represents the instances: T as time-stamp, IP as IP Address, E as user http method and C as status code. We assume that the detected threat is found using the (RegEx) code in one record of all P (possible) uploaded log files in FLUKES. Then FLUKES generates an integer value to each instance to be always true.

assigned integer values. As a result of the previous two processes Fig. 4 (d) shows the new formatted signature pattern of Bruteforce attack. We can translate the final refined signature pattern as follows: Theorem: Sample Node. dir ect   v1 → v3 ≡ 17 : 30 : 54     dir ect    v2 → v4 ≡ 192.168.16.60 Recur s    v5 → v7 ≡ [5]PASS     Recur s  v6 → v8 ≡ 230 

Definition 1. P

Reдex (Match(T 1, IP2, E3, C4)) → int (true (T 1, IP2, E3, C4)) 3.2.3 Sorting datasets. Once each record is assigned with integer value, FLUKES will utilize a data abstraction technique called (iteration and extraction). This process consists of two parts direct and recursive: Direct route: used to process the left bond of the first record (two instances) the Timestamp and Client IP address in a log file. This will always execute the same result when (Brute-force attack) is performed on the FTP server. Recursive Route: used to recursively reverse the right bond of second recorded matched event including two instances: the user HTTP method and status code. This will always execute the success user authentication access status code 230. Fig. 4 (c) shows the proposed data abstraction technique, where instances are mapped into nodes, and each node represents the

The sample node shows that when two elements (nodes v3 and v4) have been matched to a specific attack pattern signature on a given event (230), a detection event is observed for visualization.

4

IMPLEMENTATION AND EXPERIMENTAL EVALUATION

FLUKES uses standard web technologies for rendering logs datasets: HTML5, Javascript, CSS and SVG, built with the use of D3.js modules for static and interactive visualization [9]. Representations are flexible to be changed since its all are open-source. Currently, FLUKES supports the following log formats: • • • •

Snort alert logs. FTP server logs. Apache web server logs. IIS server logs.

FLUKES: Autonomous Log Forensics, Intelligence and Visualization Tool In order to test the tool, we used the U.S. National CyberWatch Mid-Atlantic Collegiate Cyber Defense Competition (MACCDC) 2012 FTP and HTTP datasets [21]. The examination also used the 2012 Honeynet Exploit Kits of Snort logs generated from various threat classes to test detection capabilities and observation [22]. The experimentation is carried on Fujitsu Siemens Lifebook Laptop model AH532/G21, with 4GB of memory, 2.6 GHz Intel Core i5 processor with HD Graphics 4000 Nvidia GeForce. FLUKES was tested using Google Chrome version 54 and Firefox version 47. FLUKES found interesting signature patterns in FTP/IIS logs. In Fig. 5 we show FLUKES dashboard that demonstrates two graphical representations of 1) Static visualization of FTP/IIS server log correlation 2) interactive parallel coordinates of FTP server log Brute-force attack of a single log file 3) summary view of the events in the rendered table. FLUKES was able to detect the anomaly of successful Brute-force attack and generated new signature pattern, as well as mapping all events in each log file. Furthermore, FLUKES was able to handle generally two log files at a time and correlate all events with no size limitations. Nonetheless, the test shows the following limitation in FLUKES. • Signature pattern detection mechanism is done with external RegEx API, which might slow down processing of large log files. • FLUKES uses D3.js modules for visualization, which has some data-source limitation in processing RAW data in SVG graphical format.

5

CONCLUSIONS AND FUTURE WORK

This paper improvises an automatic reserved engineering threats detection tool ”FLUKES”. The tool optimizes and extracts intelligent threat modules using a special designed machine learning algorithm technique. It allows forensic experts to visualize and explore different threats monitored by, servers, IDPS and anti-virus software. In order to automatically select a representation model, FLUKES uses all information on each field of the log file. This ensures that the entire log file can be expressively summarized and reconstructed without tampering with the original information. If new pattern of attack is found, the user only has to program a regular expression (RegEx) code of the attack behavior, which will allow FLUKES to parse the code to be inspected in all given log fields. The test also demonstrates that FLUKES is useful to visualize, correlate information and explore security related log files, by determining hidden threats found in log files. However, future work is still necessary to allow users to have a graphical control to the selection of data fields of each log file. As well as adding more selection of graphical representation with the more capabilities in exporting standard reports of the findings. Finally, we are working on developing internal mechanisms to minimize external calls to third party modules. FLUKES is being integrated with Flume Client SDK, which will enable the it to connect to Flume and send data into Flumefis data flow over RPC. This will allow the tool to transmit log files more effectively and enhance their security. [13].

ACKNOWLEDGMENTS This work was supported by Zayed University Research Office, Research Cluster Award # R17079.

ICFNDS ’17, July 19-20, 2017, Cambridge, United Kingdom

REFERENCES [1] Monther Aldwairi and Koloud Al-Khamaiseh. 2015. Exhaust: Optimizing WuManber pattern matching for intrusion detection using Bloom filters. In Web Applications and Networking (WSWAN), 2015 2nd World Symposium on. IEEE, 1–6. [2] Monther Aldwairi and Rami Al-Salman. 2011. MALURLs: Malicious URLs Classification System (The best paper award) (Annual International Conference on Information Theory and Applications Canning). GSTF Digital Library (GSTF-DL). The best paper award. [3] Monther Aldwairi and Niveen Ekailan. 2011. Hybrid Pattern Matching Algorithm for Intrusion Detection Systems. Journal of Information Assurance and Security 6, 6 (2011), 512–521. [4] Monther Aldwairi, Yaser Khamayseh, and Mohammad Al-Masri. 2015. Application of artificial bee colony for intrusion detection systems. Security and Communication Networks 8, 16 (2015), 2730–2740. [5] Shadi Aljawarneh, Monther Aldwairi, and Muneer Bani Yassein. 2017. Anomalybased intrusion detection system through feature selection analysis and building hybrid efficient model. Journal of Computational Science (2017). [6] Mansour Alsaleh, Abdulrahman Alarifi, Abdullah Alqahtani, and AbdulMalik Al-Salman. 2015. Visualizing web server attacks: patterns in PHPIDS logsfi. Security and Communication Networks 8, 11 (2015), 1991–2003. [7] Beth Binde, Russ McRee, and Terrence J OfiConnor. 2011. Assessing outbound traffic to uncover advanced persistent threat. SANS Institute. Whitepaper (2011). [8] Mark Boger, Tianyuan Liu, Jacqueline Ratliff, William Nick, Xiaohong Yuan, and Albert Esterline. 2016. Network traffic classification for security analysis. In SoutheastCon, 2016. IEEE, 1–2. [9] Mike Bostock. 2015. Data-Driven Documents. (2015). https://d3js.org/ [10] Eoghan Casey. 2010. Handbook of Digital Forensics and Investigation. Vol. 3. Elsevier Academic Press Inc, Oxford, United Kingdom. 150–151 pages. [11] Inc Cisco Systems. 2009. User Guide for Cisco Security MARS Local and Global Controllers, Release 6.x. (2009). http://www.cisco.com/en/US/docs/security/ security management/cs-mars/6.0/user/guide/combo/bkMarsUgCombo.pdf [12] Hewlett Packard Enterprise. 2016. Announcing HPE Security ArcSight Data Platform solution. (2016). http://www8.hp.com/h20195/V2/GetPDF.aspx/ 4AA6-5106ENW.pdf [13] Apache Software Foundation. 2012. Data flow model. (2012). https://flume. apache.org/FlumeDeveloperGuide.htmll [14] Christopher Humphries, Nicolas Prigent, Christophe Bidan, and Fr´ed´eric Majorczyk. 2013. Elvis: Extensible log visualization. In Proceedings of the Tenth Workshop on Visualization for Cyber Security. ACM, 9–16. [15] Ardymulya Iswardani and Imam Riadi. 2016. DENIAL OF SERVICE LOG ANALYSIS USING DENSITY K-MEANS METHOD. Journal of Theoretical and Applied Information Technology 83, 2 (2016), 299. [16] Mazen Kharbutli, Monther Aldwairi, and Abdullah Mughrabi. 2012. Function and data parallelization of Wu-Manber pattern matching for intrusion detection systems. Network Protocols and Algorithms 4, 3 (2012), 46–61. [17] Nan Ju Kim, Hoon Jeong, Hye Jin Pyo, and Eui In Choi. 2014. Security Framework Using Forensic Function and Log Management. In Applied Mechanics and Materials, Vol. 590. Trans Tech Publications, 752–755. [18] Christopher Kintzel, Johannes Fuchs, and Florian Mansmann. 2011. Monitoring large ip spaces with clockview. In Proceedings of the 8th international symposium on visualization for cyber security. ACM, 2. [19] Jaehee Lee, Jinhyeok Jeon, Changyeob Lee, Junbeom Lee, Jaebin Cho, and Kyungho Lee. 2016. A Study on Efficient Log Visualization Using D3 Component against APT: How to Visualize Security Logs Efficiently?. In 2016 International Conference on Platform Technology and Service (PlatCon). IEEE, 1–6. [20] Kristi Morton, Magdalena Balazinska, Dan Grossman, Robert Kosara, and Jock Mackinlay. 2014. Public Data and Visualizations: How are Many Eyes and Tableau Public Used for Collaborative Analytics? ACM SIGMOD Record 43, 2 (2014), 17–22. [21] NETRESEC. 2012. U.S. National CyberWatch Mid-Atlantic Collegiate Cyber Defense Competition (MACCDC) Netresec. (2012). http://www.netresec.com/ ?page=MACCDC [22] Mike Sconzo. 2012. Security Repo Snort Logs. (2012). http://www.secrepo.com/ tg/tg snort fast.7z [23] Teryl Taylor, Stephen Brooks, and John McHugh. 2008. NetBytes viewer: An entity-based netflow visualization utility for identifying intrusive behavior. In VizSEC 2007. Springer, 101–114. [24] Teryl Taylor, Diana Paterson, Joel Glanfield, Carrie Gates, Stephen Brooks, and John McHugh. 2009. Flovis: Flow visualization system. In Conference For Homeland Security, 2009. CATCH’09. Cybersecurity Applications & Technology. IEEE, 186–198. [25] Cynthia Wagner, G´erard Wagener, Alexandre Dulaunoy, Thomas Engel, and others. 2010. PeekKernelFlows: Peeking into IP flows. In Proceedings of the Seventh International Symposium on Visualization for Cyber Security. ACM, 52– 57.

ICFNDS ’17, July 19-20, 2017, Cambridge, United Kingdom

M. Aldwairi and H. H. Alsaadi

Figure 5: FLUKES dashboard with static visualization of FTP/IIS radial log correlation, interactive parallel coordinates of FTP server Log and filtered log view of Brute-force attack