Efficient Information Visualization for Intrusion Detection in Web Applications Kok Chin Khor, Siew Kwan Lieong Faculty of Information Technology, Multimedia University, Cyberjaya 63100 Selangor, Malaysia {kckhor, lieong.siew.kwan01}@mmu.edu.my
Abstract Efficient information visualization is an important element required for urgent detection of intruders. The conventional way of browsing system logs does not provide immediate action against unauthorized server entries. In this article we propose a system for web application administrators to easily identify and quickly act upon intrusions in a three layered visualization related to a web application. Information on users of each layer can be viewed easily and detection of intrusion will allow administrators immediate engagement to secure a web application based on the visualization of the layer which is under attack. Usability studies have also been conducted on the system and the results reviewed. Keywords: Web Security, Web Application, Intrusion Detection, Information Visualization.
1.
Introduction
Various intrusion detection techniques have been developed in order to protect connected computer systems. The main purpose of intrusion detection is to detect attacks that may violate computer systems [2]. The conventional ways of which web administrators detect intrusions is by manually analyzing the log data which is in the form of log files. System logs are one of the main sources of detecting intrusions that web administrators rely upon most of the time. As the WWW network traffic rapidly expands, log files are quickly overloaded with information collected from the network. Due to the huge amount of information in the logs, web administrators faced problems of analyzing the log data in order to detect intrusions. To ease the exhaustive task of browsing raw logs, web administrators often seek a variety of graphical tools to assist in information analysis. Although computer technologies have evolved tremendously, computer systems today are not fully
Eugene Ch’ng Electronic, Electrical and Computer Engineering University of Birmingham, Edgbaston B15 2TT Birmingham, United Kindom
[email protected]
secured even with the advent of all sorts of protections [11-13]. This is due to the fact that new methods of attacks are continually being initiated to compromise the security of computer systems. Although it seemed impossible to keep up with new forms of attacks, the web security community can take the initiatives to produce better techniques for detecting intrusions. As mentioned by Kemmerer and Vigna [2], it is important to develop new techniques to protect computer systems. A variety of useful information can be found in log data. For example, Knowledge discovery [7] and the understanding of WWW visitor behaviors [8] are some of the creative use of log data to acquire information through data mining techniques. Nevertheless, some features of data may be missed as human interaction and visualization are less incorporated with the systems. To solve this problem, effective visualization of information gathered from the data mines is a necessity. Currently, visualization is one of the most popular techniques used in analyzing log information for detecting intrusions. Information Visualization is the communication of information using graphics [9]. The information can be in the form of raw data, document, structure and etc. There are many ways to visualize information, for example graphs. Although graphs are useful to a certain extent, it may not be the best way for immediate counter measures in securing a web application. There has to be a more efficient way for web administrators to visualize and interact with the protection software. Therefore, incorporating interactive visualization will help web administrators in understanding the complex behaviors of the intrusion detection of a computer system [3]. The objective for this research is to develop a visualization technique that will assist web administrators in monitoring the user location within a web application for detecting security breach. Our methodology for visualizing the log data is based on the three authorization layers of a web application - public access layer, registered user layer, and administrator layer. Our system has shown that client activities in different layers can be visualized based on session log as information sources.
In this paper, we begin with the discussion of related works in section 2 followed by explanation on system architecture, visualization techniques and rules of detecting intrusions in section 3. The usability study of the system is explained in section 4. Finally, we conclude the paper with contributions and future works.
2.
Related Research
There are two essential types of intrusion detection techniques: misuse detection and anomaly detection. In Intrusion Detection System (IDS) that applied misuse detection technique, attack descriptions or signatures are used to match against audit data in order to detect known attacks. Compare to IDS that applied anomaly detection, the IDS that applied misuse detection is unable to detect new form of attacks. Defining normal behaviors of various activities in a connected computer system is important for anomaly detection. By understanding what normal behaviors are, one can detect the obvious unusual activities that may be violating the system [2]. Most of the IDS that applied either misuse or anomaly detection technique uses visualization technique to assist the administrator in analyzing intrusion. Various visualization techniques based on log data have been developed in order to analyze the information effectively. Organizing log data into viewable information is not new, for example, WAV [10] is a system that visually associates the relationship of clients and URLs based on large volume of web transaction data. In WAV, overlapping of information on client is avoided by positioning clients with similar behavior and relationship together with uncluttered display. Visualization techniques therefore, can be integrated into IDS to achieve the purpose of efficient information visualization as seen in NIVA [4], an intrusion detection visual analyzer for visualizing information. Erbacher and Frincke [5], created a visual representation to represent nodes in an IDS database and data accesses between them are displayed. Although the information represented is clear to identify behaviors of users, their system is only useful in post-mortem analysis. In detecting intrusion behavior and forensic analysis [1], a system with complete environment for data analysis is provided. Various raw data and tools are incorporated in the system and multiple visual representations are used to view the data. The features of the system provide useful information to administrator in detecting intrusions. Nevertheless, involving different kinds of raw data may lead to information overloads. Time is required for administrators to analyze a visual representation representing huge amounts of information, in this case, summarization technique can be considered. Tudumi [6] implemented two types of log summarization techniques. The first technique is used to control the amount of data by summarizing different events. The second technique summarized access host information based on domain names. The system has applied known rules for detecting intrusion and visualizes the
information which includes network access and user login information in layered concentric disks. However, the system does not reflect the parts of the system that is being intruded. The lack of such features can also be seen in other systems [1, 4-5]. In our survey of related works, our observations showed that there should be some new forms of visualization techniques if we wish to apply it to monitoring the security of web applications.
3.
Web Application Monitoring System (WAMS)
The survey of literatures in related areas led us to conclude a generalization of the following factors involved in intrusion detection and information visualization. Firstly, web application administrators need to be aware of intrusions immediately. Time is also required for web administrators to analyze a visual representation generated by the system. As the amount of log data may be very large after a long period of collection time, the work required to analyze it is tedious. In this case, visual representations that are created based on log data may loose its significance in late analysis. Secondly, it is a difficult task for a web administrator to detect which part of a web application that is being intruded upon by analyzing the visual representations that are created based on log data. This is important as it can help a web administrator to decide whether a client’s activity is a normal activity or a possible attack on a web application. If a suspicious activity is occurring in an important part of a web application, for instance, the administrator area, the web administrator can take immediate action to prevent a possible intrusion.
Figure 1 The Three Layers of a Web Application The two important factors have been considered and applied in this research. The objective is to develop a technique to visualize session logs in real time in order to
alert web administrators of intrusions. The interactivity of the system will also enable them to immediately identify the part of a web application that is being intruded upon. We associate the clients with different layers of a web application and find out the exact position of the clients in a web application. The two main features integrated into the Web Application Monitoring System (WAMS) are the Information Visualization Interface and the Rules Application Element. In the Rules Application Element, simple rules are set based on session information or log-in information to detect possible intrusions. Interactive features are also incorporated into the system. Once an intrusion is detected, web administrator can interact with the Graphic User Interface (GUI) to retrieve important information about the intrusion. The next section illustrates the system architecture. A web application generally consists of three layers as presented in Figure 1. The Public Access Layer consists of web pages that can be accessed by the general public without restrictions. The Registered Users Layer consists of the web pages that can be accessed by registered users only. Registered users are able to view or alter important information which is disclosed or not disclosed to public. The Administrator Layer is a layer for web administrators with rights to view and alter all information residing in a web application. There are check points of authorization required to enter into the Registered User Layer and the Administrator Layer.
used. Log data will be created by the web application itself to collect session information when users attempted to log into the web application. The system gathers log data which is stored in database for analysis and rules is applied to determine possible intrusions. The processed information is then sent for visualization in real time. The system is able to analyze updated information immediately from the logs and display it in a web browser. In Figure 3 the three concentric layers represent the three different layers of a web application. The outmost layer is the public access area and the middle layer is the registered user area. Administrator area belonged to the innermost layer. If there are no intrusions, the layers appeared green in color (A and C). As soon as an intrusion occurs, the color of the layers will appear red and start to blink (B and D). Yellow dots on the layers represent the number of users visiting the web application. A dot represents 10 online users. The number of online user a dot represents can be altered according to the average number of access everyday. If the average number of access is in thousands, it is good for the web administrator to set the number the dot represents to 100 to ease the monitoring tasks. The dots are helpful in alerting the web administrator of the heavy access on the web application. A, C, and D each as at least an authorized administrator in the Administrator Layer. B and D have an intruder in the Registered Users Area while A and C does not.
Figure 2 System Architecture of the Web Application Monitoring System (WAMS) The WAMS is built to monitor the layers of a web application. WAMS can be embedded in a web application as one of its administrative feature or run as a standalone system, both requiring authentication and authorization before the interface appears for use. The system process is illustrated in Figure 2. The logging module of the web application will gather session information and store it in database. The current system is hosted in Apache Web Server with the web application written in PHP and supported by MySQL database to store session information. However any other web servers and web application languages can also be
Figure 3 Visualization of information on different layers of web application showing four different scenarios In order to retrieve information on a specific layer that is probably being intruded, the web administrator can interact with the layer using the mouse pointer to show the displayed information panel. Figure 4 shows the information displayed when the intruded area is selected.
The Rules Application Element applies simple rules in order to detect possible intrusions. Since the public access layer contains information that can be viewed by the general public, intrusion detection is not performed in that layer. The intrusion detection will be performed only in registered user and administrator area. Users are authenticated using session or cookies that are generated by server-side scripting language which is PHP.
Users considered the WAMS is easy to use to monitor activities of clients on 3 different layers of a web application. Information supplied by the WAMS is generally considered to be sufficient in helping users track possible intrusions. The score for “Helpfulness in Alerting Webmaster” is low in comparison with suggestions given by users that the system needed alerts such as email messages or Short Messaging Service (SMS) while not in the office. The outcome of the test and suggestions are being considered for implementations in future versions.
Figure 4 Display of Information upon selection of the intruded area Web administrator is able to alter the rules according to the situation. In the administrator area, we assume that the administrator will access the server by using one of the workstations in an area within a fixed range of IP addresses. Any access to the layer with IP addresses that is not within the defined range will be considered as an intrusion. Alerts will be generated if non-administrator accessed this layer. Multiple login with the same username is considered a possible intrusion. In the registered user area, rules in administrator area will be applied without a fixed range of IP address. This is due to fact that some of the web applications allow access of international users; forums and free email services are some instances. Similar to the administrator area, multiple connections from the same remote workstation will be treated as a possible intrusion.
4.
Usability Study
We have conducted a usability study to see how well our visualization technique can help the web administrator in intrusion detection. Based on Nielson's study [14], we propose a usability study of six participants consisting of experienced staff holding job positions as senior system analyst, network administrator, Human Computer Interface subject lecturer and web programmers. The evaluation was carried out in a computer lab setting at the Multimedia University, Cyberjaya, Malaysia. Full system description and instructions were given to the users in document form before the evaluation. During the study, participants gave scores based on their perception of the WAMS for clarity, ease of use, flexibility of system, system responsiveness, helpfulness in alerting the administrator and sufficiency of information in tracking intrusions. The questions ratings are based on a 7-point bipolar Likert-type scale, where 1 represented worst rating and 7 represented best rating. The target value of 5.25 being within the 95% confidence limits for all scales. The result of the evaluation is as shown in Figure 5. The mean scores for each criterion are above our target.
Figure 5 The results of the usability study in graph format
Conclusions and Future Work In this project, we proposed an alternative technique for visualizing log data in order to ease the web administrators in monitoring a web application’s security. The proposed visualization technique will greatly assist web administrators in handling attacks that requires quick response. Identifying the layer of web application that is being intruded is important in refining the security of a web application. The refining work will only be focused on the identified layer and this will save time and maintenance. The proposed system is flexible as it can be integrated in a web application or run as a standalone system. The techniques we have applied and the initial results in the usability studies have demonstrated the usefulness of the Web Application Monitoring System for the continual development of the concept in this area. We are in the effort of exploring the possibility of refining visualization techniques that are able to represent detailed information of each node of users and their behaviors for effective visualization. At the time of writing, the modules in a web application are being represented as divisions within the visualization concentric circles so that user locations can be identified based on the arrangements. The IP Locator is being added to the system to find out the exact location of the nodes that launch the attacks. This is possible for
experimental purposes for nodes that are installed within our research facilities. We are also attempting to integrate the WAMS with other IDS so that more rules can be utilized. The concept and the system we have presented have shown that the importance of integrating Information Visualization in Intrusion Detection can be an indispensable tool for detecting possible intrusions within a web application.
References [1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11] [12] [13] [14]
Robert F. Erbacher. Intrusion Behavior Detection Through Visualization. Proceedings of the IEEE Systems, Man & Cybernetics Conference. Vol. 3, 25072513. October 2003. Richard A. Kemmerer and Giovanni Vigna. Intrusion Detection: A Brief History and Overview. Computer, Volume 35, Issue 4, 27-30. April 2002 Soon Tee Teoh, Kwan-Liu Ma, Soon Felix Wu and T.J Jankun-Kelly. Detecting Flaws and Intruders with Visual Data Analysis. IEEE Computer Graphics and Applications. Volume 4, Issue 5, 27-35. September 2004. Kofi Nyarko, Tanya Capers, Craig Scott and Kemi Ladeji-Osias. Network Intrusion Visualization with NIVA, an Intrusion Detection Visual Analyzer with Haptic Integration. Proceedings of the 10th Symp. on Haptic Interfaces for Virtual Envir. & Teleoperator Systs. 277. 2002 Robert F. Erbacher and Deborah Frincke. Visualization in Detection of Intrusion and Misuse in Large Scale Network. IEEE International Conference on Information Visualization. 294-299. July 2000. Tetsuji Takada and Hideki Koike. Tudumi: Information Visualization System for Monitoring and Auditing Computer Logs. Proceeding of the Sixth International Conference on Information Visualization. 570-576. July 2002. Feng Tao and Fionn Murtagh, Towards Knowledge Discovery from WWW Log Data. Proceedings of the International Conference on Information Technology: Coding and Computing. 302 -307. March 2000. Juan Velasquez, Hiroshi Yusada and Terumasa Akoi. Combining the Web Content and Usage Mining to Understand the Visitor Behavior in a Web Site. Proceeding of the 3rd IEEE International Conference on Data Mining. 669-672. November 2003. Usama Fayyad, Georges G. Grinstein and Andres Wierse. Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann Publishers. 2002. Ming C. Hao, Prakaj Grag, Umeshwar Dayal and Vijay Machiraju, Daniel Cotting. Visualization of Large Web Access Data Sets. Proceeding of the Symposium on Data Visualization. 201-204. March 2002. Snort. http://www.snort.org Sygate Firewall. http://www.sygate.com Ad-Aware SE. http:// www.lavasoftusa.com Jacob Nielson, Why You Only Need to Test With 5 Users, http://www.useit.com/alertbox/20000319.html, 2000