Network Eye: End-to-End Computer Security Visualization - CiteSeerX

2 downloads 26873 Views 2MB Size Report
network awareness tools such as Network Eye and others .... tive protection and monitoring of the data stored ... basic, contrasting design philosophies that are.
Network Eye: End-to-End Computer Security Visualization Glenn A. Fink*, Robert Ball*, Chris North*, Nipun Jawalkar*, and Ricardo Correa** *Dept. of Computer Science, Virginia Polytechnic Institute and State University Blacksburg, Virginia 24061-0106 USA +1 540 961 0159 {finkga | rgb6 | north | njawalka}@cs.vt.edu

Abstract Visibility is crucial to managing and securing today’s computers and networks. Visualization tools are a means to provide visibility into the invisible world of network computing. Many good tools exist that give administrators a view into parts of the total picture, but our year-long study of system administrators and their tools shows a strong need for end-to-end visualization of network activity that preserves the context of the information observed. End-to-end visualization will allow an administrator to focus on one communication end-point and see the applications, processes, sockets, and ports responsible for all communications there. Then, the user can follow an application’s communications across the network and see the remote host and ports involved. If the same user owns the remote point, a tattletale client installed there can give the port-socket-processapplication information for the remote end point as well. This end-to-end view will help administrators construct accurate mental models of their networks via a comprehensive overview and drilldown for greater detail. This paper presents the requirements we garnered from system administrators as we developed Network Eye. We then discuss two vertical prototypes and discuss users’ reactions to them and to the end-to-end view.

Introduction Recent attacks (especially infamous worms like Slammer, SoBig, Blaster, MyDoom, and PhatBot, to name a few) have cost businesses “an estimated $666 million in 2003, according to a survey of computer security executives released today.” [MacGuire, 2004] The survey “showed that more than 40% of 500 executives polled said hackers have become the greatest cybersecurity

**Dept. of Computer Science, The University of Texas at El Paso Computer Science Building Room 234 500 W. University Drive; El Paso, TX 79968-0518 [email protected]

threat to business and government networks, compared to 28% who feared internal threats such as disgruntled or recently fired employees.” This is arguably only a subset of the damages that can be both quantified and admitted to. In a recent address, Alan Paller, founder of the SANS Institute, cited system administrators’ lack of awareness and formal training in security of as a major reason why hackers succeed. But training is costly, and maintaining awareness is difficult. We studied the activities and tools of local system administrators for over one year to determine their needs and to work with them in designing tools to allow them to work efficiently. Regardless of training or experience, administrators must be able to rapidly form accurate mental models of what is happening in their systems and networks, especially during a crisis. Our interviewees have indicated that they most need concrete visualization tools to succeed. We are participatively designing Network Eye with our user community to provide usable visualizations for network and system awareness. The Requirements Analysis section of this paper presents our findings from interviewing 22 system administrators about what kinds of computer-security visualizations they need. These interviews helped us refine our potential user community and their needs for network visualization tools to enhance awareness. The Solution Design section describes our end-to-end architecture (Network Eye) and two vertical prototypes that we developed as proofs-of-concept for the two key portions of the overall architecture. In the Evaluation section, we analyze the reactions and advice of a select set of interview subjects to the prototypes we developed. The evaluation consists of a usability study and an informally structured cognitive walkthrough. We conclude with lessons

learned from the requirements generation, development, and evaluation phases.

Requirements Analysis To determine the needs of our user community, we interviewed 22 professional system administrators, many of whom specialize in computer and network security. The subjects had varied backgrounds and experience levels and were from two large universities. We asked them about their duties, their experience with intrusions and other security incidents, and the tools they use. Most of the subjects we interviewed were very enthusiastic about a graphical approach to network awareness. They showed us examples of tools they used (both graphical and text-based) to help us understand their visualization needs. Interview approach Initially, we identified individuals who were professional system administrators and conducted semi-structured interviews. From these interviews, we gleaned information about the requirements for a potential visualization tool. We formed an overall architecture (Network Eye) and wrote up several prototypes of portions of this architecture. Then we conducted a small usability study (eight subjects) to test the utility of our concepts. Finally, we brought the prototypes to some of the original interviewees (and a few of their associates) to get freeform feedback on the utility of our designs. This section recounts the requirements gleaned from this study. Who are the users? We began to interview system administrators, naively thinking of them as a homogenous group. Instead we found at least four distinct job types in the group of subjects studied: 1. System Administrator for Users (SAU): Manages end-user workstations for individuals or labs; lots of user contact (33% of jobs reported by interviewees). 2. System Administrator for Servers (SAS): Manages web, file, mail, compute, etc. servers with little or no user contact (42% of jobs). 3. Security Officer (SO): Receives data from SAU/SAS, sets security policy, coordinates response to security incidents, forensic

investigation, risk analysis, law enforcement liaison (13% of jobs). 4. Network Analyst or Researcher (NAR): Monitors health and welfare of enterprise network; investigates ways to improve performance (4% of jobs). Another 8% of duties (programmer, supervisor, etc.) reported by interviewees did not fit any of the above categories. We believe that Network Eye will support the needs of all of these jobs, especially the SO and NAR jobs whose work intrinsically involves capabilities particular to Network Eye and available nowhere else. System Administrator Activities We identified the following set of securityrelated system administration activities in the course of our investigation: General Administrative Activities: • Patching software • Managing users (education, individual service, problem-solving, etc.) • Maintenance (log file reading, adjusting configurations, etc.) • Ambient network awareness Pure Security activities: • Staying informed of exploits, etc. • Forensics (figuring out what happened) • Response and recovery • Directed network investigation Of the security-related activities listed, visual network awareness tools such as Network Eye and others can support ambient network awareness (administration) and directed network investigation (pure security). Our findings indicate these two activities are inherent elements of SO and NAR jobs. SAU and SAS jobs also entail these activities to a lesser extent. System administrator activities have both proactive and reactive facets. [Venter and Eloff, 2003] For instance, an administrator may read log files proactively, to look for and head off suspicious activity, or reactively, to find out what is happening where during a crisis. For example, computer “worms” have a definite lifecycle that can determine what is most important for a system administrator to do at a given time. Before the outbreak, administrators may be primarily in pro-

active mode, but during and after a major worm attack, the same administrator may be in reactive mode. Understanding how visualization tools may be used during each of these periods is important, since they typify the kinds of activity administrators will be undertaking and what their needs are. Worm Propagation Model Figure 1 shows an informal propagation model for a typical Internet worm as understood by our interviewees. A worm is an autonomous, malicious program designed to seek out and exploit known security vulnerabilities on networked machines. A more precise modeling of worm propagation may be found in [Zou, 2002].

Figure 1: Typical Worm Propagation Lifecycle

Although we have seen quite a few of these programs since the original Morris worm of 1988, they are quite complex to engineer correctly, and usually seem to appear in many different forms as the authors and other “contributors” refine it. The saturation point of a worm may be thought of as the asymptotic maximum number of infected hosts the worm will likely achieve on the Internet at a given time. At saturation, the worm has infected basically all the hosts that are vulnerable to its exploit. Recent worms have exhibited the ability to mutate, downloading new code and potentially new exploit techniques from sites across the Internet. Mutation would allow the worm to achieve a secondary, higher number of infected hosts, so it is very difficult in practice to estimate in advance how many computers “full” saturation is for a particular worm.

As Figure 1 shows, worms propagate on the Internet from zero to full saturation generally following a sigmoidal growth curve. Prior to the main attack, our interviewees have often witnessed diagnostic variants, early versions of the worm that fail to propagate as planned. Sometimes these variants are self-terminating. The release of these diagnostic variants marks the beginning of an early detection window of time. The early detection window is not easy to define in advance and has no predictable duration. If a worm’s exploit is discovered during this window, our interview subjects believe critical analysis of the exploit may be completed before the main assault, and patches may be applied so that the negative effects of the worm would be greatly reduced and its propagation may resemble the intercepted propagation curve shown in the figure. At some point, the worm is released onto the Internet, marking the start of the critical detection time window. This is the last opportunity for administrators to take preventive action to prepare for the worm. During this time, the worm is gaining footholds in enough different networks to begin its most aggressive period of growth. The critical detection window can be very short such as for the MS-SQL Slammer worm [Shultz, et al., 2003], where within about ten minutes the worm had reached full Internet saturation. Our interviewees revealed that the critical detection window is typically much longer, often several days. They believe that making the prerelease phase of worm activity visible might encourage Internet users to patch vulnerable systems, block affected ports, or take other preventive measures. This would slow the propagation speed of the worm, delaying saturation, reducing the absolute infection level, and allowing more time to patch systems that have not yet been affected. Figure 1 illustrates this potential delay as the Delayed Propagation Curve. During the latter two phases, rapid growth and saturation visualization tools may be used to identify infected hosts and target reactions to the worm’s activity. By the time the worm reaches rapid growth, most of the damage has been done, or is unavoidable. During these phases, if the worm hit before preparations were complete, system administrators must spend much time re-

pairing the damage caused. Eventually, the number of infected hosts will diminish, although there are still many instances of “old” worms and viruses that are never completely eradicated from the Internet. The rate of diminishment may also be increased if administrators have tools to identify and repair infected machines. Characteristics of Awareness Tools This section discusses some defining characteristics of security awareness tools and gives examples from related work in computer and network security and management. The following are the defining attributes we use to describe awareness tools: 1. Scope of Visibility: The logical level where observation takes place (network, single host, multihost or end-to-end). 2. Location of Intelligence: Expertise is either built into the tool or expected from the user. 3. Time of Observation: Tools may be most suitable for up-to-the-minute (real-time) observation or after-the-fact analysis. 4. Encoding of Data: Information is conveyed to the user, visually, textually, or via some other sense. 5. Context of Presentation: A visualization may show objects in their natural context (concrete) or show attributes of objects separated from the context (abstract). Scope of visibility can be defined as an awareness tool’s logical level of observation and perspective. From our survey of administrators and their tools, we have identified four scopes of visibility: 1. Network Visibility: Traffic examiners based on a tcpdump thesis. 2. Host Visibility: Tools that group and/or correlate local information from one or more hosts. 3. Application Visibility: Tools that provide a view into the workings of applications running on one or more hosts. 4. Overall Visibility: Tools that allow a user to examine connections from the originating host to the destination and everything inbetween.

A variety of tools of the first three types are available. An example network-based visibility tool is “The Spinning Cube of Potential Doom” [Lau, 2004], a tool shown at Sci-Net conferences to illustrate attacks in progress on the trade show network. The purpose is to raise awareness of the ubiquity of attacks by presenting a visual representation of intrusion detection data. Our users, however, neither agree on whether to use a network IDS nor on which one to use. Therefore, we chose to have our tool listen to raw packet traces from tcpdump—a widely used data format. A popular host-based visibility tool is Big Brother. [MacGuire, 2003] This tool uses Simple Network Management Protocol (SNMP) and log data to poll the hosts for their status data, and presents that status on web pages that are accessible from anywhere. The hosts can be grouped hierarchically, and their status is rolled up on a single page so that the user can tell at a glance which hosts and services are available, which ones have warning conditions, and which are down. Our users declare these tools indispensible to their ability to be aware of the state of their systems. A commercial application awareness tool, AppRadar™, [AppRadar, © 2004] is a host-based, database intrusion protection system for protecting enterprise installations of Microsoft’s SQL Server. AppRadar provides database-specific, active protection and monitoring of the data stored within the databases. Our users need awareness of what applications on their networks are doing but desire a general tool that will treat applications as simple resources. Overall visibility is a broad-spectrum, integrated view of computer activities in the context of the networks they reside on. Such tools would allow a user to see how each process and activity contributes to the overall activity observed on the network at large. This perspective would enable users to better determine why they are seeing the behavior they observe so they can make better judgments about its security implications. Our users did not express directly the need for this kind of tool, but while we were trying to learn how to do system administration work, we found a constant need to gain the perspective and awareness this kind of tool could provide. When we presented the idea to some of our subjects for their critique, they were very enthusiastic about

this approach. In our research, we found no such overall, end-to-end visibility tools. Network Eye is our attempt to fill this void. Location of Intelligence: We have witnessed two basic, contrasting design philosophies that are evident to one degree or another in every tool we have evaluated: 1. Autonomy: Let the tool do as much work as possible to save the cost of human effort. Endow the tool with as much intelligence as possible so that a human decision-maker is as little necessary as possible. a. Detection b. Detection and reaction 2. Presentation: People can figure out what’s going on with the network if they can just see it. Make the tool powerful at presenting and manipulating the data so that humans can make informed decisions easily. Both of these approaches have merit, and neither is sufficient by itself to meet computer security needs. This attribute represents a continuum rather than a choice. A plethora of autonomous tools exist relying on various artificial intelligence engines (expert systems, neural networks, etc.) to make decisions. Recently, so called “intrusion prevention” systems have entered the market on the premise that security events on the Internet happen too quickly for humans to detect and respond to. These tools detect known bad or anomalous behavior and react “intelligently” to the situation at hand with little or no human intervention. A popular example of an autonomous approach is Internet Security Systems’ Proventia. [ISS, 2003] We do not argue with the effectiveness of this approach for many applications, but we found that most system administrators we interviewed did not trust a program to make decisions about how to react to a potential threat for them. The infamous problem of false positive identification of threats coupled with a speedy but incorrect reaction can be very frustrating. There already exist many programs that adequately provide some degree of autonomous action, and we intend to complement rather than replace these. An example of the presentation approach is EtherApe,[Toledo, 2001] EtherApe translates

packet traces into circular connection diagrams. EtherApe does no analysis; it simply interprets the packet data and presents it in graphical form to the user. EtherApe was an inspiration to our design, but our users needed a presentation that was more scalable to large numbers of systems. Time of Observation: The extremes of this continuum are real-time (i.e., “What’s happening?”) monitoring, and post hoc (i.e., “What happened?”) analysis. Some tools are most appropriate for real-time monitoring, while others excel at post hoc analysis. Still other tools work best in “near real-time” mode. Our users indicated that approaches across the continuum were useful, but: (1) it is too expensive to provide around-the-clock, human surveillance of their networks except during an attack, and (2) during an attack there is no time for detailed post hoc analysis. A tool that merely shows what is happening right now will require more vigilance than most organizations can afford or than most humans can provide. A tool that provides only analysis of a security event after the fact is a luxury for most, a necessity for a very few. Our users said they would like to have a tool that provides real-time, ambient awareness, but that they would not be able to simply sit and watch it the whole time. Ad hoc analysis (i.e., “What just happened?”) is a greater need than either extreme. Post hoc analysis requires storing a sufficient amount of meaningful data. Virginia Tech’s campus network typically sees around seven gigabytes of packet data per day—more when a major attack is underway. Storing and manipulating this data is cumbersome, especially when one realizes that a sizeable fraction of it is unimportant background chatter. Our users suggested reducing the detail in data as it ages. The general consensus was: • Within the last hour: keep everything. • Within last 24 hours: keep only packet header information. • Within last week: keep only flow information (source, destination IP, ports, protocol, time), possibly sampled. • Earlier: keep statistics only. This approach greatly reduces the data storage requirements for post hoc analysis while gracefully degrading performance as data ages.

Encoding of Data: There are several ways to encode data for the user’s perception. The majority of the tools in use by our subjects encoded the information as text for the user to read. Examples of text-encoded awareness tools are tcpdump [tcpdump, 2003] and its GUI relations, Ethereal, etc. Text-encoded tools are excellent at providing detail and precision but require a great amount of work and expertise to decode. An alternative encoding presents information visually. While textual encoding uses vision, it is intrinsically an auditory modality rather than visual. [Wickens, et al., 1983] Human vision is our highest bandwidth sensory channel, so it makes sense to present data encoded visually for rapid comprehension. There are many examples of this approach in the literature. Teoh [Teoh et al. 2003] has developed a system to visualize the exchange of autonomous system (AS) routing data on the Internet. EtherApe [Toledo, 2001] visualizes communication data at the local network level. Erbacher [Erbacher, 2003] presents communication data visually to enable administrators to see intrusive behavior patterns from a local host. Our users have reviewed several systems like these (tending to prefer popular, open source projects to commercial or academic projects) and find that they fit various niches. What they need is a tool that would be scalable enough to accommodate broad-spectrum use. Other data encodings might include virtual reality and tactile representation of awareness data. [Nyarko, et al. 2002] Haptic devices can make users aware of the security state of their systems through the sense of touch rather than further taxing the overloaded sense of vision. While haptics and virtual reality are interesting topics for research, our users do not generally have access to the special equipment required to support such encodings, and they are more comfortable with the visual or textual encodings. Context of Presentation: The context of the information presented can be thought of as a continuum between abstract presentation (where attributes are presented abstracted from the context of their source objects) and concrete presentation (where objects are presented in their natural context). An abstract approach might display histograms and line graphs describing statistical

characteristics of communications data. A more concrete presentation might attempt to display a network diagram with hosts, network cables, and routers. Teoh’s work [Teoh, 2004] visualizing originating autonomous systems (ASes) has concrete characteristics because screen locations represent actual physical or logical objects. In the same paper, Teoh describes a more abstract visualization: EventShrubs, that visualizes routing instability events as a set of stacked pie charts ordered on a timeline. We consider this visualization more abstract than the first because attributes of events are being plotted rather than objects themselves. Luc Girardin [Girardin, 1999] uses selforganizing maps to show an overview of network activity. He has implemented a neural network to reduce the dimensionality of the space of network and logging information down to two-dimensional topological maps that illustrate the state of a network. Girardin’s work uses foreground and background colors, sizes, and relative positions on the map display to display both network state and the deviance of current state (i.e., quantized error) from the normal. The method does not require any foreknowledge of the network whose state is to be displayed, but the resulting self-organizing maps are somewhat difficult to read. We believe that though the maps were an attempt to make the abstract network data more concrete by making it visible, the result is nearly as abstract as traditional charts and graphs. An example of a completely abstract visualization system is S-Net, [Cao, et al., 2002] which generates statistical plots from traffic data. Although the tool provides drill-down to the individual observation, still it presents no physical or logical objects in its visualization. Our interviewees have several abstract statistical plotting tools such as MRTG, [Oetiker and Rand, 1998] and a few concrete visualization tools such as EtherApe. [Toledo, 2001] With the exception of a few who were experienced statisticians, our interviewees overwhelmingly preferred concrete visualizations, believing them to be easier to understand and faster to use for constructing a mental model of the situation. Once the mental model is established, they may use abstract tools more easily.

Solution Design We use participative design as part of our development methodology, gleaning wishlists, anecdotes, wisdom, and experience from our users. As we gather this information, we modify the Network Eye architectural plan. To test specific concepts and hypotheses, we construct prototypes of pieces of the system at varying levels of fidelity. We derive our prototypes directly from designs produced in collaboration with the interviewees. Every feature of the overall design and the prototypes discussed in this paper was derived from interactions with interviewees and application of human-computer interface (HCI) knowledge and information visualization techniques. These prototypes stimulate further design discussions with our user community. Network Eye Overview We are currently completing Network Eye’s architecture definition stage. The final architecture has been derived from interviewing system administrators, testing existing security awareness tools, and researching visualization techniques. Our architecture consists of two tightly integrated visualizations that provide an end-to-end view of applications, computers, and networks in context (see Figure 2).

Figure 2: Network Eye’s end-to-end view of application communicating across the network.

The first visualization, the network communication diagram, displays network data (packet traces, etc.) gathered from passive listeners on the network. The second visualization, the hostapplication resource map, provides a view of host data (processes, sockets, ports, and files in use, etc.). The two are joined to give an integrated, end-to-end system security picture.

Design Choices: Based on the requirements we derived from our collaboration with system administrators, we can enumerate our design choices made in support of those requirements. 1. Scope of Visibility: We chose to implement a single, integrated tool that would provide overall visibility, end to end, of the network, hosts, and applications running at a given time. 2. Location of Intelligence: We chose to simply present the data to the operator visually and let the decision-making intelligence reside there. We provide of filtering and aggregation tools to reduce the tremendous complexity of the data that can be seen with this tool. 3. Time of Observation: We will support both real-time monitoring and post hoc analysis of the data using different views of the data. To enable real-time awareness, we provide an ambient monitoring mode to present a network and host state in a short-duration sliding time window. The window is displayed as a constantly updated video loop of network activity, similar to a radar sweep. We will support post hoc analysis by keeping data for a time while saving space by reducing the fidelity of data as it ages. 4. Encoding of Data: We elected to use visual encoding as much as possible because our users have the greatest need of tools that will support quickly building accurate mental models of the current situation. 5. Context of Presentation: We chose concrete presentation over abstract because of the user’s apparent need to construct an accurate mental model before proceeding with further analysis or actions. A few other design decisions are relevant. Several awareness tools use post-processed data from intrusion detection systems (IDSs). Tools like Big Brother [MacGuire, 2003] use active probes of the monitored hosts to gather data about the state of the monitored systems. We have elected to use passive-only techniques wherever possible to avoid corrupting the network communication picture. We must use active techniques to retrieve host data from remote hosts or to correlate the findings of distributed sensors, but we wish to minimize this kind of activity.

Network Pixel Map: Network Eye will display all the observed IP addresses in a pixel-oriented overview visualization [Keim, 2000] that fits in a single display (Figure 3). The basic network pixel map will have a accommodate viewing up to about 100,000 hosts at a time. We will use focus+context (fish-eye distortion) to enable selecting a single host from the myriads displayed.

formation and via experience. Dangerous hosts will be assigned from local IP blacklists or can be moved to the outskirts as they are discovered. This arrangement of markers causes groups of hosts in the same class B networks to appear in rings. In Figure 3, the green ring near the center is the set of hosts in the local network. Better use of space could be achieved by moving the markers for dangerous hosts into the corners. Network Communication Diagram: Mirrored network pixel maps form the heart of Network Eye as the network communication diagram. This visualization shows network traffic flow as lines connecting the mirrored maps (see Figure 4). Point-to-point interconnections, and fan-out/fan-in become apparent. In Figure-4, communication lines connect client hosts to servers or initiators to recipients.

Figure 3: Example network pixel map showing 8,513 IPs observed at a single host over one week.

On the network pixel map each marker (which may be as small as a single pixel) represents a unique IP address observed in the network traffic. Marker placement indicates the trust level and IP address of the host. Marker color is in spectral order from red to violet where red indicates lower-values (near 0.0.0.0) and violet indicates higher values (near 255.255.255.255). We will use pulsating illumination to indicate recent activity of hosts in this view. Pixel (host) arrangement is critical to user insight [Keim, 2000]. We have selected level of trust (e.g., organizational proximity) to guide the placement of hosts in the pixel map view. This method integrates useful information into the sea of multi-colored dots. We classify hosts into five arbitrary trust levels: Home, Trusted, Safe, Untrusted, and Danger. We arrange the trust levels, target fashion, in nested circles, with the most trusted hosts closest to the center and known dangerous hosts in the outermost ring. Trust levels may be assigned from network configuration in-

Figure 4: Network communication diagram built from mirrored network pixel maps.

We can allow the user to encode meaning into line thickness, pattern, color, pulsation, and luminosity in a variety of ways as desired. To assist the users in making sense of the tremendous amount of data, we provide three kinds of filtering tools: (1) selection via pointing and clicking, (2) graphical and textual packet filtering based on Unix’s Berkeley Packet Filters (BPF), and (3) rate-based filtering for selecting according to the level of traffic experienced. We will provide tools to zoom in, filter out distracting features, and drill down to the packet-level on demand. Another sense-making strategy is aggregation of markers and communication lines. Currently, Network Eye aggregates only host markers and only by the first octet. We can reduce the com-

plexity of the picture much more by aggregating the host markers according to the amount of traffic they generate thus only showing the “heavy hitters” as Estan, Savage,, and Varghese do in their AutoFocus tool [Estan, et al., 2003] Our collaboration with Dr. Estan is producing many ways to aggregate effectively. Although packets flow both ways, communication lines are always drawn starting at the client host on the left and going to the server side on the right. With the mirrored maps, elementary patterns are immediately visible: high fan-out shows a single client host contacting many others (potentially scanning), high fan-in shows many hosts contacting one (potentially a distributed denial-ofservice attack). Adding animated activity playback will be a powerful aid to forensic analysis of a potential security incident. By filtering out typical traffic, analysts may identify unusual flows as highly likely intrusion paths. Host Resource Map: The second type of visualization displays host processes and their communication resources. It meshes closely with the network data visualization. The host data visualization shows the interaction of applications in the overall network (see Figure 5). An analyst may spot some communication patterns of interest, select a communication link, and drill down to find out what hosts, ports, sockets, processes, applications, and users are responsible. Host drilldown enables quick investigation and reaction to potential intrusions. An application can be seen as a set of processes that use sockets and other resources to operate. The sockets are bound to some port, and these ports may be communicating with remote hosts. The ability to trace a port back to its owning application greatly helps in determining why the communication is happening. Both client-server and peer-to-peer relationships can be discovered in this way, regardless of encryption. To obtain this information from a remote host requires a tattletale client to be installed there. This is not an unreasonable condition since remote administration tools are common. Network Eye’s tattletale client will use secure shell authentication and encryption for security. Our users requested that the host-application resource visualization also be able to show files related to

each application to make it easier to locate and eradicate unwanted or malicious programs.

Figure 5: Host-application resource map showing an application tied into a network pixel map.

VISUAL: A Prototype Network Communication Diagram The centerpiece of Network Eye is the network communication diagram. We developed this concept from our users’ desire to be able to see all the data at once and filter out the clutter as desired. Before embarking on the expensive development of a fully-functional, three-dimensional network communication diagram, we decided it would be best to test the hypothesis that users would be able to see intelligible communication patterns in what appears to be a confusing array of dots and lines. We designed a prototype network communication diagram called Visual Information Security Utility for Administration Live (VISUAL) [Ball, et al., 2004] to test our hypothesis. We restricted VISUAL to a two-dimensional display for more rapid development. Showing communications in only two dimensions reduces the amount of information displayable because the communication lines between some pairs of hosts may occlude other host markers or communication lines. As a result, we scaled back our ambition of displaying a hundreds of thousands of hosts at a time to displaying hundreds or perhaps a few thousand. We also changed the layout of the display from the two mirrored pixel maps to a home network nested within the plane where the external hosts were being displayed. This approach is similar to the layout in [Erbacher, 2003],

but we use simple square markers rather than Erbacher’s more complex glyphs. Overview: VISUAL opens with a quick overview of network traffic for a small to mid-size network (on the order of 10,000 hosts) over a predefined period of time. The home (internal) hosts appear in a centrally placed square grid, and external hosts are shown as square markers whose size is determined by the relative amount of bandwidth that host is responsible for over the analysis period (see Figure 6).

play. An important interaction technique VISUAL provides is the timeline, a temporal filtering technique that allows the user to review historical events in context.

Figure 7: (a) Fan-in and (b) fan-out traffic patterns in VISUAL.

Figure 6: Screenshot of VISUAL displaying 80 hr of network data from a home network of 1020 hosts.

We separate internal (home) and external hosts because it is natural for our users to think of the world this way. The user can also get details about a computer by selecting its marker. The details include all the hosts that the selected computer communicates with, the source and destination ports (for Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) communications), and what percentage of the overall bandwidth the selected host contributes as a proportion of the total traffic in the analysis time window. The diagram shows which home hosts received a great deal of connections (fan-in), and which external hosts were contacted by the largest number of unique internal hosts (fan-out). Fan-in and fan-out can be readily seen (see Figure 7). VISUAL provides filtering via mouse selection, limiting displayed IP addresses to any address mask, protocol type, and time. Filtering allows users to reduce the complexity of the dis-

Placement of Host Markers: We placed external hosts on the screen roughly according to their Internet Protocol version 4 (IPv4) addresses using a simple mapping function that guaranteed no overlapping markers. We gave no special treatment to private (nonroutable) IP addresses or to broadcast, multicast, or other special address classes because this was not important to demonstrate the concept. We arranged the home hosts in a a contiguous square grid. Unused address cells within the grid are colored black. The figures show a 32 by 32 grid (1024 cells), of which 1020 cells represent active home addresses. This arrangement preserves their unified identity as the home network. We do not require the home hosts to have similar network addresses or even that they belong to the same address space. The important idea is that they are perceived as a group. Communication Visualization: Each line represents an exchange of one or more packets between the connected hosts without indicating bandwidth. We chose this approach to reduce occlusion other external host markers. VISUAL uses the color of the line to represent direction of traffic. Incomingonly is red; outgoing only is green, and bidirectional is blue. An example of incoming traffic would be an Internet Control Message Protocol (ICMP) echo request (ping) packet from an external host to the home network. Another example would be TCP sessions that are initiated by an external host but ignored by the internal destination host. Most TCP traffic is bidirectional and thus colored blue.

NEMO: A Prototype Host-Application Resource Map NEtwork MOnitor (NEMO) is a network traffic analysis and application resource monitoring tool implemented in Microsoft’s C# language. NEMO was built to test the hypothesis that a concrete visualization of processes and their communications is useful in developing an accurate mental model of what processes are responsible for the communications observed on the network. Our users have reported that knowing what application is responsible for a given communication is much more informative than merely knowing what the port number is. Since there are 65535 TCP and UDP ports, and since programs may violate the port conventions and use any port for any purpose, the port number is at best an ambiguous indication of why the observed communication is happening. From information stored on the local host regarding running processes, open sockets and ports, and communications with the outside world, NEMO generates a concrete visualization that can convey much information at a glance. We found that our users could easily detect potential intrusions, which show up as deviations in normal patterns of activity. NEMO uses a twodimensional ragged matrix layout, where each running program that has network connections appears as a row with the program’s connections and open ports appear as row elements. This provides a glanceable overview, while allowing the user to drill down to details about specific ports and connections. Figure 8 shows the NEMO application’s main screen. Overview: NEMO shows all open ports on the computer, grouped by the program that owns them. All ports owned by any subprocess of the program are shown together next to the program name. The name of the executable that was used to start each program appears as the header of each row, and the ports used by the program are shown as rectangles. The rectangles representing each open port have subcomponents that give further detail about each connection. Glancing at a row in NEMO’s display, a user will be able to tell how many ports a program has open, the type and relative activity of each port, and the communication state of the port. A marker for a port that

is actively receiving or sending packets flickers so that active ports stand out visually.

Figure 8: Screenshot of NEMO running, showing the port status of a typical Windows XP machine.

Details of the connections appear as tooltip windows under the cursor when the user positions the mouse over a port icon. Details include the process ID, the local and remote port numbers (for TCP and UDP traffic), the local and remote IP addresses, the number of packets sent and received by this connection, and the state of the connection (TCP sockets only). This information could be readily linked to a network pixel map. The Application: As ports are opened and closed, they are added and removed from the port list. Similarly, programs are added and removed from the program list as they start up and exit. A program’s ports are represented by colored squares whose color indicates the status of the port. A letter below the square tells the port type (“U” for UDP and “T” for TCP). A small, stacked bar graph within the square shows the relative counts for packets sent received by the port. A vertical bar separates the well-known ports (below 1024) from the high numbered ports. Finally, there is a timeline that can be used to view the network status of the computer at any point in time when NEMO was monitoring the system. The timeline can be used to help mentally reconstruct the activity of any program. NEMO is driven by two Windows events: connection table updates and network activity. A timer is used to update the connection table every

500 msec, when any new connections are added to the list along with a timestamp. Connections that stay open receive a status update based on the latest settings, and connections that are no longer open are removed from the display. Anytime the kernel receives or sends a packet, NEMO captures it and parses it to extract the packet type, the local and remote IP addresses and any port information. If the packet matches the specifications of any of the open connections, it is timestamped and saved to that connection’s list.

Evaluation With prototypes in hand, we conducted a small usability study and then conducted follow-up interviews with a select subset of the subjects from the original system administrator study and some of their peers as well. The purpose of these follow-up interviews was to review overall design approach and to evaluate the prototypes. The follow-up interviews were very loosely structured. We presented our overall design and asked subjects for their comments. We then showed them each prototype and asked for further comments. Then we used the presentations as a discussion springboard to reveal more information about the users’ needs. User Evaluation of Network Eye The most frequently cited problem the users had with Network Eye’s concept was the idea of trying to select a single host from an array of about a hundred thousand tiny markers. First, it must be understood that just because Network Eye is scalable to 100K hosts does not mean it must display all those, nor does it mean that a host marker may be no larger than a single pixel. At the upper end of its scale, we expect some difficulties with the pixel-oriented approach. Discussions with the expert group helped us to design a new interaction technique for the network communication diagram. The user could select hosts for filtering via a two-dimensional network pixel map. Either focus+context (fisheye distortion) or overview+detail (magnifying glass) could be used to make it easier to make selections as small as a single host. Then only communications involving selected hosts would be shown. Another request was to be able to click on the communication lines for further information on a

flow. Although we had not considered this interaction before, we believe it could greatly improve the utility and ease of use of Network Eye. More research needs to be done in this area with further vertical prototypes to test the various interaction techniques. The second problem users had with Network Eye was that it would have to collect and manipulate data at a tremendous rate to enable realtime monitoring. This is a potential shortcoming in the design and will require the existence of a working prototype to determine how intensive the load would be. We may need lots of storage and high-speed graphics to display real-time network situations in practice. We are conducting further research to determine our approach’s limitations. User Evaluation of VISUAL We conducted a usability study with the VISUAL prototype to determine whether users are able to get insight into their network events by viewing graphic traffic patterns. Our main objective was to verify if users could tell when abnormal traffic occurred. Determination of abnormality can only be based on preexisting knowledge of the network itself. What is normal in one network might be abnormal in another. Most network administrators already have a mental model of what the communication patterns should look like. VISUAL can help them see if what they expected matches what is actually happening. For example, if a computer in the home network is used only internally, and VISUAL shows external communications to that computer, then that activity would be considered abnormal. Since test subjects for our usability study could not have a preexisting idea of what is normal in the test network, we had them compare two or three traffic data sets, one of which they were told was normal traffic. The others may or may not have abnormalities present. Part of their task was to identify traffic patterns that were markedly different between the datasets. All data sets came from the 1999 DARPA Intrusion Detection System Evaluation. [Haines, 2001] We conducted our usability study with 9 university students (not system administrators) that characterized themselves on a range from power users to occasional users of computers.

During each session, we explained to each user how VISUAL works, then allowed each to become comfortable with VISUAL for about five minutes using a training data set that did not have any abnormalities associated with it. Once they were familiar with VISUAL, we presented them with two testing datasets. Each user was able to quickly note abnormalities in the data. Even though each described the abnormalities in his/her own words, every user noted the same set of abnormalities. For example, every user was able to quickly focus on a prominent ping sweep as abnormal. Although not all of the users knew what a ping sweep was, they could clearly see how it differed from the rest of the data presented. We believe that our usability test successfully demonstrated that a graphical visualization of network traffic can provide useful insights without requiring any training. Completely inexperienced users were able to quickly identify important communication patterns. The participants also provided many useful comments that we will use to improve VISUAL and Network Eye in the future. A problem with VISUAL is occlusion. Host markers may be completely hidden by communication lines. We have partially solved this problem by using translucent lines. Network Eye seeks to mitigate occlusion by implementing a three-dimensional network communication display that can be manipulated by users to view from different angles. Improved traffic filtering and aggregation mechanisms will also ameliorate occlusion. After the usability study, we conducted followon interviews with subject-matter experts where we demonstrated VISUAL to get their reactions. The feedback we received was very positive. All interviewees said they could easily identify communication patterns. They were happy with how the tool allowed them to quickly compare the relative activity levels of hosts on the network, and appreciated how the graphical approach complemented their existing suite of tools. The VISUAL prototype can present approximately ten thousand external and a thousand internal hosts in its current form. Some subjects commented that this number was too large for their needs, while others said it was too small. Clearly, scalability of the displays is a key con-

cern. The data set we demonstrated was only about 180,000 packets long, and a few of the subjects noted that this was only a few minutes worth of data on their networks. This tells us that high-performance storage and graphics will be needed to present some data sets seen in practice. The subject matter experts suggested that three-dimensional plotting be used because of the added analysis power that a user could gain. We believe three-dimensional tools excel in analytical settings, while two-dimensional displays are better for presenting ambient information at a glance. This would suggest that a two-dimensional display would be better for the earlier detection phases where awareness rather than in-depth analysis is needed. All of the expert reviewers suggested that line color be used to encode information about the flow. The most popular suggestion was to color by protocol, but since protocol is categorical data, and there are hundreds of popular protocols, no pre-established coloring scheme would produce intuitive results. An excellent coloring scheme suggested was to color communication lines by their age (e.g., old lines might be colored toward the red end of the spectrum while bluer colors are used to indicate recent communications). In VISUAL, we assumed that our users only cared about traffic going between the home hosts and external hosts. We provided only very limited ability to visualize strictly internal traffic. Our users were quick to point out that although this is true in the normal case, once a worm hits, they turn their attention inward so that if the worm hits their network they can locate and eradicate it as quickly as possible. Network Eye will be better at concentrating on internal-only traffic because it is straightforward to simply zoom in on the center of each pixel map that represents the home network. User Evaluation of NEMO Prototype As with VISUAL, NEMO is a prototype of kinds of functionality requested by the system administrators we interviewed. After its development was complete, we showed NEMO to several of the interviewees and several other system administrators who had not been involved in the earlier interviews. We conducted an informal cognitive walkthrough with five of these expert

users. We briefed them on Network Eye’s concept of end-to-end visualization and the hostapplication resource map. We then demonstrated NEMO as a prototype of this functionality. We asked how the visualization would help them in their work and how it might be improved. We also asked what other uses NEMO might serve and what functionality needed to be added. The most common scenarios for when NEMO would be helpful were when the administrator either notices (1) some unusual traffic emanating from a particular host or (2) a slowdown or other unusual behavior and decides to investigate the host. One expert compared NEMO to software that finds and removes spyware (programs that surreptitiously contact sites on the Internet and send local information to them). NEMO was much easier to use for detecting running spyware but did not provide a means for removal once it was detected. Our interviewees decided that we should add open files to the list of resources shown to make it easier to eradicate spyware, but that actually providing functions to remove the files was beyond the scope and intent of the program. NEMO currently does nothing with information in the Windows Registry that can be used to hide spyware, but since our intention was to make NEMO (and also the host-application resource map) as portable as possible, we decided not add such features to the Network Eye architecture at this time. The users liked the fact that the ports were shown alongside the applications that used them. Discovering this information using command-line tools such as the Unix lsof program is possible, but much extraneous information is also presented, and lsof’s command-line options are quite complicated and difficult to remember. All the experts agreed that NEMO was easy enough to use that a non-expert user could use it to monitor his own machine. They agreed that NEMO told them most of what they needed to know at a glance and without any expertise in finding the information. One expert told us that he wanted to correlate information from NEMO’s timeline to anomalies found in the logs. NEMO does not provide drilldown to this level, but such a feature would be a useful addition and is well within the scope of host and application level post hoc analysis.

Conclusion This paper illustrates the need for visualization software in computer security circles, describes tools designed to meet this need, provides an example of a workable requirements capture process, and recounts the lessons we’ve learned from our interaction with the user community. In particular, we have benefited from the participative approach to design (users as designers). While it is tempting to construct great visualizations without any user participation in the design, we believe the result will be less usable and less used. We believe our contributions to the state of visualization for computer security are: 1. Classification of the types of system administration and the activities common to administration jobs. 2. Usable prototype visualization tools for host and network awareness. 3. Incorporation of principles and methods of information visualization into computer security software. 4. Close involvement with expert users that documents the need for visualization in computer security. 5. Practical use of a participative design approach in a real project with use of rapid prototyping to refine requirements early and accurately. Interaction with the expert reviewers made it clear that scale is very important when designing network awareness applications. There may not be a single visualization that can scale appropriately across the needs of different kinds of system administrators and administration activities. We hope this paper will encourage broader discussion and critique of visualization techniques to the benefit of those who secure systems and networks. Our intent is not to replace all the great tools out there with Network Eye. Instead, we hope to complement the existing range of tools by providing what is needed but currently absent in the state of the art.

References AppRadar, © 2004. Application Security, Inc., http://www.appsecinc.com/products/appradar/ Ball, R. G., Fink, G. A., Rathi, A., Shah, S., and North, C., "Home-Centric Visualization of

Network Traffic for Security Administration," To be published, 2004. Cao, J.; Cleveland, W.S.; and Sun, D.X., S-Net: A Software System for Analyzing Packet Header Databases. in Passive and Active Measurement Workshop Proceedings, (Fort Collins, Colorado, USA, 2002), 34-44. Erbacher, R.F., Intrusion behavior detection through visualization. in Systems, Man and Cybernetics, 2003. IEEE International Conference on, (2003), 2507-2513. Estan, C., Savage, S. and Varghese, G., Automatically inferring patterns of resource consumption in network traffic. in Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, (Karlsruhe, Germany, 2003), ACM Press, New York, NY, USA, 137-148. Girardan, L., An eye on network intruderadministrator shootouts. in Proceedings of the 1st USENIX Workshop on Intrusion Detection and Network Monitoring, 1999. USENIX Assoc. Haines, J.W., Lippmann, R.P., Fried, D.J., Tran, E., Boswell, S. and Zissman, M.A. 1 9 9 9 DARPA Intrusion Detection System Evaluation: Design and Procedures, MIT Lincoln Laboratory, 2001. Internet Security Systems, Inc. Simplified Deployment, Streamlined Management and Maximized Protection: Proventia™ Dynamic Threat Protection™ Appliances; Protection Without Complexity, Internet Security Systems, Inc. (www.iss.com), 2003. Keim, D.A. Designing pixel-oriented visualization techniques: theory and applications. IEEE Transactions on Visualization and Computer Graphics, 6 (1). 59-78. 2000.

Lau, S. The Spinning Cube of Potential Doom. Communications of the ACM, 47 (6). 25-26. 2004. MacGuire, S. and Croteau, R.A. Big Brother, BB4 Technologies Inc, part of Quest Software Inc., 2003, Web-based Systems and Network Monitor. Oetiker, T. and Rand, D., MRTG–The Multi Router Traffic Grapher, Swiss Federal Institute of Technology, Zurich, Switzerland, 1998, http://people.ee.ethz.ch/~oetiker/webtools/mrtg /paper/ Schultz, E.E.; Mellander, J.; and Peterson, D. R., The MS-SQL Slammer Worm, Network Security, Vo 2003, Issue 3, March 2003, Pages 1014 Tcpdump. TCPDUMP public repository , 2003. Teoh, S.T.; Jankun-Kelly, T.J.; Ma, Kwan-Liu; Wu, S. Felix, “Visual Data Analysis for Detecting Flaws and Intruders in Computer Network Systems,” Submitted to Infovis 2003. Toledo Juan Cota, EtherApe, July 2001. http://etherape.sourceforge.net/, last accessed June 2004. Venter, H.S. and Eloff, J.H.P. A taxonomy for information security technologies. Computers & Security, 22 (4). 299-307. Wickens, C.D.; Sandry, D.L.; and Vidulich, M. Compatibility and Resource Competition between Modalities of Input, Central Processing, and Output. Human Factors, 25 (2). 227-248. 1983. Zou, C.C., Gong, W. and Towsley, D., Code red worm propagation modeling and analysis. in Proceedings of the 9th ACM conference on Computer and communications security, (Washington, DC, USA, 2002), ACM Press, New York, NY, USA, 138-147.

Suggest Documents