A Distributed Architecture for an Adaptive Computer Virus ... - CiteSeerX

3 downloads 28337 Views 163KB Size Report
ments of such a computer virus immune system and ... Computer viruses are widely acknowledged as a sig- .... Identify and Repair Damage Identify damaged.
A Distributed Architecture for an Adaptive Computer Virus Immune System Robert E. Marmelstein, David A. Van Veldhuizen and Gary B. Lamont Department of Electrical and Computer Engineering Graduate School of Engineering Air Force Institute of Technology Wright-Patterson AFB, OH 45433-7765 frmarmels, dvanveld, lamontg@a t.af.mil

ABSTRACT Computer viruses are widely recognized as a signi cant computer threat. The \birth rate" of new viruses is high and increasing due to global connectivity, and technology improvements can accelerate their spread. In response to this threat, some contemporary research e orts are aimed at creating computer virus immune systems (CVIS). A CVIS uses the human immune system as a model for identifying, attacking, and eradicating viruses from computers and networks. This paper analyzes the requirements of such a computer virus immune system and evaluates current approaches with respect to these requirements. Based on this analysis, we propose a distributed architecture for implementing a CVIS. In particular, we discuss how emerging technologies such as evolutionary algorithms (EAs) and intelligent agents (IAs) can be employed to give the CVIS a self-adaption capability for new viral threats.

1 INTRODUCTION

Computer viruses are widely acknowledged as a signi cant computer threat. It is dicult to quantify this threat, but indications are that it is becoming more and more widespread. Two qualitative trends are recognized and accepted: the \birth rate" of new viruses is high and increasing, and accelerating computer interconnectivity and interoperability enhances the capabilities of viruses to spread [6]. These factors make it increasingly dicult for existing anti-virus products to keep pace with threat proliferation. In response to this situation, several research e orts are using biological immune systems (BISs) as a model for developing techniques which identify, attack, and eradicate viruses from computer systems. Of particular interest to researchers is computational emulation of BIS self-adaption mechanisms to combat previously unseen viruses. This paper examines how modeling the human BIS as a distributed information processing system can be used to protect against computer viruses. We begin in Section 2 by summarizing key facets of the human

immune system and associate this model with CVIS requirements. Contemporary research in this eld is examined and evaluated in Section 3. This analysis culminates in Section 4 where we present a distributed CVIS architecture. Finally, we present our conclusions and planned future work in Section 5.

2 IMMUNE SYSTEM MODELING AND REQUIREMENTS

In this section we explore the suitability of the human BIS as a model for an adaptive CVIS. This is accomplished by identifying the principal components of the BIS and using them to de ne a CVIS requirements.

Biological Immune System Overview

The human BIS is a vast network of molecules and cells whose primary function is to protect against foreign microorganisms (or antigens) such as viruses, bacteria, and parasites. The basic process by which the BIS accomplishes this task is extensively described (with detailed pedagogical diagrams) by a number of authors including Stryer [11], and Marrack and Kappler [10]. Rather than reiterate these discussions, we treat the BIS as the following high level set of interacting components: Detector Detectors discriminate between entities that are part of the host (self) and those that are foreign (non-self). Classi er Once an antigen is detected, classi ers determine its unique type. Correct classi cation of the antigen is essential to formulating an e ective immune response. Cleansing Agent Cleansing agents eliminate speci c antigens from the host. In a BIS, white blood cells (or lymphocytes) produce antibodies that attack and eliminate antigens. Memory Memory stores successful threat responses for future reference. When faced with another in-

stance of a given antigen, BIS memory is consulted to produce the most e ective response. Adaptation Process The adaptation process modi es other BIS components to optimize overall BIS system performance. While there is no centralized physical mechanism responsible for adaptation, its e ect on BIS components can be observed in a myriad of distributed processes. For example, lymphocytes evolve via reactions to surrounding cells. This process, known as negative selection [5], eliminates lymphocytes that attack host cells.

Nature of the Threat

Like its biological counterpart, a computer virus can spread from host to host through a variety of methods. With regard to these methods, deterministic viruses can be classi ed into three major categories: le infectors, boot-sector viruses, and macro viruses. These virus types are usually stealth or semi-stealth in nature. File infectors are normally installed in memory during system initialization or when operating system commands are executed. Once resident they can attach themselves to legitimate programs and les. Boot-sector viruses infect their namesake sector on a given disk. When an infected disk is booted, the virus is loaded into memory and spreads by infecting other disks inserted into the machine. Macro viruses inhabit executable `macro' les of widely used application programs (such as Microsoft Excel). Running these macros can cause considerable system damage. Innovations such as application agents can also undermine computer security. While they make it possible to automate more computer functions, they may also contribute to the spread of viruses while acting on the user's behalf. Thus, an e-mail application agent may automatically unleash a virus by opening infected attachments [3, 9]. The Internet, while being a valuable information resource, is simultaneously a proven virus source. Internet browsers make it all too easy to download infected les; combining browsers and agents further compounds this problem.

CVIS Requirements

Many anti-virus utilities operate by scanning les for bit patterns (called signatures) known to belong to a speci c virus; these utilities are called signature scanners. In addition, deductive techniques (such as heuristic scanning) used \rules of thumb" to identify programs exhibiting virus-like behavior. While reliable, these methods use static knowledge bases that may not describe newer viruses. The result is a continual update cycle in order to cope with newer viral threats.

This situation implies a more robust computer security system is needed; we suggest a silicon-based immune system which automatically and inductively adapts to new viral threats. Such a system must possess analogous components to those of its biological counterpart in order to perform several functions. Detect Virus Detect viral presence (non-self) in the host. Detection must occur regardless of whether or not the virus was previously encountered; viruses are identi ed through a scanning technique (e.g., signature, correlation) or a system trace approach. Classify Virus Isolate the virus and uniquely identify it based on its characteristics. Extract Signature If the virus cannot be accurately classi ed, extract its signature from an infected le. Purge Virus Use the virus' signature to locate infected host resources (programs, data, or memory). Purge the virus from all infected resources. Identify and Repair Damage Identify damaged system resources (perhaps as part of the purge process) and restore to their uninfected state (if possible) or replace from some integrity database. Augment Virus Database If the virus was not previously encountered, store all relevant information needed to recognize it in the future.

CVIS Implementation Challenges Given BIS complexity, signi cant technical obstacles stand in the way of CVIS implementation. Perhaps the most challenging of these is the task of replicating a BIS's inherent parallelism. Consider that in a BIS, lymphocytes are distributed throughout the body's circulatory system. Since each lymphocyte acts as an independent agent the search for antigens (nonself) within the host is highly parallel. In contrast, a number of factors limit the extent to which a search for a viruses can be parallelized within most computer architectures. Some of these factors include: the number of available system processors, competition between tasks for processor time, and bottlenecks in accessing shared resources (such as disk drives and memory). Implementing an arti cial adaption mechanism is another major challenge in BIS emulation. Because a CVIS is a discrete system, it lacks the evolutionary adaption mechanism of the BIS. Several factors complicate such a mechanism's design. For example, what aspect of the CVIS should evolve? Candidate components include virus detection, virus purge, and damage repair. Whichever is chosen, it is critical to

insure that the cure is not worse than the disease; a \fail-safe" process in other words. In addition, the computational overhead of testing candidate solutions for the large number of existing viruses clearly make it impractical to implement on a single system.

3 CURRENT COMPUTER VIRUS IMMUNE SYSTEM RESEARCH

As a source of virus ghting techniques for computers, the BIS model has recently generated considerable interest . In his survey of arti cial immune system research, Dasgupta [2] cited Forrest et al.'s [4, 5] and Kephart et al.'s [7, 8] CVIS research. This section evaluates each approach's suitability for speci c CVIS aspects.

Self - Nonself Determination

Forrest's work concerns methods for distinguishing legitimate computer resources from those corrupted by a computer virus. This is primarily accomplished via detectors which monitor important data. The detectors are randomly generated binary strings of xed size. Given detector strings are complemented by an equally sized segments of protected data; the output determines whether the data has been improperly modi ed. Thus, detectors act as self/non-self discriminators for the protected data. While elegant in its simplicity, Forrest's algorithm has signi cant drawbacks. Perhaps most serious is the overhead of generating sucient detectors to monitor a given data size (Ns). Even though a large data set can be monitored using a small number of detectors (Nr ), only a small percentage of an initial randomly generated detector pool (Nro) proves to be useful. This percentage is determined by the probabilities that two strings of size l match at r contiguous locations (PM ), and the desired reliability of the detector (Pf ). Thus, given Equation 1, with PM and Pf xed, Nro must increase exponentially (as a function of Ns ) in order to yield Nr valid detectors.

?lnP ro = PM  (1 ? fPM )Ns

N

(1)

This algorithm's computational overhead increases as it becomes necessary to protect frequently changing data les. To stay current, the system must generate new detectors whenever changes to protected resources occur. However, one way to make this approach more viable is generating the self/non-self detectors from an infected portion of the le. We can then use the resulting detectors to check for the presence of a speci c virus. Thus, we shift the purpose of these detectors from protecting a speci c piece of

data to virus classi cation and le integrity checking. Since a given virus is unlikely to change, this variation has much lower overhead than Forrest's method.

Virus Decoy Programs

Kephart's approach uses decoys to detect computer viruses via a \baiting" scheme. Decoys are programs existing speci cally to become infected by a virus. This approach enjoys several advantages over Forrest's. Since there is no reason for a decoy program to change, the risk of a false positive detection (inherent in Forrest's algorithm) is practically zero. In addition, because decoy le structure is known, virus code can be automatically isolated and its signature extracted. Lastly, the decoy method avoids the overhead of generating new detectors every time protected data changes. Despite its advantages this approach is not without risk. There is no guarantee a virus will attack a decoy in a given time period (if at all). Of course, the longer it takes a decoy to become infected, the more damage a virus can cause to other system programs. It is therefore preferable for the decoy to possess attributes maximizing its probability of infection. Because these characteristics are virus dependent, decoys must be generic (attracting many viruses) or specialized (for a speci c virus). Another drawback of this method is its usefulness only for detection and not for classi cation. As a result, it must be used in conjunction with another method to identify a detected virus.

4 TOWARDS AN ADAPTIVE CVIS

As discussed earlier, a CVIS must be adaptive in order to e ectively protect against an ever changing virus population. However, as indicated in Section 2, signi cant technical obstacles stand in the way of achieving such a capability. To overcome these problems we propose a multi-level architecture designed to manage the enormous computational burdens associated with a CVIS implementation, but yet not be intrusive or obstructive to the user. This is accomplished primarily through coordination of autonomous Intelligent Agents (IAs) at three levels: local, network, and global. Our architecture's pictorial representation is shown in Figure 1, and a comprehensive overview of this interlocking scheme is presented in the following sections.

Local Level

An IA at the local level controls all CVIS-related activities for an individual computer system (or node). Because we desire to minimize CVIS processing overhead on each node, only those functions necessary for local virus protection are performed at this level.

Figure 1: Diagram of Distributed CVIS Architecture

Virus Detection Virus detection is accomplished

using Kephart's decoy approach (refer to Section 3). A local IA manages decoys resident on a given node. In particular, it determines the number and type of decoys needed for the node's protection, schedules them for execution, and periodically checks each decoy for infection. If an infected decoy is found the contaminated le is sent to the network level for virus classi cation. This approach provides multiple and redundant virus capture mechanisms. Virus Elimination After the network IA has classi ed the virus, it sends the local IA the set of self/non-self detectors needed to identify the detected virus. As discussed in Section 3, this enables the remaining les on the system to be tested for infection. System Repair If possible, the local IA attempts to repair infected les. This process is initiated if a repair routine exists (at the global level) for a given virus. If none is available the le is deleted and replaced with its most recent (uninfected) backup. Vulnerability Analysis Scanning the system con guration attempts to uncover de ciencies in the current level of virus protection. Since the local IA keeps track of known network threats it uses the audit's re-

sults in requesting resources to improve the node's threat readiness. The local IA's functionality depends heavily on its interaction with the network and global levels. While the local IA performs functions with relatively low processing overhead (e.g., executing and checking decoys), more complex functions (e.g., virus classi cation, signature extraction, and decoy evolution) are performed by dedicated platforms at higher CVIS levels. In terms of information exchange, the local IA routes all requests for virus ghting resources (e.g., decoys, self/non-self detectors, le repair routines, etc.) through its associated network IA. In turn, the local IA is sent the resources corresponding to its request. Alerts are also received at the local level regarding viruses discovered in the network. These alerts trigger vulnerability audits and are the basis for resource requests. The guiding principle is that the local IA is responsible for maintaining the resource con guration needed to successfully counter viruses on its particular node.

Network Level

Our architecture's network level contains nodes characterized by a high degree of interaction. It is expected that nodes at this level exhibit one or more of

the following characteristics: they share a large number of resources, they are connected by a local or wide area network, and they support a common organization. Under these conditions an infection appearing in one node can quickly spread throughout the network. As a result, a key network IA purpose is to insure its local IAs are informed of any virus found in the network. In addition, the network level acts as a conduit for resource transfers between the local and global CVIS levels. The following activities are performed by the network IA: Virus Classi cation When network IAs receive infected decoy les (from their local IAs), they attempt to classify the embedded virus. This is done by checking the infected decoy portion against the library of self/non-self detectors for each virus on le. If a match is found, the appropriate detector set is sent to the local IAs for virus elimination. Infected les that cannot be identi ed are forwarded to the global level. Threat Alerts A network IA also distributes alerts about viruses detected in the network. An alert is issued when a local IA detects a virus and the infected le forwarded to the network IA. If the network IA can classify the virus it sends alert messages to all local IAs within its span of control. Metrics Reporting The network IA reports its success (or lack thereof) in processing detected infections. For example, a decoy successfully attracting a virus is reported to the global CVIS level. These reports are used as metrics (at the global level) to gauge the overall tness of a given resource within the CVIS.

Global Level The apex of our architecture hierarchy is the global level. Activity at this level is focused on generating and adapting virus ghting resources. Once developed, these resources can be downloaded by the network IAs. In this sense, the global level acts as a warehouse for resources found to be useful at the lower levels of the architecture. Application of Evolutionary Algorithms Evolutionary algorithms (EAs) are utilized at the global level as a means of resource adaption. As an example, we explain how decoys are evolved within our architecture. Recall that decoys are used as a means of detecting the presence of viruses on a host node. We propose a genetic algorithm (GA) to improve the performance of two decoy types: virus speci c and generic. This GA is de ned using the general EA

model provided by Back [1, page 122] as follows: GA = (I; ; ; ; s; ; ; ) ;

(2)

where I corresponds to individuals,  the tness function, the GA probabilistic operators, the generation transition function, s the selection operator,  the termination criteria,  the number of parents, and  the number of o spring. Our approach is to use all networks linked to the CVIS as a laboratory for evaluating the decoy population. In our GA instantiation, the chromosome (I ) is composed of attributes a ecting the decoy's behavior; these include Processor Duty cycle (PDC), le size and name, directory location, and priority. Once a population of individuals is generated, they are randomly downloaded to applicable local IAs. At that point we begin to collect statistics on each individual's tness. Fitness is computed based on decoy type. In the virus speci c case, tness is dependent upon how quickly the decoy attracts a virus. We de ne this metric by counting the number of les found to be infected by the virus relative to the PDC amount used by the decoy (Equation 3). Thus a decoy containing infection with low PDC maximizes its tness. For the generic decoy case (Equation 4), given N viruses and M (i) nodes infected by the ith virus, we compute tness by averaging 1 for each node a ected. Using this approach we maximize 2 for each decoy based on the number of viruses it detects. - PDC 1 = Number of1Infected Files + 1

N ( 0; X 1 PM (i)  2 = N 1 i=1

M (i) j=1

1;

if M (i) = 0 if M (i) > 0

(3)

)

(4) The above process need not be computationally intensive. Consider that weeks (or even months) may elapse before a critical mass of metrics are collected justifying a new generation of decoys. This enables a single, dedicated platform at the global level to manage a large virus population. Virus priorities determine the proportion of processing resources devoted to the GA (with newer viruses receiving highest priority). Since a virus may never become extinct, the GA's priority can be decreased in proportion to the time the CVIS has been free of its associated virus. This approach is substituted for a xed termination criteria . Other GA parameters are to be de ned upon implementation.

Generation of Virus Detectors Our CVIS design adapts to new viruses by learning how to classify them through induction. Recall that infected les which cannot be classi ed are sent to the global level. The global IA isolates infected portions of the decoy and attempts to match them to a known virus; if this fails the infection is cataloged as a new virus. Accordingly, using a (modi ed) algorithm (discussed in Section 3), self/non-self detectors for the new virus are generated. These detectors are then forwarded to the network IAs facilitating classi cation of future instances of the same virus. Of course, this approach is predicated upon static virus signatures. Modeling Morphisms In Section 2 we proposed the human BIS as a model for an adaptive CVIS. Our architecture successfully implements key model elements. At a high level of observation, the BIS has ve interacting components: detectors, classi ers, cleansing agents, memory, and an adaptation process. CVIS detectors are produced by the global IA, and used by network and local IAs. Classi cation is performed at both the network and global levels. Local system repair emulates cleansing agent actions. Memory is distributed across the CVIS' levels, and nally, our CVIS implementation adapts to new viral threats. The proposed CVIS architecture in toto thus executes the key BIS components in a silicon environment.

5 CONCLUSIONS AND FUTURE WORK In this paper we have outlined a distributed CVIS architecture based on its biological counterpart, the BIS. This design makes extensive use of low-level virus ghting approaches developed by other researchers. What distinguishes our work from theirs is that we have modi ed their techniques (speci cally addressing virus detection and classi cation) and augmented them with new technologies (such as EAs and IAs), designing a comprehensive CVIS capable of adapting to new viral strains. Furthermore, we have illustrated how the distributed nature of our architecture mitigates the enormous computational burden of this task and provides an e ective and ecient CVIS. We plan to continue analyzing and designing various aspects of the proposed CVIS architecture. We suggest comprehensive development based on this architecture be standardized, implemented, and validated by an appropriate agency such as the Air Force Information Warfare Center, the CERT Coordination Center, or the National Computer Security Association.

References

[1] Back, Thomas. Evolutionary Algorithms in Theory and Practice . Oxford University Press, 1996. [2] Dasgupta, D. and N. Attoh-Okine. \ImmunityBased Systems: A Survey." Proceedings of the IEEE International Conference on Systems, Man and Cybernetics . October 1997. [3] Etzioni, Oren and Daniel Weld. \A SoftbotBased Interface to the Internet," Communications of the ACM , 37 (7):72{80 (July 1994). [4] Forrest, S., et al. \Self-Nonself Discrimination in a Computer." Proceedings of the IEEE Symposium on Research in Security and Privacy . 1994. [5] Forrest, Stephanie, et al. \Computer Immunology," Communications of the ACM , 40 (10):88{ 96 (October 1997). [6] Kephart, Je rey O. \A Biologically Inspired Immune System for Computers." Arti cial Life IV, Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems , edited by Rodney A. Brooks and Pattie Maes. 130{139. Cambridge, MA: MIT Press, 1994. [7] Kephart, J.O. \A Biologically Inspired Immune System for Computers." Proceedings on the 4th International Workshop on the Systhesis and Simulation of Living Systems and Arti cial Life . 130{139. 1994. [8] Kephart, J.O., et al. \Fighting Computer Viruses," Scienti c American , 88{93 (November 1997). [9] Maes, Pattie. \Agents that Reduce Work and Information Overload," Communications of the ACM , 37 (7):31{40 (July 1994). [10] Marrack, P. and J.W. Kappler. \How the Immune System Recognizes the Body," Scienti c American , 269 (3):80{89 (September 1993). [11] Stryer, Lubert. Biochemistry (4th Edition). New York: W. H. Freeman and Company, 1995.