Automatic Management of Logging Infrastructure Christopher R. Johnson Department of Computer Science Knox College Galesburg, Illinois
[email protected]
Abstract—A secure and reliable network logging infrastructure is necessary for anomaly detection systems to be effective. Without reliable logs, an anomaly detection system would be unable to detect malicious activity. An attacker could even target the logging infrastructure before carrying out an attack to make their actions undetectable. Unfortunately, logging infrastructure is very difficult to maintain manually as compliance to separation of duty policies and constant monitoring are a burden on administrators. In this paper we present an architecture for a system that is able to automatically react to problems with the logging infrastructure. This system uses a method we present to detect that a problem exists, determine the best way to fix that problem, and then notify automated agents on the network to take action. It does all of this with little or no administrator involvement to reduce the burden of maintaining a secure logging infrastructure.
I. I NTRODUCTION A secure and reliable logging infrastructure is at the root of most systems designed for detecting malicious insider behavior. If the logging infrastructure fails or becomes compromised, these systems may be unable to detect malicious activity: important events might not be properly logged and personnel might be able to falsify logs to make their actions untraceable. Hence, protecting the logging infrastructure is a priority in the management of any anomaly detection system. To provide a better protection against attack, the logging infrastructure should be resilient to failures and to intentional compromises. Failure resilience is generally obtained using redundant systems, while protection against intentional compromises can be obtained by enforcing separation of duty constraints between the personnel managing the systems and the personnel managing the logging infrastructure: personnel in charge of administering systems should not have privileges for modifying the logs of their actions, while personnel in charge of the logging infrastructure should have no privileges that allow them to perform actions on parts of the system not related to logging. As enterprise systems frequently experience failures and changes in configuration or policy, the logging infrastructure needs to be properly changed to ensure reliable monitoring
Mirko Montanari, Roy H. Campbell Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois {mmontan2,rhc}@illinois.edu
of the system. Enforcing reliability constraints and separation of duty in a large enterprise system is challenging. Personnel performing changes on systems may need to wait for a separate person before they act so that corresponding changes can be made in the logging systems. In this paper, we present an automated system which is capable of automatically ensuring that these separation of duty constraints and reliability policies on the logging infrastructure are satisfied even after planned or unplanned changes in the system configuration. Our design allows for automatic reconfiguration of the logging infrastructure as a consequence of changes in the system resulting from failures or from administrators’ actions. Using this automated management of the logging infrastructure, we can reduce the burden on administrators because reactions to failures are automated. Our system monitors the current system configuration and reacts to changes. The reaction process is composed of two phases. During the first phase, information about the logging and network configurations is verified for compliance to reliability requirements and to separation of duty policies. This phase is implemented by creating a Datalog model of the network configuration and by specifying associated inference rules that represent network security policies. Datalog inference is used to verify whether violations of policies can be derived from the current network and logging configuration [1]. Once the violation is detected, the system generates a logical attack graph [2]. The attack graph is a directed graph that describes the logging or network configurations that cause the policy violation. For example, suppose that we have a policy that all workstations must log to a secure logging server and that we have a network configuration where the logging server is placed behind a firewall to protect a vulnerable software that cannot be removed for legacy purposes. In this setting, an administrator could modify the firewall configuration, expose the vulnerability, and make the logging server vulnerable to attacks. Using our system, we can automatically integrate information about the network configuration, derive that the logging server is not a secure server anymore, and hence detect that the system is
not configured properly because the logging policy has been violated. During the second phase, our system analyzes the attack graph and identifies the least-cost set of actions that must be taken to resolve the violation. The resulting set of actions can then be used by the system to change the configuration to a state that satisfies the logging policies. In our previous example, the action could be either to fix the firewall configuration (if admissible by the business constraints) or to move the logging service to another server that is currently secure. The organization of the rest of this paper is as follows: In Section II, we give an overview of related work. In Section III, we describe the architecture of our system and how it can be used to enforce auditing policies. In Section IV, we describe our method, its underlying algorithms, and how it compares to previous work. In Section V, we look at the performance of our method and present results and analysis of simulations that were run on randomly generated network models. Finally, in Section VI, we present a direction for future work that may increase the performance of our method and may increase its applicability to more areas of Insider Threat and network security in general.
an attack begins, then the logs will not be there to begin with. Our solution complements these systems by detecting and by reacting to misconfigurations and failures in the logging infrastructure that would have compromised the ability of anomaly detection systems to collect data reliably and securely. The motivation for our architecture was to extend configuration model checking solutions to react to the problems that they detect. Configuration checking systems proposed by Montanari et al and Lobo et al can be used to infer how changes in configuration may yield complex attack vectors [1] [10]. However, this type of system cannot propose solutions to the detected configuration problems. Our system extends such systems by computing solutions to complex violations. Noel et al. [11] proposed an algorithm for the computation of minimal reconfiguration actions in polynomial time. However, their system is unable to account for the complex separation of duty policies and the reliability requirements that are common in the configuration of the logging infrastructure of an enterprise. The method presented in this paper can represent these constraints and can quickly compute the reconfiguration actions to perform for managing complex logging configurations.
II. R ELATED W ORK Insider threat is a growing problem for organizations. Disgruntled employees can use their authorized access to sabotage computer systems, while others may be paid to copy sensitive documents for outsiders. Even honest, but careless employees can contribute to this problem by failing to pick secure passwords, thereby giving others their access [3]. A major focus of insider threat research is on anomaly detection systems that compare the current activity of each user with precomputed normal usage profiles [4]. If a user is engaging in activity outside of their normal usage pattern, it draws attention to their abnormal activity on the assumption that it is more likely to be malicious. Organizations that use anomaly detection systems often use centralized, redundant, or distributed logging systems. The anomaly detection systems that we have surveyed all depended on reliable logs to be effective [4] [5] [6]. However, while techniques for the secure monitoring of machines [7] [8] can be used for generating trustworthy logs on each machine, and cryptographic techniques can be used to prevent log tampering [9], without proper configuration of the logging infrastructure the information collected on each machine cannot be securely analyzed in a central location. If the logging infrastructure is disrupted before
III. A RCHITECTURE Our architecture consists of a centralized system that obtains information about the network configuration from a network of agents. These agents are trusted resident programs and/or kernel modules that collect data about the configuration of networked machines and send those data to the centralized system. These trusted agents use secure monitoring techniques for reliably monitoring the configurations of hosts even in the case of compromises [7] [8]. Additionally, each agent is able to perform a specific set of actions that change the configuration of the system. Our system aggregates data from agents and discovers violations of security or reliability policies, and it sends that data to be processed using our reaction method. The output of that algorithm is a minimal set of actions required to solve the violation, called a minimal hardening set. The minimal hardening set is sent to the relevant agents, which perform the actions they are instructed to carry out and report back to the centralized system with the resulting configuration changes. A diagram of this architecture is shown in Figure 1. The components surrounded by a dotted line are present in systems for the detection of policy violations (e.g., [1]), while the others are specific to our system.
is not configured to log to the remote logging server will be found to be in violation of policy. The reaction system will determine the best way to resolve this violation, which will most likely be to change the configuration so that it logs to the correct server. Example rules and policies to enforce this are shown in Figure 3. [ rules ] /∗ ∗ A s e r v e r has a logging s e r v i c e i f ∗ i t i s r u n n i n g t h a t s e r v i c e and t h a t ∗ service is a logging service . ∗/ [ ( ? S e r v e r s r g : h a s L o g g i n g ? S e r v i c e ) ? S e r v i c e s r g : h a s A c t i o n # Shutdown , ? Machine s r g : h a s A g e n t #TRUE : s h u t d o w n S e r v i c e ( ? Machine , ? S e r v i c e ) [ 3 0 , 8 0 ] } Figure 2.
/∗ ∗ A machine i s l o g g i n g t o a s e r v i c e i f ∗ t h e machine i s c o n f i g u r e d t o log t o t h e ∗ s e r v i c e and t h e m a c h i n e c a n r e a c h t h e s e r v i c e . ∗/ [ ( ? Machine s r g : l o g g i n g T o ? L o g g i n g )