Generalising Event Forensics Across Multiple Domains - Core

Generalising Event Forensics Across Multiple Domains Bradley Schatz, George Mohay, Andrew Clark Information Security Research Centre Queensland University of Technology {b.schatz, g.mohay, a.clark}@qut.edu.au

Abstract In cases involving computer related crime, event oriented evidence such as computer event logs, and telephone call records are coming under increased scrutiny. The amount of technical knowledge required to manually interpret event logs encompasses multiple domains of expertise, ranging from computer networking to forensic accounting. Automated methods of classifying events and patterns of events into higher level terminology and vocabulary hold promise for assisting investigators to cope with voluminous, low-level event oriented evidence. In a previous paper, we showed that the semantic web language OWL was an effective means of representing domain-specific event based knowledge, and when combined with a rule language, was sufficient to apply standard correlation techniques to the task of automated forensic investigation. We also described a prototype implementation of this approach, called FORE. In this paper, we demonstrate that the approach can be extended to be rapidly applied to events sourced from new domains, enabling cross-domain correlation, and that the new approach will accommodate standardised component ontologies which model the separate domains under consideration.

Keywords Computer Forensics, Event Correlation, Modelling of IT Security, Semantic Forensics

INTRODUCTION Effective forensic investigation of event-oriented digital evidence involves interpreting the meaning of events that are often very fine grained in nature. Due both to this granularity and the number of potential sources of event evidence, the volume and complexity of events under consideration is burdensome for manual investigation. These observations parallel similar observations made in the fields of network management and intrusion detection, where diagnosis of the causes of network alarms and the detection of network and host intrusions are the goals. Existing approaches to event correlation have focused on single domains of interest only, and have employed models of correlation that are very specific in nature. Repurposing these specific existing approaches to the more general task of cross-domain forensics is made difficult for a number of reasons. Existing event pattern languages do not necessarily generalise to application in wider domains. For example, while state machine based event pattern languages may work well for events related to protocols, they do not work well for patterns where time and duration are uncertain (Doyle et al., 2001). Most approaches focus exclusively on events, and ignore context related information such as environmental data and configuration information. Furthermore, few approaches have available implementations in a form that is readily modifiable. Where we have modifiable implementations, we find that extension to multiple domains is made impractical by the software paradigm underlying its implementation. (Schatz et al., 2004) We need systems that will allow us to rapidly integrate knowledge from new types of domains, including their transactional or event logs, in a manner that makes explicit the environmental or implicit concepts embedded within log entries. This is in order to promote both human understanding, and automated inference. A general solution is needed. In a previous paper (Schatz et al., 2004), we presented our approach, which represents event or transaction based knowledge as well as environment-based knowledge by defining an extensible and semantically grounded domain model (a forensic ontology) expressed using the Web Ontology Language (OWL) (McGuiness and Harmelen, 2004). We created our own rule based correlation language, FR3, based on the observation that most rule and signature based correlation techniques are translatable to rules. We demonstrated the application of the approach to the forensic investigation of a scenario in a single, homogeneous domain, using an ad-hoc ontology.

That prototype (called FORE) employs the JENA semantic web framework (McBride, 2002) as a knowledge base and inferencing engine by introducing instances of event related knowledge sourced from the following event sources: o windows security logs, records of resource authentication in the windows OS, o apache web server logs, records of accesses of web resources, o UNIX syslog, the standard logging format for UNIX systems and services. The main components of the FORE system are a generic event log parser, forensic ontology, correlation rules, a rule parser, event browser and the JENA framework (see Figure 1). The log parser converts event log entries into instances of concepts described in the forensic ontology. Correlation rules expressed in, FR3, are parsed into the native format of JENA, and applied to the instances by the JENA inference engine. Investigators may interact with the knowledge base containing the events and entity information using the event browser.

Generic Log Parser Apache Spec

Win32 Spec

Door Spec

SAP Spec

Knowledge Base

Rule Base

FR3 Rule Parser

Correlation Rules

JENA Framework

Forensic Ontology

Event Browser

Figure 1: The FORE Architecture In the work described in this paper, we demonstrate that the approach is extensible and can be generalised to support forensics across multiple heterogeneous domains. The extended approach is applied to a forensic scenario that involves both Enterprise Resource Planning (ERP) security transaction logs, and building facilities management based door logs, in addition to the computer security logs that we considered previously. Furthermore, we demonstrate that our approach can scale by virtue of enabling the separate development and subsequent integration of domain ontologies, event parsers, and correlation rules, by experts in their respective domains. This provides freedom to the expert in advancing forensic understanding within a narrow domain, while also providing the necessary structure to relate and communicate that understanding to less sophisticated practitioners. While the validity of the results produced by forensic tools is of serious import to the forensic and legal community, in this work we do not focus on how the outcomes of this tool would be made acceptable to a court of law. For a comprehensive treatment of this issue see (Guidance Software, 2004). However, there is an extensive body of work regarding explaining the deductions of expert and rule systems that would provide the foundations for addressing such concerns (Swartout et. al.. 1991).

RELATED WORK Web Ontology Language With the emergence of the Semantic Web and related research in the knowledge representation and reasoning (KR&R, or KR) and agent oriented computing fields (Chen et al., 2004, McGuiness, 2001), ontologies have become widely used as a means for specifying and defining descriptions of concepts and their relationships. Several ontology languages have been developed in recent times, including the Web Ontology Language (OWL), and DAML+OIL (Harmelen et al., 2004). OWL, which is based on DAML+OIL, has recently been standardised by the World Wide Web Consortium (W3C).

Both OWL and DAML+OIL are based on a branch of logics called Description Logics (DL). These logics are a subset of First Order Logic (FOL) that are well suited to expressing terminology and instance information, with efficient and decidable inference characteristics. Standard Ontologies The OWL language provides support for merging of ontologies, through the use of language features which enable importing other ontologies and enable expression of conceptual equivalence and disjunction (Smith et al., 2004). This encourages separate ontology development, refinement and re-use. An upper ontology refers to the set of elementary, generalised and abstract concepts that should form the basis of all other ontologies. The two primary efforts towards defining upper ontologies are the Standard Upper Ontology (SUO) and CYC (which stands for enCYClopaedia) (Lenat, 1995). Recently, the CYC upper ontology has been made public as a part of the openCYC.org project (Cycorp, 2004). The SUO working group, under the auspices of IEEE, is currently working at forming this ontology from a number of upper ontologies, including the Suggested Upper Merged Ontology (SUMO) (Niles and Pease, 2001) and CYC. Both efforts further define middle level ontologies which are more domain specific than their upper counterparts. Reed and Lenat (2002) observe that in practise, most work on ontology merging and reuse occurs in the middle and lower levels of ontology, where the defining vocabulary for a domain is located. In the web services arena, an upper ontology called OWL-S (formerly DAML-S) has been defined in order to describe web services (OWL-S Coalition, 2004). OWL-S uses a subset of the DAML-Time ontology (DAML-S Coalition, 2003), called the “entry” sub-ontology of time (Pan and Hobbs, 2004). Ontology in Computer Security & Forensics There is little to no published research specifying formal ontologies for computer forensics or computer-related crime. However, we have identified a number of applications of it in the computer security field especially relating to intrusion detection. Raskin et al. (2001) argue for the adoption of ontology as a powerful means for organising and unifying the terminology and nomenclature of the information security field. They observe that the use of ontology in the information security field will increase the systematics, allow for modularity and could make new phenomena predictable within the security domain. Schumacher (2003) focuses on systematic approaches to improving software security, by using Security Patterns, the application of the design patterns approach to security. Ontologies are used as a means to model both the security concepts referred to by the patterns, as well as the patterns themselves. Undercoffer et al. (2004) produced an ontology which can be used to describe a model of computer attack, which they call a “Target Centric Ontology for Intrusion Detection”. Our work is closest to the research performed by Goldman et al. (2001) in their IDS alert fusion prototype, SCYLLARUS. This work used the description logic environment CLASSIC (Borgida et al., 1989) to model a site’s security policy, static network, software configuration, and intrusion events. Only the Network Entity Relationship Diagram (NERD) ontology, which contains concepts focused around network and host was published.

A DESIGN FOR EVENT FORENSICS ACROSS MULTIPLE DOMAINS In the work described in this paper, we extend our previous approach so that it will accommodate new domains and reasoning across multiple such domains. We demonstrate its applicability by applying it to two new domains of event based evidence, along with the domain discussed previously. Where we previously employed an ad-hoc ontology, we now refine our approach by integrating third party ontologies as our foundations, demonstrating a means for separate development of domain specific ontologies and forensic correlation rules by experts in their domain. This section describes the results of applying the FORE system to a forensic scenario that integrates event sources drawn from two domains we have not considered before: the spatial domain of door swipe card logs, which are used to control access to rooms, and the security audit logs from the Enterprise Resources Planning (ERP) system SAP.

An Example Scenario – Identity Masquerading The following potential scenario illustrates the motivation for our work and serves as a test of the success of our approach. We have identified a scenario of potential misuse in an accounting environment where a company is using the SAP ERP system. The scenario consists of the following trace of events, with the sources of the events in question indicated in parenthesis: 1.

Door log (Door log)

2.

Win32 login (Win32 System Log)

3.

Win32 Process Start: SAP (Win32 System Log)

4.

SAP Login Succeeded/Failed: Username 1 (SAP Security Audit Log)

5.

Win32 logoff (Win32 System Log)

Detection of this scenario could indicate a user mistyping their username or password. However, it could also indicate a user attempting to (or succeeding to) login as another user. Persistent recurrence of this event could potentially indicate the user methodically guessing the password of another user. Integration of Standard Ontologies SUMO provides two middle level ontologies related to our work: distributed computing, and geography. Chen and Finin (2004) have defined a set of ontologies collectively referred to as SOUPA for context aware pervasive computing environments, which addresses concerns such as location, places and time. It imports subsets of the OWL-S web services ontologies, and defines a spatial ontology based on a subset of the openCYC spatial ontology. We chose to use the SOUPA ontology for representation of place and space related concepts as SOUPA is more lightweight than the currently defined SUMO ontologies. Lightweight ontologies perform better in automated inference, as there are a reduced number of concepts and instances required to be considered by the inferencing engine. Further, the SOUPA efforts have demonstrated this ontology working with the JENA toolkit. The SOUPA ontology imports the time entry ontology of Pan and Hobbs (2004) as its basis of time and events, but we do not use this in our work, as sophisticated temporal reasoners are required to reason about this model of time. Temporal representation will be the subject of future investigation. Of the ontologies related to security, the security ontology of Raskin et al. (2001) appeared to be promising; however the ontology was unavailable at the URL published. Of the available security ontologies, the closest fit to our needs was the NERD ontology. In order to integrate it, we first had to translate it into OWL. This was straightforward, as it is specified using the CLASSIC DL language and OWL is based on DL foundations. The NERD ontology was far more granular in its modelling of the composition of network and host structure. For example, in our original ontology, we modelled the IP address of a host as a property of the Host class. However, in the NERD ontology, we must use a succession of anonymous instances to represent this host. Rather than stating “the host with IP address 131.181.6.3” in our original ad-hoc ontology, we must make the statement “the host whose interface has an ipsetup with IP address 131.181.6.3” using the NERD ontology. This is expressed using this ontology as: 131.181.6.3

This introduces many more entities into the system per log entry, which could quickly overload the information conveyed in the entity view. In response to this, we only present the outermost enclosing instance, with the

child properties represented as path elements. For example, in our entity viewer, we would represent the Host above as: [hasInterface.hasIPSetup.hasIPaddress.ipaddress=131.181.6.3]

Integrating New Domains The door log entries contain the date, time, card id, name of assigned owner, the door name, and the zone. In our case, the door is named by both the room it controls access to and the building containing the room. Integrating this knowledge into our prototype first involves identifying the concepts implicit in the event log data, and then determining an appropriate place for the concepts in our ontologies. As we wish to represent Rooms and Buildings, we hook in our Room concept by inheriting from the SOUPA class SpacedInAFixedStructure. Similarly, we inherit Building from FixedStructure. We hooked a DoorEvent into our existing ad-hoc event ontology by inheriting it from our existing Event class. We next write an event parser specification specific to the door logs, which match the door log syntax, and declare the OWL instances which are necessary to represent a door entry. Below we present an example door log event, as created by the parser: GP. S BLOCK GP. S BLOCK RM S826A 42281 RICCO LEE 2004-03-04T20:30:00Z

SAP Security Audit Logs record, among other things, the success or failure of logins to SAP, along with the date and time of the event, and the host (or in SAP terminology, terminal) that the user attempted to login from. Addition of SAP related events specific to our scenario required the addition of the following new concepts to our ontology: Class ServiceAuthenticationEvent SAPAuthenticationEvent SAPClientLoginSuccessEvent SAPClientLoginFailureEvent SAPClientProcessCreationEvent IdentityMasqueradeEvent

Meaning Authentication of a user to a resource, specifically, a resource that is a service Authentication of a user by SAP. Login success or failure. Inherits ServiceAuthenticationEvent. Successful login to SAP. Inherits SAPAuthenticationEvent. Unsuccessful login to SAP. Inherits SAPAuthenticationEvent. The SAP client program has been run on a client terminal. Multiple login names have been used to access a service from the context of a single login account. Table 1: SAP Related Events

The basis for identifying a case of identity masquerading is by recognising when a user uses multiple identities to access resources. In order to recognise this we look for SAP authentication events, which occur from the context of a single user’s OS login session, where the user identity is not the same. The LoginSessionEvent is a higher level abstraction which represents a user’s interactive login session on a host (this kind of event abstraction is presented in the previous paper). Below we present a correlation rule in our language FR3, which detects instances of this scenario:

Rule

e1?[rdf:type -> fore:LoginSessionEvent ; fore:startTime -> ?t1 ; fore:finishTime ->

?t3 ; fore:host -> ?h ; fore:user -> ?u1], e2?[rdf:type -> fore:SAPAuthenticationEvent; fore:startTime -> ?t2; fore:terminal -> ?h; fore:user -> ?u2], le(?t1, ?t2), le(?t2, ?t3), notEqual(?u1, ?u2), makeTemp(?s) -> ?s[rdf:type -> fore:IdentityMasqueradeEvent; fore:causality -> ?e1,?e2], ?e2[fore:causality -> ?e1];

Meaning

Match an event instance of class LoginSessionEvent with an event instance of class SAPAuthenticationEvent where the LoginSessionEvent’s host is the same host as the terminal in the SAPAuthenticationEvent. The SAPAuthenticationEvent must occur within the time boundaries of the LoginSessionEvent, and the users in each event are not the same user. If this is the case, create an event of type IdentityMasqueradeEvent and link its causality property to the matched events, and link the causality property of the SAPAuthenticationEvent to the LoginSessionEvent Table 2: Identity Masquerade Rule

Correlating door entries with interactive logins to a workstation is achieved using the following rule:

Rule

?e1[rdf:type -> fore:DoorEvent; fore:user -> ?u ; fore:startTime -> ?t1], ?e3[rdf:type -> fore:TerminalEvent; fore:user -> ?u; fore:startTime -> ?t3], fail ( ?e2[rdf:type -> fore:DoorEvent; fore:user -> ?u; fore:startTime -> ?t2], lessThan(?t1, ?t2), lessThan(?t2, ?t3) ) -> ?e3[fore:causality -> ?e1];

Meaning

Match an event instance of class DoorEvent with an event instance of class TerminalEvent that occurs before it. If they refer to the same user and there is not another door event in between, then link the TerminalEvent’s causality property to the DoorEvent. Table 3: Door Entry- Login Rule

EXPERIMENTAL RESULTS We ran our extended software against the previously presented multi-domain scenario with a knowledgebase containing some hundreds of events sourced from the three different domains. The event browser immediately identified the scenario, along with a number of false positives. The scenario was identified by instances of the MultipleIdentitiesUsedEvent event appearing in the event browser. We provide further means for finding instances by querying for the specific event, or by using high level views which limit the set of events displayed to higher level concepts closer to the concerns and vocabulary of the investigator. The user interface enables the investigator to “drill down” to the events which caused it. In this example, the MultipleIdentitiesUsedEvent has causal links to the LoginSessionEvent and the SAPAuthenticationEvent that triggered its creation. In Figure 2, we present a graph of events that correspond to the scenario, which can be explored by an investigator using the drill-down feature of the interface. The causal relationships correlated by the rules above are presented in using bold. Other links are correlated by rules not presented here.

IdentityMasqueradeE vent

SAPClientProcessC reationEvent user=P host=F

SAPClientLoginSuccessE ventEvent user=Q terminal=F

LoginSessionE vent user=P TerminalLoginE vent user=P

TerminalLogout Event user=P

DoorEvent user=P

Figure 2: Causal Ancestry Graph of Identity Masquerading Scenario In our test environment, like many real world deployments of SAP, the SAP username is not necessarily the same as the OS username for the same user. The preceding rule presented in table 2 resulted in many false positives, as the test for inequality fires the rule for minor differences in username. For example, “jsmith” and “j.smith” are treated as separate users. In order to resolve this kind of problem, we explicitly select the users in question, and indicate that they should be treated as representing the same thing. As a result, MultipleIdentitiesUsedEvent based on this kind of identity failure are removed from the knowledgebase and event viewer. This approach to hypothetically resolving identity between a user identified from a door log, and a user identified in a login, similarly allowed us to causally correlate door logs with logins to computers.

CONCLUSION We have demonstrated that the FORE approach is extensible and generalisable to support reasoning across multiple heterogeneous domains. We do so by successfully applying the prototype to a forensic scenario that involves both ERP security transaction logs, and door logs, in addition to computer security logs such as we have considered in our previous efforts. Furthermore, we demonstrate that our approach can scale, by supporting the separate development and subsequent integration of domain models, event parsers, and correlation rules, by experts in their respective domains. We believe that this at the same time provides freedom to the expert in advancing forensic understanding within a narrow domain, while providing the necessary structure to relate and communicate that understanding to less sophisticated practitioners. Our rule language is currently difficult to read due to its low level nature. Future work will focus on finding more suitable abstractions to enable forensic practitioners to more easily express correlation rules. We further wish to explore the time entry ontology as a potential representation of time.

ACKNOWLEDGEMENTS We are very grateful to our colleague Peter Best for his help in identifying misuse scenarios in the SAP environment.

REFERENCES Borgida, A., Brachman, R. J., McGuinness, D. L. and Resnick, L. A. (1989) 'CLASSIC: A Structural Data Model for Objects', In ACM SIGMOD International Conference on Management of Data, Portland, Oregon, pp. 58-67.

Chen, H., Perich, F., Finin, T. and Joshi, A. (2004) 'SOUPA: Standard Ontology for Ubiquitous and Pervasive Applications', In International Conference on Mobile and Ubiquitous Systems: Networking and Services, Boston, MA. Cycorp, (2004), Opencyc.org, http://www.opencyc.org/, Accessed 5 June 2004 DAML-S Coalition, (2003), DAML-Time Homepage, http://www.cs.rochester.edu/~ferguson/daml/, Accessed 20 July 2004 Doyle, J., Kohane, I., Long, W., Shrobe, H. and Szolovits, P. (2001) 'Event Recognition Beyond Signature and Anomaly', In IEEE Workshop on Information Assurance and Security, United States Military Academy, West Point, New York, pp. 17-23. Goldman, R., Heimerdinger, W., Harp, S., Geib, C., Thomas, V. and Carter, R. (2001) 'Information Modeling for Intrusion Report Aggregation', In DARPA Information Survivability Conference and Exposition II, IEEE, Anaheim, CA. guidanceSoftware, (2004), Encase Legal http://www.guidancesoftware.com/corporate/whitepapers/downloads/LegalJournal.pdf, Accessed 30 Sept 2004

Journal, Accessed

Harmelen, F. v., Patel-Schneider, P. F. and Horrocks, I., (2004), ‘Reference description of the DAML+OIL (March 2001) ontology markup language’, http://www.daml.org/2001/03/reference.html, Accessed 20 July, 2004 Lenat, D. B. (1995) 'CYC: a large-scale investment in knowledge infrastructure', Communications of the ACM, vol. 38, pp. 33-38. McBride, B. (2002) 'Jena: a semantic web toolkit', IEEE Internet Computing, vol. 6, pp. 55-59. McGuinness, D. L. (2001) 'Description Logics Emerge from Ivory Towers', In International Workshop on Description Logics, Stanford, CA. McGuinness, D. L. and Harmelen, F. v., (2004), OWL Web Ontology Language Overview, http://www.w3.org/TR/2004/REC-owl-features-20040210/, Accessed 24/2/2004 Niles, I., and Pease, A. (2001). Towards a Standard Upper Ontology. In Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001), Chris Welty and Barry Smith, eds, Ogunquit, Maine, October 17-19, 2001. OWL-S Coalition, (2004), OWL-S 1.0 Release, http://www.daml.org/services/owl-s/1.0/, Accessed 20 July 2004 Pan, F. and Hobbs, J. R. (2004) 'Time in OWL-S', In 2004 AAAI Spring Symposium Series - Semantic Web Services, Stanford University. Raskin, V., Hempelmann, C. F., Triezenberg, K. E. and Nirenburg, S. (2001) 'Ontology in information security: a useful theoretical foundation and methodological tool', In Workshop on New Security Paradigms, Cloudcroft, New Mexico. Reed, S. L. and Lenat, D. B. (2002) 'Mapping Ontologies into Cyc', In AAAI workshop on Ontologies and the Semantic Web, Edmonton, Canada. Schatz, B., Mohay, G. and Clark, A. (2004) 'Rich Event Representation for Computer Forensics', In Asia Pacific Industrial Engineering and Management Systems (APIEMS 2004), Brisbane, Australia. Schumacher, M. (2003) 'Security Engineering with Patterns', Lecture Notes in Computer Science, vol. 2754 Smith, M. K., Welty, C. and McGuinness, D. L., (2004), OWL Web Ontology Language Guide, http://www.w3.org/TR/owl-guide/, Accessed 20 July 2004 Swartout, W., Paris, C. and Moore, J. (1991) 'Explanations in knowledge systems: design for explainable expert systems', IEEE Expert, vol. 6, pp. 58 – 64. Undercoffer, J., Joshi, A., Finin, T. and Pinkston, J. (2004) 'A Target-Centric Ontology for Intrusion Detection', In 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico.

COPYRIGHT Bradley Schatz, George Mohay, Andrew Clark ©2004. The authors assign the We-B Centre & Edith Cowan University a non-exclusive license to use this document for personal use provided that the article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive license to the We-B Centre & ECU to publish this document in full in the Conference Proceedings. Such documents may be published on the World Wide Web, CD-ROM, in printed form, and on mirror sites on the World Wide Web. Any other usage is prohibited without the express permission of the authors.

Generalising Event Forensics Across Multiple Domains - Core

Generalising Event Forensics Across Multiple Domains - Core

Suggest Documents

across multiple domains A common heritable factor

Reusing Workflow Fragments Across Multiple Data Domains

Design Information Sharing Across Multiple Knowledge ... - Core

Multiple post-Caledonian exhumation episodes across ... - Core

Social recommendation across multiple relational domains - Meng Jiang

Building Virtual Networks Across Multiple Domains - Events - acm

Policy Management across Multiple Platforms and Application Domains

Retrieving Information Across Multiple, Related Domains Based on ...

Generalising rate heterogeneity across sites in statistical phylogenetics

Event extraction across multiple levels of biological organization

Data interoperability across IoT domains

Rich Event Forensics with Semantic Event Logs - CiteSeerX

Rich Event Forensics with Semantic Event Logs - Semantic Scholar

Work-Conserving Fair-Aggregation Across Multiple Core ... - CiteSeerX

Socioeconomic conditions across life related to multiple ... - Core

Disruption of the microbiota across multiple body sites in ... - Core

Meta-analysis of prediction model performance across multiple ... - Core

validating javascript guidelines across multiple web browsers - CORE

Generalising unitary time evolution

Event-related potentials and cognitive performance in multiple ... - Core

Combination across domains: an MEG ... - NYU Psychology

disfluency detection across domains - INESC-ID

interpersonal competence across domains - Semantic Scholar

Self-localization Using Visual Experience Across Domains