A Proposed Resilience Framework

A Proposed Resilience Framework Marcus A. Thompson, Michael J. Ryan, and Alan C. McLucas University of New South Wales School of Engineering and Information Technology Australian Defence Force Academy Northcott Drive Canberra ACT 2600 Australia +61 2 6268 8111 [email protected] [email protected] [email protected] ABSTRACT Despite the critical nature of security in the design of almost all systems—and the increasing criticality of security systems themselves—the adequate specification and design of modern systems is considerably hindered by the lack of a useful set of definitions. The definitions offered by prominent standards bodies lack commonality in meaning and interpretation, are overlapping, and contain language that is very specific to the particular domains of electronic, physical and personnel security. Further, the majority of modern terms tend to be focused almost exclusively on electronic or cyber security. Consequently, the current set of security terms and definitions provide little assistance to stakeholders when articulating requirements, nor to systems designers when defining system architectures. Current resilience terminology is similarly confused, with a lack of commonality amongst published standards and guidelines, domain-specific language, and terms used synonymously to blur the distinction between resilience and security. This paper presents a new framework for achieving resilience in the face of security threats. The paper initially summarises the ontology of security proposed in earlier work, before distinguishing between security and resilience. Resilience is presented as being spectral, addressing unknown threats, and as having application after security is lost; whereas security is presented as being brittle, addressing known threats, and as having application before a security breach. Based on the previous work a definition of resilience is developed, further resilience terms are defined, and a resilience ontology is proposed for broad application across the electronic, physical, and personnel security domains. Key words – resilience, security, terminology, ontology, framework

INTRODUCTION Despite the critical nature of security in the design of almost all systems—and the increasing criticality of security systems themselves—the adequate specification and design of modern systems is considerably hindered by the lack of a useful set of definitions. The definitions offered by prominent standards bodies lack commonality in meaning and interpretation, are overlapping, and contain language that is very specific to the particular domains of electronic, physical and personnel security. Further, the majority of modern terms tend to be focused almost exclusively on electronic or cyber security. Consequently, the current set of security terms and definitions provide little assistance to stakeholders when articulating requirements, nor to systems designers when defining system architectures. (Thompson et al. 2012) Current resilience terminology is similarly confused, with a lack of commonality amongst published standards and guidelines, domain-specific language, and terms used synonymously to blur the distinction between resilience and security. This paper builds on previous work (see Thompson et al. 2102), which proposed a cohesive set of security definitions that are applicable to security systems engineering across the electronic, physical,

2 and personnel security domains, to present a new framework for achieving resilience in the face of security threats. The paper initially summarises the ontology of security proposed in earlier work, before distinguishing between security and resilience. Based on the previous work a definition of resilience is developed, further resilience terms are defined, and a resilience ontology is proposed for broad application across the electronic, physical, and personnel security domains. SECURITY The ultimate aim of security is to retain a resource of value at some particular nominated state. Whether preserving the availability of a bank balance, ensuring personal safety, preserving the confidentiality of information in a database, or safeguarding the integrity of a territorial border, the required end-state of security is to maintain the nominated state of a designated resource. A review of current literature indicates that there is broad commonality amongst standards organisations regarding the use of the terms confidentiality, integrity and availability to describe the desired state of security. However, in presenting security terminology, standards organisations such as the International Standards Organisation, the International Telecommunications Union, the Organisation for Economic Cooperation and Development, the Control Objectives for Information and related Technology, the Internet Engineering Taskforce, the US National Institute of Standards and Technology (NIST), and Standards Australia each present unique descriptions of the various constituent elements that comprise security. While some of the terms and associated definitions are similar and common to multiple standards organisations, others lack commonality and are unique to a single standards organisation. Collectively, the elements present an eclectic mix of actions, states, and governance functions with terms that are overlapping, and at times contradictory—see Thompson et al. (2012) for a more complete analysis. Security Framework A new definition of security developed in earlier work is proposed as: Security is the maintenance of the nominated state of a designated resource. (Thompson et al. 2012) where the nominated state is a specific condition that is determined through a governance process that assesses the intrinsic value of the resource that is designated as requiring security. Such a governance process, which is beyond the scope of this paper, would include thorough threat assessments, risk management processes, and cost-benefit analyses based upon an organisation’s unique requirements and circumstances. The definition of security can be completely elaborated (based on Thompson, et al.) to be: The security of the nominated state of a designated resource is maintained when an authenticated entity is known to perform an action that is accessible. Setting or establishing levels of authentication, attribution, and accessibility are also specific functions of governance that will be set based upon an organisation’s specific circumstances. Table 1 summarises the full elaboration of the improved definition for security supported by a hierarchy of security services and example security mechanisms. The nominated state of a designated resource is secure when …

Security Services

an authenticated entity

is known to perform

an action that is accessible.

Authentication (Identification)

Attribution (Non-repudiation)

Access Control (Perimeter Control)

3

Example Security Mechanisms

Passwords Biometrics Identity Cards Passports Pattern of life analyses Introduction Physical recognition

Notarization Digital Signatures Logging Observation Inspections Recordings Registers

Encryption Permission Controls Validation Firewalls Data Filters Routing Control Proxy Servers Access Lists Visas

Table 1: Summary of Revised Ontology of Security Relationship Between Security and Resilience The repeated use of overlapping, common and/or synonymous terminology within the current literature highlights the close relationship between security and resilience. However, there are particular attributes reflected in the literature that highlight important differences between security and resilience terminology. As described earlier, security has a brittle, or binary connotation where a resource is either secure, or it is not. For example, while the physical security qualities of a safe or security container may be very strong, once the security of the safe is compromised or penetrated, the contents of the safe are exposed, and no longer considered secure. Conversely, resilience has a spectral connotation, where a resource can lose security, but be resilient to varying degrees. For example, a designated resource may be held in a safe or security container located inside a locked room or vault, which in turn is inside a building to which access is controlled by a system of swipe cards, passwords and biometric inspection. Additionally, the designated resource held within the safe or security container could be fitted with a Radio Frequency Identification Device, so that any movement would be immediately detected and tracked. The designated resource is therefore covered by ‘layered defence’. Jackson and Ferris describe layered defence as addressing the ‘Swiss cheese model’, where the more layers there are, the more resilient the system will be. (Jackson and Ferris 2013) Such a model is a contemporary application of the layered suite of defensive measures in a mediaeval castle that separate the least important outer areas from the more important inner areas, providing a degree of resilience back towards an impregnable, or secure, ‘Keep’. The binary nature of security demands that security practitioners consider what could happen after security has failed; that is, after a security breach. Hathaway argues that ‘we need to be adaptive in the face of a persistent, evolving and all-pervasive threat’, and that it is ‘critical that organisations are aware of and understand the threat so that it can be quickly confronted and effectively managed when it arrives’ (Hathaway 2009). By using the term ‘when’ rather than ‘if’ a threat arrives, Hathaway’s statement clearly articulates that the threat will arrive, and implies that consideration of post-event actions is critical. Here, Hathaway is clearly referring to unknown threats, as a comprehensive risk assessment and understanding of known threats will influence pre-event system design and posture. However, as it is not possible to implement proactive security systems to counter all unknown threats, organisations must be able to withstand a security breach and reactively maintain business continuity. That is, the nominated state of security must be maintained. The distinction between proactive and reactive measures is reinforced by Baskerville, et al, who describe a security incident as being the pivot between prevention to response (Baskerville et al. 2014). While exclusively focussed on the electronic domain, their argument that organisations must balance preventative and responsive measures based upon their unique circumstance reinforces the critical nature of resilience to an organisation’s post-event security environment, and the key distinction between security and resilience.

4 It is the occurrence of a security breach that enables the important distinction between security and resilience. Security is brittle, addresses known threats, and has application before a security breach, while resilience is spectral, addresses unknown threats, and has application after security is lost. PROPOSED RESILIENCE FRAMEWORK Current Definitions of Resilience The term resilience is used in many different fields. In the context of nature and the environment, the Center for Resilience at the Ohio State University defines resilience as: ‘the capacity of complex systems to survive, adapt, evolve and grow in the face of turbulent change’ (Ohio State University 2010). At an organisational level, a 2010 presentation to an Australian public sector conference defined a resilient organisation as being: ‘one that is able to achieve its business objectives and realise opportunities, even in the face of adversity’ (Whitehorn 2010). Similarly, Standards Australia has published two definitions of resilience as being: ‘the ability to resist being affected or to recover from an event’ (Standards Australia 2006)’; and ‘the adaptive capacity of an organization in a complex and changing environment’ (Standards Australia 2010). These definitions of resilience each possess the common theme of an organisation resisting, adapting and/or recovering following an adverse event. Jackson and Ferris describe resilience as being ‘an attribute of a system, not an add-on to a system’, and distinguish between inherent resilience and engineered resilience. They note that as an inherent property of an entity, resilience refers to the recovery from a disturbance. Whereas in engineered systems, ‘resilience involves a wide range of potential threats and system responses, both pre-emptive and post-event’. (Jackson and Ferris 2013) Jackson and Ferris cite the US National Security Strategy definition of resilience as: ‘… the ability to adapt to changing conditions and prepare for, withstand, and rapidly recover from disruption.’ (The White House 2010) In this definition, ‘adapt’ is described as meaning to restructure before or during an encounter with a threat; ‘avoid’ means to eliminate contact between the system and the threat and to suffer no damage or disruption to functionality from the threat; ‘prepare for’ means to engineer the system in advance to enable recovery following an encounter with a threat; ‘withstand’ means to retain partial or full functionality following an encounter with a threat; and ‘recover’ means to retain or restore partial or full functionality following an encounter with a threat. (The White House 2010) Other commentators have also taken a systems view of resilience, with Gibson and Tarrant describing resilience as ‘being founded on good risk management, being multidimensional, and existing over a spectrum’ (Gibson and Tarrant 2010). They specifically describe resilience as ‘being less about bouncing back from adversity, and more concerned with adaptive capacity and determining how to better understand and address uncertainty in our internal and external environments’ (Gibson and Tarrant 2010). This description is contrary to the other definitions of resilience which imply elements

5 of resistance, adaptability and recovery, while maintaining the capacity of an organisation to achieve its core operational objectives. The Security Risk Management Body of Knowledge (SRMBOK) defines resilience as: ‘the ability of an organisation, individual, or community to minimise the harmful or deleterious consequences of disruptive events and to use the event as a trigger to strengthen and develop’ (Talbot and Jakeman 2009). The SRMBOK further describes resilience as being ‘the capacity to anticipate risk, limit impact, and bounce back rapidly’, and that resilience is ‘the ultimate objective of both economic security and organizational competitiveness’ (Talbot and Jakeman 2009). The SRMBOK also states that resilience is ‘less about creating hardened structures and rigid protections of relying on standard procedures and more about developing a flexible, responsive, and adaptable way of thinking, behaving, and dealing with the impact of change within both the external and internal environment’ (Talbot and Jakeman 2009). The SRMBOK theme of resilience including responses to unexpected changes or adverse events is consistently reflected throughout the literature. For example, DeBardelben, et al, define resilience as: ‘the ability of a system to keep applications running and maintain an acceptable level of service in the face of transient, intermittent and permanent faults’ (DeBardeleben et al. 2010). Similarly, Dahms describes resilience as: ‘an organisation’s state of being resulting from the management of uncertainty in a complex adaptive system’ (Dahms 2010). Dahms further argues that an indicator of this state of being is ‘an organisation’s adaptive capacity, and that uncertainty is the concept linking risk management, corporate governance and resilience’ (Dahms 2010). This again illustrates the critical role of governance, and closely ties resilience to the resolution of unknown threats that have not been addressed in security planning and preparations. The World Health Organisation also takes a systems view of resilience, defining resilience in a 2007 report as: ‘the degree to which a system continuously prevents, detects, mitigates or ameliorates hazards or incidents.’ (World Health Organization 2007) Another United Nations entity, the United Nations Office for Disaster Risk Reduction, presents a similar view of resilience, defining it as: ‘the ability of a system, community or society exposed to hazards to resist, absorb, accommodate to and recover from the effects of a hazard in a timely and efficient manner, including through the preservation and restoration of its essential basic structures and functions’ (United Nations Office for Disaster Risk Reduction 2009). This definition includes the further comment that ‘resilience means the ability to “resile from” or “spring back from” a shock’ (United Nations Office for Disaster Risk Reduction 2009). The ability to quickly recover from an unforeseen event (that is, an event caused by an unknown threat) is determined by the degree to which any system or organisation is able to learn from past experiences for better future protection and to adapt and improve risk reduction measures. With the exception of Gibson and Tarrant, the element of recovery is a common theme among the definitions and descriptions of resilience in the current literature. Similarly, the literature includes a

6 common theme of security-related resilience as existing on a spectrum and having application following an event, or a security breach. There is also an evident risk management theme to the listed definitions of resilience, which are consistently related to the ability of an organisation to resist and/or respond positively to uncertainty or an unexpected adverse event. Other Resilience Terms In addition to the definitions of resilience, the literature includes several other constituent and standalone resilience terms. Gibson and Tarrant describe four broad strategic approaches that can be taken to build improved resilience: Resistance. Aimed at improving robustness and hardening the organisation to withstand the immediate effects that volatility may impose. Reliability. Aiming to ensure that key functions, resources, information and infrastructure continue to be available, accessible and fit-for-purpose following an event. Redundancy. Provides for one or more alternatives to day-to-day operational approaches. Flexibility. Enables the organisation to adapt to extreme circumstances and sudden shocks that often exceed the design parameters for the other strategies. (Gibson and Tarrant 2010) Resistance relates to the ability to maintain a state of security in the presence of a threat. Often used in a manner that is synonymous with security, a fundamental aspect of resistance is to ‘design networks and develop software to minimise the opportunity for threat actors to attack the organisation’s network’ (Nyberg 1988). The Gibson and Tarrant reference to withstanding the immediate effects of a security breach has clear application to resilience. The Gibson and Tarrant definition of redundancy also has a clear relationship with availability, reliability, and flexibility; each of which are states, rather than approaches to achieving resilience. The Gibson and Tarrant definition of redundancy is very similar to that proffered by Jackson and Ferris, who described functional redundancy as a principle of resilience, defined as having ‘two or more different ways to perform a critical task (Jackson and Ferris 2013). Redundancy is also closely related to the concepts of limiting the impact of a security breach, and maintaining essential services. Like Gibson and Tarrant, Woods defines four essential characteristics of resilience as follows: Capacity. The ability of the system to survive the threat. Flexibility. The ability of the system to adapt to the threat. Tolerance. The ability of the system to degrade gracefully in the face of the threat. Cohesion. The ability of the system to act as a unified whole in the face of the threat. (Woods 2007) Woods’ inclusion of flexibility has an emphasis on adaptability. Yet adaptability has a broader application that enables an organisation or system to apply the lessons learned following a security breach and to realise any procedural or technical improvements that may be required to address any identified vulnerabilities. The continued emphasis on post-event adaptability further highlights the critical role of governance in the design and execution of resilience measures. Based upon its definition, Woods’ characteristic of capacity could be renamed survivability. However, Woods’ description of cohesion is a new concept that seeks to ensure that all security and resilience measures are applied in a cooperative manner in the face of threat(s). This is also clearly a

7 governance function. Woods’ characteristic of tolerance is also closely related to the concept of survivability and is aimed at limiting the impact of a security breach. A 2001 NIST publication groups underlying security services according to their primary purpose of support, prevent and recover (NIST 2001). In the context of this discussion the supporting security services could be described as governance functions, the prevention security services could be described as security services, and the detection and recovery security services are clearly related to resilience. However, while not specifically defined by NIST, the listed constituent elements of auditing, intrusion detection and containment, proof of wholeness (integrity), and restore the secure state would benefit from additional consideration. As discussed earlier, auditing is considered to be a governance function that contributes to policy compliance. Similarly, integrity is an attribute of a system, rather than an action that can be applied to achieve resilience. This additional consideration leaves intrusion detection and containment, and restore the secure state as relevant elements of a generic resilience ontology. More recently, the NIST Framework for Improving Critical Infrastructure Cybersecurity described five concurrent functions of: Identify: Develop the organizational understanding to manage cybersecurity risk to systems, assets, data and capabilities. Protect. Develop and implement the appropriate safeguards to ensure deliver of critical infrastructure services. Detect. Develop and implement the appropriate activities to identify the occurrence of a cybersecurity event. Respond. Develop and implement the appropriate activities to take action regarding a detected cybersecurity event. Recover. Develop and implement the appropriate activities to maintain plans for resilience and to restore any capabilities or services that were impaired due to a cybersecurity event. (NIST 2014) While deliberately focussed on the electronic domain, this contemporary framework and its composite definitions provide a useful insight into broader thinking and research regarding security and resilience. The definition of protect implies the implementation of proactive pre-event measures that highlight the close yet distinct relationship between security and resilience. The definitions of identify and detect also illustrate the close relationship between identification and detection. Similarly, while recovery would no doubt be a part of any respond function, the NIST definition of recover includes specific reference to restoring capabilities or services to the pre-event state. The recent Rahman and Choo literature survey of information security incident management handling and response also found a lack of consistency in post-event terminology and approaches. After reviewing and analysing standards and guidelines from the Computer Emergency Response Team Coordination Centre, the NIST, the European Union Agency for Network and Information Security, the SANS Institute, the Information Technology Infrastructure Library, and other works, they adopted the NIST solution (Cichonski et al. 2012) to conclude that the four main phases for incident handling are preparation; detection and analysis; incident response (including containment eradication and recovery); and post-incident. (Rahman and Choo 2015) While these phases were not explicitly defined, like the NIST Framework for Improving Critical Infrastructure Cybersecurity, preparation is a proactive measure that is more closely linked to security. However, detection and analysis; containment, eradication and recovery; and post-incident

8 actions are useful measures that have broader application and utility to resilience, beyond the electronic domain. Summary of Current Resilience Terminology The key resilience terminology as evident in the literature can be broadly grouped under the headings of detection, containment and resolution, as illustrated in Table 2. Detection Breach detection Identification

Containment Breach containment Absorption Limitation of Impact Maintenance of Basic Functions Survivability Resistance Redundancy

Resolution Accommodation Restoration Rapid Bounce-Back Correction Recovery Eradication Adaptability

Table 2: Summary of Key Resilience Terms REDEFINING RESILIENCE The ultimate aim of resilience is to maintain a particular state of security for a designated resource. With that in mind, and building upon the security ontology developed in previous work, a new definition of resilience is proposed as: Resilience is the maintenance of the nominated state of security. where, the nominated state is a specific condition that is determined through a security risk assessment. Security is breached once the nominated state has been changed, and resilience is the ability to redress the change to maintain the nominated state – that is, to restore the nominated state of security. As discussed earlier, resilience has application after a security breach has occurred. Resilience implies that security must be restored in a timely manner following a security breach. However, for security to be restored, the security breach must first be detected. This could involve the triggering of an alarm system, a physical witness to an intrusion, or the discovery of a failure of a perimeter. Once a security breach is detected, the impact of the security breach on the designated resource must then be contained. This could involve the deployment of a quick reaction force, or the lock-down of a facility or asset to prevent further exploitation. Finally, the security breach must then be resolved in order to remove any residual threat elements and restore the nominated state of a designated resource. Resolution could involve a counter-attack; the capture and/or eviction of intruders; or the repair or reinforcement of a perimeter to restore a previously established level of security. A governance process would also be required to conduct a comprehensive review of the event and the identification of appropriate lessons learned to be factored into revised (that is, adapted) security measures. In summary, the restoration of security requires the detection, containment and resolution of any security breach. That is, any failure of one or more of the security services of authentication, attribution or access control must be detected, contained and resolved. Or more succinctly: The nominated state of a designated resource is resilient when a security breach can be detected, contained and resolved.

9 The detection, containment and resolution of failures of authentication, attribution and/or access control can be collectively described as resilience. The base definition of resilience therefore can be decomposed further by examining the detail of detection, containment and resolution. These three functions can collectively be considered as presenting a robust and hardened approach to security events. So, the definition of resilience can be completely elaborated to be: Resilience is achieved when a security breach is detected, contained and resolved. Setting or establishing the levels of detection, containment and resolution is a specific function of governance that will be informed by a comprehensive security risk assessment. A PROPOSED RESILIENCE ONTOLOGY Having proposed a new definition for resilience, it is appropriate to consider how this definition fits into a resilience ontology. The literature does not include a structured resilience ontology. Nor does the literature include concepts such as resilience services, or resilience mechanisms. However, there is clear utility in the development of a generic resilience ontology that has application across the physical, electronic and personnel domains. It is considered useful to provide a definition of a resilience service that closely reflects the nature of the proposed definition of resilience. To that end, and building on the security ontology developed previously: A resilience service is a process that, alone or in combination with others, maintains the nominated state of security. Since the proposed definition states that resilience is maintained when a security breach is detected, contained and resolved, it follows that detection, containment and resolution are appropriate resilience services. Detection The first step required of any system in the event of a security breach, is the detection of that breach. The Concise Oxford Dictionary defines detection as: ‘the action or process of identifying the presence of something concealed.’ (Oxford 1993) Similarly, the United Nations’ World Health Organisation defines detection as being: ‘an action or circumstance that results in the discovery of an incident.’ (World Health Organization 2007) These definitions are very similar to the definitions and concepts of resilience proffered by John Johnson in his landmark 1958 publication ‘Analysis of Image Forming Systems’, where detection is defined as: ‘the ability to say that something of interest is present in the image’. However, Johnson also highlights the important relationship between detection to the functions of recognition and identification, where recognition is defined as: ‘the ability to determine the class of object present, such as a car or aircraft.’ and identification is defined as:

10 ‘the ability to determine the type of object present, such as the make of car or the type of aircraft.’ (Johnson 1958) Adopting a similar approach to Johnson, Tucker et al define detection as being: ‘an indication of a state change within a network or host.’ (Tucker et al. 2007) Like Johnson, Tucker et al describe important relationships between detection and other the functions of recognition and identification, where recognition is defined as being the ability of a system to declare: ‘the type of attack, such as Distributed Denial of Service (DDoS), reconnaissance, or User to Root (U2R).’ and identification is defined as being the capability of a system to declare: ‘the exploits used to achieve the intrusion, such as buffer overflow or an application-specific vulnerability.’ (Tucker et al. 2007) More recently in the electronic domain, the NIST recognised the importance of identification as part of detection in their definition of the detect function as being to: ‘develop and implement the appropriate activities to identify the occurrence of a cybersecurity event.’ (NIST 2014) As a result of the preceding discussion, a definition of detection is proposed as follows: Detection is the recognition and identification of a security breach. Additionally, recognition and identification (of a security breach) are considered to be resilience mechanisms subordinate to the resilience service of detection where, as defined by Johnson, recognition relates to the class of a security breach, and identification relates to the type of a breach. Containment Once a security breach is detected, the next step is to contain the security breach before a system is overwhelmed or irreparable damage is caused. The Concise Oxford Dictionary defines containment as being: ‘the action of keeping something harmful under control or within limits.’ (Oxford 1993) Sridharan et al discuss both temporal and spatial containment, where containment is defined as setting the circumstances where an error or security breach: ‘cannot propagate outside its associated scope.’ (Sridharan et al. 2008) The Open Group define containment as being: ‘an organization’s ability to limit the breadth and depth of an event; for example, cordoningoff the network to contain the spread of a worm.’ (The Open Group 2009) In the context of risk containment, the World Health Organisation defines containment as being the: ‘Immediate actions taken to safeguard from a repetition of an unwanted occurrence.’ (World Health Organization 2007)

11 More recently, Patrick Kral of the SANS Institute described containment as the ability of a system to: ‘limit the damage and prevent any further damage from happening’. (Kral 2011) The definitions are clearly consistent with respect to limiting the impact of a security breach. Cichonski et al argue that a key element of limiting the impact of a security breach is decisionmaking, where shutting down a system or disabling certain functions can limit the impact of a security breach (Cichonski et al. 2012). The reference to decision making again highlights the importance of governance in the execution of resilience measures. The ability to absorb the initial effect(s) of a security breach and survive any lasting effect(s) is implicit in the above definitions of containment. These two terms were also prevalent in the definitions of resilience listed earlier. As a result of the preceding discussion, a definition of containment is proposed as follows: Containment is the act of controlling a security breach. Additionally, absorption, survivability, and impact limitation are considered to be resilience mechanisms subordinate to the resilience service of containment. Resolution Following the containment of a security breach, the final step in the delivery of resilience is the resolution of the security breach, including the restoration of security. The Concise Oxford Dictionary defines resolution as: ‘the action of solving a problem or contentious matter.’ (Oxford 1993) Cichonski et al list eradication and recovery as the final stages in the resolution of a security breach, executed ‘in a phased approach so that remediation steps are prioritized’. (Cichonski et al. 2012) While not specifically defining either term, they describe eradication as being: ‘necessary to eliminate components of the incident, such as deleting malware and disabling breached user accounts, as well as identifying and mitigating all vulnerabilities that were exploited’. (Cichonski et al. 2012) Cichonski et al further describe recovery as involving: ‘such actions as restoring systems from clean backups, rebuilding systems from scratch, replacing compromised files with clean versions, installing patches, changing passwords, and tightening network perimeter security (e.g., firewall rulesets, boundary router access control lists)’. (Cichonski et al. 2012) The Open Group describe resolution as including remediation and recovery, where remediation is defined as: ‘an organization’s ability to remove the threat agent.’ and recovery is defined as: ‘the ability to bring things back to normal.’ (The Open Group 2009) These definitions are very similar and comprise of the consistent themes of recovery, correction, normalisation, and eradication. These themes can be grouped as the restoration of a previously held state following a security breach.

12 As a result of the preceding discussion, a definition of resolution is proposed as follows: Resolution is the application of a solution to a security breach. Additionally, restoration, and eradication are considered to be resilience mechanisms subordinate to the resilience service of resolution. RESILIENCE MECHANISMS Applying the same logic applied during the derivation of the security ontology in previous work, it follows that: A resilience mechanism is an activity that, alone or in combination with others, contributes to the provision of a resilience service. Resilience mechanisms for each of the three resilience services are derived directly from Table 2. For detection, the resilience mechanisms are recognition and identification; for containment, the resilience mechanisms are absorption, survivability and impact limitation; and for resolution, the resilience mechanisms are eradication and restoration. The function performed by each resilience mechanism is as follows: Recognition: Recognition is a resilience mechanism that determines the class of a security breach. Identification: Identification is a resilience mechanism that determines the type of a security breach. Absorption: Absorption is a resilience mechanism that enables the initial effect of an event to be tolerated and for normal operations to continue, despite a security breach. Survivability: Survivability is a resilience mechanism that enables any lasting effect(s) of an event to be tolerated, and includes realising objectives in the face of adversity, and maintaining an acceptable level of service. Impact Limitation: Impact Limitation is a resilience mechanism that constrains the consequences of an event. Eradication: Eradication is a resilience mechanism that removes any residual threat agent. Restoration: Restoration is a resilience mechanism that enables the re-establishment of a previous level of security, and includes recovery and rapid bounce-back. Resilience tools and governance functions are likely to be used in combination to implement any or all resilience mechanisms that are required to contribute to the provision of the resilience service. Summary of the Proposed Ontology of Resilience Table 3 summarises the preceding discussion as a hierarchical resilience ontology of definitions supported by a hierarchy of resilience services and resilience mechanisms. Resilience is achieved when a security breach is … detected, contained, Resilience Services

Detection

Containment

and resolved. Resolution

13

Resilience Mechanisms

Recognition Identification

Absorption Survivability Impact Limitation

Eradication Restoration

Table 3: Summary of Revised Ontology of Resilience Resilience Mechanisms: Conditions and Means As noted earlier, setting the required level of each security service and resilience service is a specific governance function that will be derived from the threat assessment and risk management processes within an organisation. Like the security services, each resilience service should be described in both functional and physical terms: first in terms of the conditions required by the organisation for that service, and then in terms of the possible physical means by which the service can be implemented. The organisation has the freedom to choose the condition, based on a security risk assessment, and for any chosen condition there will be a range of options for the means of implementation. By way of example consider the resilience mechanism of containment following a physical intrusion into an access-limited building. Responders must first decide what conditions they wish to place on the degree of containment to be enforced by the system. Clearly, those responsible for the building may not want an intruder to have free access to the entire building and its contents, and may seek to contain the intrusion to a particular section, floor, or even room of the building. The possible means of providing the mechanism must then be considered. A high degree of containment can be achieved if the exact location of the intruder is known, and individual rooms, sections and floors can be ‘locked down’. Those responsible for the building must then conduct a risk analysis of any second or third order effects of implementing a resilience mechanism. If, for example, decision makers chose to continue normal operations – that is, absorb the intrusion – they must accept that they cannot guarantee that the intruder will not gain access to other areas of the building and further erode resilience. Like security therefore, there are two governance aspects for each resilience service—the conditions placed on the service, and the selection of an appropriate means. Additionally, the consideration of one resilience service may be entirely dependent on the other services. For example, the containment of a particular security breach is contingent upon that breach being discovered by the detection mechanism. Similarly, the resolution of a security breach may be independent of detection and containment, such as in an automatically over-writing website. However, resolution may also be dependent upon the breach being detected, in a reactive manner, and/or potentially contained before resolution can occur. Using the previous example of an intrusion into an access-controlled building, the resolution of the security breach – that is, the removal of the intruder – would require the intrusion to be detected, and the containment of the intruder to a particular known location, so that security staff can arrest and remove the intruder from the building. CONCLUSION AND FUTURE WORK Resilience is an important concept and quality that is applied to address security breaches that were not anticipated in the design and implementation of security measures. Resilience is different to security in that resilience is spectral, addresses unknown threats, and has application after security is lost; while security is brittle, addresses known threats, and has application before a security breach. Building on previous work by the authors, a new ontology of resilience terminology and definitions, as summarised in Table 3, is proposed. The definitions are presented in a hierarchy developed by functional decomposition from the base definition of resilience. A new definition for resilience is proposed as follows:

14 Resilience is the maintenance of the nominated state of security. Using the same functional decomposition approach as used in previous work to develop definitions of security services and security mechanisms, a new definition for a resilience service can be developed: A resilience service is a process that, alone or in combination with others, maintains the nominated state of security. Similarly, a new definition of a resilience mechanism can also be developed as: A resilience mechanism is an activity that, alone or in combination with others, contributes to the provision of a resilience service. These definitions encapsulate the intent and meanings of current resilience terminology, and are therefore not in conflict with current usage. Like the security ontology developed previously, the resilience terms developed here are applicable across the electronic, physical, and personnel security domains. The next stage of this research will use quantitative analysis to demonstrate the utility of the new resilience ontology. Simulation will be used to develop a mathematical probabilistic model that will enable the assessment of specific resilience mechanisms and tools in the face of particular threats. In conjunction with ongoing refinement of the security framework developed earlier, the quantitative representation will complete the technical aspect of the security and resilience framework under development. Additionally, the framework will be incomplete without the incorporation of the governance aspects of security and resilience. As described earlier, nominating the desired state of security, designating the resource(s) to be secured, limiting time, constraining financial and other resources, conducting cost-benefit analyses, and applying lessons learned are critical governance functions that must be addressed as part of this framework. REFERENCES Baskerville, R., Spagnotetti, P., and Kim, J. (2014), 'Incident-centred information security: Managing a strategic balance between prevention and response', Information and Management, 51 (1), 138-51. Cichonski, P., Miller, T., Grance, T., and Scarfone, K., (2012), 'Computer Security Incident Handling Guide: Recommendations of the National Institute of Standards and Technology', in National Institute of Standards and Technology (ed.), (Special Publication 800-61 Revision 2; Washington DC: US Department of Commerce). Dahms, T., (2010), 'Resilience and Risk Management', The Australian Journal of Emergency Management, 25 (2), 21-26. DeBardeleben, N., Laros, J., Daly, J., Scott, S., Engelmann, C., and Harrod, B., (2010), 'High-end Computing Resilience: Analysis of Issues Facing the HEC Community and Path-Forward for Research and Development'. Gibson, C.A., and Tarrant, M., (2010), 'A 'conceptual models' approach to organisational resilience', The Australian Journal of Emergency Management, 25 (No. 02), 6-12. Hathaway, M.E., (2009), 'Speech to RSA Conference', (San Francisco). Jackson, S., and Ferris, T.L.J., (2013), 'Resilience Principles for Engineered Systems', Systems Engineering, 16 (2), 152-64. Johnson, J., (1958), 'Analysis of image forming systems', Image Intensifier Symposium (US Army Engineering Research Development Laboratories, Fort Belvoir, USA). Kral, P., (2011), 'The Incident Handlers Handbook', (The SANS Institute). NIST (2001), 'Underlying Technical Models for Information Technology Security', in Gary Stoneburner (ed.), (Special Publication No. 800-33: National Institute of Standards and Technology).

15 --- (2014), 'Framework for Improving Critical Infrastructure Cybersecurity Version1.0'. Nyberg, K. A., (1988), 'Denial of service flaws in SDI software-an initial assessment', Aerospace Computer Security Applications Conference, 1988., Fourth, 22-29. Ohio State University 'What is Resilience?', , accessed 13 August 2010. Oxford (1993), 'The New Shorter English Dictionary', (New York: Oxford University Press). Rahman, Nurul Hidayah Ab and Choo, Kim-Kwang Raymond (2015), 'A survey of information security incident handling in the cloud', Computers and Security, 49, 45-69. Sridharan, V., Liberty, D. A., and Kaeli, D. R. (2008), 'A taxonomy to enable error recovery and correction in software', Workshop on Quality-Aware Design. Standards Australia (2006), 'Security Risk Management', (HB 167:2006 Sydney: Standards Australia). --- (2010), 'Business continuity — Managing disruption-related risk'. Talbot, J., and Jakeman, M., (2009) Security Risk Management Body of Knowledge [online text], John Wiley & Sons, Inc. The Open Group (2009), 'Risk Taxonomy', (C081; Berkshire: The Open Group). The White House (2010), 'National Security Strategy', in The White House (ed.), (Washington, DC). Thompson, M., Ryan, M., and McLucas, A. (2012), 'Security Systems Engineering: Using Functional Decomposition to Resolve a Confused Taxonomy', INCOSE International Symposium (Rome). Tucker, C.J., Furnell, S.M., Ghita, B.V., and Brooke, P.J., 'A new taxonomy for comparing intrusion detection systems', Internet Research, 17 (1), 88 – 98. United Nations Office for Disaster Risk Reduction (2009) Terminology on Disaster Risk Reduction [online text], United Nations International Strategy for Disaster Reduction Whitehorn, G., (2010), 'Building Organisational Resilience in the Public Sector', Comcover Insurance and Risk Management Conference (National Portrait Gallery Canberra). Woods, D.D., (2007), 'Essential Characteristics of Resilience', in Erik Hollnagel, David D. Woods, and Nancy Leveson (eds.), Resilience engineering: Concepts and precepts (Aldershot: Ashgate Publishing, Ltd), 21-34. World Health Organization (2007), 'Report on the Web-Based Modified Delphi Survey of the International Classification for Patient Safety', in World Alliance for Patient Safety (ed.), (Geneva). BIOGRAPHIES Marcus Thompson is a Brigadier in the Australian Army with extensive experience in communications and information systems. He holds bachelor and masters degrees in engineering, business and strategy, and is currently undertaking doctoral research with the University of New South Wales at the Australian Defence Force Academy. Dr Mike Ryan is a senior lecturer at the University of New South Wales at the Australian Defence Force Academy. He holds bachelor, masters and doctor of philosophy degrees in engineering, and his research interests include project management, systems engineering, requirements engineering and military communications and information systems. He is the author or co-author of nine books, three book chapters, and over a hundred technical papers. Dr Alan McLucas is a senior lecturer at the University of New South Wales at the Australian Defence Force Academy. He holds bachelor, masters and doctor of philosophy degrees in engineering, management and operations research, and has had extensive experience in management, complex problem solving, and strategy development. Alan is widely published in the systems thinking and system dynamics modeling literature and is the author of two books on these subjects.