This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
Eciently Computing the Likelihoods of Cyclically Interdependent Risk Scenarios Steve Muller
a,b,c
a
b
c
, Carlo Harpes , Yves Le Traon , Sylvain Gombault , c
Jean-Marie Bonnin
itrust consulting s.à r.l., {steve.muller, harpes}@itrust.lu b University of Luxembourg,
[email protected] c Telecom Bretagne, {sylvain.gombault, jm.bonnin}@telecom-bretagne.eu a
Abstract
Quantitative risk assessment provides a holistic view of risk in an organisation, which is, however, often biased by the fact that risk shared by several assets is encoded multiple times in a risk analysis.
An apparent solution to this issue
is to take all dependencies between assets into consideration when building a risk model.
However, existing approaches rarely support cyclic dependencies,
although assets that mutually rely on each other are encountered in many organisations, notably in critical infrastructures. To the best of our knowledge, no author has provided a
provably
ecient algorithm for computing the risk in
such an organisation, notwithstanding that some heuristics exist. This paper introduces the dependency-aware root cause model (DARC), which is able to compute the risk resulting from by a collection of root causes using a poly-time randomized algorithm, and concludes with a discussion on real-time risk monitoring, which DARC supports by design.
Keywords:
Cyclic dependencies, cyclic causal graphs, risk analysis, risk
assessment, quantitative assessment, dependency graph.
1. Introduction
Risk management constitutes an important aspect of decision taking, especially if the outcome is uncertain or has a large-scale impact on an organisation, which is why it forms a basis for many information security standards, includ-
Preprint submitted to Computers & Security
November 24, 2016
This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
ing ISO/IEC 27xxx [1]. Risks can be evaluated in two ways [2]: qualitatively and quantitatively. Furthermore, some authors have suggested combining [3] or converting [4] both methods to get better results, but this topic is beyond the scope of this paper. In a
qualitative
assessment, risk scenarios are identied and then estimated
in terms of probability and impact on a discrete (and often abstract) scale, which consists of some few values, such as `low', `normal', `high' dened set of
unacceptable tuples hprobability, impacti
aut al.
A previously
permits to distinguish
between risk scenarios for which counter-measures need to be implemented (so as to reduce risk) and scenarios that are critical for an organisation.
Freq.
Impact
low
normal
high
critical
very often often
X
normal
X
X
rarely
X
X
X
Figure 1: Sample table that can be used in a risk analysis. White cells denote acceptable, black cells unacceptable risk scenarios. In contrast,
quantitative
assessments focus more on the potential damage
that is inicted to an organisation, e.g. in nancial terms. So instead of qualifying a risk scenario as above, its likelihood and impact are expressed numerically; for example, by stating that scenario X is estimated to occur every 5 years and when it does, it causes a loss of 10 k¿, leading to an expected loss of 2 k¿ per year. Note that unlike above, quantitative risk analyses provide an integral view of risk faced by an organisation since all scenarios can be inspected and compared to one another at once, thanks to the numerical value of the expected losses. By consequence, the urgency of securing a specic asset can be readily deduced, which is not so obvious to achieve in qualitative analyses. A major drawback of many risk assessment methods is the fact that the
2
This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
1
determined risk is biased; indeed, assets often share
the same risk scenarios
which are included in the risk analysis of all these assets individually. Although this is the way to go when considering each asset on its own, the risk scenario in question is going to be accounted for multiple times in the global risk analysis, the outcome of which thereby becomes distorted. It is thus sensible to eliminate any redundancy from the risk assessment, for instance by assigning each scenario to the most related asset. However, by proceeding so, the separate view on an individual asset is no longer complete, since it does not take care of every possible risk scenario. To counteract this behaviour, several authors [5] [6] [7] [8] [9] [10] incorporate the assets and/or risk scenarios together with their
interdependencies
into a hierarchical graph,
based on which they deduce the risk for an asset, a group of assets or the whole organisation by reading o all subordinate risk scenarios.
holistic
individual
view
view
biased
X
redundancies removed
X
incomplete
dependency model
X
X
complete scenario list for each asset
Figure 2: Flaws and strengths of the several risk assessment models. Check marks (X) indicate correct outcome. However, hardly any risk assessment model supports cyclic dependencies, although cycles exist in every (sub-) system where the compromise of one component aects the whole (sub-) system. This is especially true in the context of Industrial Control Systems (ICS), where cascading eects can be of devastating order; for instance, power availability and a voltage control system (requiring electricity to work) constitute a common example of interdependent assets.
1 For illustration, take a server hosting a critical service. The well-functioning of the re-
sponsible software is threatened on the one hand by vulnerabilities (bugs, security aws) of the
service itself, but also, on the other hand, by any down-time of the server. 3
This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
Cases from the non-ICS realm include a web service with a database (containing the administrator password) and an administration interface (permitting to read out the database); this setting is further elaborated in Figure 3.
User Database
Admin Web Interface
Web Interface
Backup Location
password disclosure modied credentials
XSS unauthorized access SQL injection
users cannot connect condential data leak
unauthorized access database dump disclosure
Figure 3: Simple example of a cyclic dependency graph represented by the model introduced in this paper. `x → y ' means `x can lead to y '. This illustration depicts a poorly designed web service hosting condential and valuable data (e.g. medical information) where the administrator can change any user passwords and can retrieve any of the regularly made backups of the user account database. Note the dependency cycle `admin interface backup location user database' (dotted lines). Some authors [11] [6] propose a solution to deal with cyclic dependency graphs using graph unfolding techniques, but they fail at providing a complexity analysis for their methods. This paper introduces a novel approach for computing the risk faced by cyclically dependent assets. Indeed, the proposed algorithm is based on a randomised (non-deterministic) simulation and in contrast to other algorithms provably ecient (which is important when doing real-time risk monitoring). Section 2 presents related papers dealing with (cyclic) dependencies in risk assessments. The risk model used by the algorithm is dened in Section 3, along with the algorithm itself, whereas the proof of its correctness and running time can be found in the Appendix. The conducted experiments and their results are exposed in Section 4, Section 5 deliberates a generalisation of the model to also support more complex dependencies, and the paper closes with a conclusion in Section 6.
4
This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
2. Related work
Breier [7] encodes information security assets and their dependencies in a directed graph, where edges denote causal relations between nodes. The model supports the use of logical and (for assets that depend on all parents) and or (for assets that depend on one of the parents) operations. In the risk computation, dependencies manifest themselves by added-value impact to the risk of dependent assets. Aubigny et al. [12] establish a risk ontology for highly interdependent (critical) infrastructures, based on the estimation of quality of service (QoS). The proposed model supports risk prediction and incorporates a data structure which allows QoS information to be shared among interconnected infrastructures. Xiaofang et al. [13] classify assets into three layers, viz. business, information and system. risk
=
impact
×
On the lowest layer, risk is computed traditionally as
likelihood.
Dependencies appear in the model as weighted
impact added to the risk of dependent higher-level assets.
MAGERIT
[14] is a risk assessment methodology supported by the Spanish
government which also deals with asset dependencies by embedding them into a graph. Assets are characterized by their security objectives, viz. condentiality, integrity and availability, which are linked together whenever they have an impact on each other.
The methodology does not specify an explicit way to
perform the risk assessment, though. Fenz et al. [15] use a very abstract approach by establishing a framework describing in detail how to semi-automatically generate a Bayesian network (thus encoding dependencies in a causal graph) from an ontology. Rahmad et al. [16] aim at improving on thoughts from [15].
MAGERIT
by combining it with
A major dierence consists in the fact that they use an
exhaustive list of threat scenarios instead of security objectives, which considerably increases the size of the model. Baiardi and Sgandurra [17] provide a methodology
Haruspex
based on
attack trees that permits the likelihood of a threat to be deduced. The link to
5
This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
asset dependency is that, in a certain sense, causal relations can be deduced from the attack path of an intruder compromising one asset after another in order to reach his goal. The model, which is based on a
Monte Carlo simulation, requires
a lot of parameters and estimates before it can be used. McQueen et al. [18] propose a methodology to deduce an expected time-tocompromise (representing risk) from attack paths which encode dependencies between services in a SCADA environment. Utne et al. [9] focus on cascading eects in critical infrastructures by analysing the interdependencies between several high-level services (such as electricity) and their impact to the society. They also provide a framework to quantize the several magnitudes involved in the risk assessment, allowing an explicit computation of risk. Most interestingly, the risk itself is expressed as expected number of people aected by an incident. Wang et al. [6] present an algorithm for computing probabilities of occurrence of cyclic dependencies, but they do not analyse the running time, which is generally exponentially large and thus only works for small or sparse graphs. Homer et al. [11] adapt the concepts and algorithms known from Bayesian networks to the realm of risk assessments and apply a graph unfolding technique to generalise the model for cyclic dependency graphs. They do not guarantee for any running time bounds, either.
3. Modelling the Risk Analysis Context
3.1. Risk Assessment The
context
of a risk analysis is primarily determined by the set of
assets,
which can be virtually anything of value to a company or institution. In terms of risk, each asset is characterized by its impact on business when one of its (security-related) properties can no longer be guaranteed. Such properties are called
security aspects
in the following, and include
2
most notably the three
2 Note that on the one hand, some of these aspects might not be applicable to certain
assets. On the other hand, it is sometimes sensible to add further properties, or to be more 6
This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
notions [19] below.
Condentiality the assumption that sensible information is known only to a well-dened group of people
Integrity the state that an asset is guaranteed to remain in a well-dened state
Availability the property that an asset can be accessed in the way that was previously dened
The impact itself is expressed in nancial terms; this approach has the notable advantage that estimates have an objective meaning and can thus be easily compared to each other. More precisely, the impact is dened to be the nancial damage caused by the threatened security aspect, or the amount of money necessary to recover back to the original state. That way, one can compute the
loss expectancy (LE)
which represents the total losses to be expected in given
period, typically a year:
LE
=
X s:risk
impact(s)
· likelihood(s),
(1)
scenario
where likelihood(s) denotes the number of times that scenario
s
is expected to
occur in the given period (i.e., the expected frequency) quantity usually to be estimated by a risk assessor, a methodology or a tool. This paper will present an algorithm for eciently computing the likelihood function, see Section 3.5.
3.2. Dependencies A considerable aw in many risk management models is the lack of understanding of asset dependencies. Indeed, consider a hard disk hosting valuable data (and suppose, for the sake of the example, that no back-up is available),
specic about existing ones especially if the impact considerably changes when doing so. For instance, one usually wants to distinguish between interruption, and
temporary unavailability, causing business
permanent loss, which may be fatal to business. 7
This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
then a hard disk failure does not only require the physical disk to be replaced (which is cheap), but also implies the complete loss of core data (which may be business-ending). By consequence, dependency-unaware models do not give disk health monitoring the attention it deserves. In fact, asset dependencies represent nothing else than relations of cause and consequence of security incidents on given assets. Causal graphs are thus a natural candidate for encoding these relationships in a mathematical model. Recall that a causal graph on a vertex set of events is a directed graph such that two incidents are linked whenever they cause one another.
3.3. Likelihoods and Probabilities Whereas many approaches found in the literature (e.g. [7], [20], [21]) use an abstract scale (like `low', `medium', `high') for describing the likelihood of an event, this paper relies on concrete physical magnitudes that support the direct use in a computation.
Magnitude
Unit
Description
¿
Damage faced if threat occurs
Likelihood
1/y
Expected occurrence per year
LE
¿/y
Expected loss per year
Impact
Figure 4: Table summarizing the notions involved in the assessment of risk A reason why one uses to prefer an abstract scale over a number is because precise values are rarely known, so only a rough estimate can be given. However, to make the assessment task easier, one can still restrict to a given set of discrete values to choose from
for orientation, and interpolate to express slight nuances.
One possible such mapping is given in Figure 5, but can (and has to) be adapted to the setting in question. Note the fundamental dierence between `likelihood' and `probability'. Probabilities are only meaningful when characterising a random event
a priori :
happens or not with a certain chance. Likelihoods, however, express the
teriori
it
a pos-
statistical occurrence of events over time. Mathematically, probabilities
8
This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
level
likelihood
/y
very low
every 30 years
0.0333
low
every 10 years
0.1
moderate
every 3 years
high
once a year
very high
once a month
0.333 1 12
Figure 5: One possible mapping of an abstract scale to a concrete likelihood.
[0, 1],
are unit-less and lie within and have as unit
1 time
whereas likelihoods can be arbitrarily large
.
Both notions `probability' and `likelihood' can be easily confused, because they are somewhat related. It is meaningless to say that a risk scenario occurs with a certain probability, though. Indeed, consider the statement there is a 10% chance of re and suppose that the damage caused by re is 100 k¿, then 3
it is clearly not obvious
to determine the expected damage over a given period.
If, however, one estimates the likelihood of re to be in average once every 10 years, then the expected damage is trivially
105 ¿ ·
1 10 y
= 10 k¿/y .
The model introduced in this paper makes use of both notions, that is, on the one hand, the hand, the
likelihood
probability
that it
of an incident or risk scenario, and on the other
entails
another (dependent) incident.
In order to distinguish between the two, write and
L
P
for the probability measure
for the likelihood.
3.4. The Dependency-Aware Root Cause ( darc) Model Pick a set of nodes choice is to opt for
V,
each representing a security incident.
V ⊆ A × S,
where
security properties. For instance,
A
is the set of all assets and
S := {C, I, A}
A possible
S
the set of
could comprise condentiality,
3 Is it 10% per day? Per year? How can one then express events that are estimated to occur
very often, like, every day? Keep in mind that probabilities have an intrinsic upper bound of 100%.
9
This is a preprint of an article published in Computer & Security:
integrity and availability threat scenarios.
a.s
instead of Let
E ⊆ V ×V
incident of
(a, s) ∈ A × S
(α, β)
α
http://dx.doi.org/10.1016/j.cose.2016.09.008
In that case, for readability, write
to denote the security aspect
be the set of (directed) edges such that
has an impact on incident
if the set of edges
E
β.
s
a.
of an asset
(α, β) ∈ E
For readability, write
i security
α→β
instead
is understood from the context. Illustrating the
notation, the example above can be rewritten in a very short and intuitive form:
HDD.A
→ Data.A,
which reads if the availability of the hard-drive is compromised, then so is the availability of any data stored on it. The tuple to as the
(V, E)
constitutes a directed graph which is henceforth referred
(asset) dependency graph.
It is not required to be acyclic.
Whereas manual estimation of the full probability distribution of the several incidents is theoretically possible, statistical experiments in real-world systems are usually infeasible, for it would require the simulation of threats on business processes. Instead, the proposed model will use a slightly simplied approach by basing itself on estimating the probability that a particular incident 4
another, independently of possible other causes . mapping
p : E → [0, 1]
entailed by
α.
where
Note that this is
entailed by other events.
p(α → β)
not
causes
Formally, this describes a
denotes the probability that
the same as
P[β|α],
since
β
β
is
could also be
p(α → β) is often denoted P[β | do(α)] or P[β | set(α)]
in the literature [22]. Dene a
root cause
to be a vertex without parent nodes in the dependency
graph. These are the causes for which an explicit likelihood needs to be specied later on, whereas for non-root nodes it is deduced from the model. An example of a dependency graph is depicted in Figure 6. The edge weights represent the values of
p.
4 In particular, the parent causes of an event are related by a boolean OR operation. The
model can be extended to support arbitrary boolean formulas as well, see Section 5.
10
This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
DoS Router.I
0.1
Power.A
0.01 0.5
Database.A
0.001 0.8 Server.A
Figure 6: An example illustrating the representation of the model as a graph. A and I stand for the availability and integrity properties of the assets, respectively. Edge weights encode the values of the probability map p. Note how the model does not make a dierence between external threats (circled) and security properties of assets (boxed).
3.5. Computing the probability distribution The complexity of the DARC model lies in the fact that one is interested in the
probability
that a certain sequence of events cause each other, which is
dierent from the problem of nding such a sequence. Indeed, for the former, one needs to determine of them occurs.
all
such sequences and compute the probability that
one
For this, simply listing those sequences and adding up their
probabilities of occurrence yields wrong results, since some edges are accounted for twice.
3.5.1. The acyclic case If the dependency graph is acyclic, well-established theory can be used to describe the full probability distribution. Indeed, by its denition, the dependency graph together with the associated probability distribution constitutes a Bayesian network. This case has been extensively studied [23], notably in [24] and [11]. This paper will thus focus on cyclic dependencies. It is important to note that already in this simpler case, it is computationally infeasible to determine the full probability distribution of
general
Bayesian net-
works [25]. If one assumes further properties of the graph, ecient algorithms do exist, though [26]. A new approach will thus be required to support cyclic graphs as well.
11
This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
3.5.2. The general case For cyclically related security incidents, computing their likelihoods constitutes a more delicate problem than it seems.
C
A
B
E
F
D
Figure 7: A simple cyclic dependency graph By the way how the model is dened, an event (i.e., a node) is not triggered multiple times throughout the course of the experiment, but once and for all. For a concrete instance of the random experiment, the issue consists in nding all events that are activated by any of the root causes. See for example Figure 7: event
E
E
can be caused by
C
or
F , but inspecting the situation in more detail,
is only caused by either of the two event chains
In particular, chain
E
can only by triggered by
C
if
C
A→B→C→E
or
F → E.
is not already triggered by the
F → E → D → B → C.
By consequence, the probability that a risk scenario occurs cannot only be expressed by the probability of its direct parents, but has to involve all cycle-free paths from a root node. The computation eort for enumerating all such paths 5
can be huge , which is also why any eorts of nding an
ecient
deterministic
algorithm failed. Instead, the randomized algorithm given in Algorithm 1 aims to give a good approximation; note that risk assessments as introduced in this paper do not require input data to be precise and are stable with respect to uctuations. The running time of Algorithm 1 is polynomial in its input data.
More
5 In the worst case, namely in complete graphs, the running time is exponential in the
number of vertices.
12
http://dx.doi.org/10.1016/j.cose.2016.09.008
This is a preprint of an article published in Computer & Security:
Algorithm 1 Compute probability matrix
G = (V, E)
Input: Graph
Input: Probability map Input:
with root nodes
VR ⊂ V
p : E → [0, 1]
ε > 0, δ > 0
Output: Probabilities
C : VR × V → [0, 1] that a root node causes a node, each
value with absolute error at most
ε.
The algorithm will fail with probability
δ.
at most
1: γ := 1+ε√ε
2: N := ε26γ ln 2n δ 3: 4:
for
where
(vr , v) ∈ VR × V
n := |V |
do
C(vr , v) ← 0.
5:
end for
6:
loop
N
times
7:
Sample a random graph
8:
for
9:
vr ∈ VR for
10: 11:
14:
G
according to
vr
v
then
do
path in
G0
from
to
C(vr , v) ← C(vr , v) + 1/N
12: 13:
∃
from
do
v ∈ V (G0 ) if
G0
end if end for end for
15:
end loop
16:
return
C
13
p
This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
precisely, it is bounded by
2n O n · m · ln · ε−3 , δ where
n is the number of vertices, m is the number of edges, δ ε
that the algorithm output is wrong and
is the probability
is an upper bound for the absolute
error of the computed values. Observe the logarithmic dependency on
δ,
which
permits amplifying the algorithm accuracy without signicantly increasing its running time. The proof of correctness, running time and error probability is given in the annex.
3.6. Real-Time Risk Monitoring The darc model permits to be used for real-time
6
risk monitoring in the
sense that the likelihoods of the root nodes, which usually have to be estimated manually, can be automatically determined by external sources such as intrusion detection systems (IDS) or security information and event management (SIEM) appliances. Observe that the function
C : VR × V → R
depends on the probability weights
p
computed in Algorithm 1 only
encoded into the graph, but not on the
likelihoods of the root causes. By consequence, since the model (including
p)
is
not supposed to change during the risk monitoring phase, the above algorithm is only invoked once, namely after the model design phase. As such,
C
can be
considered static. Since
C
C ∈ RVR ×V
expects two arguments, it can be viewed as a two-dimensional matrix where each row represents a root node
arbitrary node
∈V.
In fact, an entry of
C
∈ VR
and each column an
denotes the probability that a given
root node (= row) entails any given node (= column).
6 Note that `real-time' denotes a process where an explicit bound on the running time is
known.
14
This is a preprint of an article published in Computer & Security:
If the vector
LR ∈ RVR
then the likelihoods
L
http://dx.doi.org/10.1016/j.cose.2016.09.008
denotes the (estimated) likelihoods of root causes,
of the all nodes can be computed as
L = L> R · C, where
·
denotes matrix multiplication and
Moreover, let
I∈R
V
L> R
the transpose of the vector
LR .
be the vector that holds the direct impact caused by each
node. The global risk can nally be written as
risk
= L> R ·C ·I
In a more explicit fashion,
risk
=
X X
LR (vr ) · C(vr , v) · I(v),
vr ∈VR v∈V where
C is computed from the model by Algorithm 1, the impacts I
estimated by a risk assessor and the root cause likelihoods
LR
are manually
are dynamically
monitored by external sources (IDS, SIEM ...).
4. Experiments
The algorithm is constructed in such a way that it can be interrupted at any point, yielding a less precise, but complete solution. More precisely, if it is aborted after
αN
steps, for
at most by a factor
α
− 31
0 < α < 1,
then the relative error
ε
will increase
: this estimate follows directly from the denition of
N
(the number of simulation iterations) and is formally proven in Lemma 1 in the appendix. Moreover, since all simulations are run in an independent manner, they can be perfectly run in parallel (proting from multi-threading capabilities of a CPU) or in a distributed way. To test the performance of the algorithm on `average' graphs, dependency graphs have been generated uniformly at random. A typical risk analysis may
15
This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
12
execution time (ms)
10 8 6 4 2 200
400
600
number of nodes
800
1,000
n
Figure 8: Execution time of Algorithm 1 in seconds, depending on the graph size n, with ε = 0.1 and δ = 0.01 and an average of 5 neighbours per node.
7
8
cover up to 50 dierent assets , each of which generally encounters 35 threats , so a related graph is composed of a few hundred nodes. It is sensible to assume that nodes are not connected (in average) to more than a few edges, so a typical graph will consist of a few thousand edges at most. The simulation was performed on a dual-core 2.5 GHz processor (i7-3537U). The results are depicted in Figures 8, 9, 10 and 11 as expected, they reect the running time computed in Proposition 1 in the appendix. The precise numbers can be found in Table .1 in the appendix. In order to compare the performance of Algorithm 1 to other approaches, similar experiments have been conducted with for straight-forward deterministic algorithms.
For example, consider the simple recursive algorithm which
conditions on the existence of each edge. The former relies on the mathematical
itrust consulting
7 Based on experience from past risk analyses performed by . 8 Most often, these threats include the criticality, integrity and availability aspects of each
asset, which can be further sub-divided (e.g. temporary unavailability vs. permanent loss).
16
This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
execution time (ms)
300
200
100
0.1
0.2
0.3
relative error
0.4
0.5
ε
Figure 9: Execution time of Algorithm 1 in seconds, depending on the precision ε of the results, with n = 500 and δ = 0.01 and an average of 5 neighbours per node.
observation that, for each edge
Pr[∃ path v
e ∈ E,
w] =
p(e)· Pr[∃ path v +(1 − p(e))· Pr[∃ path v
w | e] w | ¬ e],
which can easily be turned into a recursive algorithm, computing the probability that a path exists between any two nodes
v
and
w.
Unfortunately, such
algorithms have exponential running time and take more than a few minutes already for small graphs (|V
| ≥ 20, |E| ≥ 200).
All other attempts to solving the problem in a deterministic way resulted in similarly bad execution times.
5. Extension to boolean formulas
The dependency graph is based on the concept of causality; that is, the parents of a node represent alternative causes, each of which can engender the consequential scenario.
Formally, the dependency relationship of a vertex
17
v0
This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
execution time (ms)
6
4
2
0
10−4
10−3
10−2
10−1
algorithm correctness
δ
Figure 10: Execution time of Algorithm 1 in seconds, depending on the correctness δ of the algorithm output, with n = 500 and ε = 0.1 and an average of 5 neighbours per node.
and its parent nodes
Pv0 ⊂ V (G)
can be expressed as
_
ρ(v0 ) :=
Ix ,
x∈Pv0 where
Ix
denotes the boolean variable encoding whether the event
x
occurs or
not. The beauty of Algorithm 1 lies in the fact that it does not depend at all on the topology of the graph or on the form of the dependencies. In fact, generalising the `or' relations to arbitrary boolean expressions
ρ(·)
is straight-forward and
does not change the main lines or the proof of the algorithm; yet considerable adaptations have to be made in order to nd whether a node is triggered or not (line 10 in the algorithm). For general boolean formulas the eort for computing the likelihoods is considerably higher, since a recursive search might no longer be possible (see e.g. Figure 12). A dierent theory, such as boolean satisability [27], is required in these matters. Moreover, the comparatively good running time of Algorithm 1 was due to the fact that evaluating the likelihood (sc. nding all cycle-free paths) can be implemented in an ecient way.
For more complex boolean formulas this
18
This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
execution time (s)
60
40
20
0 200 num b er
400
600
of n o de
s
800
0
n
1 80,00000,000 6 40,000 0,000 20,000 ges m b er num
of ed
Figure 11: Execution time of Algorithm 1 in seconds, depending on the graph size n and m, with ε = 0.1 and δ = 0.01. may not necessarily be the case (indeed, the boolean satisability problem sat is
N P -complete
9
[27]) so that deterministic (and possibly even error-free)
algorithms could outperform the simulation variant.
6. Conclusion and Outlook
This paper provides a simple and lightweight approach for encoding asset dependencies into a graph structure.
Since that graph is not assumed to be
acyclic, the model can also be used in environments with interdependencies, such as in Industrial Control Systems (ICS) or Critical Infrastructures (CI). The major contribution of this piece of work (apart from the DARC model) is Algorithm 1, which computes the resulting risk in such a graph, but in a provably ecient way.
Indeed, as it turned out, any deterministic approach
9 Given an arbitrary boolean formula ρ on variables x , . . . , x , the n 1
in determining whether there is an
sat problem consists
assignment α ∈ {0, 1}n such that ρ(x1 := α1 , . . . , xn :=
αn ) = 1 .
19
This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
Y
A
A∧X
Y ∧B
B
X
Figure 12: Endless loop in dependencies for general boolean formulas: in fact, in order to evaluate A ∧ X , one needs to evaluate
both parents, including X and thus Y ∧ B and Y . But
Y can only be evaluated if A ∧ X is known already. It is not so clear how to proceed in such a
case: one solution is to set the likelihood to zero for all non-reachable nodes, because in fact the cycle can never be entered; however, this might not be sensible in all use-cases. 10
that we could think of, is computationally too complex
to serve as a basis for
any usable algorithm. The DARC model was developed with the intention of creating a tool that continuously computes and monitors the taking all dependencies into account. relationships between incidents (i.e.
current
risk faced by an organisation,
For now, it merely encodes the causal
A
causes
B ),
so that a quantitative risk
assessment can only be performed in a very basic way (risk = likelihood
×
impact) this approach has been discussed in Section 3.6. Therefore, the next steps consist in embedding the DARC mode into a whole risk methodology, by including more ne-grained notions into the model (such as threat exposure, vulnerabilities or preventive measures). Doing so will also enable more sources of risk information to be integrated into the monitoring tool, for instance software agents that rate and report the performance of preventive security measures.
10 That is, the running time is exponential in the number of nodes and edges, which rapidly
becomes a problem already for small graphs (|V | ≥ 30).
20
This is a preprint of an article published in Computer & Security:
http://dx.doi.org/10.1016/j.cose.2016.09.008
7. Acknowledgements
This work was supported by the Fonds National de la Recherche, Luxembourg (project reference 10239425).
Appendix
Proposition 1.
Algorithm 1 is correct with probability δ and terminates within
time
O n · m · ln
2n δ
·ε
−3
.
Moreover, each computed value lies within an interval of ±ε around the true value. Proof.
Fix a root node
vr ∈ VR .
For
v ∈ V
and
the indicator variable that there is a path from
1 ≤ i ≤ N,
vr
to
v
in the
let
Xi (v)
i-th
be
random
experiment. Observe that
N 1 X E[Xi (v)] = E[X0 (v)] = P[X0 (v) = 1], N i=1 that is, the quantity approximated by the algorithm (left hand-side) equals the probability that node
v
vr .
is reachable by
So if the random experiments do not
deviate too much from their expectations, the algorithm output is correct up to a certain relative error, which is determined in the following. Dene Fix
γ
and
v ∈ V.
N
as in the algorithm. Note that
Suppose for now that
µ(v) ≥ γ .
γ < ε < 1. Using a two-sided Cherno
bound [28],
" N # X P Xi (v) − N µ(v) > N µ(v)ε i=1 2 ε ≤ 2 exp − N γ 3 δ = 2. n
21
(by the choice of
N)
This is a preprint of an article published in Computer & Security:
µ(v) ≤ γ < ε, using a one-sided "N # X P Xi (v)/N > ε
If however
http://dx.doi.org/10.1016/j.cose.2016.09.008
Cherno bound,
i=1
"
# ε − µ(v) =P N µ(v) Xi (v) > 1 + µ(v) i=1 ! 2 ε − µ(v) N µ(v) ≤ 2 exp − µ(v) 3 N ≤ 2 exp −(ε − γ)2 3γ ! 2 ε−γ 2 ln(2n /δ) . = 2 exp − εγ By the denition of
γ
N X
it holds that
with probability at least
If
µ(v) ≥ γ
δn−2
µ(v) < 1, P N i=1 Xi (v) − N µ(v) is If
µ(v) ≤ γ
ε−γ > εγ
and thus
(∗) ≤
vr,0 .
δ n2 . To summarize,
the following two statements hold:
this also implies that the absolute error at most
ε;
e(v) :=
ε.
then the absolute error
e(v)
error is at most
Note that the statements above hold for any xed vertex node
(∗)
then the relative error of the random experiment is at most
however, since
v0
ε.
and any xed root
Using a union bound,
P[∃vr ∃v : e(v) > ε] ≤ n2 · P[e(v0 ) > ε] = δ yielding the desired error probability for the algorithm. Regarding the running time, note that the inner for-loop can be implemented (e.g. using a breadth-rst search) in linear time at most
n
O(m)
root nodes, whereas the sampling requires time
total execution time of
Lemma 1.
for each of the
O(m),
resulting in a
O(n · m · N ).
For xed δ , if Algorithm 1 is aborted after αN iterations, for 0