Northampton. Square. London. EC1V 0HB. Abstract. In spite of much research effort, there is no universally applicable software reliability growth model which.
Recalibrating
Software
Sarah Brocklehurst,
Reliability
P Y Chan,
Models
Bev Littlewood
-y
Centre
for Software Reliability City University Northampton Square London EC1V 0I-IB
John Snell Computer Science Department City University Northampton Square London EC1V 0HB
Abstract In spite of much research effort, there is no universally applicable software reliability growth model which can be trusted to give accurate predictions of reliability in all circumstances. Worse, we are not even in a position to be abl_ to decide a priori which of the many models is most suitable in a particular context. Our own r_ecent work has tried to resolve this problem by developing techniques-whereby, for eccch program, the accuracy of various models can be analysed. A user is thus enabled to select that model which is giving the most accurate reliability predictf-ons for the particular program under examination. One_of these ways of analysing predictive accuracy, which we callthe uplot, in fact allows a user to estimate the relationship between the predicted reliability and the true reliability. In this paper we show how this can be used to improve reliability predictions in a completely general way by a process of recalibration. Simulation results show that the technique gives improved reliability predictions in a large proportion of cases. However, a user does not need to trust the efficacy of recalibration, since the new reliability estimates produced by the technique are truly predictive and so their accuracy in a particular application can be judged using the earlier methods. The generality of this approach would therefore suggest that it be applied as a matter of course whenever a software reliability model is used.
(NASA-CR-186407) RFLIA_ILITY
MODELS
RECALl BRATING (City Univ.)
N90-19763
SOFTWARE 36
p CSCL
09B G3161
Unclas 0270364
1
The
Introduction
earliest
twenty
attempts
years
ago.
to measure
and predict
In spite of considerable
is still no definitive
method
Perhaps
not be surprising.
this should
not easy.
Perhaps
or model
the major
the reliability
research
which
occurred
work in the intervening
can be universally
Estimating
difficulty
of software
about
years,
recommended
and predicting
software
is that we are concerned
primarily
there
as 'best'. reliability
is
with design
faults.
This situation theory.
is very different
Here
the dramatic
concentration
on the random
have
understanding
a good
depend
upon,
reliabilities
the
theory,
failures
similar
It seems
quarter
century
processes
of physical
failure.
Thus,
hand,
likely,
of such flaws
of components designs.
they represent
on hardware
system
reliability
we now
hardware on
the
of design
a
systems other,
the
hardware
faults
to the
intelligent
strategies
to
results
in a higher
proportion
of
flaws
in hardware
systems
are
the result good
from
to use
Such
of this, that obtaining
come
of this physical
the importance ability
reliability
for example,
structure,
The very success
by flawed
have
of complex
system
Our
failure
faults:
as a result
detailed
systems.
caused
to software
the
reliabilities
is now revealing
of physical
being
the
components.
of complex
the effects
system
one
hardware
of the past
of how
however,
reliability
minimise
very
on
by the conventional
advances
of the constituent
reliability overall
from that tackled
of human
methods
misunderstandings.
for measuring
will be as difficult
the effect
as measuring
software
reliability.
Software
has
no significant
inherent
design
faults
circumstances.
These
in the
design
original
theories
of how
require
better
sciences,
rather
of these
sciences
or in subsequent faults
software;
in arriving
any dramatic
These
difficulties
come
for solutions.
in order
problem
failures
in the software We
are
appropriate
Presumably
the
look to social
theories social
good would
processes
and psychological
In view of the comparative understanding,
their creation
do not have such
and
merely
operational
since
currently
solving
perhaps
at quantitative
breakthrough
recently.
into being.
if so, we should
Software under
changes.
of human
lack of success
it would
be wise
not to
in the short term.
notwithstanding,
modelling
themselves
will have been resident
than physics,
expect
user can choose
faults
understanding
in writing
manifestation.
revealing
software
involved
reliability
physical
there
have
been
important
In fact, there is now a plethora
to make
reliability
estimates
2
advances
of models
and predictions.
in software
from which However,
the none
of these has been able to decide This
shown
in a particular
presents
measures
Our
recent
whereby data
work
source.
then be sensible,
selection
of a model,
Indeed,
this 'best'
model
predicted
methods and actual
two especially noise
of past
techniques
errors
will be shown cases,
particular
need
apply
such
models
case can be analysed,
means
on a particular
and select
for each the model
predictions.
to use that model
for courses'
approach
obviates
is provided
data which
It would for the next the need
with its 'best'
for
model.
as more data is collected.
work
by analysing
In particular, which
closeness
they provide
work
can be used
between
information
about and
is that this knowledge
to improve
general
future
of the
predictions.
The
and are not model-dependent.
predictive
this efficacy
the
we call bias (or ill-calibration)
idea in the present
in improving
not take
by devising
techniques,
reliability
each data source
here are quite
to be effective
but users
could
of departure
of prediction
to be described
to use.
in obtaining
of past predictions
by several
behaviour.
The key
model
interested
this problem
the accuracy
selection
types
appropriate
is solely
to tackle
This 'horses
may change
important
who
the most accurate
instead
failure
be the most
of any other information,
of model
(or variability).
nature
about
best by giving
in the absence
and we are not presently
can have confidence.
produced
on that data source.
new
user,
is that a user
a priori
These
would
attempted
to the results
has so far performed
prediction
[1] has
intention
(program),
which
he/she
can be made
The
in all circumstances,
for a potential
in which
judgements
source
context
difficulties
reliability
own
to be applicable
accuracy
on trust:
just like any other
their
model,
They
in a high proportion predictive
using
accuracy
our earlier
of in a
techniques
[11.
2
Reliability
In its simplest variables
is being
to fix the fault
represent found
form,
T1, T2 .....
as a program failure
growth
the
software
debugged. which
[ 1, 8, 15].
predictive
reliability
Tn, representing
this fault-finding
elsewhere
and
caused
and fixing
growth
the execution
It is generally that
problem times
assumed
failure.
operation:
accuracy
between
vary
of different
the
random
successive
that attempts
Models details
concerns
failures
are made
in the way approaches
at each
that they can be
At stagei, whenobservationstl, t2.... , ti-I havebeenmadeof the first i-1 inter-failure times,theobjectiveis to predictfuturefailure behaviourrepresented by the unobserved Ti, Ti+l .... randomvariables. Informally, thepredictionproblemis solvedif we can accurately estimatethe joint distribution of any finite subsetof Ti, Ti+l .... This statement,however,begsthe questionof whatwe meanby 'accurately',andit is this issuewhich formsa majorpartof ourearlierwork [1]. In practice, of course, a user will be satisfied with much less than a complete descriptionof all future uncertainty.In manycases,for example,it will be sufficientto know thecurrentreliability of thesoftwareunderexamination.This could bepresented in manydifferentforms: thereliability function,P(Ti < t); thecurrentrateof occurence of failures (ROCOF), [3]; the mean (or median) time to next failure (mttf). Alternatively,a usermaywishto predictwhena target reliability, perhaps to be used as the criterion
for termination
If we accept
that prediction
competing
software
comparing
the relative
allow
(i)
us to predict
(ii)
the future
a statistical
statements
about
Of course,
the model
it can be seen
models
of prediction
model
which
parameter
that
is misleading.
systems.
(Ti, Ti+l ...) from
specifies
the usual We
A prediction
the past (tb t2 ....
the distribution
discussion
should,
instead,
system
which
of be will
ti-1) comprises:
of any subset
of the Tj's
o_ ;
inference
procedure
for
procedure
combining
(i) and (ii) to allow
future
o_ involving
use
of
available
data
is an important
is not sufficient:
stages
system.
In fact disaster
can strike
(ii) and
There
is not 'close (iii)
are vital
at any of the three
to be possible
to gain trust in (or to mistrust) this is not possible.
part of this triad
if the model
model
it ought
us to make
probability
Tj's.
can be obtained
In principle,
goal,
of Tj's);
a prediction
predictions
will be achieved.
growth
merits
on a (unknown)
(realisations
(iii)
is our
reliability
the probabiIistic
conditional
of testing,
to analyse
the predictions.
are several
and it seems to reality'. components
4
that good
However,
a good
of the prediction
stages.
each of the three Unfortunately,
reasons.
unlikely
stages
separately
it is our experience
so as that
In the In'stplace,themodels fit' approach does
to be attempted.
not allow
problem
are usually
the simplest
this kind of analysis.
for independent
of unknown
Even
identically
parameters.
too complicated
This
The reliability
exponential
should
distributed
for a traditional
not surprise
random
growth
order
statistic
model
[14]
us: the goodness-of-fit
variables
context
'goodness-of-
is hard in the presence
is much
worse
because
of non-
stationarity.
Secondly,
statistical
Bayesian
analysis
models
assume
an upper
of these
bound
asymptotic
are invariably
there is a proper
It involves
posterior for
for the popular
advances
in Bayesian
computers,
Finally,
of the greater
dubious
proposition.
cannot software when
it may be possible program,
or even
Their
small
framework.
this does
analytical
present
models.
coupled
some
However, with
with recent
powerful
of their
which
are 'obviously'
better
underlying
assumptions.
We
them.
of some However,
It is our belief
models
personal
this still leaves
others
which
even choose
model
under
to a program
development
this a naive
that we cannot
a reliability
find
overly
understanding
of the software
than others
seem
that
knowledge
of the software
estimators.
even
predictive
[I8],
are models
is so imperfect
to match
that we cannot
in the near future.
to discount
we have an intimate
is thus
at stage (ii) and Bayesian
growth
the assumptions
a priori.
engineering
reliability techniques
plausibility
be reasonable
be rejected
Unfortunately,
that there
Certainly,
(ML)
several
There
(ii) and (iii) in the Bayesian
of the parameters
may change
example,
of faults.
This implies
for a non-
hard to obtain.
to stages
software
be argued
because
and it might
likelihood
numerical
this picture
it could
for maximum
[2]).
For
number
Tj's.
approach
(see
only a finite
impossibly
parameters
not available.
of observable
distributions (iii)
of unknown
are usually
contains
theory
Of course,
difficulties
models
on the number
properties
distributions
of the estimators
that the software
trust the usual sample
properties
of the processes an appropriate
study.
At some
model
future
via the characteristics
methodology
used.
to obtain
trustworthy
of
time of that
This is not currently
the case.
Where
does this leave
for his current examination complete be done,
a user, who merely
software
project?
and comparison prediction the most
systems. important
Our
wants view
of the quality
is that there
is no alternative
of the predictions
emanating
In [1] we have described tools being
key idea in each case is that a comparison
reliability
the u-plot
ways
and the prequential
is made between
5
several
metrics
to a direct from
different
in which
this can
likelihood.
what has been
predicted
The and
what is (later) actually observed. We believethat this emulateshow a user would informally gainconfidencein a sequence of predictions. For simplicity weshallconcentrateonpredictionof thenexttime to failure Ti, basedon observationstl, t2.... , ti-1. The u-plot usesthe predictor _i(t), the estimateof the distributionfunctionFi(t) = P(Ti < t), via _ ui =
where
_i(ti)
ti is the
probability
later-observed
integral
function.
transform
{ui} should
are various we
types shall
distributed.
of the ui sequence.
u-plots
on 86 predictions:
(JM)
and 0.150 from
also performing
importantly
making
predictions
This can be seen (the true U(0,1) u values failure, almost
and
use
a U(0,1)
might
to see that
is the sample
[1].
show
{ui} sequence
ui is the
distribution
distribution
the
There
themselves;
looks
uniformly
cumulative
distribution
of this plot from the cdf of U(0,1), of the prediction
that is the maximum
standard
Thus
predictive
it is easy
which
the
which
of a departure distance,
tables
for Jelinski-Moranda
on a data
based
More
whether
Ti.
system vertical
to determine
the
from accuracy. deviation,
whether
as a
or not it is
significant.
predictions
prediction
from
The departure
is an indication
the Kolmogorov
1 shows
making
with
the
is good,
such an appearance
be concerned
of this departure
Figure
from
sample
variable
using
{_i(ti)}
We shall do this via the u-plot
can use
statistically
observation
of predictions
of departure
line of unit slope,
measure
of the
of the random
look like a random
only
(cdf) function
We
realisation
If the sequence
sequence
here
(1)
set, called
_51(t)
(LV).
The
first
JM; the second poorly
cdf), so there
The
are too many
These
whilst
suggesting
suggests
of the plots
LV predictions above
small ui values.
A similar
plots
are 0.205 very
poor
that this model
the chance argument
tells
us that
is
the line of unit slope
But consistently of small shows
JM is
are too pessimistic.
times
too small between
that a plot which
the line of unit slope, such as LV, is too pessimistic.
6
are each
to JM.
the shape
is underestimating
[13] models
distances
1% level,
JM plot is everywhere
is too optimistic.
below
at the
superior
purposes,
in [1].
The Kolmogorov
at 5%, which
are too optimistic,
as follows.
i.e. the model
_136(t).
is significant
for our present which
S1 [17], analysed
is significant
but is somewhat
tells us that the model
everywhere
through
[10] and Littlewood-Verrall
is
If we knew that thesedeviations between predicted and actual behaviour were consistent,we could attemptto measurethe degreeof optimism (or pessimism)and improvefuturepredictionsby takingaccountof this tendency.It is this ideawhich we shall develop in the next section. Before we do that, we shall briefly describethe prequentiallikelihood function (PL) which is a generalmechanismfor comparingthe accuracyof predictionsystems. The PL is definedasfollows. Thepredictivedistribution _i(t) for Ti basedon t 1, t2..... ti_1 will beassumedto havea probabilitydensityfunction(pd0 _i(t)
=
_i'(t)
Forpredictionsof Tj+I, Tj+2, .... Tj+n, theprequential
likelihood
is
j+n
PLn
=
YI
_i(ti)
(2)
i=j+l
A comparison Tj÷2 ....
of two prediction
Tj+n, can be made
systems,
A and B, over
via their prequential
a range
likelihood
of predictions
of Tj+I,
ratio
j-l-n
YI
'_i A (t i)
i=j+l PLRn
=
(3) j+n
FI
'_i B (t i)
i=j+l
Notice
how,
in a fashion
contributions
analogous
to the prequential
likelihood
pdf for Ti of the the later-observed as
n ---) _, prediction
with which suggests works.
system
we inevitably the superiority
Specifically
To summarise, prediction
are obtained
realisation
of A over
B.
for a particular
data
[7] shows
in favour
of A.
bias or noisiness
otherwise
7
into the predictor that if PLRn
For the finite
reasons
---) oo samples
consistently why
the PL
of a prediction
system
for choosing
the best
be the case.
as a general source.
the individual
that PLRn increasing
In [1] we give intuitive
that consistent
PL than would
by substitution
ti. Dawid
B is discredited
the PLR can be regarded
system
of the u sequence,
have to deal, we shall argue
we show
will tend to give a smaller
to the calculation
The
procedure u-plot
is a means
of indicating
a
particular kind of consistentinaccuracyof prediction which could be a contributory factor in poor predictive accuracy. Thus a poor u-plot might suggestthat poor predictiveaccuracy(represented by a poorprequentiallikelihood) is dueto consistent bias. For sucha case,we shall showin the next sectionhow it is possibleto remove the biasandsoimprovetheaccuracyof reliability predictions.
3
Recalibration
Consider
of predictions
a prediction
distribution
is Fi(t).
_bi(t)
of the random
Let the relationship
variable
between
Ti, when
these
the
true
be represented
(unknown)
by the function
Gi where
Fi(t)
= Gi[
Obviously,
if we
inaccurate
predictor,
many
cases
changing
the
(4)
lbi(ti) ]
knew
Gi we could
_i(ti).
sequence
recover
The key
notion
true
distribution
in our recalibration
{ Gi } is approximately
stationary,
of Ti from approach
i.e.
the
is that in
it is only
slowly
in i.
If the sequence precise
were
completely
interpretation
would
also have
using
it to improve
stationary,
the possibility
of estimating
the accuracy
of future
in practice
does
to be the case that the sequence
seem
i.e. Gi = G for all i, we would
of the idea of 'consistent
Of course,
opens
the
such complete
up the possibility
bias' used the common
a more
section.
We
G from past predictions
and
predictions.
stationarity
of approximating
in the previous
have
is unlikely
to be achieved.
changes
only
slowly
Gi with
an estimate
in many Gi* and
However, cases.
it
This
so forming
a
new prediction
_i*(ti)
A suitable function calculated formed
= Gi*[
estimator
for Gi is suggested
of Ui = t}i(Ti). from from
(5)
_i(ti)].
predictions
the ujs for j