2013 13th International Conference on Quality Software
Bayesian Probabilistic Monitor: A new and efficient probabilistic monitoring approach based on Bayesian statistics Yuelong Zhu1 , Meijun Xu1 , Pengcheng Zhang1,2 , Wenrui Li3 , Hareton Leung4 College of Computer and Information, Hohai University, Nanjing, P.R.China 210098 2 State Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, P.R.China, 210093 3 School of Mathematics & Information Technology, Nanjing Xiaozhuang University, P.R.China, 211171 4 Department of Computing, Hong Kong Polytechnic University, HongKong, P.R.China Email:
[email protected] 1
These requirements can be generally formulated in the form of probabilistic quality properties [6]. Examples are “whenever a patient requests a service the system will in 99.999 percent of the cases respond to the request within 30 seconds.” and “the probability that telecommunication system is not available for more than five minutes is less than once per year”. However, existing probabilistic monitoring approaches either first estimate a probability and compare it with a predefined θ which lacks powerful statistical analysis [4], or follow classical statistical procedures, such as Wald’s Sequential Probability Ratio Test (SPRT) [18] which can hardly make a decision when the actual probability is inside the indifferent region, and also need the monitored probabilities be constant during the whole monitoring run [13], [8], [7].
Abstract—Modern software systems deal with increasing dependability requirements which specify non-functional aspect of a system correct operation. Usually, probabilistic properties are used to formulate dependability requirements like performance, reliability, safety, and availability. Probabilistic monitoring techniques, as an important assurance measure, has drawn more and more interest. Despite currently several approaches has been proposed to monitor probabilistic properties, it still lacks of a general and efficient monitoring approach for monitoring probabilistic properties. This paper puts forward a novel probabilistic monitoring approach based on Bayesian statistics, called Bayesian Probabilistic Monitor (BaProMon). By calculating Bayesian Factor, the approach can check whether the runtime information can provide sufficient evidences to support the null or alternative hypothesis. We give the corresponding algorithms and validate them via simulated-based experiments. The experimental results show that BaProMon can effectively monitor QoS properties. The results also indicate that our approach is superior to other approaches. Index Terms—Runtime monitoring; Probability Properties; Bayesian Statistics; Web Services
To deal with these problems, this paper proposes a novel probabilistic monitoring approach based on Bayesian statistics, called Bayesian Probabilistic Monitor (BaProMon). The approach absorbs the main idea of new research result of probabilistic model checking technique [12] based on Bayesian sequential hypothesis testing. It calculates Bayesian Factor and compares it with two predefined hypotheses H0 and H1 . However, different from [12], our BaProMon is used to evaluate probabilities from sample execution paths during actual services execution while their approach is used for full probabilistic models at design-time.
I. Introduction Modern software applications are increasingly embedded in an open world [2]. They are more and more required to adapt to highly dynamic, distributed and heterogeneous environment. Some software systems utilize autonomous, platformindependent services from third parties to compose them dynamically. Moreover, the execution environment or context is changing continuously at runtime. All of these reasons may lead to some uncertainty, such as changes of service component interface, dynamic selection, and component behavior itself. These kinds of uncertainty may increase the possibility that services does not meet dependability requirement, especially for probabilistic quality requirements. Runtime monitoring technique focuses on verifying the correctness and reliability of software systems at runtime, and detecting the shortcoming, or anomaly, which may cause adaptation, self-optimization, and reconfiguration [5]. Recently there is increasing research on monitoring dependability requirements such as performance, reliability, and availability.
The contributions of the paper are summarized as follows: •
•
•
Pengcheng Zhang is Corresponding Author
978-0-7695-5039-8/13 $26.00 © 2013 IEEE DOI 10.1109/QSIC.2013.55
45
the formal syntax and semantics of a specification called 3-valued semantic Probabilistic Linear Temporal Logic (PLTL3 ) is defined to precisely specify probabilistic properties; the first application of Bayesian Sequential Hypothesis Testing for monitoring probabilistic properties called BaProMon is proposed; BaProMon and its corresponding algorithms are validated by a set of dedicated experiments. The experimental results show that BaProMon is more efficient compare to current probabilistic monitoring approaches.
π, s1 |= GΦ iff, for all i ≥ 1, si |= Φ π, s1 |= Φ1 ∪ Φ2 iff there is some i ≥ 1 such that si |= Φ2 and for all j = 1, ..., i − 1 s j |= Φ1 π, s1 |= Sp (Φ) iff P (π |= Φ) p π, s1 |= pn p (Φ) iff for all j = 1, ..., n P (si |= Φ) p π, s j |= F ≤t Φ iff there is some j ≤ i ≤ j + t such that si |= Φ In theory,we observe an infinite trace of some system. But in fact, we only observe a finite sequence. Hence,a problem arises: using only two truth values may lead to misleading results in runtime verification. For example, formula G p, stating that all state satisfying p should occur. It is clearly that when ¬p is observed, the value of formula should be false. Yet despite continuous p hold, it is inappropriate to say that the formula is true, since the next observation might violate the formula. To solve these problems, we employ symbols ”⊥, and ?”, respectively denote three truth values: true, false, and inconclusive and extend that logic to PLTL3 . In case of u = s1 , s2 , ..., sn , u is a finite prefix of π, the truth value of a PLTL3 formula ϕ with respect to u, denoted by u |= ϕ p3 , is defined as follows: ⎧ ⎪ ⎪ ⊥ if ∀σ ∈ Σπ , uσ ϕ ⎪ ⎪ ⎪ ⎨ u |= ϕ p3 = ⎪ if ∀σ ∈ Σπ , uσ |= ϕ ⎪ ⎪ ⎪ ⎪ ⎩? otherwise Formulae generated from the above syntax are divided into two kinds: • Nonprobabilistic formulae consist of two types: a) Ordinary propositional formulae are verified to be true or false at any point in an execution run. b) Formula with constraint t determines certain assumptions hold at a state when relevant subformula is true within constraint t. For example, ”within next day, the doctors will make rounds of wards” can be specified as F ≤1day DoctorsWardRound. It holds when doctors make rounds of wards within sometime tomorrow. • Probabilistic formulae are evaluated over a number of traces according to relevant nonprobabilistic formulae holds with certain probability. For instance, Sp (Φ1 ∧ Φ2 ) is true for a sequential traces, if and only if the corresponding non-probabilistic formula Φ1 ∧ Φ2 is true with a probability above p. Some probabilistic formulae examples are listed below: 1) ”Blood bank encounters blood shortage and patient needs blood transfusion, can not happen concurrently with probability>0.0001” can be specified as S≤0.0001 (BloodS hort ∧ patiBloodT rans). 2) ”when a software failure is detected, the system will be self-healing within 2 min with probability 0.99” can be specified as FailureDetected → S≥0.99 F ≤2min S el f Healing .
The paper is structured as follows: Section II defines the formal syntax and semantics of PLTL3 . Section III includes three parts: introducing some basic concepts of runtime monitoring technique, Bayesian statistical theory, and the presentation of a series of statistical algorithms which realize sequential probabilistic monitoring process. In Section IV, we preform a set of experiments to validate our Bayesian probabilistic monitoring approach. Section V compares related work. Section VI concludes the paper and outlines directions for future research. II. Three-valued Semantic of Probabilistic Linear Temporal Logic (PLTL3 ) Several probabilistic temporal logics have been proposed to specify probabilistic properties, such as PCTL (Probabilistic Computation Tree Logic) [9], PFTL (Probabilistic Frequency Temporal Logic) [17], and PTCTL (Probabilistic Timed CTL) [1], [14]. Initially, these probabilistic temporal logics are proposed as specification languages for probabilistic model checking. They are used to specify certain temporal properties hold within a predefined likelihood. But in [3], Andreas et al. argue that branching time logics cannot be used for monitoring purpose. While model checking is the language inclusion problem, runtime verification deals with the word problem. Also unlike model checking collecting hundreds of different execution paths to verify probabilistic properties, we only consider a single path in runtime monitoring. Consequently, inspired by [3], our algorithms verify properties of system modeled as formulas in Probabilistic Linear Temporal Logic with 3-valued semantic (PLTL3 ). We first define the syntax and semantics of Probabilistic Linear Temporal Logic (PLTL). The syntax of PLTL is defined as follows: Φ true | atom | ¬Φ | Φ ∨ Φ | Φ ∧ Φ | Φ → Φ | XΦ | GΦ | Φ ∪ Φ | Sp (Φ) | pn p (Φ) | F ≤t Φ where p ∈ [0, 1] , ∈ {, ≥} , t ∈ N ∪ {∞}, and atom represents an atomic proposition. The temporal connectives X means “next state”, G means “all future states” and ∪ means “Until”. Sp (Φ) means that the occurrence probability of paths starting from the given state and satisfying Φ obeys the constraint p, which focus on steady-state properties; pn p (Φ) means that the occurrence probability of subpaths starting from the given state to the nth state and satisfying Φ obeys the constraint p, which focus on transient properties; F ≤t Φ means Φ occurrence within the constraint t. Let M = (S, →, L) be a model, π = s1 → s2 → s3 ... be a infinite trace in M, denote ith state of a path as si , Φ a PLTL formula. The satisfaction relation for PLTL3 is defined as follows: π, s1 |= true π, s1 |= atom iff a ∈ L (s) π, s1 |= ¬Φ iff s1 Φ π, s1 |= Φ1 ∨ Φ2 iff s1 |= Φ1 or s1 |= Φ2 π, s1 |= Φ1 ∧ Φ2 iff s1 |= Φ1 and s1 |= Φ2 π, s1 |= Φ1 → Φ2 iff s1 |= Φ2 or s1 Φ1 π, s1 |= XΦ iff π, s2 |= Φ
III. Bayesian statistical monitoring In this section, we first provide some basic concepts of runtime monitoring technique and give the fundamental idea of
46
Each monitoring run is a binary experiment and the probability of success is a constant probability θ. The outcomes of different monitoring are independent and identically distributed. Probabilistic monitoring based on repeated binary experiments can determine whether the probabilistic property is fulfilled or not. According to Bayesian theory, prior information (such as experience or the historical data) are used to infer the prior distribution for statistical parameter θ. But in most cases, distribution of θ is unknown due to lack of prior information. Bayesian statistics suggests that any unknown distribution can be regarded as a random variable. In that case, according to Bayesian hypothesis, assuming θ can be any value in its value ranges, and each value of θ has the same probability, we can define the density function g (θ) as follows: ⎧ ⎪ ⎪ ⎨c, θ ∈ Θ g (θ) = ⎪ (1) ⎪ ⎩0, θ Θ
our approach. Then, the detailed approach and corresponding algorithms are given. A. Runtime Monitoring Runtime monitoring is a technique for checking correctness of a system at runtime by observing and checking system execution against its property specifications [10], [13], [15]. An execution trace of software is a continuous state sequence which may be infinite. It is impossible to investigate all the traces and their states. Therefore, runtime monitoring only considers the running trace and its states. The states are expressed as a set of values for observed variables. Based on these values, the monitor checks whether a system performs according to relevant specifications. To be specific, execution trace is a limited prefix of sequential system running. Runtime monitoring supervises software by collecting finite prefix of system execution and compares it with its specification. Fig. 1 gives an overview of the typical runtime monitoring paradigm.
where set Θ = {θ1 , θ2 , ...} is Value Field. For monitoring probabilistic property, θ ∈ (0, 1), g (θ) is uniformly distributed in the interval (0, 1). xi means the result of ith sample. Given θ, probability density function p (xi | θ) follows the Bernoulli distribution: p (xi | θ) = θ xi (1 − θ)1−xi
(2)
p (xi | θ) is conditional probability, which computes the probability of xi occurring with given definite θ. Joint probability density function p (d, θ) is defined as: p (d, θ) = p (d | θ) g (θ) Fig. 1.
Runtime Monitoring Paradigm
(3)
In Bayesian theory, posterior probability measures the likelihood that an event will occur given that a related event has already occurred. It is a modification of the prior probability. According to p (d, θ), posterior distribution p (θi | d) is:
B. Fundamental idea of Bayesian In the paper, we use statistical hypothesis testing based on Bayesian theory to determine the satisfiability of probabilistic properties. In runtime monitoring, we only have one execution path. Only if a system with repeating or periodic behaviors such as network protocols or time schedulers, can we collect the repeated behaviors from one path and evaluated its probabilistic properties according to successful behavior that means relevant PLT L3 formula being satisfied. To statistically check probabilistic properties, time is supposed as discrete steps and each execution run is considered as a series of discrete states. Repeated behavior is regarded as a kind of discrete states.
Let Φ1 → Sp F ≤t Φ2 be Φ1 → Sp F ≤t Φ2 where ∈ {, ≥} , t ∈ N, Φ1 and Φ2 be two states with repeated behaviors. When Φ1 is satisfied, it means Φ2 will be acquired within t steps or t time units. Let d = {x1 , x2 , ..., xn } denote samples of experimental results, and n is the number of repeated experiments during execution. xi means the result of ith experiment. When Φ1 occurs and Φ2 is satisfied within constraint t, xi = 1, otherwise xi = 0 . x = x1 + x2 + ... + xn is a random variable representing a number of successful experiments.
p (θ | d) =
p (d | θ) g (θ) p (d | θ) g (θ) dθ Θ
(4)
Reflecting the change of θ after sampling, p (θ | d) is the outcome of using overall information and sample information to adjust the prior information. According to the binomial distribution characteristics of monitoring experiments, we introduce Beta distribution as prior distribution (Beta distribution is the conjugate prior 1 to the binomial distribution). Its probability density function is as follows: 1 ∀u ∈ (0, 1) , B (u, a, b) = (5) ua−1 (1 − u)b−1 Be (a, b) where a > 0, b > 0. Be (a, b) is defined as:
1 Be (a, b) = ta−1 (1 − t)b−1 dt
(6)
0 1 In Bayesian probability theory, if the posterior distribution p (θ | d) is in the same family as the prior probability distribution p (d | θ), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood.
47
ϯϬ
With different parameters a and b, B (u, a, b) can approximate other smooth unimodal densities on (0,1). Especially, when a = b = 1, B (u, a, b) is uniformly distributed in (0, 1). From (8), we define the Beta distribution function pa,b (u):
u B (t, a, b) dt (7) ∀u ∈ (0, 1) , pa,b (u) =
ŶсϬ͕džсϬ
Ϯϱ
ŶсϭϬϬ͕džсϲϬ
ϭϱ
Accordingly, posterior distribution p (θ | d) is:
where γ = Hence, when n is large enough, we have:
ϱ
Ϭ Ϭ
Ϯϱ
Ϭ͘ϴ
ϭ
Ϭ͘ϴ
ϭ
Ŷсϱ͕džсϯ ŶсϮϬ͕džсϭϮ
ϮϬ
ŶсϭϬϬ͕džсϲϬ
x
ϭϱ
ŶсϭϬϬϬ͕džсϲϬϬ
ϭϬ ϱ Ϭ Ϭ
Ϭ͘Ϯ
According to Bayesian theorem, the posterior probabilities for hypotheses H0 and H1 are: p (d | H1 ) p (H1 ) p (d | H0 ) p (H0 ) p (H1 | d) = p (H0 | d) = p (d) p (d)
Posterior Probability Density Curve B (θ, a + x, b + n − x)
where g(t) is the prior density function. According to (7), Bayes Factor can be simplified as follows:
where p (d) = p (d | H0 ) p (H0 ) + p (d | H1 ) p (H1 ). The ratio of two posterior probabilities for hypotheses H0 and H1 given samples d is: p (H0 | d) p (d | H0 ) p (H0 ) B = = p (H1 | d) p (d | H1 ) p (H1 )
Ϭ͘ϲ
BF can be used as a tool to measure comparative confidence in H0 and H1 . Given threshold T > 1, if BF is greater than T, there are enough evidence to support the null hypothesis; if BF is smaller than 1 / T, there are enough evidence in favour of alternative hypothesis. In given sample set d = {x1 , x2 , ..., xn }, BF can be expressed as: 1 p (x1 | t) ...p (xn | t) · g (t) dt (14) BF = θθ p (x1 | t) ...p (xn | t) · g (t) dt 0
H0 : p ≥ θ, H1 : p 1, B; 1: if B > T then 2: return H0 1 3: else if B < T then 4: return H1 5: end if 6: return U In Algorithm 1, pointer S .current is incremented to point to the last item monitored. Whenever the observed value of a new monitoring run is available, ArrayList S is updated. Backward pointer S .Bac is used to dynamically point to the previous observed value, and it finally points to the nearest decision point. dec (Bac) decreases S .Bac. In lines 4 − 10, the algorithms will continue the backward analysis until accepting H0 or H1 or all the observed values of prior monitoring runs have been considered. It indicates that no conclusion could be drawn with Bayesian due to the limited monitoring results. In this situation, the procedure TrandiTestDeci will be called (lines 11 − 13), which uses the procedure described in Sammapun et al. [16]. The algorithm calculates Bayes Factor according to Eqs. (15) by calling mathematical function from standard mathematical libraries. Then, the algorithm invokes subroutine BayesDecision (Algorithm 2) to check whether H0 or H1 is satisfied and BaDeci stores the return value from BayesDecision. With predefined threshold T, if B is greater than T, accept H0 ; if B is smaller than 1 / T, accept H1 ; else return U indicating undecided result. For each monitor run, total number Bn and succeeding number Bx are first set to zero, and BaDeci is first set to U.
BSRM suffers from three problems. Firstly, because it uses Boolean ArrayList to store the results, which is a dynamic data structure, it would violate the principle of limited space overhead. For some embedded systems, such as handhold devices, it is not acceptable. Hence, we need to find a better data structure to replace ArrayList. Secondly, Bayes Factor will vacillate when probability of total samples is close to the value of θ, which makes it impossible to accept H0 . Thus, we need to improve the preprocessed results.Finally, each monitor run needs to do backward analysis and invoke mathematical function from standard mathematical libraries, which increases the time overhead. Thus, we need to optimize Algorithm 1, reduce unwanted invoking and minimize the time overhead. We therefore propose improved Bayesian Statistical Runtime monitoring. To solve the above problems, Algorithm 3 makes use of a ring buffer RB with size m to store the latest results of monitoring run; iBn and iBx are the amplified total number and suc-
49
In algorithm 4, BaExpand will continue backward until accepting H0 or H1 (lines 1-10) or all the observed value of previous monitoring runs have been considered (lines 11-15).
Algorithm 4 BaExpand Require: Threshold T > 1,B,i,n,x; 1: while BaDeci == U do 2: Decr (Dec) 3: n=n+1 4: if RB.dec == 1 then 5: x= x+1 6: end if 7: iBn = ni 8: iBx = xi 9: B = BayesFactor θi , iBn, iBx 10: BaDeci = BayesDecision (B) 11: if BaDeci == U and RB. f ull () then 12: BaDeci = T randiT estDeci (S .current) 13: Dec = mod (current, m) + 1 14: Break 15: end if 16: end while 17: preBaDeci = BaDeci 18: return BaDeci
IV. Validation In this section, we validate our algorithms using several Web services and compare their performance with an existing algorithm. The existing algorithm we choose is iSPRT (improved Sequential Probability Ratio Test) [7] which is the most effective monitoring algorithm in the literature. The experiments are designed to investigate the following four basic questions: • REQ 1: Can the proposed approach monitor a probabilistic property effectively and realize sequential monitoring? • REQ 2: How many samples are required to reach a conclusion? • REQ 3: How much execution time is required? • REQ 4: Are false negatives (Type I errors) and false positives (Type II errors) within acceptable error margins? Experimental setup: We have invoked four Web services for nearly two months to observe the response time and check whether it is accordant with the predefined probabilistic properties of Web services. In these experiments, the response time is computed by measuring the time from sending a request to receiving a response. We list the Web service, description, URL, response time requirement in Table I. Column 5 shows the expected time with specified probabilistic requirements. For example for WS1, “invoking a Domestic Flight service, Result-set should be acquired within 3.8 sec with probability 88%”. This can be specified as InvokingS ervice → S≥0.88 F ≤3800 Response . In this paper, we only shows the results of WS1 for the reason that experiments of other three Web service get similar results. Parameter setting: In our experiments, since Bayesian approach uses uniform distribution on (0, 1) as prior distribution, we set parameters of beta distribution a = b = 1. As Bayes Factor shows a summary of evidence whether samples are in favor of null hypothesis or not, the choice of threshold will directly affect accuracy and precision of results. Harold Jeffreys [11] had given the scale for interpretation of T, as shown in Table II.
ceeding number respectively. B = BayesFactor θi , iBn, iBx calculate with synchronous preprocessing θ, n, x. It changes the original model M (p ≥ θ) to new model M (p ≥ θ ). In model M, the null hypothesis is p ≥ θ, while in model M , p ≥ θ . For example, in model M, θ = 0.6, there are 62 samples satisfied the formal specification when the total number of samples is 100. Applying the preprocess procedure, let θ = 0.62 , then the total number is amplified to 10000 and 62 is now amplified to 622 . Actually, it increases the gap between the critical value and observed value to avoid Bayes Factor falling into the indifference region while keeping these two models equivalent. To reduce the time overhead, we reuse the result of iBSRM. preBaDeci is used to store the last Bayesian Decision and BaDeci is used to store the current results. If the new monitoring run causes an undecided result U, Algorithms 4 BaExpand is invoked to backward investigate until a decision can be made or the ring buffer RB is fully inspected, which indicates that no conclusion could be drawn. If BaDeci is not equal to undecided result U, Algorithms 3 only updates the decision pointer RB.Dec with no backward investigation. RB.current is incremented to point to the last item monitored. In detail (lines 12 − 30) , when the previous decision was undecided U and BaDeci are still undecided U, the algorithm invoke algorithm 4 to backward investigate. When RB is fully inspected and BaDeci are still undecided U, procedure TrandiTestDeci is invoked . When the previous decision accepts H0 and the latest observed value of monitoring run is equal to 1, we only increase RB.Dec to point to the later monitoring run. If BaDeci equal to U, it needs to invoke algorithm 4 . When the previous decision accepts H1 and the latest observed value of monitoring run is equal to 0, we only increase RB.Dec to point to the later monitoring run. If BaDeci equal to U, algorithm 4 are invoked .
T
Strength of evidence
100 : 1
Negative (supports H1 ) Barely worth mentioning Substantial strong Very strong Decisive TABLE II Strength of Evidence
By considering the ratio of H0 and H 1 under prior disθ tribution, we choose T = max (1−θ) , 1000 . Our experiments
50
URL
Response Range (s)
flight
http://www.webxml.com.cn/webservices/ DomesticAirline.asmx
≤ 3.8
88
TV List
Provide TV List of Chinese television
http://www.webxml.com.cn/webservices/ ChinaTVprogramWebService.asmx
≤5
90
WS3
Captcha Code
Random output four Chinese simplified picture; Return data: verification word according to picture
http://www.webxml.com.cn/WebServices/ ValidateCodeWe5bService.asmx
≤3
80
WS4
RMB Instant Quotation
Provide currency conversion rate from foreign currency to yen
http://www.webxml.com.cn/WebServices/ ForexRmbRateWebService.asmx
≤2
95
WSi
WS Name
WS1
Domestic Schedule
WS2
Description Flight
Provide aircraft schedule
Time
Probabilistic Requirement (%)
TABLE I Response time requirements of web services
With different number of failures, we analyze performance of three approaches by a performance curve presented in Figure 5. A performance curve demonstrates the probability of accepting the null hypothesis for various true probability and shows whether different approaches have equal statistical power. Figure 5 indicates that there is little difference between iSPRT and iBSRM, but the acceptance probability for BSRM is a little lower than others where true probability is nearly to threshold probability θ in the PLTL3 formula. This suggests that both iSPRT and iBSRM are superior to BSRM.
compare with iSPRT used Type I (false negatives) and Type II (false positives) error bounds of 0.03 and select appropriate indifference region for iSPRT to make Bayesian’s Type I and Type II error less than or equal to iSPRT. A. REQ1: Can the proposed approach monitor a probabilistic property effectively and realize sequential monitoring? In order to answer this question, we systematically inject failures into the original observed values of WS1 with predefined probabilities. For WS1, a correct event is that the request is responded within 3.8sec. A failure occurs if there is a timeout. In Figure 3, we plot the curve of response time of WS1 which we seed failures between index 800 to 1200 with probability more than 15%. As shown, response time of Domestic Flight Search service is unstable and vary with time. Intuitively, we don’t know if it satisfies the requirement. Comparing results of BSRM, iSPRT and iBSRM, Figure 4 shows the results, where “0” means H0 been denied, “1” means accepted H0 and “-1” means undetermined. Vertical line shows the changes of analysis results. Subinterval 800 to 1200 clearly shows that iBSRM makes faster reaction and finds the probabilistic changing earlier than both iSPRT and BSRM. Moreover, when the other two approaches fail to make a decision, iBSRM can still get a decision accepting H0 or not.
Fig. 4.
Results of Monitoring Run
B. REQ2: How many samples are required to reach a conclusion?
Fig. 3.
Both BSRM and iBSRM sweep backward to find the nearest decision point from the new monitoring run to the first run. Therefore, the required monitoring sample size is not fixed. We investigate the bounds by comparing result against iSPRT. For iSPRT, the required sample size depends on the actual probability p. If p is inside the indifference region (p − δ, p + δ), it is highest. So we observe two situations: a) p is outside the indifference region; b) p is inside the indifference region. We repeat ten times with each time collecting 25000 monitoring runs to observe how many samples drop into the indifference region which means that the approach has failed
Domestic Flight’s Response Time with Seeding Failure
51
%650
ϭ Ϭ͘ϵ Ϭ͘ϴ Ϭ͘ϳ Ϭ͘ϲ Ϭ͘ϱ Ϭ͘ϰ Ϭ͘ϯ Ϭ͘Ϯ Ϭ͘ϭ Ϭ
L6357 L%650
Ϭ
Ϭ͘ϭ Ϭ͘Ϯ Ϭ͘ϯ Ϭ͘ϰ Ϭ͘ϱ Ϭ͘ϲ Ϭ͘ϳ Ϭ͘ϴ Ϭ͘ϵ
ϭ
ϭ Ϭ͘ϵ Ϭ͘ϴ
Ϭ͘ϳ Ϭ͘ϲ Ϭ͘ϱ Ϭ͘ϰ
Ϭ͘ϯ Ϭ͘Ϯ Ϭ͘ϭ Ϭ
Ϭ
Ϯ ϰ ϲ
ϴ
(a)
Fig. 5. Performance Curve of BSRM, iSPRT and iBSRM with < θ = 0.88 >
ϭ
to determine whether the probabilistic property is fulfilled or not. More samples within the indifference region means the specific algorithm needs more samples to make a decision, even thoroughly unable to judge with limited samples. The result is shown in Figure 6. Figure 6(a) shows that when actual probability p1 is outside the indifference region, all of BSRM, iSPRT and iBSRM can effectively accept H0 or H1 . With increasing number of samples, the failure probabilities of all three approaches approximate zero. Figure 6(b) shows when actual probability p2 is inside indifference region, the monitoring results of BSRM and iSPRT cannot accept H0 or H1 . However, iBSRM can still make a distinction. Both Figure 6(a) and Figure 6(b) indicate the expected number of samples for iBSRM is less than the other two approaches.
Ϭ͘ϵ Ϭ͘ϴ
Ϭ͘ϳ Ϭ͘ϲ Ϭ͘ϱ Ϭ͘ϰ
Ϭ͘ϯ Ϭ͘Ϯ Ϭ͘ϭ
Ϭ Ϭ
Ϯ
ϰ ϲ
ϴ
(b) Fig. 6. Comparison of Failure Sample Size of BSRM, iSPRT and iBSRM with < θ = 0.88, (a) p1 = 0.93, (b) p2 = 0.88 >
D. REQ4: Are false negatives (Type I errors) and false positives (Type II errors) within acceptable error margins?
C. REQ3: How much execution time is required? The required time is also not fixed. We vary parameter p between 0.8 and 0.98 with 0.02 interval and set θ = 0.88. This is designed to observe effect of different parameter p on execution time concisely. For each p, we simulate 3000 monitoring runs and record the execution time. The result is presented in Figure 7. Execution time of iSPRT is slightly lower than that of BSRM and iBSRM. Although statistics of iSPRT is also calculated based on the previous results, it doesn’t need to call any library function. When BaDeci equals to U, BSRM reverses to find a decision point until H0 or H1 is accepted or no conclusion could be drawn by available samples. Library function is called more than one time. Consequently, BSRM’s execution time is a little higher than iBSRM’s. When p is near 0.98, successful samples occur with high probability, reducing the need of calling timeconsuming process BaExpand. In theory, BSRM and iBSRM have the same time complexity. When call time of standard mathematical libraries is a constant value, both of them is O (n).
In order to answer this question, we systematically simulate samples with a series of predefined failure probability. For each predefined failure probability (fr) we have generated 20000 samples and have respectively calculated the average ratio of three approaches. The results are shown in Table III, where the row Expected describes the theoretically expected results (IR stands for indifferent region, U means undecided). In Table III all three approaches accepts H0 in all cases when fr is less than or equal to 6% and accepts H1 in all cases when fr is more than or equal to 19%. For BSRM, when f r = 12%, the Type I error rate is largest (1.84%), which is less than α = 3%. When f r = 13%, the Type II error rate is largest (0.11%), which is also less than β = 3%. For iSPRT, when f r = 10%, the Type I error rate is largest (0.03%), which is less than α = 3%. When f r = 14%, the Type II error rate is largest (0.11%),which is also less than β = 3%. For iBSRM, between f r = 10% and f r = 12%, the Type I error rate is more than α = 3%, but it also shows that probability of accepting H0 is much higher than other two approaches, and between
52
ALG
Fr (%)
6
7
8
9
10
11
12
13
14
15
16
17
BSRM
Expected
H0
H0
H0
H0
H0
H0
H0
H1
H1
H1
H1
H1
H1
H0 (%)
99.02
97.77
76.44
52.31
12.70
6.02
0.19
0.11
0
0
0
0
0
H1 (%)
0
0
0
0
0
0
1.84
7.80
14.74
39.18
56.98
82.92
95.16
U (%)
0.98
2.23
23.56
47.69
87.30
93.98
97.97
92.09
85.26
60.82
43.02
17.08
4.84
iSPRT
iBSRM
18
Expected
H0
H0
H0
H0
H0
IR
IR
IR
H1
H1
H1
H1
H1
H0 (%)
99.20
98.85
94.73
89.87
52.04
30.12
14.07
2.40
0.11
0
0
0
0
H1 (%)
0
0
0
0
0.03
2.80
10.06
37.87
55.08
78.29
90.94
97.90
99.13
IR (%)
0.8
1.15
5.27
10.13
47.93
67.08
75.87
59.73
44.81
21.71
9.06
2.10
0.87
Expected
H0
H0
H0
H0
H0
H0
H0
H1
H1
H1
H1
H1
H1
H0 (%)
100
99.98
99.93
99.38
92.97
78.16
48.55
17.16
4.47
2.34
0
0
0
H1 (%)
0
0
0
0.30
5.42
18.33
46.67
79.14
93.67
96.89
99.97
99.98
100
U (%)
0
0.02
0.07
0.32
1.61
3.51
4.78
3.70
1.86
0.67
0.03
0.02
0
TABLE III The monitoring results with respect to injected fr for probabilistic property 0.88
f r = 13% and f r = 14%, the Type II error rate is more than β = 3%, but it also shows that probability of accepting H1 is much higher than other two approaches. If we increase the threshold, iBSRM can decrease Type I error rate and Type II error rate.
high precision, the Bayesian approach also cannot verify probabilistic property like other sequential hypothesis testing. Thirdly, our Bayesian method comparing composite hypotheses only can supply the results of “satisfaction” or “dissatisfaction” rather than reflecting the change of probability. It is not enough for the purpose of automatic adjustment with changing probability. Finally, in theory, iBSRM could avoid statistics falling into indifference region by choosing appropriate threshold. But as shown in Table III, sometimes the approach could bring about amplifying Type I errors and Type II errors to some extent. Although increasing threshold can decrease Type I errors and Type II errors, it also increases statistics falling into indifference region and decreases the capacity of making decision. For some applications which false negatives and false positives must be limited within certain significance level, the choice of threshold must be considered carefully.
Ϯϱ ϮϬ ϭϱ ϭϬ
ϱ
Ϭ Ϭ͘ϴ
Ϭ͘ϴϮ
Fig. 7.
Ϭ͘ϴϰ
Ϭ͘ϴϲ
Ϭ͘ϴϴ
Ϭ͘ϵ
Ϭ͘ϵϮ
Ϭ͘ϵϰ
Ϭ͘ϵϲ
Ϭ͘ϵϴ
V. Related Work Existing approaches for probabilistic runtime monitoring can be categorized into two categories. The first category seeks to estimate the probability that the property holds and then compares it to the predefined probabilistic specification. In [4], Chan et al. provide a platform for monitoring PCTL properties in . NET applications. They obtain the statistical evidence by calculating the ratio between the successful (or unsuccessful) monitoring results and the total number of observations. However, due to lacking statistical analysis of results, this category of approach based on estimation may lead to predefined probability significantly different from the true probability of the property. The second category based on hypothesis testing relies on classical statistical procedures. Sammapun et al. [16] introduce runtime verification in the framework of Monitoring and Checking (MaC) which provides a probabilistic extension of Meta-Event Definition Language (MEDL) [13]. This
Comparison of Execution Time
Threats to validity: Although the experimental results illustrate that our algorithms markedly precede existing approaches, there are still some threats to effect the proposed probabilistic monitoring approach. Firstly, our Bayesian approach is based on assumption that monitoring is Bernoulli trials. In Bernoulli trials the events are tested under the same conditions repeatedly and need to be independent. However in some software systems the outcome of a monitoring run may depend on previous events or states. It is necessary to improve our approach to monitor probabilistic properties for these kinds of systems. Secondly, some extreme situations, like “service needs to be responded within 1 sec with probability 99.999%”, may lead to BSRM and iBSRM failing to make a decision. For
53
approach first estimates probabilities by counting successful samples against all samples and then uses hypothesis testing to determine statistically whether a system satisfies a probabilistic property with a given level of confidence. Grunske and Zhang [8] propose a probabilistic monitoring approach called ProMo using CS L Mon . This approach uses SPRT to determine the outcome of a monitoring run with significant level of α and power of 1 − β. It does not support continuous monitoring. Zhang et al. [21] propose Probabilistic Timed Property Sequence Chart (PTPSC) which is a probabilistic extension of Timed Property Sequence Chart [20]. Based on PTPSC, a syntax-directed translator automatically generates a probabilistic monitor which combines timed B¨uchi automata and a SPRT process. Grunske [7] presents an improved and generic statistical decision procedure based on sequential hypothesis testing. The procedure achieves the goal of sequential probabilistic monitoring by backward statistical analysis and reuses the results of previous monitoring run. However, when the actual probability of a system is in the indifference region of sequential probability ratio test, these approaches will fail to make a decision on the hypothesis. Furthermore, in SPRT, the monitored probabilities need to be constant over the lifetime of the system, but in real-life the probabilistic specification may change according to the demand of client. Once the probability is altered, the results of previous monitoring run can not be reused and the monitor needs to start over again. BaProMon relieves these problems in the following way. i) By calculating Bayesian Factor, the approach can check whether the runtime information can provide sufficient evidences to support the null or alternative hypothesis. The experimental results show that BaProMon is more accurate than iSPRT when the probability is in the indifference region. ii) BaProMon can still work when the actual probability is changing.
(PTPSC) [19]. Scenario-based specifications would allow a more natural specification of properties for software-extensive systems. VII. Acknowledgements The authors would like to thanks Lars Grunske and Aldeida Aleti for their constructive comments on previous version of the paper. The work is supported by the National Natural Science Foundation of China under Grant (Nos. 61202097 and 61202136 and 51079040), China Postdoctoral Science Foundation (Grant Nos. 2012T50489 and 2011M500897) and Doctoral Fund of Ministry of Education of China (Grant No.20120094120009). References [1] C. Baier, E. M. Clarke, V. Hartonas-Garmhausen, M. Z. Kwiatkowska, and M. Ryan, “Symbolic model checking for probabilistic processes,” in ICALP, 1997, pp. 430–440. [2] L. Baresi, E. D. Nitto, and C. Ghezzi, “Toward open-world software: Issue and challenges,” IEEE Computer, vol. 39, no. 10, pp. 36–43, 2006. [3] A. Bauer, M. Leucker, and C. Schallhart, “Runtime verification for ltl and tltl,” ACM Trans. Softw. Eng. Methodol., vol. 20, no. 4, p. 14, 2011. [4] K. Chan, I. Poernomo, H. W. Schmidt, and J. Jayaputera, “A modeloriented framework for runtime monitoring of nonfunctional properties,” in QoSA/SOQUA, 2005, pp. 38–52. [5] N. Delgado, A. Q. Gates, and S. Roach, “A taxonomy and catalog of runtime software-fault monitoring tools,” IEEE Trans. Software Eng., vol. 30, no. 12, pp. 859–872, 2004. [6] L. Grunske, “Specification patterns for probabilistic quality properties,” in Proc. of ICSE 2008, 2008, pp. 31–40. [7] ——, “An effective sequential statistical test for probabilistic monitoring,” Information & Software Technology, vol. 53, no. 3, pp. 190–199, 2011. [8] L. Grunske and P. Zhang, “Monitoring probabilistic properties,” in ESEC/SIGSOFT FSE, 2009, pp. 183–192. [9] H. Hansson and B. Jonsson, “A logic for reasoning about time and reliability,” vol. 6(5), no. 1994, 512-535. [10] K. Havelund and G. Rosu, “Java pathexplorer - a runtime verification tool,” in In The 6th International Symposium on Artificial Intelligence, Robotics and Automation in Space: A New Space Odyssey, 2001, p. 2001. [11] H. Jeffreys, “Theory of probability, 3rd ed., (oxford university press, oxford, 1961),” 1961. [12] S. K. Jha, E. M. Clarke, C. J. Langmead, A. Legay, A. Platzer, and P. Zuliani, “A bayesian approach to model checking biological systems,” in CMSB, 2009, pp. 218–234. [13] M. Kim, M. Viswanathan, S. Kannan, I. Lee, and O. Sokolsky, “Javamac: A run-time assurance approach for java programs,” Formal Methods in System Design, vol. 24, no. 2, pp. 129–155, 2004. [14] M. Z. Kwiatkowska, G. Norman, D. Parker, and J. Sproston, “Performance analysis of probabilistic timed automata using digital clocks,” Formal Methods in System Design, vol. 29, no. 1, pp. 33–78, 2006. [15] M. Leucker and C. Schallhart, “A brief account of runtime verification,” J. Log. Algebr. Program., vol. 78, no. 5, pp. 293–303, 2009. [16] U. Sammapun, I. Lee, O. Sokolsky, and J. Regehr, “Statistical runtime checking of probabilistic properties,” in RV, 2007, pp. 164–175. [17] T. Tomita, S. Hagihara, and N. Yonezaki, “A probabilistic temporal logic with frequency operators and its model checking,” in INFINITY, 2011, pp. 79–93. [18] A. Wald, “Sequential tests of statistical hypotheses,” The Annals of Math. Statistics, vol. 16, no. 2, pp. 117–186, 1945. [19] P. Zhang, L. Grunske, A. Tang, and B. Li, “A formal syntax for probabilistic timed property sequence charts,” in ASE, 2009, pp. 500– 504. [20] P. Zhang, B. Li, and L. Grunske, “Timed property sequence chart,” Journal of System and Software, vol. 83, no. 3, pp. 371–390, 2010. [21] P. Zhang, W. Li, D. Wan, and L. Grunske, “Monitoring of probabilistic timed property sequence charts,” Softw., Pract. Exper., vol. 41, no. 7, pp. 841–866, 2011.
VI. Conclusion and Future Work In this paper, an effective statistical testing method for monitoring of probabilistic properties called BaProMon has been introduced. This statistical testing method can be used in any monitoring framework where a single monitoring run has one of two outcomes “success” or “failure” and the monitoring outcomes follow the assumption of a Bernoulli trial. The method is an effective test since in most relevant cases it reduces the number of required samples. The approach has been applied to monitor the response time of web services.The results also indicate that our approach is superior to the existing statistical tests used in probabilistic monitoring. In Bayesian statistics, if we choose suitable prior distribution, it will significantly reduce the required sample size. Reasonable prior distribution will make Bayesian statistics more effective. Future work will include improving parameters of prior distribution dynamically and focus on reflecting fluctuation of probability in order to supply useful information for automatic adjustment. Furthermore, we would like to extend our bayesian probabilistic monitoring approach into scenario-based specifications, such as Probabilistic Timed Property Sequence Charts
54