DISCRETE EVENT SIMULATION WITH APPLICATION TO COMPUTER AND COMMUNICATION SYSTEMS PERFORMANCE AND DEPENDABILITY ECE 557 Spring 2015 Duke University Kishor Trivedi
[email protected] Room 203, Hudson Hall Copyright © 2015 Kishor S. Trivedi
1
Some nice quotes LECTURE: An art of transmitting information from the notes of the lecturer to the notes of students without passing through the minds of either
CONFERENCE: The confusion of one man multiplied by the number present
Copyright © 2015 Kishor S. Trivedi
2
Logistics of the course 6-8 home works, no late homeworks accepted, homeworks will be mixture of paper/pencil, those involving writing a simulation program, and those involving using the SHARPE or SPNP software packages Must work individually on all homeworks 1 project Grading: 60% HW, 40% project Class webpage: www.ee.duke.edu/~ktrivedi/ECE5557 TA: Zheng Zheng; e-mail:
[email protected] TA Office Hours: Thursday, 1-4 pm Copyright © 2015 Kishor S. Trivedi
3
Three Segments & Two addenda 1. Segments 1. Discrete Event Simulation (class notes including Chapters 10 & 11 of bluebook + DOE+ Case studies) 2. DTMC, CTMC, PFQN (Ch. 7, 8, 9 of Blue book), SPN/GSPN/SRN (Ch. 8 of Blue book) 3. MRGP, MRSPN, FSPN will be very nice to cover as well but unlikely due to time constraints Bluebook: Trivedi, Probability and Statistics with Reliability, Queuing and Computer Science Applications, John Wiley, 2001
2. Addenda 1.
2.
Project; each student can choose an application to study and the simulation package to use – Start the project early and do it in phases Applications to networks, computing, smartgrid etc. will be throughout the course Copyright © 2015 Kishor S. Trivedi
4
Outline of the Segment on Simulation
Introduction to Simulation Stat. Anal. Input Data ––Chap10a, Chap10b, Chap10c, Chap10d Random Variate generation- Chap3r and additional reading Output Analysis- Chap10e Case Studies – NASA Satellite Prob. Rep. data, Bugzilla reports Simulation Packages – Module 5 Applications of simulation - Module 6 Regression and ANOVA– Chap11a, Chap11b Design of Experiments (DOE) Case Studies: Software aging and rejuvenation, software fault tolerance using environmental diversity Case Studies: Cloud, uncertainty propagation, smartgrid, healthcare Model Validation and Verification- Module 7 Variance Reduction Techniques-Module 8 Copyright © 2015 Kishor S. Trivedi
5
Why do we need Statistics? Making sense out of measurement data
NASA Satellite data; Bugzilla reports for open source software; software aging data
Developing/solving a Simulation Model [spnp, CSIM] Input Data Analysis Simulation Output Data Analysis While developing an Analytic Model [e.g., using SHARPE]
Input Data Analysis Propagating parametric uncertainty thru an analytic model
Online control [e.g., software rejuvenation or performance control] Copyright © 2015 Kishor S. Trivedi
6
Need to Model Random Phenomena Random Phenomena in a Computer/networking/cloud/web environment/VANET Arrival of jobs/messages/requests Execution (transmission/processing) time of jobs/messages/requests Memory requirement of jobs/messages/requests. Failure times or times to repair of components/resources/system/service How to Quantify Randomness? Use probabilistic models How to Estimate these Quantifiers? Use statistical techniques on measurement data Copyright © 2015 Kishor S. Trivedi
7
Modeling Random Phenomena Understand the system Measurement Data from a Real System
Statistical Analysis
Model Input Parameters
Probability Model (PM) Model Outputs
Output Measurement Data from a Real System
Model Validation
PM: Structural or white/grey box (Chaps 1-9); or Empirical or black box (Chaps 10-11) Copyright © 2015 Kishor S. Trivedi
8
Examples Performance and reliability analysis of composed web services [Sato/Trivedi icsoc 2007 paper] Availability/Reliability analysis of SIP (Session Initiation Protocol) on IBM WebSphere on BladeCenter [Trivedi et al PRDC 2008 paper; ISSRE 2010 paper] Determining mean response time/availability/power in cloud [Ghosh et al FGCS 2013, IEEE-TCC and IEEE-TSC 2014] Transmitting safety messages in a VANET [Yin et al papers] Deciding rejuvenation schedule of a system to clean its environment of any software aging effects [many papers] Propagating parametric uncertainty through Probability Models [Mishra et al, ISSRE 2011] Copyright © 2015 Kishor S. Trivedi
9
MEASURES TO BE EVALUATED Dependability
Reliability: R(t), System MTTF Availability: Steady-state, Transient Downtime Security, safety “Does it work, and for how long?''
Pure (Failure Free) Performance Throughput, Blocking Probability, Response Time “Given that it works, how well does it work?'‘
Composite Performance and Dependability (performability)
“How much work will be done(lost) in a given interval including the effects of failure/repair/contention?'' Copyright © 2015 Kishor S. Trivedi
Dependability– An umbrella term Laprie: Trustworthiness of a computer system such that reliance can justifiably be placed on the service it delivers Attributes
Dependability
Availability Reliability Safety Maintainability
Means
Fault Fault Fault Fault
Threats
Faults Errors Failures
Copyright © 2015 Kishor S. Trivedi
Prevention/Avoidance Removal Tolerance Forecasting
Extended Dependability and Security tree Faults/Vulnerabilities Threats
Errors/Atomic attacks Failures/
Dependability and Security
Intrusions
Confidentiality Attributes
Integrity Availability Reliability Safety Maintainability Fault/Intrusion Prevention
Means
Fault/Intrusion Detection Fault/Intrusion Tolerance Fault/Vulnerability Removal Fault/Intrusion Forecasting Copyright © 2015 Kishor S. Trivedi
Security
Metrics and Methods Metrics: Performance (utilizations, thruput, goodput, response time, blocking probability), Reliability, Availability, Safety, Security, Power, Performability, Survivability, Resilience System-oriented vs. user-oriented metrics
Methods of Evaluation Evaluation Vs. Prediction Vs. Bottleneck Detection Vs. Optimization
Copyright © 2015 Kishor S. Trivedi
13
Software (Program) Performance Evaluation
Worst-case vs. Average case Data-structure-oriented (Ch 2,7) vs. Control structure-oriented (Ch 2,3,4,5,7,8) Sequential vs. Concurrent (single threaded vs. multi-threaded) Restricted (Structured) (Ch 1-5) vs. unrestricted transfer of control (Ch 7) Unlimited (hardware) resources vs. limited resources Software architecture: modules, their characteristics (execution time) and interactions (branching, looping) Business process flows, Composed Web Services (similar to programs) Metrics: completion time & response time (mean, variance & dist.) Measurements, Probability Models (simulation vs. analytic), or a combination Analytic models: non-state-space (directed acyclic task precedence graph); State Space: DTMC, SMP, CTMC, SPN; Hierarchical
Copyright © 2015 Kishor S. Trivedi
14
System Performance Evaluation
Workload: Traffic arrival process, service time distributions, pattern of resource requests
Hardware and software architecture
Resource Contention, Scheduling & Allocation
Concurrency, Synchronization, Distributed processing Timeliness (may have to Meet Deadlines) Metrics: Throughput, Goodput, loss (blocking) probability, response time or delay (mean, variance & dist (Sec 9.6)) Low-level (CPU Cache, memory interference: Ch. 7) System-level (CPU-I/O, multiprocessing: Ch. 8,9) Network-level (protocols, handoff in wireless: Ch. 7,8) Measurements, models (simulation or analytic), or combination Analytic models: DTMC (Ch 7), CTMC (Ch 8), PFQN (Ch. 9), SPN (Ch 8); NPFQN: Hierarchical (Ch 9), Approximation (Ch 9)
Copyright © 2015 Kishor S. Trivedi
15
Software Reliability Evaluation
Black-box (measurements + statistical inference) vs. Architecture-based approach (Structural Models)
Black-box approach is called software reliability growth modeling (Ch. 3, 5, 8, 10)
Black-box approaches treat software as a monolithic whole, considering only its interactions with external environment, without an attempt to model its internal structure
With growing emphasis on reuse, software development process moves toward component-based software design
White-box approach may be better to analyze a system with many software components and how they fit together Copyright © 2015 Kishor S. Trivedi
16
Software Architecture
Software behavior describes the manner in which different components interact.
May include the information about the execution time of each component.
Control flow graph is used to represent architecture.
Sequential program architecture can be modeled by Discrete Time Markov Chain (DTMC; Ch 7)
Continuous Time Markov Chain (CTMC; Ch 8) Semi-Markov process (SMP) Markov Regenerative Process (MRGP)
Parallel program architecture can be modeled by Stochastic Petri Net (Ch 8) Copyright © 2015 Kishor S. Trivedi
17
System Reliability/Availability
Fault load: fault types, failure rates, repair/recovery procedures, delay time distributions and imperfect coverage for recovery steps Hardware and software architecture Minimum Resource Requirements Performance/Reliability interdependence Metrics: Reliability, Availability, system MTTF, Downtime Low-level (Physics of failures, chip level) System-level (CPU-I/O, multiprocessing: Ch 1,3,4,5,6,8,9) Software and Hardware combined together (Ch 8) Network-level Measurements, models (simulation or analytic) or a combination Analytic models types : RBD (Ch 1,3,4,5,6), FTREE (Ch 1,3,4,5), CTMC (Ch 8), SPN (Ch 8), Hierarchical (Ch 8) Copyright © 2015 Kishor S. Trivedi
18
Evaluation vs. Optimization Evaluation of system for computation of desired metrics given a set of parameters Sensitivity Analysis
Parametric (Blake et al. Sigmetrics 1988) Bottleneck analysis (Sato & Trivedi ICSOC07, Rubens et al IEEE-TR 2012) Reliability importance (Fricks, RAMS 2003)
Optimization (Ch. 11 in 1st ed. white book)
Static: Linear, nonlinear, geometric, integer, multiobjective; constrained or unconstrained Dynamic: Dynamic programming, Markov decision process, semi-Markov decision process Simulated annealing, Evolutionary programming Copyright © 2015 Kishor S. Trivedi
19
PURPOSE OF EVALUATION Understanding a system Observation
Operational environment
Controlled environment
Reasoning
A probability model is a convenient abstraction Predicting behavior of a system
Need a model Copyright © 2015 Kishor S. Trivedi
20
PURPOSE OF EVALUATION(Contd.) Famous quotes bring out the difficulty of prediction based on models:
“All Models are Wrong; Some Models are Useful” George Box and Albert Einstein “Prediction is fine as long as it is not about the future” Mark Twain Copyright © 2015 Kishor S. Trivedi
21
Methods of EVALUATION Measurement-Based More Accurate, most expensive Not always possible or cost effective during system design. Statistical techniques are very important here Empirical model can be formulated regression, machine learning etc.
via
(structural) Model-Based Copyright © 2015 Kishor S. Trivedi
22
Methods of EVALUATION(Contd.) Model-Based Less Accurate, Less expensive
1. Discrete-Event Simulation vs. Analytic solution 2. State-Space Methods (Ch. 7,8) vs. NonState-Space Methods (Ch. 1-5,9) 3. Hybrid: Simulation + Analytic (SPNP) 4. State Space + Non-State Space (SHARPE)
Copyright © 2015 Kishor S. Trivedi
23
Methods of EVALUATION
(Contd.)
Measurements + Models
Models need input parameters that are estimated from measurements Validated against measurements
Measurements should be guided by models Vaidyanathan & Trivedi IEEE TDSC, 2005; Hsueh, Iyer & Trivedi IEEE TC, 1988; Gokhale et al, Perf Eval. 2005; Trivedi et al, PRDC 2008; ISSRE 2010 Copyright © 2015 Kishor S. Trivedi
24
QUANTITATIVE EVALUATION TAXONOMY
Closed-form solution
Numerical solution using a tool Copyright © 2015 Kishor S. Trivedi
25
Notes
Both measurements & simulations imply statistical analysis of outputs
Statistical inference (Ch 10) Hypothesis testing (Ch 10) Regression (linear, nonlinear) (Ch 11) Design of experiments (not in bluebook) Trend Detection (Ch 11) Analysis of variance (Ch 11)
Distribution-driven simulation requires generation of random deviates (variates). (Ch. 3, 4, 5)
Probability and Statistics are different but highly intertwined.
Probability models need inputs that generally come from measurement data (followed by statistical inference)
Statistics in turn uses probability theory to derive formulas Copyright © 2015 Kishor S. Trivedi
26
Introduction to Simulation
MODULE 1
Copyright © 2015 Kishor S. Trivedi
27
What is Simulation? An experiment on a system model to empirically determine its characteristics. A model solution method that mimics or emulates the behavior of a system over time. Involves generation and observation of artificial history of the system under study. Inferences are then drawn from the response of the model, concerning the dynamic behavior of the real system.
Copyright © 2015 Kishor S. Trivedi
28
Computer Simulation
Involves modeling of actual or theoretical system, executing the model (an experiment) on a digital computer, and (statistically) analyzing the execution output. Current state of the physical system is represented by state variables (program variables). State variables are modified to mimic the evolution of the physical system over time
Copyright © 2015 Kishor S. Trivedi
29
What is a Model? Model is a representation of the system under study developed through techniques of
Abstraction, that is, discarding unimportant details to make a model as simple as possible, and as complex as necessary Decomposition, that is, divide and conquer Idealization (e.g., relaxing unimportant constraints)
All three techniques aim at complexity reduction More art involved than science
Physical or Mathematical (abstract, formal) models Copyright © 2015 Kishor S. Trivedi
30
What is a Model? Frequently models have random inputs, consisting of a set of sequences of random variables with specified distributions Non-determinism can also be introduced by some random operational decisions represented in the model.
Correspondingly such models have a random output with unknown distribution In such cases the goal is to estimate certain characteristics of output distributions.
Copyright © 2015 Kishor S. Trivedi
31
Model Solution Types Model Solution
Transient (Terminating)
Steady-state (Non-terminating)
Copyright © 2015 Kishor S. Trivedi
32
Model Solutions (Transient) Model Solution Transient
Analytic
Fully-Symbolic solution
Semi-Symbolic solution
Simulation (terminating)
Numerical solution
Copyright © 2015 Kishor S. Trivedi
33
Model Solutions (Steady State) Model MODEL Solution SOL. Steady State
Analytic
Symbolic Solution
Simulation (Steady-state)
Numerical solution Copyright © 2015 Kishor S. Trivedi
34
Nature of Model Solutions
Fully Symbolic Closed form solution of an analytic model by hand or via Mathematica (Matlab?, others?) Exact [Example will follow]
Numerical solution of an analytic model using one of many packages such as SHARPE or SPNP numerical errors (round off, truncation, convergence) [Example will follow]
Semi-Symbolic (semi-numerical) (transient) solution– symbolic in t (see SHARPE cdf in exponomial form); note that for steady-state case, there is no semi-symbolic solution [Example will follow]
Simulative solution statistical (or sampling) errors (finite number of paths traversed out of (possibly) infinitely many paths) [Example will follow]
Copyright © 2015 Kishor S. Trivedi
35
Fully Symbolic Transient Solution (by hand)
Copyright © 2015 Kishor S. Trivedi
36
Markov Reliability Model With Repair Consider a 2-component parallel system where we disallow repair from system down state. Note that state 0 is an absorbing state. The state diagram is given in the following figure. This is reliability model with repair. We need to resort to Markov chains.
Copyright © 2015 Kishor S. Trivedi
37
Markov Reliability Model With Repair (Contd.)
2 2
1
0
Absorbing state
Markov chain has an absorbing state.
In the steady-state, system will be in state 0 with probability 1.
Hence steady state analysis will yield a trivial answer; transient analysis is of interest. States 1 and 2 are transient states. Copyright © 2015 Kishor S. Trivedi
38
Markov Reliability Model With Repair (Contd.)
2 2
1
0
Assume that the initial state of the Markov chain is 2, that is, p2(0) = 1, pk (0) = 0 for k = 0, 1.
Then the system of differential Equations is written
based on: Rate of buildup = Rate of flow in - Rate of flow out for each state Copyright © 2015 Kishor S. Trivedi
39
Markov Reliability Model With Repair (Contd.) 2
2
1
0
dp 2 (t ) 2p 2 (t ) p1 (t ) dt
dp 1 (t ) 2p 2 (t ) ( )p 1 (t ) dt dp 0 (t ) p 1 (t ) dt Copyright © 2015 Kishor S. Trivedi
40
Markov Reliability Model With Repair (Contd.) Using the technique of Laplace transform, we can reduce the above system to:
sp 2 ( s ) 1 2 p 2 ( s ) p 1 ( s ) sp 1 ( s ) 2 p 2 ( s ) ( )p 1 ( s ) sp 0 ( s ) p 1 ( s )
where p ( s ) e stp (t ) dt 0
Copyright © 2015 Kishor S. Trivedi
41
Markov Reliability Model With Repair (Contd.) _ _ _ _ _ _ _ _ __ _
Solving for π 0 (s) , we get: p 0 ( s)
s[ s 2
22 (3 ) s 22 ]
After an inversion, we obtain p0 (t), the probability that no components are operating at time t ≥ 0. For this purpose, we carry out a partial fraction expansion.
Copyright © 2015 Kishor S. Trivedi
42
Markov Reliability Model With Repair (Contd.) Inverting the transform, we get 22 e 2t e 1t R(t ) 1 p 0 (t ) ( ) 1 2 2 1
where 1 , 2
(3 )
2 6 2 2
Copyright © 2015 Kishor S. Trivedi
43
Fully Symbolic Transient Solution 2
2
(by hand)
1
0
22 e 2t e 1t R(t ) 1 p 0 (t ) ( ) 1 2 2 1 (3 ) 2 6 2 1 , 2 2 Copyright © 2015 Kishor S. Trivedi
44
Fully Symbolic Closed form Transient solution in Mathematica
Absorbing state
Copyright © 2015 Kishor S. Trivedi
45
Fully Symbolic Transient Solution What are the fundamental limits of this approach? Finding roots of polynomial in a fully symbolic fashion Currently possible only up to a fifth degree polynomial
Copyright © 2015 Kishor S. Trivedi
46
Semi-Symbolic (semi-numerical) (transient) solution in SHARPE (textual input) bind lambda 1/1000 bind mu 1/1 markov semi 2 1 2*lambda 1 0 lambda 1 2 mu end * Initial Probabilities assigned: 21 10 00 end
echo ************************** **************** echo ********* Outputs asked for the model: semi ************** cdf(semi,0) end
Copyright © 2015 Kishor S. Trivedi
47
Semi-Symbolic (semi-numerical) (transient) solution in SHARPE
Copyright © 2015 Kishor S. Trivedi
48
Semi Symbolic Transient Solution What are the limits of this approach? Only full matrix method is known When the roots are close by, numerical instability occurs
Copyright © 2015 Kishor S. Trivedi
49
Numerical Transient solution in SHARPE (textual input) echo
bind lambda 1/1000 bind mu 1/1 markov numeric 2 1 2*lambda 1 0 lambda 1 2 mu end * Initial Probabilities defined: 21 10 00 end
******************************* ******************************* ********** echo ********* Outputs asked for the model: numeric ************** func Reliability(t) 1-tvalue(t;numeric) loop t,1,991,10 expr Reliability(t) end var MTTAb mean(numeric, 0) expr MTTAb end
Copyright © 2015 Kishor S. Trivedi
50
Numerical Transient solution in SHARPE (textual input)
Copyright © 2015 Kishor S. Trivedi
51
Numerical Transient Solution What are the limits of this approach? Sparse matrix storage and sparsity preserving algorithms enable very large Markov models to be solved Stiffness of Markov models will slow down the solution
Copyright © 2015 Kishor S. Trivedi
52
Symbolic Solution (Steady state)
1 Anonshared
2 2
2
2 1 2
1
1
0
Shared repair
Copyright © 2015 Kishor S. Trivedi
53
Steady-state balance equations For any state: Rate of flow in = Rate of flow out Consider the shared case, 2p 2 p1
( )p 1 2p 2 p 0
p 1 p 0 pi : steady state probability that system is in state i, that is: πi lim P( X (t ) i) t
Copyright © 2015 Kishor S. Trivedi
54
Steady-state balance equations (Contd.)
p1 2
Hence
p2
Since
p 0 p1 p 2 1
We have p 0 or
p0
p1
p0
p0 p 0 1 2 1
2 1 22
Copyright © 2015 Kishor S. Trivedi
55
Symbolic steady-state Solution What are the limits of this approach?
Copyright © 2015 Kishor S. Trivedi
56
Numerical Solution (steady state)
Copyright © 2015 Kishor S. Trivedi
57
Shared Case markov shared 2 1 2*lambda
* Could be also written * 2 1 2/MTTF 1 0 lambda
1 2 mu 0 1 mu end
bind mu 1 lambda 0.1 end var U prob(shared,0) var downtime 60*8760*U loop j ,2, 5, 0.5 bind lambda 1.0 *10^-j expr downtime end end
Copyright © 2015 Kishor S. Trivedi
58
Markov Availability Model
Copyright © 2015 Kishor S. Trivedi
(Contd.)
59
Numerical steady-state Solution What are the limits of this approach? Sparse matrix storage and sparsity preserving solution methods are known (mostly iterative methods) Some iterative methods are guaranteed to always converge (Power method) while some others (though faster on the average) may fail to converge (SOR, GS) Copyright © 2015 Kishor S. Trivedi
60
Simulation of Markov Model 2 2
1
2 0
2
1
Absorbing state
0
Useful steps to follow: 1. 2. 3. 4.
Simulation flow chart Random Variate generation (see module 3) Write Simulation code (Java, C, C++, others) Interpret results Copyright © 2015 Kishor S. Trivedi
61
Simulation Flow chart
Copyright © 2015 Kishor S. Trivedi
62
System flow chart
Copyright © 2015 Kishor S. Trivedi
63
Random Variate Generation public class homework1a {
Java Example
//Create the object before every simulation run to guarantee a new
seed. Random generator = new Random(); //(Uniform random generator)
. . . private static double generateRandomVariate(double f) { double x=0; double u = generator.nextDouble(); x = -(Math.log(1-u))/f; return x; } Copyright © 2015 Kishor S. Trivedi
64
Random Variate Generation C++ Example //initialize variables ... while (t