Sujit K. Ghosh is Assistant Professor, Department of Statistics, North Carolina State University, ... We take a nonparametric Bayesian approach in this paper.
Bayesian Nonparametric Inference for Nonhomogeneous Poisson Processes LYNN KUO
and
SUJIT K. GHOSH
Abstract Several classes of nonparametric priors are employed to model the rate of occurrence of failures of the nonhomogeneous Poisson process used in software reliability or in repairable systems. The classes include the gamma process prior, the beta process prior, and the extended gamma process prior. We derive the posterior distribution for each process. Sampling based methods are developed for Bayesian inference. Numerical comparisons among the three classes are performed on a real software failure data set. KEY WORDS: Beta process prior; Extended gamma process prior; Gamma process prior; Gibbs sampling; In nitely divisible distribution; Levy process.
Lynn Kuo is Professor, Department of Statistics, U-120, University of Connecticut, Storrs, CT 062693120. Sujit K. Ghosh is Assistant Professor, Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203.
1
1. Introduction Nonhomogeneous Poisson point processes (NHPP) have been used extensively in modeling the number of failures in repairable systems and in software testing. Most Bayesian inference for NHPP assumes a parametric model for the rate of occurrence of failures (ROCOF), also called the intensity function, and proceeds with prior assumptions on the unknown parameters. For example, the Goel-Okumoto (1979), Musa-Okumoto (1984), Duane (1964), and Cox-Lewis (1966) processes are special cases of the NHPP, with the ROCOF being the hazard function of an exponential, a Pareto, a Weibull, and a Gompertz distribution respectively (cf. Kuo and Yang, 1996). The parametric assumptions have the advantages of being simple to understand and to t. However, it is often dicult to justify a speci c parametric family. We take a nonparametric Bayesian approach in this paper. The nonparametric Bayesian approach allows us to incorporate our prior belief on the ROCOF. Moreover, it is more robust than the parametric approach against the misspeci cation of the distribution. We model the ROCOF to be an unknown sample path that potentially can have an in nite number of parameters where we assume speci c prior distributions for the increments of the sample path. We usually incorporate our prior belief on the mean and the variance of the cumulative ROCOF. In particular, we assume two types of priors: one type is on the space of the cumulative ROCOFs and includes the gamma processes and the beta processes; the other is on the space of monotone ROCOFs and includes the gamma, beta, and the extended gamma processes. All the processes have independent increments (or decrements). The gamma process assumes gamma distributions for the increments. The beta process proposed by Hjort (1990) assumes that the in nitesimal increments have beta distributions. The gamma and beta process priors for the cumulative ROCOF are exible. However, when the ROCOF is known to be monotone, for example, a decreasing function of time in the \happy'" system or an increasing function in the \sad" system, the gamma and beta process priors on the cumulative ROCOF may be too general to be useful. In these cases, we can construct gamma processes, beta processes, and extended gamma processes (Dykstra and Laud, 1981) directly on the space of monotone ROCOFs, not the cumulative ROCOFs. Given the gamma process is a 2
special case of the extended gamma process that is a scale mixture of the gamma processes, we will focus on the extended gamma process and comment on the beta process for the space of ROCOFs. On Bayesian updating, we not only discuss how to obtain the posterior distributions, but also develop sampling-based methods. The sampling-based approach has the advantages that any features of the posterior distributions, for example, mean, variance, quantile, con dence interval, and histogram, can be obtained from the random variates simulated from the posterior distribution. The gamma process prior is most convenient to use, because it is a natural conjugate prior for the likelihood. For each of the processes, we give two versions of the sampling-based algorithm. One version is what we call the \partitioning approximation" method where the updating is done by dividing the time interval into a large number of subintervals. The other version, called the \Levy process generation", makes use of the Levy representation in the posterior or in the prior and sampling techniques for generating the posterior distributions. We compare both versions for a particular data set. If the number of partitions is large enough in the \partitioning approximation" version, the dierence between the two versions is negligible. Most readers may prefer the \partitioning approximation" version because it is easy to understand and easy to program. For the gamma and beta process priors on the cumulative ROCOFs, both algorithms sample replications of the random variates from the posterior distributions directly. In fact, for the gamma process prior, neither version is needed in practice, because we know exactly the posterior distribution analytically. For the extended gamma process on the ROCOFs, both versions use Gibbs sampling (cf. Gelfand and Smith 1990, and Tanner and Wong, 1987). Gibbs sampling is a Markov chain Monte Carlo (MCMC) approach which samples variates according to a Markov chain with the stationary distribution as the desired posterior distribution. Data augmentation with a class of multinomially distributed latent variables facilitates the speci cation of the conditional densities used in the algorithm. Therefore, we use the data augmentation technique in both versions. The \Levy process generation" version samples the increments in the prior by the Walker (1996) algorithm and updates the increments independently given the latent variables; the \partitioning approximation" version takes a larger number of subintervals and samples the increments by independent 3
gamma densities conditioning on the latent variables. Again we show numerically the dierence between the two algorithms is negligible, when the number of partitions is large in the latter version. The Bayesian computation part of this paper is mostly adapted from the following papers. Damien, Laud, and Smith (1995, 1996) develop sampling based methods to generate random variates directly from the posterior distributions for the gamma process, Dirichlet process, simple homogeneous, and beta process priors where the data consist of real observed and right-censored failure times. Laud, Smith and Damien (1996) develop sampling-based methods for the same type of data with extended gamma processes. While Damien et al. and Laud et al. consider the case where the failure times subject to censoring are i.i.d., we consider a dierent likelihood where i.i.d. assumptions do not apply. Walker (1966) proposes a dierent algorithm for sampling from an in nitely divisible distribution. This algorithm is also related to the algorithm given by Wolpert and Ickstadt (1996). We discuss both the algorithms of Damien et al. and of Walker for generating the in nitely divisible increment. In addition to Bayes inference, we also consider issues of model adequacy and model selection. We use model in a broad sense to include both the likelihood and the prior. The predictive intervals for the forecast (future failure times) can be constructed from the predictive density of the forecast given the past data by means of the sampling-based method. The model is judged to be adequate if 1 ? of the actually observed failure epochs are contained in the 1 ? predictive intervals. For model selection, we use a predictive likelihood that modi es the posterior Bayes factor (Aitkin, 1991). The advantage of this approach is given in Kuo and Yang (1996). The posterior Bayes factor compares the marginal likelihoods of the whole data set for two models with respect to their posterior distributions. To avoid the criticism that the posterior Bayes factor uses the same data twice, we modify the posterior Bayes factor as follows. We split the data into two parts. We evaluate the predictive likelihood for the second part against the posterior distribution of the unknown sample path given the rst part of the data. If we think of the second part of the data as the \future" data, this modi ed posterior Bayes factor compares the predictive power for the future given the past data. On inference for the parametric NHPP, Goel-Okumoto (1979), Musa-Okumoto (1984), 4
Cox-Lewis (1966), and Lawless and Thiagarajah (1996) derive the maximum likelihood estimators. Crowder, Kimber, Smith, and Sweeting (1991) also discuss maximum likelihood inference for the Weibull (Duane) and Cox-Lewis processes. Kuo and Yang (1996), Kuo et al. (1997), and Yang and Kuo (1995) develop Bayesian inference. Kuo and Yang also provide an extensive literature review for Bayesian parametric inference. On Bayesian nonparametric inference for the NHPP, Lo (1982) considers estimating the cumulative intensity function from independent and identically distributed (i.i.d.) multiple samples. Lo and Weng (1989) develop analytic results for a family of multiplicative point processes with extended gamma process priors. On the nonparametric inference for the intensity function of the NHPP, Boswell (1966) derive an isotonic regression estimator for a non-decreasing intensity function. Bartoszynski et al. (1981) describe kernel and penalized likelihood approach for multiple samples from a NHPP. Clevenson and Zidek (1977) consider a class of histogram, a class of moving averages, and a class of linear estimators for the intensity function. Leemis (1991) proposes a piecewise linear estimator for the cumulative intensity function. He also provides a list of papers in the area. Section 2 discusses the likelihood of the NHPP. Section 3 de nes the nonparametric priors. Section 4 develops the posterior distributions. Section 5 exhibits the sampling based approaches to simulate variates from the posterior distributions. Section 6 describes predictive inference for the future survival function and for the future mean time between failures. Section 7 discusses methods for checking model adequacy and for model selection. A numerical example is given in Section 8 and some concluding remarks are provided in Section 9.
2. Likelihood Let M (t) denote the number of failures in (0; t]. We assume M (t) is a NHPP with mean function (t) = EM (t). That is, the numbers of failures in disjoint intervals are independent with Poisson distributions of means (ti) ? (ti?1) for the interval (ti?1; ti ] (ti?1 < ti ). In our de nition of the NHPP, we also include disjoint intervals of the form (ti?1 ; t?i ], (t?i ; ti ], 5
etc., where t? denotes the left-hand limit of t. Let (t) = dtd (t). Therefore, (t) denotes the ROCOF of the NHPP, also called the intensity function; and (t) denotes the cumulative ROCOF function. If (t) has a jump at ti , then we de ne (ti) = [(ti) ? (t?i )]y(ti ), where y is the Dirac delta function. Given the time-truncated model, testing until time tU , the ordered epochs of the observed n failure times are denoted by x1 ; : : : ; xn , where x1 < x2 < < xn. We de ne the data set DtU to be fn; x1 ; x2 ; : : : ; xn ; tU g. Given the failure truncated model (observed until the nth failure), we de ne the data set Dxn to be fx1 ; x2 ; : : : ; xng. Then the probability of observing no failures in (0; x?1 ), one failure in (x?1 ; x1 ), no failures in (x1; x?2 ), and so on up to no failures in (xn; tU ) is given by
LNHP P (jDtU ) =
n Y i=1
e?[(x?i )?(xi?1 )] [(xi) ? (x?)]e?[(xi )?(x?i )]
!
i
e?[(tU )?(xn)] ; (1)
where x0 = 0. For the failure truncated model, a similar expression to (1) can be applied with tU replaced by xn . If we allow the possibilities of ties in the failure epochs, then we generalize the likelihood in (1) to
LNHP P G(jDtU ) =
n Y i=1
e?[(x?i )?(xi?1 )] [(xi) ? (x?)]di e?[(xi )?(x?i )] i
!
e?[(tU )?(xn )] ; (2)
where di denotes the number of multiplicities of the failures observed at xi . So the total number of failures observed in (0; tU ] is Pni=1 di : Both likelihoods suggest that if the prior on the cumulative ROCOF (t) has independent increments then the posterior of also has independent increments. In particular if the prior increments have gamma distributions, then the posterior increments also have gamma distributions. In the following, we de ne two general classes of nonparametric priors. The rst class is constructed on the space of cumulative ROCOFs. We will focus on the gamma and beta processes. The second class is constructed on the space of monotone ROCOFs. We will focus on the extended gamma process. 6
3. Nonparametric Priors 3.1 Priors for the Cumulative ROCOF Let t = (t) be a process with independent increments, non-decreasing almost surely, right continuous almost surely, limt!?1 t = 0 almost surely, and limt!1 t = 1 almost surely. We allow the possibility that t = 1 with positive probability for nite t. For such a process t there exist at most countably many xed points of discontinuity at t1 ; t2 ; : : : with random size of the jumps to be S1 ; S2 ; : : : with densities f1 ; f2 ; : : :. If we remove these xed jumps of discontinuity from the process t, then
Zt = t ?
X j
Sj I[tj ;1) (t)
is a process with independent increments with no xed points of discontinuity. Then the moment generating function of such a process has the following Levy representation
Z1 Mt () = Ee?Zt = exp ?b(t) + (e?z ? 1)dNt (z ) ; 0
where b is a non-decreasing and continuous function with b(t) ! 0 as t ! ?1 and Nt is a continuous Levy measure satisfying 1. for every Borel set B , Nt (B ) is continuous and non-decreasing in t, 2. for every real t, Nt (B ) is a measure on the Borel subsets of (0; 1),
R 3. 01 z (1 + z )?1dNt (z ) < 1, and
R 4. 01 z (1 + z )?1dNt (z ) ! 0 as t ! ?1. In the following construction, we assume b(t) 0. When the increments of the process t have gamma distributions, we call this process the gamma process. We use the notation ?(; ) to denote a gamma distribution with mean = throughout the paper. Let us assume t is a gamma process with t ?(0(t)c; c). So the prior guess for t is E t = 0(t) and the prior variance of t is Var(t) = 0(t)=c. So we choose 0(t) to represent our prior belief on the cumulative ROCOF and c to represent the prior precision; the bigger the c is, the more certain we are of our prior guess. Then the Levy measure is given by
dNt (z ) = c0 (t)e?cz z ?1 dz: 7
The gamma process is discussed in many books and papers that include Kalb eisch and Prentice (1980, Section 8.4), Singpurwalla and Youngren (1993), and Singpurwalla (1997). Similarly, we can consider the simple homogeneous process with the following Levy measure, dNt (z ) = c0 (t)e?cz (1 ? e?z )?1dz; and a \Dirichlet" process (Ferguson, 1973), where a Dirichlet process is actually constructed on F (t) = 1 ? e?t , with the following Levy measure, ?(