An Automated Multiresolution Procedure for Modeling Complex Arrival ...

1 downloads 0 Views 285KB Size Report
To automate the multiresolution procedure of Kuhl et al. for modeling and simulating arrival processes that may exhibit a long-term trend, nested periodic ...
INFORMS Journal on Computing

informs

Vol. 18, No. 1, Winter 2006, pp. 3–18 issn 1091-9856  eissn 1526-5528  06  1801  0003

®

doi 10.1287/ijoc.1040.0113 © 2006 INFORMS

An Automated Multiresolution Procedure for Modeling Complex Arrival Processes Michael E. Kuhl, Sachin G. Sumant

Department of Industrial and Systems Engineering, Rochester Institute of Technology, 81 Lomb Memorial Drive, Rochester, New York 14623-5603, USA {[email protected], [email protected]}

James R. Wilson

Edward P. Fitts Department of Industrial Engineering, North Carolina State University, Campus Box 7906, Raleigh, North Carolina 27695-7906, USA, [email protected]

T

o automate the multiresolution procedure of Kuhl et al. for modeling and simulating arrival processes that may exhibit a long-term trend, nested periodic phenomena (such as daily and weekly cycles), or both types of effects, we formulate a statistical-estimation method that involves the following steps at each resolution level corresponding to a basic cycle: (a) transforming the cumulative relative frequency of arrivals within the cycle (for example, the percentage of all arrivals as a function of the time of day within the daily cycle) to obtain a statistical model with approximately normal, constant-variance responses; (b) fitting a specially formulated polynomial to the transformed responses; (c) performing a likelihood ratio test to determine the degree of the fitted polynomial; and (d) fitting to the original (untransformed) responses a polynomial of the same form as in (b) with the degree determined in (c). A comprehensive experimental performance evaluation involving 100 independent replications of eight selected test processes demonstrates the accuracy and flexibility of the automated multiresolution procedure. Key words: simulation; probability: stochastic model applications; statistics: estimation; statistical analysis; nonhomogeneous Poisson processes; time-dependent arrivals History: Accepted by Susan Sanchez, Area Editor for Simulation; received February 2004; revised July 2004; accepted August 2004.

1.

Introduction

the usual Poisson postulates so that the corresponding (cumulative) mean-value function is given by  t t ≡ E N t =

z dz for all t ≥ 0

Time-varying arrival processes are routinely encountered in practical applications of industrial and systems engineering techniques. The following are typical situations in which the arrival rate of relevant entities depends strongly on time: demands for seasonal products such as lawn mowers, arrivals of patrons at an amusement park, arrivals of patients at an emergency room, and arrivals of telephone calls at a customer service center. To analyze or improve system operation in such situations, discreteevent stochastic simulation is often the technique of choice. Consequently, high-fidelity probabilistic input models are frequently needed to perform meaningful simulation experiments. In the past, nonhomogeneous Poisson processes (NHPPs) have been used successfully to model complex time-dependent arrival processes in a broad range of application domains (Lewis and Shedler 1976, Kao and Chang 1988, Lee et al. 1991, Massey et al. 1996, Pritsker et al. 1995). An NHPP N t t ≥ 0 is a counting process such that N t is the number of arrivals in the time interval 0 t ; and t, the instantaneous arrival rate at time t, is a nonnegative, integrable function satisfying

0

The rate or mean-value function of the NHPP N t t ≥ 0 completely characterizes the probabilistic behavior of the process (Çinlar 1975). Both parametric and nonparametric methods have been developed to estimate the rate or mean-value function of the process N t t ≥ 0 from observed arrival times. This paper focuses on the nonparametric “multiresolution” estimation method of Kuhl et al. (1997a) and Kuhl and Wilson (2001), in which progressively increasing levels of detail (or resolution) are used to model a target arrival process that may exhibit a long-term trend or nested cyclic effects; and the latter effects might not necessarily possess the symmetry of sinusoidal oscillations. For example, in developing a large-scale simulation model of the organ-procurement and patient-registration processes for liver transplants in the United States, Pritsker et al. (1995) find that some of the arrival streams of liver donors and liver patients have pronounced longterm trends as well as asymmetric cyclic effects that 3

4

Kuhl, Sumant, and Wilson: An Automated Multiresolution Procedure for Modeling Complex Arrival Processes

depend on the time of year (season), the day of the week, and the hour of the day; see also Pritsker (1998). In the multiresolution procedure of Kuhl et al. (1997a) and Kuhl and Wilson (2001), we begin at resolution 0 by estimating any long-term trend that is present in the overall observation interval 0 S . If p cyclic effects are also present (where p ≥ 1), then we evaluate the arrival process on the scale of the largest cycle (that is, the cycle of length b1 that is associated with resolution 1) to estimate the most prominent periodic features of the arrival process; and we assume S/b1 is an integer so that the observation interval 0 S consists of complete resolution-1 cycles. If p ≥ 2, then we evaluate the periodic phenomena corresponding to cycles of progressively smaller lengths that are nested within each other so that at resolution i (for i = 1     p − 1), each associated cycle of length bi consists of an integral number of cycles of length bi+1 that are associated with resolution i + 1. Thus the cycles of length bp at resolution p are the smallest observation intervals over which we seek to model accurately the periodic behavior of the target arrival process. Note that if p = 0, then there are no cyclic effects; and in this case we simply estimate the longterm trend over the single resolution-0 cycle of length b0 ≡ S. Finally we construct the estimated mean-value function from the estimates of all the cyclic effects as well as the long-term trend effect. Kuhl and Wilson (2001) develop a theoretical basis for the multiresolution procedure, but they do not provide a specific implementation of the procedure that can be readily applied to sample data. In this paper, we develop a completely automated implementation of the multiresolution procedure that can be executed over the web, enabling users to fit their arrival processes and then generate independent realizations of each fitted process interactively without the need for downloading and installing the multiresolution software. The rest of this paper is organized as follows. For completeness, in Section 2 we summarize the mathematical formulation of the multiresolution procedure. In Section 3 we present the automated fitting procedure, which involves the following steps at each resolution level associated with a basic cycle: 1. performing a variance-stabilizing transformation (specifically, the composite of the arc-sine and square-root transformations) of the cumulative relative frequency of arrivals within the cycle so as to obtain independent, normally distributed responses with a constant variance (at least to a reasonable approximation); 2. fitting an appropriate function to the transformed arrival data using a polynomial that has been specially formulated to ensure it has the properties required for a cumulative distribution function;

INFORMS Journal on Computing 18(1), pp. 3–18, © 2006 INFORMS

3. performing a likelihood ratio test to determine the appropriate degree of the fitted polynomial in Step 2; and 4. fitting an appropriate function to the original (untransformed) arrival data using a polynomial of the same form as in Step 2 with the degree determined in Step 3. In Section 4 we detail an example that illustrates the automated multiresolution procedure and gives some indication of the flexibility of this method. In Section 5 we present an experimental performance evaluation of the procedure. In Section 6 we demonstrate our web-based input-modeling software by applying it to the example elaborated in Section 4. Finally in Section 7 we summarize conclusions and recommendations for future work. Although this paper is based on Sumant (2003), some preliminary results on the formulation of our multiresolution procedure are presented in Kuhl et al. (2003).

2.

Summary of the Multiresolution Estimation Procedure

The mean-value function is fitted to arrival-time data observed over the time interval 0 S , where the data may contain the following: a long-term trend; p cyclic effects having periods of length bi for i = 1     p, provided p ≥ 1; or both types of effects. The multiresolution procedure of Kuhl et al. (1997a) and Kuhl and Wilson (2001) is designed for modeling and simulation of arrival processes satisfying two key assumptions that are formally stated below. As an aid in explaining the significance of these assumptions, we consider a simple example of an arrival process whose instantaneous arrival rate has a long-term trend over a time horizon of 28 days as well as weekly and daily cyclic effects—that is, periodic rate components with periods of one week and one day. Assumption 1. If p ≥ 1, then there are p distinct cycle lengths (periods) b0 > · · · > bp such that bi−1 is an integral multiple of bi for i = 1     p, where b0 = S. In our example, we take one day as the unit of time so that we have an overall time horizon of S = b0 = 28 days; and within each (weekly) cycle of length b1 = 7 days, there are (daily) cycles of length b2 = 1 day so that Assumption 1 is satisfied. Assumption 1 requires knowledge of the cycle lengths of all the cyclic effects observed in the arrival process; and in our experience, such information is often available in practical applications. On the other hand, in the absence of exact knowledge of the number of cyclic effects and the corresponding cycle lengths, we can estimate these quantities by a standard spectral-analysis procedure (Lewis 1970, Lee et al. 1991, Kuhl 1994, Kuhl and Wilson 1996).

Kuhl, Sumant, and Wilson: An Automated Multiresolution Procedure for Modeling Complex Arrival Processes

5

INFORMS Journal on Computing 18(1), pp. 3–18, © 2006 INFORMS

Assumption 2. If p ≥ 1, then within the jth cycle j − 1bi  jbi of length bi ( for i = 1     p), the arrival rate at time t is proportional to a baseline function i s, where s = t mod bi = t − j − 1bi is the offset from the beginning of the cycle so that we have t = i j i t − j − 1bi = i j i s, and i l  l = 1     S/bi  are the constants of proportionality for all cycles of length bi . In our example, Assumption 2 requires that the rate of arrivals within each day (that is, at resolution 2) should follow the same general profile as a function of the time of day; but on successive days within each weekly cycle, the daily arrival rates will have the multipliers 2 l  l = 1     S/b2 = 28, respectively, to account for different numbers of expected arrivals on different days in the overall time horizon of 28 days. Similarly, Assumption 2 requires that the rate of arrivals within each week (that is, at resolution 1) should follow the same general profile as a function of the time of the week; but on successive weeks within the overall time horizon of 28 days, the weekly arrival rates will have the multipliers 1 l  l = 1     S/b1 = 4, respectively, to account for different numbers of expected arrivals on different weeks in the overall time horizon. In Section 4 we elaborate this example to illustrate not only the probabilistic structure of NHPPs satisfying Assumptions 1 and 2 but also the general approach we take to modeling and estimating such arrival processes. See Section 3 of Kuhl and Wilson (2001) for a detailed description of another arrival process that satisfies Assumption 2. If there are no cyclic effects so that p = 0, then Assumptions 1 and 2 are vacuously satisfied. When p ≥ 1, Assumption 2 holds if and only if the target arrival process has the following property: within the jth cycle j − 1bi  jbi of length bi for i = 1     p and j = 1     S/bi , the buildup to time t = j − 1bi + s (where s = t mod bi ) of the cumulative fraction of the expected number of arrivals in the cycle, t − j − 1bi jbi  − j − 1bi



is given by a function Ri s of the time s that has elapsed since the beginning of the cycle; and the same function Ri s describes the buildup of arrivals within every cycle of length bi . Thus to estimate the meanvalue function t, we must first estimate the auxiliary functions s

i z dz Ri s ≡  0b for s ∈ 0 bi and i = 0     p (1) i

i z dz 0 At each resolution i, we must obtain an estima i · of the auxiliary function Ri · with the foltor R i 0 = Ri 0 = 0; lowing properties: the initial value R the final value Ri bi  = Ri bi  = 1; and the derivative

i s ≥ 0 for s ∈ 0 bi  and i = 0     p. If p ≥ 1, then R at resolution 0 we fit a monotone increasing function 0 t, t ∈ 0 S , to the points  jb1  N jb1 /N S T  j = R 1     S/b1 . If p = 0, then at resolution 0 we perform the fitting procedure described for resolution p in the paragraph immediately following Remark 1 below. If p ≥ 2, then at resolution i for i = 1     p − 1, we i s to esticonstruct a monotone increasing function R mate the cumulative percentage of arrivals that are expected to occur during the first s time units of the associated cycle of length bi , where s ∈ 0 bi . Thus if i s to p ≥ 2, then at resolution i we fit the function R T the points  jbi+1  Gi jbi+1   j = 0     bi /bi+1 , where Gi s =

i −1  1 S/b N lbi + s − N lbi  N S l=0

for s ∈ 0 bi and i = 1     p − 1 (2) Remark 1. Theorem 2 of Kuhl and Wilson (2001) provides general conditions under which with probability one the function Gi s converges uniformly to the theoretical auxiliary function Ri s for s ∈ 0 bi  as the length of the overall time horizon tends to infinity. Regardless of the value of p, at resolution p we let k  k = 1     N S denote the observed arrival times in the interval 0 S ; and we define k = k mod bp for k = 1     N S. Let 1 ≤ · · · ≤ N S denote the ordered arrival times within the basic resolution-p cycle 0 bp . We then fit a monotone increasing func p s to the points  k  k/N S T  k = 1     tion R N S subject to the usual boundary conditions that p 0 = 0 and R p bp  = 1. R The final estimate t of the mean-value function is computed as follows: 0 t t = N SQ

for t ∈ 0 S 

(3)

i t i = p     0 are defined where the functions Q iteratively for all t ∈ 0 S as: p t ≡ R p t − jp t − 1bp Q

(4)

and if p ≥ 1, then for i = p − 1     0, we take i ji+1 t −1bi+1 −ji t −1bi i t ≡ R Q  i ji+1 t bi+1 −ji t −1bi + R

 i ji+1 t −1bi+1 −ji t −1bi Q i+1 t (5) −R

where at resolution i the cycle of length bi containing a given time t is given by the cycle index ji t ≡ unique integer j such that j − 1bi ≤ t < jbi for t ∈ 0 S and i = 0     p

(6)

Kuhl, Sumant, and Wilson: An Automated Multiresolution Procedure for Modeling Complex Arrival Processes

6

INFORMS Journal on Computing 18(1), pp. 3–18, © 2006 INFORMS

See p. 569 of Kuhl and Wilson (2001) for a complete explanation of equations (1)–(6) and an illustrative example of the multiresolution procedure. Although the multiresolution method proposed by Kuhl et al. (1997a) and Kuhl and Wilson (2001) does not require a specific functional form for the estimator i · of the auxiliary function Ri · at each resolution i, R in the next section we specify a complete methodology for implementing the multiresolution procedure in general applications.

3.

Automated Implementation of the Multiresolution Procedure

The following discussion on computing the estimated i · applies separately to each resauxiliary function R olution i for i = 0     p; and although virtually all the quantities arising in this discussion will depend on the resolution index i, for the sake of notational simplicity we do not explicitly represent the dependence of some quantities on i (through subscripts, etc.). No confusion should result from this simplification; and throughout the rest of this section, we assume that the resolution index i is fixed so that we are considering the problem of constructing a partic i ·. ular auxiliary function estimator R 3.1.

Selecting the Functional Form for Each i · Estimated Auxiliary Function R The multiresolution procedure requires that each fit i · is monotone increasing and that ted function R i 0 = 0 and R i bi  = 1 for i = 0     p. Therefore, R we use a specially formulated polynomial that meets these requirements—namely, a degree-r polynomial of the form  s/b  if r = 1   i

r−1 i s = r−1 R   k r   #k i s/bi  + 1− #k i s/bi   if r > 1 k=1

k=1

(7) for s ∈ 0 bi and i = 0     p. (As noted at the beginning of this section, we write r rather than ri for notational simplicity so that r will always denote the i s for the current current estimate of the degree of R resolution level i.) For the case in which r > 1, we constrain the coefficients #k i  k = 1     r − 1 in (7) i s ≥ 0 for all s ∈ 0 bi . The polynomial (7) to yield R automatically satisfies the required boundary conditions. The simplest probabilistic arrival process has a constant arrival rate within the ith basic cycle 0 bi ; and this situation corresponds to the simplest form of our model (7) wherein we take r = 1 so that we obtain a linear mean-value function over that cycle. To estimate the auxiliary function (1) by a fitted i · of the form (7) at resolution i, we must function R

determine the appropriate degree r and then estimate the associated vector of coefficients, 1 if r = 1 (8) i ≡ T #1 i      #r−1 i  if r > 1 The standard approach for solving such a problem is to perform classical regression analysis using a forward selection, backward elimination, stepwise regression, or “best subset” regression procedure (Draper and Smith 1998). However, the assumptions of classical regression analysis require observations of a dependent variable that are independent and normal with a constant variance so that the corresponding error terms are independent and identically distributed (i.i.d.) normal random variables. In the case of fitting arrival data obtained from an NHPP that satisfies Assumptions 1 and 2, the observed cumulative relative frequencies within each basic cycle are neither normal nor independent; moreover, such responses do not possess a constant variance. Therefore, a variance-stabilizing transformation must be applied to the data before we try to use standard statistical procedures to determine an appro i ·. We chose to priate degree r for the function R formulate a likelihood ratio test (LRT) for this purpose because of its simplicity and the demonstrated effectiveness of similar LRTs in previous studies for which an exponential-polynomial rate component of the appropriate degree is used to model the long-term trend in an NHPP (Lee et al. 1991, Johnson et al. 1994, Kuhl et al. 1997b, Kuhl and Wilson 2000). 3.2.

Likelihood Ratio Test (LRT) for Determining the Degree of Each Estimated Auxiliary i · Function R If p ≥ 1, then to estimate the auxiliary functions defined by (1), we do the following: (a) at each resolution i for i = 0     p − 1, we let mi ≡ bi /bi+1 − 1; (b) we take mp ≡ N S; and (c) for i = 0     p, we fit a poly i · of the form (7) with appropriate nomial function R degree r to a set of points having the general form   (9) Zj  Wj T  j = 1     mi as defined in Section 2. If p ≥ 1 and the resolution i is in the range 0     p − 1, then at the jth point in (9) we take the abscissa Zj = jbi+1 and the ordinate Wj = Gi jbi+1  for j = 1     mi , where Gi · is defined in (2). Regardless of the value of p, at resolution p we take Zj = j and Wj = j/N S for j = 1     N S, where j is defined in the paragraph immediately following Remark 1. Since Wj is always a proportion, we exploit the variance-stabilizing transformation

  (10) for j = 1     mi Yj ≡ sin−1 Wj see Draper and Smith (1998).

Kuhl, Sumant, and Wilson: An Automated Multiresolution Procedure for Modeling Complex Arrival Processes

7

INFORMS Journal on Computing 18(1), pp. 3–18, © 2006 INFORMS

Corresponding to the dependent variable Yj defined by (10), we define the independent variable Xj ≡ Zj /bi ∈ 0 1

for j = 1     mi

and in terms of the vector of regression coefficients )/2 if r = 1 Cr ≡ (11) C1      Cr−1 T  if r > 1 for the degree-r polynomial   )     u if r = 1    2  fr u Cr  ≡ r−1

r−1    )     − Ck ur  if r > 1   C k uk +  2 k=1 k=1 for all u ∈ 0 1 , (12) we postulate the following statistical model for Yj as a function of Xj , Yj = fr Xj Cr  + -j

for j = 1     mi 

(13)

Based on (13), we assume that the transformation (10) yields approximately independent normal errors -j  with mean zero and constant variance . 2 so that we have iid

-j  j = 1     mi  ∼ N0 . 2 

(14)

Notice that in (12) we have fr 0 Cr  = 0 and fr 1 Cr  = )/2. Remark 2. In practice we have found that (10) yields approximately constant variance for the transformed responses and errors in (13); however, the assumptions of normal and stochastically independent errors in (14) are generally violated. As discussed in Sections 4–5, it appears that achieving approximately constant variance of the transformed responses and errors in (13) is sufficient to ensure reasonable effectiveness of the LRT for determining the degree of each estimated auxiliary function of the form (7). If we take r = 1 in (12), then no parameter estimation is required for the statistical model (13) of the transformed responses at resolution i. To test the adequacy of the resulting fit, we perform a standard regression analysis based on the ratio of the error sum of squares for the degree-1 fit, 2 mi   )j SSE1 ≡ Yj −  2mi + 1 j=1 to the total sum of squares, SST ≡

mi  j=1

Yj − Y 2 

mi 1  where Y ≡ Y mi j=1 j

If SSE1 /SST < 001, then the regression model (13) of degree 1 accounts for at least 99% of the variation in the transformed dependent variable Yj ; and thus we assume that an acceptable fit is obtained with a polynomial of degree r = 1. On the other hand, if SSE1 /SST ≥ 001, then to obtain an acceptable fit with (13) we must consider larger values of r, provided mi is sufficiently large to permit this.  r , the ordiFor each r r = 2 3   , we see that C nary least squares (OLS) estimator of the coefficient vector Cr in (12), is the solution of the following constrained least-squares regression problem:  r = arg min C r C

mi 

r  2 Yj − fr Xj C

(15)

for all u ∈ 0 1

(16)

j=1

subject to r  = 0 fr u C Observe that r  = fr u C

r−1  k=1

kC k uk−1 + r



 ) r−1 − C k ur−1 2 k=1 for all u ∈ 0 1  (17)

so that (16) can be interpreted as requiring that the zeros of the degree-r − 1 polynomial (17) lie outside 0 1; and thus the constraint (16) ensures that i s is strictly the corresponding function estimator R increasing on the open interval 0 bi  and monotone increasing on the closed interval 0 bi . Associated with (15) is the error sum of squares for the degree-r fit, SSEr =

mi  j=1

 r  2 Yj − fr Xj C

for r = 1 2    

(18)

 1 ≡ )/2 in conformance with (11). where we take C  r is both the MLE and the Now for each r (r ≥ 2), C OLS estimator of Cr when (13)–(14) hold; moreover, we see that . r2 ≡ SSEr /mi

for r = 1 2   

(19)

is the maximum likelihood estimator of . 2 for each postulated value of r (Seber 1977). Given Y ≡ Y1      Ymi T , we see that the associated likelihood function is given by  mi  r  2 Yj − fr Xj C  1 1  r Y = Lr C exp − √ 2 . r2 r 2) j=1 .  −m /2 = 2)e. r2 i  so that the resulting log-likelihood function for the fixed degree r is   2   r Y = − mi ln2) + 1 + ln . (20) r  r  C 2

Kuhl, Sumant, and Wilson: An Automated Multiresolution Procedure for Modeling Complex Arrival Processes

8

INFORMS Journal on Computing 18(1), pp. 3–18, © 2006 INFORMS

The degree r of the polynomial (12) is determined by a likelihood ratio test that has been adapted to constrained nonlinear regression. Our approach to formulating an appropriate LRT for determining r is based on a similar technique used by Avramidis and Wilson (1994) in the context of another constrained nonlinear regression problem. At the outset, we assume that (12)–(14) hold for some value of r to be determined. Starting with the degree r = 2 and  r to (15), we seek to computing the optimal solution C test the null hypothesis that r−1  k=1

Ck =

) 2

(21)

(so that the degree of the polynomial (12) is at most r − 1) versus the alternative hypothesis that r−1  k=1

Ck =

) 2

(so that the degree of the polynomial (12) is at least r). If (12)–(14) and (21) hold, then Theorem 4.4.4 of Serfling (1980) ensures that D

 r Y − r−1 C  r−1 Y −→ 2 2 1 2 r C m → i

D

where −→ denotes convergence in distribution as mi mi → tends to infinity and 2 2 1 denotes a chi-squared random variable with 1 degree of freedom. In view of (20), we see that 2  r Y − r−1 C  r−1 Y = −mi ln. r2 / .r−1  2 r C

We perform the likelihood ratio test at the level of significance , where 0 <  < 1, and the final estimate of r at resolution i is  1 if SSE1 /SST < 001 or mi ≤ 2    

2   r 2 r˜ = min r r ≥ 2 −mi ln . ≤ 21− 1 − 1 2  . r−1    otherwise, (22) 2 21− 1

where denotes the 1 −  quantile of the chisquared distribution with 1 degree of freedom. After determining the degree r˜ of the polynomial (12) used to fit the transformed arrival data (10) at resolution i, we take the same degree for the polynomial (7) that is used to fit the original (untransformed) arrival data (9) as follows: Wj = sin2 Yj  = fr˜ Xj i  + 3j

for j = 1     mi  (23)

where the regression coefficient vector i is given ˜ by (8) with dimension r = r.

Remark 3. Although the postulated error structure of the model (23) for the original (untransformed) responses is admittedly inconsistent with the postulated error structure for the model (13) of the transformed responses, we obtained substantially better values of the goodness-of-fit performance measures detailed in Section 5.2 below by fitting the original arrival data directly to the model (23) rather than untransforming the model (13) fitted to the transformed arrival data. For further discussion of this point, see Avramidis and Wilson (1994) and Section 13.6 of Draper and Smith (1998). Kuhl and Wilson (2000) formulate and evaluate an ordinary least squares (OLS) procedure for estimating NHPPs having parametric rate functions of the type called “Exponential-Polynomial-Trigonometric with Multiple Periodicities” (EPTMP); and they demonstrate that the estimation accuracy and computational efficiency of the OLS procedure in this type of application is reasonable. The same OLS procedure is applied  i of the vector i here to yield the final OLS estimator  of regression coefficients in the model (23). Figure 1 provides a formal algorithmic statement of the LRTbased multiresolution estimation procedure. Remark 4. Theorem 3 of Kuhl and Wilson (2001) provides general conditions under which with probability one a mean-value function estimator t based on (1)–(6) will converge uniformly to the theoretical mean value function t for all t ∈ 0 S . Unfortunately the LRT-based multiresolution estimation procedure of Figure 1 does not yield auxil i · i = 0     p that coniary function estimators R verge in any sense to their theoretical counterparts Ri · i = 0     p; and thus the resulting final mean 0       p  is not value function estimator t = t  guaranteed to converge to t for all t ∈ 0 S . Although our LRT-based multiresolution estimation procedure is heuristic, the results of Sections 4–5 show that this procedure can nevertheless yield accurate estimates of complex arrival processes exhibiting long-term trends and multiple periodic effects.

4.

Illustrative Example

To illustrate the automated multiresolution procedure for estimating NHPPs, we elaborate the example introduced in Section 2. Recall that the arrival rate has a long-term trend over a time horizon of 28 days as well as weekly and daily cyclic effects. Since the arrival process has two periodic components, we estimated auxiliary functions defined by (1) at each of the following levels of detail: (a) resolution 0, corresponding to the long-term trend over a period of 28 days; (b) resolution 1, corresponding to the cyclic rate component with a period of one week; and (c) resolution 2, corresponding to the cyclic rate component

Kuhl, Sumant, and Wilson: An Automated Multiresolution Procedure for Modeling Complex Arrival Processes

9

INFORMS Journal on Computing 18(1), pp. 3–18, © 2006 INFORMS

LRT-Based Multiresolution Estimation Procedure [0] Initialize the resolution index i ← 0.  0       p  defined by (3)–(6) using the estimated regression [1] If i > p then deliver the fitted mean-value function t = t   0       p  in (7)–(8); and stop. coefficients  If i ≤ p, then take the composite (10) of the arc-sine and square-root transformations of the original arrival data (9) within the basic resolution-i cycle 0 bi . i · as an estimator of Ri ·. [2] Test the adequacy of the degree-1 polynomial R  i ← 1 in the model (7) for the original (untransformed) responses at (a) If SSE1 /SST < 001 or mi ≤ 2, then set r˜ ← 1 and  resolution i; set i ← i + 1; and go to [1]. (b) If SSE1 /SST ≥ 001 and mi > 2, then set r ← 2 and go to [3].  r of the coefficient vector Cr in the model (13) for the transformed responses at resolution i. [3] Compute the OLS estimator C  r−1  0 T to obtain C  r ; for example, to compute C  2 , start with C 2 = C  1  0 T = )/2 0 T . r = C (a) Use starting value C  (b) Compute Cr according to (15) subject to (16). (c) Compute . r2 = SSEr /mi according to (18)–(19). 2 2 [4] If −mi ln. r2 / .r−1  ≤ 21− 1, then set r˜ ← r − 1 and go to [5]; otherwise, set r ← r + 1 and go to [3].

˜ compute the OLS fit to the model (23) for the original (untransformed) data; and save the resulting [5] Using the fitted degree r,  i of the vector (8) of regression coefficients in the model (7) for Ri ·. Set i ← i + 1 and go to [1]. ˜ r-dimensional estimator  LRT-Based Multiresolution Estimation Procedure

with a period of one day. The input data file contained 216 arrival times, and we specified the significance level  = 010 for the LRT (22) at each level of resolution. (The arrival data for this example are available from the web site discussed in Section 6.) At resolution 0, we obtained a data set composed of m0 = 3 points corresponding to the cumulative fraction of arrivals observed at the end of each week, excluding the endpoints of the overall observation interval (where the associated cumulative arrival percentages of 0% and 100% will be enforced automatically by the form of the fitted polynomials). After transforming this data set according to (10) and applying the LRT-based fitting procedure, we found the fitted function f r˜0 · of the form (12) to be linear so that we took r˜0 = 1. Consequently, we used a first 0 · of the form (7) with r = r˜0 = 1 degree polynomial R to approximate the original (untransformed) data on the cumulative fraction of arrivals at the end of each week in the overall observation interval. Figures 2 and 3 respectively show the fitted functions f r˜0 · and 0 · for resolution 0, where we plotted both functions R on the original time scale of resolution 0 so that the two fits can be compared easily. Next, we estimated the resolution-1 auxiliary function R1 · associated with a period of one week. First we superimposed the arrivals over each week. Thus we obtained a data set composed of m1 = 6 points corresponding to the cumulative fraction of arrivals observed at the end of each day of the week, excluding the week’s endpoints—that is, we took observations at the end of each successive resolution-2 cycle nested within the basic resolution-1 cycle, except for the endpoints of the latter cycle. We then transformed these observations using (10) and applied the LRTbased fitting procedure, obtaining a cubic polynomial

of the form (12) so that we took r˜1 = 3. Figure 4 depicts f r˜1 ·, the fitted polynomial at resolution 1 for the transformed data. Next we fitted a cubic polynomial 1 · of the form (7) with r = r˜1 = 3 to the original R (untransformed) data as shown in Figure 5. Notice 1 · that in Figures 4 and 5, we plotted both f r˜1 · and R on the original time scale of the basic weekly cycle associated with resolution 1. The next step in the procedure was to estimate the resolution-2 auxiliary function R2 · associated with a period of one day. We superimposed the m2 = 216 data points corresponding to the cumulative fraction of arrivals at each individual arrival time expressed merely as the time of day (fraction of the full day) at which the arrival occurred. Thus at resolution 2, we fitted a polynomial of degree r˜2 = 5 to obtain the function f r˜2 · shown in Figure 6 as a smooth curve. The smooth curve in Figure 7 depicts the corresponding Transformed cumulative fraction of arrivals

Figure 1

1.6

0.8

0 0

7

14

21

Time (days)

Figure 2

Estimated Function f r˜0 · for Transformed Data at Resolution 0

28

Kuhl, Sumant, and Wilson: An Automated Multiresolution Procedure for Modeling Complex Arrival Processes

10

INFORMS Journal on Computing 18(1), pp. 3–18, © 2006 INFORMS

1.0

Cumulative fraction of arrivals

Cumulative fraction of arrivals

1.0

0.5

0

0.5

0 0

7

14

21

28

0

1

2

Figure 3

Estimated Function R 0 · for Original (Untransformed) Data at Resolution 0

Figure 5

1.6

0.8

0 1

2

3

4

5

6

7

5

6

7

Estimated Function f r˜1 · for Transformed Data at Resolution 1

Estimated Function R 1 · for Original (Untransformed) Data at Resolution 1

1.6

0.8

0 0

Time (days)

Figure 4

4

by the step function. We see that in this example, the multiresolution procedure was capable of accurately approximating the empirical mean-value function as well as the empirical counterparts of the auxiliary functions Ri · i = 0     p based on a single realization of the target arrival process. The estimation procedure illustrated in this paper is controllable in the sense that at each resolution, the user can specify the maximum degree of the polynomial fitted to the data as well as the significance level for the likelihood ratio test. The above example with three resolutions is a fairly complex application that clearly illustrates the flexibility of the fitting procedure. At resolution 0, Figure 3 shows that we obtained a reasonable linear fit to the original (untransformed) data. Similarly at resolution 1, Figure 5 reveals that the fitted cubic function provided an excellent fit to the original data. The most difficult task was to obtain an adequate approximation to the 216 arrival times expressed at resolution 2 merely as arrival times Transformed cumulative fraction of arrivals

Transformed cumulative fraction of arrivals

2 · of the form (7) with estimated auxiliary function R r = r˜2 = 5 based on the original (untransformed) data. The step function in Figure 7 represents the empirical cumulative distribution function (c.d.f.) over each day associated with the original arrival times expressed merely as the time of day for each arrival, and the step function in Figure 6 represents the corresponding transformed c.d.f. values. Table 1 displays coefficients for the estimated functions f r˜i · i = 0     p for the transformed data as well as the coefficients for estimated auxiliary func i · i = 0     p for the original data. tions R After fitting the required auxiliary functions at each resolution, we evaluated the corresponding estimate of the mean-value function. Figure 8 displays the following plots: (a) the fitted mean-value function as specified by (3)–(6), which is represented by the smooth curve; and (b) the empirical mean-value function for the given realization of this arrival process— that is, the observed cumulative number of arrivals at each individual arrival time, which is represented

0

3

Time (days)

Time (days)

0.2

0.4

0.6

0.8

Time (days)

Figure 6

Estimated Function f r˜2 · for Transformed Data at Resolution 2

1.0

Kuhl, Sumant, and Wilson: An Automated Multiresolution Procedure for Modeling Complex Arrival Processes

11

INFORMS Journal on Computing 18(1), pp. 3–18, © 2006 INFORMS

1.0

Cumulative arrivals

Cumulative fraction of arrivals

200

0.5

150

100

50

0 0

0 0

0.2

0.4

0.6

0.8

7

14

1.0

21

28

Time (days)

Time (days) Figure 7

Estimated Function R 2 · for Original (Untransformed) Data at Resolution 2

during the day. Figure 7 shows that the fitted polynomial of degree 5 provided an excellent approximation to the distribution of arrival times during the day. Finally the fitted mean-value function in Figure 8 reveals that the overall arrival process was modeled with reasonable accuracy and that the LRT-based multiresolution estimation procedure performed acceptably in this application. Notice that the discrepancy at time 14 in Figure 8 is a result of the size of the residual (estimated error) at time 14 for resolution 0 as shown in Figure 3.

5.

Experimental Performance Evaluation

To evaluate the performance of the LRT-based multiresolution estimation procedure, we selected eight test processes that are NHPPs satisfying Assumptions 1 and 2. Each selected test process has a rate function with one or two cyclic components as well as a general trend over time; moreover, we formulated the corresponding theoretical auxiliary functions Table 1

Figure 8

for Illustrative Example Fitted Mean Value Function ·

Ri · i = 0     p defined by (1) so that every Ri · has the functional form prescribed by the right-hand side of (7). For each selected test process, we compared the fitted degree of the estimated auxiliary i · with the actual degree of the correfunction R sponding theoretical auxiliary function Ri · at resolution i for i = 0     p. We chose two types of performance measures to gauge the estimation accuracy achieved by the fitting procedure. Performance measures of the first type describe the accuracy of the fitted rate and meanvalue functions as estimators of the underlying theoretical rate and mean-value functions, respectively. Notice that for each selected test process, the theoretical rate and mean-value functions are related to the theoretical auxiliary functions Ri · i = 0     p through equation (15) of Kuhl and Wilson (2001) and thus can be evaluated exactly (at least to the limits of machine accuracy). Performance measures of the second type describe the accuracy of the fitted mean-value function as an estimator of the empirical mean-value function—that is, the observed buildup of arrivals as a function of time within a single realization of the target arrival process.

Estimated Polynomial Coefficients for Illustrative Example Estimated coefficients for transformed process

Resolution i 0 1 2

C1 i 0 0560998693 0 552856266 7 20057869

C2 i — −0 102613039 −32 8600502

C3 i

C4 i

C5 i

— 0 00795580633 78 1004944

— — −84 660675

— — 33 7904472

Estimated coefficients for original (untransformed) process Resolution i 0 1 2

1 i

2 i

0 0357142857 0 264645542 2 2876811

— −0 0184986202 −5 25465107

3 i — 0 0001571824 11 5634518

4 i — — −13 1874561

5 i — — 5 59097481

Kuhl, Sumant, and Wilson: An Automated Multiresolution Procedure for Modeling Complex Arrival Processes

12

INFORMS Journal on Computing 18(1), pp. 3–18, © 2006 INFORMS

5.1. Generation of Experimental Data We considered the following factors in constructing each test process used in our simulation experiments: (a) the number of cyclic effects; (b) the cycle lengths (periods) for the cyclic effects; (c) the degrees of the polynomials characterizing the associated cyclic components; and (d) the degree of the polynomial characterizing the long-term trend. We selected the cycle lengths (periods) of the cyclic effects as multiples of ten so that the LRT-based multiresolution estimation procedure would have at least nine points in the data set to be fitted at each resolution. We selected the polynomial degrees as follows: (i) for long-term trends, we used degrees one, three, and five; and (ii) for cyclic effects, we used degrees three and five. Table 2 displays the coefficients of all the polynomials used in our experimental performance evaluation. These polynomials have been specifically selected to yield a diversity of shapes for the auxiliary functions Ri · i = 0     p. For each of the eight selected test processes specified in Table 2, we generated K = 100 independent realizations of the test process using the simulation procedure of Kuhl and Wilson (2001) based on inversion of the theoretical mean-value function for the process. In light of a similar experimental performance evaluation by Kuhl and Wilson (2000) for OLS estimation of NHPPs having EPTMP-type rate functions, we concluded that by performing K = 100 Table 2 Case (test process)

independent replications of our LRT-based multiresolution estimation procedure applied to each selected test process, we would obtain reasonably stable estimates of the required goodness-of-fit measures for each test process. 5.2. Formulation of Performance Measures Johnson et al. (1994) formulate and use a set of numerical performance measures to evaluate their MLE-based procedure for estimating the rate and mean-value functions for NHPPs having rate functions of the type they called “Exponential-PolynomialTrigonometric" (EPT). Kuhl et al. (1997b) and Kuhl and Wilson (2000) extend this set of performance measures to evaluate their MLE- and OLS-based procedures for estimating an NHPP with an EPTMP-type rate function. In this paper, we use the goodness-offit criteria of Kuhl et al. (1997b) and Kuhl and Wilson (2000) as well as graphical techniques to delineate the performance of the LRT-based multiresolution estimation procedure when it was applied to the selected test problems. For completeness, in this section we explicitly define each performance measure used. The goodness-of-fit statistics encompass (a) absolute measures of error computed across all 100 independent replications of a given test process that constitute a single experiment, and (b) relative performance measures that are also computed across all 100 replications of a given test process but can be compared across experiments involving different test

Polynomial Coefficients Used to Generate Experimental Data Polynomial coefficient Resolution i

1 i

2 i

1

0 1

0 100000 1 629999

— −2 699994

2

0 1

0 100000 3 570480

3

0 1

4

3 i

4 i

5 i

— 2 069995

— —

— —

— −19 049687

— 50 161603

— −55 057099

— 21 374703

0 284523 1 629999

−0 056150 −2 699994

3.769841E−3 2 069995

— —

— —

0 1

0 281048 3 570480

−0 104787 −19 049687

0 021632 50 161603

−1.818530E−3 −55 057099

5.220857E−5 21 374703

5

0 1 2

0 010000 0 284523 1 629999

— −0 056150 −2 699994

— 3.769841E−3 2 069995

— — —

— — —

6

0 1 2

0 010000 0 281048 3 570480

— −0 104787 −19 049687

— 0 021632 50 161603

— −1.818530E−3 −55 057099

— 5.220857E−5 21 374703

7

0 1 2

2.261904E−3 0 284523 1 629999

2.559523E−4 −0 056150 −2 699994

−1.785714E−6 3.769841E−3 2 069995

— — —

— — —

8

0 1 2

5.667226E−3 0 281048 3 570480

7.088293E−5 −0 104787 −19 049687

2.290776E−6 0 021632 50 161603

−7.189598E−8 −1.818530E−3 −55 057099

4.623270E−10 5.220857E−5 21 374703

Kuhl, Sumant, and Wilson: An Automated Multiresolution Procedure for Modeling Complex Arrival Processes

13

INFORMS Journal on Computing 18(1), pp. 3–18, © 2006 INFORMS

processes. Let ˜ k t and  k t denote the estimated rate and mean-value functions, respectively, that were computed on replication k of a given test process k = 1     K. The average absolute error 5k and the maximum absolute error 5∗k incurred in estimating the rate function t on the kth replication are given by 5k ≡ and

1 S ˜  k t − t dt S 0

respectively, for k = 1     K. Similarly, the average absolute error 6k and the maximum absolute error 6∗k incurred in estimating the mean-value function t on the kth replication are given by

and

1 S   k t − t dt S 0

  6∗k ≡ max   k t − t 0 ≤ t ≤ S 

respectively, for k = 1     K. We let 5 and V5 respectively represent the sample mean and the sample coefficient of variation of the observations 5k  k = 1     K so that we have   K K  2 1/2  1  1  5 5k − 5 5k and V5 = 5 = K k=1 K − 1 k=1 The statistics 5∗ and V5∗ are calculated similarly from  and V6 the observations 5∗k  k = 1     K. We let 6 denote the analogous performance measures based on the average errors 6k  incurred in estimating the  ∗ and V6∗ mean-value function. Moreover, we let 6 denote the analogous performance measures based on the maximum errors 6∗k  incurred in estimating the mean-value function. To facilitate the comparison of results across different test processes, we also computed the following relative (normalized) goodness-of-fit statistics proposed by Johnson et al. (1994) and Kuhl et al. (1997b): 5 ≡ 6 ≡  S 0

5  S/S

 6 t dt/S



5∗ ≡

5∗  S/S

and 6∗ ≡  S 0

∗ 6 t dt/S

Nk tj k  = j

for j = 1     Nk S

Thus the sum of squared estimation errors and the mean squared estimation error incurred in approximating the empirical mean-value function on the kth replication are given by

  5∗k ≡ max  ˜ k t − t 0 ≤ t ≤ S 

6k ≡

given test process (k = 1     K). Notice that at the time tj k of the jth arrival on the kth replication of a given test process, the empirical mean-value function is given by



Following Kuhl and Wilson (2000), we also computed goodness-of-fit statistics that measure the ability of the estimation procedure to approximate the empirical mean-value function on each individual realization of a given arrival process. Let tj k  j = 1     Nk S denote the arrival epochs observed in the time interval 0 S on the kth replication of a

SSEEk ≡

Nk S

 j=1

 k tjk −j 2 and MSEEk ≡ SSEEk /Nk S

respectively, for k = 1     K. Furthermore, we let SSEE and VSSEE respectively represent the sample mean and the sample coefficient of variation of the observed values SSEEk  k = 1     K. Similarly, we let MSEE and VMSEE respectively represent the sample mean and the sample coefficient of variation of the observed values MSEEk  k = 1     K. On the kth replication of a given test process, the average absolute error and the maximum absolute error incurred in estimating the empirical mean-value function are given by Dk ≡ and

k S 1 N   t  − j Nk S j=1 k j k

  Dk∗ ≡ max   k tj k  − j j = 1     Nk S 

 respectively, for k = 1     K. Furthermore, we let D ∗  and D denote the sample means of the observed values Dk  k = 1     K and Dk∗  k = 1     K, respectively. Kuhl and Wilson (2000) also formulate two types of relative performance measures in which the Dk  and Dk∗  are standardized to facilitate comparison across experiments. The first type of relative performance measure based on the Dk  and Dk∗  uses the grand average level of the empirical mean-value function computed over all K replications of length S  to normalize the average performance measures D ∗  and D so that we have  D ≡ K  S k=1 0

 D Nk t dt/KS

D∗ ≡ K  S k=1 0

and

∗ D Nk t dt/KS



To formulate alternatives to D and D∗ that in some circumstances may be more useful in evaluating the multiresolution procedure, on the kth replication of a

Kuhl, Sumant, and Wilson: An Automated Multiresolution Procedure for Modeling Complex Arrival Processes

14

INFORMS Journal on Computing 18(1), pp. 3–18, © 2006 INFORMS

given test process we express the performance measures Dk and Dk∗ as percentages of the average level of the empirical mean-value function within that replication for k = 1     K, and then we average the resulting percentages across all replications of the given test process. Thus we obtain the normalized statistics HD ≡

K Dk 1 S K k=1 Nk t dt/S 0

HD∗ ≡

1 K

K  k=1

S 0

Table 3 Case (test process)

Nk t dt/S



In addition to computing numerical performance measures, we produced graphs to provide a visual means of judging the quality of the fits obtained with the LRT-based multiresolution estimation procedure. We graphed the underlying theoretical rate and mean-value functions along with tolerance bands for those functions that are based on our replicated sample estimators of those functions. For each fixed time t ∈ 0 S , let

˜ 1 t < ˜ 2 t < · · · < ˜ K t denote the ordered estimates of t computed from all K replications of the estimation procedure. Then for 0 < # < 1, an approximate 1001 − #% tolerance interval for t is given by  

˜ K#/2 t ˜ K1−#/2 t  where z denotes the smallest integer greater than or equal to z. Similarly, tolerance intervals are determined for the mean-value function t at a fixed time t ∈ 0 S . 5.3. Presentation and Analysis of Results Table 3 provides a comparison of the actual degree i · for all levof Ri · with the fitted degree of R els of resolution of each test process. Notice that the worst performance of the LRT (22) was observed for resolution 1 in case (test process) 2, for which a polynomial of the correct degree 5 was fitted on 88% of the replications while polynomials of degree 6 or 7 were fitted on 12% of the replications. Notice moreover that the LRT (22) avoided underestimating the degree of the polynomial to be fitted at every resolution for every test process. Thus in every test process we avoided the bias in estimating the auxiliary functions Ri · i = 0     p that is inherent in underestimating the number of terms in these functions; see Chapter 10 of Draper and Smith (1998). It should also be recognized that the errors -j  in the statistical model (13) on which (22) is based are not, in fact, normal and uncorrelated as we have assumed in (14);

Fitted degree

Resolution

Actual degree

1

0 1

1 3

100

2

0 1

1 5

100

3

0 1

3 3

4

0 1

5 5

5

0 1 2

1 3 3

100

0 1 2

1 5 5

100

7

0 1 2

3 3 3

8

0 1 2

5 5 5

and

Dk∗

Frequency of Fitted Polynomial Degree

6

1

2

3

4

5

6

7

88

10

2

100 96

3

1

2

1

100 98

2

100 99 99

1 1

100

99 98

99 95

98 98 100

1 2

1 2

2 2

nevertheless, we believe the results in Table 3 provide good evidence of the effectiveness of the LRT (22) and of its robustness against violations of the basic assumption of normal uncorrelated errors underlying this test procedure. Tables 4 and 5 contain a summary of the goodnessof-fit statistics formulated in Section 5.2. Figures 9–12 display the graphs of 90% tolerance bands for the rate function and mean-value function associated with two representative cases—namely, test processes 1 and 3. Sumant (2003) provides complete experimental results for all eight test process. Since Table 4 shows that 6 ≤ 004 and 6∗ ≤ 008 for all test processes while 5∗ = 204 for test process 8, we see that while the mean-value function was estimated with reasonable accuracy in all test processes, the errors in estimating the rate function were much larger in some cases. Moreover, from the coefficients of variation V5 , V5∗ , V6 , and V6∗ displayed in Table 4, we see that the variability of the estimation errors was not excessive in any test process. Considering all the goodness-offit statistics in Table 4, we concluded that the OLS procedure for obtaining the estimated auxiliary func i · i = 0     p was reasonably accurate. In tions R i · are based, the statistical model (7) on which the R the residuals are highly nonnormal and strongly correlated; thus we concluded that in our applications to test processes 1–8, the OLS procedure for estimating the auxiliary functions Ri · was robust against violations of its underlying assumptions—provided that

Kuhl, Sumant, and Wilson: An Automated Multiresolution Procedure for Modeling Complex Arrival Processes

15

INFORMS Journal on Computing 18(1), pp. 3–18, © 2006 INFORMS

Table 4 Case 1 2 3 4 5 6 7 8

Goodness-of-Fit Statistics for Estimating t and t 

V



∗

V∗

∗

 

V



∗ 

V∗

∗

4 63 5 77 5 89 13 00 0 58 1 29 0 68 1 40

0.45 0.34 0.31 0.09 0.34 0.09 0.29 0.09

0.05 0.06 0.06 0.13 0.06 0.13 0.07 0.14

12 76 21 26 40 02 86 44 4 17 8 69 7 11 20 42

0.54 0.49 0.47 0.25 0.51 0.33 0.40 0.29

0.13 0.21 0.40 0.86 0.42 0.87 0.71 2.04

13.47 10.09 14.88 18.40 13.50 13.76 16.15 15.22

0.71 0.88 0.59 0.40 0.71 0.70 0.58 0.55

0.03 0.02 0.03 0.03 0.03 0.03 0.03 0.04

27.36 20.57 27.88 39.42 27.39 28.29 30.48 35.00

0.70 0.84 0.54 0.30 0.70 0.66 0.56 0.46

0.06 0.04 0.06 0.07 0.06 0.06 0.06 0.08

the OLS procedure was supplied at the outset with accurate estimates of the degrees of the polynomials to be approximated. The results in Table 5 provide good evidence that the LRT-based multiresolution estimation procedure generally yielded a precise estimate of the target empirical mean-value function in each data set to which the procedure was applied. For example, in test process 5 we see that HD = 00001 and HD∗ = 00004. This means that on a “typical” realization of test process 5, the average and maximum absolute errors incurred in estimating the empirical mean-value function computed over that realization were, respectively, 0.01% and 0.04% of the time-weighted average level of the empirical mean-value function computed over that realization. Furthermore, we see that MSEE = 013 for test process 5. This provides another perspective on the accuracy with which the empirical mean-value function was estimated on each realization of test process 5. In general, the performance measures MSEE, D , D∗ , HD , and HD∗ in Table 5 show that the LRT-based multiresolution estimation procedure consistently yielded an accurate approximation to the empirical mean-value function in each test process. Remark 5. Although our primary focus is on the use of the LRT-based multiresolution estimation procedure when the arrival rate of the target process exhibits some form of periodic behavior as well as a long-term trend, the procedure can be used effectively for modeling and simulation of NHPPs even when there are no periodic effects so that we take p = 0 as the highest level of resolution. Table 5 Case 1 2 3 4 5 6 7 8

Moreover, we can take p = 0 and simply represent all time-dependent components of the arrival rate—both periodic and nonperiodic—within a single level of resolution. When we use the nonparametric approaches of Arkin and Leemis (2000) and Leemis (1991, 2004) for modeling and simulation of NHPPs, we always obtain a piecewise constant rate function. In contrast, if we use the LRT-based multiresolution procedure with p = 0, then we obtain a rate function that is continuous everywhere in 0 S and continuously differentiable everywhere except at the endpoints 0 and S. In some practical applications, we know that the rate function must be smooth based on prior information about the arrival-generation mechanism; this generally means that the rate function should be continuous everywhere and continuously differentiable everywhere except possibly at the endpoints of each basic cycle (if there are any cyclic effects). When we require meaningful estimates not only of the mean-value function but also of the rate function, the LRT-based multiresolution procedure may provide more useful information about the significant features of the rate function than could be obtained by other popular methods. Figures 9 and 11 clearly illustrate this point, and additional supporting evidence is available in Sumant (2003).

6.

Web-Based Input Modeling

The multiresolution procedure described in the previous sections has been implemented in a web-based input-modeling system that is documented in the Online Supplement to this paper on the journal’s web-

Goodness-of-Fit Statistics for Estimating Nt SSEE

VSSEE

MSEE

VMSEE

D

D∗

D

D ∗

HD

HD ∗

178 30 257 02 280 16 664 49 132 80 137 72 61 50 43 38

0.017 0.037 0.080 0.042 0.022 0.013 0.027 0.046

0.18 0.26 0.28 0.66 0.13 0.14 0.06 0.04

0.018 0.038 0.081 0.043 0.022 0.013 0.027 0.046

10 39 11 51 8 85 14 58 9 99 10 01 6 42 4 79

30.12 32.32 33.74 42.91 27.47 27.75 20.28 17.45

0.021 0.023 0.018 0.026 0.020 0.020 0.012 0.011

0.061 0.064 0.069 0.078 0.055 0.055 0.039 0.040

0.0001 0.0003 0.0002 0.0003 0.0001 0.0001 0.0001 0.0001

0.0005 0.0008 0.0008 0.0009 0.0004 0.0004 0.0003 0.0003

Kuhl, Sumant, and Wilson: An Automated Multiresolution Procedure for Modeling Complex Arrival Processes

16

INFORMS Journal on Computing 18(1), pp. 3–18, © 2006 INFORMS

300

700

500

200

Arrival rate

Arrival rate

600

100

400 300 200 100

0 0

2

1

3

4

5

6

7

8

9

0

10

0

1

2

3

4

Time t Figure 9

Fitted Rate Function for Test Process 1

Figure 11

site. The system can be used not only to fit an NHPP to data using the multiresolution procedure but also to generate arrival times from the fitted NHPP that could be used as inputs to a simulation model. The benefits of the web-based input-modeling system include the ability for a simulation analyst to access the software for fitting NHPPs to data from anywhere in the world. In addition, the data analysis can be completed merely by means of a web browser and without the need to install the software on the user’s computer. Furthermore, the system can be used in simulation courses to teach students about NHPPs and data fitting without having to provide students with their own copy of the fitting software. Figure 13 depicts a sample screen from an online session in which we accessed the web-based inputmodeling system. To execute the fitting procedure over the web, the user will need to supply an ASCII (plain-text) file containing a single column of observed arrival times as detailed in the Online Supplement to this paper. The user will also be prompted for the following information about the data and the parameters for the

7

8

9

10

Fitted Rate Function for Test Process 3

1,200

Cumulative mean arrivals

Cumulative mean arrivals

6

fitting procedure: (a) the ending time of the observation interval, S; (b) the number of periodic (cyclic) components, p; (c) the cycle length (period) bi of the ith cyclic effect for i = 0     p; (d) the significance level  that will be used in the LRT (22) for determining the degree of the polynomial; and (e) the maximum degree of the polynomials to fit at each resolution. Once the user has submitted this information, the fitting procedure is executed and the following results are presented: (a) graphs of the fitted auxiliary functions at each level of resolution for both the transformed and original data sets; and (b) a graph of the fitted mean-value function superimposed on the empirical mean-value function. After fitting an NHPP to arrival data, we can use the website to generate independent realizations of the fitted arrival process that can in turn drive a simulation experiment. The resulting arrival times are written into an ASCII (plain-text) file in a single-column format that can be easily read into commercial simulation software packages during the associated simulation run(s) requiring such inputs. See the Online Supplement for screen

1,200

600

0

600

0 0

1

2

3

4

5

6

7

8

9

0

10

Fitted Mean-Value Function for Test Process 1

1

2

3

4

5

6

7

8

Time t

Time t Figure 10

5

Time t

Figure 12

Fitted Mean-Value Function for Test Process 3

9

10

Kuhl, Sumant, and Wilson: An Automated Multiresolution Procedure for Modeling Complex Arrival Processes INFORMS Journal on Computing 18(1), pp. 3–18, © 2006 INFORMS

17

Figure 13 Web-Based Input-Modeling Software Note. Microsoft product screen shot reprinted with permission from Microsoft Corporation.

shots illustrating each step in the estimation or simulation of NHPPs using our web-based input-modeling system.

7.

Conclusion and Recommendations

In this paper we have developed a completely automated procedure for estimating arrival processes that have time-dependent arrival rates exhibiting longterm trends or nested periodic effects that might not be symmetric. We have also provided some evidence that the procedure is sufficiently flexible to enable accurate estimation of the mean-value function of the target arrival process while using a relatively small number of parameters that can be efficiently estimated from a single realization of the process. Future extensions of the multiresolution procedure will include (a) a graphical user interface for invoking the procedure in widely used simulation software systems, and (b) efficient software for generating independent replications of a fitted arrival process that can be invoked “on the fly” (that is, directly during the course of a simulation run) in the selected simulation software systems without the need for manipulating external files. The inversion scheme of Kuhl and Wilson (2001) for simulating NHPPs fitted by the multiresolution estimation procedure is substantially faster than the corresponding inversion scheme of Kuhl et al. (1997b) for simulating NHPPs that have an EPTMP-type rate

function; and we believe that future development (b) above could coupled with the flexibility of the multiresolution procedure would provide practitioners with a useful tool for modeling and simulation of many time-dependent arrival processes. Another direction for future work is to develop statistical tools for validating the basic assumptions of the multiresolution procedure in arrival processes to which the procedure might be applied. Acknowledgments

This research was partially supported by NSF Grant DMI9900164. The authors thank David Kelton, Susan Sanchez, and the anonymous reviewers for several suggestions that improved the readability of this paper.

References Arkin, B. L., L. M. Leemis. 2000. Nonparametric estimation of the cumulative intensity function for a nonhomogeneous Poisson process from overlapping realizations. Management Sci. 46 989–998. Avramidis, A. N., J. R. Wilson. 1994. A flexible method for estimating inverse distribution functions in simulation experiments. ORSA J. Comput. 6 342–355. Çinlar, E. 1975. Introduction to Stochastic Processes. Prentice-Hall, Englewood Cliffs, NJ. Draper, N. R., H. Smith. 1998. Applied Regression Analysis, 3rd ed. John Wiley & Sons, New York. Johnson, M. A., S. Lee, J. R. Wilson. 1994. Experimental evaluation of a procedure for estimating nonhomogeneous Poisson processes having cyclic behavior. ORSA J. Comput. 6 356–368.

18

Kuhl, Sumant, and Wilson: An Automated Multiresolution Procedure for Modeling Complex Arrival Processes

Kao, E. P. C., S.-L. Chang. 1988. Modeling time-dependent arrivals to service systems: A case in using a piecewise-polynomial rate function in a nonhomogeneous Poisson process. Management Sci. 34 1367–1379. Kuhl, M. E. 1994. Estimation and simulation of nonhomogeneous Poisson processes having multiple periodicities. M.S. thesis, Department of Industrial Engineering, North Carolina State University, Raleigh, NC, ftp://ftp.ncsu.edu/pub/eos/pub/jwilson/kuhl94ms.pdf. Accessed August 25, 2004. Kuhl, M. E., J. R. Wilson. 1996. User’s manual for mp3mle and mp3sim: Software for estimating and simulating nonhomogeneous Poisson processes. Department of Industrial Engineering, North Carolina State University, Raleigh, NC, ftp://ftp.ncsu.edu/pub/eos/pub/jwilson/mp3-user-man.pdf. Accessed August 25, 2004. Kuhl, M. E., J. R. Wilson. 2000. Least squares estimation of nonhomogeneous Poisson processes. J. Statist. Comput. Simulation 67 75–108. Kuhl, M. E., J. R. Wilson. 2001. Modeling and simulating Poisson processes having trends or nontrigonometric cyclic effects. Eur. J. Oper. Res. 133 566–582. Kuhl, M. E., H. Damerdji, J. R. Wilson. 1997a. Estimating and simulating Poisson processes with trends or asymmetric cyclic effects. S. Andradóttir, K. J. Healy, D. H. Withers, B. L. Nelson, eds. Proc. 1997 Winter Simulation Conf., Institute of Electrical and Electronics Engineers, Piscataway, NJ, 287–295, http://www.informs-sim.org/wsc97papers/0287.PDF. Accessed August 25, 2004. Kuhl, M. E., S. Sumant, J. R. Wilson. 2003. A flexible automated procedure for modeling complex arrival processes. S. Chick, P. J. Sánchez, D. Ferrin, D. J. Morrice, eds. Proc. 2003 Winter Simulation Conf., Institute of Electrical and Electronics Engineers, Piscataway, NJ, 399–407, http://www.informs-sim.org/ wsc03papers/049.pdf. Accessed August 25, 2004. Kuhl, M. E., J. R. Wilson, M. A. Johnson. 1997b. Estimating and simulating Poisson processes having trends or multiple periodicities. IIE Trans. 29 201–211.

INFORMS Journal on Computing 18(1), pp. 3–18, © 2006 INFORMS

Lee, S., J. R. Wilson, M. M. Crawford. 1991. Modeling and simulation of a nonhomogeneous Poisson process having cyclic behavior. Comm. Statist.—Simulation 20 777–809. Leemis, L. M. 1991. Nonparametric estimation of the cumulative intensity function for a nonhomogeneous Poisson process. Management Sci. 37 886–900. Leemis, L. M. 2004. Nonparametric estimation and variate generation for a nonhomogeneous Poisson process from event count data. IIE Trans. 36 1155–1160. Lewis, P. A. W. 1970. Remarks on the theory, computation, and application of the spectral analysis of series of events. J. Sound Vibration 12 353–375. Lewis, P. A. W., G. S. Shedler. 1976. Statistical analysis of nonstationary series of events in a data base system. IBM J. Res. Develop. 20 465–482. Massey, W. A., G. A. Parker, W. Whitt. 1996. Estimating the parameters of a nonhomogeneous Poisson process with linear rate. Telecomm. Systems 5 361–388. Pritsker, A. A. B. 1998. Life and death decisions: Organ transplantation allocation policy analysis. OR/MS Today 25 22–28. Pritsker, A. A. B., D. L. Martin, J. S. Reust, M. A. F. Wagner, O. P. Daily, A. M. Harper, E. B. Edwards, L. E. Bennett, J. R. Wilson, M. E. Kuhl, J. P. Roberts, M. D. Allen, J. F. Burdick. 1995. Organ transplantation policy evaluation. C. Alexopoulos, K. Kang, W. R. Lilegdon, D. Goldsman, eds. Proc. 1995 Winter Simulation Conf., Institute of Electrical and Electronics Engineers, Piscataway, NJ, 1314–1323. Seber, G. A. F. 1977. Linear Regression Analysis. John Wiley & Sons, New York. Serfling, R. J. 1980. Approximation Theorems of Mathematical Statistics. John Wiley & Sons, New York. Sumant, S. G. 2003. An automated procedure for simulating complex arrival processes: A web-based approach. M.S.I.E. thesis, Department of Industrial and Systems Engineering, Rochester Institute of Technology, Rochester, NY, ftp://ftp.ncsu. edu /pub /eos /pub /jwilson /sumant03ms. pdf. Accessed August 25, 2004.

Suggest Documents