Using RAPTOR to Evaluate Communication System Sparing Strategies

0 downloads 0 Views 147KB Size Report
Society for Modeling and Simulation International (SCS) can be found at: SIMULATION ... Downloaded from ..... spare is ordered at the 5th and 10th year, and rushed spares .... a free copy of this software, including example files, visit ... [3] Banks, J., J. Carson, and B. Nelson. 1999. Discrete-event system sim- ulation. 2d ed.
SIMULATION http://sim.sagepub.com

Using RAPTOR to Evaluate Communication System Sparing Strategies Stephen P. Chambal and Gerald T Mackulak SIMULATION 2002; 78; 681 DOI: 10.1177/0037549702078011003 The online version of this article can be found at: http://sim.sagepub.com/cgi/content/abstract/78/11/681

Published by: http://www.sagepublications.com

On behalf of:

Society for Modeling and Simulation International (SCS)

Additional services and information for SIMULATION can be found at: Email Alerts: http://sim.sagepub.com/cgi/alerts Subscriptions: http://sim.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008 © 2002 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

APPLICATIONS

Using RAPTOR to Evaluate Communication System Sparing Strategies Stephen P. Chambal Department of Operational Sciences Air Force Institute of Technology Wright-Patterson AFB, OH 45434 stephen.chambal@afit.edu Gerald T. Mackulak Department of Industrial Engineering Arizona State University Tempe, AZ 85287-5906 Simulation is a powerful tool for evaluating complex systems. RAPTOR (Rapid Availability Prototyping for Testing Operational Readiness) is a generic simulation developed by the Air Force and modified by the authors that predicts reliability, maintainability, and availability (RM&A) performance of systems and components. RAPTOR is uniquely suited for evaluating communication network system reliability, especially systems consisting of expensive components with highly variable availability times. Systems are modeled using a modified version of RAPTOR to compare competing sparing policies to obtain an overall target system availability. The sparing strategies evaluated differ by the order point and delivery time used to populate a common resource pool. The evaluation metric is high-level system availability over the life of the communication system. Alternative options are provided to the company in determining reorder and delivery strategy. This simplified case study demonstrates a strategy selection algorithm incorporating the RAPTOR simulation model. Keywords: Reliability simulation of complex systems; generic modeling of communication networks; RAPTOR; analyses of competing sparing strategies

1. Introduction Growth in the telecommunications industry has been phenomenal. In the near future, systems will exist that provide voice, image, and data throughout a single integrated worldwide network. But with this growth comes new challenges for the communications systems engineers. Communications systems are becoming increasingly complicated in structure and increasingly diverse in their span of coverage. Current systems have limitations in both bandwidth (number of simultaneous users) and coverage (areas where connection is provided). At issue is whether a user needs to be able to connect 100% of the time, regardless of physical location or current system user level. Certainly, the ability of a user to connect will be a function of the availability designed into the system as well as the subscriber base. Users of future systems will be willing to pay a reasonable fee for service only if the system is available an acceptable percentage of time. The greater the level of bandwidth and coverage, the more users will be willing to pay for the SIMULATION, Vol. 78, Issue 11, November 2002 681-691 © 2002 The Society for Modeling and Simulation International

| | | |

service. However, dropped calls or system unavailability will quickly decrease the perceived advantage of such systems and ultimately lower the charge rate these systems are capable of sustaining. Also at issue is the definition of the availability metric. For example, is a user willing to perceive two systems as equivalent when one system is down for 1 minute every 3 hours while the other is down 1 month every 15 years? At issue are the intervals between outages, the types of outages, and the duration of their repair. Communication system designers need to address these issues subject to the cost of initial installation and maintenance of system components. It is a given that system components will fail and need to be replaced. At issue is the redundancy of design, cost of maintenance and repair, philosophy of the repair strategy, and availability designed within the system. For example, a land-based system may be relatively inexpensive to maintain, offer high availability, and yet offer only limited coverage, while a space-based system may offer the same level of availability but be expensive to maintain and yet offer worldwide coverage. The communication network availability metric depends on sparing strategies, maintenance delays, and a variety of other logistic and administrative considerations.

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008 © 2002 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

Chambal and Mackulak

These constraints are often ignored in modeling networks and predicting system availability. The impact of these factors is critical to deciding on maintenance policies and determining overall system performance. The purpose of this study is to provide guidance in comparing sparing ordering policies in support of communication networks that involve complicated ordering policies not handled by current reliability software. The timing and quantity of resupply, the feasibility of emergency sparing, and the manipulation of common sparing pools complicate the availability evaluation of communication networks. Our intent is to use a simulation approach to create competing sparing strategies as a function of lifetime availability. We use a modified version of RAPTOR (Rapid Availability Prototyping for Testing Operational Readiness) to complete this analysis. The modeling methodology is discussed, including background material on generic reliability simulation. The simulation aspect is critical to understanding the capability of applying this approach to reliability and sparing analyses for any system confronted by the reader. Next, three sparing policies are developed and presented as the basis for the comparison case study. The policies are evaluated using the simulation software and modified to ensure equal lifetime availability for the communication network. The advantages and disadvantages of each policy are discussed. Final comments are directed at creating a cost metric for trade-off analyses. 2. Modeling Methodology Modeling and simulation can be an effective tool for examining strategies that compare competing systems based on availability. Simulation provides the flexibility to include multiple sparing strategies for the same system. A great deal of research has been done in using generic simulation tools to provide effective modeling with reduced programming requirements. Generic simulation tools allow engineers to model a variety of similar-type systems with maximum use of predefined routines and algorithms common to that class of problems. Often times, these simulation packages are point-and-click tools, which allow the system under test to be modeled with no programming required. Generic simulation tools provide the practicing engineer with a credible method for comparing alternative system configurations. Mackulak and Cochran [1] identified 15 features of using a generic simulation approach in manufacturing. Features for generic simulation tools in reliability, maintainability, and availability (RM&A) parallel many of these same requirements. No user coding, interactive capability through menus and windows, automated model execution, and automated output analysis are a few of the generic requirements. In a separate study, Mackulak, Savory, and Cochran [2] focused on important features of a simulation environment. Friendly user interface, generous graphics, and windows with menus are the top three listed features.

682 SIMULATION

Once again, these requirements transfer from manufacturing to reliability engineering as analysts evaluate complex systems. Manufacturing simulation provides insight into solutions for developing an approach to handle communication networks. A great deal of literature has been devoted to simulation in the manufacturing arena. Furthermore, software has been specifically developed for this purpose, and a number of these are reviewed by Banks, Carson, and Nelson [3]. A more complete review is provided by Banks [4] and covers 20 software packages. These packages are not applicable to telecommunications, and the practitioner would struggle in modeling networks with these packages. However, the communication industry has learned from manufacturing and developed modeling tools specifically designed for their system analysis. CACI Products Company applied the same principles of generic modeling to communication networks in developing COMNET III. This package is primarily aimed at computer networks and focuses on call routing, routing protocols, storage devices, and transport resources [5]. Law [6] gives examples of simulation languages and simulators specifically used for communication networks. These include BONeS NetDESIGNER, MODSIM II, SLAMSYSTEM (languages), BONeS PlanNet, and NETWORK II.5 (simulators). These systems focus on the network effectiveness (utilization of links, blocked calls, etc.) versus the network suitability (RM&A). Although effectiveness and suitability are interconnected, this research assumes that a detailed and valid model exists in terms of system structure and behavior. The emphasis thus shifts at incorporating logistical issues of RM&A into the simulation model for evaluation. Research in the field of reliability emphasizes generic modeling for system-level RM&A analysis. Generic models were written by Gonzales-Vega [7], Gopalakrishnan [8], and the National Aeronautics and Space Administration (NASA) Lewis Research Center [9, 10]. The Air Force Operational Test and Evaluation Center (AFOTEC) maintains ownership of the RAPTOR software [11]. This program was developed to enhance test and evaluation of new weapon systems and support equipment being purchased by the Air Force. RAPTOR leads the field of reliability simulation tools and sets the standard for software capability in performing RM&A analysis. RAPTOR is a modeling framework controlled completely by a point-and-click graphical user interface and is capable of quick analysis on RM&A characteristics of the defined system. The user chooses between 15 distributions to describe the failure and repair behavior of the individual components. The software handles sparing information, administrative and logistic delays, stand-by units, series, parallel, and any combination of reliability block diagrams [11]. The PC Windows-based simulation software executes rapidly depending on the size and complexity of the system. RAPTOR allows for long-term analysis as well

Volume 78, Number 11

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008 © 2002 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

USING RAPTOR TO EVALUATE SYSTEM SPARING STRATEGIES

as “mission-based” scenarios with specified time duration. The software graphically depicts the changing conditions of the system, allowing the users to visually examine the operating system during simulation [11]. This enhances model building, validation, and presentation capabilities above and beyond other simulation packages. We use RAPTOR to implement the algorithm required for this study. The generic nature allows for accurate design of telecommunication networks, and the suitability focus provides the desired output metrics. Extensive changes to the code were implemented by Chambal [12] to enhance the logistics view of the software and to include the appropriate modeling behavior to evaluate competing sparing strategies. Additional modifications were programmed to implement the logistical changes in the input, model behavior, and output performance of the software. RAPTOR simulates multiple sparing strategies for identical systems, with the user defining input parameters based on delay times, ordering periods, spare arrivals, and order quantity. Furthermore, detailed reports are available for verification and validation along with additional analysis conducted outside the simulation environment. Output files can be directly loaded into other software tools for more detailed statistical and cost analysis. Table 1 identifies the required input to build a reliability block diagram simulation. This information sets the input parameters for running the simulation. Changing these parameters allows the analysis to be completed.

3.1 Policy 1: The Current Sparing Strategy

3. Creating the Sparing Policies

If the system were to order a replacement spare (now considered a “routine” spare) upon depletion of the spare pool, it could possibly reduce the downtime associated with the next component failure. If the routine spare arrives before another unit fails, the system only experiences downtime related to replacing the failed unit (maintenance repair time). This change in ordering policy creates a second sparing strategy that will be referred to as Policy 2. Policy 1 strategy uses rushed spares while the Policy 2 strategy uses routine spares and follows the flow diagram given in Figure 2. Policy 2 changes the timing of spare ordering but is still reactive in nature. No communication spares are ordered until individual components fail. The availability will undoubtedly benefit from implementing a periodic ordering policy, providing a more proactive approach to the final sparing policy.

The trade-off typical to sparing strategy analysis lies in the allocation of maintenance resources. If a company does not want to tie up capital with an inefficient spare pool, it is subject to the increased cost due to producing “rushed” spares. Often, an expensive component that is rushed through production and delivery can have increased cost that can exceed the lost capital tied to spares sitting in reserve. Of course, a company can elect to not rush a spare through production or delivery if it is willing to accept lower system availability. The communications network in this study is an encoded version of an actual system (coded for confidentiality reasons) composed of nine subsystems or components separated by large physical distances. The system description has been simplified to explain the approach applied in this analysis and to maintain confidentiality of sensitive data. All nine components are independent and identical and experience the same rate of failure. All nine units must be operational for the system to be in an up or available condition. The failure rate is exponential, with a 10-year reliability of 50%. This implies a mean time to failure of 173 months, or just over 14.4 years (R(t) = e−λt , solving for lambda). The network components share pooled spares, which are identical to the original components. The target availability for the system is 93% over a 15-year life for the network.

The system currently operates under the following rules and will be referred to as Policy 1. One spare is placed in the spare pool at the beginning of system operation. The system operates until one component fails, causing the system to be unavailable. The pooled spare is used to replace the downed component, and the system becomes operational. The spare pool is not replenished until a second component fails. At this time, a spare is ordered and immediately used to replace the failed component upon its arrival. Therefore, the spare pool remains empty for the remainder of system operations once the first spare is required. Figure 1 provides an overview of this order and inventory management strategy. The main disadvantage to this sparing design is guaranteed downtime (system unavailability). The downtime is exaggerated by failing to replenish the spare pool upon its depletion. The repair time to replace the downed component cannot be avoided; however, the delay time due to ordering a spare can be reduced. The situation is magnified due to the high cost of spare delivery and an added expense of production and delivery due to the “rushed” nature to fill the spare requirement. Furthermore, the unavailability of the system is very costly both in terms of lost revenue and loss of goodwill. 3.2 Policy 2: Routine Ordering Sparing Strategy

3.3 Policy 3: Scheduled Ordering Sparing Strategy Policy 3 is proactive in nature and involves ordering “scheduled” spares that arrive to the spare pool at a predetermined time interval. This policy also relies on rushed spares if failures occur prior to the arrival of the scheduled replenishment of the spare pool. The behavior of this policy is very similar to Policy 1, with the exception of scheduled spares. Figure 3 provides a pictorial representation of the sparing policy.

Volume 78, Number 11 SIMULATION

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008 © 2002 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

683

Chambal and Mackulak

Table 1. Required input parameters for RAPTOR simulation software Category

Variable

Description

Block details

Name Operating/event block

Block name Boolean variable; is the block a one-shot component (i.e., a switch)? Random stream for distribution draws Boolean variable; does the component continue to operate or go into stand-by when the system fails? Boolean variable; is this component dependent on other components? One of 15 failure distributions

Random stream number Continues to operate Stream dependency Failure/repair distributions

Failure distribution Parameters Repair distribution Parameters Increment/decrement factor Increment/decrement limit Repair by adjustment

Maintenance information

Given parameters for the specified distribution One of 15 repair distributions Given parameters for the specified distribution Repair better, same, or worse than new condition The limit for the repair increment/decrement factor Boolean variable; is the component repaired by adjust or just “remove and replace”? One of 15 repair distributions Given parameters for the specified distribution Boolean variable; yes or no

Adjustment repair distribution Parameters Infinite spares Pooled spares (PS) (includes variables below) PS—Initial stock level PS—Rushed spares PS—Order upon depletion Custom spares (CS) (includes variables below) CS—ALDT CS—Initial stock level CS—Rushed spares Cold stand-by

Pooled sparing is used when multiple components use the same sparing capability Initial spares available Length of time to receive rushed spares and the number ordered Boolean variable; do you order upon depletion or upon request? Custom sparing is used when each block has its own sparing capability Administrative and logistic delay time Initial spares available Length of time to receive rushed spares and number ordered These are different from spares and are returned to the stand-by pool after repair of original component

This information is used during the simulation runs to compare sparing strategies.

Additional changes could affect one or more of the previously discussed sparing policies. However, for this case study, these three policies will be used as a baseline for presenting alternatives to the decision maker. The policies will be implemented using the modified RAPTOR code to predict the lifetime availability. Input parameters are manipulated to ensure that all three policies have equitable availability. Once this is accomplished, the advantages and disadvantages can be addressed and presented to the decision maker with final recommendations. 4. Simulating the Policies to Create Equal Availability Options The sparing strategies are implemented in the RAPTOR

684 SIMULATION

simulation model to determine the effect on system performance. RAPTOR accepts any ordering policy or combination of policies for maintaining the communication network. However, the goal of the simulations is to identify the specific levels of the input variables where all three policies have equal lifetime availability. The initial analysis varies three factors at two levels (eight scenarios), and the system availability is predicted as the evaluation metric. The number of initial spares in the communications network is varied from one to two. The number of replenishment spares ordered is also varied from one to two units, and the system availability is predicted. Finally, the sparing policy is manipulated based on the previous discussions. Policy 1 and 2 are simulated, reflecting the rushed spares being ordered when a failed unit

Volume 78, Number 11

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008 © 2002 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

USING RAPTOR TO EVALUATE SYSTEM SPARING STRATEGIES

Generate Next Component Failure

Policy 1 Upon Request Yes

Is the System Available

Make System Unavailable

No Is the Spare Pool Empty

Yes

Are there Incoming Spares Available

No

Yes Repair Component

Yes

Order new Spare(s)

Queue for Spare Arrival

Decrement Incoming Spares

Increment Incoming Spares

Are there other failed components

No

Make System Available Figure 1. The simulation we use follows the flowchart to model sparing for the communications network. Rushed spares are ordered only after a component fails with an empty resource pool. These spares are rushed to the user for maintenance. Scheduled spares are added to provide added flexibility and arrive periodically.

finds the spare pool empty and the routine spares being ordered upon depletion of the spare pool. We assume the communication network is operational at the beginning of the simulation and evaluate the deterioration over the 15-year period (the life of the system) using an availability metric. The simulation tracks the system availability for a 15-year period. All nine subsystems or components must be operating for the system to be available. The spare arrival time is set to 30 days for both ordering policies, Policy 1 (upon request) and Policy 2 (upon

depletion). Once the spare has arrived, the repair distribution is uniform real, ranging from 15 to 30 days. One thousand replications are performed to calculate average availability. The eight scenarios are simulated, and their availability results are provided in the Table 2. The results of these simulations demonstrate that a high level of availability can be achieved with only one initial spare and one for a reorder quantity when required. The availability numbers may be artificially high due to the short delivery time assumed for this first step of analysis. Volume 78, Number 11 SIMULATION

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008 © 2002 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

685

Chambal and Mackulak

Generate Next Component Failure

Policy 2 Upon Depletion

Is the System Available

Yes

Make System Unavailable

No No

Order new Spare(s)

Are there Incoming Spares Available

Yes

Yes Is this the Last Spare

Is the Spare Pool Empty

Order new Spare(s)

Yes

Increment Incoming Spares

Decrement Incoming Spares

Queue for Spare Arrival

Increment Incoming Spares

Repair Component

Yes

Are there other failed components No

Make System Available

Figure 2. The simulation uses this flowchart to incorporate routine spares as an alternative strategy for sparing policy. The spares are ordered as the resource pool is depleted.

Table 2. Increased availability as the number of initial spares and spares ordered increases from one to two Number of Initial Spares

Ordering Policy

Number of Spared Ordered

System Availability (%)

1 1 2 2 1 1 2 2

Upon request Upon request Upon request Upon request Upon depletion Upon depletion Upon depletion Upon depletion

1 2 1 2 1 2 1 2

92.0 93.8 92.6 94.1 96.1 96.2 96.1 96.2

Ordering spares upon depletion returns high availability, but both strategies are close to the target 93% availability with one initial spare and one spare ordered at a time.

686 SIMULATION

Volume 78, Number 11

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008 © 2002 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

USING RAPTOR TO EVALUATE SYSTEM SPARING STRATEGIES

Generate Next Component Failure

Yes Is the System Available

Is the Spare Pool Empty

Generate Next Scheduled Spare Arrival

Make System Unavailable

No Are there Incoming Spares Available

Yes

No

Yes Repair Component

Yes

Order new Spare(s)

Queue for Spare Arrival

Decrement Incoming Spares

Increment Incoming Spares

Are there other failed components

Policy 3 Schedules

No

Make System Available Figure 3. The simulation uses this flowchart to incorporate scheduled spares as an alternative strategy for sparing policy. The spares are ordered on a fixed interval, and rushed spares are ordered as required.

The next step of the analysis involves changing the delivery time to receive the appropriate spares. Availability is inversely proportional to the length of delivery time. Policy 1 and Policy 2 are simulated as the ordering time is varied from 10 to 180 days. Figure 4 plots the system lifetime availability versus delivery time. This graph shows the decrease in availability as the ordering time increases. As expected, Policy 1 is extremely sensitive as the delivery time for rushed spares directly affects guaranteed downtime and, therefore, lifetime availability. Policy 2 is almost insensitive to delivery time in the range defined (10-180 days). This is due to the high component reliabilities and the fact that the routine spares

may still arrive before the system experiences downtime due to a depleted spare pool. This graph provides valuable information in identifying the input setting for the first two sparing policies. The policies are chosen to offer comparable system availability. Policy 1 is maintaining the system with a 20day reorder time for an “upon request” ordering strategy with rushed spares. This provides system availability approximately equal to 93%. The availability increased from 92% experienced previously with a 30-day reorder time. Policy 2 uses an “upon depletion” ordering strategy with routine spares and allows the delivery time to increase to 150 days, or 5 months. Although the order time seems

Volume 78, Number 11 SIMULATION

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008 © 2002 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

687

Availability, Percentage

Chambal and Mackulak

98 96 94

Upon Depletion Upon Request

92 90 88 86 0

50

100

150

200

Ordering Time, Days Figure 4. The availability for “upon depletion” is far more sensitive to the order time than “upon request.” The ordering time for each policy is given as the curve crosses the 93% target value.

excessive, the system availability maintains the target value of 93% due to infrequent failures. Policy 3 must be analyzed to identify the input setting required to provide 93% lifetime availability. The main focus in this analysis deals with determining the scheduled ordering interval. This creates a predetermined interval for ordering replenishment spares for the resource pool. RAPTOR simulation output provides information for determining an appropriate interval value. Figures 5 and 6 plot the time to first failure and the availability over time. Examining these graphs helps identify a scheduled reorder period for replacement spares required in Policy 3. The mean time to first request for rushed spares is just over 40 months, close to 3.5 years. The system availability drops below 93% around the 5-year point. By ordering spares on a scheduled interval, the availability may increase or stabilize near 93%. The simulation input for Policy 3 is modified, and scheduled arrivals are added to the spare pool resource. Table 3 gives the availability results as the ordering period is varied from 1 to 5 years. No scheduled arrivals are included for comparison. Table 3 indicates an advantage to scheduling spare arrivals. The disadvantage lies in the number of unused spares remaining at the end of the 15-year period. As the frequency of the arrivals increases, the availability increases and so does the unused number of spares. This is an inefficient use of capital resources. The number of unscheduled spares signifies the number of failures still requiring a rushed spare due to a depleted spare pool. The rushed spares take 30 days to arrive. Using 93% as our assumed goal, a scheduled order period of one spare every 5 years seems appropriate. No spares are left after 15 years, and only six additional spares are required under rushed situations. A spare is not ordered at the 15-year point (the end of the network life), only at the 5-year and 10-year point in time. This defines the input parameters for Policy 3 used for comparison in the final section of analysis. 688 SIMULATION

5. Comparing Sparing Policies and Introducing Cost A brief review of the three alternatives provides an overview for the competing strategies for maintaining 93% availability for this communications network. All three policies have one initial spare and the ability to order replenishment spares if the spare pool is empty. Policy 1 orders spares when the resource pool is found empty by a requesting component and is considered “upon request” using rushed spares. Policy 2 orders spares when a repaired component depletes the spare pool and is considered “upon depletion” using routine spares. Policy 3 is a mix of scheduled and unscheduled ordering. A scheduled spare is ordered at the 5th and 10th year, and rushed spares are ordered when the spare pool is found empty, similar to Policy 1. These three alternatives are considered equal on the grounds of availability (all approximately 93%: 93.4%, 92.9%, and 93.1%) and are compared based on advantages and disadvantages. Table 4 summarizes the policy information. The critical issue in selecting a sparing policy is the delivery time. For many communication networks, the time required to receive an incoming spare can be quite lengthy. This time is increased if the replacement component must be manufactured or acquired through an extensive acquisition process. Communication networks depend on sophisticated subsystems with multiple-year development times. The technology is continually changing, and the capital investments are tremendous. This places more emphasis on the sparing policy to outline the required delivery policy for needed spares. The logistics of delivery can also affect system availability. Depending on the delivery platform, the cost of meeting the delivery schedule may be excessive. Launch facilities can be reserved for months in advance. Requiring a rushed spare can be very costly and at times almost impossible.

Volume 78, Number 11

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008 © 2002 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

130

120

110

100

90

80

70

60

50

40

30

20

10

120 100 80 60 40 20 0 0

Frequency

USING RAPTOR TO EVALUATE SYSTEM SPARING STRATEGIES

Months

Availability, Percentage

Figure 5. This graph plots the time to first failure. The average first failure time is approximately 40 months.

96 95 94 93 92 91 90 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

T ime, years Figure 6. The availability of the systems drops below the target 93% at approximately 4.5 years

This issue is further complicated by weather, politics, reliability of launch facilities, and a host of other concerns. With these factors in mind, the policies are reevaluated. Policy 1 has tremendous disadvantages centered on guaranteed downtime and the use of rushed spares. The downtime is drastically affected by spare availability, launch access, and the timing of the system failure. Timing issues can affect the ease of repairing the system based on the location/positioning of the failed component. Policy 1 has the advantage of lower capital expenditure. Network spares are not left in inventory or resource pools. The freed capital can be invested in other areas of the company to generate revenue. The cost of rushed spares can be offset by corporate agreements with launch facilities and suppliers. Finally, the entire communication network may not fail when a component is lost. The system performance may

be degraded in terms of the number of users the system can support. Policy 2 has the advantage of ordering routine spares. In most cases, these spares replenish the resource pool prior to the next component failure. The main disadvantage is capital lost due to inventory. In addition, the delivery time is longer and may affect the system in the case of simultaneous failures. The company’s ability to accelerate the delivery schedule is questionable and very expensive. Therefore, the advantage of extending the delivery time with minimal impact to the availability may create additional downtime in extreme situations. Policy 3 can be seen as a combination of the first two policies. This policy has an advantage of a predictable replenishment cycle. The ability to schedule spare development and delivery can significantly reduce costs. However,

Volume 78, Number 11 SIMULATION

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008 © 2002 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

689

Chambal and Mackulak

Table 3. System availability decreases as the scheduled arrival interval increases Arrival Interval

Availability (%)

Ending Number of Spares

Unscheduled Spares

1 year 2 years 3 years 4 years 5 years No scheduled arrivals

96.0 94.9 94.0 93.5 93.1 92.0

6.19 1.42 0.36 0.23 0.07 0

0.45 2.64 4.46 5.35 6.15 8.128

Using a 5-year scheduled arrival maintains the system at the desired availability level.

Table 4. Summary information for all three sparing policies Policy 1 2 3

Ordering Point

Delivery Time

When failed component finds spare pool empty When failed component uses last spare Scheduled intervals and rushed as required

20 days

when unscheduled spares are required on a rushed basis, Policy 3 is affected in the same manner as Policy 1. This policy is considered the most conservative but also suffers from the potential of having the largest inventory. The capital expenditures must be analyzed in terms of all three sparing policies. RAPTOR simulation provides a number of output statistics and data files to build a valid cost equation for each of the alternatives. Since the time value of money is considered an influence to the problem, the simulation must report when the spare ordering takes place. Each alternative could be simulated, building a histogram of order times for each spare replenishment activity. Similar to the graph depicting time to first request, RAPTOR provides the data necessary to characterize the time to second request, third request, and so on. Three cost equations are provided as possible metrics to perform trade-off analysis based on the defined policies. Cost is based on a 15-year availability for the communications network and is given in the following equations: Alt1 → Cost :=

j k   l=2

Alt2 → Cost :=

i=1

j k   l=1

A ∗ v n /j,

Scheduled, 5 years rushed, 30 days

“upon depletion” spares in Alternative 2, v is the return on investment ratio, roi is the return on investment, n is the order time for the rushed spares, j is the simulation run number (in this scenario, j = 1000), and k is the system failure number (notice Alt2 orders spares after first failure), with the value of k increasing until 90% of the failure times are accounted for. Alternative 3 requires additional output from the simulation software to separate the cost of ordering scheduled spares and rushed spares. The rushed spares are used to augment a depleted spare pool when the scheduled spares have not arrived. The simulation output reports a cumulative number of maintenance delays throughout the system for the given number of years. These delays represent the cumulative number of rushed spares required to maintain the 93% target availability. The total cost for Alternative 3 is based on ordering the rushed spares at the end of each year, along with the scheduled spares arriving in the 5th and 10th year of system operation. The formula for Alternative 3 is given as follows:  j   i Alt3 → Cost := A ∗ Ri ∗ v + B ∗ v 5 + B ∗ v 10 i=1

B ∗ v n /j,

i=1

v := 1/1 + roi, where A is the cost factor for ordering “upon request” spares in Alternative 1, B is the cost factor for ordering 690 SIMULATION

180 days

v := 1/1 + roi, where A is the cost factor for ordering a rushed spare in Alternative 3, R is the number of rushed spares required for each year, B is the cost factor for ordering scheduled spares in Alternative 3 (with scheduled spares ordered at the 5- and 10-year points), v is the return on investment

Volume 78, Number 11

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008 © 2002 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

USING RAPTOR TO EVALUATE SYSTEM SPARING STRATEGIES

ratio, roi is the return on investment (in this case, 10%), and i is the number of years in the lifetime of the system (in this case, 15). 6. Conclusion The sparing policies are compared based on maintaining predetermined lifetime availability. There are significant cost differences between ordering spares for arrival in 20 to 30 days (rushed spares) versus 180 days (routine spares). Policy 1 can require additional manpower, expensive delivery costs, and added factors associated with using rushed spares. Policy 2 provides flexibility and a longer window for spares arriving through routine channels, thus reducing the replenishment cost. The scheduled spares for Policy 3 are assumed the least expensive and provide some benefit to maintaining a high availability while reducing the number of rushed spares required. Developing an appropriate cost equation may provide the decision maker with an additional metric to compare these sparing policies. The RAPTOR simulation software provides flexibility for defining the maintenance strategy for achieving high availability of the communications network. As sparing strategies change, the generic simulation input is modified, providing new output to evaluated competing policies. There are many logistical issues not addressed in this article. Budget issues and contractual arrangements would have to be considered, along with the politics of maintaining an extensive communication network. The impact on the end user and customer satisfaction would also affect the final policy decision. The main goal of this paper, however, is not to provide a guaranteed solution but to demonstrate the insight provided by using simulation to compare sparing strategies. Using a generic simulation approach can be valuable to modeling the overall behavior of a complicated communications network. For downloading a free copy of this software, including example files, visit www.arinc.com/raptor/.

7. References [1] Mackulak, G. T., and J. K. Cochran. 1990. Generic/specific modeling: An improvement to CIM simulation techniques. Optimization of Manufacturing Systems Design, pp. 237-59. [2] Mackulak, G. T., P. A. Savory, and J. K. Cochran. 1994. Ascertaining important features for industrial simulation environments. Simulation 63 (4): 211-21. [3] Banks, J., J. Carson, and B. Nelson. 1999. Discrete-event system simulation. 2d ed. Upper Saddle River, NJ: Prentice Hall. [4] Banks, J. 1993. Software for simulation. Conference Proceedings, Winter Simulation Conference. [5] Jones, J. 1995. COMNET III: Object-oriented network performance prediction. Conference Proceedings, Winter Simulation Conference, pp. 545-47. [6] Law, A. 1995. Simulation of communications networks. Conference Proceedings, Winter Simulation Conference, pp. 46-50. [7] Gonzales-Vega, O. 1987. Reliability simulation. Ph.D. diss., Texas A&M. [8] Gopalakrishnan, K. 1985. Discrete event reliability maintainability availability (DERMA) model. Paper presented the joint national meeting of TIMS/ORSA. [9] Hoffman, D. J., and L. A. Viterna. 1991. User’s guide, ETARA PC version 3.3: Reliability, availability, maintainability simulation model. Cleveland, OH: NASA Lewis Research Center. [10] Stalnaker, D. K. 1993. User’s guide, ACARA. Cleveland, OH: NASA Lewis Research Center. [11] Air Force Operational Test and Evaluation Center. 1995. User’s manual: Rapid availability prototyping for testing operational readiness (RAPTOR). Albuquerque, NM: Air Force Operational Test and Evaluation Center. [12] Chambal, S. P. 1999. Advancing reliability, maintainability, and availability analysis through a robust simulation environment. Ph.D. diss., Arizona State University.

Stephen P. Chambal is an assistant professor of operations research in the Department of Operational Sciences at the Air Force Institute of Technology at Wright-Patterson Air Force Base, Ohio. Gerald T. Mackulak is an associate professor of engineering in the Department of Industrial Engineering at Arizona State University, Tempe.

Volume 78, Number 11 SIMULATION

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008 © 2002 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

691

Suggest Documents