for system configuration and resource allocation in large healthcare systems. ... Administration Medical Center, Ascension Health, and St. Vincent Hospitals.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 3, MAY 2008
605
Using Shared-Resource Capacity for Robust Control of Failure-Prone Manufacturing Systems Shengyong Wang, Song Foh Chew, and Mark A. Lawley
Abstract—Deadlock-free resource allocation has been an active area of research in flexible manufacturing. Most researchers have assumed that allocated resources do not fail, and thus, little research has addressed the discrete-event supervision of manufacturing systems that are subject to resource failure. In our previous work, we developed supervisory controllers to ensure robust deadlock-free operation for systems with unreliable resources. These controllers guarantee that parts requiring failed resources do not block the production of parts that are not requiring failed resources. This previous work assumes that parts requiring failed resources can be advanced into failure-dependent (FD) buffer space (buffer space exclusively dedicated to parts requiring unreliable resources). Supervisors admit only states for which a sequence of such part advancements is feasible. The research presented in this paper relaxes this assumption because, in some systems, providing FD buffer space might be too expensive or it might be desirable to load the system more heavily with FD parts. In this paper, we concentrate on distributing parts requiring failed resources throughout the buffer space of shared resources so that these distributed parts do not block the production of part types that are not requiring failed resources. The approach presented here requires no state enumeration and is polynomial in stable measures of system size. We also present results from simulation experiments that compare system performance under these new policies with system performance under our previously published supervisors. These results show that our new policies allow better performance if the required part mixes favor FD part types. The systems of interest are single-unit resource allocation systems. Index Terms—Deadlock avoidance, failure-prone systems, flexible manufacturing systems, resource allocation, supervisory control.
N OMENCLATURE ri R RU RR RFD RNFD RPFD
System resource type i. Set of system resource types. Set of unreliable resources. Set of reliable resources. Set of FD resources. Set of NFD resources. Set of PFD resources.
Manuscript received January 23, 2006; revised October 7, 2006 and April 5, 2007. This paper was recommended by Associate Editor M. P. Fanti. S. Wang is with the Department of Systems Science and Industrial Engineering, State University of New York at Binghamton, Binghamton, NY 13902 USA. S. F. Chew is with the Department of Mathematics and Statistics, Southern Illinois University Edwardsville, Edwardsville, IL 62026 USA. M. A. Lawley is with the Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN 47907 USA. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCA.2008.918616
R R+ R− Ci P Pj Pjk P FD P NFD Tj RTjk ρ Ωi Q Q0 Σ Σc Σu αjk βjk κi ηi ξ δ Π Πi ΠFD ΠNFD ΠFD i yjk xjk zik ANOVA BA DAP FD NFD NHC PFD RCO RFD ROD RO SU-RAS
Set of currently failed resources. Set of currently failed resources with one additional failure. Set of currently failed resources with one repair. Capacity of resource type i. Set of part types. Part type j. kth stage of part type Pj . FD part-type stages. NFD part-type stages. Route of Pj . Residual route of Pjk . ρ(Pjk ) returns the resource required by Pjk . Set of part-type stages supported by ri . Set of system states. Set of initial states. Set of system events. Set of controllable system events. Set of uncontrollable system events. Event representing allocation ρ(Pjk ) to pj,k−1 . Service completion for Pjk on resource ρ(Pjk ). Event representing failure of the server of resource ri . Event representing repair of the server of resource ri . Event-enabling function. State transition function. Set of parts in the system. Set of parts located at ri . Set of FD parts in the system. Set of NFD parts in the system. Set of FD parts on ri . Number of unfinished units of Pjk at ρ(Pjk ). Number of finished units of Pjk at ρ(Pjk ). Sum of xjk and yjk . Analysis of variance. Banker’s algorithm. Deadlock avoidance policy. Failure dependent. Non-FD. Neighborhood policy. Partially FD. Region of continuous operation. Region of failure dependence. Region of distribution. Resource order policy. Single-unit resource allocation system.
1083-4427/$25.00 © 2008 IEEE
606
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 3, MAY 2008
I. I NTRODUCTION
R
EAL-TIME resource allocation is a basic control function in flexibly automated manufacturing systems. Over the past decade, significant research has addressed the deadlockfree allocation of system buffer space, and researchers have developed many useful and practical results [1]–[15]. Comparatively, little work has addressed the control of these systems so that they are robust to failures. The concept of robust operation for systems with unreliable resources implies not only that these systems remain deadlock free but that they also continue to operate without disruptions when resource failure and repair occur. A nominal requirement would be that they continue to produce part types not requiring failed resources. To this end, a supervisor must control the resource allocation state, so that if failure occurs, parts requiring failed resources can be redistributed or relocated so that they do not stall the production of parts not requiring failed resources. The redistribution must be safe, so that when failed resources are repaired, system operation can continue. Research works addressing this class of systems can be found in [16]–[26]. We will first review the work of other researchers and then briefly review our previous work. In [17], Reveliotis addresses the issue of part blocking in the presence of contingencies due to resource breakdown or introduction of expedient jobs. He combines flexible routing with the DAPs developed in [10]–[14] to accommodate operational contingencies. This approach assumes that each part follows an assigned route until a failure occurs. At that point, a route reassignment is computed, and some set of parts is removed from the system in order to continue operation. Park and Lim [18] address fault-tolerant supervisory control by deriving necessary and sufficient conditions for the existence of a fault-tolerant supervisor. These conditions are stated in terms of language theoretic properties and are computationally intensive to compute. In [20]–[24], Hsieh develops fault-tolerant controllers for assembly processes using the Petri net formalism. Resource failure is modeled as the removal of tokens from the marking of the Petri net, and sufficient conditions for liveness after the tokens are removed are established. The approach uses the concept of “minimal resource requirement” to determine acceptable control actions. That is, the firing of a transition is allowed only if the resultant marking has a reachable marking that covers the minimal resource requirements of the processes in the net. The work also proposes a subclass of Petri nets for modeling systems with flexible routing. Fault-tolerant conditions for these Petri nets are established, and a decomposition method is proposed to test the feasibility of production routes. This paper differs from [17] in that we do not assume outside capacity to be sufficient to remove any number of parts from the system. In other words, our controllers take capacity outside the system (central buffer) into account in making allocation decisions. We differ from [18] in that we focus on developing robust polynomial control policies and not on establishing conditions for their existence. Finally, we differ from [20]–[24] in that we accept that the failure of a resource will prevent our system from achieving its full range of production. Our
objective is to control the system so that if a resource fails, the system can continue to produce part types not requiring that resource. This does not imply that a Petri net model of the system would be live under failure. Our previous research defines the requirements for robust supervision and develops several robust supervisors for SU-RAS. In [16], we develop robust supervisors for systems with a single unreliable resource by modifying and combining the DAPs of [10]–[13]. In [25], we combine the RO [14] with an NHC to develop an algebraic supervisor that is robust to the failure of a single unreliable resource. In [26], we extend the results of [16] to systems with multiple unreliable resources so that if any subset of unreliable resources fails, the residual system can continue to produce all part types not requiring failed resources. Note that the supervisors of [16], [25], and [26] admit only safe states for which there exists a sequence of resource allocations that advances parts requiring unreliable resources into FD buffer space (buffer space dedicated exclusively to parts requiring unreliable resources). This guarantees that when unreliable resources fail, shared resources can be cleared of parts requiring failed resources, so that these parts do not block the production of parts not requiring failed resources. In this paper, we relax this requirement so that the system can be more heavily loaded with FD parts, as might be desirable when required part mixes favor FD part types. In our experience, this assumption is essential for automated systems that produce very large bulky components such as semitrailers for hauling freight, where buffer space is necessarily limited. To relax the assumption, we must develop robust supervisors that distribute parts requiring failed resources throughout the buffer space of shared resources such that these parts do not block the production of part types not requiring failed resources. Our approach is to group the resources into three resource regions: the RCO, RFD, and ROD. We then develop supervisors for each of these regions and show that their conjunction satisfies the properties of a robust supervisor. We first do this for systems with a single unreliable resource and, then, extend our results to multiple unreliable resources under the assumption that at most one resource fails at a time. If multiple resources fail simultaneously, the supervisors developed here cannot guarantee robust operation, and some part types not requiring failed resources may be blocked from production until repair events occur. This more limited robustness is the cost of more flexible allocation for FD parts. Finally, we present the results of simulation experiments that illustrate the policies developed in this paper, allowing better system performance when priority is placed on FD parts than the policies presented in our previous work. The remainder of this paper is organized as follows. Section II formally defines the system and the problem that we address. Section III presents the development of the supervisor RO2 for systems with a single unreliable resource, whereas Section IV presents the development of the supervisor RO4 for systems with multiple unreliable resources. These supervisors assume that each part type requires at most one unreliable resource, and they guarantee a continuous operation in the face of a single resource failure. Section V presents the results of
WANG et al.: USING SHARED-RESOURCE CAPACITY FOR ROBUST CONTROL
Fig. 1.
Example system with a single unreliable resource.
a simulation experiment that compares system performance under these new policies with system performance under our previously published supervisors. Finally, Section VI discusses future research directions. For the sake of readability, formal definitions and proofs are located in the Appendix.
II. D ISCRETE E VENT S YSTEM The SU-RAS model presented in this section is identical to that of [16], [21], and [22]. For self-containment, we discuss the model based on the example system shown in Fig. 1. We model the system as a nine-tuple vector S = R, C, P, ρ, Q, Q0 , Σ, ξ, δ. Let R be the set of resource types. R = RR ∪ RU , where RR is the set of reliable resource types, which are not subject to failure, and RU is the set of unreliable resource types, which are subject to failure. In the example system in Fig. 1, RR = {r2 , r3 , r4 , r5 , r6 }, and RU = {r1 }. Let C be the resource-capacity vector, C = Ci : i = 1, . . . , |R|, where Ci is the capacity of the buffer associated with resource type ri ∈ R. In the example system, every resource has four units of capacity; thus, C = 4, 4, 4, 4, 4, 4. Let P be the set of part types produced by the system. Each part type Pj ∈ P represents an ordered set of processing stages Pj = Pj1 , . . . , Pj|Pj | , where part-type stage Pjk represents the kth processing stage of Pj . Four product types are produced in the example system, and the processing stages are the following: P1 = {P11 , P12 , P13 , P14 }, P2 = {P21 , P22 , P23 , P24 }, P3 = {P31 , P32 , P33 }, and P4 = {P41 , P42 , P43 }. We use pjk to represent an actual instance of Pjk . Let ρ : Pj → R be a function such that ρ(Pjk ) returns the resource required by Pjk . Thus, the route of Pj , in terms of resources used, is Tj = ρ(Pj1 ), . . . , ρ(Pj|Pj | ). The corresponding product routes in the example system are the following: T1 = r4 , r3 , r2 , r1 for P1 , T2 = r1 , r2 , r3 , r5 for P2 , T3 = r5 , r2 , r6 for P3 , and T4 = r6 , r2 , r5 for P4 . We will let Ωi be the set of part-type stages supported by ri , i.e., Ωi = {Pjk : ρ(Pjk ) = ri ∈ R}. In the example system, Ω1 = {P21 , P14 }, Ω2 = {P13 , P22 , P32 , P42 }, and so on (see Fig. 1).
607
We will suppose our system resource types to be workstations consisting of buffer space for staging and storing parts and one server or processor for operating on parts. The capacity of a system resource type indicates the size of the associated buffer. A server will be busy so long as there are unfinished parts in the buffer. A failure of a resource type implies failure of the associated server, not the associated buffer. We assume that when a server fails, we may continue to allocate parts to its associated buffer space up to capacity. Unfinished parts at the buffer space, however, may not be processed and, hence, may not proceed along their respective routes until the server is repaired. Finished parts at the buffer may be advanced out and, hence, may move along their respective routes even if the server fails. We assume that server failure does not damage or destroy a part being processed (although, this assumption is not necessary) and that failure can only occur when a server is busy. Let Q represent the set of system states. For a state q ∈ Q, we have q = svi , yjk , xjk : i = 1, . . . , |R|, j = 1, . . . , |P |, k = 1, . . . , |Pj |, where svi is the status of the server of workstation i (0 if failed, and 1 if operational), yjk is the number of unfinished units of Pjk (parts waiting for the server and in process) located in the buffer space of ρ(Pjk ), and xjk is the number of finished units of Pjk located in the buffer space of ρ(Pjk ). Q0 is the set of initial states, where q0 ∈ Q0 is the state in which no resources are allocated and all servers are operational. The |P | dimension of a system state q is |R| + j=1 2|Pj |. Let Σ represent the set of system events that can occur. Σ is partitioned into two sets, the set of controllable events Σc and the set of uncontrollable events Σu . The controllable events are those that the supervisor can disable; in our model, this could mean preventing the allocation of a unit of requested resource ρ(Pjk ) to a requesting part pj,k−1 . We use αjk to represent the allocation of a unit of requested resource ρ(Pjk ) to a requesting part pj,k−1 . In addition, αj,|Pj |+1 represents a finished part of part type Pj leaving the system. Thus, in our model, Σc = {αjk : j = 1, . . . , |P |, k = 1, . . . , |Pj | + 1} is the set of controllable events in the example in Fig. 1 Σc = {α11 , α12 , α13 , α14 , α15 ; (allocation events for P1 ) α21 , α22 , α23 , α24 , α25 ; (allocation events for P2 ) α31 , α32 , α33 , α34 ;
(allocation events for P3 )
α41 , α42 , α43 , α44 }.
(allocation events for P4 )
The uncontrollable events represent those events that our supervisors cannot disable. These will include part completions, which are denoted as βjk [i.e., the completion of server processing on a part pjk on resource ρ(Pjk )], resource failures, which are denoted as κi , and resource repairs, which are denoted as ηi . That is, κi represents the failure of the server of resource ri , whereas ηi represents the repair of the server of resource ri . More formally, let Σu = Σu1 ∪ Σu2 be the set of uncontrollable events, where Σu1 = {βjk : j = 1, . . . , |P |, k = 1, . . . , |Pj |} represents the completion of service for an instance of Pjk . Σu2 = {κi , ηi : ri ∈ RU } represents the failure (κi ) and repair (ηi ) events of the server of unreliable resource ri . Again, service completions, failures, and repairs are assumed to
608
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 3, MAY 2008
be beyond a controller’s influence. In Fig. 1, the uncontrollable event sets are the following: Σu1 = {β11 , β12 , β13 , β14 ;
(completion events for P1 )
β21 , β22 , β23 , β24 ; (completion events for P2 ) β31 , β32 , β33 ;
(completion events for P3 )
β41 , β42 , β43 }
(completion events for P4 )
Σu2 = {κ1 , η1 }
(failure/repair events for r1 ).
We now require two functions. The first, which we denote as ξ, will compute the events that are enabled for a given state (i.e., ξ : Q → 2Σ is a function that, for a given state, returns the set of enabled events). This function is defined for a state q in the following. 1) Events that release new parts into the system are enabled when space is available on the first required workstation in the route, i.e., For Pj1 ∈ Ωi , if Ci −
(yjk + xjk ) > 0, then αj1 ∈ ξ(q).
Pjk ∈Ωi
2) If a part is at service, then the corresponding service completion event is enabled, i.e., For Pjk ∈ Ωi , if yjk > 0 and svi = 1, then βjk ∈ ξ(q). 3) If the server is busy with a part, then the corresponding failure event is enabled, i.e., For ri ∈ RU , if svi = 1 and yjk > 0 for some Pjk ∈ Ωi , then κi ∈ ξ(q). 4) If the server is failed, the corresponding repair event is enabled, and the corresponding service completion events are disabled, i.e., For ri ∈ RU , if svi = 0,
event corresponding to the advancement of the part is enabled, i.e., For Pjk ∈ Ωi , 1 < k ≤ |Pj |, if xj,k−1 > 0 and Ci −
(yjk + xjk ) > 0,
Pjk ∈Ωi
then αjk ∈ ξ(q). 6) If a part has finished all of its operations, the event corresponding to unloading it from the system is enabled, i.e., For Pj,|Pj | ∈ Ωi ,
if xj,|Pj | > 0,
then αj,|Pj |+1 ∈ ξ(q).
The second required function, which is denoted as δ, will compute state transitions, i.e., using the current state plus a selected event for execution, δ will determine the state that results after the execution of the event. Specifically, let δ : Q × Σ → Q such that the conditions shown at the bottom of the page are met, where exj,k−1 , eyjk , exjk , and esvi are the standard unit vectors with components corresponding to xj,k−1 , yjk , xjk , and svi being one, respectively. Note that, eyj,|Pj |+1 = exj0 = 0, which is the zero vector with the same dimension, and that Pj0 represents the parts of Pj waiting outside the system. As an example, in Fig. 1, if r1 is operational and holding a finished unit of P21 and an unfinished unit of P14 , r2 is holding an unfinished unit of P22 and an unfinished unit of P32 , r6 is holding a finished unit of P33 , and all other resources are idle, then the corresponding state vector is as follows: q = 1, 1, 1, 1, 1, 1; (all servers operational) 0, 0, 0, 1; (y14 = 1, one unfinished P14 ) 0, 1, 0, 0; (y22 = 1, one unfinished P22 ) yjk (y32 = 1, one unfinished P32 ) 0, 1, 0; 0, 0, 0; (no unfinished P4 ) 0, 0, 0, 0; (no finished P1 ) 1, 0, 0, 0; (x21 = 1, one finished P21 ) xjk 0, (x33 = 1, one finished P33 ) 0, 1; 0, 0, 0 (no finished P4 ). The corresponding enabled events are the following:
then ηi ∈ ξ(q) and βjk ∈ ξ(q)∀Pjk ∈ Ωi . 5) When a part finishes its current operation and buffer space is available at its next required workstation, the
ξ(q) = {β14 , β22 , β32 ;
(enabled completion events)
α22 , α34 ;
(enabled allocation events)
κ1 }
(enabled failure event for r1 ).
δ(q, αjk ) = q−exj,k−1 +eyjk ,
advancement of a part pj,k−1
δ(q, βjk ) = q−eyjk +exjk ,
service completion of a part pjk
δ(q, κi ) = q−esvi ,
failure of server i
δ(q, ηi ) = q+esvi ,
repair of server i
WANG et al.: USING SHARED-RESOURCE CAPACITY FOR ROBUST CONTROL
If we execute event α22 , then we get a new state δ(q, α21 ) = q − ex21 + ey22 = q q = 1, 1, 1, 1, 1, 1; (all servers operational) 0, 0, 0, 1; (y14 = 1, one unfinished P14 ) 0, 2, 0, 0; (y22 = 2, two unfinished P22 ’s) yjk 0, (y32 = 1, one unfinished P32 ) 1, 0; 0, 0, 0; (no unfinished P4 ) 0, 0, 0, 0; (no finished P1 ) 0, 0, 0, 0; (no finished P2 ) xjk (x33 = 1, one finished P33 ) 0, 0, 1; 0, 0, 0 (no finished P4 ). Note that when r1 fails (the event κ1 occurs), the occurrences of events in the set {α11 , α12 , α13 , α14 , α15 , events associated with P1 β11 , β12 , β13 , β14 , α21 , α22 , α23 , α24 , α25 , events associated with P2 β21 , β22 , β23 , β24 } are bounded until the repair event η1 occurs. Our objective with robust supervisory control is to make sure that these are the only events whose occurrences are bounded by the failure of r1 . For example, if we are in the state q = 1, 1, 1, 1, 1, 1; (all servers operational) 0, 0, 0, 4; (y14 = 4, four unfinished P14 ’s) 0, 0, 0, 0; (no unfinished P2 ) yjk 0, (no unfinished P3 ) 0, 0; 0, 0, 0; (no unfinished P4 ) 0, 0, 4, 0; (x13 = 4, four finished P13 ’s) 0, 0, 0, 0; (no finished P2 ) xjk 0, (no finished P3 ) 0, 0; 0, 0, 0; (no finished P4 ) and r1 fails (the event κ1 occurs), then the occurrence of all system events will be bounded or blocked until the repair event occurs. This is because every part type requires resource r2 , which is completely allocated to the parts of type P13 . These instances of P13 are blocked from advancing to r1 , because its buffer space is completely allocated to instances of P14 , which are all unfinished. Note that this prevents P3 and P4 from producing although they do not require failed r1 . Thus, the failure of r1 in this state will eventually stall the whole system, which is what we want to avoid through robust supervision. The properties that we want to ensure through robust supervision are informally stated in the following (the formal statement is given in Appendix A). Property 2.1: A supervisory controller is said to be robust to resource failures if it satisfies the following. 1) The supervisory controller guarantees deadlock-free operation with no resource failures. 2) The supervisory controller guarantees that the system visits only the states for which continuing operation is possible in the event of a resource failure.
609
3) The supervisory controller guarantees deadlock-free operation while unreliable resources are failed. 4) The supervisory controller guarantees that while unreliable resources are failed, the system will visit only the states for which deadlock-free operation is possible in the event of repair. One objective of our research in robust supervisory control is to develop maximally robust supervisory control policies that satisfy the aforementioned property for any possible subset of failed resources. Such policies are presented [26]. We are also interested in policies that provide some protection against failure but cannot be considered maximally robust (we refer to these as partially robust). This can be advantageous because achieving maximal robustness typically requires greater restrictions on the part mix and resource allocation state than does partial robustness. This paper presents a new class of robust controllers that is partially robust for systems with multiple unreliable resources. These controllers are robust to resource failures that do not occur simultaneously. However, when more than one resource fails at a time, dependence chains that cannot be resolved until repair events occur can form. These dependence chains are not cyclic, i.e., no deadlock will occur, correct operation will continue after sufficient repair events occur, and dependence chains will work themselves out without human intervention. As previously stated, the advantage is that these policies allow greater allocation flexibility under nominal operation, while still protecting against single resource failures. More specifically, our maximally robust supervisors [26] constrain the resource allocation state by requiring parts that need future processing on unreliable resources to have buffer capacity reserved on FD resources (resources dedicated exclusively to processing parts requiring unreliable resources). This assures that if unreliable resources fail, operational resources can be cleared of parts waiting for them to be repaired. The partially robust policies presented here allow such parts to be distributed among the buffer space of resources along their respective routes. This allows greater allocation flexibility because more parts requiring unreliable resources can be allowed in the system. Furthermore, if a single resource is failed, the policy guarantees that the distribution of parts requiring the failed resource does not block the production of parts not requiring that resource. If more than one resource fails at a time, acyclic dependence chains can form, as previously discussed, which cannot be resolved until sufficient repair events occur. After developing the new policies in the next two sections, we analyze simulation experiments that help in revealing, from a performance perspective, when this form of partially robust control is more desirable than maximal robustness. III. S INGLE U NRELIABLE R ESOURCE In this section, we develop a robust supervisory controller for systems with a single unreliable resource, which is denoted as RO2 , which is a conjunction of two ROs [14]. Recall that RO is a suboptimal DAP based on the intuition that parts flowing in opposite directions through the same set of workstations must at some point be able to pass (see Appendix F for a brief
610
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 3, MAY 2008
Fig. 3.
Fig. 2. Resource regions and RO. (a) Resource regions. (b) RO2 for resource regions.
example). In this policy, the workstations are ordered, and each part is categorized according to how it flows with respect to that order. Resource allocation is constrained so that there never simultaneously exists a workstation that is low in the order filled with parts moving up the order and a workstation that is high in the order filled with parts moving down the order (this negates a necessary condition for both deadlock and unsafeness). RO is expressible as a set of O(|R|2 ) linear inequalities that defines a deadlock-free region of system operation. In [14], Lawley et al. proves the correctness of RO for systems with no unreliable resources. In [25], Lawley illustrates how to apply RO in conjunction with an NHC to guarantee robust operation for systems with a single unreliable resource, assuming that parts requiring that resource can be advanced into FD buffer space. In this paper, we relax this assumption, i.e., we establish robust operation without assuming that shared resources can be cleared of parts requiring a failed resource. In the following, Section III-A presents the groupings of resources and definitions of different resource regions to which RO will be applied. Section III-B introduces the RO2 and provides examples to illustrate its application. Both sections make frequent referrals to Fig. 1, which shows a system with six resources, four part types, and one unreliable resource. The Appendix provides formal proofs for the robustness of RO2 . Section IV extends our results to systems with multiple unreliable resources. A. Resource Classification and Resource Regions Recall that Ωi = {Pjk : ρ(Pjk ) = ri } is the set of parttype stages supported by a resource ri . We say that a parttype stage Pjk is FD if it requires an unreliable resource in its residual route, i.e., if ρ(Pjm ) ∈ RU for some m ≥ k. Otherwise, Pjk is NFD. Let P FD and P NFD represent the sets of FD and NFD part-type stages, respectively. For example, in Fig. 1, P FD = {P11 , P12 , P13 , P14 , P21 } and P NFD = {P22 , P23 , P24 , P31 , P32 , P33 , P41 , P42 , P43 }. For the next three paragraphs, the reader should refer to Figs. 2(a) and 3.
Resource regions for the system in Fig. 1.
We say that ri is an FD resource if ri supports an FD parttype stage, i.e., if Ωi ∩ P FD = ∅. Otherwise, ri is an NFD resource. Furthermore, ri is a PFD resource if it supports both FD and NFD part-type stages, i.e., if Ωi ∩ P FD = ∅ and Ωi ∩ P NFD = ∅. Thus, an FD resource will process FD parts and possibly NFD parts, whereas an NFD resource processes only NFD parts. A PFD resource will process at least one FD and one NFD part. We can now define three resource sets: RFD = {ri : ri is an FD resource}, RNFD = {ri : ri is an NFD resource}, and RPFD = {ri : ri is a PFD resource}. In Fig. 1, these sets are RFD = {r1 , r2 , r3 , r4 }, RNFD = {r5 , r6 }, and RPFD = {r2 , r3 }. Clearly, the following set relationships are true. 1) All unreliable resources are FD, i.e., RU ⊆ RFD . 2) All PFDs are FD, i.e., RPFD ⊆ RFD . 3) A resource cannot be both FD and NFD, i.e., RFD ∩ RNFD = ∅. 4) Each resource is either FD or NFD, i.e., RFD ∪ RNFD = R. 5) PFD and NFD resources are reliable, i.e., RPFD ∪ RNFD ⊆ RR . Note that a PFD resource is reliable; otherwise, all of its supported parts would be FD. Based on these sets, we define the following three resource regions. 1) The region of continuous operation, RCO = RPFD ∪ RNFD . These are the resources that must continue operations after the unreliable resource fails. Note that RCO does not contain the unreliable resource. 2) The region of failure dependence, RFD = RFD . These are the resources that can hold parts requiring the failed resource. This set includes the unreliable resource.
WANG et al.: USING SHARED-RESOURCE CAPACITY FOR ROBUST CONTROL
3) The region of distribution, ROD = RFD\RU . These are the operational resources throughout which FD parts must be distributed when the unreliable resource fails. This set does not contain the unreliable resource. Fig. 2(a) abstractly shows these regions. Fig. 3 shows the regions for the system in Fig. 1, where RCO = {r2 , r3 , r5 , r6 }, RFD = {r1 , r2 , r3 , r4 }, and ROD = {r2 , r3 , r4 }. Our intuition is as follows. If r1 fails, we want our RCO, RCO = {r2 , r3 , r5 , r6 }, to keep on making parts not requiring r1 (P3 and P4 ). To achieve this, we want to be able to distribute parts requiring r1 (P1 and P2 ) throughout the buffer space of resources in the RFD, RFD = {r1 , r2 , r3 , r4 }, such that no resource in RCO is filled with these FD parts. Operational resources that might have to be shared by FD and NFD parts are contained in the ROD, ROD = {r2 , r3 , r4 }. The next section will now define and illustrate a supervisor that achieves our control objective. B. RO2 In this section, we develop a supervisory controller for systems with |RU | = 1. Denoted by RO2 , this controller is the conjunction of two constraint sets: RORCO and RORFD . Fig. 2(b) shows the resource regions to which RORCO and RORFD are applied. Definition 3.1: RORCO is the set of constraints zjk + zuv < Cg + Ch Pjk ∈Ωg
much more flexible than this brief explanation implies. For a complete discussion, see [14].) By using logic that is intuitively similar to the aforementioned, RORFD guarantees that a sufficient number of FD parts can be safely advanced out of the shared resources (those in RPFD ⊆ RCO) so that each shared resource has at least one unit of buffer capacity that is not allocated to an FD part. This allows RCO, under the supervision of RORCO , to continue the production of NFD part types. At the same time, RORFD guarantees that this advancement of FD parts out of shared resources and into unshared FD resources (those in RFD\RPFD ) does not cause unsafeness or policy-induced deadlocks in the unshared FD resources. In Appendix B we rigorously establish these results for all possible cases by constructing a corresponding event sequence permitted by the supervisor, by showing that the resulting state is safe, and by constructing a safe sequence permitted by the supervisor. As an example, we enumerate the constraints for the system in Fig. 1 as follows: RORCO :
where zst = xst + yst , rg , rh ∈ RCO, and g = h. RO restricts the number of FD and NFD parts in RCO so that there exists in RCO at most one capacitated resource. Definition 3.2: RORFD is the set of constraints zjk + zuv < Cg + Ch Puv ∈Ωh ∩P F D
where zst = xst + yst , rg , rh ∈ RFD, and g = h.RORFD restricts the number of FD parts in RFD so that there exists in RFD at most one resource filled with FD parts. Definition 3.3: RO2 admits the enabled controllable event α if and only if δ(q, α) satisfies RORCO ∧ RORFD . RORCO guarantees deadlock-free operation for RCO by the correctness of RO [14], which uses the fact that any allocation state with at most one filled resource is safe. In its most conservative form, RO restricts the number of filled resources to one. Intuitively, RO is correct by the following logic: 1) If no resource is filled, then the next required resource of every part is available, and the advancement of any single part can result in at most one filled resource; and 2) if only one resource is filled, then the next required resource of every part on the filled resource is available, and the advancement of any part away from the filled resource can result in at most one filled resource. Thus, if the system is in a resource allocation state with at most one filled resource, then it is possible to move to another resource allocation state with at most one filled resource, implying that the system state is safe. (RO is actually
r2 r3 : z13 +z22 +z32 +z42 +z12 +z23 < 8 r2 r5 : z13 +z22 +z32 +z42 +z24 +z31 +z43 < 8 r2 r6 : z13 +z22 +z32 +z42 +z33 +z41 < 8 r3 r5 : z12 +z23 +z24 +z31 +z43 < 8 r3 r6 : z12 +z23 +z33 +z41 < 8 r5 r6 : z24 +z31 +z43 +z33 +z41 < 8
Puv ∈Ωh RCO
Pjk ∈Ωg ∩P F D
611
RORFD :
r1 r2 : z14 +z21 +z13 < 8 r1 r3 : z14 +z21 +z12 < 8 r1 r4 : z14 +z21 +z11 < 8 r2 r3 : z13 +z12 < 8 r2 r4 : z13 +z11 < 8 r3 r4 : z12 +z11 < 8.
We illustrate the use of RO2 as follows. Consider Fig. 4, which provides a system state, for example, q, for our example system. In q, r1 and r2 are holding four p14 ’s and three p13 ’s, respectively, whereas r3 and r6 have one p12 and four p41 ’s, respectively. It is easy to verify that q is admissible by RO2 . Note, in this case, that the only capacitated resource in RRCO is r6 and that the only capacitated resource in RFD is r1 . There are a number of feasible part advancements from q. For example, a finished p41 at r6 may be advanced into r2 to become a p42 , resulting in a state, for example, q1 . In q1 , RRCO has r2 as the only capacitated resource; hence, RORCO is not violated. However, RFD contains two capacitated resources, r1 and r2 . Because r2 holds the NFD part p42 , r2 is not filled with FD parts. In fact, r1 is the only resource filled with FD parts in RFD ; thus, RORFD is not violated. As a result, RO2 permits q1 . Let us look at an inadmissible resulting state. Assume, at q, that the completed p12 from r3 is advanced into r2 to become a p13 , resulting in a state, for example, q2 . Clearly, q2 is not admissible by RO2 , because q2 violates RORCO and, in fact,
612
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 3, MAY 2008
Fig. 5.
System with three unreliable resources.
Fig. 4. Admissible state by RO2 .
q2 also violates RORFD . Note that q2 is a safe state, given that unreliable resource r1 is operational, meaning that there exists a sequence of resource allocations to empty the system of parts from q2 . However, it is possible, at q2 , that r1 may fail before it has finished any of its p14 ’s. If this happens, then the system would not be able to continue to produce both P3 and P4 because they are now blocked by the p13 ’s filling r2 . As a consequence, q2 is undesirable. The following theorem establishes that RO2 satisfies Property 2.1 and, thus, is robust for systems in which there exists a single unreliable resource. (Its counterpart in the Appendix is Theorem B.1 in Appendix B.) Theorem 3.1: If |RU | = 1, RO2 satisfies the requirements of Property 2.1 and is therefore robust to failures of the unreliable resource. We note that the number by RO2 |R| of constraints generated 2 is proportional to O 2 , which is O(|R| ). Furthermore, the number of terms in each constraint is loosely bounded by the cumulative route length, CRL = |P1 | + |P2 | + · · · + |P|P | |. Thus, evaluating the constraints requires no more than O(CRL ∗ |R|2 ) additions and comparisons, which is a polynomial. We now extend our results to systems with multiple unreliable resources under the assumption that each part type requires at most one unreliable resource. We develop a policy that is robust to the failure of one resource at a time. IV. M ULTIPLE U NRELIABLE R ESOURCES This section develops a controller RO4 that satisfies the requirements of Property 2.1 for systems with multiple unreliable resources if each part type requires at most one unreliable resource and at most one unreliable resource is in the failed state at a time. If multiple resources are down simultaneously, the production of some part types that are not requiring failed resources may be blocked until repairs occur. This is a more limited form of robustness than that presented in [26], but, as noted earlier in this paper, this is the cost of a more flexible allocation for FD parts. Appendix C provides formal analysis and robustness proofs for this policy. For the case of multiple unreliable resources |RU | > 1, we need to define one additional set. Recall that P FD and P NFD represent the sets of FD and NFD part-type stages, respectively. Let PiFD represent the set of part-type stages that are FD on ri ∈ RU . Fig. 5 shows an example system consisting of eight resources, each with capacity 2, where r4 , r6 , and r8 are unre-
liable. This system produces four different part types with their respective routes shown in Fig. 5. Note that P FD = {P21 , P22 , P23 , P31 , P32 , P33 , P41 , P42 } P NFD = {P11 , P12 , P13 , P24 , P43 } P4FD = {P21 , P22 , P23 } P6FD = {P41 , P42 } P8FD = {P31 , P32 , P33 } and for resources RFD = {r2 , r4 , r5 , r6 , r7 , r8 } RNFD = {r1 , r3 } RPFD = {r2 } RCO = {r1 , r2 , r3 } RFD = {r2 , r4 , r5 , r6 , r7 , r8 } ROD = {r2 , r5 , r7 }. To assure robust operation in this system, we will have to extend RORFD and define a new RO for ROD, which is ROROD . RORCO will remain unchanged from the previous section. Definition 4.1: RORFD is the set of constraints zjk + zuv < Cg + Ch for ri ∈ RU Pjk ∈Ωg ∩PiFD
Puv ∈Ωh ∩PiFD
where zst = xst + yst , rg , rh ∈ RFD, and g = h. RORFD is different from that of Definition 3.2 in that we now generate a constraint set for each unreliable resource. RORFD admits states for which at most one resource of RFD is capacitated with PiFD parts for each ri ∈ RU . Note that it does not place any constraint on the total number of RFD resources capacitated, only on the number capacitated by a single FD part type. 2 Definition 4.2: RORFD is the set of constraints zjk + zmn Pjk ∈Ωg ∩P FD
Pmn ∈Ωh ∩P FD
+
zuv < Cg + Ch + Cj
Puv ∈Ωj ∩P FD
where zst = xst + yst , rg , rh , rj ∈ RFD, and g = h = j.
WANG et al.: USING SHARED-RESOURCE CAPACITY FOR ROBUST CONTROL
613
2
RORFD admits states for which at most two resources of RFD are capacitated with FD parts but does not place any constraint on the total number of RFD resources capacitated. Definition 4.3: ROROD is the set of constraints
zjk +
Pjk ∈Ωg ∩P FD
zuv < Cg + Ch
Puv ∈Ωh ∩P FD
where zst = xst + yst , rg , rh ∈ ROD, and g = h. ROROD admits states for which at most one resource of ROD is capacitated with FD parts, although it places no constraint on the number of unreliable resources that are capacitated. As an example, we enumerate the constraints for the system in Fig. 5 as follows: RORCO
r1 r2 : z11 + z12 + z21 < 4 r1 r3 : z11 + z13 + z24 + z43 < 4 r2 r3 : z12 + z21 + z13 + z24 + z43 < 4
RFD
RO
r2 r4 : z21 + z23 < 4 r2 r5 : z21 + z22 < 4 r4 r5 : z23 + z22 < 4 r5 r6 : z41 + z42 < 4 r5 r7 : z31 + z32 < 4 r5 r8 : z31 + z33 < 4 r7 r8 : z32 + z33 < 4
2
RORFD
r2 r4 r5 : z21 + z23 + z22 + z31 + z41 < 6 r2 r4 r6 : z21 + z23 + z42 < 6 r2 r4 r7 : z21 + z23 + z32 < 6 r2 r4 r8 : z21 + z23 + z33 < 6 r2 r5 r6 : z21 + z22 + z31 + z41 + z42 < 6 r2 r5 r7 : z21 + z22 + z31 + z41 + z32 < 6 r2 r5 r8 : z21 + z22 + z31 + z41 + z33 < 6 r2 r6 r7 : z21 + z42 + z32 < 6 r2 r6 r8 : z21 + z42 + z33 < 6 r2 r7 r8 : z21 + z32 + z33 < 6 r4 r5 r6 : z23 + z22 + z31 + z41 + z42 < 6 r4 r5 r7 : z23 + z22 + z31 + z41 + z32 < 6 r4 r5 r8 : z23 + z22 + z31 + z41 + z33 < 6 r4 r6 r7 : z23 + z42 + z32 < 6 r4 r6 r8 : z23 + z42 + z33 < 6 r4 r7 r8 : z23 + z32 + z33 < 6 r5 r6 r7 : z22 + z31 + z41 + z42 + z32 < 6 r5 r6 r8 : z22 + z31 + z41 + z42 + z33 < 6 r5 r7 r8 : z22 + z31 + z41 + z32 + z33 < 6 r6 r7 r8 : z42 + z32 + z33 < 6
ROD
RO
r2 r5 : z21 + z22 + z31 + z41 < 4 r2 r7 : z21 + z32 < 4 r5 r7 : z22 + z31 + z41 + z32 < 4.
Fig. 6. Admissible state by RO4 .
and Note that the number of constraints generated is O |R| 3 that, as before, the number of terms in each constraint is bounded by CRL. Thus, evaluating these constraints is no worse than O(CRL ∗ |R|3 ). We now define RO4 . Definition 4.4: RO4 admits the enabled controllable event α 2 if and only if δ(q, α) satisfies RORCO ∧ RORFD ∧ RORFD ∧ ROD . RO 2 Note that if |RU | = 1, then both RORFD and ROROD are RFD , i.e., they are redundant, and thus, RO4 is implied by RO 2 equivalent to RO . Furthermore, if RU = ∅, then both RO2 and RO4 are equivalent to the original RO given in [14]. We illustrate RO4 as follows. Fig. 6 shows a system state, for example, q, for the system in Fig. 5. In q, r2 is holding a p12 and a p21 , and r5 and r6 are holding a p41 and two p42 ’s, respectively. It is easy to verify that q is admissible by RO4 . There are a few feasible part advancements from q. For instance, we may load a new part into the system, such as loading a p31 into r5 , resulting in a state, for example, q1 . In q1 , r2 is the only capacitated resource in RCO; thus, RORCO is not violated. There are three capacitated resources, which are r2 , r5 , and r6 , in RFD. However, because p12 is an NFD part, only r5 and r6 are filled with FD parts, rendering no violation 2 of RORFD . Furthermore, ROROD is not violated, because r5 is the only resource filled with FD parts. Clearly, there do not exist two or more resources filled with parts that are FD on the same unreliable resource; thus, RORFD is not violated. As a result, q1 is acceptable to RO4 . We next look at an undesirable resulting state. For example, if we load a p41 into r5 at q, we get a state, for example, q2 , with two resources r5 and r6 filled with parts that are FD on the same unreliable resource r6 . Thus, q2 violates RORFD and is not permitted by RO4 . Note that q2 is a safe state, given that unreliable resource r6 is operational, i.e., there exists a sequence of resource allocations to clear the system of parts from q2 . However, it is possible, at q2 , that r6 may fail before it has completed any p42 . If this happens, clearly, the system cannot continue to produce both P2 and P3 because they are now blocked by the p41 ’s filling r5 , although they do not require r6 in their processing. As a consequence, q2 is undesirable. In essence, under RO4 , we require that FD parts blocked by the failure of an unreliable resource be able to distribute among the buffer space of resources along their respective routes, so that they do not block the production of other parts. For instance, in the example, if r6 fails before it has completed any p42 that it is holding at q, the p41 is then blocked by the failed
614
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 3, MAY 2008
r6 and, thus, will be stored at r5 . However, the resource-failureinduced part blockage will in no way preclude other part types, P1 , P2 , and P3 , which are not requiring r6 in their processing, from producing. The following theorem guarantees that RO4 is robust for systems where every part type requires at most one unreliable resource and at most one resource is in a failed state at a time. Theorem 4.1: Supervisor RO4 is robust for systems where |RU | ≥ 1 and the number of failed resources does not exceed one. The intuition behind this theorem is somewhat similar to that of Theorem 3.1, although the setting and context are much more difficult. RO4 ensures that if a shared resource (i.e., a PFD resource) is filled with FD parts, at least one can be advanced out of the shared resources and, thus, out of RCO, which can then operate under RORCO . Furthermore, clearing RCO of this part will not create problems in the FD resources. We now provide a brief and intuitive explanation. To summarize, we have the following. 1) RORFD allows states with at most one FD resource filled with parts that are FD on the same unreliable resource. 2 2) RORFD allows states for which at most two FD resources are capacitated with FD parts. 3) ROROD admits states for which at most one resource of ROD is capacitated with FD parts. Suppose that a state is allowed by RO4 . Then, by 1), it has at most one FD resource filled with parts that are FD on the same unreliable resource. By 2), it has at most two FD resources filled with FD parts, and by 3), it has at most one PFD resource filled with FD parts. Now, suppose that an unreliable resource fails. Roughly speaking, if there is no PFD resource filled with FD parts, then RCO can operate freely under RORCO , as previously discussed. If there is one PFD resource, for example, ri , which is filled with FD parts [by 3), there can be at most one], then at least one part has to be advanced into the FD\PFD resources (out of RCO) without causing unsafeness. At this point, numerous cases have to be considered and resolved. For illustration, let ru be the failed unreliable resource, and suppose that ru is filled. Then, by 2), ru and ri are the only capacitated resources of RFD, and by 1), ri holds an FD part, for example, pjk , which requires an unreliable resource, rv = ru . Note that rv is not filled, nor is any other FD resource required by pjk (other than ri ). Thus, again, roughly speaking, pjk has an open path in the FD resources into rv . Once pjk advances, RCO can operate freely under RORCO . Once we have proven the existence of a sequence of part advances that sufficiently clears RCO of FD parts, we must prove that it is admitted by RO4 , that the resulting state, for example, q, is safe, and that q exhibits a safe sequence allowed by RO4 . Appendix C provides these proofs for every case. V. E MPIRICAL I NVESTIGATION In this section, we first design an experiment to compare RO2 with RO and NHC+BA for systems with a single unreliable resource. Then, we design an experiment to compare system
TABLE I EXPERIMENTAL FACTORS (SINGLE UNRELIABLE RESOURCE)
performances under RO4 , RO, and NHC+BA for systems with multiple unreliable resources. Appendices D–F provide brief overviews of NHC, BA, and RO, respectively. A. Experiment for Single-Unreliable-Resource System The experimental system for simulation is the example in Fig. 1. Our experiment has two levels of buffer sizes for the resources in RCO, low level (two units) and high level (ten units). The order release policy uses an equal part mix between FD and NFD parts. When all resources in the systems are operational, we load new parts into systems in round-robin order, P1 −P2 −P3 −P4 . When the unreliable resource is down, we immediately switch to continuously loading NFD parts in round-robin order as P3 −P4 and to loading FD parts as long as the control policy allows. When the unreliable resource is repaired and back to normal status, we immediately switch to the original loading sequence. We also have two levels of “failure cycle” factors, a short failure cycle as 100 min and a long failure cycle as 1000 min. The last factor is percentage of downtime. We test this at four different levels, which are 10%, 30%, 50%, and 70%. We take a complete “resource cycle” (uptime + downtime) to be the sum of two exponential random variables, with parameters that add up to either 100 or 1000 min. At 10% downtime and 1000-min failure cycle, the time to failure is exponential 900 min, and the downtime is exponential 100 min. Thus, at 10% downtime, the time to failure is longer, and the downtime is shorter, whereas at 70% downtime, the time to failure is short, and the downtime is long. We set the processing time of all part stages be exponential 5 min. Table I summarizes the experimental factors. For performance measures, we look at the production of FD parts during a simulation run of 1000 h. Finally, we perform three replications for each of the 3 × 2 × 2 × 4 = 48 combinations. This provides a total of 96 degrees of freedom for estimating experimental error, which we think is very adequate. Table II presents the ANOVA for FD production. Although there are many statistically significant effects, we concentrate our discussion on the main effects (Fig. 7) and the interactions between policy and buffer space, failure cycle, and percentage of downtime (Figs. 8–10, respectively). Although most of the main effects are not surprising, we see in Fig. 7 that RO2 enables a higher overall FD production rate than does NHC+BA and RO [although there is a partial overlap in the 95% confidence interval (CI)], indicating that using shared buffer space to position FD parts when resources fail has advantages for FD parts. This effect is more pronounced when buffer space is higher, as shown in Fig. 8. When buffer size is small, there is little difference, whereas if buffer size is large, production under RO2 is significantly
WANG et al.: USING SHARED-RESOURCE CAPACITY FOR ROBUST CONTROL
615
TABLE II ANOVA FOR FD PRODUCTION (SINGLE UNRELIABLE RESOURCE)
Fig. 7.
Main effects for FD production (single unreliable resource).
higher. This is due to the nature of the policies. The NHC+BA policy restricts the number of FD parts allowed in the system by the buffer sizes of FD resources. Even when the buffer sizes of other resources in RCO are high, the policy cannot allow more FD parts into the system. This explains why FD production under NHC+BA changes little between large and small buffer sizes. In contrast, RO2 allows the distribution of FD parts across the buffer spaces of resources in RCO. When buffer sizes are larger, more FD parts can be distributed in the system, and the production rate of these part types increases. As for RO, it has the same or better ability to load FD parts into the system as RO2 , thus enabling a higher FD production rate than NHC+BA. However, due to its inability to avoid failure-induced blocking, FD production under RO is lower than RO2 .
Figs. 9 and 10 show the policy versus failure cycle and percentage of downtime. In general, RO2 enables a higher FD production than RO (again, with some partial overlap of the 95% CI’s) and NHC+BA, but this is slightly less pronounced for longer failure cycles and higher percentage of downtime. In summary, for systems with a single unreliable resource, when the RCO is sufficiently capacitated, allowing supervisors to use shared-resource capacity to assure robust operation significantly promotes the production of FD part types. B. Experiment for Multiple Unreliable Resources The experimental system for simulation is the system in Fig. 11. It has two unreliable resources, which are r1 and r6 ,
616
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 3, MAY 2008
Fig. 8. Interaction plot of policy and buffer size (single-unreliable-resource system).
Fig. 11.
Experimental system for multiple-unreliable-resource system.
TABLE III EXPERIMENTAL FACTORS (MULTIPLE UNRELIABLE RESOURCE)
Fig. 9. Interaction plot of policy and failure cycle (single-unreliable-resource system).
Fig. 10. Interaction plot of policy and percentage of downtime (singleunreliable-resource system).
and produces eight part types, where P1 −P4 are NFD parts and P5 −P8 are FD parts. The experimental design is similar to the experimental setting for the single-unreliable-resource system in Section V-A. Table III summarizes the experimental factors.
Instead of comparing RO2 , we compare RO4 with NHC+BA and RO for this system with two unreliable resources. There are two levels of buffer sizes in RCO; specifically, for low level, we use three and six units for NFD and PFD resources, respectively, and for high level, we use 9 and 18 units for NFD and PFD resources, respectively. The order release policy is the same as described in Section V-A. Failure cycle and percentage of downtime follow the same design as in the singleunreliable-resource system. We set the processing time of all part stages to be exponential 5 min. For performance measures, we look at the production of FD parts during a simulation run of 1000 h. Finally, we perform three replications for each of the 3 × 2 × 2 × 4 = 48 combinations. Table IV presents the ANOVA for FD production. Although there are many statistically significant effects, we concentrate our discussion on the main effects (Fig. 12) and the interactions between policy and buffer space, failure cycle, and percentage of downtime (Figs. 13–15, respectively). In general, we observe results similar to those for the single-unreliable-resource system. In the main plot (Fig. 12), NHC+BA and RO4 enable a higher overall average FD production than RO, although they are not discernable in main effect from each other. To understand this, we must look at the interactions. For the effects of buffer space, as shown in Fig. 13, when the buffer size in RCO is high, FD production is higher under RO4 than in NHC+BA (although there is slight overlap in the 95% CI’s). On the other hand, when the buffer size is small, NHC+BA enables a higher FD production than RO4 . In terms of failure cycle, as shown in Fig. 14, RO is always outperformed by both RO4 and NHC+BA. With a short failure cycle, RO4 achieves a better FD production rate, and with a
WANG et al.: USING SHARED-RESOURCE CAPACITY FOR ROBUST CONTROL
617
TABLE IV ANOVA FOR FD PRODUCTION (MULTIPLE UNRELIABLE RESOURCE)
Fig. 12. Main effects for FD production (multiple-unreliable-resource system).
longer failure cycle, NHC+BA becomes the obvious choice for FD production. Fig. 15 shows the interaction between policy and percentage of downtime. Regardless of the percentage of downtime, FD production under RO4 is always higher than RO. When the percentage of downtime is small, RO4 enables a higher FD production rate than NHC+BA. NHC+BA, on the other hand, performs better and better and eventually outperforms RO4 as the percentage of downtime increases. All of the aforementioned observations are due to the natures of the policies. Resource failures cause blocking, and blocking may propagate through the system and thus further stall the production of some portion of the system or the whole system in the worst
case. RO does not consider robust supervision under resource failures. RO4 guarantees continuous production under single resource failure and uses shared buffer space to position FD parts when resources fail. However, RO4 cannot handle simultaneous multiple resource failures. In other words, when multiple resources fail, RO4 may not prevent the propagation of blocking. NHC+BA works with multiple resource failures by restricting the number of FD parts allowed in the system by the buffer sizes of FD resources. A smaller percentage of downtime and a shorter failure cycle indicate less chance for multiple resource failures. By allowing the distribution of FD parts across the buffer spaces of resources in RCO, RO4 admits more FD parts into the system than NHC+BA and thus has
618
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 3, MAY 2008
VI. C ONCLUSION
Fig. 13. Interaction plot of policy and buffer size (multiple-unreliableresource system).
In this paper, we developed robust supervisory controllers for SU-RASs with unreliable resources. The first policy, which is RO2 , ensures robust operation for one unreliable resource, whereas the second, which is RO4 , ensures robust operation for several unreliable resources, given that at most one resource is in a failed state at a time. These policies permit part mixes that are more heavily weighted toward FD part types than our previously published work. They do this by allowing parts that require failed resources to be held in the buffer space of resources, producing both FD and NFD part types. We motivated these policies with examples, demonstrated their application, and rigorously established their correctness in the Appendix. Finally, we performed simulation experiments that demonstrate the production advantages that these policies can offer. In future research, we will carry out more extensive experimentation to investigate systems with multiple unreliable resources and to determine the best way to select and configure a robust supervisory controller for a given system. We will also address the idea of condition-based control, where we attempt to develop robust supervisors for systems where probabilistic degradation chains are used to model resource reliability and failure.
A PPENDIX A D ESIRED P ROPERTIES FOR A R OBUST S UPERVISORY C ONTROLLER
Fig. 14. Interaction plot of policy and failure cycle (multiple-unreliableresource system).
Fig. 15. Interaction plot of policy and percentage of downtime (multipleunreliable-resource system).
a higher FD production rate. With increasing percentage of downtime and failure cycle, multiple resource failures occur more frequently. Under RO4 , blocking may propagate, and continuous production may be stalled. Thus, NHC+BA dominates RO4 and RO in FD production.
This section formally develops and defines a set of desired properties for a robust supervisory controller using language theory. Recall that our system is S = R, C, P, ρ, Q, Q0 , Σ, ξ, δ. Let L(S) ⊆ Σ∗ be the uncontrolled language generated by S. Furthermore, for a string σ ∈ L(S) and an event π ∈ Σ, let πσ be the score (number of occurrences) of π in σ. The state transition function δ is extended in the usual way, i.e., for σ ∈ L(S) leading from state qo ∈ Qo to q ∈ Q and π ∈ Σ ∩ ξ(q), we have δ(q, π) = δ(δ(qo , σ), π) = δ(qo , σπ). The controlled language L(∆, S) ⊆ L(S) represents the behavior exhibited by the system S under the control of supervisor ∆. Here, ∆ is a function mapping L(S) to the power set of Σ. Specifically, ∆ : L(S) → 2Σ such that for σ ∈ L(S), ∆(σ) is the control action for S at state δ(q0 , σ) with q0 ∈ Q0 . S is only allowed to execute an event of ∆(σ) ∩ ξ(δ(q0 , σ)). Hence, under ∆, π ∈ ∆(σ) ∩ ξ(δ(q0 , σ)) is “admissible,” whereas π ∈ ξ(δ(q0 , σ))\∆(σ) is “inadmissible.” Note that π must be in Σc because it is assumed that (Σu ∩ ξ(δ(q0 , σ))) ⊆ ∆(σ), i.e., ∆ is not allowed to disable any enabled uncontrollable events at state δ(q0 , σ). Let L(S/Σu2 ) ⊆ {Σc ∪ Σu1 }∗ represent the uncontrolled language of S, given that resource failures do not occur. Let L(∆, S/Σu2 ) ⊆ L(S/Σu2 ) represent the controlled language of S under ∆, given that resource failures do not occur. Let Q∆ = {qu : for some σ ∈ L(∆, S/Σu2 ), qu = δ(qo , σ)}. The first required property of the supervisory controller ∆ is that it keeps the system deadlock free in the absence of resource failure, i.e., that it keeps the system safe. In terms of strings
WANG et al.: USING SHARED-RESOURCE CAPACITY FOR ROBUST CONTROL
and events, we express this as follows: Assuming no resource failure, ∆ must guarantee that ∀σ1 ∈ L(∆, S/Σu2 ) and n ∈ ℵ (natural numbers), ∃σ2 ∈ L(∆, S/Σu2 ) such that σ1 is a prefix of σ2 and ∀π ∈ Σc ∪ Σu1 , πσ2 > n. This basically states that, in the absence of resource failures, the system can continue to produce all of its part types indefinitely. Now, suppose that the system has executed the event sequence σ1 ∈ L(∆, S/Σu2 ), that the system is in state qu = δ(qo , σ1 ), and that the server of ri ∈ RU is busy in this state. If we append a failure event κi onto σ1 , we get σ1 κi ∈ L(∆, S) and state δ(qu , κi ). The event set that S can generate is now reconfigured; in fact, we will say that we have a modified events generator S i that bounds the occurrence of certain events, at least those in ψi . Furthermore, S i must start in initial state δ(qu , κi ), and in fact, the set of initial states for S i can be defined as Qi = {δ(qu , κi ) : qu ∈ Q∆ and qu enables κi }. Let L(∆, S i /Σu2 ) be the controlled language of S i , assuming no further event failure or repair, and let Qi∆ = {qv : for some qu ∈ Qi there exists σ ∈ L(∆, S i /Σu2 ) such that qv = δ(qu , σ)}. ∆ must now keep S i safe while the failed resource is being repaired. That is, assuming no further resource failure or repair, ∆ must guarantee that ∀σ1 ∈ L(∆, S i /Σu2 ) and n ∈ ℵ, ∃σ2 ∈ L(∆, S i /Σu2 ) such that σ1 is a prefix of σ2 and ∀π ∈ {Σc ∪ Σu1 }/ψi , πσ2 > n. This basically states that, in the absence of additional resource failures or repairs, the system can continue to produce all part types not requiring failed resource ri ∈ RU . This further implies that, while supervising S, ∆ must constrain S to feasible initial states for S i , i.e., the initial states of S i for which continuing operation is possible. Now, suppose that the system has executed the event sequence σ1 κi σ2 ∈ L(∆, S), where σ1 ∈ L(∆, S/Σu2 ), κi ∈ Σu2 , and σ2 ∈ L(∆, S i /Σu2 ), and that the uncontrollable repair event ηi ∈ Σu2 occurs. Then, we have the event sequence σ1 κi σ2 ηi ∈ L(∆, S) and state δ(qo , σ1 κi σ2 ηi ) = δ(qu , κi σ2 ηi ) = δ(qu , σ2 ηi ) = δ(qv , ηi ) = qv . The event set that S can generate has now been restored, and ∆ must once again supervise S, this time, starting in initial state qv . This implies that, in supervising S i , ∆ must constrain S i to feasible initial states for S, i.e., the initial states of S from which continuing operation is possible. The following is now possible. Property A: Supervisory controller ∆ is robust to the failure of resource ri ∈ RU , if the following are true. A.1 ∀σ1 ∈ L(∆, S/Σu2 ) and ∀n ∈ ℵ, ∃σ2 ∈ L(∆, S/Σu2 ) such that σ1 is a prefix of σ2 and ∀π ∈ Σc ∪ Σu1 , πσ2 > n. A.2 For every qu ∈ Q∆ that enables κi , the state δ(qu , κi ) serves as a feasible initial state for S i . A.3 ∀σ1 ∈ L(∆, S i /Σu2 ) and ∀n ∈ ℵ, ∃σ2 ∈ L(∆, S i /Σu2 ) such that σ1 is a prefix of σ2 and ∀π ∈ {Σc ∪ Σu1 )/ψi , πσ2 > n. A.4 For every qv ∈ Q ∆ , the state δ(qv , ηi ) serves as a feasible initial state for S. Property A is the formal statement of Property 2.1.
619
A PPENDIX B C ORRECTNESS P ROOF FOR RO2 This section establishes that RO2 satisfies Property A for systems with |RU | = 1. For this purpose, we will use a number of lemmas to develop simple results that, when combined, establish the correctness of RO2 . Lemma B.1 establishes that if a part-type stage is NFD, then so are its successors. Lemma B.1: Pjk ∈ P NFD implies that Pjm ∈ P NFD for m ≥ k. Proof: Suppose Pjm ∈ P FD (i.e., Pjm ∈ P NFD ) for some m ≥ k. Then, by definition, Pjk ∈ P FD (i.e., Pjk ∈ P NFD ). Lemma B.2 establishes that if a part-type stage is NFD, then its associated resource is in RCO. Lemma B.2: Pjk ∈ P NFD implies that ρ(Pjk ) ∈ RCO. Proof: Recall that RCO = RNFD ∪ RPFD . ρ(Pjk ) ∈ RCO implies ρ(Pjk ) ∈ (R\RCO) = (R\(RNFD ∪ RPFD )) = (R\RNFD )\RPFD = RFD \RPFD . Thus, Pjk ∈ P FD (i.e., Pjk ∈ P NFD ) because every stage supported by a resource of RFD \RPFD is FD. Lemma B.3 establishes that if a part-type stage is NFD, then its residual route is contained in RCO. Lemma B.3: Pjk ∈ P NFD implies Tjk ⊆ RCO. Proof: Follows directly from Lemmas B.1 and B.2. Lemma B.3 asserts that an NFD part will never visit a resource of R\RCO. Lemma B.4 establishes that an FD part requires a resource of RFD for its processing. Lemma B.4: Pjk ∈ P FD implies ρ(Pjk ) ∈ RFD . Proof: If ρ(Pjk ) ∈ R\RFD = RNFD (i.e., ρ(Pjk ) ∈ FD R ), then Pjk ∈ P NFD (i.e., Pjk ∈ P FD ). Lemma B.5 establishes that, under RO2 , no deadlock structure can arise. Lemma B.5: Suppose q ∈ Q2RO (i.e., state q is admitted by RO2 ), and let D ⊆ R. Then, D is not a deadlock at q ∈ Q2RO . Proof: Let Π be the set of parts present in the system at state q ∈ Q2RO . Let ΠFD be the set of FD parts in the system at state q, ΠNFD be the set of NFD parts in the system at state q, and Πg be the set of parts located at rg at state q. It is clear that, Π = ΠFD ∪ ΠNFD with ΠFD ∩ ΠNFD = ∅. Suppose that D ⊆ R is in deadlock at q ∈ Q2RO . Because D is in deadlock, |D| > 1 and ∀ri ∈ D, |Πi | = Ci . Thus, D ⊂ RCO because RORCO allows at most one capacitated resource in RCO. Similarly, D ⊂ R\RCO = RFD \RPFD because RORFD allows at most one resource filled with only FD parts in RFD. These two arguments imply that |D| = 2 with |D ∩ RCO| = 1 and |D ∩ (R\RCO)| = 1 (note that R\RCO is the complement of RCO). Let {rh } = (D ∩ RCO) = (D ∩ (RPFD ∪ RNFD )) and {rg } = D ∩ (R\RCO) = D ∩ (RFD \RPFD ). Clearly, rg is an FD resource that processes only FD parts, but rh might be either FD or NFD. Note that |Πh | = Ch , and we claim that rh is filled with FD parts. To see this, ∀pjk ∈ Πh , ρ(Pj,k+1 ) = rg , because rg and rh are the sole resources in deadlock. Thus, by definition of FD parts, all parts in Πh are FD, i.e., Πh ⊆ ΠFD , because ρ(Pj,k+1 ) = rg ∈ RFD \RPFD . This implies rh ∈ RCO ∩ RFD = RPFD . Thus, in state q ∈ Q2RO , we have two resources rg and rh from
620
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 3, MAY 2008
RFD, both filled with FD parts, which violate RORFD . This contradicts the assumption that q ∈ Q2RO . Lemma B.6 establishes that for any state admitted by RO2 , there will be a sequence of resource allocations admitted by RO2 that completes all NFD parts. Lemma B.6: For q ∈ Q2RO , there exists a sequence of resource allocations admitted by RO2 that empties the system of NFD parts. Proof: Let q ∈ Q2RO . We need to prove that, beginning at q, there is a sequence of resource allocations admitted by RO2 to empty the system of all NFD parts. We consider two cases. Case 1—q ∈ Q2RO Exhibits No Capacitated Resource in RCO: Every resource of RCO must have at least one free unit of capacity. By Lemma B.3, every NFD part must be in RCO. Select any NFD part and advance it one step. This is possible because no RCO resource is capacitated. The resulting state has at most one capacitated resource in RCO. Thus, it satisfies RORCO . Also, RORFD remains unaffected because NFD parts do not appear in its constraints. Thus, the resulting state satisfies RO2 . Iterate until either RCO is empty of NFD parts or RCO exhibits one capacitated resource. If the first condition occurs, we are done. If the second condition occurs, go to Case 2. Case 2—q ∈ Q2RO Exhibits One Capacitated Resource in RCO: There are two possibilities. Case 2.1 considers the situation where the capacitated resource, for example, rg , holds at least one NFD part. Case 2.2 considers the situation where the capacitated resource, which is again rg , is filled with FD parts. Case 2.1: The capacitated resource rg holds at least one NFD part. From rg , advance an NFD part, for example, pjk , one step. This is possible because the resource where it is advancing to is again in RCO by Lemma B.3 and is not capacitated. The resulting state has at most one capacitated resource in RCO. Thus, it satisfies RORCO . RORFD remains unaffected because NFD parts do not appear in its constraints. Thus, the resulting state satisfies RO2 . This state satisfies either the condition of Case 1 or Case 2.1. If Case 2.1 is satisfied, repeat the procedure of Case 2.1. If Case 1 is satisfied, repeat the procedure of Case 1. Continue until RCO is empty of NFD parts. Case 2.2: The capacitated resource rg is filled with FD parts. Then, by Lemma B.4, rg ∈ RCO ∩ RFD = RPFD . In addition, rg is the only capacitated resource in the system, because its FD parts appear in both RORCO and RORFD . Select any part on rg and advance it one step. The advancement either results in a single new capacitated resource or in no capacitated resource, both of which satisfy RO2 . If the resulting state has no capacitated resource, follow the logic of Case 1. If the resulting state has a capacitated resource with at least one NFD part, follow the logic of Case 2.1. If the resulting state has a capacitated resource filled with FD parts, then there are two possibilities. Either the resource is in the following. 1) RPFD ⊆ RCO. 2) RFD \RPFD (where RCO ∩ (RFD \RPFD ) = ∅). That is, either the resource is in RCO or it is in R\RCO. If 1) is true, iterate the logic of Case 2.2; if 2) is true, follow the logic of Case 1. This procedure will terminate in a finite number of steps with all NFD parts being removed from the system.
Lemma B.7: For q ∈ Q2RO with ΠNFD = ∅, there exists a sequence of resource allocations admitted by RO2 that empties the system of FD parts. Proof: Let q ∈ Q2RO such that ΠNFD = ∅. By Lemma B.4, every FD part must be held by a resource of RFD. If RFD contains a capacitated resource, this must be the only capacitated resource in the system, and it is filled with FD parts. Select any part from the capacitated resource and advance it one step. Else, if no capacitated resource exists, select any part and advance it one step. There are two possible outcomes. 1) The advanced part remains an FD part. 2) The advanced part becomes an NFD part, rendering ΠNFD = ∅. In either case, the resulting state exhibits at most one capacitated resource and thus satisfies both RORCO and RORFD . If 1) is true, then we continue to iterate the aforementioned step. If 2) is true, then we follow the proof of Lemma B.6 to clear the NFD part. After that, if ΠFD = ∅, we continue to iterate the aforementioned steps. It is obvious that a finite number of iterations will empty the system of FD parts. Lemma B.8: RO2 ensures safety for the system, given that ri ∈ RU does not fail. Proof: This follows directly from Lemmas B.6 and B.7. Lemma B.9: ∀qu ∈ Q2RO such that κi ∈ ξ(qu ), δ(qu , κi ) = qu is a feasible initial state for the reduced system. Proof: We need to prove that, beginning at qu , we are able to continue producing every part type that is not requiring the failed resource ri . To do this, we will establish a sequence of resource allocations permitted by RO2 that advances all NFD parts out of the system. By Lemma B.6, there exists an admissible sequence of events, for example, σ, beginning at qu , that advances every NFD part out of the system, given that the unreliable resource does not fail. In fact, σ remains valid, beginning at qu . To see this, note that σ does not contain any service completion events βjk ’s such that Pjk ∈ Ωi . As a result, the status of unreliable resource ri is irrelevant to σ. In other words, the occurrence of the failure event for ri , which is κi , at qu does not in any way influence σ. Therefore, σ is valid for removing NFD parts from the system, beginning at qu . Furthermore, by the proof of Case 2.2 of Lemma B.6, δ(qu , σ) has no capacitated resource in RCO. Now, suppose we load new NFD parts into the system as long as RO2 is not violated. Then, by the aforementioned logic, these parts can be completed. Thus, the production of every part type that is not requiring ri can continue indefinitely starting from qu , and hence, qu is a feasible initial state. Lemma B.10: RO2 ensures safety for the system, given that ri ∈ RU has failed. Proof: Follows directly from the proof of Lemma B.9. Lemma B.11: ∀qv ∈ QiRO2 such that ηi ∈ ξ(qv ), δ(qv , ηi ) = qv is a feasible initial state for the upgraded system. Proof: We need to establish that, beginning at qv , there is an admissible sequence of resource allocations that empties the system. By Lemma B.10, all NFD parts can be completed by executing an admissible sequence σ that is unaffected by the status of ri ∈ RU . Thus, σ is valid for removing NFD
WANG et al.: USING SHARED-RESOURCE CAPACITY FOR ROBUST CONTROL
parts from the system, beginning at qv . Because δ(qv , σ) satisfies RO2 , Lemma B.7 guarantees an admissible sequence that completes all FD parts, for example, τ , beginning with δ(qv , σ). Thus, δ(qv , στ ) = qo ∈ Qo . Now, because the system is empty and all resources are operational, it is obvious that the production of every part type can continue indefinitely, because RO2 guarantees safety for the upgraded system. Theorem B.1 now follows directly. Theorem B.1: If |RU | = 1, RO2 satisfies the requirements of Property A and is therefore robust to failures of the unreliable resource. Proof: Follows directly from Lemmas B.8–B.11. We have now established that controller RO2 is robust to resource failures for systems with |RU | = 1. A PPENDIX C C ORRECTNESS P ROOF FOR RO4 This section establishes that RO4 satisfies Property A for systems with |RU | > 1 under the assumption that each part type requires at most one unreliable resource and that we only have one failure at a time. As stated before, if more than one unreliable resource fails, parts that are not requiring those resources may be blocked from production until repair events occur. Lemma C.1 ensures the following property: Suppose ΠNFD = ∅ and ROD contains a capacitated resource. Then, ROD holds a part such that every resource required to advance the part to its unreliable resource has at least one free unit of capacity. Lemma C.1: Let q ∈ Q4RO and ΠNFD = ∅. If rg ∈ ROD for some ri ∈ RU , with |Πg | = Cg , then ∃pjk ∈ Πg ∩ ΠFD i such that ∀rh ∈ ρ(Pjk ) = rg , . . . , ρ(Pj,k+c ) = ri distinct from rg , |Πh | < Ch . Proof: The set Πg ⊆ ΠFD , because ΠNFD = ∅. By ROD , rg is the only capacitated resource filled with FD RO parts in ROD. In addition, rg is the only capacitated resource in ROD because there are no NFD parts in ROD. By Lemma B.4, Pjk , . . . , Pj,k+c ⊆ PiFD , and thus, we have ρ(Pjk ) = rg , . . . , ρ(Pj,k+c ) = ri ⊆ RFD. In other words, every resource required for advancing a part pjk at rg into the buffer of its required unreliable resource ri is an FD resource. We now establish the proof by contradiction. Assume that ∀ri ∈ RU , ∀pjk ∈ Πg ∩ ΠFD i , ∃rh ∈ ρ(Pjk ) = rg , . . . , ρ(Pj,k+c ) = ri such that rh = rg and |Πh | = Ch . Because ρ(Pjk ) = rg , . . . , ρ(Pj,k+c ) = ri ⊆ RFD, rh ∈ RFD, and thus, rh ∈ ROD, because rg is the only capacitated resource in ROD. Because ROD = RFD\RU , it is clear that rh ∈ RU . By RORFD2 , at most two resources of RFD may be capacitated in an admissible state, and thus, rh is the only capacitated resource in RU (because rg and rh are both capacitated, and in RFD, no other resource in RFD can be capacitated). As a result, ∀pjk ∈ Πg , pjk ∈ ΠFD h . We now have and Πh ⊆ |Πg | = Cg and |Πh | = Ch such that Πg ⊆ ΠFD h RFD RFD FD (RO does not admit Πh , which is a violation of RO a state if it has two resources in RFD filled with the parts that are FD on the same unreliable resource). Thus, we have a contradiction.
621
Lemma C.2 now establishes that if we have a capacitated resource in RPFD holding only FD parts, then the resource holds at least one part such that every resource required to advance the part to its unreliable resource has at least one free unit of capacity. This result places no restriction on NFD parts. Lemma C.2: Let q ∈ Q4RO and rg ∈ RPFD with |Πg | = for some ri ∈ Cg and Πg ⊆ ΠFD . Then, ∃pjk ∈ Πg ∩ ΠFD i RU , such that ∀rh ∈ ρ(Pjk ) = rg , . . . , ρ(Pj,k+c ) = ri distinct from rg , |Πh | < Ch . Proof: Because rg ∈ RPFD ⊆ RFD \RU = ROD, ROROD guarantees that rg is the only capacitated resource in ROD filled with FD parts. Because rg ∈ RPFD ⊆ RCO, RORCO guarantees that rg is the only capacitated resource in ROC. Thus, rg is the only capacitated resource in RCO ∪ ROD = RPFD ∪ RNFD ∪ ROD = RPFD ∪ RNFD ∪(RFD \RU ) = (RFD ∪ RNFD )\RU = R\RU . Thus, if any other resource is capacitated, it must be in RU . To complete the proof, apply the contradiction of the proof of Lemma C.1. Lemmas C.1 and C.2 guarantee that a part in ROD held by a capacitated resource filled with FD parts can be advanced into the buffer of its required unreliable resource. Lemmas C.3 and C.4 show that this advancement does not violate RO4 . Lemma C.3: Assume that the conditions of Lemma C.1 hold. That is, let q ∈ Q4RO , ΠNFD = ∅, and rg ∈ ROD with |Πg | = such that ∀rh ∈ Cg and Πg ⊆ ΠFD . Let pjk ∈ Πg ∩ ΠFD i ρ(Pjk ) = rg , . . . , ρ(Pj,k+c ) = ri distinct from rg , |Πh | < Ch . Let σjm = αj,k+1 βj,k+1 , . . . , αj,k+m , for m = 1, . . . , c. Then, {δ(q, σj,k+m ) : m = 1, . . . , c} ⊆ Q4RO . That is, all states encountered during the advancement of pjk from rg to ri will satisfy RO4 . Proof: We first establish that RORFD is not violated by δ(q, σj,k+m ) for m = 0, . . . , c. For the sake of induction, we assume that σjk = σj,k+0 = ε (the null string) so that δ(q, σjk ) = δ(q, σj,k+0 ) = δ(q, ε) = q. Note that, because δ(q, σj,k+0 ) = q ∈ Q4RO , q does not violate RORFD . This is our base case. Now, we show that if δ(q, σj,k+n ) satisfies RORFD , then δ(q, σj,k+n+1 ) for n = 1, . . . , c − 1 also satisfies RORFD . Consider state δ(q, σj,k+n ). In this state, the advanced part pj,k+n is at resource ρ(Pj,k+n ). This is the only resource that can be capacitated along the sequence ρ(Pjk ) = rg , . . . , ρ(Pj,k+c ) = ri , because, in state q, only rg was capacitated, and advancing one part from rg along this sequence can result in at most one capacitated resource in the sequence. Because, by assumption, δ(q, σj,k+n ) ∈ Q4RO , δ(q, σj,k+n ) does not violate RORFD , i.e., δ(q, σj,k+n ) has at most one resource, for example, rt , in RFD that is filled with parts that are FD on the same unreliable resource. There are three cases to consider for δ(q, σj,k+n ). 1) rt does not exist. 2) rt exists, and rt = ρ(Pj,k+n ). 3) rt exists but rt = ρ(Pj,k+n ). For cases 1) and 2), δ(q, σj,k+n+1 ) has at most one resource in RFD filled with parts that are FD on the same unreliable resource, and this capacitated resource must be ρ(Pj,k+n+1 ),
622
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 3, MAY 2008
because ρ(Pj,k+n+1 ) is the only resource that could be filled by advancing pj,k+n one step (note that, after advancing pj,k+n one step, ρ(Pj,k+n ) cannot be capacitated). Thus, for these two cases, δ(q, σj,k+n+1 ) satisfies RORFD . For case 3), because ρ(Pj,k+n ) ∈ ROD ⊆ RFD and ROROD is not violated in δ(q, σj,k+n ), rt ∈ RFD\ROD = RU . Note that rt = ri , because, by Lemma C.1, ri is not capacitated. Because pj,k+n is FD on ri , in state δ(q, σj,k+n+1 ), ρ(Pj,k+n+1 ) cannot be filled with parts that are FD on rt , because pj,k+n+1 is not FD on rt . Thus, for this case, δ(q, σj,k+n+1 ) satisfies RORFD . Thus, δ(q, σj,k+m ) for m = 0, . . . , c satisfies RORFD . We now establish that ROROD is not violated by δ(q, σj,k+m ) for m = 0, . . . , c. Because δ(q, σj,k+0 ) = q ∈ Q4RO , q does not violate ROROD . Suppose δ(q, σj,k+n ) ∈ Q4RO . Then, δ(q, σj,k+n ) does not violate ROROD , i.e., δ(q, σj,k+n ) has at most one resource, for example, rt , in ROD that is filled with FD parts. There are three cases to consider for δ(q, σj,k+n ). 1) rt does not exist. 2) rt exists, and rt = ρ(Pj,k+n ). 3) rt exists but rt = ρ(Pj,k+n ). For cases 1) and 2), δ(q, σj,k+n+1 ) has at most one resource in ROD filled with parts that are FD on the same unreliable resource, and this capacitated resource must be ρ(Pj,k+n+1 ), because ρ(Pj,k+n+1 ) is the only resource that could be filled by advancing pj,k+n one step (note that, after advancing pj,k+n one step, ρ(Pj,k+n ) cannot be capacitated). Thus, for these two cases, δ(q, σj,k+n+1 ) satisfies ROROD . For case 3), note that ρ(Pj,k+n ) is the only resource in the sequence ρ(Pjk ) = rg , . . . , ρ(Pj,k+c ) = ri that can be capacitated in δ(q, σj,k+n ). Thus, rt ∈ ρ(Pjk ) = rg , . . . , ρ(Pj,k+c ) = ri . Thus, rt is not affected by the sequence of events σj,k+n , in which rt was capacitated with FD parts in the original state q. Thus, for q, we have rt ∈ ROD and rg ∈ ROD, both capacitated with FD parts, which contradicts the induction hypothesis. Thus, we conclude that δ(q, σj,k+n+1 ) satisfies ROROD . Next, we establish that RORCO is not violated by δ(q, σj,k+m ) for m = 0, . . . , c. Because δ(q, σj,k+0 ) = q ∈ Q4RO , q does not violate RORCO . Suppose δ(q, σj,k+n ) ∈ Q4RO . Then, δ(q, σj,k+n ) does not violate RORCO , i.e., δ(q, σj,k+n ) has at most one resource, for example, rt , in RCO filled with parts. By the assumption of Lemma C.1, these must be FD parts (thus, there are no parts in the resources of RCO\RFD). Thus, rt ∈ RFD ∩ RCO = RPFD . There are three cases to consider for δ(q, σj,k+n ). 1) rt does not exist. 2) rt exists, and rt = ρ(Pj,k+n ). 3) rt exists but rt = ρ(Pj,k+n ). For cases 1) and 2), δ(q, σj,k+n+1 ) has at most one resource in RFD ∩ RCO filled with parts, and this capacitated resource must be ρ(Pj,k+n+1 ), because ρ(Pj,k+n+1 ) is the only resource that could be filled by advancing pj,k+n one step. Thus, for these two cases, δ(q, σj,k+n+1 ) satisfies RORCO . For case 3), note that ρ(Pj,k+n ) is the only resource in the sequence ρ(Pjk ) = rg , . . . , ρ(Pj,k+c ) = ri that can be capacitated in δ(q, σj,k+n ). Thus, rt ∈ ρ(Pjk ) =
rg , . . . , ρ(Pj,k+c ) = ri . Thus, rt is not affected by the sequence of events σj,k+n , which implies that rt was capacitated with FD parts in the original state q. Thus, for q, we have rt ∈ RCO ∩ RFD = RPFD ⊆ ROD and rg ∈ ROD\RCO, which violates ROROD , which is contrary to the induction hypothesis. Thus, we conclude that δ(q, σj,k+n+1 ) satisfies RORCO . 2 We now establish that RORFD is not violated by δ(q, σj,k+m ) for m = 0, . . . , c. Because δ(q, σj,k+0 ) = q ∈ 2 Q4RO , q does not violate RORFD . Suppose δ(q, σj,k+n ) ∈ 2 Q4RO . Then, δ(q, σj,k+n ) does not violate RORFD , i.e., δ(q, σj,k+n ) has at most two resources, for example, rs and rt , in RFD filled with FD parts. Because δ(q, σj,k+n ) satisfies ROROD , both are not in ROD = RFD \RU , and thus, at least one, for example, rs , must be in RU . Assume that rt ∈ RFD \RU and rs ∈ RU . By previous logic, rt must be ρ(Pj,k+n ). Note the following. 1) ρ(Pj,k+n+1 ) ∈ RFD \RU . 2) ρ(Pj,k+n+1 ) ∈ RU . If 1) is true, then δ(q, σj,k+n+1 ) has at most two resources rs ∈ RU and ρ(Pj,k+n+1 ) ∈ RFD \RU in RFD filled with FD parts. If 2) is true, then δ(q, σj,k+n+1 ) has at most two resources rs ∈ RU and ρ(Pj,k+n+1 ) ∈ RU in RFD filled with FD parts. Thus, 2 δ(Pj,k+n+1 ) satisfies RORFD . Thus, we have shown that δ(q, σj,k+0 ) = q ∈ Q4RO , and that if δ(q, σj,k+n ) ∈ Q4RO , then δ(q, σj,k+n+1 ) ∈ Q4RO , and this completes the proof. Lemma C.4: Assume that the conditions of Lemma C.2 hold. That is, let q ∈ Q4RO and rg ∈ RPFD with |Πg | = such that ∀rh ∈ Cg and Πg ⊆ ΠFD . Let pjk ∈ Πg ∩ ΠFD i ρ(Pjk ) = rg , . . . , ρ(Pj,k+c ) = ri distinct from rg , |Πh | < Ch . Let σjm = αj,k+1 βj,k+1 , . . . , αj,k+m for m = 1, . . . , c. Then, {δ(q, σj,k+m ) : m = 1, . . . , c} ⊆ Q4RO . That is, all states encountered during the advancement of pjk from rg to ri will satisfy RO4 . Proof: We first establish that RORFD is not violated by δ(q, σj,k+m ) for m = 0, . . . , c. For the sake of induction, we assume that σjk = σj,k+0 = ε (the null string) so that δ(q, σjk ) = δ(q, σj,k+0 ) = δ(q, ε) = q. Note that because δ(q, σj,k+0 ) = q ∈ Q4RO , q does not violate RORFD . This is our base case. Now, we show that if δ(q, σj,k+n ) satisfies RORFD , then δ(q, σj,k+n+1 ) for n = 1, . . . , c − 1 also satisfies RORFD . Consider state δ(q, σj,k+n ). In this state, the advanced part pj,k+n is at resource ρ(Pj,k+n ). This is the only resource that can be capacitated along the sequence ρ(Pjk ) = rg , . . . , ρ(Pj,k+c ) = ri , because, in state q, only rg was capacitated, and advancing one part from rg along this sequence can result in at most one capacitated resource in the sequence. Because, by assumption, δ(q, σj,k+n ) ∈ Q4RO , δ(q, σj,k+n ) does not violate RORFD , i.e., δ(q, σj,k+n ) has at most one resource, for example, rt , in RFD that is filled with parts that are FD on the same unreliable resource. There are three cases to consider for δ(q, σj,k+n ). 1) rt does not exist. 2) rt exists, and rt = ρ(Pj,k+n ). 3) rt exists but rt = ρ(Pj,k+n ).
WANG et al.: USING SHARED-RESOURCE CAPACITY FOR ROBUST CONTROL
For cases 1) and 2), δ(q, σj,k+n+1 ) has at most one resource in RFD filled with parts that are FD on the same unreliable resource, and this capacitated resource must be ρ(Pj,k+n+1 ), because ρ(Pj,k+n+1 ) is the only resource that could be filled by advancing pj,k+n one step (note that, after advancing pj,k+n one step, ρ(Pj,k+n ) cannot be capacitated). Thus, for these two cases, δ(q, σj,k+n+1 ) satisfies RORFD . For case 3), because ρ(Pj,k+n ) ∈ ROD ⊆ RFD and ROROD is not violated in δ(q, σj,k+n ), rt ∈ RFD\ROD = RU . Note that rt = ri , because, by Lemma C.2, ri is not capacitated. Because pj,k+n is FD on ri , in state δ(q, σj,k+n+1 ), ρ(Pj,k+n+1 ) cannot be filled with parts that are FD on rt , because pj,k+n+1 is not FD on rt . Thus, for this case, δ(q, σj,k+n+1 ) satisfies RORFD . Thus, δ(q, σj,k+m ) for m = 0, . . . , c satisfies RORFD . We now establish that ROROD is not violated by δ(q, σj,k+m ) for m = 0, . . . , c. Because δ(q, σj,k+0 ) = q ∈ Q4RO , q does not violate ROROD . Suppose δ(q, σj,k+n ) ∈ Q4RO . Then, δ(q, σj,k+n ) does not violate ROROD , i.e., δ(q, σj,k+n ) has at most one resource, for example, rt , in ROD that is filled with FD parts. There are three cases to consider for δ(q, σj,k+n ). 1) rt does not exist. 2) rt exists, and rt = ρ(Pj,k+n ). 3) rt exists but rt = ρ(Pj,k+n ). For cases 1) and 2), δ(q, σj,k+n+1 ) has at most one resource in ROD filled with parts that are FD on the same unreliable resource, and this capacitated resource must be ρ(Pj,k+n+1 ), because ρ(Pj,k+n+1 ) is the only resource that could be filled by advancing pj,k+n one step (note that, after advancing pj,k+n one step, ρ(Pj,k+n ) cannot be capacitated). Thus, for these two cases δ(q, σj,k+n+1 ) satisfies ROROD . For case 3), note that ρ(Pj,k+n ) is the only resource in the sequence ρ(Pjk ) = rg , . . . , ρ(Pj,k+c ) = ri that can be capacitated in δ(q, σj,k+n ). Thus, rt ∈ ρ(Pjk ) = rg , . . . , ρ(Pj,k+c ) = ri . Thus, rt is not affected by the sequence of events σj,k+n , and thus, rt could not be capacitated with FD parts in the original state q. Thus, for q, we have rt ∈ ROD and rg ∈ ROD, both capacitated with FD parts, which contradicts the induction hypothesis. Thus, we conclude that δ(q, σj,k+n+1 ) satisfies ROROD . Next, we establish that RORCO is not violated by δ(q, σj,k+m ) for m = 0, . . . , c. Because δ(q, σj,k+0 ) = q ∈ Q4RO , q does not violate RORCO . Suppose δ(q, σj,k+n ) ∈ Q4RO . Then, δ(q, σj,k+n ) does not violate RORCO , i.e., δ(q, σj,k+n ) has at most one resource, for example, rt , in RCO filled with parts. There are three cases to consider for δ(q, σj,k+n ). 1) rt does not exist. 2) rt exists, and rt = ρ(Pj,k+n ). 3) rt exists but rt = ρ(Pj,k+n ). For cases 1) and 2), δ(q, σj,k+n+1 ) has at most one resource in RFD ∩ RCO filled with parts, and this capacitated resource must be ρ(Pj,k+n+1 ), because ρ(Pj,k+n+1 ) is the only resource that could be filled by advancing pj,k+n one step. Thus, for these two cases, δ(q, σj,k+n+1 ) satisfies RORCO . For case 3), note that ρ(Pj,k+n ) is the only resource in the sequence ρ(Pjk ) = rg , . . . , ρ(Pj,k+c ) = ri that can be capacitated in δ(q, σj,k+n ), implying that rt ∈ ρ(Pjk ) =
623
rg , . . . , ρ(Pj,k+c ) = ri . Thus, rt is not affected by the sequence of events σj,k+n , which implies that rt was capacitated with parts in the original state q. Thus, for q, we have rt ∈ RCO and rg ∈ RPFD ⊆ RCO, which violate RORCO , which is contrary to the induction hypothesis. Thus, we conclude that δ(q, σj,k+n+1 ) satisfies RORCO . 2 We now establish that RORFD is not violated by δ(q, σj,k+m ) for m = 0, . . . , c. Because δ(q, σj,k+0 ) = q ∈ 2 Q4RO , q does not violate RORFD . Suppose δ(q, σj,k+n ) ∈ 2 Q4RO . Then, δ(q, σj,k+n ) does not violate RORFD , i.e., δ(q, σj,k+n ) has at most two resources, for example, rs and rt , in RFD filled with FD parts. Because δ(q, σj,k+n ) satisfies ROROD , both are not in ROD = RFD \RU , and thus, at least one, for example, rs , must be in RU . Assume that rt ∈ RFD \RU and rs ∈ RU . By previous logic, rt must be ρ(Pj,k+n ). Note the following. 1) ρ(Pj,k+n+1 ) ∈ RFD \RU . 2) ρ(Pj,k+n+1 ) ∈ RU . If 1) is true, then δ(q, σj,k+n+1 ) has at most two resources rs ∈ RU and ρ(Pj,k+n+1 ) ∈ RFD \RU in RFD filled with FD parts. If 2) is true, then δ(q, σj,k+n+1 ) has at most two resources rs ∈ RU and ρ(Pj,k+n+1 ) ∈ RU in RFD filled with FD parts. Thus, 2 δ(Pj,k+n+1 ) satisfies RORFD . Thus, we have shown that δ(q, σj,k+0 ) = q ∈ Q4RO , and that if δ(q, σj,k+n ) ∈ Q4RO , then δ(q, σj,k+n+1 ) ∈ Q4RO , and this completes the proof. Lemma C.5 asserts that we can advance every NFD part out of the system. Lemma C.5: For qu ∈ Q4RO , there exists a sequence of resource allocations admitted by RO4 that empties the system of NFD parts. Proof: Let qu ∈ Q4RO . We need to establish that, beginning at q, there exists an underlying sequence σ of resource allocations admitted by RO4 to empty the system of NFD parts. There are two cases. Case 1:—There Exists No Capacitated Resource in RCO: Every resource of RCO has a free unit of capacity at qu . As a result, basing on the arguments presented in Case 1 of Lemma B.6, we will be able to advance all NFD parts out of RCO. Furthermore, advancing an NFD part out of the system will not violate RO4 . This is true in that advancing an NFD part in its route will give rise to at most one capacitated resource in RCO and, thus, will not violate RORCO . In addition, RORFD , 2 ROROD , and RORFD remain intact, because these sets of constraints do not count NFD parts. Therefore, the sequence of the resulting states generated by advancing an NFD part out of the system will be admitted by RO4 . After clearing all NFD parts, we have ΠNFD = ∅, and thus, RNFD holds no part. Case 2—There Exists One Capacitated Resource, for Example, rg , in RCO: There are two possible cases to consider. First, Case 2.1 will deal with the situation where rg holds an NFD part. Case 2.2 will handle the situation where rg holds only FD parts. We show that we can advance all NFD parts out of the system, beginning at qu , without violating RO4 . Case 2.1—rg holds an NFD part: Basing on the arguments presented in Case 2.1 of Lemma B.6, we are able to
624
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 3, MAY 2008
advance from rg an NFD part out of the system. Furthermore, advancing this NFD part out of the system will not violate RO4 . Clearly, RORCO is not violated, because advancing this NFD part will result in at most one capacitated resource in 2 RCO. In addition, RORFD , ROROD , and RORFD are not affected, because they do not consider NFD parts. Therefore, the sequence of states generated by advancing this NFD part out of the system will be permitted by RO4 . After decapacitating rg , there exist no capacitated resources in RCO. To continue to clear the system of the remaining parts, we will now proceed to Case 1. Case 2.2—rg holds only FD parts: Lemma B.4 indicates that rg is in RCO ∩ RFD = RPFD . Lemma C.2 ensures that we can advance a part pjk of Πg into the buffer of ri such that ri ∈ RTjk ∩ RU . Lemma C.4 guarantees that no constraints will be violated as we advance pjk along its route. Hence, the sequence of states generated by advancing pjk into the buffer of ri will be acceptable to RO4 . Then, there exists no capacitated resource in RCO, and we may now proceed to Case 1 to continue to empty the system of the remaining NFD parts. Let rg ∈ RU . Lemma C.6 establishes the following: 1) If rg is operational, then every resource of the residual route of every part that is FD on rg is operational; and 2) if no reliable resource is capacitated, then, for every part that is FD on rg , every resource that is unique from rg in its residual route is uncapacitated. These results are necessary for us to establish safe sequences for FD parts. Lemma C.6: Let q ∈ Q4RO and rg ∈ RU . 1) If rg ∈ R, then ∀pjk ∈ ΠFD g , ∀rh ∈ RTjk , and rh ∈ R. 2) If ∀rf ∈ RR , |Πf | < Cf , then ∀pjk ∈ ΠFD g , we have h ∀rh ∈ RTjk and rh = rg , |Π | < Ch . Proof: Recall that ∀Pjk ∈ P , |RTjk ∩ RU | ≤ 1, and thus, 1) follows immediately. For 2), suppose that ∃pjk ∈ ΠFD g such that ∃rh ∈ RTjk and rh = rg , |Πh | = Ch . Then, rh ∈ RU because ∀rf ∈ RR , |Πf | < Cf . As a consequence, RTjk ∩ RU = {rg , rh }. Now, Lemma C.7 provides a sequence that advances FD parts out of the system when there are no NFD parts in the system. Lemma C.7: Let q ∈ Q4RO such that ΠNFD = ∅. Then, there = ∅ in exists a sequence σ such that ∀rg ∈ RU \R, ΠFD g δ(q, σ). Proof: We establish the proof by constructing a sequence σ of events to clear every set of FD parts associated with every operational unreliable resource. Suppose ∃rh ∈ ROD such that |Πh | = Ch . Because ΠNFD = ∅, Πh ⊆ ΠFD . Because ROROD allows in ROD at most one capacitated resource filled with FD parts, rh is the only capacitated resource in ROD. Because ΠNFD = ∅, every resource of RR \ROD is empty. Thus, rh is the only capacitated resource of ROD ∪ (RR \ROD) = ROD ∪ RR = (RFD \RU ) ∪ RR = RR . First, decapacitate rh by advancing a part pjk of Πh into the buffer of ru ∈ RTjk ∩ RU . This is possible by Lemma C.1. (Note that it is possible for ru ∈ R.) Assume that the resulting state obtained after advancing pjk into ru is q1 . By Lemma C.3, every state encountered in advancing pjk into the buffer of ru satisfies RO4 . Thus, q1 ∈ Q4RO and contains no capacitated resource in RR .
beginning at q1 . For all ri ∈ RU \R, we want to empty ΠFD i If ∃rg ∈ RU \R such that |Πg | = Cg , advance a part of Πg out of the system to decapacitate rg . Then, advance any remaining out of the system until ΠFD is empty. This is part of ΠFD g g possible by Lemma C.6 (although we have not shown that it satisfies RO4 ). When ∀rg ∈ RU \R, |Πg | < Cg , select any ΠFD = ∅ and g advance any part of ΠFD out of the system. Again, this is g FD possible by Lemma C.6. Repeat until Πg is empty. Clearly, we can repeat the procedure until ∀rg ∈ RU \R, ΠFD = ∅. g Lemma C.8 ensures that RO4 is not violated by the procedure of Lemma C.7. Lemma C.8: Let q ∈ Q4RO such that ΠNFD = ∅. Let σ be a = ∅ in δ(q, σ), and let sequence such that ∀rg ∈ RU \R, ΠFD g σ be any prefix of σ. Then, δ(q, σ ) ∈ Q4RO . Proof: The sequence σ of events obtained in the proof of Lemma C.7 will first decapacitate rh in ROD. Let σ1 be the sequence of events that advances a part from rh into the buffer of its required unreliable resource. Let σ1 be any prefix of σ1 . Then, by Lemma C.3, δ(q, σ1 ) ∈ Q4RO . Note that δ(q, σ1 ) = q1 ∈ Q4RO . U The sequence σ will next empty ΠFD i , ∀ri ∈ R \R, beginning at δ(q, σ1 ) = q1 . Let σ2 be a sequence of events = ∅ in q2 , such that δ(q1 , σ2 ) = q2 , where ∀ri ∈ RU \R, ΠFD i and if σ2 is any proper prefix of σ2 , ∃ri ∈ RU \R such that = ∅. Clearly, σ = σ1 σ2 . We now have to establish that ΠFD i δ(q1 , σ2 ) ∈ Q4RO . Note that δ(q1 , σ2 ) does not violate ROROD . To see this, note that q1 has no capacitated resource in RR and that advancing a single FD part out of the system can cause at most one capacitated resource in ROD ⊆ RR . Thus, δ(q1 , σ2 ) has at most one capacitated resource in ROD. By the same argument, δ(q1 , σ2 ) does not violate RORCO . 2 Now, suppose δ(q1 , σ2 ) violates RORFD . Then, δ(q1 , σ2 ) exhibits three capacitated resources in RFD. Because RR contains no capacitated resource at q1 , advancing an FD part out of the system can cause at most one capacitated resource in ROD ⊆ RR . We must then have two capacitated resources in RU at δ(q1 , σ2 ). By the procedure given in the proof of Lemma C.7, these two capacitated resources must be in R at q1 , and thus, |R| > 1. This contradicts our assumption that ∀q ∈ Q4RO , |R| ≤ 1. Therefore, advancing FD parts out of the system will 2 not violate RORFD . Finally, RORFD will not be violated by the arguments similar to those presented in Lemmas C.3 and C.4. We have now completed the proof. We will show that supervisor RO4 is robust to a single resource failure of RU at a time. To this end, we will first establish four lemmas as follows. The following Lemma C.9 indicates that the supervisor ensures safety for the system if no resource failure occurs. Lemma C.9: RO4 ensures safety for the system, given that R = ∅. Proof: Follows directly from Lemmas C.5 and C.8. Lemma C.10: ∀qu ∈ Q4RO and qu that enables κi ∈ Σu2 , we have δ(qu , κi ) = qu , serving as a feasible initial state for the reduced system.
WANG et al.: USING SHARED-RESOURCE CAPACITY FOR ROBUST CONTROL
Proof: It follows directly from Lemmas C.5 and C.8 that if qu ∈ Q4RO with R = ∅ and κi ∈ ξ(qu ), then, starting with qu = δ(qu , κi ), implying that R = ri in qu , it is possible, under the supervision of RO4 , to first empty the system of all NFD parts and then of all FD parts that are not requiring ri ∈ R. Furthermore, in the resulting state, the only possible capacitated resource will be ri ∈ R. Thus, no resource of R\R is capacitated, i.e., each resource of R\R has at least one unit of unoccupied buffer capacity. Next, we show that we are able to indefinitely produce every part that is not requiring ri . We can load and advance new parts so long as RO4 is not violated. By Lemma C.5, we can finish all NFD parts, and by Lemma C.8, we can finish all FD parts that are not requiring ri ∈ R. It is clear that every part that is not requiring ri ∈ R can continue to produce indefinitely in the reduced system so long as R = {ri }. Lemma C.11: RO4 ensures safety for the system, given that R = {ri }. Proof: Follows from the proof of Lemma C.10. Lemma C.12: For R = {ri } in qv with qv ∈ Q4 RO and ηi ∈ ξ(qv ), qv = δ(qv , ηi ) serves as a feasible initial state for the upgraded system. Proof: First, note that for qv , R = ∅. Furthermore, qv and qv have exactly the same distribution of parts, i.e., the same resource allocation, because the execution of ηi does not change the resource allocation state. Because Q4RO considers only the resource allocation state and not the status of unreliable resources, qv ∈ Q4RO implies that qv ∈ Q4RO . Thus, by Lemma C.9, the system is safe in state qv = δ(qv , ηi ). The following theorem is now readily available. Theorem C.1: Supervisory controller RO4 is robust for systems where |RU | ≥ 1 and |R| ≤ 1. Proof: It follows directly from Lemmas C.9–C.12. We have now established that controller RO4 is robust to a single resource failure for systems with |RU | ≥ 1. A PPENDIX D NHC NHC is a set of neighborhood constraints based on the notion of failure dependence. Informally, a resource is FD if every part that enters its buffer space requires some future processing on a given unreliable workstation. Thus, all unreliable resources are FD. Some reliable resources may also be FD if they only process parts that require future processing on a given unreliable resource. For each FD resource, we generate a neighborhood. The neighborhood of an FD resource is a virtual space of finite capacity that is used to control the distribution of parts requiring that given FD resource. In the following, we illustrate the NHC through example and refer the reader to [26] for an in-depth discussion. The system in Fig. 16 has two unreliable resources {r2 , r9 }. Note that, anytime r1 appears in a route, r2 appears later in the route, and thus, r1 is FD on r2 (and r2 is FD on itself). Also, anytime r7 or r8 appears in a route, r9 appears later in the route; therefore, the resources in the set {r7 , r8 , r9 } are FD on r9 . Thus, for r2 , we set up two neighborhoods, one for r1 and
625
Fig. 16. System with two unreliable resources.
one for r2 (call these NH21 and NH22 ), and for r9 , we set up three neighborhoods, one for r7 , one for r8 , and one for r9 (call these NH97 , NH98 , and NH99 ). These are defined as follows: NH21 = {P14 , P22 , P23 , P24 , P25 , P26 } NH22 = {P11 , P12 , P13 , P15 , P21 , P27 } NH97 = {P32 } NH98 = {P31 } NH99 = {P33 , P34 , P35 }. To understand this, consider NH21 and NH22 . Note that the support set of r1 is Ω1 = {P14 , P26 } and that the support set of r2 is Ω2 = {P13 , P15 , P21 , P27 }. Thus, {P14 , P26 } ⊆ NH21 , and {P13 , P15 , P21 , P27 } ⊆ NH22 . Now, consider T1 = {ρ(P11 ), ρ(P12 ), ρ(P13 ), ρ(P14 ), ρ(P15 )} = {r6 , r3 , r2 , r1 , r2 }. Because {r6 , r3 } precede r2 in the route but are not FD on r2 , parts in their support set that require r2 in future processing, which are {P11 , P12 }, will belong to NH22 . Similarly, T2 = {ρ(P21 ), ρ(P22 ), ρ(P23 ), ρ(P24 ), ρ(P25 ), ρ(P26 ), ρ(P27 ), ρ(P28 )} = {r2 , r3 , r4 , r6 , r5 , r1 , r2 , r3 }. Because {r3 , r4 , r6 , r5 } are not FD on r2 , {P22 , P23 , P24 , P25 } will belong to NH21 . Thus, we get NH21 = {P14 , P22 , P23 , P24 , P25 , P26 } and NH22 = {P11 , P12 , P13 , P15 , P21 , P27 }. We now construct neighborhood constraints. Our intention is to guarantee that every part in the neighborhood of an FD resource has capacity reserved at that resource. Recall that xjk is the number of finished instances, and yjk is the number of
626
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 3, MAY 2008
unfinished instances of Pjk located in the buffer of ρ(Pjk ). For the example, we have the following constraints:
(xjk + yjk ) ≤ C1 , NHC21 = Z12 = Pjk ∈NH21
Z22
=
=
(xjk + yjk ) ≤ C2
Pjk ∈NH22
NHC91
Z79 =
(xjk + yjk ) ≤ C7 ,
Pjk ∈NH97
Z89 =
(xjk + yjk ) ≤ C8 ,
Pjk ∈NH98
Z99
=
(xjk + yjk ) ≤ C9 .
Pjk ∈NH99
These constraints assure that none of the neighborhoods become overcapacitated. These constraints alone can induce deadlock among FD resources, because if all neighborhoods are capacitated, parts cannot move from one neighborhood to another without overcapacitating a neighborhood. Thus, we develop an additional set of constraints (called NHCi2 ) as follows. For NHCi2 , it is first necessary to compute the part flows among neighborhoods. For neighborhoods with mutual flow, we develop a constraint that allows only one of the neighborhoods to be filled at a time. For example, we see that NH21 and NH22 have mutual flows, because {P14 , P26 } ⊆ NH21 and {P13 , P27 } ⊆ NH22 (P13 moves from NH22 to NH21 where it becomes P14 , and P26 moves from NH21 to NH22 where it becomes P27 ). Thus, we get the constraint NHC22 = {Z12 + Z22 < C1 + C2 }, which guarantees that these two are not simultaneously capacitated. Note that there is no mutual flow between neighborhoods of FD resources for r9 , and thus, no additional constraints are generated for these. To summarize, NHC guarantees that no neighborhood is overcapacitated and that neighborhoods with mutual-flow dependencies are not simultaneously capacitated. A PPENDIX E BA BA is perhaps the most widely known DAP, and its underlying concepts have influenced the thinking of numerous researchers. BA is a suboptimal DAP in the sense that it achieves computational tractability by sacrificing some safe states. BA avoids deadlock by allowing an allocation only if the requesting processes can be ordered, so that the terminal resource needs for the ith process Pi in the ordering can be met by pooling available resources and those released by completed processes P1 , P2 , . . . , Pi−1 . The ordering is essentially a sequence in which all processes in the system can complete successfully. BA is of O(mn log n), where m is the number of resource types and n is the number of requests. For our purposes, we modify BA to search for an ordering of parts that advances FD parts (those requiring unreliable
Fig. 17.
Counterflow system.
resources) into the resource of their current neighborhood and NFD parts (those not requiring unreliable resources) out of the system. Our modifications are straightforward (see [26] for the detailed algorithm). Again, the ordering is such that the resources required by the first part are all available, those required by the second part are all available after the first part has finished and released the resources held by the part, and so forth. If the system can be cleared in this way (all FD parts are advanced into FD resources, and all NFD parts are advanced out of the system), then we can guarantee that, if any unreliable resource fails, the system can continue producing parts that do not require the failed resource. A PPENDIX F RO RO is a suboptimal DAP based on the intuition that parts flowing in opposite directions through the same set of resources must at some point be able to pass [14]. RO constraints are given as follows:
zij +
Pij ∈RUu
zkm < Cu + Cv
Pkm ∈LUv
∀ru , rv s.t. h(ru ) < h(rv ) and zij = xij + yij . RUu (LUv ) represents the set of part-type stages on ru (rv ) that are flowing to the “right” (“left”), and h represents an ordering of the resources (resources that are low in the order are “on the left,” and resources that are high in the order are “on the right”). Note that a constraint is generated for each pair of resources. This constraint sums the current number of the rightbound Pij ’s of the resource that is low in the order (to the left) and the current number of leftbound Pkm ’s of the resource that is high in the order (to the right) and ensures that this sum is always less than the combined capacity of the two resources. Applying this policy to the system in Fig. 17 will yield the following: RU1 = {P11 } LU1 = ∅ RU2 = {P12 } LU2 = {P23 } RU3 = {P13 } LU3 = {P22 } RU4 = ∅ LU4 = {P21 }. 1) 2) 3) 4) 5) 6)
z11 + z23 z11 + z22 z11 + z21 z12 + z22 z12 + z21 z13 + z21
< 3 + 4 = 7 (r1 < 3 + 3 = 6 (r1 < 3 + 2 = 5 (r1 < 4 + 3 = 7 (r2 < 4 + 2 = 6 (r2 < 3 + 2 = 5 (r3
and r2 ). and r3 ). and r4 ). and r3 ). and r4 ). and r4 ).
WANG et al.: USING SHARED-RESOURCE CAPACITY FOR ROBUST CONTROL
Constraint 1), for example, assures that the number of P11 ’s at r1 plus the number of P23 ’s at r2 is always less than the combined capacities of r1 and r2 . These constraints will disallow states such as that in Fig. 17, which violates constraint 3). For complete details, the reader is referred to [14]. R EFERENCES [1] Z. Banaszak and E. Roszkowska, “Deadlock avoidance in pipeline concurrent processes,” Podst. Ster. (Foundations of Control), vol. 18, no. 1, pp. 3–17, 1988. [2] Z. Banaszak and B. Krogh, “Deadlock avoidance in flexible manufacturing systems with concurrently competing process flows,” IEEE Trans. Robot. Autom., vol. 6, no. 6, pp. 724–734, Dec. 1990. [3] N. Viswanadham, Y. Narahari, and T. Johnson, “Deadlock prevention and deadlock avoidance in flexible manufacturing systems using Petri net models,” IEEE Trans. Robot. Autom., vol. 6, no. 6, pp. 713–723, Dec. 1990. [4] R. Wysk, N. Yang, and S. Joshi, “Detection of deadlocks in flexible manufacturing cells,” IEEE Trans. Robot. Autom., vol. 7, no. 6, pp. 853– 859, Dec. 1991. [5] Y. Leung and G. Sheen, “Resolving deadlocks in flexible manufacturing cells,” J. Manuf. Syst., vol. 12, no. 4, pp. 291–304, 1993. [6] F. Hsieh and S. Chang, “Dispatching-driven deadlock avoidance controller synthesis for flexible manufacturing systems,” IEEE Trans. Robot. Autom., vol. 10, no. 2, pp. 196–209, Apr. 1994. [7] J. Ezpeleta, J. Colom, and J. Martinez, “A Petri net based deadlock prevention policy for flexible manufacturing systems,” IEEE Trans. Robot. Autom., vol. 11, no. 2, pp. 173–185, Apr. 1995. [8] K. Xing, B. Hu, and H. Chen, “Deadlock avoidance policy for Petri-net modeling of flexible manufacturing systems with shared resources,” IEEE Trans. Autom. Control, vol. 41, no. 2, pp. 289–296, Feb. 1996. [9] M. Fanti, B. Maione, S. Mascolo, and B. Turchiano, “Event-based feedback control for deadlock avoidance in flexible production systems,” IEEE Trans. Robot. Autom., vol. 13, no. 3, pp. 347–734, Jun. 1997. [10] M. Lawley, S. Reveliotis, and P. Ferreira, “FMS structural control and the neighborhood policy—Part 1: Correctness and scalability,” IIE Trans., vol. 29, no. 10, pp. 877–899, 1997. [11] M. Lawley, S. Reveliotis, and P. Ferreira, “FMS structural control and the neighborhood policy—Part 2: Generalization, optimization, and efficiency,” IIE Trans., vol. 29, no. 10, pp. 889–899, 1997. [12] S. Reveliotis, M. Lawley, and P. Ferreira, “Polynomial-complexity deadlock avoidance policies for sequential resource allocation systems,” IEEE Trans. Autom. Control, vol. 42, no. 10, pp. 1344–1357, Oct. 1997. [13] M. Lawley, S. Reveliotis, and P. Ferreira, “The application and evaluation of Banker’s algorithm for deadlock-free buffer space allocation in flexible manufacturing systems,” Int. J. Flexible Manuf. Syst., vol. 10, no. 1, pp. 73–100, Feb. 1998. [14] M. Lawley, S. Reveliotis, and P. Ferreira, “A correct and scalable deadlock avoidance policy for flexible manufacturing systems,” IEEE Trans. Robot. Autom., vol. 14, no. 5, pp. 796–809, Oct. 1998. [15] M. Lawley and S. Reveliotis, “Deadlock avoidance for sequential resource allocation systems: Hard and easy cases,” Int. J. Flexible Manuf. Syst., vol. 13, no. 4, pp. 385–404, Oct. 2001. [16] M. Lawley and W. Sulistyono, “Robust supervisory control policies for manufacturing systems with unreliable resources,” IEEE Trans. Robot. Autom., vol. 18, no. 3, pp. 346–359, Jun. 2002. [17] S. Reveliotis, “Accommodating FMS operational contingencies through routing flexibility,” IEEE Trans. Robot. Autom., vol. 15, no. 1, pp. 3–19, Feb. 1999. [18] S. Park and J. Lim, “Fault-tolerant robust supervisor for discrete event systems with model uncertainty and its application to a workcell,” IEEE Trans. Robot. Autom., vol. 15, no. 2, pp. 386–391, Apr. 1999. [19] F. Hsieh, “Reconfigurable fault tolerant deadlock avoidance controller synthesis for assembly production processes,” in Proc. IEEE Conf. Man, Syst. Cybern., Nashville, TN, 2000, pp. 3045–3050. [20] F. Hsieh, “Fault-tolerant deadlock avoidance algorithm for assembly processes,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 34, no. 1, pp. 65–79, Jan. 2004. [21] F. Hsieh, “Robustness of deadlock avoidance algorithms for sequential processes,” Automatica, vol. 39, no. 10, pp. 1695–1706, Oct. 2003. [22] F. Hsieh, “Fault tolerant liveness analysis for a class of Petri nets,” in Proc. IEEE Int. Conf. Control Appl., Istanbul, Turkey, 2003, pp. 1046–1051.
627
[23] F. Hsieh, “Analysis of a class of controlled Petri net based on structural decomposition,” in Proc. 10th IFAC/ IFORS/IMACS/IFIP Symp. Large Scale Syst.: Theory Appl., Jul. 2004, pp. 51–57. [24] F. Hsieh, “Robustness of a class of controlled Petri nets,” in Proc. 36th Southeastern Symp. Syst. Theory, 2004, pp. 92–96. [25] M. Lawley, “Control of deadlock and blocking in production systems with unreliable workstations,” Int. J. Prod. Res., vol. 40, no. 17, pp. 4563–4582, 2002. [26] S. Chew and M. Lawley, “Robust supervisory control for production systems with multiple resource failures,” IEEE Trans. Autom. Sci. Eng., vol. 3, no. 3, pp. 309–323, Jul. 2006.
Shengyong Wang received the B.S. degree in mechanical engineering from the Beijing University of Aeronautics and Astronautics, Beijing, China, the M.S. degree in innovation in manufacturing system and technology from the Singapore–Massachusetts Institute of Technology Alliance, Singapore, and the Ph.D. degree in industrial engineering from Purdue University, West Lafayette, IN, in 2000, 2001, and 2006, respectively. He is currently a Research Assistant Professor with the Department of Systems Science and Industrial Engineering, State University of New York, Binghamton. His research interests include healthcare engineering, discrete-event systems, modeling and simulation, production system analysis, and robust supervisory control. Dr. Wang is a member of the Institute for Operations Research and the Management Sciences.
Song Foh Chew received the B.S. degree in mathematics with a minor in physics from Bemidji State University, Bemidji, MN, and the M.S. and Ph.D. degrees in industrial engineering from Purdue University, West Lafayette, IN, in 1992, 1997, and 2005, respectively. He is currently an Assistant Professor of Operations Research with the Department of Mathematics and Statistics, Southern Illinois University Edwardsville, Edwardsville. His research interests are primarily in the areas of deadlock avoidance and robust supervisory control of resource allocation systems.
Mark A. Lawley received the Ph.D. degree in mechanical engineering from the University of Illinois, Urbana–Champaign in 1995. He is currently an Associate Professor with the Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN. Before joining the Weldon School of Biomedical Engineering in 2007, for nine years, he served as an Assistant and Associate Professor of industrial engineering, also at Purdue University, and held engineering positions with Westinghouse Electric Corporation, Emerson Electric Company, and the Bevill Center for Advanced Manufacturing Technology. As a researcher in academics, he has authored over 60 technical papers. He is particularly interested in developing optimal decision policies for system configuration and resource allocation in large healthcare systems. As a Regenstrief Scholar, he has focused on research initiatives with Wishard Hospital, Regenstrief Institute of Indianapolis, Richard L. Roudebush Veterans Administration Medical Center, Ascension Health, and St. Vincent Hospitals. His research has been supported by the National Science Foundation, Union Pacific Railroads, Consilium Software, General Motors, Ascension Health, Indiana State Department of Health, Regenstrief Foundation, St. Vincent Ministry, and many others. Dr. Lawley has won two best paper awards for his work in the control of flexible automation. In January 2005, he was appointed Regenstrief Faculty Scholar in support of Purdue University’s Regenstrief Center for Health Care Engineering.