Robust supervisory control policies for manufacturing systems with ...

12 downloads 126 Views 414KB Size Report
buffer space for staging and storing parts and a processor or server for .... resources have one unit of buffer space and one server and ..... rent neighborhoods.
346

IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 18, NO. 3, JUNE 2002

Robust Supervisory Control Policies for Manufacturing Systems With Unreliable Resources Mark A. Lawley, Member, IEEE, and Widodo Sulistyono

Abstract—Manufacturing practitioners and researchers have recognized the need to develop effective supervisory controllers for automated manufacturing systems. One active area of research has been developing deadlock avoidance methods for these systems. Almost all work to date has assumed that allocated resources do not fail. In this paper, we consider deadlock and blocking problems in systems with unreliable resources. Our objective is to develop supervisory control policies that allocate resources so that the failure of any given resource does not propagate through blocking to effectively stall other portions of the system. In other words, when a resource fails, we want the system to automatically continue producing all part types that do not require the failed resource. To accomplish this, the supervisory controller must ensure safety for the original system while avoiding states that do not serve as feasible initial states for the reduced system when an unreliable resource fails. The policy must then ensure safety for the reduced system while avoiding states that do not serve as feasible initial states for the original system so that transition to normal operation is smooth when the failed resource is restored. This paper illustrates this class of problems through several examples, identifies properties that controllers must satisfy to deal effectively with these problems, and develops two polynomial control policies that satisfy these properties. Index Terms—Deadlock avoidance, fault-tolerant, flexible manufacturing systems, robust control.

I. INTRODUCTION

O

VER THE PAST decade, manufacturing practitioners and researchers have recognized the need to develop effective supervisory controllers for automated manufacturing systems. One important and active area of research has been in developing control logic that guarantees deadlock-free operation for these systems. Tremendous progress has been made in understanding both the computational and structural aspects of the deadlock problem for a variety of manufacturing system classifications. Researchers have addressed deadlocking problems in systems with no routing flexibility and flexible routing through machine and sequence flexibility, systems with and without resource redundancy and those with conjunctive resource allocation needs. Manuscript received March 26, 2001; revised January 22, 2002. This paper was recommended for publication by Associate Editor M. Jeng and Editor N. Viswanadham upon evaluation of the reviewers’ comments. This work was supported in part by the National Science Foundation under Grant NSF GOALI 0085047. The authors are with the School of Industrial Engineering, Purdue University, West Lafayette, IN 47907-1287 USA (e-mail: [email protected]; [email protected]). Publisher Item Identifier S 1042-296X(02)06335-8.

One important area where little manufacturing control work has been done is in developing systems of control logic that robustly handle failure of components. Currently, in the majority of automated manufacturing systems, the failure of a single device such as a machine tool or even an individual sensor can cause the whole system to shutdown. This is a major problem that the manufacturing research community must begin addressing, especially now that we have built a solid foundation of deadlock avoidance research upon which to build. In this paper, we investigate resource allocation in manufacturing systems with unreliable resources. Resource failure could be caused by damaged tooling, quality out-of-control signals, missing or defective part programs or tools, failure of actuating and sensing systems, and so on. Our objective is to develop supervisory control policies that allocate resources so that the failure of any given resource does not propagate through blocking to effectively stall other portions of the system. In other words, when a resource fails, we want the system to automatically continue producing all part types that do not require that failed resource. Furthermore, when the failed resource is repaired, we want to resume producing the full range of part types supported by the system. This should all occur with a minimum amount of disruption, i.e., with a minimal amount of part shuffling and transport The paper is organized as follows. Section II provides the literature review and broadly discusses the unique contributions of this paper. Section III formally defines the properties that we want our controllers to exhibit and provides a taxonomy that classifies manufacturing system control problems with respect to robustness and redundancy. Section IV develops a robust supervisor using a suboptimal deadlock avoidance policy (DAP). Section V develops a robust supervisor using the optimal DAP and discusses the use of a central buffer. Section VI then concludes by discussing future research directions. II. LITERATURE REVIEW In this section, we review relevant research on deadlock avoidance and tolerance to resource failure in automated manufacturing systems. System deadlocks were first addressed by computer scientists responsible for developing resource allocation logic in computer operating systems (prominent among them are [1]–[5]). These researchers developed the idea of a resource allocation system (RAS) as being composed of a finite set of resources, either

1042-296X/02$17.00 © 2002 IEEE

LAWLEY AND SULISTYONO: ROBUST SUPERVISORY CONTROL POLICIES FOR MANUFACTURING SYSTEMS WITH UNRELIABLE RESOURCES

reusable or consumable, a set of “processes” that request and use these resources and a “banker” responsible for allocating the resources to the processes so that every process is eventually satisfied. The contributions of this early research include: 1) defining the concepts of “safety” (is deadlock avoidable?) and “safe sequence” (how can resources be allocated to avoid deadlock?) and developing algorithms that search for safe sequences [1], [2]; 2) using digraph-based formalisms to represent resource allocation, identifying necessary and sufficient conditions for deadlock and developing correct detection algorithms for certain RAS classes [3]; and 3) establishing the computational complexity of safety for certain RAS classes [4], [5]. In this literature, the safety problem is in general NP-complete, while the deadlock detection problem is of polynomial complexity. Thus, while an operating system could reasonably be expected to detect deadlock, computing safety in real time is not generally feasible. In the late 1980s, manufacturing researchers began to recognize that deadlock and safety were important problems in automated manufacturing (among the first were [6]–[9]). Manufacturing researchers possess two advantages that the computer scientists did not have. First, a part’s future resource needs (machines remaining to be visited, tools required, etc.) are well documented in the part’s process plan, and second, resource requirements in manufacturing are highly sequential. (The safety problem remains NP-complete, while deadlock detection is of polynomial complexity, [10].) By combining these two advantages with the methodologies developed in the computing research, manufacturing researchers were able to make many additional contributions. First, many researchers developed suboptimal deadlock avoidance policies, that is, deadlock avoidance policies that reject some safe resource allocations in order to achieve tractable computation [11]–[19]. Secondly, these researchers began to explicitly analyze and reason about resource allocation states that can be classified as neither safe nor deadlock, i.e., states that contain no deadlock, but from which deadlock cannot be avoided (prominent among this work is [20]–[23]). The analysis of these states allowed the identification of many special structures that render the safety problem polynomial (for a review of these special structures, see [10]). Other work investigates the relationship between flexible routing capabilities and the deadlock avoidance problem in manufacturing systems [24]–[26]. Although the study of systems with unreliable components has existed in computing for many years (see, for example, [27]–[31]), relatively little work exists with regard to manufacturing systems. Results from computing literature are typically not directly applicable to manufacturing systems because redundancy is not easy or inexpensive to achieve in a manufacturing system and the “jobs” that manufacturing systems process are tangible entities that occupy space, represent wealth, and require time consuming physical transport. Thus, we think the control implications of resource failure in manufacturing are best investigated by building on the deadlock avoidance work discussed in the previous paragraphs and that is what we do in this paper. Traditionally, manufacturing researchers have considered only the performance implications of resource failure [32]–[35], while only a few papers address the control implications. [36]

347

addresses the issue of part blocking in the presence of contingencies due to resource breakdown or introduction of expedient jobs. The author combines flexible routing with the deadlock avoidance policies developed in [17]–[19] to accommodate operational contingencies. This approach assumes that each part follows an assigned route until a failure occurs. At that point, a route reassignment is computed and some set of parts is removed from the system in order to continue operation. The work in [37] addresses fault-tolerant supervisory control by deriving necessary and sufficient conditions for the existence of a fault-tolerant supervisor. These conditions are stated in terms of language theoretic properties and are computationally intensive to compute. Finally, [38] develops a fault-tolerant controller for assembly processes using the Petri net formalism. This approach uses the concept of “minimal resource requirement” to determine acceptable control actions. That is, the firing of a transition is allowed only if the resultant marking has a reachable marking that covers the minimal resource requirements of the processes in the net. The author models resource failure as decrements in the net marking and develops bounds on marking changes for which the Petri net system model remains live. Our work differs from [36] in that we do not assume “outside” capacity to be sufficient to remove any number of parts from the system. In other words, our controllers take capacity outside the system (central buffer) into account in making allocation decisions. We differ from [37] in that we focus on developing robust polynomial control policies and not on establishing conditions for their existence. Finally, this work differs from [38] in that we accept that the failure of a resource will prevent our system from achieving its full range of production. Our objective is to control the system so that if a resource fails the system can continue to produce part types not requiring that resource. This does not imply that a Petri net model of the system would be live under failure. Section III formally defines the problems that we address in this paper. Section IV then develops a robust supervisor based on a suboptimal deadlock avoidance policy, while Section V develops a robust supervisor based on an optimal deadlock avoidance policy. Section VI concludes and discusses future work. III. ROBUST BUFFER SPACE ALLOCATION In order to clearly and formally define the property that we want our controllers to guarantee, we first present a discrete event model of relevant system components and their logical operations. For more information on these models, the reader is referred to [39]. The system is defined as a tuple . is the set of resource types in the system, with , , where is the set of resource is the set of resource types types not subject to “failure” and (natural numbers) be a funcsubject to failure. Let is the tion that returns the capacity of a resource, that is, number of identical units of . is the set of part types produced with each represented as an ordered set of processing stages, where represents the th

348

IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 18, NO. 3, JUNE 2002

Let

Fig. 1.

Example resource structure.

processing stage of . We will use to represent an . Let : such that actual instance of returns the resource required by . Thus, the route of is . Let . We will suppose our resource types to be workstations with buffer space for staging and storing parts and a processor or server for operating on parts. We will use the standard assumption from queuing theory that the server is not idle so long as there are unfinished parts in a workstation’s buffer space. The resource units that we are concerned with allocating are instances of the workstation’s buffer space. The controllers that we design are not intended to allocate the server among parts waiting at the workstation. We assume this to be done by some local queue discipline. Workstation “failure” will imply the failure of the workstation’s server, not any of its buffer space. We will assume that, when the server of a workstation fails, we can continue to allocate its buffer space up to capacity, but that none of the waiting parts can be processed and thus cannot proceed along their respective routes until the server is repaired. We further assume that server failure does not prevent finished parts occupying the workstation’s buffer space from being moved away from the workstation and proceeding along their respective routes. Finally, we assume that server failure does not damage or destroy the part being processed and that failure can only occur when the server is working. and in Fig. 1, each For example, consider resources having three units of buffer capacity and a single server. Suppose is finished at and next needs to visit , while that part is finished at and next needs to visit . Further, part is waiting to be processed at and the suppose that part server is busy with . If the server fails in this state, then is and is allowed to advance to still allowed to advance to , although it cannot be processed until the server is repaired. We are now in a position to define states and events. represent the set of system states, where Let sv , sv being the status of the server of an unreliable workstation (0 if failed, 1 if operational), being the number of unfinished (parts waiting and in-process) located in the buffer units of and being the number of finished units of space of located in the buffer space of . is the set of initial being the state in which no resources are states with allocated and all servers are operational. The dimension of is .

, where is the set of controllable events with repreto an instance of , that is, senting the allocation of is the event that an instance of a part of type advances into the buffer space of a workstation that will perform its th represents a finished part of type leaving operation. the system. We assume that the system controls the occurrence of these events through resource allocation decisions. is the set of uncontrollable events where represents the . completion of service for part stage represents the failure ( ) and repair ( ) of the server of unreliable resource . Service completions, failures and repairs are assumed to be beyond the controllers influence. be a function that, for a given state, returns Let the set of enabled events. This function is defined for a state, , as follows: and , if 1) For then Events that release new parts into the system are enabled when space is available on the first required workstation in the route. and , if 2) For and sv

then

If unfinished parts are in the buffer space of a workstation and the server is working, then the corresponding completion events are enabled. and , if 3) For and sv

then

If unfinished parts are in the buffer space of an unreliable machine and the server is working, then the corresponding failure event is enabled. , if 4) For sv

then

and

If the server of a machine is failed, the repair event is enabled and completion events corresponding to that machine are not. and , for , if 5) For and then When a part finishes its current operation and space becomes available on its next required machine, the event that advances that part is enabled. , then . If a part has com6) If pletely finished all of its operations, the event that unloads it from the system is enabled.

LAWLEY AND SULISTYONO: ROBUST SUPERVISORY CONTROL POLICIES FOR MANUFACTURING SYSTEMS WITH UNRELIABLE RESOURCES

349

(a)

(b) Fig. 2. Failure propagation in a simple system. (a) Blocking propagation. (b) Infeasible initial state.

Let follows:

be the state transition function defined as advancing a part service completion failure of server repair of server

where

,

, and

are the standard unit vectors with and sv , respectively. Note that , the null vector. fails, the uncontrollable event When resource occurs, the service completion event is disabled for all . This effectively bounds the occurrence of events until the uncontrollable repair event, , occurs. Our objective is to fails design resource allocation logic so that when the set of bounded events is exactly . For example, the small with routes system in Fig. 2 produces part types , , and , , and , respectively. All four resources have one unit of buffer space and one server and . Events are enumerated as follows: ,

Now suppose that the failure in Fig. 2(a) as the server of Then the failure bounded events of

event occurs is processing . will be , that is, these events can occur at most a finite number of times before the repair event, , occurs. Note, however, that for this state other events are also affected. None of the events in the can occur because is blocked set which is waiting for , while is blocked by by a unit waiting for the failed resource . In fact, the a unit of has effectively stalled the system through the failure of cannot be removed propagation of blocking. The unit of

since this particular system has no central buffering capacity. (In this work, we will develop resource allocation methods for both systems with and without central buffers. In addressing central buffer systems, we will assume that, like workstations, they accommodate a finite number of parts. This will be discussed more thoroughly later in the paper.) Fig. 2(b) illustrates an additional requirement for our controller. Suppose the system is in the given state when the server of fails. In this case, the system can continue to produce part to . Once this is accomplished, type by first advancing all bounded events are in , that is, the system can continue to is being repaired. However, produce ’s while the server of are deadwhen the repair event, , occurs, parts locked and production of these part types cannot resume. Thus, we add the requirement that in controlling the system between failure and repair events our controller must not allow states that stall the original system after the repair event occurs. With these motivating examples, we are prepared to formally develop the desired properties of our resource allocation policies. represent the uncontrolled Let and event language of the system and for let be the score (number of occurrences) of event in . We extend the state transition function to strings of events and enabled , in the usual way, that is, for . We define a supervisory controller, , as follows: for and state , let . , where means is admisThen means is inadmissible. Thus, desible and termines which of the enabled controllable events (resource allocations) is admissible and which is not. The admissible events may be executed whereas the inadmissible events must not be executed. represent the uncontrolled Let given that resource failures do not occur. language of represent the controlled language given that resource failures do not occur. Let of under for some (we ). assume that the system initially starts in

350

IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 18, NO. 3, JUNE 2002

The first required property of the supervisory controller is that it keeps the system deadlock-free in the absence of resource failure, i.e., that it keeps the system safe. In terms of strings and events, we express this as follows: assuming no resource failure, must guarantee that and (natural such that is a prefix of numbers), and , . This basically states that, in the absence of resource failures, the system can continue to produce all of its part types indefinitely. Now suppose that the system has executed event sequence , that the system is in state , is busy in this state. If we append and that the server of and state a failure event, , onto , we get . The event set that can generate is now reconfigured; in fact, we will say that we have a modified events generator that bounds the occurrence of certain events, at least those in . must start in initial state , and in fact Furthermore, the set of initial states for can be defined as and enables . be the controlled language of assuming Let no further event failure or repair and let for some

there exists such that

must now keep safe and running while the failed resource is being repaired. That is, assuming no further resource failure and or repair, must guarantee that , such that is a prefix of and , . This basically states that in the absence of additional resource failures or repairs, the system . can continue to produce all part types not requiring This further implies that, while supervising , must constrain to feasible initial states for , that is, initial states of for which deadlock-free operation is possible. Now suppose that the system has executed event se, where , quence , and and the uncontrollable , occurs. Then, we have event serepair event, and state quence . The event can generate has now been restored and must set that once again supervise , this time starting in initial state . must constrain to This implies that, in supervising , feasible initial states for , that is, initial states of from which deadlock-free operation is possible. The following definition is now possible. is operationally roDefinition 1: Supervisory controller , in system if: bust to the failure of resource, and , 1) such that is a prefix of and , . that enables , the state 2) For every serves as a feasible initial state for . and , 3) such that is a prefix of and , .

TABLE I ROBUSTNESS LEVEL

4) For every , the state serves as a feasible initial state for . To summarize: 1) guarantees that the system remains deadlock-free if there are no resource failures; 2) guarantees that the system visits only states for which continuing operation ; 3) guarantees is possible in the event of the failure of is failed; and deadlock-free operation for the system while is failed the system will visit only 4) guarantees that while states for which deadlock-free operation is possible in the event of repair. The objective of this work is to develop supervisory control policies that satisfy these properties and thus maintain system operation in the face of resource failures without manual intervention to clear and reset the system. Many degrees of operational robustness are possible. A specific supervisory controller may be robust to the failure of one resource, but be unable to guarantee continuing operation if two resources fail simultaneously. Further, the level of resource and processing redundancy in the system will affect operational properties. For example, if every operation can be performed by two or more resources, then higher degrees of operational robustness are possible. We organize these possibilities into a taxonomy that will be used to establish new classes of relevant systems. We begin with the simplest approach; to assume that there is some subset of resources susceptible to failure and that at most one of these resources can be down at a time. This leads to the results shown in Table I. Next, we need some similar taxonomy for resource redundancy. As with robustness, there are many possibilities that need to be explored. One simple example is shown in Table II. The next step is to cross these two taxonomies and identify those systems that are most relevant to manufacturing as shown in Table III. The deadlock avoidance research reviewed earlier addresses repsystems falling in the first row, that is, RB RD for resents reliable systems with no flexible routing so that safe operation is equivalent to avoiding deadlock. For reliable systems , the control objective is to avoid deadlock while with exploiting flexible routing alternatives made available through resource flexibility. The second and third rows exhibit several interesting systems. For example, with the combination RB RD , the controller must allocate resources so that if a specified unreliable resource fails the resulting system can continue to produce part types not

LAWLEY AND SULISTYONO: ROBUST SUPERVISORY CONTROL POLICIES FOR MANUFACTURING SYSTEMS WITH UNRELIABLE RESOURCES

TABLE II RESOURCE REDUNDANCY

TABLE III CROSS PRODUCT

requiring that resource, since there are no flexible routing opportunities. For RB RD , the controller must allocate resources so that if a specified unreliable resource fails the system can continue producing all part types, since every part type stage requiring the failed resource has an optional resource type that it can use. Since the use of a central buffer can play an important role in handling resource failures, we append a third category to our , where represents the capacity of taxonomy, RB RD the central buffer. Section IV develops a supervisory controller for RB RD , while Section V develops a controller for , . RB RD IV. ROBUST SUPERVISORY CONTROL WITH A SUBOPTIMAL DEADLOCK AVOIDANCE POLICY This section develops a supervisory controller for the system RB RD . This controller combines a suboptimal DAP with a set of neighborhood constraints. We will establish that this policy satisfies properties 1)–4) of Definition 1. A suboptimal DAP is a policy that sacrifices some safe allocation states in order to achieve computational tractability. Banker’s Algorithm (BA), first proposed by Habermann [1] for computer operating systems, is perhaps the most widely known policy in this class. It assumes that as each process enters the system, it declares the maximum number of each resource that it might ever require at one time. It further assumes that, if a process is simultaneously allocated its stated maximum of each resource, then the process will terminate without additional requests and will release all reusable resources allocated to it. These resources then become available for allocation to other processes. The policy avoids deadlock by allowing an allocation only if the processes can be ordered so that the , can be met by maximal resource needs of the th process, pooling available resources with those allocated to processes . The order defines a sequence in which all

351

processes in the system can be terminated successfully. The work in [3] shows BA to be , where is the number of resource types and is the number of requests. The work in [18] develops a variant of BA by modifying its data structure and introducing the concepts of ordered and partially ordered allocation states that allow incremental increases in the number of admissible states (states accepted by BA are “ordered” while those rejected are “unordered”). Experimental results suggest that the modified algorithm performs well in terms of the ratio (size of admissible region/size of safe region) as compared to several other suboptimal DAPs (such as [7], [17], and [19]). Most DAPs, including BA, are not robust to resource failure. For example, BA and most other DAPs would allow the states of Fig. 2. Thus, in the following, we develop two policies, one based on a set of neighborhood constraints and the other a modified version of BA. Although neither of these two policies is correct by itself, we will prove that their conjunction satisfies the desired properties. We begin with the neighborhood constraints. , that is, is the sole unreliable resource and Let is the set of part type stages supported by . Derecall that fine a resource, , to be “failure-dependent” if, for every , , where . That is, is failure-dependent if every part that enters its buffer space requires future probe the set of failurecessing by the unreliable server. Let ). For each resource dependent resources (clearly, , we will develop a neighborhood constraint. The neighin is defined as the set of part type stages borhood of later in their routes and have no intervening rethat require . More formally, NH source in with and for . For , , NH example, in Fig. 3, , NH , NH . A resource allocation state is said to capacitate a neighborhood when the number of part type instances currently in the system equals the capacity of neighborhood’s resource. For example, in Fig. 3, NH is capacitated if (recall that is the number of unfinished units of located and is the number of finished in the buffer space of located in the buffer space of ). units of We will construct neighborhood constraints (NHC) to assure that no neighborhood is over-capacitated, thus, for the example of Fig. 3, we have (1) (2) (3) Equation (1) assures that NH is not over-capacitated, while (2) and (3) assure the same for NH and NH , respectively. We need one additional constraint. To see this, notice that in Fig. 3 both NH and NH are capacitated and the system is adstalled. The constraints for NH will not let a unit of vance to . However, any safe sequence for this state requires ’s be advanced to , which violates ’s that one of the constraint. Thus, the neighborhood constraints, as they are now

352

IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 18, NO. 3, JUNE 2002

Fig. 3. Example system.

Fig. 4.

Example of deadlock state admitted by NHC.

stated, induce deadlock on the system. To alleviate this, we introduce the constraint that at most one neighborhood can be capacitated at a time. That is, (4) (5) (6) Equation (4) assures that NH and NH are not simultaneously filled, (5) assures that NH and NH are not simultaneously filled, and (6) assures that NH and NH are not simultaneously filled. The neighborhood constraints (NHC) can be generally stated as For each For each pair We note that these constraints alone will not enforce safety of the system, that is, they will permit unsafe states, as illustrated in Fig. 4. Thus, NHC by itself is not a correct supervisor. We will show later that its conjunction with a modified BA will yield a supervisor that is robust with respect to the unreliable resource. We now establish some simple but important properties of NHC. Property 1: Movements of parts not requiring the unreliable resource are not inhibited by NHC. does not require the unreliable Proof: If part type resource, then none of its associated state variables appear in NHC. Thus, NHC does not inhibit these parts. Property 2: Neighborhoods do not overlap, that is, NH NH , such that . NH NH , for . By defiProof: Suppose appears in the route of and nition, the unreliable resource , . Also, every part that enters thus, for some

the buffer space of both and requires future processing by the unreliable server. Without loss of generality, suppose and with . needs first to visit , Loosely speaking, an instance of and then . NH implies that then , that is, none of the reare failure-desources required by NH implies that pendent. Similarly, , that is, none of the resources required are failure-depenby . dent. This is a contradiction, since Property 3: Movements of parts within a neighborhood do not violate NHC. , NH . Further, suppose Proof: Suppose . Then, the sum of state state satisfies NHC and ) appears in the constraint variables ( , advancing an of NH . Since to its next step decrements and increinstance of . This does not change the sum ( ments ) and thus does not cause NHC to be violated. Property 4: Advancing a part out of a capacitated neighborhood will not violate NHC. NH and NH . Proof: Suppose , and capacFurther, suppose state satisfies NHC, , advancing itates NH . Since will decrement the contents of NH and posa unit of sibly increment the contents of another neighborhood. However, NHC guarantees that only one neighborhood is capacitated in state , NH . Thus, advancing a part out of NH cannot overcapacitate a neighborhood or cause two neighborhoods to be simultaneously capacitated. Having developed NHC, we will now modify BA. Our objective is to develop a version of BA that finds an ordering of parts in the system so that those not requiring the unreliable resource are advanced out of the system (thus making their currently allocated resource available) and those requiring the unreliable resource are advanced to the failure-dependent resource that corresponds to their current neighborhood (and thus, in case of failure, assuring the supervisor’s ability to clear the system is the unreliable of these parts). In the following, we assume resource. The required notation and the algorithm, Algorithm 1, are given in the Appendix. Furthermore, Fig. 5 provides a detailed

LAWLEY AND SULISTYONO: ROBUST SUPERVISORY CONTROL POLICIES FOR MANUFACTURING SYSTEMS WITH UNRELIABLE RESOURCES

353

Fig. 5. Applying Algorithm 1.

example. Algorithm 1 operates on the part type stages represented in the system and not on the actual parts (if there are several parts of the same type and stage in the system, Algorithm 1 handles them simultaneously). Note that it uses three data structures: AVAILABLE, ALLOCATION, and NEED. AVAILABLE is a vector that records the number of available resource instances of each resource (the amount of available buffer space for each workstation), while ALLOCATION is a two-dimensional (2-D) array that records the allocation of

resources to each part type stage represented in the system. NEED is a 2-D array that records the future resource needs for each part type stage. For stages not requiring the unreliable resource, NEED records the resources required to finish and exit the system. For stages requiring the unreliable resource, NEED records the resources required for the parts of that stage to make it to the first failure-dependent resource in the remaining route (if parts of a stage are already on a failure-dependent resource, the NEED is set to 0).

354

IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 18, NO. 3, JUNE 2002

The first loop for Algorithm 1 eliminates from consideration those part type stages corresponding to parts on failure-dependent resources. Recall that, for parts requiring the unreliable resource, the objective is push them into the failure-dependent resource of their current neighborhood. If they are already there, the algorithm does not have to move them, so the corresponding stages can be eliminated from consideration. The next loop identifies part type stages whose future resource NEED is completely available. If the stage requires the unreliable resource, then this means that all resources required to advance the corresponding parts (one at a time) into the failure-dependent resource of the current neighborhood are available. In this case, the resource currently held is deallocated (its place in the AVAILABLE vector is incremented by the number of parts of that stage present) and the stage’s NEED for nonfailure-dependent resources is set to zero. Then the AVAILABLE units of the corresponding failure-dependent resource is decremented by the number of parts of that stage present and the ALLOCATION of the failure-dependent resource to that stage is incremented by the number of parts of that stage present. This changes the resource allocation state from the current to the one in which all of these parts have finished advancing. If the stage whose NEED is completely available does not require a failure-dependent resource, then the resource currently held is de-allocated (its place in the AVAILABLE vector is incremented by the number of parts of that stage present), its NEED is set to zero and its ALLOCATION is set to zero. This changes the resource allocation state from the current to one in which all of these parts have been completed. Algorithm 1 accepts when all parts corresponding to stages not requiring the unreliable resource have been advanced out of the system and all parts corresponding to stages requiring the unreliable resource have been advanced into the failure-dependent resource of their current neighborhood. If this is not possible, then Algorithm 1 rejects. We note that, if there are no unreliable resources, this version of BA reduces to the modified version given in [18] and it guarantees deadlock-free operation. If an unreliable resource exists, then the policy is not correct by itself, for it will allow deadlocks among failure dependant resources, as we see in Fig. 5. We now show that a conjunctive policy, which combines BA with NHC (a resource allocation will be permitted only if it satisfies both BA and NHC) is robust to the failure of the unreliable resource. . We refer to this policy as enforces safety for . Lemma 1: Proof: First, we note that BA admits a state only if there exists an ordering such that all parts not requiring future processing on the failed resource, , can be finished (call these can be parts ) and all parts requiring future processing on advanced into failure-dependent resources (call these parts ). to belong to . Further, Note that we will assume parts at NHC admits a state only if at most one neighborhood is capactherefore admits a state only if it satisfies each of itated. these requirements. To prove the lemma, we must establish two things. First, we must show that execution of the ordering computed by BA never requires a violation of NHC. We must then having been comshow that the resulting state (all parts in

pleted and all parts in having been advanced into failure-dependent resources) is safe and all remaining parts can be ad. vanced to completion under induces deadlock in , that We begin by assuming that , but for any possible oris, there exists a state acceptable to dering that BA computes, execution of that ordering eventually requires violation of NHC. Thus, in executing the ordering, would induce deadlock at the point where NHC has to be vio. lated to continue. Call this induced deadlock state satisfies , that is, satisfies both BA and NHC. But, if the next part in the order of BA is advanced, NHC will be violated. be the next part advanced in the order computed Let must violate NHC. Thus, by Property by BA. Advancing requires the unreliable resource. Further, by Property 4, 1, must be advancing out of an uncapacitated neighborhood occupies space in a failure-dependent resource. and thus, cannot be next in the order since it alBut, if this is true, ready occupies space at a failure-dependent resource and the order computed by BA does not advance parts out of their current neighborhoods. Thus, we have a contradiction. Therefore, the ordering computed by BA for a given state can be executed without violating NHC. We must now show that all parts remaining in the state resulting from executing the order can be finished under . First, note that after executing the order, only failure-dependent resources hold parts. Further, by NHC, there can be at most one capacitated neighborhood. Thus, at most one failure-dependent resource can be allocated to capacity. If a failure-dependent resource is allocated to capacity, advance any part at that resource one step in its route, otherwise, to the reselect any part to advance one step. Then apply sulting state. By Property 4, this state does not violate NHC. BA will either move this part out of the system or into another failure-dependent resource. To see this, note that if the part does not require another failure-dependent resource, then all resources in its remaining route are available. If the part does require another failure-dependent resource, then it is in the neighborhood and therefore free to advance into the failure-dependent resource’s buffer space. It is clearly possible to repeat this procedure until all parts are completed and thus the proof is complete. We will now show that all states admitted by serve as feasible initial states for . and enables , then Lemma 2: If is a feasible initial state for . satisfies . Then starting from , Proof: Suppose admits a sequence of events that finishes all parts in and advances all parts in to a failure-dependent resource. We claim that this same sequence can be executed starting in state . Consider the set of events in the sequence. It consists of the operation completions ( ’s) and the part advancements ( ’s) required to finish parts in . It also consists of the operation completions ( ’s) and part advancements ( ’s) required to move to the failure-dependent resource of their current parts in neighborhood. It will have no part advancement ( ) that corresponds to advancing a part away from a failure-dependent resource and thus it does not require and will not have any com-

LAWLEY AND SULISTYONO: ROBUST SUPERVISORY CONTROL POLICIES FOR MANUFACTURING SYSTEMS WITH UNRELIABLE RESOURCES

pletion event ( ) for a failure-dependent resource. Thus, the sequence in no way depends on the occurrence of completion ), events at the unreliable resource ( ’s such that that is, the state of ’s server is irrelevant to the execution of the sequence. also holds Therefore, the sequence that BA computes for for . Once this sequence is executed starting from , all resources not failure-dependent are free of parts and the system can continue producing those parts not requiring . Lemma 3: ensures safety for . assures feasible initial states for Proof: By Lemma 2, . Further, it easily follows from the proof of Lemma 1 that if is started in a feasible initial state, then enforces safety. Lemma 4: If , then is a feasible initial state for . Proof: Suppose satisfies . Then BA computes an orcan be emptied from the dering of parts such that those in system and those in can be advanced to the failure-dependent resource of their current neighborhood. By the proof of Lemma 2, this sequence in no way depends on the status of the server of the unreliable resource, . Thus, the occurrence of the repair event, , in no way affects the validity of the order of BA. and all parts in will be After the order is executed, in the buffer space of failure-dependent resources. By the proof . Thus, of Lemma 1, these parts can then be completed under is a feasible initial state for . Theorem 1: is robust to the failure of . Proof: Follows from Definition 1 and Lemmas 1–4. These results establish that is robust to the failure of one specified unreliable resource. Thus, a system can be and, anytime the unreliable started under the supervision of resource fails, production can continue smoothly and without major disruption. We want to clarify that it is rarely necessary to follow the sequence computed by BA, even when failure occurs. It is only necessary to assure that each state allowed has such an associated sequence. This, together with NHC, gives us the guarantee that safe operation is possible from that state. In will allow new parts requiring the unreliable resource fact, to be loaded into the system even when that resource’s server is down, just so long as the resulting state satisfies both BA and NHC. The following section develops another robust supervisory , which makes use of the optimality repolicy for RB RD sults cataloged in [10].

V. ROBUST SUPERVISORY CONTROL WITH OPTIMAL DEADLOCK AVOIDANCE As mentioned in the literature review, one branch of deadlock avoidance research seeks to discover special system structures that render the safety question polynomial. If such a structure is present in the system design, then establishing state safety is equivalent to detecting deadlock. That is, if a state exhibits no deadlock, then it is safe. For these systems, single step lookahead for deadlock (SSLA), allowing a part movement so long as the resulting state exhibits no deadlock, keeps the system operational as long as no resource fails (it is easy to show that

355

SSLA is not robust to resource failure). SSLA applies a deadlock detection algorithm to the state that results if some selected part is allowed to advance one step in its route. Deadlock detection is a straightforward polynomial computation, and detection algorithms can be found in [22] and [24]. In [10], several special cases are cataloged for which state safety and deadlock detection are equivalent. Two of the most applicable are: 1) when every resource has at least two units of buffer space and 2) there is an optional central buffer to which any part may return after any operation, provided the central buffer has unallocated space available. For these types of systems, we now establish that a policy combining NHC and SSLA (a state is allowed only if it has no deadlock and does not violate neighborhood constraints) is robust to the failure of one specified resource. We refer to this policy as . ensures safety for . Lemma 5: accepts a state only if it is safe and does not Proof: violate NHC. Being safe implies the existence of a sequence of resource allocations that finishes all parts in the system and satisfying NHC implies no over-capacitated neighborhood and at most one capacitated neighborhood. To prove the lemma, we need to establish that execution of the safe sequence does not require violation of NHC. induces Assume that, in executing the safe sequence, but any safe deadlock, that is, we reach a state acceptable to be the set of parts in part movement violates NHC. Let be those the system that can safely advance one step, let that can advance, but not safely (these advance into deadlock), be those that cannot advance because they and let , the advancement of are blocked. For every must violate NHC. By Property 1, requires the unreliable must be advancing out of an resource and, by Property 4, occupies uncapacitated neighborhood. This implies that buffer space at a failure-dependent resource, and that has some available buffer space (otherwise, its are neighborhood would be capacitated). Thus, parts in or by parts in , blocked either by other parts in . but not by parts in . Then, every part in the Suppose that system can advance safely. Thus, if there is a capacitated neighborhood, advancing any part from that neighborhood does not violate NHC. If there is no capacitated neighborhood, advancing any part from one neighborhood to another cannot violate NHC. Thus, we can have no induced deadlock. and . Then every part Suppose that is blocked by another part in and we have a in deadlock (and a contradiction since such a state would never be . accepted by SSLA). Thus, we must have , we get a deadlock. Now, if we advance does not advance into the failure-dependent resource . To see this, recall that occupied by any can safely advance and thus must have available deadlocks , buffer space. Because advancing to causes every part at to advancing such that become blocked. However, if there exists , then advancing to cannot capacitate . That is, advancing to cannot cause to become blocked

356

IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 18, NO. 3, JUNE 2002

and thus advancing to cannot cause . deadlock. Thus, we have is inIt follows that no resource in , volved in the deadlock created when we advance are irrelevant to the unsafety of parts in that is, parts in . This implies that there is no safe sequence for parts in and thus we have a deadlock-free unsafe state one step and posfrom deadlock involving at least the parts in . This, however, contradicts the assumpsibly some in tion that we have a special structure in place that eliminates the possibility of deadlock-free unsafe states. This establishes the lemma. Thus, if we have any special structure in place in the system that renders deadlock-free unsafe states impossible, then the policy SSLA combined with NHC keeps the system operating. serves as a We next establish that any state reachable under feasible initial state for . and enables , then Lemma 6: If is a feasible initial state for . satisfies and enables . Then Proof: Suppose there exists a sequence of admissible state transitions that . Note that empties the system; call this sequence contains events that advance parts into failure-dependent , events that finish parts at resources, and events failure-dependent resources, that advance parts away from failure-dependent resources, . In fact, for every (recall is the set of parts in the system that require future prothat has a subsequence cessing on the unreliable resource, ), , where advances into its of the form advances the part closest failure-dependent resource, and out of this failure-dependent resource, through the rest of its may be route and out of the system. The subsequence interspersed with many other events, but it must appear in . occurs in state . Thus, Now suppose the failure event and we need to show that it the new state is is a feasible initial state for . It will be feasible initial if all can be advanced to failure-dependent resources and held there, without deadlocking other parts in the system. To as follows: for each , delete achieve this, modify from , that is, delete the events that its corresponding advance the part out of its closest failure-dependent resource, through the rest of its route and out of the system. Call the reis a sequence that advances all parts resulting sequence . quiring the unreliable resource into the their closest failure-dependent resource and all parts not requiring the unreliable resource out of the system. To see that does this, recall that once occupies a failure-dependent resource it no longer in. terferes with the advancement and completion of parts , the deleted subsequence , is irrelevant to Thus, for . Thus, is a feasible the safety of parts initial state for . Lemma 7: Given that retains the special structure that renensures safety for . ders the safety polynomial, assures feasible initial states for Proof: By Lemma 6, and, in fact, demonstrates that all parts in can be advanced through their remaining routes and out of the system and all can be advanced into failure-dependent resources. parts in

If the failure of does not interfere with the special structure that eliminates deadlock-free unsafe states, then it is clear that can have no deadlock-free unsafe states and thus keeps operating. To establish that is robust to the failure of , we now , have only to show that every state of reachable under serves as a feasible initial state for . the state , then is a feasible Lemma 8: If initial state for . , assures a safe seProof: For every state, and a sequence that advances parts in quence for parts in into failure-dependent resources. Clearly, this sequence is also . Execute this sequence. The resulting state, valid for and every part of occupying a failure-desay , has does not violate NHC. We have pendent resource. Further, can be advanced to only to show that every part present in . completion under If there exists a capacitated neighborhood, advance any part from that neighborhood, otherwise advance any part. The part (the unreliable resource) for the last time is either leaving (in which case it is entering no neighborhood) or entering an uncapacitated neighborhood. If the first, then it can be advanced out of the system. If the second, then it can be advanced to the failure-dependent resource corresponding to that neighborhood. If parts remain, repeat the procedure. Clearly, after this procedure is executed a finite number of times, all parts will be finished and the system empty. The empty , is a feasible initial state with all resources operational, state for . Theorem 2: is robust to the failure of . Proof: This proof follows from Definition 1 and Lemmas 5–8. These results establish that is robust to the failure of one specified unreliable resource. Thus, a system exhibiting the required special structure can be started under the supervision of and, anytime the unreliable resource fails, production can , continue smoothly and without major disruption. As with we want to clarify that it is rarely necessary to explicitly follow the safe sequences guaranteed by , even when failure occurs. It is only necessary to assure that each state allowed has such a sequence. This, together with NHC, gives us the guarantee that , will safe operation is possible from that state. As with allow new parts requiring the unreliable resource to be loaded into the system even when that resource’s server is down, just so long as the resulting state satisfies both SSLA and NHC. We will now discuss the use of a central buffer for helping to be robust handle resource failures. We have established for systems with an optional central buffer, to the failure of assuming that, after any operation, any part may return to the central buffer if it has available space. For this result to hold, the central buffer need hold only one part, that is, the result does not depend on the central buffer being able to accommodate every part requiring the failed resource (for a proof of this, the reader is referred to [24]). Now, suppose we have a central buffer with spaces. We need at least one unit of the central buffer’s space to assure that SSLA avoids deadlock. The other spaces can be distributed among the neighborhoods of the failure-dependent resources. For example,

LAWLEY AND SULISTYONO: ROBUST SUPERVISORY CONTROL POLICIES FOR MANUFACTURING SYSTEMS WITH UNRELIABLE RESOURCES

Fig. 6.

357

Using a central buffer for SSLA and to augment neighborhoods.

in Fig. 6, two units of the central buffer are allocated for use and two units are allocated to by failure-dependent resource unreliable resource . The other two units are used by SSLA to assure that deadlock-free unsafe states do not arise. In this case, will be robust to the failure of and the right-hand sides of NHC will be augmented, making the policy more flexible. A more sophisticated approach is to dynamically allocate units of the central buffer to neighborhoods that request those units when the neighborhoods fill up. Thus, the right-hand sides of NHC could be dynamically adjusted based on real-time allocation of central buffer space among failure-dependent resources and SSLA. This is beyond the scope of this paper and is one direction for future work. VI. CONCLUSION In this paper, we developed the notion of robust supervisory control for automated manufacturing systems with unreliable resources. Our objective is to allocate system buffer space so that when an unreliable resource fails the system can continue to produce all part types not requiring the failed resource. We established four properties that such a controller must satisfy, namely, that it ensure safety for the system given no resource failure; that it constrain the system to feasible initial states in case of resource failure; that it ensure safety for the system while the unreliable resource is failed; and that during resource repair it constrain the system to states that will be feasible initial states when the repair is completed. We then developed two polynomial control polices that satisfy these conditions and discussed methods for using a central buffer to increase the flexibility of these policies. We also developed a taxonomy that combines robustness and redundancy to help guide future work in this area. Future research will address dynamic allocation of central buffer space to neighborhoods, other relevant categories of the taxonomy and performance related issues through discrete event simulation. APPENDIX Let RT be the remaining route of (note that the current resource is included). For , let

RT and RT . is the set of part type stages represented in whose part instances are either allocated buffer or need to visit before finishing. is the set space at of part type stages represented in whose part instances do not . hold or require buffer space at . Let We want to modify BA so that it accepts a state if and only such that every part of type if there exists an ordering of can be be allocated sufficient resources to be driven from the system and every part of type can be allocated sufficient resources to be driven into some . , where Partition as follows: and . is the set of part type stages in whose part instances are allocated buffer space at a failure-dependent resource. is the set of part type stages in whose part instances are not allocated buffer space at a failure-dependent resource. be an indexing function on so Let is the th member of . Define the following that data structures. and , For all NEED ALLOCATION AVAILABLE (recall is the capacity of ). , For ALLOCATION . RT , For . NEED For , , . NEED For where is smallest integer with , NEED . ALLOCATION AVAILABLE The algorithm is now defined as follows.

358

IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 18, NO. 3, JUNE 2002

Algorithm 1 Begin

Remove parts on failure-dependent resources. Loop such that NEED , Find If no such exists, break from loop, Else End Loop Find parts with resource needs that can be accommodated. Loop , Return Admit If such that , Else find NEED ALLOCATION AVAILABLE If no such exists, Return Reject. Else such that For AVAILABLE AVAILABLE ALLOCATION NEED End For If NEED for some Find such that ALLOCATION AVAILABLE AVAILABLE ALLOCATION ALLOCATION ALLOCATION NEED End If For such that ALLOCATION End For End Else End Loop End REFERENCES [1] A. Habermann, “Prevention of system deadlocks,” Commun. ACM, vol. 12, pp. 373–377, 1969. [2] A. Shoshani and E. Coffman, “Sequencing tasks in multiprocess systems to avoid deadlocks,” in Proc. 11th Annu. Symp. on Switching Automata Theory, 1970, pp. 225–233. [3] R. Holt, “Some deadlock properties of computer systems,” ACM Computing Survey, vol. 4, pp. 179–196, 1972. [4] T. Araki, Y. Sugiyama, and T. Kasami, “Complexity of the deadlock avoidance problem,” in Proc. 2nd IBM Symp. the Mathematical Foundations of Computer Science, Tokyo, Japan, 1977, pp. 229–257. [5] E. Gold, “Deadlock prediction: Easy and difficult cases,” SIAM J. Computing, vol. 7, pp. 320–336, 1978. [6] Z. Banaszak and E. Roszkowska, “Deadlock avoidance in pipeline concurrent processes,” Podstawy Sterowania (Foundations of Control), vol. 18, pp. 3–17, 1988. [7] Z. Banaszak and B. Krogh, “Deadlock avoidance in flexible manufacturing systems with concurrently competing process flows,” IEEE Trans. Robot. Automat., vol. 6, pp. 724–734, 1990. [8] N. Viswanadham, Y. Narahari, and T. Johnson, “Deadlock prevention and deadlock avoidance in flexible manufacturing systems using petri net models,” IEEE Trans. Robot. Automat., vol. 6, pp. 713–723, 1990. [9] R. Wysk, N. Yang, and S. Joshi, “Detection of deadlocks in flexible manufacturing cells,” IEEE Trans. Robot. Automat., vol. 7, pp. 853–859, 1991. [10] M. Lawley and S. Reveliotis, “Deadlock avoidance for sequential resource allocation systems: Hard and easy cases,” Int. J. Flexible Manuf. Syst., vol. 13, no. 4, 2001.

[11] Y. Leung and G. Sheen, “Resolving deadlocks in flexible manufacturing cells,” J. Manuf. Syst., vol. 12, no. 4, pp. 291–304, 1993. [12] T. Kumaran, W. Chang, H. Cho, and R. Wysk, “A structured approach to deadlock detection, avoidance and resolution in flexible manufacturing systems,” Int. J. Prod. Res., vol. 32, no. 10, pp. 2361–2379, 1994. [13] F. Hsieh and S. Chang, “Dispatching-driven deadlock avoidance controller synthesis for flexible manufacturing systems,” IEEE Trans. Robot. Automat., vol. 10, pp. 196–209, 1994. [14] J. Ezpeleta, J. Colom, and J. Martinez, “A Petri-net based deadlock prevention policy for flexible manufacturing systems,” IEEE Trans. Robot. Automat., vol. 11, pp. 173–185, Apr. 1995. [15] M. Lawley, S. Reveliotis, and P. Ferreira, “Design guidelines for deadlock handling strategies in flexible manufacturing systems,” Int. J. Flexible Manuf. Syst., vol. 9, pp. 5–30, 1997. [16] M. Fanti, B. Maione, S. Mascolo, and B. Turchiano, “Event-based feedback control for deadlock avoidance in flexible production systems,” IEEE Trans. Robot. Automat., vol. 13, pp. 347–734, 1997. [17] M. Lawley, S. Reveliotis, and P. Ferreira, “FMS structural control and the neighborhood policy: Parts 1 and 2,” IIE Trans., vol. 29, no. 10, pp. 877–899, 1997. , “Application and evaluation of Banker’s algorithm for dead[18] lock-free buffer space allocation in flexible manufacturing systems,” Int. J. Flexible Manuf. Syst., vol. 10, pp. 73–100, 1998. , “A correct and scalable deadlock avoidance policy for flexible [19] manufacturing systems,” IEEE Trans. Robot. Automat., vol. 14, pp. 796–809, 1998. [20] E. Roszkowska and J. Jentink, “Minimal restrictive deadlock avoidance in FMS’s,” in Proc. Eur. Control Conf. ECC ’93, vol. 2, 1993, pp. 530–534. [21] K. Xing, B. Hu, and H. Chen, “Deadlock avoidance policy for Petri net modeling of flexible manufacturing systems with shared resources,” IEEE Trans. Automat. Contr., vol. 41, pp. 289–296, 1996. [22] S. Reveliotis, M. Lawley, and P. Ferreira, “Polynomial complexity deadlock avoidance policies for sequential resource allocation systems,” IEEE Trans. Automat. Contr., vol. 42, pp. 1344–1357, Oct. 1997. [23] M. Fanti, B. Maione, and B. Turchiano, “Event control for deadlock avoidance in production systems with multiple capacity resources,” Studies in Informat. Contr., vol. 7, no. 4, pp. 343–364, Dec. 1998. [24] M. Lawley, “Deadlock avoidance for production systems with flexible routing,” IEEE Trans. Robot. Automat., vol. 15, pp. 1–13, June 1999. , “Integrating flexible routing and algebraic deadlock avoidance [25] policies in automated manufacturing systems,” Int. J. Prod. Res., vol. 38, no. 13, pp. 2931–2950, 2000. [26] W. Sulistyono and M. Lawley, “Deadlock avoidance for manufacturing systems with flexible sequencing,” in Proc. IEEE Int. Conf. Robotics and Automation, San Francisco, CA, Apr. 2000. [27] B. Johnson, Design and Analysis of Fault-Tolerant Digital Systems. Reading, MA: Addison-Wesley, 1989. [28] M. Lyu, Software Fault Tolerance, M. Lyu, Ed. Chichester, N.Y.: John Wiley, 1995. [29] J. Gray and D. Siewiorek, “High-availability computer systems,” IEEE Computer, vol. 24, pp. 39–48, Sept. 1991. [30] D. Pradhan, Fault Tolerant Computer System Design. Upper Saddle River, NJ: Prentice-Hall, 1996. [31] D. Siewiorek and R. Swarz, Reliable Computer Systems: Design and Evaluation. Natick, MA: A.K. Peters, 1998. [32] J. Buzacott, “Optimal operating rules for automated manufacturing systems,” IEEE Trans. Automat. Contr., vol. AC-27, pp. 80–86, 1982. [33] O. Maimon and S. Gershwin, “Dynamic scheduling and routing for flexible manufacturing systems that have unreliable machines,” Oper. Res., vol. 36, pp. 279–292, 1986. [34] A. Kalir and Y. Arzi, “Optimal design of flexible production lines with unreliable machines and infinite buffers,” IIE Trans., vol. 30, pp. 391–399, 1998. [35] M. Olumolade and D. Norrie, “Reactive scheduling system for cellular manufacturing with failure-prone machines,” Int. J. Comput. Integr. Manuf., vol. 9, no. 2, pp. 131–144, 1996. [36] S. Reveliotis, “Accommodating FMS operational contingencies through routing flexibility,” IEEE Trans. Robot. Automat., vol. 15, pp. 3–19, Feb. 1999. [37] S. Park and J. Lim, “Fault-tolerant robust supervisor for discrete event systems with model uncertainity and its application to a workcell,” IEEE Trans. Robot. Automat., vol. 15, pp. 386–391, Apr. 1999. [38] F. Hsieh, “Reconfigurable fault tolerant deadlock avoidance controller synthesis for assembly production processes,” in Proc. IEEE Conf. Man, Systems and Cybernetics, Nashville, TN, 2000, pp. 3045–3050. [39] P. Ramadge and W. Wonham, “Supervisory control of a class of discrete event processes,” SIAM J. Control Optim., vol. 25, no. 1, pp. 206–230, 1987.

LAWLEY AND SULISTYONO: ROBUST SUPERVISORY CONTROL POLICIES FOR MANUFACTURING SYSTEMS WITH UNRELIABLE RESOURCES

Mark A. Lawley (M’01) received the B.S. degree in industrial engineering from Tennessee Technological University, Cookeville, the M.S. degree in manufacturing systems engineering from Auburn University, Auburn, AL, and the Ph.D. degree in mechanical engineering from the University of Illinois at Urbana-Champaign. He is currently Assistant Professor of Industrial Engineering at Purdue University, West Lafayette, IN. He has held industrial positions with Westinghouse Electric Corporation and Emerson Electric Company. His research interests are in manufacturing systems modeling, analysis and control. Dr. Lawley is a Registered Professional Engineer in the State of Alabama.

359

Widodo Sulistyono received the B.S. and M.S. degrees in electrical engineering from Texas A&M University, College Station. He is currently working toward the Ph.D. degree in industrial engineering at Purdue University, West Lafayette, IN. He served as an Applications Engineer for the Agency for the Assessment and Application of Technology—Indonesia, specializing in industrial control automation. He also worked as a Graduate Assistant in the Industrial Engineering Department at Purdue University.