Optimal Job Releasing and Sequencing for a Reentrant Manufacturing

Proceedings of the 45th IEEE Conference on Decision & Control Manchester Grand Hyatt Hotel San Diego, CA, USA, December 13-15, 2006

FrIP13.10

Optimal Job Releasing and Sequencing for a Reentrant Manufacturing Line with Finite Capacity Buffers José A. Ram´ırez-Hernández and Emmanuel Fernandez Abstract— This paper presents an optimal policy for the problems of job releasing and sequencing in an adapted version of a benchmark Reentrant Manufacturing Line (RML). We consider a finite state space and an infinite horizon discounted cost optimization criteria. The resulting optimal policy provides a trade-off between throughput maximization (i.e., profits) and minimization of inventory costs. The policy is defined by two indexes of the inventory costs, profits, system parameters, and discount factor. Results show that when no profits are obtained, the policy also presents Blackwell optimality characteristics. In addition, the optimal policy reflects the effect of discounted and undiscounted profits during state transition intervals.

I. INTRODUCTION The problem of production control in Semiconductor Manufacturing Systems (SMS), also known as Shop Floor Control (SFC) [1], has received great attention from researchers during the last two decades [2], [3], [4], [5]. The interest in this area has increased over the years due to the economic and technological impact of semiconductor devices in different human activities, and the challenging nature of the control of these systems. The most complex operations in SMS are performed in the so-called wafer fabs within the front-end process [6]. In a fab, semiconductor devices are built in silicon wafers through a repetitive manufacturing process that is generally modeled as a queueing network with reentrant lines [3]. In a reentrant line, jobs can return to a particular station several times during the fabrication process. We will refer to these systems as Reentrant Manufacturing Lines (RML). In SMS, the control of RML is particularly complex due to large state and action spaces resulting from these systems. Therefore, exact optimal control solutions for such systems are generally intractable [7]. However, near to optimal approaches based on simulation-based optimization such as Approximate Dynamic Programming (ADP)1 , may provide useful methodologies and algorithms to overcome the curse of dimensionality and modeling that are characteristic in real world RML in SMS. The objective of this paper is then to obtain exact optimal policies for both the problem of job releasing or input regulation [1], [10] and job sequencing [2], [3], [4], [5] for an adapted version of a benchmark RML [11], [5], [12], [13], under the framework of the infinite horizon José A. Ram´ırez-Hernández and Emmanuel Fernandez are with the Department of Electrical & Computer Engineering & Computer Science, University of Cincinnati, OH 45221, USA. Emails:{ramirejs;emmanuel}@ececs.uc.edu. 1 ADP is also known in the literature as Neuro-Dynamic Programming (NDP) [8] or Reinforcement Learning (RL) [9].

1-4244-0171-2/06/$20.00 ©2006 IEEE.

discounted cost (DC) criteria [14], [15]. In turn, these optimal solutions can be utilized to assess ADP approaches that may provide an insight for the implementation of existing ADP algorithms as well as the design of new ones. Likewise, such assessment can be used to understand the limitations and advantages of such methodologies for its application into realistic scenarios. In addition, the DC criteria facilitates the analysis of the optimal control problem, and thus may result useful for improving performance in the short-term in SMS. Optimization of short-term production performance could be of significance given the current dynamics in the SMS industry where new products have short life-cycles [16], [17] and require a rapid ramp-up [18] in the manufacturing process. This paper extends previous work focused on the optimal job sequencing for the benchmark RML [13], [19], [20] to the case of both optimal job releasing and sequencing. The work presented here considers finite capacity buffers and general nonnegative and monotonically nondecreasing, with respect to the componentwise partial order, one-stage inventory cost functions. Here the optimal policy provides a trade-off between minimization of inventory costs and maximization of throughput. Other research on the optimal job sequencing control of the benchmark RML has also been presented in [11], [12] where linear inventory costs are considered. In [11] sufficient conditions for optimality are provided under the DC criteria, while in [12] both fluid approximations and numerical solutions by value iteration to the optimal Average Cost (AC) policy are presented. In addition, in [13], [19] preliminary experiments on the application of ADP are presented. The problem of controlling the release of new jobs is modeled in this paper by including a finite capacity buffer at the entrance of the RML which holds jobs to be released into the system. This release is modeled as an on/off switching of a server for this buffer. The difficulty of this task resides in deciding when to release a new job while positively impacting production performance (e.g., minimizing inventory costs). Moreover, the buffer at the entrance can be seen as modeling also a so-called order pool [23], [24], which receives orders of jobs according to certain demand. Orders are assumed to arrive as a Poisson process, and then these wait in the order pool until a decision for releasing a new job is produced according to a given policy. Thus, the order pool serves as a mechanism to regulate the entrance of new jobs into the RML. Furthermore, we assume in our model that all processing times are exponentially distributed. This scheme leads to the overall system being amenable for analysis as

6654

45th IEEE CDC, San Diego, USA, Dec. 13-15, 2006

FrIP13.10

a controlled queueing system via the uniformization procedure [15], [25]. This procedure in turn allows to obtain a discrete-time Markov Decision Process (MDP), for which an optimal policy can be obtained by using value iteration and induction arguments [14], [15]. Although the above model may depart from actual practices in reality, it seems as a good mathematical abstraction that allows the use of the analysis tools mentioned above, and one that has been also used by other authors in the analysis of RML systems [5], [11], [12]. The literature indicates that different approaches, based on both heuristics [2] and optimal criteria [22], have been utilized for job releasing. As mentioned in a comprehensive review in [10], earlier methods for job releasing are categorized between push and pull methods. While in the former method little attention to capacity and congestion is considered when jobs are released into the fab, the latter methods are designed with recognition of capacity limits and the just-in-time production philosophy among others aspects. Examples of such approaches include Constant WIP (CONWIP) [26], Workload Control [23], [10], Starvation Avoidance [21], Workload Regulation [2], and Linear Control Rules [22]. In general, job releasing strategies or policies are considered of critical importance to improve production performance [10]. Moreover, the Semiconductor Industry Association has recently indicated in [16] the need for additional research, algorithms, and decision methods for lot (job) releasing, especially for high mix scenarios (i.e., production of several different devices in the same fab). The minimization of cycle-time as well as inventory levels are common objectives for the above mentioned methods and for SMS in general [2], [4], [5]. In this paper the optimization seeks to minimize the costs derived from inventory levels while maximizing profits (i.e., throughput, minimize cycletime). As a result, an optimal policy is obtained for both the job sequencing and releasing control problems. Such policy presents a myopic behavior, and is given by two indexes of the inventory costs, profits, system parameters, and the discount factor. Moreover, when no profits are obtained from completing jobs, the policy presents Blackwell optimality characteristics. In addition, the optimal policy reflects the effect of discounted and undiscounted profits during state transition intervals. Thus, when profits are undiscounted during state transition intervals, either higher inventory costs or inventory levels are allowed before switching control decisions. The organization of this paper is as follows: section II presents the benchmark RML model with job releasing and sequencing control. The optimization model is presented in section III, and the results as well as two examples are given in section IV. Conclusions are provided in section V. II. BENCHMARK REENTRANT MANUFACTURING LINE MODEL WITH JOB RELEASING AND SEQUENCING CONTROL The model presented in this paper is an adapted version of a benchmark RML that has been previously presented in e.g., [5], [11], [12], [13]. As depicted in Figure 1, we

extended the benchmark RML model by including an order pool and a job releasing control station at the entrance of the manufacturing line. This station receives orders for jobs that later are released according to an optimization objective in a FIFO fashion. Order Pool & Job Releasing Control

uR=0

µR

w

uR=1

λ

Arrivals (new job orders)

Buffer 0 (Order Pool)

Benchmark Reentrant Manufacturing Line

j

i Buffer 1

l

us=1

µ1

us=1

us=0

µ3

us=0

µ2

Buffer 2

Station 2

Buffer 3

Station 1

Out (jobs completed)

Fig. 1. Benchmark Reentrant Manufacturing Line with job releasing and sequencing control.

Similarly to the model presented in [13], the system in Figure 1 corresponds to a Semi-Markov Decision Process (SMDP) with a continuous-time Markov chain where the state is given by the tuple s(t) := (w(t), i(t), j(t), l(t)) corresponding to the buffer levels at time t, with s(t) ∈ S, where S := {(w, i, j, l)| ξ ≤ Lξ < +∞, where ξ, Lξ ∈ Z∗ , with ξ = w, i, j, l} is the state space and Z∗ := Z+ ∪ {0}. Thus, the dimension of S is determined by the buffer finite capacities Lξ , with ξ = w, i, j, l, respectively. If a buffer reaches its maximum capacity Lξ , then a blocking mechanism is activated and no jobs are allowed to be received in that buffer. The production sequence of the system in Figure 1 is as follows: new order arrival→Buffer 0 (order pool), job releasing station→Buffer 1, Station 1→Buffer 2, Station 2→Buffer 3, Station 1→Out (job completed). The processing times at each station are exponentially distributed with means 1 1 1 1 µR , µ1 , µ2 , and µ3 , respectively. Moreover, jobs waiting for service in buffers 1 and 3 are served at rates µ1 and µ3 , respectively. Similarly to the research on workload and job release strategies presented in [27], [28], we model the arrival of new orders as a Poisson process with a mean time between arrivals λ1 . For this system, control decisions deal with both job releasing and sequencing into and inside the benchmark RML, respectively. In the former task a new job is released into the RML when uR = 1, then an order is taken from the pool and it is converted into an effective job that is sent to the RML. If uR = 0, then no orders are executed and no jobs are released. In addition, we assume that there is an infinite amount of raw material to cope with the corresponding demand. In the latter task, jobs waiting in buffers 1 and 3 are chosen to be served in Station 1 by selecting us = 1 and us = 0, respectively. Therefore, the control of the system

6655


FrIP13.10

is defined as a vector u := [uR us ] with u ∈ U, U := UR (s) × Us (s), uR ∈ UR (s) ⊆ UR , and us ∈ Us (s) ⊆ Us , where UR := {0, 1}, Us := {0, 1}, and UR (s), Us (s) are constraints for the control actions uR and us , respectively, given s ∈ S. In particular, UR (w, i, j, l) := {0} if w = 0 or i = Li ; i.e., the control uR is constrained by the capacity in buffer 1 as well as the availability of new orders in buffer 0. III. OPTIMIZATION MODEL: INFINITE HORIZON DISCOUNTED COST The optimization model considers the minimization of an infinite horizon discounted cost which is defined as follows: Definition 1: Given a discount factor β > 0, with β ∈ R, then Jβπ (s0 )

:= lim Eπ N →∞

tN

e

−βt

0

g (s(t), u(t)) dt s(0) = s0 , (1)

is the β-discounted cost under policy π ∈ Πad , where Πad is the set of admissible policies and tN is the time for the N -th state transition. In addition, g(s(t), u(t)) is the continuoustime one-stage cost function, and s0 is the initial state, s(t) ∈ S, u(t) ∈ U. In addition, the optimal β-discounted cost is ∗ defined as Jβ∗ (s0 ) := minπ Jβπ (s0 ). Moreover, if Jβπ (s0 ) = ∗ ∗ Jβ (s0 ), then π ∈ Πad is said to be an optimal policy. As mentioned in section II, the model of the RML corresponds to a continuous-time Markov chain. Thus, a uniformization [15], [25] procedure is performed to obtain a discrete-time and statistically equivalent model which facilitates the analysis for the optimal control problem. Thus, the elements of the discrete-time optimization model are defined as follows [15]: Definition 2: Given a uniform version of a SMDP under the discounted cost criteria (1), then N π k Jα (s) := lim Eπ α g˜(sk , uk ) s0 = s (2) N →∞ k=0

is the α-discounted cost under policy π ∈ Πad , where α := ν β+ν , and g (s, u) + gˆ(s, u), g˜ (s, u) := (3) β+ν are, namely, the discount factor and the one-stage cost function for the discrete-time model. In (3), the cost function gˆ(s, u) is utilized to model situations where a cost, which is independent of the length of the state transition interval, is imposed at the moment of applying control u at state s [15]. In addition, sk ∈ S and uk ∈ U are the state and control at the k-th state transition, ν is the uniform transition rate, with ν ≥ νs (u) for all s ∈ S, u ∈ U, and νs (u) is the rate of transition [15] associated to state s and control u. For the RML system described in section II, ν is defined as follows: ν := λ + µA + µ1 + µ2 + µ3 . Jα∗ (s),

(4)

with s ∈ S, is defined The optimal α-discounted cost as Jα∗ (s) := minπ Jαπ (s), and from Definition 1 and the uniformization procedure, we have that Jα∗ (s) = Jβ∗ (s).

Thus, a policy which is optimal for the discrete-time system is also optimal for the continuous-time system. Therefore, the optimal control analysis is performed with the corresponding discrete-time Bellman’s optimality equation: Jα∗ (s) = min E {˜ g (s, u) + αJα∗ (f (s, u))} , u∈U

(5)

where α is the discount factor, with 0 < α < 1, and the next state function is given by f (s, u), with s ∈ S, and u ∈ U. As indicated in [5], [15], if {τn } is the sequence of times where the continuous-time Markov chain is sampled with τ0 = 0, then each time instant where the system changes its state is considered a sample time in the uniformized version. Thus, it is assumed that the control policy does not change during the interval [τn , τn+1 ). These types of policies are defined as non-interruptive [5]. Similarly, the job sequencing control problem is limited to non-idling policies [5], [12] for which a station is not permitted to remain idle if at least one buffer has one or more jobs to be processed. Figure 2 depicts the state transitions diagram for the uniformized version of the continuous-time Markov chain of the benchmark RML with job releasing and sequencing control.

Rs uR µ R / ν

us µ 1 / ν

B1 s

λ/ν

µ 3 / ν ( 1 - us )

As

s B3 s

µ2 / ν η

B2 s η =1 - [ λ + µR uR + µ1 us + µ3 (1 - us) + µ2 ]

Fig. 2. State transitions diagram for uniformized version of the continuoustime Markov chain associated to the benchmark RML with job releasing and sequencing control.

In Figure 2, R, A, B1 , B2 and B3 are mappings from S to S [29], as follows: • Rs = (w − 1, i + 1, j, l), • As = (min{w + 1, Lw }, i, j, l), • B1 s = (w, (i − 1 1j )+ , min{j + 11i>0 , Lj }, l), • B2 s = (w, i, (j − 1 1l )+ , min{l + 11j>0 , Ll }), and + • B3 s = (w, i, j, (l − 1) ), + where (·) := max(·, 0), 11ξ := 1(ξ 1 < Lξ ), 11ξ>0 := 1(ξ 1 > 0), and 1(·) 1 is the indicator function. In general, the Bellman’s equation in (5) can be expressed in the following way: Jα∗ (s) =

1 β+ν

minu∈U [g (s, u) + gˆ(s, u)(β + ν)+ ν s p˜(u)ss Jα∗ (f (s, u))], (6) where s, s ∈ S, and s represents the next state, with s = s(τn+1 ) = f (s, u), and p˜(u)ss = P {s |s, u} is the conditional transition probability for the uniform version of the continuous-time Markov chain.

6656


FrIP13.10

IV. OPTIMAL JOB RELEASING AND SEQUENCING POLICY This section presents the main result of this paper, which is an optimal job releasing and sequencing policy for the benchmark RML with finite capacity buffers. We also provided two examples at the end of this section that consider both a linear and a quadratic one-stage cost function. Before presenting the results, we provide next several definitions and assumptions utilized in this section.

Assumption 1: Profits obtained per job completed in the benchmark RML are assumed to be either discounted or not during state transition intervals, but not both. Next, we provide a more specific definition of the onestage cost function g˜(s, u) considering the previous definition and assumption. Definition 5: Given s ∈ S and u ∈ U, then a) If profits are discounted during state transition intervals, then from (3) gˆ(s, u) := 0 and g˜(s, u) :=

A. Definitions and assumptions As mentioned before, we assume that work starvation is avoided in the benchmark RML; therefore, there is a set of states for which a non-idling stationary policy is applied for the job sequencing task. For the benchmark RML system, this policy is defined as follows: Definition 3: In the benchmark RML, a stationary job sequencing policy πN I = {π, π, π, ...}, where π : S → Us , is non-idling if π(w, i, j, l) := 1 for all w ≥ 0, i ≥ 0, j ≥ 0, l = 0; and π(w, i, j, l) := 0 for all w ≥ 0, i = 0, j ≥ 0, l > 0. In addition, the set of states for which the non-idling policy becomes applied is defined as SN I . Thus, if (w, i, j, l) ∈ SN I , then Us (w, i, j, l) = {1} ∀ (w ≥ 0, i > 0, j ≥ 0, l = 0), and Us (w, i, j, l) = {0} ∀ (w ≥ 0, i = 0, j ≥ 0, l ≥ 0). The subset of states for which a decision has to be made from the Bellman’s optimality equation is defined as SN I := {(w, i, j, l) ∈ S|w ≥ 0, i > 0, j ≥ 0, l > 0}, SN I ⊆ S. We also define the optimization objective in terms of two performance indexes commonly utilized in SMS [5], inventory cost (i.e.,WIP or holding costs) and throughput. Thus, we have the following definition: Definition 4 (Production Optimization Objective): The optimization objective in the control of job releasing and sequencing in the benchmark RML is defined as the tradeoff between minimizing the inventory cost and maximizing the profits obtained from completing jobs (i.e., maximize throughput) under the framework of the infinite horizon discounted cost criteria given in Definitions 1 and 2. The problem of maximizing throughput (i.e., maximizing profits from completing jobs), which is equivalent to minimizing the cycle-time of the jobs, can be considered in the optimization problem under two different perspectives: either as profits discounted during the duration of state transition intervals, or as profits that are not discounted during such intervals of time. On the one hand, when profits obtained from completing jobs are assumed to be constant during the state transition interval, then a non discounted profit approach is utilized. This situation can be modeled in the one-stage cost function (3) by defining an appropriate function gˆ(s, u). On the other hand, when profits obtained are considered to be continuously discounted in time, then a model with discounted profits during state transition intervals is applied. In that case, gˆ(s, u) := 0 in (3). Thus, the following assumption and definition follows:

1 ( g(s) − p · 1(JC) 1 ), β+ν

(7)

where g(s) represents the inventory cost per unit time which is independent of the control u, p is the profit received from completing a job, with p ≥ 0, p ∈ R, and JC :={event: one job completed} which occurs with probability (see Figure 2): µ3 P { 1(JC) 1 = 1 | s} = (1 − us (s)). ν b) If profits are not discounted during state transition intervals, then g(s) + gˆ(s, u), β+ν

(8)

gˆ(s, u) := −p · (1 − us (s)).

(9)

g˜(s, u) := where

The following definition and assumption are related to the structure of the one-stage inventory cost g(s). Definition 6: The componentwise partial order on S, denoted ”≤cw ”, is defined as follows: for any v = (v1 , v2 , v3 , v4 ), r = (r1 , r2 , r3 , r4 ) ∈ S, we say that r ≤cw v iff rq ≤ vq for q=1, 2, 3, 4; and for all r, v ∈ S. Consider m : S → R, such that m(s) ≥ 0 ∀ s ∈ S. If m(r) ≤ m(v) for any r, v ∈ S, r ≤cw v, then m(·) is said to be non-negative and monotonically nondecreasing with respect to the componentwise partial order ”≤cw ”. Assumption 2: The one-stage inventory cost function g(s) is non-negative and monotonically nondecreasing with respect to the usual componentwise partial order ”≤cw ” for all s ∈ S. B. Results This section presents the optimal policy for the problem of job releasing and sequencing in the benchmark RML. Before presenting the main result, however, we provide other key results and a definition utilized in the derivation of the optimal policy. First, given s ∈ S and u ∈ U, the Bellman’s optimality equation for the job releasing and sequencing problem of the benchmark RML according to (6) and the state transition probabilities depicted in Figure 2 is as follows:

6657

1 Jα∗ (s) = minu∈U β+ν [ g(s) − p φ(d) + λJα∗ (As) ∗ +µR uR (s) Jα (Rs) + µ1 us (s) Jα∗ (B1 s) +µ3 (1 − us (s)) Jα∗ (B3 s) + µ2 Jα∗ (B2 s) +µR (1 − uR (s)) Jα∗ (s) + µ3 us (s) Jα∗ (s) +µ1 (1 − us (s)) Jα∗ (s)], ∀ s ∈ S, d ∈ {0, 1}, (10)


FrIP13.10

where Jα∗ (s) is the α-discounted optimal cost for s ∈ S and φ(d) is given according to the type of discounting applied to the profits described in Definition 5 and by selecting d ∈ {0, 1} as follows:

Theorem 1 (Optimal Policy): For the problem of job releasing and sequencing in the benchmark RML, if s ∈ SN I and d ∈ {0, 1}, then it is optimal to serve buffer 3, if and only if, ∆g (s) + p · φ(d) · (β + ν) ≥ 0, where

φ(d) := µ3 d + (1 − d)(β + ν).

∆g (s) := µ1 [g(B1 s) − g(s)] − µ3 [g(B3 s) − g(s)], ∀ s ∈ S, (16) and given s ∈ S, it is optimal to release a new job into the benchmark RML, if and only if, ∆gR (s) < 0, where

(11)

Then, when d = 1 we have the case of profits discounted during state transition intervals, and d = 0 for undiscounted profits in such intervals. In addition, notice that φ(0) ≥ φ(1). The Bellman’s optimality equation in (10) can also be rewritten as follows: 1 Jα∗ (s) = β+ν [ g(s) − p φ(d) + λJα∗ (As) + µ1 Jα∗ (s) ∗ s) + µ3 Jα∗ (B3 s) + µR Jα∗ (s)

+µ2 Jα (B2 + minu∈U µR uR · ∆R (s) + us · ∆ds (s) ], ∀ s ∈ S, d ∈ {0, 1}, (12) where, (13) ∆R (s) := Jα∗ (Rs) − Jα∗ (s),

∆ds (s) := ∆s (s) + p φ(d),

(14)

and ∆s (s) := µ1 [Jα∗ (B1 s)−Jα∗ (s)]−µ3 [Jα∗ (B3 s)−Jα∗ (s)]. (15) The next Lemma provides the optimality conditions for the job releasing and sequencing problem. Lemma 1 (Optimality Conditions): Given s ∈ S, then the following are the optimality conditions for both releasing a new job into the benchmark RML and also for sequencing the jobs in buffer 1 and 3: a) It is optimal to release a new job into the benchmark RML (i.e., u∗R = 1) if ∆R (s) < 0, otherwise it is optimal not to release a job (i.e., u∗R = 0). b) It is optimal to serve buffer 3 (i.e., u∗s = 0) if ∆ds (s) ≥ 0, otherwise it is optimal to serve buffer 1 (i.e., u∗s = 1). Proof: Follows directly from the Bellman’s optimality equation: the right side of (12) is minimized by selecting uR and us according to the sign of ∆R (s) and ∆ds (s), respectively. Notice that without loss of optimality we arbitrarily assigned the action u∗s (s) = 0 (u∗R (s) = 0) if ∆ds (s) = 0 (∆R (s) = 0). That is, if ∆ds (s) = 0 (∆R (s) = 0) any control action in Us (UR ) is equally optimal. Lemma 2 (Monotonicity of Jα∗ (s)): Let Assumption 2 holds, then the optimal α-discounted cost Jα∗ (s) is monotonically nondecreasing w.r.t componentwise partial order for all s ∈ S. Proof: The proof of Lemma 2 follows by induction and value iteration as follows: by value iteration a sequence of functions Jk (s) are generated from (10) for k = 0, 1, 2, ... such that J0 (s) := 0 ∀ s ∈ S and limk→∞ Jk (s) = Jα∗ (s) ∀ s ∈ S. Then, it can be proved that Jk (s+ ) ≥ Jk (s) ∀ k ≥ 0 and ∀ s, s+ ∈ S, s.t. s ≤cw s+ . Finally, the result follows by value iteration as k → ∞. The following is the main result of this paper that presents the optimal job sequencing and releasing policy for the benchmark RML.

(17) ∆gR (s) := g(Rs) − g(s), ∀ s ∈ S. Proof: The proof follows by showing that ∆g (s) + p · φ(d) · (β + ν) ≥ 0 ⇔ ∆ds (s) ≥ 0 and ∆gR (s) ≥ 0 ⇔ ∆R (s) ≥ 0 ∀ s ∈ S. Similarly to the proof of Lemma 2, the result can be proved by induction and value iteration by showing that ∆g (s) + p · φ(d) · (β + ν) ≥ 0 ⇔ ∆ds,k (s) ≥ 0 and ∆gR (s) < 0 ⇔ ∆R,k (s) < 0 for all k ≥ 0 and for all s ∈ S, where ∆ds,k (s) and ∆R,k (s) are obtained by value iteration from (12), (14), and (13), so the result follows as k → ∞ given that limk→∞ ∆ds,k (s) = ∆ds (s) and limk→∞ ∆R,k (s) = ∆R (s). The result of Lemma 2 is also utilized in the proof. The optimal policy in Theorem 1 presents a myopic behavior, i.e., the optimal decision is determined by the control action that leads to the next possible state with minimum one-stage cost. In addition, the optimal job releasing and sequencing policy (for p = 0) does not change for any β > 0, β ∈ (0, +∞), as it does not depend on β. Thus, from the definition of a Blackwell optimal policy [15, pp. 201],[14, pp. 494] we have the following corollary. Corollary 1: Both the optimal job releasing and sequencing policy (for p = 0) is Blackwell optimal; therefore, it is average cost (AC) optimal within the class of all stationary policies. C. Examples The following examples present the optimal job sequencing and releasing policies when both a linear and a quadratic one-stage cost functions for g(s) are considered. Example 1 (Linear Cost): Consider the benchmark RML with job releasing and sequencing control, and the one-stage inventory cost function g(w, i, j, l) = cw w + ci i + cj j + cl l, for which Assumption 2 holds and cw , ci , cj , and cl are nonnegative cost coefficients. From Theorem 1 the optimal job sequencing policy is as follows: consider s ∈ SN I ⊆ S, then it is optimal to serve buffer 3 (i.e., u∗s = 0) iff µ1 cj +µ3 cl +p·φ(d)·(β+ν) ≥ µ1 ci . This policy seems similar to the well known µc-rule where the products µc represent savings in waiting cost per unit average service time [15]. When such savings for serving buffer 3, along with the proportion of profits for completing a job, become larger than those obtained from serving buffer 1, then a higher priority is given to the control action of serving buffer 3. Notice that the proportion of profits is larger when d = 0. Similarly, from Theorem 1 and given and s ∈ S s.t. w > 0, i < Li , then it is optimal to release a new job into the RML iff cw > ci . That is, it is optimal to release new jobs as

6658


FrIP13.10

long as the holding costs in buffer 0 are higher than those in buffer 1. Example 2 (Quadratic Cost): In this example consider the following one-stage inventory cost function g(w, i, j, l) = cw w2 + ci i2 + cj j 2 + cl l2 . As in Example 1, Assumption 2 holds and cw , ci , cj , and cl are nonnegative cost coefficients. If (w, i, j, l) ∈ SN I ⊆ S, then from Theorem 1 we have that for j = Lj (i.e., buffer 2 is full) it is optimal to serve buffer 3, and for j < Lj it is optimal to serve buffer 3 iff 1 [2µ1 cj j + 2µ3 cl l + δ(p, d)] ≥ i, 2µ1 ci

(18)

where δ(p, d) := µ1 ci + µ1 cj − µ3 cl + p · φ(d) · (β + ν), and δ(p, 0) ≥ δ(p, 1) for all p ≥ 0. The optimal job sequencing policy tends to give priority to serve buffer 3 when the level of inventory in buffer 2 or 3 grows. Similarly, when profits are not discounted during state transition intervals (d = 0), buffer 3 receives more priority to be served. That is, the proportion of profits accumulated in time becomes larger when d = 0, so higher inventory levels (i.e., costs) are allowed in buffer 1. On the other hand, from Theorem 1, and given s ∈ S s.t. w > 0, i < Li , it is optimal to release a new job (i.e., u∗A = 1) into the benchmark RML iff: w>

ci 1 ci i+ + . cw 2cw 2

(19)

In this case the optimal policy tends to give priority to the action of releasing new jobs into the RML when both the inventory level and costs in the order pool are increased. V. CONCLUSIONS This paper presented the solution for the optimal job releasing and sequencing in an adapted version of a benchmark RML with finite capacity buffers and under the framework of an infinite horizon discounted cost criteria. For this problem the objective was the maximization of throughput (i.e., minimization cycle-time) while minimizing inventory costs. The resulting optimal policy is then composed of two parts, a job releasing and a job sequencing policy. The policy is defined by two indexes which are dependent on the inventory cost, profits per job completed, system’s parameters, and discount factor. Moreover, when no profits are obtained from completing jobs, the policy presents Blackwell optimality characteristics. In addition, the policy reflects the effect of discounted and undiscounted profits during state transition intervals. R EFERENCES [1] R. Uzsoy, C. Lee, and L. A. Martin-Vega, “A review of production planning and scheduling models in the semiconductor industry part II: Shop-floor control,” IIE Transactions, vol. 26, no. 6, pp. 44–55, 1994. [2] L. M. Wein, “Scheduling semiconductor wafer fabrication,” IEEE Transactions on Semiconductor Manufacturing, vol. 1, pp. 115–130, 1988. [3] P. R. Kumar, “Re-entrant lines,” Queueing Systems: Theory and Applications, vol. 13, pp. 87–110, 1993. [4] C. H. Lu, D. Ramaswamy, and P. R. Kumar, “Efficient scheduling policies to reduce mean and variance of cycle-time in semiconductor manufacturing plants,” IEEE Transactions on Semiconductor Manufacturing, vol. 7, no. 3, pp. 374–388, 1994.

[5] S. Kumar and P. R. Kumar, “Queueing network models in the design and analysis of semiconductor wafer fabs,” IEEE Transactions on Robotics and Automation, vol. 17, no. 5, pp. 548–561, 2001. [6] J. D. Plummer, M. D. Deal, and P. B. Griffin, Silicon VLSI Technology. Englenwood Cliffs, N. J.: Prentice-Hall, 2000. [7] C. H. Papadimitriou and J. N. Tsitsiklis, “The complexity of optimal queueing network control,” Mathematics of Operations Research, vol. 24, no. 2, pp. 293–305, 1999. [8] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Bellmont, MA: Athena Scientific, 1996. [9] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998. [10] J. W. Fowler, G. L. Hogg, and S. J. Mason, “Workload control in the semiconductor industry,” Production Planning & Control, vol. 13, no. 7, pp. 568–578, 2002. [11] J.-B. Suk and C. G. Cassandras, “Optimal control of a storage-retrieval queuing system,” in Proceedings of the 28th IEEE Conference on Decision and Control, 1989, pp. 1093–1098. [12] R.-R. Chen and S. Meyn, “Value iteration and optimization of multiclass queueing networks,” Queueing Systems, vol. 32, pp. 65–97, 1999. [13] J. A. Ramrez-Hernndez and E. Fernandez, “A case study in scheduling reentrant manufacturing lines: Optimal and simulation-based approaches,” in Proceedings of the 44th IEEE Conference on Decision and Control, and the European Control Conference 2005, Seville, Spain, December 12-15 2005, pp. 2158–2163. [14] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York, NY: John Wiley & Sons, Inc, 1994, ch. Discounted Markov Decision Problems, pp. 142–266. [15] D. P. Bertsekas, Dynamic Programming and Optimal Control, 2nd ed. Bellmont, MA: Athena Scientific, 2000, vol. II. [16] Semiconductor Industry Association (SIA). (2005) International Technology Road Map for Semiconductors (ITRS) 2005. [Online]. Available: http://public.itrs.net [17] M. Venables, “Small is beautiful: small low volume semiconductor manufacturing plants,” IE Review, pp. 26–27, March 2005. [18] R. Sturm, J. Dorner, K. Reddig, and J. Seidelmann, “Simulation-based evaluation of the ramp-up behavior of waferfabs,” in Advanced Semiconductor Manufacturing Conference and Workshop, March 31st April 1st 2003, pp. 111–117. [19] J. A. Ram´ırez-Hernández and E. Fernandez, “Optimal job sequencing control in a benchmark reentrant manufacturing line,” July 2006, submitted for publication. [20] ——, “Optimal job sequencing in a benchmark reentrant line with finite capacity buffers,” in Proceedings of the 17th International Symposium on Mathematical Theory of Networks and Systems, Kyoto, Japan, July 24-28, 2006, pp. 1214–1219. [21] C. R. Glassey and M. G. C. Resende, “Closed-loop job release control for VLSI circuit manufacturing,” IEEE Transactions on Semiconductor Manufacturing, vol. 1, no. 1, pp. 36–46, 1988. [22] C. R. Glassey and J. G. Shanthikumar, “Linear control rules for production control of semiconductor fabs,” IEEE Transactions on Semiconductor Manufacturing Systems, vol. 9, no. 4, pp. 536–549, 1996. [23] M. Land and G. Gaalman, “Workload control concepts in job shops: A critical assessment,” International Journal of Production Economics, vol. 45-47, pp. 535–548, 1996. [24] J.-W. Breithaupt, M. Land, and P. Nyhuis, “The workload control concept: theory and practical extensions of load oriented order release,” Production Planning & Control, vol. 13, no. 7, pp. 625–638, 2002. [25] S. A. Lippman, “Applying a new device in the optimization of exponential queuing systems,” Operations Research, vol. 23, pp. 687– 710, 1975. [26] M. L. Spearman, D. L. Woodruff, and W. J. Hopp, “CONWIP: a pull alternative to kanban,” International Journal of Production Research, vol. 28, no. 5, pp. 879–894, 1990. [27] L. M. Roderick, D. T. Phillips, and G. L. Hogg, “A comparison of order release strategies in production control systems,” International Journal of Production Research, vol. 30, no. 3, pp. 611–626, 1992. [28] S. T. Enns and M. Prongué Costa, “The effectiveness of input control based on aggregate versus bottleneck work loads,” Production Planning & Control, vol. 13, no. 7, pp. 614–624, 2002. [29] J. Walrand, An Introduction to Queueing Networks. Englewood Cliffs, NJ: Prentice Hall, 1988.

6659