Document not found! Please try again

Data Mining Approach to Production Control in the Computer ...

4 downloads 528 Views 261KB Size Report
cell in a dynamic computer integrated manufacturing system. The proposed Competitive Decision Selector (CDS) observes the status of the system and jobs at ...
V2002-341

1

Data Mining Approach to Production Control in the Computer Integrated Testing Cell Choonjong Kwak and Yuehwern Yih

Abstract—This paper presents a data mining-based

production control approach for the testing and rework cell in a dynamic computer integrated manufacturing system. The proposed Competitive Decision Selector (CDS) observes the status of the system and jobs at every decision point and makes its decision on job preemption and dispatching rules in real time. The CDS equipped with two algorithms combines two different knowledge sources: (1) the long-run performance and (2) the short-term performance of each rule on the various status of the system. The short-term performance information is mined by a data-mining approach from large-scale training data generated by simulation with data partition. A decision tree-based module generates classification rules on each partitioned data that are suitable for interpretation and verification by users and stores the rules in the CDS knowledge bases. Experimental results show that the CDS dynamic control is better than other common control rules with respect to the number of tardy jobs. Index Terms— Data Mining, Dispatching Rule, Preemption, Production Control, Rework, Scheduling.

M

I. INTRODUCTION

ODERN distributed control and computerized data logging systems collect large volumes of data in real time by using bar codes, sensors, and integrated vision systems in computer integrated manufacturing environments. The data may contain valuable information for operation and control strategies as well as on normal and abnormal operational patterns. However, the use of the accumulated data has been limited, which has led to the “rich data but poor information” problem [36]. Data mining is the application of specific algorithms for fitting models to, or extracting patterns from, observed data, as part of Knowledge Discovery in Databases (KDD) that is the overall process of discovering useful and understandable knowledge from data [8]. The other steps in the KDD process include data preparation, data selection, data cleaning, data transformation, and proper Manuscript received November 7, 2001. This work was supported in part by the National Science Foundation under NSF Young Investigator Award (9358158-DMI) and the Purdue Research Foundation. Choonjong Kwak is a graduate student at Purdue University, West Lafayette, IN 47907 USA. (e-mail: [email protected]). Yuehwern Yih is with the Purdue University, West Lafayette, IN 47907 USA (corresponding author. phone: 765-494-0826; fax: 765-494-1299; email: [email protected]).

interpretation and evaluation of mined results. The discovered knowledge can be gleaned from different aspects, and large databases can thereby serve as rich sources for required knowledge generation and verification. Researchers in various fields including database systems, artificial intelligence, machine learning, knowledge-base systems and statistics have recently explored data mining. A common problem in data mining is data classification [9]. Data classification is to use observations on a set of features to categorize them into different classes. It has also been an important theme in statistics, machine learning, and expert systems. Wehenkel [37] applied machine learning methods to a power-system security problem in which he screened diverse simulation scenarios of a system to yield a large database. By using data mining techniques, he extracted synthetic information on the simulated system’s main features from the database. Lane and Brodley [17] proposed a data-mining system that monitors a single user’s computer or account and develops a profile of that user’s typical behavior. The proposed system detects anomalies as deviations from the known and expected patterns. Wang and McGreavy [36] presented a data mining system that automatically clusters simulated data into classes corresponding to various operational modes for process plant operation and control. Data were artificially generated, due to the difficulties of data collection (particularly, from important abnormal conditions) in power system, process plant, and computer security problems. McDonald [21], Duvivier [7], and Perfector et al. [26] addressed yield analysis and/or diagnosis problems in semiconductor production environments by using a variety of data mining techniques. Thornhill et al. [34] dealt with a method for automated data mining of large historical data sets for statistical quality control. Despite the popularity of data mining, few results have been reported for production control in discrete manufacturing environments. Production control in dynamic manufacturing systems is a very challenging problem. A variety of reasons often lead rush orders to routine [20]. The system operates under a high degree of uncertainty. Constant design changes exacerbate the problem. Cost is often sacrificed to obtain faster service. Job preemption can benefit rush and priority orders; however, it gained less attention in literature compared to dispatching rules, particularly in real time manners [4], [10]. Robb and Rohleder [29] argued that the lack of realism in static and deterministic representations of real environments exaggerated the gap between theory and practice and supported dynamic modeling and real time heuristics.

V2002-341 Dispatching rules are very effective for shop floor control and most of the shop floor control approaches in fact adapt dispatching rules to decide what job should be chosen next when a machine is available [35], [15]. A number of dispatching rules have been studied over the decades. Park et al. [23] used an inductive learning methodology to choose an appropriate dispatching rule at each dispatching point based on the observed pattern of six system parameters and showed the superiority of their proposed approach over the repeated applications of single dispatching rules. As the authors pointed out, however, a possible problem in their approach is that the appropriateness of a dispatching rule at a real time dispatching point is determined by the rule bases that are based on the long-run performance of dispatching rules, while it requires less data. This may not be necessarily effective because the dispatching decisions of the rule bases are made by observing the patterns of the real time system status. Sun and Yih [32] proposed a neural network-based control system to choose appropriate dispatching rules for machines and a robot based on the real time system status under multiple objectives. They collected large-scale training data from the whole simulation period and showed the performance of the proposed control system on testing simulation data with different random number streams. However, their neural network-based approach suffers from incomprehensible results, which is the major reason that neural networks are not used for data mining tasks in most cases [19]. Eventual implementation of decision support systems is not likely, if their algorithms cannot be easily understood and verified by users [29]. This paper tackles a production control problem, particularly for both job preemption and dispatching, in the testing and rework cell of a dynamic and stochastic CIM system. The proposed Competitive Decision Selector (CDS) observes the status of the system and jobs at every decision point and makes its decision on job preemption and dispatching rules in real time. The CDS is equipped with two algorithms to combine knowledge from two different sources in real time: (1) the long run performance and (2) the shortterm performance of each rule on the different status of the system and jobs. A data mining approach is proposed to get symbolic classification rules from large-scale training data generated over the whole simulation period, but ideas are also borrowed from the long-run performance-based induction approach. The data partition concept [33] is introduced in the data mining approach. A decision tree-based module generates symbolic classification rules on each partitioned data that are suitable for interpretation and verification by human users. The rest of this paper is organized as follows. The next section states and formulates the production control problem being tackled. In Section III, the proposed methodological framework is presented module by module in each sub section. Section IV explains the simulation test bed that is built to evaluate the proposed approach in dynamic and stochastic environments. The results are presented with statistical analysis in Section V. Finally, general conclusions follow in Section VI.

2 II. PROBLEM STATEMENT AND FORMULATION Consider a testing/rework cell with m identical parallel workstations. Jobs come from the previous assembly stage to the single queue of this testing/rework stage. All jobs are served by a service discipline (e.g., on a first-come-first-served basis). Once a workstation gets job j, j = 1, 2, …, n, it is tested for its testing time tj. Its setup time sj is incurred with a major setup when a tester shifts from one job family to another, while with a minor setup when it shifts from one job type to another within the same family. If job j passes the test, it goes to the next stage. Otherwise, it is reworked for its rework time rj while staying in the same workstation so that it can be tested again in the same place with sj= 0 after its rework. It is often desirable in Just in Time (JIT) environments to utilize the skills of the multifunction operators in multi-purpose workstations. Preemption of jobs is allowed at any moment of the testing/rework stage. Once a job is preempted, it is returned to the queue with a service discipline and then either tested or reworked, depending on the job’s stage at the moment of preemption, when it is served again. When the job was in the middle of testing, it is assumed all the testing progress that had been made on the job is lost and testing has to be done from scratch. When the job was at rework, it is assumed to resume later from the point of preemption. These are reasonable assumptions, because intermediate testing results are temporarily stored on electrical or human tester’s memory while rework is typically done physically on the job. For distinguishing purposes, the former is called interruption while the latter is called timeout in this research. To the best of the authors’ knowledge, no research has considered two kinds of preemptions at the same time. The term of timeout is motivated by its similarity to timeout collaboration protocol in a testing environment with rework [15], [16]. While timeout still collaborates with a service discipline (or, a dispatching rule), it does not involve idle time of workstations as opposed to the timeout collaboration protocol. Consider a single relation R with attributes x1, x2,…, xl, z for classification tasks in discrete domains. Each row in the table is called an instance or object. The attributes x1, x2,…, xl are referred to as input features or predictive attributes and the attribute z represents the category or class of the instance. The classification tasks are part of the CDS function f1 that is defined over a set X of input features. Let Y be the image of X under the mapping f1, that is, Y = { y ∈ N | y = f1 ( x ) , for some x ∈ X } (1) where decision rule y ∈ Y and input feature vector x = (x1, x2,…, xl). Let |Y|=K, because this research considers only a finite number of decision rules. Given control function f2 and decision rule y, binary variable z ∈ Z represents the tardy status on a completed job as follows: (2) z=f2(y)+ε , where ε is a random or stochastic term and z is 1 if a completed job is tardy or 0 otherwise. Production control by using data mining is a very challenging task. While classification tasks typically seek a value of z given input features x1, x2,…, xl, production control by data

V2002-341

3

mining is involved with two mapping functions f1 and f2. Furthermore, the system status of dynamic production control becomes more and more different over time from that of static control, even with the same starting conditions of the system and the same experimental system parameters afterward, because the control function f2 affects the input features x1, x2,…, xl that represent the status of the system and jobs. Let ej be the effective processing time of job j for a task, that is, the time a task of job j is completed at a workstation assuming no more preemptions of job j. The effective processing time of job j for testing is ej = sj + tj, for all j (3) On the other hand, the effective processing time of job j for rework is e j = 0, if job j passes the test, (4) r j , if job j is reworked immediately following testing, rj

− v , if job j is resumed after timeout,

where job j was previously timeouted after v units of time. The setup works in the form of sequence-independent family setup times as explained earlier. It is assumed that each workstation i can process at most one operation, either testing or rework, at a time. Three 0-1 integer decision variables are defined as δ ij = 1, if job j is assigned to workstation i, (5)

γ ij =

0, otherwise, 1, if job j is interrupted at workstation i,

(6)

φ ij =

0, otherwise, and 1, if job j is timeouted from workstation i,

(7)

0, otherwise. Let cj and dj denote the completion time and deadline (or, due date) of job j, respectively. The objective is to minimize the number of tardy jobs as follows: n

cj − d j

j =1

| cj − d j |

Min ∑ [ Max (

, 0 ) ].

(8)

When an urgent job arrives, it may be beneficial to preempt one of the current jobs being served considering the overall objective function. Decision points include each job completion for δij and each job arrival to the queue for γij and φij. If an incoming job j is not urgent at its arrival time, the job j is checked again at an appropriate time. It should try to preempt other jobs in the workstations not to be late, if it is still not being served at the second decision point for γij and φij. III. METHODOLOGIES The hierarchical control architecture in CIM enterprise environments consists of four major levels: factory, cell, workstation, and resources [30]. The plant information system collects and saves the status of the system in the cell and workstation databases in real time by computerized monitoring and reporting devices with bar-codes, sensors, and integrated vision systems [6]. Computerized data processing units preprocess the data and deliver the necessary features to the Competitive Decision Selector (CDS). The CDS makes its

decision based on collected features that represent the system status and job characteristics and then gives it to controllers. The controllers have three hierarchical levels: plant control system, cell central controller, and local controller, and the last two controllers are considered in this research. The controllers implement the CDS decision in the cell and workstations, which affects, in turn, the system status. The construction process of the CDS consists of four major modules: training data generation, data partition/ transformation, rule extraction, and CDS. This research focuses on data selection, data transformation, and proper application and evaluation of mined results in a simulation test bed, as well as data mining, in the overall KDD process. All procedures are iterative in nature. For example, features are selected through the iterative process by a hybrid feature selection approach, as explained in Section III-A. Rule extraction is done by using decision tree C4.5, which is well known, and its typical process [28] is not explained in detail here. All default parameters of C4.5 are used and its significant test is performed for pruning. The following sections mostly focus on three other modules to describe how data mining is used for production control. A. The Training Data Generation Module Feature extraction is the general activities that create new features through transformations or combinations of the original primitive features. Feature selection, on the other hand, is the task of finding the most reasonable subset of features for a classifier to seek fewer features and maximum class separability [14]. Feature selection is a key issue in pattern classification problems and which feature set is best depends upon the given classification task [2]. Both the feature filter and wrapper approaches are popular in feature selection [11]. The filter approach selects features independent of the objective function of the problem and ignores the effects of selected features on the performance of the objective function. The wrapper approach takes into account the eventual objective function in its feature selection but this approach may not be computationally feasible with a large number of features [5]. This research newly proposes a hybrid approach of the two, to overcome their limitations in feature selection. Feature extraction in this research started with collecting initial feature candidates based on the literature [24], [29], [12], [23], [5], because domain-specific knowledge is typically used in this process. However, feature extraction and selection are iterative processes until they find a reasonable set of the features that adequately explains a particular problem. Through preliminary experiments, a set of feature candidates was finally extracted in this research as shown in Table I. Since the number of tardy jobs is of interest in this research, more deadline-related statistics and counters were created and added as an effort to intensify deadline-related features, while several flow time-related features were removed from the initial feature candidates. The relative deadline of job j at time t is the difference between its deadline dj and the current time t, that is, dj – t. A potentially tardy job means the job that is not tardy at the current time t but will be most likely tardy by the deadline dj considering its expected processing time. Features

V2002-341 were extracted with several considerations. First, relative values were preferred to actual ones to avoid any possible negative effect on performance due to the variation by other factors. For example, relative deadlines were used instead of deadlines because the former cases are time invariant. Second, position information was considered for control purposes to represent the relationship of job status between the queue and servers. When it is known that there are urgent waiting jobs in the queue, control decisions on the servers (i.e., job preemption) may have to be changed. Finally, useful features can often be generated by taking differences, sums, averages, and ratios of primitive features, which as yet is difficult to automate. To represent congestion level, for example, the number of jobs in the system was replaced by the ratio of the number of waiting jobs in the queue to the total number of jobs in the queue and servers. Difference values were used in several cases, when ratio values may have errors (e.g., the value of the denominator is zero). Feature selection is more complicated in this production control problem that consists of two mapping functions. Since the number of the given feature candidates in Table I is still large, the concept of the filter approach is first used to prescreen promising features and then the wrapper approach is used to determine the final set of features. The filter approach sees the performance of classification, that is, classification accuracy, to decide which groups of features should be chosen. Note that, although this method resembles the wrapper approach, the classification accuracy is not the final objective function of this production problem. The forward stepwise group selection that is motivated by forward stepwise selection [22][11][5], is proposed in this research for the filter approach. The forward stepwise group selection considers a group of features instead of an individual feature at each step. The criterion for adding or deleting a group of features is if it improves classification performance. The forward stepwise group selection starts with the empty set of features and either adds or deletes a group of features at each step until it ends with the identification of several good sets of features with similar classification performance. Once several good feature sets are obtained through the filter approach, the wrapper approach is then used to identify the best feature set by checking the performance of the objective function, minimizing the number of tardy jobs. This hybrid heuristic approach is based on an observation from preliminary experiments that the performance of the eventual objective function does not exactly match but is, to some extent, proportional to classification performance. This hybrid approach is effective, when a large number of features are dealt with as in data mining tasks and search cannot be done on line, to compensate for the disadvantage of the wrapper approach. Features can be grouped, for the forward stepwise group selection of the filter approach, by their characteristics such as deadline-related statistics, deadline-related counters, performance-related statistics, differences, average values, etc. as shown in Table I. Table II shows an example of the feature selection process that compares difference and average values

4 starting from the empty set. Congestion level is always included because of its role as a similarity measure in a grouping of the data partition module. Accuracy indicates the classification accuracy within the CDS function f1 on separate test data.

Deadline related Statistics

Deadline related Counters

Congestion Level

Performance related Features

TABLE I 36 Feature Candidates 1: min. relative deadline of jobs in queue 2: avg. relative deadline of jobs in queue 3: min. relative deadline of jobs in servers 4: avg. relative deadline of jobs in servers 5: min. critical ratio of jobs in queue 6: avg. critical ratio of jobs in queue 7: min. critical ratio of jobs in servers 8: avg. critical ratio of jobs in servers 9: min. slack time of jobs in queue 10: avg. slack time of jobs in queue 11: min. slack time of jobs in servers 12: avg. slack time of jobs in servers 13: min. relative deadline of jobs in queue – min. relative deadline of jobs in servers 14: avg. relative deadline of jobs in queue– avg. relative deadline of jobs in servers 15: min. critical ratio of jobs in queue – min. critical ratio of jobs in servers 16: avg. critical ratio of jobs in queue – avg. critical ratio of jobs in servers 17: min. slack time of jobs in queue – min. slack time of jobs in servers 18: avg. slack time of jobs in queue – avg. slack time of jobs in servers 19: no. of tardy jobs in queue 20: no. of tardy jobs in servers 21: no. of potentially tardy jobs in queue 22: no. of potentially tardy jobs in servers 23: no. of tardy jobs in queue – no. of tardy jobs in servers 24: no. of potentially tardy jobs in queue – no. of potentially tardy jobs in servers 25: no. of jobs in queue / the number of jobs in queue and servers 26: min. current tardiness of jobs in queue 27: avg. current tardiness of jobs in queue 28: max. current tardiness of jobs in queue 29: min. current tardiness of jobs in servers 30: avg. current tardiness of jobs in servers 31: max. current tardiness of jobs in servers 32: mean tardiness in recent 10 jobs 33: no. of tardy jobs in recent 10 jobs / 10 34: min. current tardiness of jobs in queue – min. current tardiness of jobs in servers 35: avg. current tardiness of jobs in queue – avg. current tardiness of jobs in servers 36: max. current tardiness of jobs in queue – max. current tardiness of jobs in servers

Two observations can be drawn from the result of Table II. First, difference values (group a) are worse than average values (group b) with respect to classification performance at this point. Separate job status in the queue and servers seems to give more information than one combined, because the respective job information in the queue and servers may be buried in the combined features by taking the differences. It is a good example to indicate that transformations or combinations of primitive features do not always guarantee better performance and should be done with care. Secondly, it is shown that more features are not necessarily beneficial for

V2002-341

5

performance improvement.

a

b

TABLE II AN EXAMPLE OF FEATURE SELECTION Characteristics of Features Features Deadline related Statistics + 14,16,18,25 Congestion Level Deadline related Counters + 23,24,25 Congestion Level Performance related Features + 35,25 Congestion Level All The Above Four Different 14,16,18,23,24,25,35 Kinds Deadline related Statistics + 2,4,6,8,10,12,25 Congestion Level Deadline related Counters + 19,20,21,22,25 Congestion Level Performance related Features + 27,30,32,33,25 Congestion Level All The Above Four Different 2,4,6,8,10,12,19,20, Kinds 21,22,25,27,30,32,33

Accuracy 56.9% 57.0% 60.7% 66.0% 64.3% 64.4% 73.5% 73.4%

B. The Data Partition/Transformation Module Data partition by decision rule y is actually made in the module of training data generation. Decision rule y, y = 1,2,3,…,K, is applied to each simulation scenario as a static control rule and necessary attributes are collected at data collection points and saved in each corresponding file. Initial training data are actually a chronological sequence of x’s and z ’s that are collected at the points of job arrival to the queue and job completion, respectively. The initial training data are a collection of training blocks. DEFINITION: A training block is a matrix with ldimensional raw vectors x = (x1, x2,…, xl) and z=(z,0,0,…,0) that is a chronological sequence of consecutive x’s and consecutive z’s, before another sequence of consecutive x’s follows. Thus, training block B={x1,x2,…xp,zp+1,zp+2,…zp+q}T, (9) where p is the number of x’s and q is the number of z’s in the block. The minimum training block is {x1,z2}T with p = q = 1. The next step of the data partition/transformation module is to transform the given training block B to another matrix B′ where each instance has the standard data structure of (xt, yt, Σzt). The transformed matrix B ′ is obtained by the following equation: p+q

x1 , y t , ∑ z t

B′ =

(10)

t = p +1

p+q

∑ zt

x 2 , yt ,

t = p +1

…..

x p , yt , where

p+q

∑ zt

t = p +1

p+q

∑ z t is equivalent to the number of tardy jobs in the

t = p +1

training block B. This transformation is necessary due to the discrepancy of data generation (collection) between feature vector x and its performance z. In classification models, each instance of the training data set requires a known class label as well as a set of

features. Obviously, initial training data from production environments do not show any exact match between the class label and a set of features, because the points of data generation for the feature vector x and the class variable z, that is, job arrival to the queue and job completion, are different from each other. From a classification perspective, performance z corresponds to the class variable and both feature vector x and decision rule y are input features, i.e., predictive attributes, in this transformed matrix B ′ . It is not surprising that the number q of z’s, or the number of consecutive job completions, in a block can be more than the number m of workstations in the testing/rework cell. This is because the feature vector x is not collected at dispatching points when a job is completed and a new job is selected for an available workstation, but at each job arrival to the queue. The rationale behind this transformation is that the performance z of a completed job may depend upon its recent status of the system and jobs and the recent control decision y. The final step of this module is to partition the transformed training data by system congestion level ht = 0, if N q (t ) = 0, (11) N q (t ) / N (t ) ,

otherwise,

where N(t) is the number of jobs in the system at time t and Nq(t) is the number of jobs in the queue at time t. N(t) excludes the incoming job to the queue that triggers the collection of feature vector x . This partition is motivated by the fact that when Nq(t) = 0 and thus N(t)-Nq(t)= N(t) ≤ m there is no need for production control, both preemption and dispatching decisions, and by the assumption that, even when Nq(t)>0 and thus N(t)-Nq(t) = m, the performance of production control may be related to system congestion level. Additional advantages by this data partition approach include faster rule extraction by smaller data for each knowledge base and possibly improved accuracy because the rule extraction of a decision tree may benefit from data clustering which gathers data with similar characteristics in the same group. This second partition process is done individually on the partitions that are previously made by decision rule y and a knowledge base is constructed on each sub-partition in the next rule extraction module. C. The Competitive Decision Selector (CDS) Module A knowledge base is constructed by decision tree C4.5 [28] within each sub-partition in the off-line rule extraction module to make on-line dispatching and preemption decisions in the CDS module. Each knowledge base is grouped by the congestion level in this CDS module. When a production control decision is about to be made, the data on the current system status and job characteristics are monitored, collected, and saved in cell and/or workstation databases. The information such as necessary features is obtained by computerized data processing units and transmitted to the CDS module. The CDS module first finds the current congestion level of the system and then activates the corresponding group of knowledge bases. In the corresponding group, each knowledge base individually analyzes the given features, and then submits its expected short-term performance based on a

V2002-341

6

training block when its decision rule is applied to the current status of the system and job characteristics. Then, each knowledge base with its expected short-term performance competes with another in the CDS module that eventually selects, with a tie-breaking vector, the winner, that is, the control rule with the minimum number of expected tardy jobs of the near future. The CDS module consists of two algorithms as follows:

current status of the system and job characteristics may not be the same as the best rule with respect to the long run performance over the entire simulation run. The CDS-II algorithm considers the current status-based decision the first priority and uses the long run-based decision as the second priority, or the tie breaking decision. This algorithm works efficiently with frequently occurring ties and large K in real time without an exhaustive search.

Algorithm CDS-I (off-line) 1) Run simulation U times for each decision rule y. Start with the result of the 1st run. 2) List all K decision rules with respect to the number of tardy jobs from the smallest to the largest. Break ties by the mean tardiness. 3) Give the corresponding ranks 1,2,…,K to the K listed decision rules as

IV. EXPERIMENTS

their scores

∆ ku . If Uth run is finished, go to Step 4. Otherwise, go

back to Step 2 with the result of the next run. U

4) Rank all K decision rules based on

∑ ∆ ku . The lower the better.

u =1

5) Place the indexes k of all K decision rules into the tie breaking vector TB=[tb1, tb2,…, tbK] with the best first. Done. Algorithm CDS-II (on-line) 1) Calculate the congestion level ht of the feature vector xt at decision point t. 2) Activate the corresponding group Gh of knowledge bases by inputting the feature vector xt to knowledge bases KBh1, KBh2,…, KBhK of Gh . 3) Set the best decision variable BD to be 0. 4) Set the best performance variable BP=

min zˆ hk of expected tardy jobs k

over KBh1, KBh2,…, KBhK. 5) Set TB index w to be 1. 6) Start with KBh1 by setting decision index k to be 1. 7) If

zˆ hk = BP and tb w = k , then set BD = k and go to Step 9. Otherwise,

set k = k+1. 8) If k=K, set w = w+1 and go back to Step 6. Otherwise, go back to Step 7. 9) Return the best decision BD. Go back to Step 1, set t = t+1, and wait the next unequally distant decision point t+1.

It is well known that making decisions based on a single replication of each alternative decision is dangerous [18]. Statistical procedures to rank alternative decisions often require very strict assumptions [3]. Obviously, the comparison of the means of performance outcomes is not appropriate in this case, because that kind of comparison determines only whether the alternatives have the same performance or not, rather than ranking them. Also, the uth performance outcomes over different decision rules are correlated with one another due to the use of common random numbers. The CDS-I algorithm constructs tie-breaking vector TB in off-line, by collecting the long run performance of each decision from U simulation runs, to rank K decision rules before the test mode actually starts. This is a simplified version of the divide-andlabel algorithm for multiple objectives [13]. This CSD-I algorithm does not require any statistical assumptions with great flexibility by handling multiple runs well. The CDS-II algorithm selects the best decision rule with min zˆ hk in real time (much less than 1 second) by breaking k

ties among decision rules with tie-breaking vector TB. This is particularly effective, when the number of tardy jobs is of interest, which inevitably often involves many ties among alternative decision rules. The best decision rule under the

A Surface Mount Technology (SMT) process is chosen as a case study to construct a simulation test bed in this research. It is very difficult or often impossible to model and solve the overall test cell as a dynamic and stochastic analytical model without simulation [38]. Basic system information comes from Piramuthu et al. [27] whose system parameters represent the actual values in an SMT system. The data for a testing stage are adopted because this research focuses on the test/rework cell unlike Piramuthu et al. that also covers other stages with a different objective function and problem (e.g., no job rework). Reasonable values were assumed for missing parameters based on given basic ones (e.g., mean process times for each job type). TABLEIII. DATA FOR THE MODELED SYSTEM System Parameters Values Mean Testing Time in Family 1 0.608 (min.) Mean Testing Time in Family 2 3.008 (min.) The Number of Testing Stations 3 Mean Setup Time between Families 7.5 (min.) Mean Setup Time within the Same Family 2.5 (min.) Mean Repair Time in Family 1 8.5 (min.) Mean Repair Time in Family 2 6.3 (min.) Mean Allowance Factor 8.0 Mean Rework Rate in Family 1 0.30 Mean Rework Rate in Family 2 0.37 Interarrival Time Exponential (4.7 min.)

Table III gives the system parameter values used in the experiment. Two job families with seven different job types of each are considered. Job interarrival times are modeled by an exponential distribution. An arriving job is equally assigned to one of the 14 job types. The actual processing times used are generated from exponential distributions with the given means in Table 3 but are not known to the CDS, which has only mean values from the data history. A. Decision Rules Considered Four different decision rules are considered: (1) the SEPT with interruption and timeout, (2) the SEPT rule, (3) the EDD rule with interruption and timeout, and (4) the EDD rule. The controllers select, by the Shortest Expected Processing Time (SEPT) rule, the job with the shortest expected processing time, when a workstation is available. Since the CDS does not know the actual processing time of job j until the job is completed, the total expected processing time eˆ j of job j is approximated in the CDS by eˆ j = (1 + θ j ) ( s j + t j ) + θ j ⋅ r j ,

(12)

V2002-341 where θ j is the mean rework rate of job j. The mean setup time within the same family is used to estimate s j and all mean vales of job j are determined by its job type for equation (12). One rework is assumed to simplify recursive reworks. The controllers choose, by the Earliest Due Date (EDD) rule, the job with the earliest due date, when a workstation is available. Due dates are generated by the TWK method, one of common due-date assignment methods [1], [13]. The TWK method determines the due date of job j that is generated at time t by d j = t + α ⋅ eˆ j , (13) where α is the allowance factor that is sampled in every due date generation from an exponential distribution with the mean value given in Table 3. B. The Features Selected The final set of features was selected by the proposed hybrid heuristic approach. Most effective were the features that were characterized by deadline-related counters, congestion level, performance-related features, and minimum and average values. The final set of features includes the following: No. of tardy jobs in queue No. of tardy jobs in servers No. of potentially tardy jobs in queue No. of potentially tardy jobs in servers No. of jobs in queue / No. of jobs in queue and servers Minimum current tardiness of jobs in queue Average current tardiness of jobs in queue Minimum current tardiness of jobs in servers Average current tardiness of jobs in servers Mean tardiness in recent 10 jobs No. of tardy jobs in recent 10 jobs / 10

C. Data Generation, Partition, and Transformation The simulation models were developed using SIMAN [25]. Key parameters used in the models are shown in Table 3. The CDS module was written in C and inserted into SIMAN for real-time control purposes. The integrated system was implemented on a Sun Ultra10 Sparc station under SunOS Open Windows Version 8.0. In all simulation replications, applying SEPT rule without job preemption for 115200 minutes was used as a common initialization period for each 172800 minute run. This common initialization period was introduced to allow each simulation scenario with a single control rule to have the same starting system status at time 115200. Common random number streams were also used to reduce the variance in the estimate of the response variable over multiple replications. The training data were generated through a run for each simulation scenario. Only one decision rule continued to be applied in each scenario as a static control rule and necessary attributes were collected at data collection points, that is, each job arrival to the queue and each job completion, and saved in each corresponding file through the whole training data generation period (i.e., 115200-172800 minutes). This is actually the data partition by decision rule y. Five replications for each simulation scenario were performed to construct tiebreaking vector TB by algorithm CDS-I. Testing was performed ten times for each of static control rules and the

7 CDS dynamic control, again with the common initialization period. In the testing mode, only feature collection was made at each job arrival to the queue. The data set is large enough to use a single train-test partition as opposed to cross-validation. The testing data were generated with different random number streams from the training mode. Both training and testing data were normalized. if(norm_features[27] 0.022076 && norm_features[20] > 0.333333 && norm_features[20] 0.666667 && norm_features[25] > 0.468377 ) performance_array[0] = 4; if(norm_features[29] > 0.274906 && norm_features[29] 0.67 then apply the SEPT rule with job preemption, to incorporate the well known heuristic knowledge that the SEPT rule performs well in congested shop floors. Fig. 1 shows the examples of the classification rules. The numbers within the square brackets of norm_features correspond to those of Table I, while the numbers within the square brackets of performance_array are the indexes of the decision rules. The values of norm_features are all normalized ones, and the values of performance_array are the expected tardy jobs of the near future. D. The Operating Logic in the CDS Module Production control in dynamic environments raises the question of the frequency of decision making [29]. The CDS is fast enough to make frequent decisions in real time. In this research, feature collection points are synchronized into the point of each job arrival to the queue. The job that comes to the queue is either a new arrival or a preempted job. When a job arrives to the queue, necessary features are collected to see the current status of the system and jobs and used for decision making in the CDS. Decisions are made on both dispatching and preemption at the same time in the CDS right after the feature collection, but their implementation points are different. Preemption is done, as soon as a preemption decision is made (timeout, interruption, or nothing). On the other hand, actual dispatching is done at each job completion point by selecting the next job for an available workstation based on the dispatching decision (e.g., SEPT, EDD, FIFO, etc.) that was made at the earlier job arrival point. A preemption decision is more complicated than a dispatching decision. When job j is coming to the queue at

V2002-341

8

time t with its total expected processing time eˆ j , the CDS determines job preemption by the following procedure: Step 1) If any workstation is idle, no preemption. Done. Otherwise, go to Step 2. Step 2) If d j > t + eˆ j , go to Step 5. Otherwise, activate algorithm CDS-II and go to Step 3. Step 3) If the decision by CDS-II is either interruption or timeout, find the job b with the latest due date in the workstations, get its due date d b , and go to Step 4. Otherwise, no preemption. Done. Step 4) If db ≤ dj, no preemption. Done. Otherwise, preempt job b. Done. Step 5) Schedule the next decision point of the job j for d j - eˆ j . Done. The preconditions of job preemption in this research are based on the slack time of the incoming job and the latest due date of the jobs being served and independent of the status (either testing or rework) of the jobs in the workstations. Step 2 determines whether the incoming job j is urgent or not based on its due date dj and its preemption threshold of t + eˆ j . If the incoming job j is not urgent at its arrival time, the job j is checked again at time d j − eˆ j . It should try to preempt other jobs in the workstations not to be late, if it is still not being served at d j − eˆ j .

actually the biggest superior differences by the CDS over the other rules because the performance measure is the number of tardy jobs. Maximum values are the smallest superior differences over the other rules or the largest inferior difference over the EDD rule with job preemption. The pairedt test results show that the CDS dynamic control is significantly different (actually, better) from other static control rules with respect to the number of tardy jobs at the overall confidence level of 90 percent. This result shows that the CDS approach can take advantage of both long-run simulation results that are summarized by tie-breaking vector TB and short-term performance information that is mined from huge historical data. TABLE IV. THE NUMBER OF TARDY JOBS FOR TEN INDEPENDENT REPLICATIONS OF THE SIX RULES WITH THE SAMPLE MEAN AND STANDARD DEVIATION FIFO SEPT I/T SEPT EDD I/T EDD CDS 1 3506 2628 2774 2532 2990 2537 2 3508 2706 2827 2653 2867 2378 3 3541 2635 2819 2578 2828 2547 4 3515 2572 2719 2496 2837 2413 5 3483 2687 2851 2793 2844 2543 6 3516 2654 2803 2776 2836 2569 7 3516 2553 2792 2618 2668 2445 8 3534 2638 2823 2656 2827 2567 9 3538 2657 2813 2464 2884 2391 10 3460 2740 2785 2457 2836 2469 Mean 3511.70 2647.00 2800.60 2602.30 2841.70 2485.90 Std. Deviation 25.03 56.69 36.39 119.87 78.24 75.28

V. RESULTS Table IV shows the results of the test mode where simulation is done with different random number streams from the training mode. An industry common standard, the FIFO (FirstIn, First-Out) rule, was added for comparison purposes. Ten replications were performed with a different random number stream in the interarrival time distribution each time. It is clearly shown that the FIFO rule is much worse than other rules. Five other control rules seem to be competitive without any dominant rule, but the CDS dynamic control shows relatively better performance. Among four static control rules, the SEPT and EDD rules with interruption and timeout are better than the SEPT and EDD rules, respectively. This result is intuitively correct, because more flexibility is obtained to get better performance when job preemption is allowed. The CDS dynamic control is the best eight times out of 10 replications and the EDD rule with interruption and timeout is the best twice of the ten. The standard deviations give interesting results. The FIFO and SEPT rules show relatively low variances. On the other hand, the EDD rules suffer from high variances. The CDS dynamic control remains at average levels with respect to standard deviation. The statistical test shown in Table V gives more strict conclusions. The CDS dynamic control is compared with the other four static control rules by using the paired-t test. Since four comparisons are made, four confidence intervals are constructed at the 97.5 percent level of each to yield an overall confidence level of 90 percent [18]. Minimum values are

CDS – SEPT I/T CDS – SEPT CDS – EDD I/T CDS - EDD

TABLE V. THE RESULTS OF PAIRED-T TEST Mean Std. Dev. Min. Max. -161.1 93.2 -328 -71 -314.7 73.4 -449 -234 -116.4 103.7 -275 12 -335.8 102.3 -493 -223

P value 0.0004 0.0001 0.0062 0.0001

An interesting observation in preliminary experiments is that when any static control rule starts to dominate other rules in its performance, it becomes more and more difficult for the CDS dynamic control to beat the dominant rule. This is because, with a dominant rule, it is very difficult for the decision tree approach to learn better aspects of the other inferior rules compared to the dominant rule and, as a result, the CDS will show, at best, as good performance as the dominant rule. In real situations this is practically acceptable. First, when a clearly dominant rule exists, practitioners can directly adopt the rule without any further consideration of dynamic control. Second, individual data may fluctuate a great deal over time in dynamic and stochastic CIM systems, but the distributions, from which individual data come, do not change abruptly. Note that the distributions themselves are obtained through approximating data over time. Thus, the CDS knowledge bases can be constructed and used while static production control rules are competing with one another under similar environmental distributions. Although the CDS knowledge bases should be altered when surrounding environments significantly change, the cost of constructing new knowledge bases again is not expensive nowadays (several hours in this research). Recall that the major premise of data mining is

V2002-341

9

the number of tardy jobs

inexpensive access to high performance computational resources with high volumes of real-time enterprise data. Finally, as more static production control rules are considered in constructing the CDS knowledge bases, it will become more difficult for a single static rule to dominate all the others while the performance of the CDS will be improved. FIFO

3500

SEPT I/T SEPT

3000

among the four static control rules. This clearly explains that job preemption itself does not guarantee better results than non-preemption and finer production control may be necessary to achieve better performance of job preemption (e.g., an adaptive preemption timing, the adaptive selection of an preempted job, etc. depending on changing environments). The CDS control is the best one overall and the standard deviation results also show similar patterns to previous ones. The statistical test in Table VII also supports that the CDS dynamic control outperforms the other individual static control rules at a confidence level of 97.5 percent.

EDD I/T 2500 2000 1

2

3

4

5

6

7

8

EDD

VI. CONCLUSIONS

CDS

This paper presents a data mining-based production control approach for the testing and rework cell in a dynamic and stochastic CIM system. The proposed CDS observes the status of the system and jobs at each decision point and makes its decision on job preemption and dispatching rules in real time. The CDS combines two different knowledge sources. First, long-run simulation results are summarized into tie-breaking vector TB. Second, useful short-term performance information is mined by a data mining approach from huge historical data generated over the whole simulation period with data partition. A decision tree-based module generates symbolic classification rules on each partitioned data and saves them in the CDS knowledge bases. The CDS dynamic control shows better performance than static control rules, particularly when static control rules are competing with one another. Several conclusions can be drawn in this research. First, the CDS approach can take advantage of two different knowledge sources: (1) the long run performance of each rule and (2) the short-term performance of each rule on the different status of the system and jobs. Second, the CDS approach can extract and learn good aspects of each competing static rule, while it becomes difficult or meaningless when one static rule dominates other rules over all aspects. But this limitation may be alleviated, as more static production control rules are considered in constructing the CDS knowledge bases. Third, it is indicated from a few cases that finer control is necessary for job preemption to achieve better performance than non-preemption. Job preemption itself without finer control may not be enough to get better performance than nonpreemption in all cases. This result casts the finer control issue as a possible future research direction. It is important to note that blind application of data mining (data dredging) can be dangerous, leading to the use of meaningless patterns. It is always desirable to incorporate appropriate prior knowledge and to properly interpret mined patterns, which is still an open issue for the difficulty of automation. However, data mining can provide people with useful support for proper interpretation of mined patterns and strategic decisions, by presenting a set of symbolic rules.

9 10

cases Fig. 2. The number of tardy jobs for ten independent replications of each of the six rules

Another experiment was performed with the change in mean rework rates. This time, the mean rework rates in families 1 and 2 are 0.35 and 0.42. Accordingly, the mean of the exponential interarrival distribution was adjusted to 5.1 to make four static controls competitive. It is well known that the performance of dispatching rules depends upon shop load conditions that are often controlled by interarrival time distributions. As shown in Fig. 2, the performance of the FIFO rule is not even close to that of other rules again. TABLE VI. THE NUMBER OF TARDY JOBS FOR TEN INDEPENDENT REPLICATIONS IN THE SECOND EXPERIMENT

1 2 3 4 5 6 7 8 9 10 Mean Std. Deviation

FIFO SEPT-I/T SEPT EDD-I/T EDD CDS 3284 2593 2557 2470 2587 2408 3219 2417 2622 2644 2418 2403 3315 2605 2646 2697 2779 2423 3206 2536 2597 2467 2405 2365 3235 2507 2595 2417 2519 2425 3242 2407 2617 2572 2521 2315 3208 2406 2600 2395 2719 2295 3190 2407 2593 2330 2659 2317 3232 2599 2680 2486 2614 2514 3226 2597 2619 2430 2588 2392 3235.70 2507.40 2612.60 2490.80 2580.90 2385.70 37.69 89.88 33.32 114.40 120.41 65.49

TABLE VII THE RESULTS OF PAIRED-T TEST IN THE SECOND EXPERIMENT

CDS – SEPT I/T CDS – SEPT CDS – EDD I/T CDS – EDD

Mean -121.7 -226.9 -105.1 -195.2

Std. Dev. 61.0 55.0 113.3 139.8

Min. -205 -305 -274 -424

Max. -14 -149 28 -15

P value 0.0001 0.0001 0.0166 0.0017

Table VI shows that the SEPT or EDD rule with job preemption is still the best among four static control rules in most cases. However, four cases are observed in which the SEPT and EDD rules without job preemption outperform their counterparts with job preemption, respectively, and one case shows that the EDD rule without job preemption is the best

REFERENCES [1] [2]

Anderson, E.J. and J.C. Nyirenda, “Two new rules to minimize tardiness in a job shop,” International Journal of Production Research, Vol. 28, no. 12, pp. 2277-2292, 1990. Augusteijn, M.F., L.E. Clemens and K.A. Shaw, “Performance evaluation of texture measures for ground cover identification in

V2002-341

[3] [4]

[5] [6] [7]

[8] [9]

[10] [11] [12] [13] [14]

[15] [16] [17] [18] [19] [20]

[21] [22] [23]

[24]

satellite images by means of a neural network classifier,” IEEE Transactions on Geoscience and Remote Sensing, Vol. 33, no. 3, pp. 616-626, 1995. Belz, R. and P. Mertens, “Combining knowledge-based systems and simulation to solve rescheduling problems,” Decision Support Systems, Vol. 17, no. 2, pp. 141-157, 1996. Bock D.B. and J.H. Patterson, “A comparison of due date setting, resource assignment, and job preemption heuristics for the multiproject scheduling problem.” Decision Sciences, vol.21, no.2, pp.387-402, 1990. Chen C-C, Y. Yih, and Y-C. Wu, “Auto-bias selection for developing learning-based scheduling systems,” International Journal of Production Research, Vol.37, no.9, pp.1987-2002, 1999. Classon, F., Surface mount technology for concurrent engineering and manufacturing, New York, McGraw-Hill, 1993. Duvivier, F., “Automatic detection of spatial signature on wafermaps in a high volume Production,” Proceedings 1999 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (EFT'99), pp.61-66. Los Alamitos, CA, USA, 1999. Fayyad U., G. Piatetsky-Shapiro, and P. Smyth, “The KDD process for extracting useful knowledge from volumes of data,” Communications of the ACM, Vol.39, no.11, pp.27-34, 1996. Frasconi, P, M. Gori, and G. Soda, “Data categorization using decision trellises,” IEEE Transactions on Knowledge and Data Engineering, Vol.11, no.5, pp.697-712, 1999. Hoogeveen, H., C.N. Potts, and G.J. Woeginger, “On-line scheduling on a single machine: maximizing the number of early jobs,” Operations Research Letters, Vol. 27, no.5, pp.193-197, 2000. John, G.H., R. Kohavi, and K. Pfleger, “Irrelevant features and the subset selection problem,” Machine Learning: Proceedings of the Eleventh International Conference, pp121-129, 1994. Julien F.M., M.J. Magazine, and N.G. Hall, “Generalized preemption models for single-machine dynamic scheduling problems,” IIE Transactions, Vol.29, no.5, pp.359-372, 1997. [Kwak C., T. Morin, and Y. Yih, "A Multicriteria Approach to Timeout Collaboration Protocol," Research Memorandum 01-11, School of Industrial Engineering, Purdue University, West Lafayette, IN, 2001. Kwak, C., J.A. Ventura, and K. Tofang-Sazi, “A neural network approach for defect identification and classification of leather fabric,” Journal of Intelligent Manufacturing, Vol. 11, no. 5, pp. 485-499, 2000. Kwak, C. and Y. Yih, “Simulation comparison of collaboration protocol-based testing models," International Journal of Production Research, Vol. 39, No. 13, pp. 2947-2956, 2001. Kwak, C. and Y. Yih, “Statistical analysis of factors influencing the performance of the protocol-based testing model,” International Journal of Production Research, to appear. Lane, T. and C.E. Brodley, “Temporal sequence learning and data reduction for anomaly detection,” ACM Transactions on Information and System Security, vol. 2, no.3, pp295-331, 1999. Law A. M. and W.D. Kelton, Simulation Modeling And Analysis, New York, McGraw-Hill, Inc., 1991. Lu, H., R. Setiono, and H. Liu, “Effective data mining using neural networks,” IEEE Transactions on Knowledge and Data Engineering, Vol.8, no.6, pp.957-961, 1996. Manko, H.H., Soldering handbook for printed circuits and surface mounting: design, materials, processes, equipment, trouble-shooting, quality, economy, and line management, New York, Van Nostrand Reinhold, 1995. McDonald, C.J., “New tools for yield improvement in integrated circuit manufacturing: can they be applied to reliability?” Microelectronics Reliability, Vol.39, no.6-7, pp.731-739, 1999. Neter, J, W. Wasserman, and M. H. Kutner, Applied linear statistical models : regression, analysis of variance, and experimental designs, Homewood, IL, Irwin, 1990. Park S.C., N. Raman, and M.J. Shaw, “Adaptive scheduling in dynamic flexible manufacturing systems: a dynamic rule selection approach,” IEEE Transactions on Robotics and Automation, Vol.13, no.4, pp.486502, 1997. Paterok, M. and M. Ettl, “Sojourn time and waiting time distributions for M/G/1 queues with preemption distance priorities,” Operations Research, Vol. 42, No. 6, pp. 1146-1161, 1994.

10 [25] Pegden, C.D., R.E. Shannon and R.P. Sadowski, Introduction to Simulation Using SIMAN, New York, McGraw-Hill, Inc., 1995. [26] Perfector, E.D., K.S. Desai, and G. McAfee, “MCM-D/C yield improvements through effective diagnositics,” The International Journal of Microcircuits and Electronic Packaging, Vol. 22, No. 4, pp. 411-417, 1999. [27] Piramuthu S, N. Raman N, and M.J. Shaw, “Learning-based scheduling in a flexible manufacturing flow line,” IEEE Transactions on Engineering Management, Vol.41, no.2, pp.172-182, 1994. [28] Quinlan J.R., C4.5 programs for machine learning, San Mateo, Morgan Kaufmann Publishers, Inc., 1993. [29] Robb D.J. and T.R. Rohleder, “An evaluation of scheduling heuristics for dynamic single-processor scheduling with early/tardy costs,” Naval Research Logistics, Vol.43, no.3, pp.349-364, 1996. [30] Sauvaire P., J.A. Ceroni, and S.Y. Nof, "Information Management for FMS and Non-FMS Decision Support Integration," International Journal of Industrial Engineering Applications and Practice, Special Issue on Information Systems, Vol. 5, No. 1, pp. 78-87, 1998. [31] Strauss, R., Surface mount technology, Oxford; Boston, ButterworthHeinemann, 1994 [32] Sun, Y-L and Y. Yih, “An intelligent controller for manufacturing cells,” International Journal of Production Research, Vol.34, no.8, pp.2353-2373, 1996. [33] Thawonmas, R. and S. Abe, “Function approximation based on fuzzy rules extracted from partitioned numerical data,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, Vol.29, no.4, pp.525-534, 1999. [34] Thornhill, N.F., M.R. Atia, and R.J. Hutchinson, “Experiences of statistical quality control with BP chemicals, Grangemouth,” International Journal of COMADEM, vol.2, no.4, pp.5-10, 1999. [35] Uzsoy, R., C.Y. Lee, and L.A. Martin-Vega, “A review of production planning and scheduling models in the semiconductor industry. Part II: Shop-floor control,” IIE Transactions, Vol. 26, no. 5, pp. 44-55, 1994. [36] Wang, X.Z. and C. McGreavy, “Automatic classification for mining process operational data,” Industrial and Engineering Chemistry Research. Vol. 37, no 6, pp. 2215-2222, 1998. [37] Wehankel, L, “Machine learning approaches to power-system security assessment,” IEEE Intelligent Systems and Their Applications, vol.12, no.5, pp.60-72. 1997. [38] Yang, J. and T-S Chang, “Multiobjective scheduling for IC sort and test with a simulation test bed,” IEEE Transactions on Semiconductor Manufacturing, Vol.11, no.2, pp.304-315, 1998. Choonjong Kwak received his B.S. in industrial engineering from Korean University in 1996 and M.S. in industrial engineering from Pennsylvania State University in 1998. Yuehwern Yih received her B. S. in industrial engineering from National Tsing Hua University, Taiwan, in 1984 and her Ph.D. in industrial engineering from University of Wisconsin-Madison in 1988. She is currently an Associate Professor in the School of Industrial Engineering at Purdue University. She published over 70 papers and contributed several book chapters in the areas of real-time scheduling and machine learning in controlling complex production systems. She is also a co-editor of the book, Manufacturing Cells: A System Engineering View (London: Taylor & Francis Ltd., 1995). Dr. Yih received the NEC Faculty Fellowship, the NSF Young Investigator Award, and the SME 1998 Dell K. Allen Outstanding Young Manufacturing Engineer Award. She is a member of Omega Rho,.IIE. INFORMS, AAAI.