Model for Extracting Information from Production Schedule Data

5 downloads 16317 Views 115KB Size Report
The analysis of large data sets, i.e. using “Big Data”, is an enabler of increased ... might be that general data mining tools are not enough and smarter analysis is ...
Model for Extracting Information from Production Schedule Data Henri Tokola, Esko Niemi Aalto University, Department of Mechanical Engineering, Esbo, Finland [email protected]

Abstract Data collected from production is available in computerised production control systems, but its current utilisation could be improved. To introduce one way to use production data, this paper studies automatic post-analysis of the schedule data of a single-machine system. This kind of estimation is extra information that can be used to strengthen planning and reporting systems. The approach in this paper is different from previous studies as, instead of just considering e.g. bottlenecks or the critical path, our method estimates different causes of tardiness for a tardy job. In the model we automatically find a job for which the finish time is after the deadline. After selecting the target job, our model finds out what the likely causes of the tardiness are. The following five causes of tardiness are studied: bad scheduling, a rush job, a long job, unavailable capacity and bottleneck congestion. For each cause and each job, there is an index that estimates how significant the cause is. The indices are combined to calculate how much the other individual jobs affect the tardiness of the target job. The paper provides an example where the model is used. Using the model extra information is revealed from the existing schedule data. The model is easy to understand and implement and the computational complexity of the model is low. Keywords: data mining, scheduling, tardiness.

1.

Introduction

The analysis of large data sets, i.e. using “Big Data”, is an enabler of increased productivity [1]. Nedelcu [2] writes that raw material for big data is already available in the manufacturing industry and that the manufacturing industry stores more data than any other sector. Thus, manufacturing should be a fruitful area for big data analysis, e.g. for finding new information from the existing data using data mining. According to Sagiroglu and Sinanc [3], manufacturing can also benefit from big data by improving its forecasting, supply chain planning, sales, production operations and web-based applications. However, Brown et al. [4] state that although indices show data is easy to capture from manufacturing, the potential value that can be derived from the data is low. It might be that general data mining tools are not enough and smarter analysis is needed. As stated above, data from production is available in computerised production control systems, but its current utilisation could be improved. To introduce one way to use production data, this paper studies automatic post-analysis of the schedule data of a single-machine system. The aim of the automatic analysis is to estimate what the causes of tardiness of orders are from the existing data, which contains only simple schedule information. This kind of estimation is extra information that can be used to strengthen planning and reporting systems; it can help the users of planning and reporting software to quickly find the causes of tardiness from the data by pointing out the jobs and causes that most probably make a specific job tardy. If the data mining of scheduling data is considered, most of the earlier research relates to topics focused on dispatching rules and automatic analysis of bottlenecks. Dispatching rules determine the order in which new jobs are started. Data mining is used to discover or select these rules from existing production data. Li and Olafsson [5] discover

dispatching rules that appear to be used. Thus the constructed rules mimic the human scheduler’s decisions, which are believed to be of high quality. Different dispatching rules are known to perform better than others depending on performance criteria and under different circumstances. Shiue [6] classifies well known dispatching rules using production data generated using simulation. The developed dynamic rule selection system clearly outperforms the use of static priority rules. Bottlenecks cause congestion in production and their removal increases the throughput of the system and therefore it is important to locate them. Bottlenecks are in focus in the well-known theory of constraints; see e.g. Blackstone [7] or Goldratt [8]. The research related to automatic production data analysis has recently focused on bottleneck detection systems. A bottleneck detection system was implemented in the papers of Roser et al. [9] and Roser et al. [10]. In their models, the short-term bottleneck is the machine that has the longest active period at a time. Li et al. [11] present a model that finds the bottlenecks on a production line from the blockage and starvation percentages of machines. Kernan et al. [12] propose a method for calculating a resource constraint metric that takes worker skills into account and studies how they constrain the production. As the above papers are recent and limited in their scope, there is clearly a need for studies that analyse the schedule data further. Also related to the topic, many papers in the construction industry literature focus on the analysis of project timetables after delays; see e.g. Ibbs and Nguyen [13] or Alkass et al. [14]. The purpose of these papers is to find out how penalty claims can be divided among the tasks of a project after a delay. There are several methods to analyse what the problem in the schedule was. These methods usually use the critical path method (CPM) to find out the subtasks that cause the tardiness. In this way the penalty claims can be addressed. The methods, however, do not consider the causes of the tardiness.

The approach in this paper is different from the abovementioned papers, as, instead of considering dispatching rules, bottlenecks or the critical path, our method estimates different causes of tardiness for a tardy job. The causes that are considered are bad scheduling, a rush job, a long job, unavailable capacity and bottleneck congestion. These kinds of causes of delays are typical in production, but they are overlooked in the literature, as pointed out by Gupta et al. [15]. The model in this paper is implemented for the analysis of single-machine systems. One reason for this is that it is typical that production has one bottleneck machine which is the only scheduled machine. If a multi-machine case is considered, the model could be used individually for each of the machines. In future, however, the method could be adjusted for multimachine cases. The rest of the paper is organised as follows. Next, Section 2 presents the model used for estimating the causes of tardiness. In Section 3, the model is used in an example situation. Section 4 discusses the usefulness of the model by studying the results of the example. Finally, Section 5 draws conclusions and points out future research directions.

2.

Fig. 1: Example of notation shown in a Gantt chart in the case of three jobs. It is assumed that the times follow the following constraints. (1) (2) It is also assumed that the jobs are indexed in the order in which they are processed, so that , ∀ ∈ 1,2, … ,

Model for estimating causes of tardiness

Notation The following symbols are used in the paper: – Number of jobs – Job number, ∈ 1,2, … , – Tardy target job under study, ∈ 1,2, … , – Release time for job j – Deadline for job j – Starting time for processing of job j – Finish time for processing of job j – Processing duration of job j – Allowance time for job j – Rush job index for job j – Bad scheduling index for job j – Long job index for job j –Unavailable capacity index for job j –Bottleneck congestion index for job j – Criticality index for job j – Probability that job j causes delay of target job. 2.1. Single machine system This paper studies the schedule data of a single-machine system. The information that has to be available is the following. There exist J jobs, numbered from 1 to J. j denotes a single job. For each individual job j there has to be an available release time , a deadline , a starting time and a finish time . Fig. 1 illustrates the notation in a three-job case.

1

(3)

This also implies that only one job can be processed at a time. Pre-emption is not allowed. The processing duration can be calculated from the starting time and finish time by (4) Using ∑

s, the average processing duration

∈ , ,..,

is simply

/

(5)

The time between the deadline and release time, i.e. the allowance time, is denoted by , and it can be calculated from the deadline and release time by (6) The average allowance time is denoted by calculated as follows: ∑

∈ , ,..,

/

. It is (7)

2.2. Model Next, the model for estimating the causes of tardiness is introduced. First, we need a target job t, for which the cause of the tardiness is estimated. It can be selected automatically by finding a job that is tardy, i.e. finding one such job t for which the following holds: (8) After selecting the target tardy job t, our model finds out what the likely causes of the tardiness are. In our model, the following five causes of tardiness are studied: bad scheduling, a rush job, a long job, unavailable capacity and bottleneck congestion. For each cause and each job, there is an index that estimates how significant the cause is. The indices are combined to calculate how much the causes affect the tardiness of the target job. The indices are formulated next, one at a time. Bad scheduling (BS) occurs when a job that has a loose deadline is processed instead of a job with a tight deadline.

This occurs when products are not processed according to their earliest due date (EDD) priority. Thus, when the target job is considered, bad scheduling occurs when a job j with a deadline after the target job t is processed before the deadline of the target job, i.e. if the following occurs: ⋀





(9)

Further, because it is more harmful if the processing time of job j is longer, bad scheduling is weighted by / . Using this , as we calculate the index value for bad scheduling, follows: / , if ⋀ 0, otherwise



(11)

0, if

A long job (LJ) can be the result of a breakdown or other problems. Here, the job is considered a long job if its processing time is longer than the average processing time . The corresponding index value is obtained by dividing the difference between the processing time and the average processing time by the average processing time. Thus, the mathematical equation for a long job index is the following: / , if

(12)

0, if

Problems with unavailable capacity (UC) occur when jobs are not processed in a situation where the machine is free. This can be caused e.g. by breakdowns or the absenteeism of operators. This kind of situation shows as a gap between the processing of the jobs. The length of the gap divided by the average duration of the processing is the value of the corresponding index. It is calculated as follows: / , 0

(13)

0, otherwise Bottleneck congestion (BC) occurs if the machine under study is overbooked. The jobs are part of the bottleneck congestion if they are processed successively just before the tardy job. The index value for a job is calculated recursively downwards, starting from the target job. If there is slack between the starting time of a job and the finish time of the previous job, the bottleneck congestion is reduced. The amount of the reduction is the slack divided by the average processing time. Of course, the amount of this reduction cannot make the index negative. Thus, the actual index is the following:

/

,1

if j ∈ 1,2, … , 1, if 0, otherwise

, 1

(14)

By taking into account the above indices, a criticality index is constructed for each job j. Bottleneck congestion is important here as it indicates how much the causes of tardiness of a specific job can affect the tardiness of the target job. Therefore, the criticality index is calculated by multiplying the bottleneck congestion index by the sum of the other indices, i.e. .

(15)

have been obtained, the probability that After all the the specific job caused the delay of the target job can be calculated. It is simply divided by the sum of all the , i.e. /∑

(16)

This number can be used to find out the greatest cause of the tardiness. The job with the greatest number is most probably the cause of the tardiness.

3.

/ , if

min

(10)

A rush job (RJ) can be identified by looking at the allowance time of the jobs. If the allowance time for a job is small, the job is considered a rush job. More precisely, the actual allowance time of the job is compared to the average allowance time to find out if a job is a rush job. If is smaller than , the job j is considered a rush job. In that case, the corresponding index value is obtained by calculating the difference from the average allowance time and dividing the difference by the average allowance time. Thus, the actual index value is calculated as follows:

, min if

∗ 1

Example

This section describes an example where the model introduced above is used. The example data has eight jobs that are processed in a single-machine system. The data, which is randomly generated, is shown in Table 1. The same data is again shown as a Gantt chart in Fig. 2. Job 8, the only tardy job, is selected as the target job, and then the results are calculated from Table 1 using the model. The results from the model are shown in Table 2. Table 1: The example schedule data where eight jobs are processed in a single-machine system. j 1

0.00

47.53

2.00

10.27

2

0.00

80.87

10.27

19.54

3

11.22

82.53

19.54

38.86

4

20.69

84.66

38.86

52.09

5

33.44

140.00

52.09

62.00

6

44.80

101.47

65.00

83.25

7

80.85

108.49

83.25

98.75

8

90.88

112.49

98.75

118.79

released later. So it is not necessarily poorly scheduled. This could be taken into account in extended models in the future. Second, job 1 gets quite a high percentage (12%), but because it was processed much earlier, it is not a likely cause of the tardiness. Thus, it seems that the model could be improved for high numbers of consecutive jobs. The model has many advantages. It is easy to understand as it uses simple formulae. It needs only a few facts from the production schedule. The parts of the model can also be used with an even smaller amount of information. Table 3 lists the information that is needed by the different causes of tardiness. For example, the rush job index needs only release and deadline information. Table 3: Information needed for different indices. Fig. 2: Example schedule data in a Gantt chart. Job 8 is tardy, and the tardiness is marked in the figure using a darker colour. Job 8 is selected as the target job in the example.

Detail

– Release time for job j

Table 2: The indices given by the model. Job 8 is selected as the target job. Zero values are shown as empty cells. j 1

0.40

0.13

2 3

0.10

4

0.19

5

4.

0.25

0.64

6

0.29

7

0.65

8

0.73

0.18

0.29

0.19

0.81

0.43

12%

0.81

0.00

0%

0.81

0.28

8%

0.81

0.16

4%

0.81

0.52

14%

1.00

0.66

18%

1.00

0.65

18%

1.00

1.02

28%

Discussion

This section analyses the results of the above numerical example and discusses the use of the model. The purpose is to find out how well the model behaves in the example and discuss the advantages of the model. The results of the example in Table 2 show that most probably (28%) job 8 has caused its tardiness by itself. The indices and of job 8 have values greater than 0, which means that job 8 is a rush job and a long job. Jobs 7 and 6 have the second greatest probability (18%) of being the cause of the tardiness of job 8. Indices suggest that job 7 is a rush job ( =0.65). Job 6 is a rush job ( =0.29) and a long job =0.18) and it has problems with unavailable capacity ( ( =0.19). Job 5 has the third greatest probability (14%) of being the cause of the tardiness of job 8. Its actual cause is bad scheduling ( =0.64), as it could have been processed after job 8. Jobs 1, 2, 3 and 4 have lower probabilities (12%, 0%, 8% and 4%, respectively) of being the cause of the tardiness of job 8. When the results from the model are compared with the schedule, the model generally seems to give sound results. However, there are some issues that might be seen as problems. First, the model indicates bad scheduling for job 5, which is started without knowing that job 7 and job 8 are

– Bad scheduling index – Rush job index – Long job index – Unavailable capacity index – Bottleneck congestion index – Criticality index

– Deadline for job j

X X

– Starting time for processing of job j

– Finish time for processing of job j

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

The model reveals new relevant information from the data: rush jobs, bad scheduling and bottlenecks. Data mining with the proposed model can e.g. answer the following questions: Why are there tardy jobs? How likely is it that a rush job causes tardiness? How often is a machine a bottleneck? The computational complexity of the model is low. The computational complexity of processing all tardy jobs is at least because the model needs backtracking from the tardy job. However, the backtracking distance can be limited, which will make the computational complexity , linear in terms of the problem size .

5.

Conclusions

This paper studied how the cause of a tardy job can be found from simple schedule data. In order to do that, it introduces a model that can be used to analyse the schedule data of a singlemachine system. The model studies a target job that is tardy, and reveals whether the tardiness is caused by having a problem with bad scheduling, by rush jobs or long jobs, by

having problems with unavailable capacity or by being part of bottleneck congestion. The benefits of the model are:  The model is easy to understand and implement. It does not use any kind of distributions or complex optimisation algorithms;

[8] E. M. Goldratt, The goal: a process of ongoing improvement, North River Press, 2011. [9] C. Roser, M. Nakano, M. Tanaka, Shifting bottleneck detection. Proceedings of the Winter Simulation Conference, Vol. 2, pp. 1079-1086, 2002.



Using the model, extra information is revealed from the existing schedule data. The value of existing production data might be low without this kind of extra information;

[10] C. Roser, M. Nakano, M. Tanaka, Comparison of bottleneck detection methods for AGV systems. Proceedings of the Winter Simulation Conference, Vol. 2, pp. 1192-1198, IEEE, 2003.



The computational complexity of the model can be linear in terms of the size of the problem. Thus, the model can be used efficiently for data mining.

[11] L. Li, Q. Chang, J. Ni, Data driven bottleneck detection of manufacturing systems, International Journal of Production Research, Vol. 47, pp. 5019-5036, 2009.

In this paper, the model has been shown to work in a simple example. In future, the model could be validated using case studies with large-scale data, where the model can be used to reveal what the typical scheduling problems in a company are. The model can also be extended for other, more complex cases. Possible extensions include a multi-machine case and a case where not all the information is available. It would also be interesting to use a similar approach for other kinds of problems such as setup-time optimisation or supply chain delays. Additionally, it is not clear how the model can be visualised in reporting software. This can be another research direction in the future.

Acknowledgment This research was carried out as part of the Finnish Metals and Engineering Competence Cluster’s (FIMECC) MANU programme in the LeanMES project.

References [1] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, A. H. Byers, Big data: The next frontier for innovation, competition, and productivity, http://www.mckinsey.com/business-functions/businesstechnology/our-insights/big-data-the-next-frontier-forinnovation, 2011. [2] B. Nedelcu, About big data and its challenges and benefits in manufacturing, Database Systems Journal, Vol. 4, pp. 10-19, 2013. [3] S. Sagiroglu, D. Sinanc, Big data: a review, International Conference on Collaboration Technologies and Systems (CTS), pp. 42-47, 2013. [4] B. Brown, M. Chui, J. Manyika, Are you ready for the era of ‘big data’, McKinsey Quarterly 4, pp. 24-35, 2011. [5] X. Li, S. Olafsson, Discovering dispatching rules using data mining, Journal of Scheduling, Vol. 8, pp. 515-527, 2005. [6] Y-R. Shiue, Data-mining-based dynamic dispatching rule selection mechanism for shop floor control systems using a support vector machine approach, International Journal of Production Research, Vol. 47, pp. 3669-3690, 2009. [7] J. H. Blackstone, Theory of constraints – a status report, International Journal of Production Research, Vol. 39, pp. 1053-1080, 2001.

[12] B. Kernan, A. Lynch, D. Tung, C. Sheahan, A novel metric for determining the constraining effect of resources in manufacturing via simulation, International Journal of Production Research, Vol. 49, pp. 3565-3584, 2011. [13] W. Ibbs, L. D. Nguyen, Schedule analysis under the effect of resource allocation, Journal of Construction Engineering and Management, Vol. 133, pp. 131-138, 2007. [14] S. Alkass, M. Mazerolle, F. Harris, Construction delay analysis techniques, Construction Management & Economics, Vol. 14, pp. 375-394, 1996. [15] J. N. Gupta, E. F. Stafford Jr., Flowshop scheduling research after five decades, European Journal of Operational Research, Vol. 169, pp. 699-711, 2006.

Suggest Documents