Improving Data Consistency in Production Control - Core

0 downloads 0 Views 183KB Size Report
Excellent production planning and control (PPC) processes are a main prerequisite for managing .... manual feedback from the shop floor, an incorrect aggregation ..... [8] Lakshminarayan K, Harp SA, Samad T. Imputation of missing data in.
Available online at www.sciencedirect.com

ScienceDirect Procedia CIRP 41 (2016) 51 – 56

48th CIRP Conference on MANUFACTURING SYSTEMS - CIRP CMS 2015

Improving data consistency in production control Christina Reutera, Felix Brambringa,* a

Laboratory for Machine Tools and Production Engineering (WZL) of RWTH Aachen University, Steinbachstr. 19, 52074 Aachen, Germany

* Corresponding author. Tel.: +49-241-80-28241; fax: +49-241-80-22293. E-mail address: [email protected]

Abstract Under volatile market conditions, manufacturing companies, which achieve a high adherence to promised delivery dates, possess a considerable advantage over their direct competitors. However, due to dynamically changing production circumstances, this logistical target is rather hard to reach. Excellent production planning and control (PPC) processes are a main prerequisite for managing inevitable turbulences which occur in every production system and which impede to follow through with detailed scheduled production plans. An often overlooked reason for deviations between the originally planned and actually realized production program is the inadequate data quality of master and transaction data. PPC processes such as detailed scheduling, production control and production monitoring rely heavily on a vast volume of data in order to update short-term production plans, derive conclusions for ad-hoc control interventions and monitor resources’ efficiency as well as production job statuses. Typical techniques for dealing with inadequate data quality in the business context involve implementing integrity constraints in databases and defining data quality processes for the whole organization. Evidently, these classic approaches are not successfully preventing manufacturing companies from dealing with inadequate data quality in PPC processes. This paper presents a new approach for mitigating the negative effects of deficient data quality in production control by adapting data mining (DM) algorithms in order to estimate probable values for typical data inconsistencies in data relevant for production control. The algorithms are tested on a real-world data set of a typical mid-sized manufacturing company and their capability of retrieving the correct values is quantified. © Authors. Published © 2015 2015 The The Authors. Published by by Elsevier Elsevier B.V. B.V. This is an open access article under the CC BY-NC-ND license Peer-review under responsibility of the Scientific Committee of 48th CIRP Conference on MANUFACTURING SYSTEMS - CIRP CMS (http://creativecommons.org/licenses/by-nc-nd/4.0/). 2015. Peer-review under responsibility of the scientific committee of 48th CIRP Conference on MANUFACTURING SYSTEMS - CIRP CMS 2015 Keywords: Production Planning and Control; Data mining; Scheduling accuracy; Beyond lean manufacturing

1. Introduction Manufacturing companies in high-wage countries like Germany operate in a competitive environment under volatile market conditions which give rise to a number of mutually reinforcing challenges [1]: in a globalized economy, competitors from emerging countries strive to catch up in terms of high-quality products and services, while customers’ increasing expectations towards highly customized products with short delivery times force these companies to manufacture a high variety of individualized, small-scale series products in job shop production with complex material flows [2, 3]. Studies show, that manufacturing companies which manage to handle these challenges while maintaining a high adherence to promised delivery dates, possess a considerable competitive advantage [4]. In spite of its importance, manufacturing companies often struggle with

fulfilling this logistical target due to unpredictable deviations caused by machine failures, missing raw materials or short-term customer changes [5]. Excellent production planning and control (PPC) processes can be considered a main prerequisite for coping with these inevitable turbulences which occur in every production system and are reinforced by dynamic market conditions [6]. Consequently, manufacturing companies employ IT systems which support near- and middle-term scheduling tasks and monitoring of production processes. These so-called Advanced Planning and Scheduling (APS) systems integrate master data usually stored in the database of an Enterprise Resource Planning (ERP) system and a wide range of production feedback data, such as time stamps and production volumes, in order to plan and update detailed scheduling of production jobs. The production feedback data is gathered via multiple sources like Production Data Acquisition (PDA) or

2212-8271 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the scientific committee of 48th CIRP Conference on MANUFACTURING SYSTEMS - CIRP CMS 2015 doi:10.1016/j.procir.2015.12.116

52

Christina Reuter and Felix Brambring / Procedia CIRP 41 (2016) 51 – 56

Manufacturing Execution (MES) systems, specifically installed sensors or even manual reporting [5, 7]. As a matter of fact, incomplete and inconsistent data are inevitable when gathering live data via multiple sources and production feedback data does not represent an exception [8, 9]. Evidently, the existing approaches for ensuring high data quality cannot successfully prevent or at least mitigate this problem. In this paper, a new approach for mitigating the negative effects of missing, noisy and inconsistent data in production planning and control by adapting data mining algorithms for estimating correct values is presented and quantified. Results indicate that implementing this approach in PPC processes of manufacturing companies will reduce laborious data cleaning, increase transparency over current production job statuses and ultimately help to improve the logistical performance of manufacturing companies. The following paper is structured in five sections. First, a problematization concerning relevant data general inconsistencies in production feedback data and their negative impacts regarding PPC processes are given. In order to distinguish the presented approach from the current state of the art, existing practices are identified, clustered and reviewed. Subsequently, a detailed description of our approach is given and three specific algorithms are discussed in detail. For assessing its advantageousness, the algorithms’ capabilities of retrieving the correct values are tested. The paper concludes with a brief conclusion and an outlook on future research activities to further refine and validate the presented approach. 2. Typical inconsistencies in production control relevant data Detailed scheduling, production control and production monitoring are among the most important PPC processes which heavily rely on a vast amount of production feedback data in order to update short-term production plans, derive conclusions for ad-hoc control interventions and monitor resources’ efficiency as well as production job statuses. However, the quality of these data is regularly impaired by a number of typical data inconsistencies and errors. High data quality can be defined as a state in which the data satisfy the requirement “fitness for use” [10, 11]. Therefore, high data quality must be defined with the underlying data usage in mind. For PPC processes, data can be considered of high quality if it fulfills the following three criteria [12, 13]: x Consistency: information can be won from the data without any contradictions surfacing x Correctness: reality is reflected correctly by the data x Completeness: no data which should have been gathered is missing However, various reasons, such as imprecise or incomplete manual feedback from the shop floor, an incorrect aggregation of data in business databases, uncoordinated deviations from planned assignments or failing machine-integrated sensors, lead to noisy, incomplete and inconsistent data and diminish

Data inconsistency

Graphic representation

Mathematical representation

tS, i

tE, i tS, i < tS, i+1 < tE, i < tE, i+1

(b) tS, i+1

tE, i+1

tS, i

tE, i

tS, i+1

tE, i+1

(c)

(d)

tS, i

tE, i

tS, i+1

tE, i+1 tS, i

tS, i+1 < tE, i+1 < tS, i < tE, i tS, i+1

tE, i+1 tE, i

tS, i

Legend

tS, i+1 < tS, i < tE, i+1 < tE, i

tE, i

(e)

(f)

tS, i < tS, i+1 < tE, i+1 < tE, i

tS, i+1

tE, i+1

start duration

end possible offset

tS, i+1 < tS, i < tE, i < tE, i+1 tS, i: start time stamp of process step i tE, i: end time stamp of process step i

Fig. 1. Different cases of time inconsistencies in production feedback data

its quality [14]. Typical data inconsistencies and errors in production feedback data include missing information concerning the utilized work stations and incomplete or inconsistent start and end time stamps of individual process steps. These data inconsistencies are labeled (a) to (g) and visualized in Figure 1: (a) (b) to (f) (g)

Missing information concerning the utilized work station Different cases of time inconsistencies (cf. Fig. 1) Missing start and end time stamp of individual process steps

Table 1 shows the rates of occurrence for these typical data inconsistencies in four real-world production feedback data sets gathered over several months at four mid-sized German manufacturing companies with job shop production. Table 1. Data inconsistencies in four real world production feedback data sets Company Time period # data sets

A

B

C

D

Jan 2012 to Nov 2012

Jan 2013 to Jan 2014

Jan 2010 to Feb 2012

Jan 2010 to Nov 2011

92,049

89,188

225,937

273,760

(a)

0.1 %

0.0 %

0.2 %

0.1 %

(b)

0.8 %

5.9 %

6.5 %

0.2 %

(c)

0.2 %

2.9 %

0.7 %

0.3 %

(d)

0.1 %

0.4 %

0.3 %

0.1 %

(e)

0.1 %

0.5 %

0.4 %

4.2 %

(f)

0.1 %

0.1 %

0.3 %

0.1 %

(g)

5.7 %

14.9 %

6.9 %

3.1 %

Taking into consideration the absolute numbers of recorded data sets which represent individual production processes, these relatively low percentage values still represent a substantial number of recorded data errors or inconsistencies. Incomplete and inconsistent production feedback data give rise to various problems concerning PPC processes: in production monitoring, the major constraint is the reduction of sample size which diminishes the power and validity of any calculations concerning the achievement of defined logistical

Christina Reuter and Felix Brambring / Procedia CIRP 41 (2016) 51 – 56

targets. For example, the occurrence of data inconsistency (a) given in Table 1 inhibits the correct calculation of the actual utilization of a work station and the adherence to promised delivery dates since the occupancy duration of the work station cannot be accounted for. In detailed scheduling, production feedback data is used for updating the list of current production jobs and their allocation on the work stations with imply chronological order of the processes. Errors and inconsistencies in these input data do not mirror the actual state of production jobs and ultimately result in a production plan which can be considered unrealistic from the start. Following this rationale, unsatisfactory results in the adherence to promised delivery dates are deeply rooted in the planning systematic and its flaws. The problematic nature of errors and inconsistencies in databases is well known in all empirical fields of science [15, 16, 17]. Already in the 1970s, RUBIN formulated three universal mechanisms for missing data [18]: x MCAR: x MAR: x MNAR:

Missing Completely At Random Missing At Random Missing Not At Random

In RUBIN’s framework, missing data is considered a probabilistic phenomenon with a random distribution. The applicability of statistical methods depends on the underlying missing data mechanism [19]. In a MCAR situation, the probability of missing data in one variable is independent of this variable itself and all other external influences which means that the available data can be used to make inferences without introducing a bias into the data. In case of a MAR mechanism, the probability of missing data only depends on the observed data but not on the missing data itself, meaning the cause for the missing data is some external factor. In this case, predictions can still be made solely from the available variables in the dataset. In contrast to the previous two cases, in a MNAR situation the missing data is non-random and depends on the missing variable itself. The MNAR mechanism implicates it is not possible to make estimations of the missing data which are exclusively based on the available data. Therefore, in this case no generic method for handling missing data exists but must be developed individually by taking external data which might be available into consideration [20]. The MCAR hypothesis can be tested by conducting univariate t-tests [17]. Applying this approach on the errors listed in Table 1, suggests that the errors are in fact not MCAR which was to be expected since the probability of an existing underlying cause are high in a manufacturing and assembly environment. For inconsistent data, no comparable classification framework exists which could guide the handling of the inconsistencies. However, in many cases it is possible to decide which of the inconsistent information is more trustable and to consider the remaining data as missing under the MNAR mechanism. Due to its common occurrence in real world data, the mechanisms, negative implications and remedies of missing and inconsistent data in general have been widely studied and discussed by researchers and practitioners alike. The

fundamental streams of thought will be reviewed in the next section. 3. State of the art Existing approaches for reducing data errors and inconsistencies in databases can be clustered into two groups. All approaches of the first group have in common that they focus on defining and implementing organizational processes in order to prevent low data quality while the approaches of the second group are characterized by intercepting false data through algorithms implemented in the various IT systems. Organizational approaches concerning data quality strive to design resilient processes which are able to guide the employees’ work within the organization. KNIGHT and SMITH propose an organizational structure with three levels: the Enterprise Information Governance Steering Committee is responsible for implementing a data quality vision and financing initiatives in order to improve data quality. The Data Governance Council constitutes the strategic level where standards are set for all departments and activities for the third level, the Data Stewards, are defined. The Data Stewards are responsible for ensuring high data quality on the tactical level by managing access to and approval of data sets [21]. In a similar concept, RUSSOM also suggests to manage data quality in a three-level organization with a Governance Committee acting as the steering board and Corporate Stewards as units coordinating data quality initiatives. In this concept, so called Domain Stewards correspond to the previously mentioned Data Stewards [22]. Since all organizational approaches have in common that employees must learn and follow the processes and all associated rules, these procedures heavily rely on the employees’ discipline and are prone to human error. Approaches directly focusing on the IT systems and databases try to mitigate the disadvantages of organizational processes by implementing rules and advanced logic directly into the IT systems. One common approach involves implementing integrity constrains into Database Management Systems (DBMS), which are designed in order to ensure data consistency over the whole data life-cycle. Integrity constrains guarantee that data sets are semantically correct when data is inserted, changed or deleted from a database by specifying a range of values which has to be adhered to by every data transaction [23, 24]. The most simple integrity constraint can be implemented by defining mandatory fields which the user needs to fill before the program continues. However, these integrity constraints can be easily circumvented by entering just a random letter or dot. More sophisticated integrity constraints possess the disadvantage that changing circumstances might lead to an expansion of the admissible range of values which needs to be adjusted manually. Data mining (DM) represents another line of research in which cleaning data is a fundamental step and a prerequisite for extracting knowledge from any given database. Although DM is widely applied in fields such as banking, insurance, medicine, and marketing little DM research concerning production and manufacturing has been carried out. Literature

53

54

Christina Reuter and Felix Brambring / Procedia CIRP 41 (2016) 51 – 56

reviews conducted by WANG ET AL. [25], HARDING ET AL. [26], and CHOUDHARY ET AL. [27] state that current research concentrates on product quality improvement, product design, materials planning, and production processes and operations. Most of this research does neither exclusively focus on PPC processes nor explicitly discusses missing or inconsistent data and its negative effects on detailed scheduling and production monitoring. In their pioneering work, LAKSHMINARAYAN ET AL. problematized incomplete data in industrial databases taking the example of a Honeywell dataset. The authors applied a Bayesian unsupervised learning method and a decision-tree based supervised learning method for imputing values and report that the unsupervised learning approach performs poorly when predicting a single value but has high accuracy when predicting multiple choices for the same target variable. Both machine learning techniques are applied on maintenance data while PPC processes are not explicitly addressed. [28]. KOONCE and TSAI deploy a genetic algorithm for optimizing a job shop schedule and apply data mining in order to detect patterns in the optimized detailed schedule. Their goal is to develop a rule set scheduler which will be able to mimic the genetic algorithm’s scheduling results in similar situations. However, since they apply the approach on a simple theoretical scheduling problem, no missing or inconsistent data occurs [29]. ÖZTÜRK ET AL. use a regression tree approach in order to estimate lead time in a make-to-order manufacturing environment. The authors compare their approach to three other lead time estimation methods and deem their approach superior. Since they use simulated data, missing or inconsistent data lies out of the scope of their research [30]. ZENG and GAO discuss the impact of value imputation based on a decision-tree algorithm when predicting control parameters of a blast furnace ironmaking process. The authors conclude that their data preprocessing approach can considerably improve controlling strategies of the ironmaking process [31]. Finally, KWAK and KIM propose a data mining approach for optimizing a semiconductor-manufacturing process under consideration of missing values. The authors conclude that their multiple imputation approach can improve process results. However, the paper concentrates on only one process step of a highly linear value stream which cannot be compared to a job shop production with complex material flows [32]. To summarize, in contrast to the organizational approaches, the methodologies of the second group aim on maintaining the logical integrity of the data retrospectively. This might appear counterintuitive because the root causes of missing and inconsistent data are not directly addressed but rather inadequate data quality is treated as an inevitable characteristic of empirical data and the mitigation of its negative effects by estimating and imputing the correct values is considered the main priority [20]. While existing DM approaches in production and manufacturing have proven their applicability in several production related domains in a missing-data context, no such methodologies exist for PPC processes. However, due to the great importance of PPC

processes for business success, manufacturing companies could profoundly benefit from DM approaches specifically designed for this purpose. 4. Improving data consistency in production control Taking the challenge of low data quality in PPC processes and the current state of the art into consideration, several requirements for our newly proposed approach can be imposed: First and foremost, the new approach should be automatized and robust, in a way that production control is not slowed down, as well as flexible and modularly extendable for ensuring adaptability to different data structures. Secondly, the approach should only introduce a minimized bias while the fact that working with unaltered, low quality data, introduces a bias into any analysis is regularly not considered at all. The conceptual integration of our approach for improving data consistency in production control into the classic production control loop is depicted in Fig. 2. The planning bases, such as products, volumes and dates, as well as the latest production feedback data are stored in databases most commonly managed by the ERP system. A detailed scheduled production program is compiled by the APS system which is the guideline for all upcoming production jobs. Due to technical or organizational disturbances, this production program regularly cannot be followed through exactly as planned. On top of that, the production feedback data quality is usually impaired due to previously described errors in manual documentation or failing IT hardware. By updating the ERP database with these new production feedback data, the classic control loop is closed. In our approach, a second, inner control loop is implemented which is responsible for automatically detecting and healing typical data inconsistencies and errors in production control. In control system theory, the inner loop of such a cascaded control loop runs faster than the outer one [34] which prevents the faulty production feedback data representing the basis for updating the detailed scheduled production program. The inner control loop consists of two sequential steps: first, typical data inconsistencies are automatically identified by implementing integrity rules which the data have to comply to. The errors are immediately evaluated quantitatively for their severeness (cf. Table 1). In the following step, the inconsistent or missing values are estimated using specifically adapted DM algorithms. In this paper, we focus on data inconsistency (a) (cf. Table 1) and disturbances (technical, organizational)

PPC APS Plan ƒ products, ƒ volumes, ƒ dates, ƒ …

ERP production MES

Estimating correct values by implementing DM algorithms

planning, transaction and master data Automated Identification of data inconsistencies by implementing integrity rules [%]

0 2 4 6 8 10 12 14

cascaded control loop Inner control loop runs faster than outer one inner control loop

?

?

Output ƒ products, ƒ volumes, ƒ dates, ƒ …

?

Production feedback data (via PDA, MDA, sensors, etc.)

Fig. 2. Enhanced production control loop (following [33])

outer control loop

55

Christina Reuter and Felix Brambring / Procedia CIRP 41 (2016) 51 – 56

present three DM algorithms, namely the Naïve Bayes Classifier (NBC), the Association Rule Induction (ARI) algorithm, and the k-Nearest Neighbor (kNN) algorithm, which have been adapted and evaluated for performance in respect of retrieving missing information concerning the utilized work station. All four algorithms are briefly introduced in the following paragraphs. The Naïve Bayes Classifier (NBC) is based on Bayes’s rule of conditional probability which states that the probability P of a hypothesis H bearing on evidence E computes according to equation (1) [35].

5[ H E ]

5[ E H ] * 5[ H ] 5[ E ]

(1)

In the case of missing information concerning the utilized work station, the probability for every work station being the correct one can be calculated bearing on the evidence of the known data, such as product category, lot size, etc. The NBC “naïvely” assumes independence of these events for only then it is valid to multiply the respective probabilities of events. Evidently, this is requirement is not met in the described case since e.g. product category and lot size will most likely not be independent characteristics of a production job. However, several practical applications have shown that NBC can provide reliable results albeit violation of the independence requirement [36, 37]. NBC is considered to be highly transparent and efficient. The application of the Association Rule Induction (ARI) algorithm has been extensively discussed in a previous paper [14]. Two indicators are used to determine the best association rule: the support, which represents the percentage of cases in which the specific rule can be applied, and the confidence, which is a measure for choosing the correct rule given the support [38]. For any two rules I and J, the confidence of J being true given I is true, computes according to equation (2).

con( I o J )

sup(J ) sup(I )

(2)

As an extension to the application described in [14] we now allow every sequence of preceding or antecedent work stations to serve as rule I. The ARI algorithm is easy and fast to compute. Finally, the k-Nearest Neighbor (kNN) algorithm is another standard statistical technique which has been adapted to a wide number of classification problems. In this instancebased learning method, every new instance is matched to the class of the closest k neighbors which are identified using a distance metric [35]. In our adaptation, we implemented two distance metrics: the Heterogeneous Euclidean-Overlap Metric (HEOM) and the Heterogeneous Value Difference Metric (HVDM). These two metrics differ in the computation of the distance of nominal attributes which can be mapped more precise with the HVDM approach. However, the computational effort rises accordingly [39]. In the case of

missing information on the utilized work station, the kNN algorithm can identify process steps in the historical data which closely resemble the data set with the missing information in terms of product category, lot size, etc. The closest neighbor can then be used to impute the missing data. All three algorithms presented here have been employed on the problem of missing information concerning the utilized work station. The comparative results are presented in the next section. 5. Experiments and results In order to validate the presented approach, all three algorithms have been applied to the real-world data set from company 1 (cf. Table 1). The data set has been prepared by deleting all lines with missing information concerning the utilized work station in order to receive a gold standard data set which builds the basis for further analysis. For creating the test data set, the MAR pattern of the missing data has been inserted again afterwards. This approach is in line with other research concerning missing data [17, 32]. The experiment results are given in Figure 3. The kNN algorithm in combination with the HVDM distance metric delivered the best result with roughly over 90 % correctly estimated work stations when k was set to 5. However, the HEOM distance metric was only 0.5 % behind while needing much less computational effort. The Association Rule Induction algorithm can be considered less powerful in terms of estimating the correct work station with only 84 % correct estimations. With only 68 % correctly estimated work stations, the NBC algorithm performed the worst of all algorithms meaning the introduced bias was the highest. A main characteristic of any production feedback data set, including our test data set, is the high variance of possible attribute values which implicates possible attribute combinations only manifest scarcely in the data sets. This problem is commonly known as the curse of dimensionality [40] and the NBC algorithm seems to react most sensitive to this phenomenon. To sum up, the kNN algorithms seem to deliver the most promising results while being also the most elaborate to configure since the distance metrics and the best size of the neighborhood (k) must be determined in advance. 100 80

84,0%

89,7%

90,2%

kNN (HEOM)

kNN (HVDM)

67,6%

60 40 20 0 NBC

ARI

Fig. 3. Experiment results for all algorithms

6. Conclusion and further research The importance of guaranteeing high data quality in PPC processes will increase dramatically in the near future, when manufacturing companies continue to move on beyond lean approaches and pursue the vision of smart factories and Industrie 4.0. Therefore, typical data inconsistencies in

56

Christina Reuter and Felix Brambring / Procedia CIRP 41 (2016) 51 – 56

production feedback data have been discussed and three DM algorithms for mitigating one of these data errors have been presented and evaluated in this paper. The evaluation results indicate that data quality can be significantly improved by adapting DM algorithms specifically for this purpose. The superior goal is to develop an approach for reducing the negative effects of missing, noisy and inconsistent data in PPC processes. For this end, a concept for implementing DM algorithms into the classic production control loop in order to estimate missing and inconsistent data has been proposed. Realizing this concept, will allow to reduce laborious data cleaning, increase transparency over current production job statuses and ultimately help to improve logistical performance of manufacturing companies. For further refinement of this approach, additional research is necessary in several directions: the understanding of the negative effects of data inconsistencies on PPC process results needs to be heightened in order to deal with their occurrence efficiently. Additionally, further DM algorithms, such as a Lazy Learning Decision Tree (LDT) might be applicable. Furthermore, all inconsistencies which have been only briefly addressed in this paper must be taken into consideration. Acknowledgements The authors would like to thank the German Research Foundation (DFG) for funding this work in the research and development project "Cluster of Excellence - Integrative Production Technology for High Wage Countries”. References [1] Brecher C (ed.). Integrative production technology for high-wage countries. Berlin: Springer; 2012. [2] Wiendahl HP. Betriebsorganisation für Ingenieure (Enterprise organisation for engineers). Munich: Hanser; 2008. [3] ElMaraghy HA, AlGeddawy T, Azab A, ElMaraghy W. Change in Manufacturing – Research and Industrial Challenges. In: ElMaraghy HA (ed.). Enabling manufacturing competitiveness and economic sustainability: Proceedings of the 4th International Conference on Changeable, Agile, Reconfigurable and Virtual production (CARV 2011), Montreal, Canada, October 3rd-5th 2011. Berlin: Springer; 2011, p. 2-9. [4] Schuh, G. Effizient, schnell und erfolgreich: Strategien im Maschinen-und Anlagenbau (Efficient, fast and successful: strategies in mechanical and plant engineering). VDMA; 2007. [5] Schuh G, Stich V (eds.). Produktion am Standort Deutschland (Production in Germany). Survey. Aachen; 2013. [6] Wiendahl H, ElMaraghy HA, Nyhuis P, Zäh MF, Duffie N et al. Changeable Manufacturing - Classification, Design and Operation. CIRP Annals - Manufacturing Technology 2007;56(2):783-809. [7] Buzacott JA, Corsten H, Gössinger R, Schneider, HM. Production Planning and Control: Basics and Concepts. Munich: Oldenbourg; 2013. [8] Lakshminarayan K, Harp SA, Samad T. Imputation of missing data in industrial databases. Applied Intelligence 1999;11(3):259-275. [9] Schuh G, et al. Achieving Higher Scheduling Accuracy in Production Control by Implementing Integrity Rules for Production Feedback Data. Procedia CIRP 2014;19:142-147. [10] Yeganeh NK, Sadiq S, Sharaf MA. A framework for data quality aware query systems. Information Systems 2014;46:24-44. [11] Hüner KM, Schierning A, Otto B, Österle H. Product data quality in supply chains - the case of Beiersdorf. Electronic Markets 2011;21(2):141-154. [12] Jarke M, Vassiliou Y. Data warehouse quality - a review of the DWQ project. Proceedings of the Conference on Information Quality. Cambridge, MA;1997:299-313.

[13] Wang RY, Strong DM. Beyond accuracy - what data quality means to data consumers. J Mgmt Inf Syst 1996;12(4):5-34. [14] Schuh G, Potente T, Thomas C, Brambring F. Approach for Reducing Data Inconsistencies in Production Control. In: Zaeh MF (ed.). Enabling Manufacturing Competitiveness and Economic Sustainability: Proceedings of the 5th International Conference on Changeable, Agile, Reconfigurable and Virtual Production (CARV 2013), Munich, Germany, October 6th-9th, 2013. Berlin: Springer; 2013. p. 347-351. [15] Hang Y, Fong S, Chen W. Aerial root classifiers for predicting missing values in data stream decision tree classification 2011. [16] Simiński K. Neuro-rough-fuzzy approach for regression modelling from missing data. Int J App Math Comp Sci 2012;22(2):461-476. [17] Enders CK. Applied missing data analysis. New York: Guilford Publications; 2010. [18] Rubin DB. Inference and missing data. Biometrika 1976;63(3):581-592. [19] Schafer JL, Graham JW. Missing data: our view of the state of the art. Psychological methods 2002;7(2):147-177. [20] García-Laencina P, Sancho-Gómez JL, Figueiras-Vidal AR. "Pattern classification with missing data: a review." Neural Computing and Applications 19.2 (2010): 263-282. [21] Knight B, Smith D. The Strength is in the Governance. Teradata Magazine 2007:1-3. [22] Russom P. Taking Data Quality to the Enterprise through Data Governance. The Data Warehousing Institute, TDWI Report Series; 2006. [23] Schneider M. Implementierungskonzepte für Datenbanksysteme (Implementation concepts for database management systems). Berlin: Springer; 2003. [24] Kemper A, Eickler, A. Datenbanksysteme: Eine Einführung (Database management systems: an introduction). Munich: Oldenbourg; 2011. [25] Wang K, Tong S, Eynard B, Roucoules L, Matta N. Review on application of data mining in product design and manufacturing. In: Fuzzy Systems and Knowledge Discovery. 2007;4:613-618. [26] Harding JA, Shahbaz M, Srinivas, Kusiak A. Data Mining in Manufacturing – A Review. J of Manuf Sci and Eng 2006;128(4):969976. [27] Choudhary AK, Harding JA, Tiwari MK. Data mining in manufacturing – a review based on the kind of knowledge. J of Int Manuf 2008;20:501521. [28] Lakshminarayan K, Harp SA, Goldman RP, Samad T. Imputation of Missing Data Using Machine Learning Techniques. KDD 1996:140-145. [29] Koonce DA, Tsai SC. Using data mining to find patterns in genetic algorithm solutions to a job shop schedule. Computers & Industrial Engineering 2000;38(3):361-374. [30] Öztürk A, Kayalıgil S, Özdemirel NE. Manufacturing lead time estimation using data mining. Eur J of Operat Res 2006;173(2):683-700. [31] Zeng JS, Gao CH. Improvement of identification of blast furnace ironmaking process by outlier detection and missing value imputation. J of Proc Contr 2009;19(9):1519-1528. [32] Kwak DS Kim KJ. A data mining approach considering missing values for the optimization of semiconductor-manufacturing processes. Expert Systems with Applications 2012;39(3):2590-2596. [33] Kletti J, Schumacher J. Die perfekte Produktion (Perfect manufacturing). Berlin: Springer; 2011. [34] Levine WS (ed.). The Control Handbook. New York: CRC Press; 1996. [35] Witten IH, Frank E. Data Mining: Practical machine learning tools and techniques. San Francisco: Morgan Kaufmann; 2005. [36] Maimon O, Rokach LS. Data mining by attribute decomposition with semiconductor manufacturing case study. Data mining for design and manufacturing 2001;3:311-336. [37] Bramer MA (ed.). Principles of data mining. 2nd edition. London: Springer; 2013. [38] Agrawal R, Srikant R. Fast algorithms for mining association rules. Proc. 20th int. conf. very large data bases. 1994;1215:487-499. [39] Wilson DR, Martinez TR. Improved Heterogeneous Distance Functions. J of Artif Int Res 1997;6:1-34. [40] Verleysen M, François D. The curse of dimensionality in data mining and time series prediction. Computational Intelligence and Bioinspired Systems. Berlin:Springer; 2005:758-770.

Suggest Documents