data mining based knowledge discovery for quality ... - CiteSeerX

International Journal of Advances in Engineering & Technology, May 2013. ©IJAET ISSN: 2231-1963

DATA MINING BASED KNOWLEDGE DISCOVERY FOR QUALITY PREDICTION AND CONTROL OF EXTRUSION BLOW MOLDING PROCESS E.V.Ramana1 and P. Ravinder Reddy2 1

Department of Mechanical Engineering, Shadan College of Engineering & Technology, Hyderabad, India 2 Department of Mechanical Engineering, Chaitanya Bharati Institute of Technology, Hyderabad, India

ABSTRACT In practice, when quality of products fall below the expected level, causal relationships between process variables and product defects are investigated. Quality improvement and control activities in product manufacturing include identifying the factors that significantly affect quality, modeling relationships between input attributes and target attribute (yield, quality, performance index etc) and predicting quality levels of given input attributes. Data mining tools have created new ways of extracting useful knowledge from existing manufacturing process databases. This paper proposed data mining based knowledge discovery approach from Extrusion Blow Molding process database making use of Microsoft Clustering, Naïve Bayes and Decision Trees algorithms available in SQL Server 2008. Data mining models have been created using these algorithms from extrusion blow molding process data set of a typical product (Gum Bottle). These models were trained based on database containing historical process data. The objective of this study was to apply the discovered process knowledge to set optimum process parameters resulting in high quality products, increased output and reduction of scrap.

KEYWORDS: Data mining, Clustering, Pattern recognition, Decision Trees, Knowledge discovery

I.

INTRODUCTION

New approach like data mining is required for quality prediction and control to keep pace with increased complexity of manufacturing. Data Mining is the non-trivial process of identifying valid, novel, potentially useful and ultimately understandable patterns. [4] The domain knowledge is used to guide the search or evaluate the interestingness of resulting patterns.[5] Data mining is able to predict, classify data as well as detect relationships existing between quality measure (target attribute) and input attributes (manufacturing process data). The quality measure may have nominal values such as “Accepted” / “Rejected”. Data mining tools are useful in many areas of manufacturing such as defect analysis, yield improvement, quality monitoring and process control etc. [17][1][15] While quality monitoring tries to reduce the variability by detection and removal of assignable causes, process control is based on process compensation and regulation to reduce variability. [9] Classification is a form of supervised learning where class labels for training samples are given and used as examples to supervise the learning of a classification model. Typical classification algorithms used for data mining task are Decision Tree (DT), Artificial Neural Networks (ANN), Naïve Bayes etc. Clustering is the process of grouping data into classes or clusters so that objects with in a cluster have high similarity but are very dissimilar to objects in other clusters.[13] Clustering is also known as unsupervised learning. Unlike supervised learning (Classification), the class label of each data object is not known. [21] Data mining tools can be used to extract knowledge from process data sets. The discovered knowledge can be used to minimize number of defective products and to achieve desired level of process performance and quality of product. [7] This knowledge shall be represented in a form that is understandable to human. [19][18][16][11] Data mining techniques like Artificial Neural Networks (ANN) and Genetic Algorithms (GA) have been used in determining process

703

Vol. 6, Issue 2, pp. 703-713

International Journal of Advances in Engineering & Technology, May 2013. ©IJAET ISSN: 2231-1963 parameters in injection molding, welding etc. Combining a trained Radial Basis Network (RBN) and sequential Quadratic Programming (SQP) method was used to determine an optimal parameter setting of injection molding process. [14] A genetic algorithm approach was implemented to the optimization of process parameters in Laser Beam Welding. [8] Knowledge discovered from Microsoft (MS) Clustering, Decision Trees and Naïve Bayes algorithms have been implemented to determine optimal process parameters for extrusion blow molding process. This paper consists of eight sections. Extrusion Blow Molding process and its associated data set is disclosed in Section 2.MS Clustering algorithm has been applied on data set to form clusters in Section 3. In Section 4, the method of using MS Naïve Bayes algorithm to predict the class of the data has been presented. Section 5 describes MS Decision Trees algorithm applied for classification task. Evaluation measures used for predicting accuracy models are presented in Section 6. Conclusions of this paper are disclosed in Section 7. Finally Section 8 presents the scope of future work.

II.

EXTRUSION BLOW MOLDING

In extrusion blow molding, a hot tubular parison is extruded continuously and is cut by operating the knife after obtaining the desired length. [6] Now the mould will move towards the extruder center, catch the parison and its open end is sealed off by closing the mold halves. Air is blown at desired pressure into hot parison to expand it against the walls of the mould. The product is ejected after adequate cooling from the mould.

2.1. Data Set For this application, a process data set of the plastic product (150 ml Gum Bottle) made of HDPE (High Density Poly Ethylene) consisting of 60 records is used. The product is manufactured by extrusion blow molding process. Data acquisition related to process attributes is done from the data stored in machine and manually recorded data. The process attributes considered for building data mining models are presented in Table 1. Table 1 Extrusion blow molding process attributes Barrel Temperatures in Zone 1 to Zone 5 (oC) Cycle Time (s) Ejection Time (s) Ext ID Extrusion Die Temperature (oC) Inflation Pressure (bar) Mold Temperature (oC) Parison Length (mm) Parison shape Recycled material Surface finish Total of Inflation and Cooling time(s) Trace of foreign material

__________________________________________________________________________________

III.

CLUSTERING

Clustering is simple and natural for human being in dealing with small set of attributes. It becomes difficult to handle as number of attributes grows for human mind. MS Clustering algorithm enables to conceive the hidden relationships between the large numbers of attributes existing in modern data sets. The most common usage of MS Clustering algorithm is to detect the clusters in the data and label the data with the clusters that are discovered. This algorithm can be used to predict values as well as to provide natural groupings, but traditionally clustering is not used for prediction. [12] Clustering shall be used as preprocessing step for other algorithms such as decision trees in a large analytical project. It is often the first data mining task to explore any underlying patterns that exist in the data. Clustering has been performed using MS clustering algorithm on Extrusion Blow Molding process data set to gain insight into characteristics of each cluster and to focus on a particular set of clusters for further analysis.

3.1 Performing Clustering MS Clustering algorithm has been applied with out providing class label in the first instance. The cluster diagram generated by the algorithm shows that clusters 4, 5 and 6 are isolated from the rest of them as presented in Fig-1. Cluster 3 having population size of one is not taken into consideration. It is assumed in the first instance that one among the Clusters 1 and 2 is representing accepted products, since these are having significant population. The same algorithm has been applied on the data set by setting class attribute as predictable to predict class label of clusters in the second instance so as to

704

Vol. 6, Issue 2, pp. 703-713

International Journal of Advances in Engineering & Technology, May 2013. ©IJAET ISSN: 2231-1963 identify defective and acceptable product cluster characteristics and their associated values. MS Clustering algorithm parameters are set to the values given below. Cluster count: 0 (Algorithm use heuristics to determine number of clusters to build), Clustering method (default: scalable EM), Maximum attributes (default: 255), Maximum states (default: 100), Minimum support (default: 1), Modeling cardinality (default: 10), Sample size (default: 50000), Stopping Tolerance: 1, this value is used to determine when convergence is reached)

Fig-1 Cluster diagram

Cluster 1 is representing the characteristics of accepted products as assumed in the first instance while Cluster 4 representing the characteristics of products rejected due to excess flash on both ends whose discrimination scores with Cluster 1 are given in Table-2. Extrusion die temperature, parison length and surface finish are attributes that have impact on the above mentioned defect. [6][20] Table-2 Discrimination scores of Cluster 4 and Cluster 1 Variables

Values

Favors Cluster 4 Favors Cluster 1

Extrusion Die Temperature

>= 43

100.000

Parison Length

>= 113

100.000


< 43

100.000

Parison Length

< 113

100.000

Class

Accepted

64.604

Class

Rejected due to Excess flash on both ends of component

64.604

Surface Finish

Average

64.604

Surface Finish

Good

64.604

Trace of foreign material, high mold temperature and average surface finish are the attributes that favors the Cluster 3 (representing the components rejected due to discoloration and holes on surface) and its discrimination score with Cluster 1 representing the accepted components is shown in Fig-2. Fig-3 shows high mold temperature, parison shape with hooking and poor surface are the key influencers in causing poor neck formation and finish.

705

Vol. 6, Issue 2, pp. 703-713


Fig.2 Discrimination scores for Cluster 3 and Cluster1

Fig.3 Discrimination scores for Cluster 5 and Cluster1

IV.

NAÏVE BAYES CLASSIFICATION

Naïve Bayes is one of the simplest classifiers which provide very effective way to explore data. Model’s training is done with a single pass over the training data which makes the algorithm suitable to perform analysis on large data sets with large numbers of attributes. [12] It assumes that the effect of an attribute value on a given class is independent of the other attributes. The ability to explore relationships between attributes can be applied to identify the factors that are having high impact on quality in a manufacturing process. [15] MS Naïve Bayes algorithm enables to identify the key input attributes (influencers) for a specific target attribute. MS Naïve Bayes mining model has been created by making use of same process attributes that were used by clustering algorithm as shown in Table 1. Naïve Bayes dependency network view in Fig.4 shows the attributes that are having strong impact in predicting the class after eliminating weaker links. Each node represents the attribute and a node has an outgoing edge, it is predictive of the node at the end of the edge. MS Naïve Bayes algorithm parameters are set to the values given below.

706

Vol. 6, Issue 2, pp. 703-713

International Journal of Advances in Engineering & Technology, May 2013. ©IJAET ISSN: 2231-1963 Maximum input attributes (default: 255), Maximum output attributes (default: 255), Maximum states (default: 100, Minimum-Dependency-Probability (0.1, specifies minimum dependency probability between input and output attributes)

Fig-4 Dependency network view

Characteristics of attributes of accepted products and their associated values or range of values are given in the attribute characteristics view of MS Naïve Bayes viewer as shown in Table-3. Fig-5 displays that high mold temperature, poor surface finish and parison shape hooking favoring rejection of components due to poor neck formation & finish. Table-3 Attribute characteristic view

707

Attributes

Values

Probability

Inflation Pressure

5-6

100.000%


< 43

100.000%

Mold Temperature

Normal

100.000%

Material Used

HDPE

100.000%

Ejection Time

1

100.000%

Barrel Zone-4 Temperature

175 - 176

100.000%


170 - 172

100.000%

Cycle Time

11

100.000%


180 - 181

100.000%

Surface Finish

Good

100.000%

Trace Of Foreign Material

No

100.000%


165 - 166

100.000%


160 - 161

100.000%

Total Of Inflation And Cooling Time

10

100.000%

Parison Shape

Straight

100.000%

Parison Length

< 113

100.000%

Recycled Material

No

57.500%

Recycled Material

Yes

42.500%

Vol. 6, Issue 2, pp. 703-713


Fig-5 Discrimination scores for accepted components and rejected components due to poor neck formation & finish

Fig-6 displays the long parison and high mold temperatures are resulting in the rejections due to excess flash on both ends of component. Table-4 establishes by discrimination scores of attributes that recycled material with trace of foreign matter and high molding temperature can cause the rejection of components due to discoloration and holes on surface.

Fig-6 Discrimination scores for accepted components and rejected components due to excess flash on both ends of component Table-4 Discrimination scores that favors accepted components and components rejected due to discoloration and holes on surface Attributes Trace Of Foreign Material

708

Values Favors Accepted No

Favors Rejected due to Discoloration and holes on surface

100.000

Vol. 6, Issue 2, pp. 703-713

International Journal of Advances in Engineering & Technology, May 2013. ©IJAET ISSN: 2231-1963 Trace Of Foreign Material

Yes

100.000

Surface Finish

Average

100.000

Surface Finish

Good

Mold Temperature

High

Mold Temperature

Normal

Recycled Material

Yes

Recycled Material

No

V.

100.000 100.000 100.000 13.919 13.919

DECISION TREES CLASSIFICATION

Decision tree is one of the most popular data mining techniques because of high degree of accuracy and easily understandable patterns. Decision tree algorithms extract a decision tree from the data and it contains explicit knowledge that can be easily interpreted by user. [12][2] ID3, C4.5, CART, T2, CAL5, CN2 are some of the decision tree algorithms that are used in industry. [3][10] MS decision trees algorithm can be used for classification, regression and association tasks. In this paper, decision trees algorithm has been applied on process data associated with attributes given in Table 1 for classification task. Dependency network generated by the algorithm shown in Fig-7 displays that surface finish and trace of foreign material are the only attributes to predict the class label. Since surface finish of the product (good/average/poor) is the outcome of varying the other controllable process attributes like barrel temperatures, mold temperature, extrusion die temperature etc., it has been ignored from consideration in decision trees model irrespective of its prediction accuracy. MS Decision Trees algorithm parameters are set to the values given below. Complexity penalty (0.1), Maximum input attributes (255), Maximum output attributes (255), Minimum support (1, Minimum number of cases leaf node must contain), Score method (3, Bayesian with K2 prior method to calculate split score), Split method (2, split the tree completely on each attribute)

Fig-7 Dependency network view of Decision Trees model

Decision tree provided by MS Tree Viewer with mining legend is shown in Fig-8. The following decision rules can be derived from the decision tree.

709

Vol. 6, Issue 2, pp. 703-713


Fig-8 Decision Tree Rule-1: If Mold Temperature=”Normal” and Extrusion Die Temperature < 43 Then Class=Accepted with Probability=89.13% and Support=40 cases Rule-2: If Mold Temperature=”High” and Recycled material=”Yes” Then Class=”Rejected due to discoloration and holes on surface” with Probability=66.17% and Support=5 cases Rule-3: If Mold Temperature=”Normal” and Extrusion Die Temperature >=”43” Then Class=”Rejected due to excess flash on both ends of component” with Probability=62.15% and Support=4 cases Rule-4: If Mold Temperature=”High” and Recycled Material=”No” Then Class=”Rejected due to poor neck formation & finish” with Probability=57.14% and Support= 3 cases

VI.

EVALUATION MEASURES

The quality of knowledge extracted from process data and prediction accuracy of the models can be assessed by accuracy charts, classification matrix and cross validation. [3] The standard lift chart contains one line for each model selected and two extra lines: a random line and an ideal line. Lift chart is generated with the input selection of class attribute as predictable column and prediction value Accepted is shown in Fig-9. Mining models based on Naïve Bayes, Decision Trees, and Clustering algorithms can get 60% target using only 50% of the data and their respective prediction probabilities are shown in the mining legend. The prediction accuracy of models has been measured by lift chart on test data set with 10% hold out cases.

710

Vol. 6, Issue 2, pp. 703-713


Fig-9 Standard lift chart

Multifold cross validation is done on the three models mentioned above with 10 folds on the entire data set (to make use of all available cases for training) for a target attribute “class” and target state “Accepted”. It splits training data into 10 folds and model is built on the data from all the other folds (partitions) and validated against data from current fold (partition). Cross validation results are presented in Fig-10.Training data is good enough for all the models since results for each partition are similar with an average value of 4.3 and standard deviation of 0.45.

Fig- 10 Cross validation results

VII.

CONCLUSION

In the present article, Microsoft Naïve Bayes, Decision Trees, Clustering algorithms available in SQL Server 2008 have been implemented to extract the useful and expressive knowledge from extrusion blow molding process dataset of a typical product(150 ml Gum Bottle). The prediction accuracy of the models has been evaluated by standard lift chart and ten fold cross validation methods on the test cases. Naïve Bayes and clustering models were found to have better accuracy than Decision Trees in the evaluation performed by standard lift chart while predicting process parameter values that result in acceptable products. The knowledge driven and proactive decisions have been implemented in quickly setting process parameters and their range of values that resulted in increased output of high quality products and significantly reduced the scrap.

711

Vol. 6, Issue 2, pp. 703-713


VIII.

FUTURE WORK

One important extension to this work will be to build up expert system based on extrusion blow molding process for a specific company from the knowledge discovered by the data mining models. Data mining algorithms such as Artificial Neural Networks (ANN), Genetic Algorithms (GA), Logistic Regression and hybrid algorithms shall be considered for evaluation in evolving more effective models. Data mining models can be made more reliable by building them on process database accumulated over adequate period accommodating large number of cases covering all possible materials, products, equipment and defects.

REFERENCES [1] A.K. Choudhary J.A. Harding, M.K. Tiwari, “Data mining in manufacturing: a review based on kind of knowledge”, Journal of Intelligent Manufacturing, 2008 [2] A. Kusiak, “Data mining: manufacturing and service applications”, International Journal of Production Research, Vol. 44, 2006, pp4175-4191 [3] Andrew Kusiak, “Selection of Invariant Objects With a Data-Mining approach, IEEE Transactions On Electronics Packaging Manufacturing, Vol. 28, No.2, April 2005,pp187-196 [4] Arun K Pujari, “Data Mining Techniques”, Universities Press, pp43-44, 2001 [5] B.N. Lakshmi, G.H. Raghunandhan, “A Conceptual Overview of Data Mining”, Proceedings of the National Conference on Innovations in Emerging Technology-2011, Erode, Tamilnadu, India.17 & 18 February, 2011,.pp.27-32. [6] Dr.Terry L.Richardson, ”Industrial Plastics: Theory Applications”, South-Western Publishing Co.,pp310316 [7] E.V.Ramana, P.Ravinder Reddy, “Integration of control charts and Data Mining for Process Control and Quality Improvement”, International Journal of Advances in Engineering & Technology, Vol. 2, Issue 1,2012,pp.640-648 [8] G. Harinath Goud, E Venugopal Goud, “A genetic algorithm approach to the optimization of process parameters in laser beam welding”, International Journal of Mechanical Engineering and Technology, Vol. 3, Issue 3, 2012, pp459-470 [9] Gulser Koksal, Inci Batmaz, Murat Caner Testik, “A review of data mining applications for quality improvement in manufacturing industry”, Expert systems with Applications, Vol:38, 2011, pp-13448-13467 [10] Hovhannes Sadoyan, Armen Zakarian, Pravansu Mohanty, ”Data mining algorithm for manufacturing process control”, International. Journal Advanced Manufacturing Technology, Vol.28, 2006, pp342-350 [11] J.A. Harding, M.Shahbaz, Srinivas, A.Kusiak, Data mining in Manufacturing: A Review, Journal of Manufacturing Science and Engineering, Vol. 128, 2006, pp969-976 [12] Jamie Maclennan, ZhaoHui Tang, Bogdan Crivat, Data Mining with Microsoft SQL Server 2008, pp215317, 2009 [13] Jiawei Han and Micheline Kamber,”Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers,pp383-385,2008 [14] Jie-Ren, Shie, “Optimization of injection molding process for contour distortions of polypropylene composite components by a radial basis neural network”, International Journal of Advanced Manufacturing Technology, vol.36, 2008, pp1091-1103 [15] Kescheng Wang, Applying data mining to manufacturing: the nature and applications, Journal of Intelligent Manufacturing, Vol.18, 2007, pp487-495 [16] Lior Rokach, Oded Maimon, “Data mining for improving the quality of manufacturing: a feature set decomposition approach”, Journal of Intelligent Manufacturing, Vol.17,2006, pp285-299 [17] Lior Rokach, Roni Romano, Oded Maimon, “Mining manufacturing databases to discover the effect of operation sequence on the product quality, Journal of Intelligent Manufacturing, Vol.19, 2008, pp313-325

712

Vol. 6, Issue 2, pp. 703-713

International Journal of Advances in Engineering & Technology, May 2013. ©IJAET ISSN: 2231-1963 [18] M. A. Karim, G. Russ, and A. Islam, “Detection of faulty products using data mining”, Proceedings of International Workshop on Data Mining and Artificial Intelligence (DMAI' 08), 24 December, 2008, Khulna, Bangladesh, pp101-107 [19] Mark Ploczynski, Andrzej Kochanski, “Knowledge Discovery and Analysis in Manufacturing, Quality Engineering, Vol. 22, pp169-181 [20] Muralisrinivasan Natamai Subramanian, “The Basics of Troubleshooting in plastics processing” Wiley, pp135-148 [21] Vikram Pudi, P.Radha Krishna, “Data Mining”, Oxford University Press, pp122-128, 2010

AUTHORS

E. V. Ramana is working as a Professor& Head of Department of Mechanical Engineering, Shadan College of Engineering & Technology, Hyderabad. He received his first M.Tech degree in Energy Systems and second M.Tech degree in CAD/CAM in the years 1992 and 1997 from JNT University, Hyderabad. He possesses 16 years of experience in teaching and 6 years in industry and research. He is currently pursuing the PhD degree from JNTUH, Hyderabad, India. He has considerable experience in developing mechanical engineering related application soft wares. His active area of research is Data Mining in Manufacturing.

P. Ravinder Reddy is born in 1965 and he is working as a Professor and Head of Mechanical Engineering, Chaitanya Bharathi Institute of Technology, Hyderabad. He is having 22 Years of Teaching, Industrial and Research experience. Taught Postgraduate and under graduate Engineering subjects. Published Research Papers over 132 in International and national Journals, and Conferences. Guided 10 Ph.Ds and 4 Ph.D scholars submitted their thesis. Guided over 250 M.E/M.Tech Projects and carried out research and consultancy to a tune of Rs. 1.9 Cr sponsored by BHEL, AICTE, UGC, NSTL and other industries. Organized 23 Refresher/STTPs/ workshops, one international conference and delivered 92 invited/ keynote/ special lecturers. Received “UGC Fellowship” award by UGC (1999). Raja Rambapu Patil National award for promising Engineering Teacher by ISTE for the year 2000 in recognition of his outstanding contribution in the area of Engineering and Technology. Excellence “A” Grade awarded by AICTE monitoring committee for the MODROB project sponsored by AICTE in 2002. “Engineer of the year Award-2004” for his outstanding contribution in Academics and research by the Govt. of Andhra Pradesh and Institution of Engineers (India), AP State Centre on 15th September 2004 on the occasion of 37th Engineer’s Day. Best Technical Paper Award in the year Dec. 2008 by National Governing Council of Indian Society for Non Destructive Testing.

713

Vol. 6, Issue 2, pp. 703-713

data mining based knowledge discovery for quality ... - CiteSeerX

data mining based knowledge discovery for quality ... - CiteSeerX

Suggest Documents

Knowledge-Based Data Mining - CiteSeerX

Data Mining and Knowledge Discovery

2. Grid-Based Data Mining and Knowledge Discovery - DIMES-UNICAL

2. Grid-Based Data Mining and Knowledge Discovery

Introduction to Data Mining and Knowledge Discovery

Data Mining and Knowledge Discovery Handbook

Knowledge Discovery with Data and Text Mining

Foundations of Data Mining and Knowledge Discovery

Knowledge Discovery and Data Mining in Databases

Data Mining and Knowledge Discovery: Applications, Techniques ...

From Data Mining to Knowledge Discovery in Databases - CiteSeerX

From Data Mining to Knowledge Discovery in Databases - CiteSeerX

heuristic knowledge discovery for archaeological data ... - CiteSeerX

Ontology-Based Meta-Mining of Knowledge Discovery ... - CiteSeerX

Requirements-Based Knowledge Discovery for ... - CiteSeerX

Knowledge Discovery and Data Mining - Association for the ...

hybrid data mining technique for knowledge discovery from ... - arXiv

Formal Concept Analysis for Knowledge Discovery and Data Mining ...

Data mining and knowledge discovery for process monitoring and ...

Ontologies for Data Mining and Knowledge Discovery to Support ...

Data mining and Knowledge Discovery Resources for ... - DAME - Unina

data mining and knowledge discovery tools for exploiting big earth ...

A Data Mining Query Language for Knowledge Discovery in a ...

Medical Data Mining: Knowledge Discovery in a Clinical Data ...