Visually Mining Off-line Data for Quality ... - Wiley Online Library

5 downloads 4387 Views 229KB Size Report
KEY WORDS: dynamic graphics; retrospective CUSUM; statistical process control; visual data mining. 1. INTRODUCTION. Both monitoring the production ...
QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL Qual. Reliab. Engng. Int. 2003; 19:273–283 (DOI: 10.1002/qre.588)

Special Issue

Visually Mining Off-line Data for Quality Improvement‡ Giovanni C. Porzio1,∗,† and Giancarlo Ragozini2 1 Department of Economics, Cassino University, via Mazzaroppi, 03043 Cassino, Italy 2 Department of Sociology ‘Gino Germani’, Federico II University of Naples, Vico Monte di Piet`a 1, 80138 Naples, Italy

Highly automated modern manufacturing processes are yielding large databases with records on hundreds of process variables and product characteristics. This large amount of information calls for new approaches to production process analysis. In this paper, we discuss why a data mining framework can be appropriate for this goal, and we propose a visual data mining strategy to mine large and highdimensional off-line data sets. The strategy allows users to achieve a deeper process understanding through a set of linked interactive graphical devices, and is illustrated c 2003 John Wiley & Sons, Ltd. within an industrial process case study. Copyright  KEY WORDS: dynamic graphics; retrospective CUSUM; statistical process control; visual data mining

1. INTRODUCTION

B

oth monitoring the production process and improving its quality are of strategic importance to the manufacturing industry. These goals are generally pursued through standard statistical process control (SPC) techniques. Nevertheless, new technologies call for new statistical methods. In the past, data were collected by sampling because of some limited capabilities of measurement systems. The main SPC tools were then designed to analyse sampled data, basically exploiting inferential methods. However, new automatic measurement devices have recently allowed continuous monitoring of processes, yielding large amounts of measures stored in internal databases. Consequently, both on-line process monitoring should be revised to manage such complexities1, and off-line analysis should be enriched through specific tools suitable for exploring large data sets. Wishing to address the latter issue, we propose a visual data mining strategy for off-line analysis of production processes. The method is designed to improve process and item quality, paying attention to the process level and variability, and to the main causes that lower quality. In order to visually extract interesting information lying in internal SPC databases, we provide a set of easy-tointerpret linked charts. These charts, that exploit some basic statistical methods, are strongly based on interactive and dynamic graphical devices. The procedure has been tested within the production process of a European car industry. Data are collected through optical electronic devices that take measures on about 70 points of each produced vehicle body. The paper is organized as follows. In Section 2 we briefly recall the main aims and tools of SPC, while in Section 3 we introduce basic concepts of data mining and we discuss a related visual approach. Then, we present

∗ Correspondence to: G. C. Porzio, Department of Economics, Cassino University, via Mazzaroppi, 03043 Cassino, Italy. † E-mail: [email protected] ‡ This paper is based on a presentation given at the second ENBIS Conference, Rimini, September 2002.

c 2003 John Wiley & Sons, Ltd. Copyright 

Received 3 March 2003 Revised 25 May 2003

274

G. C. PORZIO AND G. RAGOZINI

a visual data mining strategy to improve quality in Section 4. Methodological details are provided within a case study in Section 5. Finally, Section 6 offers some concluding remarks.

2. IMPROVING QUALITY THROUGH SPC In the last few decades there has been considerable growth in the use of statistical methods for quality improvement in almost every kind of firm. Even if quality design and customer satisfaction have received much attention in a total quality management perspective, process quality control still remains the main issue for manufacturing industries to tackle. In this context, production can be efficiently managed through SPC, a common practice in industries and a classic consolidated methodology in statistics. SPC techniques are used to monitor the ongoing process, to detect out-of-control items, and to improve the overall process and item quality. In other words, SPC is mainly devoted to: (i) evaluating whether the process and items have specific quality standards (i.e. whether the process level and/or the item characteristics are sufficiently close to given targets); (ii) evaluating the coherence and the closeness to standards (i.e. if the process variability is sufficiently small); (iii) detecting and identifying process features that decrease quality. SPC methodology is based on a set of user-oriented descriptive and inferential statistical tools, which usually exploit graphical representations (i.e. charts) to make their utilization easier. Among the inferential techniques, control charts are the primary tool. Many kinds of charts are available, from the first Shewhart to the more sophisticated CUSUM and EWMA charts. Control charts are essentially a visualization of a series of repeated significance tests, one for each incoming item in the sample. Being tests, they obviously require precise distributional assumptions on models for the observed data. Several charts have been designed under different conditions to monitor process level or variability, and to detect out-of-control items (see e.g. Montgomery2). The increasing complexity of production processes, which requires controlling many product features, has been leading researchers to develop tools that take into account the multivariate nature of processes. In a parametric setting, main works consider multivariate normal as the underlying density function3–5 , while more recently nonparametric control charts have been developed to deal with more complex data structures6–8 . Multivariate control charts offer higher sensitivity in detecting out-of-control items. However, they do not allow quality to be directly improved as they do not indicate which variable or variables led to changes in level and/or variability. In other words, multivariate methods generally fail to provide tools for discovering causes that lower process and items quality. In this direction, some diagnostic techniques have been proposed, for example by Hawkins9 , and Fuchs and Benjamini10.

3. PROCESS ANALYSIS AND DATA MINING FRAMEWORKS Thanks to modern database capabilities and new measurement devices, an increasing number of production data are available to extract relevant information for quality improvement. In our opinion, to manage these large data sets and to address the new issues of off-line process analysis, data mining seems to be the adequate statistical framework. Data mining has emerged in response to a need from industry for effective and efficient analysis of large data sets. It was first defined as a non-trivial process of knowledge discovery in databases to identify valid, novel, potentially useful, and ultimately understandable patterns in data11. In this framework, new methods and techniques are developed combining both basic and more sophisticated statistical tools in order to extract useful information hidden in data, such as clusters, trends, associations and correlations12,13 . c 2003 John Wiley & Sons, Ltd. Copyright 

Qual. Reliab. Engng. Int. 2003; 19:273–283

VISUALLY MINING OFF-LINE DATA

275

Data mining has found primary applications in the business world for market basket analysis, customer profiling and scoring, Web clickstream analysis, and suchlike14. However, many other applications have emerged in other fields such as economics, ecology, chemistry, medicine, and on-line SPC as well 15,16 . Hence, data mining strategy potentials lead us to believe that integrating statistical process control techniques with data mining can be effective in tackling the discussed emerging issues in process analysis. In particular, we consider visual data mining as the most appropriate framework for people involved daily in SPC, because of their habit of dealing with graphical displays. Visual data mining can be considered a data mining process enriched by visualization methods17,18 . It consists of a learning process based on new visualization techniques like dynamic graphics. However, while data mining is generally based on highly automated methods, visual data mining is an interactive and not a completely automatic process. The key idea behind the approach is that people complement machines, and hence the capabilities of both can be better exploited for knowledge discovery if visualization complements automatic procedures19. In a visual data mining perspective, graphical representations are tools for the analysis by themselves, and not a way of representing numerical results.

4. A VISUAL DATA MINING STRATEGY With the aim of improving process quality, we thus propose a visual strategy that allows the user to achieve deeper process understanding. Specifically, we design a procedure that aims to identify changes in the process level and variability, and to detect causes that most affect quality, through an off-line analysis of high dimensional and large SPC data sets. The procedure makes it easy to extract useful information in an understandable way through simple visualization and exploratory data analysis tools. The proposed methodology is based on a set of control charts related to each other through algebraic linkages20, and it relies on dynamic graphics tools21 . The user is called upon to interactively mine the data throughout three steps: the knowledge acquired on one chart is exploited as an input to decide upon some graphical parameters that allow the next charts to be drawn. At the first stage, we propose analysing the series of a certain univariate quality index—summarizing somehow the multivariate measurements taken on each item—by means of a retrospective CUSUM type chart. The latter is a CUSUM chart applied to off-line process data22 , and allows us to identify possible points where the process level changes (generally referred to as change points or breakpoints). The series is then split into groups of data on the basis of the identified change points. As these subgroups are homogeneous with respect to the process level, we denote them as retrospective rational subgroups. We then suggest a visual inspection of two charts, that we call Subgroup Mean and Interquantile charts, to select the rational subgroups that present heavier quality decay. Looking at differences among both means and variations in interquantile ranges, groups of data with a worse process level and/or wider variability are identified for further analysis. Finally, in order to discover the main causes of quality decrease, we propose a Pareto-like graphical display to facilitate the user to decide upon possible intervention on the production process. The chart is based on an association analysis performed through the computation of appropriate correlation coefficients on the selected subgroups. We would like to note here that SPC techniques are usually provided along with automatic decision rules, based on critical values of hypothesis tests. However, inferential methods are not suitable in a data mining perspective. Large amounts of data lead to the rejection almost always of any null hypothesis, as the power of tests becomes very high. Consequently, in our case, any test to verify the presence of change points or differences among some statistics (means, medians, variances and suchlike) will yield too many significant results. For this reason, we do not provide any kind of test-based decision rules for the discussed charts: the user is called to assess the significance of any differences by looking at the charts, taking into account his/her knowledge of the process. c 2003 John Wiley & Sons, Ltd. Copyright 

Qual. Reliab. Engng. Int. 2003; 19:273–283

276

G. C. PORZIO AND G. RAGOZINI

5. THE STRATEGY: TECHNICAL DETAILS AND A CASE STUDY In this section we present the proposed strategy within a case study. First, we briefly introduce the data and the context where they arise, and then we provide the methodological details along with real data charts. 5.1. An industrial process case study The data we consider derive from a production plant of a European car industry. Bodies of vehicles are monitored through optical electronic devices, taking real-time measurements on 68 key points of the surfaces. Each day over 1000 cars are produced, so that roughly half a million data could be stored weekly. At the factory, process level and variability are monitored on-line through univariate classical control charts (X¯ and R charts) for around 30 vehicles at time. Due to the large number of variables, factory engineers chose to monitor those of crucial interest, and hence only a small subset of quality characteristics are actually under inspection. In this way, potentially useful information is overlooked. The whole set of variables is instead exploited to construct a univariate index that summarizes the overall quality of any single car body. For every 30 cars, the index values are visualized to manage the process performance. If we denote with X = (X1 , . . . , Xj , . . . , X68 ) the variables under control at the factory, and with X0 = j (X01 , . . . , X0 , . . . , X068 ) the corresponding target values, the index will be QI = f (X, X0 )

(1)

i.e. QI is a map from the high dimensional space R68 to R1 . Although we cannot describe the index in detail, note that lower values of the index correspond to higher quality products. An observed series of the index for 1000 cars, recorded in about five days, is represented in Figure 1, where data have been pre-processed. The level and the variability of the series do not appear to be homogeneous for all the items, even if it is quite difficult to determine where such changes occur. In the following, without lack of generality, we will use the QI index and this series to illustrate our methodology. 5.2. Identifying retrospective rational subgroups In the case of a large number of observations, such as the series in Figure 1, determining visually whether and where breakpoints have occurred can be difficult. However, breakpoints potentially hidden in the plot of the original series can appear more clearly if data are appropriately transformed through retrospective cumulative sums. Displaying this cumulated series in a chart allows easier graphical identification of change points. In SPC, CUSUM charts are used to detect when a process is going out of control because of a small shift with respect to a given target value. Generally, on-line CUSUM charts display the series of cumulative sums of deviations of the observed values from a given target value. The sum is up to the last observed data value, and it is updated as soon as a new value comes in. Properties of CUSUM charts have been extensively investigated in the literature by many authors (see e.g. Hawkins and Olwell23 , and references therein). Retrospective CUSUMs differ from on-line CUSUMs as they transform off-line data for a given period, taking as the target value their arithmetic mean. Breakpoints in the transformed series correspond to breakpoints in the original one. If the latter presents subseries with different mean levels, a plot of the transformed series (the rCUSUM chart) will show a piecewise behaviour. The slope of each piece measures the difference among the subseries process level and the overall mean level. Its sign highlights positive or negative shifts in the mean, while the size measures the width of the difference. As long as the slope is close to zero, the corresponding values of the series are close to the overall average. Breakpoint detection methods based on rCUSUM have been investigated in an inferential framework (see e.g. Antoch et al.24 and references therein), and some automatic procedures for multiple breakpoints detection are available for small data sets25 . In our visual data mining perspective, the user selects dynamically the change points in the rCUSUM chart, looking at substantial differences in slopes. In so doing, a partition of the transformed series that corresponds to c 2003 John Wiley & Sons, Ltd. Copyright 

Qual. Reliab. Engng. Int. 2003; 19:273–283

277

0.8 0.4

0.6

Quality Index

1.0

1.2

VISUALLY MINING OFF-LINE DATA

0

200

400

600

800

1000

Items

Figure 1. An observed series of the quality index QIi computed for 1000 vehicle bodies

a partition in the original one is generated. The item groups in the partition will be homogeneous with respect to their mean level in the original series. Due to such homogeneity, we call these groups of items retrospective rational subgroups. In SPC jargon, sampling by rational subgroups is a way of collecting groups of data that are likely to be homogeneous within and potentially different among them. Hence, rational subgroups are identified before data are collected. In our case, we group off-line data after collection and some preliminary analyses, and this is why we denote them as retrospective rational subgroups. With respect to the quality index QI (Equation 1), we aim to identify groups of items which are homogeneous with respect to their quality level. In other words, we look for a partition of the observed series {QIi }(i=1,...,n) taken in a period of interest. Its retrospective cumulative sum up to the mth observation C(QI )m is C(QI )m =

m  ) ) (QIi − QI (QIm − QI = C(QI )m−1 + sQI sQI i=1

 is the arithmetic mean and sQI the standard deviation of {QIi }(i=1,...,n) (the standardized version where QI facilitates the comparison of charts over different periods). The rCUSUM chart consists of the point-line index plot of C(QI )m , m = 1, . . . , n. Looking at the piecewise behaviour in this chart, the user breaks the cumulative series into pieces, selecting the main change points. This implies the identification of a corresponding set of retrospective rational subgroups. If the change points are located at m1 , . . . , mk , . . . , mK−1 , the K retrospective rational subgroups of items will be indexed by rRS k = {mk−1 < i ≤ mk } k = 1, . . . , K, i ∈ N with m0 = 0, and mK = n. That is, rRS k is the set of indices of the observations belonging to the kth group. Figure 2 presents the rCUSUM chart for the series discussed above (Figure 1). A piecewise behaviour clearly appears, and hence some change points in the cumulative series can be identified. At a first glance, we c 2003 John Wiley & Sons, Ltd. Copyright 

Qual. Reliab. Engng. Int. 2003; 19:273–283

G. C. PORZIO AND G. RAGOZINI

0 -2 -4 -6 -8

Retrospective Cumulative Sums

2

4

278

0

200

400

600

800

1000

Items

Figure 2. Retrospective CUSUM chart for the quality index series displayed in Figure 1. Vertical lines highlight a partition into eight groups of items homogeneous in level

select seven of them ({mk }k=1,...,K−1 = {81, 135, 170, 335, 631, 827, 895}), that partition the C(QI )i series into eight groups (in the plot, vertical lines highlight the partition). Note that alternative choices are possible, which should be dynamically investigated by the user (for example, the first three groups could be aggregated into a single one). The obtained eight retrospective rational subgroups will be further analysed in the next steps of the strategy. 5.3. Level and variability analysis Once a set of retrospective rational subgroups have been identified throughout the rCUSUM chart, the second step of our visual strategy aims to select those (one or a few more) with the lowest quality level. Selection is performed through a graphical comparison in terms of closeness to the target and coherence with the level. With this goal, we propose two graphical displays: the Subgroup Mean chart and the Subgroup Interquantile chart. The Subgroup Mean chart is designed to aid the user in the evaluation of differences among the means of each retrospective rational subgroup. The chart displays a stepwise function SM(x) that maps items with the  k be mean of its subgroup, and then it is the step function plot26 of the subgroup means. For each group, let QI its mean level  k= 1 QIi , k = 1, . . . , K QI nk i∈r RS k

with nk = (mk − mk−1 ) the cardinality of rRS k . The chart is the plot of the function  k × I(mk−1 ,mk ] (x) SM(x) = QI

x ∈ [1, n],

∀k

where IA (·) is the indicator function of the set A. In addition, a horizontal line representing the overall mean level is superimposed on the plot as a reference. c 2003 John Wiley & Sons, Ltd. Copyright 

Qual. Reliab. Engng. Int. 2003; 19:273–283

VISUALLY MINING OFF-LINE DATA

279

The Subgroup Interquantile chart provides a way to graphically compare variability among groups. By analogy with the previous chart, it is a plot of two stepwise functions I QR(x) and P R(x), which respectively maps items with the interquartile and the 10–90 percentile ranges of their groups. Let I QRk = Qk0.75 − Qk0.25 (k = 1, . . . , K) be the subgroup interquartile range and P Rk = Qk0.90 − Qk0.10 (k = 1, . . . , K) be its 10–90 percentile range, where we denote the j/m quantile for the kth group with Qkj/m , that is the value below which are the j mths of the data in the group. The chart is a step function plot of the subgroup interquantile ranges, displaying both I QR(x) = I QRk × I(mk−1 ,mk ] (x) P R(x) = P Rk × I(mk−1 ,mk ] (x)

x ∈ [1, n], ∀k x ∈ [1, n], ∀k

along with two reference lines corresponding to the overall interquartile and the 10–90 percentile ranges. The proposed charts allow the user to select the worst subgroups, as they visualize: (i) which groups of items have better or worse performance with respect to the average level and variability; (ii) how wide are the deviations of the group means and ranges from the overall mean and range; (iii) how large are the sizes of each group. Combining the information from both charts, users should select one or more rational subgroups to be explored in the next steps of the analysis. They should focus their attention on groups with the widest positive deviations from the reference lines, jointly taking into account the group sizes. For the data set we are analysing, the proposed charts for the eight groups previously obtained are displayed in Figures 3 and 4. In the Subgroup Mean chart (Figure 3) we note that two groups (the fourth and the sixth from the left) have mean levels substantially higher than the overall level, i.e. the worse quality level. In addition, the Subgroup Interquantile chart (Figure 4) highlights that one of these (the sixth) also has the widest variability. As the size of such a group is quite large with respect to the series size (183 over 1000 observations), we consider it worthy of further investigation.

5.4. Discovering causes At this stage of our visual data mining strategy it is worth discovering which are the Xj variables that move along with the quality index series. In other words, we aim to point out the variables that have the same pattern of the quality index for the group of items presenting the lowest (or lower) quality. The pairwise correlation coefficient seems to us to be a simple and an appropriate tool for this aim. It is indeed well known that it is a similarity measure among two series, as it is negatively related to their squared Euclidean distance. Furthermore, as the quality index is a function of all the quality measurements, in our case the correlation coefficient can be interpreted in terms of causality. Hence, the most correlated variables may have caused the quality decay and are good candidates for further investigation. To visually identify the variables that mainly lead to the decay of process quality, we then propose a Correlation Pareto chart. The chart consists of two side by side bar plots, representing respectively the positive and negative correlation values on a quadratic scale. The values are arranged in descending order in both plots, and the same vertical axis scale is used, so that the user can easily identify which variables have mainly contributed to the variation of the index. In addition, to highlight the relative weight of each variable with respect to the others (having the same correlation sign), a Pareto-like line is superimposed on each plot. The correlation coefficients are computed for the items belonging to two consecutive rational subgroups: the one selected through the inspection of the Subgroup Mean and Interquantile charts, and the previous group. In this way we do not consider patterns along the whole observed series, but we measure similarities only in neighbourhoods of the selected change in level. In detail, let k ∗ be the selected group, and let rj∗ be the moment product Pearson correlation coefficients between the quality index and its j th component variable computed on the items belonging to the k ∗ th and the c 2003 John Wiley & Sons, Ltd. Copyright 

Qual. Reliab. Engng. Int. 2003; 19:273–283

G. C. PORZIO AND G. RAGOZINI

0.66 0.64 0.58

0.60

0.62

Subgroup Means

0.68

0.70

0.72

280

0

200

400

600

800

1000

Items

0.3 0.2 0.1

Subgroup Interquantiles

0.4

Figure 3. Subgroup Mean chart for the quality index series displayed in Figure 1, based on the partition drawn in Figure 2. The function displays the group means. The horizontal line represents the overall mean

0

200

400

600

800

1000

Items

Figure 4. Subgroup Interquantile chart for the quality index series displayed in Figure 1, based on the partition drawn in Figure 2. The functions display the group 10–90 percentiles and the interquartile ranges, respectively, at the top and at the bottom of the figure. The horizontal lines represent the overall ranges

c 2003 John Wiley & Sons, Ltd. Copyright 

Qual. Reliab. Engng. Int. 2003; 19:273–283

0.5

75% 50% Cumulative percentage

0.3

25%

0.2

0.0%

0.1 0.0

0.0

0.0%

0.1

25%

0.2

r^2

50%

0.3

75%

0.4

0.5 0.4

100%

281 100%

VISUALLY MINING OFF-LINE DATA

Variables

Variables

Figure 5. Correlation Pareto chart for the sixth selected group

previous group. That is, 

rj∗

 I ∗ )(Xj − X j ∗ ) − QI i I = rk ∗ (QI, X ) =   j 2  j 2 i∈r RS I ∗ (QIi − QI I ∗ ) i∈r RS I ∗ (Xi − XI ∗ ) j

i∈r RS I ∗ (QIi

 I ∗ and X j ∗ are the means of QIi and Xj for the with rRS I ∗ = rRS k ∗ −1 ∪ rRS k ∗ , I ∗ = {k ∗ − 1, k ∗ }, QI i I items in rRS I ∗ . The Correlation Pareto chart will display separately the positive and negative correlations (say rj∗+ and rj∗− ), as they provide different information. While positive correlations highlight variables that may have caused the quality lowering, negative correlations point out variables that may have improved the quality level. Furthermore, as the correlation coefficient values are not on a linear scale, we plot them on a quadratic scale in order that comparison is not misleading. The squared correlation coefficient can also be interpreted as the proportion of variance of the QI index explained by the linear relationship with the variable Xj . To visually aid the user in evaluating differences we draw the two bar plots in analogy with the classical SPC Pareto chart. The left side of the chart displays the sorted sequence of the squared positive coefficients {[r ∗+ ]2(j ) }j =1,...,J + ([r ∗+ ]2(j ) ≥ [r ∗+ ]2(j +1) ), with J + the number of positive correlations. In addition, a line joining the points j ∗+ 2 i=1 [r ](i) + j = 1, . . . , J + P(j ) = J + ∗+ ]2 [r i=1 (i) is superimposed. Negative correlations are represented in an analogous way at the right side of the chart. c 2003 John Wiley & Sons, Ltd. Copyright 

Qual. Reliab. Engng. Int. 2003; 19:273–283

282

G. C. PORZIO AND G. RAGOZINI

With respect to our case study, the Correlation Pareto chart for the sixth group identified in the previous section is displayed in Figure 5. Looking at positive correlations (left side of the chart) we note three variables (in order from the left X57 , X58 , X42 ) with higher correlations. These variables mainly lead to the quality decrease of the selected items. They are, then, candidates for further investigation by the factory engineers. At this stage of the analysis, further investigations on the selected variables may be performed through partial correlations, so that relationships among them and possible spurious correlations can be taken into account. Alternatively, if a strong multivariate correlation structure is suspected, we suggest constructing a Partial Correlation Pareto Chart for all the variables. At the end, we would like to note that a knowledge discovery process cannot be brought to a close by means of a single procedure or method. The whole strategy has thus to be intended as an exploratory phase of the learning process that could be integrated with more sophisticated analysis.

6. SOME CONCLUDING REMARKS In this paper we presented a visual data mining strategy designed to visually investigate multivariate production processes in an effective and simple way. Effectiveness is pursued through the interaction among human visual learning skills and computer-based numerical methods. The user is called upon to learn from data step by step, exploiting novel information acquired at each stage to achieve deeper insights into the process. The overall strategy is designed so that the graphical procedure has to be run more than once: the user should change the items to be grouped and investigated, so that different causes of quality lowering can be discovered. In addition, we note that the method, asking the user to interactively drive the computer in the detection of relevant information allows some computational gains with respect to other possible automatic techniques. The simplicity of the strategy is given by both the adopted statistical methods and the way in which results are represented. Basic ideas behind our proposal are indeed quite familiar within the SPC context. However, we revised their use in order to provide answers to the challenging questions arising from the availability of large databases. Furthermore, combining these basic tools under a visual data mining perspective, we enhanced their capabilities and explanatory power. Finally, we note that, although the procedure has been illustrated for a given index, it can be performed with any univariate quality index that maps the multivariate structure of a process in a few summary variables (e.g. the Hotelling T 2 statistic or some first principal components). Acknowledgements The authors would like to thank Paola Costantini and staff at the company for useful information and discussions.

REFERENCES 1. Montgomery DC. Research in industrial statistics—part I. Quality and Reliability Engineering International 2001; 17(6):iii–iv. 2. Montgomery DC. Introduction to Statistical Quality Control (4th edn). Wiley: New York, 2001. 3. Alt FB. Multivariate quality control. Encyclopaedia of Statistical Sciences, vol. 6, Johnson NL, Kotz S (eds.). Wiley: New York, 1985; 111–122. 4. Alt FB, Smith ND. Multivariate process control. Handbook of Statistics, vol. 7, Krishnaiah PR, Rao CR (eds.). Elsevier: Amsterdam, 1988; 333–351. 5. Mason RL, Young JC. Multivariate Statistical Process Control with Industrial Application (ASA-SIAM Series on Statistics and Applied Probability, vol. 9). Society for Industrial and Applied Mathematics: Philadelphia, PA, 2001. 6. Liu RY. Control charts for multivariate process. Journal of the American Statistical Association 1995; 90:1380–1387. 7. Qiu P, Hawkins DM. A rank-based multivariate CUSUM procedure. Technometrics 2001; 43:120–132. c 2003 John Wiley & Sons, Ltd. Copyright 

Qual. Reliab. Engng. Int. 2003; 19:273–283

VISUALLY MINING OFF-LINE DATA

283

8. Scepi G. Parametric and nonparametric multivariate quality control charts. Multivariate Total Quality Control, Lauro C, Antoch J, Esposito Vinzi V, Saporta G (eds.). Physica: Heidelberg, 2002; 163–189. 9. Hawkins DM. Multivariate quality control using regression adjusted variables. Technometrics 1991; 33:61–75. 10. Fuchs C, Benjamini Y. Multivariate profile charts for statistical process control. Technometrics 1994; 36:182–195. 11. Fayyad UM, Piatetsky-Shapiro G G, Smyth P. From data mining to knowledge discovery. Advances in Knowledge Discovery and Data Mining, Fayyad UM, Piatetsky-Shapiro G G, Smyth P (eds.). AAAI Press: Menlo Park, CA, 1996; 37–57. 12. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer: New York, 2001. 13. Perner P. Data Mining on Multimedia Data. Springer: Heidelberg, 2002. 14. Giudici P. Applied Data Mining: Statistical Methods for Business and Industry. Wiley: New York, 2003(in press). 15. Milne R,Drummond M, Renoux P. Predicting paper making defects on-line using data mining. Knowledge-Based Systems 1998; 11:331–338. 16. Perner P. Advances in Data Mining, Applications in E-commerce, Medicine, and Knowledge Management. Springer: Heidelberg, 2002. 17. Inselberg A. Visual data mining with parallel coordinates. Computational Statistics 1998; 13:47–63. 18. Macˆedo M, Cook D, Brown TJ. Visual data mining in atmospheric science data. Data Mining and Knowledge Discovery 2000; 4:69–80. 19. Cox KC, Eick SG, Wills GJ. Visual data mining: recognizing telephone calling fraud. Data Mining and Knowledge Discovery 1997; 1:225–231. 20. Young FW, Faldowsky RA, McFarlane MM. Multivariate statistical visualization. Handbook of Statistics, vol. 9, Rao CR (ed.). Elsevier: Amsterdam, 1993; 959–998. 21. Cleveland WS, McGill R. Dynamic Graphics for Statistics. Wadsworth and Brooks/Cole: Belmont, CA, 1988. 22. Woodward RH, Goldsmith PL. Cumulative Sum Techniques. Oliver and Boyd for ICI: Edinburgh, 1964. 23. Hawkins DM, Olwell DH. Cumulative Sum Charts and Charting for Quality Improvement, Springer: New York, 1998. 24. Antoch J, Huskova M, Jaruskova D. Off-line statistical process control. Multivariate Total Quality Control, Lauro C, Antoch J, Esposito Vinzi V, Saporta G (eds.). Physica: Heidelberg, 2002; 1–86. 25. Taylor AL, Tait SP, Porter MA, Perry MJ, Nicolson RW. Automatic breakpoint detection for retrospective cumulative sum charts. Pharmaceutical Statistics 2002; 1:25–34. 26. Cleveland WS. The Elements of Graphing Data (rev. edn). AT& T Bell Laboratories: Murray Hill, NJ, 1994; 188–189.

Authors’ biographies Giovanni C. Porzio is an Associate Professor in the Department of Economics at Cassino University in Italy. He received an MSc in Statistics from the University of Minnesota, and a PhD in Computational Statistics from the Federico II University of Naples, Italy. His research interests include multivariate statistical process control, model building and diagnostics, graphical methods for statistics. Giancarlo Ragozini is an Assistant Professor in the Department of Sociology at the Federico II University of Naples in Italy. He received a PhD in Computational Statistics from the Federico II University of Naples, Italy. His research interests include multiple outlier detection in multivariate data, nonparametric quality control, graphical methods for statistics, and the application of computational geometry to exploratory data analysis. He is currently involved in the VITAMINS European project on visual data mining.

c 2003 John Wiley & Sons, Ltd. Copyright 

Qual. Reliab. Engng. Int. 2003; 19:273–283