MINI-PAPER - ASQ

11 downloads 152 Views 337KB Size Report
In this mini-paper we will discuss some basic purposes for doing such data analysis and the simple tools that can assist in data analysis. We will focus on the  ...
MINI-PAPER A Gentle Introduction to the Analysis of Sequential Data by Rong Pan, Ph.D., Assistant Professor of Industrial Engineering, Arizona State University We, applied statisticians and manufacturing engineers, often need to deal with sequential data, which are collected on a manufacturing process by sensors. The data volume could be very large and it is easy to lose the sight of the purpose of data analysis. In this mini-paper we will discuss some basic purposes for doing such data analysis and the simple tools that can assist in data analysis. We will focus on the analysis of a univariate series. Our purposes include understanding the series’ characteristics as exhibited in observations, predicting a future observation, and controlling mean value and variance. 1. Independence versus Correlation Sequential data is often referred as time series, particularly for the observations collected over time. However, we recognize that this type of data is not necessary to be indexed by time only. For example, considering the x-ray inspection of highway pavement, these data are spatially indexed. One of the critical data analysis steps is to check if observations are independent to each other. Many statistical data analysis techniques assume that observations are independent, which is unlikely in sequential data where close observations tend to be correlated even when the system under study is stable and undisturbed. This type of correlation is defined as autocorrelation (prefix “auto-“ means “self”), it is the correlation of samples in the series to earlier samples in the same series. Autocorrelation is a function of time lag, i.e., the time units separating a later series from an earlier series. Figure 1(a) plots the data from Boyles (2000), which gave a time series (190 observations) for the fill weights observed for a powder food product. Over time, it looks as though observations are wandering around a mean value, but it is still not easy to detect autocorrelation by visual inspection only. The autocorrelation can be revealed by a plot of the sample autocorrelation function, which is shown in Figure 1(b). Clearly, the autocorrelation of lag 1 is significant and positive, which indicates that every next observation tends to be more or less like its previous observation. Unlike independent observations where variation is due to a pure random noise only, variation in an autocorrelated series can be decomposed to random noise and a systematic variation pattern. In this example, the primary pattern is lag one autocorrelation. Knowing this systematic pattern is essential to the time series prediction and control.

Statistically significant autocorrelation is strongest at lag=1

Figure 1: (a) time series plot and (b) sample autocorrelation of Boyles’ data set.

4

ASQ STATISTICS DIVISION NEWSLETTER, VOL. 28, NO. 1

A Gentle Introduction to the Analysis of Sequential Data Continued from page 4 2. Stationary versus Nonstationary Another important characteristic of time series is stationarity. Loosely speaking, if observations are varying around a constant mean and the variance does not explode over time, then the series is stationary. In industry, a nonstationary process is highly undesirable. It indicates that the process needs some corrective actions to bring it back under control. Detecting the nonstationary pattern can give insights into the mechanism that generates the series. For example, del Castillo (2002, p.20) gave 40 measures of a dimension machined on aluminum parts processed on a Fadal computer numerically controlled (CNC) machine tool. The time series plot, as shown in Figure 2(a), clearly demonstrates a pattern of increasing mean over time, possibly because of the tool wear. We are interested in decomposing the total variation of this series to the sum of increasing mean value plus the pure random noise. After removing (filtering) the increasing trend of the mean by using a drifted random walk model (technically an “integrated moving average” model), Figure 2(b) shows the series under control. After subtracting the predictable drifts, it becomes a random noise without apparent patterns. The process can be made stationary through corrective actions calculated by the filter model.

Figure 2: (a) time series plot and (b) residual plot of Fadal dataset. Modern manufacturing processes are usually under some types of automatic process control (APC) to prevent nonstationarity/instability. As systematic reactions to past observations, control actions will introduce autocorrelation into the sequential process observations. These automated reactions to trends make APC different from statistical process control (SPC). SPC is more concerned with detecting a sudden shift in process mean and/or process variance due to assignable causes, but less concerned with mild autocorrelation as long as the process is stationary. Assignable causes may come from outside the process, and require investigation to determine an effective reaction. APC reacts to common causes that are part of the process (e.g. tool wear), and effective reactions are so well known they can be automated. However, APC can interfere with SPC. Systematic reactions deliberately push the next observation away from the previous observation, creating negative autocorrelation that may inflate the estimation of process variance. This leads to inflated control chart limits, and makes the statistical process control chart less sensitive to a shift. An active quality control research area in the past two decades is the integration of APC and SPC. For detailed discussions on this topic, see Box and Luceño (1997), del Castillo (2002) and Pan (2009).

ASQ STATISTICS DIVISION NEWSLETTER, VOL. 28, NO. 1

5

A Gentle Introduction to the Analysis of Sequential Data Continued from page 5 3. Filtering and Smoothing Given a sequential dataset we would like to fit a model which consists of a predictable part, such as a deterministic mathematical function, and a random noise. Smoothing and filtering are two approaches to the same problem, i.e., separating the predictable part (signal) from the noise, but from different perspectives. Filtering comes from signal processing and target tracking, where the task is to filter out the noise and to predict the next move of the target. Therefore, the ability to make on-line predictions in real-time is very important. Smoothing comes from statistics, where the major task includes model fitting and diagnosis. It is typically implemented off-line, so all available data can be utilized to estimate the predictable part of the time series. After removing the noise, the series will become smoother (less variation) than the original one. Oftentimes these methods are known as forward filtering and backward smoothing. For example, Eq. (1) is a linear filter using a moving average, taking the weighted average of past five observations: yt = w1xt + w2xt-1 + w3xt-2 + w4xt-3 + w5xt-4 , Eq. (1) 5

where wi’s are weights and ∑ i - 1 wi = 1. The current time index value is “t”, one lag in the past is “t-1”, and so on. For this example, a simple moving average filter will let wi = 1/5. After averaging, the series {y1} is smoother than the original series {x1}, and {yt} can be used to predict the next observation of {xt} series. Depending on applications, equal weights used in the above linear filter may not be the best choice. A sensible modification of Eq. (1) could be assigning larger weights to recent observations and smaller weights to remote observations. In fact, the exponential smoothing algorithm reduces the weight value exponentially to discount the effect of remote observations on prediction. The same algorithm is used in EWMA control charts. Statisticians are interested in the estimation of weights in a linear filter. For some special classes of time series models, optimal weights can be found based on certain optimization criteria. For example, we may find the optimal weight of exponential smoothing by minimizing the mean square prediction error. For the previous Boyles’s dataset, the optimal weight of exponential smoothing is found to be 0.18. As shown in Figure 3, the mean square error is reduced to 216 by exponential smoothing using the optimal weight, comparing to 240 by moving average. However, optimizing the weights may not be practically important to many applications. Exponential smoothing performs well for weights that are rougly near the optimal, which is why it is popular.

Figure 3: (a) moving average and (b) exponential smoothing of Boyles’ dataset.

6

ASQ STATISTICS DIVISION NEWSLETTER, VOL. 28, NO. 1

A Gentle Introduction to the Analysis of Sequential Data Continued from page 6 Eq. (1) can be viewed as a linear regression of y_t on five consecutive observations of x_t. To find the optimal regression coefficients, we can use the least square method. This implies that other regression techniques can be applied for the data smoothing purpose. Locally weighted scatterplot smoothing, or loess, is one of such methods. In loess, each smoothed point in the series is estimated by a low-degree polynomial function of its neighbor points. The polynomial is fitted using weighted least squares, giving more weight to points in the close neighborhood of the point being estimated. Note that we may specify the degree of polynomial model and the weights, but the data fitting is on a local neighborhood level. Typically, we use simple models, such as linear regression models, to fit localized subsets of the data and build smooth curve point by point; thus, there is not a global function to describe the deterministic part of the variation in data, but rather a collection of locally predicted points. Figure 4 (a) and (b) give the loess curves of Boyles’ dataset with 0.2 and 0.4 degree of smoothing, respectively, which means that for each smoothed point only the nearest 20% or 40% of total points are utilized for localized regression. As this percentage increases, we can see that the curve becomes flatter, so less variability in the observed data is accounted for by this smooth curve.

Figure 4: (a) loess smoothing with the smoothing parameter 0.2 and (b) 0.4 Loess is data oriented and computationally intensive. Computer programs are necessary to fit all of the many local regression equations used to smooth one series. However, one of the major advantages of this method is its flexibility in fitting complex, nonlinear patterns exhibited by the data. In fact, analysts have a great deal of control of smoothness of the fitted curve by adjusting the smoothing parameter. 4. Prediction and Control As explained previously, there are many reasons for analyzing sequential data; however, two major general purposes are prediction and control, and often they go hand-in-hand. In production and inventory management, for example, we want to predict the production or inventory level of an item. Then we may apply some management tools and strategies to adjust this level to be closer to a target or to reduce its variation. After decomposing the sequential observations to a predictable function and a random noise, it is straightforward to apply the prediction function. Therefore, to design a good predictor we want it able to account for most of the variability in observations due to systematic patterns while ignoring random noise. An optimal filter is, thus, an optimal predictor. Based on the predicted process value, a process feedback control scheme applies control actions on some adjustable variables, which in turn alters the process value to its desired target. In control theory, design of a controller involves controllable variables and their interactions with system outputs, which is out of the scope of this article. Figure 5 depicts a generic process of feedback control and highlights the role of sequential data analysis in this process.

ASQ STATISTICS DIVISION NEWSLETTER, VOL. 28, NO. 1

7

A Gentle Introduction to the Analysis of Sequential Data Continued from page 7

Figure 5: A generic process feedback control scheme

5. Summary In this mini-paper, we discuss the essence of a sequential data analysis task, which is to decompose the process variation to a systematic pattern and a random noise, and they are in turn to be utilized in system prediction or control. The literature on this topic is vast. We only wish to illustrate some basic concepts, such as autocorrelation and nonstationarity, through examples and to demonstrate the ideas behind some simple filtering/smoothing techniques.

References: Boyles, R. A. (2000). “Phase I analysis for autocorrelated processes”, Journal of Quality Technology, 32(4), pp. 395-409. Del Castillo, E. (2002). Statistical Process Adjustment for Quality Control, Wiley Series in Probability and Statistics, Wiley. Box, G. E. P. and Luceño, A. (1997). Statistical Control by Monitoring and Feedback Adjustment, Wiley Series in Probability and Statistics, Wiley. Pan, R. (2009). Statistical Process Adjustment in Short-Run Manufacturing Process, VDM Verlag Dr. Müller.

8

ASQ STATISTICS DIVISION NEWSLETTER, VOL. 28, NO. 1