Statistical Process Techniques on Water Toxicity Data Eleni Smeti∗, Demetrios Koronakis∗, Leonidas Kousouris∗
Abstract -- Statistical Process Control (SPC) uses statistical techniques to improve the quality of a process reducing its variability. The main tools of SPC are the control charts. The basic idea of control charts is to test the hypothesis that there are only common causes of variability versus the alternative that there are special causes. Control charts are designed and evaluated under the assumption that the observations from the process are independent and identically distributed (iid) normal. However, the independence assumption is often violated in practice. Autocorrelation may be present in many chemical procedures, and may have a significant effect on the properties of the control charts. Thus, traditional SPC charts are inappropriate for monitoring process quality. In this study, we present methods for process control that deal with autocorrelated data and mainly a more sophisticated method based on time – series ARIMA models (Alwan – Roberts method). We apply the typical SPC tetechniques and the timeseries method on toxicity data of water for human consumption from treated water tanks. This application shows the serious effects of autocorrelation when typical SPC control charts are applied on autocorrelated observations. Index terms-- Statistical process control; Control charts; Autocorrelation; Times series models; Autocorrelated process control
I.
The main tools of SPC are the control charts. The basic idea of control charts is to test the hypothesis that there are only common causes versus the alternative that there are special causes of variability. In the former case, the process is in a state of statistical control, and in the latter, the process goes out of statistical control. Typical control charts (Shewhart, CUSUM, EWMA) are designed and evaluated under the assumption that the observations from the process are independent and identically distributed (iid). However, the independence assumption is often violated in practice. Autocorrelation may be present in many processes, and may have a significant effect on the properties of the control charts. When autocorrelation is present, there are problems of noticing “special causes” that do not exist and not detecting “special causes” that truly exist, implying a high probability of false positives and / or false negatives. Thus, typical SPC charts are inappropriate for monitoring process quality. In this study, we present methods for process control that deal with autocorrelated data and mainly a more sophisticated method based on time – series ARIMA models (Alwan – Roberts method). Autocorrelation in the data is usual in many analytical systems. A study is also presented concerning the toxicity in water for human consumption.
INTRODUCTION II. SPC CONTROL CHARTS FOR VARIABLES
All systems and processes exhibit variability. Statistical Process Control (SPC) uses statistical methods to improve the quality of a process. This can be achieved by the systematically reduction of the variability. There are two reasons of variability: common causes and special causes. Common causes concern in the natural variability that always exists in every process and there is no way to avoid it. Special causes are not an inherent characteristic of the process and therefore they can be identified and eliminated. Statistical Process Control aims at the separation of the two types of variability. The strategy for eliminating special causes of variation is summarized in: i) The usage of early warning indicators ii) the search for cause of trouble wherever there is an indication that a special cause has occurred (what was different on that occasion) and iii) the elimination of the particular temporary or local problem. ∗
Water Quality Control Division, Water Supply & Sewerage Corporation of Athens (E.YD.A.P.). E-mail:
[email protected]
The most important types of control charts for variables (i.e. quality characteristics which are measurable and can be expressed on a numerical scale) are: Shewhart control charts, the Exponentially Weighted Moving Average (EWMA) charts and the CUSUM charts. All types of control charts have some common features. The center line corresponds to average performance whereas the control limits (the other two lines) correspond to the expected range of variation based on the process. If all the points plot between the two control limits and do not exhibit any identifiable pattern the process is said to be in statistical control. The available data concern in daily samples with size n=1. Therefore, the three types of control charts, which were referred above, regard to individual observations. The first and simplest type of control charts is Shewhart chart. X control chart for individuals (I Chart) is used for the control of the process mean. In this chart, the measurement data are plotted according to the time order. The center line is set in the mean of the process or in its estimation and the control limits are usually in a distance of
three standard deviations from both sides of the mean. The standard deviation of the process is often, practically, unknown and is estimated from the moving range as
MR / d 2 or from the sample standard deviation of the data as s / c4, where d2 can be read from special tables for n = 2 (it is exactly 1,128), and c4 is a constant which is related to the number n of the individual observations and approximates the value 1, as n increases. These charts are appropriate for fast detection of large shifts in the process, while they are insensible to small shifts. For the increase of the effectiveness of the Shewhart charts there have been recommended criteria which indicate that a process shift or trend has begun. These criteria are known as run rules and are based on runs of consecutive points increasing/ decreasing or oscillating above and below the center line. Exponentially Weighted Moving Average charts (EWMA) and CUSUM charts are better at detecting small changes in the process mean than Shewhart charts. The EWMA statistic is defined as zi = λxi + (1-λ)zi-1 and it is a weighted average of all past and current observations with weights that decrease geometrically. The constant λ can take values in the interval 0