Data mining and knowledge discovery for process monitoring and ...

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING Int. J. Adapt. Control Signal Process. 2006; 20:353–355 Published online in Wiley InterScience (www.interscience.wiley.com)

BOOK REVIEW DATA MINING AND KNOWLEDGE DISCOVERY FOR PROCESS MONITORING AND CONTROL,

by X.Z. Wang, Advances in Industrial Control, Springer, London, 1999, pp. 1–251, ISBN 1-85233-137-2 Effective plant operation and supervision requires collecting and collating a significant number of process variables at high sampling frequencies, which results in large amounts of quantitative information being acquired. This data is vital for automatic control, process monitoring and operational assessment of the plant, but the volume of this data can often mask the underlying trends or important components necessary for correct implementation of these tasks. In addition, plant operators may be unable to extract fundamental information from such vast datasets. Data reduction techniques have been proposed to overcome this problem. Several different approaches have been formulated in this area to address this issue of extracting the most important information from large databases of correlated or uncorrelated measurements. Statistically based dimension reduction methods such as principal component analysis (PCA) and partial least squares (PLS) are widely used methods for extracting the most pertinent components of the data based on its covariance structure. Wavelet analysis is an expanding discipline developed through signal and image analysis applications, but increasingly applied for data compression and fault detection in the process industries. Fundamentally, the wavelet analysis utilizes the Fourier transform as a decomposition of a function in time into a sum of frequency components. A principal function can therefore be represented by a family of functions, the wavelets, which may be used to extract features from the original dataset. Artificial intelligence [1] and pattern recognition [2], techniques are also increasingly used in the process industries.

Copyright # 2006 John Wiley & Sons, Ltd.

A number of books cover these separate areas individually and in a general sense, for most of these techniques are applicative to a wide-range of disciplines including medicine, chemometrics, economics and neurology. A thorough overview of several approaches, focusing mainly on a comparison of PCA, PLS and other multivariate analyses, to fault handling in industrial processes, with examples based on the Tennessee Eastman problem, is provided by Chiang et al. [3]. The origins of modern pattern recognition techniques, as used in current industrial practice, may be found in Duda and Hart [4] and a good introduction to the problems of supervision and control of industrial plants is provided by Sohlberg [5]. However, the publication, Data Mining and Knowledge Discovery for Process Monitoring and Control by X.Z. Wang, is an attempt to bring together these separate fields in a single review and analysis, with specific application to modern process industries. These fields are agglomerated under the banner of data mining and knowledge discovery methods, which neatly encapsulates the application and objective of this book. The book has been organized as follows: Chapter 1 provides an introduction to statespace based process monitoring and control systems, with emphasis on the problems associated with the large volumes of data acquired by the distributed control system (DCS). A brief description of current approaches to process monitoring and control, traditional statistical monitoring and characteristics of the process data is given. Although, data mining (DM) and knowledge discovery in databases (KDD) are not generically defined for all fields of application, Chapter 2 presents an overview of these terms in the context of data analysis for process applications. Essentially, DM and KDD relate to extracting potentially useful patterns or information from data. Received Revised

354

BOOK REVIEW

The remainder of the chapter is dedicated to providing examples of data mining techniques, such as clustering and classification, and a summary of method selection criteria. Chapter 3 discusses the requirement for data pre-processing prior to feature extraction, dimension reduction and concept formation. Both PCA and wavelets are introduced as pre-processing, as well as data mining, methods and mathematical descriptions of both are given, along with clear examples of their role in pre-processing as a precursor to data mining. A main benefit of PCA and PLS is their ability to handle multivariate systems, with large numbers of correlated variables. Chapter 4 discusses the role of both PCA and PLS in multivariate statistical analysis of data, describing their advantages over traditional univariate quality control charts, for example. A brief discussion of advanced forms of these tools, such as multiblock PCA/PLS for batch process monitoring and nonlinear PCA, is provided. Finally, an industrial case study of a fluid catalytic converter (FCC) is used to highlight the application of PCA to knowledge discovery using historical product data in an attempt to provide operational strategy guidance for improving product quality. The concept of machine learning is described in Chapters 5 and 6. Chapter 5 looks at supervised learning for operational support. Focusing on the neural network approach, a concise description of both the feed-forward and fuzzy neural networks is given. Examples of fault diagnosis and operational state identification highlight the clear benefits of neural network models in providing conditional monitoring that can be used seamlessly with decision making tools. Case studies are provided for the FCC example, a continuous stirred tank reactor (CSTR) and a wastewater treatment plant. The latter being only a simple study of what is commonly recognized as a complex, highly dynamic, nonlinear process. Supervised learning is dependent on a posteriori knowledge of certain elements of the data, i.e. the normal operating condition. This information forms the training phase of the learning method, to which test data are applied and deductions made on the similarity between the test data set and the training model. If no training data Copyright # 2006 John Wiley & Sons, Ltd.

is available, i.e. measurements are unavailable or classification of patterns cannot be formed, then unsupervised learning methods can be applied, which is the focus for Chapter 6. The techniques described, adaptive resonance theory (ART) and ARTnet, rely on distance measures of similarity, which are then compared to some threshold value. A third technique, known as Bayesian automatic classification does not require a threshold value and is therefore considered a truly autonomous clustering procedure. Again, the learning algorithms are succinctly presented and an example using the FCC case study is provided. In Chapter 7, the problem of identifying the causal explanation for data assignation to clusters is addressed. The methods described in earlier chapters focus on the assignation of data to clusters but cannot provide an indication of the reason why, such that effective process monitoring and control would be difficult. This is analogous to detecting a process fault without knowing the nature or location of the fault, thus restricting the ability for fault remediation via control. Inductive learning for conceptual clustering utilizes multivariate statistical techniques with clustering and classification and decision trees to achieve operational state identification for real-time monitoring. An example of this approach to a refinery methyl tertiary butyl ether (MTBE) process provides clarification of the theory. Chapter 8 looks at expert systems (ES) as an alternative to neural networks, for example. ES are used for rule generation via automatic knowledge extraction. Descriptions of rule generation using fuzzy set operation, neural networks and the rough set method are provided. Chapter 9 concludes the book with a discussion of inferential models as software sensors. The basic concept is that in many complex chemical or biochemical processes, information used for fundamental modelling is not available or difficult to obtain, such that an accurate model relating measured variables with controlled (inferred) variables is infeasible. The chapter looks at using inferential models, derived from operational data, as software sensors and, subsequently, for process monitoring and control applications, such as fault diagnosis. The FCC case study is used as the example for continuous process improvement Int. J. Adapt. Control Signal Process 2006; 20:353–355

355

BOOK REVIEW

using a mixture of neural networks and automatic knowledge extraction techniques. The book is of interest to process and control engineers, as well as plant operators and supervisors as a reference to the wide range of available techniques that can be applied to large data sets for process monitoring and control. However, it must be noted that the book is both an overview of these techniques and also discusses ideas not wholly accepted or applied in practice in modern industry at the present time. Hence, the book may be used as a reference for further reading on the particular topics discussed or as a basis for further research prior to process implementation. The main contribution is the confluence of many ideas and applications detailed elsewhere into a concise and welcome treatise on their practical use in the process industries. The use of clear examples throughout the book highlights the real benefits that may be achieved with data mining and knowledge discovery. Finally, it would be interesting to take the concepts to their logical extreme in relating the outputs from the various conceptual and inferential models to actual automatic control theory, such that some interaction between the data

Copyright # 2006 John Wiley & Sons, Ltd.

mining tools and the plant controller is made, possibly via the supervisory control system. MATTHEW J. WADE Environmental Institute, ttz Bremerhaven, Umweltinstitut, An der Karlstadt 6-27568, Bremerhaven, Germany (DOI: 10.1002/acs.869)

REFERENCES 1. Crowe ER, Vassiliadis CA. Artificial intelligence: starting to realise its practical promise. Chemical Engineering Progress 1995; 91(1):22–31. 2. Fayyad UM, Piatestsky-Shapiro G, Smyth P, Uthurusamy R (eds). Advances in Knowledge Discovery and Data Mining. AAAI Press/MIT Press: Cambridge, MA, 1996. 3. Chiang LH, Russell EL, Braatz RD. Fault Detection and Diagnosis in Industrial Systems. Springer: Berlin, 2001. 4. Duda RO, Hart PE. Pattern Classification and Scene Analysis. Wiley: New York, 1973. 5. Sohlberg B. Supervision and Control for Industrial Processes. Springer: Berlin, 1998.

Int. J. Adapt. Control Signal Process 2006; 20:353–355