data-driven modeling of surge

1 downloads 0 Views 3MB Size Report
also assisted this research by lending a PXI box for data collection at the test site ...... sqrt(840)*x1^4*x2*x3 sqrt(1680)*x1^3*x2*x3^2 sqrt(2520)*x1^2*x3^2*x2^2.
DATA-DRIVEN SURGE MAP MODELING FOR CENTRIFUGAL AIR COMPRESSORS

by

Angela A. Sodemann

A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of

Master of Science in Engineering

at The University of Wisconsin-Milwaukee August 2006

DATA-DRIVEN SURGE MAP MODELING FOR CENTRIFUGAL AIR COMPRESSORS

by Angela A. Sodemann

A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of

Master of Science in Engineering

at The University of Wisconsin-Milwaukee August 2006

Major Professor

Date

Graduate School Approval

Date

ii

ABSTRACT DATA-DRIVEN SURGE MAP MODELING FOR CENTRIFUGAL AIR COMPRESSORS by Angela A. Sodemann

The University of Wisconsin-Milwaukee, 2006 Under the Supervision of Dr. Yaoyu Li

Surge is a phenomenon consisting of large pressure and airflow oscillations that can cause severe damage to centrifugal air compressors. Surge can be avoided by operating the compressor in the non-surge region below the surge line on a compressor surge map. In order to avoid surge and also operate the compressor in the highly efficient region near the surge line, it is necessary to know the location of the surge line accurately. In this thesis, the location of the surge line is accurately determined by creating a surge map that incorporates all relevant variables, including ambient air conditions. Surge maps are obtained by using data-driven modeling methods. The map is created using data obtained from surge tests performed on a compressor testbed at Toyota Motors Manufacturing in Kentucky. Principal components analysis (PCA) is utilized to reduce the data dimensionality for surge mapping. It is found that the relative humidity, the outlet temperature at stage 1, and the inlet guide vane (IGV) opening percentage are the most useful variables for surge modeling. Multilayer perceptron (MLP) and support vector machine (SVM) methods are implemented for surge map modeling. The method of asymmetric support vector machine (ASVM) is then developed to reduce the possibilities of damage resulting from an incorrect surge

iii

prediction. The obtained ASVM surge map model is then verified on the compressor testbed.

Major Professor

Date

iv

TABLE OF CONTENTS List of Figures

vii

List of Tables

x

Acknowledgements

xi

Chapter 1 1.1 1.2 1.3 1.4

Introduction Background Problem Statement Research Approach Organization of the Thesis

1 1 6 11 12

Chapter 2 2.1 2.2

Literature Review Surge Literature Review Review of Data-Driven Modeling Methods 2.2.1 Multilayer Perceptron 2.2.2 Principal Components Analysis 2.2.3 Support Vector Machine Review of Data-Driven Modeling for Turbomachinery

14 14 19 20 22 23 25

Chapter 3 3.1 3.2

Multilayer Perceptron Introduction to Pattern Classification Multilayer Perceptron

28 28 30

Chapter 4 4.1 4.2

Principal Components Analysis Introduction to Dimensionality Reduction Principal Components Analysis 4.2.1 Introduction to Principal Components Analysis 4.2.2 Data Scaling 4.2.3 PCA Algorithm 4.2.4 Summary of PCA Algorithm

38 38 40 40 41 41 43

Chapter 5 5.1 5.2

Support Vector Machine Introduction to Support Vector Machine Linearly Separable SVM 5.2.1 Data Scaling 5.2.2 SVM Algorithm Nonlinearly Separable SVM

45 45 49 49 50 54

Asymmetric Support Vector Machine ASVM Motivation ASVM Algorithm Numeric Example for Illustrating ASVM

57 57 61 62

2.3

5.3 Chapter 6 6.1 6.2 6.3

v

Chapter 7 7.1 7.2

7.3

Chapter 8 8.1 8.2 8.3 8.4

Results and Discussion Outline Surge Map Modeling with Historical Data 7.2.1 Data Collection 7.2.2 Multilayer Perceptron 7.2.3 Principal Components Analysis 7.2.4 Support Vector Machine Surge Map Modeling with Surge Test Data 7.3.1 Data Collection 7.3.2 Data Analysis 7.3.3 Principal Components Analysis 7.3.4 Support Vector Machine 7.3.5 Asymmetric Support Vector Machine 7.3.6 Model Validation

67 67 68 68 70 74 78 81 81 83 87 89 97 99

Conclusions Thesis Research Overview Research Contributions Future Work Concluding Remarks

104 104 107 108 110

References

112

Appendix A

Compressor Operation Data

118

Appendix B

Multilayer Perceptron Code

126

Appendix C

Principal Components Analysis Code

129

Appendix D

Support Vector Machine Code

139

Appendix E

Surge Map Verification Plots

149

vi

LIST OF FIGURES Figure 1.1

Turbocharger Compressor Flow Maps for 3000GT and Stealth Owners

4

Figure 1.2

Map showing the relationship between the surge limit line and the control line

5

Figure 1.3

Surge and not-surge data collected from surge tests plotted as airflow vs. pressure ratio

8

Figure 1.4

Pattern classification separating hypersurface separating surge and not-surge points

10

Figure 1.5

Two possible cases of misclassification

11

Figure 2.1

Speed lines indicating the operating conditions for a centrifugal air compressor at various impeller speeds

16

Figure 2.2

Actual surge line and surge control line plotted on top of centrifugal compressor speed lines

18

Figure 2.3

Illustration of neuron structure

20

Figure 3.1

Structure of a perceptron

31

Figure 3.2

Activation functions

32

Figure 3.3

Structure of a multilayer perceptron

33

Figure 5.1

Comparison of maps obtained using MLP and SVM

47

Figure 5.2

Example of a set of linearly separable data points

48

Figure 5.3

Example of a set of nonlinearly separable data points

49

Figure 5.4

Relationship between support vectors and separating line

51

Figure 5.5

Illustration of the method of nonlinear support vector machine

55

Figure 6.1

Uncertain area in the separable pattern classification problem for surge map modeling

59

Figure 6.2

Illustration of a surge point predicted to not surge

60

vii

Figure 6.3

Illustration of a not-surge point predicted to surge

60

Figure 6.4

Numeric example for illustrating ASVM-based surge map modeling

63

Figure 6.5

3-D ASVM test with d = -10

64

Figure 6.6

Another view of Figure 6.5

65

Figure 6.7

Smallest distance from the separation hypersurface under different magnitude of desired output in the 3-D ASVM simulation

66

Figure 7.1

Historical surge data with motor current vs. IGV opening

70

Figure 7.2

Structure of the multilayer perceptron used in this study

71

Figure 7.3

Two-dimensional multilayer perceptron surge model #1 for historical surge data

73

Figure 7.4

Two-dimensional multilayer perceptron surge model #2 for historical surge data

74

Figure 7.5

Historical surge data plotted along the two largest eigenvectors obtained from PCA processing

77

Figure 7.6

Historical surge data plotted along the three largest eigenvectors obtained from PCA processing

77

Figure 7.7

Another view of Figure 7.6

78

Figure 7.8

SVM separation surface for p = 2

79

Figure 7.9

Another view of Figure 7.8

80

Figure 7.10

Comparison of MLP, SVM, and “current limit low” lines for two-dimensional surge data

81

Figure 7.11

Plot of IGV and BOV positions during a surge test

84

Figure 7.12

Plot of pressure and temperature measurements during a surge test

85

Figure 7.13

Second surge test performed on January 18, 2006

100

Figure 7.14

First surge test performed on May 24, 2006

102

viii

Figure A.1

Diagram of sensor locations within the TMMK testbed compressor

124

Figure E.1

First surge test performed on January 18, 2006

149

Figure E.2

First surge test performed on January 13, 2006

150

Figure E.3

Second surge test performed on January 13, 2006

151

Figure E.4

Third surge test performed on January 13, 2006

152

Figure E.5

Second surge test performed on May 24, 2006

153

ix

LIST OF TABLES Table 5.1

Typical kernels that may be used for the support vector machine method

56

Table 6.1

Distances of surge and not-surge points from the separating hypersurface for different values of not-surge desired output

64

Table 7.1

List of 22 variables recorded/calculated and four sample data points

83

Table 7.2

Surge and not-surge operating conditions identified for the first surge test performed on September 15, 2005

86

Table 7.3

Eigenvalues and the eigenvectors corresponding to large eigenvalues for the covariance matrix of the surge data

88

Table 7.4

SVM output for all training and testing data

91

Table 7.5

SVM output for all training and testing data, using a different subset of data for training

92

Table 7.6

Comparison of training and testing point outputs for the SVM model and MLP model #1

94

Table 7.7

Comparison of training and testing point outputs for the SVM model and MLP model #2

95

Table 7.8

Comparison of training and testing point outputs for the SVM model and MLP model #3

96

Table 7.9

ASVM surge map modeling results for all training and testing points

98

Table A.1

One set of historical data collected during a surge event

119

Table A.2

Motor current and IGV opening data for the surge and notsurge points identified from historical data, before scaling

122

Table A.3

Motor current and IGV opening data for the surge and notsurge points identified from historical data, after scaling

123

Table A.4

Covariance matrix calculated from the scaled historical data

125

x

ACKNOWLEDGEMENTS

The author would like to thank the administration, engineers, and co-ops at Toyota Motors Manufacturing in Kentucky and Toyota Motors Manufacturing North America, including Bruce Bremmer, Mark Rucker, Rick Lancaster, Jeff Carnagie, Neal Schimmels, UC co-ops Andy Inman, Richard Ferguson, and Brian Stebbins for their work and support in this research. Thanks is also due to the administration and researchers of the Center for Intelligent Maintenance Systems, including Dr. Jay Lee, Daniel Guido, and Dr. Hai Qiu. Assistance and support was also received from Dr. Fred Discenzo and Dr. Peter Schmidt of Rockwell Automation, Inc. and from Jeff Buterbaugh of National Instruments Corp. Rockwell Automation, Inc. also supported this research through the lending of a PLC for use in testing in the UWM mechatronics laboratory. National Instruments Corp. also assisted this research by lending a PXI box for data collection at the test site. Thanks is heartily given to all of these individuals without whom this research would not have been possible.

xi

1

Chapter 1 Introduction Chapter 1 will present an introduction to this thesis research. This chapter begins with background information on the research theme in the first section, followed by a formal problem statement in the next section. Section 1.3 will introduce the methods to be used in the research. Finally, Section 1.4 will give an outline of the thesis.

1.1 Background Compressed air is an important utility source in industry [1]. For manufacturing plants, compressed air is used to drive various pneumatic tools in the production lines. This energy source is widely used because it is clean, easily transported, easily produced, and readily available. The system for producing compressed air is known as a “compressed air system.” A compressed air system contains many components.

The crucial component in a

compressed air system is the compressor, which is responsible for increasing the air pressure.

There are two basic types of air compressors: positive displacement

compressors and dynamic compressors. The two typical dynamic compressors are axial compressors and centrifugal compressors.

Axial compressors compress air in the

direction parallel to the rotational axis, while centrifugal compressors compress air in the direction perpendicular to the rotational axis [2]. Among the different types of compressors, the centrifugal compressor is the most widely used because of its easy adaptability to high performance demands through use of

2

multiple “stages.” The centrifugal compressor makes use of a rotary assembly of blades known as an “impeller” which increases the velocity of the air, thus increasing its kinetic energy. The high-speed airflow is then decelerated when passing through a diffuser, where the kinetic energy is converted into potential energy in the form of higher pressure. This combination of an impeller and diffuser is referred to as a stage. Multi-stage operation is often necessary to accelerate the air to sufficient speed when a high compression ratio is desired [3]. The process of air compression is a polytropic process where Pv n is a constant, so air temperature will be raised significantly at the outlet of each stage. This high temperature is detrimental to the components of the following stages. Also, the moisture contained in the hot air is undesirable for the end use. For this reason, each stage is followed by an aftercooler to lower the temperature and a dryer to remove the moisture. As the dirt and oil contained in the air may tend to damage the compressor by becoming deposited on the impeller or compressor piping, filters are also necessary. Centrifugal compressors are vulnerable to an unstable operating condition that is known as “surge.” Surge is a phenomenon that occurs in air compressors when the compressor loses the ability to produce sufficient pressure at the diffuser to balance the pressure already existing in the discharge piping of the compression system [4]. This usually occurs when the compressor is operated at a low airflow rate and high pressure. When this occurs, the pressure in the discharge piping can induce reverse airflow through the previous stages, causing instability. This instability is characterized primarily by pressure and airflow fluctuations. In the most destructive form of surge, known as “deep surge,” the airflow and pressure fluctuations are large enough to cause significant noise

3

damage and vibrations, and consequently severely damage the compressor [2]. Therefore, surge is an operating condition that must be avoided. In order to prevent surge occurrence, the general solution is to avoid operation in the operating region of surge. This surge avoidance operation requires a map or maps of surge operating conditions, known as a “surge map.” The limit of surge operating conditions is plotted on the surge map as a “surge limit line.” A surge map is currently found by operating the compressor at a low airflow rate and measuring the pressure oscillations [3]. From this data, surge conditions are located on a map incorporating some combination of variables. Most common is to map surge with pressure versus airflow rate, while other variables such as motor current, impeller speed, and polytropic head are also used. An example of a surge map provided by a compressor manufacturer is shown in Figure 1.1. Operating the compressor in the region to the right of the surge limit line will not result in surge; operating in the region to the left of the line will result in surge.

4

Figure 1.1: Turbocharger Compressor Flow Maps for 3000GT and Stealth Owners [5] In addition to the surge limit line, a control line is set a distance away from the surge line. This line must be set as much as 15-20% away from the surge line when a single variable surge line is used [6] as illustrated in Figure 1.2. In order to avoid surge reliably, the compressor is operated below the control line [7].

5

Figure 1.2: Map showing the relationship between the surge limit line and the control line Previous research has shown that surge conditions are affected by a number of complex and often interrelated factors, such as the gradient of compressor pressure rise characteristic, impeller blade backsweep and swirl, pressure downstream of the compressor, and other factors [8, 9]. In order to obtain an accurate surge map, the map must account for all the relevant factors. Factors such as ambient temperature, pressure, and humidity change the characteristics of the air and, as a result, change the dynamic behavior of the compressor related to surge. If a surge map is to be accurate, it must take into account the ambient air conditions as well as all other variables that affect surge. Current industry practice does not account for many of these variables that affect surge. Typical practice is to use a surge map that incorporates only one or two variables and does not account for ambient air conditions. Only a single surge map is used in

6

practice, whether the air is hot and humid, i.e. easy to surge, or cold and dry, i.e. difficult to surge. It is desirable to have an accurate surge map in order to attain efficient compressor operation. If the surge line is not accurately known, the compressor must be operated in a region far from the surge line in order to be certain of preventing surge. In this region far from the surge line, the compressor produces a large amount of airflow. The higher the airflow produced, the lower is the efficiency of the compressor. If a compressor is operated far from the surge boundary, the compressor operation will be very inefficient. It is most desirable to operate the compressor as close as possible to surge conditions, without entering into a surge condition, in order to operate the compressor most efficiently. The necessity of operating the compressor far from the surge limit also reduces the dynamic range of compressor operation. Therefore, obtaining surge maps that incorporate ambient air conditions is necessary for efficient and safe operation of air compressors.

1.2 Problem Statement In order to safely and efficiently operate an air compressor, the compressor must be operated very close to the surge limit without entering surge. To achieve this, it is necessary to obtain surge maps that are accurate and reliable. An accurate and reliable surge map must incorporate all variables relevant to surge, including ambient air conditions. Since many variables need to be taken into account for surge map generation, the data obtained will be multi-dimensional. Data-driven modeling approaches are effective for processing and modeling multi-dimensional data. The objective of this investigation

7

is to obtain the surge maps based on surge test data using data-driven modeling approaches. To obtain the data for surge modeling, a compressor was operated while collecting data from sensors installed throughout the compressor. The inlet guide vane (IGV) of the compressor was slowly closed, thus reducing airflow, until surge was detected. The data before and after a surge occurrence can be obtained as multi-dimensional data. A particular operating condition for the compressor can be thought of as a point in a multi-dimensional space. These dimensions incorporate all variables that are measured in the compressor: pressure and temperature variables across each compressor stage, valve opening, motor current, ambient air conditions, etc. The operating condition points for which surge occurs are called “surge points.” Other data points are called “not-surge points.” An example of surge and not-surge data points is plotted on a graph of airflow vs. pressure ratio in Figure 1.3.

8

Figure 1.3: Surge and not-surge data collected from surge tests plotted as airflow vs. pressure ratio

The surge mapping problem is to find the curve that separates the surge and notsurge points. In multiple dimensions, the points will be separated by a hypersurface. In order to maintain generality, the separating boundary will hereafter be referred to as the “separating hypersurface.” This hypersurface will then be the surge line. The problem of finding a separating hypersurface between two classes of data points is a pattern classification problem [10]. For pattern classification problems, the data being used may have many dimensions.

However, the more dimensions

incorporated, the more complex the model becomes. A highly complex model would be difficult to implement for practical applications such as a compressed air system, and would possibly lead to numerical difficulty during the modeling/training process. In

9

order to incorporate all relevant dimensions while keeping a problem of manageable complexity, the most relevant variables must be identified. The irrelevant variables will then be eliminated. This is done by a statistical method called principal components analysis (PCA) [11]. Once the relevant variables have been identified, a separating hypersurface can be found between the surge points and not-surge points using a pattern classification method. However, it must be acknowledged that when collecting data for identifying surge points and not-surge points, only a limited number of points will be available. It is not possible to test an infinitely large number of operating conditions to determine whether or not surge occurs at that condition. For this reason, there will be some uncertainty in the separating hypersurface determined from the pattern classification method. There may be some misclassification when the model is presented with new testing data. This uncertainty arises from the gap that necessarily exists between surge points and not-surge points. Operating conditions that correspond to points in this gap are unknown points. Unknown points falling to one side of the separating hypersurface will be predicted to surge; unknown points falling to the other side will be predicted to not surge, as shown in Figure 1.4.

10

Points predicted to not surge

Discharge Pressure

Points predicted to surge

Surge point Not-surge point Surge Limit Line (separating hypersurface) Airflow Figure 1.4: Pattern classification hypersurface separating surge and not-surge points

Due to the gap between the surge points and the not-surge points, a point in the area of uncertainty might be incorrectly predicted (misclassified).

If a point

corresponding to a surge operating condition inside the area of uncertainty falls to the right of the surge line, this surge point will be predicted to not surge.

If a point

corresponding to a not-surge operating condition inside the area of uncertainty falls to the left of the surge line, this not-surge point will be predicted to surge. possibilities are shown in Figure 1.5.

These two

11

Discharge Pressure

Point in the area of uncertainty, predicted to surge but actually not surging

Point in the area of uncertainty, predicted to not surge but actually surging

Surge point Not-surge point

Airflow Figure 1.5: Two possible cases of misclassification If a not-surge point is predicted to be a surge point, the consequence may be a small loss of efficiency.

If a surge point is predicted to not surge, however, the

consequence could be far worse. In this case, a surge event would occur. Because of the damage that can be caused by a surge event, it is far more dangerous for a surge point to be predicted to not surge than for a not-surge point to be predicted to surge. For this reason, the separating hypersurface found between the classes should be positioned as close as possible to the not-surge points. This would minimize the probability of missing a surge prediction.

1.3 Research Approach For this research, data for surge map modeling have been obtained by performing surge tests on a centrifugal air compressor at Toyota Motors Manufacturing in Kentucky, Inc. (TMMK).

The compressor operating conditions are changed slowly by

incrementally closing the inlet guide vane (IGV), the valve at the inlet of the compressor, while compressor internal variables and ambient air conditions are recorded.

The

12

operating condition at which surge occurs is identified as a surge point, and the nearest preceding operating condition is identified as a not-surge point. In order to apply a data-driven modeling method to the data, the dimensionality of the data must be reduced. Principal components analysis (PCA) is applied for this purpose. PCA is used to process the obtained data points, identifying which variables contribute most to surge by determining the directions of most separation in the multidimensional space in which the surge and not-surge points are contained. These variables that contribute most to surge are retained, while the other variables are discarded. This is the process by which dimensionality reduction is achieved. This data with reduced dimensionality is then used with the methods of multilayer perceptron (MLP) and support vector machine (SVM). The MLP method is a neural network method for locating a separating hypersurface between classes of data. The SVM method locates the hypersurface that optimally separates the data by equalizing the distance between the hypersurface and the classes of data. A modification to the method of support vector machine, the method of asymmetric support vector machine (ASVM), is then developed to control the distance between the separating hypersurface and the data points. This method is used to move the separating hypersurface close to the not-surge points, in order to reduce the possibility of dangerous misclassification of surge points.

1.4 Organization of the Thesis This thesis is composed of 8 chapters and 4 appendices. Chapter 2 will provide the literature review on compressor surge and the neural network methods used in this study. The causes of surge and the current surge avoidance

13

measures will be reviewed, followed by a review of multilayer perceptron, principal components analysis, and support vector machine.

Chapter 2 will also review the

application of data-driven modeling techniques to turbomachinery. Chapter 3 presents the multilayer perceptron, a simple neural network method to find the separating hypersurface between a set of points. This chapter also gives basic information on neural networks and pattern classification in general. The methodology of principal components analysis will be described in Chapter 4. The algorithms will also be presented. Chapter 5 will describe the basic theory of support vector machine for finding the optimal separating hypersurface between two classes of data.

The method will be

described for both linearly separable and nonlinearly separable cases. The modification to the method of support vector machine, the method of asymmetric support vector machine, will be introduced in Chapter 6. The methodology and a numeric example will be presented. Chapter 7 will present the results of the application of the above methods of datadriven modeling to the problem of surge map modeling for a centrifugal air compressor. Chapter 8 will give conclusions, summarize contributions of this research, and outline possible future research. Data tables and program codes that have been created for the implementation of the methods used in this research are included in the appendices.

14

Chapter 2 Literature Review Chapter 2 will provide a review of previous relevant research. The first section of this chapter will review the background and previous research that has been done on compressor surge, its causes, and methods of surge avoidance and control. The second section will give a review on the data-driven modeling methods that will be used in this study. Chapter 2 will then conclude with a review of research of applying data-driven modeling techniques to turbomachinery.

2.1 Surge Literature Review Surge is a phenomenon of oscillating pressure and airflow that occurs in a compressor when it is operated at an airflow rate that is too low to balance the discharge pressure [2, 8].

These pressure and airflow oscillations can cause damage to the

compressor resulting from induced vibrations in the impeller. Where and when surge begins is a matter of some conjecture and much study. Experiments have been conducted to determine where in the compressed air system surge first begins and what variables first indicate the onset of surge [8, 12 - 15]. The overall result of all these studies has been that surge in a centrifugal air compressor is dependent on a number of complex and often interrelated factors, such as the gradient of the compressor pressure rise characteristic, inducer incidence, impeller blade backsweep, inlet swirl, number of diffuser and impeller vanes, and other factors.

15

It is known that surge occurs when the compressor is operating in an area of positive slope on a speed line on the pressure/airflow map for the compressor [16]. The speed line of a pressure/airflow map indicates the pressure/airflow conditions that exist in the compressor when the impeller is spinning at a given speed. When the compressor is operating on a speed line with positive slope, a decrease of airflow causes a decrease of pressure, which in turn causes another decrease in airflow. The cycle repeats, and the compressor enters surge [17]. Since speed lines on a pressure/airflow map are convex, as shown in Figure 2.1, the limit of stability for a compressor at a given speed occurs at the peak of the speed line [3]. By connecting the peak points of many speed lines, a line is obtained that is known as the “surge line,” the limit of stability for compressor operation. The exact location of the surge line cannot currently be accurately determined with analytical or experimental models, due to the large number of variables affecting the occurrence of surge [8].

16

Surge Line

Discharge Pressure

Higher Impeller Speed

Speed Lines

Lower Impeller Speed

Airflow Rate Figure 2.1: Speed lines indicating the operating conditions for a centrifugal air compressor at various impeller speeds

Since the surge limit line cannot be determined using analytical methods, a surge line is usually obtained by surge tests, i.e. operating the compressor at a low airflow rate and monitoring the operating conditions in each stage [3]. Surge can be determined from the pressure oscillations observed. The surge line for the entire compressor is then extrapolated from the pressure and airflow measurements of each stage. The result of this method of surge mapping is not sufficiently accurate [8]. Different types of surge can be characterized by the “depth,” or intensity, of the surge event. The more intense the surge event is, the more damage may result. In less intense forms of surge, airflow in the forward direction is slowed or even halted. In the most intense form of surge, known as deep surge, the airflow reverses and pressure fluctuations can damage or even destroy components of the compressor, such as the impeller bearing [4]. The oscillating airflow causes temperature at the inlet to rise

17

quickly, also causing damage [18]. In normal operation of a compressor, surge must be avoided because of the damage that surge can cause to the compressor. The matter of avoiding surge is a complicated task because of the simultaneous benefits and dangers of operating near the surge line. The location of the surge line can not be accurately known using current surge mapping methods and the regions near the surge line are the regions where most pressure rise and, therefore, most efficiency, is obtained [7].

Therefore, it is both unsafe and yet highly desirable to operate the

compressor in the regions near the surge line. There are two methods that are commonly used to compensate for these difficulties. In one method, known as “surge detection and avoidance,” the compressor is operated as near to the surge line as possible. Sensors detect variations in pressure, airflow, temperature, and other variables to detect when surge begins. When surge is detected, an emergency backup valve, known as a “blow-off valve,” (BOV) is opened to quickly lower the pressure and halt surge. This method is not widely used alone, due to the difficulty of accurately detecting surge and the combination of required quick response time and large actuation forces [7]. When this method is used, it is used primarily as a back-up measure to prevent more serious damage if surge does occur. The second method, known as “surge avoidance,” is widely in use today. In this method, a “control line” is placed 10-25% below the surge line in order to compensate for the uncertainty in the location of the surge line, as is shown in Figure 2.2 [7].

18

Surge control line

Discharge Pressure

Actual surge line

Airflow Figure 2.2: Actual surge line and surge control line plotted on top of centrifugal compressor speed lines

The compressor is operated below this safety margin, ensuring that the compressor will avoid the unstable operating range. Unfortunately, this method also necessitates that the compressor operate in the less efficient regions far from the surge line [7]. Neither of these methods currently used is a desirable solution to the problem of surge avoidance. The first method is perilous since it depends on the quality of surge detection in order to prevent severe damage. A single missed detection would result in serious disaster. Because there may be just a short time available to halt surge, there is no time available for a back-up measure, should the first prevention fail. Both methods are inefficient: surge avoidance is inefficient because the compressor cannot be operated in the efficient regions near the surge line, and surge detection and avoidance is inefficient because of the practice of “blowing off” useful compressed air. avoidance methods are in need of improvement.

Surge

19

One possible improvement, known as “surge control,” [19, 20] seeks to enlarge the stable operating range (“push back” the surge line) by altering the physical construction of the compressor [14] or adding actuation, such as an additional valve [16] or an air injector [21]. While this method is successful at increasing the compressor range in this manner, it remains that the new location of the surge line is as unknown as the old; a control line is still needed to compensate for the uncertain location of the surge line. Based on the literature review on compressor surge and surge avoidance control, there has been no work reported on obtaining surge map models using data-driven modeling approaches. Therefore, there is a potential for improvement by investigating data-driven modeling methods for surge mapping.

2.2

Review of Data-Driven Modeling Methods Two data-driven modeling methods will be used in this study: multilayer

perceptron and support vector machine.

A statistical method for dimensionality

reduction, principal components analysis, will also be used to reduce the dimensionality of the data for use with the modeling methods. The two modeling methods, multilayer perceptron and support vector machine, are neural network methods. The neural network is a mathematical structure that consists of many individual elements known as “neurons” linked together. An illustration of a neuron is shown in Figure 2.3.

20

Bias Input

Weight

+

Output

Figure 2.3: Illustration of neuron structure Each neuron receives an input, then performs a simple mathematical calculation to produce an output. By linking many individual neurons together, the structure is able to “learn” from input data [11].

Neural networks are commonly used in function

approximation, pattern classification, and artificial intelligence applications [11].

2.2.1 Multilayer Perceptron The multilayer perceptron is a linkage of neurons arranged in a lattice form. Several neurons linked in parallel form a “layer” of neurons. The multilayer perceptron is a structure that consists of multiple layers of neurons connected in series. By linking several layers of neurons together, the multilayer perceptron is able to learn nonlinear patterns. The concept of an artificial neuron was first proposed by W. McCulloch and W. Pitts in 1943. The “perceptron,” a linkage of neurons capable of learning, was invented in 1957 by Frank Rosenblatt. Rosenblatt’s perceptron consisted of only a single layer of neurons, and in 1969 Marvin Minsky and Seymour Papert demonstrated that this type of perceptron is not capable of solving a nonlinearly separable problem. Because of this limitation, neural networks lost popularity as a method of pattern classification until 1982 when it was discovered by Rumelhart, Hinton, and McClelland (and also independently by others) that a perceptron with multiple layers can solve a nonlinearly separable

21

problem.

Rumelhart, Hinton, and McClelland also proposed the backpropagation

algorithm as an effective learning method for multilayer perceptrons. In 1987, HechtNielsen showed that multilayer perceptrons with two hidden layers can represent any continuous mapping, and in 1989 Hornik showed that one hidden layer is sufficient for universal approximation [11, 22-25]. With this new and more advanced capability, neural networks and, particularly, the multilayer perceptron, regained interest. Multilayer perceptrons are useful because of their ability to “learn” a pattern from input data. The output of the final layer of neurons is compared to a “desired output” defined by the class that the input belongs to. The difference between the desired output and the actual output is used to change the calculation performed by each neuron. By repeatedly presenting inputs, comparing the output, and changing the calculation, a multilayer perceptron can be made to “learn” a pattern. Once a pattern has been learned, the multilayer perceptron is able to correctly classify unseen data according to the learned pattern. The multilayer perceptron is a neural network method that is often used for pattern classification. Pattern classification refers to the problem of classifying data elements by class based on data element features. For example, the pattern classification capabilities of a multilayer perceptron were used in cancer diagnosis by dividing cell features into the well and not well categories. Cells that are well were used as a basis, then anything that deviated from this was classified as not well [26].

Multilayer

perceptrons have also been used in artificial intelligence applications to determine which rules are executable. Current situations are compared to situations in memory, and similar rules are determined to be executable rules [27].

22

Multilayer perceptrons are also often used in speech and language recognition. Some adaptations, such as a fuzzy adaptation to the multilayer perceptron, have been used to allow the network to recognize uncertain data such as language with more consistency [28]. The multilayer perceptron was compared to and found to be superior in terms of misclassification rate to other pattern classification methods such as the singlelayer perceptron and the Bayes classification methods [29]. All of these applications make use of the multilayer perceptron’s ability to learn a pattern. The process of learning that a multilayer perceptron is subjected to is achieved through a “training” process. During this process, the multilayer perceptron is repeatedly presented with inputs and updated according to the output it produces. The algorithm used for updating in this manner is known as a learning method. Several possible learning methods that have been used for multilayer perceptrons, include Newtonian methods, least mean square error method, and Quasi-Newtonian methods. However, the most efficient (and, hence, most popular) method of learning is the back-propagation learning algorithm [30, 31]. The back-propagation method was chosen for this study. It is generally a slow method if the network and data sets are large, and some study has been devoted to increasing the speed [32]. However, if the network and data sets are not large, and speed is not a critical issue, e.g. for off-line modeling applications, the back-propagation method is an accurate learning method for the multilayer perceptron.

2.2.2 Principal Components Analysis Principal components analysis is a statistical method for reducing the dimensionality of data. The dimensionality reduction is achieved by projecting the data

23

onto a different coordinate system [33].

The new coordinate system is chosen by

identifying the vectors in which the data exhibits most separation. This method was developed by Pearson and Hotelling in the early 1900s [11, 34]. Principal components analysis, a method for dimensionality reduction, has often been used in pattern-classification problems, where a small number of data samples coupled with high dimensionality results in poor or even impossible classification. When this situation is encountered, principal components analysis is used as a method of dimensionality reduction, to allow a smaller number of data samples to result in a better classification [35]. Dimensionality reduction, of course, results in a loss of information. It may be that the loss results in a mapping that is no longer accurate. However, if the dimensions that are kept are the ones with the most information and the dimensions that are eliminated are the ones with the least amount of information, then the information loss will be minimal. It has been shown that the method of principal components analysis is successful at identifying the most “useful” dimensions, so that the dimensionality reduction in many cases results in an insignificant loss of data [36].

2.2.3 Support Vector Machine The support vector machine is an implementation of the method of structural risk minimization [11]. The method of structural risk minimization acknowledges that the learning problem of a neural network is underdetermined [37]. In other words, there are many possible separating hypersurfaces for any given set of data.

The method of

structural risk minimization causes the learning problem to become determined by specifying that the separating hypersurface that is desirable is the one that is the least

24

complex solution. This is achieved by defining a cost function that includes terms for misclassification errors, then minimizing the cost function. The surface that is found will be optimal in the sense that is specified by the cost function [11]. The method of support vector machine is used to find the optimal separating hypersurface between classes of data, as a means of pattern classification. Support vector machine is a method that is effective for finding a separating hypersurface even with a small training sample and high dimensionality, which is the case in the surge problem [38]. SVM was first invented by Vapnik and Lerner in 1963 [39]. This early algorithm was only able to separate linearly separable data. It was only much later in 1992 that Vapnik, Boser, and Guyon discovered the kernel method for separating nonlinearly separable data [40]. Support vector machines have been used in a variety of applications, particularly in image and signal processing [41]. It has been found to be a superior method of pattern classification for speech recognition tasks [42]. Classical neural network methods, such as multilayer perceptron, are based on empirical risk minimization (minimizing error on the training dataset), but the support vector machine is based on structural risk minimization from statistical learning theory. As a result, the method of support vector machine has better generalization for nontraining data, and has been used for a multitude of pattern recognition tasks including face, handwriting, and voice recognition [38]. The method of support vector machine as a pattern classifier has been compared to other pattern classification methods like radial basis function neural networks and the

25

K-nearest neighbor classifier, and the support vector machine method was found to be most effective [43]. Support vector machine has also been found to be effective for feature extraction even when the data is heavily distorted [44]. In order to fit specific applications, the support vector machine method is often modified for a particular benefit. For example, modifications have been developed such as fuzzy support vector machine, v-SVM, and posterior probability support vector machine [45]. Support vector machines have been used as feature extractors [46], as well as being used in pattern recognition and regression estimation [47].

2.3

Review of Data-Driven Modeling for Turbomachinery The air compressor is a member of a larger class of machinery known as

turbomachinery.

Turbomachinery is a class of machinery that involves transferring

energy between a fluid and a rotating element [48]. Turbomachinery includes both machines that transfer energy from a rotating element to a fluid, such as compressors, and those that transfer energy from a fluid to the rotating element, such as turbines. These types of machines are central to operations in oil refineries, power plants, and chemical engineering plants [49]. This section reviews the application of data-driven modeling techniques to various turbomachinery. Applications of data-driven modeling techniques to turbomachinery falls into two categories: 1) modeling for the design of efficient turbomachinery, and 2) fault diagnosis and prognosis of turbomachinery. Within the first category, the Bayesian learning method has been used to identify the behavior of turbine engine components for preliminary design. Given a set of desired

26

boundary conditions, a Bayesian machine learning method was used to find optimum machine design specification [50]. Neural network methods have also been used to design turbine diffuser blades that produce a given velocity distribution on their surfaces [51, 52, 53, 54]. A desired velocity distribution is presented as the target, then possible designs are compared with the target. The neural network methods are used to minimize the error between the candidate design and the target design. This method has been attempted using a backpropagation neural network [51], a numerical scheme based on the limiter theory [53], and a 3D optimization method based on neural networks [54]. A similar method has also been applied to the design of turbine airfoils, using a multilayer feedforward neural network [55]. Within the second category, modeling for fault diagnosis and prognosis, neural network methods have been used to monitor the condition of the machine based on vibration responses in machine components. “Symptoms” of a problem are linked to problem causes. When symptoms are detected, the problem cause is identified by the neural network.

With the prognosis capability developed, maintenance on the

turbomachinery can take place before serious damage has occurred [56, 57].

The

nonlinear modeling capabilities of the neural network have been utilized to identify unbalanced rotating components of turbomachinery such as the power transmission shaft [58] by monitoring vibration signatures [58, 59]. Support vector machine has recently also been applied to turbomachinery fault prediction. A modification was developed to apply the method to the multi-class problem

27

of fault detection. This application was found successful due to the support vector machine’s capability to generalize well even with a small training data sample [60]. A few applications of data-driven methods to turbomachinery do not fall into the two categories.

One such example is a method combining linear models, neural

networks, and fuzzy logic to adjust the turbomachinery control system to improve efficiency and lower emissions [61]. A feedforward neural network was used to model an axial piston pump in order to reduce power loss at high pressures. The neural network model was found to accurately predict the pump behavior [62]. In the work reported by [63], a multilayer perceptron was used to model the performance of a diesel engine given variables such as engine speed and throttle position. Based on this review, no previous work has been done using data-driven modeling methods to model surge in air compressors. Furthermore, while the multilayer perceptron has been extensively applied to turbomachinery for various purposes, neither principal components analysis nor support vector machine has been applied to turbomachinery in any depth.

28

Chapter 3 Multilayer Perceptron Chapter 3 is divided into two sections. The first section will give an introduction to pattern classification. The pattern classification problem will be summarized and methods used to solve pattern classification problems will be described. Section 3.2 will focus in on one pattern classification method, the multilayer perceptron.

3.1 Introduction to Pattern Classification The problem of pattern classification is a problem that occurs frequently in many diverse fields of study such as medicine, atmospheric sciences, economics, and engineering [63]. The pattern recognition and classification problem can be succinctly stated as follows: given data with some elements belonging to one class and other elements belonging to another class, find a model that correctly assigns each data element to the correct class. The model that is used for the classification problem must correctly classify not only the data that was used to obtain the model, but must also correctly classify data to be obtained in the future; data that the model has not yet been exposed to. This type of problem arises, for example, in medical diagnosis. A particular set of symptoms exhibited by a patient is an element of data. The disease the patient has is the class to which this data belongs. The pattern classification problem in this case is: given a set of symptoms, identify which disease the patient has. A model that classifies these

29

patterns well will be able to correctly identify the disease of a patient not seen before, given the symptoms the patient experiences. The pattern classification problem for surge mapping consists of two classes of data: the surge class and the not-surge class. The data elements to be classified are obtained from surge tests performed on the compressor. The operating condition at which surge did occur during tests is identified as a surge point; the previous operating condition (at which surge did not occur) is identified as a not-surge point. The pattern classification problem in this case is: given a new operating condition of the compressor, identify whether this condition is a surge condition or a not-surge condition. The data is divided into two sets: the training set and the validation set. The training set of data will be used with a pattern classification method to map the data. Once the mapping, or training, is complete, the acquired map will be tested using the validation set of data. If a high percentage of the data points in the validation set are classified correctly, then the map is determined to be an accurate map. There are multiple methods that are often employed for pattern classification. Three commonly used method categories are the Bayes classifier method, the decision tree methods that also include such methods as tables and lattices, and neural network methods. The Bayes classifier method is a statistical approach that is based on outcome probabilities. A future outcome or classification is predicted by evaluating the frequency of occurrence of outcomes and classifications in the past. The resulting map is located such that the probability of misclassification is small. For this method to be successful, it is necessary to have some a priori knowledge of outcome probabilities [10].

30

In the decision tree method, a classification is made based on the answers to a series of binary questions asked about the unknown to be classified. Each question corresponds to a “node” of the decision tree, and each answer corresponds to a “branch.” When a branch reaches its termination, a decision has been made [63]. The decision tree method requires complex knowledge of the pattern, in order to correctly formulate the questions at the nodes and to correlate the answers of the branches with the correct classification. These two methods, the Bayes classifier and the decision tree, are useful methods for pattern classification if the data to be classified is linearly separable. Neither of these methods is capable of separating data that is nonlinearly separable. The third common pattern classification method, neural networks, allows for the pattern classification of data that is nonlinearly separable. As the surge mapping problem is a highly nonlinear problem, neither the Bayes classifier nor the decision tree is sufficient. Therefore, the neural network methods were chosen over the Bayes classifier and the decision tree.

3.2 Multilayer Perceptron Neural network methods are so called because the structure of these methods imitates the structure of a neurological system. Similar to the linkage of neurons that compose an animal’s nervous system, the neural network consists of a linkage of simple mathematical units known as “neurons.” The most common type of neural network structures is the perceptron.

A

perceptron is a network of one or more neurons linked to each other and to an input, and producing an output. The structure of a perceptron is illustrated in Figure 3.1.

31

x1

Bias b

w1 x2

w2

v

Inputs



Neuron wm

Output y

ϕ (v) Hard Limiting Activation Function

xm Figure 3.1: Structure of a perceptron

The neuron receives an input x and multiplies that input by a weight w. A “bias” b is added to the result, then the result is transformed with a nonlinear “activation function” that limits the output as follows:

y ( x ) = φ (wx + b)

(3.1)

An activation function is used to limit the neuron’s output. Figure 3.2 is a graph of several typical activation functions: hard-limiting, piecewise-linear, and tangentsigmoid functions.

Neuron Output

32

1

0

0 Neuron Input Tangent-sigmoid activation function Hard limiting activation function Piecewise-linear activation function Figure 3.2: Activation functions

The perceptron shown in Figure 3.1 may be expanded by considering the concept of “layers.” A group of neurons linked in parallel is known as a “layer” of neurons. A perceptron may have one or more layers. The single layer perceptron is useful for classifying patterns that are linearly separable; that is, patterns consisting of classes that can be divided by a line.

A

perceptron that consists of more than one layer, a “multilayer perceptron,” is capable of classifying patterns that are more complex and may not be linearly separable. This is because the additional layers act as feature extractors for identifying significant aspects of the data. Because surge data is nonlinearly separable, the multilayer perceptron was chosen for surge mapping. The multilayer perceptron is a mathematical structure that can be used for pattern classification and function approximation. A multilayer perceptron consists of a layer of

33

neurons receiving the input data, an output layer producing the output, and at least one “hidden layer” of neurons between the input and output layers [11]. See Figure 3.3.

Input Layer

Output Layer





Input Signal

Hidden Layer

Output Signal

… Figure 3.3: Structure of a multilayer perceptron

By repeatedly presenting the network with the inputs and a “desired output” d, the network can be “trained” to recognize a pattern or approximate a function. The activation function for a multilayer perceptron must be nonlinear and differentiable everywhere. If the activation function is not sufficiently nonlinear, the network will behave like a single-layer perceptron, unable to classify nonlinearly separable data. The hard limiting activation function is not sufficient for multilayer perceptrons because it is not continuous at the origin [11]. The piecewise-linear function is continuous everywhere but not differentiable at the two turning points. Additionally, the piecewise-linear function is not sufficient because it is not sufficiently nonlinear. The sigmoid activation function is continuous, differentiable, and sufficiently nonlinear. This function can be written as:

y( x) =

1 1 + e − wx +b

(3.2)

34

where x is the neuron input, y is the neuron output, w is the neuron weight, and b is the neuron bias. When training a neural network, the actual output for a data point is compared with the desired output d. The output of the network is calculated as Eq. (3.1), then the error e is calculated as: e=d−y

(3.3)

The weight w at iteration n+1 is given as: w (n + 1) = w (n) + Δw (n)

(3.4)

Δw (n) = ηδ (n) y (n)

(3.5)

and:

where δ is the local gradient and η is the learning rate parameter. An explanation of these parameters follows. After the weights have been updated, the network is presented with the input set again. One complete presentation of an input set is known as an “epoch.” A number of epochs may be required for a network to sufficiently learn a set of data. From Eq. (3.5), it can be seen that the weight update is a function of the learning rate parameterη , the local gradient δ , and the output signal of the neuron y . A high learning rate parameter will result in faster learning, but will prevent an exact calculation. A smaller learning rate parameter will result in slower learning, but will allow for a “higher resolution” to learn the desired output more exactly. The local gradient is a function of the error as calculated in Eq. (3.3), and can be written as:

δ = eϕ ′

(3.6)

35 where ϕ represents the activation function for the network. It can be seen from Eq. (3.6) that the local gradient becomes zero as the error becomes zero, and becomes large for large error. The sigmoid activation function has benefits in allowing for the use of multiple layers, but it also can cause a problem with the weight updates. As can be seen in Eq. (3.1), the output of a neuron, y, is limited by the activation function. As the output y becomes large, the gradient, δ , goes to 0. This, in turn, limits the size of the weight update in Eq. (3.5). By limiting the size of the weight update, the rate of learning is slowed. This problem can be remedied by using a particular type of backpropagation called “resilient backpropagation.” In this method, only the sign of the gradient is used in the weight update, not the magnitude of the gradient, i.e.: Δw = η sgn(δ ) y By using this method, learning speed is increased.

(3.7) This process of error

calculation and weight update results in the conversion of the output of the multilayer perceptron to the desired outputs. Once this conversion has been achieved, the network is said to have been “trained.” An input with unknown output may then be presented to the network, and a predicted output will be produced. In this way, the network functions as a pattern classification or function approximation tool. Several parameters in the back-propagation algorithm must be chosen: the learning rate parameter, the number of neurons, and the number of epochs to use in training. The selection of these parameters are dependent on a number of factors, such as the size of the training set, the complexity of the physical problem to be solved, the amount of time allowed for training, and the accuracy that is desired.

36

The learning rate parameter determines the speed and accuracy of convergence of the weights to a solution. From Eq. (3.5), it can be seen that a large value for the learning rate parameter will result in large updates to the weights.

This will cause faster

convergence, but will result in less accuracy and, if the parameter is large enough, may even cause unstable oscillation of the weights. A smaller learning rate parameter will result in small weight updates. This will give higher accuracy of results, but will slow the training process. For these reasons, a smaller learning rate parameter is desired until the training time becomes too long for the practical application. The number of neurons to use depends primarily on the complexity of the problem to be solved and the desired accuracy. The optimal number of hidden neurons is considered to be the smallest number of hidden neurons that classifies the data well enough [11]. Determination of this value is more or less a trial and error process, although a larger number of neurons will result in a larger number of weights and, thus, higher complexity of the decision surface. The number of epochs to be used depends primarily on choice of the learning parameter. If the learning rate parameter is small, then convergence will be slow and more epochs will be needed to locate a solution. If the learning rate parameter is large, then convergence will be faster and fewer epochs can be used. Another important consideration for the multilayer perceptron is what size of training set is needed to obtain good results. In general, this can be estimated from the total number of free parameters W and the allowable inaccuracy ε as in: ⎛W ⎞ N = O⎜ ⎟ ⎝ε ⎠

(3.8)

37

Eq. (3.8) indicates that if, for example, ten percent inaccuracy is allowed, then there should be ten times more training samples than free parameters. Once parameters and data size have been obtained and implemented, it is desirable to test the network with data samples that the network has not seen before. This is known as “validation.” For validation, the data set is divided into one set for training the network, and another set for testing the network. It was determined by Kearns that an optimal division of the data set is to use 80% of the available data for training and 20% of the data for validation [11]. The validation data set is presented to the network to determine what percentage of unseen data elements are correctly classified. The multilayer perceptron is sometimes able to produce adequate results for the pattern classification problem, but these results may not be consistent. It is able to find a separating hypersurface, but many such hypersurfaces exist. The multilayer perceptron does not locate the optimal separating hypersurface, it simply locates any separating hypersurface. It does not produce consistent results; some hypersurfaces that are located with the perceptron achieve good separation, some others do not. To obtain better and more consistent results, we would like to find not only some separating hypersurface, but the best separating hypersurface.

38

Chapter 4 Principal Components Analysis Chapter 4 will establish the dimensionality reduction methods used in this research. Section 4.1 will present the motivation and concepts behind dimensionality reduction, while Section 4.2 will focus on one particular method of dimensionality reduction, principal components analysis. This section is divided into four sub-sections giving an introduction to principal components analysis, detailing the data scaling method that was undertaken, presenting the principal components analysis algorithm, then finally summarizing the method.

4.1 Introduction to Dimensionality Reduction The data to be classified in a pattern classification problem may be in multiple dimensions. In the surge mapping classification problem, the data points to be classified span one dimension for each process variable that is monitored. The process variables being monitored in this research include pressure and temperature at the outlet of each stage, ambient temperature and humidity, airflow, motor current, and IGV opening percentage. It is not known in advance which process variables are relevant to surge and which are irrelevant. Some variables may have little or no effect on the onset of surge, and some variables may vary with or be dependent on other variables that contribute more to the onset of surge.

No variable can be eliminated before its relevance is

identified. It is not necessary to include all possible process variables in order to create a

39

more accurate model for surge mapping, nor is it practical. Such practice will result in a high model dimensionality. High model dimensionality is undesirable for several reasons. The first concern is known as the “curse of dimensionality” [11]. This concept relates to the idea that in order to obtain good classification for data in a high dimensional space, the data points available for training must be dense.

The higher the dimensionality of the space,

however, the less dense the data points become. To achieve good classification in a highdimensional space, the number of data points needs to be large, resulting in higher experimental cost. Secondly, it is not economical to implement a high-dimensional solution.

For each dimension in the mapping, a sensor is needed to record the

corresponding variable. The more dimensions in the mapping, the more sensors are needed to implement the model. It is always desirable to use a minimal number of sensors, i.e. the minimal number of dimensions required for modeling. In order to reduce costs and increase the classification accuracy of the model, the data used for classification should contain the smallest number of dimensions that is required for good classification. To reduce the number of dimensions contained in the data, it is necessary to evaluate the consequences of eliminating certain dimensions. Some dimensions of the data would result in significant loss of information if eliminated, while other dimensions in the data can be eliminated without significant loss of information. In order to reduce the dimensionality of the data without significant loss of information, it is desirable to identify the significant variables and retain only these dimensions in the modeling process. Once identified, the dimensions corresponding to insignificant variables can be eliminated.

40

4.2 Principal Components Analysis 4.2.1 Introduction to Principal Components Analysis Principal components analysis (PCA) is a method that is often used to identify the dimensions of the training data that provide the most separation of the data [11]. These dimensions can be considered the “most important” dimensions. If the “most important” dimensions can be identified, they can be used to identify the most relevant variables. Once the most relevant variables are known, only dimensions corresponding to these variables are used in the modeling. Other dimensions are eliminated. This results in dimensionality reduction of the data space. The identification of the “most important” variables is achieved by identifying the composite directions of most separation for the data in the original data space. The data points are then projected onto the composite dimensions, and the data points then exhibit more significant separation in the transformed space. When these composite directions of most separation are identified, the components they contain are examined, because the components correspond to different variables. A large component indicates that its corresponding variable is significant, while a small or zero component indicates that its corresponding variable is insignificant. A composite direction may contain nonzero elements corresponding to many different variables. For example, the composite direction of most separation may contain nonzero components corresponding to the first and fifth variables, but all zero components corresponding to the second, third, and fourth variables. In this example, the variables corresponding to the first and fifth elements would be determined to be significant, while the second, third, and fourth variables would be determined to be insignificant.

41

Once the significant and insignificant variables are identified, the significant variables are used in the modeling and the insignificant variables are eliminated. These significant variables are the new dimensions used in modeling.

4.2.2 Data Scaling Prior to processing with PCA, the data must be scaled. Two types of data scaling commonly applied to statistical tasks are mean centering and variance scaling. Variance scaling consists of scaling each data dimension to have unit variance. This method of scaling is necessary when the data to be processed contains dimensions with widely different dynamic ranges. Since that is not the case in the surge mapping application, variance scaling is not used. In order to preserve the relative variance among data dimensions, mean-centering is the chosen method of data scaling. Mean-centering is accomplished by subtracting the mean x i off of a data dimension x i as: x i , mc = x i − x i

(4.1)

The resulting data is then mean-centered. The PCA algorithm can begin.

4.2.3 PCA Algorithm The random variable X will represent one dimension of the data that PCA will be applied to. The goal of the PCA algorithm is to identify in which directions data exhibits most separation. Amount of separation can be expressed using the concept of variance. Variance, which will be represented by the symbol σ 2 , is defined as:

42

n

σ2 =

∑ (x i =1

i

− x)2

n −1

(4.3)

Once variance is defined, another variable called the “variance probe” can be defined. This variance probe can be defined in terms of the covariance between data dimensions. In order to do this, the covariance matrix R must first be defined as:

Ri , j = cov (xi , x j ) =

n

n

j =1

i =1

∑ ∑ (x

i

− xi )( x j − x j )

(n − 1)

(4.4)

where n is the total number of dimensions [11]. Then, allowing for q to represent a unit vector, the variance probeψ can be written as in Eq. (4.5).

ψ (q) = σ 2 = q T Rq

(4.5)

Because covariance is a measure of amount of separation, the variance probe is an indicator of amount of separation in the direction of the unit vector q. When the variance probe is maximized, then the corresponding unit vectors q are the directions of most separation. In order to identify the q vectors of most separation, the “principal components” must also be identified. Principal components indicate how much separation is present in a given q direction. The principal components for a single data vector x can be found by projecting data vectors onto the unit vectors q:

a j = q Tj x = x T q j

(4.6)

The vectors q j and the principal components a j for the entire data set can be found by using the covariance matrix:

43

Rq j = a j q j

(4.7)

It can be seen that this is an eigenvalue problem.

4.2.4 Summary of PCA Algorithm The q vectors, the composite directions of most separation, are found by first calculating the covariance matrix of the original data as in Eq. (4.4). The covariance matrix is a form of the variance probe, as shown in Eq. (4.5). Covariance indicates amount of separation, as was shown in Eqs. (4.3) and (4.4). The eigenvectors and eigenvalues for the covariance matrix are then calculated. The eigenvectors of the covariance matrix are the q vectors as seen in Eq. (4.7). These vectors indicate composite directions of most separation, and the corresponding eigenvalues a indicate length of the eigenvectors, or the amount of separation the data contains in the composite direction q. The composite direction of most separation is then identified by the largest eigenvalue. The corresponding eigenvector is the composite direction of most separation. To assist in describing the practical application of the PCA method, an example follows: Equation (4.8) shows a sample eigenvector matrix Q and eigenvalue vector a. Rows of the eigenvector matrix represent variables, and columns represent the new composite dimensions, as is shown in Eq. (4.8). For example, in Eq. (4.8), elements of the eigenvector matrix q11 through q14 represent a single original process variable, pressure 1. Elements q11 through q 41 represent a composite dimension in which the data exhibits a1 amount of separation.

44 Composite directions

⎡ q11 ⎢q Pressure 2 Q = ⎢ 21 ⎢q31 Temperature 1 ⎢ Temperature 2 ⎣q 41 Pressure 1

q12

q13

q 22

q 23

q32 q 42

q33 q 43

q14 ⎤ q 24 ⎥⎥ q34 ⎥ ⎥ q 44 ⎦

⎡ a1 ⎤ ⎢a ⎥ a = ⎢ 2⎥ ⎢ a3 ⎥ ⎢ ⎥ ⎣a 4 ⎦

(4.8)

Once the composite directions of most separation are identified, the most significant variables can be identified by examining the elements of the vectors. Since rows of the eigenvectors correspond to variables, the most significant variables can be identified as the variables corresponding to the largest elements of the composite directions of most separation. For example, if a1 is the largest element of the eigenvalue vector in Eq. (4.8), then the first column of the eigenvector matrix ( q11 through q 41 ) is the direction of most separation.

If q 21 is the largest element among these, then

“Pressure 2” is the most significant variable. These most significant variables can be retained for modeling, while other variables are eliminated. The dimensionality of data has then been reduced.

45

Chapter 5 Support Vector Machine Chapter 5 will present the method of support vector machine. The motivation behind and an introduction to the method will be given in Section 5.1. This will be followed by a description of the support vector machine method for linearly separable data in Section 5.2. This chapter will close by detailing the method of support vector machine for the nonlinearly separable case in Section 5.3.

5.1 Introduction to Support Vector Machine For any set of data with elements belonging to two different classes, there exist an infinite number of possible separating hypersurfaces. A separating hypersurface may be located very close to one class of points, or it may be located very close to another class of points, or it may be located anywhere in between. When attempting to locate a separating hypersurface for a set of data, the question arises as to which of the possible hypersurfaces is the most desirable. The “most desirable” hypersurface may be defined in any of many different ways, depending on the situation at hand, however, in general the goal is to classify all points correctly. So, in general, a desirable hypersurface will minimize misclassification for both training and validation data; that is, the most desirable hypersurface will classify the most points correctly as is possible. Misclassification occurs when an unseen data point falls on the wrong side of the separating hypersurface. This is more likely to occur when the separating hypersurface is located very close to a set of training points. Misclassification is less likely when the separating hypersurface is located far from a set of data points. For these reasons,

46

occurrence of misclassification can be minimized by maximizing the distance between classes of data and the separating hypersurface. When this is accomplished, it is unlikely that an unknown data point will fall on the incorrect side of the separating hypersurface. There may also be additional stipulations on the optimal hypersurface. For example, it is also desirable to minimize the order or complexity of the separating hypersurface. Multilayer perceptron methods do not have the capacity to locate a separating hypersurface that is optimal in this sense. Multilayer perceptron is capable only of finding some separating hypersurface.

Using multilayer perceptrons, the separating

hypersurface found is only sometimes adequate.

There is no guarantee that the

hypersurface found is optimal in terms of maximizing the distance from either class to the hypersurface. In order to achieve optimality, another method is required. Multilayer perceptron is in a class of techniques known as stochastic approximation. This class of techniques does not achieve optimality in the classification in the sense of maximizing the separation between points and the separating hypersurface and thus minimizing misclassification. In order to achieve optimality in this sense, the classification problem may be viewed instead as a problem of “structural risk minimization.” The principle of structural risk minimization is applied by defining a cost function for minimization. The cost function includes terms representing undesirable outcomes such as misclassification and model complexity.

By minimizing the cost

function, the undesirable outcomes are avoided in an optimal manner. By viewing the problem in this way, it is possible to reduce classification error for training points to zero and also minimize the complexity of the located hypersurface. A comparison of MLP maps and an SVM map is shown in Fig. (5.1).

47

36 Surge Points Not Surge Points

34

MLP map 1 MLP map 2 SVM map

IGV Opening (Percent)

32 30 28 26 24 22 20 18 16 120

130

140

150 160 170 180 190 Current (Amp) Figure 5.1: Comparison of maps obtained using MLP and SVM

In order to minimize classification error, the distance between data points and the separating hypersurface should be maximized.

In order to maximize the distance

between points and the separating hypersurface, data points that are closest to the separating hypersurface are identified. These points are called “support vectors.” By positioning the separating hypersurface so that it is equidistant from all the support vectors in each class, the distance between the training points and the separating hypersurface

is

maximized,

thus

satisfying

the

requirement

for

minimizing

misclassification. A second requirement for optimality is that the hypersurface found be as simple as possible. The desired hypersurface is not only the one that classifies the most points correctly, but is the lowest order hypersurface that classifies the most points correctly. In

48

order to achieve this, the weight vector must be “equalized.” The weight vector in the support vector machine method is similar to the slope value in a linear equation. By minimizing the weight vector’s deviation from zero, the separating hypersurface is kept as simple as possible. The two methods of support vector machine are slightly different depending on whether the classification problem is linearly separable or nonlinearly separable: the linearly separable case is applied to the situations that can be correctly classified by linear separating hypersurfaces, while the nonlinearly separable case applies to data that can only be correctly classified by a nonlinear separating hypersurface.

The difference

between these two cases is illustrated in two dimensions in Figures 5.2 and 5.3. The linearly separable case is shown in Figure 5.2, and the nonlinearly separable case is shown in Figure 5.3.

Discharge Pressure

Possible separating line

Surge point Not-surge point

Airflow Figure 5.2: Example of a set of linearly separable data points

49

Discharge Pressure

Possible separating curve

Surge point Not-surge point

Airflow Figure 5.3: Example of a set of nonlinearly separable data points

The linear method of support vector machine is achieved by directly applying the method of support vector machine to input data, while the method of nonlinear support vector machine consists of first transforming data to a higher-dimensional space in which the data is linearly separable, locating the linear separating hypersurface in the higherdimensional space, then transforming the data and the hypersurface back to the lowerdimensional space.

5.2 Linearly Separable SVM 5.2.1 Data Scaling Prior to applying the SVM algorithm, data must be scaled in order to equalize data vectors measured in different units. A different method of scaling is used here than was described in Chapter 4 for PCA, because it is not necessary for SVM to preserve amount of variance in the data. However, it is necessary for SVM to preserve as much

50

separation between classes as possible. For these reasons, variance scaling will be used without mean-centering for SVM. To scale the data, first the average of each dimension must be calculated as: N

∑x

x=

i =1

i

(5.1)

N

where i is an index representing the data vectors, x is one dimension, and N is the number of data vectors to be used in the modeling.The result of Eq. (5.1), the dimension average, is then used in the scaling as in Eq. (5.2). The dimension average is subtracted from the corresponding dimension in each data vector. Each difference is squared and the squares are averaged. The square root is then taken. This result is the scaling factor λ to be used to scale the dimensions [66]: k

λ=

∑ (x i =1

i

− x)

2

(5.2)

N

The new data y to be used in SVM are the original dimensions x scaled as:

y=

x

λ

(5.3)

Data scaling is then complete, and the SVM algorithm can begin.

5.2.2 SVM Algorithm The problem of locating the optimal separating hypersurface is a constrained optimization problem.

The requirements are to minimize misclassification and to

minimize the complexity, or order, of the hypersurface.

In order to minimize

misclassification, the distance from data points to the separating hypersurface must be

51

calculated. The actual distance g (x) of a point x from the hypersurface with weights w

and bias b is: g ( x) = w T x + b

(5.4)

Next, the distance is re-scaled so that all support vectors are a distance of 1 from the separating hyperplane. The scaled distance g (x ) can then be written in terms of the actual distance r as: g ( x) = r w = 1

(5.5)

The actual distance r is positive for points with a positive desired output, and negative for points with a negative desired output. The margin of separation ρ between the two sets of classes can then be written in terms of r as:

ρ = 2r =

2 w

(5.6)

See Figure 5.4 for an illustration of the meaning of r.

Discharge Pressure

Support Vectors

Surge point Not-surge point

Distance r Airflow

Figure 5.4: Relationship between support vectors and separating line

52

The points for which the scaled distance is equal to one are the support vectors. The relationship between the support vectors and the separating hypersurface is illustrated in two dimensions in Figure 5.4. In order to minimize misclassification, the margin of separation ρ must be maximized. From Eqs. (5.5) and (5.6), it can be seen that maximizing the margin of separation is equivalent to finding the weight vector w that minimizes the following quadratic cost function: Φ (w ) =

1 T w w 2

(5.7)

Minimizing Eq. (5.7) also serves to minimize the complexity of the separating hypersurface [11]. Meanwhile, this optimization problem is subject to the constraint that all points must be correctly classified. This constraint can be written as:

(w

T

xi + b ≥ 1

)

for d i = +1

(5.8)

(w

T

x i + b ≤ −1

)

for d i = −1

(5.9)

Where i = 1, 2, 3, …, N. Equation (5.8) expresses the distance between positive output data points and the hypersurface in terms of the weight vector w, the input data x i , the bias b, and the desired outputs d i for the total number of training data input N. Notice that Eq. (5.8) states that this distance must be greater than or equal to 1 for all positive training data points, and greater than or equal to -1 for all negative training data points.

53

Based on Eqs. (5.7), (5.8), and (5.9), the method of support vector machine is a constrained optimization problem.

The method of Lagrange multipliers is used to

convert the problem into an unconstrained optimization problem. With the Lagrange multipliers method, Eqs. (5.7) and (5.8) are combined into a single cost function, the “Lagrangian function” J: J ( w , b, α ) =

[ (

) ]

N 1 T w w − ∑α i d i w T xi + b − 1 2 i =1

(5.10)

The variables α i are auxiliary nonnegative variables known as Lagrange multipliers. This cost function now includes the terms for classification error and also the terms that represent the complexity of the function. Recall that the optimum separating hypersurface is the one that minimizes both classification error and complexity of the separating hypersurface by maximizing the margin of separation. Eq. (5.10) should be minimized with respect to w and b and maximized with respect to α i . The necessary condition for the optimality of Eq (5.10) leads to the following two equations: ∂J ( w , b, α ) =0 ∂w

(5.11)

∂J ( w , b, α ) =0 ∂b

(5.12)

The weights can be solved for as: N

w = ∑α i d i xi

(5.13)

i =1

The next step is to find the α i values so the weights can be found. To easily find the α i values, the duality theorem is used. The duality theorem can be stated as follows:

54

Given training data

{( x i , d i )}iN=1 , find the α i values that maximize the function N

Q(α ) = ∑ α i − i =1

1 N ∑ 2 i =1

N

∑α α j =1

i

j

d i d j x iT x j

(5.14)

subject to the constraints N

∑α d i =1

i

i

=0

αi ≥ 0

(5.15) (5.16)

Once the α i values have been found, then they can be used to solve for the weights w as in Eq. (5.13) and for the bias b as: b0 = 1 − w T x ( s )

(5.17)

where x (s ) represents the support vector. This method is known as the linear method of Lagrange Multipliers, for separating points that are linearly separable.

5.3 Nonlinearly Separable SVM If the training data can not be linearly separated, the method will have to be modified for the nonlinearly separable case will have to be used. The method for the nonlinearly separable case is slightly different from the linearly separable case. In order to correctly separate nonlinearly separable points, the data is first transformed to a higher dimension “feature space” in which the data is linearly separable. The linear separating hypersurface is found as in the linearly separable case, then the data and the hypersurface are transformed back to the original space. This process is illustrated in Figure 5.5.

55

Mapped data, high dimensionality space

Linearly separable data, high dimensionality space

Method of linear support vector machine

Transformation

Transformation

Nonlinearly separable data, low dimensionality space

Mapped data, low dimensionality space

Figure 5.5: Illustration of the method of nonlinear support vector machine

As can be seen from Figure 5.5, the method of nonlinear support vector machine consists of performing the method of linear support vector machine on data that has been transformed to a high dimensionality space. Data that has been transformed can be written as φ (x ) . Then the Lagrangian function performed on this data is written as: J ( w , b, α ) =

[

]

N 1 T w w − ∑ α i d i (w T φ (x i ) + b ) − 1 2 i =1

(5.18)

The Lagrangian function is then maximized to obtain the cost function Q for the nonlinearly separable case: N

Q(α ) = ∑ α i − i =1

1 N ∑ 2 i =1

N

∑α α j =1

i

j

d i d j φ T (x i )φ (x j )

(5.19)

Defining the kernel K as: K ( x i , x j ) = φ T ( x i )φ ( x j )

The modified cost function Q is then written:

(5.20)

56

N

Q(α ) = ∑ α i − i =1

1 N ∑ 2 i =1

N

∑α α j =1

i

j

(5.21)

di d j K ( xi , x j )

The kernel, which defines the transformation, can be defined in different ways depending on the chosen method. Table 5.1 lists three types of kernels that may be used with the support vector machine method.

Table 5.1: Typical Kernels that may be used for the support vector machine method [11]

Type of support vector machine Polynomial learning machine Radial-basis function network Two-layer perceptron

Kernel

(x

T i

)

x j +1

p

2⎞ 1 ⎛ exp⎜ − xi − x j ⎟ 2 ⎝ 2σ ⎠ T tanh (β 0 x i x j + β 1 )

For this study, the polynomial learning machine method of nonlinear support vector machine has been chosen, in order that the dimensionality of the feature space might be manipulated. The polynomial learning machine type of kernel that is used for this method is defined as follows: K ( x i , x j ) = ( x iT x j + 1) p

(5.22)

The dimensionality of the feature space can then be manipulated by adjusting the value of p. The ability to adjust this parameter allows for the fitting of data that may represent a high-order nonlinear function. The value of p can be set high enough that the data is linearly separable in the feature space, even though it may be highly nonlinear in the original function space. In this way, the method of support vector machine is able to locate the optimum separating hypersurface for linearly or nonlinearly separable data.

57

Chapter 6 Asymmetric Support Vector Machine Chapter 6 will present the method of asymmetric support vector machine (ASVM). This chapter is divided into three sections: Section 6.1 will present the motivation behind the development of the method of asymmetric support vector machine. Section 6.2 will present the algorithm for the method. Finally, Section 6.3 will detail an example of the asymmetric support vector machine method.

6.1 ASVM Motivation In Chapter 5, the method of support vector machine was presented as a way to minimize misclassification by equalizing the distance between the support vectors in each class and the separating hypersurface. This is, in general, a valid way to minimize misclassification error. However, there may exist situations or circumstances in which the equalization of distance between points in each class and the separating hypersurface is not the most preferable case. In some applications, such as fault diagnosis and surge map modeling, misclassification of data in one class is much more costly than misclassification of data in the other class. For separable cases, a major issue is how to deal with the gap, i.e. the uncertain region, between different classes. This is particularly important for validation data. The chance of misclassification of validation data in any given class is closely related to the distance between the training data points and the separating hypersurface. If the training points in a class are far from the hypersurface, validation data points are unlikely to fall

58

on the wrong side of the separating hypersurface.

In this case, the chance of

misclassification of validation data within that class is small. If the training points in a class are close to the hypersurface, then validation data points are more likely to fall on the wrong side of the separating hypersurface and be misclassified. In this case, the chance of misclassification of validation data within that class is large. The support vector machine method of equalizing the distance from the support vectors in one class to the separating hypersurface and that from the support vectors in the other class to the separating hypersurface leads to approximately equal chances of misclassification of validation data for both classes. If misclassification of validation data in one class is more acceptable than misclassification in another class, then the support vector machine method of equalizing distances is undesirable. It would be better if the separating hypersurface was farther from the class for which misclassification is more costly, in order to reduce the chances of misclassification for this class. The problem of surge mapping is a case of this type. Within the gap between the surge and not-surge points is an “area of uncertainty” where it is not known whether a point will actually be surge or not-surge, as shown in Figure 6.1.

59

Discharge Pressure

Area of uncertainty

Surge point Not-surge point

Airflow Figure 6.1: Uncertain area in the separable pattern classification problem for surge map modeling

If an unknown point is predicted to surge, but surge does not actually occur, as shown in Figure 6.2, no damage is done to the compressor. This can be considered a “false-alarm.” A small amount of energy will be lost unnecessarily due to operating the compressor farther from the true surge line than is absolutely necessary, but this is preferable to the other possibility. If an unknown point is predicted not to surge but does, in fact, surge, damage will be caused to the compressor due to the large pressure oscillations that ensue. This case is illustrated in Figure 6.3. For the case of surge prediction, it is more acceptable for not-surge points to be misclassified than for surge points to be misclassified. It is, therefore, useful to produce a surge line that is very close to the not-surge points and farther from the surge points.

60

Discharge Pressure

Point in the area of uncertainty, predicted to not surge but actually surging

Surge point Not-surge point

Airflow Figure 6.2: Illustration of a surge point predicted to not surge

Discharge Pressure

Point in the area of uncertainty, predicted to surge but actually not surging

Surge point Not-surge point

Airflow Figure 6.3: Illustration of a not-surge point predicted to surge

The standard formulation of the support vector machine method does not allow for distance manipulation of this kind. For this reason, the method of asymmetric support vector machine has been developed to allow the manual adjustment of these distances.

61

6.2 ASVM Algorithm The adjustment of distance between the optimal separating hypersurface and a particular data set can be accomplished by adjusting the desired output value for one of the classes of training data while holding the desired output for the other class constant. This can be seen by recalling some of the equations that were established in Chapter 5. Recall from Chapter 5 that the actual algebraic distance of a point x from the separating hypersurface with weights w and bias b is written as: g ( x) = w T x + b

(6.1)

Recall from Chapter 5 that in the linear method of support vector machine, the optimum separating hypersurface is found by minimizing the following Lagrangian function:

J ( w , b, α ) =

[ (

) ]

N 1 T w w − ∑α i d i w T xi + b − 1 2 i =1

(6.2)

By substituting Eq. (6.1) into Eq. (6.2), we have: J ( w , b, α ) =

N 1 T w w − ∑ α i [d i g ( x ) − 1] 2 i =1

(6.3)

From Eq. (6.3) it can be seen that the desired output, d i is the weight applied to the actual algebraic distance of a point x from the separating hypersurface in the Lagrangian function. The distance between the separating hypersurface and a class of points can be adjusted by setting the desired output for a class of points. A larger desired output for a set of inputs will result in a smaller distance between the separating hypersurface and these points, while a smaller desired output for a set of inputs will result in a larger distance.

62

The nonlinearly separable case is similar to the linearly separable case, with the exception that the data to be separated is first transformed into a higher-dimension feature space. The hypersurface in the feature space is then written as:

wφ (x) + b = 0

(6.4)

Similarly, the Lagrangian function for the nonlinearly separable case is formed by replacing the data points x with the transformation of the data points: J ( w , b, α ) =

[ (

) ]

N 1 T w w − ∑ α i d i w T φ (x) + b − 1 2 i =1

(6.5)

From Eq. (6.4), it can be seen that the distance from a point to the separating hypersurface in the feature space is written as: g * ( x ) = wφ ( x ) + b

(6.6)

Then it can be seen from Eqs. (6.5) and (6.6) that in the nonlinearly separable case, the desired output d is the weight placed on the distance from a data point to the separating hypersurface in the higher-dimension feature space. This method can now be used to prevent the dangerous case of a surge event that is not predicted. By increasing the desired output for the not-surge class of points, the separating hypersurface will be moved closer to these points and farther from the surge points. This will then reduce the possibility of misclassification of a surge point.

6.3 Numeric Example for Illustrating ASVM The proposed ASVM scheme is illustrated with a numeric example of twodimensional surge map modeling. Figure 6.4 demonstrates of the effects of altering the desired output for not-surge points.

This example utilizes the polynomial learning

machine kernel as is shown in Eq. (5.22) with p = 2. The lines pictured, from left to

63

right, represent values of -0.1, -0.5, -1.0, -2.0, and -10.0 for not-surge desired output. The desired output for surge is held at +1.

As the not-surge desired output is decreased

relative to the usual desired output of -1, the separating line moves closer to the not-surge points. As the desired output is increased relative to -1, the line moves closer to the surge points. Support Vector Machine 5 4.5

Surge Points Not Surge Points d = -0.1 d = -0.5 d = -1.0 d = -2.0 d = -10.0

4

Dimension 2

3.5 3 2.5 2 1.5 1 0.5 0

0

1

2

3 4 Dimension 1

5

6

7

Figure 6.4: Numeric example for illustrating ASVM-based surge map modeling

This two-dimensional test was then extended to three dimensions, and the distance from the separating hypersurface to the support vectors of surge points and that to the support vectors of not-surge point was calculated for different not-surge desired outputs. The results appear in Table 6.1.

64

Table 6.1. Distances of surge and not-surge points from the separating hypersurface for different values of not-surge desired output Not-surge Desired Output Distance to Surge Points Distance to Not-surge Points -1 -5 -10 -20 -50 -100 -200 -500 -1000

.1637 .2669 .2902 .3007 .3093 .3124 .3136 .3145 .3148

.1585 .0577 .0388 .0329 .0297 .0292 .0291 .0291 .0291

Two plots of the 3-D test are shown in Figures 6.5 and 6.6.

5 4.5

Dimension 3

4 3.5 3 2.5 2 1.5 1 3

2 0 4

5

6

7

-2

Dimension 1

Figure 6.5: 3-D ASVM test with d = -10

Dimension 2

65

Dimension 3

5 4 3 2 1 3 4 5 6 Dimension 1

7

-2

-1

0

1

2

Dimension 2

Figure 6.6: Another view of Figure 6.5

Figure 6.7 shows the results from Table 6.1 graphed. From this plot, it can be seen that the relationship between distance and desired output is exponential. The desired output for not-surge points is set at -10, at the bend in the exponential curve, in order to maximize distance benefit.

66

Minimum Euclidean Distance to Hyperplane

0.35

0.3

0.25

0.2

Distance to Surge Points Distance to Not Surge Points

0.15

0.1

0.05

0

0

100

200 300 400 500 600 700 800 Magnitude of Desired Output for Not Surge Points

900

1000

Figure 6.7: Smallest distance from the separation hypersurface under different magnitude of desired output in the 3-D ASVM simulation

The numeric examples show that distance from support vectors to the separating hypersurface for a class of data can be reduced by increasing the desired output for that class. These examples show that the relationship between desired output and distance is an exponential relationship.

67

Chapter 7 Results and Discussion Chapter 7 will present the results of implementing the data-driven modeling methods previously described based on compressor operation data.

This chapter is

divided into three sections. Section 7.1 will give an outline of the process that was undertaken in the implementation.

Sections 7.2 and 7.3 detail two phases of the

implementation: implementation on historical data and implementation on current data, respectively.

7.1

Outline The data-driven modeling approaches described in previous chapters were applied

to surge map modeling of a centrifugal air compressor. First, multilayer perceptron was applied to surge mapping. In order to apply the neural network modeling methods in an implementable fashion, principal components analysis was utilized to reduce data dimensionality. Then, support vector machine was applied to surge mapping. Finally, the asymmetric support vector machine method was applied to reduce the possibility of dangerous incorrect prediction. The proposed methods were first applied to historical operating data from an air compressor. Once it was confirmed which method for surge mapping, either multilayer perceptron or support vector machine, is the most effective, then the most effective method was applied to current data collected from an air compressor.

68

It was determined from the mapping applied to the historical data that the multilayer perceptron, while usually adequate for classification, does not provide an optimal solution. The method of support vector machine was found to provide a better separating hypersurface.

After the methods had been tested on historical data, the

effective methods were applied to current data for implementation. In order to implement neural-network based surge mapping, five steps were necessary. First, data for training and testing a network were collected from an operating air compressor. Then, the data were analyzed to identify which points belong to the surge class and which points belong to the not-surge class. Third, principal components analysis was used to identify the significant and insignificant variables, in order to reduce the dimensionality of the data for modeling. The method of support vector machine was then applied to locate the optimal separating hypersurface. Finally, the asymmetric support vector machine modification was made to the model in order to reduce the possibility of dangerous misclassification.

7.2 Surge Map Modeling with Historical Data 7.2.1 Data Collection Historical compressor operation data were provided by Toyota Motors Manufacturing in Kentucky (TMMK). The data were collected from the TMMK threestage Centac centrifugal air compressor from Ingersoll Rand, Inc. The compressor has a maximum intake capacity of 7349 cfm and a maximum discharge pressure of 130 psi. The data collected consisted of values recorded during compressor operation from sensors installed throughout the air compressor. Variables recorded include current,

69

electric power, IGV opening, BOV opening, inlet pressure, second stage inlet pressure, third stage inlet pressure, outlet pressure, and voltage. The data were collected during compressor operation at times when the compressor surged. determined to have surged when the blow-off valve opened.

The compressor was Currently, surge is

determined to have occurred when the motor current drops below 163 A. This criterion is known as “current limit low.” This is an approximation value that was determined from compressor calibration data by TMMK engineers. Data points collected during compressor operation were divided into two classes; the surge class and the not-surge class. Surge points were identified by blow-off valve measurement. If the blow-off valve opened, then surge was assumed to have occurred, and the point was identified as a surge point. The data point immediately preceding the surge event was identified as a not-surge point. A plot of surge and not-surge points can be seen plotted in Figure 7.1, with motor current versus IGV opening. One set of the historical data can be found in Table A.1 of Appendix A.

70

36

Surge Points Not Surge Points

34

IGV Opening (Percent)

32 30 28 26 24 22 20 18 16

120

130

140

150

160

170

180

190

Motor Current (Amp)

Figure 7.1: Historical surge data with motor current vs. IGV opening

7.2.2 Multilayer Perceptron The multilayer perceptron was used first as a preliminary measure to find the surge line with two dimensions of the data that exhibit good separation, IGV opening and motor current. Prior to applying the multilayer perceptron method, data must be scaled. Data scaling is done using the same method as was described earlier in Chapter 5. Once the data have been scaled, they can be used as the input to a multilayer perceptron. A table of the original and scaled data is located in Appendix A, Tables A.2 and A.3. As shown in Figure 7.2, the multilayer perceptron network used here contains 2 input nodes corresponding to IGV opening and motor current, 8 hidden neurons, and one output neuron.

71

Input Nodes

Hidden Neurons

Output Neuron

IGV Opening Surge / NotSurge Decision Motor Current

Figure 7.2: Structure of the multilayer perceptron used in this study

Since the learning will be done off-line, the speed of learning is not critical. As a result, the learning rate parameter can be set to 0.05, a low value, in order to reduce modeling error. The other values necessary for the multilayer perceptron, the local gradient and the output signal of the neuron, are calculated using the Matlab function “train.” The sigmoid-type activation function defined in Eq. (3.2) has been chosen for this network in order to sufficiently approximate a nonlinear function.

A plot of this

activation function can be seen in Figure 3.2. Because the network is using a sigmoid type activation function, the network is trained using a resilient back-propagation training algorithm. This training algorithm avoids the slow training that results when the output of the network is far from the desired output. In order to achieve a good separating hypersurface, 500 epochs were run. The surge points were given a desired output of +1, and the not surge points were assigned a desired output of -1. The 20 data points to be used as the input were divided

72

into a training set and a testing set. Since it is suggested to divide the data set into 80% training data and 20% testing data [11], the network was trained with 16 of the 20 available points and tested with the remaining 4 points. The multilayer perceptron mapping program written to perform this mapping can be found in Appendix B. The multilayer perceptron correctly classified all points, but the separating curve varied widely between trials. Figures 7.3 and 7.4 show the surge lines obtained on two trial runs. The separating hypersurface that is obtained by a multilayer perceptron varies widely depending on the initial weights and bias that is used in the network. Typical practice is to randomly generate weights and bias for network initialization. Due to this random element, the multilayer perceptron does not always produce a good separating curve. The multilayer perceptron method does not currently allow for the determination of the optimum separating hypersurface. The inconsistency of the multilayer perceptron is the main reason why a better solution is sought in the support vector machine method described in Chapter 5.

73

36 Surge Points Not Surge Points

34

IGV Opening (Percent)

32 30 28 26 24 22 20 18 16 120

130

140

150 160 170 180 190 Current (Amp) Figure 7.3: Two-dimensional multilayer perceptron surge model #1 for historical surge data

74

36 34

Surge Points Not Surge Points

IGV Opening (Percent)

32 30 28 26 24 22 20 18 16 120

130

140

150 160 170 180 190 Current (Amp) Figure 7.4: Two-dimensional multilayer perceptron surge model #2 for historical surge data

7.2.3 Principal Components Analysis In order to be accurate, the surge map must incorporate all relevant dimensions. As mentioned in Chapter 4, it would be difficult to implement for surge control using all eleven dimensions, as the surge line would be an eleven-dimensional hypersurface and would require eleven input signals from eleven sensors installed in the air compressor. Because of the “curse of dimensionality,” a large set of training data would also be necessary. To be effective for surge map modeling, the dimensionality of the surge map must be reduced. The method of principal components analysis was applied for this purpose.

75

Prior to applying PCA, the data must be scaled. The method of mean-centering, as described in Chapter 4, was used. The scaled eleven-dimensional historical surge data were then used to calculate the covariance matrix. The covariance matrix can be seen in Appendix A, Table A.4. The eigenvectors and eigenvalues of the covariace matrix were then calculated. The eigenvector matrix for the covariance matrix is shown in Eq. (7.1). q1 ⎡ 0.00 ⎢ 0.00 ⎢ ⎢ 0.00 ⎢ ⎢ 1.00 ⎢ 0.01 ⎢ Q = ⎢ 0.00 ⎢−0.01 ⎢ ⎢ 0.00 ⎢− 0.03 ⎢ ⎢ 0.00 ⎢ ⎣ 0.00

q2

q3

q4

q5

q6

0.00 0.00 0.08 −0.03 0.16 0.08 −0.27 0.43 − 0.84 0.01 0.01

0.01 0.00 0.01 0.01 0.07 −0.15 − 0.66 0.55 0.49 −0.01 −0.01

− 0.13 0.00 0.09 0.01 − 0.81 0.55 −0.07 0.16 0.01 0.00 0.02

− 0.06 0.01 − 0.77 0.00 0.31 0.53 0.06 0.13 0.08 −0.01 −0.03

0.05 −0.01 − 0.24 0.01 −0.25 − 0.42 0.58 0.60 0.01 − 0.02 0.04

q7

q8

q9

q 10

0.09 − 0.01 −0.57 − 0.01 − 0.40 −0.43 − 0.39 − 0.34 − 0.22 0.03 0.04

0.95 −0.01 0.03 0.00 −0.05 0.15 0.01 0.02 0.02 −0.03 − 0.27

−0.26 0.11 −0.02 0.00 −0.04 −0.08 0.00 0.00 −0.02 0.00 −0.96

−0.03 −0.97 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.23 − 0.10

q 11 0.044⎤ 0.23⎥⎥ 0.01⎥ ⎥ 0.00⎥ 0.01⎥ ⎥ 0.01⎥ 0.02⎥ ⎥ 0.02⎥ 0.02⎥ ⎥ 0.97⎥ ⎥ 0.01⎦

(7.1)

Rows of the eigenvector matrix correspond to original dimensions, and columns correspond to composite dimension of most separation. Notice that the fourth row of the eigenvector matrix in Eq. (7.1) is composed almost entirely of elements very close to zero. This row corresponds to the motor voltage dimension. The value of 1.00 in this row is located in the column that corresponds with an eigenvalue of 0.00, indicating that this variable has no correlation with surge. This is understandable, as in the original data the voltage value does not change, but remains constant throughout. Other rows to note in Eq. (7.1) are the second row, which corresponds with motor current, and the tenth row, which corresponds to BOV position.

76

The eigenvectors in Eq. (7.1) correspond to the eigenvalues seen in Eq. (7.2). 0.00 ⎡ ⎤ ⎢ ⎥ 0.35 ⎢ ⎥ ⎢ ⎥ 0.54 ⎢ ⎥ 1.02 ⎢ ⎥ ⎢ ⎥ 9.31 ⎢ ⎥ [Λ ] = ⎢ 13.47 ⎥ ⎢ 15.71 ⎥ ⎢ ⎥ ⎢ 90.86 ⎥ ⎢ 1902.08 ⎥ ⎢ ⎥ ⎢ 74212.30 ⎥ ⎢ ⎥ ⎣18755818.07 ⎦

(7.2)

The magnitude of an eigenvalue indicates the “amount of separation,” or variance, achieved by the corresponding eigenvector. The largest two eigenvalues in Eq. (7.2), 18755818.07 and 74212.30, located as the last two elements of the vector, indicate that the last two columns, q10 and q11 , of the eigenvector matrix in Eq. (7.1) are the composite directions of most separation. The data projected onto these two vectors are plotted in Figure 7.5. The eigenvectors corresponding to the three largest eigenvalues:

q9 , q10 , and q11 , were also used to transform data, as shown in Figures 7.6 and 7.7.

77

100 Surge Points Not-surge Points

50

q10 0

-50 -1000

-800

-600

-400

-200

0

200

q11 Figure 7.5: Historical surge data plotted along the two largest eigenvectors obtained from PCA processing

q9

q 10

q 11

Figure 7.6: Historical surge data along the three largest eigenvectors obtained from PCA processing

78

Surge Points Not-surge Points

80 60 40

q9

20 0 -20 -40 100

q 10

0 -100

-1000

-800

-600

-400

-200

0

200

q 11

Figure 7.7: Another view of Figure 7.6

The eigenvectors can be used to identify the dimensions that contribute most to surge. Rows of the eigenvector matrix correspond to dimensions of the original data. Large values indicate that that vector has a large component from the corresponding dimension. In this data, the large values near 1 and -1 in the last two columns of Eq. (7.1), q10 and q11 , indicate that motor electric power and blow-off valve value are the most significant variables.

7.2.4 Support Vector Machine Data is first scaled using the method discussed in Chapter 5. Once the data has been scaled, the support vector machine mapping can begin. In order to find the optimum separating hypersurface, the Lagrange Multipliers method of nonlinear support

79

vector machine was used. When the p value in the kernel is set to 2, the support vector machine is unable to correctly separate the data. The results are pictured in Figures 7.8 and 7.9. Historical Data Points

25 24 23 22

20 19 18 17 16

130 150 170 190

15 1210 1180

1150 1120 1090 1060 1030 1000 970 Electric Power (W)

Figure 7.8: SVM separation surface for p = 2

940

Motor Current (A)

IGV Opening (%)

21

80

IGV Opening (%)

25 20 15 1210 190

1150 170

1090 150

1030 Electric Power (W)

970 130

Motor SurgeCurrent (A)

Figure 7.9: Another view of Figure 7.8

In order to correctly classify all of the points, p has to be set to at least 4, so that the feature space is of dimension 38. Using this higher-dimension space, a good separating hypersurface is found to correctly separate all of the points. hypersurface has been generated in scaled coordinates.

Also note that the

In order to implement this

mapping, either the hypersurface must be transferred to the non-scaled coordinates, or new testing data must be transferred to the scaled coordinates. As can be seen in Eq. (5.3), the first option can be accomplished by multiplying the hypersurface by the scaling factor λ , or the second option can be accomplished by dividing a testing point by λ . The second option has been chosen for this study, because it is more efficient and requires less memory space in a program for implementation. A comparison of results for different mapping methods is shown in Figure 7.10. In this figure, the solid line represents the SVM modeling, the dashed line represents the

81

“current limit low” surge line currently used for surge prevention, and the dot-dashed line represents the MLP model. 35

IGV Opening (Percent)

30

Surge Points Not-surge Points SVM Model Current Limit Low MLP Model

25

20

15 120

130

140

150 160 Current (Amps)

170

180

190

Figure 7.10: Comparison of MLP, SVM and “current limit low” lines for two-dimensional surge data

7.3 Surge Map Modeling with Surge Test Data 7.3.1 Data Collection Surge tests were conducted on the TMMK testbed, i.e. the three-stage centrifugal air compressor as described in the previous section. First, the following sensors were installed in the testbed: pressure sensors positioned at the inlet and outlet of the IGV and the outlet of the first, second, and third compression stages, temperature sensors positioned at the inlet of the IGV and the outlet of the first, second, and third compression

82

stages, a potentiometer measuring the opening position of the IGV, an ambient humidity sensor, an airflow sensor, and a current sensor recording the motor current. The sensors were wired to a PXI data acquisition system NI 6030E from the National Instruments Corporation. The sampling frequency for the above sensors is set to 2000 Hz. A diagram of sensor locations within the compressor can be found in Appendix A, Figure A.1. After these sensors had been installed, surge tests were performed on the testbed. During a surge test, the compressor was operated in an operating region that is known not to surge, due to the large IGV opening. The IGV was then closed in small decremental steps of 1% opening in order to reduce the airflow and bring the compressor closer to surge. Each step was held for 3-5 seconds to allow the compressor to achieve steady state at this new operating condition, in order to improve the reliability of surge determination. If the compressor did not surge in this time, the IGV was again closed 1% and the process was repeated. During the surge tests, the compressor operator identified the occurrence of a surge event by listening to the surge-induced noise. The noise is due to the large pressure and airflow oscillations that begin when surge occurs. When a surge event is detected, the operator opens the blow-off valve in order to halt surge, and the surge test is completed. In order to account for different ambient weather conditions, surge tests must be conducted during different seasons of the year. For this study, 25 surge tests were conducted between August 8 and October 28, 2005. The recorded variables are also used to derive certain additional variables, such as compression ratios and polytropic heads for each stage and the whole compressor. There

83

are a total of 22 variables.

Table 7.1 lists all the variables that are recorded and

calculated, along with units and four sample data points.

Table 7.1: List of 22 variables recorded/calculated and four sample data points Date Variable IGV Position (% Open) 1st Stage Air Pressure (psig) 2nd Stage Air Pressure (psig) 3rd Stage Air Pressure (psig) Airflow (scfm) Motor Current (A) BOV Position (% Closed) Relative Humidity (%) 1st Stage Discharge Air Temperature (F) 2nd Stage Discharge Air Temperature (F) 3rd Stage Discharge Air Temperature (F) IGV Pressure Out (psia) IGV Pressure In (psia) Inlet Air Temperature (F) Pressure Differential Across IGV (psia) 1st Stage Compression Ratio 2nd Stage Compression Ratio 3rd Stage Compression Ratio Total Polytropic Head 1st Stage Polytropic Head 2nd Stage Polytropic Head 3rd Stage Polytropic Head

8-08-2005 23.729 15.271 49.584 123.56 4720.3 150.76 99.863 54.156 279.29 255.84 244.94 13.907 14.123 80.240 0.2157 2.1136 2.1673 2.1612 7973.1 2064.3 2931.3 2826.6

9-06-2005 16.035 13.492 46.867 119.71 4656.6 145.73 99.865 62.643 260.81 250.55 245.17 13.423 14.211 72.445 0.7884 2.0639 2.2047 2.1926 7698.5 1962.8 2928.7 2864.5

9-15-2005 23.228 14.916 48.625 121.08 4738.7 148.33 99.861 56.020 273.74 247.64 220.53 13.831 14.108 81.325 0.2778 2.0986 2.1614 2.1551 7906.6 2046.6 2897.7 2782.7

9-30-2005 17.110 14.332 47.906 120.16 4687.9 145.80 99.861 36.951 264.19 257.73 251.35 13.554 14.135 77.984 0.5812 2.1003 2.1794 2.1647 7817.3 2036.4 2894.3 2840.7

7.3.2 Data Analysis The data acquisition system recorded 32 seconds of data for each surge test. The data collected from the sensors in the testbed compressor must be analyzed to determine when surge occurred. Figures 7.11 and 7.12 are plots of data collected from some of the sensors during a surge test performed on September 15, 2005. Figure 7.11 is a plot of the IGV opening and BOV closing positions during the test. Figure 7.12 is a plot of the inlet

84

pressure of the IGV, the outlet pressure of the IGV, the pressure differential across the

IGV Opening (%)

IGV, and the air temperature at the IGV. In these plots, the surge event is shown in bold.

25 20 IGV Feedback IGV Command

15

5

15

20

25

30

20

25

30

Surge

100 BOV Closing (%)

10

80 60 40 20 0

5

10

15 Time (Sec)

Figure 7.11: Plot of IGV and BOV positions during a surge test

IGV Outlet Pressure (psia) Pressure Differential (psia) IGV Inlet Pressure (psia) Inlet Air Temp (° F)

85

18 16 14 5

10

15

20

25

30

Surge

15 14 13 5

10

15

20

25

30

5

10

15

20

25

30

15 20 Time (Sec)

25

30

0 -2 -4

Surge

81 80 79

5

10

Figure 7.12: Plot of pressure and temperature measurement during a surge test

Surge initiation was determined by noting when large pressure oscillations began. In particular, the plot of pressure differential across the IGV was most used for this because this measurement frequently was the first to exhibit the oscillations. Surge was determined to have stopped when the BOV opened. In Figures 7.11 and 7.12, the area identified as a surge event is plotted in bold. Note the large pressure oscillations in this area.

86

Once the surge event has been identified, a surge operating condition and a notsurge operating condition can be identified. An “operating condition” is considered to be a static state of compressor operation. Since the state of operation of the compressor may experience dramatic transience when the IGV position is changed, an operating condition is determined as the average value of each variable while the IGV is stationary. The operating condition that occurs nearest in time before the onset of surge oscillations is identified as a surge operating condition. The condition immediately preceding the surge condition is identified as a not-surge operating condition. Table 7.2 indicates the surge and not surge operating conditions identified for the surge test that is illustrated in Figures 7.11 and 7.12.

Table 7.2: Surge and not-surge operating conditions identified for the first surge test performed on September 15, 2005. Test 001 9-15 Variable

not surge 11.4 ~ 14.0 23.228 14.916 48.625 121.08 4738.7 148.33 56.020

surge 14.7 ~ 17.2 22.140 14.846 48.419 120.89 4667.1 145.83 56.019

273.74

273.87

Outlet Temperature Stage 2 (°F)

247.64

247.94

Outlet Temperature Stage 3 (°F) IGV Inlet Pressure (psia) IGV Inlet Temperature (°F) Pressure Differential Across IGV (psia) Compression Ratio 1 Compression Ratio 2 Compression Ratio 3 Total Polytropic Head (J) Polytropic Head Stage 1 (J) Polytropic Head Stage 2 (J) Polytropic Head Stage 3 (J)

220.53 14.108

221.74 14.115

81.325 0.2778 2.0986 2.1614 2.1551 7906.6 2046.6 2897.7 2782.7

81.319 0.3147 2.0985 2.1593 2.1590 7897.8 2046.5 2894.0 2791.3

Time Duration for Averaging (sec) IGV Opening (%) Outlet Pressure Stage 1 (psia) Outlet Pressure Stage 2 (psia) Outlet Pressure Stage 3 (psia) Airflow (scfm) Motor Current (A) Relative Humidity (%) Outlet Temperature Stage 1 (°F)

87

For each surge test performed, a surge operating condition and a not-surge operating condition are identified.

For some tests, a not-surge point could not be

identified due to the location of the surge event in the data window. If surge began too close to the beginning of the data window, a not-surge point could not be identified for that test. As can be seen in Table 7.2, the surge and not-surge conditions comprise two vectors with 20 elements each (not including the Time Duration for Averaging element, which is not an actual variable of the operating condition). These vectors are the surge and not-surge points to be used in creating a surge map. Twenty-five surge points and 22 not-surge points were identified from surge tests performed between August 8, 2005 and October 28, 2005.

7.3.3 Principal Components Analysis Each surge or not-surge point is a vector in 20 dimensions.

This high

dimensionality will cause difficulty in surge mapping, due to the factors discussed in Chapter 4, such as numerical inaccuracy and the difficulty of implementing a highdimensional model.

Principal components analysis was used to reduce the

dimensionality of the data points. Prior to beginning PCA, the data is scaled as described in Chapter 4. Once the data is properly scaled, the PCA algorithm can be applied. The covariance matrix of the scaled data was calculated, then the eigenvalues and eigenvectors of the covariance matrix were found. The program for PCA implementation can be found in Appendix C. Large eigenvalues indicate a large amount of separation in the direction of the corresponding eigenvectors.

Rows of the eigenvector matrix

correspond to original dimensions. “Important” variables to be kept and used in the surge model can be identified by locating large elements of the eigenvectors that correspond to

88

large eigenvalues. Table 7.3 is a table of the eigenvalues calculated for the surge data, along with the eigenvectors that correspond to the large eigenvalues. Large eigenvalues and their corresponding eigenvectors are highlighted in light gray. In Table 7.3, the bottommost eigenvalue corresponds to the rightmost eigenvector. Notice in Table 7.3 that the humidity entry in the rightmost eigenvector is -0.9554. This is a very large element, and indicates that humidity is an important variable because the data have significant separation in this direction. Other variables that were identified as important variables include Outlet Temperature Stage 1 and Outlet Temperature Stage 3, and are highlighted in dark gray in Table 7.3.

Table 7.3: Eigenvalues and the eigenvectors corresponding to large eigenvalues of the covariance matrix of the surge data Eigenvalues 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0002 0.0006 0.0013 0.0028 0.0065 0.5286 1.2108 2.2982 4.4152 7.2254 21.4655 71.9089 102.3378 344.9266

Eigenvectors 0.4514 0.1116 0.3854 0.3000 0.0248 0.6752 0.0538 -0.0054 0.2018 0.0931 -0.0085 0.1872 -0.0303 -0.0003 -0.0004 -0.0027 0.0159 0.0068 -0.0009 -0.0043

-0.0442 -0.0041 -0.0534 -0.0283 -0.0067 -0.1661 0.0181 0.2736 0.3517 0.8747 -0.0010 0.0529 0.0000 0.0002 -0.0001 0.0000 0.0000 -0.0001 0.0008 0.0014

0.1525 0.0079 -0.0157 -0.0441 -0.0026 -0.0805 0.2802 0.8856 -0.0361 -0.2870 -0.0012 0.1459 -0.0120 -0.0014 -0.0015 0.0002 0.0009 -0.0012 0.0008 0.0003

Variables 0.0201 0.0045 0.0406 0.0238 0.0000 0.0194 -0.9554 0.2741 0.0451 -0.0779 0.0031 0.0297 -0.0020 -0.0003 -0.0003 0.0001 0.0003 -0.0002 0.0005 0.0004

IGV Opening Outlet Pressure Stage 1 Outlet Pressure Stage 2 Outlet Pressure Stage 3 Airflow Motor Current Relative Humidity Outlet Temperature Stage 1 Outlet Temperature Stage 2 Outlet Temperature Stage 3 IGV Inlet Pressure IGV Inlet temperature Pressure Differential Across IGV Compression Ratio 1 Compression Ratio 2 Compression Ratio 3 Total Polytropic Head Polytropic Head Stage 1 Polytropic Head Stage 2 Polytropic Head Stage 3

89

It is important to include the IGV Position variable in the surge model, as this is the only variable that can be directly controlled. In order to limit the dimensionality of the model to three dimensions, only two variables were kept in addition to the IGV Opening variable. Relative Humidity and Outlet Temperature Stage 1 were the variables chosen, as these variables correspond to large elements of the eigenvectors corresponding to large eigenvalues. Once the most significant variables have been identified, the most significant variables can be used in the surge map modeling, while eliminating the rest of the variables. The method of modeling used is the support vector machine.

7.3.4 Support Vector Machine Before modeling begins, data is first scaled according to the method described in Chapter 5.

Then, surge data points in three dimensions (IGV Opening, Relative

Humidity, and Outlet Temperature Stage 1) were used with the support vector machine method to model surge. In order to use this method, a subset of all surge and not-surge points was reserved to be used for training. This subset consisted of approximately 80% of all surge and 80% of all not-surge points. It is important to carefully choose the points to be used for training because some of the data is closely clustered together, due to similar ambient air conditions during some different surge tests. Training with closely clustered data can result in overfitting the data. The subset was used to train the model, then all points were used to test the model. The method of support vector machine was applied to the training subset of points. First, the kernel was calculated according to Eq. (5.19). The value of p was set to

90

8 in order to transfer the data to a sufficiently high-dimensional space for the data to be linearly separable. The equation for the kernel used is: K ( x i , x j ) = ( x iT x j + 1) 8

(7.3)

The desired output for surge is set at +1, and the desired output for not-surge is set at -1. The support vector machine code can be seen in Appendix D. Due to the nonlinearity of the resulting surge model, it cannot be easily plotted. However, each point can be input into the model to determine the point’s distance from the surge map hypersurface. A positive distance indicates that the point is predicted to surge, and a negative distance indicates that the point is predicted to not surge. Outputs for all data are shown in Table 7.4. The outputs from the training data are shaded in grey, and those from the testing data are not shaded.

91

Table 7.4: SVM output for all training and testing data Point Number

Surge Points Output 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

1.3276 0.6234 1.0778 0.7120 0.7613 0.7407 0.3477 0.9613 1.1856 0.9526 1.1384 1.1023 0.9724 1.0309 0.7362 1.1817 4.3038 38.0930 2.3878 0.9486 9.3465 5.7033 4.3349 1.8929 6.3300

Not Surge Points Output -1.1726 -1.5568 -1.2388 -1.4238 -0.9507 -1.2339 -0.8724 -1.0557 -0.9266 -1.1279 -1.1180 -1.1617 -1.2884 -0.9750 -1.1400 -2.5121 -3.3625 -2.6750 -2.7209 3.1774 5.8007 8.2476

Both training and validation data for surge points have all positive outputs, indicating that they have been classified correctly. All the training data for not-surge points have negative outputs, indicating that they are correctly classified, while 4 out of 7 validation data for not-surge are correctly classified. The results show that success rate of validation was 100% for surge points and 57.14% for not-surge points. This is a low percentage of correct classification for the not-surge points. However, training points were selected under the condition that surge classification must be 100% correct. If the surge classification rate for validation points is allowed to be lower, such as 80%, then a

92

better classification for validation points in the not-surge class, also around 80%, is also attainable. Therefore, the low percentage of correct not-surge classification is due to the required high percentage of correct surge classification. The accuracy of the map in classifying validation data varies with the choice of the training data set. Using a different subset of the data for training will result in a less accurate classification of validation data.

Table 7.5 lists the model output using a

different subset of data for training.

Table 7.5: SVM output for all training and testing data, using a different subset of data for training Point Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Surge Points Not Surge Points Output Output 1.0571 -1.0517 1.1379 -1.0577 0.7484 -1.0026 0.7880 -1.1466 0.9241 -0.6987 1.0819 -0.9894 0.9882 -1.1467 1.1149 -0.9763 1.0041 -1.1498 0.8463 -0.9508 1.1357 -0.9578 1.0055 -0.9399 1.0104 -1.0136 0.9403 -1.0045 0.9244 1.0171 0.9056 30.5449 0.8971 -0.0263 4.4785 0.1417 7.6515 -2.6361 8.9745 3.4389 1.3600 7.9400 14.8537 8.0684 14.1912 12.3592 2.3865

93

After testing many possible choices for the training data subset, the set in Table 7.4 was chosen because it provides the most accurate classification. The model output indicates how far from the surge hypersurface are the points. Tables 7.6 – 7.8 are comparisons of model outputs for the SVM model and three MLP models. It can be seen from these tables that while the SVM model locates the separating hypersurface equidistant from the surge and not-surge points, the MLP model locates the hypersurface close to some points and far from other points. In addition, the SVM model is significantly more accurate than the MLP model in testing point classification. In Tables 7.6 – 7.8, model outputs for training data are shaded gray while outputs for testing points are not shaded.

94

Table 7.6: Comparison of training and testing point outputs for the SVM model and MLP model #1 Point Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

SVM Surge SVM Not Surge MLP Surge MLP Not Surge Points Output Points Output Points Output Points Output 1.3276 -1.1726 0.8036 -1.0643 0.6234 -1.5568 1.2704 -0.9796 1.0778 -1.2388 0.9324 -0.9861 0.7120 -1.4238 0.8268 -1.0247 0.7613 -0.9507 1.2324 -1.0141 0.7407 -1.2339 0.3362 -0.9414 0.3477 -0.8724 0.5447 -0.9580 0.9613 -1.0557 0.1027 -1.3396 1.1856 -0.9266 0.6117 -1.0404 0.9526 -1.1279 0.3468 -0.1368 1.1384 -1.1180 0.2664 0.0480 1.1023 -1.1617 0.2080 -0.5005 0.9724 -1.2884 0.6441 -0.1825 1.0309 -0.9750 1.2947 -0.2965 0.7362 -1.1400 0.9597 -0.5189 1.1817 -2.5121 1.0327 -0.0611 4.3038 -3.3625 -0.4876 0.4174 38.0930 -2.6750 -0.3429 0.4795 2.3878 -2.7209 -1.0802 0.6844 0.9486 3.1774 -0.8480 0.6106 9.3465 5.8007 -0.5201 0.8033 5.7033 8.2476 -0.5731 0.5774 4.3349 -0.3251 1.8929 0.7780 6.3300 1.1485

95

Table 7.7: Comparison of training and testing point outputs for the SVM model and MLP model #2 Point SVM Surge SVM Not Surge MLP Surge MLP Not Surge Number Points Output Points Output Points Output Points Output 1 1.3276 -1.1726 1.0210 -0.8603 2 0.6234 -1.5568 1.0815 -0.9045 3 1.0778 -1.2388 1.1493 -0.8926 4 0.7120 -1.4238 1.0107 -1.1379 5 0.7613 -0.9507 1.0630 -0.8519 6 0.7407 -1.2339 0.2178 -1.1680 7 0.3477 -0.8724 0.5452 -1.1666 8 0.9613 -1.0557 0.2194 -1.0354 9 1.1856 -0.9266 0.5787 -0.1265 10 0.9526 -1.1279 0.4642 -0.2317 11 1.1384 -1.1180 0.6450 -0.6960 12 1.1023 -1.1617 0.1398 -0.1778 13 0.9724 -1.2884 0.8047 -0.5210 14 1.0309 -0.9750 1.2789 -1.1601 15 0.7362 -1.1400 1.0244 -0.1769 16 1.1817 -2.5121 0.9252 -0.3772 17 4.3038 -3.3625 -0.9846 0.3637 18 38.0930 -2.6750 -1.9315 0.6797 19 2.3878 -2.7209 -1.5150 1.0230 20 0.9486 3.1774 -1.5111 1.2999 21 9.3465 5.8007 0.4310 1.0852 22 5.7033 8.2476 -1.4825 1.2107 23 4.3349 -0.9234 24 1.8929 0.9431 25 6.3300 1.0518

96

Table 7.8: Comparison of training and testing point outputs for the SVM model and MLP model #3 Point SVM Surge SVM Not Surge MLP Surge MLP Not Surge Number Points Output Points Output Points Output Points Output 1 1.3276 -1.1726 0.7951 -1.1574 2 0.6234 -1.5568 1.2433 -0.8049 3 1.0778 -1.2388 0.8833 -1.0227 4 0.7120 -1.4238 0.9531 -0.9008 5 0.7613 -0.9507 1.0788 -1.0248 6 0.7407 -1.2339 0.4145 -0.9508 7 0.3477 -0.8724 0.5775 -0.7502 8 0.9613 -1.0557 0.0999 -1.1583 9 1.1856 -0.9266 0.5132 -0.9948 10 0.9526 -1.1279 0.6241 -0.2323 11 1.1384 -1.1180 0.2205 -0.1214 12 1.1023 -1.1617 0.2292 -0.6058 13 0.9724 -1.2884 0.8067 0.1465 14 1.0309 -0.9750 0.7823 0.0097 15 0.7362 -1.1400 0.9049 -0.3623 16 1.1817 -2.5121 1.0191 0.2501 17 4.3038 -3.3625 -0.6324 0.4814 18 38.0930 -2.6750 0.1915 0.2353 19 2.3878 -2.7209 -0.8957 0.9799 20 0.9486 3.1774 0.0137 0.3298 21 9.3465 5.8007 -0.0798 1.0290 22 5.7033 8.2476 -1.5408 1.1148 23 4.3349 -1.1716 24 1.8929 0.4480 25 6.3300 1.0818

Distance from the surge hypersurface is the absolute value of the support vector machine model output. Notice from Table 7.6 that in the SVM model, the surge points are an average of 1.0 distance away from the surge line, and the not-surge points are also an average of 1.0 distance away from the surge line. As was described in Chapter 6, this is not the most desirable case. In order to prevent a missed surge prediction, it is more desirable for the surge hypersurface to be close to the not-surge points. This can be achieved through use of the asymmetric support vector machine.

97

7.3.5 Asymmetric Support Vector Machine The method of asymmetric support vector machine, as described in Chapter 6, is to increase the value of the desired output for one class of points. The separating hypersurface will then be located closer to this class of points. In order to move the surge hypersurface close to the not-surge points, the desired output for not-surge points was decreased from -1.0 to -10.0, while the desired output for surge points was maintained at +1.0. The support vector machine code was then run with the new desired outputs, and all points were input into the model to find the new model output. The new output is shown in Table 7.9, where outputs for training points are shaded gray and outputs for testing points are not shaded. The ASVM model correctly classifies all 16 training surge points and all 9 validation surge points. It also correctly classifies all 15 not-surge training points and 4 of the 7 not-surge validation points. Notice that the correctly classified not-surge points are now an average of 0.1 distance away from the surge hypersurface, instead of 1.0 as with the support vector machine model. The surge points, however, are maintained at the same average of 1.0 distance from the surge hypersurface. Thus, the asymmetric support vector machine method successfully located the surge hypersurface closer to the notsurge points.

98

Table 7.9: ASVM surge map modeling results for all training and testing points Point Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Surge Points Not Surge Points Output Output 1.1660 -0.0778 0.8310 -0.3205 1.0187 -0.2097 0.8289 -0.2222 0.9041 -0.0107 0.8726 -0.2829 0.6728 -0.1474 0.9370 -0.1090 1.0668 -0.0402 0.9914 -0.2224 1.0398 -0.1109 1.0462 -0.0719 1.0102 -0.1733 1.0173 -0.0471 0.8531 -0.0943 1.0507 -0.6450 1.9312 -1.0385 18.5469 -0.6791 0.8873 -0.8232 0.0965 1.9924 5.1689 3.5898 2.2608 4.7556 1.6311 1.7033 3.8548

99

7.3.6 Model Validation The model obtained from the asymmetric support vector machine method was implemented on the same compressor used to conduct the surge tests. The testing occurred in January and May of 2006. The model was written as a program in RSLogix 5000. The program received inputs from the temperature sensor positioned after stage 1, the ambient humidity sensor, and the IGV position potentiometer. The three inputs were processed through the ASVM network to produce a single output. This single output indicates the distance from the surge line of the operating condition defined by the current temperature, humidity, and IGV position. A positive distance indicates that surge is predicted for the current operating condition, and a negative distance indicates that surge is not predicted. The model output during the first surge test performed on January 18, 2006 is shown in Figure 7.13 along with the relative humidity, IGV opening, first stage outlet temperature, and pressure differential. The actual surge event is circled on the distance to surge plot and the pressure differential plot.

Relative Humidity (%)

100

63 62 61

Algebraic Distance to Surge IGV Opening (%) Outlet Temperature Stage 1 (° F) Pressure Differential (psia)

60

0

10

20

30

40

50

0

10

20

30

40

50

0

10

20

30

40

50

0

10

20

30

40

50

0

10

20

30

40

50

20 18 16 14

238 237 236 235 0 -5

-10 -15

Surge

5

0

-5

Motor Current (A)

Figure 7.13: Second surge test performed on January 18, 2006

101

It can be seen from Figure 7.13 that as surge is approached, the distance to the surge line becomes less negative. This confirms that surge is becoming closer. However, the model output is expected to cross zero before the surge event occurs, and it can be seen from Figure 7.11 that this does not occur. The reason for this incorrect prediction can be seen by inspecting the data collected during this test. The first stage discharge air temperature during this test is approximately 236.5 °F. However, the lowest first stage discharge air temperature for the data used to train the model is 245.8 °F. The ASVM model trained with the warmer data collected in summer and fall does not account for the colder temperatures seen during the verification test performed in winter. More training data must be collected during the colder winter months. The model must be re-trained with the colder temperature data in order to produce a map that is accurate year-round. Plots of data collected during an additional surge test performed on the cold day of January 18 can be seen in Appendix E, Figure E.1. Surge tests were also performed on January 13 and May 24 of 2006. These warmer days verify the surge map. Figure 7.14 shows the three variables used for the map: relative humidity, IGV opening, and outlet temperature for stage 1 plotted with the output of the RSLogix program, representing the algebraic distance to surge, and the pressure differential across the IGV, indicating the onset of surge.

36

35

34

0

10

20

30

40

50

60

0

10

20

30

40

50

60

0

10

20

30

40

50

60

0

10

20

30

40

50

60

0

10

20

30

40

50

60

25 20 15 10 269 268 267 266 10

0

Pressure Differential (psia)

Algebraic Distance to Surge

IGV Opening (%) Outlet Temperature Stage 1 (° F)

Relative Humidity (%)

102

-10

Surge

5

0

-5

Time (sec) Figure 7.14: First surge test performed on May 24, 2006

103

When the algebraic distance to surge is negative, surge is not predicted. When the algebraic distance to surge is positive, surge is predicted. From Figure 7.14, it can be seen that at 0 seconds, surge is predicted to be imminent, based on the positive algebraic distance to surge. At approximately 2 seconds, the relative humidity drops and the algebraic distance to surge decreases to approximately 0, indicating that a surge condition is still present, but surge is less imminent. At approximately 25 seconds, the IGV closes a small amount, which increases the surge danger. The algebraic distance to surge increases to deeper within the surge region. Approximately 15 seconds after this increase, surge occurs. Additional plots from the tests performed on January 13 and May 24 can be seen in Appendix E, Figures E.2-E.5.

104

Chapter 8 Conclusions This chapter will present a summary of this thesis, including the results that have been obtained and the conclusions that have been drawn from these results. Section 8.1 reviews the main goals and procedures of the research and outlines the main findings of the research that was undertaken. Section 8.2 isolates the research that is specifically a contribution of this thesis.

Section 8.3 summarizes future work that is possible to

enhance this research, and section 8.4 presents some concluding remarks concerning the importance and possibilities for application of this research.

8.1 Thesis Research Overview Surge is a phenomenon consisting of large pressure and airflow oscillations that occurs in centrifugal air compressors. The large pressure and airflow oscillations which are characteristic of surge can cause damage to the air compressor.

By accurately

determining the conditions that cause surge, this damage can be prevented. While it is known that surge occurs during conditions of low airflow and high pressure operation, specific conditions under which surge occurs are not accurately known. Based on datadriven modeling techniques, this thesis research aims to provide more accurate surge maps that are adaptive to ambient conditions. This research attempts to determine the conditions that cause surge by viewing surge map modeling as a pattern classification problem. Conditions that cause surge are identified as “surge points” and conditions that do not cause surge are identified as “not-

105

surge points.”

A pattern classification method is used to locate the separating

hypersurface between these points. By viewing the problem in this way, it is possible to account for many variables, including ambient air conditions. Principal components analysis was used to reduce the dimensionality of the surge and not-surge data used for classification. This was necessary to prevent numerical instability, to identify the variables that are most significant to surge, and to make the resulting model implementable. Two methods were investigated for modeling surge from the dimensionalityreduced surge data: multilayer perceptron and support vector machine.

Multilayer

perceptron was used as a preliminary solution. The method of support vector machine was used in order to locate the optimal separating hypersurface between the surge and the not-surge points. The method of asymmetric support vector machine was then developed in order to locate the separating hypersurface closer to the not-surge points, and thus reduce the possibility of missed surge prediction. Surge tests were performed on a testbed compressor at Toyota Motors Manufacturing in Kentucky, Inc. by on-site engineers. Sensors were installed throughout the compressor, then measurements were taken from the sensors while the compressor was put through a surge event. The data collected was used to determine the conditions under which surge occurred. The obtained data was used to model the surge map using the pattern classification methods as described. The two most important findings obtained through this research are the identification of the variables that most contribute to surge and the determination of effective methods for modeling such nonlinear data as surge data.

106

The identification of which variables contribute most to surge was accomplished through the principal components analysis method.

First, the principal components

analysis method was used to determine along which direction vectors most separation of the data was achieved. Then, the largest elements of these direction vectors indicate which variables cause most separation and contribute most to surge. It was found that the variables which contribute most to surge are ambient humidity and the temperature variables within the compressor. This finding is significant because previously, air condition variables have not been used in surge mapping. The research done here indicates that not only are these variables important for surge prediction, but they are, in fact, the most important variables, and must be incorporated into the mapping in order to obtain an accurate mapping. Two pattern classification methods were tested and compared in the surge mapping process: multilayer perceptron and support vector machine. It was found that while the multilayer perceptron is sometimes accurate in the modeling, it is not consistent.

The method of support vector machine was found to provide a more

consistent model by locating the optimal solution for the modeling problem. The surge map obtained by the support vector machine model was tested on the same compressor used to perform the surge tests. The surge map was shown to be inaccurate during days when the temperature was colder than the temperatures of the training data, indicating the need for cold weather data to be incorporated into the model. During the warmer days of testing when conditions fell within the ranges accounted for by the training data, the model correctly predicted the onset of surge.

107

8.2 Research Contributions The major contributions of this thesis include three aspects: (1) the development of a data-driven modeling approach for obtaining surge maps, (2) the incorporation of ambient air conditions into a surge mapping model, and (3) the development of the method of asymmetric support vector machine to manipulate the distance between data points and the separating hypersurface. Ambient air conditions have not previously been incorporated into surge maps although, as empirically known, ambient air conditions have significant impact on the actual surge map. Lack of such information in past surge maps has led to low efficiency and limited dynamic range for air compressors. Since the location where a compressor is tested to determine a surge map is often not the same location as where the compressor will be operated, the air compressor operator is forced to adapt surge map information to the climate of the operating location. Also, in locations with significant seasonal change, operators must re-calibrate surge map information seasonally to account for the air conditions of the season. By incorporating humidity and temperature information into the surge model, compressor operators are spared the additional work of frequent re-calibration. Compressor operation is safe from the potential dangers of surge during an unseasonably warm day, and the compressor gains efficiency during colder days by accounting for the additional safe operating space gained in cold temperatures. This research has also contributed the development of the asymmetric support vector machine method. This method is an improvement over the traditional support vector machine method by giving the user control over the distance between points to be

108

classified and the separating hypersurface. In some applications, as is the case in the surge mapping application, it is more dangerous to have a misclassification in one class than to have a misclassification in the other class. Previously, there has been no way to account for this. With the method of asymmetric support vector machine, the user can determine how much closer the hypersurface should be to one class, then change the parameters accordingly. This method gives the user more control over the model output than has been possible previously. This is a general modification to the support vector machine method, and can be applied to many other applications as well.

8.3 Future Work In order to make the surge models obtained fully implementable on an air compressor, the models must account for the full range of possible ambient air conditions. The tests performed for this research spanned only the months from August until November. The conditions covered did not include the typical cold, dry conditions present in the winter months of Kentucky. As a result, the surge model obtained is not accurate when the ambient temperature and humidity do not fall within the ranges that occur between August and November. In the colder winter months, the model is not accurate. Surge tests must be performed during the cold winter months, and the data must be incorporated into the model. Once the model is accurate for a sufficient range of ambient conditions, then work can begin to extend the surge model to other types of compressor systems, and in other climates. Support vector machine is a method that is applicable only to relatively small data sets. The larger the data set used for training, the more convoluted is the surface that is found [65]. When an overly convoluted surface is found to separate classes of data, the

109

data is said to be “overfit.” To avoid overfitting on a larger data set, a smaller set of training data must be chosen from among the available data [65]. This choosing of a smaller set of training data from among the available data also solves the long training time problem. Work must be done to efficiently choose effective training data from among the available data. In addition to the problem of overfitting, a large data set also causes problems in training due to memory and time contraints. The support vector machine method is a quadratic optimization problem with a number of variables equal to the number of data points. If the number of data points is large, this optimization problem becomes difficult to solve [67, 68]. Modifications to the SVM method will have to be made in order to allow for a large data set. The choice of kernel for support vector machine is another possible area for study. This research has only considered use of the polynomial kernel with the support vector machine method. This choice was made because the dimensionality of this kernel can be manipulated.

However, it is not known if another kernel, such as the radial-basis

function or two-layer perceptron kernels, would produce better results. Surge mapping is a method of surge avoidance. Due to the unpredictable nature of the factors that affect surge, such as ambient air conditions, it is feasible that surge avoidance may fail on occasion. Should this happen, surge must be halted promptly. In an event such as this, it would be necessary to detect surge early enough to initiate measures to halt surge before damage occurs. methods of surge detection.

Further study is needed to improve

110

Use of surge detection could also contribute to an improvement in surge mapping by creating an adaptive surge map. The surge map obtained by neural network methods is only accurate within conditions that are accounted for in training data. If a condition occurs during implementation that is not accounted for in the training data, the new operating condition may be misclassified. However, if a method of surge detection is employed, the operating condition would be determined to be a surge point. This point would then be incorporated as a training point, and the previously unaccounted-for condition would then be accounted for. In this way, the surge map could be made to be adaptive by integrating a surge detector with the surge map. The research for this thesis has focused specifically on centrifugal air compressors. Work is needed to investigate how this method may be applied also to other types of compressors, such as axial airflow compressors.

8.4

Concluding Remarks Compressed air is an energy source that is widely used in many industries, and

there is great potential for benefit from improving the efficiency of air compressors. Air compressor efficiency is limited by the operating range restrictions imposed by the threat of surge. Accurate and dependable prediction of surge can effectively increase the operating range of a compressor, thus increasing efficiency. The methods used in this study have proven to be effective for accurately modeling surge. Due to the incorporation of ambient air conditions, the model obtained can be extended to apply not only to compressors in Kentucky, but compressors in any world climate. The accurate surge model will allow compressors the additional operating range needed to operate in the highly efficient region near the surge limit. This efficiency

111

increase will result in lower costs for the many industries that depend on compressed air as a primary energy source.

112

References [1] Western Interstate Energy Board , March 2001. http://www.westgov.org/wieb/ap2forum/mar2001/aircompress.htm [2] B. de Jager, “Rotating Stall and Surge Control: A Survey,” Proceedings of the 34th Conference on Decision & Control, 1995, pp. 1857-1862 [3]

P. C. Hanlon, Compressor Handbook. New York: McGraw-Hill, 2001

[4] N. P. Cheremisinoff and P. N. Cheremisinoff, Compressors and Fans, Prentice Hall, Englewood Cliffs, New Jersey, 1992, pp. 132-156 [5] J. Lucius, “Turbocharger Compressor Flow Maps for 3000GT and Stealth Owners.” http://www.stealth316.com/2-3s-compflowmaps.htm [6] Micon Centrifugal Compressor Control Considerations. Micon Systems. http://www.miconsystems.com/papers/centrif.htm. [7] J. T. Gravdahl, and O. Egeland, Compressor Surge and Rotating Stall, SpringerVerlag London Limited, 1999, pp. 14-17, 39-43 [8] R. L. Elder, and M. E. Gill, “Discussion of the Factors Affecting Surge in Centrifugal Compressors,” Journal of Engineering for Gas Turbines and Power, Vol. 107, 1985, pp. 499-506 [9] J. T. Gravdahl, F. Willems, B. De Jager, and O. Egland, “Modeling for surge control of centrifugal compressors: comparison with experiment,” Proceedings of the IEEE Conference on Decision and Control, Vol. 2, 2000, p. 1341-1346 [10] R. O. Duda, R. E. Hart, and D. G. Stork, Pattern Classification. Second edition, John Wiley & Sons, Inc. New York, 2004, p.20 [11] S. Haykin, Neural Networks; A Comprehensive Foundation, Second Edition, New Jersey: Prentice Hall, 1999 [12] M. P. Wernet, M. M. Bright, and G. J. Skoch, “An Investigation of Surge in a High-Speed Centrifugal Compressor Using Digital PIV,” ASME Journal of Turbomachinery, Vol. 123, 2001, pp. 418-428 [13] D. A. Fink, N. A. Cumpsty, and E. M. Greitzer, “Surge Dynamics in a Free-Spool Centrifugal Compressor System,” ASME Journal of Turbomachinery, Vol. 114, 1992, pp. 321-332 [14] K. K. Botros, “Transient Phenomena in Compressor Stations During Surge,” Journal of Engineering for Gas Turbines and Power, Vol. 116, 1994, pp. 133-142

113

[15] A. Elzahby, A. Ghenaiet, and S. Elfeki, “Theoretical Prediction of Radial Flow Compressor Surge Line,” Modeling, Measurement, and Control, Vol. 41, 1994, pp. 53-63 [16] J. T. Gravdahl and O. Egeland, “Speed and Surge Control for a Low Order Centrifugal Compressor Model,” Modeling, Identification and Control, Vol. 19, 1998, pp. 13-29 [17] E. Logan Jr., Turbomachinery, Basic Theory and Applications, Marcel Dekker, Inc., New York, 1981 [18] R. N. Brown, Compressors Selection and Sizing, Houston, TX: Gulf Publishing Company, 1986 [19] G. L. Arnulfi, P. Giannattasio, C. Giusto, A. F. Massardo, D. Micheli, and P. Pinamonti, “Multistage Centrifugal Compressor Surge Analysis: Part II – Numerical Simulation and Dynamic Control Parameters Evaluation,” ASME Journal of Turbomachinery, Vol. 121, 1999, pp. 312-320 [20] F. Willems, and B. de Jager, “Modeling and Control of Compressor Flow Instabilities,” IEEE Control Systems Magazine, Vol. 19, No. 5, Oct. 1999, pp. 8-18 [21] A. Stein, S. Niazi, and L. N. Sankar, “Computational Analysis of Stall and Separation Control in Centrifugal Compressors,” Journal of Propulsion and Power, Vol. 16, 2000, pp. 65-71 [22] S. Seung, Multilayer Perceptrons and Backpropagation Learning, September 17, 2002 http://hebb.mit.edu/courses/9.641/lectures/lecture04.pdf#search='multilayer%20 perceptrons%20and%20backpropagation%20learning%20Seung' [23] “Perceptron,” Wikipedia, 14 March 2006 http://en.wikipedia.org/wiki/McCullochPitts_neuron [24] O. Veksler, “Pattern Recognition,” U of Western Ontario. http://www.csd.uwo.ca/faculty/olga/Courses/CS434a_541a/Lecture12.pdf#search='histor y%20of%20the%20Multilayer%20perceptron' [25] J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle, Least Squares Support Vector Machines, World Scientific, River Edge, NJ, Nov. 2002 [26] C. Moallemi, “Classifying Cells for Cancer Diagnosis Using Neural Networks,” IEEE Expert, Vol. 6, Issue 6, pp. 8, 10-12, Dec. 1991 [27] M. A. Sartori, K. M. Passino, and P. J. Antsaklis, “A Multilayer Perceptron Solution to the Match Phase Problem in Rule-Based Artificial Intelligence Systems,” IEEE Transactions on Knowledge and Data Engineering, Vol. 4, No. 3, June 1992, pp. 290-297

114

[28] S. K. Pal, and S. Mitra, “Multilayer Perceptron, Fuzzy Sets, and Classification,” IEEE Transactions on Neural Networks, Vol. 3, No. 5, September 1992, pp. 683-697 [29] M. S. Obaidat, and D. S. Abu-Saymah, “Methodologies for Characterizing Ultrasonic Transducers Using Neural Network and Pattern Recognition Techniques,” IEEE Transactions on Industrial Electronics, Vol. 39, No. 6, December 1992, pp. 529536 [30] Matlab Neural Network Toolbox Support. MathWorks, Inc. http://www.mathworks.com/access/helpdesk/help/toolbox/nnet/ [31] L. Wehenkel, “Applied Inductive Learning,” U. of Liege, October 2000 http://www.montefiore.ulg.ac.be/~lwh/AIA/notesaia.pdf#search='Applied%20Inductive%20Learning%20Louis%20Wehenkel%20October %202000' [32] M. G. Bello, “Enhanced Training Algorithms, and Integrated Training/Architecture Selection for Multilayer Perceptron Networks,” IEEE Transactions on Neural Networks, Vol. 3, No. 6, November 1992, pp. 864-875 [33] “Principal Components Analysis,” Wikipedia, 3 April 2006. http://en.wikipedia.org/wiki/Principal_components_analysis [34] “PCA Background,” UCLA-DOE Institute for Genomics and Proteomics , http://www.doe-mbi.ucla.edu/~parag/multivar/pcawords.htm [35] W. Zheng, J. Lai, and P. C. Yuen, “GA-Fisher: A New LDA-Based Face Recognition Algorithm With Selection of Principal Components,” IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics, Col. 35, No. 5, October 2005, pp. 1065-1078 [36] M. D. Farrell, Jr., and R. M. Mersereau, “On the Impact of PCA Dimension Reduction for Hyperspectral Detection of Difficult Targets,” IEEE Geoscience and Remote Sensing Letters, Vol. 2, No. 2, April 2005, pp. 192-195 [37] Guyon, V. Vapnik, B. Boser, L. Bottou, and S. A. Solla, “Capacity Control in Linear Classifiers for Pattern Recognition,” Pattern Recognition, 1992. Vol. II. [38] H. Byun and S. Lee, “A Survey on Pattern Recognition Applications of Support Vector Machines,” International Journal of Pattern Recognition and Artificial Intelligence, Vol. 17, 2003, pp. 459-486 [39] V. Vapnik and A. Lerner, “Pattern Recognition using generalized portrait method,” Automation and Remote Control, Vol. 24, 1963, pp. 774-780

115

[40] B.E. Boser, I.M. Guyon, and V.N. Vapnik, “A Training Algorithm for Optimal Margin Classifiers,” Proceedings of the Fifth Annual ACP Workshop on Computational Learning Theory, 1992, pp. 144-152 [41] G. Gomez-Perez, G. Camps-Valls, J. Gutierrez, and J. Malo, “Perceptual Adaptive Insensitivity for Support Vector Machine Image Coding,” IEEE Transactions on Neural Networks, Vol. 16, No. 6, November 2005, pp. 1574-1581 [42] A. Ganapathiraju, J. E. Hamaker, and J. Picone “Applications of Support Vector Machines to Speech Recognition,” IEEE Transactions on Signal Processing, Vol. 52, Issue 8, Aug. 2004, pp. 2348-2355 [43] F. Melgani and L. Bruzzone, “Classification of Hyperspectral Remote Sensing Images With Support Vector Machines,” IEEE Transactions on Geoscience and Remote Sensing, Vol. 42, No. 8, August 2004, pp. 1778-1790 [44] Y. Fu, R. Shen, and H. Lu, “Watermarking Scheme Based on Support Vector Machine for Colour Images,” Electronic Letters, Vol. 40, No. 16, 5 August 2004, pp.986987 [45] Q. Tao, G. Wu, F. Wang, and J. Wang, “Posterior Probability Support Vector Machines for Unbalanced Data,” IEEE Transactions on Neural Networks, Vol. 16, No. 6, November 2005, pp. 1561-1573 [46] D. A. Sadlier and N. E. O’Connor, “Event Detection in Field Sports Video Using Audio-Visual Features and a Support Vector Machine,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 15, No. 10, October 2005, pp. 1225-1233 [47] L. Zhang, W. Zhou, and L. Jiao, “Hidden Space Support Vector Machines,” IEEE Transactions on Neural Networks, Vol. 15, No. 6, November 2004, pp. 1424-1434 [48] “Turbomachinery,” Wikipedia, 7 May 2006. http://en.wikipedia.org/wiki/Turbomachinery [49] Y. D. Chen, R. Du, and L. S. Qu, “Fault features of large rotating machinery and diagnosis using sensor fusion,” Journal of Sound and Vibration, Vol. 188, No. 2, November 1995, pp. 227-242 [50] S. Nagendra, J. B. Staubach, A. J. Suydam, S.J. Ghunakikar, and V.R. Akula, “Optimal rapid multidisciplinary response networks: RAPIDDISK,” Structural and Multidisciplinary Optimization, Vol. 29, No. 3, March 2005, pp. 213-231 [51] H.Y. Gan, “Inverse Design Method of Diffuser Blades by Genetic Algorithms,” Proceedings of the Institution of Mechanical Engineers, Part A: Journal of Power and Energy, Vol. 212, No. 4, 1998, pp. 261-268

116

[52] S. Pierret and R.A. Van den Braembussche, “Turbomachinery Blade Design Using a Navier-Stokes Solver and Artificial Neural Network,” Journal of Turbomachinery, Transactions of the ASME, Vol. 121, No. 2, April 1999, pp. 326-332 [53] J.C. Pascoa, A.C. Mendes, L.M.C. Gato, and R. Elder, “Aerodynamic Design of Turbomachinery Cascades Using an Enhanced Time-Marching Finite Volume Method,” Modeling in Engineering and Sciences, Vol. 6, No. 6, December 2004, pp. 537-546 [54] F. Zhou, G. Feng, and H. Jiang, “The Development of Highly Loaded Turbine Rotating Blades by Using 3D Optimization Design Method of Turbomachinery Blades Based on Artificial Neural Network and Genetic Algorithm,” Chinese Journal of Aeronautics, Vol. 16, No. 4, November 2003, pp. 198-202 [55] M.M. Rai and N.K. Madavan, “Applications of Artificial Neural Networks to the Design of Turbomachinery Airfoils,” Journal of Propulsion and Power, Vol. 17, No.1, January/February 2001, pp.176-183. [56] A. E. El-Alfy, N.H. Mostafa, and E.S. El-Mtwally, “On Line Knowledge Base System for Predictive Maintenance of Gas Reforming and Turbo Machinery: Case Study on Chemical Ammonia Plant at Talkha Egypt,” Advances in Intelligent Systems and Computer Science, Advances in Intelligent Systems and Computer Science, 1999, pp. 201-206 [57] F. Huiyuan, X. Guang, and W. Shangjin, “Study on Turbomachinery Performance Prediction with Neural Networks,” Chinese Journal of Mechanical Engineering (English Edition), Vol. 13, No. 1, March 2000, pp. 52-57 [58] M. Kalkat, S. Yildirim, and I. Uzmay, “A Neural Network for Analysis of Vibration in Mechanical Systems Arising from Unbalance,” Neural Network World, Vol. 11, No. 2, 2001, pp. 175-188 [59] E.D. Dimapohonas, “Development of Learning Expert Systems for Early Diagnostics of Turbine Machines on the Basis of Neural Networks,” Teploenergetika, No. 10, October 1993, pp. 68-71 [60] S. Yuan and F. Chu, “Support Vector Machines-Based Fault Diagnosis for TurboPump Rotor,” Mechanical Systems and Signal Processing, Vol. 20, No. 4, May 2006, pp. 939-952 [61] T. Barker, “Westinghouse SmartProcess: Seeking Optimum Operation,” Turbomachinery International, Vol. 41, No. 4, July 2000, pp. 19-20 [62] M.A. Karkoub, O.E. Gad, and M.G. Rabie, “Predicting Axial Piston Pump Performance Using Neural Networks,” Mechanism and Machine Theory, Vol. 34, No. 8, November 1999, pp. 1211-1226

117

[63] E. Arcaklioglu and I. Celikten, “A Diesel Engine’s Performance and Exhaust Emissions,” Applied Energy, Vol. 80, No. 1, January 2005, pp. 11-22 [64] B. G. Batchelor, ed. Pattern Recognition Ideas in Practice. Plenum Press, New York and London, 1978 [65] Y. Zhan and D. Shen, “Design Efficient Support Vector Machine for Fast Classification,” Pattern Recognition, Vol. 3, Issue 1, 2005, pp. 157-161 [66] W. S. Meisel, Computer-Oriented Approaches to Pattern Recognition, New York, Academic Press, 1972 [67] E. Osuna, R. Freund, and F. Girosi, “An Improved Training Algorithm for Support Vector Machines,” Neural Networks for Signal Processing [1997] VII. Proceedings of the 1997 IEEE Workshop, 24-26 Sept. 1997, pp. 276-285 [68] T. Joachims, “Making Large-Scale SVM Learning Practical,” Advances in Kernel Methods – Support Vector Learning, MIT Press, Cambridge, USA, 1998, pp. 41-56

118

Appendix A Compressor Operation Data Appendix A contains information about data collection from the TMMK centrifugal air compressor.

Four tables in this appendix detail the historical data

collected from the air compressor, the surge and not surge data points identified from the historical data before and after scaling, and the calculated historical data covariance matrix. The one figure in this appendix illustrates the locations of sensors installed throughout the TMMK testbed centrifugal air compressor. Table A.1 is a table of one set of data collected during a surge event. In this table, the data point that was identified as a surge point is shaded in dark gray. The data point identified as a not surge point is shaded in light gray.

Table A.1: One set of historical data collected during a surge event

Date 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05

Time 15:08:00 15:08:05 15:08:10 15:08:15 15:08:20 15:08:25 15:08:30 15:08:35 15:08:40 15:08:45 15:08:50 15:08:55 15:09:00 15:09:05 15:09:10 15:09:15 15:09:20 15:09:25 15:09:30 15:09:35 15:09:40 15:09:45 15:09:50 15:09:55 15:10:00 15:10:05

Outlet press (psig) 96.7466 95.9164 95.8432 96.3559 96.8076 97.2349 97.4668 97.1861 96.3803 96.4902 96.7344 97.5401 97.7720 97.6133 97.2227 97.2471 97.9552 98.4190 98.7975 98.8585 98.2237 97.0640 96.7466 96.7832 96.4658 96.2583

Current (amp) 191.7383 196.0324 198.4038 203.8301 203.2533 205.3470 203.8515 199.8779 202.3988 201.9074 202.8901 201.7151 199.8779 198.4038 199.8138 198.1261 196.7161 187.4656 182.8724 180.4156 177.1470 177.2111 173.6647 172.2547 170.5670 172.6820

Inlet pressure 2nd stage (psig) 33.8472 34.9056 35.4220 36.0007 36.2534 36.2351 36.1215 35.7260 35.8688 36.0483 36.3047 35.7700 35.7919 35.4953 35.3341 35.2865 34.7958 32.5361 32.0417 31.7780 31.2140 30.6720 30.3351 30.0604 29.8773 30.1226

Inlet pressure 3rd stage (psig) 51.7548 52.6765 53.0549 53.5921 53.7752 53.8729 53.7264 53.3174 53.3174 53.5860 53.8362 53.4334 53.4822 53.0733 52.9023 52.9146 52.5056 51.0407 50.8453 50.7172 50.0640 49.3987 49.1912 49.0691 48.8860 49.1057

Inlet temp 2nd stage (F) 84.0810 84.0810 84.0810 84.0810 84.0810 84.0810 84.0810 84.0810 84.0810 84.0810 84.0810 84.0810 85.3323 85.3323 85.3323 85.3323 85.3323 85.3323 85.3323 85.3323 85.3323 85.3323 85.3323 85.3323 82.1583 82.1583

Inlet temp. 3rd stage (F) 82.1888 82.1888 82.1888 82.1888 82.1888 82.1888 82.1888 82.1888 82.1888 82.1888 82.1888 82.1888 82.8297 82.8297 82.8297 82.8297 82.8297 82.8297 82.8297 82.8297 82.8297 82.8297 82.8297 82.8297 81.6089 81.6089

Inlet pressure 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000

Elec. power (KW) 1257.4010 1311.2678 1310.0469 1322.8651 1349.4171 1335.9885 1336.1411 1311.2678 1338.1248 1336.9043 1349.2645 1335.6833 1337.2093 1309.1314 1310.1995 1309.5892 1284.2580 1216.0471 1203.0764 1161.5699 1163.4011 1148.1413 1133.4920 1132.2712 1133.4920 1132.4238

Voltage (AC) 4127.2050 4128.6699 4125.2519 4124.2753 4128.6699 4127.6933 4129.1582 4130.1347 4127.6933 4123.2983 4128.6699 4130.1347 4129.6464 4123.2983 4124.7636 4124.2753 4129.6464 4127.6933 4128.6699 4168.2231 4125.2519 4128.6699 4170.1762 4127.2050 4129.1582 4166.7583

inlet guide vane (%) 39.7973 45.2359 47.6042 48.9470 49.1301 47.6469 44.8025 45.3274 48.4526 49.5330 49.7527 46.8900 44.3447 41.4148 41.7200 41.4637 36.8064 30.3546 28.2793 27.3637 26.1429 25.5325 24.3117 23.3962 24.5925 27.3148

blow off valve (%) 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 95.12 79.26 66.09 69.26 89.42 97.02 100 100

18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05

15:10:10 15:10:15 15:10:20 15:10:25 15:10:30 15:10:35 15:10:40 15:10:45 15:10:50 15:10:55 15:11:00 15:11:05 15:11:10 15:11:15 15:11:20 15:11:25 15:11:30 15:11:35 15:11:40 15:11:45 15:11:50 15:11:55 15:12:00 15:12:05 15:12:10 15:12:15 15:12:20 15:12:25 15:12:30 15:12:35 15:12:40 15:12:45

96.1850 95.9897 95.6723 95.8310 96.1484 95.9409 95.3061 94.9642 94.3172 94.5858 94.9276 95.9164 96.1728 96.6855 97.2471 97.7232 98.0406 98.9074 99.7741 100.201 100.665 100.506 98.4190 95.7211 93.0110 91.8513 90.5328 89.7271 89.1778 91.2042 91.9123 93.4383

172.3188 173.9638 174.5620 176.1429 175.3738 175.6943 177.5743 177.7452 179.2406 213.8924 218.2292 219.1265 219.7888 220.3228 218.1010 220.0238 220.2801 216.9474 209.5769 207.2483 194.3447 176.2071 134.0139 151.0407 131.9843 131.4930 132.2621 137.8807 142.4525 211.8628 220.5792 224.0615

30.1263 30.1849 30.1922 30.2765 30.3827 30.4815 30.4339 30.4083 30.6976 38.9782 39.4286 39.7106 39.8461 39.8608 39.9084 39.9157 39.8278 39.5348 37.7769 37.3924 34.4552 31.2836 30.6866 11.7890 10.1117 10.1117 10.9979 16.4511 25.2847 39.7070 39.8901 40.1135

49.1485 49.1240 48.9776 49.0691 49.2766 49.2034 49.1301 48.9959 49.0996 55.9604 56.2839 56.6013 56.8271 56.8882 56.9492 56.9614 56.9553 56.6379 55.1913 54.8922 52.8657 50.9491 49.1729 27.7971 24.8184 24.7817 27.2660 34.3099 43.6366 56.4304 56.6440 57.0713

82.1583 82.1583 82.1583 82.1583 82.1583 82.1583 82.1583 82.1583 82.1583 82.1583 82.1583 82.1583 82.1583 82.1583 82.1583 82.1583 82.1583 82.1583 82.1583 82.1583 82.1583 82.1583 85.5459 85.5459 85.5459 85.5459 85.5459 85.5459 85.5459 85.5459 85.5459 85.5459

81.6089 81.6089 81.6089 81.6089 81.6089 81.6089 81.6089 81.6089 81.6089 81.6089 79.8693 79.8693 79.8693 79.8693 79.8693 79.8693 79.8693 79.8693 79.8693 79.8693 79.8693 79.8693 83.5622 83.5622 83.5622 83.5622 83.5622 83.5622 83.5622 83.5622 83.5622 83.5622

14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000

1133.4920 1148.1413 1149.5147 1132.2712 1148.9042 1163.4011 1162.0277 1174.6932 1176.5244 1432.4299 1445.7059 1447.6896 1458.6766 1446.1636 1447.5371 1445.8585 1447.5371 1434.1085 1392.4494 1366.2027 1214.6737 1131.8134 358.4508 983.3363 821.4307 834.2489 861.2586 874.2293 915.4306 1487.5175 1487.0598 1487.9753

4124.2753 4124.7636 4172.6181 4127.6933 4123.2983 4126.7167 4124.2753 4124.7636 4127.6933 4126.7167 4128.1816 4129.6464 4129.1582 4124.7636 4127.6933 4126.2285 4124.2753 4129.1582 4126.2285 4122.8100 4126.2285 4123.2983 4167.2465 4168.7114 4172.1298 4172.6181 4124.7636 4130.1347 4128.1816 4128.6699 4127.6933 4128.6699

28.5722 32.0637 37.6670 41.4331 42.4769 45.9012 52.3713 55.9543 66.2088 73.5335 77.9710 79.9914 80.2478 80.5285 78.6913 74.7970 73.1550 64.8110 60.9717 48.9409 35.4452 24.0859 24.0066 14.3929 13.4773 18.9830 42.8065 64.8110 78.8561 97.7903 100 99.9816

100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 0 0 0 6.732 26.88 45.19 58.02 78.18 85.50 100

18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05 18-Mar-05

15:12:50 15:12:55 15:13:00 15:13:05 15:13:10 15:13:15 15:13:20 15:13:25 15:13:30 15:13:35 15:13:40 15:13:45 15:13:50 15:13:55

94.9032 96.1972 96.9541 97.8453 98.0162 98.0284 98.1383 98.9196 99.0783 98.9562 98.8463 98.6998 98.2481 97.6866

225.9842 225.3219 224.6597 225.2151 223.0360 222.6942 225.6638 225.3433 222.8865 222.9296 223.5060 222.3310 225.1297 222.6087

40.2563 40.3552 40.4504 40.5493 40.5749 40.5859 40.5603 40.5859 40.5310 40.5420 40.4504 40.4431 40.3039 40.3112

57.3399 57.4681 57.7061 57.9136 57.9320 57.9136 57.9197 57.9197 57.9991 57.8038 57.6939 57.6817 57.4925 57.3216

85.5459 85.5459 85.4239 85.4239 85.4239 85.4239 85.4239 85.4239 85.4239 85.4239 85.4239 85.4239 85.4239 85.4239

83.5622 83.5622 81.1817 81.1817 81.1817 81.1817 81.1817 81.1817 81.1817 81.1817 81.1817 81.1817 81.1817 81.1817

14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000 14.6000

1473.9364 1488.5856 1474.2415 1474.5467 1474.5467 1474.6993 1488.8908 1476.0727 1473.9364 1474.2415 1488.2805 1487.6701 1460.9655 1474.8519

4126.7167 4124.7636 4129.1582 4123.2983 4123.2983 4123.2983 4128.6699 4126.2285 4122.3217 4128.6699 4123.7866 4127.6933 4128.6699 4128.6699

100 99.5361 98.1810 96.3376 96.0324 94.8116 93.5909 92.6753 91.7597 90.5450 89.6233 89.0130 87.7922 86.5714

100 100 100 97.51 95.91 92.57 89.73 82.88 78.09 71.78 65.69 64.43 60.74 59.55

122

Tables A.2 and A.3 list the motor current and IGV opening data for all surge and not-surge points collected from historical data. Table A.2 lists the data points before scaling, and Table A.3 lists the data points after scaling. In these tables, the training points are shaded in gray and the testing points are not shaded.

Table A.2: Motor current and IGV opening data for the surge and not-surge points identified from historical data, before scaling Surge Points Surge Points Not Surge Points Not Surge Points Motor Current IGV Opening (%) Motor Current IGV Opening (%) (Amp) (Amp) 134.0139 24.0066 176.2071 24.0859 157.7275 22.4379 171.4216 20.0207 163.9657 23.4023 173.1521 21.8885 131.1939 19.8865 170.1611 20.9180 123.7594 23.3901 179.2834 28.4197 139.6325 20.3992 177.3606 25.6851 150.1862 28.4319 172.2547 21.0339 144.2257 22.7858 172.3615 24.3850 154.4162 28.3831 174.8611 25.3799 123.0544 21.8763 174.4552 21.3391 153.3266 34.9997 165.8670 20.1245 159.2657 20.6495 172.6606 20.8752 139.4189 19.4287 171.1011 22.3646 160.5689 17.9576 177.3606 21.0584 144.4821 23.2436 177.9161 26.6129 148.7548 20.2771 174.0065 21.1682 157.4712 19.6484 170.3961 18.8793 121.6871 18.9709 171.9770 22.1449 123.2680 20.3321 180.6293 27.6078 153.7325 18.4643 179.0056 21.2415 Gray fill = Training Points White fill = Testing Points

123

Table A.3: Motor Current and IGV opening data for the surge and not-surge points identified from historical data, after scaling Surge Points Surge Points Not Surge Points Not Surge Points Scaled Motor Scaled IGV Scaled Motor Scaled IGV Current Opening Current Opening 7.457655 7.050094 9.805633 7.073398 8.777278 6.589409 9.539331 5.87956 9.124422 6.872632 9.635628 6.42808 7.300727 5.840124 9.469189 6.143064 6.887007 6.869047 9.976827 8.346107 7.770323 5.990698 9.869831 7.543045 8.357614 8.349692 9.585696 6.177123 8.025925 6.691585 9.59164 7.161232 8.593007 8.335352 9.730735 7.453418 6.847775 6.424495 9.708148 6.266751 8.532375 10.27847 9.230229 5.910033 8.862875 6.064193 9.608284 6.130517 7.758434 5.705683 9.521498 6.567899 8.935395 5.273679 9.869831 6.184293 8.040191 6.826026 9.900741 7.815513 8.277961 5.954847 9.683182 6.216559 8.763012 5.770215 9.482266 5.544353 6.771688 5.571242 9.570241 6.503367 6.859663 5.97098 10.05173 8.107698 8.554963 5.42246 9.961372 6.23807 Gray fill = Training Points White fill = Testing Points

Figure A.1 is a diagram of the locations of sensors installed throughout the TMMK testbed centrifugal air compressor, unit number ACU 501-2.

Figure A.1: Diagram of sensor locations within the TMMK testbed compressor

Table A.4: Covariance matrix calculated from the scaled historical data. 4781 2656 5900 9459 14917 14682 18046 759389 19457 ⎤ ⎡ 32570 233899 ⎢233899 1729502 34201 19019 42131 67376 106816 105100 129168 5436802 141462 ⎥ ⎢ ⎥ ⎢ 4781 34201 718 393 874 1396 2208 2172 2670 112360 2801 ⎥ ⎢ ⎥ 19019 393 219 485 775 1230 1210 1486 62589 1541 ⎥ ⎢ 2656 ⎢ 5900 42131 874 485 1083 1728 2725 2681 3296 138677 3484 ⎥ ⎢ ⎥ Cov( x, y ) = ⎢ 9459 67376 1396 775 1728 2767 4353 4284 5267 221621 5599 ⎥ ⎢ 14917 106816 2208 1230 2725 4353 6912 6802 8346 351506 8645 ⎥ ⎢ ⎥ 2172 1210 2681 4284 6802 6695 8214 345940 8506 ⎥ ⎢ 14682 105100 ⎢ 18046 129168 2670 1486 3296 5267 8346 8214 10088 424836 10503 ⎥ ⎢ ⎥ ⎢759389 5436802 112360 62589 138677 221621 351506 345940 424836 17894658 440139⎥ ⎢ ⎥ 2801 1541 3484 5599 8645 8506 10503 440139 13382 ⎦ ⎣ 19457 141462

126

Appendix B Multilayer Perceptron Code load data load load load

currentnotsurge.txt points are removed. currentsurge.txt IGVsurge.txt IGVnotsurge.txt

%Values 2,3, and 6 are outliers.

These

xpoints=[currentsurge' currentnotsurge']; ypoints=[IGVsurge' IGVnotsurge']; x1sig=0; x2sig=0; x1avg=(sum(xpoints)/(length(xpoints))); x2avg=(sum(ypoints)/(length(ypoints))); x=1; while x

Suggest Documents