Taguchi method, design of experiments and multiple regression analysis are ... experiments in âorthogonal arrayâ with an aim to attain the optimum setting of the ...
Module 5 Design for Reliability and Quality IIT, Bombay
Lecture 4 Approach to Robust Design IIT, Bombay
Instructional Objectives The primary objectives of this lecture are to outline the concept of robust design and various tools to achieve the same for typical manufacturing processes.
Defining Robust Design Robust design is an engineering methodology for improving productivity during research and development so that high-quality products can be produced quickly and at low cost. According to Dr. Genichi Taguchi, a robust design is one that is created with a system of design tools to reduce variability in product or process, while simultaneously guiding the performance towards an optimal setting. A product that is robustly designed will provide customer satisfaction even when subjected to extreme conditions on the manufacturing floor or in the service environment.
Tools for Robust Design Taguchi method, design of experiments and multiple regression analysis are some of the important tools used for robust design to produce high quality products quickly and at low cost.
Taguchi Method Taguchi method is based on performing evaluation or experiments to test the sensitivity of a set of response variables to a set of control parameters (or independent variables) by considering experiments in “orthogonal array” with an aim to attain the optimum setting of the control parameters. Orthogonal arrays provide a best set of well balanced (minimum) experiments [1]. Table 5.4.1 shows eighteen standard orthogonal arrays along with the number of columns at different levels for these arrays [1]. An array name indicates the number of rows and columns it has, and also the number of levels in each of the columns. For example array L 4 (23) has four rows and three “2 level” columns. Similarly the array L 18 (2137) has 18 rows; one “2 level” column; and seven “3 level” columns. Thus, there are eight columns in the array L 18 . The number of rows of an orthogonal array represents the requisite number of experiments. The number of rows must be at least equal to the degrees of the freedom associated with the factors i.e. the control variables. In general, the number of degrees of freedom associated with a factor IIT, Bombay
(control variable) is equal to the number of levels for that factor minus one. For example, a case study has one factor (A) with “2 levels” (A), and five factors (B, C, D, E, F) each with “3 level”. Table 5.4.2 depicts the degrees of freedom calculated for this case. The number of columns of an array represents the maximum number of factors that can be studied using that array.
Table 5.4.1 Orthogonal
Number
array
of rows
L4
Standard orthogonal arrays [1]
Maximum
Maximum number of columns at these
number of
levels
factors
2
3
4
5
4
3
3
-
-
-
L8
8
7
7
-
-
-
L9
9
4
-
4
-
-
L 12
12
11
11
-
-
-
L 16
16
15
15
-
-
-
L 16 ’
16
5
-
-
5
-
L 18
18
8
1
7
-
-
L 25
25
6
-
-
-
6
L 27
27
13
-
13
-
-
L 32
32
31
31
-
-
-
L 32 ’
32
10
1
-
9
-
L 36
36
23
11
12
-
-
L 36 ’
36
16
3
13
-
-
L 50
50
12
1
-
-
11
L 54
54
26
1
25
-
-
L 64
64
63
63
-
-
-
L 64 ’
64
21
-
-
21
-
L 81
81
40
-
40
-
-
The signal to noise ratios (S/N), which are log functions of desired output, serve as the objective functions for optimization, help in data analysis and the prediction of the optimum results. The Taguchi method treats the optimization problems in two categories: static problems IIT, Bombay
and dynamic problems. For simplicity, the detailed explanation of only the static problems is given in the following text. Next, the complete procedure followed to optimize a typical process using Taguchi method is explained with an example.
Table 5.4.2
The degrees of freedom for one factor (A) in “2 levels” and five factors (B, C, D, E, F) in “3 levels” Factors
Degrees of freedom
Overall mean
1
A
2-1 = 1
B, C, D, E, F
5 × (3-1) = 10
Total
12
Static problems Generally, a process to be optimized has several control factors (process parameters) which directly decide the target or desired value of the output. The optimization then involves determining the best levels of the control factor so that the output is at the target value. Such a problem is called as a "STATIC PROBLEM". This can be best explained using a P-Diagram (Figure 5.4.1) which is shown below ("P" stands for Process or Product). The noise is shown to be present in the process but should have no effect on the output. This is the primary aim of the Taguchi experiments - to minimize the variations in output even though noise is present in the process. The process is then said to have become ROBUST.
Figure 5.4.1
P- Diagram for static problems [1].
IIT, Bombay
Signal to Noise (S/N) Ratio There are three forms of signal to noise (S/N) ratio that are of common interest for optimization of static problems. [1] Smaller-the-better This is expressed as n = −10 Log10 [mean of sum of squares of measured data ]
(1)
This is usually the chosen S/N ratio for all the undesirable characteristics like “defects” for which the ideal value is zero. When an ideal value is finite and its maximum or minimum value is defined (like the maximum purity is 100% or the maximum temperature is 92 K or the minimum time for making a telephone connection is 1 sec) then the difference between the measured data and the ideal value is expected to be as small as possible. Thus, the generic form of S/N ratio becomes, n = −10 Log10 [mean of sum of squares of {measured − ideal}]
(2)
[2] Larger-the-better This is expressed as n = −10 Log10 [mean of sum of squares of reciprocal of measured data ]
(3)
This is often converted to smaller-the-better by taking the reciprocal of the measured data and next, taking the S/N ratio as in the smaller-the-better case.
[3] Nominal-the-best This is expressed as
square of mean n = −10 Log10 var iance
(4)
IIT, Bombay
This case arises when a specified value is the most desired, meaning that neither a smaller nor a larger value is desired.
Example for application of Taguchi Method Determine the effect of four process parameters: temperature (A), pressure (B), setting time (C), and cleaning method (D) on the formation of surface defects in a chemical vapor deposition (CVD) process to produce silicon wafers. Also estimate the optimum setting of the above process parameters for minimum defects. Table 5.4.3 depicts the factors and their levels.
Table 5.4.3
Factors and their levels
Factor
Level 1
2
3
A. Temperature (0C)
T 0 −25
T0
T 0 +25
B. Pressure (mtorr)
P 0 −200 P 0
C. Settling time (min)
t0
t 0 +8
D. Cleaning method
None
CM 2 CM 3
P 0 +200 t 0 +16
Step 1: Select the design matrix and perform the experiments The present example is associated with four factors with each at three levels. Table 5.4.1 indicates that the best suitable orthogonal array is L 9 . Table 5.4.4 shows the design matrix for L 9 . Next conduct all the nine experiments and observe the surface defect counts per unit area at three locations each on three silicon wafers (thin disks of silicon used for making VLSI circuits) so that there are nine observations in total for each experiment. The summary statistic, η i , for an experiment, i, is given by ηi = −10 log10 C i
(5)
where C i refers to mean squared effect count for experiment i and the mean square refers to the average of the squares of the nine observations in the experiment i. Table 5.4.4 also depicts the observed value of η i for all the nine experiments. This summary statistic η i is called the signal to noise (S/N) ratio. IIT, Bombay
Table 5.4.4
L 9 array matrix experiment table [1].
Column number and factor assigned Expt No.
1 Temperature
2
3
4
Pressure Settling time
Cleaning
Observation, η (dB)
(A)
(B)
(C)
method (D)
1
1
1
1
1
η 1 = -20
2
1
2
2
2
η 2 = -10
3
1
3
3
3
η 3 = -30
4
2
1
2
3
η 4 = -25
5
2
2
3
1
η 5 = -45
6
2
3
1
2
η 6 = -65
7
3
1
3
2
η 7 = -45
8
3
2
1
3
η 8 = -65
9
3
3
2
1
η 9 = -70
Step 2: Calculation of factor effects The effect of a factor level is defined as the deviation it causes from the overall mean. Hence as a first step, calculate the overall mean value of η for the experimental region defined by the factor levels in Table 5.4.4 as
m=
1 9 1 ηi = (η1 + η 2 + ..... + η9 ) = −41.67 dB ∑ 9 i =1 9
(6)
The effect of the temperature at level A 1 (at experiments 1, 2 and 3) is calculated as the difference of the average S/N ratio for these experiments (m A1 ) and the overall mean. The same is given as The effect of temperature at level A 1 = m A1 – m =
1 (η1 + η2 + η3 ) − m 3
(7)
Similarly,
IIT, Bombay
The effect of temperature at level A 2 = m A2 − m =
1 (η4 + η5 + η6 ) − m 3
(8)
The effect of temperature at level A 3 = m A3 − m =
1 (η7 + η8 + η9 ) − m 3
(9)
Using the S/N ratio data available in Table 5.4.4 the average of each level of the four factors is calculated and listed in Table 5.4.5. These average values are shown in Figure 5.4.2. They are separate effect of each factor and are commonly called main effects.
Table 5.4.5
Average η for different factor levels [1].
Factor
Level 1
2
3
A. Temperature
-20 -45 -60
B. Pressure
-30 -40 -55
C. Settling time
-50 -35 -40
D. Cleaning method
-45 -40 -40
Figure 5.4.2 Plots of factor effects IIT, Bombay
Step 3: Selecting optimum factor levels Our goal in this experiment is to minimize the surface defect counts to improve the quality of the silicon wafers produced through the chemical vapor deposition process. Since –log depicts a monotonic decreasing function [equation (5)], we should maximize η. Hence the optimum level for a factor is the level that gives the highest value of η in the experimental region. From Figure 5.4.2 and the Table 5.4.5, it is observed that the optimum settings of temperature, pressure, settling time and cleaning method are A 1 , B 1 , C 2 and D 2 or D 3 . Hence we can conclude that the settings A 1 B 1 C 2 D 2 and A 1 B 1 C 2 D 3 can give the highest η or the lowest surface defect count. Step 4: Developing the additive model for factor effects The relation between η and the process parameters A, B, C and D can be approximated adequately by the following additive model: η (A i , B j , C k , D l ) = m + a i + b j + c k + d l + e
(10)
where the term m refers to the overall mean (that is the mean of η for the experimental region). The terms a i , b j , c k and d l refer to the deviations from μ caused by the setting A i , B j , C k , and D l of factors A, B, C and D, respectively. The term e stands for the error. In additive model the cross- product terms involving two or more factors are not allowed. Equation (10) is utilized in predicting the S/N ratio at optimum factor levels.
Step 5: Analysis of Variance (ANOVA) Different factors affect the surface defects formation to a different degree. The relative magnitude of the factor effects are listed in Table 5.4.5. A better feel for the relative effect of the different factors is obtained by the decomposition of variance, which is commonly called as analysis of variance (ANOVA). This is obtained first by computing the sum of squares.
9
Total sum of squares =
∑ ηi2 = (−20) 2 + (−10) 2 + ..... + (−70) 2 = 19425 (dB) 2
(11)
i =1
IIT, Bombay
Sum of squares due to mean = (number of experiments) × m 2 = 9 × 41.67 2 = 15625 (dB) 2
(12)
9
Total sum of squares =
∑ (ηi − m) 2 = 3800 (dB) 2
(13)
i =1
Sum of squares due to factor A = [(number of experiments at level A 1 ) × (m A1 -m)2] + [(number of experiments at level A 2 ) × (m A2 -m)2] +
(14)
[(number of experiments at level A 3 ) × (m A3 -m)2] = [3 × (-20+41.67)2] + [3 × (-45+41.67)2] + [3 × (-60+41.67)2] = 2450 (dB)2. Similarly the sum of squares due to factor B, C and D can be computed as 950, 350 and 50 (dB)2, respectively. Now all these sum of squares are tabulated in Table 5.4.6. This is called as the ANOVA table. ANOVA table for η [1].
Table 5.4.6 Factor
Degree of Sum of Mean square =
F
freedom
squares
sum of squares/degree of freedom
A. Temperature
2
2450
1225
B. Pressure
2
950
475
12.25
C. Settling time
2
350*
175
4.75
D. Cleaning method
2
50*
25
Error
0
0
-
Total
8
3800
(Error)
(4)
(400)
(100)
*Indicates sum of squares added together to estimate the pooled error sum of squares shown within parenthesis. F ratio is calculated as the ratio of factor mean square to the error mean square.
Degrees of freedom: •
The degrees of freedom associated with the grand total sum of squares are equal to the number of rows in the design matrix.
•
The degree of freedom associated with the sum of squares due to mean is one. IIT, Bombay
•
The degrees of freedom associated with the total sum of squares will be equal to the number of rows in the design matrix minus one.
•
The degrees of freedom associated with the factor will be equal to the number of levels minus one.
•
The degrees of freedom for the error will be equal to the degrees of freedom for the total sum of squares minus the sum of the degrees of freedom for the various factors.
In the present case-study, the degrees of freedom for the error will be zero. Hence an approximate estimate of the error sum of squares is obtained by pooling the sum of squares corresponding to the factors having the lowest mean square. As a rule of thumb, the sum of squares corresponding to the bottom half of the factors (as defined by lower mean square) are used to estimate the error sum of squares. In the present example, the factors C and D are used to estimate the error sum of squares. Together they account for four degrees of freedom and their sum of squares is 400.
Step 6: Interpretation of ANOVA table. The major inferences from the ANOVA table are given in this section. Referring to the sum of squares in Table 5.4.6, the factor A makes the largest contribution to the total sum of squares [(2450/3800) x 100 = 64.5%]. The factor B makes the next largest contribution (25%) to the total sum of squares, whereas the factors C and D together make only 10.5% contribution. The larger the contribution of a particular factor to the total sum of squares, the larger the ability is of that factor to influence η. Moreover, the larger the F-value, the larger will be the factor effect in comparison to the error mean square or the error variance. Step 7: Prediction of η under optimum conditions In the present example, the identified optimum condition or the optimum level of factors is A 1 B 1 C 2 D 2 (step 3). The value of η under the optimum condition is predicted using the additive model [equation (10)] as
η opt = m + (m A1 − m) + (m B1 − m) = −41.67 + (−20 + 41.67) + (−30 + 41.67) = −8.33 dB (15)
IIT, Bombay
Since the sum of squares due to the factors C and D are small as well as used to estimate the error variance, these terms are not included in equation (15). Further using equations (5) and (15), the mean square count at the optimum condition is calculated as y
− ηopt = 10 10
(defects/unit area)2. The corresponding root-mean square defect count is
= 100.833 = 6.8
6.8 = 2.6 defects/unit
area.
Design of Experiments A designed experiment is a test or series of tests in which purposeful changes are made to the input variables of a process or system so that we may observe and identify the reasons for changes in the output response. For example, Figure 5.4.3 depicts a process or system under study. The process parameters x 1 , x 2 , x 3 , … , x p are controllable, whereas other variables z 1 , z 2 , z 3 , …,z q are uncontrollable. The term y refers to the output variable. The objectives of the experiment are stated as: •
Determining which variables are most influential on the response, y.
•
Determining where to set the influential x’s so that y is almost always near the desired nominal value.
•
Determining where to set the influential x’s so that variability in y is small.
•
Determining where to set the influential x’s so that the effects of the uncontrollable z 1 , z 2 …z q are minimized.
Figure 5.4.3 General model of a process or system [2].
IIT, Bombay
Experimental design is used as an important tool in numerous applications. For instance it is used as a vital tool in improving the performance of a manufacturing process and in the engineering design activities. The use of the experimental design in these areas results in products those are easier to manufacture, products that have enhanced field performance and reliability, lower product cost, and short product design and development time.
Guidelines for designing experiments •
Recognition and statement of the problem.
•
Choice of factors and levels.
•
Selection of the response variable.
•
Choice of experimental design.
•
Performing the experiment.
•
Data analysis.
•
Conclusions and recommendations.
Factorial designs Factorial designs are widely used in experiments involving several factors where it is necessary to study the joint effect of the factors on a response. For simplicity and easy understanding, in the present section the design matrix of the 22 factorial design is presented with subsequent explanation on the calculation of the main effects, interaction effects and the sum of squares. The two level design matrices are very famous and used in the daily life engineering applications very frequently. The 22 design The 22 design is the first design in the 2k factorial design. This involves two factors (A and B), each run at two levels. Table 5.4.7 depicts the 22 design matrix, where “−” refers to the low level and “+” refers to the high level. These are also called as non-dimensional or coded values of the process parameters. The relation between the actual and the coded process parameters is given as
IIT, Bombay
x high + x low x − 2 xi = x high − x low
(16)
2
where x i is the coded value of the process parameter (x). The term y refers to the response parameter.
Table 5.4.7 Expt. No.
22 factorial design matrix. Factors
Response
A
B
AB
y
1
-1
-1
1
y1
2
+1
-1
-1
y2
3
-1
+1
-1
y3
4
+1
+1
1
y4
Similarly, the main effect of factor B is calculated as
y B+ − y B− =
y 3 + y 4 y1 + y 2 − 2r 2r
(18)
The interaction effect of AB is calculated as
y AB+ − y AB− =
y1 + y 4 y 2 + y 3 − 2r 2r
(19)
The next step is to compute the sum of squares of the main and interaction factors. Before doing that, the contrast of the factors need to be calculated as follows.
(Contrast) A = (y 2 + y 4 )-(y 1 + y 3 )
(20)
(Contrast) B = (y 3 + y 4 )-(y 1 + y 2 )
(21)
(Contrast) AB = (y 1 + y 4 )-(y 2 + y 3 )
(22) IIT, Bombay
Further these contrasts are utilized in the calculation of sum of squares as follow.
(Sum of squares) A = SS A =
[(contrast ) A ]2 r × number of rows
(23)
(Sum of squares) B = SS B =
[(contrast ) B ]2 r × number of rows
(24)
(Sum of squares) AB = SS AB
[(contrast ) AB ]2 = r × number of rows 2
Total sum of squares = SS T =
2
r
∑∑∑ y i =1 j=1 k =1
2 ijk
−
(25) 2 y avg
r × number of rows
(26)
In general, SS T has [(r × number of rows)-1] degrees of freedom (dof). The error sum of squares, with r × [number of rows-1] is calculates as
Error sum of squares = SS E = SS T – SS A – SS B - SS AB
(27)
Moreover each process parameter is associated with a single degree of freedom. Further, the complete analysis of variance is summarized in Table 5.4.8. This is called as analysis of variance (ANOVA) table. The term F 0 refers to the F ratio and the same is calculated as the ratio of factor mean square to the error mean square. The interpretation of the ANOVA table can be done similar to the one as explained in the Taguchi method, step 6. The main drawback with the two level designs is the failure to capture the non linear influence of the process parameters on the response. Three level designs are used for this purpose. The explanation about the three level designs is given elsewhere [2].
IIT, Bombay
Table 5.4.8 Source variation
of
Analysis of Variance (ANOVA) table.
Sum of squares
Degree of freedom
Mean square
F0
A
SS A
(dof) A
SS A /(dof) A
B
SS B
(dof) B
SS B /(dof) B
(F 0 ) A
AB
SS AB
(dof) AB
SS AB /(dof) AB
(F 0 ) B
Error
SS E
(dof) E
SS E /(dof) E
(F 0 ) AB
Total
SS T
(dof) T
Central composite rotatable design Even though three level designs help in understanding the non linear influence of the process parameters on the response, the number of experiments increases tremendously with the increase in number of process parameters. For example, the number of experiments involved in three level designs with three, four and five factors is twenty seven (33=27), eighty one (34=81) and two hundred and forty three (35=243), respectively. The principle of central composite rotatable design (CCD) reduces the total number of experiments without a loss of generality [2]. This is widely used as it can provide a second order multiple regression model as a function of the independent process parameters with the minimum number of experimental runs [2]. The principle of central composite rotatable design includes 2f numbers of factorial experiments to estimate the linear and the interaction effects of the independent variables on the responses, where f is the number of factors or independent process variables. In addition, a number (n C ) of repetitions [n C > f] are made at the center point of the design matrix to calculate the model independent estimate of the noise variance and 2f number of axial runs are used to facilitate the incorporation of the quadratic terms into the model. The term rotatable indicates that the variance of the model prediction would be the same at all points located equidistant from the center of the design matrix. The choice of the distance of the axial points (ζ) from the centre of the design is important to make a central composite design (CCD) rotatable. The value of ζ for rotatability of the design scheme is estimated as ζ = (2f)1/4 [2]. The number of experiments is estimated as
IIT, Bombay
2 f + (2 ∗ f ) + n C
(28)
The intermediate coded values are calculated as [2] x max + x min x − 2 xi = ζ ∗ x max − x min 2
(29)
where x i is the coded value of a process variable (x) between x max and x min . For example the number of experiments in a CCD matrix corresponding to two process variables is calculated as 2 2 + (2 ∗ 2) + 4 = 12 and the distance of the axial points from the center is calculated as ζ = (2*2)1/4 = 1.414. Hence Table 5.4.9 depicts the CCD for a two process parameter application.
Table 5.4.9
Central composite design (CCD) for a two process parameter application. Process parameters (coded)
Response variable
x1
X2
y
1
-1
-1
y1
2
+1
-1
y2
3
-1
+1
y3
4
+1
+1
y4
5
-1.414
0
y5
6
+1.414
0
y6
7
0
-1.414
y7
8
0
+1.414
y8
9
0
0
y9
10
0
0
y 10
11
0
0
y 11
12
0
0
y 12
Expt. No.
IIT, Bombay
Regression modeling Regression models are the mathematical estimation equations with response variable as a function of process parameters. These models are developed statistically by utilizing the information of the measured response variable and the corresponding design matrix. Considering the ‘f’ number of independent process parameters, a generalized regression model can be represented as
2
f f y m = β 0 + Σ β j x j + Σ β jj x *j + ∑∑ β ij x *i x *j + ε j=1 j=1 i =1 j=1 *
f
f
*
(30)
where y ∗m is a response variable in non-dimensional form, x i * and x j * refer to the independent variables in non-dimensional form, β’s refer to the regression coefficients and ε is the error term. Calculation of the regression coefficients and ANOVA terms The coefficients, β’s, in the regression model [equation (5.4.30)] are calculated based on the minimization of the error between the experimentally measured and the corresponding estimated values of the response variables. The least square function, S, to be minimized can be expressed as [3]
f f f f S (β 0 , β1 , , β ij ) = ∑ y ∗m − β 0 − Σ β j x *j − Σ β jj ( x ∗j ) 2 − ∑∑ β ij x ∗i x ∗j j=1 j=1 s =1 i =1 j=1
u
2
(31)
The estimated second order response surface model is represented as
f
f
j=1
j=1
f
f
y ∗p = β 0 + Σ β j x *j + Σ β jj (x ∗j ) 2 + ∑∑ β ij x ∗i x ∗j
(32)
i =1 j=1
Further the adequacy of the developed estimation model is tested using Analysis of Variance (ANOVA) as shown in Table 5.4.10.
IIT, Bombay
Table 5.4.10 Analysis of variance (ANOVA) method for testing the significance of regression model [3]. Sum of Degree of freedom Mean
F-statistic
squares
(dof)
square
(F)
Regression
SS R
m-1
MS R
FR
PR
Linear terms
SS R_L
m-1-m’
MS R_L
F R_L
P R_L
Non-linear terms
SS R_NL
m’
MS R_NL
F R_NL
P R_NL
Residual
SS Res
u-m
MS Res
Lack of fit
SS LOF
u-m-n C +1
MS LOF
Pure error
SS PE
n C -1
MS PE
Total
SS T
u-1
Source of variation
P-value
R 2Adj
The terms in ANOVA table are calculated in the following manner.
2
u ( y ∗m ) s ∑ u SS R = ∑ ( y ∗p ) s − s=1 u s =1
2 u ∗ ∗ ; SS Re s = ∑ ( y m ) s − ( y p ) s s =1
u ( y ∗m ) s ∑ u SS T = ∑ ( y ∗m ) s − s=1 u s =1
u ( y ∗m ) s ∑ u ∗ s =1 ; SS R _ L = ∑ ( y p _ L ) s − u s =1
2
SS R _ NL = SS R − SS R _ L ; SS PE
u ( y ∗m ) s ∑ u = ∑ ( y ∗m ) s − s=43 u − 44 s = 43
SSR ; m −1 SSR _ L m − 1 − m'
(33, 34)
2
(35, 36)
2
(37, 38)
SS LOF = SS Re s − SS PE ; MS R = MS Re s
SS = Re s ; MS R _ L u−m
=
(39, 40)
IIT, Bombay
MS R _ NL = MS PE = FR _ L =
FLOF =
SS R _ NL m'
; MS LOF =
SS LOF ; u − m − nC +1
SS PE MS R ; FR = ; nC −1 MS Re s
MS R _ L MS Re s
; FR _ NL =
MS LOF ; R 2Adj MS PE
MS R _ NL MS Re s
;
(41 – 48)
SS Re s u − m = 1− SS T u −1
where (a) SS R , SS Res , and SS T refer to the regression sum of squares, residual sum of squares and total sum of squares with degrees of freedom m−1 (m is the number of terms in the regression model), u−m and u−1 respectively. (b) SS R_L , SS R_NL , SS PE and SS LOF refer to the regression sum of squares of the model having only linear terms, regression sum of squares of the model having only nonlinear terms, pure error sum of squares and the lack of fit sum of squares with degrees of freedom m−1−m’, m’ (number of non-linear terms in response surface model), n C −1 and u−m−n C −1 respectively. (c) y ∗p _ L refers to the regression model with only linear terms. (d) MS R and MS Res refer to the regression mean squares and the residual mean squares respectively. (e) MS R_L , MS R_NL , MS PE and MS LOF refer to the regression mean squares of the model having only linear terms, regression mean squares of the model having only non-linear terms, pure error mean squares and lack of fit mean squares respectively. (f) F R , F R_L , F R_NL and F LOF refer to the F-statistic required for the hypothesis testing of the regression model, model with only linear terms, model with only quadratic terms and the lack of fit of regression model respectively. (g) P R , P R_L , P R_NL and P LOF refer to the P-value of the regression model, model with only linear terms, model with only non-linear terms, and the lack of fit of second order IIT, Bombay
response surface model respectively. The term P-value refers to the smallest significance level at which the data lead to rejection of the null hypothesis. In other words, if the P-value is less than level of significance (α) then the null hypothesis is rejected. These values are calculated using the corresponding F-statistic value and the F-distribution table. (h) R 2Adj refers to the adjusted coefficient of determination. Model adequacy checking The various steps followed to check the adequacy of the regression model are [1] Step 1 Initially, the lack of fit test is performed to check the lack of fit for the regression model. The appropriate hypothesis considered for testing is
H 0 : The regression model is adequate
(Null hypothesis)
(49)
H 1 : The regression model is not adequate
(Alternate hypothesis)
(50)
For a given significance level (α), the null hypothesis is rejected if FLOF > Fα, u −m −n c +1, n c −1
The terms Fα, u −m −n
c +1, n c −1
and
PLOF > α
(51)
and P LOF are calculated from the F-distribution table. The value of α
is considered as 0.1 in the present study [3]. If the equation (51) is not satisfied then the null hypothesis is accepted, implying that there is no evidence of lack of fit for the regression model and the same model can be used for further analysis.
[2] Step 2 The significance of this quadratic model is checked by conducting hypothesis testing. The appropriate hypothesis considered for testing is H 0 : β1 = β 2 = β j = β11 = β 22 = β jj = β12 = β13 = β ij = 0; Null hypothesis i< j
H1 : β ≠ 0 for atleast one term in the regression mod el;
(52)
Alternate hypothesis
For a given significance level (α), the null hypothesis is rejected if IIT, Bombay
FR > Fα,m −1,u −m and P R -value < α
(53)
The terms Fα,m −1,u −m and P R are calculated from the F-distribution table. If the equation (53) is satisfied then the null hypothesis is rejected, implying that at least one of the regressors in the model are non zero or significant.
[3] Step 3 The contribution of the linear and non-linear terms to the model is tested. For a given significance level (α), the linear terms contribute significantly when FR _ L > Fα,m −1−m ',u −m and corresponding P R_L -value < α ;
(54)
and the quadratic terms contribute significantly when FR _ NL > Fα,m ',u −m and corresponding P R_NL -value < α
(55)
[4] Step 4 The coefficient of determination R 2Adj is calculated. This represents the proportion of the variation in the response explained by the regression model. If the value of R 2Adj is close to 1.0 then most of the variability in response is explained by the model.
[5] Step 5 The t-statistic and P-value of all the coefficients in regression model are calculated. If the Pvalue of any term in the model is greater than α then the same are insignificant. [6] Step 6 The significant terms in the regression model are identified using the step wise regression analysis. Step wise regression analysis involves multiple steps of regression, where in each step a IIT, Bombay
single variable having low P-value (