one well· planned experiment, incorporating many fac· tors, can give much more ..... at improving plastic parts, used a 3 x 3 factorial design in combination with a ...
'REF ~ 'PeRSl'€C-1N E.:\ D#J C-oIJThW'lfoltJ't'R,f' j'7~ J lSTI(.J' liqq~ 1>~C. tl~G-LUJ ~ »,j'. -p/yJoR£ l=4)rroR.j' M A-W. A-~ 11). (};:= Jtw'IeeJ c..A1>1 f}1 A- NO lC:f -#.< J Chapter 5
The Statistical Approach to Design of Experiments Ronald D. SlIM E. I. du Pont de Nemours " Company Lynne B. Hare Thomas J. Upton Co,"psny
72
Perspectives on Contempolaly Statistics • How can we design a new medical device that will give accurate measurements when used by patients at home?
much data should be collected to understand variability and help identify ways to control it.
• How can we design a food recipe that will perform well regardless of factors .beyond our control, such as varying water hardnOSll and oven temperatures?
2. STATISTICS HELP DECIDE WHAT EXPERIMENTS TO RUN
Much of Western industry has recognized that they must work in new ways, using new approaches to problemsolving and learning, if they are to regain a competitive edge. Fundamentally, customer satisfaction has become
a requirement for survival. Because customers insist on it, industry is being forced to direct it's attention to quality as an overriding concern. Notice that a concern for quality is at the heart of each of the questions above. The key question is how industry can provide products and services that will better satisfy the needs of customers so that, eventually, they can regain the competitive edge. Well-knoWn authors and sources have identified statistics as a body of knowledge that can help provide answers (Deming 1986, Business Week 1987, Penzias 1989).
1. ACQUIRING KNOWLEDGE Statistical design of experiments is an essential part of the scientific method of inquiry, and it is an extremely powerful approach to experimentation. Using the scien· tific method, we begin with conjectures based on prior knowledge and spurred by dissatisfaction with current understanding. This is the inductive step. We design a data collection plan; that is, we d••ign the e.zperim.r....
L
1 "..
T \l i .
itt.!,pi.
F".nn'W&1t -H"PcrTWESIS
!
~
tJn·!e/~
~-
~ ~!~
.f4vPOT"tlrrs IS
~ -ME"'''
~.~~ .~~P
E.~Rt"'DIT
'M
r 11!"o!I!A'1L &T'CI.,... !II!
£ 'l./'EIIW"IENrAr"'N Figure 3: The iterative nature of experimentation.
= ~
j
Snee and Hare: Design of Experiments
77
Perspectives on Contemporary Statiotiat 78 o Systematic approacll to experiment design and to analysis and interpretation of results o Development
of data that
meet the needs of the study
o
Provides unbiased estimates of the factor elfects and aaoociated tmeertaintieo
o Ability to investigate the effects of a large number of variables
o Enables the experimenter to detect important differences
o Ability to control the inJIuence of nuisance variables
o Includes the plan for analysis and reporting of the results
o Efficient use of data
o Gives results that are easy to interpret
o Quantitative estimates of the effects of variables
o Permits conclusions that have wide validity
• Quantitative estimate of experimental variation
o Shows the direction of better results
o Ability to develop trade-offs among multiple response variables
o Is 88 simple 88 possible
o Identification of the range of validity of results
Table 3: Characteristics of a good experimental design. Table 2: Benefits of statistically designed experiments. of validity of the results of an experiment. We are thus able to predict what will happen at different combinations of the factors, even when data were not collected at these different combinations. . Statistical methodology, therefore, provides an overall systematic approacll to experimentation that helps set up experiments and plan the data collection process, offers analysis procedures to make sense of the data, and promotes valid interpretation of the results and sound decisions as a basis for action.
4. DESIGNING AN EXPERIMENT The planning of an experiment must address some key issues. First we must have a statement of the objective which is agreed to by all concerned parties. This is often the most dilIicult part of the experiment. Next we must consider the factors or variables to be studied and the responses or characteristics to be measured. We must determine whether the factors are quantitative (e.g., temperature, pressure) or qualitative (e.g., catalyst type, variety) and what levels of the factors we want to study. Quantitative factors are usually eaSier to study. Often, however, designs have only qualitative factors or mixture of qualitative and quantitative factors. The two-level factorial designs to be discussed later are particularly effective in handling both qualitative and quantitative factors.
a
It is important that we have a good measurement of the responses or characteristics we are studying. Many experiments have failed because the response measurement was poor. Typically, a highly variable response reo
quires larger numbers of experimental runs at each combination of factors. It is not uncommon to work with attribute data such as "go-no-go" responses, percentdefective data, and subjective rating scales. Useful data can be obtained in these situations if the measurements are planned properly. Next we must take into account the amount of resources (money, personnel, equlpment) available for the study and the deadline for resulta. A good strategy avoids using all of the time and other reaoUfCes on the first experiment. Reserve some resources to confirm the recommended changes developed from the initial experiments. The overall goal is to produce the needed results, infor· mation, etc. within the allotted time and money. The scope of the experimental program should be designed accordingly. Two critical issues concerning the conduct of the experiment are ",plication and mtldomization. Replication refers to the number of times we will repeat each of the runs in the experimental design. Replication increases the precision of the estimates of the factor effects and provides an estimate of the experimental error that is used to calculate the uncertainty in the estimated factor elfects. Randomization is important to ensure that any unplanned changes that take place during the conduct of the experiment do not bias the results. Randomization is also critical to some of the assumptions of the statistical analysis. Complete randomization, in which the experiment&! runs are made in a completely random order, is often the preferred approacll, but there are exceptions.
In agricultural research, for example, treatments such 88 applications of herbicides, pesticides, and fertilizers are allocated randomly within blocks or field plots. Several blocks are studied. This is done to minimize the impact of "noise" variables such as soil, fertility, and slope differences throughout the whole plot of land. More information on the .use of blocking and other forms of restricted randomization can he found in Box, Hunter, and Hunter (1978) and Steel and Torrie (1980).
"\
(99.S
12
Additive (Ibs) B
Two critical issues concerninl the conduct of the experiment .-e replication and randomization.
2 .01
To summarize the features of a good experimental design, Table 3 (Snee, Hare, and Trout 1985) lists ..ven main characteristics. Some of these may seem dilIieult to achieve, hut all are worthy goals. Without .them an experiment may waste effort and yield erroneous conclusions. Moot of these characteristics may seem to be little more than commmi sense; but, through its statistic&! and mathematical foundations,. the atatistical approacll ensures that good experimental designs have them.
5. EXAMPLES OF DESIGNED EXPERIMENTATION The environment in which the experiment is to be conducted dictates the design. Considerations include the objective, the types of factors to be studied and responses to be measured, the available resources (money, people, equipment), and deadlines for completing the results. Several examples in this section illustrate types of environments and the aaoociated designs.
~)
.04
Figure 4: Outcomes of a ~ experiment, product purity in percent.
11.1 StudyiDg Factors at Two Levela Two-level factorial designs, in which each factor is studied at two leveis, are the moot common class of designs because they are easy to use and provide maximum information per experimental run. These designs yield very efficient measures of the lineu etfecta of factom and their interactions. The Easter lily experiment discussed earlier used a 2 x 2 factorial design (Table 1). The results of a 2 x 2 x 2, or ~ (read "two to the three"), factorial design are shown in Figure 4. This experiment studied the effects of one qualitative and two quantitative factors, each at two leveis, on the purity of a product.. Two types of polymers (A and B) were in-
79
Snee and Hare: DesigJJ of Experiments
Pel8pectives
80 45
Prellurl (pOl)
30
T_. . .
15
50
25
rcl
75
Figure 5: Plastic part process improvement, 3 x 3 factorial design. volved in making the product. Each was studied at 2 concentrations (0.01, 0.04%). An additive was known to increase purity when used at a 2-pound rate. The question was whether -,igher purity percentages would result from using higher amounts of the additive, specifically 12 pounds. The results of the 8 combinations in this 23 factorial design are shown in Figure 4. The runs were made in a random order. Three of the 8 combinations were replicated. AJJ the experiment progressed, it was decided that further replication was not needed because the differences among the replicates were very smalL One beauty of the 2' design is that the results are easy to interpret. The Jinear effects of the factors can be studied by computing the changes in the response along the different edges of the cube (Figure 4). The linear elfect of each factor (rounded to one decimal) is ,e stimated four timee as follOW!!:
Average
Polymer Concentration
Polymer Type
Additives
+0.8 +0.6 +0.6 +0.6 +0.65
-0.1 -0.3 -0.3 -0.3 -0.25
+0.2 +0.0 +0.0 +0.0 + 0.05
We conclude that polymer concentration has a strong poeitive effect Oil product purity, a small but consistent
differm:e separates the product purity 16veJa of polymers A and B, and larger amounts of the additive did not further improve product purity. The fact that the8e factor effects were consistent on each of the four edges of the cube for each factor indicated that there were no interactions among the factors. AU these results were verified by a more formal statistical analysis appropriate to the 23 design. (We do not discuss that analysis here.) It was found that polymer A should be used along with 2 pounds of the additive. The polymer concentration would be selected based on the application and coot considerations, with 0.04% providing the best results. Further, it was learned, that increasing additives should be avoided because, beyond the standard 2 pounds, they contribute nothing to purity except the coet of attaining it. Of course, it is pooaible that some additive level between 2 and 12 pounds results in even higher percentages of purity than the levels studied. The relationship between purity and additive level may not be linear. It is always wise to carry out additional experiments to confirm assumptions and results. Many other isaues surround the use of two-Ievel factorial designs. When there are many factors, a class of factorial designs called fractional factorial designs can be used to help determine which factors or combinations of factors are most influential. Fractional factorial designs, used as screening designs, help direct attention to areas with the greatest potential gain. More infurrnation on this subject can be found in Box, Hunter, and Hunter (1978) . Section 5.4 discusses a two-level screening experiment.
5.2 Studyiq Factors at Three LeveIa Three-Jevel factorial designs can also produce useful results elIiciently, especially when the number of factors is sma11~ay two or three. The following example, aimed at improving plastic parts, used a 3 x 3 factorial design in combination with a two-Ieve1 design similar to that described in the previous section. The defect rate of molded plastic parts was running around 1. 7% and needed to be reduced. The parts were being produced using two different processes (old, new) and two dilferent sources of raw material (A, B). Both processes could be operated over the same ranges of temperature (2S-75°e) and preosure (15-45 psi). The operating conditions in use at the time were sooe and 30 psi. It was decided to study the effects of temperature and pressure separately for each proceM and type of raw material, using three levels of temperature (25, SO, 75°e) and three 1eve1s of pressure (15, 30, 45 psi). The result-
F'reoeure PSI
Raw MateriaJ A Temperature (Oe) 25 75 SO
Oll
Contemporary Statistics
Raw MateriaJ II Temperature (Oe) 75 25 SO
Old Process 15 30 45
0 0.1 0
0 0 2.0
0.7 4.0 6.1
15 30 45
o
o o
0.4
0.1 0.6 0.3
0 0 0
0 12.7 23.7
New Process
o
o
13.9
0.6 2.4
o
0
o
0
0.4
0
2.4 19.7
0.1
Table 4: Reponses (percent defective parts) in the plastic part improvement experiment. ing 9 combinations of temperature and pressure form a 3 x 3 factorial design (Figure 5). Four such experiments were run--ne for each of the 4 combinations of process type (old, new) and type of raw material (A, B). The response was percent defective parts in a sample of 700 parts. The data from this experiment are summarized in Table 4. We see immediately, without any statistical analysis, that no defective p'rts were observed at 50·C when the pressure was at 15 30 psi. These findingll applied for both processes and ooth types of raw material. The next step was to verify these results by running a second set of 3 x 3 factorial designs in the region of a temperature-pressure combination that produced no defective parts. In the verification test, temperature was studied at 30, 45, and 6O"e with pressure at 10, 15, and 20 psi (Figure 6). No defective parts were observed at any of the 9 combinatioDs of temperature and pressure for either process and both types of raw material lots. These were exciting results: a defect-free p~ had not been seen before. The initial and verification tests had been run in the laboratory. It was now time to run a plGn l I ..t to see whether the manufacturing process would behave similarly. The plant test was run at the center point of the 3 x 3 verification test (45°e, 15 psi). No tlefecl8 ""'"' ob..rved in the pl""l tull! , A change in the standard operating procedures for the proceM was issued. In monitoring over subeequent months, the process de-
feet rate dropped from its original 1.7% to 0.1%. This resulted in an annual savings of 1500,000, to say nothing of the benefits, financial and otherwise, of increased customer satisfaction. This example illustrates three key points. First the iterative nature of experimentation (Figure 3) is evident. We progressed from the initial test, to the verification test, and finally to the plant test. Second. it illustrates the power of the 3 x 3 factorial design and a key scientific principle: the need to establish the range of validity of the results through studying different processes and lots of raw material. Finally, it demonstrates that the factorial design does an elfective job of sampling the experimental region. Hence, inspection of the data makes clear where the process should ,be operated. This is particularly true when only one response variable is of interest, as in this example (percent defective parts). Exploring Response Surfaces Response-surface methods are powerful tools for exploring experimental regions in which the factors are quantitative. Their graphical nature offers great advantages for easy interpretation and communication of results, as the following example illustrates. A research e1fort to develop an assay method to make plasma ammonia measurements using an automated instrument (Humphries et al. 1979) involved three key variables known to alfect tbe sensitivity of the instrument. These are bulfer pH, enzyme concentration, and buffer molarity. It was decided to study the efI'ecta of these
~
1
81
Suee and Hare: Design of Experiments
Per.pectives on Contemporary Statistics
82
Experimental Region
50
Initial Test
I
40
7.65
Plant Test
r~. ~
20
I
I
x------x
10
0
30
20
@:
pH
.
..
........ 7.25
50 60 Temperature (OC)
70
\J
I
x
40
I
~ i 6"~ e;- ---' tima! design theory requires a detailed knowledge of matrix theory (Federov 1972). It is interesting to note that both authors of this chapter have undergraduate degrees in mathematics. Knowledge of science; technology, and the scientific method are needed so that one can communicate effectively with the team doing the experimentation and understand how, the statistical approach fits into the process of experimentation. Knowledge of computer science and how to use computers to solve problems is needed in design (Snee 1985&), analysis, and interpretation. We know of experiments in which computers constructed the design, directed the experimental apparatus, collected the dsta, and did the analysis. All of the computer software for this was, of course, constructed by humans, but one got the feeling
that the data were "untouched by human hands." Knowledge of human behavior is. critical to seeing that experiments are designed cooperatively and correctly and that the changes suggested by the results are, in fact, implemented by the organization. Statisticians can benefit by obtalning feedback on their consulting ability and interpersonal skills (McCulloch et al. 1985). We all have a natural resistance to change. It can be overcome if we understand the emotional needs of the people we work with and how they are lilrely to interact with one another. It is also critical to understand how the proposed changes will affect the different groups and how to get the support of all affected groups, particularly the leadership of the organization. Thoae who practice statistical consulting need to know more about principles of communication and teaching. A working knowledge of the principles of behavioral science is extremely valuable to mathematicians and statisticians and to other scientists and engi-
neers.
8. IMPROVED QUALITY OF EXPERIMENTATION IS OUR GOAL The statistical approach to design of experiments applies to all types of experimental studies and has been found to speed the progreas of experimental programs. The statistical approach integrates well with the scientific method and iterative nature of experimentation and, as a result,
Ronald O. Snee is Leader of Continuous Improvement Development for Du Pont's Corporate Continuous improvement Process. He joined Du Pont in 1968 after receiving a B.A. degree in mathematics from Washington and Jefferson College and M.S. and Ph.D. degrees in 's tatistics from Rutgers University. His previous assignments at Du Pont included management of the development and implementation of continuous process improvement systems for project engineering, direction of Du Pont internal consultants, and service as Senior Consultant, Corporate R&D Planning. He also serves as an adiunct professor at the University of Delaware. Dr. Snee has published extensively on statistics, quality, and ap"lications. He has received the American Society for Quality Control's Brumbaugh, Frank Wilcoxon, Jack Youden, Ellis R. Ott,Shewell, and William G. Hunter awards. In 1986 he received the Shewhart Medal, ASQC's highest award. He is a fellow of the American Statistical Aasocistion, the American Society for Quality Control, and the American A.B8ociation for the,Advancement of Science. Lynne B. Hare directs the Technical Services Department at Thomas J. Lipton, where he has been instrumental in introducing the principles of statistical thinking, data-drlven decisions, and total quality management. A career statistician, he holds an A.B. degree from the Colorado College and M.S. and Ph.D. degrees from Rutgers University. His research interests are quality management, statistical applications in quality and product iv-
PenpectiveB on Contemporary Sla*is&ic5 ity, and mlxture deaign!, especially as they a.pply to food formulation and processing. He is a fellow of the American Society for Quality Control, past-chair of its Statistics Division, and a member of its Standing Review Board. He also serves on the TechnomdriCl Management Committee and is active in the American Statistical Association.
REFERENCES Box, G .E.P., Hunter, W.G. and Hunter, J.S. (1978), Statimcl for E.tperiment .... , New York: John Wiley. Business Week (1987), "The Push For Quality," Busin ... Week, June 8, 1987, 13()-143. Cleveland, W.S. (1985), The Element. Data, Monterey, CA: Wadsworth.
0/
Graphing
Cornell, J .A. (1990), E.tperiments With Miztu ..... , second edition, New York: John Wliey.
Deming, W .E. (1986), Out of 1M GriN, Cambridge, MA: MIT Center for
Ad~
Engineering Study.
Draper, N.R. and Smith, H. (1981), Applw &gre..ion AnalVlil, second edition, New York: John Wiley. Federov, V.V. (1972), Th..,." of Optim4J Ezperiment., New York: Academic Preas. Finney, D.J. (1960), The Theory of Erperimental De· .;gn, Chicago: The University of Chicago Press. Fisher, R.A. (1935), The Duign of Ezperiments, Edin· burgb: Oliver and Boyd. Hare, L.B. (1974), "Mixture Designs Applied to Food Formulation," Food Techno/oml, 28, 5G-56, 62. Humphries, B.A., Melnychuk, M., Snee, R.D. and Donegun, E.J. (1979), "Automated En2ymatic As.say for Plasma Ammonia," ClinicGl ChemutTv, 215, 26-30. Hunter, W.G. (1977), "Some Ideas About Teaching Design of Experiments with 25 Examples of Ex· periments Conducted by Students," The Americ4n Statiotici"n, 31, 12-17.
Sa.. and Hare; Design of Experiments McCulloch, C.~., Boroto, D.R., Meeter, D., Pollard, R and Zahn, D.A. (1985), "An Expanded Approach to Educating Statistical Consultants," The Americtm Stalillicicn, 39, 15~167 . Myers, G.C., Jr. (1985), "Use of Response Surface Methodology in Clinical Chemistry," in Ezperiment. in Ind",try, eds. RD. Snee, L.B. Hare and J.R. Trout, 5~74, Milwaula!e, WI: American Society for Quality Control. Ott, E.R. (1975), Procu. Quality Control, New York: McGraw-Hill. Penzi.., A. (1989), "Editorial-Teaching Statistics to Engineers," Science, 244, 1025. Rautela, G.S., Snee, R.D. and Miller. W.K. (1979), "Response Surface C~optimization of Reaction Conditions in Clinical Chemical Methods," Clinical Chemistry, 25, 1954-1964. Snee, R.D. (1979), "Experimenting with Mixture.," CHEMTECH, 9, 702-710. - - (1983), "Statistics in lndustry," in Encyclopedia of Statutical Science., eds. S. Kotz, N.L. Johnson, and C.B. Read, 6~73, New York: John Wiley. "Computer Aided Design of (1985a), Experiments-Some Practical Experiences," Journal of Quality Technology, 17, 222-236. -
(1985b), "Experimenting With a Large Number of Variables," in Ezperiment. in Ind",try-Design Analysi.t and Interpretalion of Result., eds. RD. Snee, L.B. Hare and J.R Trout, 25-35, Milwaukee, WI: .American Society for Quality Control.
- " (1988), "Mathematics Is Only One Tool That Statisticians Use," College Mathe"",liC8 Journal, 19,30-32. " - - (1990), "Statistical Thinking and Its Contribution to Total Quality," The American Statutieian, ", 116-121.
Snee, R.D., Hare, L.B. and Trout, J.R., (eds.) (1985), Ezperiment. in IndUltry-Design, A nalllN and Interpretation of Re.ult., Milwaula!e, WI: American Society for Quality Control. Snee, R.D. and Pfeifer, C.G. (1983), "Graphical Representation of Data," in Encyclopedia of Statuncal Sciences, Volume 3, eds. S. Kotz, N.L. Johnson and C.B. Read, 48S-511, New York: John Wiley.
91
Steel, R.G.D. and Torrie, J.H. (1980), Principles and Proceduru of StalilliC8-A Bicmdrical Approach, second edition, New York: McGraw-Hill. Tufte, E.R. (1982), The Vuual Dilplay of Quantitative Informtstion, Chesire, CT: Graphics Press. Wernimont, G. (1977), "A Ruggedness Evaluation of Test Procedures," ASTM Standardization New., 5, 61-64. . Youden, W.J. and Steiner, E.H. (1975), Statuncal Manual of the Auocicticn of Ojficial Analytical Chemut., Arlington, VA: Association of Official Analytical Chemists.