The Cooper-Harper Scale (CHS) for the subjective rating of aircraft handling ... Edwards AFB and the Navy Test Pilot School at Patuxent River NAS. ... physical workload involved in the control of the aircraft and of the need for changes to the system. ... test pilots at the Royal Aircraft Establishment at Bedford, England.
The Cooper-Harper Aircraft Handling Qualities Scale, from the Viewpoint of a Human Factors Flight Test Engineer White paper, 1998, unpublished James C. Miller, Ph.D., CPE Director, Human-Environmental Research Center* USAF Academy CO 80840 The Cooper-Harper Scale (CHS) for the subjective rating of aircraft handling qualities combines a decision tree approach with a 10-point ordinal rating scale (Cooper and Harper, 1969). This rating technique has been taught to student test pilots and test engineers at the AF Test Pilot School at Edwards AFB and the Navy Test Pilot School at Patuxent River NAS. It is used by test pilots in flight test programs. This is the first subjective scaling technique to which test pilots are exposed. It allows them to become trained psychophysical observers of aircraft handling qualities. The structure of the CHS is influential in shaping test pilots’ thoughts about the whole process of quantifying subjective ratings. When we present other subjective rating techniques to test pilots, we need to point out the differences between our desired rating process and the Cooper-Harper process. For example, most other techniques do not include the decision tree used in the Cooper-Harper Scale. The definition of "satisfactory" advanced by Cooper and Harper (1969) for use in the CHS was, "it isn't necessarily perfect, or even good, but it is good enough that [the pilot] wouldn't ask that it be changed. It meets a standard; it has sufficient goodness; it's of a kind to meet all pilot demands for the intended use." The adjective, "unsatisfactory" was defined as indicating "deficiencies and objectionable characteristics that he feels should be corrected." The dimension of "satisfaction," then, was linked to the operation, "change." If the handling qualities were "unsatisfactory," then they should be changed. "Acceptable" meant that “the flight phase (or task) can be accomplished; it means that the evaluation pilot would agree to use it for the designated role; that such deficiencies as may exist can be endured or tolerated. Use of the term 'acceptable' does not say how good it is, but does say that the pilot considers it good enough for the intended use. With these characteristics, the flight phase (or task) can be accomplished with adequate precision. The task, for example, may be accomplished with considerable effort and concentration on the part of the pilot, but the level of workload required to achieve this performance is tolerable and not unreasonable in context with the intended use." "Unacceptable" did "not necessarily mean that the designated flight phase (or task) cannot be accomplished; it does mean that the necessary performance cannot be achieved or that the effort, concentration, and workload required are of such magnitude that the evaluation pilot rejects the aircraft for this phase of its intended use."
The dimension of "acceptability," then, was equated with the subjective dimension, "tolerability" (the term used on the scale), and with the operational concepts of mental and physical workload. Cooper and Harper (1969) also defined "uncontrollable" to be a situation that could not even meet the pilot's criterion of maintaining "control only by restricting the tasks and maneuvers he is called upon to perform and by giving the configuration his undivided attention." Workload was just too high to maintain control of the aircraft. Considering these definitions, Cooper and Harper (1969) offered four categories of handling qualities. First, they defined the category, "uncontrollable" with "change" required. Second, they defined the category "unacceptable," in which the high workload was "intolerable" and some "changes" were required. Third, they defined the category, "Unsatisfactory but Tolerable," in which the workload was "tolerable," but a few "changes" were required. Finally, they defined the fourth category, "Satisfactory," in which workload was "tolerable" and no "changes" were required. The operational implications of the CHS, then, involve subjective impressions of the mental and physical workload involved in the control of the aircraft and of the need for changes to the system. The adjective, "satisfactory," refers to the need for changes, and the adjective, "acceptable," refers to impressions of workload. One problem with the CHS is that it calls for a single rating covering three dimensions. Satisfaction (the need, or lack of need, for change), acceptability (workload), and performance are all rated with a single number. The results of decades of psychophysical research argues against this method of scale design. Few people can agree that a rating of seven for acceptability is the same thing as a rating of seven for satisfaction. Another psychophysical problem is that the decision tree portion of the CHS confounds the interpretation of medians that fall in the gray areas between the scores three and four, between six and seven, and between nine and ten. The CHS does not suggest an underlying continuum that allows interpretations of medians that fall in these areas. Thus, the decision tree degrades the scale in terms of its quality of measurement. It may be considered as an ordinal scale with four ranks, though we usually analyze it as a 10-rank ordinal scale. For the latter analysis, we may use the statistical convention of assigning fractional scores to neighboring integers. If the CHS has these problems, why do test pilots continue to use it? They use it because it works in spite of its problems and because it has become a standard method of reference across generations of test pilots. It works for several reasons. First, for aircraft stability, if workload is high then system performance will be low and the system will require change. The three dimensions appear to have monotonic relationships for handling qualities that help to prevent ambiguities in ratings. This is unlike many other kinds of jobs where performance is best at an optimal workload level and poorer above and below that optimum workload level. Second, trained test pilots may tend to focus almost exclusively on the 10-point aspect of the scale when giving ratings (Roscoe, 19xx), reducing the ambiguity that might occur due to the mixture of Page 2
four- and ten-point scales. Finally, the decision tree appears useful in forcing pilots to consensus at the end of a flight test project. Test engineers do not care to have final CHS ratings from different pilots in different portions of the scale, for example some threes and some fours. Thus, test pilots force themselves to agree on one portion of the scale or another. This reduces some of the intersubject rating variability that is a natural error source in the subjective rating process. A Modified Cooper-Harper Scale Wierwille and Casali (1983) noted that the Cooper-Harper scale represented "a combined handling qualities/workload rating scale." They found it to be sensitive to psychomotor demands on an operator, especially for aircraft handling qualities. They wished to develop an equally useful scale for the estimation of workload associated with cognitive functions such as "perception, monitoring, evaluation, communications, and problem solving." The Cooper-Harper scale terminology was not suited to this purpose. They suggested a modified Cooper-Harper scale (MCH) which might "increase the range of applicability to situations commonly found in modern systems." Investigations were conducted to validate the MCH. They focused upon perception (e.g., aircraft engine instruments out of limits during simulated flight), cognition (e.g., arithmetic problem solving during simulated flight), and communications (e.g., detection of, comprehension of, and response to own aircraft call sign during simulated flight). The results suggested that the MCH was "a valid, statistically reliable indicator of overall mental work load." They recommended the use of the MCH in experiments where overall mental workload was to be assessed. Proper instructions to subjects were emphasized. Obviously, the application of the MCH to the test flight environment poses some problems. First, it was designed for use in experimental situations, not situations requiring an absolute diagnosis of a subsystem. Second, it carries with it the same interpretive difficulties noted for the CHS. Third, it carries with it the underlying assumption that high workload is the only determinant of the need for changing a subsystem. This final problem is extremely important. Using the CHS, it is possible, in flight test, to equate high workload with the need to change the aircraft's handling qualities. Poor handling qualities are, automatically, a safety of flight issue. This monotonic equation does not always apply to the relationship between mental workload and the need for subsystem change. Bedford Workload Scale Roscoe (19xx) described a modification of the CHS, created by trial and error with the help of test pilots at the Royal Aircraft Establishment at Bedford, England. The Bedford Workload Scale (BWS) retained the decision tree and four- and ten-rank ordinal structures of the CHS. It used the Cooper and Harper (1969) definition of pilot workload, "...the integrated mental and physical effort required to satisfy the perceived demands of a specified flight task." This approach was reported to be welcomed by pilots. The concept of "spare capacity" was used on the BWS to help define levels of workload. Roscoe reported that pilots found the BWS "easy to use without the need to always refer to the decision tree." He found that pilot workload ratings varied appropriately during close-coupled Page 3
inflight maneuvers in a BAe 125 twin-jet aircraft. He noted that it was necessary to accept ratings of 3.5 from the pilots. This suggests that the pilots emphasized the 10-, rather than the 4-rank, ordinal structure of the BWS and supports the idea of using scales of simpler design than the CHS and BWS. He noted the lack of absolute workload information provided by the BWS and suggested the use of short, well-defined flight tasks to enhance the reliability of subjective workload ratings. References Cooper, G.E., and Harper, Jr., R.P. The Use of Pilot Rating in the Evaluation of Aircraft Handling Qualities (NASA-TN-D-5153). Washington, D.C., NASA, 1969. Roscoe, A.H. Assessing pilot workload in flight. In Flight Test Techniques (AGARD-CP-373). Neuilly sur Seine, France, NATO Advisory Group for Aerospace Research and Development (AGARD), pp. 12-1 to 12-13, 1984. Wierwille, W.W., and Casali, J.G. A validated rating scale for global mental workload measurement applications. Proc. 27th Annual Meeting of the Human Factors Society, pp. 129-133, 1983. --*These notes were part of a larger memorandum concerning workload ratings written while Dr Miller was chief of the Human Factors Branch, AF Flight Test Center, Edwards AFB, 1987-89.
Page 4