From Practice to Research: A Plan for Cross-Course ...

6 downloads 92 Views 188KB Size Report
technical course in which some engineering ethics has been inserted. ... between scores on course-specific pre- and post-tests to provide a measure allowing.
From Practice to Research: A Plan for Cross-Course Assessment of Instruction in Engineering Ethics Michael Davis Illinois Institute of Technology, Chicago, USA [email protected]

Abstract: This paper has three parts. The first briefly distinguishes the subject of this research from the main line of research in ethics education. The main line is concerned with assessing improvement in ethical judgment (or moral development). The research discussed here is concerned with assessing improvement in ethical sensitivity and ethical knowledge. The second part describes work already done that provides a model for what is proposed. The third part sketches the research program itself. “All else equal, simplest is best.”—Anonymous

Introduction This paper proposes a novel research program to assess methods of teaching engineering ethics, a program that would allow ordinary instructors, with little effort, to turn ordinary assessment tools (graded exams, homework assignments, and so on) into publishable research, whether the course in question is a stand-alone course in engineering ethics or a technical course in which some engineering ethics has been inserted. This paper has three parts. The first briefly distinguishes the subject of this research from the main line of research in ethics education. The main line is concerned with assessing improvement in ethical judgment (or moral development). The research discussed here is concerned with assessing improvement in ethical sensitivity and ethical knowledge. The second part of this paper describes work already done that provides a model for what is proposed, the use of ratios between scores on course-specific pre- and post-tests to provide a measure allowing assessment across courses, programs, and even institutions.( For a fuller description of that work, see Davis and Feinerman, 2012.) While the use of pre- and post-tests is not new, the use of their ratios across courses, programs, and institutions to do assessment is. The third part of this paper sketches the research program itself—or, rather, a framework for answering a family of research questions.

Moral development versus ethical sensitivity and knowledge A discussion of teaching ethics may be about “ethics” in one of at least three senses: 1) morality (universal standards of conduct), 2) moral theory (the philosopher’s typical use), or 3) the morally binding standards that apply to members of a group simply because they are members of that group (the sense carried in such terms as “code of professional ethics”). My focus here is on the special-standards sense, since engineering ethics only applies to engineers. Teaching engineering ethics (in this special-standards sense) may have at least one of the following four objectives:

1



increasing ethical sensitivity



adding to ethical knowledge



improving ethical judgment



enhancing ethical commitment.

Ethical sensitivity is the ability to identify ethical problems in context, for example, noting that a certain design raises a question of public safety or waste. Ethical knowledge is ordinary knowledge relevant to resolving an ethical problem. Some ethical knowledge is propositional (“knowing that”), for example, knowing that one’s conduct is governed by law, organizational regulations, and professional code. But much ethical knowledge is skill, for example, knowing how to use an ethical decision procedure or how to raise an ethical objection in a way likely to win support. Ethical judgment is the ability to design a reasonable course of action for the ethical problem identified. Ethical commitment is the tendency, likelihood, or “will power” to act on one’s ethical judgment. What is sometimes called “moral imagination” is either an aspect of sensitivity or an aspect of judgment, depending on whether the term is understood as referring to the ability to appreciate the consequences of one’s choice (sensitivity of a sort) or the ability to invent alternatives to the choices with which one has been presented (part of judgment). Given its ambiguity, “moral imagination” seems to be a term better avoided. (For more of judgment, Davis, 2012.) Most published attempts to assess the teaching of ethics are concerned with “moral development” as measured by the Defined Issues Test (DIT) or similar standardized assessment tool. Do the scores of this or that class rise significantly after the “intervention” being assessed? Since the DIT is standardized, there is no difficulty in comparing scores after a lecture, discussion, role-play, or other exercise in Class A with scores after a different exercise in Class B. Such comparisons seem to give us useful insight not only into which exercises work and which do not but also into which work better than others, that is, which contribute to moral development more. The literature on in-course assessment of ethics learned is not large but growing quickly. (Beside other work cited here, see, for example: Bebeau (2002a, b, 2005), Mumford et al. (2006), Loui (2006), and Kligyte et al. (2008).) Moral development is, I think, a close relative of (what I have called) ethical judgment, though that remains to be proved. (A test of ethical development in engineering has recently become available. Borenstein et al. (2010).) If, as I believe, ethical judgment improves with ethical development, the title of their article is not misleading. In any case, what is certain is that the literature on assessment of teaching engineering ethics, indeed, the literature on teaching professional ethics generally, is almost silent on what methods work best for teaching ethical sensitivity, knowledge, and commitment. That is understandable for ethical commitment: the classroom does not seem to be a good place from which to predict conduct (or gauge will power), especially in the radically different circumstances of professional practice, though future conduct is the chief test of ethical commitment. On the other hand, the failure to assess methods of teaching ethical sensitivity and ethical knowledge just seems odd. The classroom is a good place to do such assessment. After all, those who teach engineering ethics typically ask students to identify ethical issues in a scenario, problem, or case—and grade the results. They also typically provide students with ethical knowledge and then grade students on what they show they know, for example, when asked to resolve the ethical issues they identified. Most of the work necessary for assessing methods of teaching ethical sensitivity and ethical knowledge is already being done in almost any classroom in which engineering ethics is being taught, tested, and graded.

2

There are obstacles to such assessment, nonetheless. Perhaps the most important is that there is no standardized test for ethical sensitivity or ethical knowledge—and, given how tied to course content the course’s coverage of ethical sensitivity and knowledge typically are, there is unlikely to be an informative standardized test without first standardizing the ethical sensitivity and knowledge actually taught in the courses to be compared. We can, of course, do some assessment across quite different courses in engineering ethics using student selfreports. But, while a self-report can tell us whether students noticed the ethics, thought they learned something useful, and approved of what they were taught, a self-report cannot tell us how much, if anything, the students actually learned, that is, how much their ethical sensitivity or knowledge actually improved. The new element in what I am proposing is to measure that improvement across courses without a standardized test and, indeed, without much change in grading procedures. One can simply divide the score that students achieved on a test after they were taught certain material by the score achieved on a test before they were taught the material, representing that ratio as a decimal. Though I think that the simplicity of this method and the production of numbers greater than 1.0 is worth the small risk of the pre-test score being zero, statisticians recommend a more complex formula: Post-test Score – Pre-test Score)/(Maximum Possible Score – Pre-test Score. They are certainly right if a computer is to do all the calculations, since a pre-test score of 0 produces an infinite number, freezing most computers. The resulting decimal can then be compared to the corresponding decimals from the same course in other semesters, from other courses, or from other institutions. Simple—but will it produce significant results?

Proof of concept Under a grant I held from the National Science Foundation (EEC-0629416), Alan Feinerman, an electrical engineer at the University of Illinois-Chicago, taught ECE449, a course in nanofabrication with both graduate and undergraduate students, in which he tried in small ways to increase the ethical sensitivity and knowledge of the students. In the Spring Semester 2008, before covering any ethical material, he gave an exam (T1) in which several of the questions included an ethical element relevant to the course. He did the same (T4) near the end of the semester after covering the ethical material. In Spring 2009, he made some adjustments and tried again. In 2010, another engineer taught ECE449 in place of Feinerman, omitting the ethics, but giving the pre- and post-test, thus providing a control. T1 had nine questions, three of which concerned ethics in whole or in part. For example, the second question on T1 was (designed to be) entirely about ethics. It read: “List one advantage and one disadvantage MEMS/NEMS has for society (that means you, your relatives, their friends, …)?” Knowing at least one social advantage and at least one social disadvantage of technological breakthroughs in micro-electrical mechanical systems (MEMS) or nano-electrical mechanical systems (NEMS) concerns engineering ethics insofar as engineers have a professional obligation to look after the public health, safety, and welfare and knowing about such advantages and disadvantages is necessary to do that (and is therefore a form of ethical knowledge). One reviewer made the following objection to this sample question: One concern about the pre/post assessment method described in the paper is whether it effectively assesses gain in knowledge of ethics, at least in the specific way the implementation is described in this paper. As an example of a question used to

3

assess knowledge of ethics, the author provides the question: ‘List one advantage and one disadvantage MEMS/NEMS has for society’. The author rightfully states that this topic concerns ethics insofar as engineers [have] a professional obligation to look after public health and safety. It seems possible, however, that a student could address advantages and disadvantages of MEMS without explicitly addressing surrounding ethical issues, and could demonstrate a gain in pre/post test score without having developed an understanding of the underlying and related ethical issues. The author should describe how pre/post gain requires explicit (not just implicit) demonstration of understanding of ethical principles, for example the four specific objectives that are mentioned in the introduction. The demand that students demonstrate an “understanding of ethical principles” confuses a knowledge of ethics (moral theory) with ethical knowledge in general (a knowledge of what one needs to know to practice ethics reasonably well). Much, perhaps most, ethical knowledge is not a knowledge of ethics. When integrating engineering ethics into technical courses (as in the course under discussion), one need not teach ethical principles to teach a reasonable amount of ethical knowledge. When discussing what a particular course should achieve, we must be careful not to let the ideal (the perfect) bar the way to the useful (the “good enough”). For more on this practical point, see Davis, 2010.That said, I think the reviewer’s instincts are right. We could ask questions that do a better job of prompting display of specifically ethical knowledge than this question did. And in fact Feinerman did that the next year (as explained below). Neither Feinerman’s lectures nor the assigned readings for ECE449 had yet discussed the social advantages or disadvantages of MEMS/NEMS when Feinerman gave T1. The corresponding question on T4 was: “List one advantage and one disadvantage that a microfluidic device has for society (that means you, your relatives, their friends, …)?” The only difference between this T4 question and its T1 counterpart is that students should (and probably would)—given what the course covered—have learned something about the social advantages and disadvantages of micro-fluidic devices during the semester (for example, their use in providing diabetics with a constant supply of insulin). The only difference between an ethics question and others on T1 or T4 is in the answers that would be appropriate (for example, because the question concerned ‘‘advantages for society’’ rather than mere control of fluid flow). In this respect at least, Feinerman had integrated the ethics pre-test into the course’s ordinary assessment process. Students saw nothing odd in these questions since they seemed to test student knowledge of the technology (and, in fact, did that too). Scoring such questions was simple. No correct item earned a score of 0; one correct item, 5; and at least two correct items, 10. Simple scoring is important for at least three reasons. First, it makes learning to score ethics questions (relatively) easy (or, at least, less daunting) for anyone familiar with the course’s technical material but not with grading ethics. A more complex set of rubrics (though providing a better assessment overall) would take a professor of engineering longer to learn to use reliably (and is therefore more daunting). (See, for example, the rubrics used in: Sindelar et al. 2003.) Here again, an instructor need not discard a useful strategy because it is not perfect. Like most engineers, Feinerman had neither the time nor the patience to learn a complex grading strategy; he also did not have the money or time to train a graduate student to do the grading for him. Second, the simple scoring meant that scoring the tests would not be nearly as time consuming as using a more complex rubric would be. Third, the simplicity of scoring

4

meant that the grading of questions would be (relatively) objective. There would be relatively few judgment calls. The results of this first attempt to measure increases in ethical sensitivity and knowledge were suggestive but not statistically significant. Of the three pairs of questions, two of the three mean scores for graduate students showed improvement (from 8.3 to 9.2 on one and from 9.2 to 10 on the other). But on a third pair of questions, change went the other way (from a mean of 9.2 to 7.5)—owing to one student whose score went from 10 on T1 to 0 on T4. One student has a significant impact when the total comparison group is six. The mean ‘‘improvement’’ for the grad students was 0.98 (the sum of the T4 means divided by the sum of the T1 means), an overall loss of ethical sensitivity or knowledge. The undergraduate result was better: 1.14. But neither result was statistically significant. The standard deviation on pre- and post-tests for both grad and undergraduate students was well above 3. These results illustrate a problem of dealing with small numbers of students. The negative result for the graduate students was due entirely to one student getting 10 on one question on the pre-test and 0 on the corresponding question on the post-test. All other grad students got the same score on that T4 question as on the corresponding T1 (10 in all but one instance)—and all the undergrads but one also got 10 on it (with that one scoring 5 on both). Four undergrads improved over the first test; none did worse. The temptation, then, is to treat the grad student’s 0 on that one post-test question (after a 10 on the pre-test) as an outlier (perhaps he misread the question, had just received bad news, or was in some other respect understandably “off” on that day). Ignoring that one outlier, the grad students’ overall improvement score is the same as the undergrads’: 1.1. The results of this first attempt at assessment of improvement in ethical sensitivity and knowledge were humbling—for at least three reasons, all related to grading. First, students occasionally came up with answers that, though insightful in their way, managed to ignore what had been taught (for example, they identified true social disadvantages not covered in the course). Feinerman felt he had to give credit for such answers (and that credit is represented in the scores reported here). Generally, according to Feinerman, students with more practical experience in engineering (for example, ten years in the field) were more likely to produce such unexpected answers than those with less experience. Second, but related to the first problem, Feinerman came to believe that using different questions on T1 and T4 added substantially to problems of comparison because one question might produce more unexpected responses than another. We therefore decided to use the same questions on the pre- and post-test in the second trial rather than (as in this first) similar questions that might turn out not to be as similar as supposed. Third, we decided that the questions, though easy to grade (10, 5, 0), did not provide enough information about what students had learned. More open-ended or complex questions might provide more information (without making the grading more complicated). Though humbling in these ways, the first results are nonetheless suggestive, more suggestive than the usual statistical measure of significance acknowledges. Except for that one graduate student’s slip on one question, all the students in the class, grad as well as undergrad, either improved their ethics score from T1 to T4 or held steady. While inserting small amounts of ethics into a technical course seemed to have had only a small effect (.1), it did seem to have some, enough at least to measure, even if (given the small numbers involved) the results were not statistically significant. We had reason to be hopeful—and to try an improved pre- and post- test the following year.

5

I report these negative results for three reasons. First, I think it important for researchers to report their failures. Doing so may help other researchers to avoid repeating those failures, thus saving both their time and the social resources they would otherwise waste. Second, I think it important for other researchers to see that the method I propose using may need adjustment. It is not a mere deduction from theory but an invention designed to fit a variety of practical constraints. Seeing how that fit was achieved may suggest better ways to fit those constraints or better ways to fit somewhat different constraints. Third, the ideal audience for this paper is not social scientists but teachers of engineering ethics, especially engineers and philosophers, those who seem to carry most of the burden of such teaching. For them, this narrative is likely to be more informative than the impersonal, a-historic format that the social sciences seem to favor for reporting such research. In Spring 2009, Feinerman made several improvements in the research design. First, rather than using similar questions, Feinerman used the same set of ethics questions in the preand post-test (hoping thereby to reduce unexpected variability). Second, the ethics questions were somewhat different. Feinerman came to think that explicitly asking about ‘‘social’’ advantages and disadvantage was too explicit and might make the ethics questions stick out too much. So, he tried to do a better job of integrating the ethics into the technical material. Here, for example, is the first ethics question in the pre-test (and post-test) for Spring 2009: (a) Why are researchers investigating nanotechnology to correct environmental pollution? (b) List advantage(s) of using nanotechnology for environmental pollution remediation. (c) List disadvantage(s) of using nanotechnology for environmental pollution remediation. (d) Give an example of an acceptable and an unacceptable use of nanotechnology for environmental pollution remediation. The first part (a) requires technical information (more or less). The fourth part (d) requires a specifically ethical response (assuming ‘‘acceptable’’ to be a reasonable prompt for ‘‘ethical’’). The middle two parts (b and c) are explicitly open-ended, allowing students to list as many advantages or disadvantages as they can (some of which could be ethical, though they would not be included in the ethical assessment). The pre-test had six such questions (of varying difficulty). The post-test was identical. The pre- and post-tests were given as freestanding quizzes to between 75 and 100% of the total class (depending on attendance). The pre-test was given near the middle of the semester (April 1, 2009); the post-test, near the end (May 1, 2009). Each post-test concerned nano topics that were covered after the pre-test. Feinerman chose to separate the pre- and post-test from his regular sequence of tests for two reasons. First, he wanted to have more questions than would be possible if he integrated the questions into the regular tests (six ethics questions rather than the three he would otherwise have used). The larger number of ethics questions made it likely that the overall standard deviation would be smaller. Second, he had come to think that his students would find the ethics questions enough like his other test questions that he need not bury them in a regular test. The results of this second try were better than the first. The standard deviation was much smaller than on the first try: 0.59 rather than well above 3 (for undergrads as well as for grads). The results were therefore significant. The measured improvement was also much greater: 1.40 for grad students and 1.60 for undergrads.

6

The control was accomplished during Spring 2010 with pre- and post-tests administered to between 58 and 100% of the class (depending on attendance on that day). The pre- and post-test were, in most respects, like those in the second try. The only exception was that Feinerman, being on Sabbatical that semester, did not teach the course (though he did administer and grade the tests). The ECE449 instructor that semester did not discuss any of the ethical nano-topics on the pre- and post-tests. He taught the course much as Feinerman would have taught it before he began to integrate ethics. The 2010 version of ECE449 was therefore as clean of ethics as a course in engineering might reasonably be. Each test was administered part-way through a regular lecture. The pre-test was administered on March 15; the post-test, on April 26. The number of students taking the tests were about the same as the year preceding (ten grad students and seven undergrads). Feinerman graded the six ethics questions on each of the two tests just as he had done the year before. This time, however, the students showed virtually no ethical progress overall (about .04 as against .4 or .6 for the second trial). The standard deviation was about the same as on the second try (.4 for grad students, .6 for undergrads). The difference between teaching some ethics in a single engineering class and teaching none was, it seemed, both measurable and significant. These results certainly suggest that the effectiveness of various methods of teaching ethical sensitivity and ethical knowledge can be measured in a way allowing comparison across courses—and, therefore, even across institutions.

The proposed research program Most instructors who teach engineering ethics, whether in a stand-alone course or integrated into an ordinary engineering course, give tests to determine what their students have learned. The tests typically require students (among other things) to identify ethical issues arising in a certain engineering context, explain them, and use information taught in the course to resolve them. Some tests are “objective”; some, short answer or essay. Tests are typically given after the students have been taught the relevant material. That is, the instructors are now using “post-tests” concerned (in part) to assess the student’s ethical sensitivity and knowledge. What I propose is that students be given a pre-test as well, one identical to the post-test, on the first day of class, on the day before there is any discussion of ethics, or one some other suitable day; that the results of pre-test and post-test be compared much as I just described (using decimals); and that the results be reported along with a description of what was taught and how it was taught. Doing as I propose would, of course, allow instructors to assess the effectiveness of their teaching of ethical sensitivity and knowledge in any given semester, no small thing in itself, but also would allow them to experiment with improvements, since improvement could be assigned a definite number, one allowing comparison across different versions of the same course—and, therefore, also across different courses in the same institution or different courses in different institutions. Instructors can turn the results of ordinary tests into publishable research—if they obtain significant results or can offer useful insight into why a certain teaching method failed to produce such results. Most important for my purposes, the research done in this way should add considerably to our knowledge of how to teach engineering ethics, for example, whether real cases are better for teaching ethical sensitivity than fabricated cases or whether case discussion is better than lecture for transmitting ethical knowledge. An individual teacher need not do a comparative study. Another researcher could use the published results of individual classes to make the crosscourse comparisons and, indeed, cross-institution comparisons.

7

References Bebeau, M. J. (2002a). The defining issues test and the four component model: Contributions to professional education. Journal of Moral Education, 31(3), 271–295. Bebeau, M. J. (2002b). Outcome measures for assessing integrity in the research environment (Appendix B), in integrity in scientific research: Creating an environment that promotes responsible conduct. National Academy Press, Washington, D.C. (available on NAP website: http://www.nap.edu/books/0309084792/html). Bebeau, M. J. (2005). Evidence-based ethics education. Summons, The Journal for Medical and Dental Defence Union of Scotland (Summer), 13–15. Borenstein, J., Drake, M. J., Kirkman, R., & Swann, J. (2010). The engineering and science issues test (ESIT): A discipline-specific approach to assessing moral judgment. Science and Engineering Ethics, 16, 387–407. Davis, M. (2010), The Usefulness of Moral Theory in Teaching Practical Ethics: A Reply to Gert and Harris, Teaching Ethics 11, 51-60. Davis, M. (2012), A Plea for Judgment, Science and Engineering Ethics 18 (December), 789-808. Davis, M. and Feinerman, A. (2012), Assessment of Teaching Ethics to Graduate Students in Engineering,” Science and Engineering Ethics, 18, 351-367. Kligyte, V., Marcy, R. T., Sevier, S. T., Godfrey, E. S., & Mumford, M. D. (2008). A qualitative approach to responsible conduct of research (RCR) training development: Identification of metacognitive strategies. Science and Engineering Ethics, 14, 3–31. Loui, M. C. (2006). Assessment of engineering ethics video: Incident at Morales. Journal of Engineering Education, 95, 85–91. Mumford, M. D., Devenport, L. D., Brown, R. P., Connelly, S., Murphy, S. T., et al. (2006). Validation of ethical decision making measures: Evidence for a new set of measures. Ethics and Behavior, 16, 319– 345. Sindelar, M., Shuman, L., Besterfield-Sacre, M., Miller, R., & Mitcham, C., et al. (2003). Assessing engineering students’ abilities to resolve ethical dilemmas. In Proc. 33rd annual frontiers in education 3 (November 5–8). S2A 25–31.

8