Document not found! Please try again

Teaching Software Engineering to End-users - ACM Digital Library

3 downloads 347 Views 229KB Size Report
programmers, based on our findings in the bioinformatics .... (http://iscb.org/univ_programs/program_board.php), and ... databases and programming in Java, and C++ in all the ... programming course a tutorial on writing test cases, as well as.
Teaching Software Engineering to End-users Medha Umarji1, Mark Pohl1, 2, Carolyn Seaman1, A. Güneş Koru 1, and Hongfang Liu3 1 University of Maryland Baltimore County, Baltimore, Maryland, {medha1, cseaman, gkoru}@ umbc.edu 2 University of Maryland, School of Medicine, Maryland, [email protected] 3 Georgetown University Medical Center, Washington DC, [email protected] geological sciences and life sciences. Development environments within these communities of practice are very diverse and vary from large groups of designers and developers that create professionally developed systems, to individual researchers with little experience who develop or simply modify existing software to accomplish a particular task. Many software developers in these domains can be termed as end user programmers.

ABSTRACT Bioinformatics software is an example of immensely complex and critical scientific software, and this domain provides an excellent illustration of the role of end user computing in the sciences. To explore these interesting characteristics from a software engineering standpoint, we had conducted an exploratory survey of bioinformatics developers. The survey had a range of questions about people, processes and products. As software engineering researchers, we realized that the survey results had important implications for the education of bioinformatics software professionals. Through this paper we intend to open an avenue of discussion about software engineering knowledge that should be taught to end user programmers, based on our findings in the bioinformatics domain. In addition to the survey results we went through the curricula of more than fifty bioinformatics programs as well as the contents of over fifteen textbooks. We observed that there was no mention of the role and importance of software engineering practices essential for creating dependable software systems. We present a set of recommendations for improving bioinformatics education in terms of software engineering principles and ways that they apply in the context of end-user development.

End user programming is currently the most prevalent form of programming[1]. Software development techniques (such as requirements elicitation), goals and domain expertise are some of the factors that set end user programmers apart from professional programmers[2]. Burnett et al. [1] have discussed the lack of “dependability” of the programs end users create, particularly because faults in these programs have resulted in losses close to millions of dollars. One discipline that has a well-defined software development community of practice is bioinformatics, which is defined as “any application of computational methods to biological problems, from statistical genetics to design of man-machine interfaces” [3]. The bioinformatics community is traditionally a research community, with a high degree of heterogeneity and openness to sharing of information and software [4]. Much of the software developed in this domain is either developed as open source or is eventually made open source[5, 6]. As with any scientific discipline, the field of bioinformatics has a high need for reliable, error-free and accurate software systems.

Categories K.3.2. [Computers and Education]: Computer and Information Science Education.

We argue that bioinformatics provides an interesting example of domain-specific end user programming. This is not to say that all bioinformatics software is developed without professional programmers. But much of it is (how much, exactly, is an open question) and, even when professional programmers are hired to develop bioinformatics software, the process still requires very close cooperation between those programmers and other bioinformatics professionals. Requirements cannot simply be “handed off” from the domain experts to the degree that is possible in other disciplines. Close cooperation is necessary in order to keep up with changing hypotheses, new algorithms and new methods for handling vast quantities of data [7]. So we believe it is safe to say that many bioinformaticians, who are not trained as software developers, are heavily involved in the development of bioinformatics software. For this reason, we think it is useful to study this community of practice from an end user programming perspective.

General Terms Design, Documentation, Experimentation, Human Factors

Keywords Bioinformatics, software engineering, education, end-user programming.

1. END USER PROGRAMMING AND BIOINFORMATICS As the ubiquity of software increases, software functionality is becoming more specific to the domain in which it will be used. Over the years several communities of practice have evolved with their own approaches to and philosophies about software development, particularly in domains such as space exploration, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. WEUSE IV’08, May 12, 2008, Leipzig, Germany. Copyright 2008 ACM 978-1-60558-034-0/08/05…$5.00.

We believe that, by taking an end user programming perspective in studying bioinformatics software developers, we can extrapolate our findings to other populations of professional end user programmers, particularly in other scientific computing disciplines. Our exploratory work (which cannot be described in detail here, but some of which is reported in [8]) has included interviews and surveys focusing on such issues as software

40

course in the computer science segment of a bioinformatics program.

development practices, software maintenance, quality assurance, and education in the bioinformatics domain. Our results indicate that there is a need for software with better quality, more documentation and better integration of systems. This assessment is based on perceptions and opinions of practitioners in this domain as well as indications that the bioinformatics education lacks content to improve the situation. In this position paper, we focus on educational issues.

In all out of a total of 79 program offerings, there were only 2 instances where a software engineering related course was a required part of the curriculum. As an elective, software engineering featured 13 times in the degree-program combinations.

2.2 Survey of Leading Textbooks

In order to understand what bioinformaticians are taught currently, we studied the syllabi and contents of bioinformatics programs in universities across the United States, using a list compiled by the International Society of Computing Biology (ISCB). We also studied the tables of contents of textbooks commonly used in bioinformatics curricula. In both cases, we were looking for the extent to which software engineering principles and techniques are included.

We also examined closely the tables of contents of bioinformatics textbooks that focused on computational approaches to solving bioinformatics problems. All the tables of contents were examined in detail online, as it was not feasible to physically go through the content of each book. Again, we made note of the topics commonly covered in such textbooks. It was easily evident that there was no information about software engineering concepts within the bioinformatics textbooks. We observed that there was some emphasis on design patterns in the object-oriented programming-related books, but there was nothing about, for example, techniques for prototyping, testing and quality assurance in bioinformatics projects.

We then compared what we found about bioinformatics curricula to some of our survey findings that point to specific software engineering areas that deserve more attention in this domain. The result is a set of recommendations for improving bioinformatics education in software engineering principles and ways that they apply in the context of end-user development.

The entire list of textbooks can be found at the ISCB website: http://www.iscb.org/bioinformaticsBooks.shtml.

2. WHAT BIOINFORMATICIANS STUDY In this section we provide detailed information about our survey of the coursework in bioinformatics programs and textbooks.

3. WHAT BIOINFORMATICIANS COULD STUDY

2.1 Survey of Bioinformatics Programs

We had conducted a survey of bioinformatics professionals subscribed to OSS project mailing lists listed at the open-bio foundation. The survey questions addressed relevant characteristics of the population and the domain, such as academic background, perceptions of product quality, documentation practices and maintenance-related activities. The survey instrument was created and distributed online. After blank responses were deleted, we had a valid sample of 126 respondents. The response rate was 27.9%.

Bioinformatics is now an established field of study all over the world. This is evidenced in the ubiquity of both graduate and undergraduate bioinformatics software programs. We acquired a list of bioinformatics programs within the United States from the International Society of Computational Biology (http://iscb.org/univ_programs/program_board.php), and browsed through each program’s website. We considered all the bioinformatics, computational biology, biomedical informatics and related programs within the United States. We did not, however, distinguish between programs that have more of an emphasis on tool development vs. those that focus more on the use of tools for biological analysis. From the website information, we took note of the software engineering courses within each program. Our compiled list is available at: http://userpages.umbc.edu/~medha1/bio/BioinformaticsProgram s.xls

In this section, we present recommendations for software engineering content, based on a) our survey, b) literature on bioinformatics software development and c) general software engineering knowledge. These recommendations should be seen as a starting point for discussion. While they are empirically grounded in our survey of bioinformatics developers, the survey was not originally intended to explore educational issues. We also do not take into account the cost and resource issues involved in incorporating these recommendations, which obviously would have to be considered before implementing any changes, and would probably require some prioritization of the recommendations. Due to the interdisciplinary nature of bioinformatics, end users are overloaded with information, and hence we recommend a concise, handy set of guidelines rather than an exhaustive coverage of topics.

We observed that there is a consistent effort to include computer science courses such as design and analysis of algorithms, databases and programming in Java, and C++ in all the bioinformatics programs[9, 10]. There was, however, little or no training given to these students on basic software engineering principles. Especially given that the end user programmers in this domain learn primarily through self-teaching (a result from one of our surveys), it is all the more important to incorporate these concepts into bioinformatics curricula, rather than assuming that end users in this domain will take traditional software engineering courses.

We feel bioinformatics end user programmers should be made aware that there are many possible approaches to software design and development, as this knowledge will help them in making an informed choice about the programming paradigm they would like to use. Our survey results point to some confusion among respondents about the differences between different development paradigms. Although it may seem trivial or obvious, we observed that very few of the surveyed

Bioinformatics educators mention that skills in programming, databases, artificial intelligence and algorithms are absolutely essential [10], [9]. We observed that software engineering is sometimes offered as a course elective and rarely as a required

41

It is no doubt important to design different tools and systems to support end user software development. However, if software engineering best practices become an integral part of the education process for (at least some types of) end user programmers, there is a higher chance that they will be incorporated into daily work practices, thus impacting the dependability of software created by end user programmers.

bioinformatics programs and textbooks actually carry this information. Each paradigm can be discussed in some detail along with advantages and disadvantages. Bioinformatics research and practice have critical implications for life sciences, and it is absolutely essential to have strong quality assurance (QA) practices such as code reviews and testing to ensure good quality. In our previous study of QA practices in bioinformatics [11], testing and code review practices were found severely lacking, particularly in smaller sized projects. Other authors [12] have also suggested strengthening the QA practices of scientific software development in general. One idea is to include in a programming course a tutorial on writing test cases, as well as how to perform a code review. A step by step process, with the basic elements in each, can be explained.

Our recommendations are meant to spark a discussion on these issues, and there needs to be further empirical validation of the right type of course material to be provided to end users in different domains.

5. REFERENCES [1] M. Burnett, C. Cook, and G. Rothermel, "End-user software engineering," Commun. ACM, vol. 47, pp. 53-58, 2004.

Bioinformatics is still a very young field, and software developed in this field has not yet matured enough to be studied from an evolutionary perspective. Our survey had a high incidence of “neutral” responses to questions about evolutionary issues such as system age, complexity, and change-proneness. This may imply a low awareness of the existence of such relationships. As their applications move into legacy status, end user programmers need to know more about the complex relationships between software size, complexity and age, so that they can write well-structured and documented programs.

[2] S. Judith, "When software engineers met research scientists: A case study," Empirical Softw. Engg., vol. 10, pp. 517-536, 2005. [3] D. Counsell, "A review of bioinformatics education in the UK," Briefings in Bioinformatics, vol. 4, pp. 7-21, 2003. [4] L. Stein, "Bioinformatics: Gone in 2012.," in O’Reilly Bioinformatics Technology Conference (Keynote Address). San Diego CA, 2003. [5] P. v. Heusden, "Applying software validation techniques to bioperl," in Bioinformatics Open Source Conference, 2004.

Good software documentation is an important contributor towards facilitating software maintenance and reuse. In our survey, respondents said that it would be beneficial to them to have an increased amount of documentation. Baxter et al. [12] also cite documentation as a key practice to be strengthened in scientific software development. Therefore we suggest that tips for creating and maintaining documents such as user guides and system descriptions should be included.

[6] M. Pocock, "Biojava toolkit progress," in Bioinformatics Open Source Conference, 2002. [7] C. Letondal and W. Mackay, "Participatory programming and the scope of mutual responsibility: Balancing scientific, design and software commitment," in Proceedings of the Eighth Conference on Participatory design: Artful integration: interweaving media, materials and practices Volume 1. Toronto, Ontario, Canada: ACM, 2004.

There is a consensus amongst the thought leaders in bioinformatics that the only way forward is to reuse as many programs and research results as possible[4]. It would be very useful to instruct the future bioinformatics software engineers about techniques for reusing programs and documentation and leveraging the power of open source for mutual benefit.

[8] A. G. Koru, K. El-Emam, A. Neisa, and M. Umarji, "A survey of quality assurance practices in biomedical open source projects," Journal of Medical Internet Research, vol. 9, pp. e8, 2007. [9] D. T. Burhans and G. R. Skuse, "The role of computer science in undergraduate bioinformatics education," SIGCSE Bull., vol. 36, pp. 417-421, 2004.

We understand that practices from traditional software engineering may not be directly applicable to the end user programming context, due to inherent differences between the two. However, they do provide an excellent starting point for discussion on what software engineering methods should be taught to end user programmers.

[10] T. Doom, M. Raymer, D. Krane, and O. Garcia, "A proposed undergraduate bioinformatics curriculum for computer scientists," in Proceedings of the 2002 ACM Special Interest Group on Computer Science Education (SIGCSE 2002). Covington, Kentucky, 2002.

4. CONCLUSIONS In this position paper, we have presented some preliminary recommendations about the education of bioinformaticians that are suggested by our exploratory research in characterizing the bioinformatics software development community. Also based on this exploratory work, we argue that this population exhibits many characteristics of end user programmers, and thus can be studied from that perspective. It is interesting, then, to extend this idea to view programmers in other scientific computing disciplines as end user programmers, as others have suggested [13]. Further, we would like to propose that our recommendations for the education of bioinformaticians may also be applicable to these other scientific domains.

[11] A. G. Koru, K. El-Emam, A. Neisa, and M. Umarji, "A survey of quality assurance practices in biomedical open source projects," Journal of Medical Internet Research, vol. 9, pp. e8, May, 2007. [12] S. M. Baxter, S. W. Day, J. S. Fetrow, and S. J. Reisinger, "Scientific software development is not an oxymoron," PloS Computational Biology, vol. 2, September, 2006. [13] J. Carver, "Empirical studies in end-user software engineering and viewing scientific programmers as endusers: Position statement," presented at Dagstuhl Seminar Proceedings 07081: End-User Software Engineering, 2007.

42

Suggest Documents