GLOBAL ENGINEERS & TECHNOLOGISTS REVIEW www.getview.org
SOFTWARE TOOLS IN BIOINFORMATICS: A SURVEY ON THE IMPORTANCE AND ISSUES FACED IN IMPLEMENTATION AHMAD1, T. and AL-SENAIDY2, A.M. 1, 2
Department of Biochemistry College of Science King Saud University P.O. Box 2455, Riyadh, SAUDI ARABIA 1
[email protected] ABSTRACT Bioinformatics tools are very complex and critical software tools in life sciences. Bioinformatics tools in today’s world range from simple basic tools to very large sophisticated software packages. In spite of increasing complexity and sophistication of the bioinformatics tools, these tools are increasingly used in the field of modern biology. This introduction of software tools in the field of modern biology has its distinct advantages as well as there are some challenges in their implementation. This study was conducted in keeping with the aim of discussing these advantages and challenges faced with the implementation of bioinformatics tools. This paper has two main objectives; first discusses the challenges faced in the development of bioinformatics tools from a software development point of view. Second, the results of the survey based on the importance and issues in bioinformatics tools are presented from an end user point of view. The main audience of this survey were students, educators and implements of the bioinformatics tools. Keywords: Software, Tools, Bioinformatics, Survey, Implementation.
1.0
INTRODUCTION
Today world education is based on technology in any fields, including science, engineering, mathematics, management and so on cannot think to be formulated into higher levels in their own field without a technology. Any science data and application integration and ability is one of the most challenging problems facing Bioinformatics tools today. With software engineering develop and maintain software application or tools by using computer science theory and engineering methodology. The software developer and user in such a particular environment as Bioinformatics, they calculate mathematical formulas, space exploration and many sequins etc. According to Counsell (2003) the bioinformatics defines as a any application of computational methods and analysis to biological problems. Software engineer or systems biologists work with many kinds of programming languages to develop biochemistry tools for biologists or users. Bioinformatics tools programming is complex nature of and its growing volume day by day, there is expected demand to support bioinformatics programmers in developing worthy of reliance and maintainable software systems. Bioinformatics programmers and users in a particular environment as bioinformatics space systematic exploration and mathematical modelling have developed into communities of exercise with their own domain-specific ideas to and philosophies about bioinformatics software development. It particular purpose encouraged computing and information technology to examine a broad range of biological data generated by several botanical experiments. Bioinformatics concedes the potential to perceive biological information and sequences process in a more effective, reliable and efficient way by utilizing computational technologies for extracting knowledge from the dumb amount of blended biological data; then again, it allows for a new and composite domain for data analysis. Hence, bioinformatics tools have been developed into a vital area of study for both biology and computer science. Bioinformatics tools are developed by many programming languages (C, C++ , Java etc.) and that tool run on different operating system , the first free open source bioinformatics tools are developed for Unix/ Linux operating systems, whereas life scientists are well companion with the windows operating system environment, bioinformatics tools are not easily adjusted to run under the computer bunch. Commercial and open-source bioinformatics in that researches end up covering more with the complexity of the bioinformatics tools, rather than the scientific and mathematical problems at hand as mentioned by Johnson et al., (2004). The biochemists research project started out by isolating and serving to purge bulk amounts of a protein from its main source organism in order to characterize a special gene product. Now scientists done such a process by using sequence analysis tools, derive all classifies of functional and perhaps, structural insight within that stretch of DNA. This research conduct a survey to give descriptions of some popular bioinformatics tools is used for research. The main objective is to determine the awareness and popular tools of bioinformatics. The -
G.L.O.B.A.L E.N.G.I .N.E. E.R.S. .& . .T.E.C. H.N.O.L.O.G.I.S.T.S R.E.V.I.E. W
6
Global Engineers & Technologists Review, Vol.3 No.3
(2013)
research surrounds with following questions such as do you use software tools in bioinformatics?, why do you use bioinformatics tools?, which tools are important?, what are the issues in using bioinformatics tools?.
2.0
RELATED WORK
Bioinformatics is a comparatively open area of scientific research programs, the first things is to understand the information tasks and end-user programming activities of biologists. According to MacMullen and Denn (2005) all research into information projects of biologists revealed three categories sequence alignment, structure prediction and function prediction. Meanwhile Bartlett and Toms (2005) worked with biologists to know how they control functional examines of gene sequence using tools such as NCBI tools, MLST, BLAST the genetic sequence database, and GenBank, an analytical tool for calculating similarities between sequences. Tran et al., (2004) formulated a study with six different bioinformatics research centre and discovered four common subjects with the way bioinformatics tasks were carried out. These admitted and included a lack of procedural confirmation that some fact of high level tasks, use of home grown schemes, lack of awareness of current in existence bioinformatics tools, and so many variations in individual demands and preferences. A study conducted by Umarji and Seaman (2008) shows the software developers in 2008 for bioinformatics and declare a plan the design and pattern of a search tool for biologists that would make easier access to opensource bioinformatics software components. Near about 50% of their 126 respondents had held the computer science degrees, and others had done biology-related professionals. Their determinations showed that opensource projects and program on a free web were the most common configuration management tools were used, need for a room of such tools are an improvement in testing and maintainability of software, and that comments, feedback and documentation could have with a possibility of becoming an actual useful information that should be exploited. Javahery et al., (2004) in their research shows the Human-Centred Software Engineering (HCSE) at Concordia University has worked on and developed web-based interfaces to popular tools of bioinformatics portals in order to allow for integrated access to web resources relevant to a set of typical tasks. Almost similar cases also can be find in study by Hochheiser et al., (2003) for the Human-Computer Interaction Lab at the University of Maryland is looking into advanced visualization a practical method or techniques to access and manipulate large multimedia information sets in biological databases. With the end-user programming, Massar et al., (2005) performed with purpose and intent BioLingua, an interactive or user friendly web-based programming environment to afford biologists more control and easily to access in performance analyses with genomics, metabolic, and experimental data and higher-level representations. Their system based on symbolic programming language to provide a transparent, integrated interface to normally used bioinformatics tools. Letondal (2006) is also helped of end-user programming to develop Biok to analyzing DNA and protein sequences or multiple alignments with integrated programmable application. Although these contributions cover the shed light on important aspects of improving the user go through of biological databases, and end-user programming activates of biologists. This study is filling the gap by looking into the information activities that occur in developing, testing and maintains bioinformatics software by considering both biologists and software developer with no biological domain knowledge.
3.0
ISSUES IN BIOINFORMATICS SOFTWARE DEVELOPMENT
Great career opportunities and very attractive salary in the field of a bioinformatics software developer, those developers developed the bioinformatics tools, including database and data warehouses, web-based retrieval and query application, data mining software, storage application, etc. However, at the same time so many issues with computer professionals, they should be concerned with building and testing robust application and performance issues such as real-time processing, complex formulas, understanding the biological terms, reliability, data integrity, etc. In today world's more and more prominent scale biological analyses designed to improve health outcomes are attempted, and the very important it becomes in an accurate manner translate experimental results. A large variety of genomic data sources have emerged every day, so that bioinformatics tools are needed that correctly and rapidly report results to labs in order to decide treatment strategies. Some important challenges facing by software developers to develop any bioinformatics tools or database. i)
ii)
Requirements specification : One of the major problems of requirements analysis to develop any Bioinformatics tools such as, developer's lack of attention in biologic terms, communication breakdown with biological scientist or software developer and also a wide range of people used this bioinformatics project so requirements must follow an end-user programming basis. Bioinformatics tool design : Design is a very important part of any product's success, here is also some coordination problem with biological scientist or software developer. © 2013 GETview Limited. All rights reserved
7
Global Engineers & Technologists Review, Vol.3 No.3
iii) iv) v)
(2013)
Implementation and integration : Developer must be working with implementing correct algorithms, understand the complexity of biological terms or calculation made a right manner, etc. Testing the bioinformatics tools: Finally, developer investigation the tool is working or not, those tools functioning everywhere or operating system, after input its give out correct output. Most of the test effort occurs after the requirements have been defined. Quality assurances : One of the major challenge of software development is the failure system or software are highly costly; the nature of bioinformatics development is very complex; therefore, more effective quality assurance actions should be followed in bioinformatics projects.
The scientists of molecular biologists and other are concerned with data input and user interfaces, analysis and analytical tools and interpretation in a shared, global environment Kesh (2004). Furthermore, some scientists compiled about software application are incapable of being used with one another as they use different file formats. As a phenomenon that follows the output of one bioinformatics tool cannot be used directly as an input for other tools, without data format conversion. Table 1 shows the issues faced by software developer and its cause. Table 1: Problem faced by software developer Issues with Software Developer User friendly or visualization interface Effective bioinformatics tool development Must be web-based access Needs background knowledge of bioinformatics Data encoding representation Well structured Operating system free
Cause It must be user friendly to work very easy. Biologists use tools to perform data analysis. Biologists access any data and tools on Internet Exploit the biological characteristics and symbol as much as possible and resolved to use established procedures. It must be good encoding of bioinformatics data to success the tools. It solved the problem with logic or algorithms. It must be platform independent.
When any software developer developed any bioinformatics tools, database, data warehouses, so many responsibilities on such as compatibility for every operating system, methods for benchmarking genomics tools, develop strong cross functional relationship within the R&D and operation, publicly open source tools, track quality metrics for sequencing and variant detection work flows testing of bioinformatics methods or algorithms for data analysis, deliver written and verbal technical reports, etc. Today, with the help of computers bioinformatics tools to access much more biological data than ever before. The bioinformatics tools help lot of scientists, researchers analyze data every day like identify genes by comparing genomic data across organisms and recognize patterns in the data. Insights as to the structure of proteins can come into possession of through computer analyses of the protein sequence. The tools help sequence and alignment editing, analyze statistics libraries, user-friendly and easy to use etc. That way bioinformatics tools are very popular day by day.
4.0
RESULT AND DISCUSSION
This survey was conducted over a period of 4 months. This questionnaire used in this survey was given to a wide range of audience including doctors, professionals working in the field of bioinformatics as well as to students. The survey instrument was made and distributed online and hard copy different in university, colleges as well as scientific research labs. The total amount of respondents returned after giving answers is 179. The whole process of getting a response from people was completed in 104 days. 4.1 Importance of Bioinformatics Tools Figure 1 shows the importance of bioinformatics tools with 93% respondents agreed that the bioinformatics software tools play a very vital role of bioinformatics while as many 7% of respondents were not sure about the use of bioinformatics tools.
Figure 1: Importance of bioinformatics tools result.
© 2013 GETview Limited. All rights reserved
8
Global Engineers & Technologists Review, Vol.3 No.3
(2013)
4.2 Introduction to Bioinformatics Tools Figure 2 shows the survey result on the introduction to bioinformatics tools 179 professional workers. As many 53% obtain bioinformatics training in an education period, 29% respondents whom they took some bioinformatics training courses or training period. 11% of respondents received their training during their job and 7% have no introduction to bioinformatics tools at all.
Figure 2: Introduction to bioinformatics tools result.
4.3 Commonly Used the Tools Figure 3 shows the result of commonly used the tools whereas 65 % of respondents come from Blast, NCBI, Fasta, Expasy, Genbank, ClustalW and Primer 3.
Figure 3: Commonly used the tools result.
Sequence analysis and structure analysis is where majority 48% of a bioinformatics scientist are working. Protein expression analysis, Gene's expression analysis and mutation analysis are also very favourites where the scientist works. The worst features of bioinformatics tools are according to majority of the respondents is, hard to interpret the results sometimes and poor documentation, is most of the responder select. A majority of scientist also suggests the new feature that should be included in fresh Bioinformatics stools, they say must be an improvement on tutorials and documentation for all programs, some programs are dependent on other programs so be sure all programs can work independently, be able to align sequences and show % homology between sequences.
5.0
CONCLUSION
The field of bioinformatics is rapidly developing with each passing day. Each day, new tools and approaches are introduced in the field of modern biology and medicine. With the development of new bioinformatics tools every day, their challenges and complexity of these tools are supposed to decrease. This study was conducted to measure the current state of these challenges and complexities of these tools. The paper first discusses the challenges faced with the development of bioinformatics tools from a software engineering point of view. The results found the prior knowledge of the subject as one of the issues in bioinformatics tools development as well as other issues of user friendliness and mode of access. Second, a survey was conducted to students, educators and implementations of this tool. The respondents of the survey were asked about their introduction and importance of bioinformatics tools. A majority of 93% of respondents found bioinformatics tools important in the field of modern biology. The respondents were then asked about their exposure to bioinformatics tools. 53% of respondents claimed that they were first introduced to these tools in education period while as 29% said they have done some kind of certification courses in bioinformatics tools. 48% of respondents claimed sequence analysis and structure analysis as main areas of implementation of bioinformatics tools. In the end, respondents were asked about the issues they face in the usage of these tools. The majority of the respondents said that the difficulty to interpret results is one of the main issues with these tools. Some respondents blamed this on the poor quality of documentation available with these tools. The majority of respondents demanded for the introduction of tutorials and documentation as a new feature to be added in these tools. This study found many problems faced by bioinformatics software developing. The research can further be extended to investigate the tasks of bioinformatics software development process in depth and continue to evaluate collaboration among software developers and biologist.
© 2013 GETview Limited. All rights reserved
9
Global Engineers & Technologists Review, Vol.3 No.3
(2013)
ACKNOWLEDGMENTS The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding the work through the research group project no RGP-VPP-151. REFERENCES [1] Counsell, D. (2003): A Review of Bioinformatics Education in the UK, Briefings in Bioinformatics, Vol.4, No.1, pp.7-21. [2] Johnson, C.R., MacLeod, R., Parker, S.G. and Weinstein, D. (2004): Biomedical Computing and Visualization Software Environments,” Communications of the ACM, Vol.47, No.11, pp.64-71. [3] MacMullen, W.J. and Denn, S.O. (2005): Information Problems in Molecular Biology and Bioinformatics, Journal of the American Society for Information Science and Technology, Vol.56, Iss.5, pp.447-456. [4] Bartlett, J.C. and Toms, E.G. (2005): Developing a Protocol for Bioinformatics Analysis: An Integrated Information Behavior and Task Analysis Approach, Journal of the American Society for Information Science and Technology, Vol.56, Iss.5, pp.469-482. [5] Tran, D., Dubay, C., Gorman, P. and Hersh, W. (2004): Applying Task Analysis to Describe and Facilitate Bioinformatics Tasks,” Stud Health Technol Inform, Vol.11, pp.818-822. [6] Umarji, M. and Seaman, C. (2008): Information Design of A Search Tool for Bioinformatics,” in the Proceedings of ICSE workshop on Software Engineering for Computational Science and Engineering. [7] Javahery,H., Seffah, A. and Krishnan, S. (2004): Beyond Power-Making Bioinformatics Tools User-Centric, Communications of the ACM, Vol.47, No.11, pp.58-63. [8] Hochheiser, H., Baehrecke, E.H., Mount, S.M. and Shneiderman, B. (2003): Dynamic Querying for Pattern Identification in Microarray and Genomic Data”. In the Proceedings of 2003 IEEE International Conference on Multimedia and Expo. Washington, DC, USA. [9] Massar, J.P., Travers, M., Elhai, J. and Shrager, J. (2005): BioLingua- A Programming Knowledge Environment for Biologists, Bioinformatics, Vol.21, No.2, pp.199-207. [10] Letondal, C. (2006): Participatory Programming - Developing Programmable Bioinformatics Tools for End User, Human-Computer Interaction Series, Vol.9, pp.207-242. [11] Kesh, S. (2004): Critical Issues in Bioinformatics and Computing, Perspectives in Health Information Management, Vol.1, pp.1-9.
© 2013 GETview Limited. All rights reserved
10