The Galaxy service pilot in CSIRO - CSIRO Research Publications ...

6 downloads 125 Views 656KB Size Report
Galaxy: a web-based genome analysis tool for experimentalists. ... 1451-5. 3. Galaxy: a comprehensive approach for suppo
The Galaxy service pilot in CSIRO A collaboration between science and IT Steve McMahon, CSIRO, Canberra, Australia

ADVANCED SCIENTIFIC COMPUTING, INFORMATION MANAGEMENT & TECHNOLOGY

Galaxy (1) (2) (3) is a web based bioinformatics tool kit with the ability to perform, reproduce and share complete analyses. A Galaxy service pilot has been set up in CSIRO for the benefit of biologists and bioinformaticians within the organisation. The service pilot is implemented as a collaboration between CSIRO’s Information Management and Technology staff (IM&T) and key bioinformatics experts within CSIRO. This makes best use of the IT infrastructure and service delivery expertise of the IT and the bioinformatics domain expertise of the bioinformatics staff. Other projects such as an enterprise genome browser are also using this approach. This presentation outlines Galaxy, the way it has been implemented in CSIRO as a service pilot and some of the outcomes and related experiences.

Problem Biologists working with sequencing data require various analyses to be done. In CSIRO they had been relying on a limited number of skilled bioinformaticians to carry out this analysis. Alternatively they may have chosen to educate themselves in these same skills which is time consuming. This analysis was not always performed on the best computing resources and not always in the most optimal way. It was proposed that a service providing easy access to some analysis tools would improve research throughput of the novice bioinformaticians while freeing up time of the experienced bioinformaticians for other work.

The objectives of the project are to deliver

• • •

Project scope This service pilot project intends to demonstrate how a full Galaxy service might benefit the bioscience community in CSIRO. It involves providing a BLAST service optimised to execute on a high performance compute cluster. The service pilot also delivers a number of other useful bioinformatics tools. The pilot is intended to guide the design and implementation of a full production Galaxy service.

A limited set of bioinformatics tools which are easy to use and reliable available through a Galaxy portal which is configured to run optimally on suitable compute resources. A framework allowing the bioinformatics community to request new features or enhancements to the Galaxy portal. A decision to either implement a full Galaxy service or not and how this full service might be defined.

Current status The project has already shown how CSIRO IT and science staff can work together to achieve project goals. The service pilot has recently been made available to users and the early feedback is positive. The project continues to add tools and other functionality and resolve issues.

What benefits will the project deliver?

• • • • •

Biologists will be able to do bioinformatics analysis without having to learn about the UNIX command line or write scripts The burden on expert bioinformaticians who are currently asked to do analysis by the broader biology community will be reduced An opportunity to optimise bioinformatics analysis tools so that they maximise the use of available compute resources will be provided A mechanism to re-use and share bioinformatics analysis workflows and data will be provided

Interaction with the NECTAR Genomics Virtual Laboratory CSIRO is a partner in the NECTAR Genomics Virtual Laboratory. One of the activities of this project is to produce a Galaxy service to run in the NECTAR Research Cloud. It is expected that future development of the Galaxy service in CSIRO will build on efforts in the NECTAR Genomics Virtual Laboratory. It is also expected that tools or workflows developed within CSIRO would be fed into the NECTAR Genomics Virtual Laboratory.

By partnering with IM&T bioinformatics groups will gain an expertly managed, reliable and scalable service

FOR FURTHER INFORMATION

REFERENCES

Steve McMahon e [email protected] w www.csiro.au

1. Galaxy: a web-based genome analysis tool for experimentalists. Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J. January 2010, Current Protocols in Molecular Biology, pp. 1-21. 2. Galaxy: a platform for interactive large-scale genome analysis. Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, Miller W, Kent WJ, Nekrutenko A. 10, October 2005, Genome Research, Vol. 15, pp. 1451-5. 3. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Goecks, J, Nekrutenko, A, Taylor, J and The Galaxy Team. 8, August 2010, Genome Biol., Vol. 11.