Document not found! Please try again

CHROMA: consensus-based colouring of multiple alignments for ...

9 downloads 16019 Views 534KB Size Report
May 29, 2001 - jalview/), format alignments in HTML for display within a Web page. ... font colours, and type attributes, such as bold and italic, according to ...
BIOINFORMATICS APPLICATIONS NOTE

Vol. 17 no. 9 2001 Pages 845–846

CHROMA: consensus-based colouring of multiple alignments for publication Leo Goodstadt ∗ and Chris P. Ponting MRC Functional Genetics Unit, University of Oxford, Department of Human Anatomy and Genetics, South Parks Road, Oxford OX1 3QX, UK Received on April 6, 2001; revised on May 29, 2001; accepted on June 5, 2001

ABSTRACT CHROMA annotates multiple protein sequence alignments by consensus to produce formatted and coloured text suitable for incorporation into other documents for publication. The package is designed to be flexible and reliable, and has a simple-to-use graphical user interface running under Microsoft Windows. Both the executables and source code for CHROMA running under Windows and Linux (portable command-line only) are freely available at http://www.lg.ndirect.co.uk/chroma. Software enquiries should be directed to [email protected].

Much of biology in this post-genome era profits from the alignment of protein sequences. Significantly similar sequences are predicted to be homologous, having arisen from a common ancestral gene, and may possess similarities in function. Demonstrating amino acid conservation among many homologues requires the construction of a multiple alignment. Subtle conservation patterns can be revealed by colouring sequences according to the conservation of amino acid groupings (Taylor, 1986). Several tools have been developed to colour or shade alignments according to conservation. Some of the most popular of these, including MView (Brown et al., 1998; http://mathbio.nimr.mrc.ac.uk/∼nbrown/mview/) and Jalview (unpublished; http://www.ebi.ac.uk/∼michele/ jalview/), format alignments in HTML for display within a Web page. However, HTML is not well suited for representing text with large number of different font formats and colours. Other programmes such as Belvu (http:// www.cgr.ki.se.cgr.groups/sonnhammer/Belvu.thml) generate colour postscript files. These are most suitable for printing directly but are not easily incorporated into other documents. The current generation of popular Personal ComR puter (PC) word processors, such as Microsoft R R R    Word and Corel WordPerfect , and presentaR , tion programmes such as Microsoft PowerPoint also do not allow easy and faithful import of HTML ∗ To whom correspondence should be addressed.

c Oxford University Press 2001 

or Postscript. By contrast, the command-line tools Boxshade (Hofmann,K. and Baron,M.D., unpublished; http://www.ch.embnet.org/software/BOX form.html) and JOY version 5.0 (Mizuguchi et al., 1998; http: //www-cryst.bioc.cam.ac.uk/∼joy/) do produce Rich Text Format (RTF) suitable for inclusion in Word or WordPerfect formatted publication, yet are considerably less powerful and flexible than MView and Belvu. BioEdit (Hall,T., unpublished; http://www.mbio.ncsu.edu/ RNaseP/info/programs/BIOEDIT/bioedit.html) provides extensive facilities to manipulate multiple sequences on PCs, but generates only black-and-white output. The current lack of a PC-based tool dedicated to generating colour multiple alignments in formats compatible with popular word processor packages has compelled many to undertake the laborious task of individually formatting every amino acid symbol in an alignment. CHROMA (for CHromatic Representation Of Multiple Alignments) is a freely available, interactive, multiple alignment formatting package running under Microsoft Windows. It generates RTF output suitable for inclusion in Word, WordPerfect and PowerPoint (Figure 1), and HTML for display on the internet. Conserved residues are highlighted with the use of different background and font colours, and type attributes, such as bold and italic, according to formatting schemes that can be specified easily with immediate visual feedback. Conservation is defined with respect to a calculated consensus sequence appended to the annotation. Once imported into an editor such as Word, the annotated sequences can be manipulated like any other piece of formatted text. This additional flexibility enables any further necessary modifications or custom reformatting to be carried out at leisure. Thus, for example, the residues in an active site can be highlighted or, perhaps, the spacing adjusted to accommodate the particular publication requirements of a journal. CHROMA accepts sequences in FASTA (Pearson), MSF, ClustalW, PHD (http://maple.bioc.columbia.edu/ predictprotein) and DSC formats. The generation of a consensus involves the examination of each column of a multiple alignment to determine whether an above845

L.Goodstadt and C.P.Ponting

Fig. 1. Screenshots from Microsoft Word and CHROMA running side-by-side. The multiple alignment loaded in the CHROMA main window (partly obscured) has been annotated according to the settings in the ‘Edit Residue Group Formats’ dialogue box. The various aspects of the ‘Ser/Thr’ amino acid group are being specified here. The precedence of groups can be set using the ‘Move Up’/‘Move Down’ arrow buttons (those to the top having higher priority). Groups are added or removed using the ‘Add’ and ‘Delete’ buttons, and removed from consideration in the annotation or reintroduced using the ‘Inactivate’ and ‘Activate’ buttons. The resulting automatically-generated annotation is displayed here as a Microsoft Word document.

threshold fraction of the residues belongs to a group of amino acids characterized by similar physicochemical properties. Only matching residues are then highlighted. Both consensus threshold and amino-acid groups are easily user-defined. The latter is particularly important as the grouping of amino acids depends on their structural context and functional roles such that often no unambiguous and universally applicable categories can be constructed. Thus, Cys residues might coordinate divalent metal ions such as Zn2+ , or lie in strongly hydrophobic positions in the interior of structures, or contribute to active sites of enzymes, or oxidize to form disulphide bridges; with the exception of disulphide bridge formation, Cys might be substituted in these roles without significant alteration of function by His, Ala and Ser, respectively. Four aspects of amino acid groups can be specified interactively using Windows controls (Figure 1): (1) their constituent amino acids; (2) the matching precedence (e.g. specific groups like ‘negatively charged’ might be preferred over more general ones like ‘polar’); (3) the corresponding symbol for the generated consensus line; as well as (4) the text format applied to matching residues. CHROMA was designed to be flexible, user-friendly and robust. All of the many formatting options can be set on the fly, either directly from menus or via dialogue

846

boxes without manual editing of parameter files. The software is distributed with an extensive context-sensitive help facility together with a 12-step tutorial describing the main programme features. Errors that occur in parsing multiple alignment data provoke detailed descriptions of the problem with offending lines highlighted. The many additional features of CHROMA are described in more detail on its web page (http: //www.lg.ndirect.co.uk/chroma/paper.html). These include the optional replacement of long insertions within an alignment by the number of excised amino acids, partitioning of sequence sets into multiple aligned subfamilies, and the design architecture. This package has recently been incorporated into the SMART web site (http://smart.embl-heidelberg.de).

REFERENCES Brown,N.P., Leroy,C. and Sander,C. (1998) MView: a webcompatible database search or multiple alignment viewer. Bioinformatics, 14, 380–381. Mizuguchi,K., Deane,C.M., Blundell,T.L., Johnson,M.S. and Overington,J.P. (1998) JOY: protein sequence-structure representation and analysis. Bioinformatics, 14, 617–623. Taylor,W.R. (1986) The classification of amino acid conservation. J. Theor. Biol., 119, 205–218.

Suggest Documents