GenomePixelizer—a visualization program for comparative genomics ...

8 downloads 130706 Views 88KB Size Report
Sep 28, 2001 - BIOINFORMATICS APPLICATIONS NOTE Vol. 18 no. 2 2002. Pages 335–336 ... HTML ImageMap tags for each gene in the image allowing.
BIOINFORMATICS APPLICATIONS NOTE

Vol. 18 no. 2 2002 Pages 335–336

GenomePixelizer—a visualization program for comparative genomics within and between species A. Kozik, E. Kochetkova and R. Michelmore Department of Vegetable Crops, University of California, Davis, CA 95616, USA Received on August 28, 2001; revised on September 28, 2001; accepted on October 12, 2001

ABSTRACT Summary: GenomePixelizer is a visualization tool that generates custom images of the physical or genetic positions of specified sets of genes in whole genomes or parts of genomes. Multiple sets of genes can be shown simultaneously with user-defined characteristics displayed. It allows the analysis of duplication events within and between species based on sequence similarities. The program is written in Tcl/Tk and works on any platform that supports the Tcl/Tk toolkit. GenomePixelizer generates HTML ImageMap tags for each gene in the image allowing links to databases. Images can be saved and presented on web pages. Availability: GenomePixelizer is freely available at http: //niblrrs.ucdavis.edu/GenomePixelizer/GenomePixelizer Welcome.html Contact: [email protected]

INTRODUCTION The increasing availability of the sequences for whole genomes has created the need for different types of visualization tools that allow the facile manipulation and comparisons of the data. Several genome viewers are currently available, for example: NCBI Map Viewer (http://www.ncbi.nlm.nih.gov/), TIGR Genome Browser (http://www.tigr.org/), MIPS Arabidopsis Redundancy Viewer (http://mips.gsf.de/proj/thal/db/gv/rv/) and WormBase (AceDB, http://www.wormbase.org/). These tools allow the viewing and exploration of not only whole chromosome(s) but also the details of genome assembly, ORF prediction and gene annotation. However, the existing genome viewers lack the flexibility to work with specific subsets of genes, to analyze the relationships between different chromosomes and to examine patterns of gene duplication. The MIPS Arabidopsis Redundancy Viewer comes closest to achieving this; however, this web-based tool does not allow viewing of sets of genes other than those from Arabidopsis, focus on regions of interest or creation of images other than with default parameters. We therefore created the highly customizable genome c Oxford University Press 2002 

visualization program, GenomePixelizer, that works on any computer running the Tcl/Tk toolkit. GenomePixelizer is most similar to the MIPS Arabidopsis Redundancy Viewer and gff2ps (http://www1.imim.es/ software/gfftools/GFF2PS.html). GenomePixelizer differs from gff2ps in that it does not require GFF (http://www.sanger.ac.uk/Software/formats/GFF/) as an input file. The input file format for GenomePixelizer is simpler, more flexible and customizable and can be created using an Excel-like editor. GenomePixelizer displays the relationship between genes on any number of chromosomes; in contrast, gff2ps does not display relationships between chromosomes and Arabidopsis Redundancy Viewer displays only two chromosomes simultaneously. However, these programs are complementary and their combined usage is extremely powerful in understanding genome organization and evolution. We are using GenomePixelizer to analyze the evolution of Nucleotide Binding Site–Leucine Rich Repeat (NBS– LRR) encoding genes in Arabidopsis relative to genome duplication events.

PROGRAM CAPABILITIES GenomePixelizer generates images of one or more genomes. The positions of user-selected sets of genes are displayed along the chromosomes based on either physical or genetic distances (in Mb or cM respectively). The source of sequences is not restricted to one organism; relationships between different genomes can be displayed (e.g. cytochrome P450 genes in Arabidopsis thaliana and Caenorhabditis elegans, http://niblrrs.ucdavis.edu/GenomePixelizer/Examples/ GenoPix Example arab-worm-inter.html). Two userdefined characteristics are available for each gene: the position above or below the chromosome (the direction of transcription is currently the default in the program) and the color of the element (e.g. the type of gene or the presence of particular motifs). GenomePixelizer generates HTML ImageMap tags for ‘clickable’ links to databases such as MIPS that provide detailed information 335

A.Kozik et al.

• Regions with high gene density can be drawn using automatic or manual correction to display overlapping gene symbols. • Source code is freely available and new features can be added with minimal code modifications. • Images can be captured by any screenshot program and incorporated into Web pages. Images may also be saved as a PostScript file and then transformed into GIF or PNG file format.

Fig. 1. Screenshot from GenomePixelizer showing cytochrome P450 genes distributed over the five chromosomes of Arabidopsis. Genes with greater than 75% predicted amino acid identity are joined by lines. An example dialog box containing the gene id or additional information is shown in the lower right corner that can be obtained by clicking on an individual element.

for each gene. Adjustable levels of sequence similarity between genes are indicated by colored lines joining the pairs of genes compared. The patterns generated allow the easy identification of duplicated genomic regions (Figure 1). This allows comparisons between patterns of duplication for different families of genes, investigations of the occurrence of large versus local duplications and deletions as well as studies of macro- and micro-synteny.

FEATURES OF GENOMEPIXELIZER • Displays user-defined features for selected genes throughout whole genome(s). • Images fit into a single screen without scrolling. It is also possible to generate larger images with a built-in scroll-bar. • Simple and flexible input file set up, edited and modified using spreadsheet editor (e.g. MS Excel). Individual genes can easily be added, deleted or modified. • Minimal modification to the input file provides zoomin functionality and allows the viewing of regions of high gene density in greater detail.

336

IMPLEMENTATION GenomePixelizer requires three files. The startup file specifies the names of the input file and the distance matrix file as well as the number and size of chromosomes, the upper and lower levels of sequence similarity, the horizontal and vertical dimensions of the image, and other optional parameters. The input file contains the gene IDs, gene coordinates, and gene features defined by user. The distance matrix file contains pairs of gene IDs and their percentage similarity or identity as defined by the user. GenomePixelizer reads the startup file and draws the chromosomes within the window according to their specified sizes. It then reads the input file and places each gene either below or above the chromosomes. The default positions of genes above or below the chromosomes correspond to Watson/Crick orientation; however, the user may assign any other binary characteristic. Each gene is represented by a colored element. The color scheme is flexible and customizable; it may reflect any feature defined by the user, such as the type of gene or the presence of particular motifs. Simultaneously, GenomePixelizer generates a separate file with HTML ImageMap tags that can be used to create Web pages with clickable images. Finally, the program reads the distance matrix file and draws the lines between genes within the upper and lower levels of similarity defined in the startup file. Examples and detailed documentation are available at: http://niblrrs.ucdavis.edu/ GenomePixelizer/GenomePixelizer Welcome.html. ACKNOWLEDGEMENTS This work was supported by a grant from the National Science Foundation Plant Genome Program (Award no. DBI9975971) and the USDA IFAFS Plant Genome Program (Award no. 00-52100-9609). We thank Blake Meyers for critical reading of the manuscript.