update on the zebrafish genome project

6 downloads 0 Views 2MB Size Report
Apr 20, 2009 - Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK ... and analysed in its entirety at the Wellcome Trust Sanger Institute to.
UPDATE ON THE ZEBRAFISH GENOME PROJECT Providing Manual Annotation for the Scientific Community J P Almeida-King, S Donaldson, G K Laird, D M Lloyd, H K Sehra, J E Collins, K Howe, B Reimholz, J Torrance, S Trevanion, D Stemple, J G R Gilbert, E Griffiths, J E Loveland, R Storey, J L Harrow, T Hubbard

I Barroso,

Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK

Zv8: The Latest Zebrafish Assembly

Overview The zebrafish genome, which consists of 25 linkage groups and is ~1.4Gb in size, is being sequenced, finished and analysed in its entirety at the Wellcome Trust Sanger Institute to provide an open source, high quality reference genome.

Our latest assembly was generated using 3 genetic maps: HS (meiotic), MGH (meiotic) & T51 (radiation hybrid). Each map was weighted based on the HS > MGH > T51 hierarchy, with one hit/marker/location.

The manual annotation, which is compiled in close collaboration with the Zebrafish Information Network (http://zfin.org/), is provided by the Human and Vertebrate Analysis and Annotation (HAVANA) group and is frequently released onto the Vertebrate Genome Annotation (Vega) database (http://vega.sanger.ac.uk) and may also be viewed as a DAS source in Ensembl (http://www.ensembl.org/Danio_rerio).

The resulting ordered finger print contigs (FPC) will eventually represent our assembly on which our annotation is based. The Ensembl release of Zv8 is due in late spring 2009. 3500

3500

HS Vs Zv7

HS Vs Zv8

2500

2500

We approach annotation using two main strategies. Firstly, the generation of gene lists composed of ZFIN cDNA that map to our current finished assembly. And, secondly, by using whole chromosome clone by clone annotation, where we have annotated over 4200 clones.

2000

The latest zebrafish assembly (Zv8), which represents the most accurate assembly to date is available in Pre-Ensembl (http://pre.ensembl.org/Danio_rerio/Info/Index).

HS Map (cM)

3000

H S M ap ( cM )

Annotation is based on the reference genome assembly sequence, which is derived from a minimal tile path composed of clones finished to a 99.9% sequence standard.

3000

1500

2000

1500

1000

1000

500

500

0 0

200000000

400000000

600000000

800000000

1000000000

Gen o m e ( b p )

1200000000

1400000000

0 0

200000000

400000000

600000000 Genome (bp)

800000000

1000000000

1200000000

The colours represent linkage groups and the dots represent unplaced contigs that fall outside of the assembly. Zv8 is a smaller assembly than Zv7 due to the removal of over 260Mb of haplotypic sequence.

The ZMAP Annotation Interface

Nature Precedings : doi:10.1038/npre.2009.3105.1 : Posted 20 Apr 2009

HAVANA annotation is based on same species and cross-species evidence (cDNA, EST, Protein) sourced from DNA databases and UniProt.

1

Refseq, Pfam, CpG islands, markers, tandem repeats, ab initio gene predictions are also displayed.

Building Genes Using Solexa Data

5 1

HAVANA transcripts

2

Poly-A signal & tail

3

Automated gene predictions

4

Pfam domains

5

Multi-species evidence

Solexa gene models

Orthologue CCNL1 with associated Solexa gene models. 11 potential novel variants are suggested by the models.

4

Solexa gene models are now incorporated alongside our dataset and are used as a guide by annotators. They are derived from over 32Gb of short read pairs obtained by deep sequencing 5 tissue types at 6 stages of development.

3

2 2 The fibroblast growth factor receptor 1 (fgfr1).

Access To Our Sequencing Pipeline

Compara View in Otterlace

http://www.sanger.ac.uk/cgi-bin/humpub/chromoview HUMAN

ZEBRAFISH

MOUSE

Chromoview is a resource tool that enables users to view our Tiling Path Format (TPF) alongside the ‘ A Golden Path’ (AGP) or assembly, which is derived from the TPF. Clones may be queried, providing information on assembly location and status in the pipeline. Variants of fgfr1 in human, zebrafish & mouse.

Community Directed Annotation

Vega: The Manual Annotation Database

As well as our on-going genome annotation we also welcome feedback and annotation requests for genes and regions of interest ([email protected]).

The Vega database is a unique central repository for high quality, frequently updated, manual annotation of vertebrate finished genomic sequence.

We have recently annotated 93 genes to facilitate the production of gene knock-out mutants, with the aim of studying the effect on phenotype in relation to human obesity.

Vega gives an accurate record of your gene of interest, including details of coding and noncoding variants and pseudogenes.

Annotation of the core MHC is also planned, which will complement the human and mouse MHC we have already published

Vega currently includes the full clone annotation of 12 linkage groups and all mapped ZFIN cDNA’ s.

MOUSE

HUMAN

ZEBRAFISH

The FTO gene was annotated in human, mouse and zebrafish as a direct response to a request from the community.

Annotation of the zebrafish MHC will focus on the core region on chromosome 19. Picture obtained from: BMC Genomics. 2005; 6: 152.

MapView showing chromosome 20

Vega release 35