Computational Structural Genomics of a Complete Minimal Organism

2 downloads 0 Views 271KB Size Report
Genome Informatics 13: 390–391 (2002). Computational Structural Genomics of a Complete Minimal Organism. John-Marc Chandonia1. David E. Konerding1.
390

Genome Informatics 13: 390–391 (2002)

Computational Structural Genomics of a Complete Minimal Organism John-Marc Chandonia1

David E. Konerding1

Darryl G. Allen1

[email protected]

[email protected]

[email protected]

In-Geol

Choi1

[email protected] 1 2

Hisao

Yokota1

[email protected]

Steven E. Brenner1,2 [email protected]

Berkeley Structural Genomics Center, Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA Department of Plant and Microbial Biology, 111 Koshland Hall, University of California, Berkeley, CA 94720-3102, USA

Keywords: structural genomics, functional prediction from structure

1

Introduction

Structural genomics aims to provide an experimental structure or computational model of every tractable protein in a complete genome. A considerable fraction of the genes in all sequenced genomes have no known function, and have diverged sufficiently from functionally characterized homologues that the evolutionary relationship cannot be detected from sequence alone. Determining the structure of these proteins may reveal distant homology, which can be used to infer cellular and molecular functions. The structure is also important for acquiring a detailed understanding of enzymatic catalysis and interaction with small molecule ligands and other proteins. More generally, knowledge of an increasingly complete repertoire of protein structures will aid structure prediction methods, improve understanding of protein structure, and ultimately lend insight into molecular interactions and pathways. Mycoplasma genitalium has the smallest bacterial genome, with only 480 proteins. Its close relative, Mycoplasma pneumoniae, has 677. The small size of these genomes should allow us to obtain structures or models for all tractable proteins in these genomes within 5 years. This is expected to yield important insight into the minimal set of genes necessary for life; many genes in more complex organisms may be variations on genes in the minimal set.

2 2.1

Methods and Results Project Overview

An overview of the project is shown in Figure 1. We use computational methods to identify and model proteins whose structures can be predicted. Other proteins and their homologues (especially those from hyperthermophiles) will be screened computationally to find ones which are likely to be amenable to experimental characterization. Solved structures are analyzed using a variety of computational methods in order to provide experimentally testable hypotheses about their cellular functions and potential binding partners. Current results are available on the Berkeley Structural Genomics Center website [6].

2.2

Function from Structure

Even when proteins have diverged too far to be recognizably similar by sequence comparison, their folds are conserved. Recognition of homology from structure allows inference of likely functional relatedness, which may be then precisely evaluated. We are assessing and developing several novel

Computational Structural Genomics of a Complete Minimal Organism

391

techniques of analyzing protein structure. When these are determined to be robust, we will apply those methods to the structures determined as part of this project. One method is direct comparisons with functionally characterized homologues, using a structural alignment tool such as MINAREA [3]. Another method is analysis of the surface properties of the protein, possibly coupled with an analysis of conservation among homologues of a protein from multiple species. A more thorough means of using information from multiple homologues is the Evolutionary Trace method [4]. A dendrogram of related sequences is constructed using multiple homologues of a gene from various organisms. Each residue in the protein can then be traced in the tree, providing an evolutionary perspective in which to evaluate the structural or functional role of that residue. Finally, ligands revealed in the crystal structures themselves can provide clues as to the function of the protein. In the crystal structure of MJ0577, a “hypothetical protein” from Methanoccocus jannaschii, a bound ATP was observed. Prior to crystallization, there was no biochemical evidence that the protein bound ATP. This led to the hypothesis that the protein functions as a molecular switch in combination with other proteins [5].

A

B

Representation of the proteins in a single genome. The proteins are illustrated as points in some arbitrary sequence space. Stars indicate proteins of known structure. Proteins whose structures are not tractable are eliminated.

Multiple genomes. We work in the context of all fully-sequenced genomes. Colors indicate different genomes’ proteins.

E

D

Structure-identified families. Often structural similarity will reveal homology, even when the families lack significant sequence similarity.

C

Target

Target selection. A family is selected for experimental structural characterization, and a target from within that family is highlighted.

Sequence-identified families. By sequence similarity, it is possible to recognize homology among the proteins and construct families.

F

Analysis. The solved structure is analyzed and structural similarity identified. The structure is also used to make models of homologs.

Figure 1: Project overview; more details are provided in [1, 2]. Figure 1: Project overview; more details are provided in [1,2].

References of the surface properties of thegenomics, protein, possibly an analysis of conservation among [1]analysis Brenner, S.E., A tour of structural Nature coupled Reviewswith Genetics, 2:801–809, 2001. homologues of a protein from multiple species. A more thorough means of using information from multiple [2] Brenner, S.E., Target selection for structural genomics, Nature Structural Biology, Structural homologues the Evolutionary Trace method GenomicsisSupplement, 7:967–969, 2000. [4]. A dendrogram of related sequences is constructed using multiple homologues of a gene from various organisms. Each residue in the protein can then be traced in [3]theFalicov, A. and an Cohen F.E., A perspective surface of minimum area metricthe forstructural the structural comparison tree, providing evolutionary in which to evaluate or functional role ofofthat proteins, J. Mol. Biol., 258:871–892, 1996. residue. Finally, ligands revealed in the crystal structures themselves can provide clues as to the function of [4]theLichtarge, O.,theBourne, H.R., and Cohen, F.E., An evolutionary method definesjannaschii, binding a protein. In crystal structure of MJ0577, a “hypothetical protein” trace from Methanoccocus surfaces to protein families, J. Mol. Biol., 257:342–358, 1996. bound ATPcommon was observed. Prior to crystallization, there was no biochemical evidence that the protein ATP. This led Hung, to the hypothesis that the protein functions as a molecular switchH., in combination with [5]bound Zarembinski, T.I., L.W., Mueller-Dieckmann, H.J., Kim, K.K., Yokota, Kim, R., and other proteins Kim, S.H., [5]. Structure-based assignment of the biochemical function of a hypothetical protein: A test case of structural genomics, Proc. Natl. Acad. Sci. USA, 95:15189–15193, 1998. [6]References http://www.strgen.org/ – Berkeley Structural Genomics Center website. [1] Brenner SE. A Tour of Structural Genomics. Nature Reviews Genetics 2:801-9. 2001. [2] Brenner SE. Target Selection for Structural Genomics. Nature Structural Biology, Structural Genomics Supplement 7: 967-9. 2000. [3] Falicov A, Cohen FE. A Surface of Minimum Area Metric for the Structural Comparison of Proteins.

Suggest Documents